[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2019-01-17 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r248857901
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryAppStatusListener.scala
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.History._
+import org.apache.spark.scheduler.SparkListenerExecutorAdded
+import org.apache.spark.scheduler.cluster.ExecutorInfo
+import org.apache.spark.status.{AppStatusListener, AppStatusSource, 
ElementTrackingStore}
+
+private[spark] class HistoryAppStatusListener(
+kvstore: ElementTrackingStore,
+conf: SparkConf,
+live: Boolean,
+appStatusSource: Option[AppStatusSource] = None,
+lastUpdateTime: Option[Long] = None)
+  extends AppStatusListener(kvstore, conf, live, appStatusSource, 
lastUpdateTime) {
+
+  override def onExecutorAdded(event: SparkListenerExecutorAdded): Unit = {
+val execInfo = event.executorInfo
+val newExecInfo = new ExecutorInfo(execInfo.executorHost, 
execInfo.totalCores,
+  renewLogUrls(execInfo), execInfo.attributes)
+
+super.onExecutorAdded(event.copy(executorInfo = newExecInfo))
+  }
+
+  def renewLogUrls(execInfo: ExecutorInfo): Map[String, String] = {
+val oldLogUrlMap = execInfo.logUrlMap
+val attributes = execInfo.attributes
+
+conf.get(CUSTOM_EXECUTOR_LOG_URL) match {
+  case Some(logUrlPattern) =>
+val pattern = "\\{\\{([A-Za-z0-9_\\-]+)\\}\\}".r
+
+val allPatterns = 
pattern.findAllMatchIn(logUrlPattern).map(_.group(1)).toSet
+val allPatternsExceptFileName = allPatterns.filter(_ != "FILE_NAME")
+val allAttributeKeys = attributes.keys.toSet
+val allAttributeKeysExceptLogFiles = allAttributeKeys.filter(_ != 
"LOG_FILES")
+
+if 
(allPatternsExceptFileName.diff(allAttributeKeysExceptLogFiles).nonEmpty) {
+  logFailToRenewLogUrls("some of required attributes are missing in 
app's event log.",
+allPatternsExceptFileName, allAttributeKeys)
+  return oldLogUrlMap
+} else if (allPatterns.contains("FILE_NAME") && 
!allAttributeKeys.contains("LOG_FILES")) {
+  logFailToRenewLogUrls("'FILE_NAME' parameter is provided, but file 
information is " +
+"missing in app's event log.", allPatternsExceptFileName, 
allAttributeKeys)
+  return oldLogUrlMap
+}
+
+var replacingUrl = logUrlPattern
+
+allPatternsExceptFileName.foreach { pattern =>
+  // we already checked the existence of attribute when comparing keys
+  replacingUrl = replacingUrl.replace(s"{{$pattern}}", 
attributes(pattern))
+}
 
 Review comment:
   Yeah nice suggestion. Still need to familiarize with functional programming. 
:) Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2019-01-17 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r248856862
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ##
 @@ -1046,6 +1201,29 @@ class FsHistoryProviderSuite extends SparkFunSuite with 
BeforeAndAfter with Matc
 conf
   }
 
+  private def createTestExecutorInfo(
+  appId: String,
+  user: String,
+  executorSeqNum: Int,
+  includingLogFiles: Boolean = true): ExecutorInfo = {
+val host = s"host$executorSeqNum"
+val container = s"container$executorSeqNum"
+val cluster = s"cluster$executorSeqNum"
+val logUrlPrefix = s"http://$host:/$appId/$container/origin;
+
+val executorLogUrlMap = Map("stdout" -> s"$logUrlPrefix/stdout",
+  "stderr" -> s"$logUrlPrefix/stderr")
+
+val executorAttributes = if (includingLogFiles) {
+  Map("LOG_FILES" -> "stdout,stderr", "CONTAINER_ID" -> container,
+"CLUSTER_ID" -> cluster, "USER" -> user)
+} else {
+  Map("CONTAINER_ID" -> container, "CLUSTER_ID" -> cluster, "USER" -> user)
+}
 
 Review comment:
   Looks cleaner! Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2019-01-17 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r248856694
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryAppStatusListener.scala
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.History._
+import org.apache.spark.scheduler.SparkListenerExecutorAdded
+import org.apache.spark.scheduler.cluster.ExecutorInfo
+import org.apache.spark.status.{AppStatusListener, AppStatusSource, 
ElementTrackingStore}
+
+private[spark] class HistoryAppStatusListener(
+kvstore: ElementTrackingStore,
+conf: SparkConf,
+live: Boolean,
+appStatusSource: Option[AppStatusSource] = None,
+lastUpdateTime: Option[Long] = None)
+  extends AppStatusListener(kvstore, conf, live, appStatusSource, 
lastUpdateTime) {
+
+  override def onExecutorAdded(event: SparkListenerExecutorAdded): Unit = {
+val execInfo = event.executorInfo
+val newExecInfo = new ExecutorInfo(execInfo.executorHost, 
execInfo.totalCores,
+  renewLogUrls(execInfo), execInfo.attributes)
+
+super.onExecutorAdded(event.copy(executorInfo = newExecInfo))
+  }
+
+  def renewLogUrls(execInfo: ExecutorInfo): Map[String, String] = {
 
 Review comment:
   `replaceLogUrls` sounds better. Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2019-01-15 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r248156970
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -57,6 +58,12 @@ private[yarn] class ExecutorRunnable(
   var rpc: YarnRPC = YarnRPC.create(conf)
   var nmClient: NMClient = _
 
+  val clusterId: Option[String] = try {
 
 Review comment:
   I took a look at `Utils.tryLog` and that doesn't seem to fit my intention, 
since it logs error message which end users could get a wrong sign (as well as 
it catches exception too broadly). If `RM cluster id` is a mandatory config we 
may need to take a different approach (like fail-fast). If that is optional, I 
think we should not leave an error log.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2019-01-15 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r248156970
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -57,6 +58,12 @@ private[yarn] class ExecutorRunnable(
   var rpc: YarnRPC = YarnRPC.create(conf)
   var nmClient: NMClient = _
 
+  val clusterId: Option[String] = try {
 
 Review comment:
   I took a look at `Utils.tryLog` and that doesn't seem to fit my intention, 
since it logs error message which end users could get a wrong sign. If `RM 
cluster id` is a mandatory config we may need to take a different approach 
(like fail-fast). If that is optional, I think we should not leave an error log.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2019-01-15 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r248155705
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -57,6 +58,12 @@ private[yarn] class ExecutorRunnable(
   var rpc: YarnRPC = YarnRPC.create(conf)
   var nmClient: NMClient = _
 
+  val clusterId: Option[String] = try {
 
 Review comment:
   Nice suggestion. Sorry I missed this while dealing with other stuff. Will 
address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2019-01-12 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r247335594
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +253,57 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+val customLogUrl = sparkConf.get(config.CUSTOM_LOG_URL)
+
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.buildLogUrls(customLogUrl, httpScheme, 
address,
+  clusterId, containerId, user, envNameToFileNameMap)
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  def buildLogUrls(
+  logUrlPattern: String,
+  httpScheme: String,
+  nodeHttpAddress: String,
+  clusterId: Option[String],
+  containerId: String,
+  user: String,
+  envNameToFileNameMap: Map[String, String]): Map[String, String] = {
+val optionalPathVariable: Map[String, Option[String]] = 
Map("{{ClusterId}}" -> clusterId)
+val pathVariables: Map[String, String] = Map("{{HttpScheme}}" -> 
httpScheme,
 
 Review comment:
   My intention was to separate parameters by `mandatory` and `optional`. 
Mandatory parameters should be available for all the cases, whereas optional 
parameters are available only when Hadoop/YARN has configured like that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-11 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240817633
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +253,57 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+val customLogUrl = sparkConf.get(config.CUSTOM_LOG_URL)
+
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.buildLogUrls(customLogUrl, httpScheme, 
address,
+  clusterId, containerId, user, envNameToFileNameMap)
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  def buildLogUrls(
+logUrlPattern: String,
+httpScheme: String,
+nodeHttpAddress: String,
+clusterId: Option[String],
+containerId: String,
+user: String,
+envNameToFileNameMap: Map[String, String]): Map[String, String] = {
 
 Review comment:
   Ah yes I just confused indent rule between method parameters and return... 
Nice catch. Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-11 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240817431
 
 

 ##
 File path: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnLogUrlSuite.scala
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.spark.SparkFunSuite
+
+class YarnLogUrlSuite extends SparkFunSuite {
+
+  private val testHttpScheme = "https://;
+  private val testNodeHttpAddress = "nodeManager:1234"
+  private val testContainerId = "testContainer"
+  private val testUser = "testUser"
+  private val testEnvNameToFileNameMap = Map("TEST_ENV_STDOUT" -> "stdout",
+"TEST_ENV_STDERR" -> "stderr")
+
+  test("Custom log URL - leverage all patterns, all values for patterns are 
available") {
+val logUrlPattern = 
"{{HttpScheme}}{{NodeHttpAddress}}/logs/clusters/{{ClusterId}}" +
+  "/containers/{{ContainerId}}/users/{{User}}/files/{{FileName}}"
+
+val clusterId = Some("testCluster")
+
+val logUrls = ExecutorRunnable.buildLogUrls(logUrlPattern, testHttpScheme, 
testNodeHttpAddress,
+  clusterId, testContainerId, testUser, testEnvNameToFileNameMap)
+
+val expectedLogUrls = testEnvNameToFileNameMap.map { case (envName, 
fileName) =>
+  envName -> 
(s"$testHttpScheme$testNodeHttpAddress/logs/clusters/${clusterId.get}" +
+s"/containers/$testContainerId/users/$testUser/files/$fileName")
+}
+
+assert(logUrls === expectedLogUrls)
+  }
+
+  test("Custom log URL - optional pattern is not used in log URL") {
+// here {{ClusterId}} is excluded in this pattern
+val logUrlPattern = 
"{{HttpScheme}}{{NodeHttpAddress}}/logs/containers/{{ContainerId}}" +
+  "/users/{{User}}/files/{{FileName}}"
+
+// suppose the value of {{ClusterId}} pattern is not available
+val clusterId = None
+
+// This should not throw an exception: the value for optional pattern is 
not available
+// but we also don't use the pattern in log URL.
+val logUrls = ExecutorRunnable.buildLogUrls(logUrlPattern, testHttpScheme, 
testNodeHttpAddress,
+  clusterId, testContainerId, testUser, testEnvNameToFileNameMap)
+
+val expectedLogUrls = testEnvNameToFileNameMap.map { case (envName, 
fileName) =>
+  envName -> 
(s"$testHttpScheme$testNodeHttpAddress/logs/containers/$testContainerId" +
+s"/users/$testUser/files/$fileName")
+}
+
+assert(logUrls === expectedLogUrls)
+  }
+
+  test("Custom log URL - optional pattern is used in log URL but the value " +
+"is not present") {
 
 Review comment:
   Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240446659
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
+env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
+env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+  }
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  val LOG_URL_PATTERN_HTTP_SCHEME = "{{HttpScheme}}"
 
 Review comment:
   Ah OK. I'm in favor of avoiding to use string constant directly, but not 
strong opinion on it. Will address.
   
   And yes I can put them in a single method, but placing a new method into 
class will bring unnecessary burden to the test code, since ExecutorRunnable 
receives lots of parameters to be instantiated.
   
   If we want to add an end-to-end test (instantiating YARN cluster and running 
executors) we still need to instantiate ExecutorRunnable (I think we are 
already covering it from here [1]), but if we just want to make sure the logic 
works properly, we might want to keep this as new object and add a test against 
the object to avoid instantiating ExecutorRunnable. WDYT?
   
   1. 
https://github.com/apache/spark/blob/05cf81e6de3d61ddb0af81cd179665693f23351f/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala#L442-L461


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240446659
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
+env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
+env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+  }
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  val LOG_URL_PATTERN_HTTP_SCHEME = "{{HttpScheme}}"
 
 Review comment:
   Ah OK. I'm in favor of avoiding to use string constant directly, but not 
strong opinion on it. Will address.
   
   And yes I can put them in a single method, but placing a new method into 
class will bring unnecessary burden to the test code, since ExecutorRunnable 
receives lots of parameters to be instantiated.
   
   If we want to add an end-to-end test (instantiating YARN cluster and running 
executors) we still need to instantiate ExecutorRunnable (I think we are 
already covering it from here [1]), but if we just want to make sure the logic 
works properly, we might want to keep this as new object and add a test against 
the object to avoid instantiating ExecutorRunnable. WDYT?
   
   1. 
https://github.com/apache/spark/blob/05cf81e6de3d61ddb0af81cd179665693f23351f/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala#L460


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240444758
 
 

 ##
 File path: docs/running-on-yarn.md
 ##
 @@ -430,6 +430,21 @@ To use a custom metrics.properties for the application 
master and executors, upd
   See spark.yarn.config.gatewayPath.
   
 
+
+  spark.yarn.custom.log.url
+  (none)
+  
+  Specifies custom spark log url for supporting external log service rather 
than NodeManager webapp address.
+  Spark will support some path variables via patterns. Supported patterns and 
allocated values are below: 
+  
+  * `{{HttpScheme}}`: `http`/`https` according to YARN HTTP policy. 
(Configured via `yarn.http.policy`)
+  * `{{NodeHttpAddress}}`: HTTP URI of the node on Container. 
 
 Review comment:
   My bad. It is `host:port` instead of URI which can be retrieved from 
`container.getNodeHttpAddress`. The representation of `node on container` is 
borrowed from javadoc of this method, but I'm OK to use anything more clarified.
   
   Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240444139
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
 Review comment:
   Yes it will remove the branch. Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox
HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240436930
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
+env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
+env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+  }
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  val LOG_URL_PATTERN_HTTP_SCHEME = "{{HttpScheme}}"
+  val LOG_URL_PATTERN_NODE_HTTP_ADDRESS = "{{NodeHttpAddress}}"
+  val LOG_URL_PATTERN_CLUSTER_ID = "{{ClusterId}}"
+  val LOG_URL_PATTERN_CONTAINER_ID = "{{ContainerId}}"
+  val LOG_URL_PATTERN_USER = "{{User}}"
+  val LOG_URL_PATTERN_FILE_NAME = "{{FileName}}"
+
+  def buildPathVariables(httpScheme: String, nodeHttpAddress: String, 
clusterId: String,
 
 Review comment:
   I guess it's allowed when it fits within two-lines, but no problem to change 
it. Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org