Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6166#discussion_r30837452
  
    --- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerBackend.scala
 ---
    @@ -53,4 +62,65 @@ private[spark] class YarnClusterSchedulerBackend(
           logError("Application attempt ID is not set.")
           super.applicationAttemptId
         }
    +
    +  override def getDriverLogUrls: Option[Map[String, String]] = {
    +    var yarnClientOpt: Option[YarnClient] = None
    +    var driverLogs: Option[Map[String, String]] = None
    +    try {
    +      val yarnConf = new YarnConfiguration(sc.hadoopConfiguration)
    +      val containerId = YarnSparkHadoopUtil.get.getContainerId
    +      yarnClientOpt = Some(YarnClient.createYarnClient())
    +      yarnClientOpt.foreach { yarnClient =>
    +        yarnClient.init(yarnConf)
    +        yarnClient.start()
    +
    +        // For newer versions of YARN, we can find the HTTP address for a 
given node by getting a
    +        // container report for a given container. But container reports 
came only in Hadoop 2.4,
    +        // so we basically have to get the node reports for all nodes and 
find the one which runs
    +        // this container. For that we have to compare the node's host 
against the current host.
    +        // Since the host can have multiple addresses, we need to compare 
against all of them to
    +        // find out if one matches.
    +
    +        // Get all the addresses of this node.
    +        val addresses =
    +          NetworkInterface.getNetworkInterfaces.asScala
    +            .flatMap(_.getInetAddresses.asScala)
    +            .toSeq
    +
    +        // Find a node report that matches one of the addresses
    +        val nodeReport =
    +          yarnClient.getNodeReports(NodeState.RUNNING).asScala.find { x =>
    +            val host = x.getNodeId.getHost
    +            addresses.exists { address =>
    +              address.getHostAddress == host ||
    +                address.getHostName == host ||
    +                address.getCanonicalHostName == host
    +            }
    +          }
    +
    +        // Build the HTTP address for the node and build the URL for the 
logs.
    +        nodeReport.foreach { report =>
    +          val httpAddress = report.getHttpAddress
    +          // lookup appropriate http scheme for container log urls
    +          val yarnHttpPolicy = yarnConf.get(
    +            YarnConfiguration.YARN_HTTP_POLICY_KEY,
    +            YarnConfiguration.YARN_HTTP_POLICY_DEFAULT
    +          )
    +          val user = Utils.getCurrentUserName()
    +          val httpScheme = if (yarnHttpPolicy == "HTTPS_ONLY") "https://"; 
else "http://";
    +          val baseUrl = 
s"$httpScheme$httpAddress/node/containerlogs/$containerId/$user"
    --- End diff --
    
    Hari and I discussed this offline a bit on how this works when you've got 
multiple containers on a node -- it is just a bit confusing so I suggested 
adding a comment here, something like: "The nodeReport gives us the httpAddress 
for the NodeManager, which may be shared by more than one container on that 
node.  But we know we have the container for the driver because we use the 
containerId as well"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to