Hi Spark developers, For Spark running on YARN, I would like to be able to find out the container where an executor is running by looking at the logs. I haven't been able to find a way to do this, not even with the Spark UI, as neither the Executors tab nor the stage information page show the container id. I was thinking on modifying the logs sent in YarnAllocator to log the executor id on container start, as follows:
@@ -494,7 +494,8 @@ private[yarn] class YarnAllocator( val containerId = container.getId val executorId = executorIdCounter.toString assert(container.getResource.getMemory >= resource.getMemory) - logInfo(s"Launching container $containerId on host $executorHostname") + logInfo(s"Launching container $containerId on host $executorHostname " + + s"for executor with ID $executorId") def updateInternalState(): Unit = synchronized { numExecutorsRunning += 1 @@ -528,7 +529,8 @@ private[yarn] class YarnAllocator( updateInternalState() } catch { case NonFatal(e) => - logError(s"Failed to launch executor $executorId on container $containerId", e) + logError(s"Failed to launch executor $executorId on container $containerId " + + s"for executor with ID $executorId", e) // Assigned container should be released immediately to avoid unnecessary resource // occupation. amClient.releaseAssignedContainer(containerId) Do you think this is a good idea, or there is a better way to achieve this? Thanks in advance, Juan ?