[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613974#comment-16613974 ] ASF GitHub Bot commented on FLINK-10223: asfgit closed pull request #6679: [FLINK-10223][LOG]Logging with resourceId during taskmanager startup URL: https://github.com/apache/flink/pull/6679 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java b/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java index eb6df19a4ed..1ce29af00a0 100644 --- a/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java +++ b/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java @@ -710,6 +710,7 @@ private RegistrationResponse registerTaskExecutorInternal( WorkerRegistration registration = new WorkerRegistration<>(taskExecutorGateway, newWorker, dataPort, hardwareDescription); + log.info("Registering TaskManager {} ({}) at ResourceManager", taskExecutorResourceId, taskExecutorAddress); taskExecutors.put(taskExecutorResourceId, registration); taskManagerHeartbeatManager.monitorTarget(taskExecutorResourceId, new HeartbeatTarget() { diff --git a/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java b/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java index afe22dea1a4..5c1f420dd54 100644 --- a/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java +++ b/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java @@ -349,6 +349,8 @@ public static TaskExecutor startTaskManager( checkNotNull(rpcService); checkNotNull(highAvailabilityServices); + LOG.info("Starting TaskManager with ResourceID: {}", resourceID); + InetAddress remoteAddress = InetAddress.getByName(rpcService.getAddress()); TaskManagerServicesConfiguration taskManagerServicesConfiguration = diff --git a/flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala b/flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala index 0988730689a..b58a67c1c07 100644 --- a/flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala +++ b/flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala @@ -350,7 +350,7 @@ class JobManager( hardwareInformation, numberOfSlots) => // we are being informed by the ResourceManager that a new task manager is available - log.debug(s"RegisterTaskManager: $msg") + log.info(s"RegisterTaskManager: $msg") val taskManager = sender() diff --git a/flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala b/flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala index c04084c55f4..2008ad87d56 100644 --- a/flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala +++ b/flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala @@ -1831,7 +1831,7 @@ object TaskManager { taskManagerClass: Class[_ <: TaskManager]) : Unit = { -LOG.info("Starting TaskManager") +LOG.info(s"Starting TaskManager with ResourceID: $resourceID") // Bring up the TaskManager actor system first, bind it to the given address. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: Gary Yao >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613422#comment-16613422 ] ASF GitHub Bot commented on FLINK-10223: GJL commented on issue #6679: [FLINK-10223][LOG]Logging with resourceId during taskmanager startup URL: https://github.com/apache/flink/pull/6679#issuecomment-420992511 LGTM, merging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611364#comment-16611364 ] ASF GitHub Bot commented on FLINK-10223: Aitozi commented on a change in pull request #6679: [FLINK-10223][LOG]Logging with resourceId during taskmanager startup URL: https://github.com/apache/flink/pull/6679#discussion_r216853295 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java ## @@ -349,6 +349,8 @@ public static TaskExecutor startTaskManager( checkNotNull(rpcService); checkNotNull(highAvailabilityServices); + LOG.info("Starting taskManager {}", resourceID); Review comment: Agree, thx for your suggestion, have updated the PR according to the comments. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610845#comment-16610845 ] ASF GitHub Bot commented on FLINK-10223: yanghua commented on a change in pull request #6679: [FLINK-10223][LOG]Logging with resourceId during taskmanager startup URL: https://github.com/apache/flink/pull/6679#discussion_r216725459 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java ## @@ -349,6 +349,8 @@ public static TaskExecutor startTaskManager( checkNotNull(rpcService); checkNotNull(highAvailabilityServices); + LOG.info("Starting taskManager {}", resourceID); Review comment: Agree @Clark , in addition, `taskManager` -> `TaskManager` looks better to me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610358#comment-16610358 ] ASF GitHub Bot commented on FLINK-10223: Clark commented on a change in pull request #6679: [FLINK-10223][LOG]Logging with resourceId during taskmanager startup URL: https://github.com/apache/flink/pull/6679#discussion_r216607537 ## File path: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala ## @@ -1831,7 +1831,7 @@ object TaskManager { taskManagerClass: Class[_ <: TaskManager]) : Unit = { -LOG.info("Starting TaskManager") +LOG.info(s"Starting TaskManager $resourceID") Review comment: Same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610357#comment-16610357 ] ASF GitHub Bot commented on FLINK-10223: Clark commented on a change in pull request #6679: [FLINK-10223][LOG]Logging with resourceId during taskmanager startup URL: https://github.com/apache/flink/pull/6679#discussion_r216607446 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java ## @@ -349,6 +349,8 @@ public static TaskExecutor startTaskManager( checkNotNull(rpcService); checkNotNull(highAvailabilityServices); + LOG.info("Starting taskManager {}", resourceID); Review comment: I think "Starting taskManager with ResourceID: {}, resourceID)" would be more explicit. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610076#comment-16610076 ] ASF GitHub Bot commented on FLINK-10223: Aitozi commented on issue #6679: [FLINK-10223][LOG]Logging with resourceId during taskmanager startup URL: https://github.com/apache/flink/pull/6679#issuecomment-420133979 cc @GJL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609935#comment-16609935 ] aitozi commented on FLINK-10223: [~gjy]I have pushed the PR, could help take a look please, thx > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609933#comment-16609933 ] ASF GitHub Bot commented on FLINK-10223: Aitozi opened a new pull request #6679: [FLINK-10223][LOG]Logging with resourceId during taskmanager startup URL: https://github.com/apache/flink/pull/6679 ## What is the purpose of the change Log the resourceId during taskmanager startup to help with find the specific taskmanager when encountering an exception happened in a certain tm ## Brief change log - add two minor log ## Verifying this change verify by running a local cluster ``` Jying@Jying:flink-1.7-SNAPSHOT$ cat log/flink-Jying-taskexecutor-0-Jying-Pro-MacBook.local.log | grep "Starting taskManager" 2018-09-11 07:55:29,457 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Starting taskManager 13d3e0bab7dcae1d979665acb5afaa31 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Labels: pull-request-available > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10223) TaskManagers should log their ResourceID during startup
[ https://issues.apache.org/jira/browse/FLINK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608506#comment-16608506 ] Gary Yao commented on FLINK-10223: -- [~aitozi] What is the state of this ticket? > TaskManagers should log their ResourceID during startup > --- > > Key: FLINK-10223 > URL: https://issues.apache.org/jira/browse/FLINK-10223 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.1, 1.7.0 >Reporter: Konstantin Knauf >Assignee: aitozi >Priority: Major > Fix For: 1.6.1, 1.7.0, 1.5.4 > > > To debug exceptions like "org.apache.flink.util.FlinkException: The assigned > slot was removed." in the master container it is often helpful to > know, which slot was provided by which Taskmanager. The only way to relate > slots to TaskManagers right now, seems to be to enable DEBUG logging for > `org.apache.flink.runtime.jobmaster.slotpool.SlotPool`. > This would be solved, if each Taskmanager would log out their `ResouceID` > during startup as the `SlotID` mainly consists of the `ResourceID` of the > providing Taskmanager. For Mesos and YARN the `ResourceID` has an intrinsic > meaning, but for a stand-alone or containerized setup the `ResourceID` is > just the a random ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)