[ https://issues.apache.org/jira/browse/SPARK-22342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429950#comment-16429950 ]
Susan X. Huynh commented on SPARK-22342: ---------------------------------------- Good news: I found the root cause of the multiple registration bug, and it is not a Spark bug. It is caused by a bug in libmesos: "using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop", https://issues.apache.org/jira/browse/MESOS-8171 . This bug leads to the multiple SUBSCRIBE calls seen in the driver logs. Upgrading the libmesos bundle in my Docker image to a version with this patch fixed the issue. cc [~skonto] > refactor schedulerDriver registration > ------------------------------------- > > Key: SPARK-22342 > URL: https://issues.apache.org/jira/browse/SPARK-22342 > Project: Spark > Issue Type: Improvement > Components: Mesos > Affects Versions: 2.2.0 > Reporter: Stavros Kontopoulos > Priority: Major > > This is an umbrella issue for working on: > https://github.com/apache/spark/pull/13143 > and handle the multiple re-registration issue which invalidates an offer. > To test: > dcos spark run --verbose --name=spark-nohive --submit-args="--driver-cores > 1 --conf spark.cores.max=1 --driver-memory 512M --class > org.apache.spark.examples.SparkPi http://.../spark-examples_2.11-2.2.0.jar" > master log: > I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3085 hierarchical.cpp:303] Added framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3085 hierarchical.cpp:412] Deactivated framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3090 hierarchical.cpp:380] Activated framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi > with checkpointing disabled and capabilities [ ] > I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3087 master.cpp:3083] Framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark > Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed > over > I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for > framework 'Spark Pi' at > scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 > I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for > framework 'Spark Pi' at > scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 > I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for > framework 'Spark Pi' at > scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 > I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for > framework 'Spark Pi' at > scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 > I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi > with checkpointing disabled and capabilities [ ] > I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3087 master.cpp:3083] Framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark > Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed > over > I1020 13:49:05.000000 3087 master.cpp:7662] Sending 6 offers to framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark > Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 > I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi > with checkpointing disabled and capabilities [ ] > I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3087 master.cpp:3083] Framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark > Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed > over > I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer > 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039 > I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer > 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038 > I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer > 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037 > I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer > 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036 > I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer > 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035 > I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer > 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 > I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for > framework 'Spark Pi' at > scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 > I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi > with checkpointing disabled and capabilities [ ] > I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3087 master.cpp:3083] Framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark > Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed > over > I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi > with checkpointing disabled and capabilities [ ] > I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3087 master.cpp:3083] Framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark > Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed > over > I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi > with checkpointing disabled and capabilities [ ] > I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 > I1020 13:49:05.000000 3087 master.cpp:3083] Framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark > Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed > over > I1020 13:49:06.000000 3084 master.cpp:7662] Sending 6 offers to framework > 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark > Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 > I1020 13:49:06.000000 3089 http.cpp:1166] HTTP GET for /master/slaves from > 10.0.4.84:37398 with User-Agent='Go-http-client/1.1' > driver log: > 17/10/20 13:49:07 INFO MesosCoarseGrainedSchedulerBackend: SchedulerBackend > is ready for scheduling beginning after reached minRegisteredResourcesRatio: > 0.0 > 17/10/20 13:49:07 DEBUG SparkContext: Adding shutdown hook > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S2. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S3. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S0. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S1. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S6. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S5. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S2. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S3. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S0. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S1. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a > task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039 on slave > with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S6. Requirements were not met > for this offer. > 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Accepting offer: > 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 with attributes: Map() allocation > info: role: "*" > ... > 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 0 is > now TASK_LOST > 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: taskId has > executorId: > 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: taskId has > message:Task launched with invalid offers: Offer > 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 is no longer valid -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org