[ 
https://issues.apache.org/jira/browse/SPARK-22342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429950#comment-16429950
 ] 

Susan X. Huynh commented on SPARK-22342:
----------------------------------------

Good news: I found the root cause of the multiple registration bug, and it is 
not a Spark bug. It is caused by a bug in libmesos: "using a failoverTimeout of 
0 with Mesos native scheduler client can result in infinite subscribe loop", 
https://issues.apache.org/jira/browse/MESOS-8171 . This bug leads to the 
multiple SUBSCRIBE calls seen in the driver logs. Upgrading the libmesos bundle 
in my Docker image to a version with this patch fixed the issue. cc [~skonto]

> refactor schedulerDriver registration
> -------------------------------------
>
>                 Key: SPARK-22342
>                 URL: https://issues.apache.org/jira/browse/SPARK-22342
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos
>    Affects Versions: 2.2.0
>            Reporter: Stavros Kontopoulos
>            Priority: Major
>
> This is an umbrella issue for working on:
> https://github.com/apache/spark/pull/13143
> and handle the multiple re-registration issue which invalidates an offer.
> To test:
>  dcos spark run --verbose --name=spark-nohive  --submit-args="--driver-cores 
> 1 --conf spark.cores.max=1 --driver-memory 512M --class 
> org.apache.spark.examples.SparkPi http://.../spark-examples_2.11-2.2.0.jar";
> master log:
> I1020 13:49:05.000000  3087 master.cpp:6618] Updating info for framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000  3085 hierarchical.cpp:303] Added framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000  3085 hierarchical.cpp:412] Deactivated framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000  3090 hierarchical.cpp:380] Activated framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000  3087 master.cpp:2974] Subscribing framework Spark Pi 
> with checkpointing disabled and capabilities [  ]
> I1020 13:49:05.000000  3087 master.cpp:6618] Updating info for framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000  3087 master.cpp:3083] Framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark 
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed 
> over
> I1020 13:49:05.000000  3087 master.cpp:2894] Received SUBSCRIBE call for 
> framework 'Spark Pi' at 
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000  3087 master.cpp:2894] Received SUBSCRIBE call for 
> framework 'Spark Pi' at 
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for 
> framework 'Spark Pi' at 
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for 
> framework 'Spark Pi' at 
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi 
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark 
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed 
> over
> I1020 13:49:05.000000 3087 master.cpp:7662] Sending 6 offers to framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark 
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi 
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark 
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed 
> over
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034
> I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for 
> framework 'Spark Pi' at 
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi 
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark 
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed 
> over
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi 
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark 
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed 
> over
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi 
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark 
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed 
> over
> I1020 13:49:06.000000 3084 master.cpp:7662] Sending 6 offers to framework 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark 
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:06.000000 3089 http.cpp:1166] HTTP GET for /master/slaves from 
> 10.0.4.84:37398 with User-Agent='Go-http-client/1.1'
> driver log:
> 17/10/20 13:49:07 INFO MesosCoarseGrainedSchedulerBackend: SchedulerBackend 
> is ready for scheduling beginning after reached minRegisteredResourcesRatio: 
> 0.0
> 17/10/20 13:49:07 DEBUG SparkContext: Adding shutdown hook
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S2. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S3. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S0. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S1. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S6. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S5. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S2. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S3. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S0. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S1. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a 
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039 on slave 
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S6. Requirements were not met 
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Accepting offer: 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 with attributes: Map() allocation 
> info: role: "*"
> ...
> 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 0 is 
> now TASK_LOST
> 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: taskId has 
> executorId:
> 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: taskId has 
> message:Task launched with invalid offers: Offer 
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 is no longer valid



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to