> On Sept. 13, 2016, 5:11 p.m., Zameer Manji wrote: > > I support this change as a developer. > > > > As an operator I am scared. > > > > What happens to an existing cluster if we don't set `framework_name`? Will > > it register another frameowork_id? (bad) or will it fail to register? > > (better). > > Santhosh Kumar Shanmugham wrote: > The restarting framework will be treated like a scheduler fail-over. > > Zameer Manji wrote: > The release notes in this patch says > > Update default value of command line option `-framework_name` to > 'aurora'. Please be aware that > depending on your usage of Mesos, this will be a backward incompatible > change. > > I'm trying to understand the implications of the backwards > incompatability. Will the scheduler fail to register or will it register > under a new frameworkid (and then lose track of previous tasks?) > > Joshua Cohen wrote: > Santhosh, did you verify this in vagrant with a scheduler that already > had tasks running? If it is backwards compatible then we can probably adjust > the release notes? > > Santhosh Kumar Shanmugham wrote: > Results from testing in Vagrant cluster, > > Renaming framework from 'TwitterScheduler' to 'Aurora': > > The framework re-registers after restart (treated by master as failover) > and gets the same framework-id and performs task reconciliation thereby > restoring the tasks. > > I0914 16:48:28.408182 9815 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 3weeks to > failover > I0914 16:48:28.408226 9815 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > E0914 16:48:28.408617 9819 process.cpp:2105] Failed to shutdown socket > with fd 28: Transport endpoint is not connected > I0914 16:48:43.722126 9813 master.cpp:2424] Received SUBSCRIBE call for > framework 'Aurora' at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > I0914 16:48:43.722190 9813 master.cpp:2500] Subscribing framework Aurora > with checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, > GPU_RESOURCES ] > I0914 16:48:43.722225 9813 master.cpp:2564] Updating info for framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:48:43.722256 9813 master.cpp:2577] Framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 failed over > I0914 16:48:43.722429 9813 hierarchical.cpp:348] Activated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:48:43.722595 9813 master.cpp:5709] Sending 1 offers to > framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > I0914 16:49:44.204677 9812 master.cpp:5447] Performing explicit task > state reconciliation for 1 tasks of framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > > Rolling back framework name to 'TwitterScheduler' from 'Aurora': > > Same here. > > I0914 16:51:33.203495 9812 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 3weeks to > failover > I0914 16:51:33.203526 9812 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:51:49.614074 9813 master.cpp:2424] Received SUBSCRIBE call for > framework 'TwitterScheduler' at > scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083 > I0914 16:51:49.614215 9813 master.cpp:2500] Subscribing framework > TwitterScheduler with checkpointing enabled and capabilities [ > REVOCABLE_RESOURCES, GPU_RESOURCES ] > I0914 16:51:49.614312 9813 master.cpp:2564] Updating info for framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:51:49.614359 9813 master.cpp:2577] Framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 failed over > I0914 16:51:49.614977 9813 hierarchical.cpp:348] Activated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:51:49.615170 9813 master.cpp:5709] Sending 1 offers to > framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083 > I0914 16:52:50.315119 9812 master.cpp:5447] Performing explicit task > state reconciliation for 1 tasks of framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083 > > Restarting the scheduler after updating the config to 'TwitterScheduler' > from 'Aurora': > > Rename did not take effect. The master re-registered the framework to the > same id and performed a task reconciliation. > > I0914 20:11:49.178103 28171 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-c42cd8cf-09a0-4d81-a947-094c4fac601e@192.168.33.7:8083 3weeks to > failover > I0914 20:11:49.178138 28171 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > E0914 20:11:49.183275 28178 process.cpp:2105] Failed to shutdown socket > with fd 29: Transport endpoint is not connected > I0914 20:12:33.277560 28177 master.cpp:2424] Received SUBSCRIBE call for > framework 'Aurora' at > scheduler-6dcb9baa-503f-44a9-9df6-79da717f3a1c@192.168.33.7:8083 > I0914 20:12:33.277710 28177 master.cpp:2500] Subscribing framework Aurora > with checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, > GPU_RESOURCES ] > I0914 20:12:33.277753 28177 master.cpp:2564] Updating info for framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 20:12:33.277784 28177 master.cpp:2577] Framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-c42cd8cf-09a0-4d81-a947-094c4fac601e@192.168.33.7:8083 failed over > I0914 20:12:33.277961 28177 hierarchical.cpp:348] Activated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 20:12:33.278136 28177 master.cpp:5709] Sending 1 offers to > framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-6dcb9baa-503f-44a9-9df6-79da717f3a1c@192.168.33.7:8083 > I0914 20:13:33.848175 28175 master.cpp:5447] Performing explicit task > state reconciliation for 1 tasks of framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-6dcb9baa-503f-44a9-9df6-79da717f3a1c@192.168.33.7:8083 > > In all the above cases the running task was not affected and was > available in the UI after the scheduler restarted.
Update the last case (restarting the Scheduler with an old framwork name): Rename *does* take effect. The master re-registered the framework to the same id and performed a task reconciliation. I0914 20:34:58.059640 28176 master.cpp:1297] Giving framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at scheduler-4a7c21b7-5d90-4218-936e-4142051b3444@192.168.33.7:8083 3weeks to failover I0914 20:34:58.059675 28176 hierarchical.cpp:382] Deactivated framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 I0914 20:35:23.447479 28175 master.cpp:2424] Received SUBSCRIBE call for framework 'TwitterScheduler' at scheduler-cea31751-7cb5-46b2-8208-f9ab1d4fe86c@192.168.33.7:8083 I0914 20:35:23.447573 28175 master.cpp:2500] Subscribing framework TwitterScheduler with checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, GPU_RESOURCES ] I0914 20:35:23.447592 28175 master.cpp:2564] Updating info for framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 I0914 20:35:23.447615 28175 master.cpp:2577] Framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at scheduler-4a7c21b7-5d90-4218-936e-4142051b3444@192.168.33.7:8083 failed over I0914 20:35:23.447777 28175 hierarchical.cpp:348] Activated framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 I0914 20:35:23.447968 28175 master.cpp:5709] Sending 1 offers to framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at scheduler-cea31751-7cb5-46b2-8208-f9ab1d4fe86c@192.168.33.7:8083 I0914 20:36:24.069891 28173 master.cpp:5447] Performing explicit task state reconciliation for 1 tasks of framework 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at scheduler-cea31751-7cb5-46b2-8208-f9ab1d4fe86c@192.168.33.7:8083 - Santhosh Kumar ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51874/#review148816 ----------------------------------------------------------- On Sept. 13, 2016, 5:18 p.m., Santhosh Kumar Shanmugham wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51874/ > ----------------------------------------------------------- > > (Updated Sept. 13, 2016, 5:18 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1688 > https://issues.apache.org/jira/browse/AURORA-1688 > > > Repository: aurora > > > Description > ------- > > Change framework_name default value from 'TwitterScheduler' to 'aurora' > > > Diffs > ----- > > RELEASE-NOTES.md ad2c68a6defe07c94480d7dee5b1496b50dc34e5 > > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java > 8a386bd208956eb0c8c2f48874b0c6fb3af58872 > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > 97677f24a50963178a123b420d7ac136e4fde3fe > > Diff: https://reviews.apache.org/r/51874/diff/ > > > Testing > ------- > > ./build-support/jenkins/build.sh > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > > Thanks, > > Santhosh Kumar Shanmugham > >