> On Sept. 14, 2016, 3:48 p.m., Maxim Khutornenko wrote: > > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java, > > line 82 > > <https://reviews.apache.org/r/51874/diff/5/?file=1498686#file1498686line82> > > > > Did you try to rollback to pre 0.15 scheduler while changing the > > framework name? Trying to see if we can drop this 'backwards incompatible' > > statement now. > > Santhosh Kumar Shanmugham wrote: > Tested "roll-forward" (to Aurora) and "roll-back" (via release and config > change) (to TwitterScheduler) on Aurora-0.14 (depends on Mesos-0.27.2) and > Aurora-0.15(dependes on Mesos-0.28.2). The master was able to re-register the > framework with the same "id" and the running tasks were continuing to make > progress. (See details in testing section) > > However I could not rollback the scheduler from 0.15 to 0.14 from source > inside vagrant. Started to on "aurorabuild all" complain with message, > "Could not satisfy all requirements for mesos.native==0.27.2" > > Santhosh Kumar Shanmugham wrote: > Tested changing the framework_name on Aurora 0.14, 0.15 and master. > Dropping the comment about 'backward incompatible'.
Just to be clear, you tested this change against a single Mesos master verison right? Could you share which version of Mesos that was? - Zameer ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51874/#review148988 ----------------------------------------------------------- On Sept. 14, 2016, 5:33 p.m., Santhosh Kumar Shanmugham wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51874/ > ----------------------------------------------------------- > > (Updated Sept. 14, 2016, 5:33 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1688 > https://issues.apache.org/jira/browse/AURORA-1688 > > > Repository: aurora > > > Description > ------- > > Change framework_name default value from 'TwitterScheduler' to 'Aurora' > > > Diffs > ----- > > RELEASE-NOTES.md ad2c68a6defe07c94480d7dee5b1496b50dc34e5 > > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java > 8a386bd208956eb0c8c2f48874b0c6fb3af58872 > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > 97677f24a50963178a123b420d7ac136e4fde3fe > > Diff: https://reviews.apache.org/r/51874/diff/ > > > Testing > ------- > > ./build-support/jenkins/build.sh > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > Testing to make sure backward compatibility: > > # HEAD of master: > > # Case 1: Rolling forward does not impact running tasks: > Renaming framework from 'TwitterScheduler' to 'Aurora': > > The framework re-registers after restart (treated by master as failover) and > gets the same framework-id. Running task remain unaffected. > > Master log: > I0914 16:48:28.408182 9815 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 3weeks to > failover > I0914 16:48:28.408226 9815 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > E0914 16:48:28.408617 9819 process.cpp:2105] Failed to shutdown socket with > fd 28: Transport endpoint is not connected > I0914 16:48:43.722126 9813 master.cpp:2424] Received SUBSCRIBE call for > framework 'Aurora' at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > I0914 16:48:43.722190 9813 master.cpp:2500] Subscribing framework Aurora > with checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, > GPU_RESOURCES ] > I0914 16:48:43.722225 9813 master.cpp:2564] Updating info for framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:48:43.722256 9813 master.cpp:2577] Framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 failed over > I0914 16:48:43.722429 9813 hierarchical.cpp:348] Activated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:48:43.722595 9813 master.cpp:5709] Sending 1 offers to framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > > Scheduler log: > I0914 16:48:44.157 [Thread-10, MesosSchedulerImpl:151] Registered with ID > value: "071c44a1-b4d4-4339-a727-03a79f725851-0000" > , master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae" > ip: 119646400 > port: 5050 > pid: "master@192.168.33.7:5050" > hostname: "aurora.local" > version: "1.0.0" > address { > hostname: "aurora.local" > ip: "192.168.33.7" > port: 5050 > } > > # Case 2: Rolling backward does not impact running tasks: > Rolling back framework name from 'Aurora' to 'TwitterScheduler': > > The framework re-registers after restart (treated by master as failover) and > gets the same framework-id. Running task remain unaffected. > > Master log: > I0914 16:51:33.203495 9812 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 3weeks to > failover > I0914 16:51:33.203526 9812 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:51:49.614074 9813 master.cpp:2424] Received SUBSCRIBE call for > framework 'TwitterScheduler' at > scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083 > I0914 16:51:49.614215 9813 master.cpp:2500] Subscribing framework > TwitterScheduler with checkpointing enabled and capabilities [ > REVOCABLE_RESOURCES, GPU_RESOURCES ] > I0914 16:51:49.614312 9813 master.cpp:2564] Updating info for framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:51:49.614359 9813 master.cpp:2577] Framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 failed over > I0914 16:51:49.614977 9813 hierarchical.cpp:348] Activated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 16:51:49.615170 9813 master.cpp:5709] Sending 1 offers to framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083 > > Scheduler log: > I0914 16:51:50.249 [Thread-10, MesosSchedulerImpl:151] Registered with ID > value: "071c44a1-b4d4-4339-a727-03a79f725851-0000" > , master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae" > ip: 119646400 > port: 5050 > pid: "master@192.168.33.7:5050" > hostname: "aurora.local" > version: "1.0.0" > address { > hostname: "aurora.local" > ip: "192.168.33.7" > port: 5050 > } > > # Case 3: Restarting with old framework_name (rolling back config) does not > impact running tasks: > Restarting the scheduler after updating the config from 'Aurora' to > 'TwitterScheduler': > > Rename takes effect. The master re-registered the framework to the same id. > Running task remain unaffected. > > Master log: > I0914 20:34:58.059640 28176 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (Aurora) at > scheduler-4a7c21b7-5d90-4218-936e-4142051b3444@192.168.33.7:8083 3weeks to > failover > I0914 20:34:58.059675 28176 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 20:35:23.447479 28175 master.cpp:2424] Received SUBSCRIBE call for > framework 'TwitterScheduler' at > scheduler-cea31751-7cb5-46b2-8208-f9ab1d4fe86c@192.168.33.7:8083 > I0914 20:35:23.447573 28175 master.cpp:2500] Subscribing framework > TwitterScheduler with checkpointing enabled and capabilities [ > REVOCABLE_RESOURCES, GPU_RESOURCES ] > I0914 20:35:23.447592 28175 master.cpp:2564] Updating info for framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 20:35:23.447615 28175 master.cpp:2577] Framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-4a7c21b7-5d90-4218-936e-4142051b3444@192.168.33.7:8083 failed over > I0914 20:35:23.447777 28175 hierarchical.cpp:348] Activated framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 > I0914 20:35:23.447968 28175 master.cpp:5709] Sending 1 offers to framework > 071c44a1-b4d4-4339-a727-03a79f725851-0000 (TwitterScheduler) at > scheduler-cea31751-7cb5-46b2-8208-f9ab1d4fe86c@192.168.33.7:8083 > > Scheduler log: > I0914 20:35:24.000 [Thread-10, MesosSchedulerImpl:151] Registered with ID > value: "071c44a1-b4d4-4339-a727-03a79f725851-0000 > " > , master: id: "848618fb-714d-4b00-ad80-950f6bdc70c6" > ip: 119646400 > port: 5050 > pid: "master@192.168.33.7:5050" > hostname: "aurora.local" > version: "1.0.0" > address { > hostname: "aurora.local" > ip: "192.168.33.7" > port: 5050 > } > > # Testing on olders versions that uses Mesos 0.28 (Aurora 0.15) and Mesos > 0.27 (Aurora 0.14) > > # Aurora Version: 0.14 > # Initial version (TwitterScheduler) > # > https://git-wip-us.apache.org/repos/asf?p=aurora.git;a=commit;h=b0b598088847630f37c3f995db98a8edf9520b7e > > git reset —hard b0b598088847630f37c3f995db98a8edf9520b7e # reset HEAD to v0.14 > vagrant destroy > vagrant up > > vagrant ssh -c "aurora job create devcluster/www-data/prod/hello > aurora/examples/jobs/hello_world.aurora" # start some job > > Verify the framework name (TwitterScheduler) and id (some id - XXX) > http://192.168.33.7:5050/#/frameworks > Verify the task is running > http://192.168.33.7:8081/scheduler/www-data/prod/hello > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0914 23:26:52.095 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > > # Roll forward > git apply -3 ~/Downloads/rb51874.patch # apply the framework name default > change > vagrant ssh -c “aurorabuild scheduler” # rebuild > > Verify the framework name (Aurora) and id (same id - XXX) > http://192.168.33.7:5050/#/frameworks > Verify the task is still running > http://192.168.33.7:8081/scheduler/www-data/prod/hello > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0914 23:26:52.095 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:33:19.336 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > > # Roll backward > git stash > vagrant ssh -c “aurorabuild scheduler” # rebuild > > Verify the framework name (TwitterScheduler) and id (same id - XXX) > http://192.168.33.7:5050/#/frameworks > Verify the task is still running > http://192.168.33.7:8081/scheduler/www-data/prod/hello > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0914 23:26:52.095 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:33:19.336 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:35:28.734 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > > # Roll forward again > git apply -3 ~/Downloads/rb51874.patch # apply the framework name default > change > vagrant ssh -c “aurorabuild scheduler” # rebuild > > Verify the framework name (Aurora) and id (same id - XXX) > http://192.168.33.7:5050/#/frameworks > Verify the task is still running > http://192.168.33.7:8081/scheduler/www-data/prod/hello > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0914 23:26:52.095 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:33:19.336 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:35:28.734 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:36:29.195 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > > # Restart with old framework name > vagrant ssh > sudo vim /etc/init/aurora-scheduler.conf > # add -framework_name=TwitterScheduler after "exec bin/aurora-scheduler” and > save > sudo stop aurora-scheduler > sudo start aurora-scheduler > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0914 23:26:52.095 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:33:19.336 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:35:28.734 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:36:29.195 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > I0914 23:39:46.118 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "317cd38a-edc1-4168-8c20-ca81d8306e04-0000" > > Verify the framework name (TwitterScheduler) and id (same id - XXX) > http://192.168.33.7:5050/#/frameworks > Verify the task is still running > http://192.168.33.7:8081/scheduler/www-data/prod/hello > > # Aurora Version: 0.15 > > # Initial Version > # > https://git-wip-us.apache.org/repos/asf?p=aurora.git;a=commit;h=e870884fb30bc4d960aa5ed4901df679edbafb34 > git reset —hard e870884fb30bc4d960aa5ed4901df679edbafb34 > > vagrant destroy > vagrant up > > # start some job > vagrant ssh -c "aurora job create devcluster/www-data/prod/hello > aurora/examples/jobs/hello_world.aurora" > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0915 00:08:11.136 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > > vagrant@aurora:~$ aurora job status devcluster > INFO] Retrieving jobs for role None > INFO] Checking status of devcluster/www-data/prod/hello > Active tasks (1): > Task role: www-data, env: prod, name: hello, instance: 0, status: > RUNNING on 192.168.33.7 > CPU: 1.0 core(s), RAM: 128 MB, Disk: 128 MB > events: > 2016-09-15 00:08:36 PENDING: None > 2016-09-15 00:08:37 ASSIGNED: None > 2016-09-15 00:08:39 STARTING: Initializing sandbox. > 2016-09-15 00:08:39 RUNNING: None > Inactive tasks (0): > > # Roll forward: > git stash pop > vagrant ssh -c "aurorabuild scheduler" > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0915 00:08:11.136 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:12:33.395 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > vagrant@aurora:~$ aurora job status devcluster > INFO] Retrieving jobs for role None > INFO] Checking status of devcluster/www-data/prod/hello > Active tasks (1): > Task role: www-data, env: prod, name: hello, instance: 0, status: > RUNNING on 192.168.33.7 > CPU: 1.0 core(s), RAM: 128 MB, Disk: 128 MB > events: > 2016-09-15 00:08:36 PENDING: None > 2016-09-15 00:08:37 ASSIGNED: None > 2016-09-15 00:08:39 STARTING: Initializing sandbox. > 2016-09-15 00:08:39 RUNNING: None > Inactive tasks (0): > > # Rollback > git stash > vagrant ssh -c "aurorabuild scheduler" > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0915 00:08:11.136 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:12:33.395 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:14:49.374 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > vagrant@aurora:~$ aurora job status devcluster > INFO] Retrieving jobs for role None > INFO] Checking status of devcluster/www-data/prod/hello > Active tasks (1): > Task role: www-data, env: prod, name: hello, instance: 0, status: > RUNNING on 192.168.33.7 > CPU: 1.0 core(s), RAM: 128 MB, Disk: 128 MB > events: > 2016-09-15 00:08:36 PENDING: None > 2016-09-15 00:08:37 ASSIGNED: None > 2016-09-15 00:08:39 STARTING: Initializing sandbox. > 2016-09-15 00:08:39 RUNNING: None > Inactive tasks (0): > > # Roll forward: > git stash pop > vagrant ssh -c "aurorabuild scheduler" > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0915 00:08:11.136 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:12:33.395 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:14:49.374 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:16:14.004 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > vagrant@aurora:~$ aurora job status devcluster > INFO] Retrieving jobs for role None > INFO] Checking status of devcluster/www-data/prod/hello > Active tasks (1): > Task role: www-data, env: prod, name: hello, instance: 0, status: > RUNNING on 192.168.33.7 > CPU: 1.0 core(s), RAM: 128 MB, Disk: 128 MB > events: > 2016-09-15 00:08:36 PENDING: None > 2016-09-15 00:08:37 ASSIGNED: None > 2016-09-15 00:08:39 STARTING: Initializing sandbox. > 2016-09-15 00:08:39 RUNNING: None > Inactive tasks (0): > > # Restart with old framework name > vagrant ssh > sudo vim /etc/init/aurora-scheduler.conf > # add -framework_name=TwitterScheduler after "exec bin/aurora-scheduler” and > save > sudo stop aurora-scheduler > sudo start aurora-scheduler > > vagrant@aurora:~$ sudo grep 'Registered with ID value' > /var/log/upstart/aurora-scheduler.log > I0915 00:08:11.136 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:12:33.395 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:14:49.374 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:16:14.004 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > I0915 00:18:16.200 [Thread-11, MesosSchedulerImpl:151] Registered with ID > value: "308d7661-6bb1-4936-86b4-a01158bfa06b-0000" > vagrant@aurora:~$ aurora job status devcluster > INFO] Retrieving jobs for role None > INFO] Checking status of devcluster/www-data/prod/hello > Active tasks (1): > Task role: www-data, env: prod, name: hello, instance: 0, status: > RUNNING on 192.168.33.7 > CPU: 1.0 core(s), RAM: 128 MB, Disk: 128 MB > events: > 2016-09-15 00:08:36 PENDING: None > 2016-09-15 00:08:37 ASSIGNED: None > 2016-09-15 00:08:39 STARTING: Initializing sandbox. > 2016-09-15 00:08:39 RUNNING: None > Inactive tasks (0): > > > Thanks, > > Santhosh Kumar Shanmugham > >