Since this topic is now 16 emails long, I'll make it 17 to try to do some summarization, address concerns, and see if we can get some consensus. Lately our email discussion get to this point where there are many, many emails back and forth, but we aren't really getting anywhere (either for or against).
I think what Rawlin is proposing is as follows: - Create a Traffic Ops database backup as part of the Traffic Ops upgrade process - Perform the upgrade as usual using `db/admin.pl upgrade` and `goose up` - Keep the ability to do `goose down` as-is for whatever one off reason we may have - Add functionality to Traffic Ops such that if a `yum downgrade` is performed, the user also has the ability to run a command like `db/admin.pl restart` or something like that to restore the database to the post upgraded version - How exactly we do the backup/restore is a different issue for a different topic, we are just trying to get basic consensus that this idea makes the product better than it is today. So far, the concerns have been: - What if data has changed since the upgrade - We still need the ability to do a goose down - Backups/Restores should only be done by a DBA - We should test out the downgrade scenarios. To address these concerns: >> What if data has changed since the upgrade It is pretty reasonable to assume that if you are doing a complete downgrade it is because you found an serious issue within a short time period of doing your upgrade. It is pretty unlikely that someone is going to upgrade and then find such a serious issue a week after upgrading. It is my opinion that we should be writing software to support the 80-90% of scenarios and not the 10-20% of scenarios. Yeah, it could happen, but that doesn't mean it will. Also, if someone is concerned with losing data they can perform a downgrade in the same way we have today, by running goose X amount of times (one for each migration added). >> We still need the ability to do a goose down Rawlin has already stated that we will not be losing this ability. We will just be adding the ability to do a wholesale downgrade of the database. >> Backups/Restores should only be done by a DBA Not all of our users are DBAs and I think it is completely reasonable to provide the ability to do backup and restores for the purpose of downgrading to a known good state. If we are going to break someone's software, we should do whatever we can to help them fix it. We are not requiring that users use this process and they are more than welcome to perform their own backups if they prefer. >> We should test out the downgrade scenarios. Yeah sure, but we know that we have a less than ideal solution today with just goose down. I think what Rawlin proposes is in addition to `goose down` and will be used only in situations where a full downgrade of Traffic Ops is required. I don't think we need to test every single scenario to know that this is an area we can improve upon. As for the replacing goose conversation, we should not hijack this thread to discuss that. If we want to propose changing it then I would A) go dig up the old thread we had on it and address the concerns from that thread in a new proposal and B) submit a new mailing list topic. So, can we please get some consensus on this topic (either for or against)? Basically are we for or against the idea? We don't need to bring up every single possible design decision or edge case now, just agree that the idea is a good one or isn't. You already know where I stand, but for the sake of clarity, I think this is a good idea and greatly helps our ability to do downgrades. Thanks, Dave On Mon, Oct 22, 2018 at 8:59 AM Rawlin Peters <[email protected]> wrote: > On Fri, Oct 19, 2018 at 12:14 PM Dewayne Richardson <[email protected]> > wrote: > > > > I'm -1 until someone tests out the downgrade scenarios. My vote would be > > to keep the goose-like downgrade options (and potentially improve db/ > > admin.pl if needed to allow more rollback options if needed). > > Can you elaborate on what kind of testing of downgrade scenarios you'd > like to see? > > Ideally, I think we need to at least run SQL upgrade migrations on > every PR submitted (if the PR doesn't have an upgrade migration, the > test would still just run all the pre-existing ones), followed by N > SQL downgrade migrations (N=number of migrations added in the PR). > This should be a jenkins job that spins up a postgres docker container > with the "Kabletown" data then runs the upgrade+downgrade migrations. > If the migrations fail, the PR tests fail. This would give us somewhat > of a guarantee that the SQL migrations actually run. > > - Rawlin >
