Since this topic is now 16 emails long, I'll make it 17 to try to do some
summarization, address concerns, and see if we can get some consensus.
Lately our email discussion get to this point where there are many, many
emails back and forth, but we aren't really getting anywhere (either for or
against).

I think what Rawlin is proposing is as follows:
- Create a Traffic Ops database backup as part of the Traffic Ops upgrade
process
- Perform the upgrade as usual using `db/admin.pl upgrade` and `goose up`
- Keep the ability to do `goose down` as-is for whatever one off reason we
may have
- Add functionality to Traffic Ops such that if a `yum downgrade` is
performed, the user also has the ability to run a command like `db/admin.pl
restart` or something like that to restore the database to the post
upgraded version
- How exactly we do the backup/restore is a different issue for a different
topic, we are just trying to get basic consensus that this idea makes the
product better than it is today.

So far, the concerns have been:
- What if data has changed since the upgrade
- We still need the ability to do a goose down
- Backups/Restores should only be done by a DBA
- We should test out the downgrade scenarios.

To address these concerns:
>> What if data has changed since the upgrade
It is pretty reasonable to assume that if you are doing a complete
downgrade it is because you found an serious issue within a short time
period of doing your upgrade.  It is pretty unlikely that someone is going
to upgrade and then find such a serious issue a week after upgrading.  It
is my opinion that we should be writing software to support the 80-90% of
scenarios and not the 10-20% of scenarios. Yeah, it could happen, but that
doesn't mean it will.  Also, if someone is concerned with losing data they
can perform a downgrade in the same way we have today, by running goose X
amount of times (one for each migration added).

>> We still need the ability to do a goose down
Rawlin has already stated that we will not be losing this ability.  We will
just be adding the ability to do a wholesale downgrade of the database.

>> Backups/Restores should only be done by a DBA
Not all of our users are DBAs and I think it is completely reasonable to
provide the ability to do backup and restores for the purpose of
downgrading to a known good state.  If we are going to break someone's
software, we should do whatever we can to help them fix it.  We are not
requiring that users use this process and they are more than welcome to
perform their own backups if they prefer.

>> We should test out the downgrade scenarios.
Yeah sure,  but we know that we have a less than ideal solution today with
just goose down.  I think what Rawlin proposes is in addition to `goose
down` and will be used only in situations where a full downgrade of Traffic
Ops is required.  I don't think we need to test every single scenario to
know that this is an area we can improve upon.

As for the replacing goose conversation, we should not hijack this thread
to discuss that.  If we want to propose changing it then I would A) go dig
up the old thread we had on it and address the concerns from that thread in
a new proposal and B) submit a new mailing list topic.

So, can we please get some consensus on this topic (either for or
against)?  Basically are we for or against the idea?  We don't need to
bring up every single possible design decision or edge case now, just agree
that the idea is a good one or isn't.

You already know where I stand, but for the sake of clarity, I think this
is a good idea and greatly helps our ability to do downgrades.

Thanks,
Dave


On Mon, Oct 22, 2018 at 8:59 AM Rawlin Peters <[email protected]>
wrote:

> On Fri, Oct 19, 2018 at 12:14 PM Dewayne Richardson <[email protected]>
> wrote:
> >
> > I'm -1 until someone tests out the downgrade scenarios.  My vote would be
> > to keep the goose-like downgrade options (and potentially improve db/
> > admin.pl if needed to allow more rollback options if needed).
>
> Can you elaborate on what kind of testing of downgrade scenarios you'd
> like to see?
>
> Ideally, I think we need to at least run SQL upgrade migrations on
> every PR submitted (if the PR doesn't have an upgrade migration, the
> test would still just run all the pre-existing ones), followed by N
> SQL downgrade migrations (N=number of migrations added in the PR).
> This should be a jenkins job that spins up a postgres docker container
> with the "Kabletown" data then runs the upgrade+downgrade migrations.
> If the migrations fail, the PR tests fail. This would give us somewhat
> of a guarantee that the SQL migrations actually run.
>
> - Rawlin
>

Reply via email to