Re: Let’s discuss database upgrades

Ron Wheeler Tue, 29 Dec 2015 08:41:24 -0800

As an old-timer but a new cloudstack user, it strikes me as a bit oddthat changes to the database are allowed within a minor version change.

This seems to cause a lot more problems than it solves.

It could delay the release of someone's pet enhancement or bug fix butthe idea of not being able to upgrade from 4.5.3 to 4.6.2 is frightening.The prospect of having upgrade scripts for 4.5.2 to 4.6.0, 4.6.1and4.6.2 as well as as a separate upgrade from 4.5.3 to 4.6.2 andsimilar scripts for 4.5.4, 4.5.5, etc. to 4.6.2, 4.6.3, 4.6.4 and so on,is unpleasant.This would have to continue until someone says that 4.5.x is dead and noupgrade scripts to new 4.6.x releases will be available.

In projects that I have run, a change to the database required a newmajor release so a single conversion will take one from 4.5.x to 4.6.x


The nice thing about release numbers is that one never runs out!

Ron



On 29/12/2015 10:08 AM, Daan Hoogland wrote:

CCYY == YYYY

On Tue, Dec 29, 2015 at 3:06 PM, Rafael Weingärtner <
[email protected]> wrote:

I also liked the date-format, what did you mean with CCYY?



The way I think we might have a problem, would be to commits/PRs that end
up creating files with same names. Then, we would have to agree upon a way
to solve those conflicts, such as appending an extra character to indicate
a sequence to be followed or adding more data such as HH and mm to the
naming convention (YYYY-MM-DD-HH-mm).


I liked the way Wido suggested, we could just remove the “-” from
“YYYY-MM-DD-HH-mm” and use the value as an integer (YYYYMMDDHHmm).



It seems that we are reaching a consensus. I would love to hear back from
other devs though, especially committers.



BTW: do I have permission to create a page on the wiki so I can add
everything we discuss and agree upon here? This way, we could add that page
to the guidelines for devs creating PRs and committers reviewing and
merging them.

On Tue, Dec 29, 2015 at 12:00 PM, Wido den Hollander <[email protected]>
wrote:


On 29-12-15 14:46, Daan Hoogland wrote:

Wido, Rafael,

I like the date-format but then of course CCYY-MM-DD. I can still think

of

ways to screw up that (or the plain int;)

20151229 is a valid integer which you can simply use to compare with.

100, 101, 102 or 20151229, 20160103, 20160104, I don't care that much.

My point is that the database version should be separated from the code
base.

Wido

On Tue, Dec 29, 2015 at 1:40 PM, Rafael Weingärtner <
[email protected]> wrote:

Wido, that is true, you are right; the naming on upgrade routines can

use a

numeric value independent of the number of the version. The numeric

value

can be a simple integer that is incremented each routine that is added

or a

time stamp when the routine was added. The point is that we would have

to

link a version to a number. That would enable us to use flywaydb.

To use that approach I think we might need to break compatibility as

you

pointed out earlier, but I believe that the benefits of an improved

way

to

manage upgrade routines will compensate by the breaking of

compatibility.

On Tue, Dec 29, 2015 at 10:25 AM, Wido den Hollander <[email protected]>
wrote:


On 29-12-15 13:21, Rafael Weingärtner wrote:

I got your point Daan.

Well, and if we linked a version of ACS with a time stamp in the

format

of

DD.MM.YYYY?

In that case you could also say.

ACS 4.6.0 == db ver X

You don't have to say ver >= X, you can also say ver = X.

We could then use the time stamp in the same format to name upgrade
routines. This way the idea of running all of the routines in

between

version during upgrades could be applied.

Same goes for giving all database changes a simple numeric int which
keeps incrementing each time a change is applied ;)

Wido

On Tue, Dec 29, 2015 at 10:03 AM, Daan Hoogland <

[email protected]

wrote:

Rafael,

On Tue, Dec 29, 2015 at 12:22 PM, Rafael Weingärtner <
[email protected]> wrote:

Thanks, Daan and Wido for your contributions, I will discuss them

as

follows.

Daan, about the idea of per commit upgrades. Do you mean that we

separate

each change in the database that is introduced by PRs/Commits in a
different file (routine upgrade) per ACS version?
So we would have, V_480_A.sql (for a PR),V_480_B.sql (for another

PR)

and

so forth

If that is the case, we can achieve that using a simple convention

naming

as I suggested. Each developer when she/he needs to change or add

something

in the database creates an upgrade routine separately and gives it

an

execution order to be taken by Flywaydb. I think that could help

RMs

to

track and isolate the problem, right?

Yes, with one little caveat. We do not know in what version a

feature/PR

will end up at the time of implementing, so a name containing the

version

would not be ideal.

Hi Wido, now I understand your example.
I understand your worry about upgrade paths, and that is the

point I

want

to discuss and solve. In your example, if we release a 4.6.0 and

later

4.5.3. You said that there would be no upgrade path from 4.5.3 to

4.6.0.

Well, today that is what happens. However, if we change the

technology

we

use to upgrade the database (using a tool such as Flywaydb) and if

we

define a standard to create upgrade routines that would not be a

problem.

As I have written in my first email, to go from a version to

another

we

should be able to run all of the upgrade routines in between them
(including the upgrade routine of the goal version). Therefore, if

we

release a version 4.6.0, and then 4.5.3, if someone upgrades to

4.5.3

from

any other version, and then wants to upgrade to 4.6.0, that would

not

be

problem, it would be a metter of running only the routine upgrade

of

4.6.0

version. We do not need to explicitly create upgrade paths. They

should

be

implicit by our upgrade conventions.

About creating versions of the code that rely on some version of

the

database. I do not like much because of compatibility issues that

might

arise. For instance, let’s say version X of ACS depends on version

=Y
of

the database. If I upgrade the database to version Y + 1 or +2,

the

same

ACS version has to keep running nice and shiny. My worry is that

may

bring

some complications, such as to remove columns that cease to be

used

or

data

structure that we might want to improve.

I normally see that the database version and the code base are

tied

in

mapping 1 to 1. Maybe I am having troubles identifying the

benefits

of

that

change.

Thanks for your time ;)

On Tue, Dec 29, 2015 at 8:15 AM, Wido den Hollander <

[email protected]

wrote:


On 28-12-15 21:34, Rafael Weingärtner wrote:

Hi Wido, Rohit,
I have just read the feature suggestion.

Wido, I am not trying to complicate things, quite the opposite,

just

illustrate a simple thing that can happen and is happening; I

just

pointed

how it can be easily solved.

About the release of .Z, releases more constant and others, I do

not

want

to mix topics. Let’s keep this thread strict to discuss database

upgrades.
I do not want to start the release discussion, but what I meant

is

that

we try to find a technical solution to something which might be

solved

easier by just changing the way we release.

4.6.0 is released and afterwards 4.5.3 is released. How does

somebody

upgrade from 4.5.3 to 4.6.0? He can't, since the 4.6.0 code

doesn't

support that path.

So my idea is to split the database version from the code

version.

The code requires database version >= X and during boot it simply

checks

that.

The database migration tool can indeed do the DB migration, it

doesn't

have to be the mgmt server who does the upgrade.

Now, about the FS. I agree with Rohit that we should have only

one

way

of

managing database upgrades and creation. I just do not like the

idea

of

creating a tool that work as a wrapper on frameworks/tools such

as

flywaydb. I think that those frameworks already work pretty good

as

they

are; and, I would rather maintain configurations than some

wrapper

code.

I personally like the way ACS works during upgrades (I just do

not

like

the

code itself and how things are structured), as a system

administrator I

like to change the version in the

“/etc/apt/sources.list.d/cloudstack.list”

and use the "apt-get" "update" and "install" from the command

line. I

do

not see the need to add another tool that is just a wrapper to

the

mix.

If

I update ACS code to 4.7.0, why would I let the database schema

in

an

older

version? And if we want version DB schemas and application code

separately

maintaining somehow compatibility between them, which would

bring

whole

other level of complexity to the code; I think we should avoid

that.

The flywaydb can be easily integrated with everything we have

now;

we

could

have a maven profile for developers and integrate it in ACS

bootstrap

using

its API as a Spring bean. Therefore, we could remove the current
“DatabaseUpgradeChecker “, “DbUpgrade” and other classes that

aim

to

do

that. We could even add the creation of the schema into the

first

time

it

boots using flywaydb and retire the “cloudstack-setup-database”

script,

or

at least make it less complicated, using it just to configure

the

database

URL and users.

The point is that to use Flywaydb we would have to agree upon a

convention

on creating routines (java and SQL) to execute upgrades.

Moreover,

using

tool such as Flywaydb we do not need to worry about upgrade

paths.

As I

wrote in the email I used to start this thread, the upgrade has

to

be

straightforward, to go to a version we have to run all of the

upgrade

routines between the current version until the desired one. Our

job

is

to

create upgrade routines that work and name them properly, the

job

of

the

tool is to check the current version, the desired one, the

upgrades

that

it

needs to run and execute everything properly.

Yes, indeed. I just wanted to start the discussion if we

shouldn't

version the database differently from the code.

Additionally, I do not see the need to break compatibility as

Rohit

suggested in the FS; in my opinion, everything we have up today

can

be

migrated to the new structure I proposed. If we use a tool such

as

Flywaydb, I even volunteered for that. The only thing we have to

discuss

and agree upon is the naming conventions for upgrades routines,

where

to

put them and the configurations for flywaydb.

Thanks for your contribution and time.


On Mon, Dec 28, 2015 at 2:10 PM, Rohit Yadav <

[email protected]>

wrote:

Hi Rafael and Wido,

Thanks for starting a conversation in this regard, I could not

pursue

the

Chimp tool due to other $dayjob work though it’s good to see

some

discussion has started again. Hope we’ll solve this in 2016.

In my opinion, we will need to first separate the database

init/migration

tooling away from mgmt server (right now the mgmt server does

db

migrations

when it starts and there is a code/db version mismatch) and

secondly

make

sure that we’re using the same code/tool to deploy database

(right

now,

users use the cloudstack-setup-database python tool while

developer

use

the

maven/java DatabaseCreator activated by the -Ddeploydb flag).

After we’ve addressed these two issues we can look into how we

can

support

minor releases workflow (or decide to do something else, like

not

support

.Z releases like Wido mentioned), and see if we can or want to

use

any

existing migration tool or write a wrapper tool “chimp” that

uses

existing

tools (some of those are mentioned in the Chimp FS like

flywaydb

etc).

For

allowing users to go back and forth from a db schema/version,

we’ll

also

need some new DB migration

conventions/versioning/rules/static-checking,

and how developer need to write such paths (forward and

reverse)

etc.

The best approach I figured at the time was to decide that

we’ll

use

the

previous db upgrade path mechanism till a certain CloudStack

version

(say

4.8.0) and after that we’ll use the new approach or tooling to
upgrade/downgrade DB schemas (thereby retiring away from the

old

DB

upgrade

path mess).

[image: ShapeBlue] <http://www.shapeblue.com> Rohit Yadav

Software

Architect ,  ShapeBlue d:  * | s: +44 203 603 0540*
<%7C%20s:%20+44%20203%20603%200540>  |  m:  *+91 8826230892*
<+91%208826230892> e:  *[email protected] | t: *
<[email protected]%20%7C%20t:>  |  w:  *

www.shapeblue.com

<http://www.shapeblue.com> a:
53 Chandos Place, Covent Garden London WC2N 4HS UK Shape Blue

Ltd

is a

company incorporated in England & Wales. ShapeBlue Services

India

LLP

is a

company incorporated in India and is operated under license

from

Shape

Blue

Ltd. Shape Blue Brasil Consultoria Ltda is a company

incorporated

in

Brasil

and is operated under license from Shape Blue Ltd. ShapeBlue SA

Pty

Ltd

is

a company registered by The Republic of South Africa and is

traded

under

license from Shape Blue Ltd. ShapeBlue is a registered

trademark.

This email and any attachments to it may be confidential and

are

intended

solely for the use of the individual to whom it is addressed.

Any

views

or

opinions expressed are solely those of the author and do not

necessarily

represent those of Shape Blue Ltd or related companies. If you

are

not

the

intended recipient of this email, you must neither take any

action

based

upon its contents, nor copy or show it to anyone. Please

contact

the

sender

if you believe you have received this email in error.


On 28-Dec-2015, at 9:10 PM, Wido den Hollander <[email protected]

wrote:



On 28-12-15 16:21, Rafael Weingärtner wrote:

Thanks for your contribution Wido,
I have not seen Rohit’s email; I will take a look at it.

Ok, he has a FS here:

https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Chimp

About database schema changes happening only in X.Y, I also

agree

with

you

(that is a convention we all could agree on, and such as

conding

and

release procedures we could have a wiki page for that).

However, I

think we

still might have scripts in versions X.Y.Z to add data to a

table

such

as

“guest_os_hypervisor”.

Yes, that is true. A bugfix could be a addition into the

database,

but

we have to prevent it as much as possible.

The point to manage such scripts is that, if we are in

version

such

as

4.7.0 and a new script emerges in version 4.5.3, we would

have

to

decide to

run or not to run it. I would rather not run them, since if

they

add

something to the code base; those changes should also be

applied

into

master and as a consequence it will be available in a future

update.

I understand, but this is where our release cycle becomes the

problem.

It is because we release a X.Y.Z release we run into these

kind

of

problems.

If we as a project simple do not release the .Z releases we

would

be

fine as well ;)

You can try to complicate things with technical things, or if

we

release

every two / three weeks we don't run into these kind of

situations

:)

We might even cut the database version loose from the code

version.

Database version is simple 100, 101, 102, 103, 104, 105. And a

code

version requires a certain version of the database.

Wido

On Mon, Dec 28, 2015 at 12:50 PM, Wido den Hollander <

[email protected]>

wrote:


On 28-12-15 14:16, Rafael Weingärtner wrote:

Hi all devs,
First of all, sorry the long text, but I hope we can start

discussion

here and improve that part of ACS.

A while ago I have faced the code that Apache CloudStack

(ACS)

uses

to

upgrade from a version to newer one and that did not seem

to

be

good

way

to execute our upgrades. Therefore, I decided to use some

time

to

search

for alternatives.

I think we all saw that happen once or more :)

I have read some material about versioning of scripts used

to

upgrade

database (DB) of a system and went through some frameworks

that

could

help

us.

In the literature of software engineering, it is firmly

stated

that

we

have

to version DB scripts as we do with the source code of the

application,

using the baseline approach. Gladly, we were not that bad

at

this

point,

we

already versioned our routines for DB upgrade (.sql and

.java).

Therefore,

it seemed that we just did not have used a practical

approach

to

help

us

during DB upgrades.

 From my readings and looking at the ACS source code I

raised

the

following

requirement:
• We should be able to write more than one routine to

upgrade

to a

version; those routines can be written in Java and SQL. We

might

have

more

than a routine to be executed for each version and we

should

be

able

to

define an order of execution. Additionally, to go to an

upper

version, we

have to run all of the routines from smaller versions

first,

until

we

achieve the desired version.

We could also add another requirement that is the downgrade

from a

version,

which we currently do not support. With that comes my first

question

for

discussion:
• Do we want/need a method to downgrade from a version to a

previous

one?
I personally do not care. Usually people should create a

backup

PRIOR

to

a upgrade. If that fails they can restore the backup.

I found an explanation for not supporting downgrades, and I

liked

it:

http://flywaydb.org/documentation/faq.html#downgrade

So, what I devised for us:
First the bureaucracy part - our migrations occur basically

in

three

(3)

steps, first we have a "prepare script", then a cleanup

script

and

finally

the migration per se that is written in Java, at least,

that

is

what

we

can

expect when reading the interface

“com.cloud.upgrade.dao.DbUpgrade”.

Additionally, our scripts have the following naming

convention:

schema-<currentVersion>to<desiredVersion>, which in IMHO

may

cause

some

confusion because at first sight we may think that from the

same

version

we

could have different paths to an upper version, which in

practice

is

not

happening. Instead of a <currentVersion>to<version> we

could

simply

use

V_<numberOfVersion>_<sequencial>.<fileExtension>, giving

that,

we

have to

execute all of the V_<version> scripts that are smaller

than

the

version

we

want to upgrade.

To clarify what I am saying, I will use an example. Let’s

say

we

have

just

installed ACS and ran the cloudstack-setup-database. That

command

will

create a database schema in version 4.0.0. To upgrade that

schema

to

version 4.3.0 (it is just an example, it could be any other

version),

ACS

will use the following mapping:

_upgradeMap.put("4.0.0", new DbUpgrade[] {new

Upgrade40to41(),

new

Upgrade410to420(), new Upgrade420to421(), new

Upgrade421to430())

After loading the mapping, ACS will execute the scripts

defined

in

each

one

of the Upgrade path classes and the migration code per se.

Now, let’s say we change the “.sql” scripts name to the

pattern

mentioned, we would have the following scripts; those are

the

scripts

found

that aim to upgrade to versions between the interval 4.0.0

–

4.3.0

(considering 4.3.0, since that is the goal version):


- schema-40to410, can be named to: V_410_A.sql
- schema-40to410-cleanup, can be named to: V_410_B.sql
- schema-410to420, can be named to: V_420_A.sql
- schema-410to420-cleanup , can be named to: V_420_b.sql
- schema-420to421, can be named to: V_421_A.sql
- schema-421to430, can be named to: V_430_A.sql
- schema-421to430-cleanup, can be named to: V_430_B.sql


Additionally, all of the java code would have to follow the

same

convention. For instance, we have

“com.cloud.upgrade.dao.Upgrade40to41”,

which has some java code to migrate from 4.0.0 to 4.1.0.

The

idea

is

to

extract that migration code to a Java class named:

V_410_C.java,

giving

that it has to execute the SQL scripts before the java

code.

In order to go from a smaller version (4.0.0) to an upper

one

(4.3.0), we

have to run all of the migration routines from intermediate

versions.

That

is what we are already doing, but we do all of that

manually.

Bottom line, I think we could simple use the convention
V_<numberOfVersion>_<sequencial>.<fileExtension> to name

upgrade

routines.

That would facilitate us to use a framework to help us with

that

process.

Additionally, I believe that we should always assume that

to

go

from a

smaller version to a higher one, we should run all of the

scripts

that

exist between them. What do you guys think of that?

That seems good to me. But we still have to prevent that we

perform

database changes in a X.Y.Z release since that is branched

off

to a

different branch.

Imho database changes should only happen in X.Y releases.

After the bureaucracy, we can discuss tools. If we use that

convention to

name migration (upgrade) routines, we can start thinking on

tools

to

support our migration process. I found two (2) promising

ones:

Liquibase

and Flywaydb (both seem to be under Apache license, but the

first

one

has

an enterprise version?!). After reading the documentation

and

some

usage

examples I found the flywaydb easier and simpler to use.

What are the options of tools that we can use to help us

manage

the

database upgrade, without needing to code the upgrade path

that

you

know?

After that, I think we should decide if we should create

another

project/component to take care of migrations, or we can

just

add

the

dependency of the tool to a project such as

“cloud-framework-db”

and

start

using it.

The “cloud-framework-db” project seems to have a focus on

other

things

such

as managing transactions and generating SQLs from

annotations

(?!?

That

should be a topic for another discussion). Therefore, I

would

rather

create

a new project that has the specific goal of managing ACS DB

upgrades.

would also move all of the routines (SQL and Java) to this

new

project.

This project would be a module of the CloudStack project

and

it

would

execute the upgrade routines at the startup of ACS.

I believe that going from a homemade solution to one that

is

more

consolidated and used by other communities would be the way

to

go.

I can volunteer myself to create a PR with the

aforementioned

changes

and

using flywaydb to manage our upgrades. However, I prefer to

have a

good

discussion with other devs first, before starting coding.

Do you have suggestions or points that should be raised

before

we

start

working on that?

Rohit suggested Chimp earlier this year:

http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201508.mbox/%[email protected]%3E

The thread is called: "[DISCUSS] Let's fix CloudStack

Upgrades

and

DB

migrations with CloudStack Chimp"

Maybe there is something good in there.

--
Rafael Weingärtner

Regards.

Find out more about ShapeBlue and our range of CloudStack

related

services:

IaaS Cloud Design & Build
<http://shapeblue.com/iaas-cloud-design-and-build//> |

CSForge –

rapid

IaaS deployment framework <http://shapeblue.com/csforge/>
CloudStack Consulting <

http://shapeblue.com/cloudstack-consultancy/

CloudStack

Software Engineering
<http://shapeblue.com/cloudstack-software-engineering/>
CloudStack Infrastructure Support
<http://shapeblue.com/cloudstack-infrastructure-support/> |

CloudStack

Bootcamp Training Courses <

http://shapeblue.com/cloudstack-training/>



--
Rafael Weingärtner



--
Daan



--
Rafael Weingärtner



--
Rafael Weingärtner



--
Ron Wheeler
President
Artifact Software Inc
email: [email protected]
skype: ronaldmwheeler
phone: 866-970-2435, ext 102

Re: Let’s discuss database upgrades

Reply via email to