Re: [Ledger-smb-devel] Managing database change: improvements needed

Erik Huelsmann Wed, 26 Aug 2015 13:20:08 -0700

Some more info:

It seems Sqitch wants to address the issue of multiple branches receiving
the same patches and upgrading from one branch to another, like we want to
support: https://github.com/theory/sqitch/issues/200


Regards,

Erik.

On Wed, Aug 26, 2015 at 9:13 PM, Erik Huelsmann <[email protected]> wrote:

> Hi,
>
> Hi all,
>>
>> There are a few issues that I have with the way we currently handle
>> database schema changes.
>>
>> Current situation
>>
>> ============
>>
>> We have a main schema file for the table definitions and a number of
>> modules with stored procedures grouped by "subject".
>>
>> Next to that, there's a "Fixes" file, which contains all incremental
>> updates to the schema since some version.
>>
>> When I want to change the schema as part of our development, I need to
>> change the schema definition files *and* I need to add the schema changes
>> to the Fixes.sql file.
>>
>>
>> I agree this is less than ideal.  I think we need to think of a
>> comprehensive tooling approach and there are a few reasons we haven't done
>> this (mainly because I think getting the tooling right is a lot of effort
>> and there has in the past been other urgent things that needed more effort
>> at the beginning).  However I think this is growing in urgency as we
>> continue development.
>>
> Right. I'm merely summarizing for those who didn't know the current
> situation and to support my line of reasoning. (I don't think you think
> that I am, but for the record: I'm not criticizing the current situation.)
>
>
>> Problems with the current approach
>>
>> ==========================
>>
>> The current approach has (IMO) several problems:
>>
>> * When loading the "Fixes.sql" file, even in a production environment,
>> produces a truckload of ERRORs, because the schema may contain some or all
>> of the fixes in the file, leading to aborted transactions
>>
>> * Making adjustments in multiple places can lead to incompleteness on
>> either side
>>
>> * Fixes.sql is loaded twice; before and after loading the stored
>> procedures, making it unsuitable for some kinds of fixes
>>
>> * In order to change types or parameter lists of stored procedures, they
>> need to be dropped before being (re)created, resulting in DROP TYPE and
>> DROP FUNCTION all over the place.
>>
>> Especially the high number of ERRORs in the log file when creating a new
>> database is disconcerting to new users. Myself, I'm running into the DROP
>> TYPE and DROP FUNCTION more and more often as my refactorings for 1.5 and
>> 1.4-mc reach deeper into the system.
>>
>> Additional requirements
>>
>> =================
>>
>> * Next to solving the problems above, some users have expressed the
>> desire to extend LedgerSMB locally. However, the way we currently upgrade
>> the system removes all customizations (at least from 1.2->1.3->1.4; maybe
>> not 1.4->1.5). If our changed approach could help solve this requirement,
>> all the better.
>>
>> * The solution chosen preferably works with a strategy of branching and
>> merging as we do in the development cycle.
>>
>> Agreed on all the above.
>>
>
> Good. Then we have good reasons to change :-)
>
>
>> Possible solutions
>>
>> =============
>>
>> One thing that I think we should do is separate development from
>> production when loading the database. That is to say: when creating a new
>> database, that is a different step than upgrading a production database
>> which is in turn again very different from working on a development
>> database. Our processes should work to provide the best for each.
>>
>>
>> I am not sure I understand what you mean by this.  Do you mean that we
>> need an extra build step?  I would not be opposed to that.  But if that
>> isn't what you mean, could you be more specific?
>>
>
> Well, other than the standard Perl steps "copy to blib" and "manify POD",
> we don't really have build steps of our own. I'm indeed proposing that we
> have a build step. I envision it to be something like this:
>
> * Create PostgreSQL database
> * Play Sqitch scripts
> * Dump schema from the database
>
> and then consider the dumped schema a build artifact which doesn't need
> version control. This step would be executed before creating a distribution
> archive.
>
> The pre-distribution step should -if we go with Sqitch- also include
> compilation of a Sqitch bundle, so we don't distribute development
> progress, branches, or whatever that might be in the Sqitch repository
> (which is - as far as I understand it - stored in the development tree).
>
>
> Proposed solution to the duplicate change problem
>>
>> =====================================
>>
>> In the past we have been talking about adding a "build" step to
>> LedgerSMB. This step could be used to produce a single schema file for
>> quickly setting up a new database (company). This schema file would be a
>> build artifact and no longer be version controlled itself. It'd be the
>> result of a full schema creation step, including running the Fixes.sql file.
>>
>> Additionally, this step could be used to deliver an up-to-date
>> doc/database/ directory with current database documentation.
>>
>> I like that idea.
>>
>> While this step may feel inhibiting for development, one thing I'm
>> thinking is that we may not need to require this step to be executed on a
>> development system, except for when testing the production deployment.
>>
>> And testing the build system.
>>
>
> Exactly. I think that the Travis-CI tests (c/sh)ould include the
> generation of the schema and testing off of that.
>
>
>> Proposed solution to the slew of errors from Fixes.sql
>>
>> =======================================
>>
>> There are actually a number of solutions here, as I see it, all of them
>> revolving around the idea that every schema change should be applied once.
>> Basically, I see 2 categories of solutions:
>>
>> 1. Do it ourselves
>>
>> 2. Re-use the work of others
>>
>> The benefit of (1) is that we get full control and no further
>> dependencies for development or production. However, the downside is that
>> we get to do all the thinking and problem solving as well.
>>
>> In both categories I see similar solutions available:
>>
>> a. Numbered changes
>>
>> b. Named changes
>>
>> and in (1) I see a solution that's not available in category (2):
>>
>> c. Use PostgreSQL EXTENSIONs
>>
>> As for category (2), I haven't looked too far around yet, but I did find
>> sqitch (http://sqitch.org/); Sqitch offers a command line tool for the
>> development workflow. Additionally, it provides facilities for deploying
>> only the necessary changes with releases *and* it provides an API which we
>> can use to upgrade the database from one (patch) release to another.
>>
>> PG Extensions are definitely in the "reuse work of others" category but
>> the system solves a different problem than sqitch.
>>
>
> This is an interesting difference in views and we should explore it a bit.
>
>
>> Indeed the two together could be a very powerful combination.
>>
>> PG Extensions would allow us to ship db components (whether table sets,
>> etc) in a versioned way.  There is likely to be a learning curve here and
>> it adds a major build dependency for people installing from our source
>> packages.  But it provides some really nice features  including versioning
>> and dependency tracking/enforcement.
>>
>> What PG Extensions would allow us to do would be to effectively ship
>> versioned pieces of our db schema and have an orderly way of upgrading them.
>>
>
> Ok. So far, I'm not sure I see the difference with Sqitch, apart from the
> fact that Sqitch is something we'd have to depend on and extensions are
> built into PostgreSQL. One thing where I can see a difference - but we
> wouldn't be using it, is my current estimation - is managing *external*
> dependencies. But with Sqitch, we can manage internal dependencies, so, if
> that's what we're using extensions for, what would be your envisioned
> difference?
>
>
>>   For things like stored procedure sections, we could always drop the
>> extension and load a new version.  But it would make it very clean.  And we
>> could even break up parts of the db into modules that could be
>> independently versioned.
>>
>
>> Getting this tooling to work right is not a minor project however.  It
>> definitely cannot be done in a minor release because of the additional
>> dependencies, and I think we'd need a policy against adding extensions to
>> core in major versions.  However it would also provide a nice platform for
>> third party integration and reuse as well.
>>
>> Each of our modules could be an extension and separately versioned.
>>
>
> Ok. I like the technical side of that, but I'm not sure we need that at
> this point. I *do* see value for this in terms of optional modules. Say the
> way we used to have Budgetting as an extension in 1.3 (although it solves
> only 20% of the problem, because hooking up Budgetting into the
> web-application was actually a bigger problem than getting the tables
> installed).
>
> For branching and merging, the extensions would have a number specific to
>> development head and could be re-numbered during the build process.  For
>> modules all we really need to do is drop extension/create extension if the
>> version number has changed.
>>
>> We'd probably want to reserve a number series for "next in version
>> series."  I am thinking that using 999 would be a good one, so 1.4.999
>> would be "whatever next comes in 1.4" and this would allow us to install
>> and run out of the development tree.
>>
>
> Hmmm. Sqitch - like version control systems - does this by itself (in casu
> by generating a hash, like Git and other DVCSes).
>
>> As for sqitch, I have no experience with it yet. It's supposed to work
>> well with branching and merging. One thing it *does* do is integrate with
>> version control tools and it means to integrate with the project's
>> repository. Ideally when merging and branching, no additional processing is
>> required to integrate the changes from branches. However, I'm not exactly
>> sure that's what happens (given
>> https://groups.google.com/forum/#!searchin/sqitch-users/merge/sqitch-users/GXqgt7nJ_1k/Vvg-r1HOEqMJ)
>> but I think the referenced link is about reordering changes which already
>> have been partially deployed. What I like about Sqitch is that it
>> integrates with the VC system, but is VC system agnostic, some of us can
>> use it with Git while others can keep using Hg-git as they currently do.
>>
>> Ok so if we ship our storage schema as an extension (or group of
>> extensions) we still need to have ordered versioning going on.
>>
>
> Well, it seems to me that that's what Sqitch does, *unless* you have a
> database which is running some development version. From tag to tag (which
> would be from release to release), ordering doesn't seem to matter. The
> problem the page seems to refer to - but I should probably subscribe to the
> mailing list and ask this quention - a database which is left at an
> intermediary state.
>
> I am thinking that sqitch may allow us to do the grunt work of the schema
>> change scripts as a part of the build phase (we have to strip out
>> begin/commit statements since create/alter extension adds those), but it
>> seems like it would be very helpful in managing that side.
>>
>
> Yea. Maybe this too is a question to ask Sqitch's mailing list.
>
>> What I *am* sure about is that (1) will be completely change-oriented.
>> *Maybe* we can use sqitch with a the current module-based source code
>> organization.
>>
>> If we go with extensions we don't need to.  We just bump the version
>> number on each change of a module and keep a catalog of what is required
>> (and tests to verify of course).  We can then drop/create extensions as
>> needed for the procedural modules, and we can generate migration scripts
>> for the for the storage extension(s).  The big difference is that instead
>> of one db version we would have a bunch of components that were versioned
>> and likely a requirement that these match on login.
>>
>
> Hmm. I'm somehow that doesn't sound like a really attractive outlook: more
> things that (apparently) can go wrong.
>
>> If we're going to go with Sqitch, I don't think it's a good idea to
>> switch just then and there, but test it to build up some experience and
>> then choose a well defined point in time to start using it.
>>
>> Agreed.  Maybe start on plugins?
>>
>
> Actually, I was thinking of starting with 1.5 *after* it's been upgraded
> from 1.4? However, that would mean that *everybody* would need to migrate
> through 1.5.0, because that's the moment we'll move to Sqitch? (Or we could
> choose 1.6 or ...?)
>
>
>> So, there are 2 proposals here. What do you say?
>>
>> I don't see how we can get away from doing some things ourselves, so I
>> guess the question is what and where to reuse other work.
>>
>> So my preference would be sqitch and pg extensions.  In my mind this
>> would require a two-step build process for dbs:
>>
>> 1.  Build the schema extensions using sqitch
>> 2.  Install all pg extensions using make.
>>
>
> Ok. Is this on releases only, or with our day-to-day work on the tree?
>
> My view on using Sqitch for everything in our schema would be to replace
> the current schema loading with a call to Sqitch's API. Of course we have
> to do that ourselves, but it's not in the category that I'd call "do
> something ourselves", because all we have to do is create the code to
> delegate to "the other project's code". But figuring out *how* to do that,
> yes, that's our work.
>
> The first is a developer-only task (sort of like running YACC is when
>> building PostgreSQL -- not even required when installing from source
>> packages) so there isn't a user-visible dependency there.
>>
>> The second though does impose some user dependencies in that pg_config
>> and other pg development tools must be installed before  the build can be
>> done.  Of course for those installing debs, these would be done when
>> building debs rather than when installing them.
>>
>
> If we use sqitch only, we also need it as a dependency, both for
> developers and for end users. However, we probably don't need to learn our
> users about it, if we simply use it as a library.
>
> So that is my preference and why.
>>
> Thanks for your response!
>
> With my reaction above, I'm wondering how to get to a next step. We
> probably need to be practical in some sense and just start *somwhere*. But
> the question would be *where* and *with what* -- in the sense that the
> *what* question seems to be a choice between (as we now know them) (a)
> Sqitch or (b) Sqitch+Extensions.
>
>
> Looking forward to further discussion,
>
> --
> Bye,
>
> Erik.
>
> http://efficito.com -- Hosted accounting and ERP.
> Robust and Flexible. No vendor lock-in.
>



-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.

------------------------------------------------------------------------------

_______________________________________________
Ledger-smb-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel

Re: [Ledger-smb-devel] Managing database change: improvements needed

Reply via email to