Re: [Apache Bloodhound] #404: Populate default schema on product addition

Jure Zitnik Wed, 27 Feb 2013 06:00:12 -0800

On 2/26/13 7:08 PM, Olemis Lang wrote:

On 2/26/13, Jure Zitnik <[email protected]> wrote:

On 2/26/13 10:01 AM, Olemis Lang wrote:

[....]

I started upgrade scenario discussion in a separate thread last week,
subject '[BEP-0003] Custom (3rd party plugin) table upgrade to
multi-product'.

yes , I know ... but at this point I guess we better follow up in here
. Indeed my last msg was a rewritten version of the reply I was about
to post in there . If you find some inconsistencies and typos in there
now you know what is the cause.

ok :)

jftr, I understand reasoning for trying to find out another way and not
go with the O(p) plugin tables. Please note that the current approach
has been proposed in BEP-0003 a couple of months ago and that all work
done till now was following that design.

Yes , I get it . I'm just trying to push a little because O(p) order
of magnitude is not encouraging ... so any effort we make towards
getting rid of it will be positive afaics

I agree, we need to discuss this for the reasons known. And I agree thatif we find another way it'll definitely have positive affect.

The option of adding 'product'
column to 3rd party plugin tables (the same way as for system tables)
was assessed when preparing the BEP-0003 database translation proposal.
It was dropped for the reasons stated below.

Well , I know it was discussed but it was not until now that I figured
out a way that *might* work and was not discussed before . Given the
importance of the subject IMO we better take a look at it now since
the beginning .

ok

Again, IMO that is not doable mainly because we don't know
(semantically) what's in those tables.

Maybe we don't need to do so ;)
We only need to know that all plugin data will be repeated per-product
. Adding an extra product column extending

We don't know the schema,

If we could reverse engineer the DB schema at run time (e.g. build the
very same trac.db.Table schema classes and alike) then this is my
reasoning


Heh, that sounds a bit like a 'run-time SQL to ORM mapper' to me.

Let's consider a DB D(T, R, I) like this

T(k, u, c) where k = key columns , u = unique columns , c = all other columns
R(T1, k1, T2, k2) represents relationships between two such tables
I(T, i) is an index defined on table T for a set of columns .
G(T) will be the set of tables holding global resources after migration
P(T) will be the set of tables holding product-specific resources
after migration

Considering this , non MP-aware plugin tables will always belong in P

I'm proposing to migrate them by applying a transformation U : D(R, T,
I) -> D'(R', T', I') like this

T'(k + ('product',) , u, c) for all T in P
T'(k, u, c) for all T in G
R'(T1, k1 + ('product',) , T2, k2 + ('product',)) for all T1 , T2 both in P
R'(T1, k1, T2, k2) otherwise.
I'(T', i + ('product',)) for I(T, i) and T in P
I'(T', i) for I(T, i) and T in G

In my understanding, this is exactly what we have been doing with thesystem tables. So basically the proposition you're making is to handlecustom (3rd party non multi-product aware plugin) tables in a very(except for proposed DDL changes below) similar, if not exactly the sameway, as we do the system tables. Ok, with the global/product tablescombo if it turns out they're at all required.

The trick about relationships is already available in our SQL
translator thanks to the ( SELECT * FROM T where product = 'prefix' )
hack . So e.g. when executing joins all tables will be at the same
level i.e. ON T1.product = T2.product constraint will implicit because
aforementioned statements imply T1.product = T2.product = 'prefix' .
OTOH relationships involving global resources will not scope those
tables using product prefix . So table relationships up to this point,
will be consistent in both cases.


Agree with this part.

[...]

I'll only care about DB stuff because that's actually what really
matters and the cause of this bottleneck .

I'd say that the DB stuff matters, but we can't just ignore other issues.

yes , I'm trying to focus on this by abstracting all other details .
E.g. creating files is important but has no relation when it comes to
DB schema . That's what I mean


ok, let's put other details off the table for now.

[...]

How would we be handling indexes, unique keys and other possible
constraint, etc. in any of the above cases? Automatically adding
'product' column to unique constraints? I don't believe there's a
general rule that we could apply in regards to (but not limited to)
constraints...

Why not ? Think of it this way . You have identical DB schema in two
separate environments E1, E2 before MP upgrade , let's say Trac
environments . You'll have index I(T, i) defined on both envs . Now
abstract the limitations imposed by the underlying Trac architecture ,
the DB , etc ... and think of this scenario as the *data we have in
this universe of discurse* so if you want add a dummy product prefix
E1 to all data in E1 and another to all data in E2 . That will be the
only artifact you'll have to diferentiate both sets of data when u
erase the Trac env , the DB and everything else in your mind . In the
end relational models are plain numbers . Everything else is a
distraction .

 From a relational perspective what do you have installed in place ?
You'll have a single concept related to table T working exactly like a
compound index i + ('product',) . The same will hold for constraints
and so on . So the only thing we'd need to do is to implement the
relational equivalent we'd have in such scenario but inside the DB .
The 'product' column does not add anything new into the scene ;
neither semantics , nor new constraints , nor anything else (unless
I'm missing something obvious that I'm hoping you'll be pointing out
soon ;)

Maybe there are some minor inconsistencies in the definition of the
upgrade transformation I mentioned above , but in general the idea is
: modify MP DB schema so that product-specific data data will be
modeled exactly the same as if everything was in two different
isolated environments ; 'product' column will not make a particularly
important difference other than merely multiply resources.

The fact that under certain circumstances plugins should be able to
leverage resources to be global is out of scope in this discussion .
We cannot offer anything better than what plugin authors might offer
unless they take advantage of our new interfaces and architecture .
However afaics we can really represent plugin data exactly the same as
if they were on two isolated environments by using a single table .

I understand the idea, that's what's been implemented for the systemtables. But there is a huge difference between system and custom tables.In the first case, we have complete control over things (schema,install, upgrade and such). In the case of plugins and custom table,unfortunately, we don't have any idea of what the plugin might do (ornot do) during install/upgrade process. We can't assume it's gonna beCREATE/SELECT/INSERT/SELECT sequence you described below. It mightactually be *anything*. Including DROPs for example. More below...

[...]

I'm strongly against recording SQLs
and doing replays.

I'm not saying we should record any SQL and repeat it later . Please
read below .


Ok, the 'SQL replay' was misunderstanding then.

For example, how would MP system (or whoever we choose to do the
replays) replay the following really simple SQL, per-product:
SELECT * FROM component WHERE name='Product A component';
assuming that 'Product A component' is only defined in product 'A' ...

Good challenge . I take it as this will be inside one of the upgrade
methods of a non-MP aware plugin and therefore all the SQL is generic
i.e. not translated .
Everything I'll mention will happen in the global upgrade loop just
once on create new env and env upgrade, and later many times on
product creation

Let's make it a bit more complicated to cover the whole thing . Let
the process be .

CREATE TABLE new_table [...]
SELECT * FROM component WHERE name='Product A component'
INSERT x,y INTO new_table VALUES [...]
SELECT whatever FROM anywhere WHERE anything [...]
UPDATE existing_table VALUES [...] WHERE [...]

Another challenge, let's say the upgrade (for example) sequence of nonmulti-product plugin goes like this:


CREATE TABLE new_temp_table [...., newcol] CONSTRAINT(...+newcol)

INSERT INTO new_temp_table (c1, c2, c3, newcol) SELECT c1, c2, c3,'whatever' FROM existing_custom_table

DROP TABLE existing_custom_table
CREATE TABLE existing_custom_table [...., newcol] CONSTRAINT(...+newcol)

INSERT INTO existing_custom_table (col1, col2, col3, newcol) SELECT *FROM new_temp_table

DROP TABLE new_temp_table

Now, as afaicr sqlite doesn't support table index or constraint changeson existing tables, so this would be a 'normal' sequence to add a newcolumn to a custom table and include it in the constraint (or index forthat matter).

Insert will reuse the values retrieved in SELECT statement

So I'm basically saying this is what should happen :

1. Global plugin upgrade method is invoked , hence component instantiated in
    global env scope
2. CREATE TABLE and DDL statements will be transformed to prepare the
table to store
    data for different products by adding 'product' column and everything else
    required to make everything work like if we had two isolated environments
    with the same schema (which you already know as it's in the body of SQL DDL
    statement) . The SQL translator is responsible for making this work
this way .
3. SELECT statement will be scoped using global env prefix (i.e. product='') so
    component in global env will be looked up as expected .
4. INSERT statement will be scoped using global env prefix (i.e. product='') so
    product prefix will be appended as expected .
5. Any other SELECT statement will be scoped using global env prefix
    (i.e. product='') as expected .
6. UPDATE statement will be scoped using global env prefix (i.e. product='') so
    product prefix will be appended as expected .

No SQL recorded .


Steps 3-6 are exactly what we're doing now for system tables.

I don't think step 2 is doable unless we find a way on how it shouldwork on the sequence above.

Afterwards while still in global upgrade loop, this so called entity
we'll have in any case will detect that components C1, C2 , C3 were
upgraded in global scope (i.e. needs_upgrade and similar stuff
returned True) . Therefore the *so called entity* may be either
EnvironmentSetup itself , or a component registered as a setup
participant (e.g. MP system) or ... well ... anything else we might
invent running inside the global upgrade loop, though we should need
such thing . C1 , C2 , C3 will be all non MP-aware components . So it
will instantiate product envs inside a loop and replay *the upgrade
procedure* once again something like this .

for p in ALL_PRODUCTS:
     product_env = ProductEnvironment(env, p)

     for cls in [C1, C2, C3]:
         c = C1(product_env)
         c.upgrade_proc(...)

A couple of days ago, in either this or another thread, I proposed toinvoke the component upgrade within a product environment as this wouldenable the translator to properly kick in and translate componenttables. It'd also 'automagically' solve some of the dirty details thatwe put off the table above.

We might take more precautions in here, like using DB transactions ,
etc ... it's just a simplified version I scketch so that you get my
point .
Notice a couple of things :

   - DB conections used by upgrade proc inside the loop
     will be scoped in product context that will make a difference

I assume that the 'scoped in product context' actually means that theupgrade proc will be executed from within the product environment/context.

   - upgrade_proc is the very same method we mentioned above .
   - Setup participants in

ok (something missing in the last sentence though)

So to the point `upgrade_proc` will do things as follows .

1. Component upgrade method is invoked now in product scope, hence
    instantiated in product env scope

+1, this is what I'm advocating from the beginning of this discussion

2. CREATE TABLE and DDL statements will not be executed because as a
    precondition the global setup already took care of the necessary
    precautions and schema will be ready . The SQL translator together
    with our product DB context managers are responsible for making
    this work this way .

This is a problem, see the sequence above, we don't have any way ofknowing what DDLs and for what tables should be turned off and for whichnot.

3. SELECT statement will be scoped using product env prefix (i.e.
product='MYPRODUCT') so
    component in global env will be looked up as expected.
4. INSERT statement will be scoped using product env prefix (i.e.
product='MYPRODUCT') so
    product prefix will be appended as expected .
5. Any other SELECT statement will be scoped using product prefix
    (i.e. product='MYPRODUCT') as expected so queries will still
    operate on the DB row sub-space belonging to
    the target product . Full look up will work for global tables as
    translation happens on per-table basis.
6. UPDATE statement will be scoped using global env prefix (i.e. product='') so
    product prefix will be appended as expected . No product prefix for
    global tables since they won't be translated at all.

Correction, step 6 should also be scoped using product prefix.

This is what we've been doing for system tables (in product scope), asmentioned above.


Cheers,
Jure

Re: [Apache Bloodhound] #404: Populate default schema on product addition

Reply via email to