RE: Multi table update

SGreen Thu, 30 Dec 2004 07:55:22 -0800

"Jay Blanchard" <[EMAIL PROTECTED]> wrote on 12/30/2004 
07:59:35 AM:


> [snip]
> "atomically" from my applications point of view means:
> I have a snapshot of my data (due to InnoDB this does not change
> hiddenly during
> my session). I send a query to the database. I get the result / changed
> data.
> And everything between sending and the receiving the result I expect to
> be
> handled at once. I know MySQL cannot handle handle the query as one big
> block
> behind the doors, but I wonder why I / my application has to know about
> these
> internals in this case.
> 
> > UPDATE statements
> > work with the JOIN order and update each table in succession.
> Dependent
> > on the number of rows affected you may see both tables locked
> > (transactionally) but note a delay in the update of table1.
> 
> My application does not see this and that's my point.
> In case of this query I have to know, that there is more than the data
> before
> the query and the data after the query. There is some intermediate state
> that I
> must be aware of.
> [/snip]
> 
> Since I am not aware of your application language I can only speak to
> the C style languages, including scripting languages like PHP. Any of
> these will see the query as the sum of its parts, not as individual
> actions. If you need to see those things you will have to write seperate
> queries for each event you wish to see.
> 

Jay, 
 
I think my problem with the behavior you found documentation for (not in 
the MySQL docs, but in Paul's book?) is that it documents that an UPDATE 
doesn't work on the ENTIRE JOIN as a whole but on each table individually. 
Here is a repost of Kai's test case:

CREATE TABLE table1 (
  ID int(10) unsigned NOT NULL default '0',
  condition int(10) unsigned NOT NULL default '0'
) TYPE=InnoDB;

INSERT INTO table1 VALUES (1,7);
INSERT INTO table1 VALUES (2,8);

CREATE TABLE table2 (
  ID int(10) unsigned NOT NULL default '0',
  ID_table1 int(10) unsigned NOT NULL default '0',
  value int(10) unsigned NOT NULL default '0'
) TYPE=InnoDB;

INSERT INTO table2 VALUES (1,1,5);
INSERT INTO table2 VALUES (2,2,2);

If I do:

SELECT * from table1 INNER JOIN table2 on table1.ID= table2.ID_table1;

I get:
+----+-----------+----+-----------+-------+
| ID | condition | ID | ID_table1 | value |
+----+-----------+----+-----------+-------+
|  1 |         7 |  1 |         1 |     5 |
|  2 |         8 |  2 |         2 |     2 |
+----+-----------+----+-----------+-------+

Which is what I expect. However, when you do an UPDATE on that same JOIN, 
the engine breaks the JOIN back into it's individual pieces then 
serializes the changes between those pieces. 

UPDATE table1
INNER JOIN table2 ON table2.ID_table1=table1.ID
SET table2.value=table2.value-IF(table1.condition=7,1,0),
    table1.condition=8
WHERE table1.ID IN (1,2)

fails to update table2.value because by the time it gets to that column 
(as part of the internal JOIN tableset), table1.condition is already 8 
(even though that change was listed AFTER the conditional IF() in the SET 
clause)

This worked:
UPDATE table1
LEFT JOIN table2 
        ON table2.ID_table1=table1.ID
        AND table1.condition=7
SET table2.value= table2.value-1,
    table1.condition=8
WHERE table1.ID IN (1,2);

and so did:
UPDATE table2
INNER JOIN table1 ON table2.ID_table1=table1.ID
SET table2.value=table2.value-IF(table1.condition=7,1,0),
    table1.condition=8
WHERE table1.ID IN (1,2)

I think that's improper behavior. I agree with Kai, the changes should not 
occur in serial but in parallel based on the initial values of the JOINed 
tables. 

I also disagree with this behavior:

SET @testvalue=5;
Select @testvalue:=19, @testvalue+1 as bump;
+----------------+------+
| @testvalue:=19 | bump |
+----------------+------+
|             19 |   20 |
+----------------+------+

On entry to the SELECT statement, @testvalue has a value of 5, on exit it 
will have the value of 19. The value of "bump" SHOULD be 6 because that is 
one more than the value that @testvalue had at the beginning of the 
statement/transaction but it isn't. This means that calculations are 
serialized in the order they are listed in the select statement. That 
means that both the SELECT and UPDATE statements are non-deterministic 
when it comes to column order.

SET @testvalue=5;
Select @testvalue+1 as bump, @testvalue:=19;
+------+----------------+
| bump | @testvalue:=19 |
+------+----------------+
|    6 |             19 |(not the same results, but I only changed the 
column order)
+------+----------------+

I hope I am not so naive that Kai and I are the only two who sees a 
problem with this behavior. Some of you may think this is a "feature" but 
I do not like it because that means that calculational errors will occur 
simply by inverting column positions (putting one before the other) within 
a statement. If anyone has links to any pages on database theory that 
explains why it is proper to serialize the calculations WITHIN a single 
SQL statement, I need to read them because I have been operating under 
some mistaken beliefs for a long time. 

I have also long thought that:
UPDATE table1 LEFT JOIN table2 ...

would be OPERATIONALLY equivalent to:
UPDATE table2 RIGHT JOIN table1 ...

However, with this calculation serialization behavior, I fear it won't be. 
Now, I get worried because I think I read (but can't seem to find) a 
comment in the code or on the internals list that mentioned converting 
RIGHT joins to LEFT joins internally as a means of optimizing a query 
plan's evaluation tree. If that happens and you coded a RIGHT join 
(because you wanted a particular order of formula execution) then you will 
have to also specify STRAIGHT JOIN in order to override the optimizer just 
to get your formulas to work consistently.

This is a real "house of cards" and I am a bit uncomfortable with the 
situation. I am also surprised this hasn't come up more often. I will be 
forwarding this to the internals list to get more eyes on the issue. 
Hopefully one of them can explain why this is "correct" behavior and I can 
just settle down and just be more deliberate in how I construct my 
statements.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

RE: Multi table update

Reply via email to