"Jay Blanchard" <[EMAIL PROTECTED]> wrote on 12/30/2004
07:59:35 AM:
> [snip]
> "atomically" from my applications point of view means:
> I have a snapshot of my data (due to InnoDB this does not change
> hiddenly during
> my session). I send a query to the database. I get the result / changed
> data.
> And everything between sending and the receiving the result I expect to
> be
> handled at once. I know MySQL cannot handle handle the query as one big
> block
> behind the doors, but I wonder why I / my application has to know about
> these
> internals in this case.
>
> > UPDATE statements
> > work with the JOIN order and update each table in succession.
> Dependent
> > on the number of rows affected you may see both tables locked
> > (transactionally) but note a delay in the update of table1.
>
> My application does not see this and that's my point.
> In case of this query I have to know, that there is more than the data
> before
> the query and the data after the query. There is some intermediate state
> that I
> must be aware of.
> [/snip]
>
> Since I am not aware of your application language I can only speak to
> the C style languages, including scripting languages like PHP. Any of
> these will see the query as the sum of its parts, not as individual
> actions. If you need to see those things you will have to write seperate
> queries for each event you wish to see.
>
Jay,
I think my problem with the behavior you found documentation for (not in
the MySQL docs, but in Paul's book?) is that it documents that an UPDATE
doesn't work on the ENTIRE JOIN as a whole but on each table individually.
Here is a repost of Kai's test case:
CREATE TABLE table1 (
ID int(10) unsigned NOT NULL default '0',
condition int(10) unsigned NOT NULL default '0'
) TYPE=InnoDB;
INSERT INTO table1 VALUES (1,7);
INSERT INTO table1 VALUES (2,8);
CREATE TABLE table2 (
ID int(10) unsigned NOT NULL default '0',
ID_table1 int(10) unsigned NOT NULL default '0',
value int(10) unsigned NOT NULL default '0'
) TYPE=InnoDB;
INSERT INTO table2 VALUES (1,1,5);
INSERT INTO table2 VALUES (2,2,2);
If I do:
SELECT * from table1 INNER JOIN table2 on table1.ID= table2.ID_table1;
I get:
+----+-----------+----+-----------+-------+
| ID | condition | ID | ID_table1 | value |
+----+-----------+----+-----------+-------+
| 1 | 7 | 1 | 1 | 5 |
| 2 | 8 | 2 | 2 | 2 |
+----+-----------+----+-----------+-------+
Which is what I expect. However, when you do an UPDATE on that same JOIN,
the engine breaks the JOIN back into it's individual pieces then
serializes the changes between those pieces.
UPDATE table1
INNER JOIN table2 ON table2.ID_table1=table1.ID
SET table2.value=table2.value-IF(table1.condition=7,1,0),
table1.condition=8
WHERE table1.ID IN (1,2)
fails to update table2.value because by the time it gets to that column
(as part of the internal JOIN tableset), table1.condition is already 8
(even though that change was listed AFTER the conditional IF() in the SET
clause)
This worked:
UPDATE table1
LEFT JOIN table2
ON table2.ID_table1=table1.ID
AND table1.condition=7
SET table2.value= table2.value-1,
table1.condition=8
WHERE table1.ID IN (1,2);
and so did:
UPDATE table2
INNER JOIN table1 ON table2.ID_table1=table1.ID
SET table2.value=table2.value-IF(table1.condition=7,1,0),
table1.condition=8
WHERE table1.ID IN (1,2)
I think that's improper behavior. I agree with Kai, the changes should not
occur in serial but in parallel based on the initial values of the JOINed
tables.
I also disagree with this behavior:
SET @testvalue=5;
Select @testvalue:=19, @testvalue+1 as bump;
+----------------+------+
| @testvalue:=19 | bump |
+----------------+------+
| 19 | 20 |
+----------------+------+
On entry to the SELECT statement, @testvalue has a value of 5, on exit it
will have the value of 19. The value of "bump" SHOULD be 6 because that is
one more than the value that @testvalue had at the beginning of the
statement/transaction but it isn't. This means that calculations are
serialized in the order they are listed in the select statement. That
means that both the SELECT and UPDATE statements are non-deterministic
when it comes to column order.
SET @testvalue=5;
Select @testvalue+1 as bump, @testvalue:=19;
+------+----------------+
| bump | @testvalue:=19 |
+------+----------------+
| 6 | 19 |(not the same results, but I only changed the
column order)
+------+----------------+
I hope I am not so naive that Kai and I are the only two who sees a
problem with this behavior. Some of you may think this is a "feature" but
I do not like it because that means that calculational errors will occur
simply by inverting column positions (putting one before the other) within
a statement. If anyone has links to any pages on database theory that
explains why it is proper to serialize the calculations WITHIN a single
SQL statement, I need to read them because I have been operating under
some mistaken beliefs for a long time.
I have also long thought that:
UPDATE table1 LEFT JOIN table2 ...
would be OPERATIONALLY equivalent to:
UPDATE table2 RIGHT JOIN table1 ...
However, with this calculation serialization behavior, I fear it won't be.
Now, I get worried because I think I read (but can't seem to find) a
comment in the code or on the internals list that mentioned converting
RIGHT joins to LEFT joins internally as a means of optimizing a query
plan's evaluation tree. If that happens and you coded a RIGHT join
(because you wanted a particular order of formula execution) then you will
have to also specify STRAIGHT JOIN in order to override the optimizer just
to get your formulas to work consistently.
This is a real "house of cards" and I am a bit uncomfortable with the
situation. I am also surprised this hasn't come up more often. I will be
forwarding this to the internals list to get more eyes on the issue.
Hopefully one of them can explain why this is "correct" behavior and I can
just settle down and just be more deliberate in how I construct my
statements.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine