"Jay Blanchard" <[EMAIL PROTECTED]> wrote on 12/30/2004 07:59:35 AM:
> [snip] > "atomically" from my applications point of view means: > I have a snapshot of my data (due to InnoDB this does not change > hiddenly during > my session). I send a query to the database. I get the result / changed > data. > And everything between sending and the receiving the result I expect to > be > handled at once. I know MySQL cannot handle handle the query as one big > block > behind the doors, but I wonder why I / my application has to know about > these > internals in this case. > > > UPDATE statements > > work with the JOIN order and update each table in succession. > Dependent > > on the number of rows affected you may see both tables locked > > (transactionally) but note a delay in the update of table1. > > My application does not see this and that's my point. > In case of this query I have to know, that there is more than the data > before > the query and the data after the query. There is some intermediate state > that I > must be aware of. > [/snip] > > Since I am not aware of your application language I can only speak to > the C style languages, including scripting languages like PHP. Any of > these will see the query as the sum of its parts, not as individual > actions. If you need to see those things you will have to write seperate > queries for each event you wish to see. > Jay, I think my problem with the behavior you found documentation for (not in the MySQL docs, but in Paul's book?) is that it documents that an UPDATE doesn't work on the ENTIRE JOIN as a whole but on each table individually. Here is a repost of Kai's test case: CREATE TABLE table1 ( ID int(10) unsigned NOT NULL default '0', condition int(10) unsigned NOT NULL default '0' ) TYPE=InnoDB; INSERT INTO table1 VALUES (1,7); INSERT INTO table1 VALUES (2,8); CREATE TABLE table2 ( ID int(10) unsigned NOT NULL default '0', ID_table1 int(10) unsigned NOT NULL default '0', value int(10) unsigned NOT NULL default '0' ) TYPE=InnoDB; INSERT INTO table2 VALUES (1,1,5); INSERT INTO table2 VALUES (2,2,2); If I do: SELECT * from table1 INNER JOIN table2 on table1.ID= table2.ID_table1; I get: +----+-----------+----+-----------+-------+ | ID | condition | ID | ID_table1 | value | +----+-----------+----+-----------+-------+ | 1 | 7 | 1 | 1 | 5 | | 2 | 8 | 2 | 2 | 2 | +----+-----------+----+-----------+-------+ Which is what I expect. However, when you do an UPDATE on that same JOIN, the engine breaks the JOIN back into it's individual pieces then serializes the changes between those pieces. UPDATE table1 INNER JOIN table2 ON table2.ID_table1=table1.ID SET table2.value=table2.value-IF(table1.condition=7,1,0), table1.condition=8 WHERE table1.ID IN (1,2) fails to update table2.value because by the time it gets to that column (as part of the internal JOIN tableset), table1.condition is already 8 (even though that change was listed AFTER the conditional IF() in the SET clause) This worked: UPDATE table1 LEFT JOIN table2 ON table2.ID_table1=table1.ID AND table1.condition=7 SET table2.value= table2.value-1, table1.condition=8 WHERE table1.ID IN (1,2); and so did: UPDATE table2 INNER JOIN table1 ON table2.ID_table1=table1.ID SET table2.value=table2.value-IF(table1.condition=7,1,0), table1.condition=8 WHERE table1.ID IN (1,2) I think that's improper behavior. I agree with Kai, the changes should not occur in serial but in parallel based on the initial values of the JOINed tables. I also disagree with this behavior: SET @testvalue=5; Select @testvalue:=19, @testvalue+1 as bump; +----------------+------+ | @testvalue:=19 | bump | +----------------+------+ | 19 | 20 | +----------------+------+ On entry to the SELECT statement, @testvalue has a value of 5, on exit it will have the value of 19. The value of "bump" SHOULD be 6 because that is one more than the value that @testvalue had at the beginning of the statement/transaction but it isn't. This means that calculations are serialized in the order they are listed in the select statement. That means that both the SELECT and UPDATE statements are non-deterministic when it comes to column order. SET @testvalue=5; Select @testvalue+1 as bump, @testvalue:=19; +------+----------------+ | bump | @testvalue:=19 | +------+----------------+ | 6 | 19 |(not the same results, but I only changed the column order) +------+----------------+ I hope I am not so naive that Kai and I are the only two who sees a problem with this behavior. Some of you may think this is a "feature" but I do not like it because that means that calculational errors will occur simply by inverting column positions (putting one before the other) within a statement. If anyone has links to any pages on database theory that explains why it is proper to serialize the calculations WITHIN a single SQL statement, I need to read them because I have been operating under some mistaken beliefs for a long time. I have also long thought that: UPDATE table1 LEFT JOIN table2 ... would be OPERATIONALLY equivalent to: UPDATE table2 RIGHT JOIN table1 ... However, with this calculation serialization behavior, I fear it won't be. Now, I get worried because I think I read (but can't seem to find) a comment in the code or on the internals list that mentioned converting RIGHT joins to LEFT joins internally as a means of optimizing a query plan's evaluation tree. If that happens and you coded a RIGHT join (because you wanted a particular order of formula execution) then you will have to also specify STRAIGHT JOIN in order to override the optimizer just to get your formulas to work consistently. This is a real "house of cards" and I am a bit uncomfortable with the situation. I am also surprised this hasn't come up more often. I will be forwarding this to the internals list to get more eyes on the issue. Hopefully one of them can explain why this is "correct" behavior and I can just settle down and just be more deliberate in how I construct my statements. Shawn Green Database Administrator Unimin Corporation - Spruce Pine