Re: Using <=> in WHERE vs HAVING clause
[EMAIL PROTECTED] wrote: > > > This is what I wound up going with: > > > > > > SELECT b.id, > > > if(a.a <=> b.a, NULL, b.a), > > > if(a.b <=> b.b, NULL, b.b), > > > if(a.c <=> b.c, NULL, b.c), > > > (NOT (a.a <=> b.a) AND > > > (a.b <=> b.b) AND > > > (a.c <=> b.c)) as DiffTest > > > FROM testC as c > > > LEFT JOIN testB as b ON c.id = b.id > > > LEFT JOIN testA as a ON c.id = a.id > > > HAVING DiffTest = 1 > > > > > > Your query runs twice as fast as this one, but all of the > > > if()'s in the SELECT portion allow me to determine which > > > columns were different between the two tables, which is > > > important for the next step of the process that I'm > > > working with. > > > ...snip... OK, That's a different assumption than I was working from before. I thought you wanted to flag the whole row as being different if any field between them was different or if the row was new. But, then, you said you needed to detect the diffs. I guess I need to slow down and read before I respond next time ;-) Naw, you were working with the info that I'd given. My boiled down test case was a bit too boiled down, removed some of the real info that I needed. I need to both flag the row as changed and figure out what column changed. My original query was intended to figure out if I'd found a bug or not. I've filed it in http://bugs.mysql.com waiting to see what happens next. H. After re-reviewing your earlier post. I wonder if we don't need to move the NOT to the outside of the parentheses You see, I am not completely sure of the order of operations... Will the ANDs happen first then the NOT or will the engine NOT the first term then AND the NOT-ed results with the other terms I am just not sure. If we write it this way, there can be no doubt: You're right, typo on my part. The query should read: SELECT b.id, if(a.a <=> b.a, NULL, b.a), if(a.b <=> b.b, NULL, b.b), if(a.c <=> b.c, NULL, b.c), (NOT (a.a <=> b.a AND (a.b <=> b.b AND (a.c <=> b.c)) as DiffTest FROM testC as c LEFT JOIN testB as b ON c.id = b.id LEFT JOIN testA as a ON c.id = a.id HAVING DiffTest = 1 Well, I am stumped trying to make it any faster. Maybe changing the first LEFT JOIN to an INNER JOIN, or caching the results of the first JOIN in a temp table (splitting up the query into two steps). Other than that, I think we have it as good as we are going to get it. Yep, it's running smoothly now. I'd rather do the comparison in the WHERE clause as it feels cleaner, but this works. Know what? I just realized we should have been CC:-ing the list this whole time. That way everyone else would have been able to contribute and learn, too. It's a hard call to make, between pestering uninterested parties and teaching those who never see the intermediate steps in figuring out a problem. Posting the answers is always a good thing though. Rene -- Rene Churchill Specializing in Internet Databases, Astute Computing, LLC. Custom Programming and Phone: 802-244-5151 Webmaster support Fax: 802-244-5512 http://www.astutecomputing.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Using <=> in WHERE vs HAVING clause
Rene Churchill <[EMAIL PROTECTED]> wrote on 02/22/2005 04:23:47 PM: > [EMAIL PROTECTED] wrote: > > Rene Churchill <[EMAIL PROTECTED]> wrote on 02/22/2005 03:39:05 PM: > > > Hi Shawn, > > > > > > This is what I wound up going with: > > > > > > SELECT b.id, > > > if(a.a <=> b.a, NULL, b.a), > > > if(a.b <=> b.b, NULL, b.b), > > > if(a.c <=> b.c, NULL, b.c), > > > (NOT (a.a <=> b.a) AND > > > (a.b <=> b.b) AND > > > (a.c <=> b.c)) as DiffTest > > > FROM testC as c > > > LEFT JOIN testB as b ON c.id = b.id > > > LEFT JOIN testA as a ON c.id = a.id > > > HAVING DiffTest = 1 > > > > > > Your query runs twice as fast as this one, but all of the > > > if()'s in the SELECT portion allow me to determine which > > > columns were different between the two tables, which is > > > important for the next step of the process that I'm > > > working with. > > > > > > Thanks, > > >Rene > > > > > > > Why not do both? I only had the one term in my SELECT clause because I > > didn't know what other data you needed. Try this as a hybrid: > > > > SELECT b.id, > >if(a.a <=> b.a, NULL, b.a), > >if(a.b <=> b.b, NULL, b.b), > >if(a.c <=> b.c, NULL, b.c), > >ISNULL(a.id) as DiffTest > > FROM testC c > > INNER JOIN testB b > >on b.id = c.id > > LEFT JOIN testA a > >on a.id = b.id > >AND a.a <=> b.a > >AND a.b <=> b.b > >AND a.c <=> b.c > > > > That way you get both the speed of a direct comparison (and tighter > > joins) and the additional information you need for your next step. > > I don't believe that will work because if a.id is NULL then so > are a.a, a.b and a.c since the LEFT JOIN failed it's conditions. > That would flag all three columns as being different, even if > only one of them was. > > In the example data I gave, all of the rows were missing from > table testA, but that was just a boiled down test case to > demonstrate the problems I was having between the WHERE and > HAVING clauses. > >Rene > > -- > Rene Churchill Specializing in Internet Databases, > Astute Computing, LLC. Custom Programming and > Phone: 802-244-5151 Webmaster support > Fax: 802-244-5512 > http://www.astutecomputing.com OK, That's a different assumption than I was working from before. I thought you wanted to flag the whole row as being different if any field between them was different or if the row was new. But, then, you said you needed to detect the diffs. I guess I need to slow down and read before I respond next time ;-) H. After re-reviewing your earlier post. I wonder if we don't need to move the NOT to the outside of the parentheses You see, I am not completely sure of the order of operations... Will the ANDs happen first then the NOT or will the engine NOT the first term then AND the NOT-ed results with the other terms I am just not sure. If we write it this way, there can be no doubt: SELECT b.id, if(a.a <=> b.a, NULL, b.a), if(a.b <=> b.b, NULL, b.b), if(a.c <=> b.c, NULL, b.c), (NOT ((a.a <=> b.a) AND (a.b <=> b.b) AND (a.c <=> b.c))) as DiffTest FROM testC as c LEFT JOIN testB as b ON c.id = b.id LEFT JOIN testA as a ON c.id = a.id HAVING DiffTest = 1 Well, I am stumped trying to make it any faster. Maybe changing the first LEFT JOIN to an INNER JOIN, or caching the results of the first JOIN in a temp table (splitting up the query into two steps). Other than that, I think we have it as good as we are going to get it. Know what? I just realized we should have been CC:-ing the list this whole time. That way everyone else would have been able to contribute and learn, too. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Using <=> in WHERE vs HAVING clause
Rene Churchill <[EMAIL PROTECTED]> wrote on 02/22/2005 09:21:29 AM: > Good evening folks, I'm seeing some odd behavior in MySQL 4.0.21 > running on Mac OS X 10.3.7 > > I'm trying to compare two identical tables and find the rows > that are new/modified. I can't use a timestamp column because > the "new" table is constantly regenerated. So I'm using a large > WHERE clause and the <=> operator to detect changes. (BTW, it > would be very nice if there was a NULL safe not equal operator) > > I get empty sets returned when I use <=> in the WHERE clause, > which seems wrong to me. However if I switch to a HAVING clause, > it works as expected. Is this a bug or am I doing something > funky here? > > Note, the testC table is used because in full query, the testA > and testB tables will have 100k+ rows and I need to effeciently > narrow the scope down to the ~250 rows that I'm interested in. > > Thanks, > Rene > > > Test Script: > > create table testA (id int, a int, b int, c int); > create table testB (id int, a int, b int, c int); > create table testC (id int); > insert into testB values(1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4); > insert into testC values(2),(3),(4); > > > WHERE query: > > select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c > from testC > left join testA on testC.id = testA.id > left join testB on testC.id = testB.id > where NOT ( testA.a <=> testB.a AND > testA.b <=> testB.b AND > testA.c <=> testB.c); > Empty set (0.00 sec) > > > HAVING query: > > select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c > from testC > left join testA on testC.id = testA.id > left join testB on testC.id = testB.id > having NOT ( testA.a <=> testB.a AND > testA.b <=> testB.b AND > testA.c <=> testB.c); > > +--+--+--+--+--+--+ > | a| a| b| b| c| c| > +--+--+--+--+--+--+ > | NULL |2 | NULL |2 | NULL |2 | > | NULL |3 | NULL |3 | NULL |3 | > | NULL |4 | NULL |4 | NULL |4 | > +--+--+--+--+--+--+ > 3 rows in set (0.00 sec) > > > > -- > Rene Churchill Specializing in Internet Databases, > Astute Computing, LLC. Custom Programming and > Phone: 802-244-5151 Webmaster support > Fax: 802-244-5512 > http://www.astutecomputing.com > Have you considered moving your match/don't match conditions to your ON clause? It's easy to detect non-matches if you search for a null value where one is not supposed to be (as in a PK or FK column) in the OUTER table of an OUTER JOIN. Assuming that testB is your "new" table and testA has your "old" values. I think that you want to put a list of IDs in testC (a good idea to minimize work) and determine which rows of testB (that match the IDs in testC) do not have matches in testA. Here is how I would write the query. I would look for the matches ("equals" computes much faster than "not equals") then detect where the matches didn't happen. SELECT B.ID FROM testC c INNER JOIN testB b on b.id = c.id LEFT JOIN testA a on a.id = b.id AND a.a <=> b.a AND a.b <=> b.b AND a.c <=> b.c WHERE a.id is null; Let me know how it works. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Using <=> in WHERE vs HAVING clause
Good evening folks, I'm seeing some odd behavior in MySQL 4.0.21 running on Mac OS X 10.3.7 I'm trying to compare two identical tables and find the rows that are new/modified. I can't use a timestamp column because the "new" table is constantly regenerated. So I'm using a large WHERE clause and the <=> operator to detect changes. (BTW, it would be very nice if there was a NULL safe not equal operator) I get empty sets returned when I use <=> in the WHERE clause, which seems wrong to me. However if I switch to a HAVING clause, it works as expected. Is this a bug or am I doing something funky here? Note, the testC table is used because in full query, the testA and testB tables will have 100k+ rows and I need to effeciently narrow the scope down to the ~250 rows that I'm interested in. Thanks, Rene Test Script: create table testA (id int, a int, b int, c int); create table testB (id int, a int, b int, c int); create table testC (id int); insert into testB values(1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4); insert into testC values(2),(3),(4); WHERE query: select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c from testC left join testA on testC.id = testA.id left join testB on testC.id = testB.id where NOT ( testA.a <=> testB.a AND testA.b <=> testB.b AND testA.c <=> testB.c); Empty set (0.00 sec) HAVING query: select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c from testC left join testA on testC.id = testA.id left join testB on testC.id = testB.id having NOT ( testA.a <=> testB.a AND testA.b <=> testB.b AND testA.c <=> testB.c); +--+--+--+--+--+--+ | a| a| b| b| c| c| +--+--+--+--+--+--+ | NULL |2 | NULL |2 | NULL |2 | | NULL |3 | NULL |3 | NULL |3 | | NULL |4 | NULL |4 | NULL |4 | +--+--+--+--+--+--+ 3 rows in set (0.00 sec) -- Rene Churchill Specializing in Internet Databases, Astute Computing, LLC. Custom Programming and Phone: 802-244-5151 Webmaster support Fax: 802-244-5512 http://www.astutecomputing.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Using <=> in WHERE vs HAVING clause
Good evening folks, I'm seeing some odd behavior in MySQL 4.0.21 running on Mac OS X 10.3.7 I'm trying to compare two identical tables and find the rows that are new/modified. I can't use a timestamp column because the "new" table is constantly regenerated. So I'm using a large WHERE clause and the <=> operator to detect changes. (BTW, it would be very nice if there was a NULL safe not equal operator) I get empty sets returned when I use <=> in the WHERE clause, which seems wrong to me. However if I switch to a HAVING clause, it works as expected. Is this a bug or am I doing something funky here? Note, the testC table is used because in full query, the testA and testB tables will have 100k+ rows and I need to effeciently narrow the scope down to the ~250 rows that I'm interested in. Thanks, Rene Test Script: create table testA (id int, a int, b int, c int); create table testB (id int, a int, b int, c int); create table testC (id int); insert into testB values(1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4); insert into testC values(2),(3),(4); WHERE query: select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c from testC left join testA on testC.id = testA.id left join testB on testC.id = testB.id where NOT ( testA.a <=> testB.a AND testA.b <=> testB.b AND testA.c <=> testB.c); Empty set (0.00 sec) HAVING query: select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c from testC left join testA on testC.id = testA.id left join testB on testC.id = testB.id having NOT ( testA.a <=> testB.a AND testA.b <=> testB.b AND testA.c <=> testB.c); +--+--+--+--+--+--+ | a| a| b| b| c| c| +--+--+--+--+--+--+ | NULL |2 | NULL |2 | NULL |2 | | NULL |3 | NULL |3 | NULL |3 | | NULL |4 | NULL |4 | NULL |4 | +--+--+--+--+--+--+ 3 rows in set (0.00 sec) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]