Re: Using <=> in WHERE vs HAVING clause

2005-02-22 Thread Rene Churchill
[EMAIL PROTECTED] wrote:
 > >  > This is what I wound up going with:
 > >  >
 > >  > SELECT b.id,
 > >  > if(a.a <=> b.a, NULL, b.a),
 > >  > if(a.b <=> b.b, NULL, b.b),
 > >  > if(a.c <=> b.c, NULL, b.c),
 > >  > (NOT (a.a <=> b.a) AND
 > >  >  (a.b <=> b.b) AND
 > >  >  (a.c <=> b.c)) as DiffTest
 > >  > FROM testC as c
 > >  > LEFT JOIN testB as b ON c.id = b.id
 > >  > LEFT JOIN testA as a ON c.id = a.id
 > >  > HAVING DiffTest = 1
 > >  >
 > >  > Your query runs twice as fast as this one, but all of the
 > >  > if()'s in the SELECT portion allow me to determine which
 > >  > columns were different between the two tables, which is
 > >  > important for the next step of the process that I'm
 > >  > working with.
 > >  >
...snip...
OK, That's a different assumption than I was working from before. I 
thought you wanted to flag the whole row as being different if any field 
between them was different or if the row was new. But, then, you said 
you needed to detect the diffs. I guess I need to slow down and read 
before I respond next time ;-)
Naw, you were working with the info that I'd given.  My boiled
down test case was a bit too boiled down, removed some of the
real info that I needed.  I need to both flag the row as changed
and figure out what column changed.
My original query was intended to figure out if I'd found a bug
or not.  I've filed it in http://bugs.mysql.com  waiting to see
what happens next.
H. After re-reviewing your earlier post. I wonder if we don't 
need to move the NOT to the outside of the parentheses You see, I am 
not completely sure of the order of operations... Will the ANDs happen 
first then the NOT or will the engine NOT the first term then AND the 
NOT-ed results with the other terms I am just not sure. If we write 
it this way, there can be no doubt:
You're right, typo on my part.  The query should read:
SELECT b.id,
   if(a.a <=> b.a, NULL, b.a),
   if(a.b <=> b.b, NULL, b.b),
   if(a.c <=> b.c, NULL, b.c),
 (NOT (a.a <=> b.a AND
  (a.b <=> b.b AND
  (a.c <=> b.c)) as DiffTest
FROM testC as c
LEFT JOIN testB as b ON c.id = b.id
LEFT JOIN testA as a ON c.id = a.id
HAVING DiffTest = 1
Well, I am stumped trying to make it any faster. Maybe changing the 
first LEFT JOIN to an INNER JOIN, or caching the results of the first 
JOIN in a temp table  (splitting up the query into two steps). Other 
than that, I think we have it as good as we are going to get it.
Yep, it's running smoothly now.  I'd rather do the comparison
in the WHERE clause as it feels cleaner, but this works.
Know what? I just realized we should have been CC:-ing the list this 
whole time. That way everyone else would have been able to contribute 
and learn, too.
It's a hard call to make, between pestering uninterested parties
and teaching those who never see the intermediate steps in figuring
out a problem.  Posting the answers is always a good thing though.
Rene
--
Rene Churchill   Specializing in Internet Databases,
Astute Computing, LLC.   Custom Programming and
Phone: 802-244-5151  Webmaster support
Fax: 802-244-5512
http://www.astutecomputing.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Using <=> in WHERE vs HAVING clause

2005-02-22 Thread SGreen
Rene Churchill <[EMAIL PROTECTED]> wrote on 02/22/2005 04:23:47 PM:

> [EMAIL PROTECTED] wrote:
> > Rene Churchill <[EMAIL PROTECTED]> wrote on 02/22/2005 03:39:05 
PM:
> >  > Hi Shawn,
> >  >
> >  > This is what I wound up going with:
> >  >
> >  > SELECT b.id,
> >  > if(a.a <=> b.a, NULL, b.a),
> >  > if(a.b <=> b.b, NULL, b.b),
> >  > if(a.c <=> b.c, NULL, b.c),
> >  > (NOT (a.a <=> b.a) AND
> >  >  (a.b <=> b.b) AND
> >  >  (a.c <=> b.c)) as DiffTest
> >  > FROM testC as c
> >  > LEFT JOIN testB as b ON c.id = b.id
> >  > LEFT JOIN testA as a ON c.id = a.id
> >  > HAVING DiffTest = 1
> >  >
> >  > Your query runs twice as fast as this one, but all of the
> >  > if()'s in the SELECT portion allow me to determine which
> >  > columns were different between the two tables, which is
> >  > important for the next step of the process that I'm
> >  > working with.
> >  >
> >  > Thanks,
> >  >Rene
> >  >
> > 
> > Why not do both?  I only had the one term in my SELECT clause because 
I 
> > didn't know what other data you needed. Try this as a hybrid:
> > 
> > SELECT b.id,
> >if(a.a <=> b.a, NULL, b.a),
> >if(a.b <=> b.b, NULL, b.b),
> >if(a.c <=> b.c, NULL, b.c),
> >ISNULL(a.id) as DiffTest
> > FROM testC c
> > INNER JOIN testB b
> >on b.id = c.id
> > LEFT JOIN testA a
> >on a.id = b.id
> >AND a.a <=> b.a
> >AND a.b <=> b.b
> >AND a.c <=> b.c
> > 
> > That way you get both the speed of a direct comparison (and tighter 
> > joins) and the additional information you need for your next step. 
> 
> I don't believe that will work because if a.id is NULL then so
> are a.a, a.b and a.c since the LEFT JOIN failed it's conditions.
> That would flag all three columns as being different, even if
> only one of them was.
> 
> In the example data I gave, all of the rows were missing from
> table testA, but that was just a boiled down test case to
> demonstrate the problems I was having between the WHERE and
> HAVING clauses.
> 
>Rene
> 
> -- 
> Rene Churchill   Specializing in Internet Databases,
> Astute Computing, LLC.   Custom Programming and
> Phone: 802-244-5151  Webmaster support
> Fax: 802-244-5512
> http://www.astutecomputing.com

OK, That's a different assumption than I was working from before. I 
thought you wanted to flag the whole row as being different if any field 
between them was different or if the row was new. But, then, you said you 
needed to detect the diffs. I guess I need to slow down and read before I 
respond next time ;-)

H. After re-reviewing your earlier post. I wonder if we don't need 
to move the NOT to the outside of the parentheses You see, I am not 
completely sure of the order of operations... Will the ANDs happen first 
then the NOT or will the engine NOT the first term then AND the NOT-ed 
results with the other terms I am just not sure. If we write it this 
way, there can be no doubt:

SELECT b.id,
if(a.a <=> b.a, NULL, b.a),
if(a.b <=> b.b, NULL, b.b),
if(a.c <=> b.c, NULL, b.c),
(NOT ((a.a <=> b.a) AND
(a.b <=> b.b) AND
(a.c <=> b.c))) as DiffTest
FROM testC as c
LEFT JOIN testB as b ON c.id = b.id
LEFT JOIN testA as a ON c.id = a.id
HAVING DiffTest = 1

Well, I am stumped trying to make it any faster. Maybe changing the first 
LEFT JOIN to an INNER JOIN, or caching the results of the first JOIN in a 
temp table  (splitting up the query into two steps). Other than that, I 
think we have it as good as we are going to get it.

Know what? I just realized we should have been CC:-ing the list this whole 
time. That way everyone else would have been able to contribute and learn, 
too. 

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


Re: Using <=> in WHERE vs HAVING clause

2005-02-22 Thread SGreen
Rene Churchill <[EMAIL PROTECTED]> wrote on 02/22/2005 09:21:29 AM:

> Good evening folks, I'm seeing some odd behavior in MySQL 4.0.21
> running on Mac OS X 10.3.7
> 
> I'm trying to compare two identical tables and find the rows
> that are new/modified.  I can't use a timestamp column because
> the "new" table is constantly regenerated.  So I'm using a large
> WHERE clause and the <=> operator to detect changes.  (BTW, it
> would be very nice if there was a NULL safe not equal operator)
> 
> I get empty sets returned when I use <=> in the WHERE clause,
> which seems wrong to me.  However if I switch to a HAVING clause,
> it works as expected.  Is this a bug or am I doing something
> funky here?
> 
> Note, the testC table is used because in full query, the testA
> and testB tables will have 100k+ rows and I need to effeciently
> narrow the scope down to the ~250 rows that I'm interested in.
> 
> Thanks,
>  Rene
> 
> 
> Test Script:
> 
> create table testA (id int, a int, b int, c int);
> create table testB (id int, a int, b int, c int);
> create table testC (id int);
> insert into testB values(1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4);
> insert into testC values(2),(3),(4);
> 
> 
> WHERE query:
> 
>  select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c
>   from testC
> left join testA on testC.id = testA.id
> left join testB on testC.id = testB.id
>  where NOT ( testA.a <=> testB.a AND
>  testA.b <=> testB.b AND
>  testA.c <=> testB.c);
> Empty set (0.00 sec)
> 
> 
> HAVING query:
> 
> select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c
>   from testC
> left join testA on testC.id = testA.id
> left join testB on testC.id = testB.id
> having NOT ( testA.a <=> testB.a AND
>  testA.b <=> testB.b AND
>  testA.c <=> testB.c);
> 
> +--+--+--+--+--+--+
> | a| a| b| b| c| c|
> +--+--+--+--+--+--+
> | NULL |2 | NULL |2 | NULL |2 |
> | NULL |3 | NULL |3 | NULL |3 |
> | NULL |4 | NULL |4 | NULL |4 |
> +--+--+--+--+--+--+
> 3 rows in set (0.00 sec)
> 
> 
> 
> -- 
> Rene Churchill   Specializing in Internet Databases,
> Astute Computing, LLC.   Custom Programming and
> Phone: 802-244-5151  Webmaster support
> Fax: 802-244-5512
> http://www.astutecomputing.com
> 

Have you considered moving your match/don't match conditions to your ON 
clause? It's easy to detect non-matches if you search for a null value 
where one is not supposed to be (as in a PK or FK column) in the OUTER 
table of an OUTER JOIN.

Assuming that testB is your "new" table and testA has your "old" values. I 
think that you want to put a list of IDs in testC (a good idea to minimize 
work) and determine which rows of testB (that match the IDs in testC) do 
not have matches in testA.

Here is how I would write the query. I would look for the matches 
("equals" computes much faster than "not equals") then detect where the 
matches didn't happen.

SELECT B.ID
FROM testC c
INNER JOIN testB b
on b.id = c.id
LEFT JOIN testA a
on a.id = b.id
AND a.a <=> b.a
AND a.b <=> b.b
AND a.c <=> b.c
WHERE a.id is null;


Let me know how it works.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


Using <=> in WHERE vs HAVING clause

2005-02-22 Thread Rene Churchill
Good evening folks, I'm seeing some odd behavior in MySQL 4.0.21
running on Mac OS X 10.3.7
I'm trying to compare two identical tables and find the rows
that are new/modified.  I can't use a timestamp column because
the "new" table is constantly regenerated.  So I'm using a large
WHERE clause and the <=> operator to detect changes.  (BTW, it
would be very nice if there was a NULL safe not equal operator)
I get empty sets returned when I use <=> in the WHERE clause,
which seems wrong to me.  However if I switch to a HAVING clause,
it works as expected.  Is this a bug or am I doing something
funky here?
Note, the testC table is used because in full query, the testA
and testB tables will have 100k+ rows and I need to effeciently
narrow the scope down to the ~250 rows that I'm interested in.
Thanks,
Rene
Test Script:
create table testA (id int, a int, b int, c int);
create table testB (id int, a int, b int, c int);
create table testC (id int);
insert into testB values(1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4);
insert into testC values(2),(3),(4);
WHERE query:
select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c
 from testC
left join testA on testC.id = testA.id
left join testB on testC.id = testB.id
where NOT ( testA.a <=> testB.a AND
testA.b <=> testB.b AND
testA.c <=> testB.c);
Empty set (0.00 sec)
HAVING query:
   select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c
 from testC
left join testA on testC.id = testA.id
left join testB on testC.id = testB.id
   having NOT ( testA.a <=> testB.a AND
testA.b <=> testB.b AND
testA.c <=> testB.c);
+--+--+--+--+--+--+
| a| a| b| b| c| c|
+--+--+--+--+--+--+
| NULL |2 | NULL |2 | NULL |2 |
| NULL |3 | NULL |3 | NULL |3 |
| NULL |4 | NULL |4 | NULL |4 |
+--+--+--+--+--+--+
3 rows in set (0.00 sec)

--
Rene Churchill   Specializing in Internet Databases,
Astute Computing, LLC.   Custom Programming and
Phone: 802-244-5151  Webmaster support
Fax: 802-244-5512
http://www.astutecomputing.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Using <=> in WHERE vs HAVING clause

2005-02-21 Thread Rene Churchill
Good evening folks, I'm seeing some odd behavior in MySQL 4.0.21
running on Mac OS X 10.3.7
I'm trying to compare two identical tables and find the rows
that are new/modified.  I can't use a timestamp column because
the "new" table is constantly regenerated.  So I'm using a large
WHERE clause and the <=> operator to detect changes.  (BTW, it
would be very nice if there was a NULL safe not equal operator)
I get empty sets returned when I use <=> in the WHERE clause,
which seems wrong to me.  However if I switch to a HAVING clause,
it works as expected.  Is this a bug or am I doing something
funky here?
Note, the testC table is used because in full query, the testA
and testB tables will have 100k+ rows and I need to effeciently
narrow the scope down to the ~250 rows that I'm interested in.
Thanks,
Rene
Test Script:
create table testA (id int, a int, b int, c int);
create table testB (id int, a int, b int, c int);
create table testC (id int);
insert into testB values(1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4);
insert into testC values(2),(3),(4);
WHERE query:
select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c
 from testC
left join testA on testC.id = testA.id
left join testB on testC.id = testB.id
where NOT ( testA.a <=> testB.a AND
testA.b <=> testB.b AND
testA.c <=> testB.c);
Empty set (0.00 sec)
HAVING query:
   select testA.a, testB.a, testA.b, testB.b, testA.c, testB.c
 from testC
left join testA on testC.id = testA.id
left join testB on testC.id = testB.id
   having NOT ( testA.a <=> testB.a AND
testA.b <=> testB.b AND
testA.c <=> testB.c);
+--+--+--+--+--+--+
| a| a| b| b| c| c|
+--+--+--+--+--+--+
| NULL |2 | NULL |2 | NULL |2 |
| NULL |3 | NULL |3 | NULL |3 |
| NULL |4 | NULL |4 | NULL |4 |
+--+--+--+--+--+--+
3 rows in set (0.00 sec)


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]