Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP?

2009-06-22 Thread Matt Neimeyer
On Fri, Jun 19, 2009 at 11:27 AM, Brent Baisleybrentt...@gmail.com wrote:
 It sounds like you want to use spatial indexes, but they only became
 available in v4.1
 http://dev.mysql.com/doc/refman/5.0/en/create-index.html
 http://dev.mysql.com/doc/refman/5.0/en/using-a-spatial-index.html

That feels like the right thing (spatial calculations = spatial
indexes?) but I looked at the docs and my head exploded. Can anyone
recommend a good book that takes me through it gently?

That said I'm intreged by the MBRContains and the Polygon functions...
If I read those right I could create a simplified circle (probably
just an octogon) to help eliminate false positives in the corners
when using a plain square as the enclosure.

 You don't have to do any re-architecture to change you subquery to a join:
 SELECT custzip FROM customers
 JOIN
 (SELECT zip FROM
 zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
 cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
  5) AS zips
 ON custzip=zip

Will that work after a where clause? Multiple Times? For example...
(pseudo-code...)

SELECT * FROM customers WHERE saleslastyear  10
   JOIN (SELECT zip FROM etc) AS zips ON custzip=zip
   JOIN (SELECT MAX(date) FROM phonecalls) AS LastCalledOn ON custid=custid

Just from thinking about that... I assume that the only limitation is
that in a subselect you can do something like WHERE NOT IN (select
etc) but with a JOIN you are assuming a positive relationship? For
example using the JOIN methods above there isn't a way to simply do
AND custid NOT IN (SELECT custid FROM ordersplacedthisyear) other
than doing exactly that and adding this clause to the saleslastyear
clause. (In this particular case a column lastorderdate in customer
that was programatically updated on ordering would also be useful but
I'm thinking examples here... ;) )

I've never seen JOIN used outside of a traditional SELECT t1.*,t2.*
FROM table1 AS t1 LEFT JOIN table2 AS t2 ON t1.id=t2.id type of
structure so I kinda feel like I have a new toy...

Thanks!

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP?

2009-06-19 Thread Dan Nelson
In the last episode (Jun 18), Matt Neimeyer said:
 I'm converting a PHP app from using Visual FoxPro as the database backend
 to using MySQL as the backend.  I'm testing on MySQL 4.1.22 on Mac OSX
 10.4.  The end application will be deployed cross platform and to both 4.x
 and 5.x MySQL servers.
 
 This query returned 21 records in .27 seconds:
 
 SELECT zip FROM zipcodes WHERE
 degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
 cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
  5

Ouch.  You might want to calculate the rectange enclosing your target
distance, add an index on lat (and/or long), and add the rectangle check to
your where clause: WHERE latitude BETWEEN lt1 AND lt2 AND longitude BETWEEN
ln2 AND ln2.  That way mysql can use the index to pare down most of the rows
without having to call all those trig functions for every zipcode.
 
 This query returned 21442 records in 1.08 seconds:
 
SELECT custzip FROM customers
 
 This query is still running half an hour later, with a Time of 2167
 and a State of Sending Data (according to the mysql process list)
 
SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
 zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
 cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
  5)
 
 When I try to EXPLAIN the query it gives me the following...
 
 id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
 1,PRIMARY,customers,index,NULL,cw_custzip,30,NULL,21226,Using where; Using 
 index
 2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where

Neither mysql 4 or 5 are very smart when it comes to subqueries.  Unless
your inner query is dead simple, mysql assumes it's a dependent subquery and
runs it once per row in your outer query.  You might want to try mysql 6 and
see if it does any better.  For example, here are explain plans for mysql 5
and 6 for the following query on the famous Oracle emp sample table:

select ename from emp where mgr in 
 (select empno from emp where ename in ('scott'));
+---+
| ename |
+---+
| ADAMS | 
+---+
1 row in set (0.00 sec)

Mysql 5.1.30:
+++---+-+---+-+-+--+--+-+
| id | select_type| table | type| possible_keys | key | 
key_len | ref  | rows | Extra   |
+++---+-+---+-+-+--+--+-+
|  1 | PRIMARY| emp   | ALL | NULL  | NULL| 
NULL| NULL |   14 | Using where |
|  2 | DEPENDENT SUBQUERY | emp   | unique_subquery | PRIMARY,ENAME | PRIMARY | 
4   | func |1 | Using where |
+++---+-+---+-+-+--+--+-+

Note that it didn't use an index on the outer query, and had to examine all
14 rows.  It even used the wrong index on the inner query :)

Mysql 6.0.11:
++-+---+--+---+---+-++--+---+
| id | select_type | table | type | possible_keys | key   | key_len | ref   
 | rows | Extra |
++-+---+--+---+---+-++--+---+
|  1 | PRIMARY | emp   | ref  | PRIMARY,ENAME | ENAME | 13  | const 
 |1 | Using index condition |
|  1 | PRIMARY | emp   | ref  | MGR   | MGR   | 5   | 
test.emp.EMPNO |2 |   |
++-+---+--+---+---+-++--+---+

Note that the queries have flipped and aren't nested anymore (id is 1 on
both queries).  The first query uses the ename index and estimates it will
return one row.  The second query uses the mgr index based on the empno
value returned by the first query and estimates it will return 2 rows.  Much
better :)
 
-- 
Dan Nelson
dnel...@allantgroup.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP?

2009-06-19 Thread Matt Neimeyer
 SELECT zip FROM zipcodes WHERE
 degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
 cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
  5

 Ouch.  You might want to calculate the rectange enclosing your target
 distance, add an index on lat (and/or long), and add the rectangle check to
 your where clause: WHERE latitude BETWEEN lt1 AND lt2 AND longitude BETWEEN
 ln2 AND ln2.  That way mysql can use the index to pare down most of the rows
 without having to call all those trig functions for every zipcode.

I like this idea the best (it always bothered me running a query that
involved multiple mathmatical functions).

So... Here's the scratch php code I ended up with... Anyone see any
problems with it? The only problem I see is that I think the old code
was more circular this will be a square (within the limits of a
square on a non-spherical earth... etc.. etc..) ... so there will be
more zip codes included in the corners. If there are too many
complaints about that I might look at some sort of overlapping
rectangle scheme instead of a square.

function ChangeInLatitude($Miles) { return rad2deg($Miles/3960); }
function ChangeInLongitude($Lat, $Miles) { return
rad2deg($Miles/3960*cos(deg2rad($Lat))); }

$Miles = 5;

$OriginalLat = 39.0788994;
$OriginalLon = -77.1227036;

$ChangeInLat = ChangeInLatitude($Miles);
$ChangeInLon = ChangeInLongitude($OriginalLat, $Miles);

$MinLat = $OriginalLat-$ChangeInLat;
$MaxLat = $OriginalLat+$ChangeInLat;

$MinLon = $OriginalLon-$ChangeInLon;
$MaxLon = $OriginalLon+$ChangeInLon;

My only other question is... when I explained the new query... On the
dependent subquery it says possible keys are zip, longitude and
latitude but it used zip. It seems like a better index would be
longitude or latitude? On the primary query, even though there is an
index on custzip it doesn't say it's using ANY indexes. I should
probably leave well enough alone... but I'm curious.

Thanks again!

Matt

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP?

2009-06-19 Thread Brent Baisley
It sounds like you want to use spatial indexes, but they only became
available in v4.1
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
http://dev.mysql.com/doc/refman/5.0/en/using-a-spatial-index.html

You would need to switch your table type from InnoDB to MyISAM, which
is fairly easy with ALTER TABLE. But that should allow you to drop all
your calculations in the query.

You don't have to do any re-architecture to change you subquery to a join:
SELECT custzip FROM customers
JOIN
(SELECT zip FROM
zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
 5) AS zips
ON custzip=zip

Often times that simple change speeds things up considerably in MySQL.
An explain should show it has a DERIVED TABLE if I recall correctly.

Brent Baisley

On Thu, Jun 18, 2009 at 9:06 PM, Matt Neimeyerm...@neimeyer.org wrote:
 I'm converting a PHP app from using Visual FoxPro as the database
 backend to using MySQL as the backend. I'm testing on MySQL 4.1.22 on
 Mac OSX 10.4. The end application will be deployed cross platform and
 to both 4.x and 5.x MySQL servers.

 This query returned 21 records in .27 seconds.

   SELECT zip FROM zipcodes WHERE
 degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
 cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
  5

 This query returned 21442 records in 1.08 seconds.

   SELECT custzip FROM customers

 This query is still running half an hour later, with a Time of 2167
 and a State of Sending Data (according to the mysql process list)

   SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
 zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
 cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
  5)

 When I try to EXPLAIN the query it gives me the following...

 id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
 1,PRIMARY,customers,index,NULL,cw_custzip,30,NULL,21226,Using where; Using 
 index
 2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where

 If it matters both tables are INNODB and both customers.custzip and
 zipcodes.zip are indexed. We used a program called DBConvert from
 DMSoft to convert the data so it's exactly the same on both the VFP
 side and the MySQL side. With all that in mind... VFP returns the
 exact same query in 5-10 seconds and that includes render time in the
 web browser.

 By comparison... the query WHERE id IN (SELECT id FROM phone WHERE
 phonedate = '2001-01-01' AND phonedate = '2009-06-18') returns
 almost instantly.

 I'm at a complete loss... The suggestions I've seen online for
 optimizing Dependent Subquery's basically revolve around changing it
 from a sub-query to a join but that would require more
 re-architecturing than I want to do... (Unless I'm forced) Especially
 since more than a few of those solutions suggested precalculating the
 distance between zipcodes which only works if the distances are known
 (only allow 10, 50 and 100 mile radi for example)

 Any ideas?

 Thanks in advance!

 Matt

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:    http://lists.mysql.com/mysql?unsub=brentt...@gmail.com



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP?

2009-06-19 Thread Peter Brawley

Matt,


This query is still running half an hour later, with a Time of 2167
and a State of Sending Data (according to the mysql process list)
  SELECT custzip FROM customers WHERE custzip IN ( ...


For explanation  alternatives see The unbearable slowness of IN() at 
http://localhost/artful/infotree/queries.php.


PB

-

Matt Neimeyer wrote:

I'm converting a PHP app from using Visual FoxPro as the database
backend to using MySQL as the backend. I'm testing on MySQL 4.1.22 on
Mac OSX 10.4. The end application will be deployed cross platform and
to both 4.x and 5.x MySQL servers.

This query returned 21 records in .27 seconds.

   SELECT zip FROM zipcodes WHERE
degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
 5

This query returned 21442 records in 1.08 seconds.

   SELECT custzip FROM customers

This query is still running half an hour later, with a Time of 2167
and a State of Sending Data (according to the mysql process list)

   SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
 5)

When I try to EXPLAIN the query it gives me the following...

id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,customers,index,NULL,cw_custzip,30,NULL,21226,Using where; Using index
2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where

If it matters both tables are INNODB and both customers.custzip and
zipcodes.zip are indexed. We used a program called DBConvert from
DMSoft to convert the data so it's exactly the same on both the VFP
side and the MySQL side. With all that in mind... VFP returns the
exact same query in 5-10 seconds and that includes render time in the
web browser.

By comparison... the query WHERE id IN (SELECT id FROM phone WHERE
phonedate = '2001-01-01' AND phonedate = '2009-06-18') returns
almost instantly.

I'm at a complete loss... The suggestions I've seen online for
optimizing Dependent Subquery's basically revolve around changing it
from a sub-query to a join but that would require more
re-architecturing than I want to do... (Unless I'm forced) Especially
since more than a few of those solutions suggested precalculating the
distance between zipcodes which only works if the distances are known
(only allow 10, 50 and 100 mile radi for example)

Any ideas?

Thanks in advance!

Matt

  




No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.364 / Virus Database: 270.12.78/2185 - Release Date: 06/18/09 05:53:00


  


Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP?

2009-06-19 Thread Walter Heck - OlinData.com
Peter,


On Thu, Jun 18, 2009 at 9:27 PM, Peter
Brawleypeter.braw...@earthlink.net wrote:
 For explanation  alternatives see The unbearable slowness of IN() at
 http://localhost/artful/infotree/queries.php.

you prolly meant to not post a url pointing at your local copy of your
website. This works better for most of us:
http://www.artfulsoftware.com/infotree/queries.php ;)

Walter



-- 
Walter Heck, Engineer @ Open Query (http://openquery.com)
Affordable Training and ProActive Support for MySQL  related technologies

Follow our blog at http://openquery.com/blog/
OurDelta: free enhanced builds for MySQL @ http://ourdelta.org

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Half Hour Sub-query in MySQL vs. 5 Seconds in VFP?

2009-06-18 Thread Matt Neimeyer
I'm converting a PHP app from using Visual FoxPro as the database
backend to using MySQL as the backend. I'm testing on MySQL 4.1.22 on
Mac OSX 10.4. The end application will be deployed cross platform and
to both 4.x and 5.x MySQL servers.

This query returned 21 records in .27 seconds.

   SELECT zip FROM zipcodes WHERE
degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
 5

This query returned 21442 records in 1.08 seconds.

   SELECT custzip FROM customers

This query is still running half an hour later, with a Time of 2167
and a State of Sending Data (according to the mysql process list)

   SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
 5)

When I try to EXPLAIN the query it gives me the following...

id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,customers,index,NULL,cw_custzip,30,NULL,21226,Using where; Using index
2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where

If it matters both tables are INNODB and both customers.custzip and
zipcodes.zip are indexed. We used a program called DBConvert from
DMSoft to convert the data so it's exactly the same on both the VFP
side and the MySQL side. With all that in mind... VFP returns the
exact same query in 5-10 seconds and that includes render time in the
web browser.

By comparison... the query WHERE id IN (SELECT id FROM phone WHERE
phonedate = '2001-01-01' AND phonedate = '2009-06-18') returns
almost instantly.

I'm at a complete loss... The suggestions I've seen online for
optimizing Dependent Subquery's basically revolve around changing it
from a sub-query to a join but that would require more
re-architecturing than I want to do... (Unless I'm forced) Especially
since more than a few of those solutions suggested precalculating the
distance between zipcodes which only works if the distances are known
(only allow 10, 50 and 100 mile radi for example)

Any ideas?

Thanks in advance!

Matt

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP?

2009-06-18 Thread Johnny Withers
I often find doing the IN (subquery...) is really slow versus doing a join:

SELECT cutzip
FROM customers
INNER JOIN zipcodes ON customers.zipcode=zipcodes.zip
WHERE
degrees(acos(sin(radians(39.0788994))
*
sin(radians(latitude))
+
cos(radians(39.0788994))
*
cos(radians(latitude))
*
cos(radians(-77.1227036-longitude))
)
)
*60
*1.1515
 5

That query may have un-matched ()'s, not sure. hard to tell =)

Try a join.

-jw

On Thu, Jun 18, 2009 at 8:06 PM, Matt Neimeyer m...@neimeyer.org wrote:

 I'm converting a PHP app from using Visual FoxPro as the database
 backend to using MySQL as the backend. I'm testing on MySQL 4.1.22 on
 Mac OSX 10.4. The end application will be deployed cross platform and
 to both 4.x and 5.x MySQL servers.

 This query returned 21 records in .27 seconds.

   SELECT zip FROM zipcodes WHERE
 degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+

 cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
  5

 This query returned 21442 records in 1.08 seconds.

   SELECT custzip FROM customers

 This query is still running half an hour later, with a Time of 2167
 and a State of Sending Data (according to the mysql process list)

   SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
 zipcodes WHERE
 degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+

 cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude*60*1.1515
  5)

 When I try to EXPLAIN the query it gives me the following...

 id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
 1,PRIMARY,customers,index,NULL,cw_custzip,30,NULL,21226,Using where; Using
 index
 2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where

 If it matters both tables are INNODB and both customers.custzip and
 zipcodes.zip are indexed. We used a program called DBConvert from
 DMSoft to convert the data so it's exactly the same on both the VFP
 side and the MySQL side. With all that in mind... VFP returns the
 exact same query in 5-10 seconds and that includes render time in the
 web browser.

 By comparison... the query WHERE id IN (SELECT id FROM phone WHERE
 phonedate = '2001-01-01' AND phonedate = '2009-06-18') returns
 almost instantly.

 I'm at a complete loss... The suggestions I've seen online for
 optimizing Dependent Subquery's basically revolve around changing it
 from a sub-query to a join but that would require more
 re-architecturing than I want to do... (Unless I'm forced) Especially
 since more than a few of those solutions suggested precalculating the
 distance between zipcodes which only works if the distances are known
 (only allow 10, 50 and 100 mile radi for example)

 Any ideas?

 Thanks in advance!

 Matt

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql?unsub=joh...@pixelated.net




-- 
-
Johnny Withers
601.209.4985
joh...@pixelated.net