Matthew,

>...is there a benefit to doing the INNER JOIN in the FROM clause
>rather than creating a join condition in the WHERE clause? I rewrote
>the query a bit using the WHERE join condition and noticed a slight
>performance hit on this particular query...

There are (at least) three benefits to putting JOINs in the FROM clause: clarity for you and anyone who reads it, maintainability by you or anyone who follows you, and you've removed one kind of guessing from the query engine's joblist.

>Also, I am doing some pretty hefty multi-dimensional analysis on the data
>such as getting all the hits for each of m-variables with n-values each within
>a date range that also contain certain other varname/value combinations.
>Now if I am doing multiple different queries on this data, using different fields
>of the table in each one, is it preferable to create an index for each query, or
>make one uber-index that can be used as a swiss-army knife for at least a
>couple of them? I don't want to slow down INSERT calls if I can help it,
>but I don't want my queries to take 5 minutes each either...


If you can get one multi-col index to work for most queries, you're golden. Failing that ...

>Perhaps the solution is two different databases, one without indexes for inserts,
>one with indexes for pulling the data back out? Then I can use indexes when
>I need to and don't have the performance hit when I need to insert data.


... an OLTP db for updates, an OLAP db for reports will often SYA.

Yes, winnow down starting at the top.

PB

Mathew Ray wrote:

Many Thanks Peter, I appreciate your response.

Played around with the indexes, and modified the query a bit more to match the campaignId of the value first and got a 2000x performance increase from the original query...now it takes .03 seconds on average where it used to take 60.

One question though: is there a benefit to doing the INNER JOIN in the FROM clause rather than creating a join condition in the WHERE clause? I rewrote the query a bit using the WHERE join condition and noticed a slight performance hit on this particular query...

Also, I am doing some pretty hefty multi-dimensional analysis on the data such as getting all the hits for each of m-variables with n-values each within a date range that also contain certain other varname/value combinations. Now if I am doing multiple different queries on this data, using different fields of the table in each one, is it preferable to create an index for each query, or make one uber-index that can be used as a swiss-army knife for at least a couple of them? I don't want to slow down INSERT calls if I can help it, but I don't want my queries to take 5 minutes each either...

Perhaps the solution is two different databases, one without indexes for inserts, one with indexes for pulling the data back out? Then I can use indexes when I need to and don't have the performance hit when I need to insert data.

The last question I have regards EXPLAIN results...
Am I correct in believing that each row corresponds to a different comparison in the WHERE clause, and that each row analysis is done on each of the rows before it (associative)? So if rows is 10,10,10 for 3 joins, then in actuality 1000 rows will be examined? If this is the case, then I imagine it is better to winnow to the smallest possible set first and then go from there trying to get as few rows examined as theoretically possible given the constraints of the query.


~Mathew



Peter Brawley wrote:

 >I have  a gut feeling that this kind of join should be able to be
 >done with similar speed without having to use a temp table

Yep but remember the query engine uses one index per table so without seeing your EXPLAIN output I'd try indexing ...
the data table on name_id,value_id,campaign_id,
the names table on id,name,
the values table on id,value,
then writing the query as ...
SELECT COUNT(*)
FROM data
INNER JOIN names ON data.name_id=names.id
INNER JOIN values ON data.value_id=values.id
WHERE data.campaign_id = 22
AND names.name = 'content'
AND values.value = 'index'


PB

-----
Mathew Ray wrote:

Newbie on the list here having a bit of confusion at the moment why an INNER JOIN is taking so long... I have replaced a few column names to make it a bit more succinct:


SELECT COUNT(*) FROM data, values, names

WHERE data.campaign_id = 22
AND names.name = 'content'
AND values.value = 'index'

AND data.name_id = names.id
AND data.value_id = value.id;


This query should pull out all of the index content from the data table for campaign 22. name_id and value_id are indexed, as are the name and value fields of the names and values tables. campaign_id is also indexed in the data table and each name and value is unique per campaign. The vardata dataset for this campaign that has around 163000 entries and the above query takes nearly a minute to run. Total size of data table is around 3 million records.


On the same machine, the following query takes roughly 2 seconds to run:


CREATE TEMPORARY TABLE IF NOT EXISTS names_temp SELECT names.id as var_id, values.id as val_id FROM values, names

WHERE names.campaign_id = 22
AND values.campaign_id = names.campaign_id

AND names.name = 'content'
AND values.value = 'index';

SELECT COUNT(*)
FROM vardata, names_temp
WHERE vardata.varNameId = names_temp.var_id
AND vardata.varValueId = names_temp.val_id;


After looking at EXPLAIN for both, I understand that the latter is faster because it is doing lookups based on constant values, but I have a gut feeling that this kind of join should be able to be done with similar speed without having to use a temp table... Is there any way to optimize the performance of the join query without having to go with the two-query option?












--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.8 - Release Date: 2/14/2005


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to