Re: join speed vs. 2 queries

2005-02-15 Thread Mathew Ray
Many Thanks Peter, I appreciate your response.
Played around with the indexes, and modified the query a bit more to 
match the campaignId of the value first and got a 2000x performance 
increase from the original query...now it takes .03 seconds on average 
where it used to take 60.

One question though: is there a benefit to doing the INNER JOIN in the 
FROM clause rather than creating a join condition in the WHERE clause? I 
rewrote the query a bit using the WHERE join condition and noticed a 
slight performance hit on this particular query...

Also, I am doing some pretty hefty multi-dimensional analysis on the 
data such as getting all the hits for each of m-variables with n-values 
each within a date range that also contain certain other varname/value 
combinations. Now if I am doing multiple different queries on this data, 
 using different fields of the table in each one, is it preferable to 
create an index for each query, or make one uber-index that can be used 
as a swiss-army knife for at least a couple of them? I don't want to 
slow down INSERT calls if I can help it, but I don't want my queries to 
take 5 minutes each either...

Perhaps the solution is two different databases, one without indexes for 
inserts, one with indexes for pulling the data back out? Then I can use 
indexes when I need to and don't have the performance hit when I need to 
insert data.

The last question I have regards EXPLAIN results...
Am I correct in believing that each row corresponds to a different 
comparison in the WHERE clause, and that each row analysis is done on 
each of the rows before it (associative)? So if rows is 10,10,10 for 3 
joins, then in actuality 1000 rows will be examined? If this is the 
case, then I imagine it is better to winnow to the smallest possible set 
first and then go from there trying to get as few rows examined as 
theoretically possible given the constraints of the query.

~Mathew

Peter Brawley wrote:
 I have  a gut feeling that this kind of join should be able to be
 done with similar speed without having to use a temp table
Yep but remember the query engine uses one index per table so without 
seeing your EXPLAIN output I'd try indexing ...
  the data table on name_id,value_id,campaign_id,
  the names table on id,name,
  the values table on id,value,
then writing the query as ...
 SELECT COUNT(*)
  FROM data
   INNER JOIN names ON data.name_id=names.id
   INNER JOIN values ON data.value_id=values.id
 WHERE data.campaign_id = 22
   AND names.name = 'content'
   AND values.value = 'index'

PB
-
Mathew Ray wrote:
Newbie on the list here having a bit of confusion at the moment why an 
INNER JOIN is taking so long... I have replaced a few column names to 
make it a bit more succinct:

SELECT COUNT(*)
FROM data, values, names
WHERE data.campaign_id = 22
AND names.name = 'content'
AND values.value = 'index'
AND data.name_id = names.id
AND data.value_id = value.id;
This query should pull out all of the index content from the data 
table for campaign 22. name_id and value_id are indexed, as are the 
name and value fields of the names and values tables. campaign_id is 
also indexed  in the data table and each name and value is unique per 
campaign. The vardata dataset for this campaign that has around 163000 
entries and the above query takes nearly a minute to run. Total size 
of data table is around 3 million records.

On the same machine, the following query takes roughly 2 seconds to run:
CREATE TEMPORARY TABLE IF NOT EXISTS names_temp
SELECT names.id as var_id, values.id as val_id
FROM values, names
WHERE names.campaign_id = 22
AND values.campaign_id = names.campaign_id
AND names.name = 'content'
AND values.value = 'index';
SELECT COUNT(*)
FROM vardata, names_temp
WHERE vardata.varNameId = names_temp.var_id
AND vardata.varValueId = names_temp.val_id;
After looking at EXPLAIN for both, I understand that the latter is 
faster because it is doing lookups based on constant values, but I 
have  a gut feeling that this kind of join should be able to be done 
with similar speed without having to use a temp table... Is there any 
way to optimize the performance of the join query without having to go 
with the two-query option?




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: join speed vs. 2 queries

2005-02-15 Thread Peter Brawley
Matthew,
...is there a benefit to doing the INNER JOIN in the FROM clause
rather than creating a join condition in the WHERE clause? I rewrote
the query a bit using the WHERE join condition and noticed a slight
performance hit on this particular query...
There are (at least) three benefits to putting JOINs in the FROM clause: 
clarity for you and anyone who reads it, maintainability by you or 
anyone who follows you, and you've removed one kind of guessing from the 
query engine's joblist.

Also, I am doing some pretty hefty multi-dimensional analysis on the data
such as getting all the hits for each of m-variables with n-values 
each within
a date range that also contain certain other varname/value combinations.
Now if I am doing multiple different queries on this data,  using 
different fields
of the table in each one, is it preferable to create an index for each 
query, or
make one uber-index that can be used as a swiss-army knife for at least a
couple of them? I don't want to slow down INSERT calls if I can help it,
but I don't want my queries to take 5 minutes each either...

If you can get one multi-col index to work for most queries, you're 
golden. Failing that  ...

Perhaps the solution is two different databases, one without indexes 
for inserts,
one with indexes for pulling the data back out? Then I can use indexes 
when
I need to and don't have the performance hit when I need to insert data.

... an OLTP db for updates, an OLAP db for reports will often SYA.
Yes, winnow down starting at the top.
PB
Mathew Ray wrote:
Many Thanks Peter, I appreciate your response.
Played around with the indexes, and modified the query a bit more to 
match the campaignId of the value first and got a 2000x performance 
increase from the original query...now it takes .03 seconds on average 
where it used to take 60.

One question though: is there a benefit to doing the INNER JOIN in the 
FROM clause rather than creating a join condition in the WHERE clause? 
I rewrote the query a bit using the WHERE join condition and noticed a 
slight performance hit on this particular query...

Also, I am doing some pretty hefty multi-dimensional analysis on the 
data such as getting all the hits for each of m-variables with 
n-values each within a date range that also contain certain other 
varname/value combinations. Now if I am doing multiple different 
queries on this data,  using different fields of the table in each 
one, is it preferable to create an index for each query, or make one 
uber-index that can be used as a swiss-army knife for at least a 
couple of them? I don't want to slow down INSERT calls if I can help 
it, but I don't want my queries to take 5 minutes each either...

Perhaps the solution is two different databases, one without indexes 
for inserts, one with indexes for pulling the data back out? Then I 
can use indexes when I need to and don't have the performance hit when 
I need to insert data.

The last question I have regards EXPLAIN results...
Am I correct in believing that each row corresponds to a different 
comparison in the WHERE clause, and that each row analysis is done on 
each of the rows before it (associative)? So if rows is 10,10,10 for 3 
joins, then in actuality 1000 rows will be examined? If this is the 
case, then I imagine it is better to winnow to the smallest possible 
set first and then go from there trying to get as few rows examined as 
theoretically possible given the constraints of the query.

~Mathew

Peter Brawley wrote:
 I have  a gut feeling that this kind of join should be able to be
 done with similar speed without having to use a temp table
Yep but remember the query engine uses one index per table so without 
seeing your EXPLAIN output I'd try indexing ...
  the data table on name_id,value_id,campaign_id,
  the names table on id,name,
  the values table on id,value,
then writing the query as ...
 SELECT COUNT(*)
  FROM data
   INNER JOIN names ON data.name_id=names.id
   INNER JOIN values ON data.value_id=values.id
 WHERE data.campaign_id = 22
   AND names.name = 'content'
   AND values.value = 'index'

PB
-
Mathew Ray wrote:
Newbie on the list here having a bit of confusion at the moment why 
an INNER JOIN is taking so long... I have replaced a few column 
names to make it a bit more succinct:

SELECT COUNT(*)
FROM data, values, names
WHERE data.campaign_id = 22
AND names.name = 'content'
AND values.value = 'index'
AND data.name_id = names.id
AND data.value_id = value.id;
This query should pull out all of the index content from the data 
table for campaign 22. name_id and value_id are indexed, as are the 
name and value fields of the names and values tables. campaign_id is 
also indexed  in the data table and each name and value is unique 
per campaign. The vardata dataset for this campaign that has around 
163000 entries and the above query takes nearly a minute to run. 
Total size of data table is around 3 million records.

On the same machine, the following query 

join speed vs. 2 queries

2005-02-14 Thread Mathew Ray
Newbie on the list here having a bit of confusion at the moment why an 
INNER JOIN is taking so long... I have replaced a few column names to 
make it a bit more succinct:

SELECT COUNT(*)
FROM data, values, names
WHERE data.campaign_id = 22
AND names.name = 'content'
AND values.value = 'index'
AND data.name_id = names.id
AND data.value_id = value.id;
This query should pull out all of the index content from the data table 
for campaign 22. name_id and value_id are indexed, as are the name and 
value fields of the names and values tables. campaign_id is also indexed 
 in the data table and each name and value is unique per campaign. The 
vardata dataset for this campaign that has around 163000 entries and the 
above query takes nearly a minute to run. Total size of data table is 
around 3 million records.

On the same machine, the following query takes roughly 2 seconds to run:
CREATE TEMPORARY TABLE IF NOT EXISTS names_temp
SELECT names.id as var_id, values.id as val_id
FROM values, names
WHERE names.campaign_id = 22
AND values.campaign_id = names.campaign_id
AND names.name = 'content'
AND values.value = 'index';
SELECT COUNT(*)
FROM vardata, names_temp
WHERE vardata.varNameId = names_temp.var_id
AND vardata.varValueId = names_temp.val_id;
After looking at EXPLAIN for both, I understand that the latter is 
faster because it is doing lookups based on constant values, but I have 
 a gut feeling that this kind of join should be able to be done with 
similar speed without having to use a temp table... Is there any way to 
optimize the performance of the join query without having to go with the 
two-query option?


--
Thanks,
Mathew
..
Mathew J. Ray
Sr. Interactive Developer
IQ Television Group
..

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: join speed vs. 2 queries

2005-02-14 Thread Peter Brawley
I have  a gut feeling that this kind of join should be able to be
done with similar speed without having to use a temp table
Yep but remember the query engine uses one index per table so without 
seeing your EXPLAIN output I'd try indexing ...
  the data table on name_id,value_id,campaign_id,
  the names table on id,name,
  the values table on id,value,
then writing the query as ...
 SELECT COUNT(*)
  FROM data
   INNER JOIN names ON data.name_id=names.id
   INNER JOIN values ON data.value_id=values.id
 WHERE data.campaign_id = 22
   AND names.name = 'content'
   AND values.value = 'index'

PB
-
Mathew Ray wrote:
Newbie on the list here having a bit of confusion at the moment why an 
INNER JOIN is taking so long... I have replaced a few column names to 
make it a bit more succinct:

SELECT COUNT(*)
FROM data, values, names
WHERE data.campaign_id = 22
AND names.name = 'content'
AND values.value = 'index'
AND data.name_id = names.id
AND data.value_id = value.id;
This query should pull out all of the index content from the data 
table for campaign 22. name_id and value_id are indexed, as are the 
name and value fields of the names and values tables. campaign_id is 
also indexed  in the data table and each name and value is unique per 
campaign. The vardata dataset for this campaign that has around 163000 
entries and the above query takes nearly a minute to run. Total size 
of data table is around 3 million records.

On the same machine, the following query takes roughly 2 seconds to run:
CREATE TEMPORARY TABLE IF NOT EXISTS names_temp
SELECT names.id as var_id, values.id as val_id
FROM values, names
WHERE names.campaign_id = 22
AND values.campaign_id = names.campaign_id
AND names.name = 'content'
AND values.value = 'index';
SELECT COUNT(*)
FROM vardata, names_temp
WHERE vardata.varNameId = names_temp.var_id
AND vardata.varValueId = names_temp.val_id;
After looking at EXPLAIN for both, I understand that the latter is 
faster because it is doing lookups based on constant values, but I 
have  a gut feeling that this kind of join should be able to be done 
with similar speed without having to use a temp table... Is there any 
way to optimize the performance of the join query without having to go 
with the two-query option?



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.7 - Release Date: 2/10/2005
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]