Re: Self Join Performance

2006-07-12 Thread Brent Baisley

Interesting setup. You're using one more join than you need to. Your query 
should look like this:
SELECT
DISTINCT(fvr.DocumentID)
FROM
FieldValueRelation fvr
INNER JOIN
FieldValueRelation fvr2
ON  fvr.DocumentID = fvr2.DocumentID
AND fvr2.FieldValueID = '1569'
WHERE  fvr1.FieldValueID = '1344'

Your query was so slow because you were first selecting ALL the records from FieldValueRelation, then using a join to filter out the 
records. You should be using a WHERE clause to filter out the first data set.


- Original Message - 
From: Andrew Wood [EMAIL PROTECTED]

To: mysql@lists.mysql.com
Sent: Tuesday, July 11, 2006 9:50 PM
Subject: Self Join Performance



Hello everyone!

I've got a few questions regarding optimizing self-joins.
So I've got these three tables:

mysql describe FieldName;
+-++--+-+-++
| Field   | Type   | Null | Key | Default | Extra  |
+-++--+-+-++
| FieldNameID | bigint(20) | NO   | PRI | NULL| auto_increment |
| Name| char(255)  | YES  | | NULL||
+-++--+-+-++
2 rows in set (0.02 sec)

mysql describe FieldValue;
+--++--+-+-++
| Field| Type   | Null | Key | Default | Extra  |
+--++--+-+-++
| FieldValueID | bigint(20) | NO   | PRI | NULL| auto_increment |
| FieldNameID  | bigint(20) | NO   | MUL | NULL||
| Value| char(255)  | YES  | MUL | NULL||
+--++--+-+-++
3 rows in set (0.00 sec)

mysql describe FieldValueRelation;
+--+-+--+-+-+---+
| Field| Type| Null | Key | Default | Extra |
+--+-+--+-+-+---+
| FieldValueID | bigint(20)  | NO   | MUL | NULL|   |
| DocumentID   | bigint(20)  | NO   | MUL | NULL|   |
| FieldSetID   | tinyint(11) | YES  | | NULL|   |
+--+-+--+-+-+---+
3 rows in set (0.00 sec)


I've ran a script to populate the database with a decent set of
randomized data (~7 million DocumentIDs).  The following query may
clear things up a bit:


mysql select * from FieldName, FieldValue, FieldValueRelation where
DocumentID = '700' and FieldValue.FieldValueID =
FieldValueRelation.FieldValueID and FieldName.FieldNameID =
FieldValue.FieldNameID;
+-+---+--+-+--+--+++
| FieldNameID | Name  | FieldValueID | FieldNameID | Value|
FieldValueID | DocumentID | FieldSetID |
+-+---+--+-+--+--+++
|   1 | Account   |  6737063 |   1 | 88116010 |
 6737063 |700 |   NULL |
|   2 | FirstName | 1344 |   2 | Noelle   |
1344 |700 |  1 |
|   3 | LastName  | 1569 |   3 | Shea |
1569 |700 |  1 |
+-+---+--+-+--+--+++
3 rows in set (0.00 sec)



So here's my question:  I want to be able to find all of the rows in
the FieldValueRelation table that have both a FirstName of 'Noelle'
and a LastName of 'Shea'.  I've tried a few things, all of which have
less than stellar performance.  I'm guesstimating that I'll have to do
a self join on the FieldValueRelation table. Alas, it's pretty darn
slow:

SELECT
DISTINCT(fvr.DocumentID)
FROM
FieldValueRelation fvr
INNER JOIN
FieldValueRelation fvr1
ON fvr.DocumentID = fvr1.DocumentID
AND fvr1.FieldValueID = '1344'
INNER JOIN
FieldValueRelation fvr2
ON  fvr.DocumentID = fvr2.DocumentID
AND fvr2.FieldValueID = '1569'

And when I execute it:

++
| DocumentID |
++
| 162955 |
| 721704 |
| 993290 |
|1606157 |
|2459823 |
|2759626 |
|3949779 |
|5192230 |
|5753563 |
|6616602 |
|700 |
++
11 rows in set (37.33 sec)

37 seconds is a bit long for what I'd like to use this schema for. So
here's the counts for the matching FieldValueIDs (if this helps):

mysql select count(*) from FieldValueRelation where FieldValueID = '1569';
+--+
| count(*) |
+--+
| 7753 |
+--+
1 row in set (0.05 sec)

mysql select count(*) from FieldValueRelation where FieldValueID = '1344';
+--+
| count(*) |
+--+
| 7357 |
+--+
1 row in set (0.01 sec)


Can anyone shed any light on this?

--
Cheers,
-Andrew


All generalizations, with the possible exception of this one, are false.
- Kurt Goedel

--
MySQL General Mailing List
For list archives: http

Self Join Performance

2006-07-11 Thread Andrew Wood

Hello everyone!

I've got a few questions regarding optimizing self-joins.
So I've got these three tables:

mysql describe FieldName;
+-++--+-+-++
| Field   | Type   | Null | Key | Default | Extra  |
+-++--+-+-++
| FieldNameID | bigint(20) | NO   | PRI | NULL| auto_increment |
| Name| char(255)  | YES  | | NULL||
+-++--+-+-++
2 rows in set (0.02 sec)

mysql describe FieldValue;
+--++--+-+-++
| Field| Type   | Null | Key | Default | Extra  |
+--++--+-+-++
| FieldValueID | bigint(20) | NO   | PRI | NULL| auto_increment |
| FieldNameID  | bigint(20) | NO   | MUL | NULL||
| Value| char(255)  | YES  | MUL | NULL||
+--++--+-+-++
3 rows in set (0.00 sec)

mysql describe FieldValueRelation;
+--+-+--+-+-+---+
| Field| Type| Null | Key | Default | Extra |
+--+-+--+-+-+---+
| FieldValueID | bigint(20)  | NO   | MUL | NULL|   |
| DocumentID   | bigint(20)  | NO   | MUL | NULL|   |
| FieldSetID   | tinyint(11) | YES  | | NULL|   |
+--+-+--+-+-+---+
3 rows in set (0.00 sec)


I've ran a script to populate the database with a decent set of
randomized data (~7 million DocumentIDs).  The following query may
clear things up a bit:


mysql select * from FieldName, FieldValue, FieldValueRelation where
DocumentID = '700' and FieldValue.FieldValueID =
FieldValueRelation.FieldValueID and FieldName.FieldNameID =
FieldValue.FieldNameID;
+-+---+--+-+--+--+++
| FieldNameID | Name  | FieldValueID | FieldNameID | Value|
FieldValueID | DocumentID | FieldSetID |
+-+---+--+-+--+--+++
|   1 | Account   |  6737063 |   1 | 88116010 |
 6737063 |700 |   NULL |
|   2 | FirstName | 1344 |   2 | Noelle   |
1344 |700 |  1 |
|   3 | LastName  | 1569 |   3 | Shea |
1569 |700 |  1 |
+-+---+--+-+--+--+++
3 rows in set (0.00 sec)



So here's my question:  I want to be able to find all of the rows in
the FieldValueRelation table that have both a FirstName of 'Noelle'
and a LastName of 'Shea'.  I've tried a few things, all of which have
less than stellar performance.  I'm guesstimating that I'll have to do
a self join on the FieldValueRelation table. Alas, it's pretty darn
slow:

SELECT
DISTINCT(fvr.DocumentID)
FROM
FieldValueRelation fvr
INNER JOIN
FieldValueRelation fvr1
ON fvr.DocumentID = fvr1.DocumentID
AND fvr1.FieldValueID = '1344'
INNER JOIN
FieldValueRelation fvr2
ON  fvr.DocumentID = fvr2.DocumentID
AND fvr2.FieldValueID = '1569'

And when I execute it:

++
| DocumentID |
++
| 162955 |
| 721704 |
| 993290 |
|1606157 |
|2459823 |
|2759626 |
|3949779 |
|5192230 |
|5753563 |
|6616602 |
|700 |
++
11 rows in set (37.33 sec)

37 seconds is a bit long for what I'd like to use this schema for. So
here's the counts for the matching FieldValueIDs (if this helps):

mysql select count(*) from FieldValueRelation where FieldValueID = '1569';
+--+
| count(*) |
+--+
| 7753 |
+--+
1 row in set (0.05 sec)

mysql select count(*) from FieldValueRelation where FieldValueID = '1344';
+--+
| count(*) |
+--+
| 7357 |
+--+
1 row in set (0.01 sec)


Can anyone shed any light on this?

--
Cheers,
-Andrew


All generalizations, with the possible exception of this one, are false.
- Kurt Goedel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]