Have you tried moving the "GroupID not IN ()" into your JOIN clause?
SELECT *
FROM News
JOIN FileGroup
ON News.FileID = FileGroup.FileID
AND FileGroup.GroupID NOT IN (638)
WHERE News.Category = 7
GROUP BY News.FileID
ORDER BY News.Date DESC
LIMIT 100
I think, though, that because you are excluding only a small part of your
FileGroup table, the optimizer might think that more than the "magic
threshold" (somewhere around 30%) number of records will be returned and
may still revert to a table scan. Another option could be to select all of
the rows from FileGroup where the rows are IN the group to a temp table.
Then LEFT JOIN the temp table to news and check for null values in the
temporary table.
CREATE TEMPORARY TABLE tmpFG
SELECT FileID
FROM FileGroup
WHERE Group IN (638)
ALTER TABLE tmpFG ADD KEY(FileID)
SELECT *
FROM News n
JOIN FileGroup fg
ON n.FileID = fg.FileID
LEFT JOIN tmpFG tfg
on fg.FileID = n.FileID
WHERE n.Category = 7
AND tfg.fileID is null
GROUP BY n.FileID
ORDER BY n.Date DESC
LIMIT 100
DROP TABLE tmpFG
I suggest this as it is sometimes faster to find something then exclude
what you find than to "not find" it the first time.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
Chris Elsworth <[EMAIL PROTECTED]> wrote on 07/19/2004 11:59:43
AM:
> Hello,
>
> I wonder if someone could shed some light on a problem that's been
bothering
> me for months. Please bear with me as I explain it..
>
> I have two tables in question:
>
> CREATE TABLE `News` (
> `FileID` int(8) unsigned NOT NULL auto_increment,
> `Subject` char(200) NOT NULL default '',
> `Category` tinyint(3) unsigned NOT NULL default '0',
> `SubCategory` smallint(5) unsigned NOT NULL default '0',
> `Date` int(10) unsigned NOT NULL default '0',
> PRIMARY KEY (`FileID`),
> KEY `S_D` (`SubCategory`,`Date`),
> KEY `C_D_P` (`Category`,`Date`,`PostID`),
> KEY `C_P_D` (`Category`,`PostID`,`Date`,
> KEY `Subject` (`Subject`(10)),
> KEY `C_D` (`Category`,`Date`),
> FULLTEXT KEY `ft_Subject` (`Subject`)
> ) TYPE=MyISAM
>
> CREATE TABLE `FileGroup` (
> `FileID` int(8) unsigned NOT NULL default '0',
> `GroupID` smallint(5) unsigned NOT NULL default '0',
> PRIMARY KEY (`GroupID`,`FileID`),
> KEY `F_G` (`FileID`,`GroupID`),
> ) TYPE=MyISAM
>
>
> News has about 2.5m rows and FileGroup has 3.1m rows.
> For each row in News, there are one or more corresponding rows in
> FileGroup, determining in which newsgroups each News article appears.
>
> So, typically:
> mysql> select * from News where FileID = 26222004\G
> *************************** 1. row ***************************
> FileID: 26222004
> Subject: some_usenet_post
> Category: 7
> SubCategory: 706
> Date: 1090239675
>
> mysql> select * from FileGroup where FileID = 26222004\G
> *************************** 1. row ***************************
> FileID: 26222004
> GroupID: 638
> *************************** 2. row ***************************
> FileID: 26222004
> GroupID: 799
>
>
> The problem occurs when I want to get News rows that do not appear
> in a specific group or set of groups. I also only want one row per News
> article, not one row per group, so I have a GROUP BY (I could maybe use
> DISTINCT too but they'd do pretty much the same) in there.
>
> I end up with something like this:
>
> SELECT * FROM News
> JOIN FileGroup ON (News.FileID = FileGroup.FileID)
> WHERE GroupID NOT IN (638) AND Category = 7
> GROUP BY News.FileID ORDER BY Date DESC
> LIMIT 100
>
> The resulting explain:
> +-----------+------+-----------------------------+-------+---------
> +-------------+--------+----------------------------------------------+
> | table | type | possible_keys | key | key_len |
> ref | rows | Extra |
> +-----------+------+-----------------------------+-------+---------
> +-------------+--------+----------------------------------------------+
> | News | ref | PRIMARY,C_D_P,C_P_D,C_D | C_P_D | 1 |
> const | 595494 | Using where; Using temporary; Using filesort |
> | FileGroup | ref | F_G | F_G | 4 |
> News.FileID | 1 | Using where |
> +-----------+------+-----------------------------+-------+---------
> +-------------+--------+----------------------------------------------+
>
> MySQL is being forced to create a temporary table because of the GROUP
BY,
> and it pretty much seems to end up scanning the entire table - queries
are
> taking upwards of 30 seconds. In the queries that I can remove the JOIN
(I
> don't always need it, because I don't always need to exclude items in
> certain groups) it flies, because the temporary table and filesort
pretty
> much always go; I can fiddle with indexes to make that always the case;
the
> GROUP BY can go too, since the rows will always be unique.
>
> I've typed myself out now so I'll keep the actual question short :)
> What can I do about this? Is there a more efficient way to store this
data
> to avoid these horrific queries?
>
> If I can supply any more relevant information I'll be only too pleased
to.
>
> Thanks for any hints in advance.
>
> --
> Chris
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
>