After Shawn's guidance, I tried inserting with the temporary table method 
using this:

CREATE TEMPORARY TABLE tmpStats (KEY(product_id))
  SELECT TP.thread_id, COUNT(TP.thread_id) AS num_posts, MAX(TP.post_date) AS 
last_update
  FROM thread_post AS TP
  INNER JOIN thread_link AS TL ON TP.thread_id = TL.thread_id
  WHERE 
    TL.category_id = 123456 AND TP.post_date > 1999
  GROUP BY TP.thread_id;
            

SELECT TS.num_posts, TS.last_update, TS.thread_id, T.thread_name, 
i.thread_image
  FROM tmpStats AS TS
  INNER JOIN thread AS T ON T.tread_id = TS.thread_id
  INNER JOIN image AS i    ON i.thread_id = TS.thread_id AND i.display_type = 
'thumbnail'
  ORDER BY TS.num_posts DESC 
  LIMIT 0, 20;

I did some testing and here is the performance on various categories in the 
form of:
#Rows - Time for insert statement - Time for select statement
-----------------------
310 rows - 1.56 sec - 0.20 sec
1964 - 4.71 - 0.08
1264 - 1.98 - 0.17
51677- 43.31- 0.12

Then I went back to the old way but it was about 3 to 5 times slower than the 
above!!  I couldn't think of any reason for the extreme slowness (it wasn't 
that slow before) so I restarted mysql (service mysql restart).

Somewhat surprisingly, queries started running much faster.  After the 
restart I got these stats:
#Rows - INSERT - SELECT
----------------------
680 -  1.91 - 0.18
1373 - 1.77 - 0.10
4518 - 2.99 - 0.04
6131 - 5.29 - 0.08
6938 - 2.86 - 0.27
6993 - 3.69 - 0.04
9133 - 10.45 - 0.02
18793 - 9.80 - 0.02
24783 - 6.36 - 0.02


The old non-temp table query produced the following:
#Rows - SELECT TIME
------------------
317 - 1.78
388 - 0.89
3721 - 1.93
6025 - 1.83
51677 - 8.54

Neither query seems to be blazing fast.  It's also strange to me that 
restarting mysql would have such a performance benefit.
The server is a dedicated mysql server: dual 2 GHz Xeon with 2GB RAM, no 
raid, no slaves (yet).

Seems like queries involving only a few thousand rows should execute 
faster..???

- John

----------------------------------------
>I know you said this was a translation of your original query. Assuming 
that it is a faithful translation, I have the following suggestions:

>Do not enclose numbers with quotes (category_id is a number, right? No 
quotes are needed)

>You do not need include the table "category" in this query. You select no 
data from it and refer to it only using it's category_id. That value you 
already have on the "thread_link" table so you do not need any other 
tables to be able to use it.

>Rev 1:

>SELECT 
  Count(TP.thread_ID) as num_posts
        , Max(TP.post_date) as latest_date
        , T.thread_ID
        , T.thread_name
        , i.post_image
FROM thread as T
INNER JOIN thread_link as TL
        ON TL.thread_id = t.thread_id
        AND TL.category_id = 8759
INNER JOIN thread_post as TP
        ON TP.thread_id = t.thread_id
        AND tp.rating > 0
INNER JOIN thread_image as i
        ON i.thread_id = T.thread_id
        AND i.display_type = 'thumbnail'

>You need the count of # of posts and the latest date based on the Thread 
ID, and you know which threads to aggregate based on the category it's in. 
So, you could start by collecting into a temp table only the basic 
information you need for your report. This minimizes the size of the 
intermediate tables so that the GROUP BY  can go much faster.   Then join 
to your temp table any other tables that you need in order to fill in the 
rest of your columns

>Rev 2:

>CREATE TEMPORARY TABLE tmpStats (KEY(thread_id))
SELECT 
        TP.thread_ID
        Count(TP.thread_ID) as num_posts
        , Max(TP.post_date) as latest_date
FROM thread_posts as TP
INNER JOIN thread_link TL
        ON TP.thread_id = TL.thread_id
WHERE 
        TL.category_ID = 8759
GROUP BY 1
ORDER BY 2,3 DESC;

>SELECT ts.num_posts, ts.latest_date, ts.thread_id, t.thread_name, 
i.post_image
FROM tmpStats ts
INNER JOIN thread t
        on t.thread_id = ts.thread_id
INNER JOIN thread_image i
        on i.thread_id = ts.thread_id;

drop table tmpStats;

>You get the same results as the "all-in-one" query  but by breaking it 
into smaller steps, you save the engine a "metric butt-load" (trust me, 
it's a rather large unit of measure) of intermediate processing. Just the 
difference in joining 20  records (and not the entire thread_posts table) 
to the thread and thread_image tables  will save you several seconds.

>Shawn Green
>Database Administrator
>Unimin Corporation - Spruce Pine



>[EMAIL PROTECTED] wrote on 09/09/2004 11:36:18 AM:

> I'm having a bit of a problem with a query that takes a very long 
> time (up to 1 minute) when many matching rows are found.
> The tables are all indexed and the explain seems to indicate that 
> mysql is using the indexes but it is still painfully slow:
> 
> mysql> SELECT COUNT(TP.thread_id) AS num_posts, MAX(TP.post_date) AS
> latest_date,T.thread_id, T.thread_name, i.post_image
> FROM category AS C
>     -> INNER JOIN thread_link AS TL USE INDEX(CAT_INDEX) ON C.
> category_id = TL.category_id 
>     -> INNER JOIN thread AS T ON TL.thread_id = T.thread_id
>     -> INNER JOIN thread_image AS i ON TL.thread_id = i.thread_id
>     -> INNER JOIN thread_post AS TP USE INDEX(thread_id) ON TL.
> thread_id = TP.thread_id
>     -> WHERE C.category_id =  '8759'  AND i.display_type = 
> 'thumbnail' AND TP.rating > 0
>     -> GROUP  BY TL.thread_id
>     -> ORDER  BY  'num_posts' DESC 
>     -> LIMIT 0 , 20;
> 
> ... results ...
> 20 rows in set (37.37 sec)
> 
> The above query is a hypothetical query (hint: I'm not doing a forum
> db) but it pretty much matches what I'm doing. 
> In this case there are many categories 'C' and a thread 'T' can be 
> in multiple categories that link together with a thread_link 'TL'. 
> Thread posts 'TP' contain the individual posts within a thread topic.
> 
> I want to select the top 20 thread topics 'T', for a particular 
> category based on the number of posts within that thread 'TP'.  I 
> also want to calculate the latest post date (when the last post was 
added).
> 
> The tables work fine, the results are fine... it just ISN'T FAST! 
> Especially if there are a lot of threads for that particular category.
> 
> Here is the explain data:
> +-------+--------+---------------+------------+---------
> +---------------------+------+---------------------------------+
> | table | type   | possible_keys | key        | key_len | ref 
> | rows | Extra    |
> +-------+--------+---------------+------------+---------
> +---------------------+------+---------------------------------+
> | C     | const  | PRIMARY       | PRIMARY    |       4 | const 
> |    1 | Using temporary; Using filesort |
> | TL    | ref    | CAT_INDEX     | CAT_INDEX  |       4 | const 
> |  105 | Using where                     |
> | T     | eq_ref | PRIMARY       | PRIMARY    |       4 | TL.
> product_id       |    1 |                                 |
> | i     | eq_ref | PRIMARY       | PRIMARY    |       5 | TL.
> thread_id,const |    1 | Using where                     |
> | TP    | ref    | thread_id    | thread_id |       4 | TL.thread_id
> |    2 | Using where                  |
> +-------+--------+---------------+------------+---------
> +---------------------+------+---------------------------------+
> 5 rows in set (0.00 sec)
> 
> I think the problem may be with 'Using Temporary; Using Filesort' 
> probably due to the GROUP BY???
> 
> Here are the indexes:
> thread: 
> - thread_id (PK)
> 
> thread_link: 
> - thread_id, category_id (combined PK)
> - category_id (CAT_INDEX)
> 
> thread_post:
> - thread_id, post_id (combined PK)
> 
> category:
> - category_id
> 
> thread_image:
> - thread_id,display_type (combined PK)
> - thread_id (INDEX)
> 
> Major kudos to whom ever can help me out with this!!
> - John

Reply via email to