"Octavian Rasnita" <[EMAIL PROTECTED]> wrote on 07/19/2005 02:45:58 AM:

> From: <[EMAIL PROTECTED]>
> Subject: Re: more queries vs a bigger one
> 
> 
> > Hello,
> > approx. how long does it take your "big" query to run as it is now? 
Are
> > these queries appending a table? or are they buiding a result (from a
> > "chain" of queries)?  Have you tried separating them out?  Any 
difference?
> > -sam
> >
> >
> 
> That query takes more than 2 minutes.
> I have a table with less than 90.000 records, and this is the bigger 
table
> from the database, so the query is very slow, because as you may see, 
the
> tables are not very big.
> 
> But that query might return more than 9.000 records, even though I limit 
it
> to first 30.
> 
> I will split it into more smaller queries and I will report if it works
> faster.
> 
> The query is something like the example below, but I don't have it here
> right now to send it to the list, but I will send it soon:
> 
> select a.pre_title, a.title, a.post_title, substring(a.body, 1,
> n.preview_size) as preview, n.title as publication, a.id_category, 
cs.label,
> count(aco.hash_articles) as comments, count(act.hash_articles) as 
counter
> from articles a
> left join newspapers n on(a.id_newspapers=n.id)
> left join sections s on(a.id_sections=s.id and s.id_newspapers=n.id)
> left join sections_categories cs on(a.id_categories=cs.id)
> inner join articles_counters act on(a.body_hash=act.hash_articles)
> inner join articles_comments aco on(a.body_hash=aco.hash_articles)
> where a.id_category=20
> and a.date between '2005-01-01' and '2005-12-31'
> group by a.body_hash order by rand();
> 
> The table newspapers has only 20 records. The table sections has under 
300
> records, but the tables articles_counter and articles_comments might 
have
> many records... hundread of thousands.... millions.
> 
> Thanks.
> 
> Teddy
> 

This is your original query (above) unwrapped and tabified:
select a.pre_title
        , a.title
        , a.post_title
        , substring(a.body, 1, n.preview_size) as preview
        , n.title as publication
        , a.id_category
        , cs.label
        , count(aco.hash_articles) as comments
        , count(act.hash_articles) as counter
from articles a
left join newspapers n 
        on(a.id_newspapers=n.id)
left join sections s 
        on(a.id_sections=s.id 
                and s.id_newspapers=n.id)
left join sections_categories cs 
        on(a.id_categories=cs.id)
inner join articles_counters act 
        on(a.body_hash=act.hash_articles)
inner join articles_comments aco 
        on(a.body_hash=aco.hash_articles)
where a.id_category=20
        and a.date between '2005-01-01' and '2005-12-31'
group by a.body_hash 
order by rand();

I see ONE GLARING problem with this query right away:If you weren't using 
MySQL, this would be an illegal query. 

Your GROUP BY clause does not contain enough columns and the column it 
*does* contain doesn't appear in your SELECT clause. MySQL has a way of 
making queries like this "work" but your results are not guaranteed to be 
deterministic.

Another issue that is killing your speed is your ORDER BY RAND();

I believe it would be much faster to break this into 3 smaller queries: 
first to pick your articles, second to get your statistics, the third will 
combine the first two with some additional information to provide your 
finished report. I am assuming that `articles`.`body_hash` has an index on 
it and is unique. I assume it is unique and indexed because you are using 
it as a foreign key to the `articles_counters` and `articles_comments` 
tables. If you do not have an index on `articles`.`body_hash`, add one 
then try your query again. If it is still too slow, try this query:

/* start query */
CREATE TEMPORARY TABLE tmpArtHash (key(body_hash, rndval))
SELECT body_hash, rand() as rndval
FROM articles
WHERE id_category=20
        AND `date` between '2005-01-01' and '2005-12-31';

CREATE TEMPORARY TABLE tmpArtCounters (key(body_hash))
SELECT th.body_hash
        , count(aco.hash_articles) as comments
        , count(act.hash_articles) as counter
FROM tmpArtHash th
LEFT JOIN articles_counters act 
        on th.body_hash=act.hash_articles
LEFT JOIN articles_comments aco 
        on th.body_hash=aco.hash_articles
GROUP BY th.body_hash

SELECT a.pre_title
        , a.title
        , a.post_title
        , substring(a.body, 1, n.preview_size) as preview
        , n.title as publication
        , a.id_category
        , cs.label
        , tc.comments
        , tc.counter
FROM tmpArtHash th
INNER JOIN articles a
        on a.body_hash = th.body_hash
left join newspapers n 
        on a.id_newspapers=n.id
left join sections s 
        on a.id_sections=s.id 
                and s.id_newspapers=n.id
left join sections_categories cs 
        on a.id_categories=cs.id
LEFT JOIN tmpArtCounters tc
        ON tc.body_hash = th.body_hash
order by th.rndval;

DROP TEMPORARY TABLE tmpArtHash, tmpArtCounter;
/* end query */

Let us know how fast it works. It should take less than 3 seconds or so 
(even for 100,000 records or more).

Reply via email to