[sqlalchemy] Re: Optimizing a slow query

2008-06-09 Thread EricHolmberg



On Jun 8, 5:09 am, beewee [EMAIL PROTECTED] wrote:
 Hi,

 thanks for your answers.

  Other improvements would include (as previously stated by Michael)
  would be to make sure you have indexed all of the items in your WHERE,
  ORDER BY, and ON clauses.

 I created this index:
 create index viewforum on forum_post (topic_id, id);
 Is this right?

 As a
  quick check, try reducing the 140,000 offset to 0

 If I do so, the query is really fast (0.01 sec), but with a big offset
 it still takes more than 10 seconds :/
 This is what EXPLAIN says now:http://paste.pocoo.org/show/64838/
 I executed OPTIMIZE TABLE forum_post but it didn't speed up the
 queries.

That narrows the problem down to the sub-select then, since this
select will only return 15 rows, but it has to run through a minimum
of 140,015 rows to get those 15.  To verify this is the issue, try
doing the sub-select by itself and doing and explain on it to verify
that is the issue.

Simplified query:
explain SELECT ...  FROM (SELECT ...  FROM forum_post  WHERE 9250 =
forum_post.topic_id ORDER BY forum_post.id   LIMIT 14, 15) AS
anon_1 LEFT OUTER JOIN ... ON anon_1.forum_post_id =
forum_attachment_1.post_id LEFT OUTER JOIN ... ON
anon_1.forum_post_author_id = portal_user_1.id ORDER BY
anon_1.forum_post_id, forum_attachment_1.id, portal_user_1.id;

Try to do just the sub-select:
SELECT ...  FROM forum_post  WHERE 9250 = forum_post.topic_id ORDER BY
forum_post.id   LIMIT 14, 15;

and an explain on it:
SELECT ...  FROM forum_post  WHERE 9250 = forum_post.topic_id ORDER BY
forum_post.id   LIMIT 14, 15


Doing a quick sanity check on my system, a similar query to this
subselect with a LIMIT 14, 15 takes about 2 seconds after a server
restart and about 0.750 seconds once the indexes have been cached in
memory.  This is on an old 2.8 GHz Xeon with Hyperthreading with a
software RAID filesystem that runs about 150 MB/s and 4 GB of memory
for mysql with a dabase index size of 8 GB.  So really, your 10-second
query time is only 5 times slower which could be explained by not
having enough memory devoted to the correct MySQL buffers, so MySQL
starts having to wait on the hard drive.  Take a look at some tuning
parameters that depend upon the database engine that you are using.

  If you're still out of ideas, consider putting a timestamp on the
  records and indexing the timestamp.  You can then do a query based
  upon a time range which I know is fast since I do it all the time on a
  750-million record stock database.

 Can you explain this a little bit more? The forum_post table already
 has a time stamp (pub_date = the date when the post was written), but
 I have no idea how to filter the records using this column to just get
 the posts of a specific page of a specific topic.

If you index the timestamp, then you can provide this as a filter.
For example, in your sub-select, if you limit it to posts to the topic
within the last 30 days, then your sub-select query could easily only
have to deal with 5,000 records instead of 140,000.  For example:

From:
SELECT ...  FROM forum_post  WHERE 9250 = forum_post.topic_id ORDER BY
forum_post.id   LIMIT 14, 15

To:
SELECT ...  FROM forum_post  WHERE 9250 = forum_post.topic_id AND
pub_date = DATE_SUB(NOW(), INTERVAL 31 DAY) ORDER BY forum_post.id
LIMIT 15

This should greatly limit the number of rows that the database must
deal with internally.  Also, keep in mind that it is very important
that you make sure that the DATE_SUB calculation can be done just
once, and not for every row.  For example, pub_date = DATE_SUB(NOW(),
INTERVAL 31 DAY) will be very fast, but TO_DAYS(pub_date) -
TO_DAYS(NOW())  31 will be extremely slow since the database will
have to do the calculation for every row.

Regards,

Eric
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing a slow query

2008-06-08 Thread beewee

Hi,

thanks for your answers.

 Other improvements would include (as previously stated by Michael)
 would be to make sure you have indexed all of the items in your WHERE,
 ORDER BY, and ON clauses.
I created this index:
create index viewforum on forum_post (topic_id, id);
Is this right?

As a
 quick check, try reducing the 140,000 offset to 0
If I do so, the query is really fast (0.01 sec), but with a big offset
it still takes more than 10 seconds :/
This is what EXPLAIN says now: http://paste.pocoo.org/show/64838/
I executed OPTIMIZE TABLE forum_post but it didn't speed up the
queries.

 If you're still out of ideas, consider putting a timestamp on the
 records and indexing the timestamp.  You can then do a query based
 upon a time range which I know is fast since I do it all the time on a
 750-million record stock database.
Can you explain this a little bit more? The forum_post table already
has a time stamp (pub_date = the date when the post was written), but
I have no idea how to filter the records using this column to just get
the posts of a specific page of a specific topic.

Benjamin Wiegand


On 7 Jun., 16:22, EricHolmberg [EMAIL PROTECTED] wrote:
 It looks like your sub-select (before the joins) is processing up to
 140,015 records, so that will slow things down since the database may
 not optimize that sub-selection based upon your outer joins.  As a
 quick check, try reducing the 140,000 offset to 0 (I know this won't
 work for your application, but do this as a sanity check) in your
 limit statement.  If that works, then try refactoring your database
 (potentially with a timestamp like I explain later).

 subselect:  (... WHERE forum_post.topic_id = 9250 ORDER BY
 forum_post.id   LIMIT 14, 15)

 Other improvements would include (as previously stated by Michael)
 would be to make sure you have indexed all of the items in your WHERE,
 ORDER BY, and ON clauses.  Also, check to see if you can do arrange
 your web page such that you can reduce the amount of information
 required initially which will reduce your query time.

 Really, 198,398 records is pretty small, so even with a filesort, I
 wouldn't expect this to take very long.  Have you double-checked that
 your server's file system is running okay, that your DB tables aren't
 fragemented, etc?  You can handle most of this optimization by doing
 an OPTIMIZE table_name; query in MySQL.

 If you're still out of ideas, consider putting a timestamp on the
 records and indexing the timestamp.  You can then do a query based
 upon a time range which I know is fast since I do it all the time on a
 750-million record stock database.

 -Eric
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing a slow query

2008-06-08 Thread Paul Johnston

Hi,

create index viewforum on forum_post (topic_id, id);
  

You probably want:

create index forum_post_topic_id on forum_post (topic_id);

Paul


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing a slow query

2008-06-08 Thread beewee

 create index viewforum on forum_post (topic_id, id);

 You probably want:

 create index forum_post_topic_id on forum_post (topic_id);
are you sure? A key on topic_id and id makes my query much faster for
low offsets, while without it the query takes even for low offsets 
10 seconds.

Benjamin Wiegand
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing a slow query

2008-06-08 Thread Paul Johnston

Hi,

create index forum_post_topic_id on forum_post (topic_id);


are you sure? A key on topic_id and id makes my query much faster for
low offsets, while without it the query takes even for low offsets 
10 seconds.
  

I didn't follow the beginning of this thread, so I'm not sure exactly 
your query, but if you're searching on topic_id, I'd expect the index I 
suggested to help (although I'm not sure how much), and the combined 
index you mentioned to make almost no difference. Did you try the one I 
suggested?

But from your results, I wonder if the slowness is caused by something 
else, not particularly by the lack of an index.

Paul


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing a slow query

2008-06-07 Thread EricHolmberg

It looks like your sub-select (before the joins) is processing up to
140,015 records, so that will slow things down since the database may
not optimize that sub-selection based upon your outer joins.  As a
quick check, try reducing the 140,000 offset to 0 (I know this won't
work for your application, but do this as a sanity check) in your
limit statement.  If that works, then try refactoring your database
(potentially with a timestamp like I explain later).

subselect:  (... WHERE forum_post.topic_id = 9250 ORDER BY
forum_post.id   LIMIT 14, 15)

Other improvements would include (as previously stated by Michael)
would be to make sure you have indexed all of the items in your WHERE,
ORDER BY, and ON clauses.  Also, check to see if you can do arrange
your web page such that you can reduce the amount of information
required initially which will reduce your query time.

Really, 198,398 records is pretty small, so even with a filesort, I
wouldn't expect this to take very long.  Have you double-checked that
your server's file system is running okay, that your DB tables aren't
fragemented, etc?  You can handle most of this optimization by doing
an OPTIMIZE table_name; query in MySQL.

If you're still out of ideas, consider putting a timestamp on the
records and indexing the timestamp.  You can then do a query based
upon a time range which I know is fast since I do it all the time on a
750-million record stock database.

-Eric
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing a slow query

2008-06-06 Thread beewee

Hi,

I merged the two tables into a single one, changed the mapping but the
query still needs more than 10 seconds :/
That's how EXPLAIN looks like now: http://paste.pocoo.org/show/63542/
I think the problem is this one:
|  2 | DERIVED | forum_post | ALL|
forum_post_topic_id  | forum_post_topic_id  | 4   |
NULL| 198398 | Using filesort
|
do you have any idea, why mysql is using filesort for this and how I
can get mysql not to search through 198398 rows?
These are the indexes of forum_post: http://paste.pocoo.org/show/63543/

Benjamin Wiegand

On 5 Jun., 20:33, Michael Bayer [EMAIL PROTECTED] wrote:
 for starters I'd combine post_table and post_text_table into  
 onenot much is accomplished there by having two tables.  Also make  
 sure forum_post.topic_id is indexed.

 On Jun 5, 2008, at 1:37 PM, beewee wrote:



  Hi,

  we're writing a bulleting board using sqlalchemy at the moment, but we
  have the problem, that the database query for viewing a topic is quite
  slow for big topics.
  These are the relevant table definitions and mappings:
 http://paste.pocoo.org/show/62703/
  This is the query that is slow:http://paste.pocoo.org/show/62706/
  This is what EXPLAIN says:http://paste.pocoo.org/show/62708/
  Executing this query needs up to 25 seconds on our test server that's
  just idling. On our productive servers (which aren't idling, of
  course) phpbb is able to execute an adequate query much faster. May
  you can help us speeding up this query? I don't know what to improve,
  since all parts of the query already use a key (except the derived
  one), but unfortunately I have a quite small knowledge of improving
  database queries.

  Thank you very much,

  Benjamin Wiegand
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing a slow query

2008-06-05 Thread Michael Bayer

for starters I'd combine post_table and post_text_table into  
onenot much is accomplished there by having two tables.  Also make  
sure forum_post.topic_id is indexed.


On Jun 5, 2008, at 1:37 PM, beewee wrote:


 Hi,

 we're writing a bulleting board using sqlalchemy at the moment, but we
 have the problem, that the database query for viewing a topic is quite
 slow for big topics.
 These are the relevant table definitions and mappings:
 http://paste.pocoo.org/show/62703/
 This is the query that is slow: http://paste.pocoo.org/show/62706/
 This is what EXPLAIN says: http://paste.pocoo.org/show/62708/
 Executing this query needs up to 25 seconds on our test server that's
 just idling. On our productive servers (which aren't idling, of
 course) phpbb is able to execute an adequate query much faster. May
 you can help us speeding up this query? I don't know what to improve,
 since all parts of the query already use a key (except the derived
 one), but unfortunately I have a quite small knowledge of improving
 database queries.

 Thank you very much,

 Benjamin Wiegand

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---