Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Here is the explain plan of the new query (same as original but with changes to the Date subquery to use ::date instead of to_char to truncate the time portion), when it is run after doing these 2 sets first: set work_mem='8MB'; set enable_nestloop = false; explain plan (8.4): http://explain.depesz.com/s/tw8 thanks again for the help. Julie -Original Message- From: t...@fuzzy.cz [mailto:t...@fuzzy.cz] Sent: Monday, March 21, 2011 11:33 AM To: Davenport, Julie Cc: Merlin Moncure; Tomas Vondra; pgsql-general@postgresql.org Subject: RE: [GENERAL] query taking much longer since Postgres 8.4 upgrade Incredible! Setting enable_nestloop off temporarily for the run of this script made it run in less than a minute (had been running in 10 or 11 minutes). I think you have found a solution for many of my slow running scripts that use these same type of joins. Thanks again. Julie Nice. Can you post EXPLAIN ANALYZE again, so that we can see why this plan was evaluated as as more expensive before disabling nested loops? regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
On 22 Mar 2011, at 24:20, F. BROUARD / SQLpro wrote: Try this : 1) rewrite your query as is : Indeed, a join is probably more efficient than a big IN-list. Good point ;) select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link AS CUL INNER JOIN course_control AS CC ON CUL.course_id = CC.course_id where CUL.instructor = false AND CC.course_begin_date::date IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and CC.course_delivery LIKE 'O%' and CC.course_cross_section IS NULL and NOT EXISTS(SELECT * FROM instr_as_stutemp AS IAS WHERE C.user_id = IAS.user_id) You could write this last condition as a LEFT OUTER JOIN even, where valid records match IAS.user_id IS NULL. 2) prefix all tables by your SQL schema (public by default) I don't think that will matter much, it might shave off a tiny bit of planner execution time if tables aren't in the first schema in the search_path, but otherwise not worth the hassle. 3) create theses indexes (if not) : I think the usual convention is to suffix with _idx instead of prefixing with x_. It's what automatically created indexes do anyway. That's a matter of personal preference though. CREATE INDEX X_CUL_INS_CRS_UID ON course_user_link (instructor, course_id, user_id); CREATE INDEX X_CC_CDV_CCS_CBD_CID ON course_control (course_delivery, course_cross_section, course_begin_date, course_id); If queries where cross_section IS NULL (especially in combination with the other fields in this index) are very common, while the opposite is quite rare, you may want to add a WHERE-clause with that condition to this index. CREATE INDEX X_IAS ON IAS_UID ON instr_as_stutemp (user_id); 4) beware of using reserved words for the name of a database object like ROLE ! Good advise, but not really needed in the case of aliases I think. There's also the possibility to quote those fields as identifiers (which also makes them case-sensitive, so beware!) - in this case that would be ROLE. Le 16/03/2011 16:49, Davenport, Julie a écrit : select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link where instructor = false and course_id in ( select course_id from course_control where to_char(course_begin_date,'MMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and course_delivery LIKE 'O%' and course_cross_section IS NULL ) and user_id not in (select user_id from instr_as_stutemp) -- Frédéric BROUARD - expert SGBDR et SQL - MVP SQL Server - 06 11 86 40 66 Le site sur le langage SQL et les SGBDR : http://sqlpro.developpez.com Enseignant Arts Métiers PACA, ISEN Toulon et CESI/EXIA Aix en Provence Audit, conseil, expertise, formation, modélisation, tuning, optimisation *** http://www.sqlspot.com * -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4d8852ad651346607679948! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Incredible! Setting enable_nestloop off temporarily for the run of this script made it run in less than a minute (had been running in 10 or 11 minutes). I think you have found a solution for many of my slow running scripts that use these same type of joins. Thanks again. Julie Nice. Can you post EXPLAIN ANALYZE again, so that we can see why this plan was evaluated as as more expensive before disabling nested loops? regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
On Mon, Mar 21, 2011 at 11:32 AM, t...@fuzzy.cz wrote: Incredible! Setting enable_nestloop off temporarily for the run of this script made it run in less than a minute (had been running in 10 or 11 minutes). I think you have found a solution for many of my slow running scripts that use these same type of joins. Thanks again. Julie Nice. Can you post EXPLAIN ANALYZE again, so that we can see why this plan was evaluated as as more expensive before disabling nested loops? well the problem is obvious -- the planner is estimating ~ 250 loops, when it in fact has to do ~ 60k. That's a two orders of magnitude miss. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
On Mon, Mar 21, 2011 at 11:32 AM, t...@fuzzy.cz wrote: Incredible! Setting enable_nestloop off temporarily for the run of this script made it run in less than a minute (had been running in 10 or 11 minutes). I think you have found a solution for many of my slow running scripts that use these same type of joins. Thanks again. Julie Nice. Can you post EXPLAIN ANALYZE again, so that we can see why this plan was evaluated as as more expensive before disabling nested loops? well the problem is obvious -- the planner is estimating ~ 250 loops, when it in fact has to do ~ 60k. That's a two orders of magnitude miss. Yeah, you're right, although I think the estimate is 1 loop vs. 2882 loops in reality. The 250 vs. 60k is related to the result set. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Incredible! Setting enable_nestloop off temporarily for the run of this script made it run in less than a minute (had been running in 10 or 11 minutes). I think you have found a solution for many of my slow running scripts that use these same type of joins. Thanks again. Julie -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Friday, March 18, 2011 1:18 PM To: Davenport, Julie Cc: Tomas Vondra; pgsql-general@postgresql.org Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade On Fri, Mar 18, 2011 at 10:42 AM, Davenport, Julie jdavenp...@ctcd.edu wrote: This helped, is now down from 14.9 min to 10.9 min to run the entire script. Thanks. can you try disabling nestloop and see what happens? In the session, before running the query, isssue: set enable_nestloop = false; merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Try this : 1) rewrite your query as is : select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link AS CUL INNER JOIN course_control AS CC ON CUL.course_id = CC.course_id where CUL.instructor = false AND CC.course_begin_date::date IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and CC.course_delivery LIKE 'O%' and CC.course_cross_section IS NULL and NOT EXISTS(SELECT * FROM instr_as_stutemp AS IAS WHERE C.user_id = IAS.user_id) 2) prefix all tables by your SQL schema (public by default) 3) create theses indexes (if not) : CREATE INDEX X_CUL_INS_CRS_UID ON course_user_link (instructor, course_id, user_id); CREATE INDEX X_CC_CDV_CCS_CBD_CID ON course_control (course_delivery, course_cross_section, course_begin_date, course_id); CREATE INDEX X_IAS ON IAS_UID ON instr_as_stutemp (user_id); 4) beware of using reserved words for the name of a database object like ROLE ! A + Le 16/03/2011 16:49, Davenport, Julie a écrit : select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link where instructor = false and course_id in ( select course_id from course_control where to_char(course_begin_date,'MMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and course_delivery LIKE 'O%' and course_cross_section IS NULL ) and user_id not in (select user_id from instr_as_stutemp) -- Frédéric BROUARD - expert SGBDR et SQL - MVP SQL Server - 06 11 86 40 66 Le site sur le langage SQL et les SGBDR : http://sqlpro.developpez.com Enseignant Arts Métiers PACA, ISEN Toulon et CESI/EXIA Aix en Provence Audit, conseil, expertise, formation, modélisation, tuning, optimisation *** http://www.sqlspot.com * -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Dne 18.3.2011 19:18, Merlin Moncure napsal(a): On Fri, Mar 18, 2011 at 10:42 AM, Davenport, Julie jdavenp...@ctcd.edu wrote: This helped, is now down from 14.9 min to 10.9 min to run the entire script. Thanks. can you try disabling nestloop and see what happens? In the session, before running the query, isssue: set enable_nestloop = false; Yes, that'd be interesting. And provide 'exaplain analyze' as before (using explain.depesz.com), please. And a bit unrelated recommendation - based on the settings (cost constants, work_mem etc.) it seems guess you have a default untuned postgresql.conf. Is that right, Julie? In this case you can significantly improve the load performance by several settings: 1) increase checkpoint_segments (default is 3, use 12 or something like that - depends on the if there are warnings about checkpoint segments in the log) 2) increase wal_buffers (just set it to 16MB and forget it) The effect depends on the amount of data loaded and other things, but it's worth a try. regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
This helped, is now down from 14.9 min to 10.9 min to run the entire script. Thanks. Dne 17.3.2011 19:29, Davenport, Julie napsal(a): I have not yet had time to try Tomas' suggestion of bumping up the work_mem first (trying to figure out how to do that from within a coldfusion script). Many thanks for all your help guys! Well, just execute this 'SQL query' just like the other ones set work_mem='8MB' and it will increase the amount of memory for that connection. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Dne 18.3.2011 16:42, Davenport, Julie napsal(a): This helped, is now down from 14.9 min to 10.9 min to run the entire script. Thanks. Still, it's way slower than the 8.0 :-( regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
On Fri, Mar 18, 2011 at 10:42 AM, Davenport, Julie jdavenp...@ctcd.edu wrote: This helped, is now down from 14.9 min to 10.9 min to run the entire script. Thanks. can you try disabling nestloop and see what happens? In the session, before running the query, isssue: set enable_nestloop = false; merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
FYI, I implemented Pavel's suggestion to use: course_begin_date::date IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) instead of to_char(course_begin_date,'MMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and it did help significantly. The overall script (where there are several queries like this one) was taking 7.5 mins on Postgres 8.0 and initially took 20 mins on 8.4; but now after this change (::date) it only takes 14.9 mins. Progress! I have not yet had time to try Tomas' suggestion of bumping up the work_mem first (trying to figure out how to do that from within a coldfusion script). Many thanks for all your help guys! Julie -Original Message- From: Pavel Stehule [mailto:pavel.steh...@gmail.com] Sent: Thursday, March 17, 2011 12:13 AM To: Davenport, Julie Cc: Tomas Vondra; pgsql-general@postgresql.org Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade 2011/3/16 Davenport, Julie jdavenp...@ctcd.edu: Yes, the column course_begin_date is a timestamp, so that would not work in this instance, but I will keep that in mind for future use elsewhere. I agree, there are ways to rewrite this query, just wondering which is best to take advantage of 8.4. Thanks much. ok, sorry, do column_course_begin::date = ... :) Pavel -Original Message- From: Tomas Vondra [mailto:t...@fuzzy.cz] Sent: Wednesday, March 16, 2011 4:40 PM To: Pavel Stehule Cc: pgsql-general@postgresql.org; Davenport, Julie Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade Dne 16.3.2011 22:31, Pavel Stehule napsal(a): 2011/3/16 Tomas Vondra t...@fuzzy.cz: Dne 16.3.2011 21:38, Davenport, Julie napsal(a): OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested. 8.0 - http://explain.depesz.com/s/Wam 8.4 - http://explain.depesz.com/s/asJ Great, that's exactly what I asked for. I'll repost that to the mailing list so that the others can check it too. When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because 8.4 side was updated a couple hours later and some minor changes make sense. Hm, obviously both versions got the row estimates wrong, but the 8.4 difference (200x) is much bigger that the 8.0 (10x). This might be one of the reasons why a different plan is chosen. the expression to_char(course_begin_date, 'MMDD'::text) = '20101025'::text should be a problem much better is test on equality in date domain like: course_begin_date = to_date('20101025', 'MMDD') this is faster and probably better estimated Which is not going to work if the course_begin_date column is a timestamp, because of the time part. But yes, there are several ways to improve this query, yet it does not explain why the 8.4 is so much slower. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Dne 17.3.2011 19:29, Davenport, Julie napsal(a): I have not yet had time to try Tomas' suggestion of bumping up the work_mem first (trying to figure out how to do that from within a coldfusion script). Many thanks for all your help guys! Well, just execute this 'SQL query' just like the other ones set work_mem='8MB' and it will increase the amount of memory for that connection. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] query taking much longer since Postgres 8.4 upgrade
When I run the following query in Postgres 8.0, it runs in 61,509.372 ms When I run it in Postgres 8.4, it runs in 397,857.472 ms Here is the query: select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link where instructor = false and course_id in ( select course_id from course_control where to_char(course_begin_date,'MMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and course_delivery LIKE 'O%' and course_cross_section IS NULL ) and user_id not in (select user_id from instr_as_stutemp) (table instr_as_stutemp has just one column and only 4 rows) What new feature of Postgres 8.4 would be making the query run so much more slowly? Is there a better way to rewrite the query for 8.4 to make it run faster? Many thanks, Julie
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Output of explain (and as likely, explain analyze) for this would be helpful. A On Wed, Mar 16, 2011 at 10:49:24AM -0500, Davenport, Julie wrote: When I run the following query in Postgres 8.0, it runs in 61,509.372 ms When I run it in Postgres 8.4, it runs in 397,857.472 ms Here is the query: select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link where instructor = false and course_id in ( select course_id from course_control where to_char(course_begin_date,'MMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and course_delivery LIKE 'O%' and course_cross_section IS NULL ) and user_id not in (select user_id from instr_as_stutemp) (table instr_as_stutemp has just one column and only 4 rows) What new feature of Postgres 8.4 would be making the query run so much more slowly? Is there a better way to rewrite the query for 8.4 to make it run faster? Many thanks, Julie -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
When I run the following query in Postgres 8.0, it runs in 61,509.372 ms When I run it in Postgres 8.4, it runs in 397,857.472 ms As Andrew already pointed out, we need to se EXPLAIN ANALYZE output from both machines to see why this happens. Are you running both queries on the same data, or is there much more data in 8.4? Have you analyzed the tables recently (or is autovacuum running)? BTW I see two possible issues with this query: 1) There's not a suitable index on course_begin_date, i.e. there's no index at all or the index is not on the expression used in the query. 2) There's not a suitable index on course_delivery (it needs to use the proper operator class). regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
On Wed, Mar 16, 2011 at 10:49 AM, Davenport, Julie jdavenp...@ctcd.edu wrote: When I run the following query in Postgres 8.0, it runs in 61,509.372 ms When I run it in Postgres 8.4, it runs in 397,857.472 ms Here is the query: select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link where instructor = false and course_id in ( select course_id from course_control where to_char(course_begin_date,'MMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and course_delivery LIKE 'O%' and course_cross_section IS NULL ) and user_id not in (select user_id from instr_as_stutemp) (table instr_as_stutemp has just one column and only 4 rows) What new feature of Postgres 8.4 would be making the query run so much more slowly? Is there a better way to rewrite the query for 8.4 to make it run faster? another common problem following upgrades are locale issues -- what is your setting for lc_collate? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
On Wed, Mar 16, 2011 at 2:14 PM, Davenport, Julie jdavenp...@ctcd.edu wrote: Hello Merlin, Thank you very much for your reply. I don't see any setting for lc_collate. I assume it would be in postgresql.conf file if it were there? These are the only lc_... settings I see in postgresql.conf: lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting Am I looking in the wrong place? Thanks much, Julie Julie A. Davenport julie.davenp...@ctcd.edu -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Wednesday, March 16, 2011 1:37 PM To: Davenport, Julie Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade On Wed, Mar 16, 2011 at 10:49 AM, Davenport, Julie jdavenp...@ctcd.edu wrote: When I run the following query in Postgres 8.0, it runs in 61,509.372 ms When I run it in Postgres 8.4, it runs in 397,857.472 ms Here is the query: select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link where instructor = false and course_id in ( select course_id from course_control where to_char(course_begin_date,'MMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and course_delivery LIKE 'O%' and course_cross_section IS NULL ) and user_id not in (select user_id from instr_as_stutemp) (table instr_as_stutemp has just one column and only 4 rows) What new feature of Postgres 8.4 would be making the query run so much more slowly? Is there a better way to rewrite the query for 8.4 to make it run faster? another common problem following upgrades are locale issues -- what is your setting for lc_collate? from psql, do: show lc_collate; more than likely, your lc_collate is set to UTF8, that means that where a like 'foo%' will not use index, which is starting to sound like your problem. unfortunately, database collation is only settable when database is created (or more typically with initdb). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Dne 16.3.2011 20:32, Merlin Moncure napsal(a): On Wed, Mar 16, 2011 at 2:14 PM, Davenport, Julie jdavenp...@ctcd.edu wrote: Hello Merlin, Thank you very much for your reply. I don't see any setting for lc_collate. I assume it would be in postgresql.conf file if it were there? These are the only lc_... settings I see in postgresql.conf: lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8'# locale for time formatting Am I looking in the wrong place? Thanks much, Julie Julie A. Davenport julie.davenp...@ctcd.edu -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Wednesday, March 16, 2011 1:37 PM To: Davenport, Julie Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade On Wed, Mar 16, 2011 at 10:49 AM, Davenport, Julie jdavenp...@ctcd.edu wrote: When I run the following query in Postgres 8.0, it runs in 61,509.372 ms When I run it in Postgres 8.4, it runs in 397,857.472 ms Here is the query: select course_id AS EXTERNAL_COURSE_KEY, user_id AS EXTERNAL_PERSON_KEY, 'Student' AS ROLE, 'Y' AS AVAILABLE_IND from course_user_link where instructor = false and course_id in ( select course_id from course_control where to_char(course_begin_date,'MMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' ) and course_delivery LIKE 'O%' and course_cross_section IS NULL ) and user_id not in (select user_id from instr_as_stutemp) (table instr_as_stutemp has just one column and only 4 rows) What new feature of Postgres 8.4 would be making the query run so much more slowly? Is there a better way to rewrite the query for 8.4 to make it run faster? another common problem following upgrades are locale issues -- what is your setting for lc_collate? from psql, do: show lc_collate; more than likely, your lc_collate is set to UTF8, that means that where a like 'foo%' will not use index, which is starting to sound like your problem. That probably is not the root cause here, according to a description I've received off the list. There are no indexes at all (which is an issue on it's own), but it means this kind of issue is not possible. Or maybe there's a different lc_collate with much more complex rules? That might result in very CPU-expensive operations, and as there's a lot of sequential scans etc. (i.e. a lot of rows to process). regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Dne 16.3.2011 21:38, Davenport, Julie napsal(a): OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested. 8.0 - http://explain.depesz.com/s/Wam 8.4 - http://explain.depesz.com/s/asJ Great, that's exactly what I asked for. I'll repost that to the mailing list so that the others can check it too. When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because 8.4 side was updated a couple hours later and some minor changes make sense. Hm, obviously both versions got the row estimates wrong, but the 8.4 difference (200x) is much bigger that the 8.0 (10x). This might be one of the reasons why a different plan is chosen. Anyway both versions underestimate the course_control subquery, as they believe there will be 1 row only, but in reality there's 2882 of them :-( After your first email I did a vacuum full analyze on the 8.4 side on each of the tables in the schema that the views cover, then ran the query again, and it took even longer - up from 397,857 ms to 412,862 ms. Another query that was taking 597248 ms before the vacuum/analyze took 617526 ms after. I don't understand why, but this is generally the experience we've had with vacuum/analyze on these particular tables. We do large numbers of deletes and inserts to them every day, so I would think they would benefit from it. OK, so the tables were in a quite good shape - not bloated etc. The slight increase is negligible I guess, the vacuum probably removed the data from shared buffers or something like that. I did do a vacuum full analyze on instr_as_stutemp before the explain analyze, as you suggested. OK, now the row estimate is correct Seq Scan on instr_as_stutemp (cost=0.00..1.04 rows=4 width=9) (actual time=0.052..0.098 rows=4 loops=1) I will consider the indexes and do some benchmark testing (I have considered the 'drop-load-reindex' trick in the past). I'm sure increasing maintenance_work_mem will help. OK. But the question why the plan changed this way still remains unanswered (or maybe it does and I don't see it). One thing I've noticed is this difference in estimates: 8.0: Subquery Scan course_control (cost=9462700.13..9492043.53 rows=1 width=32) (actual time=43368.204..45795.239 rows=2882 loops=1) * Filter: (((to_char(course_begin_date, 'MMDD'::text) = '20100412'::text) OR (to_char(course_begin_date, 'MMDD'::text) = ... '20110307'::text)) AND ((course_delivery)::text ~~ 'O%'::text) AND (course_cross_section IS NULL)) 8.4: Subquery Scan course_control (cost=18710.12..548966.51 rows=1 width=32) (actual time=1632.403..4438.949 rows=2882 loops=1) * Filter: ((course_control.course_cross_section IS NULL) AND ((course_control.course_delivery)::text ~~ 'O%'::text) AND (to_char(course_control.course_begin_date, 'MMDD'::text) = ANY ('{20100412,20100510,...,20110110,20110207,20110307}'::text[]))) I.e. both verions use seqscan, both estimate the same number of rows (incorrectly), yet the estimated cost is very different (9492043 vs. 548966). Maybe the cost estimation really changed between 8.0 and 8.4, but just for sure - what are the cost values? I mean what is set for those config values: cpu_index_tuple_cost cpu_operator_cost cpu_tuple_cost random_page_cost seq_page_cost work_mem (use 'show' to see the actual value). regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
2011/3/16 Tomas Vondra t...@fuzzy.cz: Dne 16.3.2011 21:38, Davenport, Julie napsal(a): OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested. 8.0 - http://explain.depesz.com/s/Wam 8.4 - http://explain.depesz.com/s/asJ Great, that's exactly what I asked for. I'll repost that to the mailing list so that the others can check it too. When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because 8.4 side was updated a couple hours later and some minor changes make sense. Hm, obviously both versions got the row estimates wrong, but the 8.4 difference (200x) is much bigger that the 8.0 (10x). This might be one of the reasons why a different plan is chosen. the expression to_char(course_begin_date, 'MMDD'::text) = '20101025'::text should be a problem much better is test on equality in date domain like: course_begin_date = to_date('20101025', 'MMDD') this is faster and probably better estimated Regards Pavel Stehule -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
Dne 16.3.2011 22:31, Pavel Stehule napsal(a): 2011/3/16 Tomas Vondra t...@fuzzy.cz: Dne 16.3.2011 21:38, Davenport, Julie napsal(a): OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested. 8.0 - http://explain.depesz.com/s/Wam 8.4 - http://explain.depesz.com/s/asJ Great, that's exactly what I asked for. I'll repost that to the mailing list so that the others can check it too. When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because 8.4 side was updated a couple hours later and some minor changes make sense. Hm, obviously both versions got the row estimates wrong, but the 8.4 difference (200x) is much bigger that the 8.0 (10x). This might be one of the reasons why a different plan is chosen. the expression to_char(course_begin_date, 'MMDD'::text) = '20101025'::text should be a problem much better is test on equality in date domain like: course_begin_date = to_date('20101025', 'MMDD') this is faster and probably better estimated Which is not going to work if the course_begin_date column is a timestamp, because of the time part. But yes, there are several ways to improve this query, yet it does not explain why the 8.4 is so much slower. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
OK, so the cost constants are equal in both versions (the only difference is due to change of the default value). Just out of curiosity, have you tried to throw a bit more work_mem at the query? Try something like 8MB or 16MB so - just do this db=$ set work_mem=8192 and then run the query (the change is valid in that session only, the other sessions will still use 1MB). Most of the sorts was performed on-disk insted of in memory, and it might result in better plan. regards Tomas Dne 16.3.2011 22:40, Davenport, Julie napsal(a): Tomas, Here are the settings on the 8.0 side: srn_mst=# show cpu_index_tuple_cost; cpu_index_tuple_cost -- 0.001 (1 row) srn_mst=# show cpu_operator_cost; cpu_operator_cost --- 0.0025 (1 row) srn_mst=# show cpu_tuple_cost; cpu_tuple_cost 0.01 (1 row) srn_mst=# show random_page_cost; random_page_cost -- 4 (1 row) srn_mst=# show seq_page_cost; ERROR: unrecognized configuration parameter seq_page_cost srn_mst=# show work_mem; work_mem -- 1024 (1 row) Here are the settings on the 8.4 side: srn_mst=# show cpu_index_tuple_cost; cpu_index_tuple_cost -- 0.005 (1 row) srn_mst=# show cpu_operator_cost; cpu_operator_cost --- 0.0025 (1 row) srn_mst=# show cpu_tuple_cost; cpu_tuple_cost 0.01 (1 row) srn_mst=# show random_page_cost; random_page_cost -- 4 (1 row) srn_mst=# show seq_page_cost; seq_page_cost --- 1 (1 row) srn_mst=# show work_mem; work_mem -- 1MB (1 row) Thanks, Julie -Original Message- From: Tomas Vondra [mailto:t...@fuzzy.cz] Sent: Wednesday, March 16, 2011 4:23 PM To: pgsql-general@postgresql.org Cc: Davenport, Julie Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade Dne 16.3.2011 21:38, Davenport, Julie napsal(a): OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested. 8.0 - http://explain.depesz.com/s/Wam 8.4 - http://explain.depesz.com/s/asJ Great, that's exactly what I asked for. I'll repost that to the mailing list so that the others can check it too. When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because 8.4 side was updated a couple hours later and some minor changes make sense. Hm, obviously both versions got the row estimates wrong, but the 8.4 difference (200x) is much bigger that the 8.0 (10x). This might be one of the reasons why a different plan is chosen. Anyway both versions underestimate the course_control subquery, as they believe there will be 1 row only, but in reality there's 2882 of them :-( After your first email I did a vacuum full analyze on the 8.4 side on each of the tables in the schema that the views cover, then ran the query again, and it took even longer - up from 397,857 ms to 412,862 ms. Another query that was taking 597248 ms before the vacuum/analyze took 617526 ms after. I don't understand why, but this is generally the experience we've had with vacuum/analyze on these particular tables. We do large numbers of deletes and inserts to them every day, so I would think they would benefit from it. OK, so the tables were in a quite good shape - not bloated etc. The slight increase is negligible I guess, the vacuum probably removed the data from shared buffers or something like that. I did do a vacuum full analyze on instr_as_stutemp before the explain analyze, as you suggested. OK, now the row estimate is correct Seq Scan on instr_as_stutemp (cost=0.00..1.04 rows=4 width=9) (actual time=0.052..0.098 rows=4 loops=1) I will consider the indexes and do some benchmark testing (I have considered the 'drop-load-reindex' trick in the past). I'm sure increasing maintenance_work_mem will help. OK. But the question why the plan changed this way still remains unanswered (or maybe it does and I don't see it). One thing I've noticed is this difference in estimates: 8.0: Subquery Scan course_control (cost=9462700.13..9492043.53 rows=1 width=32) (actual time=43368.204..45795.239 rows=2882 loops=1) * Filter: (((to_char(course_begin_date, 'MMDD'::text) = '20100412'::text) OR (to_char(course_begin_date, 'MMDD'::text) = ... '20110307'::text)) AND ((course_delivery)::text ~~ 'O%'::text) AND (course_cross_section IS NULL)) 8.4: Subquery Scan course_control (cost=18710.12..548966.51 rows=1 width=32) (actual time=1632.403..4438.949 rows=2882 loops=1) * Filter: ((course_control.course_cross_section IS NULL) AND ((course_control.course_delivery)::text ~~ 'O%'::text) AND (to_char
Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
2011/3/16 Davenport, Julie jdavenp...@ctcd.edu: Yes, the column course_begin_date is a timestamp, so that would not work in this instance, but I will keep that in mind for future use elsewhere. I agree, there are ways to rewrite this query, just wondering which is best to take advantage of 8.4. Thanks much. ok, sorry, do column_course_begin::date = ... :) Pavel -Original Message- From: Tomas Vondra [mailto:t...@fuzzy.cz] Sent: Wednesday, March 16, 2011 4:40 PM To: Pavel Stehule Cc: pgsql-general@postgresql.org; Davenport, Julie Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade Dne 16.3.2011 22:31, Pavel Stehule napsal(a): 2011/3/16 Tomas Vondra t...@fuzzy.cz: Dne 16.3.2011 21:38, Davenport, Julie napsal(a): OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested. 8.0 - http://explain.depesz.com/s/Wam 8.4 - http://explain.depesz.com/s/asJ Great, that's exactly what I asked for. I'll repost that to the mailing list so that the others can check it too. When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because 8.4 side was updated a couple hours later and some minor changes make sense. Hm, obviously both versions got the row estimates wrong, but the 8.4 difference (200x) is much bigger that the 8.0 (10x). This might be one of the reasons why a different plan is chosen. the expression to_char(course_begin_date, 'MMDD'::text) = '20101025'::text should be a problem much better is test on equality in date domain like: course_begin_date = to_date('20101025', 'MMDD') this is faster and probably better estimated Which is not going to work if the course_begin_date column is a timestamp, because of the time part. But yes, there are several ways to improve this query, yet it does not explain why the 8.4 is so much slower. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general