Re: Regarding performance regression on specific query
Thanks for the comment. We also have several performance regression cases that we found from TPC-C benchmark. Since those queries were not executed on empty relation, they will be more interesting. We will report to pgsql-performance mailing list next time. Jinho Jung From: Tom Lane Sent: Saturday, November 24, 2018 3:32:41 PM To: Amit Langote Cc: Jung, Jinho; pgsql-hack...@postgresql.org Subject: Re: Regarding performance regression on specific query Amit Langote writes: > On 2018/11/20 2:49, Jung, Jinho wrote: >> [ assorted queries ] > I noticed that these two are fixed by running ANALYZE in the database in > which these queries are run. That didn't help much for me. What did help was increasing join_collapse_limit and from_collapse_limit to not limit the join search space --- on queries with as many input relations as these, you're really at the mercy of whether the given query structure represents a good join order if you don't. In general I can't get even a little bit excited about the quality of the plans selected for these examples, as they all involve made-up restriction and join clauses that the planner isn't going to have the slightest clue about. The observations boil down to "9.4 made one set of arbitrary plan choices, while v10 made a different set of arbitrary plan choices, and on these particular examples 9.4 got lucky and 10 didn't". Possibly also worth noting is that running these in an empty database is in itself kind of a worst case, because many of the tables are empty to start with (or the restriction/join clauses pass no rows), and so the fastest runtime tends to go to plans of the form "nestloop with empty relation on the outside and all the expensive stuff on the inside". (Observe all the "(never executed)" notations in the EXPLAIN output.) This kind of plan wins only when the outer rel is actually empty, otherwise it can easily lose big, and therefore PG's planner is intentionally designed to discount the case entirely. We never believe that a relation is empty, unless we can mathematically prove that, and our cost estimates are never made with an eye to exploiting such cases. This contributes a lot to the random-chance nature of which plan is actually fastest; the planner isn't expecting "(never executed)" to happen and doesn't prefer plans that will win if it does happen. regards, tom lane
Re: Regarding performance regression on specific query
Thanks for the test. We are wondering how ANALYZE mitigated regression from query "1.sql" and "4.sql". We followed this procedure but still observe performance regression: 1) run ANALYZE on used table_name analyze pg_catalog.pg_ts_parser; analyze information_schema.column_options; analyze pg_catalog.pg_aggregate; analyze pg_catalog.pg_inherits; analyze pg_catalog.pg_aggregate; analyze pg_catalog.pg_rewrite; analyze pg_catalog.pg_stat_user_indexes; analyze pg_catalog.pg_stat_user_tables; analyze pg_catalog.pg_attribute; analyze information_schema.column_privileges; analyze pg_catalog.pg_user_mapping; analyze pg_catalog.pg_type; analyze pg_catalog.pg_shseclabel; analyze pg_catalog.pg_statio_sys_sequences; analyze information_schema.role_routine_grants; analyze pg_catalog.pg_type; analyze information_schema.user_mapping_options; analyze pg_catalog.pg_stat_xact_sys_tables; 2) execute the same query We have more cases. Do you think we should report them through the bug report website? (https://www.postgresql.org/account/login/?next=/account/submitbug/) Jinho Jung From: Amit Langote Sent: Tuesday, November 20, 2018 2:47:54 AM To: Jung, Jinho; pgsql-hack...@postgresql.org Subject: Re: Regarding performance regression on specific query Hi, On 2018/11/20 2:49, Jung, Jinho wrote: > Execution time > = > 1.sql > 10.6 : 469 ms > 9.4.20: 10 ms > > 4.sql > 10.6 : 34019 ms > 9.4.20: 0.4 ms I noticed that these two are fixed by running ANALYZE in the database in which these queries are run. > 20.sql > 10.6 : 2791 ms > 9.4.20: 61 ms This one may be suffering from a more serious planning issue, as doing ANALYZE didn't help for this one. Will have to look closely at how the plan is changing for worse. Thanks, Amit
Regarding performance regression on specific query
Hello, I am Jinho Jung, PhD student at Georgia Tech and I am conducting DB performance research. I am sending our evaluation result to get the feedback regarding our findings. We found several performance regression queries. Attached files are three of them that we confirmed performance regression (in terms of query execution time) between v10.6 and v9.4.20. Hope you can test these queries and give us feedback. For your information, we are also sending the execution time measured on our machine. Execution time = 1.sql 10.6 : 469 ms 9.4.20: 10 ms 4.sql 10.6 : 34019 ms 9.4.20: 0.4 ms 20.sql 10.6 : 2791 ms 9.4.20: 61 ms Evaluation setup == 1) compile 9.4.20 and 10.6 from released source code (https://www.postgresql.org/ftp/source/) 2) without importing additional data, just run the attached queries We appreciate you taking time for test! Best regards, Jinho Jung 1.sql Description: 1.sql 4.sql Description: 4.sql 20.sql Description: 20.sql
Re: Regarding query minimizer (simplifier)
Hello Tom, Sorry for the misleading. Could you try these two queries? I made the query even slower in latest version of postgres. These are information about how we set up evaluation environment and query result. Thanks, Jinho Jung Install Multiple version of DBs in one machine == # Install 10.5 $ wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - $ sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" > /etc/apt/sources.list.d/pgdg_xenial.list' $ sudo apt update $ sudo apt-get install postgresql-10 # Install 9.6 $ sudo apt-get install postgresql-9.6 # Install 9.5 $ sudo apt-get install postgresql-9.5 # Install 9.4 $ sudo apt-get install postgresql-9.4 postgresql-contrib-9.4 libpq-dev postgresql-server-dev-9.4 # check $ pg_lsclusters Original regression query == explain analyze select 1 from information_schema.role_usage_grants as ref_2, lateral ( select max((null)) over (partition by ref_3.amopfamily) as c8 from pg_catalog.pg_amop as ref_3 ) as subq_0 ; ORIGINAL querying time on old version(9.4/9.5): 5.7ms on latest version(10): 91.76ms CORRELATED query to maximize error === explain analyze select * from information_schema.role_usage_grants f1 where grantor = ( select max(ref_2.grantor) from information_schema.role_usage_grants as ref_2, lateral ( select max((null)) over (partition by ref_3.amopfamily) as c8 from pg_catalog.pg_amop as ref_3 ) as subq_0 where ref_2.object_catalog = f1.object_catalog ) ; CORRELATED querying time on old version(9.4/9.5): 0.6s on latest version(10): 113s 188 times slower From: Tom Lane Sent: Saturday, October 13, 2018 5:59:06 PM To: Jung, Jinho Cc: pgsql-hackers@lists.postgresql.org Subject: Re: Regarding query minimizer (simplifier) "Jung, Jinho" writes: > Hello, I am Jinho Jung, Phd Student from GeorgiaTech, and try to find any SQL > queries that cause performance regression. While conducting evaluation, I > found an interesting query which makes x80 times slower execution in version > 10.5 than version 9.4. Please see the attached files, if you are interested. Hm, testing this in the regression database, it seems pretty speedy across all supported branches, and indeed slower in 9.4 than later branches (~25 ms vs ~10 ms). It seems likely that you're testing in a very different database, perhaps one with many more tables ... but if you don't explain the test scenario, we aren't going to have much luck investigating. regards, tom lane query_regression Description: query_regression query_much_regression Description: query_much_regression
Regarding query minimizer (simplifier)
Hello, I am Jinho Jung, Phd Student from GeorgiaTech, and try to find any SQL queries that cause performance regression. While conducting evaluation, I found an interesting query which makes x80 times slower execution in version 10.5 than version 9.4. Please see the attached files, if you are interested. To analyze the root cause of problem, I had to find minimal query that contains same problem. (i.e., same regression issue) The initial query was about 14K and I should manually remove part of query. (it took about 30 minutes to get 500 bytes query) So I am wondering if there are tools to automatically simplify statements to analyze the performance issue. Thanks, Jinho Jung query6_manually_minimize Description: query6_manually_minimize query6_original Description: query6_original