Re: [PERFORM] Speed of exist
2013/2/19 Bastiaan Olij : > Hi Andy, > > I've tried that with the same result. One subquery works beautifully, > two subqueries with an OR and it starts to do a sequential scan... try to rewrite OR to two SELECTs joined by UNION ALL Pavel > > Thanks, > > Bastiaan Olij > > On 19/02/13 6:31 PM, Andy wrote: >> Limit the sub-queries to 1, i.e. : >> >> select 1 from Table2 where Table2.ForeignKey = Table1.PrimaryKey fetch >> first 1 rows only >> >> Andy. >> >> On 19.02.2013 07:34, Bastiaan Olij wrote: >>> Hi All, >>> >>> Hope someone can help me a little bit here: >>> >>> I've got a query like the following: >>> -- >>> select Column1, Column2, Column3 >>> from Table1 >>> where exists (select 1 from Table2 where Table2.ForeignKey = >>> Table1.PrimaryKey) >>> or exists (select 1 from Table3 where Table3.ForeignKey = >>> Table1.PrimaryKey) >>> -- >>> >>> Looking at the query plan it is doing a sequential scan on both Table2 >>> and Table3. >>> >>> If I remove one of the subqueries and turn the query into: >>> -- >>> select Column1, Column2, Column3 >>> from Table1 >>> where exists (select 1 from Table2 where Table2.ForeignKey = >>> Table1.PrimaryKey) >>> -- >>> >>> It is nicely doing an index scan on the index that is on >>> Table2.ForeignKey. >>> >>> As Table2 and Table3 are rather large the first query takes minutes >>> while the second query takes 18ms. >>> >>> Is there a way to speed this up or an alternative way of selecting >>> records from Table1 which have related records in Table2 or Table3 which >>> is faster? >>> >>> Kindest Regards, >>> >>> Bastiaan Olij >>> >>> >>> >> > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speed of exist
Hi Andy, I've tried that with the same result. One subquery works beautifully, two subqueries with an OR and it starts to do a sequential scan... Thanks, Bastiaan Olij On 19/02/13 6:31 PM, Andy wrote: > Limit the sub-queries to 1, i.e. : > > select 1 from Table2 where Table2.ForeignKey = Table1.PrimaryKey fetch > first 1 rows only > > Andy. > > On 19.02.2013 07:34, Bastiaan Olij wrote: >> Hi All, >> >> Hope someone can help me a little bit here: >> >> I've got a query like the following: >> -- >> select Column1, Column2, Column3 >> from Table1 >> where exists (select 1 from Table2 where Table2.ForeignKey = >> Table1.PrimaryKey) >> or exists (select 1 from Table3 where Table3.ForeignKey = >> Table1.PrimaryKey) >> -- >> >> Looking at the query plan it is doing a sequential scan on both Table2 >> and Table3. >> >> If I remove one of the subqueries and turn the query into: >> -- >> select Column1, Column2, Column3 >> from Table1 >> where exists (select 1 from Table2 where Table2.ForeignKey = >> Table1.PrimaryKey) >> -- >> >> It is nicely doing an index scan on the index that is on >> Table2.ForeignKey. >> >> As Table2 and Table3 are rather large the first query takes minutes >> while the second query takes 18ms. >> >> Is there a way to speed this up or an alternative way of selecting >> records from Table1 which have related records in Table2 or Table3 which >> is faster? >> >> Kindest Regards, >> >> Bastiaan Olij >> >> >> > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speed of exist
Limit the sub-queries to 1, i.e. : select 1 from Table2 where Table2.ForeignKey = Table1.PrimaryKey fetch first 1 rows only Andy. On 19.02.2013 07:34, Bastiaan Olij wrote: Hi All, Hope someone can help me a little bit here: I've got a query like the following: -- select Column1, Column2, Column3 from Table1 where exists (select 1 from Table2 where Table2.ForeignKey = Table1.PrimaryKey) or exists (select 1 from Table3 where Table3.ForeignKey = Table1.PrimaryKey) -- Looking at the query plan it is doing a sequential scan on both Table2 and Table3. If I remove one of the subqueries and turn the query into: -- select Column1, Column2, Column3 from Table1 where exists (select 1 from Table2 where Table2.ForeignKey = Table1.PrimaryKey) -- It is nicely doing an index scan on the index that is on Table2.ForeignKey. As Table2 and Table3 are rather large the first query takes minutes while the second query takes 18ms. Is there a way to speed this up or an alternative way of selecting records from Table1 which have related records in Table2 or Table3 which is faster? Kindest Regards, Bastiaan Olij -- *Andy Gumbrecht* Research & Development Orpro Vision GmbH Hefehof 24, 31785, Hameln +49 (0) 5151 809 44 21 +49 (0) 1704 305 671 andy.gumbre...@orprovision.com www.orprovision.com Orpro Vision GmbH Sitz der Gesellschaft: 31785, Hameln USt-Id-Nr: DE264453214 Amtsgericht Hannover HRB204336 Geschaeftsfuehrer: Roberto Gatti, Massimo Gatti, Adam Shaw Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche anderweitige Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Speed of exist
Hi All, Hope someone can help me a little bit here: I've got a query like the following: -- select Column1, Column2, Column3 from Table1 where exists (select 1 from Table2 where Table2.ForeignKey = Table1.PrimaryKey) or exists (select 1 from Table3 where Table3.ForeignKey = Table1.PrimaryKey) -- Looking at the query plan it is doing a sequential scan on both Table2 and Table3. If I remove one of the subqueries and turn the query into: -- select Column1, Column2, Column3 from Table1 where exists (select 1 from Table2 where Table2.ForeignKey = Table1.PrimaryKey) -- It is nicely doing an index scan on the index that is on Table2.ForeignKey. As Table2 and Table3 are rather large the first query takes minutes while the second query takes 18ms. Is there a way to speed this up or an alternative way of selecting records from Table1 which have related records in Table2 or Table3 which is faster? Kindest Regards, Bastiaan Olij -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
On 19/02/13 13:39, Josh Berkus wrote: Scott, So do you have generally slow IO, or is it fsync behavior etc? All tests except pgBench show this system as superfast. Bonnie++ and DD tests are good (200 to 300mb/s), and test_fsync shows 14K/second. Basically it has no issues until checkpoint kicks in, at which time the entire system basically halts for the duration of the checkpoint. For that matter, if I run a pgbench and halt it just before checkpoint kicks in, I get around 12000TPS, which is what I'd expect on this system. At this point, we've tried 3.2.0.26, 3.2.0.27, 3.4.0, and tried updating the RAID driver, and changing the IO scheduler. Nothing seems to affect the behavior. Testing using Ext4 (instead of XFS) next. Might be worth looking at your vm.dirty_ratio, vm.dirty_background_ratio and friends settings. We managed to choke up a system with 16x SSD by leaving them at their defaults... Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
On Mon, Feb 18, 2013 at 5:39 PM, Josh Berkus wrote: > Scott, > >> So do you have generally slow IO, or is it fsync behavior etc? > > All tests except pgBench show this system as superfast. Bonnie++ and DD > tests are good (200 to 300mb/s), and test_fsync shows 14K/second. > Basically it has no issues until checkpoint kicks in, at which time the > entire system basically halts for the duration of the checkpoint. I assume you've made attemtps at write levelling to reduce impacts of checkpoints etc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
> Did you try turning barriers on or off *manually* (explicitly)? With > LSI and barriers *on* and ext4 I had less-optimal performance. With > Linux MD or (some) 3Ware configurations I had no performance hit. They're off in fstab. /dev/sdd1 on /data type xfs (rw,noatime,nodiratime,nobarrier) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
On Mon, Feb 18, 2013 at 6:39 PM, Josh Berkus wrote: > Scott, > >> So do you have generally slow IO, or is it fsync behavior etc? > > All tests except pgBench show this system as superfast. Bonnie++ and DD > tests are good (200 to 300mb/s), and test_fsync shows 14K/second. > Basically it has no issues until checkpoint kicks in, at which time the > entire system basically halts for the duration of the checkpoint. > > For that matter, if I run a pgbench and halt it just before checkpoint > kicks in, I get around 12000TPS, which is what I'd expect on this system. > > At this point, we've tried 3.2.0.26, 3.2.0.27, 3.4.0, and tried updating > the RAID driver, and changing the IO scheduler. Nothing seems to affect > the behavior. Testing using Ext4 (instead of XFS) next. Did you try turning barriers on or off *manually* (explicitly)? With LSI and barriers *on* and ext4 I had less-optimal performance. With Linux MD or (some) 3Ware configurations I had no performance hit. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
Scott, > So do you have generally slow IO, or is it fsync behavior etc? All tests except pgBench show this system as superfast. Bonnie++ and DD tests are good (200 to 300mb/s), and test_fsync shows 14K/second. Basically it has no issues until checkpoint kicks in, at which time the entire system basically halts for the duration of the checkpoint. For that matter, if I run a pgbench and halt it just before checkpoint kicks in, I get around 12000TPS, which is what I'd expect on this system. At this point, we've tried 3.2.0.26, 3.2.0.27, 3.4.0, and tried updating the RAID driver, and changing the IO scheduler. Nothing seems to affect the behavior. Testing using Ext4 (instead of XFS) next. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
So, our drop in performance is now clearly due to pathological OS behavior during checkpoints. Still trying to pin down what's going on, but it's not system load; it's clearly related to the IO system. Anyone else see this? I'm getting it both on 3.2 and 3.4. We're using LSI Megaraid. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query plans caused by under-estimation of CTE cardinality
Vitalii wrote > > Since cte is already an optimization fence, you can go further and make > it temporary table. > Create table;analyze;select should make optimizer's work much easier. > Thanks Vitalii - yes, you are right, and I have used that technique on other cases like this. However, for this one, the entire query must be executed as a single query in order that it is based on a consistent snapshot (in the Multiversion Concurrency Control sense) of the base table data. Using the temp table technique would allow a commit to occur which would be invisible to the part of the query which would build the temp but visible to the remaining part of the query. I know I could set Repeatable Read for the transaction to ensure the consistency but that causes other concurrency problems as this query is part of a fairly long-running transaction. I really just want this one query to avoid "dangerous" plans (meaning relying too much on an estimate of cardinality of ~ 1 being really correct). I also forgot to show the fragment of "good" plan (from corrupting the statistics). It demonstrates how effective the hash join is in comparison - 20 minutes reduced down to 1 second for this join. -> Hash Join (cost=0.80..1.51 rows=1 width=588) (actual time=1227.517..1693.792 rows=105984 loops=1) Hash Cond: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level)) Buffers: shared hit=386485 read=364 -> CTE Scan on winning_option_nums winnum (cost=0.00..0.40 rows=20 width=536) (actual time=1174.558..1222.542 rows=62904 loops=1) Buffers: shared hit=386485 read=364 -> Hash (cost=0.40..0.40 rows=20 width=584) (actual time=52.933..52.933 rows=111308 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8644kB -> CTE Scan on subnet_inhrt_options_asc binoptasc (cost=0.00..0.40 rows=20 width=584) (actual time=0.001..21.651 rows=111308 loops=1) John -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query plans caused by under-estimation of CTE cardinality
Since cte is already an optimization fence, you can go further and make it temporary table. Create table;analyze;select should make optimizer's work much easier. 18 лют. 2013 18:45, "John Lumby" напис. > > On 2012-10-09 23:09:21 > Tom Lane wrote: > > > > > > re subject Why am I getting great/terrible estimates with these CTE > queries? > > You're assuming the case where the estimate is better is better for a > > reason ... but it's only better as a result of blind dumb luck. The > > outer-level query planner doesn't know anything about the CTE's output > > except the estimated number of rows --- in particular, it doesn't drill > > down to find any statistics about the join column. > > > > I am also struggling with a problem involving CTEs although in my case > it is caused by huge *under*-estimation of cardinality rather then > *over*-estimation. > The statement is quite complex and the problem arises because there is a > chain of > RECURSIVE CTEs each defined as a query involving an earlier CTE and more > tables. > Eventually there is no hope for making a good cardinality estimate. > > One CTE in particular has a cardinality estimate of 1 (I guess the actual > estimate is nearer zero and rounded up) but actual count is over 10. > The planner puts this CTE as inner of a nested loop accessed by simple > linear CTE scan > and the full query then takes over 20 minutes. > >-> Nested Loop (cost=0.00..0.06 rows=1 width=588) (actual > time=2340.421..1201593.856 rows=105984 loops=1) > Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND > (winnum.option_code = binoptasc.option_code) AND > ((winnum.option_discriminator)::text = > (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = > binoptasc.net_rel_level)) > Rows Removed by Join Filter: 7001612448 > Buffers: shared hit=2290941 > -> CTE Scan on winning_option_nums winnum (cost=0.00..0.02 > rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1) > Buffers: shared hit=2290941 > -> CTE Scan on subnet_inhrt_options_asc binoptasc > (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308 > loops=62904) > > Whereas, (by altering various statistics to be very wrong) the entire > query runs in 21 seconds. > > There have been several debates about how to address situations like this > where > no practical non-query-specific statistics-gathering scheme can ever hope > to > gather enough statistics to model the later derived tables. E.g. the > frowned-on > SELECTIVITY clause and ideas for query-specific statistics. > > Meanwhile, I have one other suggestion aimed specifically at problematic > CTEs: > Would it be reasonable to provide a new Planner Configuration option : > > enable_nestloop_cte_inner (boolean) > Enables or disables the query planner's use of nested-loop join plans in > which a CTE is the inner. > It is impossible to suppress such nested-loop joins entirely, > but turning this variable off discourages the planner from using one > if there are other methods available, such as sorting the CTE for > merge-join > or hashing it for hash-join. > The default is on. > > John > > > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
[PERFORM] slow query plans caused by under-estimation of CTE cardinality
On 2012-10-09 23:09:21 Tom Lane wrote: > > re subject Why am I getting great/terrible estimates with these CTE queries? > You're assuming the case where the estimate is better is better for a > reason ... but it's only better as a result of blind dumb luck. The > outer-level query planner doesn't know anything about the CTE's output > except the estimated number of rows --- in particular, it doesn't drill > down to find any statistics about the join column. > I am also struggling with a problem involving CTEs although in my case it is caused by huge *under*-estimation of cardinality rather then *over*-estimation. The statement is quite complex and the problem arises because there is a chain of RECURSIVE CTEs each defined as a query involving an earlier CTE and more tables. Eventually there is no hope for making a good cardinality estimate. One CTE in particular has a cardinality estimate of 1 (I guess the actual estimate is nearer zero and rounded up) but actual count is over 10. The planner puts this CTE as inner of a nested loop accessed by simple linear CTE scan and the full query then takes over 20 minutes. -> Nested Loop (cost=0.00..0.06 rows=1 width=588) (actual time=2340.421..1201593.856 rows=105984 loops=1) Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level)) Rows Removed by Join Filter: 7001612448 Buffers: shared hit=2290941 -> CTE Scan on winning_option_nums winnum (cost=0.00..0.02 rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1) Buffers: shared hit=2290941 -> CTE Scan on subnet_inhrt_options_asc binoptasc (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308 loops=62904) Whereas, (by altering various statistics to be very wrong) the entire query runs in 21 seconds. There have been several debates about how to address situations like this where no practical non-query-specific statistics-gathering scheme can ever hope to gather enough statistics to model the later derived tables. E.g. the frowned-on SELECTIVITY clause and ideas for query-specific statistics. Meanwhile, I have one other suggestion aimed specifically at problematic CTEs: Would it be reasonable to provide a new Planner Configuration option : enable_nestloop_cte_inner (boolean) Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner. It is impossible to suppress such nested-loop joins entirely, but turning this variable off discourages the planner from using one if there are other methods available, such as sorting the CTE for merge-join or hashing it for hash-join. The default is on. John -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance