Re: [PERFORM] mysql to postgresql, performance questions
On Sun, Mar 21, 2010 at 9:14 PM, Dave Crooke dcro...@gmail.com wrote: Note however that Oracle offeres full transactionality and does in place row updates. There is more than one way to do it. There's no free lunch. If you do mvcc you have to maintain multiple versions of the same row. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
On Mon, 22 Mar 2010 12:14:51 +0100, Merlin Moncure mmonc...@gmail.com wrote: On Sun, Mar 21, 2010 at 9:14 PM, Dave Crooke dcro...@gmail.com wrote: Note however that Oracle offeres full transactionality and does in place row updates. There is more than one way to do it. There's no free lunch. MVCC : VACUUM Oracle : Rollback Segments MyISAM : no concurrency/transactions It's all about which compromise suits you ;) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes
On Fri, 19 Mar 2010, Stephen Frost wrote: ...it has to go to an external on-disk sort (see later on, and how to fix that). This was covered on this list a few months ago, in http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and http://archives.postgresql.org/pgsql-performance/2009-08/msg00189.php There seemed to be some consensus that allowing a materialise in front of an index scan might have been a good change. Was there any movement on this front? Limit (cost=66681.50..66681.50 rows=1 width=139) (actual time=7413.489..7413.489 rows=1 loops=1) - Merge Anti Join (cost=40520.17..66681.50 rows=367793 width=139) (actual time=3705.078..7344.256 rows=101 loops=1) Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id = f2.user_id)) - Index Scan using user_ref on friends f1 (cost=0.00..26097.86 rows=2818347 width=139) (actual time=0.093..1222.592 rows=1917360 loops=1) - Materialize (cost=40520.17..40555.40 rows=2818347 width=8) (actual time=3704.977..5043.347 rows=1990148 loops=1) - Sort (cost=40520.17..40527.21 rows=2818347 width=8) (actual time=3704.970..4710.703 rows=1990148 loops=1) Sort Key: f2.ref_id, f2.user_id Sort Method: external merge Disk: 49576kB - Seq Scan on friends f2 (cost=0.00..18143.18 rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1) Total runtime: 7422.516 ms If you had an index on ref_id,user_id (as well as the one on user_id,ref_id), it'd probably be able to do in-order index traversals on both and be really fast... But then updates would be more expensive, of course, since it'd have more indexes to maintain. That isn't necessarily so, until the issue referred to in the above linked messages is resolved. It depends. Matthew -- I've run DOOM more in the last few days than I have the last few months. I just love debugging ;-) -- Linus Torvalds -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes
Matthew Wakeling matt...@flymine.org writes: On Fri, 19 Mar 2010, Stephen Frost wrote: ...it has to go to an external on-disk sort (see later on, and how to fix that). This was covered on this list a few months ago, in http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and http://archives.postgresql.org/pgsql-performance/2009-08/msg00189.php There seemed to be some consensus that allowing a materialise in front of an index scan might have been a good change. Was there any movement on this front? Yes, 9.0 will consider plans like Merge Join (cost=0.00..14328.70 rows=100 width=488) Merge Cond: (a.four = b.hundred) - Index Scan using fouri on tenk1 a (cost=0.00..1635.62 rows=1 width=244) - Materialize (cost=0.00..1727.16 rows=1 width=244) - Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows =1 width=244) Some experimentation shows that it won't insert the materialize unless quite a bit of re-fetching is predicted (ie neither side of the join is unique). We might need to tweak the cost parameters once we get some field experience with it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Sat, 20 Mar 2010, Yeb Havinga wrote: The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with the disk pages getting larger, gist indexing has therefore become unexpectedly inefficient. Yes, that is certainly a factor. For example, the page size for bioseg which we use here is 130 entries, which is very excessive, and doesn't allow very deep trees. On the other hand, it means that a single disc seek performs quite a lot of work. But I am also not really into the core-gist code, but do have a motivation to dive into it (more than 200% performance increase in Mathew's test case). However I'd like to verify for community support before working on it. I'd also love to dive into the core gist code, but am rather daunted by it. I believe that there is something there that is taking more time than I can account for. The indexing algorithm itself is good. Matthew -- The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary. - James Nicoll -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
Matthew Wakeling wrote: On Sat, 20 Mar 2010, Yeb Havinga wrote: The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with the disk pages getting larger, gist indexing has therefore become unexpectedly inefficient. Yes, that is certainly a factor. For example, the page size for bioseg which we use here is 130 entries, which is very excessive, and doesn't allow very deep trees. On the other hand, it means that a single disc seek performs quite a lot of work. Yeah, I only did in-memory fitting tests and wondered about increased io's. However I bet that even for bigger than ram db's, the benefit of having to fan out to less pages still outweighs the over-general non leaf nodes and might still result in less disk io's. I redid some earlier benchmarking with other datatypes with a 1kB block size and also multicolumn gist and the multicolumn variant had an ever greater benefit than the single column indexes, both equality and range scans. (Like execution times down to 20% of original). If gist is important to you, I really recommend doing a test with 1kB blocks. regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Mon, 22 Mar 2010, Yeb Havinga wrote: Yes, that is certainly a factor. For example, the page size for bioseg which we use here is 130 entries, which is very excessive, and doesn't allow very deep trees. On the other hand, it means that a single disc seek performs quite a lot of work. Yeah, I only did in-memory fitting tests and wondered about increased io's. However I bet that even for bigger than ram db's, the benefit of having to fan out to less pages still outweighs the over-general non leaf nodes and might still result in less disk io's. I redid some earlier benchmarking with other datatypes with a 1kB block size and also multicolumn gist and the multicolumn variant had an ever greater benefit than the single column indexes, both equality and range scans. (Like execution times down to 20% of original). If gist is important to you, I really recommend doing a test with 1kB blocks. Purely from a disc seek count point of view, assuming an infinite CPU speed and infinite disc transfer rate, the larger the index pages the better. The number of seeks per fetch will be equivalent to the depth of the tree. If you take disc transfer rate into account, the break-even point is when you spend an equal time transferring as seeking, which places the page size around 500kB on a modern disc, assuming RAID stripe alignment doesn't make that into two seeks instead of one. However, for efficient CPU usage, the ideal page size for a tree index is much smaller - between two and ten entries, depending on the type of the data. There may be some mileage in reorganising indexes into a two-level system. That is, have an index format where the page size is 512kB or similar, but each page is internally a CPU-efficient tree itself. However, this is beyond the scope of the problem of speeding up gist. Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
Absolutely ... - for fixed size rows with a lot of small updates, Oracle wins. BTW, as of Oracle 9 they're called UNDO tablesapces - for lots of transactions and feely mixing transactions of all sizes, MVCC tables (Postgres) wins - if you just want a structured filesystem and don't have integrity requirements or a lot of updates, MyISAM wins For our app, Oracle would be the best, but it isn't strictly necessary so Postgres wins on price ;-) Cheers Dave On Mon, Mar 22, 2010 at 6:15 AM, Pierre C li...@peufeu.com wrote: On Mon, 22 Mar 2010 12:14:51 +0100, Merlin Moncure mmonc...@gmail.com wrote: On Sun, Mar 21, 2010 at 9:14 PM, Dave Crooke dcro...@gmail.com wrote: Note however that Oracle offeres full transactionality and does in place row updates. There is more than one way to do it. There's no free lunch. MVCC : VACUUM Oracle : Rollback Segments MyISAM : no concurrency/transactions It's all about which compromise suits you ;)
Re: [PERFORM] Building multiple indexes concurrently
It seems to me that a separate partition / tablespace would be a much simpler approach. On Mar 17, 2010, at 5:18 PM, Hannu Krosing wrote: On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote: Alvaro Herrera wrote: Andres Freund escribió: I find it way much easier to believe such issues exist on a tables in constrast to indexes. The likelihood to get sequential accesses on an index is small enough on a big table to make it unlikely to matter much. Vacuum walks indexes sequentially, for one. That and index-based range scans were the main two use-cases I was concerned would be degraded by interleaving index builds, compared with doing them in succession. I guess that tweaking file systems to allocate in bigger chunks help here ? I know that xfs can be tuned in that regard, but how about other common file systems like ext3 ? - Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Building multiple indexes concurrently
On Thu, 2010-03-18 at 16:12 -0400, Justin Pitts wrote: It seems to me that a separate partition / tablespace would be a much simpler approach. Do you mean a separate partition/ tablespace for _each_ index built concurrently ? On Mar 17, 2010, at 5:18 PM, Hannu Krosing wrote: On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote: Alvaro Herrera wrote: Andres Freund escribió: I find it way much easier to believe such issues exist on a tables in constrast to indexes. The likelihood to get sequential accesses on an index is small enough on a big table to make it unlikely to matter much. Vacuum walks indexes sequentially, for one. That and index-based range scans were the main two use-cases I was concerned would be degraded by interleaving index builds, compared with doing them in succession. I guess that tweaking file systems to allocate in bigger chunks help here ? I know that xfs can be tuned in that regard, but how about other common file systems like ext3 ? - Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Building multiple indexes concurrently
Yes. On Mar 18, 2010, at 5:20 PM, Hannu Krosing wrote: On Thu, 2010-03-18 at 16:12 -0400, Justin Pitts wrote: It seems to me that a separate partition / tablespace would be a much simpler approach. Do you mean a separate partition/ tablespace for _each_ index built concurrently ? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes
Message from Corin wakath...@gmail.com at 03-19-2010 01:26:35 PM -- ***snip The intention of the query is to find rows with no partner row. The offset and limit are just to ignore the time needed to send the result to the client. --- I don't understand the point of OFFSET, limit will accomplish the same thing, PG will still execute the query the only difference is PG will skip the step to count through the first million rows before returning a record. --- SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id) OFFSET 100 LIMIT 1 Mysql uses this query plan: 1 PRIMARY f1 index NULL user_ref 8 NULL 2818860 Using where; Using index 2 DEPENDENT SUBQUERY f2 ref user_ref user_ref 8 f1.ref_id,f1.user_id 1 Using index Time: 9.8s --- if that's a query explain in Mysql its worthless. The above has no information, does not tell us how long each step is taking, let alone what it was thinking it would take to make the query work . -- Postgre uses this query plan: Limit (cost=66681.50..66681.50 rows=1 width=139) (actual time=7413.489..7413.489 rows=1 loops=1) - Merge Anti Join (cost=40520.17..66681.50 rows=367793 width=139) (actual time=3705.078..7344.256 rows=101 loops=1) *Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id = f2.user_id))* - Index Scan using user_ref on friends f1 (cost=0.00..26097.86 rows=2818347 width=139) (actual time=0.093..1222.592 rows=1917360 loops=1) - Materialize (cost=40520.17..40555.40 rows=2818347 width=8) (actual time=3704.977..5043.347 rows=1990148 loops=1) - Sort (cost=40520.17..40527.21 rows=2818347 width=8) (actual time=3704.970..4710.703 rows=1990148 loops=1) Sort Key: f2.ref_id, f2.user_id Sort Method: external merge Disk: 49576kB - Seq Scan on friends f2 (cost=0.00..18143.18 rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1) Total runtime: 7422.516 ms --- We can see each step PG takes and make inform decisions what part of the query is slow . We can See the Sorting the rows takes most of the time --- It's already faster, which is great, but I wonder why the query plan is that complex. Its not complex it showing you all the steps which Mysql is not showing you I read in the pqsql docs that using a multicolumn key is almost never needed and only a waste of cpu/space. So I dropped the multicolumn key and added to separate keys instead: Where is that at??? I don't recall reading that. PG will only use indexes that match exactly where/join conditions. CREATE INDEX ref1 ON friends USING btree (ref_id); CREATE INDEX user1 ON friends USING btree (user_id); New query plan: Limit (cost=70345.04..70345.04 rows=1 width=139) (actual time=43541.709..43541.709 rows=1 loops=1) - Merge Anti Join (cost=40520.27..70345.04 rows=367793 width=139) (actual time=3356.694..43467.818 rows=101 loops=1) * Merge Cond: (f1.user_id = f2.ref_id) Join Filter: (f1.ref_id = f2.user_id) --- *take note the merge has changed. it now joins on f1.user_id=f2.ref_id then filters the results down by using the AND condition. Put the index back * --- *- Index Scan using user1 on friends f1 (cost=0.00..26059.79 rows=2818347 width=139) (actual time=0.031..1246.668 rows=1917365 loops=1) - Materialize (cost=40520.17..40555.40 rows=2818347 width=8) (actual time=3356.615..14941.405* rows=130503729* loops=1) --- take note look at what happened here. this because the of Join is not limited as it was before. did you run this query against Mysql with the same kind of indexes??? - - Sort (cost=40520.17..40527.21 rows=2818347 width=8) (actual time=3356.611..4127.435 rows=1990160 loops=1) Sort Key: f2.ref_id Sort Method: external merge Disk: 49560kB - Seq Scan on friends f2 (cost=0.00..18143.18 rows=2818347 width=8) (actual time=0.012..496.174 rows=2818347 loops=1) Total runtime: 43550.187 ms I also wonder why it makes a difference when adding a LIMIT clause to the subselect in an EXISTS subselect. Shouldn't pgsql always stop after finding the a row? In mysql is makes no difference in speed, pgsql even get's slower when adding a LIMIT to the EXISTS subselect (I hoped it would get faster?!). Limits occur last after doing all the major work is done SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id LIMIT 1) OFFSET 100 LIMIT 1 Limit (cost=6389166.19..6389172.58 rows=1 width=139) (actual time=54540.356..54540.356 rows=1 loops=1) - Seq Scan on friends f1 (cost=0.00..9003446.87 rows=1409174 width=139) (actual time=0.511..54460.006 rows=101 loops=1) Filter:
Re: [PERFORM] Block at a time ...
On Mar 17, 2010, at 9:41 AM, Craig James wrote: On 3/17/10 2:52 AM, Greg Stark wrote: On Wed, Mar 17, 2010 at 7:32 AM, Pierre Cli...@peufeu.com wrote: I was thinking in something like that, except that the factor I'd use would be something like 50% or 100% of current size, capped at (say) 1 GB. This turns out to be a bad idea. One of the first thing Oracle DBAs are told to do is change this default setting to allocate some reasonably large fixed size rather than scaling upwards. This might be mostly due to Oracle's extent-based space management but I'm not so sure. Recall that the filesystem is probably doing some rounding itself. If you allocate 120kB it's probably allocating 128kB itself anyways. Having two layers rounding up will result in odd behaviour. In any case I was planning on doing this a while back. Then I ran some experiments and couldn't actually demonstrate any problem. ext2 seems to do a perfectly reasonable job of avoiding this problem. All the files were mostly large contiguous blocks after running some tests -- IIRC running pgbench. This is one of the more-or-less solved problems in Unix/Linux. Ext* file systems have a reserve usually of 10% of the disk space that nobody except root can use. It's not for root, it's because with 10% of the disk free, you can almost always do a decent job of allocating contiguous blocks and get good performance. Unless Postgres has some weird problem that Linux has never seen before (and that wouldn't be unprecedented...), there's probably no need to fool with file-allocation strategies. Craig Its fairly easy to break. Just do a parallel import with say, 16 concurrent tables being written to at once. Result? Fragmented tables. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster
I previously posted 'forcing index scan on query produces 16x faster' and it seemed that the consensus was that 8.0.x series had an issue. I have upgraded to the highest practical version for our distro. But we seem to have the same issue. If I force the 'enable_seqscan' off our actual time is 9ms where if 'enable_seqscan' is on the performance is 2200ms ( the good news is the Seq Scan query on 8.2 is 1/2 the time of the 8.0 query ). The paste is below - I reloaded the table from scratch after the 8.2 upgrade. Then I ran a 'REINDEX DATABASE' and a 'VACUUM ANALYZE' (then ran some queries and reran the vac analyze). postream= SELECT version(); version - PostgreSQL 8.2.11 on i386-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33) (1 row) postream= SET enable_seqscan = false; SET postream= EXPLAIN ANALYZE postream- SELECT si.group1_id as name, sum(si.qty) as count, sum(si.amt) as amt postream-FROM salesitems si, sales s, sysstrings postream- WHERE si.id = s.id postream-AND si.group1_id != '' postream-AND si.group1_id IS NOT NULL postream-AND NOT si.void postream-AND NOT s.void postream-AND NOT s.suspended postream-AND s.tranzdate = (cast('2010-02-15' as date) + cast(sysstrings.data as time)) postream-AND s.tranzdate ((cast('2010-02-15' as date) + 1) + cast(sysstrings.data as time)) postream-AND sysstrings.id='net/Console/Employee/Day End Time' postream- GROUP BY name; QUERY PLAN -- HashAggregate (cost=426973.65..426973.86 rows=14 width=35) (actual time=9.424..9.438 rows=12 loops=1) - Nested Loop (cost=0.01..426245.31 rows=97113 width=35) (actual time=0.653..6.954 rows=894 loops=1) - Nested Loop (cost=0.01..2416.59 rows=22477 width=4) (actual time=0.595..2.150 rows=225 loops=1) - Index Scan using sysstrings_pkey on sysstrings (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1) Index Cond: (id = 'net/Console/Employee/Day End Time'::text) - Index Scan using sales_tranzdate_index on sales s (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 rows=225 loops=1) Index Cond: ((s.tranzdate = ('2010-02-15'::date + (sysstrings.data)::time without time zone)) AND (s.tranzdate ('2010-02-16'::date + (sysstrings.data)::time without time zone))) Filter: ((NOT void) AND (NOT suspended)) - Index Scan using salesitems_pkey on salesitems si (cost=0.00..18.54 rows=25 width=39) (actual time=0.007..0.013 rows=4 loops=225) Index Cond: (si.id = s.id) Filter: (((group1_id)::text ''::text) AND (group1_id IS NOT NULL) AND (NOT void)) Total runtime: 9.585 ms (12 rows) postream= SET enable_seqscan = true; SET postream= EXPLAIN ANALYZE postream- SELECT si.group1_id as name, sum(si.qty) as count, sum(si.amt) as amt postream-FROM salesitems si, sales s, sysstrings postream- WHERE si.id = s.id postream-AND si.group1_id != '' postream-AND si.group1_id IS NOT NULL postream-AND NOT si.void postream-AND NOT s.void postream-AND NOT s.suspended postream-AND s.tranzdate = (cast('2010-02-15' as date) + cast(sysstrings.data as time)) postream-AND s.tranzdate ((cast('2010-02-15' as date) + 1) + cast(sysstrings.data as time)) postream-AND sysstrings.id='net/Console/Employee/Day End Time' postream- GROUP BY name; QUERY PLAN HashAggregate (cost=38315.09..38315.30 rows=14 width=35) (actual time=2206.531..2206.545 rows=12 loops=1) - Hash Join (cost=2697.55..37586.74 rows=97113 width=35) (actual time=2128.070..2204.048 rows=894 loops=1) Hash Cond: (si.id = s.id) - Seq Scan on salesitems si (cost=0.00..30578.15 rows=890646 width=39) (actual time=0.047..1487.688 rows=901281 loops=1) Filter: (((group1_id)::text ''::text) AND (group1_id IS NOT NULL) AND (NOT void)) - Hash (cost=2416.59..2416.59 rows=22477 width=4) (actual time=1.823..1.823 rows=225 loops=1) - Nested Loop (cost=0.01..2416.59 rows=22477 width=4) (actual time=0.477..1.592 rows=225 loops=1) - Index Scan using
Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster
Christian Brink cbr...@r-stream.com writes: - Nested Loop (cost=0.01..2416.59 rows=22477 width=4) (actual time=0.595..2.150 rows=225 loops=1) - Index Scan using sysstrings_pkey on sysstrings (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1) Index Cond: (id = 'net/Console/Employee/Day End Time'::text) - Index Scan using sales_tranzdate_index on sales s (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 rows=225 loops=1) Index Cond: ((s.tranzdate = ('2010-02-15'::date + (sysstrings.data)::time without time zone)) AND (s.tranzdate ('2010-02-16'::date + (sysstrings.data)::time without time zone))) Filter: ((NOT void) AND (NOT suspended)) The fundamental reason why you're getting a bad plan choice is the factor-of-100 estimation error here. I'm not sure you can do a whole lot about that without rethinking the query --- in particular I would suggest trying to get rid of the non-constant range bounds. You're apparently already plugging in an external variable for the date, so maybe you could handle the time of day similarly instead of joining to sysstrings for it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Block at a time ...
This is one of the more-or-less solved problems in Unix/Linux. Ext* file systems have a reserve usually of 10% of the disk space that nobody except root can use. It's not for root, it's because with 10% of the disk free, you can almost always do a decent job of allocating contiguous blocks and get good performance. Unless Postgres has some weird problem that Linux has never seen before (and that wouldn't be unprecedented...), there's probably no need to fool with file-allocation strategies. Craig Its fairly easy to break. Just do a parallel import with say, 16 concurrent tables being written to at once. Result? Fragmented tables. Delayed allocation (ext4, XFS) helps a lot for concurrent writing at a medium-high rate (a few megabytes per second and up) when lots of data can sit in the cache and be flushed/allocated as big contiguous chunks. I'm pretty sure ext4/XFS would pass your parallel import test. However if you have files like tables (and indexes) or logs that grow slowly over time (something like a few megabytes per hour or less), after a few days/weeks/months, horrible fragmentation is an almost guaranteed result on many filesystems (NTFS being perhaps the absolute worst). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Block at a time ...
This is why pre-allocation is a good idea if you have the space Tom, what about a really simple command in a forthcoming release of PG that would just preformat a 1GB file at a time? This is what I've always done scripted with Oracle (ALTER TABLESPACE foo ADD DATAFILE ) rather than relying on its autoextender when performance has been a concern. Cheers Dave On Mon, Mar 22, 2010 at 3:55 PM, Pierre C li...@peufeu.com wrote: This is one of the more-or-less solved problems in Unix/Linux. Ext* file systems have a reserve usually of 10% of the disk space that nobody except root can use. It's not for root, it's because with 10% of the disk free, you can almost always do a decent job of allocating contiguous blocks and get good performance. Unless Postgres has some weird problem that Linux has never seen before (and that wouldn't be unprecedented...), there's probably no need to fool with file-allocation strategies. Craig Its fairly easy to break. Just do a parallel import with say, 16 concurrent tables being written to at once. Result? Fragmented tables. Delayed allocation (ext4, XFS) helps a lot for concurrent writing at a medium-high rate (a few megabytes per second and up) when lots of data can sit in the cache and be flushed/allocated as big contiguous chunks. I'm pretty sure ext4/XFS would pass your parallel import test. However if you have files like tables (and indexes) or logs that grow slowly over time (something like a few megabytes per hour or less), after a few days/weeks/months, horrible fragmentation is an almost guaranteed result on many filesystems (NTFS being perhaps the absolute worst). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Block at a time ...
On Mon, Mar 22, 2010 at 6:47 PM, Scott Carey sc...@richrelevance.com wrote: Its fairly easy to break. Just do a parallel import with say, 16 concurrent tables being written to at once. Result? Fragmented tables. Fwiw I did do some investigation about this at one point and could not demonstrate any significant fragmentation. But that was on Linux -- different filesystem implementations would have different success rates. And there could be other factors as well such as how full the fileystem is or how old it is. -- greg -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] default_statistics_target
HI Greg, Thanks for the insight. How much more of a server's resources will be consumed by an ANALYZE with default_statistics_target = 100? We have two environments hosting the same data. One is our live server, which serves the web site, and this hosts our published data, not more than 200 - 300 tables. PRODUCTION: The data warehouse consisting of our published data, as well as our input resources which are transformed via ETL processes into our published data. It is these input resources which currently consist of about 8,000 tables and growing. Don't really require analysis, as they are typically run once in a linear read when importing.they are typically read linearly, and rarely more than once. They are kept for auditing and rollbacks. LIVE: Hosts just the published data, copied over from the production server. Because the data does not get written to very often, older stats from ANALYZE are likely to still be valid. Our concern is that with the older setting of default_statistics_target = 10 it has not gone deep enough into these tables (numbering in the millios of rows) to really represent the data distribution properly. Given that it looks like you're running 8.3 from past messages I've seen from you, I'd also be concerned that you've overrun your max_fsm_pages, so that VACUUM is growing increasing ineffective for you, and that's contributing to your headache. Below are the config values of our production server (those not listed are those stubbed out) . Sadly, in an attempt to improve the server's performance, someone wiped out all of the changes I had made to date, along with comments indicating previous values, reason for the change, etc. What do they call that again? Oh, yeah. Documentation. # CENTOS 5.4 # Linux mdx_octo 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux # pgsql 8.3.10, 8 CPUs, 48GB RAM # RAID 10, 4 Disks autovacuum = on # Enable autovacuum subprocess? 'on' autovacuum_analyze_scale_factor = 0.05 # fraction of table size before analyze autovacuum_analyze_threshold = 1000 autovacuum_naptime = 1min # time between autovacuum runs autovacuum_vacuum_cost_delay = 50 # default vacuum cost delay for autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum autovacuum_vacuum_threshold = 1000 bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round checkpoint_segments = 128 # in logfile segments, min 1, 16MB each checkpoint_warning = 290s # 0 is off client_min_messages = debug1 # values in order of decreasing detail: datestyle = 'iso, mdy' default_statistics_target = 250 # range 1-1000 default_text_search_config = 'pg_catalog.english' lc_messages = 'C' # locale for system error message lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C'# locale for time formatting listen_addresses = '*' # what IP address(es) to listen on; log_destination = 'stderr' # Valid values are combinations of log_error_verbosity = verbose # terse, default, or verbose messages log_line_prefix = '%t ' # special values: log_min_error_statement = debug1 # values in order of decreasing detail: log_min_messages = debug1 # values in order of decreasing detail: logging_collector = on # Enable capturing of stderr and csvlog maintenance_work_mem = 256MB max_connections = 100 # (change requires restart) max_fsm_relations = 1000 # min 100, ~70 bytes each max_locks_per_transaction = 128 # min 10 port = 5432# (change requires restart) shared_buffers = 4096MB shared_preload_libraries = '$libdir/plugins/plugin_debugger.so' # (change requires restart) track_counts = on vacuum_cost_delay = 5 # 0-1000 milliseconds wal_buffers = 4MB wal_sync_method = open_sync work_mem = 64MB Carlo Greg Smith g...@2ndquadrant.com wrote in message news:4b9e33af.2020...@2ndquadrant.com... Carlo Stonebanks wrote: The whole topic of messing with stats makes my head spin but I am concerned about some horridly performing queries that have had bad rows estimates and others which always choose seq scans when indexes are available. Reading up on how to improve planner estimates, I have seen references to default_statistics_target being changed from the default of 10 to 100. Our DB is large, with thousands of tables Stop right there for a second. Are you sure autovacuum is working well here? With thousands of tables, it wouldn't surprise me to discover your planner estimates are wrong because there hasn't been a recent enough ANALYZE on the relevant tables. If you haven't already, take a look at pg_stat_user_tables and make sure that tables that have the bad estimates have actually been analyzed recently. A look at the live/dead row counts there should be helpful as well. If all that's recent, but you're still getting bad estimates, only then would I suggest trying an increase to default_statistics_target. In the situation where
[PERFORM] Got that new server, now it's time for config!
Here we go again! Based on recommendations made here, I got my client to migrate off of our Windows 2003 Server x64 box to a new Linux box. # CENTOS 5.4 # Linux mdx_octo 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux # pgsql 8.3.10, 8 CPUs, 48GB RAM # RAID 10, 4 Disks Below are the config values of this production server (those not listed are those stubbed out) . Sadly, in an attempt to improve the server's performance, someone wiped out all of the changes I had made to date, along with comments indicating previous values, reason for the change, etc. This is a data warehouse production server, used for ETL. 500 GB database, approx 8000 tables and growing, although the vast majority of them are the original import resource tables and are rarely accessed. The actual core data is about 200 tables, consisting of millions of rows. Data importing and content management is done via a 15,000 line TCL import scripts and application base (as this is ETL with fuzzy logic, not just COPY... FROM...) . So, we have the hardware, we have the O/S - but I think our config leaves much to be desired. Typically, our planner makes nad decisions, picking seq scan over index scan, where index scan has a better result. Can anyone see any obvious faults? Carlo autovacuum = on autovacuum_analyze_scale_factor = 0.05 autovacuum_analyze_threshold = 1000 autovacuum_naptime = 1min autovacuum_vacuum_cost_delay = 50 autovacuum_vacuum_scale_factor = 0.2 autovacuum_vacuum_threshold = 1000 bgwriter_lru_maxpages = 100 checkpoint_segments = 128 checkpoint_warning = 290s client_min_messages = debug1 datestyle = 'iso, mdy' default_statistics_target = 250 default_text_search_config = 'pg_catalog.english' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' listen_addresses = '*' log_destination = 'stderr' log_error_verbosity = verbose log_line_prefix = '%t ' log_min_error_statement = debug1 log_min_messages = debug1 logging_collector = on maintenance_work_mem = 256MB max_connections = 100 max_fsm_relations = 1000 max_locks_per_transaction = 128 port = 5432 shared_buffers = 4096MB shared_preload_libraries = '$libdir/plugins/plugin_debugger.so' track_counts = on vacuum_cost_delay = 5 wal_buffers = 4MB wal_sync_method = open_sync work_mem = 64MB -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster
Not to beat a dead horse excessively, but I think the below is a pretty good argument for index hints? I know the general optimizer wants to be highest priority (I very much agree with this), but I think there are fully legitimate cases like the below. Asking the user to rewrite the query in an unnatural way (or to change optimizer params that may work for 99% of queries) is, IMO not a good thing. Given that the postgres optimizer can never be perfect (as it will never have the perfect knowledge necessary for a perfect decision), I would request that index hints be reconsidered (for 9.0?). I know many users (myself included) are doing this in a very rudimentary way by disabling particular access types on a per session basis set enable_seqscan=off; set enable_hashjoin=off; QUERY set enable_seqscan=on; set enable_hashjoin=on;... I'd hack up a patch if I had the time at least =) Best regards, Patrick -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tom Lane Sent: Monday, March 22, 2010 12:22 PM To: Christian Brink Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster Christian Brink cbr...@r-stream.com writes: - Nested Loop (cost=0.01..2416.59 rows=22477 width=4) (actual time=0.595..2.150 rows=225 loops=1) - Index Scan using sysstrings_pkey on sysstrings (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1) Index Cond: (id = 'net/Console/Employee/Day End Time'::text) - Index Scan using sales_tranzdate_index on sales s (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 rows=225 loops=1) Index Cond: ((s.tranzdate = ('2010-02-15'::date + (sysstrings.data)::time without time zone)) AND (s.tranzdate ('2010-02-16'::date + (sysstrings.data)::time without time zone))) Filter: ((NOT void) AND (NOT suspended)) The fundamental reason why you're getting a bad plan choice is the factor-of-100 estimation error here. I'm not sure you can do a whole lot about that without rethinking the query --- in particular I would suggest trying to get rid of the non-constant range bounds. You're apparently already plugging in an external variable for the date, so maybe you could handle the time of day similarly instead of joining to sysstrings for it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Got that new server, now it's time for config!
On 3/22/10 4:36 PM, Carlo Stonebanks wrote: Here we go again! Can anyone see any obvious faults? Carlo maintenance_work_mem = 256MB I'm not sure how large your individual tables are, but you might want to bump this value up to get faster vacuums. max_fsm_relations = 1000 I think this will definitely need to be increased work_mem = 64MB Most data warehousing loads I can think of will need more work_mem, but this depends on how large of data sets you are planning to sort. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Block at a time ...
On 3/22/10 11:47 AM, Scott Carey wrote: On Mar 17, 2010, at 9:41 AM, Craig James wrote: On 3/17/10 2:52 AM, Greg Stark wrote: On Wed, Mar 17, 2010 at 7:32 AM, Pierre Cli...@peufeu.com wrote: I was thinking in something like that, except that the factor I'd use would be something like 50% or 100% of current size, capped at (say) 1 GB. This turns out to be a bad idea. One of the first thing Oracle DBAs are told to do is change this default setting to allocate some reasonably large fixed size rather than scaling upwards. This might be mostly due to Oracle's extent-based space management but I'm not so sure. Recall that the filesystem is probably doing some rounding itself. If you allocate 120kB it's probably allocating 128kB itself anyways. Having two layers rounding up will result in odd behaviour. In any case I was planning on doing this a while back. Then I ran some experiments and couldn't actually demonstrate any problem. ext2 seems to do a perfectly reasonable job of avoiding this problem. All the files were mostly large contiguous blocks after running some tests -- IIRC running pgbench. This is one of the more-or-less solved problems in Unix/Linux. Ext* file systems have a reserve usually of 10% of the disk space that nobody except root can use. It's not for root, it's because with 10% of the disk free, you can almost always do a decent job of allocating contiguous blocks and get good performance. Unless Postgres has some weird problem that Linux has never seen before (and that wouldn't be unprecedented...), there's probably no need to fool with file-allocation strategies. Craig Its fairly easy to break. Just do a parallel import with say, 16 concurrent tables being written to at once. Result? Fragmented tables. Is this from real-life experience? With fragmentation, there's a point of diminishing return. A couple head-seeks now and then hardly matter. My recollection is that even when there are lots of concurrent processes running that are all making files larger and larger, the Linux file system still can do a pretty good job of allocating mostly-contiguous space. It doesn't just dumbly allocate from some list, but rather tries to allocate in a way that results in pretty good contiguousness (if that's a word). On the other hand, this is just from reading discussion groups like this one over the last few decades, I haven't tried it... Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Got that new server, now it's time for config!
Carlo Stonebanks wrote: So, we have the hardware, we have the O/S - but I think our config leaves much to be desired. Typically, our planner makes nad decisions, picking seq scan over index scan, where index scan has a better result. You're not setting effective_cache_size, so I wouldn't expect it to ever choose an index scan given the size of your data set. The planner thinks that anything bigger than 128MB isn't likely to fit in RAM by default, which favors sequential scans. That parameter should probably be 24GB on your server, so it's off by more than two orders of magnitude. wal_sync_method = open_sync This is a scary setting to be playing with on Linux when using ext3 filesystems due to general kernel bugginess in this area. See http://archives.postgresql.org/pgsql-hackers/2007-10/msg01310.php for an example. I wouldn't change this from the default in your position if using that filesystem. I'd drastically increase effective_cache_size, put wal_sync_method back to the default, and then see how things go for a bit before tweaking anything else. Nothing else jumped out as bad in your configuration besides the extremely high logging levels, haven't looked at it that carefully yet though. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance