Re: [PERFORM] Best OS for Postgres 8.2
On Tue, 8 May 2007, Greg Smith wrote: On Tue, 8 May 2007, Luke Lonergan wrote: From discussions with the developers, the biggest issue is a technical one: the Linux VFS layer makes the [ZFS] port difficult. Difficult on two levels. First you'd have to figure out how to make it work at all; then you'd have to reshape it into a form that it would be acceptable to the Linux kernel developers, who haven't seemed real keen on the idea so far. given that RAID, snapshots, etc are already in the linux kernel, I suspect that what will need to happen is for the filesystem to be ported without those features and then the userspace tools (that manipulate the volumes ) be ported to use the things already in the kernel. The standard article I'm you've already seen this week on this topic is Jeff Bonwick's at http://blogs.sun.com/bonwick/entry/rampant_layering_violation yep, that sounds like what I've been hearing. what the ZFS (and reiserfs4) folks haven't been wanting to hear from the linux kernel devs is that they are interested in having all these neat features available for use with all filesystems (and the linux kernel has a _lot_ of filesystems available), with solaris you basicly have UFS and ZFS so it's not as big a deal. What really bugged me was his earlier article linked to there where he talks about how ZFS eliminates the need for hardware RAID controllers: http://blogs.sun.com/bonwick/entry/raid_z While there may be merit to that idea for some applications, like situations where you have a pig of a RAID5 volume, that's just hype for database writes. We issue the SYNCHRONIZE CACHE command to the disks after pushing all data in a transaction group--see, that would be the part the hardware controller is needed to accelerate. If you really care about whether your data hit disk, there is no way to break the RPM barrier without hardware support. The fact that he misunderstands such a fundamental point makes me wonder what other gigantic mistakes might be buried in his analysis. I've seen similar comments from some of the linux kernel devs, they've used low-end raid controllers with small processors on them and think that a second core/socket in the main system to run software raid on is better. David Lang ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] estimating the need for VACUUM FULL and REINDEX
Heikki Linnakangas heikki 'at' enterprisedb.com writes: Guillaume Cottenceau wrote: According to documentation[1], VACUUM FULL's only benefit is returning unused disk space to the operating system; am I correct in assuming there's also the benefit of optimizing the performance of scans, because rows are physically compacted on the disk? That's right. Ok. Then I think the documentation should probably be updated? It seems to totally miss this benefit. We've been hit by degrading performance, probably because of too seldom VACUUM ANALYZE, and in this situation it seems that the two solutions are either VACUUM FULL or dumping and recreating the database. Maybe this situation should be described in the documentation. In this list, everyone always say you should VACUUM ANALYZE frequently but little is done to consider the case when we have to deal with an existing database on which this hasn't been done properly. With that in mind, I've tried to estimate how much benefit would be brought by running VACUUM FULL, with the output of VACUUM VERBOSE. However, it seems that for example the removable rows reported by each VACUUM VERBOSE run is actually reused by VACUUM, so is not what I'm looking for. Take a look at contrib/pgstattuple. If a table has high percentage of free space, VACUUM FULL will compact that out. Thanks a lot. I've followed this path and I think it should be said that free_space must also be large compared to 8K - free_percent can be large for tables with very few tuples even on already compacted tables. Then according to documentation[2], REINDEX has some benefit when all but a few index keys on a page have been deleted, because the page remains allocated (thus, I assume it improves index scan performance, am I correct?). However, again I'm unable to estimate the expected benefit. With a slightly modified version of a query found in documentation[3] to see the pages used by a relation[4], I'm able to see that the index data from a given table... See pgstatindex, in the same contrib-module. The number you're looking for is avg_leaf_density. REINDEX will bring that to 90% (with default fill factor), so if it's much lower than that REINDEX will help. Woops, seems that this was not availabe in pgstattuple of pg 7.4 :/ -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best OS for Postgres 8.2
On Wed, May 09, 2007 at 01:57:51AM -0700, [EMAIL PROTECTED] wrote: given that RAID, snapshots, etc are already in the linux kernel, I suspect that what will need to happen is for the filesystem to be ported without those features and then the userspace tools (that manipulate the volumes ) be ported to use the things already in the kernel. Well, part of the idea behind ZFS is that these parts are _not_ separated in layers -- for instance, the filesystem can push data down to the RAID level to determine the stripe size used. Whether this is a good idea is of course hotly debated, but I don't think you can port just the filesystem part and call it a day. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best OS for Postgres 8.2
On Wed, 9 May 2007, Steinar H. Gunderson wrote: On Wed, May 09, 2007 at 01:57:51AM -0700, [EMAIL PROTECTED] wrote: given that RAID, snapshots, etc are already in the linux kernel, I suspect that what will need to happen is for the filesystem to be ported without those features and then the userspace tools (that manipulate the volumes ) be ported to use the things already in the kernel. Well, part of the idea behind ZFS is that these parts are _not_ separated in layers -- for instance, the filesystem can push data down to the RAID level to determine the stripe size used. there's nothing preventing this from happening if they are seperate layers either. there are some performance implications of the seperate layers, but until someone has the ability to do head-to-head comparisons it's hard to say which approach will win (in theory the lack of layers makes for faster code, but in practice the fact that each layer is gone over by experts looking for ways to optimize it may overwelm the layering overhead) Whether this is a good idea is of course hotly debated, but I don't think you can port just the filesystem part and call it a day. Oh, I'm absolutly sure that doing so won't satidfy people (wnd would generate howles of outrage from some parts), but having watched other groups try and get things into the kernel that the kernel devs felt were layering violations I think that it's wat will ultimatly happen. David Lang ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?
Alvaro Herrera [EMAIL PROTECTED] writes: Steinar H. Gunderson wrote: Shoot me if I'm wrong here, but doesn't VACUUM ANALYZE check _all_ tuples, as compared to the random selection employed by ANALYZE? You are wrong, but it won't be me the one to shoot you. There have been noises towards making the ANALYZE portion use the same scan that VACUUM already does, but nobody has written the code (it would be useful for some kinds of stats). I think it does for the count of total records in the table. But not for the rest of the stats. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: FW: [PERFORM]
[EMAIL PROTECTED] writes: with a standard 7200 rpm drive ~150 transactions/sec sounds about right to really speed things up you want to get a disk controller with a battery backed cache so that the writes don't need to hit the disk to be safe. Note that this is only if you're counting transactions/sec in a single session. You can get much more if you have many sessions since they can all commit together in a single disk i/o. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Nested loops overpriced
Am Dienstag, 8. Mai 2007 17:53 schrieb Tom Lane: Hmm, I'd have expected it to discount the repeated indexscans a lot more than it seems to be doing for you. As an example in the regression database, note what happens to the inner indexscan cost estimate when the number of outer tuples grows: I can reproduce your results in the regression test database. 8.2.1 and 8.2.4 behave the same. I checked the code around cost_index(), and the assumptions appear to be correct (at least this query doesn't produce wildly unusual data). Apparently, however, the caching effects are much more significant than the model takes into account. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Apparently useless bitmap scans
There's another odd thing about this plan from yesterday. Query: SELECT eh_subj.header_body AS subject, count(distinct eh_from.header_body) FROM email JOIN mime_part USING (email_id) JOIN email_header eh_subj USING (email_id, mime_part_id) JOIN email_header eh_from USING (email_id, mime_part_id) WHERE eh_subj.header_name = 'subject' AND eh_from.header_name = 'from' AND mime_part_id = 0 AND (time = timestamp '2007-05-05 17:01:59' AND time timestamp '2007-05-05 17:01:59' + interval '60 min') GROUP BY eh_subj.header_body; Plan: QUERY PLAN --- GroupAggregate (cost=1920309.81..1920534.21 rows=11220 width=184) (actual time=5349.493..5587.536 rows=35000 loops=1) - Sort (cost=1920309.81..1920337.86 rows=11220 width=184) (actual time=5349.427..5392.110 rows=35000 loops=1) Sort Key: eh_subj.header_body - Nested Loop (cost=15576.58..1919555.05 rows=11220 width=184) (actual time=537.938..5094.377 rows=35000 loops=1) - Nested Loop (cost=15576.58..475387.23 rows=11020 width=120) (actual time=537.858..4404.330 rows=35000 loops=1) - Nested Loop (cost=15576.58..430265.44 rows=11092 width=112) (actual time=537.768..4024.184 rows=35000 loops=1) - Bitmap Heap Scan on email_header eh_from (cost=15576.58..16041.55 rows=107156 width=104) (actual time=537.621..1801.032 rows=280990 loops=1) Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text)) - BitmapAnd (cost=15576.58..15576.58 rows=160 width=0) (actual time=500.006..500.006 rows=0 loops=1) - Bitmap Index Scan on dummy_index (cost=0.00..3724.22 rows=107156 width=0) (actual time=85.025..85.025 rows=280990 loops=1) - Bitmap Index Scan on idx__email_header__from_local (cost=0.00..5779.24 rows=107156 width=0) (actual time=173.006..173.006 rows=280990 loops=1) - Bitmap Index Scan on dummy2_index (cost=0.00..5992.25 rows=107156 width=0) (actual time=174.463..174.463 rows=280990 loops=1) - Index Scan using email_pkey on email (cost=0.00..3.85 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990) Index Cond: (email.email_id = eh_from.email_id) Filter: ((time = '2007-05-05 17:01:59'::timestamp without time zone) AND (time '2007-05-05 18:01:59'::timestamp without time zone)) - Index Scan using mime_part_pkey on mime_part (cost=0.00..4.06 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=35000) Index Cond: ((email.email_id = mime_part.email_id) AND (mime_part.mime_part_id = 0)) - Index Scan using idx__email_header__email_id__mime_part_id on email_header eh_subj (cost=0.00..130.89 rows=13 width=104) (actual time=0.009..0.015 rows=1 loops=35000) Index Cond: ((email.email_id = eh_subj.email_id) AND (0 = eh_subj.mime_part_id)) Filter: (header_name = 'subject'::text) Total runtime: 5625.024 ms I'm wondering what it wants to achieve with these three index scans: - Bitmap Index Scan on dummy_index (cost=0.00..3724.22 rows=107156 width=0) (actual time=85.025..85.025 rows=280990 loops=1) - Bitmap Index Scan on idx__email_header__from_local (cost=0.00..5779.24 rows=107156 width=0) (actual time=173.006..173.006 rows=280990 loops=1) - Bitmap Index Scan on dummy2_index (cost=0.00..5992.25 rows=107156 width=0) (actual time=174.463..174.463 rows=280990 loops=1) The indexes in question are: CREATE INDEX dummy_index ON email_header ((555)) WHERE mime_part_id = 0 AND header_name = 'from'; CREATE INDEX dummy2_index ON email_header (substr(header_body,5)) WHERE mime_part_id = 0 AND header_name = 'from'; CREATE INDEX idx__email_header__from_local ON email_header (get_localpart(header_body)) WHERE mime_part_id = 0 AND header_name = 'from'; It appears to want to use these indexes to get the restriction AND eh_from.header_name = 'from' AND mime_part_id = 0 from the query, but why does it need three of them to do it, when all of them have the same predicate and none of them has an indexed expression that appears in the query? There are more partial indexes with the
[PERFORM] Cannot make GIN intarray index be used by the planner
Hello all, I am trying to move from GiST intarray index to GIN intarray index, but my GIN index is not being used by the planner. The normal query is like that select * from sourcetablewith_int4 where ARRAY[myint] @ myint_array and some_other_filters (with GiST index everything works fine, but GIN index is not being used) If I create the same table populating it with text[] data like select myint_array::text[] as myint_array_as_textarray into newtablewith_text from sourcetablewith_int4 and then create a GIN index using this new text[] column the planner starts to use the index and queries run with grate speed when the query looks like that: select * from newtablewith_text where ARRAY['myint'] @ myint_array_as_textarray and some_other_filters Where the problem can be with _int4 GIN index in this constellation? by now the enable_seqscan is set to off in the configuration. With best regards, -- Valentine Gogichashvili
[PERFORM] Poor performance with queries using clause: sth IN (...)
That's email from my friend. Any hint? Original Message Subject: bug Date: Wed, 09 May 2007 15:03:00 +0200 From: Michal Postupalski To: Andrzej Zawadzki We've just changed our database from 8.1 to 8.2 and we are grief-stricken about very poor performance with queries using clause: sth IN (...). As we can see any query is translate to sth = ANY ('{}'::bpchar[])) and it tooks much more time beacuse it doesn't use index'es. Why ? How can we speed up these queries? I've just read Performance of IN (...) vs. = ANY array[...] on pgsql-performance mailing list and I didn't find any solutions. Can anybody tell me what can I do with postgres to force him using indexes? If there isn't any solution I'm afraid that we will have to do downgrade to previous version 8.1. example: SELECT count(*) FROM kredytob b, kredyty k WHERE true AND b.kredytid = k.id AND '' IN ('', upper(b.nazwisko)) AND '' IN ('', upper(b.imie)) AND '7804485' IN ('', b.pesel) AND '' IN ('', upper(trim(b.dowseria))) AND '' IN ('', b.dowosnr) AND 0 IN (0, b.typkred) AND k.datazwrot IS NULL; regards... Michał Postupalski ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Cannot make GIN intarray index be used by the planner
Do you have both indexes (GiST, GIN) on the same table ? On Wed, 9 May 2007, Valentine Gogichashvili wrote: Hello all, I am trying to move from GiST intarray index to GIN intarray index, but my GIN index is not being used by the planner. The normal query is like that select * from sourcetablewith_int4 where ARRAY[myint] @ myint_array and some_other_filters (with GiST index everything works fine, but GIN index is not being used) If I create the same table populating it with text[] data like select myint_array::text[] as myint_array_as_textarray into newtablewith_text from sourcetablewith_int4 and then create a GIN index using this new text[] column the planner starts to use the index and queries run with grate speed when the query looks like that: select * from newtablewith_text where ARRAY['myint'] @ myint_array_as_textarray and some_other_filters Where the problem can be with _int4 GIN index in this constellation? by now the enable_seqscan is set to off in the configuration. With best regards, -- Valentine Gogichashvili Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Cannot make GIN intarray index be used by the planner
I have experimented quite a lot. So first I did when starting the attempt to move from GiST to GIN, was to drop the GiST index and create a brand new GIN index... after that did not bring the results, I started to create all this tables with different sets of indexes and so on... So the answer to the question is: no there in only GIN index on the table. Thank you in advance, Valentine On 5/9/07, Oleg Bartunov [EMAIL PROTECTED] wrote: Do you have both indexes (GiST, GIN) on the same table ? On Wed, 9 May 2007, Valentine Gogichashvili wrote: Hello all, I am trying to move from GiST intarray index to GIN intarray index, but my GIN index is not being used by the planner. The normal query is like that select * from sourcetablewith_int4 where ARRAY[myint] @ myint_array and some_other_filters (with GiST index everything works fine, but GIN index is not being used) If I create the same table populating it with text[] data like select myint_array::text[] as myint_array_as_textarray into newtablewith_text from sourcetablewith_int4 and then create a GIN index using this new text[] column the planner starts to use the index and queries run with grate speed when the query looks like that: select * from newtablewith_text where ARRAY['myint'] @ myint_array_as_textarray and some_other_filters Where the problem can be with _int4 GIN index in this constellation? by now the enable_seqscan is set to off in the configuration. With best regards, -- Valentine Gogichashvili Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- ვალენტინ გოგიჩაშვილი Valentine Gogichashvili
Re: [PERFORM] Nested loops overpriced
I'm having something weird too... Look: Nested Loop Left Join (cost=93.38..7276.26 rows=93 width=58) (actual time=99.211..4804.525 rows=2108 loops=1) - Hash Join (cost=93.38..3748.18 rows=93 width=4) (actual time=0.686..20.632 rows=45 loops=1) Hash Cond: ((u.i)::text = (m.i)::text) - Seq Scan on u (cost=0.00..2838.80 rows=10289 width=4) (actual time=0.010..7.813 rows=10291 loops=1) - Hash (cost=87.30..87.30 rows=30 width=7) (actual time=0.445..0.445 rows=45 loops=1) - Index Scan using m_pkey on m (cost=0.00..87.30 rows=30 width=7) (actual time=0.046..0.371 rows=45 loops=1) Index Cond: (t = 1613) Filter: ((a)::text = 'Y'::text) - Index Scan using s_pkey on s (cost=0.00..37.33 rows=3 width=58) (actual time=19.864..106.198 rows=47 loops=45) Index Cond: ((u.i = s.u) AND ((s.p)::text = '4'::text) AND (s.t = 1613) AND ((s.c)::text = 'cmi.core.total_time'::text)) Total runtime: 4805.975 ms And disabling all the joins Tom said: Nested Loop Left Join (cost=0.00..16117.12 rows=93 width=58) (actual time=2.706..168.556 rows=2799 loops=1) - Nested Loop (cost=0.00..13187.94 rows=93 width=4) (actual time=2.622..125.739 rows=50 loops=1) - Seq Scan on u (cost=0.00..2838.80 rows=10289 width=4) (actual time=0.012..9.863 rows=10291 loops=1) - Index Scan using m_pkey on m (cost=0.00..0.80 rows=1 width=7) (actual time=0.009..0.009 rows=0 loops=10291) Index Cond: ((m.t = 1615) AND ((u.i)::text = (m.i)::text)) Filter: ((a)::text = 'Y'::text) - Index Scan using s_pkey on s (cost=0.00..31.09 rows=2 width=58) (actual time=0.047..0.778 rows=56 loops=50) Index Cond: ((u.i = s.u) AND ((s.p)::text = '4'::text) AND (s.t = 1615) AND ((s.c)::text = 'cmi.core.total_time'::text)) Total runtime: 169.836 ms I had PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, shared_buffers with 1640MB, effective_cache_size with 5400MB and 8GB of RAM, where all shared_buffers blocks are used (pg_buffercache, relfilenode IS NOT NULL). Note that even when I set default_statistics_target to 500, and calling ANALYZE s;, I cannot see the number of estimated rows on the index scan on s close to the actual rows. Could it be related? 2007/5/9, Peter Eisentraut [EMAIL PROTECTED]: Am Dienstag, 8. Mai 2007 17:53 schrieb Tom Lane: Hmm, I'd have expected it to discount the repeated indexscans a lot more than it seems to be doing for you. As an example in the regression database, note what happens to the inner indexscan cost estimate when the number of outer tuples grows: I can reproduce your results in the regression test database. 8.2.1 and 8.2.4 behave the same. I checked the code around cost_index(), and the assumptions appear to be correct (at least this query doesn't produce wildly unusual data). Apparently, however, the caching effects are much more significant than the model takes into account. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Daniel Cristian Cruz ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Nested loops overpriced
Daniel Cristian Cruz [EMAIL PROTECTED] writes: - Nested Loop (cost=0.00..13187.94 rows=93 width=4) (actual time=2.622..125.739 rows=50 loops=1) - Seq Scan on u (cost=0.00..2838.80 rows=10289 width=4) (actual time=0.012..9.863 rows=10291 loops=1) - Index Scan using m_pkey on m (cost=0.00..0.80 rows=1 width=7) (actual time=0.009..0.009 rows=0 loops=10291) That's not discounting the nested loop for cache effect at all! What is your effective_cache_size for this? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Nested loops overpriced
2007/5/9, Gregory Stark [EMAIL PROTECTED]: Daniel Cristian Cruz [EMAIL PROTECTED] writes: - Nested Loop (cost=0.00..13187.94 rows=93 width=4) (actual time=2.622..125.739 rows=50 loops=1) - Seq Scan on u (cost=0.00..2838.80 rows=10289 width=4) (actual time=0.012..9.863 rows=10291 loops=1) - Index Scan using m_pkey on m (cost=0.00..0.80 rows=1 width=7) (actual time=0.009..0.009 rows=0 loops=10291) That's not discounting the nested loop for cache effect at all! What is your effective_cache_size for this? effective_cache_size is 5400MB. I forgot to mention a modifications on cost: cpu_tuple_cost = 0.2 Which forced a usage of indexes. I set it to 0.01 and the plan has a index scan on m before the hash on u, being 15% slower: Hash Cond: ((u.i)::text = (m.i)::text) - Seq Scan on u (cost=0.00..2838.80 rows=10289 width=4) (actual time=0.007..6.138 rows=10292 loops=1) - Hash (cost=87.30..87.30 rows=30 width=7) (actual time=0.185..0.185 rows=50 loops=1) - Index Scan using m_pkey on m (cost=0.00..87.30 rows=30 width=7) (actual time=0.021..0.144 rows=50 loops=1) Index Cond: (t = 1615) Filter: ((a)::text = 'Y'::text) I'm still confused since I didn't understood what That's not discounting the nested loop for cache effect at all! could mean... Thanks for the help. -- Daniel Cristian Cruz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Nested loops overpriced
Am Mittwoch, 9. Mai 2007 16:11 schrieb Tom Lane: Well, there's something funny going on here. You've got for instance - Index Scan using email_pkey on email (cost=0.00..3.85 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990) Index Cond: (email.email_id = eh_from.email_id) Filter: ((time = '2007-05-05 17:01:59'::timestamp without time zone) AND (time '2007-05-05 18:01:59'::timestamp without time zone)) on the inside of a nestloop whose outer side is predicted to return 107156 rows. That should've been discounted to *way* less than 3.85 cost units per iteration. This is the new plan with 8.2.4. It's still got the same problem, though. QUERY PLAN --- GroupAggregate (cost=5627064.21..5627718.73 rows=32726 width=184) (actual time=4904.834..5124.585 rows=35000 loops=1) - Sort (cost=5627064.21..5627146.03 rows=32726 width=184) (actual time=4904.771..4947.892 rows=35000 loops=1) Sort Key: eh_subj.header_body - Nested Loop (cost=0.00..5624610.06 rows=32726 width=184) (actual time=0.397..4628.141 rows=35000 loops=1) - Nested Loop (cost=0.00..1193387.12 rows=28461 width=120) (actual time=0.322..3960.360 rows=35000 loops=1) - Nested Loop (cost=0.00..1081957.26 rows=28648 width=112) (actual time=0.238..3572.023 rows=35000 loops=1) - Index Scan using dummy_index on email_header eh_from (cost=0.00..13389.15 rows=280662 width=104) (actual time=0.133..1310.248 rows=280990 loops=1) - Index Scan using email_pkey on email (cost=0.00..3.79 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990) Index Cond: (email.email_id = eh_from.email_id) Filter: ((time = '2007-05-05 17:01:59'::timestamp without time zone) AND (time '2007-05-05 18:01:59'::timestamp without time zone)) - Index Scan using mime_part_pkey on mime_part (cost=0.00..3.88 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=35000) Index Cond: ((email.email_id = mime_part.email_id) AND (mime_part.mime_part_id = 0)) - Index Scan using idx__email_header__email_id__mime_part_id on email_header eh_subj (cost=0.00..155.47 rows=18 width=104) (actual time=0.009..0.014 rows=1 loops=35000) Index Cond: ((email.email_id = eh_subj.email_id) AND (0 = eh_subj.mime_part_id)) Filter: (header_name = 'subject'::text) Total runtime: 5161.390 ms Are you using any nondefault planner settings? random_page_cost = 3 effective_cache_size = 384MB How big are these tables, anyway? email 35 MB email_header421 MB mime_part 37 MB Everything is analyzed, vacuumed, and reindexed. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
ZFS and Postgresql - WASRe: [PERFORM] Best OS for Postgres 8.2
Hello Ian, I have done some testing with postgresql and ZFS on Solaris 10 11/06. While I work for Sun, I dont claim to be a ZFS expert (for that matter not even Solaris or PostgreSQL). Lets first look at the scenarios of how postgresql can be deployed on Solaris First the Solaris Options 1. UFS with default setup (which is buffered file system) 2. UFS with forcedirectio option (or unbuffered file system) 3. ZFS by default (128K recordsize with checksum but no compression) 4. ZFS with Compression (Default compression using LZ* algorithm .. now even a gzip algorithm is supported) (For simplicity I am not considering RAID levels here since that increases the number of scenarios quite a bit and also skipping Solaris Volume Manager - legacy volume management capabilities in Solaris) Now for the postgresql.conf options a. wal_sync_method set to default - maps to opendatasync b. wal_sync_method set to fdatasync (assuming checkpoint_segments and wal_buffers are high already) (This are my tests results based on the way I used the workload and your mileage will vary) So with this type of configurations I found the following 1a. Default UFS with default wal_sync_method - Sucks for me mostly using pgbench or EAStress type workloads 1b. Default UFS with fdatasync - works well specially increasing segmapsize from default 12% to higher values 2a ForcedirectIO with default wal_sync_method - works well but then is limited to hardware disk performances (In a way good to have RAID controller with big Write cache for it.. One advantage is lower system cpu utilization) 2b Didn't see huge difference from 2a in this case 3a It was better than 1a but still limited 3b It was better even than 3a and 1b but cpu utilization seemed higher 4a - Didn't test this out 4b - Hard to say since in my case since I wasnt disk bound (per se) but CPU bound. The compression helps when number of IOs to the disk are high and it helps to cut it down at the cost of CPU cycles Overall ZFS seems to improve performance with PostgreSQL on Solaris 10 with a bit increased system times compared to UFS. (So the final results depends on the metrics that you are measuring the performance :-) ) (ZFS engineers are constantly improving the performance and I have seen the improvements from Solaris 10 1/06 release to my current setup) Of course I haven't compared against any other OS.. If someone has already done that I would be interested in knowing the results. Now comes the thing that I am still exploring * Do we do checksum in WAL ? I guess we do .. Which means that we are now doing double checksumming on the data. One in ZFS and one in postgresql. ZFS does allow checksumming to be turned off (but on new blocks allocated). But of course the philosophy is where should it be done (ZFS or PostgreSQL). ZFS checksumming gives ability to correct the data on the bad checksum if you use mirror devices. PostgreSQL doesnt give that ability and in case of an error would fail. ( I dont know the exact behavior of postgresql when it would encounter a failed checksum) Hope this helps. Regards, Jignesh 李彦 Ian Li wrote: In #postgresql on freenode, somebody ever mentioned that ZFS from Solaris helps a lot to the performance of pgsql, so dose anyone have information about that? Steve Atkins wrote: On May 7, 2007, at 2:55 PM, David Levy wrote: Hi, I am about to order a new server for my Postgres cluster. I will probably get a Dual Xeon Quad Core instead of my current Dual Xeon. Which OS would you recommend to optimize Postgres behaviour (i/o access, multithreading, etc) ? I am hesitating between Fedora Core 6, CentOS and Debian. Can anyone help with this ? Well, all three you mention are much the same, just with a different badge on the box, as far as performance is concerned. They're all going to be a moderately recent Linux kernel, with your choice of filesystems, so any choice between them is going to be driven more by available staff and support or personal preference. I'd probably go CentOS 5 over Fedora just because Fedora doesn't get supported for very long - more of an issue with a dedicated database box with a long lifespan than your typical desktop or interchangeable webserver. I might also look at Solaris 10, though. I've yet to play with it much, but it seems nice, and I suspect it might manage 8 cores better than current Linux setups. Cheers, Steve ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Regards Ian ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: ZFS and Postgresql - WASRe: [PERFORM] Best OS for Postgres 8.2
Jignesh Shah escribió: Now comes the thing that I am still exploring * Do we do checksum in WAL ? I guess we do .. Which means that we are now doing double checksumming on the data. One in ZFS and one in postgresql. ZFS does allow checksumming to be turned off (but on new blocks allocated). But of course the philosophy is where should it be done (ZFS or PostgreSQL). Checksums on WAL are not optional in Postgres, because AFAIR they are used to determine when it should stop recovering. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best OS for Postgres 8.2
On May 8, 2007, at 2:59 AM, [EMAIL PROTECTED] wrote: one issue with journaling filesystems, if you journal the data as well as the metadata you end up with a very reliable setup, however it means that all your data needs to be written twice, oncce to the journal, and once to the final location. the write to the journal can be slightly faster then a normal write to the final location (the journal is a sequential write to an existing file), however the need to write twice can effectivly cut your disk I/O bandwidth in half when doing heavy writes. worse, when you end up writing mor ethen will fit in the journal (128M is the max for ext3) the entire system then needs to stall while the journal gets cleared to make space for the additional writes. That's why you want to mount ext3 partitions used with PostgreSQL with data=writeback. Some folks will also use a small filesystem for pg_xlog and mount that as ext2. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How to Find Cause of Long Vacuum Times - NOOB Question
On May 7, 2007, at 11:10 PM, Yudhvir Singh Sidhu wrote: Jim Nasby wrote: On May 5, 2007, at 5:57 PM, Yudhvir Singh Sidhu wrote: Problem: Vacuum times jump up from 45 minutes, or 1:30 minutes to 6+ hours overnight, once every 1 to 3 months. Solutions tried: db truncate - brings vacuum times down. Reindexing brings vacuum times down. Does it jump up to 6+ hours just once and then come back down? Or once at 6+ hours does it stay there? Getting that kind of change in vacuum time sounds a lot like you suddenly didn't have enough maintenance_work_mem to remember all the dead tuples in one pass; increasing that setting might bring things back in line (you can increase it on a per-session basis, too). Also, have you considered vacuuming during the day, perhaps via autovacuum? If you can vacuum more often you'll probably get less bloat. You'll probably want to experiment with the vacuum_cost_delay settings to reduce the impact of vacuuming during the day (try setting vacuum_cost_delay to 20 as a starting point). It ramps up and I have to run a db truncate to bring it back down. On some machines it creeps up, on others it spikes. I have seen it climb from 6 to 12 to 21 in 3 consequtive days. Well, what's one to do? I have maintenance_work_mem set to 32768 - Is that enough? Depends on how many dead rows there are to be vacuumed. If there's a lot, you could certainly be exceeding maintenance_work_mem. If you look closely at the output of VACUUM VERBOSE you'll see the indexes for a particular table being scanned more than once if all the dead rows can't fit into maintenance_work_mem. I vacuum daily. If you've got high update rates, that very likely might not be often enough. I just turned vacuum verbose on on one of the systems and will find out tomorrow what it shows me. I plan on playing with Max_fsm_ settings tomorrow. And I'll keep you guys up to date. The tail end of vacuumdb -av will tell you exactly how much room is needed in the FSM. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Nested loops overpriced
Peter Eisentraut [EMAIL PROTECTED] writes: Are you using any nondefault planner settings? random_page_cost = 3 effective_cache_size = 384MB How big are these tables, anyway? email 35 MB email_header 421 MB mime_part 37 MB Hmmm ... I see at least part of the problem, which is that email_header is joined twice in this query, which means that it's counted twice in figuring the total volume of pages competing for cache space. So the thing thinks cache space is oversubscribed nearly 3X when in reality the database is fully cached. I remember having dithered about whether to try to avoid counting the same physical relation more than once in total_table_pages, but this example certainly suggests that we shouldn't. Meanwhile, do the estimates get better if you set effective_cache_size to 1GB or so? To return to your original comment: if you're trying to model a situation with a fully cached database, I think it's sensible to set random_page_cost = seq_page_cost = 0.1 or so. You had mentioned having to decrease them to 0.02, which seems unreasonably small to me too, but maybe with the larger effective_cache_size you won't have to go that far. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Cannot make GIN intarray index be used by the planner
[cc'ing to pgsql-hackers since this is looking like a contrib/intarray bug] Valentine Gogichashvili [EMAIL PROTECTED] writes: here is the DT That works fine for me in 8.2: regression=# explain SELECT id, (myintarray_int4) FROM myintarray_table_nonulls WHERE ARRAY[8] @ myintarray_int4; QUERY PLAN -- Index Scan using idx_nonnulls_myintarray_int4_gin on myintarray_table_nonulls (cost=0.00..8.27 rows=1 width=36) Index Cond: ('{8}'::integer[] @ myintarray_int4) (2 rows) What I am betting is that you've installed contrib/intarray in this database and that's bollixed things up somehow. In particular, intarray tries to take over the position of default gin opclass for int4[], and the opclass that it installs as default has operators named just like the built-in ones. If somehow your query is using pg_catalog.@ instead of intarray's public.@, then the planner wouldn't think the index is relevant. In a quick test your example still works with intarray installed, because what it's really created is public.@ (integer[], integer[]) which is an exact match and therefore takes precedence over the built-in pg_catalog.@ (anyarray, anyarray). But if for example you don't have public in your search_path then the wrong operator would be chosen. Please look at the pg_index entry for your index, eg select * from pg_index where indexrelid = 'versionA.idx_nonnulls_myintarray_int4_gin'::regclass; and see whether the index opclass is the built-in one or not. Note to hackers: we've already discussed that intarray shouldn't be trying to take over the default gin opclass, but I am beginning to wonder if it still has a reason to live at all. We should at least consider removing the redundant operators to avoid risks like this one. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Vacuum Times - Verbose and maintenance_work_mem
How do you specify a log file for vacuum verbose to send info to? I have verbose turned on but cannot see any log messages. I have upped maintenance_work_mem setting from 32768 to 98304. This is on a 4 GB, 3.2 GHz Xeon, dual core, dual cpu with HTT turned on. I hope that helps with vacuum times. PG 8.0.9 on a UFS2 FreeBSD 6.2 prerelease. -- Yudhvir Singh Sidhu 408 375 3134 cell
[PERFORM] Performance Woes
Hi, I have several databases. They are each about 35gb in size and have about 10.5K relations (count from pg_stat_all_tables) in them. Pg_class is about 26k rows and the data directory contains about 70k files. These are busy machines, they run about 50 xactions per second, ( aproxx insert / update / delete about 500 rows per second). We started getting errors about the number of open file descriptors : 2007-05-09 03:07:50.083 GMT 1146975740: LOG: 53000: out of file descriptors: Too many open files; release and retry 2007-05-09 03:07:50.083 GMT 1146975740: CONTEXT: SQL statement insert ….. PL/pgSQL function trigfunc_whatever line 50 at execute statement 2007-05-09 03:07:50.083 GMT 1146975740: LOCATION: BasicOpenFile, fd.c:471 2007-05-09 03:07:50.091 GMT 0: LOG: 0: duration: 12.362 ms 2007-05-09 03:07:50.091 GMT 0: LOCATION: exec_simple_query, postgres.c:1090 So we decreased the max_files_per_process to 800. This took care of the error *BUT* about quadrupled the IO wait that is happening on the machine. It went from a peek of about 50% to peeks of over 200% (4 processor machines, 4 gigs ram, raid). The load on the machine remained constant. I am really to get an understanding of exactly what this setting is and ‘what’ is out of file descriptors and how I can fix that. I need to bring that IO back down. Thanks for any help. Ralph -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 05/12/2006 16:07
Re: [PERFORM] Performance Woes
I have several databases. They are each about 35gb in size and have about 10.5K relations (count from pg_stat_all_tables) in them. Pg_class is about 26k rows and the data directory contains about 70k files. These are busy machines, they run about 50 xactions per second, ( aproxx insert / update / delete about 500 rows per second). We started getting errors about the number of open file descriptors : 2007-05-09 03:07:50.083 GMT 1146975740: LOG: 53000: out of file descriptors: Too many open files; release and retry 2007-05-09 03:07:50.083 GMT 1146975740: CONTEXT: SQL statement insert ….. PL/pgSQL function trigfunc_whatever line 50 at execute statement 2007-05-09 03:07:50.083 GMT 1146975740: LOCATION: BasicOpenFile, fd.c:471 2007-05-09 03:07:50.091 GMT 0: LOG: 0: duration: 12.362 ms 2007-05-09 03:07:50.091 GMT 0: LOCATION: exec_simple_query, postgres.c :1090 So we decreased the max_files_per_process to 800. This took care of the error **BUT** about quadrupled the IO wait that is happening on the machine. It went from a peek of about 50% to peeks of over 200% (4 processor machines, 4 gigs ram, raid). The load on the machine remained constant. What version of Pg/OS? What is your hardware config? I had seen these errors with earlier versions of Pg 7.4.x which was fixed in later releases according to the changelogs
Re: [PERFORM] Performance Woes
2007-05-09 03:07:50.083 GMT 1146975740: LOCATION: BasicOpenFile, fd.c:471 2007-05-09 03:07:50.091 GMT 0: LOG: 0: duration: 12.362 ms 2007-05-09 03:07:50.091 GMT 0: LOCATION: exec_simple_query, postgres.c:1090 So we decreased the max_files_per_process to 800. This took care of the error **BUT** about quadrupled the IO wait that is happening on the machine. It went from a peek of about 50% to peeks of over 200% (4 processor machines, 4 gigs ram, raid). The load on the machine remained constant. Sounds to me like you just need to up the total amount of open files allowed by the operating system. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance Woes
Hello, You likely need to increase your file-max parameters using sysctl.conf. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance Woes
On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote: 2007-05-09 03:07:50.083 GMT 1146975740: LOCATION: BasicOpenFile, fd.c:471 2007-05-09 03:07:50.091 GMT 0: LOG: 0: duration: 12.362 ms 2007-05-09 03:07:50.091 GMT 0: LOCATION: exec_simple_query, postgres.c:1090 So we decreased the max_files_per_process to 800. This took care of the error **BUT** about quadrupled the IO wait that is happening on the machine. It went from a peek of about 50% to peeks of over 200% (4 processor machines, 4 gigs ram, raid). The load on the machine remained constant. Sounds to me like you just need to up the total amount of open files allowed by the operating system. It looks more like the opposite, here's the docs for max_files_per_process: Sets the maximum number of simultaneously open files allowed to each server subprocess. The default is one thousand files. If the kernel is enforcing a safe per-process limit, you don't need to worry about this setting. But on some platforms (notably, most BSD systems), the kernel will allow individual processes to open many more files than the system can really support when a large number of processes all try to open that many files. If you find yourself seeing Too many open files failures, try reducing this setting. This parameter can only be set at server start. To me, that means that his machine is allowing the new FD to be created, but then can't really support that many so it gives an error. Ralph, how many connections do you have open at once? It seems like the machine perhaps just can't handle that many FDs in all of those processes at once. That is a lot of tables. Maybe a different OS will handle it better? Maybe there's some way that you can use fewer connections and then the OS could still handle it? Regards, Jeff Davis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance Woes
Ralph Mason wrote: I have several databases. They are each about 35gb in size and have about 10.5K relations (count from pg_stat_all_tables) in them. Pg_class is about 26k rows and the data directory contains about 70k files. These are busy machines, they run about 50 xactions per second, ( aproxx insert / update / delete about 500 rows per second). Is it always the same trigger the problematic one? Is it just PL/pgSQL, or do you have something else? Something that may be trying to open additional files for example? Something that may be trying to open files behind your back? PL/Perl with funky operators or temp files? Also, what PG version is this? So we decreased the max_files_per_process to 800. This took care of the error *BUT* about quadrupled the IO wait that is happening on the machine. It went from a peek of about 50% to peeks of over 200% (4 processor machines, 4 gigs ram, raid). The load on the machine remained constant. The max_files_per_process settings controls how many actual file descriptors each process is allowed to have. Postgres uses internally a virtual file descriptor, which normally have one file descriptor open each. However, if your transactions need to access lots of files, the VFDs will close the kernel FDs to allow other VFDs to open theirs. So it sounds like your transaction has more than 800 files open. The extra IO wait could be caused by the additional system calls to open and close those files as needed. I would actually expect it to cause extra system load (as opposed to user) rather than IO, but I'm not sure. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance Woes
Jeff Davis [EMAIL PROTECTED] writes: On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote: Sounds to me like you just need to up the total amount of open files allowed by the operating system. It looks more like the opposite, here's the docs for max_files_per_process: I think Josh has got the right advice. The manual is just saying that you can reduce max_files_per_process to avoid the failure, but it's not making any promises about the performance penalty for doing that. Apparently Ralph's app needs a working set of between 800 and 1000 open files to have reasonable performance. That is a lot of tables. Maybe a different OS will handle it better? Maybe there's some way that you can use fewer connections and then the OS could still handle it? Also, it might be worth rethinking the database structure to reduce the number of tables. But for a quick-fix, increasing the kernel limit seems like the easiest answer. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Background vacuum
Dear list, I'm running postgres on a tomcat server. The vacuum is run every hour (cronjob) which leads to a performance drop of the tomcat applications. I played around with renice command and I think it is possible to reduce this effect which a renice. The problem is how can I figure out the PID of the postmaster performing the vacuum(automated)? Has anybody a nice solution to change process priority? A shell script, maybe even for java? best regards Dani ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Background vacuum
Daniel Haensse wrote: Dear list, I'm running postgres on a tomcat server. The vacuum is run every hour (cronjob) which leads to a performance drop of the tomcat applications. I played around with renice command and I think it is possible to reduce this effect which a renice. The problem is how can I figure out the PID of the postmaster performing the vacuum(automated)? Has anybody a nice solution to change process priority? A shell script, maybe even for java? While this may technically work, I think it lacks a key point. 'nice' ( at least the versions I'm familiar with ) do not adjust I/O priority. VACUUM is bogging things down because of the extra strain on I/O. CPU usage shouldn't really be much of a factor. Instead, I would recommend looking at vacuum_cost_delay and the related settings to make vacuum lower priority than the queries you care about. This should be a cleaner solution for you. -Dan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq