Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
On 10/5/06, Carlo Stonebanks <[EMAIL PROTECTED]> wrote: Hi Merlin, Here are the results. The query returned more rows (65 vs 12) because of the vague postal_code. right. interestingly, the index didn't work properly anyways. regardless, this is easily solvable but it looks like we might be looking in the wrong place. do we have an multi-column index on facility_address(facility_id, address_id)? did you run analyze? In reality, we would have to modify the postal_code logic to take advantage of full zip codes when they were avalable, not unconditionally truncate them. Carlo explain analyze select f.facility_id, fa.facility_address_id, a.address_id, f.facility_type_code, f.name, a.address, a.city, a.state_code, a.postal_code, a.country_code from mdx_core.facility as f join mdx_core.facility_address as fa on fa.facility_id = f.facility_id join mdx_core.address as a on a.address_id = fa.address_id where (a.country_code, a.state_code, mdx_core.zip_trunc(a.postal_code)) = ('US', 'IL', mdx_core.zip_trunc('60640-5759')) order by facility_id "Sort (cost=6474.78..6474.84 rows=25 width=103) (actual time=217.279..217.311 rows=65 loops=1)" " Sort Key: f.facility_id" " -> Nested Loop (cost=2728.54..6474.20 rows=25 width=103) (actual time=35.828..217.059 rows=65 loops=1)" "-> Hash Join (cost=2728.54..6384.81 rows=25 width=72) (actual time=35.801..216.117 rows=65 loops=1)" " Hash Cond: ("outer".address_id = "inner".address_id)" " -> Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) (actual time=0.007..99.072 rows=128268 loops=1)" " -> Hash (cost=2728.50..2728.50 rows=19 width=64) (actual time=33.618..33.618 rows=39 loops=1)" "-> Bitmap Heap Scan on address a (cost=48.07..2728.50 rows=19 width=64) (actual time=2.569..33.491 rows=39 loops=1)" " Recheck Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" " Filter: (mdx_core.zip_trunc(postal_code) = '60640'::text)" " -> Bitmap Index Scan on address_country_state_zip_trunc_idx (cost=0.00..48.07 rows=3846 width=0) (actual time=1.783..1.783 rows=3554 loops=1)" "Index Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" "-> Index Scan using facility_pkey on facility f (cost=0.00..3.56 rows=1 width=35) (actual time=0.009..0.010 rows=1 loops=65)" " Index Cond: ("outer".facility_id = f.facility_id)" "Total runtime: 217.520 ms" ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] UPDATE becomes mired / win32
I'm pretty sure that the table was empty before doing the load, but I gave this a shot. It didn't have an impact on the results. The behavior also persists across a dump/reload of the table into a new install on a different machine. IIRC dump/reload rebuilds indexes from scratch. Steve At 01:13 PM 10/4/2006, [EMAIL PROTECTED] wrote: The table, VOTER, contains 3,090,013 rows and each row is about 120 bytes wide. It's loaded via a batch process in one shot, and the load is followed by an VACUUM FULL ANALYZE. Its structure is shown at the bottom of the message. if the table wasn't empty before and has indices defined, try a "REINDEX TABLE VOTER" before running the update. i had a similar case where an often updated table was vacuumed regurarly, but the indices grew and grew and grew. in my case the table - even when empty and analyze full'ed was 1.2gb according to pgadmin due to (outdated) indices. a reindex fixed all my performance issues. - thomas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] UPDATE becomes mired / win32
Both commands seem to be saturating the disk. There's nothing else running in the database, and all of the locks have 't' in the granted column, which I'm assuming means they're not blocked. According to the statistics, the original table has 889 mb and indexes of 911mb, whereas the copy has 1021 mb and no space for indexes. Steve At 03:28 PM 10/4/2006, Tom Lane wrote: Steve Peterson <[EMAIL PROTECTED]> writes: > If I run the statement: > (1): UPDATE voter SET gender = 'U'; > on the table in this condition, the query effectively never ends -- > I've allowed it to run for 12-14 hours before giving up. > ... > When (1) is running, the machine is very nearly idle, with no > postmaster taking more than 1 or 2 % of the CPU. Is the disk busy? If neither CPU nor I/O are saturated, then it's a good bet that the UPDATE isn't actually running at all, but is waiting for a lock somewhere. Have you looked into pg_locks to check for a conflicting lock? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pg_trgm indexes giving bad estimations?
Ben <[EMAIL PROTECTED]> writes: > How can I get the planner to not expect so many rows to be returned? Write an estimation function for the pg_trgm operator(s). (Send in a patch if you do!) I see that % is using "contsel" which is only a stub, and would likely be wrong for % even if it weren't. > A possibly related question is: because pg_tgrm lets me set the > matching threshold of the % operator, how does that affect the planner? It hasn't a clue about that. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] pg_trgm indexes giving bad estimations?
I've got another query I'm trying to optimize: select aj.album from public.track t join public.albumjoin aj on (aj.track = t.id) join (select id from public.albummeta am where tracks between 10 and 14) lam on (lam.id = aj.album) where (t.name % '01New OrderEvil Dust' or t.name % '04OrbitalOpen Mind') group by aj.album having count(*) >= 9.6; This gives an expensive (but still reasonable) plan of: HashAggregate (cost=76523.64..76602.25 rows=4492 width=4) Filter: ((count(*))::numeric >= 9.6) -> Hash Join (cost=63109.73..76501.18 rows=4492 width=4) Hash Cond: ("outer".id = "inner".album) -> Bitmap Heap Scan on albummeta am (cost=1810.10..9995.34 rows=187683 width=4) Recheck Cond: ((tracks >= 10) AND (tracks <= 14)) -> Bitmap Index Scan on albummeta_tracks_index (cost=0.00..1810.10 rows=187683 width=0) Index Cond: ((tracks >= 10) AND (tracks <= 14)) -> Hash (cost=61274.03..61274.03 rows=10243 width=4) -> Nested Loop (cost=163.87..61274.03 rows=10243 width=4) -> Bitmap Heap Scan on track t (cost=163.87..28551.33 rows=10243 width=4) Recheck Cond: (((name)::text % '01New OrderEvil Dust'::text) OR ((name)::text % '04OrbitalOpen Mind'::text)) -> BitmapOr (cost=163.87..163.87 rows=10248 width=0) -> Bitmap Index Scan on track_name_trgm_idx (cost=0.00..81.93 rows=5124 width=0) Index Cond: ((name)::text % '01New OrderEvil Dust'::text) -> Bitmap Index Scan on track_name_trgm_idx (cost=0.00..81.93 rows=5124 width=0) Index Cond: ((name)::text % '04OrbitalOpen Mind'::text) -> Index Scan using albumjoin_trackindex on albumjoin aj (cost=0.00..3.18 rows=1 width=8) Index Cond: (aj.track = "outer".id) (19 rows) Unfortunately, when I modify this example to use a more typical number of trigram searches or'd together (anywhere from 10 to 20), the planner thinks the bitmap heap scan on track t will return a lot of rows, and so reverts to doing a sequential scan of albumjoin for the next table join. That would make sense IF there were a lot of rows returned by the bitmap index scans. But here is where the planner gets it really wrong, if I'm reading it right. It seems to think both my index scans will return 5124 rows, when, in reality, it's a lot less: select count(*) from public.track where name % '01New OrderEvil Dust'; count --- 20 (1 row) select count(*) from public.track where name % '04OrbitalOpen Mind'; count --- 123 (1 row) How can I get the planner to not expect so many rows to be returned? A possibly related question is: because pg_tgrm lets me set the matching threshold of the % operator, how does that affect the planner? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
Hi Merlin, Here are the results. The query returned more rows (65 vs 12) because of the vague postal_code. In reality, we would have to modify the postal_code logic to take advantage of full zip codes when they were avalable, not unconditionally truncate them. Carlo explain analyze select f.facility_id, fa.facility_address_id, a.address_id, f.facility_type_code, f.name, a.address, a.city, a.state_code, a.postal_code, a.country_code from mdx_core.facility as f join mdx_core.facility_address as fa on fa.facility_id = f.facility_id join mdx_core.address as a on a.address_id = fa.address_id where (a.country_code, a.state_code, mdx_core.zip_trunc(a.postal_code)) = ('US', 'IL', mdx_core.zip_trunc('60640-5759')) order by facility_id "Sort (cost=6474.78..6474.84 rows=25 width=103) (actual time=217.279..217.311 rows=65 loops=1)" " Sort Key: f.facility_id" " -> Nested Loop (cost=2728.54..6474.20 rows=25 width=103) (actual time=35.828..217.059 rows=65 loops=1)" "-> Hash Join (cost=2728.54..6384.81 rows=25 width=72) (actual time=35.801..216.117 rows=65 loops=1)" " Hash Cond: ("outer".address_id = "inner".address_id)" " -> Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) (actual time=0.007..99.072 rows=128268 loops=1)" " -> Hash (cost=2728.50..2728.50 rows=19 width=64) (actual time=33.618..33.618 rows=39 loops=1)" "-> Bitmap Heap Scan on address a (cost=48.07..2728.50 rows=19 width=64) (actual time=2.569..33.491 rows=39 loops=1)" " Recheck Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" " Filter: (mdx_core.zip_trunc(postal_code) = '60640'::text)" " -> Bitmap Index Scan on address_country_state_zip_trunc_idx (cost=0.00..48.07 rows=3846 width=0) (actual time=1.783..1.783 rows=3554 loops=1)" "Index Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" "-> Index Scan using facility_pkey on facility f (cost=0.00..3.56 rows=1 width=35) (actual time=0.009..0.010 rows=1 loops=65)" " Index Cond: ("outer".facility_id = f.facility_id)" "Total runtime: 217.520 ms" ""Merlin Moncure"" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On 10/4/06, Carlo Stonebanks <[EMAIL PROTECTED]> wrote: >> > can you do explain analyze on the two select queries on either side of >> > the union separatly? the subquery is correctly written and unlikely >> > to be a problem (in fact, good style imo). so lets have a look at >> > both sides of facil query and see where the problem is. >> >> Sorry for the delay, the server was down yesterday and couldn't get >> anything. >> >> I have modified the sub-queries a little, trying to get the index scans >> to >> fire - all the tables involved here are large enough to benefit from >> index >> scans over sequential scans. I am mystified as to why PART 1 is giving >> me: >> > >> "Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 >> width=12) >> (actual time=0.007..99.033 rows=128268 loops=1)" > > not sure on this, lets go back to that. > >>into account that perhaps the import row is using the 5-number US ZIP, >> not the 9-number USZIP+4 > > >>where >> a.country_code = 'US' >> and a.state_code = 'IL' >> and a.postal_code like '60640-5759'||'%' >>order by facility_id > > 1. create a small function, sql preferred which truncates the zip code > to 5 digits or reduces to so called 'fuzzy' matching criteria. lets > call it zip_trunc(text) and make it immutable which it is. write this > in sql, not tcl if possible (trust me). > > create index address_idx on address(country_code, state_code, > zip_trunc(postal_code)); > > rewrite above where clause as > > where (a.country_code, a.state_code, zip_trunc(postal_code)) = ('US', > 'IL', zip_trunc('60640-5759')); > > try it out, then lets see how it goes and then we can take a look at > any seqscan issues. > > merlin > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] any hope for my big query?
On Fri, 29 Sep 2006, Jim C. Nasby wrote: There's no join criteria for umdb.node... is that really what you want? Unfortunately, yes, it is. I've taken in all of everybody's helpful advice (thanks!) and reworked things a little, and now I'm left with this expensive nugget: select aj.album from (select seconds-1 as a,seconds+1 as b from node where node.dir = 6223) n join public.track t on (t.length between n.a*1000 and n.b*1000) join public.albumjoin aj on (aj.track = t.id) join (select id from public.albummeta am where tracks between 3 and 7) lam on (lam.id = aj.album) group by aj.album having count(*) >= 4; ...which comes out to be: HashAggregate (cost=90.69..904909.99 rows=31020 width=4) Filter: (count(*) >= 4) -> Nested Loop (cost=428434.81..897905.17 rows=1307904 width=4) Join Filter: (("inner".length >= (("outer".seconds - 1) * 1000)) AND ("inner".length <= (("outer".seconds + 1) * 1000))) -> Index Scan using node_dir on node (cost=0.00..3.46 rows=17 width=4) Index Cond: (dir = 6223) -> Materialize (cost=428434.81..438740.01 rows=692420 width=8) -> Hash Join (cost=210370.58..424361.39 rows=692420 width=8) Hash Cond: ("outer".id = "inner".track) -> Seq Scan on track t (cost=0.00..128028.41 rows=5123841 width=8) -> Hash (cost=205258.53..205258.53 rows=692420 width=8) -> Hash Join (cost=6939.10..205258.53 rows=692420 width=8) Hash Cond: ("outer".album = "inner".id) -> Seq Scan on albumjoin aj (cost=0.00..88918.41 rows=5123841 width=8) -> Hash (cost=6794.51..6794.51 rows=57834 width=4) -> Bitmap Heap Scan on albummeta am (cost=557.00..6794.51 rows=57834 width=4) Recheck Cond: ((tracks >= 3) AND (tracks <= 7)) -> Bitmap Index Scan on albummeta_tracks_index (cost=0.00..557.00 rows=57834 width=0) Index Cond: ((tracks >= 3) AND (tracks <= 7)) (19 rows) I'm surprised (though probably just because I'm ignorant) that it would have so much sequential scanning in there. For instance, because n is going to have at most a couple dozen rows, it seems that instead of scanning all of public.track, it should be able to convert my "t.length between a and b" clause to some between statements or'd together. Or at least, it would be nice if the planner could do that. :) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
On 10/4/06, Carlo Stonebanks <[EMAIL PROTECTED]> wrote: > can you do explain analyze on the two select queries on either side of > the union separatly? the subquery is correctly written and unlikely > to be a problem (in fact, good style imo). so lets have a look at > both sides of facil query and see where the problem is. Sorry for the delay, the server was down yesterday and couldn't get anything. I have modified the sub-queries a little, trying to get the index scans to fire - all the tables involved here are large enough to benefit from index scans over sequential scans. I am mystified as to why PART 1 is giving me: "Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) (actual time=0.007..99.033 rows=128268 loops=1)" not sure on this, lets go back to that. into account that perhaps the import row is using the 5-number US ZIP, not the 9-number USZIP+4 where a.country_code = 'US' and a.state_code = 'IL' and a.postal_code like '60640-5759'||'%' order by facility_id 1. create a small function, sql preferred which truncates the zip code to 5 digits or reduces to so called 'fuzzy' matching criteria. lets call it zip_trunc(text) and make it immutable which it is. write this in sql, not tcl if possible (trust me). create index address_idx on address(country_code, state_code, zip_trunc(postal_code)); rewrite above where clause as where (a.country_code, a.state_code, zip_trunc(postal_code)) = ('US', 'IL', zip_trunc('60640-5759')); try it out, then lets see how it goes and then we can take a look at any seqscan issues. merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Multi-key index not beeing used - bug?
[Tom Lane - Wed at 04:33:54PM -0400] > > We have indices on the users_id field and the (users_id, created)-tuple. > > Neither of those indexes can provide the sort order the query is asking > for. Ah; that's understandable - the planner have two options, to do a index traversion without any extra sorting, or to take out everything and then sort. What I'd like postgres to do is to traverse the index and do some sorting for every unique value of created. Maybe such a feature can be found in future releases - like Postgres 56.3? ;-) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Multi-key index not beeing used - bug?
Tobias Brox <[EMAIL PROTECTED]> writes: > NBET=> explain select * from account_transaction where users_id=123456 order > by created desc, id desc limit 10; > We have indices on the users_id field and the (users_id, created)-tuple. Neither of those indexes can provide the sort order the query is asking for. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] UPDATE becomes mired / win32
Steve Peterson <[EMAIL PROTECTED]> writes: > If I run the statement: > (1): UPDATE voter SET gender = 'U'; > on the table in this condition, the query effectively never ends -- > I've allowed it to run for 12-14 hours before giving up. > ... > When (1) is running, the machine is very nearly idle, with no > postmaster taking more than 1 or 2 % of the CPU. Is the disk busy? If neither CPU nor I/O are saturated, then it's a good bet that the UPDATE isn't actually running at all, but is waiting for a lock somewhere. Have you looked into pg_locks to check for a conflicting lock? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Unsubscribe
On Wed, Oct 04, 2006 at 08:30:03AM -0700, Joshua D. Drake wrote: They don't follow what is largely considered standard amongst lists which is to have list information at the bottom of each e-mail. In my experience such a footer doesn't do much to prevent people sending unsubscribe messages to the list. Mike Stone ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Multi-key index not beeing used - bug?
Thanks Tobias. The difference here though, is that in terms of your database I am doing a query to select the most recent transaction for EACH user at once, not just for one user. If I do a similar query to yours to get the last transaction for a single user, my query is fast like yours. It's when I'm doing a query to get the results for all users at once that it is slow. If you try a query to get the most recent transaction of all useres at once you will run into the same problem I am having. Graham. Tobias Brox wrote: Look at this: NBET=> explain select * from account_transaction where users_id=123456 order by created desc limit 10; QUERY PLAN --- Limit (cost=0.00..27.40 rows=10 width=213) -> Index Scan Backward using account_transaction_on_user_and_timestamp on account_transaction (cost=0.00..1189.19 rows=434 width=213) Index Cond: (users_id = 123456) (3 rows) NBET=> explain select * from account_transaction where users_id=123456 order by created desc, id desc limit 10; QUERY PLAN -- Limit (cost=1114.02..1114.04 rows=10 width=213) -> Sort (cost=1114.02..1115.10 rows=434 width=213) Sort Key: created, id -> Index Scan using account_transaction_by_users_id on account_transaction (cost=0.00..1095.01 rows=434 width=213) Index Cond: (users_id = 123456) (5 rows) In case the explains doesn't explain themself good enough: we have a transaction table with ID (primary key, serial), created (a timestamp) and a users_id. Some of the users have generated thousands of transactions, and the above query is a simplified version of the query used to show the users their last transactions. Since we have a large user base hammering our servers with this request, the speed is significant. We have indices on the users_id field and the (users_id, created)-tuple. The timestamp is set by the application and has a resolution of 1 second - so there may easily be several transactions sharing the same timestamp, but this is an exception not the rule. I suppose the developers needed to add the ID to the sort list to come around a bug, but still prefering to have the primary sorting by created to be able to use the index. One workaround here is to order only by id desc and create a new index on (users_id, id) - but I really don't like adding more indices to the transaction table. ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Graham Davis Refractions Research Inc. [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
> can you do explain analyze on the two select queries on either side of > the union separatly? the subquery is correctly written and unlikely > to be a problem (in fact, good style imo). so lets have a look at > both sides of facil query and see where the problem is. Sorry for the delay, the server was down yesterday and couldn't get anything. I have modified the sub-queries a little, trying to get the index scans to fire - all the tables involved here are large enough to benefit from index scans over sequential scans. I am mystified as to why PART 1 is giving me: "Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) (actual time=0.007..99.033 rows=128268 loops=1)" which I assume is for the: "join mdx_core.facility_address as fa on fa.facility_id = f.facility_id" Then again, I am not sure how to read the EXPLAIN ANALYSE performance numbers. The other part of the UNION (PART 2) I have also modified, I think it's working nicely. Let me know if I'm mistaken on thinking that! The one remaining problem is that the UNION of these two sub-queries has a column which is a call to a custom TCL function that does a lexical analysis on the results, ranking the result names by their proximity to the imported name. his definitely eats up the performance and I hope that my decision to call this function on the results of the union (assuming union deletes redundent rows) is the correct one. Thanks! Carlo /* PART 1. The redundant expression "facility_address_id is NULL" was removed because only an OUTER join would have made this meaningful. We use only INNER joins in this sub-query Both facility_address and address have seq scans, even though there is an index for facility_address(facility_id( and an index for address( country_code, postal_code, address). The "like" operator appears to be making things expensive. This is used because we have to take into account that perhaps the import row is using the 5-number US ZIP, not the 9-number USZIP+4 standard (although this is not the case in this sample). /* explain analyse select f.facility_id, fa.facility_address_id, a.address_id, f.facility_type_code, f.name, a.address, a.city, a.state_code, a.postal_code, a.country_code from mdx_core.facility as f join mdx_core.facility_address as fa on fa.facility_id = f.facility_id join mdx_core.address as a on a.address_id = fa.address_id where a.country_code = 'US' and a.state_code = 'IL' and a.postal_code like '60640-5759'||'%' order by facility_id "Sort (cost=6392.50..6392.50 rows=1 width=103) (actual time=189.133..189.139 rows=12 loops=1)" " Sort Key: f.facility_id" " -> Nested Loop (cost=2732.88..6392.49 rows=1 width=103) (actual time=14.006..188.967 rows=12 loops=1)" "-> Hash Join (cost=2732.88..6388.91 rows=1 width=72) (actual time=13.979..188.748 rows=12 loops=1)" " Hash Cond: ("outer".address_id = "inner".address_id)" " -> Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) (actual time=0.004..98.867 rows=128268 loops=1)" " -> Hash (cost=2732.88..2732.88 rows=1 width=64) (actual time=6.430..6.430 rows=3 loops=1)" "-> Bitmap Heap Scan on address a (cost=62.07..2732.88 rows=1 width=64) (actual time=2.459..6.417 rows=3 loops=1)" " Recheck Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" " Filter: ((postal_code)::text ~~ '60640-5759%'::text)" " -> Bitmap Index Scan on address_country_state_postal_code_address_idx (cost=0.00..62.07 rows=3846 width=0) (actual time=1.813..1.813 rows=3554 loops=1)" "Index Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" "-> Index Scan using facility_pkey on facility f (cost=0.00..3.56 rows=1 width=35) (actual time=0.012..0.013 rows=1 loops=12)" " Index Cond: ("outer".facility_id = f.facility_id)" "Total runtime: 189.362 ms" /* PART 2 - can you see anything that could work faster? */ explain analyse select f.facility_id, null as facility_address_id, null as address_id, f.facility_type_code, f.name, null as address, f.default_city as city, f.default_state_code as state_code, f.default_postal_code as postal_code, f.default_country_code as country_code from mdx_core.facility as f left outer join mdx_core.facility_address as fa on fa.facility_id = f.facility_id where fa.facility_address_id is null and f.default_country_code = 'US' and f.default_state_code = 'IL' and '60640-5759' like f.default_postal_code||'%' "Nested Loop Left Join (cost=0.00..6042.41 rows=32 width=73) (actual time=14.923..14.923 rows=0 loops=1)" " Filter: ("
Re: [PERFORM] UPDATE becomes mired / win32
The table, VOTER, contains 3,090,013 rows and each row is about 120 bytes wide. It's loaded via a batch process in one shot, and the load is followed by an VACUUM FULL ANALYZE. Its structure is shown at the bottom of the message. if the table wasn't empty before and has indices defined, try a "REINDEX TABLE VOTER" before running the update. i had a similar case where an often updated table was vacuumed regurarly, but the indices grew and grew and grew. in my case the table - even when empty and analyze full'ed was 1.2gb according to pgadmin due to (outdated) indices. a reindex fixed all my performance issues. - thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Multi-key index not beeing used - bug?
Look at this: NBET=> explain select * from account_transaction where users_id=123456 order by created desc limit 10; QUERY PLAN --- Limit (cost=0.00..27.40 rows=10 width=213) -> Index Scan Backward using account_transaction_on_user_and_timestamp on account_transaction (cost=0.00..1189.19 rows=434 width=213) Index Cond: (users_id = 123456) (3 rows) NBET=> explain select * from account_transaction where users_id=123456 order by created desc, id desc limit 10; QUERY PLAN -- Limit (cost=1114.02..1114.04 rows=10 width=213) -> Sort (cost=1114.02..1115.10 rows=434 width=213) Sort Key: created, id -> Index Scan using account_transaction_by_users_id on account_transaction (cost=0.00..1095.01 rows=434 width=213) Index Cond: (users_id = 123456) (5 rows) In case the explains doesn't explain themself good enough: we have a transaction table with ID (primary key, serial), created (a timestamp) and a users_id. Some of the users have generated thousands of transactions, and the above query is a simplified version of the query used to show the users their last transactions. Since we have a large user base hammering our servers with this request, the speed is significant. We have indices on the users_id field and the (users_id, created)-tuple. The timestamp is set by the application and has a resolution of 1 second - so there may easily be several transactions sharing the same timestamp, but this is an exception not the rule. I suppose the developers needed to add the ID to the sort list to come around a bug, but still prefering to have the primary sorting by created to be able to use the index. One workaround here is to order only by id desc and create a new index on (users_id, id) - but I really don't like adding more indices to the transaction table. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] UPDATE becomes mired / win32
I'm having an interesting (perhaps anomalous) variability in UPDATE performance on a table in my database, and wanted to see if there was any interest in looking further before I destroy the evidence and move on. The table, VOTER, contains 3,090,013 rows and each row is about 120 bytes wide. It's loaded via a batch process in one shot, and the load is followed by an VACUUM FULL ANALYZE. Its structure is shown at the bottom of the message. If I run the statement: (1): UPDATE voter SET gender = 'U'; on the table in this condition, the query effectively never ends -- I've allowed it to run for 12-14 hours before giving up. The plan for that statement is: Seq Scan on voter (cost=0.00..145117.38 rows=3127738 width=120) However, if I do the following: (2): CREATE TABLE voter_copy AS SELECT * FROM voter; (3): UPDATE voter_copy SET gender = 'U'; the query is much faster -- Seq Scan on voter_copy (cost=0.00..96231.35 rows=3090635 width=120) (actual time=108.056..43203.696 rows=3090013 loops=1) Total runtime: 117315.731 ms When (1) is running, the machine is very nearly idle, with no postmaster taking more than 1 or 2 % of the CPU. When (3) is running, about 60% CPU utilization occurs. The same behavior occurs if the table is dumped and reloaded. My environment is Windows XP SP2 and I'm on Postgresql 8.1.4 installed via the msi installer. Hardware is an Athlon 2000+ 1.67ghx, with 1G RAM. The database is hosted on a Seagate Barracuda 7200.10 connected via a FastTrak 4300 without any RAID configuration. dd shows a write speed of 39 MB/s and read speed of 44 MB/s. The server configuration deviates from the default in these statements: fsync = off shared_buffers = 25000 work_mem = 5 maintenance_work_mem = 10 CREATE TABLE voter ( voter_id int4, sos_voter_id varchar(20), household_id int4, first_name varchar(40), middle_name varchar(40), last_name varchar(40), name_suffix varchar(10), phone_number varchar(10), bad_phone_no bool, registration_date date, birth_year int4, gender char(1), pri_ind char(1), gen_1992_primary_party char(1), council_votes int2, primary_votes int2, council_primary_votes int2, special_votes int2, presidential_votes int2, votes int2, absentee_votes int2, last_voted_date date, first_voted_date date, rating char(1), score float4, general_votes int2 ) WITHOUT OIDS; ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Poor performance on very simple query ?
Hi, Tobias, Tobias Brox wrote: > How can you have a default value on a primary key? Just declare the column with both a default value and a primary key constraint. It makes sense when the default value is calculated instead of a constant, by calling a function that generates the key. In fact, the SERIAL type does nothing but defining a sequence, and then use nextval('sequencename') as default. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Unsubscribe
Steve Atkins wrote: On Oct 4, 2006, at 9:00 AM, Joshua D. Drake wrote: I also don't care about that argument in this situation. People ignorantly posting an unsubscribe to the list get this kind of response because it's an annoyance to the list users, Over time especially now, we will see many more "users" versus "developers". Most "users" will never know how (nor should they have to) read email headers. They should know how to participate in mailing lists. That's unrelated to whether you're a developer or a user. The same webpage where you subscribe to a mailing list, you can also unsubscribe. This is not some weird technical voodoo, just that some people prefer to waste a thousand other peoples time than spend a minute or two of their own. Fortunately, they're a tiny minority. I believe that if you could get an honest response, you'd find a lot of these folks are just plain lazy. They don't want to recall how to unsubscribe and figure sending mail to the list will get the required result. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Unsubscribe
On Oct 4, 2006, at 9:00 AM, Joshua D. Drake wrote: I also don't care about that argument in this situation. People ignorantly posting an unsubscribe to the list get this kind of response because it's an annoyance to the list users, Over time especially now, we will see many more "users" versus "developers". Most "users" will never know how (nor should they have to) read email headers. They should know how to participate in mailing lists. That's unrelated to whether you're a developer or a user. The same webpage where you subscribe to a mailing list, you can also unsubscribe. This is not some weird technical voodoo, just that some people prefer to waste a thousand other peoples time than spend a minute or two of their own. Fortunately, they're a tiny minority. benefits long-term list users the most. The real question is: which is more annoying to list users, the occasional unsubscribe posted to the list (with accompanying responses), or a one-line footer on each post providing a link to unsubscribe instructions? Good point, because I guarantee you every time someone pulls this elitist dung about email headers, I am going to step in and say something. So if you want to shut me up, lets get the footer added. Judging from experience on other lists, it won't help. The tiny minority of people who are unable to unsubscribe will continue to be unable to unsubscribe. It won't hurt, though. Cheers, Steve ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Unsubscribe
On Wed, 2006-10-04 at 18:02, Csaba Nagy wrote: > > If we didn't want to add it for each list we could just add a link here: > > > > http://www.postgresql.org/community/lists/subscribe OK, now that I had a second look on that page, it does contain unsubscription info... but it's well hidden for the fugitive look... the caption is a big "Subscribe to Lists", you wouldn't think at a first glance think that the form is actually used to unsubscribe too, would you ? So maybe it's just that the text should be more explicit about what it actually does... Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Unsubscribe
I'd prefer to have a short footer link called something like "Mailing List Page" which would take you to a page where you could subscribe, unsubscribe, or view the archives. I think that making the link short and also making it a quick shortcut away from the archives tips the scales in terms of utility vs. annoyance. One of the tips that shows up in the footers today is just a link to the archives anyway. -- Mark Lewis On Wed, 2006-10-04 at 11:28 -0500, Bruno Wolff III wrote: > On Wed, Oct 04, 2006 at 08:30:03 -0700, > "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > > > > Although I 100% agree with you Bruno, it should be noted that our lists > > are a closed box for most people. They don't follow what is largely > > considered standard amongst lists which is to have list information at > > the bottom of each e-mail. > > There are reasons you don't want to do that. Footers work OK for single > part email messages. They don't make so much sense in multipart messages. > You can probably take a crap shoot and add the footer to the first > text/plain part and not break things. This won't work so well for multipart > alternative messages that have text/plain and text/html parts. You could > also try to insert a footer in to the html part, but thats a bit trickier > since you can't just put it at the end. > > However, since the postgres lists are mostly just using text/plain parts > for message bodies and there are already footers being used to distribute > tips, it wouldn't make things significantly worse to add unsubscribe > information as well. > > I would prefer just making the unsubscribe instructions easy to find on > the web. > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Unsubscribe
> I would prefer just making the unsubscribe instructions easy to find on > the web. They actually reasonably are. If you go to www->community/support->lists Sincerely, Joshua D. Drake > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Unsubscribe
D'Arcy J.M. Cain wrote: > On Wed, 04 Oct 2006 09:00:45 -0700 > "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: >> So if you want to shut me up, lets get the footer added. > > Of course, that doesn't fix the problem 100%. I am on lists that do > show that info in the footer and people still send unsubscribe messages > to the list. Sure, but what is more helpful? A reply that snips everything but the footer that has those instructions or a replay that shows email headers that look basically like some weird code to users? > > By the way, mailman has a nice feature that sends messages that look > like admin requests (such as unsubscribe) to the admin. That cuts down > on the noise quite a bit. Well you don't have to convince me to use mailman ;)... but the *ahem* list administrators of this project would rather have their toenails removed with a dull spoon. Sincerely, Joshua D. Drake > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Unsubscribe
On Wed, 04 Oct 2006 09:00:45 -0700 "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > So if you want to shut me up, lets get the footer added. Of course, that doesn't fix the problem 100%. I am on lists that do show that info in the footer and people still send unsubscribe messages to the list. By the way, mailman has a nice feature that sends messages that look like admin requests (such as unsubscribe) to the admin. That cuts down on the noise quite a bit. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Unsubscribe
On Wed, Oct 04, 2006 at 08:30:03 -0700, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > > Although I 100% agree with you Bruno, it should be noted that our lists > are a closed box for most people. They don't follow what is largely > considered standard amongst lists which is to have list information at > the bottom of each e-mail. There are reasons you don't want to do that. Footers work OK for single part email messages. They don't make so much sense in multipart messages. You can probably take a crap shoot and add the footer to the first text/plain part and not break things. This won't work so well for multipart alternative messages that have text/plain and text/html parts. You could also try to insert a footer in to the html part, but thats a bit trickier since you can't just put it at the end. However, since the postgres lists are mostly just using text/plain parts for message bodies and there are already footers being used to distribute tips, it wouldn't make things significantly worse to add unsubscribe information as well. I would prefer just making the unsubscribe instructions easy to find on the web. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Unsubscribe
> If we didn't want to add it for each list we could just add a link here: > > http://www.postgresql.org/community/lists/subscribe +1 When I want to unsubscribe from a list (very rare in my case, I don't subscribe in the first place if I'm not sure I want to get it), I start by looking where I subscribed... so the above suggestion might work quite well even for lazy subscribers, they'll have their unsubscription info right where they started the subscription process, no more searching needed. Cheers, Csaba ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Unsubscribe
> I also don't care about that argument in this situation. People > ignorantly posting an unsubscribe to the list get this kind of response > because it's an annoyance to the list users, Over time especially now, we will see many more "users" versus "developers". Most "users" will never know how (nor should they have to) read email headers. > benefits long-term list users the most. The real question is: which is > more annoying to list users, the occasional unsubscribe posted to the > list (with accompanying responses), or a one-line footer on each post > providing a link to unsubscribe instructions? Good point, because I guarantee you every time someone pulls this elitist dung about email headers, I am going to step in and say something. So if you want to shut me up, lets get the footer added. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Unsubscribe
> [Joshua] >> It is ridiculous that this community expects people to read email >> headers to figure out how to unsubscribe from our lists. > > I always check the headers when I want to unsubscribe from any mailing > list, and I think most people on this list have above average knowledge > of such technical details. Of course, on a list with this many > recepients there will always be some exceptions ... I would consider myself above average knowledge of such technical details and I didn't know the list information was in the headers until recently (the last time all of this came up). Now, I of course did know that there were headers, and I can use them to diagnose problems but I was unaware of an RFC that explicitly stated how the headers were supposed to be sent for mailing lists. However, that is besides the point. It is still ridiculous to expect anyone to read the headers just to unsubscribe from a list. If we didn't want to add it for each list we could just add a link here: http://www.postgresql.org/community/lists/subscribe Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Unsubscribe
This seems to be the nearly unanimous response to people posting an unsubscribe request to the postgres mailing lists. I emphatically agree with the argument - people should know better than that, and the information included in the e-mail headers should be more than sufficient. Every conceivable avenue of discovering how to unsubscribe, other than the list software attaching a footer on each e-mail, is available to pursue. I also don't care about that argument in this situation. People ignorantly posting an unsubscribe to the list get this kind of response because it's an annoyance to the list users, not necessarily because we care about educating that particular person. The posts obviously don't help future unsubscribers who aren't willing to track the information down anyway. We should be addressing this from the standpoint of what benefits long-term list users the most. The real question is: which is more annoying to list users, the occasional unsubscribe posted to the list (with accompanying responses), or a one-line footer on each post providing a link to unsubscribe instructions? Bruno Wolff III wrote: On Wed, Oct 04, 2006 at 10:03:00 +0200, Luc Delgado <[EMAIL PROTECTED]> wrote: Please unsubscribe me! Thank you! If you really can't figure out how to unsubscribe from a list, you should contact the list owner, not the list. The list members can't unsubscribe you (and it isn't their job to) and the owner may not be subscribed to the list. The convention for lists is that adding '-owner' to the local part of the list email address will be an address for the owner. A good place to search to find out how to unsubscribe to a list is to search for the mailing lists using google. Usually the information on how to subscribe and unsubscribe are in the same place and you were able to find out how to subscribe in the first place, so you should be able to figure out how to unsubscribe by yourself as well. Also, it would be better to have a message foot saying how to unsubscribe. No, the standard is that the list information is kept in the headers so that it can be extracted by mail clients that care to. There is an RFC describing these headers. They are supplied by the mailing list software used for the Postgres mailing lists. Have your mail client display full headers for one of the list messages to get the instructions from there. -- Nolan Cafferky Software Developer IT Department RBS Interactive [EMAIL PROTECTED]
Re: [PERFORM] Unsubscribe
To be a bit constructive, could it be an idea to add unsubscribe information as one of the standard tailer tips? Then unsubscribe info wouldn't appear in every mail, but often enough for people considering to unsubscribe. To be totally non-constructive, let me add a bit to the noise below: [Bruno] > > If you really can't figure out how to unsubscribe from a list, you should > > contact the list owner, not the list. The list members can't unsubscribe you > > (and it isn't their job to) and the owner may not be subscribed to the > > list. If he can't find out how to unsubscribe from the list, how can he be expected to figure out the owner address? [Joshua] > It is ridiculous that this community expects people to read email > headers to figure out how to unsubscribe from our lists. I always check the headers when I want to unsubscribe from any mailing list, and I think most people on this list have above average knowledge of such technical details. Of course, on a list with this many recepients there will always be some exceptions ... ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Unsubscribe
Bruno Wolff III wrote: > On Wed, Oct 04, 2006 at 10:03:00 +0200, > Luc Delgado <[EMAIL PROTECTED]> wrote: >> Please unsubscribe me! Thank you! > > If you really can't figure out how to unsubscribe from a list, you should > contact the list owner, not the list. The list members can't unsubscribe you > (and it isn't their job to) and the owner may not be subscribed to the > list. Although I 100% agree with you Bruno, it should be noted that our lists are a closed box for most people. They don't follow what is largely considered standard amongst lists which is to have list information at the bottom of each e-mail. It is ridiculous that this community expects people to read email headers to figure out how to unsubscribe from our lists. > The convention for lists is that adding '-owner' to the local part > of the list email address will be an address for the owner. > A good place to search to find out how to unsubscribe to a list is to search > for the mailing lists using google. Usually the information on how to > subscribe > and unsubscribe are in the same place and you were able to find out how > to subscribe in the first place, so you should be able to figure out how > to unsubscribe by yourself as well. Nobody should have to work that hard to unsubscribe. > >> >> Also, it would be better to have a message foot saying how to unsubscribe. Yes, it definitely would. > > No, the standard is that the list information is kept in the headers so that > it can be extracted by mail clients that care to. There is an RFC describing > these headers. They are supplied by the mailing list software used for the > Postgres mailing lists. Have your mail client display full headers for one > of the list messages to get the instructions from there. Who the heck cares what the RFC says. The RFC is irrelevant if the mail clients don't support it. The clients that are most widely in use, do not support unsubscribing from lists via the email headers. Joshua D. Drake > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Unsubscribe
On Wed, Oct 04, 2006 at 10:03:00 +0200, Luc Delgado <[EMAIL PROTECTED]> wrote: > > Please unsubscribe me! Thank you! If you really can't figure out how to unsubscribe from a list, you should contact the list owner, not the list. The list members can't unsubscribe you (and it isn't their job to) and the owner may not be subscribed to the list. The convention for lists is that adding '-owner' to the local part of the list email address will be an address for the owner. A good place to search to find out how to unsubscribe to a list is to search for the mailing lists using google. Usually the information on how to subscribe and unsubscribe are in the same place and you were able to find out how to subscribe in the first place, so you should be able to figure out how to unsubscribe by yourself as well. > > > Also, it would be better to have a message foot saying how to unsubscribe. No, the standard is that the list information is kept in the headers so that it can be extracted by mail clients that care to. There is an RFC describing these headers. They are supplied by the mailing list software used for the Postgres mailing lists. Have your mail client display full headers for one of the list messages to get the instructions from there. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] simple case using index on windows but not on linux
I think I am being stupid now. The > query was returning so many rows (87% of the rows in the table) that a seq-scan was of course the best way. Sorry - all is now working and the problem was the locale issue. Thanks so much for your help everyone. -- Simon Godden ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] simple case using index on windows but not on linux
Can you post EXPLAIN ANALYSE for the LIKE and <> queries that should be using the index? With enable_seqscan on and off please. OK - I don't know what happened, but now my linux installation is behaving like the windows one. I honestly don't know what changed, which I know doesn't help people determine the cause of my issue But I still have a problem with > and <, on both environments. Now, both LIKE and = are using the index with no options on it. But the other operators are not. Firstly, with enable_seqscan on: orderstest=# explain analyze select c_number from t_order where c_number like '1%'; QUERY PLAN - Index Scan using t_order_c_number on t_order (cost=0.00..3.01 rows=1 width=11) (actual time=0.167..0.610 rows=100 loops=1) Index Cond: (((c_number)::text >= '1'::character varying) AND ((c_number)::text < '2'::character varying)) Filter: ((c_number)::text ~~ '1%'::text) Total runtime: 0.921 ms (4 rows) orderstest=# explain analyze select c_number from t_order where c_number > '0001'; QUERY PLAN --- Seq Scan on t_order (cost=0.00..18312.50 rows=878359 width=11) (actual time=1.102..4364.704 rows=878000 loops=1) Filter: ((c_number)::text > '0001'::text) Total runtime: 6431.968 ms (3 rows) And now with enable_seqscan off: orderstest=# explain analyze select c_number from t_order where c_number like '1%'; QUERY PLAN - Index Scan using t_order_c_number on t_order (cost=0.00..3.01 rows=1 width=11) (actual time=0.245..0.674 rows=100 loops=1) Index Cond: (((c_number)::text >= '1'::character varying) AND ((c_number)::text < '2'::character varying)) Filter: ((c_number)::text ~~ '1%'::text) Total runtime: 0.971 ms (4 rows) (Just the same) orderstest=# explain analyze select c_number from t_order where c_number > '0001'; QUERY PLAN Index Scan using t_order_c_number on t_order (cost=0.00..22087.31 rows=878912 width=11) (actual time=0.230..3504.909 rows=878000 loops=1) Index Cond: ((c_number)::text > '0001'::text) Total runtime: 5425.931 ms (3 rows) (Now using the index but getting awful performance out of it - how's that?) The difference seems to be whether it is treating the index condition as 'character varying' or 'text'. Basically, can I do > < >= <= on a varchar without causing a seq-scan? -- Simon Godden ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] simple case using index on windows but not on linux
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > > lc_collate is C, as are all the other lc settings. > > I have run the analyze commands. > > Still the same. That is strange. I figured it had to be related to the locale and the LIKE operator. I'm not an expert on these locale issues, but I'd be curious to see if it would start using an index if you added an index like this: CREATE INDEX test_index ON t_order (c_number varchar_pattern_ops); Dave ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] simple case using index on windows but not on linux
simon godden wrote: On 10/4/06, Richard Huxton wrote: simon godden wrote: From psql, a "show all" command will list all your config settings and let you check the lc_xxx values are correct. lc_collate is C, as are all the other lc settings. I have run the analyze commands. Still the same. Can you post EXPLAIN ANALYSE for the LIKE and <> queries that should be using the index? With enable_seqscan on and off please. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL Caching
On Tue, 2006-10-03 at 18:29 -0700, Tomeh, Husam wrote: > >> * When any session updates the data that already in shared > buffer, > >>does Postgres synchronize the data both disk and shared buffers area > >> immediately ? > > Not necessarily true. When a block is modified in the shared buffers, > the modified block is written to the Postgres WAL log. A periodic DB > checkpoint is performed to flush the modified blocks in the shared > buffers to the data files. Postgres 8.0 and beyond have a process called bgwriter that continually flushes dirty buffers to disk, to minimize the work that needs to be done at checkpoint time. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] simple case using index on windows but not on linux
On 10/4/06, Richard Huxton wrote: simon godden wrote: From psql, a "show all" command will list all your config settings and let you check the lc_xxx values are correct. lc_collate is C, as are all the other lc settings. I have run the analyze commands. Still the same. -- Simon Godden ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL Caching
On Wed, 2006-10-04 at 07:38 -0500, Dave Dutcher wrote: > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of > > Adnan DURSUN > > i want to be can read an execution plan when > > i look at it. > > So, is there any doc about how it should be read ? > > > You are asking how to read the output from EXPLAIN? This page is a good > place to start: > > http://www.postgresql.org/docs/8.1/interactive/performance-tips.html Robert Treat's Explaining Explain presentation from OSCON is also very good: http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf#search=%22%22explaining%20explain%22%22 Brad. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] simple case using index on windows but not on linux
simon godden wrote: I did that, e.g. initdb --locale=C, re-created all my data and have exactly the same problem. I have two indexes, one with no options, and one with the varchar operator options. So the situation now is: If I do a like query it uses the index with the varchar options; If I do a = query, it uses the index with no options; If I do a < or > or any other operator, it reverts back to a seq-scan! I am on FC5 - any further ideas? Did I need to do anything specific about collating sequence? I thought that the --locale=C would set that for all options. From psql, a "show all" command will list all your config settings and let you check the lc_xxx values are correct. Make sure you've analysed the database after restoring, otherwise it will have bad stats available. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL Caching
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Adnan DURSUN > i want to be can read an execution plan when > i look at it. > So, is there any doc about how it should be read ? You are asking how to read the output from EXPLAIN? This page is a good place to start: http://www.postgresql.org/docs/8.1/interactive/performance-tips.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
Hi, Alex, Alex Stapleton wrote: >> explain analyze is more helpful because it prints the times. > > You can always use the \timing flag in psql ;) Have you ever tried EXPLAIN ANALYZE? \timing gives you one total timing, but EXPLAIN ANALYZE gives you timings for sub-plans, including real row counts etc. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] slow queue-like empty table
[Csaba Nagy - Thu at 10:45:35AM +0200] > So you should check for "idle in transaction" sessions, those are bad... > or any other long running transaction. Thank you (and others) for pointing this out, you certainly set us on the right track. We did have some few unclosed transactions; transactions not beeing ended by "rollback" or "commit". We've been fixing this, beating up the programmers responsible and continued monitoring. I don't think it's only due to those queue-like tables, we've really seen a significant improvement on the graphs showing load and cpu usage on the database server after we killed all the "idle in transaction". I can safely relax still some weeks before I need to do more optimization work :-) (oh, btw, we didn't really beat up the programmers ... too big geographical distances ;-) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] simple case using index on windows but not on linux
simon godden wrote: If the index isn't used, then we have problem #3. I think this is what you are actually seeing. Your locale is something other than "C" and PG doesn't know how to use like with indexes. Read up on operator classes or change your locale. http://www.postgresql.org/docs/8.1/static/indexes-opclass.html Aha - that sounds like it - this is the output from locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" .. I guess it cannot determine the collating sequence? It can, but isn't sure that it can rely on LIKE 'A%' being the same as >= 'A' and < 'B' (not always true). Re-creating the index with the right opclass will tell it this is the case. I'm not too familiar with unix locale issues - does this output match your problem description? OK - quick intro to locales. Create a file /tmp/sortthis containing the following: ---begin file--- BBB CCC AAA A CAT A DOG ACAT ---end file--- Now run "sort /tmp/sortthis". You'll probably see spaces get ignored. Now run "LANG=C sort /tmp/sortthis". You'll probably see a traditional ASCII ("C") sort. If not try LC_COLLATE rather than LANG. Can you explain how to change my locale to 'C'? (I'm quite happy for you to tell me to RTFM, as I know this is not a linux user mailing list :) You'll want to dump your databases and re-run initdb with a locale of "C" (or no locale). See: http://www.postgresql.org/docs/8.1/static/app-initdb.html That will mean all sorting will be on ASCII value. The problem is that the database picks up the operating-system's default locale when you install it from package. Not always what you want, but then until you understand the implications you can't really decide one way or the other. HTH -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] simple case using index on windows but not on linux
simon godden wrote: (Sending again because I forgot to reply to all) On 10/4/06, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: You can increase the max shared memory size if you have root access. See http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS Scroll down for Linux-specific instructions. Thanks for the link. Are you saying that the shared memory size is the issue here? Please can you explain how it would cause a seq scan rather than an index scan. I would like to understand the issue before making changes. It *might* be shared-memory settings. It's almost certainly something to do with setup. If you have the same data and the same query and can reliably produce different results then something else must be different. If you look at the explain output from both, PG knows the seq-scan is going to be expensive (cost=20835) so the Linux box either 1. Doesn't have the index (and you say it does, so it's not this). 2. Thinks the index will be even more expensive. 3. Can't use the index at all. Issue "set enable_seqscan=false" and then run your explain analyse. If your query uses the index, what is the estimated cost? If the estimated cost is larger than a seq-scan that would indicate your configuration settings are badly out-of-range. If the index isn't used, then we have problem #3. I think this is what you are actually seeing. Your locale is something other than "C" and PG doesn't know how to use like with indexes. Read up on operator classes or change your locale. http://www.postgresql.org/docs/8.1/static/indexes-opclass.html -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] simple case using index on windows but not on linux
On 10/4/06, Richard Huxton wrote: Issue "set enable_seqscan=false" and then run your explain analyse. If your query uses the index, what is the estimated cost? If the estimated cost is larger than a seq-scan that would indicate your configuration settings are badly out-of-range. I did that and it still used seq-scan. If the index isn't used, then we have problem #3. I think this is what you are actually seeing. Your locale is something other than "C" and PG doesn't know how to use like with indexes. Read up on operator classes or change your locale. http://www.postgresql.org/docs/8.1/static/indexes-opclass.html Aha - that sounds like it - this is the output from locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= I guess it cannot determine the collating sequence? I'm not too familiar with unix locale issues - does this output match your problem description? Can you explain how to change my locale to 'C'? (I'm quite happy for you to tell me to RTFM, as I know this is not a linux user mailing list :) -- Simon Godden ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] simple case using index on windows but not on linux
(Sending again because I forgot to reply to all) On 10/4/06, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: You can increase the max shared memory size if you have root access. See http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS Scroll down for Linux-specific instructions. Thanks for the link. Are you saying that the shared memory size is the issue here? Please can you explain how it would cause a seq scan rather than an index scan. I would like to understand the issue before making changes. -- Simon Godden ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] simple case using index on windows but not on linux
simon godden wrote: The only major difference between the hardware is that the windows machine has 2gb RAM and a setting of 1 shared memory pages, whereas the linux machine has 756Mb RAM and a setting of 3000 shared memory pages (max. shared memory allocation of 32Mb). I can't see any other differences in configuration. You can increase the max shared memory size if you have root access. See http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS Scroll down for Linux-specific instructions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Unsubscribe
Please unsubscribe me! Thank you! Also, it would be better to have a message foot saying how to unsubscribe. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] simple case using index on windows but not on linux
I have a simple case, selecting on a LIKE where clause over a single column that has an index on it. On windows it uses the index - on linux it does not. I have exactly the same scema and data in each, and I have run the necessary analyze commands on both. Windows is running 8.1.4 Linux is running from RPM postgresql-server-8.1.4-1.FC5.1 There are 1 million rows in the table - a number I would expect to lower the score of a sequential scan for the planner. There is an index on 'c_number'. On windows I get this: orderstest=# explain analyze select * from t_order where c_number like '0001%'; QUERY PLAN - Index Scan using t_order_c_number on t_order (cost=0.00..26.53 rows=928 width=43) (actual time=0.029..2.857 rows=1000 loops=1) Index Cond: (((c_number)::text >= '0001'::character varying) AND ((c_number)::text < '0002'::character varying)) Filter: ((c_number)::text ~~ '0001%'::text) Total runtime: 4.572 ms (4 rows) Great - the index is used, and the query is lightning fast. On Linux I get this: orderstest=# explain analyze select c_number from t_order where c_number like '0001%'; QUERY PLAN -- Seq Scan on t_order (cost=0.00..20835.00 rows=983 width=11) (actual time=1.364..1195.064 rows=1000 loops=1) Filter: ((c_number)::text ~~ '0001%'::text) Total runtime: 1197.312 ms (3 rows) I just can't use this level of performance in my application. On my linux box, the only way I can get it to use the index is to use the = operator. If I use anything else, a seq scan is used. Disabling sequence scans in the config has no effect. It still does not use the index for anything other than an = comparison. Here is a dump of the table description: orderstest=# \d t_order; Table "public.t_order" Column | Type | Modifiers ---++--- id| bigint | not null c_number | character varying(255) | customer_id | bigint | origincountry_id | bigint | destinationcountry_id | bigint | Indexes: "t_order_pkey" PRIMARY KEY, btree (id) "t_order_c_number" btree (c_number) "_3" btree (destinationcountry_id) "_4" btree (origincountry_id) "_5" btree (customer_id) Foreign-key constraints: "fk9efdd3a33dbb666c" FOREIGN KEY (destinationcountry_id) REFERENCES go_country(id) "fk9efdd3a37d3dd384" FOREIGN KEY (origincountry_id) REFERENCES go_country(id) "fk9efdd3a38654c9d3" FOREIGN KEY (customer_id) REFERENCES t_party(id) That dump is exactly the same on both machines. The only major difference between the hardware is that the windows machine has 2gb RAM and a setting of 1 shared memory pages, whereas the linux machine has 756Mb RAM and a setting of 3000 shared memory pages (max. shared memory allocation of 32Mb). I can't see any other differences in configuration. Disk throughput on both is reasonable (40Mb/second buffered reads) Can anyone explain the difference in the planner behaviour on the two systems, using what appears to be the same version of postgres? -- Simon Godden ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq