Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, 2007-02-06 at 01:35, Karen Hill wrote: [snip] So far I've been sitting here for about 2 million ms waiting for it to complete, and I'm not sure how many inserts postgres is doing per second. One way is to run analyze verbose on the target table and see how many pages it has, and then do it again 1 minute later and check how many pages it grew. Then multiply the page increase by the record per page ratio you can get from the same analyze's output, and you'll get an estimated growth rate. Of course this will only work if you didn't have lots of free space in the table to start with... if you do have lots of free space, you still can estimate the growth based on the analyze results, but it will be more complicated. In any case, it would be very nice to have more tools to attach to running queries and see how they are doing... starting with what exactly they are doing (are they in RI checks maybe ?), the actual execution plan they are using, how much they've done from their work... it would help a lot debugging performance problems. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning
Tuners, allways be aware that results on Windows may be totally different! My main customer is running PostgreSQL 8.1 on MINIMUM shared buffers max_connections = 100# shared_buffers = 200# min 16 or max_connections*2, 8KB each I changed it to this value from the very low default 2, and the system is responding better; especially after fixing the available memory setting within the planner. ... frustrating part is, I could not replicate this behavious with pg_bench :( Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Reinsburgstraße 202b 70197 Stuttgart 0173/9409607 fx 01212-5-13695179 - Python: the only language with more web frameworks than keywords.
Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)
On 2/6/07, Mark Stosberg [EMAIL PROTECTED] wrote: It's also notable that the units used are meters, not miles like geo_distance(). That's what the magic number of 16093.44 is-- 10 miles converted to meters. When I benchmarked this query against the old geo_distance() variation, it was about 200 times faster (~100ms vs .5ms). However, my next step was to try a more real world query that involved a more complex where clause and a couple of table joins. So far, that result is coming out /slower/ with the new approach, even though the index is being used. I believe this may be cause of the additional results found that are outside of the sphere, but inside the cube. This causes additional rows that need processing in the joined tables. Could someone post an example of how to further refine this so the results more closely match what geo_distance returns() ? I agree with bruno...the extra time is probably not what you are thinking...please post explain analyze results, etc. However bruno's ratio, while correct does not tell the whole story because you have to recheck distance to every point in the returned set. There is a small optimization you can make. The query you wrote automatically excludes points within a certain box. you can also include points in the set which is the largest box that fits in the circle: select * from zipcodes where earth_box('(436198.322855334, 4878562.8732218, 4085386.43843934)'::cube,inner_radius) @ earth_coords or ( earth_box('(436198.322855334, 4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords and geo_dist... ); you can also choose to omit the earth_coords column and calculate it on the fly...there is no real performance hit for this but it does make the sql a bit ugly. merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)
Bruno Wolff III wrote: Some people here may be able to tell you more if you show us explain analyze output. Here is my explain analyze output. Some brief context of what's going on. The goal is to find Pets Near You. We join the pets table on the shelters table to get a zipcode, and then join a shelters table with earth_distance to get the coordinates of the zipcode. ( Is there any significant penalty for using a varchar vs an int for a joint? ). I've been investigating partial indexes for the pets table. It has about 300,000 rows, but only about 10 are active, and those are the ones we are care about. Queries are also frequently made on males vs females, dogs vs cats or specific ages, and those specific cases seem like possible candidates for partial indexes as well. I played with that approach some, but had trouble coming up with any thing that benchmarked faster. I'm reading the explain analyze output correctly myself, nearly all of the time spent is related to the 'pets' table, but I can't see what to about it. Help appreciated! Mark Nested Loop (cost=11.82..29.90 rows=1 width=0) (actual time=37.601..1910.787 rows=628 loops=1) - Nested Loop (cost=6.68..20.73 rows=1 width=24) (actual time=35.525..166.547 rows=1727 loops=1) - Bitmap Heap Scan on pets (cost=6.68..14.71 rows=1 width=4) (actual time=35.427..125.594 rows=1727 loops=1) Recheck Cond: (((sex)::text = 'f'::text) AND (species_id = 1)) Filter: ((pet_state)::text = 'available'::text) - BitmapAnd (cost=6.68..6.68 rows=2 width=0) (actual time=33.398..33.398 rows=0 loops=1) - Bitmap Index Scan on pets_sex_idx (cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739 rows=35579 loops=1) Index Cond: ((sex)::text = 'f'::text) - Bitmap Index Scan on pet_species_id_idx (cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779 rows=48695 loops=1) Index Cond: (species_id = 1) - Index Scan using shelters_pkey on shelters (cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1 loops=1727) Index Cond: (outer.shelter_id = shelters.shelter_id) - Bitmap Heap Scan on earth_distance (cost=5.14..9.15 rows=1 width=9) (actual time=0.984..0.984 rows=0 loops=1727) Recheck Cond: ((cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_distance.earth_coords) AND ((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text)) - BitmapAnd (cost=5.14..5.14 rows=1 width=0) (actual time=0.978..0.978 rows=0 loops=1727) - Bitmap Index Scan on earth_coords_idx (cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223 loops=1727) Index Cond: (cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_coords) - Bitmap Index Scan on earth_distance_zipcode_idx (cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1 loops=1727) Index Cond: ((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text) Total runtime: 1913.099 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)
If I'm reading this correctly, 89% of the query time is spent doing an index scan of earth_coords_idx. Scanning pets is only taking 6% of the total time. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Stosberg Sent: Tuesday, February 06, 2007 8:40 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...) Bruno Wolff III wrote: Some people here may be able to tell you more if you show us explain analyze output. Here is my explain analyze output. Some brief context of what's going on. The goal is to find Pets Near You. We join the pets table on the shelters table to get a zipcode, and then join a shelters table with earth_distance to get the coordinates of the zipcode. ( Is there any significant penalty for using a varchar vs an int for a joint? ). I've been investigating partial indexes for the pets table. It has about 300,000 rows, but only about 10 are active, and those are the ones we are care about. Queries are also frequently made on males vs females, dogs vs cats or specific ages, and those specific cases seem like possible candidates for partial indexes as well. I played with that approach some, but had trouble coming up with any thing that benchmarked faster. I'm reading the explain analyze output correctly myself, nearly all of the time spent is related to the 'pets' table, but I can't see what to about it. Help appreciated! Mark Nested Loop (cost=11.82..29.90 rows=1 width=0) (actual time=37.601..1910.787 rows=628 loops=1) - Nested Loop (cost=6.68..20.73 rows=1 width=24) (actual time=35.525..166.547 rows=1727 loops=1) - Bitmap Heap Scan on pets (cost=6.68..14.71 rows=1 width=4) (actual time=35.427..125.594 rows=1727 loops=1) Recheck Cond: (((sex)::text = 'f'::text) AND (species_id = 1)) Filter: ((pet_state)::text = 'available'::text) - BitmapAnd (cost=6.68..6.68 rows=2 width=0) (actual time=33.398..33.398 rows=0 loops=1) - Bitmap Index Scan on pets_sex_idx (cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739 rows=35579 loops=1) Index Cond: ((sex)::text = 'f'::text) - Bitmap Index Scan on pet_species_id_idx (cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779 rows=48695 loops=1) Index Cond: (species_id = 1) - Index Scan using shelters_pkey on shelters (cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1 loops=1727) Index Cond: (outer.shelter_id = shelters.shelter_id) - Bitmap Heap Scan on earth_distance (cost=5.14..9.15 rows=1 width=9) (actual time=0.984..0.984 rows=0 loops=1727) Recheck Cond: ((cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_distance.earth_coords) AND ((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text)) - BitmapAnd (cost=5.14..5.14 rows=1 width=0) (actual time=0.978..0.978 rows=0 loops=1727) - Bitmap Index Scan on earth_coords_idx (cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223 loops=1727) Index Cond: (cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_coords) - Bitmap Index Scan on earth_distance_zipcode_idx (cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1 loops=1727) Index Cond: ((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text) Total runtime: 1913.099 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)
On Tue, Feb 06, 2007 at 09:39:54 -0500, Mark Stosberg [EMAIL PROTECTED] wrote: I've been investigating partial indexes for the pets table. It has about 300,000 rows, but only about 10 are active, and those are the ones we are care about. Queries are also frequently made on males vs females, dogs vs cats It probably won't pay to make partial indexes on sex or species (at least for the popular ones), as you aren't likely to save enough by eliminating only half the cases to make up for maintaining another index. A partial index for active rows probably does make sense. or specific ages, and those specific cases seem like possible candidates for partial indexes as well. I played with that approach some, but had trouble coming up with any thing that benchmarked faster. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote: On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On Mon, 2007-02-05 at 18:35, Karen Hill wrote: I have a pl/pgsql function that is inserting 200,000 records for testing purposes. What is the expected time frame for this operation on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is a 2ghz cpu. So far I've been sitting here for about 2 million ms waiting for it to complete, and I'm not sure how many inserts postgres is doing per second. That really depends. Doing 200,000 inserts as individual transactions will be fairly slow. Since PostgreSQL generally runs in autocommit mode, this means that if you didn't expressly begin a transaction, you are in fact inserting each row as a transaction. i.e. this: I think OP is doing insertion inside a pl/pgsql loop...transaction is implied here. Yeah, I noticed that about 10 seconds after hitting send... :) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How long should it take to insert 200,000 records?
On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote: On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On Mon, 2007-02-05 at 18:35, Karen Hill wrote: I have a pl/pgsql function that is inserting 200,000 records for testing purposes. What is the expected time frame for this operation on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is a 2ghz cpu. So far I've been sitting here for about 2 million ms waiting for it to complete, and I'm not sure how many inserts postgres is doing per second. That really depends. Doing 200,000 inserts as individual transactions will be fairly slow. Since PostgreSQL generally runs in autocommit mode, this means that if you didn't expressly begin a transaction, you are in fact inserting each row as a transaction. i.e. this: I think OP is doing insertion inside a pl/pgsql loop...transaction is implied here. Yeah, I noticed that about 10 seconds after hitting send... :) actually, I get the stupid award also because RI check to unindexed column is not possible :) (this haunts deletes, not inserts). merlin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, 2007-02-06 at 12:01 -0500, Merlin Moncure wrote: On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote: On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On Mon, 2007-02-05 at 18:35, Karen Hill wrote: I have a pl/pgsql function that is inserting 200,000 records for testing purposes. What is the expected time frame for this operation on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is a 2ghz cpu. So far I've been sitting here for about 2 million ms waiting for it to complete, and I'm not sure how many inserts postgres is doing per second. That really depends. Doing 200,000 inserts as individual transactions will be fairly slow. Since PostgreSQL generally runs in autocommit mode, this means that if you didn't expressly begin a transaction, you are in fact inserting each row as a transaction. i.e. this: I think OP is doing insertion inside a pl/pgsql loop...transaction is implied here. Yeah, I noticed that about 10 seconds after hitting send... :) actually, I get the stupid award also because RI check to unindexed column is not possible :) (this haunts deletes, not inserts). Sure it's possible: CREATE TABLE parent (col1 int4); -- insert many millions of rows into parent CREATE TABLE child (col1 int4 REFERENCES parent(col1)); -- insert many millions of rows into child, very very slowly. - Mark Lewis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, Feb 06, 2007 at 10:31:26 -0800, Mark Lewis [EMAIL PROTECTED] wrote: Sure it's possible: CREATE TABLE parent (col1 int4); -- insert many millions of rows into parent CREATE TABLE child (col1 int4 REFERENCES parent(col1)); -- insert many millions of rows into child, very very slowly. I don't think Postgres allows this. You don't have to have an index in the child table, but do in the parent table. Quote from http://developer.postgresql.org/pgdocs/postgres/sql-createtable.html: The referenced columns must be the columns of a unique or primary key constraint in the referenced table. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How long should it take to insert 200,000 records?
On 2/6/07, Mark Lewis [EMAIL PROTECTED] wrote: actually, I get the stupid award also because RI check to unindexed column is not possible :) (this haunts deletes, not inserts). Sure it's possible: CREATE TABLE parent (col1 int4); -- insert many millions of rows into parent CREATE TABLE child (col1 int4 REFERENCES parent(col1)); -- insert many millions of rows into child, very very slowly. the database will not allow you to create a RI link out unless the parent table has a primary key/unique constraint, which the database backs with an indexand you can't even trick it afterwards by dropping the constraint. it's the other direction, when you cascade forwards when you can have a problem. this is most common with a delete, but can also happen on an update of a table's primary key with child tables referencing it. merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, 2007-02-06 at 14:06 -0500, Merlin Moncure wrote: On 2/6/07, Mark Lewis [EMAIL PROTECTED] wrote: actually, I get the stupid award also because RI check to unindexed column is not possible :) (this haunts deletes, not inserts). Sure it's possible: CREATE TABLE parent (col1 int4); -- insert many millions of rows into parent CREATE TABLE child (col1 int4 REFERENCES parent(col1)); -- insert many millions of rows into child, very very slowly. the database will not allow you to create a RI link out unless the parent table has a primary key/unique constraint, which the database backs with an indexand you can't even trick it afterwards by dropping the constraint. it's the other direction, when you cascade forwards when you can have a problem. this is most common with a delete, but can also happen on an update of a table's primary key with child tables referencing it. Hmmm, should check my SQL before hitting send I guess. Well, at least you no longer have to wear the stupid award, Merlin :) -- Mark Lewis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] index scan through a subquery
Tom Lane wrote: I need the lovely index scan, but my table is hidden behind a view, and all I get is the ugly sequential scan. Any ideas on how to convince the optimizer to unfold the subquery properly? You should provide some context in this sort of gripe, like which PG version you're using. But I'm going to guess that it's 8.2.x, because 8.1.x gets it right :-(. Try the attached. Good guess; I was indeed talking about the current release rather than the previous release. Also, apologies for the tone of my post: I was attempting to be jovial, but in retrospect, I see how it reads as a gripe, which I guess evoked your frowny-face emoticon. Thanks for the quick response, elegant fix, and ongoing excellent work! Cheers, Bill Index: planagg.c === RCS file: /cvsroot/pgsql/src/backend/optimizer/plan/planagg.c,v retrieving revision 1.25 diff -c -r1.25 planagg.c *** planagg.c 9 Jan 2007 02:14:13 - 1.25 --- planagg.c 6 Feb 2007 06:30:23 - *** *** 70,75 --- 70,76 optimize_minmax_aggregates(PlannerInfo *root, List *tlist, Path *best_path) { Query *parse = root-parse; + FromExpr *jtnode; RangeTblRef *rtr; RangeTblEntry *rte; RelOptInfo *rel; *** *** 102,115 * We also restrict the query to reference exactly one table, since join * conditions can't be handled reasonably. (We could perhaps handle a * query containing cartesian-product joins, but it hardly seems worth the ! * trouble.) */ ! Assert(parse-jointree != NULL IsA(parse-jointree, FromExpr)); ! if (list_length(parse-jointree-fromlist) != 1) ! return NULL; ! rtr = (RangeTblRef *) linitial(parse-jointree-fromlist); ! if (!IsA(rtr, RangeTblRef)) return NULL; rte = rt_fetch(rtr-rtindex, parse-rtable); if (rte-rtekind != RTE_RELATION || rte-inh) return NULL; --- 103,121 * We also restrict the query to reference exactly one table, since join * conditions can't be handled reasonably. (We could perhaps handle a * query containing cartesian-product joins, but it hardly seems worth the ! * trouble.) However, the single real table could be buried in several ! * levels of FromExpr. */ ! jtnode = parse-jointree; ! while (IsA(jtnode, FromExpr)) ! { ! if (list_length(jtnode-fromlist) != 1) ! return NULL; ! jtnode = linitial(jtnode-fromlist); ! } ! if (!IsA(jtnode, RangeTblRef)) return NULL; + rtr = (RangeTblRef *) jtnode; rte = rt_fetch(rtr-rtindex, parse-rtable); if (rte-rtekind != RTE_RELATION || rte-inh) return NULL; ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)
Aqua data studio has a graphical explain built into it. It supports most rdbms including postgres. Its what I use to performance tune DB2. http://www.aquafold.com/ Index ANDing would suit you here You have 3 tables with 3 possible indexes and it sounds like the query is doing table scans where it needs to use indexes. If your version of postgres does not support index anding another way around this is to create a view and then index the view (if indexing views are possible in postgres) Another possible solution is inserting your data into a single table and then indexing that table. The initial cost is consuming however if you use triggers on your parent tables to automatically insert data into the new table it becomes almost hands free. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Stosberg Sent: Tuesday, February 06, 2007 8:40 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...) Bruno Wolff III wrote: Some people here may be able to tell you more if you show us explain analyze output. Here is my explain analyze output. Some brief context of what's going on. The goal is to find Pets Near You. We join the pets table on the shelters table to get a zipcode, and then join a shelters table with earth_distance to get the coordinates of the zipcode. ( Is there any significant penalty for using a varchar vs an int for a joint? ). I've been investigating partial indexes for the pets table. It has about 300,000 rows, but only about 10 are active, and those are the ones we are care about. Queries are also frequently made on males vs females, dogs vs cats or specific ages, and those specific cases seem like possible candidates for partial indexes as well. I played with that approach some, but had trouble coming up with any thing that benchmarked faster. I'm reading the explain analyze output correctly myself, nearly all of the time spent is related to the 'pets' table, but I can't see what to about it. Help appreciated! Mark Nested Loop (cost=11.82..29.90 rows=1 width=0) (actual time=37.601..1910.787 rows=628 loops=1) - Nested Loop (cost=6.68..20.73 rows=1 width=24) (actual time=35.525..166.547 rows=1727 loops=1) - Bitmap Heap Scan on pets (cost=6.68..14.71 rows=1 width=4) (actual time=35.427..125.594 rows=1727 loops=1) Recheck Cond: (((sex)::text = 'f'::text) AND (species_id = 1)) Filter: ((pet_state)::text = 'available'::text) - BitmapAnd (cost=6.68..6.68 rows=2 width=0) (actual time=33.398..33.398 rows=0 loops=1) - Bitmap Index Scan on pets_sex_idx (cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739 rows=35579 loops=1) Index Cond: ((sex)::text = 'f'::text) - Bitmap Index Scan on pet_species_id_idx (cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779 rows=48695 loops=1) Index Cond: (species_id = 1) - Index Scan using shelters_pkey on shelters (cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1 loops=1727) Index Cond: (outer.shelter_id = shelters.shelter_id) - Bitmap Heap Scan on earth_distance (cost=5.14..9.15 rows=1 width=9) (actual time=0.984..0.984 rows=0 loops=1727) Recheck Cond: ((cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_distance.earth_coords) AND ((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text)) - BitmapAnd (cost=5.14..5.14 rows=1 width=0) (actual time=0.978..0.978 rows=0 loops=1727) - Bitmap Index Scan on earth_coords_idx (cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223 loops=1727) Index Cond: (cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_coords) - Bitmap Index Scan on earth_distance_zipcode_idx (cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1 loops=1727) Index Cond: ((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text) Total runtime: 1913.099 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend PRIVILEGED AND CONFIDENTIAL This email transmission contains privileged and confidential information intended only for the use of the individual or entity named above. If the reader of the email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any use, dissemination or copying of this email transmission is strictly prohibited by the sender. If you have received this transmission in error, please delete the email and
Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)
what is the size of that index? Have you considered breaking the index into components, ie more than one index on the table? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adam Rich Sent: Tuesday, February 06, 2007 8:54 AM To: 'Mark Stosberg'; pgsql-performance@postgresql.org Subject: Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...) If I'm reading this correctly, 89% of the query time is spent doing an index scan of earth_coords_idx. Scanning pets is only taking 6% of the total time. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Stosberg Sent: Tuesday, February 06, 2007 8:40 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...) Bruno Wolff III wrote: Some people here may be able to tell you more if you show us explain analyze output. Here is my explain analyze output. Some brief context of what's going on. The goal is to find Pets Near You. We join the pets table on the shelters table to get a zipcode, and then join a shelters table with earth_distance to get the coordinates of the zipcode. ( Is there any significant penalty for using a varchar vs an int for a joint? ). I've been investigating partial indexes for the pets table. It has about 300,000 rows, but only about 10 are active, and those are the ones we are care about. Queries are also frequently made on males vs females, dogs vs cats or specific ages, and those specific cases seem like possible candidates for partial indexes as well. I played with that approach some, but had trouble coming up with any thing that benchmarked faster. I'm reading the explain analyze output correctly myself, nearly all of the time spent is related to the 'pets' table, but I can't see what to about it. Help appreciated! Mark Nested Loop (cost=11.82..29.90 rows=1 width=0) (actual time=37.601..1910.787 rows=628 loops=1) - Nested Loop (cost=6.68..20.73 rows=1 width=24) (actual time=35.525..166.547 rows=1727 loops=1) - Bitmap Heap Scan on pets (cost=6.68..14.71 rows=1 width=4) (actual time=35.427..125.594 rows=1727 loops=1) Recheck Cond: (((sex)::text = 'f'::text) AND (species_id = 1)) Filter: ((pet_state)::text = 'available'::text) - BitmapAnd (cost=6.68..6.68 rows=2 width=0) (actual time=33.398..33.398 rows=0 loops=1) - Bitmap Index Scan on pets_sex_idx (cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739 rows=35579 loops=1) Index Cond: ((sex)::text = 'f'::text) - Bitmap Index Scan on pet_species_id_idx (cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779 rows=48695 loops=1) Index Cond: (species_id = 1) - Index Scan using shelters_pkey on shelters (cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1 loops=1727) Index Cond: (outer.shelter_id = shelters.shelter_id) - Bitmap Heap Scan on earth_distance (cost=5.14..9.15 rows=1 width=9) (actual time=0.984..0.984 rows=0 loops=1727) Recheck Cond: ((cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_distance.earth_coords) AND ((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text)) - BitmapAnd (cost=5.14..5.14 rows=1 width=0) (actual time=0.978..0.978 rows=0 loops=1727) - Bitmap Index Scan on earth_coords_idx (cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223 loops=1727) Index Cond: (cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_coords) - Bitmap Index Scan on earth_distance_zipcode_idx (cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1 loops=1727) Index Cond: ((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text) Total runtime: 1913.099 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq PRIVILEGED AND CONFIDENTIAL This email transmission contains privileged and confidential information intended only for the use of the individual or entity named above. If the reader of the email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any use, dissemination or copying of this email transmission is strictly prohibited by the sender. If you have received this transmission in error, please delete the email and immediately notify
Re: [PERFORM] How long should it take to insert 200,000 records?
What is your row size? Have you checked to see what your current inserts per second are? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scott Marlowe Sent: Tuesday, February 06, 2007 10:56 AM To: Merlin Moncure Cc: Karen Hill; pgsql-performance@postgresql.org Subject: Re: [PERFORM] How long should it take to insert 200,000 records? On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote: On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On Mon, 2007-02-05 at 18:35, Karen Hill wrote: I have a pl/pgsql function that is inserting 200,000 records for testing purposes. What is the expected time frame for this operation on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is a 2ghz cpu. So far I've been sitting here for about 2 million ms waiting for it to complete, and I'm not sure how many inserts postgres is doing per second. That really depends. Doing 200,000 inserts as individual transactions will be fairly slow. Since PostgreSQL generally runs in autocommit mode, this means that if you didn't expressly begin a transaction, you are in fact inserting each row as a transaction. i.e. this: I think OP is doing insertion inside a pl/pgsql loop...transaction is implied here. Yeah, I noticed that about 10 seconds after hitting send... :) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster PRIVILEGED AND CONFIDENTIAL This email transmission contains privileged and confidential information intended only for the use of the individual or entity named above. If the reader of the email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any use, dissemination or copying of this email transmission is strictly prohibited by the sender. If you have received this transmission in error, please delete the email and immediately notify the sender via the email return address or mailto:[EMAIL PROTECTED] Thank you. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How long should it take to insert 200,000 records?
On Feb 5, 9:33 pm, [EMAIL PROTECTED] (Tom Lane) wrote: Karen Hill [EMAIL PROTECTED] writes: I have a pl/pgsql function that is inserting 200,000 records for testing purposes. What is the expected time frame for this operation on a pc with 1/2 a gig of ram and a 7200 RPM disk? I think you have omitted a bunch of relevant facts. Bare INSERT is reasonably quick: regression=# create table foo (f1 int); CREATE TABLE regression=# \timing Timing is on. regression=# insert into foo select x from generate_series(1,20) x; INSERT 0 20 Time: 5158.564 ms regression=# (this on a not-very-fast machine) but if you weigh it down with a ton of index updates, foreign key checks, etc, it could get slow ... also you haven't mentioned what else that plpgsql function is doing. The postgres version is 8.2.1 on Windows. The pl/pgsql function is inserting to an updatable view (basically two tables). CREATE TABLE foo1 ( ) ; CREATE TABLE foo2 ( ); CREATE VIEW viewfoo AS ( ); CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( ); CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..20 LOOP INSERT INTO viewfoo (x) VALUES (x); END LOOP; END; $$ LANGUAGE plpgsql; ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] How long should it take to insert 200,000 records?
Karen Hill wrote: The postgres version is 8.2.1 on Windows. The pl/pgsql function is inserting to an updatable view (basically two tables). CREATE TABLE foo1 ( ) ; CREATE TABLE foo2 ( ); CREATE VIEW viewfoo AS ( ); CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( ); CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..20 LOOP INSERT INTO viewfoo (x) VALUES (x); END LOOP; END; $$ LANGUAGE plpgsql; Sorry - but we probably need *still* more detail! - the definition of viewfoo is likely to be critical. For instance a simplified variant of your setup does 20 inserts in 5s on my PIII tualatin machine: CREATE TABLE foo1 (x INTEGER); CREATE VIEW viewfoo AS SELECT * FROM foo1; CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( INSERT INTO foo1 VALUES (new.x); ) CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..20 LOOP INSERT INTO viewfoo (x) VALUES (i); END LOOP; END; $$ LANGUAGE plpgsql; postgres=# \timing postgres=# SELECT functionFoo() ; functionfoo - (1 row) Time: 4659.477 ms postgres=# SELECT count(*) FROM viewfoo; count 20 (1 row) Cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How long should it take to insert 200,000 records?
Karen Hill [EMAIL PROTECTED] writes: On Feb 5, 9:33 pm, [EMAIL PROTECTED] (Tom Lane) wrote: I think you have omitted a bunch of relevant facts. The postgres version is 8.2.1 on Windows. The pl/pgsql function is inserting to an updatable view (basically two tables). [ sketch of schema ] I think the problem is probably buried in the parts you left out. Can you show us the full schemas for those tables, as well as the rule definition? The plpgsql function itself can certainly go a lot faster than what you indicated. On my slowest active machine: regression=# create table viewfoo(x int); CREATE TABLE regression=# CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..20 LOOP INSERT INTO viewfoo (x) VALUES (i); END LOOP; END; $$ LANGUAGE plpgsql; CREATE FUNCTION regression=# \timing Timing is on. regression=# select functionFoo(); functionfoo - (1 row) Time: 16939.667 ms regression=# regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq