Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Csaba Nagy
On Tue, 2007-02-06 at 01:35, Karen Hill wrote:
 [snip] So far I've been sitting here for about 2 million ms
 waiting for it to complete, and I'm not sure how many inserts postgres
 is doing per second.

One way is to run analyze verbose on the target table and see how many
pages it has, and then do it again 1 minute later and check how many
pages it grew. Then multiply the page increase by the record per page
ratio you can get from the same analyze's output, and you'll get an
estimated growth rate. Of course this will only work if you didn't have
lots of free space in the table to start with... if you do have lots of
free space, you still can estimate the growth based on the analyze
results, but it will be more complicated.


In any case, it would be very nice to have more tools to attach to
running queries and see how they are doing... starting with what exactly
they are doing (are they in RI checks maybe ?), the actual execution
plan they are using, how much they've done from their work... it would
help a lot debugging performance problems.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Tuning

2007-02-06 Thread Harald Armin Massa

Tuners,

allways be aware that results on Windows may be totally different!

My main customer is running PostgreSQL 8.1 on MINIMUM shared buffers

max_connections = 100#
shared_buffers = 200# min 16 or max_connections*2, 8KB each

I changed it to this value from the very low default 2, and the system
is responding better; especially after fixing the available memory setting
within the planner.

... frustrating part is, I could not replicate this behavious with pg_bench
:(

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.


Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)

2007-02-06 Thread Merlin Moncure

On 2/6/07, Mark Stosberg [EMAIL PROTECTED] wrote:

It's also notable that the units used are meters, not miles like
geo_distance(). That's what the magic number of 16093.44 is-- 10 miles
converted to meters.

When I benchmarked this query against the old geo_distance() variation,
it was about 200 times faster (~100ms vs .5ms).

However, my next step was to try a more real world query that involved
 a more complex where clause and a couple of table joins. So far, that
result is coming out /slower/ with the new approach, even though the
index is being used. I believe this may be cause of the additional
results found that are outside of the sphere, but inside the cube. This
causes additional rows that need processing in the joined tables.

Could someone post an example of how to further refine this so the
results more closely match what geo_distance returns() ?


I agree with bruno...the extra time is probably  not what you are
thinking...please post explain analyze results, etc.  However bruno's
ratio, while correct does not tell the whole story because you have to
recheck distance to every point in the returned set.

There is a small optimization you can make.  The query you wrote
automatically excludes points within a certain box.  you can also
include points in the set which is the largest box that fits in the
circle:

select * from zipcodes
where
earth_box('(436198.322855334,
4878562.8732218, 4085386.43843934)'::cube,inner_radius) @ earth_coords
or
(
earth_box('(436198.322855334,
4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords
and
geo_dist...
);

you can also choose to omit the earth_coords column and calculate it
on the fly...there is no real performance hit for this but it does
make the sql a bit ugly.

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)

2007-02-06 Thread Mark Stosberg
Bruno Wolff III wrote:

 Some people here may be able to tell you more if you show us explain
 analyze output.

Here is my explain analyze output. Some brief context of what's going
on. The goal is to find Pets Near You.

We join the pets table on the shelters table to get a zipcode, and then
join a shelters table with earth_distance to get the coordinates of
the zipcode.  (  Is there any significant penalty for using a varchar vs
an int for a joint? ).

I've been investigating partial indexes for the pets table. It has about
300,000 rows, but only about 10 are active, and those are the ones we
are care about. Queries are also frequently made on males vs females, dogs vs 
cats
or specific ages, and those specific cases seem like possible candidates for 
partial indexes
as well. I played with that approach some, but had trouble coming up with any 
thing that
benchmarked faster.

I'm reading the explain analyze output correctly myself, nearly all of
the time spent is related to the 'pets' table, but I can't see what to
about it.

Help appreciated!

  Mark

Nested Loop  (cost=11.82..29.90 rows=1 width=0) (actual time=37.601..1910.787 
rows=628 loops=1)
   -  Nested Loop  (cost=6.68..20.73 rows=1 width=24) (actual 
time=35.525..166.547 rows=1727 loops=1)
 -  Bitmap Heap Scan on pets  (cost=6.68..14.71 rows=1 width=4) 
(actual time=35.427..125.594 rows=1727 loops=1)
   Recheck Cond: (((sex)::text = 'f'::text) AND (species_id = 1))
   Filter: ((pet_state)::text = 'available'::text)
   -  BitmapAnd  (cost=6.68..6.68 rows=2 width=0) (actual 
time=33.398..33.398 rows=0 loops=1)
 -  Bitmap Index Scan on pets_sex_idx  (cost=0.00..3.21 
rows=347 width=0) (actual time=14.739..14.739 rows=35579 loops=1)
   Index Cond: ((sex)::text = 'f'::text)
 -  Bitmap Index Scan on pet_species_id_idx  
(cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779 rows=48695 
loops=1)
   Index Cond: (species_id = 1)
 -  Index Scan using shelters_pkey on shelters  (cost=0.00..6.01 
rows=1 width=28) (actual time=0.012..0.014 rows=1 loops=1727)
   Index Cond: (outer.shelter_id = shelters.shelter_id)
   -  Bitmap Heap Scan on earth_distance  (cost=5.14..9.15 rows=1 width=9) 
(actual time=0.984..0.984 rows=0 loops=1727)
 Recheck Cond: ((cube_enlarge(('(-2512840.11676572, 4646218.19036629, 
3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ 
earth_distance.earth_coords) AND
((outer.postal_code_for_joining)::text = (earth_distance.zipcode)::text))
 -  BitmapAnd  (cost=5.14..5.14 rows=1 width=0) (actual 
time=0.978..0.978 rows=0 loops=1727)
   -  Bitmap Index Scan on earth_coords_idx  (cost=0.00..2.15 
rows=42 width=0) (actual time=0.951..0.951 rows=1223 loops=1727)
 Index Cond: (cube_enlarge(('(-2512840.11676572, 
4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double 
precision, 3) @ earth_coords)
   -  Bitmap Index Scan on earth_distance_zipcode_idx  
(cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1 loops=1727)
 Index Cond: ((outer.postal_code_for_joining)::text = 
(earth_distance.zipcode)::text)
 Total runtime: 1913.099 ms



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)

2007-02-06 Thread Adam Rich

If I'm reading this correctly, 89% of the query time is spent
doing an index scan of earth_coords_idx.  Scanning pets is only
taking 6% of the total time.




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark
Stosberg
Sent: Tuesday, February 06, 2007 8:40 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] explain analyze output for review (was:
optimizing a geo_distance()...)


Bruno Wolff III wrote:

 Some people here may be able to tell you more if you show us explain
 analyze output.

Here is my explain analyze output. Some brief context of what's going
on. The goal is to find Pets Near You.

We join the pets table on the shelters table to get a zipcode, and then
join a shelters table with earth_distance to get the coordinates of
the zipcode.  (  Is there any significant penalty for using a varchar vs
an int for a joint? ).

I've been investigating partial indexes for the pets table. It has about
300,000 rows, but only about 10 are active, and those are the ones we
are care about. Queries are also frequently made on males vs females,
dogs vs cats
or specific ages, and those specific cases seem like possible candidates
for partial indexes
as well. I played with that approach some, but had trouble coming up
with any thing that
benchmarked faster.

I'm reading the explain analyze output correctly myself, nearly all of
the time spent is related to the 'pets' table, but I can't see what to
about it.

Help appreciated!

  Mark

Nested Loop  (cost=11.82..29.90 rows=1 width=0) (actual
time=37.601..1910.787 rows=628 loops=1)
   -  Nested Loop  (cost=6.68..20.73 rows=1 width=24) (actual
time=35.525..166.547 rows=1727 loops=1)
 -  Bitmap Heap Scan on pets  (cost=6.68..14.71 rows=1 width=4)
(actual time=35.427..125.594 rows=1727 loops=1)
   Recheck Cond: (((sex)::text = 'f'::text) AND (species_id
= 1))
   Filter: ((pet_state)::text = 'available'::text)
   -  BitmapAnd  (cost=6.68..6.68 rows=2 width=0) (actual
time=33.398..33.398 rows=0 loops=1)
 -  Bitmap Index Scan on pets_sex_idx
(cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739
rows=35579 loops=1)
   Index Cond: ((sex)::text = 'f'::text)
 -  Bitmap Index Scan on pet_species_id_idx
(cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779
rows=48695 loops=1)
   Index Cond: (species_id = 1)
 -  Index Scan using shelters_pkey on shelters
(cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1
loops=1727)
   Index Cond: (outer.shelter_id = shelters.shelter_id)
   -  Bitmap Heap Scan on earth_distance  (cost=5.14..9.15 rows=1
width=9) (actual time=0.984..0.984 rows=0 loops=1727)
 Recheck Cond: ((cube_enlarge(('(-2512840.11676572,
4646218.19036629, 3574817.21369166)'::cube)::cube,
160930.130863421::double precision, 3) @ earth_distance.earth_coords)
AND
((outer.postal_code_for_joining)::text =
(earth_distance.zipcode)::text))
 -  BitmapAnd  (cost=5.14..5.14 rows=1 width=0) (actual
time=0.978..0.978 rows=0 loops=1727)
   -  Bitmap Index Scan on earth_coords_idx
(cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223
loops=1727)
 Index Cond: (cube_enlarge(('(-2512840.11676572,
4646218.19036629, 3574817.21369166)'::cube)::cube,
160930.130863421::double precision, 3) @ earth_coords)
   -  Bitmap Index Scan on earth_distance_zipcode_idx
(cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1
loops=1727)
 Index Cond:
((outer.postal_code_for_joining)::text =
(earth_distance.zipcode)::text)
 Total runtime: 1913.099 ms



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)

2007-02-06 Thread Bruno Wolff III
On Tue, Feb 06, 2007 at 09:39:54 -0500,
  Mark Stosberg [EMAIL PROTECTED] wrote:
 
 I've been investigating partial indexes for the pets table. It has about
 300,000 rows, but only about 10 are active, and those are the ones we
 are care about. Queries are also frequently made on males vs females, dogs vs 
 cats

It probably won't pay to make partial indexes on sex or species (at least
for the popular ones), as you aren't likely to save enough by eliminating only
half the cases to make up for maintaining another index. A partial index for
active rows probably does make sense.

 or specific ages, and those specific cases seem like possible candidates for 
 partial indexes
 as well. I played with that approach some, but had trouble coming up with any 
 thing that
 benchmarked faster.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Scott Marlowe
On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote:
 On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:
  On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
   I have a pl/pgsql function that is inserting 200,000 records for
   testing purposes.  What is the expected time frame for this operation
   on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
   a 2ghz cpu.  So far I've been sitting here for about 2 million ms
   waiting for it to complete, and I'm not sure how many inserts postgres
   is doing per second.
 
  That really depends.  Doing 200,000 inserts as individual transactions
  will be fairly slow.  Since PostgreSQL generally runs in autocommit
  mode, this means that if you didn't expressly begin a transaction, you
  are in fact inserting each row as a transaction.  i.e. this:
 
 I think OP is doing insertion inside a pl/pgsql loop...transaction is
 implied here. 

Yeah, I noticed that about 10 seconds after hitting send... :)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Merlin Moncure

On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:

On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote:
 On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:
  On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
   I have a pl/pgsql function that is inserting 200,000 records for
   testing purposes.  What is the expected time frame for this operation
   on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
   a 2ghz cpu.  So far I've been sitting here for about 2 million ms
   waiting for it to complete, and I'm not sure how many inserts postgres
   is doing per second.
 
  That really depends.  Doing 200,000 inserts as individual transactions
  will be fairly slow.  Since PostgreSQL generally runs in autocommit
  mode, this means that if you didn't expressly begin a transaction, you
  are in fact inserting each row as a transaction.  i.e. this:

 I think OP is doing insertion inside a pl/pgsql loop...transaction is
 implied here.

Yeah, I noticed that about 10 seconds after hitting send... :)


actually, I get the stupid award also because RI check to unindexed
column is not possible :)  (this haunts deletes, not inserts).

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Mark Lewis
On Tue, 2007-02-06 at 12:01 -0500, Merlin Moncure wrote:
 On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:
  On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote:
   On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:
On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
 I have a pl/pgsql function that is inserting 200,000 records for
 testing purposes.  What is the expected time frame for this operation
 on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
 a 2ghz cpu.  So far I've been sitting here for about 2 million ms
 waiting for it to complete, and I'm not sure how many inserts postgres
 is doing per second.
   
That really depends.  Doing 200,000 inserts as individual transactions
will be fairly slow.  Since PostgreSQL generally runs in autocommit
mode, this means that if you didn't expressly begin a transaction, you
are in fact inserting each row as a transaction.  i.e. this:
  
   I think OP is doing insertion inside a pl/pgsql loop...transaction is
   implied here.
 
  Yeah, I noticed that about 10 seconds after hitting send... :)
 
 actually, I get the stupid award also because RI check to unindexed
 column is not possible :)  (this haunts deletes, not inserts).

Sure it's possible:

CREATE TABLE parent (col1 int4);
-- insert many millions of rows into parent
CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
-- insert many millions of rows into child, very very slowly.


- Mark Lewis



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Bruno Wolff III
On Tue, Feb 06, 2007 at 10:31:26 -0800,
  Mark Lewis [EMAIL PROTECTED] wrote:
 
 Sure it's possible:
 
 CREATE TABLE parent (col1 int4);
 -- insert many millions of rows into parent
 CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
 -- insert many millions of rows into child, very very slowly.

I don't think Postgres allows this. You don't have to have an index in the
child table, but do in the parent table.
Quote from http://developer.postgresql.org/pgdocs/postgres/sql-createtable.html:
The referenced columns must be the columns of a unique or primary key
constraint in the referenced table.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Merlin Moncure

On 2/6/07, Mark Lewis [EMAIL PROTECTED] wrote:

 actually, I get the stupid award also because RI check to unindexed
 column is not possible :)  (this haunts deletes, not inserts).

Sure it's possible:

CREATE TABLE parent (col1 int4);
-- insert many millions of rows into parent
CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
-- insert many millions of rows into child, very very slowly.


the database will not allow you to create a RI link out unless the
parent table has a primary key/unique constraint, which the database
backs with an indexand you can't even trick it afterwards by
dropping the constraint.

it's the other direction, when you cascade forwards when you can have
a problem.  this is most common with a delete, but can also happen on
an update of a table's primary key with child tables referencing it.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Mark Lewis
On Tue, 2007-02-06 at 14:06 -0500, Merlin Moncure wrote:
 On 2/6/07, Mark Lewis [EMAIL PROTECTED] wrote:
   actually, I get the stupid award also because RI check to unindexed
   column is not possible :)  (this haunts deletes, not inserts).
 
  Sure it's possible:
 
  CREATE TABLE parent (col1 int4);
  -- insert many millions of rows into parent
  CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
  -- insert many millions of rows into child, very very slowly.
 
 the database will not allow you to create a RI link out unless the
 parent table has a primary key/unique constraint, which the database
 backs with an indexand you can't even trick it afterwards by
 dropping the constraint.
 
 it's the other direction, when you cascade forwards when you can have
 a problem.  this is most common with a delete, but can also happen on
 an update of a table's primary key with child tables referencing it.
 

Hmmm, should check my SQL before hitting send I guess.  Well, at least
you no longer have to wear the stupid award, Merlin :)

-- Mark Lewis

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] index scan through a subquery

2007-02-06 Thread Bill Howe
Tom Lane wrote:
 I need the lovely index scan, but my table is hidden behind a view, and
 all I get is the ugly sequential scan.  Any ideas on how to convince the
 optimizer to unfold the subquery properly?
 
 You should provide some context in this sort of gripe, like which PG
 version you're using.  But I'm going to guess that it's 8.2.x, because
 8.1.x gets it right :-(.  Try the attached.

Good guess; I was indeed talking about the current release rather than
the previous release.

Also, apologies for the tone of my post: I was attempting to be jovial,
but in retrospect, I see how it reads as a gripe,  which I guess
evoked your frowny-face emoticon.

Thanks for the quick response, elegant fix, and ongoing excellent work!

Cheers,
Bill

 Index: planagg.c
 ===
 RCS file: /cvsroot/pgsql/src/backend/optimizer/plan/planagg.c,v
 retrieving revision 1.25
 diff -c -r1.25 planagg.c
 *** planagg.c 9 Jan 2007 02:14:13 -   1.25
 --- planagg.c 6 Feb 2007 06:30:23 -
 ***
 *** 70,75 
 --- 70,76 
   optimize_minmax_aggregates(PlannerInfo *root, List *tlist, Path *best_path)
   {
   Query  *parse = root-parse;
 + FromExpr   *jtnode;
   RangeTblRef *rtr;
   RangeTblEntry *rte;
   RelOptInfo *rel;
 ***
 *** 102,115 
* We also restrict the query to reference exactly one table, since join
* conditions can't be handled reasonably.  (We could perhaps handle a
* query containing cartesian-product joins, but it hardly seems worth 
 the
 !  * trouble.)
*/
 ! Assert(parse-jointree != NULL  IsA(parse-jointree, FromExpr));
 ! if (list_length(parse-jointree-fromlist) != 1)
 ! return NULL;
 ! rtr = (RangeTblRef *) linitial(parse-jointree-fromlist);
 ! if (!IsA(rtr, RangeTblRef))
   return NULL;
   rte = rt_fetch(rtr-rtindex, parse-rtable);
   if (rte-rtekind != RTE_RELATION || rte-inh)
   return NULL;
 --- 103,121 
* We also restrict the query to reference exactly one table, since join
* conditions can't be handled reasonably.  (We could perhaps handle a
* query containing cartesian-product joins, but it hardly seems worth 
 the
 !  * trouble.)  However, the single real table could be buried in several
 !  * levels of FromExpr.
*/
 ! jtnode = parse-jointree;
 ! while (IsA(jtnode, FromExpr))
 ! {
 ! if (list_length(jtnode-fromlist) != 1)
 ! return NULL;
 ! jtnode = linitial(jtnode-fromlist);
 ! }
 ! if (!IsA(jtnode, RangeTblRef))
   return NULL;
 + rtr = (RangeTblRef *) jtnode;
   rte = rt_fetch(rtr-rtindex, parse-rtable);
   if (rte-rtekind != RTE_RELATION || rte-inh)
   return NULL;
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)

2007-02-06 Thread Hiltibidal, Robert
Aqua data studio has a graphical explain built into it. It supports most
rdbms including postgres. Its what I use to performance tune DB2.
http://www.aquafold.com/


Index ANDing would suit you here

You have 3 tables with 3 possible indexes and it sounds like the query
is doing table scans where it needs to use indexes. 

If your version of postgres does not support index anding another way
around this is to create a view and then index the view (if indexing
views are possible in postgres)

Another possible solution is inserting your data into a single table and
then indexing that table. The initial cost is consuming however if you
use triggers on your parent tables to automatically insert data into the
new table it becomes almost hands free. 




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark
Stosberg
Sent: Tuesday, February 06, 2007 8:40 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] explain analyze output for review (was:
optimizing a geo_distance()...)

Bruno Wolff III wrote:

 Some people here may be able to tell you more if you show us explain
 analyze output.

Here is my explain analyze output. Some brief context of what's going
on. The goal is to find Pets Near You.

We join the pets table on the shelters table to get a zipcode, and then
join a shelters table with earth_distance to get the coordinates of
the zipcode.  (  Is there any significant penalty for using a varchar vs
an int for a joint? ).

I've been investigating partial indexes for the pets table. It has about
300,000 rows, but only about 10 are active, and those are the ones we
are care about. Queries are also frequently made on males vs females,
dogs vs cats
or specific ages, and those specific cases seem like possible candidates
for partial indexes
as well. I played with that approach some, but had trouble coming up
with any thing that
benchmarked faster.

I'm reading the explain analyze output correctly myself, nearly all of
the time spent is related to the 'pets' table, but I can't see what to
about it.

Help appreciated!

  Mark

Nested Loop  (cost=11.82..29.90 rows=1 width=0) (actual
time=37.601..1910.787 rows=628 loops=1)
   -  Nested Loop  (cost=6.68..20.73 rows=1 width=24) (actual
time=35.525..166.547 rows=1727 loops=1)
 -  Bitmap Heap Scan on pets  (cost=6.68..14.71 rows=1 width=4)
(actual time=35.427..125.594 rows=1727 loops=1)
   Recheck Cond: (((sex)::text = 'f'::text) AND (species_id
= 1))
   Filter: ((pet_state)::text = 'available'::text)
   -  BitmapAnd  (cost=6.68..6.68 rows=2 width=0) (actual
time=33.398..33.398 rows=0 loops=1)
 -  Bitmap Index Scan on pets_sex_idx
(cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739
rows=35579 loops=1)
   Index Cond: ((sex)::text = 'f'::text)
 -  Bitmap Index Scan on pet_species_id_idx
(cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779
rows=48695 loops=1)
   Index Cond: (species_id = 1)
 -  Index Scan using shelters_pkey on shelters
(cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1
loops=1727)
   Index Cond: (outer.shelter_id = shelters.shelter_id)
   -  Bitmap Heap Scan on earth_distance  (cost=5.14..9.15 rows=1
width=9) (actual time=0.984..0.984 rows=0 loops=1727)
 Recheck Cond: ((cube_enlarge(('(-2512840.11676572,
4646218.19036629, 3574817.21369166)'::cube)::cube,
160930.130863421::double precision, 3) @ earth_distance.earth_coords)
AND
((outer.postal_code_for_joining)::text =
(earth_distance.zipcode)::text))
 -  BitmapAnd  (cost=5.14..5.14 rows=1 width=0) (actual
time=0.978..0.978 rows=0 loops=1727)
   -  Bitmap Index Scan on earth_coords_idx
(cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223
loops=1727)
 Index Cond: (cube_enlarge(('(-2512840.11676572,
4646218.19036629, 3574817.21369166)'::cube)::cube,
160930.130863421::double precision, 3) @ earth_coords)
   -  Bitmap Index Scan on earth_distance_zipcode_idx
(cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1
loops=1727)
 Index Cond:
((outer.postal_code_for_joining)::text =
(earth_distance.zipcode)::text)
 Total runtime: 1913.099 ms



---(end of broadcast)---
TIP 6: explain analyze is your friend


PRIVILEGED AND CONFIDENTIAL
This email transmission contains privileged and confidential information 
intended only for the use of the individual or entity named above.  If the 
reader of the email is not the intended recipient or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any use, dissemination or copying of this email transmission is 
strictly prohibited by the sender.  If you have received this transmission in 
error, please delete the email and 

Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)

2007-02-06 Thread Hiltibidal, Robert
what is the size of that index?

Have you considered breaking the index into components, ie more than one
index on the table?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adam Rich
Sent: Tuesday, February 06, 2007 8:54 AM
To: 'Mark Stosberg'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] explain analyze output for review (was:
optimizing a geo_distance()...)


If I'm reading this correctly, 89% of the query time is spent
doing an index scan of earth_coords_idx.  Scanning pets is only
taking 6% of the total time.




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark
Stosberg
Sent: Tuesday, February 06, 2007 8:40 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] explain analyze output for review (was:
optimizing a geo_distance()...)


Bruno Wolff III wrote:

 Some people here may be able to tell you more if you show us explain
 analyze output.

Here is my explain analyze output. Some brief context of what's going
on. The goal is to find Pets Near You.

We join the pets table on the shelters table to get a zipcode, and then
join a shelters table with earth_distance to get the coordinates of
the zipcode.  (  Is there any significant penalty for using a varchar vs
an int for a joint? ).

I've been investigating partial indexes for the pets table. It has about
300,000 rows, but only about 10 are active, and those are the ones we
are care about. Queries are also frequently made on males vs females,
dogs vs cats
or specific ages, and those specific cases seem like possible candidates
for partial indexes
as well. I played with that approach some, but had trouble coming up
with any thing that
benchmarked faster.

I'm reading the explain analyze output correctly myself, nearly all of
the time spent is related to the 'pets' table, but I can't see what to
about it.

Help appreciated!

  Mark

Nested Loop  (cost=11.82..29.90 rows=1 width=0) (actual
time=37.601..1910.787 rows=628 loops=1)
   -  Nested Loop  (cost=6.68..20.73 rows=1 width=24) (actual
time=35.525..166.547 rows=1727 loops=1)
 -  Bitmap Heap Scan on pets  (cost=6.68..14.71 rows=1 width=4)
(actual time=35.427..125.594 rows=1727 loops=1)
   Recheck Cond: (((sex)::text = 'f'::text) AND (species_id
= 1))
   Filter: ((pet_state)::text = 'available'::text)
   -  BitmapAnd  (cost=6.68..6.68 rows=2 width=0) (actual
time=33.398..33.398 rows=0 loops=1)
 -  Bitmap Index Scan on pets_sex_idx
(cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739
rows=35579 loops=1)
   Index Cond: ((sex)::text = 'f'::text)
 -  Bitmap Index Scan on pet_species_id_idx
(cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779
rows=48695 loops=1)
   Index Cond: (species_id = 1)
 -  Index Scan using shelters_pkey on shelters
(cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1
loops=1727)
   Index Cond: (outer.shelter_id = shelters.shelter_id)
   -  Bitmap Heap Scan on earth_distance  (cost=5.14..9.15 rows=1
width=9) (actual time=0.984..0.984 rows=0 loops=1727)
 Recheck Cond: ((cube_enlarge(('(-2512840.11676572,
4646218.19036629, 3574817.21369166)'::cube)::cube,
160930.130863421::double precision, 3) @ earth_distance.earth_coords)
AND
((outer.postal_code_for_joining)::text =
(earth_distance.zipcode)::text))
 -  BitmapAnd  (cost=5.14..5.14 rows=1 width=0) (actual
time=0.978..0.978 rows=0 loops=1727)
   -  Bitmap Index Scan on earth_coords_idx
(cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223
loops=1727)
 Index Cond: (cube_enlarge(('(-2512840.11676572,
4646218.19036629, 3574817.21369166)'::cube)::cube,
160930.130863421::double precision, 3) @ earth_coords)
   -  Bitmap Index Scan on earth_distance_zipcode_idx
(cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1
loops=1727)
 Index Cond:
((outer.postal_code_for_joining)::text =
(earth_distance.zipcode)::text)
 Total runtime: 1913.099 ms



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


PRIVILEGED AND CONFIDENTIAL
This email transmission contains privileged and confidential information 
intended only for the use of the individual or entity named above.  If the 
reader of the email is not the intended recipient or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any use, dissemination or copying of this email transmission is 
strictly prohibited by the sender.  If you have received this transmission in 
error, please delete the email and immediately notify 

Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Hiltibidal, Robert
What is your row size?

Have you checked to see what your current inserts per second are?



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Scott
Marlowe
Sent: Tuesday, February 06, 2007 10:56 AM
To: Merlin Moncure
Cc: Karen Hill; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How long should it take to insert 200,000
records?

On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote:
 On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:
  On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
   I have a pl/pgsql function that is inserting 200,000 records for
   testing purposes.  What is the expected time frame for this
operation
   on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor
is
   a 2ghz cpu.  So far I've been sitting here for about 2 million ms
   waiting for it to complete, and I'm not sure how many inserts
postgres
   is doing per second.
 
  That really depends.  Doing 200,000 inserts as individual
transactions
  will be fairly slow.  Since PostgreSQL generally runs in autocommit
  mode, this means that if you didn't expressly begin a transaction,
you
  are in fact inserting each row as a transaction.  i.e. this:
 
 I think OP is doing insertion inside a pl/pgsql loop...transaction is
 implied here. 

Yeah, I noticed that about 10 seconds after hitting send... :)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


PRIVILEGED AND CONFIDENTIAL
This email transmission contains privileged and confidential information 
intended only for the use of the individual or entity named above.  If the 
reader of the email is not the intended recipient or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any use, dissemination or copying of this email transmission is 
strictly prohibited by the sender.  If you have received this transmission in 
error, please delete the email and immediately notify the sender via the email 
return address or mailto:[EMAIL PROTECTED]  Thank you.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Karen Hill
On Feb 5, 9:33 pm, [EMAIL PROTECTED] (Tom Lane) wrote:
 Karen Hill [EMAIL PROTECTED] writes:
  I have a pl/pgsql function that is inserting 200,000 records for
  testing purposes.  What is the expected time frame for this operation
  on a pc with 1/2 a gig of ram and a 7200 RPM disk?

 I think you have omitted a bunch of relevant facts.  Bare INSERT is
 reasonably quick:

 regression=# create table foo (f1 int);
 CREATE TABLE
 regression=# \timing
 Timing is on.
 regression=# insert into foo select x from generate_series(1,20) x;
 INSERT 0 20
 Time: 5158.564 ms
 regression=#

 (this on a not-very-fast machine) but if you weigh it down with a ton
 of index updates, foreign key checks, etc, it could get slow ...
 also you haven't mentioned what else that plpgsql function is doing.


The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
inserting to an updatable view (basically two tables).

CREATE TABLE foo1
(


) ;

CREATE TABLE foo2
(

);

CREATE VIEW viewfoo AS
(

);
CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(

);

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..20 LOOP
INSERT INTO viewfoo (x) VALUES (x);
END LOOP;
END;
$$ LANGUAGE plpgsql;



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Mark Kirkwood

Karen Hill wrote:



The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
inserting to an updatable view (basically two tables).

CREATE TABLE foo1
(


) ;

CREATE TABLE foo2
(

);

CREATE VIEW viewfoo AS
(

);
CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(

);

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..20 LOOP
INSERT INTO viewfoo (x) VALUES (x);
END LOOP;
END;
$$ LANGUAGE plpgsql;




Sorry - but we probably need *still* more detail! - the definition of 
viewfoo is likely to be critical. For instance a simplified variant of 
your setup does 20 inserts in 5s on my PIII tualatin machine:


CREATE TABLE foo1 (x INTEGER);

CREATE VIEW viewfoo AS SELECT * FROM foo1;

CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(
 INSERT INTO foo1 VALUES (new.x);
)

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..20 LOOP
INSERT INTO viewfoo (x) VALUES (i);
END LOOP;
END;
$$ LANGUAGE plpgsql;


postgres=# \timing
postgres=# SELECT functionFoo() ;
 functionfoo
-

(1 row)

Time: 4659.477 ms

postgres=# SELECT count(*) FROM viewfoo;
 count

 20
(1 row)

Cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Tom Lane
Karen Hill [EMAIL PROTECTED] writes:
 On Feb 5, 9:33 pm, [EMAIL PROTECTED] (Tom Lane) wrote:
 I think you have omitted a bunch of relevant facts.

 The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
 inserting to an updatable view (basically two tables).
 [ sketch of schema ]

I think the problem is probably buried in the parts you left out.  Can
you show us the full schemas for those tables, as well as the rule
definition?  The plpgsql function itself can certainly go a lot faster
than what you indicated.  On my slowest active machine:

regression=# create table viewfoo(x int);
CREATE TABLE
regression=# CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..20 LOOP
INSERT INTO viewfoo (x) VALUES (i);
END LOOP;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
regression=# \timing
Timing is on.
regression=# select functionFoo();
 functionfoo 
-
 
(1 row)

Time: 16939.667 ms
regression=# 

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq