[PERFORM] An unwanted seqscan

2007-02-14 Thread Brian Herlihy
Hi,

I am having trouble understanding why a seqscan is chosen for this query.

In practice the seqscan is very expensive, whereas the nested loop is usually 
quite fast, even with several hundred rows returned from meta_keywords_url.

The server is running version 8.1.3, and both tables were analyzed recently.  
meta_keywords contains around 25% dead rows, meta_keywords_url contains no dead 
rows.

I have included the query written both as a subquery and as a join.

Thanks for any assistance!
Brian



live= explain select * from meta_keywords where url_id in (select url_id from 
meta_keywords_url where host = 'postgresql.org');
   QUERY PLAN   
-
 Hash IN Join  (cost=1755.79..545380.52 rows=9442 width=29)
   Hash Cond: (outer.url_id = inner.url_id)
   -  Seq Scan on meta_keywords  (cost=0.00..507976.54 rows=7110754 width=29)
   -  Hash  (cost=1754.35..1754.35 rows=576 width=4)
 -  Bitmap Heap Scan on meta_keywords_url  (cost=11.02..1754.35 
rows=576 width=4)
   Recheck Cond: ((host)::text = 'postgresql.org'::text)
   -  Bitmap Index Scan on meta_keywords_url_host_path  
(cost=0.00..11.02 rows=576 width=0)
 Index Cond: ((host)::text = 'postgresql.org'::text)
(8 rows)

live= set enable_seqscan=off;
SET
live= explain select * from meta_keywords where url_id in (select url_id from 
meta_keywords_url where host = 'postgresql.org');
   QUERY PLAN   
-
 Nested Loop  (cost=1755.79..3161748.83 rows=9442 width=29)
   -  HashAggregate  (cost=1755.79..1761.55 rows=576 width=4)
 -  Bitmap Heap Scan on meta_keywords_url  (cost=11.02..1754.35 
rows=576 width=4)
   Recheck Cond: ((host)::text = 'postgresql.org'::text)
   -  Bitmap Index Scan on meta_keywords_url_host_path  
(cost=0.00..11.02 rows=576 width=0)
 Index Cond: ((host)::text = 'postgresql.org'::text)
   -  Index Scan using meta_keywords_url_id on meta_keywords  
(cost=0.00..5453.28 rows=2625 width=29)
 Index Cond: (meta_keywords.url_id = outer.url_id)
(8 rows)

live= explain select * from meta_keywords join meta_keywords_url using 
(url_id) where host = 'postgresql.org'; 
QUERY PLAN  
 
-
 Hash Join  (cost=1758.52..543685.43 rows=9297 width=107)
   Hash Cond: (outer.url_id = inner.url_id)
   -  Seq Scan on meta_keywords  (cost=0.00..506859.29 rows=6994929 width=28)
   -  Hash  (cost=1757.08..1757.08 rows=577 width=83)
 -  Bitmap Heap Scan on meta_keywords_url  (cost=11.02..1757.08 
rows=577 width=83)
   Recheck Cond: ((host)::text = 'postgresql.org'::text)
   -  Bitmap Index Scan on meta_keywords_url_host_path  
(cost=0.00..11.02 rows=577 width=0)
 Index Cond: ((host)::text = 'postgresql.org'::text)
(8 rows)

live= set enable_seqscan=off;
SET
live= explain select * from meta_keywords join meta_keywords_url using 
(url_id) where host = 'postgresql.org';
   QUERY PLAN   
-
 Nested Loop  (cost=0.00..3348211.21 rows=9297 width=107)
   -  Index Scan using meta_keywords_url_host_path on meta_keywords_url  
(cost=0.00..2230.24 rows=577 width=83)
 Index Cond: ((host)::text = 'postgresql.org'::text)
   -  Index Scan using meta_keywords_url_id on meta_keywords  
(cost=0.00..5765.81 rows=2649 width=28)
 Index Cond: (meta_keywords.url_id = outer.url_id)
(5 rows)





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] An unwanted seqscan

2007-02-14 Thread Tom Lane
Brian Herlihy [EMAIL PROTECTED] writes:
 I am having trouble understanding why a seqscan is chosen for this query.

As far as anyone can see from this output, the planner's decisions are
correct: it prefers the plans with the smaller estimated cost.  If you
want us to take an interest, provide some more context --- EXPLAIN
ANALYZE output for starters.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] quad or dual core Intel CPUs

2007-02-14 Thread Claus Guttesen

I am about to pull the trigger on a new machine after analyzing some
discussions I posted here last year.  I've been trying to spec out a reliable
and powerfull enough machine where I won't have to replace it for some time.
Currently I've been using a dual Xeon 3.06ghz with 4GB of ram and utilizing a
RAID 1+0 configuration over a total 6 SCSI disks asside from the OS partition.
We have about 10GB of data and will probably scale at about 1GB per month. We
currently average about 200 queries per second and the 15 minute load average
is about .30.  I am running FreeBSD 6.1.

At the end of last year, I specced out a new machine to replace this box. At
that time, the quad core 2.66ghz were not available from my vendor and I was
not planning to go that route. Now that they are available, I am considering
the option. The main question here is whether FreeBSD 6.X and PostgreSQL 8.1
would be able to take advantage of the quad core and perform better than the
3.0Ghz dual core.  The reason I ask is due to some conflicting benchmarking
results I see posted on the spec.org website.

Here is the full specification of the new box I hope to build and run FreeBSD
6.X and PostgreSQL on:

- SuperMicro Dual Xeon X7DBE+ motherboard
  + 2 x Quad Core X5355 2.66Ghz
OR
  + 2 x Dual Core 5160 3.0Ghz

- 8 x 1GB PC2-4200 fully buffered DIMM

- LSI MegaRAID SAS 8408E w/BBU 256MB

- 16 x 73GB SAS disk

So, question #1, to go dual core or quad core?  Quad core in theory seems to
scale the machine's processing potential by almost a factor of two.


We recently migrated from a four way opteron @ 2 GHz with 8 GB to a
four way woodcrest @ 3 GHz (HP DL380 G5) with 16 GB ram. I also
upgraded FreeBSD from 6.0 to 6.2 and did a minor upgrade of postgresql
from 7.4.9 to 7.4.12. The change was tremendous, the first few hours
of after it went into production I had to doublecheck that our website
worked, since the load was way below 1 whereas the load had been
almost 100 during peak.

I don't have any financial ties to HP but building a server from
scratch may not be worth it, rather than spending time assemling all
the different parts yourself I would suggest you get a server from one
vendor who build a server according to your specs.

The DL380 (also) has a 256 MB bbc controller, the nic works flawlessly
with FreeBSD 6.2, all parts are well integrated, the frontbay can
accomodate 8 146 GB SAS drives. This server is wellsuited as a
postgresql-server.

Approx. 200 reqest a sec. should be a problem unless the queries are heavy.

regards
Claus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Benchmarking PGSQL?

2007-02-14 Thread Krishna Kumar

Hello All,
I'm a performance engineer, quite interested in getting deep into the PGSQL
performance enhancement effort. In that regard, I have the following
questions :
1. Is there a benchmarking setup, that I can access online?
2. What benchmarks are we running , for performance numbers?
3. What are the current issues, related to performance?
4. Where can I start, with the PGSQL performance effort?

Thanks a lot,
Krishna


Re: [PERFORM] Benchmarking PGSQL?

2007-02-14 Thread Shoaib Mir

Have you tried pgbench yet?

--
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 2/14/07, Krishna Kumar [EMAIL PROTECTED] wrote:


Hello All,
I'm a performance engineer, quite interested in getting deep into the
PGSQL performance enhancement effort. In that regard, I have the following
questions :
1. Is there a benchmarking setup, that I can access online?
2. What benchmarks are we running , for performance numbers?
3. What are the current issues, related to performance?
4. Where can I start, with the PGSQL performance effort?

Thanks a lot,
Krishna



Re: [PERFORM] cube operations slower than geo_distance() on production server

2007-02-14 Thread Merlin Moncure

On 2/14/07, Tom Lane [EMAIL PROTECTED] wrote:

There are two things wrong here: first, that the estimated row count is
only 20% of actual; it should certainly not be that far off for such a
simple condition.  I wonder if your vacuum/analyze procedures are
actually working.  Second, you mentioned somewhere along the line that
'available' pets are about 10% of all the entries, which means that this
indexscan is more than likely entirely counterproductive: it would be
cheaper to ignore this index altogether.


I think switching the index on pet_state to a composite on (pet_state,
species_id) might help too.

or even better:

create function is_pet_available(text) returns bool as
$$
 select $1='available';
$$ language sql immutable;

create index pets_available_species_idx on
pets(is_pet_available(pet_state), species_id);

refactor your query something similar to:

SELECT * FROM
(
SELECT
earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius
FROM pets
JOIN shelters_active as shelters USING (shelter_id)
JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode
JOIN zipcodes q ON q.zipcode = '90210'
WHERE
  is_pet_available(pet_state)
  AND species_id = 1
  AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords
) p order by radius

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Benchmarking PGSQL?

2007-02-14 Thread Luke Lonergan
Here¹s one:

Insert performance is limited to about 10-12 MB/s no matter how fast the
underlying I/O hardware.  Bypassing the WAL (write ahead log) only boosts
this to perhaps 20 MB/s.  We¹ve found that the biggest time consumer in the
profile is the collection of routines that ³convert to datum².

You can perform the test using any dataset, you might consider using the
TPC-H benchmark kit with a data generator available at www.tpc.org.  Just
generate some data, load the schema, then perform some COPY statements,
INSERT INTO SELECT FROM and CREATE TABLE AS SELECT.

- Luke


On 2/14/07 2:00 AM, Krishna Kumar [EMAIL PROTECTED] wrote:

 Hello All, 
 I'm a performance engineer, quite interested in getting deep into the PGSQL
 performance enhancement effort. In that regard, I have the following questions
 : 
 1. Is there a benchmarking setup, that I can access online?
 2. What benchmarks are we running , for performance numbers?
 3. What are the current issues, related to performance?
 4. Where can I start, with the PGSQL performance effort?
 
 Thanks a lot, 
 Krishna 
 




Re: [PERFORM] Benchmarking PGSQL?

2007-02-14 Thread Merlin Moncure

On 2/14/07, Luke Lonergan [EMAIL PROTECTED] wrote:


 Here's one:

 Insert performance is limited to about 10-12 MB/s no matter how fast the
underlying I/O hardware.  Bypassing the WAL (write ahead log) only boosts
this to perhaps 20 MB/s.  We've found that the biggest time consumer in the
profile is the collection of routines that convert to datum.

 You can perform the test using any dataset, you might consider using the
TPC-H benchmark kit with a data generator available at www.tpc.org.  Just
generate some data, load the schema, then perform some COPY statements,
INSERT INTO SELECT FROM and CREATE TABLE AS SELECT.


I am curious what is your take on the maximum insert performance, in
mb/sec of large bytea columns (toasted), and how much if any greenplum
was able to advance this over the baseline.  I am asking on behalf of
another interested party.  Interested in numbers broken down per core
on 8 core quad system and also aggreate.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)

2007-02-14 Thread Mark Stosberg

Merlin Moncure wrote:

On 2/14/07, Tom Lane [EMAIL PROTECTED] wrote:

There are two things wrong here: first, that the estimated row
count is only 20% of actual; it should certainly not be that far
off for such a simple condition. I wonder if your vacuum/analyze
procedures are actually working. Second, you mentioned somewhere
along the line that 'available' pets are about 10% of all the
entries, which means that this indexscan is more than likely
entirely counterproductive: it would be cheaper to ignore this
index altogether.


Tom,

Thanks for the generosity of your time. We are using  8.1.3 currently. I 
have read there are some performance improvements in 8.2, but we have 
not started evaluating that yet.


Your suggestion about the pet_state index was right on. I tried 
Analyze on it, but still got the same bad estimate. However, I then 
used reindex on that index, and that fixed the estimate accuracy, 
which made the query run faster! The cube search now benchmarks faster 
than the old search in production, taking about 2/3s of the time of the 
old one.


Any ideas why the manual REINDEX did something that analyze didn't? It 
makes me wonder if there is other tuning like this to do.


Attached is the EA output from the most recent run, after the re-index.


I think switching the index on pet_state to a composite on (pet_state,
species_id) might help too.

or even better:

create function is_pet_available(text) returns bool as
$$
 select $1='available';
$$ language sql immutable;

create index pets_available_species_idx on
pets(is_pet_available(pet_state), species_id);


Merlin,

Thanks for this suggestion. It is not an approach I had used before, and 
I was interested to try it. However, the new index didn't get chosen. 
(Perhaps I would need to drop the old one?) However, Tom's suggestions 
did help. I'll follow up on that in just a moment.




refactor your query something similar to:

SELECT * FROM
(
SELECT
earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius
FROM pets
JOIN shelters_active as shelters USING (shelter_id)
JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode
JOIN zipcodes q ON q.zipcode = '90210'
WHERE
  is_pet_available(pet_state)
  AND species_id = 1
  AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords
) p order by radius

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Sort  (cost=5276.93..5277.00 rows=28 width=64) (actual time=1981.830..1984.415 
rows=1344 loops=1)
   Sort Key: (cube_distance(public.zipcodes.earth_coords, 
public.zipcodes.earth_coords) / 1609.344::double precision)
   -  Nested Loop  (cost=291.32..5276.26 rows=28 width=64) (actual 
time=24.080..1976.479 rows=1344 loops=1)
 -  Nested Loop  (cost=2.15..575.79 rows=11 width=68) (actual 
time=2.637..34.067 rows=131 loops=1)
   -  Nested Loop  (cost=2.15..153.48 rows=42 width=73) (actual 
time=1.939..3.972 rows=240 loops=1)
 -  Index Scan using zipcodes_pkey on zipcodes  
(cost=0.00..3.01 rows=1 width=32) (actual time=0.283..0.287 rows=1 loops=1)
   Index Cond: ((zipcode)::text = '90210'::text)
 -  Bitmap Heap Scan on zipcodes  (cost=2.15..149.84 
rows=42 width=41) (actual time=1.403..2.323 rows=240 loops=1)
   Recheck Cond: 
(cube_enlarge((outer.earth_coords)::cube, 16093.4357308298::double precision, 
3) @ zipcodes.earth_coords)
   -  Bitmap Index Scan on zip_earth_coords_idx  
(cost=0.00..2.15 rows=42 width=0) (actual time=1.377..1.377 rows=240 loops=1)
 Index Cond: 
(cube_enlarge((outer.earth_coords)::cube, 16093.4357308298::double precision, 
3) @ zipcodes.earth_coords)
   -  Index Scan using shelters_postal_code_for_joining_idx on 
shelters  (cost=0.00..10.03 rows=2 width=12) (actual time=0.064..0.118 rows=1 
loops=240)
 Index Cond: ((shelters.postal_code_for_joining)::text = 
(outer.zipcode)::text)
 Filter: ((shelter_state)::text = 'active'::text)
 -  Bitmap Heap Scan on pets  (cost=289.17..426.86 rows=35 width=4) 
(actual time=14.362..14.746 rows=10 loops=131)
   Recheck Cond: ((pets.shelter_id = outer.shelter_id) AND 
((pets.pet_state)::text = 'available'::text))
   Filter: (species_id = 1)
   -  BitmapAnd  (cost=289.17..289.17 rows=35 width=0) (actual 
time=14.219..14.219 rows=0 loops=131)
 -  Bitmap Index Scan on pets_shelter_id_idx  
(cost=0.00..3.89 rows=256 width=0) (actual time=0.188..0.188 rows=168 loops=131)
   Index Cond: (pets.shelter_id = outer.shelter_id)
 -  Bitmap Index Scan on pets_pet_state_idx  
(cost=0.00..285.02 rows=41149 width=0) (actual time=22.043..22.043 rows=40397 
loops=82)
   Index Cond: ((pet_state)::text = 'available'::text)
 

Re: [PERFORM] quad or dual core Intel CPUs

2007-02-14 Thread Kenji Morishige
Thanks Claus thats good news!  
I'm having a reputable vendor build the box and test it for me before
delivering.  The bottom line of your message, did you mean 'should be not a
problem'?  I wonder what the main reason for your improvement, your ram was
increased by a factor of 2, but 4 way opteron vs 4 way woodcrest performance
must not be that significant.

-Kenji


 We recently migrated from a four way opteron @ 2 GHz with 8 GB to a
 four way woodcrest @ 3 GHz (HP DL380 G5) with 16 GB ram. I also
 upgraded FreeBSD from 6.0 to 6.2 and did a minor upgrade of postgresql
 from 7.4.9 to 7.4.12. The change was tremendous, the first few hours
 of after it went into production I had to doublecheck that our website
 worked, since the load was way below 1 whereas the load had been
 almost 100 during peak.
 
 I don't have any financial ties to HP but building a server from
 scratch may not be worth it, rather than spending time assemling all
 the different parts yourself I would suggest you get a server from one
 vendor who build a server according to your specs.
 
 The DL380 (also) has a 256 MB bbc controller, the nic works flawlessly
 with FreeBSD 6.2, all parts are well integrated, the frontbay can
 accomodate 8 146 GB SAS drives. This server is wellsuited as a
 postgresql-server.
 
 Approx. 200 reqest a sec. should be a problem unless the queries are heavy.
 
 regards
 Claus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)

2007-02-14 Thread Tom Lane
Mark Stosberg [EMAIL PROTECTED] writes:
 Your suggestion about the pet_state index was right on. I tried 
 Analyze on it, but still got the same bad estimate. However, I then 
 used reindex on that index, and that fixed the estimate accuracy, 
 which made the query run faster!

No, the estimate is about the same, and so is the plan.  The data seems
to have changed though --- on Monday you had

-  Bitmap Index Scan on pets_pet_state_idx  (cost=0.00..562.50 rows=39571 
width=0) (actual time=213.620..213.620 rows=195599 loops=82)
   Index Cond: ((pet_state)::text = 'available'::text)
 
and now it's

 -  Bitmap Index Scan on pets_pet_state_idx  (cost=0.00..285.02 rows=41149 
width=0) (actual time=22.043..22.043 rows=40397 loops=82)
   Index Cond: ((pet_state)::text = 'available'::text)

Don't tell me you got 155000 pets adopted out yesterday ... what
happened here?

[ thinks... ] One possibility is that those were dead but
not-yet-vacuumed rows.  What's your vacuuming policy on this table?
(A bitmap-index-scan plan node will count dead rows as returned,
unlike all other plan node types, since we haven't actually visited
the heap yet...)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Proximity query with GIST and row estimation

2007-02-14 Thread Guillaume Smet

Paul,

On 2/14/07, Paul Ramsey [EMAIL PROTECTED] wrote:

You'll find that PostGIS does a pretty good job of selectivity
estimation.


PostGIS is probably what I'm going to experiment in the future. The
only problem is that it's really big for a very basic need.
With my current method, I don't even have to create a new column: I
create directly a functional index so it's really easy to use.
Using PostGIS requires to create a new column and triggers to maintain
it and install PostGIS of course. That's why it was not my first
choice.

Thanks for your answer.

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Benchmarking PGSQL?

2007-02-14 Thread Luke Lonergan
Hi Merlin,

On 2/14/07 8:20 AM, Merlin Moncure [EMAIL PROTECTED] wrote:

 I am curious what is your take on the maximum insert performance, in
 mb/sec of large bytea columns (toasted), and how much if any greenplum
 was able to advance this over the baseline.  I am asking on behalf of
 another interested party.  Interested in numbers broken down per core
 on 8 core quad system and also aggreate.

Our approach is to attach a segment to each core, so we scale INSERT
linearly on number of cores.  So the per core limit we live with is the
10-20MB/s observed here.  We'd like to improve that so that we get better
performance with smaller machines.

We have demonstrated insert performance of 670 MB/s, 2.4TB/hour for
non-toasted columns using 3 load machines against 120 cores.  This rate was
load machine limited.

WRT toasted bytea columns we haven't done any real benchmarking of those.
Do you have a canned benchmark we can run?

- Luke 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] quad or dual core Intel CPUs

2007-02-14 Thread Claus Guttesen

Approx. 200 reqest a sec. should be a problem unless the queries are heavy.


Thanks Claus thats good news!
I'm having a reputable vendor build the box and test it for me before
delivering.  The bottom line of your message, did you mean 'should be not a
problem'?  I wonder what the main reason for your improvement, your ram was
increased by a factor of 2, but 4 way opteron vs 4 way woodcrest performance
must not be that significant.


Sorry, the line should read 'should *not* be a problem', pardon for
the confusion. So 200 queries/s should be fine, probably won't make
the server sweat.

I'm not shure what attributed most to the decrease when the load went
from approx. 100 during peak to less than 1! Since the db-server is
such a vital part of our infrastructure, I was reluctant to upgrade
it, while load was below 10. But in November and December - when we
have our most busy time -  our website slowed to a crawl, thus phasing
a new server in was an easy decision.

The woodcrest is a better performer compared to the current opteron,
the ciss-disk-controller also has 256 MB cache compared to the 64 MB
LSI-logic controller in the former db-server, FreeBSD 6.2 is also a
better performer than 6.0, but I haven't done any benchmarking on the
same hardware.

regards
Claus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] reindex vs 'analyze'

2007-02-14 Thread Mark Stosberg
On Wed, Feb 14, 2007 at 01:07:23PM -0500, Tom Lane wrote:
 Mark Stosberg [EMAIL PROTECTED] writes:
  Your suggestion about the pet_state index was right on. I tried 
  Analyze on it, but still got the same bad estimate. However, I then 
  used reindex on that index, and that fixed the estimate accuracy, 
  which made the query run faster!
 
 No, the estimate is about the same, and so is the plan.  The data seems
 to have changed though --- on Monday you had
 
 -  Bitmap Index Scan on pets_pet_state_idx  (cost=0.00..562.50 
 rows=39571 width=0) (actual time=213.620..213.620 rows=195599 loops=82)
Index Cond: ((pet_state)::text = 'available'::text)
  
 and now it's
 
  -  Bitmap Index Scan on pets_pet_state_idx  (cost=0.00..285.02 
 rows=41149 width=0) (actual time=22.043..22.043 rows=40397 loops=82)
Index Cond: ((pet_state)::text = 'available'::text)
 
 Don't tell me you got 155000 pets adopted out yesterday ... what
 happened here?

That seemed be the difference that the reindex made. The number of
rows in the table and the number marked available is roughly
unchanged.

select count(*) from pets;

304951
  (1 row)

select count(*) from pets where pet_state = 'available';
---
39857

It appears just about 400 were marked as adopted yesterday. 

 [ thinks... ] One possibility is that those were dead but
 not-yet-vacuumed rows.  What's your vacuuming policy on this table?

It gets vacuum analyzed ery two hours throughout most of the day. Once
Nightly we vacuum analyze everything, but most of the time we just do
this table. 

 (A bitmap-index-scan plan node will count dead rows as returned,
 unlike all other plan node types, since we haven't actually visited
 the heap yet...)

Thanks again for your help, Tom.

Mark

--
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark StosbergPrincipal Developer  
   [EMAIL PROTECTED] Summersault, LLC 
   765-939-9301 ext 202 database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] JOIN to a VIEW makes a real slow query

2007-02-14 Thread Chuck D.

On Tuesday 13 February 2007 14:51, Tom Lane wrote:
 Chuck D. [EMAIL PROTECTED] writes:
  It is still using that sequence scan on the view after the APPEND for the
  us_city and world_city table.  Any reason why the view won't use the
  indexes when it is JOINed to another table but it will when the view is
  queried without a JOIN?  I should have mentioned this is v8.1.4.

 8.1 isn't bright enough for that.  Should work in 8.2 though.


   regards, tom lane

Upgraded to 8.2.3 in my spare time here - went from the packaged binary that 
came with Ubuntu to compiling from source.  Haven't tuned it yet, but what do 
you think about this join on the view?


cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
   QUERY PLAN

 Nested Loop Left Join  (cost=0.00..17.76 rows=10614 width=486) (actual 
time=0.109..0.113 rows=1 loops=1)
   Join Filter: (mu.city_id = ci.city_id)
   -  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=72) (actual 
time=0.015..0.017 rows=1 loops=1)
   -  Append  (cost=0.00..16.72 rows=2 width=422) (actual time=0.073..0.075 
rows=1 loops=1)
 -  Index Scan using pk_us_city on us_city  (cost=0.00..8.28 rows=1 
width=222) (actual time=0.032..0.032 rows=0 loops=1)
   Index Cond: (mu.city_id = us_city.city_id)
 -  Index Scan using world_city_pk on world_city  (cost=0.00..8.44 
rows=1 width=422) (actual time=0.040..0.042 rows=1 loops=1)
   Index Cond: (mu.city_id = world_city.city_id)
 Total runtime: 0.359 ms
(9 rows)


From 65 seconds down to less than 1 ms.  Pretty good huh?  Nice call Tom.  

Now I'll have to find some time to do the production server before this app 
goes up.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-14 Thread Lou O'Quin
unless you specify otherwiise, every insert carries its own transaction
begin/commit.  That's a lot of overhead for a single insert, no?  Why
not use a single transaction for, say, each 1000 inserts?  That would
strike a nice balance of security with efficiency.

pseudo code for the insert:

Begin Transaction;
FOR i in 1..20 LOOP
  INSERT INTO viewfoo (x) VALUES (x);
  IF i % 1000 = 0 THEN
Commit Transaction;
Begin Transaction;
  END IF;
END LOOP;
Commit Transaction;
End


This approach should speed up things dramatically. 


 
 Karen Hill [EMAIL PROTECTED] 2/6/2007 2:39 PM 
On Feb 5, 9:33 pm, [EMAIL PROTECTED] (Tom Lane) wrote:
 Karen Hill [EMAIL PROTECTED] writes:
  I have a pl/pgsql function that is inserting 200,000 records for
  testing purposes.  What is the expected time frame for this
operation
  on a pc with 1/2 a gig of ram and a 7200 RPM disk?

 I think you have omitted a bunch of relevant facts.  Bare INSERT is
 reasonably quick:

 regression=# create table foo (f1 int);
 CREATE TABLE
 regression=# \timing
 Timing is on.
 regression=# insert into foo select x from generate_series(1,20)
x;
 INSERT 0 20
 Time: 5158.564 ms
 regression=#

 (this on a not-very-fast machine) but if you weigh it down with a
ton
 of index updates, foreign key checks, etc, it could get slow ...
 also you haven't mentioned what else that plpgsql function is doing.


The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
inserting to an updatable view (basically two tables).

CREATE TABLE foo1
(


) ;

CREATE TABLE foo2
(

);

CREATE VIEW viewfoo AS
(

);
CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(

);

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..20 LOOP
INSERT INTO viewfoo (x) VALUES (x);
END LOOP;
END;
$$ LANGUAGE plpgsql;



---(end of
broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] An unwanted seqscan

2007-02-14 Thread Brian Herlihy
Hi Tom,

Sorry, I didn't ask the right question.  I meant to ask Why does it estimate a 
smaller cost for the seqscan?

With some further staring I was able to find the bad estimate and fix it by 
increasing the relevant statistics target.

Thanks,
Brian

- Original Message 
From: Tom Lane [EMAIL PROTECTED]
To: Brian Herlihy [EMAIL PROTECTED]
Cc: Postgresql Performance pgsql-performance@postgresql.org
Sent: Wednesday, 14 February, 2007 4:53:54 PM
Subject: Re: [PERFORM] An unwanted seqscan 

Brian Herlihy [EMAIL PROTECTED] writes:
 I am having trouble understanding why a seqscan is chosen for this query.

As far as anyone can see from this output, the planner's decisions are
correct: it prefers the plans with the smaller estimated cost.  If you
want us to take an interest, provide some more context --- EXPLAIN
ANALYZE output for starters.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 6: explain analyze is your friend