Re: [PERFORM] Slow query with planner row strange estimation

2010-07-12 Thread Dimitri
It's probably one of the cases when having HINTS in PostgreSQL may be
very helpful..

SELECT /*+ enable_nestloop=off */ ... FROM ...

will just fix this query without impacting other queries and without
adding any additional instructions into the application code..

So, why there is a such resistance to implement hints withing SQL
queries in PG?..

Rgds,
-Dimitri


On 7/9/10, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jul 9, 2010 at 6:13 AM, damien hostin damien.hos...@axege.com
 wrote:
 Have you tried running ANALYZE on the production server?

 You might also want to try ALTER TABLE ... SET STATISTICS to a large
 value on some of the join columns involved in the query.

 Hello,

 Before comparing the test case on the two machines, I run analyse on the
 whole and look at pg_stats table to see if change occurs for the columns.
 but on the production server the stats never became as good as on the
 desktop computer. I set statistic at 1 on column used by the join, run
 analyse which take a 300 row sample then look at the stats. The stats
 are not as good as on the desktop. Row number is nearly the same but only
 1
 or 2 values are found.

 The data are not balanced the same way on the two computer :
 - Desktop is 12000 rows with 6000 implicated in the query (50%),
 - Production (actually a dev/test server) is 6 million rows with 6000
 implicated in the query (0,1%).
 Columns used in the query are nullable, and in the 5994000 other rows that
 are not implicated in the query these columns are null.

 I don't know if the statistic target is a % or a number of value to
 obtain,

 It's a number of values to obtain.

 but event set at max (1), it didn't managed to collect good stats (for
 this particular query).

 I think there's a cutoff where it won't collect values unless they
 occur significantly more often than the average frequency.  I wonder
 if that might be biting you here: without the actual values in the MCV
 table, the join selectivity estimates probably aren't too good.

 As I don't know what more to do, my conclusion is that the data need to be
 better balanced to allow the analyse gather better stats. But if there is
 a
 way to improve the stats/query with this ugly balanced data, I'm open to
 it
 !

 I hope that in real production, data will never be loaded this way. If
 this
 appened we will maybe set enable_nestloop to off, but I don't think it's a
 good solution, other query have a chance to get slower.

 Yeah, that usually works out poorly.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Fwd: Re: [PERFORM] Slow query with planner row strange estimation]

2010-07-12 Thread damien hostin


--
HOSTIN Damien - Equipe RD
Tel:+33(0)4 63 05 95 40
Société Axège
23 rue Saint Simon
63000 Clermont Ferrand
www.axege.com

---BeginMessage---

Robert Haas a écrit :

On Wed, Jul 7, 2010 at 10:39 AM, damien hostin damien.hos...@axege.com wrote:
  

Hello again,

At last, I check the same query with the same data on my desktop computer.
Just after loading the data, the queries were slow, I launch a vaccum
analyse which collect good stats on the main table, the query became quick
(~200ms). Now 1classic sata disk computer is faster than our little monster
server !!



Have you tried running ANALYZE on the production server?

You might also want to try ALTER TABLE ... SET STATISTICS to a large
value on some of the join columns involved in the query.

  

Hello,

Before comparing the test case on the two machines, I run analyse on the 
whole and look at pg_stats table to see if change occurs for the 
columns. but on the production server the stats never became as good as 
on the desktop computer. I set statistic at 1 on column used by the 
join, run analyse which take a 300 row sample then look at the 
stats. The stats are not as good as on the desktop. Row number is nearly 
the same but only 1 or 2 values are found.


The data are not balanced the same way on the two computer :
- Desktop is 12000 rows with 6000 implicated in the query (50%),
- Production (actually a dev/test server) is 6 million rows with 6000 
implicated in the query (0,1%).
Columns used in the query are nullable, and in the 5994000 other rows 
that are not implicated in the query these columns are null.


I don't know if the statistic target is a % or a number of value to 
obtain, but event set at max (1), it didn't managed to collect good 
stats (for this particular query).
As I don't know what more to do, my conclusion is that the data need to 
be better balanced to allow the analyse gather better stats. But if 
there is a way to improve the stats/query with this ugly balanced data, 
I'm open to it !


I hope that in real production, data will never be loaded this way. If 
this appened we will maybe set enable_nestloop to off, but I don't think 
it's a good solution, other query have a chance to get slower.



Thanks for helping

--
HOSTIN Damien - Equipe RD
Tel:+33(0)4 63 05 95 40
Société Axège
23 rue Saint Simon
63000 Clermont Ferrand
www.axege.com


---End Message---

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [Fwd: Re: [PERFORM] Slow query with planner row strange estimation]

2010-07-12 Thread damien hostin

-Ooops sorry for the spam-



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-12 Thread Matthew Wakeling

On Sat, 10 Jul 2010, Tom Lane wrote:

Doesn't pgpool do this?


No, and in fact that's exactly why the proposed implementation isn't
ever going to be in core: it's not possible to do it portably.


I'm surprised. Doesn't apache httpd do this? Does it have to do a whole 
load of non-portable stuff? It seems to work on a whole load of platforms.


Matthew

--
I would like to think that in this day and age people would know better than
to open executables in an e-mail. I'd also like to be able to flap my arms
and fly to the moon.-- Tim Mullen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-12 Thread Craig Ringer
On 12/07/10 17:45, Matthew Wakeling wrote:
 
 I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
 load of non-portable stuff? It seems to work on a whole load of platforms.

A lot of what Apache HTTPd does is handled via the Apache Portable
Runtime (APR). It contains a lot of per-platform handlers for various
functionality.

http://apr.apache.org/docs/apr/1.4/modules.html

I don't know if the socket passing is provided as part of APR or is part
of Apache HTTPd its self, but I wouldn't be at all surprised if it was
in APR.

Personally I'm now swayed by arguments presented here that trying to
push pooling into core isn't really desirable, and that better
packaging/bundling of existing solutions would be better.

Perhaps documenting the pluses/minuses of the current pooling options
and providing clear recommendations on which to use for different use
cases would help, since half the trouble is users not knowing they need
a pool or being confused as to which to select.

This discussion reminds me a bit of Hibernate's built-in client-side
connection pool. It has one, but it's a unloved stepchild that even the
Hibernate devs suggest should be avoided in favour of a couple of
external 3rd party options.

A built-in pool seems like a great idea, but there are multiple existing
ones because they solve different problems in different ways. Unless a
built-in one could solve ALL those needs, or be so vastly simpler (due
to code re-use, easier configuration, etc) that it's worth building one
that won't fit everyone's needs, then it's best to stick to the existing
external options.

So rather than asking should core have a connection pool perhaps
what's needed is to ask what can an in-core pool do that an external
pool cannot do?

Admission control / resource limit features would be great to have in
core, and can't really be done fully in external modules ... but could
be designed in ways that would allow external poolers to add
functionality on top. Josh Berkus has made some good points on why this
isn't as easy as it looks, though:


http://it.toolbox.com/blogs/database-soup/admission-control-and-its-discontents-39895

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] now() gives same time within the session

2010-07-12 Thread Atul.Goel
Hi,

I need to log the start and end time of the procedures in a table. But the 
start and end time are same. This is how I recreated the issue.

create table test_time (time timestamp);
delete from  test_time;
insert into test_time select now();
SELECT pg_sleep(10);
insert into test_time select now();
SELECT pg_sleep(10);
insert into test_time select now();
SELECT pg_sleep(10);
insert into test_time select now();
SELECT pg_sleep(10);
select * from test_time;

2010-07-12 12:43:40.509746
2010-07-12 12:43:40.509746
2010-07-12 12:43:40.509746
2010-07-12 12:43:40.509746

Atul Goel
SENIOR DEVELOPER

Global DataPoint
Middlesex House, 34-42 Cleveland Street
London W1T 4LB, UK
T: +44 (0)20 7079 4827
M: +44 (0)7846765098
www.globaldatapoint.comhttp://www.globaldatapoint.com/

This e-mail is confidential and should not be used by anyone who is not the 
original intended recipient. Global DataPoint Limited does not accept liability 
for any statements made which are clearly the sender's own and not expressly 
made on behalf of Global DataPoint Limited. No contracts may be concluded on 
behalf of Global DataPoint Limited by means of e-mail communication. Global 
DataPoint Limited Registered in England and Wales with registered number 
3739752 Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 
4LB


Re: [PERFORM] now() gives same time within the session

2010-07-12 Thread A. Kretschmer
In response to atul.g...@globaldatapoint.com :
 Hi,
 
  
 
 I need to log the start and end time of the procedures in a table. But the
 start and end time are same. This is how I recreated the issue.
 
  
 
 create table test_time (time timestamp);
 
 delete from  test_time;
 
 insert into test_time select now();


Use timeofday() instead, now() returns the transaction starting time.

BEGIN
test=*# select now();
  now
---
 2010-07-12 13:13:28.907043+02
(1 row)

test=*# select timeofday();
  timeofday
--
 Mon Jul 12 13:13:36.187703 2010 CEST
(1 row)

test=*# select now();
  now
---
 2010-07-12 13:13:28.907043+02
(1 row)

test=*#


Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PostgreSQL PITR - more doubts

2010-07-12 Thread Jayadevan M
Hello all,
One doubt about how PostgreSQL PITR works. Let us say I have all the 
archived WALs for the past week with 
archive_command = 'cp -i %p /home/postgres/archive/%f /dev/null' 
I took a base backup last night. If I try to recover the server today 
after 
copying the base backup from yesterday and providing 
restore_command = 'cp /home/postgres/archive/%f %p'
does PostgreSQL go through all the past week's archived WALS or 
it can figure out that the base backup is from yesterday, so skip 
a large number of archived WALs and start only from file xxx ?
Either way, are there ways to speed up the restore process?
Regards,
Jayadevan 





DISCLAIMER: 

The information in this e-mail and any attachment is intended only for 
the person to whom it is addressed and may contain confidential and/or 
privileged material. If you have received this e-mail in error, kindly 
contact the sender and destroy all copies of the original communication. 
IBS makes no warranty, express or implied, nor guarantees the accuracy, 
adequacy or completeness of the information contained in this email or any 
attachment and is not liable for any errors, defects, omissions, viruses 
or for resultant loss or damage, if any, direct or indirect.






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL PITR - more doubts

2010-07-12 Thread Jesper Krogh

On 2010-07-12 13:23, Jayadevan M wrote:

Hello all,
One doubt about how PostgreSQL PITR works. Let us say I have all the
archived WALs for the past week with
archive_command = 'cp -i %p /home/postgres/archive/%f/dev/null'
I took a base backup last night. If I try to recover the server today
after
copying the base backup from yesterday and providing
restore_command = 'cp /home/postgres/archive/%f %p'
does PostgreSQL go through all the past week's archived WALS or
it can figure out that the base backup is from yesterday, so skip
a large number of archived WALs and start only from file xxx ?
   


Yes, It starts out form where it needs to. Assuming you
did a pg_start_backup() before you did your base backup?

--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL PITR - more doubts

2010-07-12 Thread Jayadevan M
 Yes, It starts out form where it needs to. Assuming you
did a pg_start_backup() before you did your base backup?

Thanks. I did. 
It uses files like 000B00D9.0020.backupto get the 
necessary information?

Regards,
Jayadevan





DISCLAIMER: 

The information in this e-mail and any attachment is intended only for 
the person to whom it is addressed and may contain confidential and/or 
privileged material. If you have received this e-mail in error, kindly 
contact the sender and destroy all copies of the original communication. 
IBS makes no warranty, express or implied, nor guarantees the accuracy, 
adequacy or completeness of the information contained in this email or any 
attachment and is not liable for any errors, defects, omissions, viruses 
or for resultant loss or damage, if any, direct or indirect.






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] now() gives same time within the session

2010-07-12 Thread Atul.Goel
Sure thanks a lot.

Regards,
Atul Goel

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of A. Kretschmer
Sent: 12 July 2010 12:15
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] now() gives same time within the session

In response to atul.g...@globaldatapoint.com :
 Hi,



 I need to log the start and end time of the procedures in a table. But the
 start and end time are same. This is how I recreated the issue.



 create table test_time (time timestamp);

 delete from  test_time;

 insert into test_time select now();


Use timeofday() instead, now() returns the transaction starting time.

BEGIN
test=*# select now();
  now
---
 2010-07-12 13:13:28.907043+02
(1 row)

test=*# select timeofday();
  timeofday
--
 Mon Jul 12 13:13:36.187703 2010 CEST
(1 row)

test=*# select now();
  now
---
 2010-07-12 13:13:28.907043+02
(1 row)

test=*#


Regards, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This e-mail is confidential and should not be used by anyone who is not the 
original intended recipient. Global DataPoint Limited does not accept liability 
for any statements made which are clearly the sender's own and not expressly 
made on behalf of Global DataPoint Limited. No contracts may be concluded on 
behalf of Global DataPoint Limited by means of e-mail communication. Global 
DataPoint Limited Registered in England and Wales with registered number 
3739752 Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 
4LB

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL PITR - more doubts

2010-07-12 Thread Kevin Grittner
Jayadevan M jayadevan.maym...@ibsplc.com wrote:
 
 Yes, It starts out form where it needs to. Assuming you
 did a pg_start_backup() before you did your base backup?
 
 Thanks. I did. 
 It uses files like 000B00D9.0020.backupto get
 the necessary information?
 
Yeah, since it's a text file, you can easily have a look at what is
stored there.  It's based on when pg_start_backup and pg_stop_backup
were run.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-12 Thread Kevin Grittner
Craig Ringer cr...@postnewspapers.com.au wrote:
 
 So rather than asking should core have a connection pool perhaps
 what's needed is to ask what can an in-core pool do that an
 external pool cannot do?
 
(1)  It can prevent the most pessimal performance problems resulting
from lack of an external connection pool (or a badly configured one)
by setting a single GUC.  Configuration tools could suggest a good
value during initial setup.
 
(2)  It can be used without installing and configuring a more
sophisticated and complex product.
 
(3)  It might reduce latency because it avoids having to receive,
parse, and resend data in both directions -- eliminating one hop. 
I know the performance benefit would usually accrue to the external
connection pooler, but there might be some circumstances where a
built-in pool could win.
 
(4)  It's one more checkbox which can be ticked off on some RFPs.
 
That said, I fully agree that if we can include good documentation
on the external poolers and we can get packagers to include poolers
in their distribution, that gets us a much bigger benefit.  A
built-in solution would only be worthwhile if it was simple enough
and lightweight enough not to be a burden on execution time or
maintenance.  Maybe that's too big an if.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] now() gives same time within the session

2010-07-12 Thread Rob Wultsch
On Mon, Jul 12, 2010 at 4:15 AM, A. Kretschmer
andreas.kretsch...@schollglas.com wrote:
 In response to atul.g...@globaldatapoint.com :
 Hi,



 I need to log the start and end time of the procedures in a table. But the
 start and end time are same. This is how I recreated the issue.



 create table test_time (time timestamp);

 delete from  test_time;

 insert into test_time select now();


 Use timeofday() instead, now() returns the transaction starting time.


Is this part of the SQL standard?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] now() gives same time within the session

2010-07-12 Thread Thom Brown
On 12 July 2010 14:11, Rob Wultsch wult...@gmail.com wrote:
 On Mon, Jul 12, 2010 at 4:15 AM, A. Kretschmer
 andreas.kretsch...@schollglas.com wrote:
 In response to atul.g...@globaldatapoint.com :
 Hi,



 I need to log the start and end time of the procedures in a table. But the
 start and end time are same. This is how I recreated the issue.



 create table test_time (time timestamp);

 delete from  test_time;

 insert into test_time select now();


 Use timeofday() instead, now() returns the transaction starting time.


 Is this part of the SQL standard?


I don't believe it is.  See
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT
for more info.

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] now() gives same time within the session

2010-07-12 Thread A. Kretschmer
In response to Rob Wultsch :
 On Mon, Jul 12, 2010 at 4:15 AM, A. Kretschmer
 andreas.kretsch...@schollglas.com wrote:
  Use timeofday() instead, now() returns the transaction starting time.
 
 
 Is this part of the SQL standard?

Don't know, sorry.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] now() gives same time within the session

2010-07-12 Thread Kenneth Marshall
On Mon, Jul 12, 2010 at 06:11:31AM -0700, Rob Wultsch wrote:
 On Mon, Jul 12, 2010 at 4:15 AM, A. Kretschmer
 andreas.kretsch...@schollglas.com wrote:
  In response to atul.g...@globaldatapoint.com :
  Hi,
 
 
 
  I need to log the start and end time of the procedures in a table. But the
  start and end time are same. This is how I recreated the issue.
 
 
 
  create table test_time (time timestamp);
 
  delete from ?test_time;
 
  insert into test_time select now();
 
 
  Use timeofday() instead, now() returns the transaction starting time.
 
 
 Is this part of the SQL standard?
 
No, see section 9.9.4 of the manual.

Cheers,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Exists, limit and alternate plans

2010-07-12 Thread Віталій Тимчишин
Hello.

Today I've found out strange results for query below.
select version();
 version

--
 PostgreSQL 8.4.2 on amd64-portbld-freebsd8.0, compiled by GCC cc (GCC)
4.2.1 20070719  [FreeBSD], 64-bit

--Original query:
explain analyze select exists(select * from investor i where i.company_id =
this_.id) from COMPANY this_ order by this_.rank desc, this_.id asc limit
10;
 Limit  (cost=0.00..50.67 rows=10 width=16) (actual time=144.489..144.556
rows=10 loops=1)
   -  Index Scan using comp_rank_id on company this_
 (cost=0.00..34616009.08 rows=6831169 width=16) (actual
time=144.484..144.524 rows=10 loops=1)
 SubPlan 1
   -  Index Scan using company_invs on investor i  (cost=0.00..9.52
rows=2 width=0) (never executed)
 Index Cond: ((company_id)::bigint = $0)
 SubPlan 2
   -  Seq Scan on investor i  (cost=0.00..1836.17 rows=41717
width=8) (actual time=0.006..72.364 rows=41722 loops=1)
 Total runtime: 144.975 ms
(8 rows)

--set enable_seqscan=false;
explain analyze select exists(select * from investor i where i.company_id =
this_.id) from COMPANY this_ order by this_.rank desc, this_.id asc limit
10;
 Limit  (cost=0.00..50.67 rows=10 width=16) (actual time=0.045..0.177
rows=10 loops=1)
   -  Index Scan using comp_rank_id on company this_
 (cost=0.00..34616009.08 rows=6831169 width=16) (actual time=0.041..0.146
rows=10 loops=1)
 SubPlan 1
   -  Index Scan using company_invs on investor i  (cost=0.00..9.52
rows=2 width=0) (actual time=0.007..0.007 rows=1 loops=10)
 Index Cond: ((company_id)::bigint = $0)
 SubPlan 2
   -  Seq Scan on investor i  (cost=100.00..1001836.17
rows=41717 width=8) (never executed)
 Total runtime: 0.253 ms
(8 rows)

--limit inside exists
explain analyze select exists(select * from investor i where i.company_id =
this_.id limit 1) from COMPANY this_ order by this_.rank desc, this_.id asc
limit 10;
 Limit  (cost=0.00..50.67 rows=10 width=16) (actual time=0.052..0.219
rows=10 loops=1)
   -  Index Scan using comp_rank_id on company this_
 (cost=0.00..34616009.08 rows=6831169 width=16) (actual time=0.049..0.189
rows=10 loops=1)
 SubPlan 1
   -  Limit  (cost=0.00..4.76 rows=1 width=422) (actual
time=0.011..0.011 rows=1 loops=10)
 -  Index Scan using company_invs on investor i
 (cost=0.00..9.52 rows=2 width=422) (actual time=0.007..0.007 rows=1
loops=10)
   Index Cond: ((company_id)::bigint = $0)
 Total runtime: 0.291 ms
(7 rows)

So, my Qs:
1) Do we really have alternative plans for SubPlan that are selected at
runtime? Wow.
2) Why Seq scan plan is selected by default? Is it because of outer limit
not being applied when calculating costs for subplans at runtime?
3) Why does limit inside exists helps? Is it simply because new
alternative logic in not applied for complex case?

-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Exists, limit and alternate plans

2010-07-12 Thread Tom Lane
=?KOI8-U?B?96bUwcymyiD0yc3eydvJzg==?= tiv...@gmail.com writes:
 So, my Qs:
 1) Do we really have alternative plans for SubPlan that are selected at
 runtime? Wow.

Yup, see the AlternativeSubPlan stuff.

 2) Why Seq scan plan is selected by default? Is it because of outer limit
 not being applied when calculating costs for subplans at runtime?

It's currently driven off the estimated rowcount for the parent plan
node --- 6831169 in your example.  The subplan cannot see that the
parent plan node will be terminated short of full execution, so it
thinks that hashing the whole investor table will be a win.
Obviously it'd be nice to improve that for cases like upper LIMITs.

 3) Why does limit inside exists helps?

I think it defeats the applicability of the hash-the-whole-subtable
approach.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-12 Thread Hannu Krosing
On Mon, 2010-07-12 at 18:58 +0800, Craig Ringer wrote:
 On 12/07/10 17:45, Matthew Wakeling wrote:
  
  I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
  load of non-portable stuff? It seems to work on a whole load of platforms.
 
 A lot of what Apache HTTPd does is handled via the Apache Portable
 Runtime (APR). It contains a lot of per-platform handlers for various
 functionality.
 
 http://apr.apache.org/docs/apr/1.4/modules.html
 
 I don't know if the socket passing is provided as part of APR or is part
 of Apache HTTPd its self, but I wouldn't be at all surprised if it was
 in APR.
 
 Personally I'm now swayed by arguments presented here that trying to
 push pooling into core isn't really desirable, and that better
 packaging/bundling of existing solutions would be better.

better packaging/bundling of existing solutions is good in it's own
right,weather there will eventually be some support for pooling in core
or not.

 Perhaps documenting the pluses/minuses of the current pooling options
 and providing clear recommendations on which to use for different use
 cases would help, since half the trouble is users not knowing they need
 a pool or being confused as to which to select.

 This discussion reminds me a bit of Hibernate's built-in client-side
 connection pool. It has one, but it's a unloved stepchild that even the
 Hibernate devs suggest should be avoided in favour of a couple of
 external 3rd party options.

Yes, pooling _is_ often better handled as a (set of) separate options,
just because of the reason that here one size does definitely not fit
all;

And efficient in-core pooler probably will look very much like pgbouncer
running in a separate thread spawned by postmaster anyway.

Let's hope there will be some support in core for having user defined
helper processes soon(ish), so tweaking pgbouncer to run as one will be
reasonably easy :)

 A built-in pool seems like a great idea, but there are multiple existing
 ones because they solve different problems in different ways. Unless a
 built-in one could solve ALL those needs, or be so vastly simpler (due
 to code re-use, easier configuration, etc) that it's worth building one
 that won't fit everyone's needs, then it's best to stick to the existing
 external options.
 
 So rather than asking should core have a connection pool perhaps
 what's needed is to ask what can an in-core pool do that an external
 pool cannot do?

Probably nothing. OTOH there are some things that an external pool can
do that a built-in one can't, like running on a separate host and
pooling more than 32000 client connections there.

Cascaded pooling seems also impossible with built-in pooling

 Admission control / resource limit features would be great to have in
 core, and can't really be done fully in external modules ... but could
 be designed in ways that would allow external poolers to add
 functionality on top. Josh Berkus has made some good points on why this
 isn't as easy as it looks, though:
 
 
 http://it.toolbox.com/blogs/database-soup/admission-control-and-its-discontents-39895
 
 --
 Craig Ringer
 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-12 Thread Greg Smith

Ryan Wexler wrote:
One question I do have is this card has a setting called Read Policy 
which apparently helps with sequentially reads.  Do you think that is 
something I should enable?


Linux will do some amount of read-ahead in a similar way on its own.  
You run blockdev --getra and blockdev --setra on each disk device on 
the system to see the settings and increase them.  I've found that 
tweaking there, where you can control exactly the amount of readahead, 
to be more effective than relying on the less tunable Read Policy modes 
in RAID cards that do something similar.  That said, it doesn't seem to 
hurt to use both on the LSI card you have; giving more information there 
to the controller for its use in optimizing how it caches things, by 
changing to the more aggressive Read Policy setting, hasn't ever 
degraded results significantly when I've tried.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help in performance tuning.

2010-07-12 Thread Bruce Momjian
Craig Ringer wrote:
 It'll need to separate running queries from running processes, or
 start threading backends, so that one way or the other a single query
 can benefit from the capabilities of multiple CPUs. The same separation,
 or a move to async I/O, might be needed to get one query to concurrently
 read multiple partitions of a table, or otherwise get maximum benefit
 from high-capacity I/O subsystems when running just a few big, expensive
 queries.
 
 Otherwise I'm wondering if PostgreSQL will begin really suffering in
 performance on workloads where queries are big and expensive but there
 are relatively few of them running at a time.

Agreed.  We certainly are going to have to go in that direction someday.
We have TODO items for these.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with planner row strange estimation

2010-07-12 Thread phb07


Dimitri a écrit :

It's probably one of the cases when having HINTS in PostgreSQL may be
very helpful..

SELECT /*+ enable_nestloop=off */ ... FROM ...

will just fix this query without impacting other queries and without
adding any additional instructions into the application code..

So, why there is a such resistance to implement hints withing SQL
queries in PG?..

Rgds,
-Dimitri

  

+1.
Another typical case when it would be helpful is with setting the 
cursor_tuple_fraction GUC variable for a specific statement, without 
being obliged to issue 2 SET statements, one before the SELECT and the 
other after.



On 7/9/10, Robert Haas robertmh...@gmail.com wrote:
  

On Fri, Jul 9, 2010 at 6:13 AM, damien hostin damien.hos...@axege.com
wrote:


Have you tried running ANALYZE on the production server?

You might also want to try ALTER TABLE ... SET STATISTICS to a large
value on some of the join columns involved in the query.


Hello,

Before comparing the test case on the two machines, I run analyse on the
whole and look at pg_stats table to see if change occurs for the columns.
but on the production server the stats never became as good as on the
desktop computer. I set statistic at 1 on column used by the join, run
analyse which take a 300 row sample then look at the stats. The stats
are not as good as on the desktop. Row number is nearly the same but only
1
or 2 values are found.

The data are not balanced the same way on the two computer :
- Desktop is 12000 rows with 6000 implicated in the query (50%),
- Production (actually a dev/test server) is 6 million rows with 6000
implicated in the query (0,1%).
Columns used in the query are nullable, and in the 5994000 other rows that
are not implicated in the query these columns are null.

I don't know if the statistic target is a % or a number of value to
obtain,
  

It's a number of values to obtain.



but event set at max (1), it didn't managed to collect good stats (for
this particular query).
  

I think there's a cutoff where it won't collect values unless they
occur significantly more often than the average frequency.  I wonder
if that might be biting you here: without the actual values in the MCV
table, the join selectivity estimates probably aren't too good.



As I don't know what more to do, my conclusion is that the data need to be
better balanced to allow the analyse gather better stats. But if there is
a
way to improve the stats/query with this ugly balanced data, I'm open to
it
!

I hope that in real production, data will never be loaded this way. If
this
appened we will maybe set enable_nestloop to off, but I don't think it's a
good solution, other query have a chance to get slower.
  

Yeah, that usually works out poorly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Regards.
Philippe Beaudoin.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition queries hitting all partitions even though check key is specified

2010-07-12 Thread Josh Berkus
On 9/2/09 10:05 AM, Kevin Kempter wrote:
 On Wednesday 02 September 2009 09:02:27 Scott Marlowe wrote:
 On Wed, Sep 2, 2009 at 8:52 AM, Kevin Kempterkev...@consistentstate.com 
 wrote:
 Hi all;

 I cant figure out why we're scanning all of our partitions.

I don't think extract() is immutable, which would pretty much invalidate
your check constraints as far as CE is concerned.

I suggest feeding the actual numeric values to the check constraints.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition queries hitting all partitions even though check key is specified

2010-07-12 Thread Joshua D. Drake
On Mon, 2010-07-12 at 22:01 -0500, Josh Berkus wrote:
 On 9/2/09 10:05 AM, Kevin Kempter wrote:
  On Wednesday 02 September 2009 09:02:27 Scott Marlowe wrote:
  On Wed, Sep 2, 2009 at 8:52 AM, Kevin Kempterkev...@consistentstate.com 
  wrote:
  Hi all;
 
  I cant figure out why we're scanning all of our partitions.
 
 I don't think extract() is immutable, which would pretty much invalidate
 your check constraints as far as CE is concerned.

Correct.

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance