Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-29 Thread Toby Corkindale

On 27/04/12 09:33, Tom Lane wrote:

Toby Corkindaletoby.corkind...@strategicdata.com.au  writes:

I've created a bit of a test case now.
There's a Perl script here:
http://dryft.net/postgres/


AFAICT, what is happening is that we're repeating the planning of that
messy nest of views for each child table of foo.  For most of the
children the planner eventually decides that the join degenerates to
nothing because of constraint exclusion, but not until it's expended a
fair amount of time and memory space per child.

I looked at whether we could improve that by having inheritance_planner
use a temporary memory context per child, but that doesn't look very
practical: it would add a good deal of extra data-copying overhead,
and some of the data structures involved are not easily copiable.

The general scheme of replanning per child might be questioned as well,
but IMO it's fairly important given the looseness of inheritance
restrictions --- it's not unlikely that you *need* different plans for
different children.  We might be able to reconsider that approach
whenever we invent an explicit concept of partitioned tables, since
presumably the partitions would all be essentially alike.

In the meantime, the best advice I can come up with is to reconsider
whether you need so many partitions.  That mechanism is really designed
for only a dozen or two partitions at most.



Hi Tom,
Thanks for looking into this, I appreciate you spending the time.

The system I've come up with for partitioning this data requires quite a 
lot of partitions - say thirty to seventy - but I didn't realise it 
would cause trouble down the line, so I'll see if it can be reworked to 
reduce the number.


For what it's worth, the actual query that was blowing out to gigabytes 
was only hitting a couple of dozen partitions per table it was touching 
- but it was hitting three such tables, about sixteen times (!) each.


I'm still curious about why I can do a SELECT * FROM complexview without 
using much memory, but an UPDATE foo FROM complexview causes all the 
memory to get exhausted?


Thanks,
Toby

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-29 Thread Rob Sargentg

On 04/29/2012 07:19 PM, Toby Corkindale wrote:

On 27/04/12 09:33, Tom Lane wrote:

Toby Corkindaletoby.corkind...@strategicdata.com.au  writes:

I've created a bit of a test case now.
There's a Perl script here:
http://dryft.net/postgres/


AFAICT, what is happening is that we're repeating the planning of that
messy nest of views for each child table of foo.  For most of the
children the planner eventually decides that the join degenerates to
nothing because of constraint exclusion, but not until it's expended a
fair amount of time and memory space per child.

I looked at whether we could improve that by having inheritance_planner
use a temporary memory context per child, but that doesn't look very
practical: it would add a good deal of extra data-copying overhead,
and some of the data structures involved are not easily copiable.

The general scheme of replanning per child might be questioned as well,
but IMO it's fairly important given the looseness of inheritance
restrictions --- it's not unlikely that you *need* different plans for
different children.  We might be able to reconsider that approach
whenever we invent an explicit concept of partitioned tables, since
presumably the partitions would all be essentially alike.

In the meantime, the best advice I can come up with is to reconsider
whether you need so many partitions.  That mechanism is really designed
for only a dozen or two partitions at most.



Hi Tom,
Thanks for looking into this, I appreciate you spending the time.

The system I've come up with for partitioning this data requires quite 
a lot of partitions - say thirty to seventy - but I didn't realise it 
would cause trouble down the line, so I'll see if it can be reworked 
to reduce the number.


For what it's worth, the actual query that was blowing out to 
gigabytes was only hitting a couple of dozen partitions per table it 
was touching - but it was hitting three such tables, about sixteen 
times (!) each.


I'm still curious about why I can do a SELECT * FROM complexview 
without using much memory, but an UPDATE foo FROM complexview causes 
all the memory to get exhausted?


Thanks,
Toby


Does

UPDATE foo set values
where foo.id in (select id from complexview...)

also swallow the memory?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-29 Thread Tom Lane
Toby Corkindale toby.corkind...@strategicdata.com.au writes:
 I'm still curious about why I can do a SELECT * FROM complexview without 
 using much memory, but an UPDATE foo FROM complexview causes all the 
 memory to get exhausted?

The former only requires the complexview to get planned once.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-29 Thread Toby Corkindale

On 30/04/12 11:26, Rob Sargentg wrote:

On 04/29/2012 07:19 PM, Toby Corkindale wrote:

On 27/04/12 09:33, Tom Lane wrote:

Toby Corkindaletoby.corkind...@strategicdata.com.au writes:

I've created a bit of a test case now.
There's a Perl script here:
http://dryft.net/postgres/


AFAICT, what is happening is that we're repeating the planning of that
messy nest of views for each child table of foo. For most of the
children the planner eventually decides that the join degenerates to
nothing because of constraint exclusion, but not until it's expended a
fair amount of time and memory space per child.

I looked at whether we could improve that by having inheritance_planner
use a temporary memory context per child, but that doesn't look very
practical: it would add a good deal of extra data-copying overhead,
and some of the data structures involved are not easily copiable.

The general scheme of replanning per child might be questioned as well,
but IMO it's fairly important given the looseness of inheritance
restrictions --- it's not unlikely that you *need* different plans for
different children. We might be able to reconsider that approach
whenever we invent an explicit concept of partitioned tables, since
presumably the partitions would all be essentially alike.

In the meantime, the best advice I can come up with is to reconsider
whether you need so many partitions. That mechanism is really designed
for only a dozen or two partitions at most.



Hi Tom,
Thanks for looking into this, I appreciate you spending the time.

The system I've come up with for partitioning this data requires quite
a lot of partitions - say thirty to seventy - but I didn't realise it
would cause trouble down the line, so I'll see if it can be reworked
to reduce the number.

For what it's worth, the actual query that was blowing out to
gigabytes was only hitting a couple of dozen partitions per table it
was touching - but it was hitting three such tables, about sixteen
times (!) each.

I'm still curious about why I can do a SELECT * FROM complexview
without using much memory, but an UPDATE foo FROM complexview causes
all the memory to get exhausted?

Thanks,
Toby


Does

UPDATE foo set values
where foo.id in (select id from complexview...)

also swallow the memory?


Yes, definitely. (See an earlier post of mine for several variations on 
the query)


However a two-stage process doesn't, ie.
create temp table as select id from complexview;
update foo where id in (select id from complexview);
(or the same thing with FROM)


--
.signature

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-26 Thread Toby Corkindale

On 26/04/12 15:30, Tom Lane wrote:

Toby Corkindaletoby.corkind...@strategicdata.com.au  writes:

On 26/04/12 13:11, Tom Lane wrote:

Well, if you were to provide a reproducible test case, somebody might be
motivated to look into it.  There could be a memory leak in the planner
somewhere, but without a test case it's not very practical to go look
for it.



Would a Perl-based script that built up a database like that be a useful
test case for you?


Yeah, sure, just something that somebody else can run to duplicate the
problem.


For what it's worth, I discovered something quite interesting. The
memory usage only blows out when I do an update based on the results of
the query.


Hm, is the update target an inheritance tree?


The target is the parent table of a bunch of partitions.
The actual rows being updated live in those child tables.

Toby

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-26 Thread Tom Lane
Toby Corkindale toby.corkind...@strategicdata.com.au writes:
 On 26/04/12 15:30, Tom Lane wrote:
 Hm, is the update target an inheritance tree?

 The target is the parent table of a bunch of partitions.

How many would a bunch be, exactly?  I'm fairly sure that the complex
view would get re-planned for each target table ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-26 Thread Toby Corkindale

On 26/04/12 16:58, Tom Lane wrote:

Toby Corkindaletoby.corkind...@strategicdata.com.au  writes:

On 26/04/12 15:30, Tom Lane wrote:

Hm, is the update target an inheritance tree?



The target is the parent table of a bunch of partitions.


How many would a bunch be, exactly?  I'm fairly sure that the complex
view would get re-planned for each target table ...


The table being updated (line) has 57 child tables.

-Toby

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-26 Thread Toby Corkindale

On 26/04/12 17:16, Toby Corkindale wrote:

On 26/04/12 16:58, Tom Lane wrote:

Toby Corkindaletoby.corkind...@strategicdata.com.au writes:

On 26/04/12 15:30, Tom Lane wrote:

Hm, is the update target an inheritance tree?



The target is the parent table of a bunch of partitions.


How many would a bunch be, exactly? I'm fairly sure that the complex
view would get re-planned for each target table ...


The table being updated (line) has 57 child tables.


Although we are specifying a value for the column which they are 
partitioned on.


ie.

CREATE TABLE line (file_id, lineno, status,
   primary key (file_id, lineno));
CREATE TABLE line_1 ( CHECK (file_id=1) ) INHERITS (line);
CREATE TABLE line_2 ( CHECK (file_id=2) ) INHERITS (line);

UPDATE line SET status = something
FROM complex_view cv
WHERE cv.file_id = 2 AND line.file_id=2;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-26 Thread Toby Corkindale

On 26/04/12 13:11, Tom Lane wrote:

Toby Corkindaletoby.corkind...@strategicdata.com.au  writes:

Just wondering if anyone else has thoughts on this?



I'm still suspicious that this is a bug.


Well, if you were to provide a reproducible test case, somebody might be
motivated to look into it.  There could be a memory leak in the planner
somewhere, but without a test case it's not very practical to go look
for it.


Hi,
I've created a bit of a test case now.
There's a Perl script here:
http://dryft.net/postgres/

Running it will create a test database that's populated with quite a lot 
of schemas and partitioned tables, and a few views.


Running EXPLAIN on the query on that database at the end added ~700MB to 
the server-side postgres process.


It's not the same as 3.4GB I've seen on our bigger database warehouse, 
but maybe it's enough to help?


Let me know if I can help elaborate further,

Toby

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-26 Thread Tom Lane
Toby Corkindale toby.corkind...@strategicdata.com.au writes:
 I've created a bit of a test case now.
 There's a Perl script here:
 http://dryft.net/postgres/

AFAICT, what is happening is that we're repeating the planning of that
messy nest of views for each child table of foo.  For most of the
children the planner eventually decides that the join degenerates to
nothing because of constraint exclusion, but not until it's expended a
fair amount of time and memory space per child.

I looked at whether we could improve that by having inheritance_planner
use a temporary memory context per child, but that doesn't look very
practical: it would add a good deal of extra data-copying overhead,
and some of the data structures involved are not easily copiable.

The general scheme of replanning per child might be questioned as well,
but IMO it's fairly important given the looseness of inheritance
restrictions --- it's not unlikely that you *need* different plans for
different children.  We might be able to reconsider that approach
whenever we invent an explicit concept of partitioned tables, since
presumably the partitions would all be essentially alike.

In the meantime, the best advice I can come up with is to reconsider
whether you need so many partitions.  That mechanism is really designed
for only a dozen or two partitions at most.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Toby Corkindale
Hi,
I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit 
Debian system.

I have a database which is moderately large - 20 GByte or so - and contains 
that data split up over dozens of tables, which are themselves partitioned.
Queries are usually only run against fairly small, partitioned, sets of data.

These queries generally run fairly fast. Performance is not a problem.

However Postgres is chewing up huge amounts of memory just to create the query 
plan!

For example, even if I just run
EXPLAIN SELECT a_column FROM a_view
WHERE partition_id = 1;

Then the postgres backend process takes several seconds to return, and in the 
worst example here, is hogging more than 3Gbyte once it comes back. (It doesn't 
free that up until you close the connection)

The query plan that comes back does seem quite convoluted, but then, the view 
is a query run over about eight other views, each of which is pulling data from 
a few other views. The actual underlying data being touched is only *a few 
dozen* small rows.

As I said, the query runs fast enough.. however we only need a handful of these 
queries to get run in separate connections, and the database server will be 
exhausted of memory. Especially since the memory isn't returned until the end 
of the connection, yet these connections typically stay up for a while.

I wondered if there's anything I can do to reduce this memory usage? And, is 
this a bug?

I've posted the output of the query plan here: https://gist.github.com/2487097


Thanks in advance,
Toby

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Willy-Bas Loos
Stacking views is a bad practice. It usually means that you are making the
db do a lot of unnecessary work, scanning tables more than once when you
don't even need them.
According to your description, you have 3 layers of views on partitioned
tables.
I can imagine that that leaves the planner with a lot of possible query
plans, a lot of interaction and a lot of statistics to read.

do you have any special settings for the statistics on these tables?
and could you please post the non-default settings in your postgresql.conf
file?
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]

Would be helpful to see if you have any statistics or planner stuff altered.

Cheers,

WBL


On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale 
toby.corkind...@strategicdata.com.au wrote:

 Hi,
 I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit
 Debian system.

 I have a database which is moderately large - 20 GByte or so - and
 contains that data split up over dozens of tables, which are themselves
 partitioned.
 Queries are usually only run against fairly small, partitioned, sets of
 data.

 These queries generally run fairly fast. Performance is not a problem.

 However Postgres is chewing up huge amounts of memory just to create the
 query plan!

 For example, even if I just run
 EXPLAIN SELECT a_column FROM a_view
 WHERE partition_id = 1;

 Then the postgres backend process takes several seconds to return, and in
 the worst example here, is hogging more than 3Gbyte once it comes back. (It
 doesn't free that up until you close the connection)

 The query plan that comes back does seem quite convoluted, but then, the
 view is a query run over about eight other views, each of which is pulling
 data from a few other views. The actual underlying data being touched is
 only *a few dozen* small rows.

 As I said, the query runs fast enough.. however we only need a handful of
 these queries to get run in separate connections, and the database server
 will be exhausted of memory. Especially since the memory isn't returned
 until the end of the connection, yet these connections typically stay up
 for a while.

 I wondered if there's anything I can do to reduce this memory usage? And,
 is this a bug?

 I've posted the output of the query plan here:
 https://gist.github.com/2487097


 Thanks in advance,
 Toby

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
Quality comes from focus and clarity of purpose -- Mark Shuttleworth


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Toby Corkindale
Hi Willy-Bas,
Thanks for your reply.

I realise that stacking the views up like this complicates matters, but the 
actual views are fairly simple queries, and each one individually is only 
looking at a few dozen rows. (Eg. selecting min, max or average value from a 
small set, grouped by one column)
From the point of view of creating reporting queries, it's a nice and logical 
way to build up a query, and we didn't think it would present any problems.. 
and even on a well-populated database, the query runs very fast. It's just the 
astounding amount of memory used that presents difficulties.

Looking at the postgresql.conf for non-default settings, the notable ones are:

max_connections = 200
ssl = false
shared_buffers = 256MB
max_prepared_transactions = 16
# although they aren't used for the group of queries in question
maintenance_work_mem = 128MB
# work_mem is left at default of 1MB
effective_io_concurrency = 2
random_page_cost = 3.0
effective_cache_size = 512MB
geqo = on
geqo_threshold = 12
geqo_effort = 7

Some other things are non-default, like checkpoints, streaming-replication 
stuff, but those shouldn't have any effect.

The memory settings (shared buffers, effective cache) might seem to be set 
quite conservatively at the moment, given the memory available in the machine 
-- but since we can exhaust that memory with just a few connections, it seems 
fair.

Cheers,
Toby

- Original Message -
From: Willy-Bas Loos willy...@gmail.com
To: Toby Corkindale toby.corkind...@strategicdata.com.au
Cc: pgsql-general pgsql-general@postgresql.org
Sent: Wednesday, 25 April, 2012 6:05:37 PM
Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory


Stacking views is a bad practice. It usually means that you are making the db 
do a lot of unnecessary work, scanning tables more than once when you don't 
even need them. 
According to your description, you have 3 layers of views on partitioned 
tables. 
I can imagine that that leaves the planner with a lot of possible query plans, 
a lot of interaction and a lot of statistics to read. 

do you have any special settings for the statistics on these tables? 
and could you please post the non-default settings in your postgresql.conf 
file? 
$ grep ^[^#] /etc/ postgresql /9.1/main/ postgresql . conf | grep -e 
^[^[:space:]] 

Would be helpful to see if you have any statistics or planner stuff altered. 

Cheers, 

WBL 



On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale  
toby.corkind...@strategicdata.com.au  wrote: 


Hi, 
I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit 
Debian system. 

I have a database which is moderately large - 20 GByte or so - and contains 
that data split up over dozens of tables, which are themselves partitioned. 
Queries are usually only run against fairly small, partitioned, sets of data. 

These queries generally run fairly fast. Performance is not a problem. 

However Postgres is chewing up huge amounts of memory just to create the query 
plan! 

For example, even if I just run 
EXPLAIN SELECT a_column FROM a_view 
WHERE partition_id = 1; 

Then the postgres backend process takes several seconds to return, and in the 
worst example here, is hogging more than 3Gbyte once it comes back. (It doesn't 
free that up until you close the connection) 

The query plan that comes back does seem quite convoluted, but then, the view 
is a query run over about eight other views, each of which is pulling data from 
a few other views. The actual underlying data being touched is only *a few 
dozen* small rows. 

As I said, the query runs fast enough.. however we only need a handful of these 
queries to get run in separate connections, and the database server will be 
exhausted of memory. Especially since the memory isn't returned until the end 
of the connection, yet these connections typically stay up for a while. 

I wondered if there's anything I can do to reduce this memory usage? And, is 
this a bug? 

I've posted the output of the query plan here: https://gist.github.com/2487097 


Thanks in advance, 
Toby 

-- 
Sent via pgsql-general mailing list ( pgsql-general@postgresql.org ) 
To make changes to your subscription: 
http://www.postgresql.org/mailpref/pgsql-general 



-- 
Quality comes from focus and clarity of purpose -- Mark Shuttleworth 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Willy-Bas Loos
would it be possible to reproduce the same query without using any views?
you could see the difference in memory usage.

if that doesn't explain, try also without inheritance, by using the ONLY
keyword (and UNION ALL).

If it's really only a couple of rows, you might as well post a dump
somewhere? Then i could reproduce.

WBL


On Wed, Apr 25, 2012 at 10:51 AM, Toby Corkindale 
toby.corkind...@strategicdata.com.au wrote:

 Hi Willy-Bas,
 Thanks for your reply.

 I realise that stacking the views up like this complicates matters, but
 the actual views are fairly simple queries, and each one individually is
 only looking at a few dozen rows. (Eg. selecting min, max or average value
 from a small set, grouped by one column)
 From the point of view of creating reporting queries, it's a nice and
 logical way to build up a query, and we didn't think it would present any
 problems.. and even on a well-populated database, the query runs very fast.
 It's just the astounding amount of memory used that presents difficulties.

 Looking at the postgresql.conf for non-default settings, the notable ones
 are:

 max_connections = 200
 ssl = false
 shared_buffers = 256MB
 max_prepared_transactions = 16
 # although they aren't used for the group of queries in question
 maintenance_work_mem = 128MB
 # work_mem is left at default of 1MB
 effective_io_concurrency = 2
 random_page_cost = 3.0
 effective_cache_size = 512MB
 geqo = on
 geqo_threshold = 12
 geqo_effort = 7

 Some other things are non-default, like checkpoints, streaming-replication
 stuff, but those shouldn't have any effect.

 The memory settings (shared buffers, effective cache) might seem to be set
 quite conservatively at the moment, given the memory available in the
 machine -- but since we can exhaust that memory with just a few
 connections, it seems fair.

 Cheers,
 Toby

 - Original Message -
 From: Willy-Bas Loos willy...@gmail.com
 To: Toby Corkindale toby.corkind...@strategicdata.com.au
 Cc: pgsql-general pgsql-general@postgresql.org
 Sent: Wednesday, 25 April, 2012 6:05:37 PM
 Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory


 Stacking views is a bad practice. It usually means that you are making the
 db do a lot of unnecessary work, scanning tables more than once when you
 don't even need them.
 According to your description, you have 3 layers of views on partitioned
 tables.
 I can imagine that that leaves the planner with a lot of possible query
 plans, a lot of interaction and a lot of statistics to read.

 do you have any special settings for the statistics on these tables?
 and could you please post the non-default settings in your postgresql.conf
 file?
 $ grep ^[^#] /etc/ postgresql /9.1/main/ postgresql . conf | grep -e
 ^[^[:space:]]

 Would be helpful to see if you have any statistics or planner stuff
 altered.

 Cheers,

 WBL



 On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale 
 toby.corkind...@strategicdata.com.au  wrote:


 Hi,
 I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit
 Debian system.

 I have a database which is moderately large - 20 GByte or so - and
 contains that data split up over dozens of tables, which are themselves
 partitioned.
 Queries are usually only run against fairly small, partitioned, sets of
 data.

 These queries generally run fairly fast. Performance is not a problem.

 However Postgres is chewing up huge amounts of memory just to create the
 query plan!

 For example, even if I just run
 EXPLAIN SELECT a_column FROM a_view
 WHERE partition_id = 1;

 Then the postgres backend process takes several seconds to return, and in
 the worst example here, is hogging more than 3Gbyte once it comes back. (It
 doesn't free that up until you close the connection)

 The query plan that comes back does seem quite convoluted, but then, the
 view is a query run over about eight other views, each of which is pulling
 data from a few other views. The actual underlying data being touched is
 only *a few dozen* small rows.

 As I said, the query runs fast enough.. however we only need a handful of
 these queries to get run in separate connections, and the database server
 will be exhausted of memory. Especially since the memory isn't returned
 until the end of the connection, yet these connections typically stay up
 for a while.

 I wondered if there's anything I can do to reduce this memory usage? And,
 is this a bug?

 I've posted the output of the query plan here:
 https://gist.github.com/2487097


 Thanks in advance,
 Toby

 --
 Sent via pgsql-general mailing list ( pgsql-general@postgresql.org )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



 --
 Quality comes from focus and clarity of purpose -- Mark Shuttleworth




-- 
Quality comes from focus and clarity of purpose -- Mark Shuttleworth


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Toby Corkindale
- Original Message -
 From: Willy-Bas Loos willy...@gmail.com
 To: Toby Corkindale toby.corkind...@strategicdata.com.au
 Cc: pgsql-general pgsql-general@postgresql.org
 Sent: Wednesday, 25 April, 2012 7:16:50 PM
 Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory
 
 would it be possible to reproduce the same query without using any
 views?
 you could see the difference in memory usage.

It is possible to do it in stages instead, and if done that way, memory usage 
(and explain plans) are quite normal.
However to change everything would be very time consuming to re-code so I would 
rather avoid that.

 if that doesn't explain, try also without inheritance, by using the
 ONLY keyword (and UNION ALL).

Have tried something similar to that (accessing partition slices directly). It 
didn't change anything.


 If it's really only a couple of rows, you might as well post a dump
 somewhere? Then i could reproduce.

Well, the data touched by the query is only a handful of rows, but it's a 
handful of rows in 20GB of *other* rows.
Although the query plan correctly estimates it's only going to touch on a few.
I'm pretty sure we didn't see this crazy memory usage in earlier testing, 
before the database was starting to get populated.

Toby

 
 WBL
 
 
 
 On Wed, Apr 25, 2012 at 10:51 AM, Toby Corkindale 
 toby.corkind...@strategicdata.com.au  wrote:
 
 
 Hi Willy-Bas,
 Thanks for your reply.
 
 I realise that stacking the views up like this complicates matters,
 but the actual views are fairly simple queries, and each one
 individually is only looking at a few dozen rows. (Eg. selecting
 min, max or average value from a small set, grouped by one column)
 From the point of view of creating reporting queries, it's a nice and
 logical way to build up a query, and we didn't think it would
 present any problems.. and even on a well-populated database, the
 query runs very fast. It's just the astounding amount of memory used
 that presents difficulties.
 
 Looking at the postgresql.conf for non-default settings, the notable
 ones are:
 
 max_connections = 200
 ssl = false
 shared_buffers = 256MB
 max_prepared_transactions = 16
 # although they aren't used for the group of queries in question
 maintenance_work_mem = 128MB
 # work_mem is left at default of 1MB
 effective_io_concurrency = 2
 random_page_cost = 3.0
 effective_cache_size = 512MB
 geqo = on
 geqo_threshold = 12
 geqo_effort = 7
 
 Some other things are non-default, like checkpoints,
 streaming-replication stuff, but those shouldn't have any effect.
 
 The memory settings (shared buffers, effective cache) might seem to
 be set quite conservatively at the moment, given the memory
 available in the machine -- but since we can exhaust that memory
 with just a few connections, it seems fair.
 
 Cheers,
 Toby
 
 
 - Original Message -
 From: Willy-Bas Loos  willy...@gmail.com 
 To: Toby Corkindale  toby.corkind...@strategicdata.com.au 
 Cc: pgsql-general  pgsql-general@postgresql.org 
 Sent: Wednesday, 25 April, 2012 6:05:37 PM
 Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of
 memory
 
 
 Stacking views is a bad practice. It usually means that you are
 making the db do a lot of unnecessary work, scanning tables more
 than once when you don't even need them.
 According to your description, you have 3 layers of views on
 partitioned tables.
 I can imagine that that leaves the planner with a lot of possible
 query plans, a lot of interaction and a lot of statistics to read.
 
 do you have any special settings for the statistics on these tables?
 and could you please post the non-default settings in your
 postgresql.conf file?
 $ grep ^[^#] /etc/ postgresql /9.1/main/ postgresql . conf | grep -e
 ^[^[:space:]]
 
 
 
 Would be helpful to see if you have any statistics or planner stuff
 altered.
 
 Cheers,
 
 WBL
 
 
 
 On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale 
 toby.corkind...@strategicdata.com.au  wrote:
 
 
 Hi,
 I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a
 64bit Debian system.
 
 I have a database which is moderately large - 20 GByte or so - and
 contains that data split up over dozens of tables, which are
 themselves partitioned.
 Queries are usually only run against fairly small, partitioned, sets
 of data.
 
 These queries generally run fairly fast. Performance is not a
 problem.
 
 However Postgres is chewing up huge amounts of memory just to create
 the query plan!
 
 For example, even if I just run
 EXPLAIN SELECT a_column FROM a_view
 WHERE partition_id = 1;
 
 Then the postgres backend process takes several seconds to return,
 and in the worst example here, is hogging more than 3Gbyte once it
 comes back. (It doesn't free that up until you close the connection)
 
 The query plan that comes back does seem quite convoluted, but then,
 the view is a query run over about eight other views, each of which
 is pulling data from a few other views. The actual underlying data

Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Toby Corkindale

Hi,
Just wondering if anyone else has thoughts on this?

I'm still suspicious that this is a bug.

If I run EXPLAIN (or the query itself) on a database that has all the 
schemas and tables created, but just the relevant data touched by the 
query loaded.. then everything is fine.


The query plan is still hundreds of lines long, but running it doesn't 
use much RAM. So I think that eliminates work_mem-related issues.


It really does seem like it's purely the query plan itself that is 
consuming all the memory.


Has anyone else seen this?

Thanks,
Toby


On 25/04/12 16:18, Toby Corkindale wrote:

Hi,
I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit 
Debian system.

I have a database which is moderately large - 20 GByte or so - and contains 
that data split up over dozens of tables, which are themselves partitioned.
Queries are usually only run against fairly small, partitioned, sets of data.

These queries generally run fairly fast. Performance is not a problem.

However Postgres is chewing up huge amounts of memory just to create the query 
plan!

For example, even if I just run
EXPLAIN SELECT a_column FROM a_view
WHERE partition_id = 1;

Then the postgres backend process takes several seconds to return, and in the 
worst example here, is hogging more than 3Gbyte once it comes back. (It doesn't 
free that up until you close the connection)

The query plan that comes back does seem quite convoluted, but then, the view 
is a query run over about eight other views, each of which is pulling data from 
a few other views. The actual underlying data being touched is only *a few 
dozen* small rows.

As I said, the query runs fast enough.. however we only need a handful of these 
queries to get run in separate connections, and the database server will be 
exhausted of memory. Especially since the memory isn't returned until the end 
of the connection, yet these connections typically stay up for a while.

I wondered if there's anything I can do to reduce this memory usage? And, is 
this a bug?

I've posted the output of the query plan here: https://gist.github.com/2487097


Thanks in advance,
Toby




--
.signature

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Tom Lane
Toby Corkindale toby.corkind...@strategicdata.com.au writes:
 Just wondering if anyone else has thoughts on this?

 I'm still suspicious that this is a bug.

Well, if you were to provide a reproducible test case, somebody might be
motivated to look into it.  There could be a memory leak in the planner
somewhere, but without a test case it's not very practical to go look
for it.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Toby Corkindale

On 26/04/12 13:11, Tom Lane wrote:

Toby Corkindaletoby.corkind...@strategicdata.com.au  writes:

Just wondering if anyone else has thoughts on this?



I'm still suspicious that this is a bug.


Well, if you were to provide a reproducible test case, somebody might be
motivated to look into it.  There could be a memory leak in the planner
somewhere, but without a test case it's not very practical to go look
for it.


Hi Tom,
Thanks for responding.. I'm trying to work on a test case, but it's 
quite tricky.
It'll need to be something like a script that generates a tonne of 
partitions at the very least. I don't know if the actual amount of data 
in the partitions is part of the problem or not.
Would a Perl-based script that built up a database like that be a useful 
test case for you?



For what it's worth, I discovered something quite interesting. The 
memory usage only blows out when I do an update based on the results of 
the query. But not if I just select the results on their own, nor if I 
do the update using those values on its own.


ie.

Method #1, uses all the memory and doesn't return it:
  explain update line set status = 'foo'
  where file_id=725 and line.lineno in (
select line from complex_view
where file_id=725
  );


Method #2, also uses all the memory:
  explain update line set status = 'foo'
  from complex_view v
  where line.lineno = v.line
  and line.file_id=725
  and v.file_id=725;


Method #3, which uses next to no memory:
  explain select line from complex_view
  where file_id=725;


Method #4, which also uses next to no memory:
  explain create temp table foo as
  select line from complex_view;

  where file_id=725;
  update line set status = 'foo'
  from foo
  where line.lineno=foo.line
and file_id=725;


-Toby

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory

2012-04-25 Thread Tom Lane
Toby Corkindale toby.corkind...@strategicdata.com.au writes:
 On 26/04/12 13:11, Tom Lane wrote:
 Well, if you were to provide a reproducible test case, somebody might be
 motivated to look into it.  There could be a memory leak in the planner
 somewhere, but without a test case it's not very practical to go look
 for it.

 Would a Perl-based script that built up a database like that be a useful 
 test case for you?

Yeah, sure, just something that somebody else can run to duplicate the
problem.

 For what it's worth, I discovered something quite interesting. The 
 memory usage only blows out when I do an update based on the results of 
 the query.

Hm, is the update target an inheritance tree?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general