Re: [PERFORM] Hardware suggestions for maximum read performance

2013-05-07 Thread Jeff Janes
On Thu, May 2, 2013 at 6:35 PM, Scott Marlowe wrote:

> On Thu, May 2, 2013 at 5:11 PM, Mike McCann  wrote:
> > Hello,
> >
> > We are in the fortunate situation of having more money than time to help
> > solve our PostgreSQL 9.1 performance problem.
> >
> > Our server hosts databases that are about 1 GB in size with the largest
> > tables having order 10 million 20-byte indexed records. The data are
> loaded
> > once and then read from a web app and other client programs.  Some of the
> > queries execute ORDER BY on the results. There are typically less than a
> > dozen read-only concurrent connections to any one database.
>

I wouldn't count on this being a problem that can be fixed merely by
throwing money at it.

How many rows does any one of these queries need to access and then ORDER
BY?

...

>
> > HP ProLiant DL360p Gen 8
> > Dual Intel Xeon 2.4GHz 4-core E5-2609 CPUs
> > 64GB RAM
> > 2x146GB 15K SAS hard drives
> > 3x200GB SATA SLC SSDs
> > + the usual accessories (optical drive, rail kit, dual power supplies)
>
> If your DB is 1G, and will grow to 10G then the IO shouldn't be any
> problem, as the whole db should be cached in memory.



But it can take a surprisingly long time to get it cached in the first
place, from a cold start.

If that is the problem, pg_prewarm could help.


Cheers,

Jeff


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-07 Thread Anne Rosset
Thanks Igor.
I am going to test with pgbouncer. Will let you know.

Thanks,
Anne



-Original Message-
From: Igor Neyman [mailto:iney...@perceptron.com] 
Sent: Monday, May 06, 2013 7:04 PM
To: Anne Rosset; k...@rice.edu
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Deterioration in performance when query executed in 
multi threads



From: Anne Rosset [aros...@collab.net]
Sent: Monday, May 06, 2013 5:51 PM
To: Igor Neyman; k...@rice.edu
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Deterioration in performance when query executed in  
multi threads

Hi Igor,
Result with enable_nestloop off:




---
 Hash Join  (cost=49946.49..58830.02 rows=1 width=181) (actual 
time=2189.474..2664.888 rows=180 loops=1)
   Hash Cond: ((item.created_by_id)::text = (sfuser.id)::text)
   ->  Hash Join  (cost=49470.50..58345.53 rows=1 width=167) (actual 
time=1931.870..2404.745 rows=180 loops=1)
 Hash Cond: ((relationship.origin_id)::text = (sfuser2.id)::text)
 ->  Hash Join  (cost=48994.51..57869.52 rows=1 width=153) (actual 
time=1927.603..2400.334 rows=180 loops=1)
   Hash Cond: ((relationship.target_id)::text = (artifact.id)::text)
   ->  Seq Scan on relationship  (cost=0.00..7973.38 rows=240435 
width=19) (actual time=0.036..492.442 rows=241285 loops=1)
 Filter: ((NOT is_deleted) AND 
((relationship_type_name)::text = 'ArtifactAssignment'::text))
   ->  Hash  (cost=48994.49..48994.49 rows=1 width=154) (actual 
time=1858.350..1858.350 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 34kB
 ->  Hash Join  (cost=47260.54..48994.49 rows=1 width=154) 
(actual time=1836.495..1858.151 rows=180 loops=1)
   Hash Cond: ((field_value4.id)::text = 
(artifact.customer_fv)::text)
   ->  Seq Scan on field_value field_value4  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=22.104..30.694 rows=77378 
loops=1)
   ->  Hash  (cost=47260.52..47260.52 rows=1 width=163) 
(actual time=1814.005..1814.005 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 35kB
 ->  Hash Join  (cost=45526.57..47260.52 rows=1 
width=163) (actual time=1790.908..1813.780 rows=180 loops=1)
   Hash Cond: ((field_value3.id)::text = 
(artifact.category_fv)::text)
   ->  Seq Scan on field_value field_value3 
 (cost=0.00..1443.78 rows=77378 width=15) (actual time=0.002..9.262 rows=77378 
loops=1)
   ->  Hash  (cost=45526.55..45526.55 
rows=1 width=166) (actual time=1790.505..1790.505 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory 
Usage: 36kB
 ->  Hash Join  
(cost=43792.60..45526.55 rows=1 width=166) (actual time=1768.362..1790.304 
rows=180 loops=1)
   Hash Cond: 
((field_value.id)::text = (artifact.group_fv)::text)
   ->  Seq Scan on field_value  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=0.002..8.687 rows=77378 
loops=1)
   ->  Hash  
(cost=43792.58..43792.58 rows=1 width=175) (actual time=1767.928..1767.928 
rows=180 loops=1)
 Buckets: 1024  
Batches: 1  Memory Usage: 38kB
 ->  Hash Join  
(cost=42058.63..43792.58 rows=1 width=175) (actual time=1499.822..1767.734 
rows=180 loops=1)
   Hash Cond: 
((field_value2.id)::text = (artifact.status_fv)::text)
   ->  Seq Scan on 
field_value field_value2  (cost=0.00..1443.78 rows=77378 width=15) (actual 
time=0.002..261.082 rows=77378 loops=1)
   ->  Hash  
(cost=42058.61..42058.61 rows=1 width=178) (actual time=1492.707..1492.707 
rows=180 loops=1)
 Buckets: 
1024  Batches: 1  Memory Usage: 38kB
 ->  Hash 
Join  (cost=18039.59..42058.61 rows=1 width=178) (actual 
time=1175.659..1492.482 rows=180 loops=1)
   Hash 
Cond: ((item.id)::text = (artifact.id)::text)
 

Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Mark Kirkwood

On 07/05/13 19:33, Simon Riggs wrote:

On 7 May 2013 07:32, Mark Kirkwood  wrote:

On 07/05/13 18:10, Simon Riggs wrote:


On 7 May 2013 01:23,   wrote:


I'm thinking that a variant of (2) might be simpler to inplement:

(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans
to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows >
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.



Are you loading using COPY? Why not break down the load into chunks?



INSERT - but we could maybe workaround by chunking the INSERT. However that
*really* breaks the idea that in SQL you just say what you want, not how the
database engine should do it! And more practically means that the most
obvious and clear way to add your new data has nasty side effects, and you
have to tip toe around muttering secret incantations to make things work
well :-)


Yes, we'd need to break up SQL statements into pieces and use external
transaction snapshots to do that.


I'm still thinking that making postgres smarter about having current stats
for getting the actual optimal plan is the best solution.


I agree.

The challenge now is to come up with something that actually works;
most of the ideas have been very vague and ignore the many downsides.
The hard bit is the analysis and balanced thinking, not the
developing.



Yeah - seeing likely downsides can be a bit tricky too. I'll have a play 
with some prototyping ideas, since this is actually an area of postgres 
(analyze/stats collector) that I've fiddled with before :-)


Cheers

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Simon Riggs
On 7 May 2013 07:32, Mark Kirkwood  wrote:
> On 07/05/13 18:10, Simon Riggs wrote:
>>
>> On 7 May 2013 01:23,   wrote:
>>
>>> I'm thinking that a variant of (2) might be simpler to inplement:
>>>
>>> (I think Matt C essentially beat me to this suggestion - he originally
>>> discovered this issue). It is probably good enough for only *new* plans
>>> to
>>> react to the increased/increasing number of in progress rows. So this
>>> would require backends doing significant numbers of row changes to either
>>> directly update pg_statistic or report their in progress numbers to the
>>> stats collector. The key change here is the partial execution numbers
>>> would need to be sent. Clearly one would need to avoid doing this too
>>> often (!) - possibly only when number of changed rows >
>>> autovacuum_analyze_scale_factor proportion of the relation concerned or
>>> similar.
>>
>>
>> Are you loading using COPY? Why not break down the load into chunks?
>>
>
> INSERT - but we could maybe workaround by chunking the INSERT. However that
> *really* breaks the idea that in SQL you just say what you want, not how the
> database engine should do it! And more practically means that the most
> obvious and clear way to add your new data has nasty side effects, and you
> have to tip toe around muttering secret incantations to make things work
> well :-)

Yes, we'd need to break up SQL statements into pieces and use external
transaction snapshots to do that.

> I'm still thinking that making postgres smarter about having current stats
> for getting the actual optimal plan is the best solution.

I agree.

The challenge now is to come up with something that actually works;
most of the ideas have been very vague and ignore the many downsides.
The hard bit is the analysis and balanced thinking, not the
developing.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Matt Clarkson

On Tue, 2013-05-07 at 18:32 +1200, Mark Kirkwood wrote:
> On 07/05/13 18:10, Simon Riggs wrote:
> > On 7 May 2013 01:23,   wrote:
> >
> >> I'm thinking that a variant of (2) might be simpler to inplement:
> >>
> >> (I think Matt C essentially beat me to this suggestion - he originally
> >> discovered this issue). It is probably good enough for only *new* plans to
> >> react to the increased/increasing number of in progress rows. So this
> >> would require backends doing significant numbers of row changes to either
> >> directly update pg_statistic or report their in progress numbers to the
> >> stats collector. The key change here is the partial execution numbers
> >> would need to be sent. Clearly one would need to avoid doing this too
> >> often (!) - possibly only when number of changed rows >
> >> autovacuum_analyze_scale_factor proportion of the relation concerned or
> >> similar.
> >
> > Are you loading using COPY? Why not break down the load into chunks?
> >
> 
> INSERT - but we could maybe workaround by chunking the INSERT. However 
> that *really* breaks the idea that in SQL you just say what you want, 
> not how the database engine should do it! And more practically means 
> that the most obvious and clear way to add your new data has nasty side 
> effects, and you have to tip toe around muttering secret incantations to 
> make things work well :-)

We also had the same problem with an UPDATE altering the data
distribution in such a way that trivial but frequently executed queries
cause massive server load until auto analyze sorted out the stats.

-- 
Matt Clarkson
Catalyst.Net Limited




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance