Re: [PERFORM] Slow BLOBs restoring

2010-12-08 Thread Vlad Arkhipov
I discovered this issue a bit more. -j option is slowing down BLOBs 
restoring. It's about 1000x times slower if you specify this option. 
Does anybody plan to fix it?
I have encountered a problem while restoring the database. There is a 
table that contains XML data (BLOB), ~ 3 000 000 records, ~ 5.5Gb of 
data. pg_restore has been running for a week without any considerable 
progress. There are plenty of lines like these in the log:


pg_restore: processing item 3125397 BLOB 10001967
pg_restore: executing BLOB 10001967

CPU usage is 100% always. The total database size is about 100 Gb and 
it restores in an hour or so without BLOBs.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Yeah, that was my concern, too, though Tom seems skeptical (perhaps
 rightly). šAnd I'm not really sure why the PROCLOCKs need to be in a
 hash table anyway - if we know the PROC and LOCK we can surely look up
 the PROCLOCK pretty expensively by following the PROC SHM_QUEUE.

 Err, pretty INexpensively.

There are plenty of scenarios in which a proc might hold hundreds or
even thousands of locks.  pg_dump, for example.  You do not want to be
doing seq search there.

Now, it's possible that you could avoid *ever* needing to search for a
specific PROCLOCK, in which case eliminating the hash calculation
overhead might be worth it.  Of course, you'd still have to replicate
all the space-management functionality of a shared hash table.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow BLOBs restoring

2010-12-08 Thread Tom Lane
Vlad Arkhipov arhi...@dc.baikal.ru writes:
 I discovered this issue a bit more. -j option is slowing down BLOBs 
 restoring. It's about 1000x times slower if you specify this option. 

Are you by any chance restoring from an 8.3 or older pg_dump file made
on Windows?  If so, it's a known issue.

 Does anybody plan to fix it?

Not without a complete reproducible example ... and not at all if it's
the known problem.  The fix for that is to update pg_dump to 8.4 or
later.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Bryce Nesbitt
Can you help me understand how to optimize the following.  There's a 
subplan which in this case returns 3 rows,

but it is really expensive:



=
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
 OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
 WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
   
QUERY PLAN

 Hash Join  (cost=83054.41..443755.45 rows=261077 width=4) (actual 
time=4362.143..6002.808 rows=28 loops=1)

   Hash Cond: (articles.context_key = contexts.context_key)
   -  Seq Scan on articles  (cost=0.00..345661.91 rows=522136 width=4) 
(actual time=0.558..3953.002 rows=517356 loops=1)

 Filter: indexed
   -  Hash  (cost=69921.25..69921.25 rows=800493 width=4) (actual 
time=829.501..829.501 rows=31 loops=1)
 -  Seq Scan on contexts  (cost=14.31..69921.25 rows=800493 
width=4) (actual time=1.641..829.339 rows=31 loops=1)

   Filter: ((parent_key = 392210) OR (hashed subplan))
   SubPlan
 -  Index Scan using collection_data_context_key_index 
on collection_data  (cost=0.00..14.30 rows=6 width=4) (actual 
time=0.018..0.023 rows=3 loops=1)

   Index Cond: (collection_context_key = 392210)
 Total runtime: 6002.976 ms
(11 rows)


=
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN 
(392210,392210,395073,1304250))

AND articles.indexed
;
  QUERY PLAN
---
 Nested Loop  (cost=14.35..1863.85 rows=94 width=4) (actual 
time=0.098..1.038 rows=57 loops=1)
   -  Bitmap Heap Scan on contexts  (cost=14.35..572.57 rows=288 
width=4) (actual time=0.079..0.274 rows=59 loops=1)
 Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY 
('{392210,392210,395073,1304250}'::integer[])))
 -  BitmapOr  (cost=14.35..14.35 rows=288 width=0) (actual 
time=0.066..0.066 rows=0 loops=1)
   -  Bitmap Index Scan on parent_key_idx  
(cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1)

 Index Cond: (parent_key = 392210)
   -  Bitmap Index Scan on parent_key_idx  
(cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87 
loops=1)
 Index Cond: (parent_key = ANY 
('{392210,392210,395073,1304250}'::integer[]))
   -  Index Scan using article_key_idx on articles  (cost=0.00..4.47 
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59)

 Index Cond: (articles.context_key = contexts.context_key)
 Filter: articles.indexed
 Total runtime: 1.166 ms
(12 rows)

production= explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
 OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
 WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;


=
# select version();
PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 
20061115 (prerelease) (Debian 4.1.1-21)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group commit and commit delay/siblings

2010-12-08 Thread Simon Riggs
On Mon, 2010-12-06 at 23:52 -0500, Greg Smith wrote:
 Jignesh Shah wrote:
  On Tue, Dec 7, 2010 at 1:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:

  I could have sworn we'd refactored that to something like
 bool ThereAreAtLeastNActiveBackends(int n)
  which could drop out of the loop as soon as it'd established what we
  really need to know...I'd suggest that we just improve the
  coding so that we don't scan ProcArray at all when commit_siblings is 0.
 
  (I do agree with improving the docs to warn people away from assuming
  this is a knob to frob mindlessly.)
  
  In that case I propose that we support commit_siblings=0 which is not
  currently supported. Minimal value for commit_siblings  is currently
  1. If we support commit_siblings=0 then it should short-circuit that
  function call which is often what I do in my tests with commit_delay.

 
 Everybody should be happy now:  attached patch refactors the code to 
 exit as soon as the siblings count is exceeded, short-circuits with no 
 scanning of ProcArray if the minimum is 0, and allows setting the 
 siblings to 0 to enable that shortcut:

Minor patch, no downsides. Docs checked. Committed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Shrirang Chitnis
Bryce,

The two queries are different:

You are looking for contexts.context_key in first query

WHERE (contexts.parent_key = 392210
  OR contexts.context_key IN
 (SELECT collection_data.context_key
 FROM collection_data
  WHERE collection_data.collection_context_key = 392210)


but second query has context.parent_key

WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))

Is the contexts.context_key an indexed field? contexts.parent_key certainly 
seems to be.


HTH,


Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services


Office: (866) 808-0935 Ext: 39210
shrirang.chit...@hovservices.com
www.hovservices.com

The information contained in this message, including any attachments, is 
attorney privileged and/or confidential information intended only for the use 
of the individual or entity named as addressee.  The review, dissemination, 
distribution or copying of this communication by or to anyone other than the 
intended addressee is strictly prohibited.  If you have received this 
communication in error, please immediately notify the sender by replying to the 
message and destroy all copies of the original message.


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Bryce Nesbitt
Sent: Thursday, December 09, 2010 12:24 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] hashed subplan 5000x slower than two sequential operations

Can you help me understand how to optimize the following.  There's a
subplan which in this case returns 3 rows,
but it is really expensive:



=
explain analyze SELECT contexts.context_key
FROM contexts
 JOIN articles
 ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
  OR contexts.context_key IN
 (SELECT collection_data.context_key
 FROM collection_data
  WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;

QUERY PLAN

  Hash Join  (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=4362.143..6002.808 rows=28 loops=1)
Hash Cond: (articles.context_key = contexts.context_key)
-  Seq Scan on articles  (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.558..3953.002 rows=517356 loops=1)
  Filter: indexed
-  Hash  (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=829.501..829.501 rows=31 loops=1)
  -  Seq Scan on contexts  (cost=14.31..69921.25 rows=800493
width=4) (actual time=1.641..829.339 rows=31 loops=1)
Filter: ((parent_key = 392210) OR (hashed subplan))
SubPlan
  -  Index Scan using collection_data_context_key_index
on collection_data  (cost=0.00..14.30 rows=6 width=4) (actual
time=0.018..0.023 rows=3 loops=1)
Index Cond: (collection_context_key = 392210)
  Total runtime: 6002.976 ms
(11 rows)


=
explain analyze SELECT contexts.context_key
FROM contexts
 JOIN articles
 ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))
AND articles.indexed
;
   QUERY PLAN
---
  Nested Loop  (cost=14.35..1863.85 rows=94 width=4) (actual
time=0.098..1.038 rows=57 loops=1)
-  Bitmap Heap Scan on contexts  (cost=14.35..572.57 rows=288
width=4) (actual time=0.079..0.274 rows=59 loops=1)
  Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[])))
  -  BitmapOr  (cost=14.35..14.35 rows=288 width=0) (actual
time=0.066..0.066 rows=0 loops=1)
-  Bitmap Index Scan on parent_key_idx
(cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1)
  Index Cond: (parent_key = 392210)
-  Bitmap Index Scan on parent_key_idx
(cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87
loops=1)
  Index Cond: (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[]))
-  Index Scan using article_key_idx on articles  (cost=0.00..4.47
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59)
  Index Cond: (articles.context_key = contexts.context_key)
  Filter: articles.indexed
  Total runtime: 1.166 ms
(12 rows)

production= explain analyze SELECT contexts.context_key
FROM 

Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Bryce Nesbitt

Shrirang Chitnis wrote:

Bryce,
The two queries are different:
   
Ah, due to a mistake.  The first version with the hashed subplan is from 
production.

The second version should have read:


production= SELECT collection_data.context_key FROM collection_data 
WHERE collection_data.collection_context_key = 392210;

  392210
  395073
 1304250
production= explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.context_key IN 
(392210,395073,1304250))

AND articles.indexed
;
 QUERY PLAN
-
 Nested Loop  (cost=12.32..414.41 rows=20 width=4) (actual 
time=0.112..0.533 rows=28 loops=1)
   -  Bitmap Heap Scan on contexts  (cost=12.32..135.13 rows=62 
width=4) (actual time=0.079..0.152 rows=31 loops=1)
 Recheck Cond: ((parent_key = 392210) OR (context_key = ANY 
('{392210,392210,395073,1304250}'::integer[])))
 -  BitmapOr  (cost=12.32..12.32 rows=62 width=0) (actual 
time=0.070..0.070 rows=0 loops=1)
   -  Bitmap Index Scan on parent_key_idx  
(cost=0.00..3.07 rows=58 width=0) (actual time=0.029..0.029 rows=28 loops=1)

 Index Cond: (parent_key = 392210)
   -  Bitmap Index Scan on contexts_pkey  (cost=0.00..9.22 
rows=4 width=0) (actual time=0.037..0.037 rows=4 loops=1)
 Index Cond: (context_key = ANY 
('{392210,392210,395073,1304250}'::integer[]))
   -  Index Scan using article_key_idx on articles  (cost=0.00..4.49 
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=31)

 Index Cond: (articles.context_key = contexts.context_key)
 Filter: articles.indexed
 Total runtime: 0.614 ms
(12 rows)







production= explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
 OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
 WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
   
QUERY PLAN

 Hash Join  (cost=83054.41..443755.45 rows=261077 width=4) (actual 
time=3415.609..6737.863 rows=28 loops=1)

   Hash Cond: (articles.context_key = contexts.context_key)
   -  Seq Scan on articles  (cost=0.00..345661.91 rows=522136 width=4) 
(actual time=0.038..4587.914 rows=517416 loops=1)

 Filter: indexed
   -  Hash  (cost=69921.25..69921.25 rows=800493 width=4) (actual 
time=926.965..926.965 rows=31 loops=1)
 -  Seq Scan on contexts  (cost=14.31..69921.25 rows=800493 
width=4) (actual time=2.113..926.794 rows=31 loops=1)

   Filter: ((parent_key = 392210) OR (hashed subplan))
   SubPlan
 -  Index Scan using collection_data_context_key_index 
on collection_data  (cost=0.00..14.30 rows=6 width=4) (actual 
time=0.084..0.088 rows=3 loops=1)

   Index Cond: (collection_context_key = 392210)
 Total runtime: 6738.042 ms
(11 rows)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Marc Mamin


Hello,

are the table freshly analyzed, with a sufficient default_statistics_target ?

You may try to get a better plan while rewriting the query as an UNION to get 
rid of the OR clause.
Something like (not tested):

SELECT contexts.context_key
FROM contexts
 JOIN articles
 ON (articles.context_key=contexts.context_key)
WHERE contexts.parent_key = 392210
AND articles.indexed


UNION
SELECT context_key
FROM
(
  SELECT contexts.context_key
  FROM contexts JOIN collection_data ON ( contexts.context_key = 
collection_data .context_key)
  WHERE collection_data.collection_context_key = 392210)
) foo JOIN articles ON (foo.context_key=contexts.context_key)
WHERE articles.indexed
;


I've had one similar problem where there was no way for the planner to notice 
that the query would systematically return very few rows. Here, my last resort 
was to disable some planner methods within the given transaction.

regards,

Marc Mamin

-Ursprüngliche Nachricht-
Von: pgsql-performance-ow...@postgresql.org im Auftrag von Shrirang Chitnis
Gesendet: Mi 12/8/2010 8:05
An: Bryce Nesbitt; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two sequential 
operations
 
Bryce,

The two queries are different:

You are looking for contexts.context_key in first query

WHERE (contexts.parent_key = 392210
  OR contexts.context_key IN
 (SELECT collection_data.context_key
 FROM collection_data
  WHERE collection_data.collection_context_key = 392210)


but second query has context.parent_key

WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))

Is the contexts.context_key an indexed field? contexts.parent_key certainly 
seems to be.


HTH,


Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services


Office: (866) 808-0935 Ext: 39210
shrirang.chit...@hovservices.com
www.hovservices.com

The information contained in this message, including any attachments, is 
attorney privileged and/or confidential information intended only for the use 
of the individual or entity named as addressee.  The review, dissemination, 
distribution or copying of this communication by or to anyone other than the 
intended addressee is strictly prohibited.  If you have received this 
communication in error, please immediately notify the sender by replying to the 
message and destroy all copies of the original message.


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Bryce Nesbitt
Sent: Thursday, December 09, 2010 12:24 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] hashed subplan 5000x slower than two sequential operations

Can you help me understand how to optimize the following.  There's a
subplan which in this case returns 3 rows,
but it is really expensive:



=
explain analyze SELECT contexts.context_key
FROM contexts
 JOIN articles
 ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
  OR contexts.context_key IN
 (SELECT collection_data.context_key
 FROM collection_data
  WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;

QUERY PLAN

  Hash Join  (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=4362.143..6002.808 rows=28 loops=1)
Hash Cond: (articles.context_key = contexts.context_key)
-  Seq Scan on articles  (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.558..3953.002 rows=517356 loops=1)
  Filter: indexed
-  Hash  (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=829.501..829.501 rows=31 loops=1)
  -  Seq Scan on contexts  (cost=14.31..69921.25 rows=800493
width=4) (actual time=1.641..829.339 rows=31 loops=1)
Filter: ((parent_key = 392210) OR (hashed subplan))
SubPlan
  -  Index Scan using collection_data_context_key_index
on collection_data  (cost=0.00..14.30 rows=6 width=4) (actual
time=0.018..0.023 rows=3 loops=1)
Index Cond: (collection_context_key = 392210)
  Total runtime: 6002.976 ms
(11 rows)


=
explain analyze SELECT contexts.context_key
FROM contexts
 JOIN articles
 ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))
AND articles.indexed
;
   QUERY PLAN

Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Tom Lane
Shrirang Chitnis shrirang.chit...@hovservices.com writes:
 Bryce,
 The two queries are different:

I suspect the second one is a typo and not what he really wanted.

 WHERE (contexts.parent_key = 392210
   OR contexts.context_key IN
  (SELECT collection_data.context_key
  FROM collection_data
   WHERE collection_data.collection_context_key = 392210)

The only really effective way the planner knows to optimize an
IN (sub-SELECT) is to turn it into a semi-join, which is not possible
here because of the unrelated OR clause.  You might consider replacing
this with a UNION of two scans of contexts.  (And yes, I know it'd be
nicer if the planner did that for you.)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Marc Mamin

Another point: would a conditionl index help ?

on articles (context_key) where indexed

regards,

-Ursprüngliche Nachricht-
Von: pgsql-performance-ow...@postgresql.org im Auftrag von Marc Mamin
Gesendet: Mi 12/8/2010 9:06
An: Shrirang Chitnis; Bryce Nesbitt; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two sequential 
operations
 


Hello,

are the table freshly analyzed, with a sufficient default_statistics_target ?

You may try to get a better plan while rewriting the query as an UNION to get 
rid of the OR clause.
Something like (not tested):

SELECT contexts.context_key
FROM contexts
 JOIN articles
 ON (articles.context_key=contexts.context_key)
WHERE contexts.parent_key = 392210
AND articles.indexed


UNION
SELECT context_key
FROM
(
  SELECT contexts.context_key
  FROM contexts JOIN collection_data ON ( contexts.context_key = 
collection_data .context_key)
  WHERE collection_data.collection_context_key = 392210)
) foo JOIN articles ON (foo.context_key=contexts.context_key)
WHERE articles.indexed
;


I've had one similar problem where there was no way for the planner to notice 
that the query would systematically return very few rows. Here, my last resort 
was to disable some planner methods within the given transaction.

regards,

Marc Mamin

-Ursprüngliche Nachricht-
Von: pgsql-performance-ow...@postgresql.org im Auftrag von Shrirang Chitnis
Gesendet: Mi 12/8/2010 8:05
An: Bryce Nesbitt; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two sequential 
operations
 
Bryce,

The two queries are different:

You are looking for contexts.context_key in first query

WHERE (contexts.parent_key = 392210
  OR contexts.context_key IN
 (SELECT collection_data.context_key
 FROM collection_data
  WHERE collection_data.collection_context_key = 392210)


but second query has context.parent_key

WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))

Is the contexts.context_key an indexed field? contexts.parent_key certainly 
seems to be.


HTH,


Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services


Office: (866) 808-0935 Ext: 39210
shrirang.chit...@hovservices.com
www.hovservices.com

The information contained in this message, including any attachments, is 
attorney privileged and/or confidential information intended only for the use 
of the individual or entity named as addressee.  The review, dissemination, 
distribution or copying of this communication by or to anyone other than the 
intended addressee is strictly prohibited.  If you have received this 
communication in error, please immediately notify the sender by replying to the 
message and destroy all copies of the original message.


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Bryce Nesbitt
Sent: Thursday, December 09, 2010 12:24 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] hashed subplan 5000x slower than two sequential operations

Can you help me understand how to optimize the following.  There's a
subplan which in this case returns 3 rows,
but it is really expensive:



=
explain analyze SELECT contexts.context_key
FROM contexts
 JOIN articles
 ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
  OR contexts.context_key IN
 (SELECT collection_data.context_key
 FROM collection_data
  WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;

QUERY PLAN

  Hash Join  (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=4362.143..6002.808 rows=28 loops=1)
Hash Cond: (articles.context_key = contexts.context_key)
-  Seq Scan on articles  (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.558..3953.002 rows=517356 loops=1)
  Filter: indexed
-  Hash  (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=829.501..829.501 rows=31 loops=1)
  -  Seq Scan on contexts  (cost=14.31..69921.25 rows=800493
width=4) (actual time=1.641..829.339 rows=31 loops=1)
Filter: ((parent_key = 392210) OR (hashed subplan))
SubPlan
  -  Index Scan using collection_data_context_key_index
on collection_data  (cost=0.00..14.30 rows=6 width=4) (actual
time=0.018..0.023 rows=3 loops=1)
Index Cond: (collection_context_key = 392210)
  Total runtime: 6002.976 ms
(11 rows)


=
explain analyze SELECT 

Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Pavel Stehule
2010/12/8 Tom Lane t...@sss.pgh.pa.us:
 Shrirang Chitnis shrirang.chit...@hovservices.com writes:
 Bryce,
 The two queries are different:

 I suspect the second one is a typo and not what he really wanted.

 WHERE (contexts.parent_key = 392210
       OR contexts.context_key IN
          (SELECT collection_data.context_key
          FROM collection_data
           WHERE collection_data.collection_context_key = 392210)

 The only really effective way the planner knows to optimize an
 IN (sub-SELECT) is to turn it into a semi-join, which is not possible
 here because of the unrelated OR clause.  You might consider replacing
 this with a UNION of two scans of contexts.  (And yes, I know it'd be
 nicer if the planner did that for you.)

I remeber a similar case - 9 years ago.

slow variant:

WHERE pk = C1 OR pk IN (SELECT .. FROM .. WHERE some = C2)

I had to rewrite to form

WHERE pk IN (SELECT .. FROM WHERE some = C2 UNION ALL SELECT C1)

Regards

Pavel Stehule



                        regards, tom lane

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Bryce Nesbitt
Title: AW: [PERFORM] hashed subplan 5000x slower than two sequential
operations




Marc Mamin wrote:

  
  
  
  
  
  Hello,
are the table freshly analyzed, with a sufficient
default_statistics_target ?
  


autovacuum = on  # Enable autovacuum
subprocess? 'on' 
autovacuum_naptime = 5min # time between autovacuum runs
default_statistics_target = 150 # range 1-1000




  
You may try to get a better plan while rewriting the query as an UNION
to get rid of the OR clause.
Something like (not tested):
  

It is way better


EXPLAIN ANALYZE SELECT contexts.context_key
FROM contexts
 JOIN articles
 ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key =
392210) 
AND articles.indexed

UNION
SELECT collection_data.context_key
FROM collection_data
JOIN articles ON (articles.context_key=collection_data.context_key)
WHERE collection_data.collection_context_key = 392210
AND articles.indexed;


QUERY
PLAN


Unique (cost=418.50..418.61 rows=22 width=4) (actual
time=0.582..0.671 rows=28 loops=1)
 - Sort (cost=418.50..418.55 rows=22 width=4) (actual
time=0.579..0.608 rows=28 loops=1)
 Sort Key: contexts.context_key
 Sort Method: quicksort Memory: 26kB
 - Append (cost=0.00..418.01 rows=22 width=4) (actual
time=0.042..0.524 rows=28 loops=1)
 - Nested Loop (cost=0.00..376.46 rows=19 width=4)
(actual time=0.040..0.423 rows=28 loops=1)
 - Index Scan using parent_key_idx on
contexts (cost=0.00..115.20 rows=58 width=4) (actual time=0.021..0.082
rows=28 loops=1)
 Index Cond: (parent_key = 392210)
 - Index Scan using article_key_idx on
articles (cost=0.00..4.49 rows=1 width=4) (actual time=0.007..0.008
rows=1 loops=28)
 Index Cond: (public.articles.context_key =
contexts.context_key)
 Filter: public.articles.indexed
 - Nested Loop (cost=0.00..41.32 rows=3 width=4)
(actual time=0.043..0.043 rows=0 loops=1)
 - Index Scan using
collection_data_context_key_index on collection_data (cost=0.00..14.30
rows=6 width=4) (actual time=0.012..0.015 rows=3 loops=1)
 Index Cond: (collection_context_key = 392210)
 - Index Scan using article_key_idx on
articles (cost=0.00..4.49 rows=1 width=4) (actual time=0.006..0.006
rows=0 loops=3)
 Index Cond: (public.articles.context_key =
collection_data.context_key)
 Filter: public.articles.indexed
Total runtime: 0.812 ms








Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Bryce Nesbitt

Marc Mamin wrote:


Another point: would a conditionl index help ?
on articles (context_key) where indexed


no.

production= select count(*),indexed from articles group by indexed;
 count  | indexed
+-
 517433 | t
 695814 | f

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-08 Thread Robert Haas
2010/12/8 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 Yeah, that was my concern, too, though Tom seems skeptical (perhaps
 rightly). šAnd I'm not really sure why the PROCLOCKs need to be in a
 hash table anyway - if we know the PROC and LOCK we can surely look up
 the PROCLOCK pretty expensively by following the PROC SHM_QUEUE.

 Err, pretty INexpensively.

 There are plenty of scenarios in which a proc might hold hundreds or
 even thousands of locks.  pg_dump, for example.  You do not want to be
 doing seq search there.

 Now, it's possible that you could avoid *ever* needing to search for a
 specific PROCLOCK, in which case eliminating the hash calculation
 overhead might be worth it.

That seems like it might be feasible.  The backend that holds the lock
ought to be able to find out whether there's a PROCLOCK by looking at
the LOCALLOCK table, and the LOCALLOCK has a pointer to the PROCLOCK.
It's not clear to me whether there's any other use case for doing a
lookup for a particular combination of PROC A + LOCK B, but I'll have
to look at the code more closely.

 Of course, you'd still have to replicate
 all the space-management functionality of a shared hash table.

Maybe we ought to revisit Markus Wanner's wamalloc.  Although given
our recent discussions, I'm thinking that you might want to try to
design any allocation system so as to minimize cache line contention.
For example, you could hard-allocate each backend 512 bytes of
dedicated shared memory in which to record the locks it holds.  If it
needs more, it allocates additional 512 byte chunks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 2010/12/8 Tom Lane t...@sss.pgh.pa.us:
 Now, it's possible that you could avoid *ever* needing to search for a
 specific PROCLOCK, in which case eliminating the hash calculation
 overhead might be worth it.

 That seems like it might be feasible.  The backend that holds the lock
 ought to be able to find out whether there's a PROCLOCK by looking at
 the LOCALLOCK table, and the LOCALLOCK has a pointer to the PROCLOCK.

Hm, that is a real good point.  Those shared memory data structures
predate the invention of the local lock tables, and I don't think we
looked real hard at whether we should rethink the fundamental
representation in shared memory given the additional local state.
The issue though is whether any other processes ever need to look
at a proc's PROCLOCKs.  I think at least deadlock detection does.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hardware recommendations

2010-12-08 Thread Benjamin Krajmalnik
I need to build a new high performance server to replace our current production 
database server.
The current server is a SuperMicro 1U with 2 RAID-1 containers (one for data, 
one for log, SAS - data is 600GB, Logs 144GB), 16GB of RAM, running 2 quad core 
processors (E5405 @ 2GHz), Adaptec 5405 Controller with BBU.  I am already 
having serious I/O bottlenecks with iostat -x showing extended periods where 
the disk subsystem on the data partition (the one with all the random i/o) at 
over 85% busy.  The system is running FreeBSD 7.2 amd64 and PostgreSQL 8.4.4 on 
amd64-portbld-freebsd7.2, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD], 
64-bit.
Currently I have about 4GB of shared memory allocated to PostgreSQL.  Database 
is currently about 80GB, with about 60GB being in partitioned tables which get 
rotated nightly to purge old data (sort of like a circular buffer of historic 
data).

I was looking at one of the machines which Aberdeen has (the X438), and was 
planning  on something along the lines of 96GB RAM with 16 SAS drives (15K).  
If I create a RAID 10 (stripe of mirrors), leaving 2 hot spares, should I still 
place the logs in a separate RAID-1 mirror, or can they be left on the same 
RAID-10 container?
On the processor front, are there advantages to going to X series processors as 
opposed to the E series (especially since I am I/O bound)?  Is anyone running 
this type of hardware, specially on FreeBSD?  Any opinions, especially 
concerning the Areca controllers which they use?

The new box would ideally be built with the latest released version of FreeBSD, 
PG 9.x.  Also, is anyone running the 8.x series of FreeBSD with PG 9 in a high 
throughput production environment?  I will be upgrading one of our test servers 
in one week to this same configuration to test out, but just wanted to make 
sure there aren't any caveats others have experienced, especially as it 
pertains with the autovacuum not launching worker processes which I have 
experienced.

Best regards,

Benjamin 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware recommendations

2010-12-08 Thread Andy
If you are IO-bound, you might want to consider using SSD.

A single SSD could easily give you more IOPS than 16 15k SAS in RAID 10.
 
--- On Wed, 12/8/10, Benjamin Krajmalnik k...@servoyant.com wrote:

 From: Benjamin Krajmalnik k...@servoyant.com
 Subject: [PERFORM] Hardware recommendations
 To: pgsql-performance@postgresql.org
 Date: Wednesday, December 8, 2010, 6:03 PM
 I need to build a new high
 performance server to replace our current production
 database server.
 The current server is a SuperMicro 1U with 2 RAID-1
 containers (one for data, one for log, SAS - data is 600GB,
 Logs 144GB), 16GB of RAM, running 2 quad core processors
 (E5405 @ 2GHz), Adaptec 5405 Controller with BBU.  I am
 already having serious I/O bottlenecks with iostat -x
 showing extended periods where the disk subsystem on the
 data partition (the one with all the random i/o) at over 85%
 busy.  The system is running FreeBSD 7.2 amd64 and
 PostgreSQL 8.4.4 on amd64-portbld-freebsd7.2, compiled by
 GCC cc (GCC) 4.2.1 20070719  [FreeBSD], 64-bit.
 Currently I have about 4GB of shared memory allocated to
 PostgreSQL.  Database is currently about 80GB, with about
 60GB being in partitioned tables which get rotated nightly
 to purge old data (sort of like a circular buffer of
 historic data).
 
 I was looking at one of the machines which Aberdeen has
 (the X438), and was planning  on something along the lines
 of 96GB RAM with 16 SAS drives (15K).  If I create a RAID
 10 (stripe of mirrors), leaving 2 hot spares, should I still
 place the logs in a separate RAID-1 mirror, or can they be
 left on the same RAID-10 container?
 On the processor front, are there advantages to going to X
 series processors as opposed to the E series (especially
 since I am I/O bound)?  Is anyone running this type of
 hardware, specially on FreeBSD?  Any opinions, especially
 concerning the Areca controllers which they use?
 
 The new box would ideally be built with the latest released
 version of FreeBSD, PG 9.x.  Also, is anyone running the
 8.x series of FreeBSD with PG 9 in a high throughput
 production environment?  I will be upgrading one of our
 test servers in one week to this same configuration to test
 out, but just wanted to make sure there aren't any caveats
 others have experienced, especially as it pertains with the
 autovacuum not launching worker processes which I have
 experienced.
 
 Best regards,
 
 Benjamin 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-08 Thread Pierre C

The hardware it
is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.


For a database you'd want to consider replacing the RAID1 with a RAID1 (or  
RAID10). RAID5 is slow for small random updates, which are common in  
databases. Since you probably have enough harddisks anyway, this won't  
cost you. Linux or freebsd would also be better choices for postgres  
rather than windows.


Also, as said, your issue looks very much like a problem in the way your  
application communicates with postgres : if it takes postgres 5 ms to  
process the query and your application gets the result 8 seconds later,  
there is a problem. Note that SQL Server probably takes just a few ms for  
such a simple query, too, so your not really benchmarking SQL server  
either.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware recommendations

2010-12-08 Thread alaricd


Sent from my android device.

-Original Message-
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
Sent: Wed, 08 Dec 2010 17:14
Subject: [PERFORM] Hardware recommendations

Received: from mx2.hub.org [200.46.204.254] by mail.pengdows.com with SMTP 
(EHLO mx2.hub.org)
  (ArGoSoft Mail Server Pro for WinNT/2000/XP, Version 1.8 (1.8.9.4)); Wed, 8 
Dec 2010 23:14:07
Received: from postgresql.org (mail.postgresql.org [200.46.204.86])
by mx2.hub.org (Postfix) with ESMTP id C1EAD3EAD610;
Wed,  8 Dec 2010 19:16:09 -0400 (AST)
Received: from maia.hub.org (maia-3.hub.org [200.46.204.243])
by mail.postgresql.org (Postfix) with ESMTP id BEF461337B83
for pgsql-performance-postgresql@mail.postgresql.org; Wed,  8 Dec 
2010 19:16:02 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
 by maia.hub.org (mx1.hub.org [200.46.204.243]) (amavisd-maia, port 10024)
 with ESMTP id 69961-09
 for pgsql-performance-postgresql@mail.postgresql.org;
 Wed,  8 Dec 2010 23:15:55 + (UTC)
X-Greylist: delayed 00:12:11.193596 by SQLgrey-1.7.6
Received: from mail.illumen.com (unknown [64.207.29.137])
by mail.postgresql.org (Postfix) with ESMTP id 69A021337B8C
for pgsql-performance@postgresql.org; Wed,  8 Dec 2010 19:15:55 -0400 
(AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: [PERFORM] Hardware recommendations
Date: Wed, 8 Dec 2010 16:03:43 -0700
Message-ID: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
In-Reply-To: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Hardware recommendations
Thread-Index: AcuXJy2x5aJ1UxfPTAK6bTXXH/raOgAABuAQ
References: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits.107 tagged_above0 required=5
 testsºYES_00.9, RDNS_NONE=0.793
X-Spam-Level:
X-Mailing-List: pgsql-performance
List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:majord...@postgresql.org?body=help
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:pgsql-performance-ow...@postgresql.org
List-Post: mailto:pgsql-per

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread Benjamin Krajmalnik
John,

The platform is a network monitoring system, so we have quite a lot of 
inserts/updates (every data point has at least one record insert as well as at 
least 3 record updates).  The management GUI has a lot of selects.  We are 
refactoring the database to some degree to aid in the performance, since the 
performance degradations are correlated to the number of users viewing the 
system GUI.
My biggest concern with SSD drives is their life expectancy, as well as our 
need for relatively high capacity.  From a purely scalability perspective, this 
setup will need to support terabytes of data.  I suppose I could use table 
spaces to use the most accessed data in SSD drives and the rest on regular 
drives.
As I stated, I am moving to RAID 10, and was just wondering if the logs should 
still be moved off to different spindles, or will leaving them on the RAID10 be 
fine and not affect performance.

 -Original Message-
 From: John W Strange [mailto:john.w.stra...@jpmchase.com]
 Sent: Wednesday, December 08, 2010 4:32 PM
 To: Benjamin Krajmalnik; pgsql-performance@postgresql.org
 Subject: RE: Hardware recommendations
 
 Ben,
 
 It would help if you could tell us a bit more about the read/write mix
 and transaction requirements. *IF* you are heavy writes I would suggest
 moving off the RAID1 configuration to a RAID10 setup.  I would highly
 suggest looking at SLC based solid state drives or if your budget has
 legs, look at fusionIO drives.
 
 We currently have several setups with two FusionIO Duo cards that
 produce  2GB second reads, and over 1GB/sec writes.  They are pricey
 but, long term cheaper for me than putting SAN in place that can meet
 that sort of performance.
 
 It all really depends on your workload:
 
 http://www.fusionio.com/products/iodrive/ - BEST in slot currently
 IMHO.
 http://www.intel.com/design/flash/nand/extreme/index.htm?wapkw=(X25-E)
 - not a bad alternative.
 
 There are other SSD controllers on the market but I have experience
 with both so I can recommend both as well.
 
 - John
 
 
 
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
 ow...@postgresql.org] On Behalf Of Benjamin Krajmalnik
 Sent: Wednesday, December 08, 2010 5:04 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Hardware recommendations
 
 I need to build a new high performance server to replace our current
 production database server.
 The current server is a SuperMicro 1U with 2 RAID-1 containers (one for
 data, one for log, SAS - data is 600GB, Logs 144GB), 16GB of RAM,
 running 2 quad core processors (E5405 @ 2GHz), Adaptec 5405 Controller
 with BBU.  I am already having serious I/O bottlenecks with iostat -x
 showing extended periods where the disk subsystem on the data partition
 (the one with all the random i/o) at over 85% busy.  The system is
 running FreeBSD 7.2 amd64 and PostgreSQL 8.4.4 on amd64-portbld-
 freebsd7.2, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD], 64-bit.
 Currently I have about 4GB of shared memory allocated to PostgreSQL.
 Database is currently about 80GB, with about 60GB being in partitioned
 tables which get rotated nightly to purge old data (sort of like a
 circular buffer of historic data).
 
 I was looking at one of the machines which Aberdeen has (the X438), and
 was planning  on something along the lines of 96GB RAM with 16 SAS
 drives (15K).  If I create a RAID 10 (stripe of mirrors), leaving 2 hot
 spares, should I still place the logs in a separate RAID-1 mirror, or
 can they be left on the same RAID-10 container?
 On the processor front, are there advantages to going to X series
 processors as opposed to the E series (especially since I am I/O
 bound)?  Is anyone running this type of hardware, specially on
 FreeBSD?  Any opinions, especially concerning the Areca controllers
 which they use?
 
 The new box would ideally be built with the latest released version of
 FreeBSD, PG 9.x.  Also, is anyone running the 8.x series of FreeBSD
 with PG 9 in a high throughput production environment?  I will be
 upgrading one of our test servers in one week to this same
 configuration to test out, but just wanted to make sure there aren't
 any caveats others have experienced, especially as it pertains with the
 autovacuum not launching worker processes which I have experienced.
 
 Best regards,
 
 Benjamin
 
 --
 Sent via pgsql-performance mailing list (pgsql-
 performa...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 This communication is for informational purposes only. It is not
 intended as an offer or solicitation for the purchase or sale of
 any financial instrument or as an official confirmation of any
 transaction. All market prices, data and other information are not
 warranted as to completeness or accuracy and are subject to change
 without notice. Any comments or statements made herein do not
 necessarily reflect those of JPMorgan Chase 

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread John W Strange
Ben,

It would help if you could tell us a bit more about the read/write mix and 
transaction requirements. *IF* you are heavy writes I would suggest moving off 
the RAID1 configuration to a RAID10 setup.  I would highly suggest looking at 
SLC based solid state drives or if your budget has legs, look at fusionIO 
drives.

We currently have several setups with two FusionIO Duo cards that produce  2GB 
second reads, and over 1GB/sec writes.  They are pricey but, long term cheaper 
for me than putting SAN in place that can meet that sort of performance.

It all really depends on your workload:

http://www.fusionio.com/products/iodrive/ - BEST in slot currently IMHO.
http://www.intel.com/design/flash/nand/extreme/index.htm?wapkw=(X25-E) - not a 
bad alternative.

There are other SSD controllers on the market but I have experience with both 
so I can recommend both as well.

- John



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Benjamin Krajmalnik
Sent: Wednesday, December 08, 2010 5:04 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Hardware recommendations

I need to build a new high performance server to replace our current production 
database server.
The current server is a SuperMicro 1U with 2 RAID-1 containers (one for data, 
one for log, SAS - data is 600GB, Logs 144GB), 16GB of RAM, running 2 quad core 
processors (E5405 @ 2GHz), Adaptec 5405 Controller with BBU.  I am already 
having serious I/O bottlenecks with iostat -x showing extended periods where 
the disk subsystem on the data partition (the one with all the random i/o) at 
over 85% busy.  The system is running FreeBSD 7.2 amd64 and PostgreSQL 8.4.4 on 
amd64-portbld-freebsd7.2, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD], 
64-bit.
Currently I have about 4GB of shared memory allocated to PostgreSQL.  Database 
is currently about 80GB, with about 60GB being in partitioned tables which get 
rotated nightly to purge old data (sort of like a circular buffer of historic 
data).

I was looking at one of the machines which Aberdeen has (the X438), and was 
planning  on something along the lines of 96GB RAM with 16 SAS drives (15K).  
If I create a RAID 10 (stripe of mirrors), leaving 2 hot spares, should I still 
place the logs in a separate RAID-1 mirror, or can they be left on the same 
RAID-10 container?
On the processor front, are there advantages to going to X series processors as 
opposed to the E series (especially since I am I/O bound)?  Is anyone running 
this type of hardware, specially on FreeBSD?  Any opinions, especially 
concerning the Areca controllers which they use?

The new box would ideally be built with the latest released version of FreeBSD, 
PG 9.x.  Also, is anyone running the 8.x series of FreeBSD with PG 9 in a high 
throughput production environment?  I will be upgrading one of our test servers 
in one week to this same configuration to test out, but just wanted to make 
sure there aren't any caveats others have experienced, especially as it 
pertains with the autovacuum not launching worker processes which I have 
experienced.

Best regards,

Benjamin 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase  Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase 
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to 

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread Marti Raudsepp
On Thu, Dec 9, 2010 at 01:26, Andy angelf...@yahoo.com wrote:
 If you are IO-bound, you might want to consider using SSD.

 A single SSD could easily give you more IOPS than 16 15k SAS in RAID 10.

Are there any that don't risk your data on power loss, AND are cheaper
than SAS RAID 10?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware recommendations

2010-12-08 Thread alaricd


Sent from my android device.

-Original Message-
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
Sent: Wed, 08 Dec 2010 17:14
Subject: [PERFORM] Hardware recommendations

Received: from mx2.hub.org [200.46.204.254] by mail.pengdows.com with SMTP 
(EHLO mx2.hub.org)
  (ArGoSoft Mail Server Pro for WinNT/2000/XP, Version 1.8 (1.8.9.4)); Wed, 8 
Dec 2010 23:14:07
Received: from postgresql.org (mail.postgresql.org [200.46.204.86])
by mx2.hub.org (Postfix) with ESMTP id C1EAD3EAD610;
Wed,  8 Dec 2010 19:16:09 -0400 (AST)
Received: from maia.hub.org (maia-3.hub.org [200.46.204.243])
by mail.postgresql.org (Postfix) with ESMTP id BEF461337B83
for pgsql-performance-postgresql@mail.postgresql.org; Wed,  8 Dec 
2010 19:16:02 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
 by maia.hub.org (mx1.hub.org [200.46.204.243]) (amavisd-maia, port 10024)
 with ESMTP id 69961-09
 for pgsql-performance-postgresql@mail.postgresql.org;
 Wed,  8 Dec 2010 23:15:55 + (UTC)
X-Greylist: delayed 00:12:11.193596 by SQLgrey-1.7.6
Received: from mail.illumen.com (unknown [64.207.29.137])
by mail.postgresql.org (Postfix) with ESMTP id 69A021337B8C
for pgsql-performance@postgresql.org; Wed,  8 Dec 2010 19:15:55 -0400 
(AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: [PERFORM] Hardware recommendations
Date: Wed, 8 Dec 2010 16:03:43 -0700
Message-ID: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
In-Reply-To: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Hardware recommendations
Thread-Index: AcuXJy2x5aJ1UxfPTAK6bTXXH/raOgAABuAQ
References: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits.107 tagged_above0 required=5
 testsºYES_00.9, RDNS_NONE=0.793
X-Spam-Level:
X-Mailing-List: pgsql-performance
List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:majord...@postgresql.org?body=help
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:pgsql-performance-ow...@postgresql.org
List-Post: mailto:pgsql-per

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread Andy


  If you are IO-bound, you might want to consider using
 SSD.
 
  A single SSD could easily give you more IOPS than 16
 15k SAS in RAID 10.
 
 Are there any that don't risk your data on power loss, AND
 are cheaper
 than SAS RAID 10?
 

Vertex 2 Pro has a built-in supercapacitor to save data on power loss. It's 
spec'd at 50K IOPS and a 200GB one costs around $1,000.


  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware recommendations

2010-12-08 Thread alaricd


Sent from my android device.

-Original Message-
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
Sent: Wed, 08 Dec 2010 17:14
Subject: [PERFORM] Hardware recommendations

Received: from mx2.hub.org [200.46.204.254] by mail.pengdows.com with SMTP 
(EHLO mx2.hub.org)
  (ArGoSoft Mail Server Pro for WinNT/2000/XP, Version 1.8 (1.8.9.4)); Wed, 8 
Dec 2010 23:14:07
Received: from postgresql.org (mail.postgresql.org [200.46.204.86])
by mx2.hub.org (Postfix) with ESMTP id C1EAD3EAD610;
Wed,  8 Dec 2010 19:16:09 -0400 (AST)
Received: from maia.hub.org (maia-3.hub.org [200.46.204.243])
by mail.postgresql.org (Postfix) with ESMTP id BEF461337B83
for pgsql-performance-postgresql@mail.postgresql.org; Wed,  8 Dec 
2010 19:16:02 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
 by maia.hub.org (mx1.hub.org [200.46.204.243]) (amavisd-maia, port 10024)
 with ESMTP id 69961-09
 for pgsql-performance-postgresql@mail.postgresql.org;
 Wed,  8 Dec 2010 23:15:55 + (UTC)
X-Greylist: delayed 00:12:11.193596 by SQLgrey-1.7.6
Received: from mail.illumen.com (unknown [64.207.29.137])
by mail.postgresql.org (Postfix) with ESMTP id 69A021337B8C
for pgsql-performance@postgresql.org; Wed,  8 Dec 2010 19:15:55 -0400 
(AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: [PERFORM] Hardware recommendations
Date: Wed, 8 Dec 2010 16:03:43 -0700
Message-ID: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
In-Reply-To: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Hardware recommendations
Thread-Index: AcuXJy2x5aJ1UxfPTAK6bTXXH/raOgAABuAQ
References: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits.107 tagged_above0 required=5
 testsºYES_00.9, RDNS_NONE=0.793
X-Spam-Level:
X-Mailing-List: pgsql-performance
List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:majord...@postgresql.org?body=help
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:pgsql-performance-ow...@postgresql.org
List-Post: mailto:pgsql-per

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread alaricd


Sent from my android device.

-Original Message-
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
Sent: Wed, 08 Dec 2010 17:14
Subject: [PERFORM] Hardware recommendations

Received: from mx2.hub.org [200.46.204.254] by mail.pengdows.com with SMTP 
(EHLO mx2.hub.org)
  (ArGoSoft Mail Server Pro for WinNT/2000/XP, Version 1.8 (1.8.9.4)); Wed, 8 
Dec 2010 23:14:07
Received: from postgresql.org (mail.postgresql.org [200.46.204.86])
by mx2.hub.org (Postfix) with ESMTP id C1EAD3EAD610;
Wed,  8 Dec 2010 19:16:09 -0400 (AST)
Received: from maia.hub.org (maia-3.hub.org [200.46.204.243])
by mail.postgresql.org (Postfix) with ESMTP id BEF461337B83
for pgsql-performance-postgresql@mail.postgresql.org; Wed,  8 Dec 
2010 19:16:02 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
 by maia.hub.org (mx1.hub.org [200.46.204.243]) (amavisd-maia, port 10024)
 with ESMTP id 69961-09
 for pgsql-performance-postgresql@mail.postgresql.org;
 Wed,  8 Dec 2010 23:15:55 + (UTC)
X-Greylist: delayed 00:12:11.193596 by SQLgrey-1.7.6
Received: from mail.illumen.com (unknown [64.207.29.137])
by mail.postgresql.org (Postfix) with ESMTP id 69A021337B8C
for pgsql-performance@postgresql.org; Wed,  8 Dec 2010 19:15:55 -0400 
(AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: [PERFORM] Hardware recommendations
Date: Wed, 8 Dec 2010 16:03:43 -0700
Message-ID: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
In-Reply-To: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Hardware recommendations
Thread-Index: AcuXJy2x5aJ1UxfPTAK6bTXXH/raOgAABuAQ
References: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits.107 tagged_above0 required=5
 testsºYES_00.9, RDNS_NONE=0.793
X-Spam-Level:
X-Mailing-List: pgsql-performance
List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:majord...@postgresql.org?body=help
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:pgsql-performance-ow...@postgresql.org
List-Post: mailto:pgsql-per

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread alaricd


Sent from my android device.

-Original Message-
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
Sent: Wed, 08 Dec 2010 17:14
Subject: [PERFORM] Hardware recommendations

Received: from mx2.hub.org [200.46.204.254] by mail.pengdows.com with SMTP 
(EHLO mx2.hub.org)
  (ArGoSoft Mail Server Pro for WinNT/2000/XP, Version 1.8 (1.8.9.4)); Wed, 8 
Dec 2010 23:14:07
Received: from postgresql.org (mail.postgresql.org [200.46.204.86])
by mx2.hub.org (Postfix) with ESMTP id C1EAD3EAD610;
Wed,  8 Dec 2010 19:16:09 -0400 (AST)
Received: from maia.hub.org (maia-3.hub.org [200.46.204.243])
by mail.postgresql.org (Postfix) with ESMTP id BEF461337B83
for pgsql-performance-postgresql@mail.postgresql.org; Wed,  8 Dec 
2010 19:16:02 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
 by maia.hub.org (mx1.hub.org [200.46.204.243]) (amavisd-maia, port 10024)
 with ESMTP id 69961-09
 for pgsql-performance-postgresql@mail.postgresql.org;
 Wed,  8 Dec 2010 23:15:55 + (UTC)
X-Greylist: delayed 00:12:11.193596 by SQLgrey-1.7.6
Received: from mail.illumen.com (unknown [64.207.29.137])
by mail.postgresql.org (Postfix) with ESMTP id 69A021337B8C
for pgsql-performance@postgresql.org; Wed,  8 Dec 2010 19:15:55 -0400 
(AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: [PERFORM] Hardware recommendations
Date: Wed, 8 Dec 2010 16:03:43 -0700
Message-ID: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
In-Reply-To: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Hardware recommendations
Thread-Index: AcuXJy2x5aJ1UxfPTAK6bTXXH/raOgAABuAQ
References: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits.107 tagged_above0 required=5
 testsºYES_00.9, RDNS_NONE=0.793
X-Spam-Level:
X-Mailing-List: pgsql-performance
List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:majord...@postgresql.org?body=help
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:pgsql-performance-ow...@postgresql.org
List-Post: mailto:pgsql-per

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread alaricd


Sent from my android device.

-Original Message-
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
Sent: Wed, 08 Dec 2010 17:14
Subject: [PERFORM] Hardware recommendations

Received: from mx2.hub.org [200.46.204.254] by mail.pengdows.com with SMTP 
(EHLO mx2.hub.org)
  (ArGoSoft Mail Server Pro for WinNT/2000/XP, Version 1.8 (1.8.9.4)); Wed, 8 
Dec 2010 23:14:07
Received: from postgresql.org (mail.postgresql.org [200.46.204.86])
by mx2.hub.org (Postfix) with ESMTP id C1EAD3EAD610;
Wed,  8 Dec 2010 19:16:09 -0400 (AST)
Received: from maia.hub.org (maia-3.hub.org [200.46.204.243])
by mail.postgresql.org (Postfix) with ESMTP id BEF461337B83
for pgsql-performance-postgresql@mail.postgresql.org; Wed,  8 Dec 
2010 19:16:02 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
 by maia.hub.org (mx1.hub.org [200.46.204.243]) (amavisd-maia, port 10024)
 with ESMTP id 69961-09
 for pgsql-performance-postgresql@mail.postgresql.org;
 Wed,  8 Dec 2010 23:15:55 + (UTC)
X-Greylist: delayed 00:12:11.193596 by SQLgrey-1.7.6
Received: from mail.illumen.com (unknown [64.207.29.137])
by mail.postgresql.org (Postfix) with ESMTP id 69A021337B8C
for pgsql-performance@postgresql.org; Wed,  8 Dec 2010 19:15:55 -0400 
(AST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: [PERFORM] Hardware recommendations
Date: Wed, 8 Dec 2010 16:03:43 -0700
Message-ID: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
In-Reply-To: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Hardware recommendations
Thread-Index: AcuXJy2x5aJ1UxfPTAK6bTXXH/raOgAABuAQ
References: f4e6a2751a2823418a21d4a160b689887b0...@fletch.stackdump.local
From: Benjamin Krajmalnik k...@servoyant.com
To: pgsql-performance@postgresql.org
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits.107 tagged_above0 required=5
 testsºYES_00.9, RDNS_NONE=0.793
X-Spam-Level:
X-Mailing-List: pgsql-performance
List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:majord...@postgresql.org?body=help
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:pgsql-performance-ow...@postgresql.org
List-Post: mailto:pgsql-per

Re: [PERFORM] Performance under contention

2010-12-08 Thread Robert Haas
2010/12/8 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 2010/12/8 Tom Lane t...@sss.pgh.pa.us:
 Now, it's possible that you could avoid *ever* needing to search for a
 specific PROCLOCK, in which case eliminating the hash calculation
 overhead might be worth it.

 That seems like it might be feasible.  The backend that holds the lock
 ought to be able to find out whether there's a PROCLOCK by looking at
 the LOCALLOCK table, and the LOCALLOCK has a pointer to the PROCLOCK.

 Hm, that is a real good point.  Those shared memory data structures
 predate the invention of the local lock tables, and I don't think we
 looked real hard at whether we should rethink the fundamental
 representation in shared memory given the additional local state.
 The issue though is whether any other processes ever need to look
 at a proc's PROCLOCKs.  I think at least deadlock detection does.

Sure, but it doesn't use the hash table to do it.  All the PROCLOCKs
for any given LOCK are in a linked list; we just walk it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware recommendations

2010-12-08 Thread Scott Marlowe
On Wed, Dec 8, 2010 at 5:03 PM, Benjamin Krajmalnik k...@servoyant.com wrote:
 John,

 The platform is a network monitoring system, so we have quite a lot of 
 inserts/updates (every data point has at least one record insert as well as 
 at least 3 record updates).  The management GUI has a lot of selects.  We are 
 refactoring the database to some degree to aid in the performance, since the 
 performance degradations are correlated to the number of users viewing the 
 system GUI.

Scalability here may be better addressed by having something like hot
read only slaves for the users who want to view data.

 My biggest concern with SSD drives is their life expectancy,

Generally that's not a big issue, especially as the SSDs get larger.
Being able to survive a power loss without corruption is more of an
issue, so if you go SSD get ones with a supercapacitor that can write
out the data before power down.

 as well as our need for relatively high capacity.

Ahhh, capacity is where SSDs start to lose out quickly.  Cheap 10k SAS
drives and less so 15k drives are way less per gigabyte than SSDs, and
you can only fit so many SSDs onto a single controller / in a single
cage before you're broke.

  From a purely scalability perspective, this setup will need to support 
 terabytes of data.  I suppose I could use table spaces to use the most 
 accessed data in SSD drives and the rest on regular drives.
 As I stated, I am moving to RAID 10, and was just wondering if the logs 
 should still be moved off to different spindles, or will leaving them on the 
 RAID10 be fine and not affect performance.

With a battery backed caching RAID controller, it's more important
that you have the pg_xlog files on a different partition than on a
differen RAID set.  I.e. you can have one big RAID set, and set aside
the first 100G or so for pg_xlog.  This has to do with fsync
behaviour.  In linux this is a known issue, I'm not sure how much so
it would be in BSD.  But you should test for fsync contention.

As for the Areca controllers, I haven't tested them with the latest
drivers or firmware, but we would routinely get 180 to 460 days of
uptime between lockups on our 1680s we installed 2.5 or so years ago.
Of the two brand new LSI  controllers we installed this summer,
we've had one fail already.  However, the database didn't get
corrupted so not too bad.  My preference still leans towards the
Areca, but no RAID controller is perfect and infallible.

Performance wise the Areca is still faster than the LSI , and the
newer faster LSI just didn't work with out quad 12 core AMD mobo.
Note that all of that hardware was brand new, so things may have
improved by now.  I have to say Aberdeen took great care of us in
getting the systems up and running.

As for CPUs, almost any modern CPU will do fine.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow BLOBs restoring

2010-12-08 Thread Vlad Arkhipov

08.12.2010 22:46, Tom Lane writes:

Are you by any chance restoring from an 8.3 or older pg_dump file made
on Windows?  If so, it's a known issue.
   

No, I tried Linux only.


Not without a complete reproducible example ... and not at all if it's
the known problem.  The fix for that is to update pg_dump to 8.4 or
later.
   
I think you can reproduce it. First I created a database full of many 
BLOBs on Postres 8.4.5. Then I created a dump:


pg_dump -F c test  test.backup8

It took about 15 minutes. Then I tried to restore it on Postgres 8.

pg_restore -v -d test2 -j 2 test.backup8

It restored in 18 minutes. Then I restored it to Postgres 9.0.1, it took 
20 minutes. Then I created a dump there:


/usr/pgsql-9.0/bin/pg_dump -F c test  test.backup9

It took 25 minutes. Finally I tried to restore it and got what I've 
already described:


/usr/pgsql-9.0/bin/pg_restore -v -d test2 -j 2 test.backup9

However if I remove the option '-j', the database restores in 45 minutes.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow BLOBs restoring

2010-12-08 Thread Vlad Arkhipov

08.12.2010 22:46, Tom Lane writes:

Are you by any chance restoring from an 8.3 or older pg_dump file made
on Windows?  If so, it's a known issue.


No, I tried Linux only.


Not without a complete reproducible example ... and not at all if it's
the known problem.  The fix for that is to update pg_dump to 8.4 or
later.

I think you can reproduce it. First I created a database full of many 
BLOBs on Postres 8.4.5. Then I created a dump:


pg_dump -F c test  test.backup8

It took about 15 minutes. Then I tried to restore it on Postgres 8.

pg_restore -v -d test2 -j 2 test.backup8

It restored in 18 minutes. Then I restored it to Postgres 9.0.1, it took 
20 minutes. Then I created a dump there:


/usr/pgsql-9.0/bin/pg_dump -F c test  test.backup9

It took 25 minutes. Finally I tried to restore it and got what I've 
already described:


/usr/pgsql-9.0/bin/pg_restore -v -d test2 -j 2 test.backup9

However if I remove the option '-j', the database restores in 45 minutes.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] libpq vs ODBC

2010-12-08 Thread Divakar Singh
Is there any performance penalty when I use ODBC library vs using libpq?

 Best Regards,
Divakar



  

Re: [PERFORM] libpq vs ODBC

2010-12-08 Thread Alex Goncharov
,--- You/Divakar (Wed, 8 Dec 2010 20:31:30 -0800 (PST)) *
| Is there any performance penalty when I use ODBC library vs using libpq?

In general, yes.

In degenerate cases when most of the work happens in the server, no.

You need to measure in the contents of your specific application.

-- Alex -- alex-goncha...@comcast.net --

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] libpq vs ODBC

2010-12-08 Thread Divakar Singh
So it means there will be visible impact if the nature of DB interaction is DB 
insert/select. We do that mostly in my app.
Performance difference would be negligible if the query is server intensive 
where execution time is far more than time taken by e.g. communication 
interface 
or transaction handling.
Am I right?

 Best Regards,
Divakar





From: Alex Goncharov alex-goncha...@comcast.net
To: Divakar Singh dpsma...@yahoo.com
Cc: pgsql-performance@postgresql.org
Sent: Thu, December 9, 2010 10:31:17 AM
Subject: Re: [PERFORM] libpq vs ODBC

,--- You/Divakar (Wed, 8 Dec 2010 20:31:30 -0800 (PST)) *
| Is there any performance penalty when I use ODBC library vs using libpq?

In general, yes.

In degenerate cases when most of the work happens in the server, no.

You need to measure in the contents of your specific application.

-- Alex -- alex-goncha...@comcast.net --

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



  

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread mark

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andy
Sent: Wednesday, December 08, 2010 5:24 PM
To: Marti Raudsepp
Cc: pgsql-performance@postgresql.org; Benjamin Krajmalnik
Subject: Re: [PERFORM] Hardware recommendations



  If you are IO-bound, you might want to consider using
 SSD.
 
  A single SSD could easily give you more IOPS than 16
 15k SAS in RAID 10.
 
 Are there any that don't risk your data on power loss, AND
 are cheaper
 than SAS RAID 10?
 

Vertex 2 Pro has a built-in supercapacitor to save data on power loss. It's
spec'd at 50K IOPS and a 200GB one costs around $1,000.


Viking offers 6Gbps SAS physical connector SSD drives as well - with a super
capacitor.

I have not seen any official pricing yet, but I would suspect it would be in
the same ballpark.

 I am currently begging to get some for eval. I will let everyone know if I
swing that and can post numbers. 

-mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow BLOBs restoring

2010-12-08 Thread Tom Lane
Vlad Arkhipov arhi...@dc.baikal.ru writes:
 08.12.2010 22:46, Tom Lane writes:
 Are you by any chance restoring from an 8.3 or older pg_dump file made
 on Windows?  If so, it's a known issue.

 No, I tried Linux only.

OK, then it's not the missing-data-offsets issue.

 I think you can reproduce it. First I created a database full of many 
 BLOBs on Postres 8.4.5. Then I created a dump:

Oh, you should have said how many was many.  I had tried with several
thousand large blobs yesterday and didn't see any problem.  However,
with several hundred thousand small blobs, indeed it gets pretty slow
as soon as you use -j.

oprofile shows all the time is going into reduce_dependencies during the
first loop in restore_toc_entries_parallel (ie, before we've actually
started doing anything in parallel).  The reason is that for each blob,
we're iterating through all of the several hundred thousand TOC entries,
uselessly looking for anything that depends on the blob.  And to add
insult to injury, because the blobs are all marked as SECTION_PRE_DATA,
we don't get to parallelize at all.  I think we won't get to parallelize
the blob data restoration either, since all the blob data is hidden in a
single TOC entry :-(

So the short answer is don't bother to use -j in a mostly-blobs restore,
becausw it isn't going to help you in 9.0.

One fairly simple, if ugly, thing we could do about this is skip calling
reduce_dependencies during the first loop if the TOC object is a blob;
effectively assuming that nothing could depend on a blob.  But that does
nothing about the point that we're failing to parallelize blob
restoration.  Right offhand it seems hard to do much about that without
some changes to the archive representation of blobs.  Some things that
might be worth looking at for 9.1:

* Add a flag to TOC objects saying this object has no dependencies,
to provide a generalized and principled way to skip the
reduce_dependencies loop.  This is only a good idea if pg_dump knows
that or can cheaply determine it at dump time, but I think it can.

* Mark BLOB TOC entries as SECTION_DATA, or somehow otherwise make them
parallelizable.  Also break the BLOBS data item apart into an item per
BLOB, so that that part's parallelizable.  Maybe we should combine the
metadata and data for each blob into one TOC item --- if we don't, it
seems like we need a dependency, which will put us back behind the
eight-ball.  I think the reason it's like this is we didn't originally
have a separate TOC item per blob; but now that we added that to support
per-blob ACL data, the monolithic BLOBS item seems pretty pointless.
(Another thing that would have to be looked at here is the dependency
between a BLOB and any BLOB COMMENT for it.)

Thoughts?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] libpq vs ODBC

2010-12-08 Thread Alex Goncharov
,--- You/Divakar (Wed, 8 Dec 2010 21:17:22 -0800 (PST)) *
| So it means there will be visible impact if the nature of DB interaction is 
DB 
| insert/select. We do that mostly in my app.

You can't say a visible impact unless you can measure it in your
specific application.

Let's say ODBC takes 10 times of .001 sec for libpq.  Is this a
visible impact?

| Performance difference would be negligible if the query is server intensive 
| where execution time is far more than time taken by e.g. communication 
interface 
| or transaction handling.
| Am I right?

You've got to measure -- there are too many variables to give you the
answer you are trying to get.

To a different question, Would I use ODBC to work with PostgreSQL if
I had the option of using libpq?, I'd certainly answer, No.

You'd need to have the option of using libpq, though.  ODBC takes care
of a lot of difficult details for you, and libpq's higher performance
may turn out to be a loss for you, in your specific situation.

-- Alex -- alex-goncha...@comcast.net --


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] libpq vs ODBC

2010-12-08 Thread Divakar Singh
hmm
If I understand it correctly you argument is valid from performance point of 
view.
But in practical scenarios, it would make more sense to do ODBC if the 
difference is only 5% or so, because it opens up so many choices of databases 
for me.
Do we have some published data in this area.


 Best Regards,
Divakar





From: Alex Goncharov alex-goncha...@comcast.net
To: Divakar Singh dpsma...@yahoo.com
Cc: alex-goncha...@comcast.net; pgsql-performance@postgresql.org
Sent: Thu, December 9, 2010 11:21:26 AM
Subject: Re: [PERFORM] libpq vs ODBC

,--- You/Divakar (Wed, 8 Dec 2010 21:17:22 -0800 (PST)) *
| So it means there will be visible impact if the nature of DB interaction is 
DB 

| insert/select. We do that mostly in my app.

You can't say a visible impact unless you can measure it in your
specific application.

Let's say ODBC takes 10 times of .001 sec for libpq.  Is this a
visible impact?

| Performance difference would be negligible if the query is server intensive 
| where execution time is far more than time taken by e.g. communication 
interface 

| or transaction handling.
| Am I right?

You've got to measure -- there are too many variables to give you the
answer you are trying to get.

To a different question, Would I use ODBC to work with PostgreSQL if
I had the option of using libpq?, I'd certainly answer, No.

You'd need to have the option of using libpq, though.  ODBC takes care
of a lot of difficult details for you, and libpq's higher performance
may turn out to be a loss for you, in your specific situation.

-- Alex -- alex-goncha...@comcast.net --


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



  

Re: [PERFORM] libpq vs ODBC

2010-12-08 Thread Pavel Stehule
Hello

2010/12/9 Divakar Singh dpsma...@yahoo.com:
 hmm
 If I understand it correctly you argument is valid from performance point of
 view.
 But in practical scenarios, it would make more sense to do ODBC if the
 difference is only 5% or so, because it opens up so many choices of
 databases for me.
 Do we have some published data in this area.


It's depend on your environment - VB or VBA has not native drivers, so
you have to use a ODBC. The overhead from ODBC or ADO or ADO.NET for
almost task unsignificant. So people use it. The performance problems
can be detected in some special tasks - and then is necessary to use a
stored procedures.

Regards

Pavel Stehule


 Best Regards,
 Divakar

 
 From: Alex Goncharov alex-goncha...@comcast.net
 To: Divakar Singh dpsma...@yahoo.com
 Cc: alex-goncha...@comcast.net; pgsql-performance@postgresql.org
 Sent: Thu, December 9, 2010 11:21:26 AM
 Subject: Re: [PERFORM] libpq vs ODBC

 ,--- You/Divakar (Wed, 8 Dec 2010 21:17:22 -0800 (PST)) *
 | So it means there will be visible impact if the nature of DB interaction
 is DB
 | insert/select. We do that mostly in my app.

 You can't say a visible impact unless you can measure it in your
 specific application.

 Let's say ODBC takes 10 times of .001 sec for libpq.  Is this a
 visible impact?

 | Performance difference would be negligible if the query is server
 intensive
 | where execution time is far more than time taken by e.g. communication
 interface
 | or transaction handling.
 | Am I right?

 You've got to measure -- there are too many variables to give you the
 answer you are trying to get.

 To a different question, Would I use ODBC to work with PostgreSQL if
 I had the option of using libpq?, I'd certainly answer, No.

 You'd need to have the option of using libpq, though.  ODBC takes care
 of a lot of difficult details for you, and libpq's higher performance
 may turn out to be a loss for you, in your specific situation.

 -- Alex -- alex-goncha...@comcast.net --


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance