Re: [PERFORM] adding foreign key constraint locks up table

2011-01-04 Thread Florian Weimer
* Tom Lane:

>> Do you mean that the ALTER query and subsequent queries are shown as
>> "waiting" in pg_stat_activity?  In this case, I'm also wondering why
>> this is inecessary.
>
> ALTER ADD FOREIGN KEY must lock both tables to add triggers to them.

But why is such a broad lock needed?  If the table was created in the
current transaction and is empty, the contents of the foreign key
table should not matter.

-- 
Florian Weimer
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] concurrent IO in postgres?

2011-01-04 Thread Greg Smith

Jeff Janes wrote:

There are parameters governing how likely it is that bgwriter falls
behind in the first place, though.

http://www.postgresql.org/docs/9.0/static/runtime-config-resource.html

In particular bgwriter_lru_maxpages could be made bigger and/or
bgwriter_delay smaller.
  


Also, one of the structures used for caching the list of fsync requests 
the background writer is handling, the thing that results in backend 
writes when it can't keep up, is proportional to the size of 
shared_buffers on the server.  Setting that tunable to a reasonable size 
and lowering bgwriter_delay are two things that help most for the 
background writer to keep up with overall load rather than having 
backends write their own buffers.  And the way checkpoints in PostgreSQL 
work, having more backend writes is generally not a performance 
improving change, even though it does have the property that it gets 
more processes writing at once.


The thread opening post here really didn't discuss if any PostgreSQL 
server tuning or OS tuning was done to try and optimize performance.  
The usual list at 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server is 
normally a help.


At the kernel level, the #1 thing I find necessary to get decent bulk 
performance in a lot of situations is proper read-ahead.  On Linux for 
example, you must get the OS doing readahead to compensate for the fact 
that PostgreSQL is issuing requests in a serial sequence.  It's going to 
ask for block #1, then block #2, then block #3, etc.  If the OS doesn't 
start picking up on that pattern and reading blocks 4, 5, 6, etc. before 
the server asks for them, to keep the disk fully occupied and return the 
database data fast from the kernel buffers, you'll never reach the full 
potential even of a regular hard drive.  And the default readahead on 
Linux is far too low for modern hardware.



But bulk copy binary might use a nondefault allocation strategy, and I
don't know enough about that part of the code to assess the
interaction of that with bgwriter.
  


It's documented pretty well in src/backend/storage/buffer/README , 
specifically the "Buffer Ring Replacement Strategy" section.  Sequential 
scan reads, VACUUM, COPY IN, and CREATE TABLE AS SELECT are the 
operations that get one of the more specialized buffer replacement 
strategies.  These all use the same basic approach, which is to re-use a 
ring of data rather than running rampant over the whole buffer cache.  
The main thing different between them is the size of the ring.  Inside 
freelist.c the GetAccessStrategy code lets you see the size you get in 
each of these modes.


Since PostgreSQL reads and writes through the OS buffer cache in 
addition to its own shared_buffers pool, this whole ring buffer thing 
doesn't protect the OS cache from being trashed by a big bulk 
operation.  Your only real defense there is to make shared_buffers large 
enough that it retains a decent chunk of data even in the wake of that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL

2011-01-04 Thread Greg Smith

nextage Tech wrote:

How good are certmagic.com practice exams for PostgreSQL
 exam? My friends told me they are pretty good and he has passed many
exams with their material. Let me know guys!
  


Well, since the PostgreSQL CE 8 Silver exam itself isn't very well 
defined as an industry certification goes, whether or not Certmagic's 
prep helps you pass or not isn't too exciting to talk about.  Unlike 
most of the commercial databases, there is no official certification 
available for PostgreSQL from the vendor, as there is no real vendor 
here.  SRA does their CE certification, EnterpriseDB has some 
certification exams related to their training, and we're happy to print 
certificates for students who pass our training classes too.  But 
without any standardized testing guidelines that go beyond tests 
developed by individual training companies, PostgreSQL certification 
really doesn't result in the same sort of credibility that, say, Oracle 
or Cisco certification does.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] adding foreign key constraint locks up table

2011-01-04 Thread Tom Lane
Florian Weimer  writes:
> * Tom Lane:
>> ALTER ADD FOREIGN KEY must lock both tables to add triggers to them.

> But why is such a broad lock needed?  If the table was created in the
> current transaction and is empty, the contents of the foreign key
> table should not matter.

It's not about content, it's about having reproducible results.  We
cannot commit an ADD TRIGGER operation when there are table-modifying
queries already in progress, because they might (will) fail to notice
the trigger.  If you don't believe this is a problem, consider the
following sequence of events:

1. Session 1 issues "DELETE FROM pk WHERE true".  It fetches the table
definition, sees there are no triggers, and begins to execute the
DELETE.  Now it goes to sleep for awhile.

2. Session 2 issues ALTER TABLE fk ADD FOREIGN KEY pk.  If it doesn't
take a lock on pk that would exclude the concurrent DELETE, it can fall
through and commit before session 1 makes any more progress.

3. Session 2 inserts some rows in fk.  They are valid since the matching
rows in pk are valid (and not yet even marked for deletion).

4. Session 1 wakes up and finishes its DELETE.  Not knowing there is any
committed trigger on pk, it performs no FK checking.

Now you have rows in fk that violate the foreign key constraint.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question: BlockSize > 8192 with FusionIO

2011-01-04 Thread Merlin Moncure
On Mon, Jan 3, 2011 at 9:13 PM, Greg Smith  wrote:
> Strange, John W wrote:
>>
>> Has anyone had a chance to recompile and try larger a larger blocksize
>> than 8192 with pSQL 8.4.x?
>
> While I haven't done the actual experiment you're asking about, the problem
> working against you here is how WAL data is used to protect against partial
> database writes.  See the documentation for full_page_writes at
> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>  Because full size copies of the blocks have to get written there, attempts
> to chunk writes into larger pieces end up requiring a correspondingly larger
> volume of writes to protect against partial writes to those pages.  You
> might get a nice efficiency gain on the read side, but the situation when
> under a heavy write load (the main thing you have to be careful about with
> these SSDs) is much less clear.

most flash drives, especially mlc flash, use huge blocks anyways on
physical level.  the numbers claimed here
(http://www.fusionio.com/products/iodrive/)  (141k write iops) are
simply not believable without write buffering.  i didn't see any note
of how fault tolerance is maintained through the buffer (anyone
know?).

assuming they do buffer, i would expect a smaller blocksize would be
better/easier on the ssd, since this would mean less gross writing,
higher maximum throughput, and less wear and tear on the flash; the
advantages of the larger blocksize are very hardware driven and
already managed by the controller.

if they don't buffer (again, I'm very skeptical this is the case), a
larger block size, possibly even a much larger block size (like 256k)
would be an interesting test.  i'm pretty skeptical about the fusion
i/o product generally, because i don't think the sata interface is a
bottleneck save for read caching, and the o/s is already buffering
reads.  the storage medium is still the bottleneck for the most part
(even if it's much faster at certain things).   note fusion is still
useful for some things, but the nice is narrower than it looks on the
surface.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Same stament sometime fast, something slow

2011-01-04 Thread Fernando Mertins
Hi there, this is my first post.

I have a PostgreSQL 8.3.6, compiled by Visual C++ build 1400 running on
Windows with virtual machine, 2 GB ram, and configured the postgresql.conf
file to log statements duration >= 500 ms.

And I have this query/log entry:
2011-01-03 23:06:29 BRT LOG:  duration: 2843.000 ms  statement: SELECT
DESCRICAO FROM CURSO WHERE CODCURSO = 2

My question is, this same query executes many times a day and many times
fast/normal, but why in some cases its run slowly? Especialy because the
"CODCURSO" column is PK and this table has only 3 registers (tiny table).

Thank you in advance!
Fernando


Re: [PERFORM] Same stament sometime fast, something slow

2011-01-04 Thread Kevin Grittner
Fernando Mertins  wrote:
 
> I have a PostgreSQL 8.3.6
 
You should consider upgrading to the latest minor release:
 
http://www.postgresql.org/support/versioning
 
http://www.postgresql.org/docs/8.3/static/release.html
 
> My question is, this same query executes many times a day and many
> times fast/normal, but why in some cases its run slowly? Especialy
> because the "CODCURSO" column is PK and this table has only 3
> registers (tiny table).
 
Two common causes for this are blocking and overloading the I/O
system at checkpoint.  You might want to turn on logging of
checkpoints to see if this happens only during checkpoints.  See this
page for techniques to look at blocking:
 
http://wiki.postgresql.org/wiki/Lock_Monitoring
 
If neither of these helps, please review this page and post again
with more details:
 
http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question: BlockSize > 8192 with FusionIO

2011-01-04 Thread Ben Chobot

On Jan 4, 2011, at 8:48 AM, Merlin Moncure wrote:

> 
> most flash drives, especially mlc flash, use huge blocks anyways on
> physical level.  the numbers claimed here
> (http://www.fusionio.com/products/iodrive/)  (141k write iops) are
> simply not believable without write buffering.  i didn't see any note
> of how fault tolerance is maintained through the buffer (anyone
> know?).

FusionIO buffers. They have capacitors onboard to protect against crashing and 
power failure. They passed our crash attempts to corrupt writes to them before 
we put them into production, for whatever that's worth, but they do take a long 
time to come back online after an unclean shutdown.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question: BlockSize > 8192 with FusionIO

2011-01-04 Thread Strange, John W
This has gotten a lot better with the 2.x drivers as well.

I'm completely aware of the FusionIO and it's advantages/disadvantages.. I'm 
currently getting the following pgbench results but still only hitting the 
array for about 800MB/sec, short of the 3GB/sec that it's capable of.  This is 
simply a trash DB for us to store results in for short periods of time.  If 
something bad was to happen we can regenerate the results.  So performance with 
limited risk is what we are looking to achieve. 

asgp...@ash01_riskresults $ pgbench -v -j 4 -t 20 -c 16 -h localhost -p 
4410 pgbench_1
starting vacuum...end.
starting vacuum pgbench_accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 4
number of transactions per client: 20
number of transactions actually processed: 320/320
tps = 16783.841042 (including connections establishing)
tps = 16785.592722 (excluding connections establishing)


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ben Chobot
Sent: Tuesday, January 04, 2011 12:37 PM
To: Merlin Moncure
Cc: pgsql-performance@postgresql.org Performance
Subject: Re: [PERFORM] Question: BlockSize > 8192 with FusionIO


On Jan 4, 2011, at 8:48 AM, Merlin Moncure wrote:

> 
> most flash drives, especially mlc flash, use huge blocks anyways on
> physical level.  the numbers claimed here
> (http://www.fusionio.com/products/iodrive/)  (141k write iops) are
> simply not believable without write buffering.  i didn't see any note
> of how fault tolerance is maintained through the buffer (anyone
> know?).

FusionIO buffers. They have capacitors onboard to protect against crashing and 
power failure. They passed our crash attempts to corrupt writes to them before 
we put them into production, for whatever that's worth, but they do take a long 
time to come back online after an unclean shutdown.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question: BlockSize > 8192 with FusionIO

2011-01-04 Thread Strange, John W
Test,

Sorry trying to fix why my email is getting formatted to bits when posting to 
the list.

- John

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Strange, John W
Sent: Tuesday, January 04, 2011 1:01 PM
To: Ben Chobot; Merlin Moncure
Cc: pgsql-performance@postgresql.org Performance
Subject: Re: [PERFORM] Question: BlockSize > 8192 with FusionIO

This has gotten a lot better with the 2.x drivers as well.



I'm completely aware of the FusionIO and it's advantages/disadvantages.. I'm 
currently getting the following pgbench results but still only hitting the 
array for about 800MB/sec, short of the 3GB/sec that it's capable of.  This is 
simply a trash DB for us to store results in for short periods of time.  If 
something bad was to happen we can regenerate the results.  So performance with 
limited risk is what we are looking to achieve. 



asgp...@ash01_riskresults $ pgbench -v -j 4 -t 20 -c 16 -h localhost -p 
4410 pgbench_1

starting vacuum...end.

starting vacuum pgbench_accounts...end.

transaction type: TPC-B (sort of)

scaling factor: 1

query mode: simple

number of clients: 16

number of threads: 4

number of transactions per client: 20

number of transactions actually processed: 320/320

tps = 16783.841042 (including connections establishing)

tps = 16785.592722 (excluding connections establishing)





-Original Message-

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ben Chobot

Sent: Tuesday, January 04, 2011 12:37 PM

To: Merlin Moncure

Cc: pgsql-performance@postgresql.org Performance

Subject: Re: [PERFORM] Question: BlockSize > 8192 with FusionIO





On Jan 4, 2011, at 8:48 AM, Merlin Moncure wrote:



> 

> most flash drives, especially mlc flash, use huge blocks anyways on

> physical level.  the numbers claimed here

> (http://www.fusionio.com/products/iodrive/)  (141k write iops) are

> simply not believable without write buffering.  i didn't see any note

> of how fault tolerance is maintained through the buffer (anyone

> know?).



FusionIO buffers. They have capacitors onboard to protect against crashing and 
power failure. They passed our crash attempts to corrupt writes to them before 
we put them into production, for whatever that's worth, but they do take a long 
time to come back online after an unclean shutdown.

-- 

Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-performance

This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.



This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.



Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intende

Re: [PERFORM] Same stament sometime fast, something slow

2011-01-04 Thread Craig Ringer

On 01/05/2011 05:03 AM, Fernando Mertins wrote:

Hi there, this is my first post.

I have a PostgreSQL 8.3.6, compiled by Visual C++ build 1400 running on
Windows with virtual machine

  

What kind of VM host? where? what else is on the same host?

Your most likely culprit is I/O contention from other guests on the same 
host, possibly combined with I/O queuing policies on the host that 
favour throughput over request latency.


Checkpoints might also be a factor.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Same stament sometime fast, something slow

2011-01-04 Thread Greg Smith

Kevin Grittner wrote:

Two common causes for this are blocking and overloading the I/O
system at checkpoint.  You might want to turn on logging of
checkpoints to see if this happens only during checkpoints.  See this
page for techniques to look at blocking:
 
http://wiki.postgresql.org/wiki/Lock_Monitoring
  


I just updated this to mention use of log_lock_waits to help here.  
Looking for patterns in log_min_duration_statement, log_checkpoints, and 
log_lock_waits entries, seeing which tend to happen at the same time, is 
the usual helpful trio to investigate when having intermittent slow 
queries.  Of course, with Windows running on a VM, there's a hundred 
other things that could be causing this completely unrelated to the 
database.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question: BlockSize > 8192 with FusionIO

2011-01-04 Thread Scott Carey

On Jan 4, 2011, at 8:48 AM, Merlin Moncure wrote:

> On Mon, Jan 3, 2011 at 9:13 PM, Greg Smith  wrote:
>> Strange, John W wrote:
>>> 
>>> Has anyone had a chance to recompile and try larger a larger blocksize
>>> than 8192 with pSQL 8.4.x?
>> 
>> While I haven't done the actual experiment you're asking about, the problem
>> working against you here is how WAL data is used to protect against partial
>> database writes.  See the documentation for full_page_writes at
>> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>>  Because full size copies of the blocks have to get written there, attempts
>> to chunk writes into larger pieces end up requiring a correspondingly larger
>> volume of writes to protect against partial writes to those pages.  You
>> might get a nice efficiency gain on the read side, but the situation when
>> under a heavy write load (the main thing you have to be careful about with
>> these SSDs) is much less clear.
> 
> most flash drives, especially mlc flash, use huge blocks anyways on
> physical level.  the numbers claimed here
> (http://www.fusionio.com/products/iodrive/)  (141k write iops) are
> simply not believable without write buffering.  i didn't see any note
> of how fault tolerance is maintained through the buffer (anyone
> know?).


Flash may have very large erase blocks -- 4k to 16M, but you can write to it at 
much smaller block sizes sequentially.

It has to delete a block in bulk, but it can write to an erased block bit by 
bit, sequentially (512 or 4096 bytes typically, but some is 8k and 16k).

Older MLC NAND flash could be written to at a couple bytes at a time -- but 
drives today incorporate too much EEC and use larger chunks to do that.  The 
minimum write size now is caused by the EEC requirements and not the physical 
NAND flash requirements.  

So, buffering isn't that big of a requirement with the current LBA > Physical 
translations which change all writes -- random or not -- to sequential writes 
in one erase block.
 But performance if waiting for the write to complete will not be all that 
good, especially with MLC.  Turn off the buffer on an Intel SLC drive for 
example, and write IOPS is cut by 1/3 or more -- to 'only' 1000 or so iops.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance