Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-02 Thread Kyotaro HORIGUCHI
  Sorry, I missed this message and only cought up when reading your CF
  status mail. I've attached three patches:
 
  Could let me know how to get the CF status mail?
 
 I think he meant this email I sent last weekend:
 
 http://www.postgresql.org/message-id/542672d2.3060...@vmware.com

I see, that's what I also received. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo fixes in src/backend/rewrite/rewriteHandler.c

2014-10-02 Thread Etsuro Fujita
Here is the comments in process_matched_tle() in rewriteHandler.c.

883  * such nodes; consider
884  *  UPDATE tab SET col.fld1.subfld1 = x, col.fld2.subfld2 = y
885  * The two expressions produced by the parser will look like
886  *  FieldStore(col, fld1, FieldStore(placeholder, subfld1, x))
887  *  FieldStore(col, fld2, FieldStore(placeholder, subfld2, x))

I think the second one is not correct and should be

FieldStore(col, fld2, FieldStore(placeholder, subfld2, y))

Just like this,

891  *  FieldStore(FieldStore(col, fld1,
892  *FieldStore(placeholder, subfld1, x)),
893  * fld2, FieldStore(placeholder, subfld2, x))

should be

FieldStore(FieldStore(col, fld1,
  FieldStore(placeholder, subfld1, x)),
   fld2, FieldStore(placeholder, subfld2, y))

Patch attached.

Thanks,

Best regards,
Etsuro Fujita
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index cb65c05..93fda07 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -883,13 +883,13 @@ process_matched_tle(TargetEntry *src_tle,
 	 *		UPDATE tab SET col.fld1.subfld1 = x, col.fld2.subfld2 = y
 	 * The two expressions produced by the parser will look like
 	 *		FieldStore(col, fld1, FieldStore(placeholder, subfld1, x))
-	 *		FieldStore(col, fld2, FieldStore(placeholder, subfld2, x))
+	 *		FieldStore(col, fld2, FieldStore(placeholder, subfld2, y))
 	 * However, we can ignore the substructure and just consider the top
 	 * FieldStore or ArrayRef from each assignment, because it works to
 	 * combine these as
 	 *		FieldStore(FieldStore(col, fld1,
 	 *			  FieldStore(placeholder, subfld1, x)),
-	 *   fld2, FieldStore(placeholder, subfld2, x))
+	 *   fld2, FieldStore(placeholder, subfld2, y))
 	 * Note the leftmost expression goes on the inside so that the
 	 * assignments appear to occur left-to-right.
 	 *

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-02 Thread Heikki Linnakangas

On 10/02/2014 02:52 AM, Peter Geoghegan wrote:

Having been surprisingly successful at advancing our understanding of
arguments for and against various approaches to value locking, I
decided to try the same thing out elsewhere. I have created a
general-purpose UPSERT wiki page.

The page is: https://wiki.postgresql.org/wiki/UPSERT


Thanks!

Could you write down all of the discussed syntaxes, using a similar 
notation we use in the manual, with examples on how to use them? And 
some examples on what is possible with some syntaxes, and not with 
others? That would make it a lot easier to compare them.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-10-02 Thread Heikki Linnakangas

On 10/02/2014 03:20 AM, Kevin Grittner wrote:

My only concern from the benchmarks is that it seemed like there
was a statistically significant increase in planning time:

unpatched plan time average: 0.450 ms
patched plan time average:   0.536 ms

That *might* just be noise, but it seems likely to be real.  For
the improvement in run time, I'd put up with an extra 86us in
planning time per hash join; but if there's any way to shave some
of that off, all the better.


The patch doesn't modify the planner at all, so it would be rather 
surprising if it increased planning time. I'm willing to just write that 
off as noise.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-10-02 Thread Tomas Vondra
Dne 2 Říjen 2014, 2:20, Kevin Grittner napsal(a):
 Tomas Vondra t...@fuzzy.cz wrote:
 On 12.9.2014 23:22, Robert Haas wrote:

 My first thought is to revert to NTUP_PER_BUCKET=1, but it's
 certainly arguable. Either method, though, figures to be better than
 doing nothing, so let's do something.

 OK, but can we commit the remaining part first? Because it'll certainly
 interact with this proposed part, and it's easier to tweak when the code
 is already there. Instead of rebasing over and over.

 The patch applied and built without problem, and pass `make
 check-world`.  The only thing that caught my eye was the addition
 of debug code using printf() instead of logging at a DEBUG level.
 Is there any reason for that?

Not really. IIRC the main reason it that the other code in nodeHash.c uses
the same approach.

 I still need to work through the (rather voluminous) email threads
 and the code to be totally comfortable with all the issues, but
 if Robert and Heikki are comfortable with it, I'm not gonna argue.

;-)

 Preliminary benchmarks look outstanding!  I tried to control
 carefully to ensure consistent results (by dropping, recreating,
 vacuuming, analyzing, and checkpointing before each test), but
 there were surprising outliers in the unpatched version.  It turned
 out that it was because slight differences in the random samples
 caused different numbers of buckets for both unpatched and patched,
 but the patched version dealt with the difference gracefully while
 the unpatched version's performance fluctuated randomly.

Can we get a bit more details on the test cases? I did a number of tests
while working on the patch, and I generally tested two cases:

(a) well-estimated queries (i.e. with nbucket matching NTUP_PER_BUCKET)

(b) mis-estimated queries, with various levels of accuracy (say, 10x -
1000x misestimates)

From the description, it seems you only tested (b) - is that correct?

The thing is that while the resize is very fast and happens only once,
it's not perfectly free. In my tests, this was always more than
compensated by the speedups (even in the weird cases reported by Stephen
Frost), so I think we're safe here.

Also, I definitely recommend testing with different hash table sizes (say,
work_mem=256MB and the hash table just enough to fit in without batching).
The thing is the effect of CPU caches is very different for small and
large hash tables. (This is not about work_mem alone, but about how much
memory is used by the hash table - according to the results you posted it
never gets over ~4.5MB)


You tested against current HEAD, right? This patch was split into three
parts, two of which were already commited (45f6240a and 8cce08f1). The
logic of the patch was this might take a of time/memory, but it's
compensated by these other changes. Maybe running the tests on the
original code would be interesting?

Although, if this last part of the patch actually improves the performance
on it's own, we're fine - it'll improve it even more compared to the old
code (especially before lowering NTUP_PER_BUCKET 10 to 1).

 My only concern from the benchmarks is that it seemed like there
 was a statistically significant increase in planning time:

 unpatched plan time average: 0.450 ms
 patched plan time average:   0.536 ms

 That *might* just be noise, but it seems likely to be real.  For
 the improvement in run time, I'd put up with an extra 86us in
 planning time per hash join; but if there's any way to shave some
 of that off, all the better.

I agree with Heikki that this is probably noise, because the patch does
not mess with planner at all.

The only thing I can think of is adding a few fields into
HashJoinTableData. Maybe this makes the structure too large to fit into a
cacheline, or something?

Tomas




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 12:07 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Could you write down all of the discussed syntaxes, using a similar notation
 we use in the manual, with examples on how to use them? And some examples on
 what is possible with some syntaxes, and not with others? That would make it
 a lot easier to compare them.

I've started off by adding varied examples of the use of the existing
proposed syntax. I'll expand on this soon.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-02 Thread Kyotaro HORIGUCHI
Hello,

  I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
  weaker form of ImmediateInterruptOK that only allows handling a pending
  die-signal in the signal handler.
  
  Robert, others, do you see a problem with this?
 
 Per se I don't have a problem with it. There does exist the problem that
 the user doesn't get a error message in more cases though. On the other
 hand it's bad if any user can prevent the database from restarting.
 
  Over IM, Robert pointed out that it's not safe to jump out of a signal
  handler with siglongjmp, when we're inside library calls, like in a callback
  called by OpenSSL. But even with current master branch, that's exactly what
  we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
  that any incoming signal will be handled directly in the signal handler,
  which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
  if control never returns to the SSL_read() or SSL_write() function that
  called secure_raw_read().
 
 But this is imo prohibitive. Yes, we're doing it for a long while. But
 no, that's not ok. It actually prompoted me into prototyping the latch
 thing (in some other thread). I don't think existing practice justifies
 expanding it further.

I see, in that case, this approach seems basically
applicable. But if I understand correctly, this patch seems not
to return out of the openssl code even when latch was found to be
set in secure_raw_write/read.  I tried setting errno = ECONNRESET
and it went well but seems a bad deed.

secure_raw_write(Port *port, const void *ptr, size_t len)
{
  n = send(port-sock, ptr, len, 0);
  
  if (!port-noblock  n  0  (errno == EWOULDBLOCK || errno == EAGAIN))
  {
w = WaitLatchOrSocket(MyProc-procLatch, ...

if (w  WL_LATCH_SET)
{
ResetLatch(MyProc-procLatch);
/*
* Force a return, so interrupts can be processed when not
* (possibly) underneath a ssl library.
*/
errno = EINTR;
(return n;  // n is negative)


my_sock_write(BIO *h, const char *buf, int size)
{
  res = secure_raw_write(((Port *) h-ptr), buf, size);
  BIO_clear_retry_flags(h);
  if (res = 0)
  {
if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
  BIO_set_retry_write(h);
  

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 3

2014-10-02 Thread Heikki Linnakangas

On 09/23/2014 09:24 PM, Andres Freund wrote:

I've previously started two threads about replication identifiers. Check
http://archives.postgresql.org/message-id/20131114172632.GE7522%40alap2.anarazel.de
and
http://archives.postgresql.org/message-id/20131211153833.GB25227%40awork2.anarazel.de
.

The've also been discussed in the course of another thread:
http://archives.postgresql.org/message-id/20140617165011.GA3115%40awork2.anarazel.de


And even earlier here:
http://www.postgresql.org/message-id/flat/1339586927-13156-10-git-send-email-and...@2ndquadrant.com#1339586927-13156-10-git-send-email-and...@2ndquadrant.com
The thread branched a lot, the relevant branch is the one with subject 
[PATCH 10/16] Introduce the concept that wal has a 'origin' node



== Identify the origin of changes ==

Say you're building a replication solution that allows two nodes to
insert into the same table on two nodes. Ignoring conflict resolution
and similar fun, one needs to prevent the same change being replayed
over and over. In logical replication the changes to the heap have to
be WAL logged, and thus the *replay* of changes from a remote node
produce WAL which then will be decoded again.

To avoid that it's very useful to tag individual changes/transactions
with their 'origin'. I.e. mark changes that have been directly
triggered by the user sending SQL as originating 'locally' and changes
originating from replaying another node's changes as originating
somewhere else.

If that origin is exposed to logical decoding output plugins they can
easily check whether to stream out the changes/transactions or not.


It is possible to do this by adding extra columns to every table and
store the origin of a row in there, but that a) permanently needs
storage b) makes things much more invasive.


An origin column in the table itself helps tremendously to debug issues 
with the replication system. In many if not most scenarios, I think 
you'd want to have that extra column, even if it's not strictly required.



What I've previously suggested (and which works well in BDR) is to add
the internal id to the XLogRecord struct. There's 2 free bytes of
padding that can be used for that purpose.


Adding a field to XLogRecord for this feels wrong. This is for *logical* 
replication - why do you need to mess with something as physical as the 
WAL record format?


And who's to say that a node ID is the most useful piece of information 
for a replication system to add to the WAL header. I can easily imagine 
that you'd want to put a changeset ID or something else in there, 
instead. (I mentioned another example of this in 
http://www.postgresql.org/message-id/4fe17043.60...@enterprisedb.com)


If we need additional information added to WAL records, for extensions, 
then that should be made in an extensible fashion. IIRC (I couldn't find 
a link right now), when we discussed the changes to heap_insert et al 
for wal_level=logical, I already argued back then that we should make it 
possible for extensions to annotate WAL records, with things like this 
is the primary key, or whatever information is needed for conflict 
resolution, or handling loops. I don't like it that we're adding little 
pieces of information to the WAL format, bit by bit.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-02 Thread Andres Freund
On 2014-10-02 17:47:39 +0900, Kyotaro HORIGUCHI wrote:
 Hello,
 
   I propose the attached patch. It adds a new flag ImmediateDieOK, which is 
   a
   weaker form of ImmediateInterruptOK that only allows handling a pending
   die-signal in the signal handler.
   
   Robert, others, do you see a problem with this?
  
  Per se I don't have a problem with it. There does exist the problem that
  the user doesn't get a error message in more cases though. On the other
  hand it's bad if any user can prevent the database from restarting.
  
   Over IM, Robert pointed out that it's not safe to jump out of a signal
   handler with siglongjmp, when we're inside library calls, like in a 
   callback
   called by OpenSSL. But even with current master branch, that's exactly 
   what
   we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which 
   means
   that any incoming signal will be handled directly in the signal handler,
   which can mean elog(ERROR). Should we be worried? OpenSSL might get 
   confused
   if control never returns to the SSL_read() or SSL_write() function that
   called secure_raw_read().
  
  But this is imo prohibitive. Yes, we're doing it for a long while. But
  no, that's not ok. It actually prompoted me into prototyping the latch
  thing (in some other thread). I don't think existing practice justifies
  expanding it further.
 
 I see, in that case, this approach seems basically
 applicable. But if I understand correctly, this patch seems not
 to return out of the openssl code even when latch was found to be
 set in secure_raw_write/read.

Correct. That's why I think it's the way forward. There's several
problems now where the inability to do real things while reading/writing
is a problem.

  I tried setting errno = ECONNRESET
 and it went well but seems a bad deed.

Where and why did you do that?

 secure_raw_write(Port *port, const void *ptr, size_t len)
 {
   n = send(port-sock, ptr, len, 0);
   
   if (!port-noblock  n  0  (errno == EWOULDBLOCK || errno == EAGAIN))
   {
 w = WaitLatchOrSocket(MyProc-procLatch, ...
 
 if (w  WL_LATCH_SET)
 {
   ResetLatch(MyProc-procLatch);
 /*
 * Force a return, so interrupts can be processed when not
 * (possibly) underneath a ssl library.
 */
 errno = EINTR;
 (return n;  // n is negative)
 
 
 my_sock_write(BIO *h, const char *buf, int size)
 {
   res = secure_raw_write(((Port *) h-ptr), buf, size);
   BIO_clear_retry_flags(h);
   if (res = 0)
   {
 if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
 {
   BIO_set_retry_write(h);

Hm, this seems, besides one comment, the code from the last patch in my
series. Do you have a particular question about it?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 3

2014-10-02 Thread Andres Freund
On 2014-10-02 11:49:31 +0300, Heikki Linnakangas wrote:
 On 09/23/2014 09:24 PM, Andres Freund wrote:
 I've previously started two threads about replication identifiers. Check
 http://archives.postgresql.org/message-id/20131114172632.GE7522%40alap2.anarazel.de
 and
 http://archives.postgresql.org/message-id/20131211153833.GB25227%40awork2.anarazel.de
 .
 
 The've also been discussed in the course of another thread:
 http://archives.postgresql.org/message-id/20140617165011.GA3115%40awork2.anarazel.de
 
 And even earlier here:
 http://www.postgresql.org/message-id/flat/1339586927-13156-10-git-send-email-and...@2ndquadrant.com#1339586927-13156-10-git-send-email-and...@2ndquadrant.com
 The thread branched a lot, the relevant branch is the one with subject
 [PATCH 10/16] Introduce the concept that wal has a 'origin' node

Right. Long time ago already ;)

 == Identify the origin of changes ==
 
 Say you're building a replication solution that allows two nodes to
 insert into the same table on two nodes. Ignoring conflict resolution
 and similar fun, one needs to prevent the same change being replayed
 over and over. In logical replication the changes to the heap have to
 be WAL logged, and thus the *replay* of changes from a remote node
 produce WAL which then will be decoded again.
 
 To avoid that it's very useful to tag individual changes/transactions
 with their 'origin'. I.e. mark changes that have been directly
 triggered by the user sending SQL as originating 'locally' and changes
 originating from replaying another node's changes as originating
 somewhere else.
 
 If that origin is exposed to logical decoding output plugins they can
 easily check whether to stream out the changes/transactions or not.
 
 
 It is possible to do this by adding extra columns to every table and
 store the origin of a row in there, but that a) permanently needs
 storage b) makes things much more invasive.
 
 An origin column in the table itself helps tremendously to debug issues with
 the replication system. In many if not most scenarios, I think you'd want to
 have that extra column, even if it's not strictly required.

I don't think you'll have much success convincing actual customers of
that. It's one thing to increase the size of the WAL stream a bit, it's
entirely different to persistently increase the table size of all their
tables.

 What I've previously suggested (and which works well in BDR) is to add
 the internal id to the XLogRecord struct. There's 2 free bytes of
 padding that can be used for that purpose.
 
 Adding a field to XLogRecord for this feels wrong. This is for *logical*
 replication - why do you need to mess with something as physical as the WAL
 record format?

XLogRecord isn't all that physical. It doesn't encode anything in that
regard but the fact that there's backup blocks in the record. It's
essentially just an implementation detail of logging. Whether that's
physical or logical doesn't really matter much.

There's basically two primary reasons I think it's a good idea to add it
there:

a) There's many different type of records where it's useful to add the
   origin. Adding the information to all these will make things more
   complicated, using more space, and be more fragile. And I'm pretty
   sure that the number of things people will want to expose over
   logical replication will increase.

   I know of at least two things that have at least some working code:
   Exposing 2PC to logical decoding to allow optionally synchronous
   replication, and allowing to send transactional/nontransactional
   'messages' via the WAL without writing to a table.

   Now, we could add a framework to attach general information to every
   record - but I have a very hard time seing how this will be of
   comparable complexity *and* efficiency.

b) It's dead simple with a pretty darn low cost. Both from a runtime as
   well as a maintenance perspective.

c) There needs to be crash recovery interation anyway to compute the
   state of how far replication succeeded before crashing. So it's not
   like we could make this completely extensible without core code
   knowing.

 And who's to say that a node ID is the most useful piece of information for
 a replication system to add to the WAL header. I can easily imagine that
 you'd want to put a changeset ID or something else in there, instead. (I
 mentioned another example of this in
 http://www.postgresql.org/message-id/4fe17043.60...@enterprisedb.com)

I'm onboard with adding a extensible facility to attach data to
successful transactions. There've been at least two people asking me
directly about how to e.g. attach user information to transactions.

I don't think that's equivalent with what I'm talking about here
though. One important thing about this proposal is that it allows to
completely skip (nearly, except cache inval) all records with a
uninteresting origin id *before* decoding them. Without having to keep
any per transaction state about 'uninteresting' 

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-02 Thread Andres Freund
On 2014-10-01 18:19:05 +0200, Ilya Kosmodemiansky wrote:
 I have a patch which is actually not commitfest-ready now, but it
 always better to start discussing proof of concept having some patch
 instead of just an idea.

That's a good way to start work on a topic like this.

 From an Oracle DBA's point of view, currently we have a lack of
 performance diagnostics tools.

Not just from a oracle DBA POV ;). Generally.

So I'm happy to see some focus on this!

 Saying that, principally they mean an
 Oracle Wait Interface analogue. The Basic idea is to have counters or
 sensors all around database kernel to measure what a particular
 backend is currently waiting for and how long/how often it waits.

Yes, I can see that. I'm not sure whether lwlocks are the primary point
I'd start with though. In many cases you'll wait on so called
'heavyweight' locks too...

 Suppose we have a PostgreSQL instance under heavy write workload, but
 we do not know any details. We could pull from time to time
 pg_stat_lwlock function which would say pid n1 currently in
 WALWriteLock and pid n2 in WALInsertLock. That means we should think
 about write ahead log tuning. Or pid n1 is in some clog-related
 LWLock, which means we need move clog to ramdisk. This is a stupid
 example, but it shows how useful LWLock tracing could be for DBAs.
 Even better idea is to collect daily LWLock distribution, find most
 frequent of them etc.

I think it's more complicated than that - but I also think it'd be a
great help for DBAs and us postgres hackers.

 An idea of this patch is to trace LWLocks with the lowest possible
 performance impact. We put integer lwLockID into procarray, then
 acquiring the LWLock we put its id to procarray and now we could pull
 procarray using a function to see if particular pid holds LWLock.

But a backend can hold more than one lwlock at the same time? I don't
think that's something we can ignore.

 Not
 perfect, but if we see sometimes somebody consumes a lot of particular
 LWLocks, we could investigate this matter in a more precise way using
 another tool. Something like that was implemented in the attached
 patch:
 
 issuing pgbench  -c 50 -t 1000 -j 50
 
 we have something like that:
 
 postgres=# select now(),* from pg_stat_lwlock ;
   now  | lwlockid | pid
 ---+--+--
  2014-10-01 15:11:43.848765+02 |   57 | 4257
 (1 row)

Hm. So you just collect the lwlockid and the pid? That doesn't sound
particularly interesting to me. In my opinion, you'd need at least:
* pid
* number of exclusive/shared acquirations
* number of exclusive/shared acquirations that had to wait
* total wait time of exclusive/shared acquirations

 Questions.
 
 1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock
 (simply for test purposes). Is it OK, or better to implement it
 somewhere inside pg_catalog or in another extension (for example
 pg_stat_statements)?

I personally am doubtful that it makes much sense to move this into an
extension. It'll likely be tightly enough interlinked to backend code
that I don't see the point. But I'd not be surprised if others feel
differently.

I generally don't think you'll get interesting data without a fair bit
of additional work.

The first problem that comes to my mind about collecting enough data is
that we have a very large number of lwlocks (fixed_number + 2 *
shared_buffers). One 'trivial' way of implementing this is to have a per
backend array collecting the information, and then a shared one
accumulating data from it over time. But I'm afraid that's not going to
fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...

Perhaps we should somehow encode this different for individual lwlock
tranches? It's far less problematic to collect all this information for
all but the buffer lwlocks...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-02 Thread Kyotaro HORIGUCHI
Hi,

   But this is imo prohibitive. Yes, we're doing it for a long while. But
   no, that's not ok. It actually prompoted me into prototyping the latch
   thing (in some other thread). I don't think existing practice justifies
   expanding it further.
  
  I see, in that case, this approach seems basically
  applicable. But if I understand correctly, this patch seems not
  to return out of the openssl code even when latch was found to be
  set in secure_raw_write/read.
 
 Correct. That's why I think it's the way forward. There's several
 problems now where the inability to do real things while reading/writing
 is a problem.
 
   I tried setting errno = ECONNRESET
  and it went well but seems a bad deed.
 
 Where and why did you do that?

The patch of this message.

http://www.postgresql.org/message-id/20140828.214704.93968088.horiguchi.kyot...@lab.ntt.co.jp

The reason for setting errno (instead of a variable for it) is to
trick openssl (or my_socck_write? I've forgot it..) into
recognizing as if the underneath send(2) have returned with any
uncontinueable error so it cannot be any of continueable errnos
(EINTR/EWOULDBLOCK/EAGAIN). Iy my faint memory, only avoiding
BIO_set_retry_write() in my_sock_write() dosn't work as expected
but it might be enough that my_sock_write returns -1 and doesn't
set BIO_set_retry_write().

The reason why ECONNNRESET is any of other errnos possible for
send(2)(*1) doesn't seem to fit the real situation, and the
blocked situation seems similar to resetted connection from the
view that it cannot continue to work due to external condition,
and it is used in be_tls_write() in a similary way.

Come to think of it, setting ECONNRESET is not so evil?


  secure_raw_write(Port *port, const void *ptr, size_t len)
  {
n = send(port-sock, ptr, len, 0);

if (!port-noblock  n  0  (errno == EWOULDBLOCK || errno == EAGAIN))
{
  w = WaitLatchOrSocket(MyProc-procLatch, ...
  
  if (w  WL_LATCH_SET)
  {
  ResetLatch(MyProc-procLatch);
  /*
  * Force a return, so interrupts can be processed when not
  * (possibly) underneath a ssl library.
  */
  errno = EINTR;
  (return n;  // n is negative)
  
  
  my_sock_write(BIO *h, const char *buf, int size)
  {
res = secure_raw_write(((Port *) h-ptr), buf, size);
BIO_clear_retry_flags(h);
if (res = 0)
{
  if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
  {
BIO_set_retry_write(h);
 
 Hm, this seems, besides one comment, the code from the last patch in my
 series. Do you have a particular question about it?

I didn't have a particluar qustion about it. This is cited only
in order to show the route to retrying.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench throttling latency limit

2014-10-02 Thread Heikki Linnakangas

On 09/15/2014 08:46 PM, Fabien COELHO wrote:



I'm not sure I like the idea of printing a percentage.  It might be
unclear what the denominator was if somebody feels the urge to work
back to the actual number of skipped transactions.  I mean, I guess
it's probably just the value you passed to -R, so maybe that's easy
enough, but then why bother dividing in the first place?  The user can
do that easily enough if they want the data that way.


Indeed skipped and late per second may have an unclear denominator. If
you divide by the time, the unit would be tps, but 120 tps performance
including 20 late tps, plus 10 skipped tps... I do not think it is that
clear. Reporting tps for transaction *not* performed looks strange.

Maybe late transactions could be given as a percentage of all processed
transactions in the interval. But for skipped the percentage of what? The
only number that would make sense is the total number of transactions
schedule in the interval, but that would mean that the denominator for
late would be different than the denominator for skipped, which is
basically uncomprehensible.


Hmm. I guess the absolute number makes sense, if you expect that there 
are normally zero skipped transactions, or at least a very small number. 
It's more like a good or no good indicator. Ok, I'm fine with that.


The version I'm now working on prints output like this:


$ ./pgbench -T10 -P1  --rate=1600 --latency-limit=10
starting vacuum...end.
progress: 1.0 s, 1579.0 tps, lat 2.973 ms stddev 2.493, lag 2.414 ms, 4 skipped
progress: 2.0 s, 1570.0 tps, lat 2.140 ms stddev 1.783, lag 1.599 ms, 0 skipped
progress: 3.0 s, 1663.0 tps, lat 2.372 ms stddev 1.742, lag 1.843 ms, 4 skipped
progress: 4.0 s, 1603.2 tps, lat 2.435 ms stddev 2.247, lag 1.902 ms, 13 skipped
progress: 5.0 s, 1540.9 tps, lat 1.845 ms stddev 1.270, lag 1.303 ms, 0 skipped
progress: 6.0 s, 1588.0 tps, lat 1.630 ms stddev 1.003, lag 1.097 ms, 0 skipped
progress: 7.0 s, 1577.0 tps, lat 2.071 ms stddev 1.445, lag 1.517 ms, 0 skipped
progress: 8.0 s, 1669.9 tps, lat 2.375 ms stddev 1.917, lag 1.846 ms, 0 skipped
progress: 9.0 s, 1636.0 tps, lat 2.801 ms stddev 2.354, lag 2.250 ms, 5 skipped
progress: 10.0 s, 1606.1 tps, lat 2.751 ms stddev 2.117, lag 2.197 ms, 2 skipped
transaction type: TPC-B (sort of)
scaling factor: 5
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 16034
number of transactions skipped: 28 (0.174 %)
number of transactions above the 10.0 ms latency limit: 70 (0.436 %)
latency average: 2.343 ms
latency stddev: 1.940 ms
rate limit schedule lag: avg 1.801 (max 9.994) ms
tps = 1603.370819 (including connections establishing)
tps = 1603.619536 (excluding connections establishing)


Those progress lines are 79 or 80 characters wide, so they *just* fit in 
a 80-char terminal. Of course, if any of the printed numbers were 
higher, it would not fit. I don't see how to usefully make it more 
terse, though. I think we can live with this - these days it shouldn't 
be a huge problem to enlare the terminal to make the output fit.


Here are new patches, again the first one is just refactoring, and the 
second one contains this feature. I'm planning to commit the first one 
shortly, and the second one later after people have had a chance to look 
at it.


Greg: As the author of pgbench-tools, what do you think of this patch? 
The log file format, in particular.


- Heikki

From 512fde5dc3fde5fc1368b3bf0c09e3ea8e022fad Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Thu, 2 Oct 2014 12:58:14 +0300
Subject: [PATCH 1/2] Refactor pgbench log-writing code to a separate function.

The doCustom function was incredibly long, this makes it a little bit more
readable.
---
 contrib/pgbench/pgbench.c | 340 +++---
 1 file changed, 169 insertions(+), 171 deletions(-)

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 087e0d3..c14a577 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -347,6 +347,9 @@ static char *select_only = {
 static void setalarm(int seconds);
 static void *threadRun(void *arg);
 
+static void doLog(TState *thread, CState *st, FILE *logfile, instr_time *now,
+	  AggVals *agg);
+
 static void
 usage(void)
 {
@@ -1016,6 +1019,16 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 	PGresult   *res;
 	Command   **commands;
 	bool		trans_needs_throttle = false;
+	instr_time	now;
+
+	/*
+	 * gettimeofday() isn't free, so we get the current timestamp lazily the
+	 * first time it's needed, and reuse the same value throughout this
+	 * function after that. This also ensures that e.g. the calculated latency
+	 * reported in the log file and in the totals are the same. Zero means
+	 * not set yet.
+	 */
+	INSTR_TIME_SET_ZERO(now);
 
 top:
 	commands = sql_files[st-use_file];
@@ -1049,10 +1062,10 @@ top:
 
 	if (st-sleeping)
 	{			

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-02 Thread Ilya Kosmodemiansky
On Thu, Oct 2, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote:
 Not just from a oracle DBA POV ;). Generally.

sure

 Saying that, principally they mean an
 Oracle Wait Interface analogue. The Basic idea is to have counters or
 sensors all around database kernel to measure what a particular
 backend is currently waiting for and how long/how often it waits.

 Yes, I can see that. I'm not sure whether lwlocks are the primary point
 I'd start with though. In many cases you'll wait on so called
 'heavyweight' locks too...


I try to kill two birds with one stone: make some prepositional work
on main large topic and deliver some convenience about LWLock
diagnostics. Maybe I'm wrong, but it seems to me it is much easier
task to advocate some more desired feature: we have some heavyweight
locks diagnostics tools and they are better than for lwlocks.



 Suppose we have a PostgreSQL instance under heavy write workload, but
 we do not know any details. We could pull from time to time
 pg_stat_lwlock function which would say pid n1 currently in
 WALWriteLock and pid n2 in WALInsertLock. That means we should think
 about write ahead log tuning. Or pid n1 is in some clog-related
 LWLock, which means we need move clog to ramdisk. This is a stupid
 example, but it shows how useful LWLock tracing could be for DBAs.
 Even better idea is to collect daily LWLock distribution, find most
 frequent of them etc.

 I think it's more complicated than that - but I also think it'd be a
 great help for DBAs and us postgres hackers.


Sure it is more complicated, the example is stupid, just to show the point.


 An idea of this patch is to trace LWLocks with the lowest possible
 performance impact. We put integer lwLockID into procarray, then
 acquiring the LWLock we put its id to procarray and now we could pull
 procarray using a function to see if particular pid holds LWLock.

 But a backend can hold more than one lwlock at the same time? I don't
 think that's something we can ignore.


Yes, this one of the next steps. I have not figure out yet, how to do
it less painfully than LWLOCK_STATS does.


 I personally am doubtful that it makes much sense to move this into an
 extension. It'll likely be tightly enough interlinked to backend code
 that I don't see the point. But I'd not be surprised if others feel
 differently.


Thats why I asked this question, and also because I have no idea where
exactly put this functions inside backend if not into extension. But
probably there are some more important tasks with this work than
moving the function inside, I could do this later if it will be
necessary.


 I generally don't think you'll get interesting data without a fair bit
 of additional work.

Sure

 The first problem that comes to my mind about collecting enough data is
 that we have a very large number of lwlocks (fixed_number + 2 *
 shared_buffers). One 'trivial' way of implementing this is to have a per
 backend array collecting the information, and then a shared one
 accumulating data from it over time. But I'm afraid that's not going to
 fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...

 Perhaps we should somehow encode this different for individual lwlock
 tranches? It's far less problematic to collect all this information for
 all but the buffer lwlocks...

That is a good point. There are actually two things to keep in mind:
i) user interface, ii) implementation

i) Personally, as a DBA, I do not see much sense in unaggregated list
of pid, lwlockid, wait_time or something like that.

Much better to have aggregation by pid and lwlockid, for instance:
- pid
- lwlockid
- lwlockname
- total_count (or number of exclusive/shared acquirations that had to
wait as you suggest, since we have a lot of lwlocks I am doubtful
about how important is the information about non-waiting lwlocks)

ii) Am I correct, that you suggest to go trough MainLWLockTranche and
retrieve all available lwlock information to some structure like
lwLockCell structure I've used in my patch? Something like hash
lwlocid-usagecount?


Regards,
Ilya


 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-02 Thread Ilya Kosmodemiansky
On Thu, Oct 2, 2014 at 5:25 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 It's not at all clear to me that a DTrace-like (or perf-based, rather)
 approach is unsafe, slow, or unsuitable for production use.

 With appropriate wrapper tools I think we could have quite a useful
 library of perf-based diagnostics and tracing tools for PostgreSQL.

It is not actually very slow, overhead is quite reasonable since we
want such comprehensive performance diagnostics. About stability, I
have had a couple of issues with postgres crushes with dtrace and dos
not without. Most of them was on FreeBSD, which is still in use by
many people and were caused actually by freebsd dtrace, but for me it
is quite enough to have doubts about keeping dtrace aware build in
production.


OK, OK -  maybe things were changed last couple of years or will
change soon - still dtrace/perf is well enough for those who is
familiar with it, but you need a really convenient wrapper to make
oracle/db2 DBA happy with using such approach.


 Resolving lock IDs to names might be an issue, though.

I am afraid it is


 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: doc: simplify examples of dynamic SQL

2014-10-02 Thread Pavel Stehule
Hi

There are few less readable examples of dynamic SQL in plpgsql doc

like:

EXECUTE 'SELECT count(*) FROM '
|| tabname::regclass
|| ' WHERE inserted_by = $1 AND inserted = $2'
   INTO c
   USING checked_user, checked_date;

or

EXECUTE 'UPDATE tbl SET '
|| quote_ident(colname)
|| ' = $'
|| newvalue
|| '$ WHERE key = '
|| quote_literal(keyvalue);

We can show a examples based on format function only:

EXECUTE format('SELECT count(*) FROM %I'
   ' WHERE inserted_by = $1 AND inserted = $2',
tabname)
   INTO c
   USING checked_user, checked_date;

or

EXECUTE format('UPDATE tbl SET %I = newvalue WHERE key = %L',
colname, keyvalue)
or

EXECUTE format('UPDATE tbl SET %I = newvalue WHERE key = $1',
colname)
  USING keyvalue;

A old examples are very instructive, but little bit less readable and maybe
too complex for beginners.

Opinions?

Regards

Pavel


Re: [HACKERS] pgcrypto: PGP signatures

2014-10-02 Thread Marko Tiikkaja

On 10/2/14 1:47 PM, Heikki Linnakangas wrote:

I looked at this briefly, and was surprised that there is no support for
signing a message without encrypting it. Is that intentional? Instead of
adding a function to encrypt and sign a message, I would have expected
this to just add a new function for signing, and you could then pass it
an already-encrypted blob, or plaintext.


Yes, that's intentional.  The signatures are part of the encrypted data 
here, so you can't look at a message and determine who sent it.


There was brief discussion about this upthread (though no one probably 
added any links to those discussions into the commit fest app), and I 
still think that both types of signing would probably be valuable.  But 
this patch is already quite big, and I really have no desire to work on 
this sign anything functionality.  The pieces are there, though, so if 
someone wants to do it, I don't see why they couldn't.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-02 Thread Stephen Frost
* Craig Ringer (cr...@2ndquadrant.com) wrote:
  The patch https://commitfest.postgresql.org/action/patch_view?id=885
  (discussion starts here I hope -
  http://www.postgresql.org/message-id/4fe8ca2c.3030...@uptime.jp)
  demonstrates performance problems; LWLOCK_STAT,  LOCK_DEBUG and
  DTrace-like approach are slow, unsafe for production use and a bit
  clumsy for using by DBA.
 
 It's not at all clear to me that a DTrace-like (or perf-based, rather)
 approach is unsafe, slow, or unsuitable for production use.

I've certainly had it take production systems down (perf, specifically),
so I'd definitely consider it unsafe.  I wouldn't say it's unusable,
but it's certainly not what we should have as the end-goal for PG.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-02 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
  1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock
  (simply for test purposes). Is it OK, or better to implement it
  somewhere inside pg_catalog or in another extension (for example
  pg_stat_statements)?
 
 I personally am doubtful that it makes much sense to move this into an
 extension. It'll likely be tightly enough interlinked to backend code
 that I don't see the point. But I'd not be surprised if others feel
 differently.

I agree that this doesn't make sense as an extension.

 I generally don't think you'll get interesting data without a fair bit
 of additional work.

I'm not sure about this..

 The first problem that comes to my mind about collecting enough data is
 that we have a very large number of lwlocks (fixed_number + 2 *
 shared_buffers). One 'trivial' way of implementing this is to have a per
 backend array collecting the information, and then a shared one
 accumulating data from it over time. But I'm afraid that's not going to
 fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...

I was just going to suggest exactly this- a per-backend array which then
gets pushed into a shared area periodically.  Taking up 50MB per backend
is quite a bit though. :/

 Perhaps we should somehow encode this different for individual lwlock
 tranches? It's far less problematic to collect all this information for
 all but the buffer lwlocks...

Yeah, that seems like it would at least be a good approach to begin
with.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Replication identifiers, take 3

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 4:49 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 An origin column in the table itself helps tremendously to debug issues with
 the replication system. In many if not most scenarios, I think you'd want to
 have that extra column, even if it's not strictly required.

I like a lot of what you wrote here, but I strongly disagree with this
part.  A good replication solution shouldn't require changes to the
objects being replicated.  The triggers that Slony and other current
logical replication solutions use are a problem not only because
they're slow (although that is a problem) but also because they
represent a user-visible wart that people who don't otherwise care
about the fact that their database is being replicated have to be
concerned with.  I would agree that some people might, for particular
use cases, want to include origin information in the table that the
replication system knows about, but it shouldn't be required.

When you look at the replication systems that we have today, you've
basically got streaming replication, which is high-performance and
fairly hands-off (at least once you get it set up properly; that part
can be kind of a bear) but can't cross versions let alone database
systems and requires that the slaves be strictly read-only.  Then on
the flip side you've got things like Slony, Bucardo, and others.  Some
of these allow multi-master; all of them at least allow table-level
determination of which server has the writable copy.  Nearly all of
them are cross-version and some even allow replication into
non-PostgreSQL systems.  But they are slow *and administratively
complex*.  If we're able to get something that feels like streaming
replication from a performance and administrative complexity point of
view but can be cross-version and allow at least some writes on
slaves, that's going to be an epic leap forward for the project.

In my mind, that means it's got to be completely hands-off from a
schema design point of view: you should be able to start up a database
and design it however you want, put anything you like into it, and
then decide later that you want to bolt logical replication on top of
it, just as you can for streaming physical replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-02 Thread Robert Haas
On Wed, Oct 1, 2014 at 4:56 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost sfr...@snowman.net wrote:
  Perhaps I'm just being a bit over the top, but all this per-character
  work feels a bit ridiculous..  When we're using MAXIMUM_ALIGNOF, I
  suppose it's not so bad, but is there no hope to increase that and make
  this whole process more efficient?  Just a thought.

 I'm not sure I understand what you're getting at here.

 Was just thinking that we might be able to work out what needs to be
 done without having to actually do it on a per-character basis.  That
 said, I'm not sure it's really worth the effort given that we're talking
 about at most 8 bytes currently.  I had originally been thinking that we
 might increase the minimum size as it might make things more efficient,
 but it's not clear that'd really end up being the case either and,
 regardless, it's probably not worth worrying about at this point.

I'm still not entirely sure we're on the same page.  Most of the data
movement for shm_mq is done via memcpy(), which I think is about as
efficient as it gets.  The detailed character-by-character handling
only really comes up when shm_mq_send() is passed multiple chunks with
lengths that are not a multiple of MAXIMUM_ALIGNOF.  Then we have to
fiddle a bit more.  So making MAXIMUM_ALIGNOF bigger would actually
cause us to do more fiddling, not less.

When I originally designed this queue, I had the idea in mind that
life would be simpler if the queue head and tail pointers always
advanced in multiples of MAXIMUM_ALIGNOF.  That didn't really work out
as well as I'd hoped; maybe I would have been better off having the
queue pack everything in tightly and ignore alignment.  However, there
is some possible efficiency advantage of the present system: when a
message fits in the queue without wrapping, shm_mq_receive() returns a
pointer to the message, and the caller can assume that message is
properly aligned.  If the queue didn't maintain alignment internally,
you'd need to do a copy there before accessing anything inside the
message that might be aligned.  Granted, we don't have any code that
takes advantage of that yet, at least not in core, but the potential
is there.  Still, I may have made the wrong call.  But, I don't think
it's the of this patch to rework the whole framework; we can do that
in the future after this has a few more users and the pros and cons of
various approaches are (hopefully) more clear.  It's not a complex
API, so substituting a different implementation later on probably
wouldn't be too hard.

Anyway, it sounds like you're on board with me committing the first
patch of the series, so I'll do that next week absent objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-02 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Oct 1, 2014 at 4:56 PM, Stephen Frost sfr...@snowman.net wrote:
  Was just thinking that we might be able to work out what needs to be
  done without having to actually do it on a per-character basis.  That
  said, I'm not sure it's really worth the effort given that we're talking
  about at most 8 bytes currently.  I had originally been thinking that we
  might increase the minimum size as it might make things more efficient,
  but it's not clear that'd really end up being the case either and,
  regardless, it's probably not worth worrying about at this point.
 
 I'm still not entirely sure we're on the same page.  Most of the data
 movement for shm_mq is done via memcpy(), which I think is about as
 efficient as it gets.

Right- agreed.  I had originally thought we were doing things on a
per-MAXIMUM_ALIGNOF-basis somewhere else, but that appears to be an
incorrect assumption (which I'm glad for).

 The detailed character-by-character handling
 only really comes up when shm_mq_send() is passed multiple chunks with
 lengths that are not a multiple of MAXIMUM_ALIGNOF.  Then we have to
 fiddle a bit more.  So making MAXIMUM_ALIGNOF bigger would actually
 cause us to do more fiddling, not less.

Sorry- those were two independent items.  Regarding the per-character
work, I was thinking we could work out the number of characters which
need to be moved and then use memcpy directly rather than doing the
per-character work, but as noted, we shouldn't be going through that
loop very often or for very many iterations anyway, and we have to deal
with moving forward through the iovs, so we'd still have to have a loop
there regardless.

 When I originally designed this queue, I had the idea in mind that
 life would be simpler if the queue head and tail pointers always
 advanced in multiples of MAXIMUM_ALIGNOF.  That didn't really work out
 as well as I'd hoped; maybe I would have been better off having the
 queue pack everything in tightly and ignore alignment.  However, there
 is some possible efficiency advantage of the present system: when a
 message fits in the queue without wrapping, shm_mq_receive() returns a
 pointer to the message, and the caller can assume that message is
 properly aligned.  If the queue didn't maintain alignment internally,
 you'd need to do a copy there before accessing anything inside the
 message that might be aligned.  Granted, we don't have any code that
 takes advantage of that yet, at least not in core, but the potential
 is there.  Still, I may have made the wrong call.  But, I don't think
 it's the of this patch to rework the whole framework; we can do that
 in the future after this has a few more users and the pros and cons of
 various approaches are (hopefully) more clear.  It's not a complex
 API, so substituting a different implementation later on probably
 wouldn't be too hard.

Agreed.

 Anyway, it sounds like you're on board with me committing the first
 patch of the series, so I'll do that next week absent objections.

Works for me.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-02 Thread Michael Banck
Hi,

we have seen repeatedly that users can be confused about why PostgreSQL
is not shutting down even though they requested it.  Usually, this is
because `log_checkpoints' is not enabled and the final checkpoint is
being written, delaying shutdown. As no message besides shutting down
is written to the server log in this case, we even had users believing
the server was hanging and pondering killing it manually.

In order to alert those users that a checkpoint is being written, I
propose to add a log message waiting for checkpoint ... on shutdown,
even if log_checkpoints is disabled, as this particular checkpoint might
be important information.

I've attached a trivial patch for this, should it be added to the next
commitfest?


Cheers,

Michael

-- 
Michael Banck
Projektleiter / Berater
Tel.: +49 (2161) 4643-171
Fax:  +49 (2161) 4643-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5a4dbb9..78483ca 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8085,10 +8085,14 @@ CreateCheckPoint(int flags)
 
 	/*
 	 * If enabled, log checkpoint start.  We postpone this until now so as not
-	 * to log anything if we decided to skip the checkpoint.
+	 * to log anything if we decided to skip the checkpoint.  If we are during
+	 * shutdown and checkpoints are not being logged, add a log message that a 
+	 * checkpoint is to be written and shutdown is potentially delayed.
 	 */
 	if (log_checkpoints)
 		LogCheckpointStart(flags, false);
+	else if (flags  CHECKPOINT_IS_SHUTDOWN)
+		ereport(LOG, (errmsg(waiting for checkpoint ...)));
 
 	TRACE_POSTGRESQL_CHECKPOINT_START(flags);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time measurement format - more human readable

2014-10-02 Thread Bogdan Pilch
 On 9/29/14, 1:08 AM, Andres Freund wrote:
 On 2014-09-28 20:32:30 -0400, Gregory Smith wrote:
 There are already a wide range of human readable time interval output
 formats available in the database; see the list at 
 http://www.postgresql.org/docs/current/static/datatype-datetime.html#INTERVAL-STYLE-OUTPUT-TABLE
 He's talking about psql's \timing...
 
 I got that.  My point was that even though psql's timing report is
 kind of a quick thing hacked into there, if it were revised I'd
 expect two things will happen eventually:
 
 -Asking if any of the the interval conversion code can be re-used
 for this purpose, rather than adding yet another custom to one code
 path standard.
 
 -Asking if this should really just be treated like a full interval
 instead, and then overlapping with a significant amount of that
 baggage so that you have all the existing format choices.

That's actually a good idea.
So what you're sayig is that if I come up with some nice way of
setting customized time output format, keeping the default the way it
is now, then it would be worth considering?

Now I understand why it says that a discussion is recommended before
implementing and posting. ;-)

bogdan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 9:54 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Alvaro Herrera wrote:
 So in essence what we're going to do is that the balance mechanism
 considers only tables that don't have per-table configuration options;
 for those that do, we will use the values configured there without any
 changes.

 I'll see about implementing this and making sure it finds its way to
 9.4beta3.

 Here's a patch that makes it work as proposed.

 How do people feel about back-patching this?  On one hand it seems
 there's a lot of fear of changing autovacuum behavior in back branches,
 because for many production systems it has carefully been tuned; on the
 other hand, it seems hard to believe that anyone has tuned the system to
 work sanely given how insanely per-table options behave in the current
 code.

I agree with both of those arguments.  I have run into very few
customers who have used the autovacuum settings to customize behavior
for particular tables, and anyone who hasn't should see no change
(right?), so my guess is that the practical impact of the change will
be pretty limited.  On the other hand, it's a clear behavior change.
Someone could have set the per-table limit to something enormous and
never suffered from that setting because it has basically no effect as
things stand right now today; and that person might get an unpleasant
surprise when they update.

I would at least back-patch it to 9.4.  I could go either way on
whether to back-patch into older branches.  I lean mildly in favor of
it at the moment, but with considerable trepidation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] port/atomics/arch-*.h are missing from installation

2014-10-02 Thread Kohei KaiGai
I got the following error when I try to build my extension
towards the latest master branch.

Is the port/atomics/*.h files forgotten on make install?

[kaigai@magro pg_strom]$ make
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -fpic -Wall -O0
-DPGSTROM_DEBUG=1 -I. -I./ -I/usr/local/pgsql/include/server
-I/usr/local/pgsql/include/internal -D_GNU_SOURCE   -c -o shmem.o
shmem.c
In file included from /usr/local/pgsql/include/server/storage/barrier.h:21:0,
 from shmem.c:18:
/usr/local/pgsql/include/server/port/atomics.h:65:36: fatal error:
port/atomics/arch-x86.h: No such file or directory
 # include port/atomics/arch-x86.h
^
compilation terminated.
make: *** [shmem.o] Error 1


Even though the source directory has header files...

[kaigai@magro sepgsql]$ find ./src | grep atomics
./src/include/port/atomics
./src/include/port/atomics/generic-xlc.h
./src/include/port/atomics/arch-x86.h
./src/include/port/atomics/generic-acc.h
./src/include/port/atomics/arch-ppc.h
./src/include/port/atomics/generic.h
./src/include/port/atomics/arch-hppa.h
./src/include/port/atomics/generic-msvc.h
./src/include/port/atomics/arch-ia64.h
./src/include/port/atomics/generic-sunpro.h
./src/include/port/atomics/arch-arm.h
./src/include/port/atomics/generic-gcc.h
./src/include/port/atomics/fallback.h
./src/include/port/atomics.h
./src/backend/port/atomics.c

the install destination has only atomics.h

[kaigai@magro sepgsql]$ find /usr/local/pgsql/include | grep atomics
/usr/local/pgsql/include/server/port/atomics.h

The attached patch is probably right remedy.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


pgsql-v9.5-fixup-makefile-for-atomics.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Andres Freund
On 2014-09-26 10:28:21 -0400, Robert Haas wrote:
 On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa o...@ohmu.fi wrote:
  So you think a read barrier is the same thing as an acquire barrier
  and a write barrier is the same as a release barrier?  That would be
  surprising.  It's certainly not true in general.
 
  The above doc describes the difference: read barrier requires loads before
  the barrier to be completed before loads after the barrier - an acquire
  barrier is the same, but it also requires loads to be complete before stores
  after the barrier.
 
  Similarly write barrier requires stores before the barrier to be completed
  before stores after the barrier - a release barrier is the same, but it also
  requires loads before the barrier to be completed before stores after the
  barrier.
 
  So acquire is read + loads-before-stores and release is write +
  loads-before-stores.
 
 Hmm.  My impression was that an acquire barrier means that loads and
 stores can migrate forward across the barrier but not backward; and
 that a release barrier means that loads and stores can migrate
 backward across the barrier but not forward.

It's actually more complex than that :(

Simple things first:

Oracle's definition seems pretty iron clad:
http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
__machine_acq_barrier is a clear superset of __machine_r_barrier and
__machine_rel_barrier is a clear superset of __machine_w_barrier

And that's what we're essentially discussing, no? That said, there seems
to be no reason to avoid using __machine_r/w_barrier().


But for the reason why I defined pg_read_barrier/write_barrier to
__atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):

The C11/C++11 definition it's made for is hellishly hard to
understand. There's very subtle differences between acquire/release
operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
parts of the standards. I think it essentially guarantees the mapping
we're talking about, but it's not entirely clear.

The way acquire/release fences are defined is that they form a
'synchronizes-with' relationship with each other. Which would, I think,
be sufficient given that without a release like operation on the other
thread a read/wrie barrier isn't worth much. But there's a rub in that
it requires a atomic operation involved somehere to give that guarantee.

I *did* check that the emitted code on relevant architectures is sane,
but that doesn't guarantee anything for the future.

Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
definitely guaranteeing what we need, even if superflously heavy on some
platforms. It still is significantly more efficient than
__sync_synchronize() which is what was used before. I.e. it generates no
code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
otherwise, although I don't know why) and similar on ia64.

As a reference, relevant standard sections are:
C11: 5.1.2.4 5); 7.17.4
C++11: 29.3; 1.10
Not that we can rely on those, but I think it's a good thing to orient on.

 I'm actually not really sure what this means unless the barrier also
 does something in and of itself.

 For example, consider this:
 
 some stuff
 CAS(lock, 0, 1) // i am an acquire barrier
 more stuff
 lock = 0 // i am a release barrier
 even more stuff
 
 If the CAS() and lock = 0 instructions were FULL barriers, then we'd
 be saying that the stuff that happens in the critical section needs to
 be exactly more stuff.  But if they are acquire and release
 barriers, respectively, then the CPU is allowed to move some stuff
 or even more stuff into the critical section; but what it can't do
 is move more stuff out.

 Now if you just have a naked acquire barrier that is not doing
 anything itself, I don't really know what the semantics of that should
 be.

Which is why these acquire/release fences, in contrast to
acquire/release operations, have more guarantees... You put your finger
right onto the spot.

 Say I want to appear to only change things while flag is 1, so I
 write this code:
 
 flag = 1
 acquire barrier
 things++
 release barrier
 flag = 0
 
 With the definition you (and Oracle) propose

As written above, I don't think that applies to oracle's definition?

 this won't work, because
 there's nothing to keep the modification of things from being
 reordered before flag = 1.  What good is that?  Apparently, I don't
 have any idea!

I hope it's a bit clearer now?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-02 Thread Andres Freund
On 2014-09-25 10:42:29 -0400, Robert Haas wrote:
 On Thu, Sep 25, 2014 at 10:24 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-09-25 10:22:47 -0400, Robert Haas wrote:
  On Thu, Sep 25, 2014 at 10:14 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   That leads me to wonder: Have you measured different, lower, number of
   buffer mapping locks? 128 locks is, if we'd as we should align them
   properly, 8KB of memory. Common L1 cache sizes are around 32k...
 
  Amit has some results upthread showing 64 being good, but not as good
  as 128.  I haven't verified that myself, but have no reason to doubt
  it.
 
  How about you push the spinlock change and I crosscheck the partition
  number on a multi socket x86 machine? Seems worthwile to make sure that
  it doesn't cause problems on x86. I seriously doubt it'll, but ...
 
 OK.

Given that the results look good, do you plan to push this?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote:
 OK.

 Given that the results look good, do you plan to push this?

By this, you mean the increase in the number of buffer mapping
partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS?

If so, and if you don't have any reservations, yeah I'll go change it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-02 Thread Andres Freund
On 2014-10-02 10:40:30 -0400, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote:
  OK.
 
  Given that the results look good, do you plan to push this?
 
 By this, you mean the increase in the number of buffer mapping
 partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS?

Yes. Now that I think about it I wonder if we shouldn't define 
MAX_SIMUL_LWLOCKS like
#define MAX_SIMUL_LWLOCKS   (NUM_BUFFER_PARTITIONS + 64)
or something like that?

 If so, and if you don't have any reservations, yeah I'll go change it.

Yes, I'm happy with going forward.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote:
 It's actually more complex than that :(

 Simple things first:

 Oracle's definition seems pretty iron clad:
 http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
 __machine_acq_barrier is a clear superset of __machine_r_barrier and
 __machine_rel_barrier is a clear superset of __machine_w_barrier

 And that's what we're essentially discussing, no? That said, there seems
 to be no reason to avoid using __machine_r/w_barrier().

So let's use those, then.

 But for the reason why I defined pg_read_barrier/write_barrier to
 __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):

 The C11/C++11 definition it's made for is hellishly hard to
 understand. There's very subtle differences between acquire/release
 operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
 parts of the standards. I think it essentially guarantees the mapping
 we're talking about, but it's not entirely clear.

 The way acquire/release fences are defined is that they form a
 'synchronizes-with' relationship with each other. Which would, I think,
 be sufficient given that without a release like operation on the other
 thread a read/wrie barrier isn't worth much. But there's a rub in that
 it requires a atomic operation involved somehere to give that guarantee.

 I *did* check that the emitted code on relevant architectures is sane,
 but that doesn't guarantee anything for the future.

 Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
 definitely guaranteeing what we need, even if superflously heavy on some
 platforms. It still is significantly more efficient than
 __sync_synchronize() which is what was used before. I.e. it generates no
 code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
 otherwise, although I don't know why) and similar on ia64.

A fully barrier on x86 should be an mfence, right?  With only a
compiler barrier, you have loads ordered with respect to loads and
stores ordered with respect to stores, but the load/store ordering
isn't fully defined.

 Which is why these acquire/release fences, in contrast to
 acquire/release operations, have more guarantees... You put your finger
 right onto the spot.

But, uh, we still don't seem to know what those guarantees actually ARE.

 Say I want to appear to only change things while flag is 1, so I
 write this code:

 flag = 1
 acquire barrier
 things++
 release barrier
 flag = 0

 With the definition you (and Oracle) propose
 this won't work, because
 there's nothing to keep the modification of things from being
 reordered before flag = 1.  What good is that?  Apparently, I don't
 have any idea!

 As written above, I don't think that applies to oracle's definition?

Oracle's definition doesn't look sufficient there.  The acquire
barrier guarantees that the load operations before the barrier will be
completed before the load and store operations after the barrier, but
the only operation before the barrier is a store, not a load, so it
guarantees nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 10:44 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-10-02 10:40:30 -0400, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  OK.
 
  Given that the results look good, do you plan to push this?

 By this, you mean the increase in the number of buffer mapping
 partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS?

 Yes. Now that I think about it I wonder if we shouldn't define 
 MAX_SIMUL_LWLOCKS like
 #define MAX_SIMUL_LWLOCKS   (NUM_BUFFER_PARTITIONS + 64)
 or something like that?

Nah.  That assumes NUM_BUFFER_PARTITIONS will always be the biggest
thing, and I don't see any reason to assume that, even if we're making
it true for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] port/atomics/arch-*.h are missing from installation

2014-10-02 Thread Andres Freund
Hi,

On 2014-10-02 23:33:36 +0900, Kohei KaiGai wrote:
 I got the following error when I try to build my extension
 towards the latest master branch.
 
 Is the port/atomics/*.h files forgotten on make install?

You're right.

 The attached patch is probably right remedy.

I've changed the order to be alphabetic, but otherwise it looks
good. Pushed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-10-02 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Oct 2, 2014 at 9:54 AM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  Alvaro Herrera wrote:
  So in essence what we're going to do is that the balance mechanism
  considers only tables that don't have per-table configuration options;
  for those that do, we will use the values configured there without any
  changes.
 
  I'll see about implementing this and making sure it finds its way to
  9.4beta3.
 
  Here's a patch that makes it work as proposed.
 
  How do people feel about back-patching this?  On one hand it seems
  there's a lot of fear of changing autovacuum behavior in back branches,
  because for many production systems it has carefully been tuned; on the
  other hand, it seems hard to believe that anyone has tuned the system to
  work sanely given how insanely per-table options behave in the current
  code.
 
 I agree with both of those arguments.  I have run into very few
 customers who have used the autovacuum settings to customize behavior
 for particular tables, and anyone who hasn't should see no change
 (right?), so my guess is that the practical impact of the change will
 be pretty limited.  On the other hand, it's a clear behavior change.
 Someone could have set the per-table limit to something enormous and
 never suffered from that setting because it has basically no effect as
 things stand right now today; and that person might get an unpleasant
 surprise when they update.
 
 I would at least back-patch it to 9.4.  I could go either way on
 whether to back-patch into older branches.  I lean mildly in favor of
 it at the moment, but with considerable trepidation.

I'm fine with putting it into 9.4.  I'm not sure that I see the value in
changing the back-branches and then having to deal with the well, if
you're on 9.3.5 then X, but if you're on 9.3.6 then Y or having to
figure out how to deal with the documentation for this.

Has there been any thought as to what pg_upgrade should do..?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-02 Thread David G Johnston
Michael Banck-2 wrote
 Hi,
 
 we have seen repeatedly that users can be confused about why PostgreSQL
 is not shutting down even though they requested it.  Usually, this is
 because `log_checkpoints' is not enabled and the final checkpoint is
 being written, delaying shutdown. As no message besides shutting down
 is written to the server log in this case, we even had users believing
 the server was hanging and pondering killing it manually.
 
 In order to alert those users that a checkpoint is being written, I
 propose to add a log message waiting for checkpoint ... on shutdown,
 even if log_checkpoints is disabled, as this particular checkpoint might
 be important information.
 
 I've attached a trivial patch for this, should it be added to the next
 commitfest?

Peeking at this provokes a couple of novice questions:

While apparently it is impossible to have a checkpoint to log at recovery
end the decision to use the bitwise-OR instead of the local shutdown bool
seemed odd at first...

LogCheckpointStart creates the log entry unconditionally - leaving the
caller responsible for evaluating log_checkpoints - while LogCheckpointEnd
has the log_checkpoints condition built into it.  I mention this because by
the same argument advocating the logging of the checkpoint start we should
also log its end.  In order to do that here, though, we have to change
log_checkpoints before calling LogCheckpointEnd.  Now, since we going to
shutdown anyway that seems benign (and forecloses on any need to set it back
to the prior value) but its still ugly.

Just some thoughts...

The rationale makes perfect sense.  +1

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Log-notice-that-checkpoint-is-to-be-written-on-shutdown-tp5821394p5821417.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Andres Freund
On 2014-10-02 10:55:06 -0400, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote:
  It's actually more complex than that :(
 
  Simple things first:
 
  Oracle's definition seems pretty iron clad:
  http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
  __machine_acq_barrier is a clear superset of __machine_r_barrier and
  __machine_rel_barrier is a clear superset of __machine_w_barrier
 
  And that's what we're essentially discussing, no? That said, there seems
  to be no reason to avoid using __machine_r/w_barrier().
 
 So let's use those, then.

Right, I've never contended that.

  But for the reason why I defined pg_read_barrier/write_barrier to
  __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):
 
  The C11/C++11 definition it's made for is hellishly hard to
  understand. There's very subtle differences between acquire/release
  operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
  parts of the standards. I think it essentially guarantees the mapping
  we're talking about, but it's not entirely clear.
 
  The way acquire/release fences are defined is that they form a
  'synchronizes-with' relationship with each other. Which would, I think,
  be sufficient given that without a release like operation on the other
  thread a read/wrie barrier isn't worth much. But there's a rub in that
  it requires a atomic operation involved somehere to give that guarantee.
 
  I *did* check that the emitted code on relevant architectures is sane,
  but that doesn't guarantee anything for the future.
 
  Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
  definitely guaranteeing what we need, even if superflously heavy on some
  platforms. It still is significantly more efficient than
  __sync_synchronize() which is what was used before. I.e. it generates no
  code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
  otherwise, although I don't know why) and similar on ia64.
 
 A fully barrier on x86 should be an mfence, right?

Right. I've not talked about changing full barrier semantics. What I was
referring to is that until the atomics patch we always redefine
read/write barriers to be full barriers when using gcc intrinsics.

 With only a compiler barrier, you have loads ordered with respect to
 loads and stores ordered with respect to stores, but the load/store
 ordering isn't fully defined.

Yes.

  Which is why these acquire/release fences, in contrast to
  acquire/release operations, have more guarantees... You put your finger
  right onto the spot.
 
 But, uh, we still don't seem to know what those guarantees actually ARE.

Paired together they form a synchronized-with relationship. Problem #1
is that the standard's language isn't, to me at least, clear if there's
not some case where that's not the case. Problem #2 is that our current
README.barrier definition doesn't actually require barriers to be
paired. Which imo is bad, but still a fact.

The definition of ACQ_REL is pretty clearly sufficient imo: Full
barrier in both directions and synchronizes with acquire loads and
release stores in another thread..

  Say I want to appear to only change things while flag is 1, so I
  write this code:
 
  flag = 1
  acquire barrier
  things++
  release barrier
  flag = 0
 
  With the definition you (and Oracle) propose
  this won't work, because
  there's nothing to keep the modification of things from being
  reordered before flag = 1.  What good is that?  Apparently, I don't
  have any idea!
 
  As written above, I don't think that applies to oracle's definition?
 
 Oracle's definition doesn't look sufficient there.

Perhaps I'm just not understanding what you want to show with this
example. This started as a discussion of comparing acquire/release with
read/write barriers, right? Or are you generally wondering about the
point acquire/release barriers?

 The acquire
 barrier guarantees that the load operations before the barrier will be
 completed before the load and store operations after the barrier, but
 the only operation before the barrier is a store, not a load, so it
 guarantees nothing.

Well, 'acquire' operations always have to related to a load. That's why
standalone 'acquire fences' or 'acquire barriers' are more heavyweight
than just a acquiring read.

And realistically, in the above example, you'd have to read flag to see
that it's not already 1, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-02 Thread Andres Freund
On 2014-10-02 10:56:05 -0400, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 10:44 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-10-02 10:40:30 -0400, Robert Haas wrote:
  On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   OK.
  
   Given that the results look good, do you plan to push this?
 
  By this, you mean the increase in the number of buffer mapping
  partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS?
 
  Yes. Now that I think about it I wonder if we shouldn't define 
  MAX_SIMUL_LWLOCKS like
  #define MAX_SIMUL_LWLOCKS   (NUM_BUFFER_PARTITIONS + 64)
  or something like that?
 
 Nah.  That assumes NUM_BUFFER_PARTITIONS will always be the biggest
 thing, and I don't see any reason to assume that, even if we're making
 it true for now.

The reason I'm suggesting is that NUM_BUFFER_PARTITIONS (and
NUM_LOCK_PARTITIONS) are the cases where we can expect many lwlocks to
be held at the same time. It doesn't seem friendly to users
experimenting with changing this to know about a define that's private
to lwlock.c.
But I'm fine with doing this not at all or separately - there's no need
to actually do it together. 

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund and...@2ndquadrant.com wrote:
 So let's use those, then.

 Right, I've never contended that.

OK, cool.

 A fully barrier on x86 should be an mfence, right?

 Right. I've not talked about changing full barrier semantics. What I was
 referring to is that until the atomics patch we always redefine
 read/write barriers to be full barriers when using gcc intrinsics.

OK, got it.  If there's a cheaper way to tell gcc loads before loads
or stores before stores, I'm fine with doing that for those cases.

  Which is why these acquire/release fences, in contrast to
  acquire/release operations, have more guarantees... You put your finger
  right onto the spot.

 But, uh, we still don't seem to know what those guarantees actually ARE.

 Paired together they form a synchronized-with relationship. Problem #1
 is that the standard's language isn't, to me at least, clear if there's
 not some case where that's not the case. Problem #2 is that our current
 README.barrier definition doesn't actually require barriers to be
 paired. Which imo is bad, but still a fact.

I don't know what a synchronized-with relationship means.

Also, I pretty much designed those definitions to match what Linux
does.  And it doesn't require that either, though it says that in most
cases it will work out that way.

 The definition of ACQ_REL is pretty clearly sufficient imo: Full
 barrier in both directions and synchronizes with acquire loads and
 release stores in another thread..

I dunno.  What's an acquire load?  What's a release store?  I know
what loads and stores are; I don't know what the adjectives mean.

 The acquire
 barrier guarantees that the load operations before the barrier will be
 completed before the load and store operations after the barrier, but
 the only operation before the barrier is a store, not a load, so it
 guarantees nothing.

 Well, 'acquire' operations always have to related to a load.That's why
 standalone 'acquire fences' or 'acquire barriers' are more heavyweight
 than just a acquiring read.

Again, I can't judge any of this, because you haven't defined the
terms anywhere.

 And realistically, in the above example, you'd have to read flag to see
 that it's not already 1, right?

Not necessarily.  You could be the only writer.  Think about the way
the backend entries in the stats system work.  The point of setting
the flag may be for other people to know whether the data is in the
middle of being modified.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-10-02 Thread Alvaro Herrera
Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:

  I agree with both of those arguments.  I have run into very few
  customers who have used the autovacuum settings to customize behavior
  for particular tables, and anyone who hasn't should see no change
  (right?), so my guess is that the practical impact of the change will
  be pretty limited.  On the other hand, it's a clear behavior change.
  Someone could have set the per-table limit to something enormous and
  never suffered from that setting because it has basically no effect as
  things stand right now today; and that person might get an unpleasant
  surprise when they update.
  
  I would at least back-patch it to 9.4.  I could go either way on
  whether to back-patch into older branches.  I lean mildly in favor of
  it at the moment, but with considerable trepidation.
 
 I'm fine with putting it into 9.4.  I'm not sure that I see the value in
 changing the back-branches and then having to deal with the well, if
 you're on 9.3.5 then X, but if you're on 9.3.6 then Y or having to
 figure out how to deal with the documentation for this.

Well, the value obviously is that we would fix the bug that Mark
Kirkwood reported that started this thread.

Basically, if you are on 9.3.5 or earlier any per-table options for
autovacuum cost delay will misbehave (meaning: any such table will be
processed with settings flattened according to balancing of the standard
options, _not_ the configured ones).  If you are on 9.3.6 or newer they
will behave as described in the docs.

 Has there been any thought as to what pg_upgrade should do..?

Yes, I'm thinking there's nothing it should do.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-02 Thread Michael Banck
Hi,

Am Donnerstag, den 02.10.2014, 08:17 -0700 schrieb David G Johnston:
 Michael Banck-2 wrote 
  I've attached a trivial patch for this, should it be added to the next
  commitfest?
 
 Peeking at this provokes a couple of novice questions:
 
 While apparently it is impossible to have a checkpoint to log at recovery
 end the decision to use the bitwise-OR instead of the local shutdown bool
 seemed odd at first...

Good point, using `shutdown' is probably better.  I guess that local
bool had escaped my notice when I first had a look at this a while ago.

 LogCheckpointStart creates the log entry unconditionally - leaving the
 caller responsible for evaluating log_checkpoints - while LogCheckpointEnd
 has the log_checkpoints condition built into it. 

I noticed this as well.  My guess would be that this is because
LogCheckpointEnd() also updates the BgWriterStats statistics, and should
do that every time, not just when the checkpoint is getting logged.
Whether or not log_checkpoint is set (and logging should happen) is
evaluated directly after that. 

Some refactoring of this might be useful (or there might be a very good
reason why this was done like this), but this seems outside the scope of
this patch.  AIUI, shutdown will not be further delayed after the
checkpoint is finished, so no further logging is required for the
purpose of this patch.

 The rationale makes perfect sense.  +1

Cool!

I have attached an updated patch using the `shutdown' bool.


Michael

-- 
Michael Banck
Projektleiter / Berater
Tel.: +49 (2161) 4643-171
Fax:  +49 (2161) 4643-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5a4dbb9..f2716ae 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8085,10 +8085,14 @@ CreateCheckPoint(int flags)
 
 	/*
 	 * If enabled, log checkpoint start.  We postpone this until now so as not
-	 * to log anything if we decided to skip the checkpoint.
+	 * to log anything if we decided to skip the checkpoint.  If we are during
+	 * shutdown and checkpoints are not being logged, add a log message that a 
+	 * checkpoint is to be written and shutdown is potentially delayed.
 	 */
 	if (log_checkpoints)
 		LogCheckpointStart(flags, false);
+	else if (shutdown)
+		ereport(LOG, (errmsg(waiting for checkpoint ...)));
 
 	TRACE_POSTGRESQL_CHECKPOINT_START(flags);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] DDL Damage Assessment

2014-10-02 Thread Dimitri Fontaine
Hi fellow hackers,

I would like to work on a new feature allowing our users to assess the
amount of trouble they will run into when running a DDL script on their
production setups, *before* actually getting their services down.

The main practical example I can offer here is the ALTER TABLE command.
Recent releases are including very nice optimisations to it, so much so
that it's becoming increasingly hard to answer some very basic
questions:

  - what kind of locks will be taken? (exclusive, shared)
  - on what objects? (foreign keys, indexes, sequences, etc)
  - will the table have to be rewritten? the indexes?

Of course the docs are answering parts of those, but in particular the
table rewriting rules are complex enough that “accidental DBAs” will
fail to predict if the target data type is binary coercible to the
current one.

Questions:

 1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

 2. What do you think such a feature should look like?

 3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Fabrízio de Royes Mello
On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine dimi...@2ndquadrant.fr
wrote:

 Hi fellow hackers,

 I would like to work on a new feature allowing our users to assess the
 amount of trouble they will run into when running a DDL script on their
 production setups, *before* actually getting their services down.

 The main practical example I can offer here is the ALTER TABLE command.
 Recent releases are including very nice optimisations to it, so much so
 that it's becoming increasingly hard to answer some very basic
 questions:

   - what kind of locks will be taken? (exclusive, shared)
   - on what objects? (foreign keys, indexes, sequences, etc)
   - will the table have to be rewritten? the indexes?

 Of course the docs are answering parts of those, but in particular the
 table rewriting rules are complex enough that “accidental DBAs” will
 fail to predict if the target data type is binary coercible to the
 current one.

 Questions:

  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

  2. What do you think such a feature should look like?

  3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?

 Provided that we are able to converge towards a common enough answer to
 those questions, I propose to hack my way around and send patches to
 have it (the common answer) available in the next PostgreSQL release.


What you are proposing is some kind of dry-run with verbose output?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Claudio Freire
On Thu, Oct 2, 2014 at 1:46 PM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine dimi...@2ndquadrant.fr
 wrote:

 Hi fellow hackers,

 I would like to work on a new feature allowing our users to assess the
 amount of trouble they will run into when running a DDL script on their
 production setups, *before* actually getting their services down.

 The main practical example I can offer here is the ALTER TABLE command.
 Recent releases are including very nice optimisations to it, so much so
 that it's becoming increasingly hard to answer some very basic
 questions:

   - what kind of locks will be taken? (exclusive, shared)
   - on what objects? (foreign keys, indexes, sequences, etc)
   - will the table have to be rewritten? the indexes?

 Of course the docs are answering parts of those, but in particular the
 table rewriting rules are complex enough that “accidental DBAs” will
 fail to predict if the target data type is binary coercible to the
 current one.

 Questions:

  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

  2. What do you think such a feature should look like?

  3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?

 Provided that we are able to converge towards a common enough answer to
 those questions, I propose to hack my way around and send patches to
 have it (the common answer) available in the next PostgreSQL release.


 What you are proposing is some kind of dry-run with verbose output?


EXPLAIN ALTER TABLE ?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Harold Giménez
I think the main issue is when a table rewrite is triggered on a DDL
command on a large table, as this is what frequently leads to
unavailability. The idea of introducing a NOREWRITE keyword to DDL
commands then came up (credit: Peter Geoghegan). When the NOREWRITE
keyword is used and the DDL statement would rewrite the table, the
command errors and exits.

This would allow ORM and framework authors to include the NOREWRITE
option by default, only to be disabled on a per-statement basis by the
developer, once they have assessed that it may be safe or otherwise
they still want to proceed with this. The workflow for an app
developer then becomes:

* Write offending data migration (eg: add a column with a NOT NULL
constraint and default value)
* Test it locally, either by running automated test suite or running on staging
* See that it fails because of NOREWRITE option
* Assess situation. If it's a small table, or I still want to ignore,
override the option. Or rewrite migration to avoid rewrite.
* Repeat

I like this a lot just because it's simple, limited in scope, and can
be easily integrated into ORMs saving users hours of downtime and
frustration.

Thoughts?

On Thu, Oct 2, 2014 at 9:46 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:


 On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine dimi...@2ndquadrant.fr
 wrote:

 Hi fellow hackers,

 I would like to work on a new feature allowing our users to assess the
 amount of trouble they will run into when running a DDL script on their
 production setups, *before* actually getting their services down.

 The main practical example I can offer here is the ALTER TABLE command.
 Recent releases are including very nice optimisations to it, so much so
 that it's becoming increasingly hard to answer some very basic
 questions:

   - what kind of locks will be taken? (exclusive, shared)
   - on what objects? (foreign keys, indexes, sequences, etc)
   - will the table have to be rewritten? the indexes?

 Of course the docs are answering parts of those, but in particular the
 table rewriting rules are complex enough that “accidental DBAs” will
 fail to predict if the target data type is binary coercible to the
 current one.

 Questions:

  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

  2. What do you think such a feature should look like?

  3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?

 Provided that we are able to converge towards a common enough answer to
 those questions, I propose to hack my way around and send patches to
 have it (the common answer) available in the next PostgreSQL release.


 What you are proposing is some kind of dry-run with verbose output?

 Regards,

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-02 Thread Heikki Linnakangas

On 10/02/2014 05:40 PM, Robert Haas wrote:

On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote:

OK.


Given that the results look good, do you plan to push this?


By this, you mean the increase in the number of buffer mapping
partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS?


Hmm, do we actually ever need to hold all the buffer partition locks at 
the same time? At a quick search for NUM_BUFFER_PARTITIONS in the code, 
I couldn't find any place where we'd do that. I bumped up 
NUM_BUFFER_PARTITIONS to 128, but left MAX_SIMUL_LWLOCKS at 100, and did 
make check. It passed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-02 Thread Andres Freund
On 2014-10-02 20:04:58 +0300, Heikki Linnakangas wrote:
 On 10/02/2014 05:40 PM, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 OK.
 
 Given that the results look good, do you plan to push this?
 
 By this, you mean the increase in the number of buffer mapping
 partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS?
 
 Hmm, do we actually ever need to hold all the buffer partition locks at the
 same time? At a quick search for NUM_BUFFER_PARTITIONS in the code, I
 couldn't find any place where we'd do that. I bumped up
 NUM_BUFFER_PARTITIONS to 128, but left MAX_SIMUL_LWLOCKS at 100, and did
 make check. It passed.

Do a make check-world and it'll hopefully fail ;). Check
pg_buffercache_pages.c.

I'd actually quite like to have a pg_buffercache version that, at least
optionally, doesn't do this, but that's a separate thing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/02/2014 11:30 AM, Dimitri Fontaine wrote:
 Questions:
 
 1. Do you agree that a systematic way to report what a DDL command
 (or script, or transaction) is going to do on your production
 database is a feature we should provide to our growing user base?

+1

I really like the idea and would find it useful/time-saving

 2. What do you think such a feature should look like?

Elsewhere on this thread EXPLAIN was suggested. That makes a certain
amount of sense.

Maybe something like EXPLAIN IMPACT [...]

 3. Does it make sense to support the whole set of DDL commands from
 the get go (or ever) when most of them are only taking locks in
 their own pg_catalog entry anyway?

Yes, I think it should cover all commands that can have an
availability impact.

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJULYhAAAoJEDfy90M199hllYgP/0Du599FAMtGh+Z9PsT+XRp9
eodurnf3TjbN8euh+/KGUDDy9dh8xiyeVCbLwT1a7tbJpY5ziGKQFrFm/5yXteq1
vU58mrvx3RwsuWJiTxVKUUddJgBd/e1Q1n7CS/rDHMWyHHxW9PfVi4c/V/09NB/p
IZQP2lTiEJMZVRgemR53OokQarmrm08fN5HtaAbdwwA0y3q26lPWyx7y0DBiy1w2
2KMNQVxIHDYPby+HlDiJEwq8YxNEOuUcznfr2rICxX5iJxsoA13A04GwqDnzcPdL
W3eg+P4qV7TriytpGD1GgqkyAzqTuQNaOBcGY7pvWBhBjQiDPA0fGuNw/a7MeOco
9JTJeCjOygoSopnMFMXyF7epjZxReZtr88uC8nZDXC8wwkJIVDzhNQefhT1lTA+a
1MTcBwgFBq1lH5ttdOTKjbqD7+uPp7nxaMhD9GNgCLu/NZeMNo1O4HMjv9Ir6AyQ
etbkxcdOFuDaHmnrXnGOAFiM01JmorpVu6LBw4OjiD9KaO9X0gudHPo4LzocCxdB
6V2eTl95z/fKlG7uQOrNJ/S9y43FhFtgMZVsi0qIRqzu34ge7nxowjwyF9wcMZSq
CKCEk4NlzULGsivPF96eMxxtebFgvYp10AvRvckGuf9s3dZBmqHfI6PPT1J3qPyj
goq9yD/KpDfHLziqmZpr
=6cWT
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Andres Freund
On 2014-10-02 11:35:32 -0400, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund and...@2ndquadrant.com wrote:
   Which is why these acquire/release fences, in contrast to
   acquire/release operations, have more guarantees... You put your finger
   right onto the spot.
 
  But, uh, we still don't seem to know what those guarantees actually ARE.
 
  Paired together they form a synchronized-with relationship. Problem #1
  is that the standard's language isn't, to me at least, clear if there's
  not some case where that's not the case. Problem #2 is that our current
  README.barrier definition doesn't actually require barriers to be
  paired. Which imo is bad, but still a fact.
 
 I don't know what a synchronized-with relationship means.

I'm using the standard's language here, given that I'm trying to reason
about its behaviour...

What it means is that if you have a matching pair of acquire/release
operations or barriers/fences everything that happened *before* the last
release fence will be visible *after* executing the next acquire
operation in a different thread-of-execution. And 'after' is defined in
the way that is true if the 'acquiring' thread can see the result of the
'releasing' operation.
I.e. no loads after the acquire can see values from before the release.

My problem with the definition in the standard is that it's not
particularly clear how acquire fences *without* a underlying explicit
atomic operation are defined in the standard.

I checked gcc's current code and it's fine in that regard. Also other
popular concurrent open source stuff like
http://git.qemu.org/?p=qemu.git;a=blob;f=include/qemu/atomic.h;hb=HEAD
does precisely what I'm talking about:

100 #ifndef smp_wmb
101 #ifdef __ATOMIC_RELEASE
102 #define smp_wmb()   __atomic_thread_fence(__ATOMIC_RELEASE)
103 #else
104 #define smp_wmb()   __sync_synchronize()
105 #endif
106 #endif
107
108 #ifndef smp_rmb
109 #ifdef __ATOMIC_ACQUIRE
110 #define smp_rmb()   __atomic_thread_fence(__ATOMIC_ACQUIRE)
111 #else
112 #define smp_rmb()   __sync_synchronize()
113 #endif
114 #endif

The commit that added it
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=5444e768ee1abe6e021bece19a9a932351f88c88
was written by one gcc guy and reviewed by another one...

So I think we can be pretty sure that gcc's __atomic_thread_fence()
behaves like we want. We probably have to be a bit more careful about
extending that definition (by including atomic.h and doing
atomic_thread_fence(memory_order_acquire)) to use general C11. Which is
probably a couple years away anyway.

 Also, I pretty much designed those definitions to match what Linux
 does.  And it doesn't require that either, though it says that in most
 cases it will work out that way.

My point is that that read barriers aren't particularly meaningful
without a defined store order from another thread/process. Without any
form of pairing you don't have that. The writing side could just have
reordered the writes in a way you didn't want them.  And the kernel docs
do say A lack of appropriate pairing is almost certainly an error. But
since read barriers also pair with lock releases operations, that's
normally not a big problem.

  The definition of ACQ_REL is pretty clearly sufficient imo: Full
  barrier in both directions and synchronizes with acquire loads and
  release stores in another thread..
 
 I dunno.  What's an acquire load?  What's a release store?  I know
 what loads and stores are; I don't know what the adjectives mean.

An acquire load is either an explicit atomic load (tas, cmpxchg, etc
also count) or a normal load combined with a acquire barrier. The symmetric
definition is true for release store.

(so, on x86 every load/store that prevents compiler reordering
essentially a acquire/release store)

  And realistically, in the above example, you'd have to read flag to see
  that it's not already 1, right?
 
 Not necessarily.  You could be the only writer.  Think about the way
 the backend entries in the stats system work.  The point of setting
 the flag may be for other people to know whether the data is in the
 middle of being modified.

So you're thinking about something seqlock alike... Isn't the problem
then that you actually don't want acquire semantics, but release or
write barrier semantics on that store? The acquire/read barrier part
would be on the reader side, no?
I'm still unsure what you want to show with that example?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Joshua D. Drake


On 10/02/2014 09:30 AM, Dimitri Fontaine wrote:


Questions:

  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?


I would say it is late to the game and a great feature.



  2. What do you think such a feature should look like?



I liked the other post that said: EXPLAIN ALTER TABLE or whatever. 
Heck it could even be useful to have EXPLAIN ANALZYE ALTER TABLE in 
case people want to run it on staging/test/dev environments to judge impact.



  3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?


I would think that introducing this incrementally makes sense.

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] TAP test breakage on MacOS X

2014-10-02 Thread Robert Haas
make check-world dies ingloriously for me, like this:

/bin/sh ../../../config/install-sh -c -d tmp_check/log
make -C ../../..
DESTDIR='/Users/rhaas/pgsql/src/bin/initdb'/tmp_check/install install
'/Users/rhaas/pgsql/src/bin/initdb'/tmp_check/log/install.log 21
cd .  TESTDIR='/Users/rhaas/pgsql/src/bin/initdb'
PATH=/Users/rhaas/pgsql/src/bin/initdb/tmp_check/install/Users/rhaas/project/bin:$PATH
DYLD_LIBRARY_PATH='/Users/rhaas/pgsql/src/bin/initdb/tmp_check/install/Users/rhaas/project/lib'
PGPORT='65432' prove --ext='.pl' -I ../../../src/test/perl/ --verbose
t/
t/001_initdb.pl ..
1..14
1..3
ok 1 - initdb --help exit code 0
ok 2 - initdb --help goes to stdout
ok 3 - initdb --help nothing to stderr
ok 1 - initdb --help
1..3
ok 1 - initdb --version exit code 0
ok 2 - initdb --version goes to stdout
ok 3 - initdb --version nothing to stderr
ok 2 - initdb --version
1..2
ok 1 - initdb with invalid option nonzero exit code
ok 2 - initdb with invalid option prints error message
# Looks like your test exited with 256 just after 2.
not ok 3 - initdb options handling

#   Failed test 'initdb options handling'
#   at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229.
ok 4 - basic initdb
ok 5 - existing data directory
ok 6 - nosync
ok 7 - sync only
ok 8 - sync missing data directory
ok 9 - existing empty data directory
ok 10 - separate xlog directory
ok 11 - relative xlog directory not allowed
ok 12 - existing empty xlog directory
ok 13 - existing nonempty xlog directory
ok 14 - select default dictionary
# Looks like you failed 1 test of 14.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/14 subtests

Test Summary Report
---
t/001_initdb.pl (Wstat: 256 Tests: 14 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
Files=1, Tests=14, 23 wallclock secs ( 0.02 usr  0.01 sys +  9.57 cusr
 3.37 csys = 12.97 CPU)
Result: FAIL
make[2]: *** [check] Error 1
make[1]: *** [check-initdb-recurse] Error 2
make: *** [check-world-src/bin-recurse] Error 2

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 1:07 PM, Andres Freund and...@2ndquadrant.com wrote:
 Do a make check-world and it'll hopefully fail ;). Check
 pg_buffercache_pages.c.

Yep.  Committed, with an update to the comments in lwlock.c to allude
to the pg_buffercache issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proper query implementation for Postgresql driver

2014-10-02 Thread Robert Haas
On Tue, Sep 30, 2014 at 1:20 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 Frankly, I suggest dropping simple entirely and using only the
 parse/bind/describe/execute flow in the v3 protocol.

The last time I checked, that was significantly slower.

http://www.postgresql.org/message-id/ca+tgmoyjkfnmrtmhodwhnoj1jwcgzs_h1r70ercecrwjm65...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Stephen Frost
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

I definitely like the idea of such a 'dry-run' kind of operation to get
an idea of what would happen.

  2. What do you think such a feature should look like?

My thinking is that this would be implemented as a new kind of read-only
transaction type.

  3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?

On the fence about this one..  In general, I'd say yes, but I've not
looked at every case and I imagine there are DDL commands which really
aren't all that interesting for this case.

 Provided that we are able to converge towards a common enough answer to
 those questions, I propose to hack my way around and send patches to
 have it (the common answer) available in the next PostgreSQL release.

That feels a bit ambitious, given that we've not yet really nailed down
the feature definition yet, but I do like where you're going. :)

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Stephen Frost
* Harold Giménez (har...@heroku.com) wrote:
 I think the main issue is when a table rewrite is triggered on a DDL
 command on a large table, as this is what frequently leads to
 unavailability. The idea of introducing a NOREWRITE keyword to DDL
 commands then came up (credit: Peter Geoghegan). When the NOREWRITE
 keyword is used and the DDL statement would rewrite the table, the
 command errors and exits.
 
 This would allow ORM and framework authors to include the NOREWRITE
 option by default, only to be disabled on a per-statement basis by the
 developer, once they have assessed that it may be safe or otherwise
 they still want to proceed with this. The workflow for an app
 developer then becomes:
 
 * Write offending data migration (eg: add a column with a NOT NULL
 constraint and default value)
 * Test it locally, either by running automated test suite or running on 
 staging
 * See that it fails because of NOREWRITE option
 * Assess situation. If it's a small table, or I still want to ignore,
 override the option. Or rewrite migration to avoid rewrite.
 * Repeat
 
 I like this a lot just because it's simple, limited in scope, and can
 be easily integrated into ORMs saving users hours of downtime and
 frustration.
 
 Thoughts?

Not against it, but feels like an independent thing to consider- what
Devrim is suggesting is broader and encompasses the issue of locks,
which are certainly important to consider also.

In short, seems like having both would be worthwhile.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Stephen Frost
* Joshua D. Drake (j...@commandprompt.com) wrote:
   2. What do you think such a feature should look like?
 
 I liked the other post that said: EXPLAIN ALTER TABLE or whatever.
 Heck it could even be useful to have EXPLAIN ANALZYE ALTER TABLE
 in case people want to run it on staging/test/dev environments to
 judge impact.

The downside of the 'explain' approach is that the script then has to be
modified to put 'explain' in front of everything and then you have to go
through each statement and consider it.  Having a 'dry-run' transaction
type which then produces a report at the end feels like it'd be both
easier to assess the overall implications, and less error-prone as you
don't have to prefex every statement with 'explain'.  It might even be
possible to have the local view of post-alter statements be available
inside of this 'dry-run' option- that is, if you add a column in the
transaction then the column exists to the following commands, so it
doesn't just error out.  Having 'explain whatever' wouldn't give you
that and so you really wouldn't be able to have whole scripts run by
just pre-pending each command with 'explain'.

   3. Does it make sense to support the whole set of DDL commands from the
  get go (or ever) when most of them are only taking locks in their
  own pg_catalog entry anyway?
 
 I would think that introducing this incrementally makes sense.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] NEXT VALUE FOR sequence

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 7:27 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 SQL:2003 introduced the function NEXT VALUE FOR sequence. Google
 tells me that at least DB2, SQL Server and a few niche databases
 understand it so far.  As far as I can tell there is no standardised
 equivalent of currval and setval (but I only have access to second
 hand information about the standard, like articles and the manuals of
 other products).

 Here is a starter patch to add it.  To avoid a shift/reduce conflict,
 I had to reclassify the keyword NEXT.  I admit that I don't fully
 understand the consequences of that change!  Please let me know if you
 think this could fly.

 Looks correct. Of course, it's annoying to have to reserve the NEXT keyword
 (as a type_func_name_keyword, not fully reserved).

 One way to avoid that is to collapse NEXT VALUE FOR into a single token in
 parser.c. We do that for a few other word pairs: NULLS FIRST, NULLS LAST,
 WITH TIME and WITH ORDINALITY. In this case you'd need to look-ahead three
 tokens, not two, but I guess that'd be doable.

Those kinds of hacks are not scalable.  It's not too bad right now
because NULLS, FIRST, and LAST are all rarely-used keywords and
there's rarely a reason for FIRST and LAST to follow NULLS except in
the exact context we care about.  But the more we extend those hacks,
the more likely it is that the lexer will smash the tokens in some
case where the user actually meant something else.  Hacking the lexer
to get around grammar conflicts doesn't actually fix whatever
intrinsic semantic conflict exists; it just keeps bison from knowing
about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Josh Berkus

 Questions:
 
  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

Yes.

  2. What do you think such a feature should look like?

As with others, I think EXPLAIN is a good way to do this without adding
a keyword.  So you'd do:

EXPLAIN
ALTER TABLE 

... and it would produce a bunch of actions, available in either text or
JSON formats.  For example:

{ locks : [ { lock_type: relation,
  relation: table1,
  lock type: ACCESS EXCLUSIVE },
{ lock_type: transaction },
{ lock_type: catalog,
  catalogs: [pg_class, pg_attribute, pg_statistic],
  lock_type: EXCLUSIVE } ]
}
{ writes : [
{ object: relation files,
  action: rewrite },
{ object: catalogs
  action: update }
]

... etc.  Would need a lot of refinement, but you get the idea.

  3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?

Well, eventually we'd want to support all of them just to avoid having
things be wierd for users.  However, here's a priority order:

ALTER TABLE
CREATE TABLE
DROP TABLE
ALTER VIEW
CREATE VIEW
CREATE INDEX
DROP INDEX

... since all of the above can have unexpected secondary effects on
locking.  For example, if you create a table with FKs it will take an
ACCESS EXCLUSIVE lock on the FK targets.  And if you DROP a partition,
it takes an A.E. lock on the parent table.

 Provided that we are able to converge towards a common enough answer to
 those questions, I propose to hack my way around and send patches to
 have it (the common answer) available in the next PostgreSQL release.

Great!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 12:54 AM, Peter Geoghegan p...@heroku.com wrote:
 I've started off by adding varied examples of the use of the existing
 proposed syntax. I'll expand on this soon.

I spent some time today expanding on the details, and commenting on
the issues around the custom syntax (exactly what it does, issues
around ambiguous conflict-on unique indexes, etc).

Also, Challenges, issues has been updated - I've added some
secondary concerns. These are non-contentious items that I think are
of concern, and would like to highlight now.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread José Luis Tallón
On 10/02/2014 06:30 PM, Dimitri Fontaine wrote:
 Hi fellow hackers,
 [snip]
 Questions:

  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

Yes, please

  2. What do you think such a feature should look like?
EXPLAIN [(verbose, format)] [DDL_COMMAND]

as in:
EXPLAIN (verbose on, format text, impact on)
ALTER TABLE emp
ADD COLUMN foo2 jsonb NOT NULL DEFAULT '{}';

where the output would include something like:

...
EXCLUSIVE LOCK ON TABLE emp;   // due to IMPACT ON
REWRITE TABLE emp due to adding column foo2 (default='{}'::jsonb)   
// due to VERBOSE on
...


 3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?

For completeness sake, yes.
But, unless the impact and verbose modifiers are specified, most
would be quite self-explanatory:

EXPLAIN (verbose on, impact on) TRUNCATE TABLE emp;
Execution plan:
- EXCLUSIVE LOCK ON TABLE emp;

- truncate index: II (file=N)  // 
= relfilenode
- truncate main fork: N (tablespace: T)// 
= relfilenode
- truncate visibility map

- RELEASE LOCK ON TABLE emp;

Summary: Z pages ( MMM MB ) would be freed

versus a simple:
EXPLAIN TRUNCATE TABLE emp;
Execution plan:
- truncate index: emp_pkey
- truncate index: emp_foo2_idx
- truncate relation emp


 Provided that we are able to converge towards a common enough answer to
 those questions, I propose to hack my way around and send patches to
 have it (the common answer) available in the next PostgreSQL release.


Sounds very good, indeed.
Count on me as tester :)


--
José Luis Tallón




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-02 Thread Bruce Momjian
On Tue, Sep 30, 2014 at 02:57:43PM -0700, Josh Berkus wrote:
 I don't know that that is the *expectation*.  However, I personally
 would find it *acceptable* if it meant that we could get efficient merge
 semantics on other aspects of the syntax, since my primary use for MERGE
 is bulk loading.
 
 Regardless, I don't think there's any theoretical way to support UPSERT
 without a unique constraint.  Therefore eventual support of this would
 require a full table lock.  Therefore having it use the same command as
 UPSERT with a unique constraint is a bit of a booby trap for users.
 This is a lot like the ADD COLUMN with a default rewrites the whole
 table booby trap which hundreds of our users complain about every
 month.  We don't want to add more such unexpected consequences for users.

I think if we use the MERGE command for this feature we would need to
use a non-standard keyword to specify that we want OLTP/UPSERT
functionality.  That would allow us to mostly use the MERGE standard
syntax without having surprises about non-standard behavior.  I am
thinking of how CONCURRENTLY changes the behavior of some commands.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Steven Lembark

 EXPLAIN ALTER TABLE ?

Good thing: People recognize it.
Bad thing:  People might not be able to tell the difference between
a DDL and DML result.

What about EXPLAIN DDL ...?

The extra keyword (DDL) makes it a bit more explicit that the 
results are not comparable to the standard explain output.

-- 
Steven Lembark 3646 Flora Pl
Workhorse Computing   St Louis, MO 63110
lemb...@wrkhors.com  +1 888 359 3508


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-10-02 Thread Steven Lembark
On Mon, 01 Sep 2014 12:00:48 +0200
Marko Tiikkaja ma...@joh.to wrote:

 create a new language.

There are enough problems with SQL in general, enough alternatives
proposed over time that it might be worth coming up with something
that Just Works.

-- 
Steven Lembark 3646 Flora Pl
Workhorse Computing   St Louis, MO 63110
lemb...@wrkhors.com  +1 888 359 3508


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-10-02 Thread Steven Lembark

 Python2 - Python3 would've been a lot less painful if you could mark,
 on a module-by-module basis, whether a module was python2 or python3
 code. It wasn't very practical for Python because python code can reach
 deep into the guts of unrelated objects discovered at runtime  - it can
 add/replace member functions, even hot-patch bytecode. That's not
 something we allow in PL/PgSQL, though; from the outside a PL/PgSQL
 function is pretty opaque to callers.

Perl does this with use version. Currently this guarantees that
the compiler is a minimum version and also turns OFF later version's
keywords. 

At that point someone could turn on/off the appropriate syntax with
by module or code block. If you never turn on v2.0 you never get the
new behavior; after that people can adjust the amount and location 
of later code to their own taste.

-- 
Steven Lembark 3646 Flora Pl
Workhorse Computing   St Louis, MO 63110
lemb...@wrkhors.com  +1 888 359 3508


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Claudio Freire
On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost sfr...@snowman.net wrote:
 * Joshua D. Drake (j...@commandprompt.com) wrote:
   2. What do you think such a feature should look like?

 I liked the other post that said: EXPLAIN ALTER TABLE or whatever.
 Heck it could even be useful to have EXPLAIN ANALZYE ALTER TABLE
 in case people want to run it on staging/test/dev environments to
 judge impact.

 The downside of the 'explain' approach is that the script then has to be
 modified to put 'explain' in front of everything and then you have to go
 through each statement and consider it.  Having a 'dry-run' transaction
 type which then produces a report at the end feels like it'd be both
 easier to assess the overall implications, and less error-prone as you
 don't have to prefex every statement with 'explain'.  It might even be
 possible to have the local view of post-alter statements be available
 inside of this 'dry-run' option- that is, if you add a column in the
 transaction then the column exists to the following commands, so it
 doesn't just error out.  Having 'explain whatever' wouldn't give you
 that and so you really wouldn't be able to have whole scripts run by
 just pre-pending each command with 'explain'.

That sounds extremely complex. You'd have to implement the fake
columns, foreign keys, indexes, etc on most execution nodes, the
planner, and even system views.

IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
locks. I don't think you can simulate the side effects without locks,
so getting the local view of changes will be extremely difficult
unless you limit the scope considerably.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost sfr...@snowman.net wrote:
 The downside of the 'explain' approach is that the script then has to be
 modified to put 'explain' in front of everything and then you have to go
 through each statement and consider it.  Having a 'dry-run' transaction
 type which then produces a report at the end feels like it'd be both
 easier to assess the overall implications, and less error-prone as you
 don't have to prefex every statement with 'explain'.  It might even be
 possible to have the local view of post-alter statements be available
 inside of this 'dry-run' option- that is, if you add a column in the
 transaction then the column exists to the following commands, so it
 doesn't just error out.  Having 'explain whatever' wouldn't give you
 that and so you really wouldn't be able to have whole scripts run by
 just pre-pending each command with 'explain'.

It's kind of tricky to implement a patch to figure this out ahead of
time. Some of the actual lock acquisitions are well hidden, in terms
of how the code is structured. In others cases, it may not even be
possible to determine ahead of time exactly what locks will be taken.

As Harold mentioned, another idea along the same lines would be to
decorate DDL with a NOWAIT no locking assertion and/or no rewrite
assertion.  Basically, if this DDL (or perhaps any DDL, if this is
implemented as a GUC instead) necessitates a table rewrite (and
requires an AccessExclusiveLock), throw an error. That's the case that
most people care about.

This may not even be good enough, though. Consider:

Session 1 is a long running transaction. Maybe it's a spurious
idle-in-transaction situation, but it could also be totally
reasonable. It holds an AccessShareLock on some relation, as long
running transactions are inclined to do.

Session 2 is our migration. It needs an AccessExclusiveLock to ALTER
TABLE on the same relation (or whatever). But it doesn't need a
rewrite, which is good. It comes along and attempts to acquire the
lock, blocking on session 1.

Session 3 is an innocent bystander. It goes to query the same table in
an ordinary, routine way - a SELECT statement. Even though session 2's
lock is not granted yet, session 3 is not at liberty to skip the queue
and get its own AccessShareLock. The effect is about the same as if
session 2 did need to hold an AccessExclusiveLock for ages: read
queries block for a long time. And yet, in theory session 2's impact
on production should not be minimal, if we consider something like
EXPLAIN output.

Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a
particularly bad case. NOWAIT might be the wrong thing for DDL
generally.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 1:37 PM, Peter Geoghegan p...@heroku.com wrote:
 And yet, in theory session 2's impact
 on production should not be minimal, if we consider something like
 EXPLAIN output.


Should have been minimal, I mean.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Claudio Freire
On Thu, Oct 2, 2014 at 5:37 PM, Peter Geoghegan p...@heroku.com wrote:
 Session 3 is an innocent bystander. It goes to query the same table in
 an ordinary, routine way - a SELECT statement. Even though session 2's
 lock is not granted yet, session 3 is not at liberty to skip the queue
 and get its own AccessShareLock. The effect is about the same as if
 session 2 did need to hold an AccessExclusiveLock for ages: read
 queries block for a long time. And yet, in theory session 2's impact
 on production should not be minimal, if we consider something like
 EXPLAIN output.

The explain would show the AccessExclusiveLock, so it would be enough
for a heads-up to kill all idle-in-transaction holding locks on the
target relation (if killable, or just wait).

Granted, it's something that's not easily automatable, whereas a nowait is.

However, rather than nowait, I'd prefer cancellable semantics, that
would cancel voluntarily if any other transaction requests a
conflicting lock, like autovacuum does.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Claudio Freire
On Thu, Oct 2, 2014 at 6:00 PM, Peter Geoghegan p...@heroku.com wrote:
 Granted, it's something that's not easily automatable, whereas a nowait is.

 However, rather than nowait, I'd prefer cancellable semantics, that
 would cancel voluntarily if any other transaction requests a
 conflicting lock, like autovacuum does.

 I think the problem you'll have with NOWAIT is: you have an error from
 having to wait...what now? Do you restart? I imagine this would
 frequently result in what is effectively lock starvation. Any old
 AccessShareLock-er is going to make our migration tool restart. We'll
 never finish.

I've done that manually (throw the DDL, and cancel if it takes more
than a couple of seconds) on modest but relatively busy servers with
quite some success.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote:
 On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost sfr...@snowman.net wrote:
  The downside of the 'explain' approach is that the script then has to be
  modified to put 'explain' in front of everything and then you have to go
  through each statement and consider it.  Having a 'dry-run' transaction
  type which then produces a report at the end feels like it'd be both
  easier to assess the overall implications, and less error-prone as you
  don't have to prefex every statement with 'explain'.  It might even be
  possible to have the local view of post-alter statements be available
  inside of this 'dry-run' option- that is, if you add a column in the
  transaction then the column exists to the following commands, so it
  doesn't just error out.  Having 'explain whatever' wouldn't give you
  that and so you really wouldn't be able to have whole scripts run by
  just pre-pending each command with 'explain'.
 
 That sounds extremely complex. You'd have to implement the fake
 columns, foreign keys, indexes, etc on most execution nodes, the
 planner, and even system views.

Eh?  We have MVCC catalog access.

 IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
 locks. I don't think you can simulate the side effects without locks,

Why?  If you know the transaction is going to roll back and you only add
entries to the catalog which aren't visible to any other transactions
than your own, and you make sure that nothing you do actually writes
data out which is visible to other transactions..

 so getting the local view of changes will be extremely difficult
 unless you limit the scope considerably.

I agree that there may be complexities, but I'm not sure this is really
the issue..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 1:52 PM, Claudio Freire klaussfre...@gmail.com wrote:
 The explain would show the AccessExclusiveLock, so it would be enough
 for a heads-up to kill all idle-in-transaction holding locks on the
 target relation (if killable, or just wait).

I think that there are very few problems with recognizing when an
AccessExclusiveLock is needed or not needed. The exceptions to the
rule that DDL needs such a lock are narrow enough that I have a hard
time believing that most people think about it, or even need to think
about it. I wish that wasn't the case, but it is.

 Granted, it's something that's not easily automatable, whereas a nowait is.

 However, rather than nowait, I'd prefer cancellable semantics, that
 would cancel voluntarily if any other transaction requests a
 conflicting lock, like autovacuum does.

I think the problem you'll have with NOWAIT is: you have an error from
having to wait...what now? Do you restart? I imagine this would
frequently result in what is effectively lock starvation. Any old
AccessShareLock-er is going to make our migration tool restart. We'll
never finish.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Claudio Freire
On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost sfr...@snowman.net wrote:
  The downside of the 'explain' approach is that the script then has to be
  modified to put 'explain' in front of everything and then you have to go
  through each statement and consider it.  Having a 'dry-run' transaction
  type which then produces a report at the end feels like it'd be both
  easier to assess the overall implications, and less error-prone as you
  don't have to prefex every statement with 'explain'.  It might even be
  possible to have the local view of post-alter statements be available
  inside of this 'dry-run' option- that is, if you add a column in the
  transaction then the column exists to the following commands, so it
  doesn't just error out.  Having 'explain whatever' wouldn't give you
  that and so you really wouldn't be able to have whole scripts run by
  just pre-pending each command with 'explain'.

 That sounds extremely complex. You'd have to implement the fake
 columns, foreign keys, indexes, etc on most execution nodes, the
 planner, and even system views.

 Eh?  We have MVCC catalog access.

And that needs locks, especially if you modify the underlying filesystem layout.

 IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
 locks. I don't think you can simulate the side effects without locks,

 Why?  If you know the transaction is going to roll back and you only add
 entries to the catalog which aren't visible to any other transactions
 than your own, and you make sure that nothing you do actually writes
 data out which is visible to other transactions..

But that's not the scope. If you want a dry-run of table-rewriting
DDL, or DDL interspersed with DML like:

alter table blargh add foo integer;
update blargh set foo = coalesce(bar, baz);

You really cannot hope not to have to write data. The above is also
the case with defaulted columns btw.

 so getting the local view of changes will be extremely difficult
 unless you limit the scope considerably.

 I agree that there may be complexities, but I'm not sure this is really
 the issue..

In essence, if you want MVCC catalog access without AEL, you're in for
a rough ride. I'm not as experienced with pg's core as you, so you
tell me, but I imagine it will be the case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 1:10 PM, Bruce Momjian br...@momjian.us wrote:
 I think if we use the MERGE command for this feature we would need to
 use a non-standard keyword to specify that we want OLTP/UPSERT
 functionality.  That would allow us to mostly use the MERGE standard
 syntax without having surprises about non-standard behavior.  I am
 thinking of how CONCURRENTLY changes the behavior of some commands.

That would leave you without a real general syntax. It'd also make
having certain aspects of an UPSERT more explicit be a harder goal
(there is no conventional join involved here - everything goes through
a unique index). Adding the magic keyword would break certain other
parts of the statement, so you'd have exact rules for what worked
where. I see no advantage, and considerable disadvantages.

Note that I've documented a lot of this stuff here:

https://wiki.postgresql.org/wiki/UPSERT

Mapping the join thing onto which unique index you want to make the
UPSERT target is very messy. There are a lot of corner cases. It's
quite ticklish.

Please add to it if you think we've missed something.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-02 Thread Peter Eisentraut
On 10/2/14 3:19 PM, Robert Haas wrote:
 1..2
 ok 1 - initdb with invalid option nonzero exit code
 ok 2 - initdb with invalid option prints error message
 # Looks like your test exited with 256 just after 2.
 not ok 3 - initdb options handling
 
 #   Failed test 'initdb options handling'
 #   at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229.

Is this repeatable?  Bisectable?  Has it ever worked?  Have you tried a
clean build?  Did you upgrade something in your operating system?

It appears to work everywhere else.

If none of this gets us closer to an answer, I can try to produce a
patch that produces more details for such failures.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 2:04 PM, Claudio Freire klaussfre...@gmail.com wrote:
 I've done that manually (throw the DDL, and cancel if it takes more
 than a couple of seconds) on modest but relatively busy servers with
 quite some success.


Fair enough, but that isn't the same as NOWAIT. It's something we'd
have a hard time coming up with a general-purpose timeout for.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-02 Thread Andres Freund
On 2014-10-02 17:09:43 -0400, Peter Eisentraut wrote:
 On 10/2/14 3:19 PM, Robert Haas wrote:
  1..2
  ok 1 - initdb with invalid option nonzero exit code
  ok 2 - initdb with invalid option prints error message
  # Looks like your test exited with 256 just after 2.
  not ok 3 - initdb options handling
  
  #   Failed test 'initdb options handling'
  #   at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229.
 
 Is this repeatable?  Bisectable?  Has it ever worked?  Have you tried a
 clean build?  Did you upgrade something in your operating system?
 
 It appears to work everywhere else.
 
 If none of this gets us closer to an answer, I can try to produce a
 patch that produces more details for such failures.

FWIW, the current amount of details on errors is clearly insufficient.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote:
 On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost sfr...@snowman.net wrote:
  The downside of the 'explain' approach is that the script then has to be
  modified to put 'explain' in front of everything and then you have to go
  through each statement and consider it.  Having a 'dry-run' transaction
  type which then produces a report at the end feels like it'd be both
  easier to assess the overall implications, and less error-prone as you
  don't have to prefex every statement with 'explain'.  It might even be
  possible to have the local view of post-alter statements be available
  inside of this 'dry-run' option- that is, if you add a column in the
  transaction then the column exists to the following commands, so it
  doesn't just error out.  Having 'explain whatever' wouldn't give you
  that and so you really wouldn't be able to have whole scripts run by
  just pre-pending each command with 'explain'.
 
 It's kind of tricky to implement a patch to figure this out ahead of
 time. Some of the actual lock acquisitions are well hidden, in terms
 of how the code is structured. In others cases, it may not even be
 possible to determine ahead of time exactly what locks will be taken.

I was thinking this would be a new kind of transaction and we'd have to
teach parts of the system about it- yes, that's pretty invasive, but
it's at least one approach to consider.

 As Harold mentioned, another idea along the same lines would be to
 decorate DDL with a NOWAIT no locking assertion and/or no rewrite
 assertion.  Basically, if this DDL (or perhaps any DDL, if this is
 implemented as a GUC instead) necessitates a table rewrite (and
 requires an AccessExclusiveLock), throw an error. That's the case that
 most people care about.

The problem I see with this approach is outlined above.  I agree that it
may be independently valuable, but I don't see it as being a solution to
the issue.

 This may not even be good enough, though. Consider:
 
 Session 1 is a long running transaction. Maybe it's a spurious
 idle-in-transaction situation, but it could also be totally
 reasonable. It holds an AccessShareLock on some relation, as long
 running transactions are inclined to do.
 
 Session 2 is our migration. It needs an AccessExclusiveLock to ALTER
 TABLE on the same relation (or whatever). But it doesn't need a
 rewrite, which is good. It comes along and attempts to acquire the
 lock, blocking on session 1.
 
 Session 3 is an innocent bystander. It goes to query the same table in
 an ordinary, routine way - a SELECT statement. Even though session 2's
 lock is not granted yet, session 3 is not at liberty to skip the queue
 and get its own AccessShareLock. The effect is about the same as if
 session 2 did need to hold an AccessExclusiveLock for ages: read
 queries block for a long time. And yet, in theory session 2's impact
 on production should not be minimal, if we consider something like
 EXPLAIN output.
 
 Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a
 particularly bad case. NOWAIT might be the wrong thing for DDL
 generally.

I agree that this is a concern, but this feels to me like a next-step
over top of the assess the locks required transaction type which I am
trying to outline.  Specifically, having a way to take the report of
what locks are going to be required and then actaully attempt to acquire
them all (or fail if any can't be granted immediately) would be a
natural next step and a way to start off the actual migration script-
either all get acquired and the script runs to completion, or a lock
isn't granted and the whole thing fails immediately without anything
actually being done.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Andres Freund
On 2014-10-02 17:03:59 -0400, Stephen Frost wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
  On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost sfr...@snowman.net wrote:
   The downside of the 'explain' approach is that the script then has to be
   modified to put 'explain' in front of everything and then you have to go
   through each statement and consider it.  Having a 'dry-run' transaction
   type which then produces a report at the end feels like it'd be both
   easier to assess the overall implications, and less error-prone as you
   don't have to prefex every statement with 'explain'.  It might even be
   possible to have the local view of post-alter statements be available
   inside of this 'dry-run' option- that is, if you add a column in the
   transaction then the column exists to the following commands, so it
   doesn't just error out.  Having 'explain whatever' wouldn't give you
   that and so you really wouldn't be able to have whole scripts run by
   just pre-pending each command with 'explain'.
  
  That sounds extremely complex. You'd have to implement the fake
  columns, foreign keys, indexes, etc on most execution nodes, the
  planner, and even system views.
 
 Eh?  We have MVCC catalog access.

So you want to modify the catalog without actually doing the
corresponding actions? That'll be heck of invasive. With changes all
over the backend. We'll need to remove error checks (like for the
existance of relfilenodes), remove rewriting, and such.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Andres Freund
On 2014-10-02 13:49:36 -0300, Claudio Freire wrote:
 EXPLAIN ALTER TABLE ?

I don't think that'll work - there's already EXPLAIN for some CREATE. At
least CREATE TABLE ... AS, CREATE VIEW ... AS and SELECT INTO.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote:
 On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost sfr...@snowman.net wrote:
  That sounds extremely complex. You'd have to implement the fake
  columns, foreign keys, indexes, etc on most execution nodes, the
  planner, and even system views.
 
  Eh?  We have MVCC catalog access.
 
 And that needs locks, especially if you modify the underlying filesystem 
 layout.

And we wouldn't be doing that, certainly.  It's a dry-run.

  IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
  locks. I don't think you can simulate the side effects without locks,
 
  Why?  If you know the transaction is going to roll back and you only add
  entries to the catalog which aren't visible to any other transactions
  than your own, and you make sure that nothing you do actually writes
  data out which is visible to other transactions..
 
 But that's not the scope. If you want a dry-run of table-rewriting
 DDL, or DDL interspersed with DML like:
 
 alter table blargh add foo integer;
 update blargh set foo = coalesce(bar, baz);
 
 You really cannot hope not to have to write data. The above is also
 the case with defaulted columns btw.

The point is to not write anything which is visible to other
transactions, which means we'd have to put DML into some different
'mode' which doesn't actually write where other processes might be
looking.  I'm not saying it's trivial to do, but I don't think it's
impossible either.  We might also be able to simply get away with
short-circuiting them and not actually writing anything (and the same
for reading data..).  What would probably be useful is to review actual
migration scripts and see if this would really work.  I know they'd work
for at least a subset of the migration scripts that I've dealt with
before, and also not all of them.

  so getting the local view of changes will be extremely difficult
  unless you limit the scope considerably.
 
  I agree that there may be complexities, but I'm not sure this is really
  the issue..
 
 In essence, if you want MVCC catalog access without AEL, you're in for
 a rough ride. I'm not as experienced with pg's core as you, so you
 tell me, but I imagine it will be the case.

It's not clear to me what you're getting at as the 'rough' part,
exactly..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-10-02 17:03:59 -0400, Stephen Frost wrote:
   That sounds extremely complex. You'd have to implement the fake
   columns, foreign keys, indexes, etc on most execution nodes, the
   planner, and even system views.
  
  Eh?  We have MVCC catalog access.
 
 So you want to modify the catalog without actually doing the
 corresponding actions? That'll be heck of invasive. With changes all
 over the backend. We'll need to remove error checks (like for the
 existance of relfilenodes), remove rewriting, and such.

Yeah, I was getting at it being rather invasive earlier.  It really
depends on exactly what we'd support in this mode, which would depend on
just what would be invasive and what wouldn't, I expect.  I dislike the
idea of not being able to actually run a real migration script though as
anything else opens the very real possibility that the real script and
the 'explain' script don't do the same thing, making this capability not
nearly as useful..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Assertion failure in syncrep.c

2014-10-02 Thread Simon Riggs
On 18 September 2014 07:32, Pavan Deolasee pavan.deola...@gmail.com wrote:

 564 /*
 565  * Set state to complete; see SyncRepWaitForLSN() for discussion
 of
 566  * the various states.
 567  */
 568 thisproc-syncRepState = SYNC_REP_WAIT_COMPLETE;
 569
 570 /*
 571  * Remove thisproc from queue.
 572  */
 573 SHMQueueDelete((thisproc-syncRepLinks));

Yes, looks like a bugette to me also.

Unlikely to occur except when no network involved, and even more
luckily no effect at all when Assert is not enabled. So not a priority
fix.

But fix looks trivial, just to switch around the two statements above.

I'll backpatch tomorrow.

Thanks

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-02 Thread Bruce Momjian
On Thu, Oct  2, 2014 at 02:08:30PM -0700, Peter Geoghegan wrote:
 On Thu, Oct 2, 2014 at 1:10 PM, Bruce Momjian br...@momjian.us wrote:
  I think if we use the MERGE command for this feature we would need to
  use a non-standard keyword to specify that we want OLTP/UPSERT
  functionality.  That would allow us to mostly use the MERGE standard
  syntax without having surprises about non-standard behavior.  I am
  thinking of how CONCURRENTLY changes the behavior of some commands.
 
 That would leave you without a real general syntax. It'd also make
 having certain aspects of an UPSERT more explicit be a harder goal
 (there is no conventional join involved here - everything goes through
 a unique index). Adding the magic keyword would break certain other
 parts of the statement, so you'd have exact rules for what worked
 where. I see no advantage, and considerable disadvantages.
 
 Note that I've documented a lot of this stuff here:
 
 https://wiki.postgresql.org/wiki/UPSERT
 
 Mapping the join thing onto which unique index you want to make the
 UPSERT target is very messy. There are a lot of corner cases. It's
 quite ticklish.
 
 Please add to it if you think we've missed something.

OK, it is was just an idea I wanted to point out, and if it doesn't
work, it more clearly cements that we need UPSERT _and_ MERGE.

Josh was pointing out that we don't want to surprise our users, so I
suggested an additional keyword, which addresses his objections, but as
you said, if that standard MERGE syntax doesn't give us what we want,
then that is the fatal objection to using only MERGE.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Claudio Freire
On Thu, Oct 2, 2014 at 6:19 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost sfr...@snowman.net wrote:
  That sounds extremely complex. You'd have to implement the fake
  columns, foreign keys, indexes, etc on most execution nodes, the
  planner, and even system views.
 
  Eh?  We have MVCC catalog access.

 And that needs locks, especially if you modify the underlying filesystem 
 layout.

 And we wouldn't be doing that, certainly.  It's a dry-run.

...

 (...) We might also be able to simply get away with
 short-circuiting them and not actually writing anything (and the same
 for reading data..).  What would probably be useful is to review actual
 migration scripts and see if this would really work.  I know they'd work
 for at least a subset of the migration scripts that I've dealt with
 before, and also not all of them.

I believe, for it to be reasonably intrusive, you'd have to abort at
the first need to actually read/write data.

Most of my migration scripts, especially the ones that would benefit
from this, require some of that, but that's just my personal common
practice, not a general case.

  IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
  locks. I don't think you can simulate the side effects without locks,
 
  Why?  If you know the transaction is going to roll back and you only add
  entries to the catalog which aren't visible to any other transactions
  than your own, and you make sure that nothing you do actually writes
  data out which is visible to other transactions..

 But that's not the scope. If you want a dry-run of table-rewriting
 DDL, or DDL interspersed with DML like:

 alter table blargh add foo integer;
 update blargh set foo = coalesce(bar, baz);

 You really cannot hope not to have to write data. The above is also
 the case with defaulted columns btw.

 The point is to not write anything which is visible to other
 transactions, which means we'd have to put DML into some different
 'mode' which doesn't actually write where other processes might be
 looking.  I'm not saying it's trivial to do, but I don't think it's
 impossible either.  (...)

No, I don't think it's impossible either. Just very, very
time-consuming. Both in developing the patch and in its eventual
maintenance.

TBH, a separate read-only transaction like explain alter would also be
quite difficult to keep in sync with actual alter logic, unless it's
handled by the same code (unlikely in that form).

  so getting the local view of changes will be extremely difficult
  unless you limit the scope considerably.
 
  I agree that there may be complexities, but I'm not sure this is really
  the issue..

 In essence, if you want MVCC catalog access without AEL, you're in for
 a rough ride. I'm not as experienced with pg's core as you, so you
 tell me, but I imagine it will be the case.

 It's not clear to me what you're getting at as the 'rough' part,
 exactly..

A lot of work, touching most of the codebase, and hard to maintain in the end.

Unless you limit the scope, as you said, just touching the catalog,
not data, could be doable. But it would act as an implicit
norewrite.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@2ndquadrant.com writes:
   - will the table have to be rewritten? the indexes?

 Please give my DDL deparsing patch a look.  There is a portion there
 about deparsing ALTER TABLE specifically; what it does is save a list of
 subcommands, and for each of them we either report the OID of the object
 affected (for example in ADD CONSTRAINT), or a column number (for ALTER
 COLUMN RENAME, say).  It sounds like you would like to have some extra
 details returned: for instance the does the whole of it require a table
 rewrite bit.  It sounds like it can be trivially returned in the JSON

Some years ago when working on the Event Trigger framework we did
mention providing some interesting events, such as a TableRewrite Event.

In between what you're saying here and what Harold and Peter Geoghegan
are mentionning (basically that dealing with table rewrites is 90% of
the need for them), it could be that the best way to have at it would be
to add that Event in the Event Trigger mechanism.

We could also add an AccessExclusiveLock Event that would fire just
before actually taking the lock, allowing people to RAISE EXCEPTION in
that case, or to maybe just do the LOCK … NOWAIT themselves in the
trigger.

For the locking parts, best would be to do the LOCK … NOWAIT dance for
all the tables touched by the DDL migration script. The Event Trigger
approach will not solve that, unfortunately.

Regards,
-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Jan Wieck

On 10/02/2014 01:15 PM, Joe Conway wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/02/2014 11:30 AM, Dimitri Fontaine wrote:

Questions:

1. Do you agree that a systematic way to report what a DDL command
(or script, or transaction) is going to do on your production
database is a feature we should provide to our growing user base?


+1

I really like the idea and would find it useful/time-saving


2. What do you think such a feature should look like?


Elsewhere on this thread EXPLAIN was suggested. That makes a certain
amount of sense.

Maybe something like EXPLAIN IMPACT [...]


3. Does it make sense to support the whole set of DDL commands from
the get go (or ever) when most of them are only taking locks in
their own pg_catalog entry anyway?


Yes, I think it should cover all commands that can have an
availability impact.


In principle I agree with the sentiment. However, that full coverage is 
a nice goal, seldom achieved.


The real question is at what level of information, returned to the user, 
does this feature become user friendly?


It is one thing to provide information of the kind of

TAKE ACCECSS EXCLUSIVE LOCK ON TABLE foo
TEST EVERY ROW IN TABLE foo FOR FK (a, b) IN bar (id1, id2)

That information is useful, but only to an experienced DBA who knows 
their schema and data to a certain degree. The majority of users, I 
fear, will not be able to even remotely guesstimate if that will need 
seconds or hours.


There needs to be more detail information for those cases and I believe 
that tackling them one at a time in depth will lead to more useful 
results than trying to cover a lot but shallow.



My $.02
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for updating src/timezone

2014-10-02 Thread Tom Lane
John Cochran j69coch...@gmail.com writes:
 As it is, I've finished checking the differences between the postgres and
 IANA code for zic.c after editing both to eliminate non-functional style
 differences such as indentation, function prototypes, comparing strchr
 results against NULL or 0, etc. It looks like the only differences from the
 stock code is support for the -P option implemented by Tom Lane and the
 substitution of the postgres code for getopt instead of the unix default as
 well as a few minor changes in some defaults and minor modifications to
 deal with Windows. Overall, rather small and trivial.

 Additionally, I discovered a little piece of low hanging fruit. The
 Makefile for the timezone code links together zic.o ialloc.o scheck.o and
 localtime.o in order to make zic.  Additionally, zic.c has a stub
 implementation of pg_open_tzfile() in order to resolve a linkage issue with
 the localtime.o module for zic.
 Well, as it turns out, localtime.o doesn't supply ANY symbols that zic
 needs and therefore can be omitted entirely from the list of object files
 comprising zic. Which in turn means that the stub implementation of
 pg_open_tzfile can be removed from the postgres version of zic.c.  I'm not
 bothering to submit a patch involving this since that patch will be quite
 short lived given my objective to bring the code up to date with the 2014e
 version of the IANA code. But I am submitting a bug report to IANA on the
 Makefile since the unneeded linkage with localtime.o is still in the 2014e
 code on their site.

John, have you made any further progress on this since July?

The urgency of updating our timezone code has risen quite a bit for me,
because while testing an update of the data files to tzdata2014h I became
aware that the -P option is failing to print a noticeable number of
zone abbreviations that clearly exist in the data files.  Since the -P
hack itself is so simple, it's hard to come to any other conclusion than
that the rest of the timezone code is buggy --- presumably because it's
four years behind what IANA is targeting with the data files.

I spent a little bit of time looking at just applying the diffs between
tzcode2010c and tzcode2014h to our code, but the diffs are large enough
that that seems both tedious and quite error-prone.  I think we'd be
better advised to push forward with the idea of trying to absorb
more-or-less-unmodified versions of the timezone code files.  (I'm
particularly disillusioned with the idea that we'll keep on pgindent'ing
that code --- the effort-to-reward ratio even to keep the comments
readable just ain't very good.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NEXT VALUE FOR sequence

2014-10-02 Thread Thomas Munro
On 3 October 2014 00:01, Thomas Munro mu...@ip9.org wrote:
 On 2 October 2014 14:48, Tom Lane t...@sss.pgh.pa.us wrote:
 Thomas Munro mu...@ip9.org writes:
 SQL:2003 introduced the function NEXT VALUE FOR sequence. Google
 tells me that at least DB2, SQL Server and a few niche databases
 understand it so far.  As far as I can tell there is no standardised
 equivalent of currval and setval (but I only have access to second
 hand information about the standard, like articles and the manuals of
 other products).

 Have you checked the archives about this?  My recollection is that one
 reason it's not in there (aside from having to reserve NEXT) is that
 the standard-mandated semantics are not the same as nextval().

 Right, I found the problem: If there are multiple instances of next value
 expressions specifying the same sequence generator within a single
 SQL-statement, all those instances return the same value for a
 given row processed by that SQL-statement.  This was discussed in a thread
 from 2002 [1].

 So the first step would be to make a standard conforming function to transform
 the standard's syntax into.

 I found the text in the 20nn draft specification and it didn't seem 
 immediately
 clear what 'statement' should mean, for example what if your statement calls
 pl/pgsql which contains further statements, and what if triggers, default
 expressions, etc are invoked?  I suppose one approach would be to use command
 IDs as the scope.  Do you think the following change would make sense?

 In struct SeqTableData (from sequence.c), add a member last_command_id.
 When you call the new function, let's say nextval_for_command(regclass),
 if last_command_id matches GetCommandId() then it behaves like currval_oid
 and returns last, otherwise it behaves like nextval_oid, and updates
 last_command_id to the current command ID.

Actually scratch that, it's not about statements, it's about rows
processed by that
SQL-statement.  I will think about how that could be interpreted and
implemented...

Best regards,
Thomas Munro


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE IF NOT EXISTS INDEX

2014-10-02 Thread Marti Raudsepp
On Wed, Oct 1, 2014 at 2:42 PM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 So, what's the correct/best grammar?
 CREATE [ IF NOT EXISTS ] [ UNIQUE ] INDEX index_name
 or
 CREATE [ UNIQUE ] INDEX [ IF NOT EXISTS ] index_name

I've elected myself as the reviewer for this patch. Here are some
preliminary comments...

I agree with José. The 2nd is more consistent given the other syntaxes:
  CREATE { TABLE | SCHEMA | EXTENSION | ... } IF NOT EXISTS name ...
It's also compatible with SQLite's grammar:
https://www.sqlite.org/lang_createindex.html

Do we want to enforce an order on the keywords or allow both?
  CREATE INDEX IF NOT EXISTS CONCURRENTLY foo ...
  CREATE INDEX CONCURRENTLY IF NOT EXISTS foo ...

It's probably very rare to use both keywords at the same time, so I'd
prefer only the 2nd, unless someone else chimes in.

Documentation: I would prefer if the explanation were consistent with
the description for ALTER TABLE/EXTENSION; just copy it and replace
relation with index.

+ ereport(NOTICE,
+ (errcode(ERRCODE_DUPLICATE_TABLE),
+  errmsg(relation \%s\ already exists, skipping,
+ indexRelationName)));

1. Clearly relation should be index.
2. Use ERRCODE_DUPLICATE_OBJECT not TABLE

+ if (n-if_not_exists  n-idxname == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+  errmsg(IF NOT EXISTS requires that you
name the index.),

I think ERRCODE_SYNTAX_ERROR makes more sense, it's something that we
decided we *don't want* to support.

- write_msg(NULL, reading row-security enabled for table \%s\,
+ write_msg(NULL, reading row-security enabled for table \%s\\n,

???

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NEXT VALUE FOR sequence

2014-10-02 Thread Tom Lane
Thomas Munro mu...@ip9.org writes:
 On 2 October 2014 14:48, Tom Lane t...@sss.pgh.pa.us wrote:
 Have you checked the archives about this?  My recollection is that one
 reason it's not in there (aside from having to reserve NEXT) is that
 the standard-mandated semantics are not the same as nextval().

 Right, I found the problem: If there are multiple instances of next value
 expressions specifying the same sequence generator within a single
 SQL-statement, all those instances return the same value for a
 given row processed by that SQL-statement.  This was discussed in a thread
 from 2002 [1].

Wow, it was that far back?  No wonder I didn't remember the details.

 I suppose one approach would be to use command
 IDs as the scope.

The spec clearly says one value per row, not one per statement; so
command ID is very definitely not the right thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/02/2014 06:43 PM, Jan Wieck wrote:
 The real question is at what level of information, returned to the
 user, does this feature become user friendly?
 
 It is one thing to provide information of the kind of
 
 TAKE ACCECSS EXCLUSIVE LOCK ON TABLE foo TEST EVERY ROW IN TABLE
 foo FOR FK (a, b) IN bar (id1, id2)
 
 That information is useful, but only to an experienced DBA who
 knows their schema and data to a certain degree. The majority of
 users, I fear, will not be able to even remotely guesstimate if
 that will need seconds or hours.
 
 There needs to be more detail information for those cases and I
 believe that tackling them one at a time in depth will lead to more
 useful results than trying to cover a lot but shallow.

Perhaps.

This and other posts on this thread make me wonder if some kind of
extension to the stats collector and pg_stat_statement might be a good
place to start?

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIbBAEBAgAGBQJULeHwAAoJEDfy90M199hlYzQP9jJU6hs21zBZWCUUhN1159U2
Gjs8IfxxTClCUFI9Do+PyoXD2qSJKb1b6y9A6n6YXIJEzZEZfrNuuH3HvcTPMDhs
yhDyAt72dyAU/udDZ2IRORQe61SBycFyi9srTyfRAf/gmUEjSdLXoZZ9JkZHhbjM
65cOmX9icRnwArPwu+FRsSilutfkW7D38s0Fao2uC+zL10acm5xkC+sTRaDJQXZm
MitNsBu2IowYaEGkbNwvzV4lemCySvIyac6YzinBBpOEU32kM7gqjeFx4KLKZAfW
g5/7e0DeuUYw6Y4+ghb+JIdiuPqV8FJdcV+L3z7nFs7QVh4F7IictJTJ/pxdYUzc
VtfGzDAtiXIV7g0TEctoH9T2yiRe2ZEllV0yy7rgXn2Uj4mgMQDgz9QNsBPbcOdz
KZ3Dey2NvoACAXiDwwzE/QKDZu1Siqfe5MnlAFPhWEOm29PiT2GF/Y7Cj6C6D84N
VL+/Y4G9BuPokYQuVMyY7gP5lRPoBqBafiACki/8kIIM5pSAqen19VzSCEZTVRmd
471TqczxG3ibsfyRWEgUxW5zhnR+Se5z6FqIJPvpVbhe3OcpUhk4eQbBrSK3AnyN
Vq0lJ1yWcXzbRxULeFpMmXBbv/ZC/qfLc0tRmMI0VJXYvIXJOp3p2HvKcIK6v/hh
9hil/uPVIHBagyEYZco=
=0lXI
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixed xloginsert_locks for 9.4

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 5:08 PM, Greg Smith
greg.sm...@crunchydatasolutions.com wrote:
 When 9.4 is already giving a more than 100% gain on this targeted test case,
 I can't see that chasing after maybe an extra 10% is worth having yet
 another GUC around.  Especially when it will probably take multiple tuning
 steps before you're done anyway; we don't really know the rest of them yet;
 and when we do, we probably won't need a GUC to cope with them in the end
 anyway.

Agreed. I think that prior to 9.4, the logging performance of Postgres
was very competitive when compared to other systems. At this stage,
it's probably extremely fast by any standard. Amit's work on only
WAL-logging the modified portion of UPDATEs helps here too.

I tend to believe that the next big round of performance gains can be
had by working on the buffer manager, and B-Tree indexes. At some
point we should work on prefix compression within B-Tree leaf pages.
We should also work on adding abbreviated keys to B-Tree internal
pages. Doing so should almost remove the benefit of using the C
locale, because most comparisons needed for index scans can use
comparisons implemented as nothing more than a memcmp() (note that
internal pages have values that are naturally heterogeneous, so this
will work well).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-10-02 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Alvaro Herrera wrote:
  Basically, if you are on 9.3.5 or earlier any per-table options for
  autovacuum cost delay will misbehave (meaning: any such table will be
  processed with settings flattened according to balancing of the standard
  options, _not_ the configured ones).  If you are on 9.3.6 or newer they
  will behave as described in the docs.
 
 Another thing to note is that if you have configured a table to have
 cost_limit *less* than the default (say 150 instead of the default 200),
 the balance system will again break that and process the table at 200
 instead; in other words, the balancing system has completely broken the
 ability to tweak the cost system for individual tables in autovacuum.

That's certainly pretty ugly.

 With the v5 patch, the example tables above will be vacuumed at exactly
 5000 and 150 instead.  The more complex patch I produced earlier would
 have them vacuumed at something like 4900 and 100 instead, so you
 wouldn't exceed the total of 5000.  I think there is some value to that
 idea, but it seems the complexity of managing this is too high.

Agreed.

 I am rather surprised that nobody has reported this problem before.  I
 am now of the mind that this is clearly a bug that should be fixed all
 the way back.

I'm coming around to that also, however, should we worry about users who
set per-table settings and then simply forgot about them?  I suppose
that won't matter too much unless the table is really active, and if it
is, they've probably already set it to zero.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Patch to add support of IF NOT EXISTS to others CREATE statements

2014-10-02 Thread Marti Raudsepp
On Tue, Aug 26, 2014 at 4:20 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 04/14/2014 10:31 PM, Fabrízio de Royes Mello wrote:
 The attached patch contains CINE for sequences.

 I just strip this code from the patch rejected before.

 Committed with minor changes

Hmm, the CommitFest app lists Marko Tiikkaja as the reviewer, but I
can't find his review anywhere...

The documentation claims:
CREATE [ IF NOT EXISTS ] SEQUENCE name
But grammar implements it the other way around:
CREATE SEQUENCE IF NOT EXISTS name;

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE IF NOT EXISTS INDEX

2014-10-02 Thread Marti Raudsepp
On Fri, Oct 3, 2014 at 2:15 AM, Marti Raudsepp ma...@juffo.org wrote:
 + ereport(NOTICE,
 + (errcode(ERRCODE_DUPLICATE_TABLE),
 +  errmsg(relation \%s\ already exists, skipping,
 + indexRelationName)));

 1. Clearly relation should be index.
 2. Use ERRCODE_DUPLICATE_OBJECT not TABLE

My bad, this code is OK. The current code already uses relation and
TABLE elsewhere because indexes share the same namespace with tables.

+ /*
+  * Throw an exception when IF NOT EXISTS is used without a named
+  * index
+  */

I'd say without an index name. And the line goes beyond 80 characters wide.

I would also move this check to after all the attributes have been
assigned, rather than splitting the assignments in half.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 5:09 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 10/2/14 3:19 PM, Robert Haas wrote:
 1..2
 ok 1 - initdb with invalid option nonzero exit code
 ok 2 - initdb with invalid option prints error message
 # Looks like your test exited with 256 just after 2.
 not ok 3 - initdb options handling

 #   Failed test 'initdb options handling'
 #   at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229.

 Is this repeatable?  Bisectable?  Has it ever worked?  Have you tried a
 clean build?  Did you upgrade something in your operating system?

I don't think it's ever worked.  I just didn't get around to reporting
it before now.

 If none of this gets us closer to an answer, I can try to produce a
 patch that produces more details for such failures.

A test that fails for no reason that can be gleaned from the output is
not an improvement over not having a test at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-10-02 Thread Michael Paquier
On Thu, Oct 2, 2014 at 12:44 AM, Andres Freund and...@2ndquadrant.com
wrote:

 I pushed the first part.

Thanks. Attached is a rebased version of patch 2, implementing the actual
feature. One thing I noticed with more testing is that if --create is used
and that the destination folder does not exist, pg_receivexlog was creating
the slot, and left with an error. This does not look user-friendly so I
changed the logic a bit to check for the destination folder before creating
any slot. This results in a bit of refactoring, but this way behavior is
more intuitive.
Regards,
-- 
Michael
From e3ee4b918b8122d414c03207fdd50f2472dc292e Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 1 Sep 2014 20:53:45 +0900
Subject: [PATCH] Support for replslot creation and drop in pg_receivexlog

Using the new actions --create and --drop that are similarly present
in pg_recvlogical, a user can respectively create and drop a replication
slot that can be used afterwards when fetching WALs.
---
 doc/src/sgml/ref/pg_receivexlog.sgml   |  29 +++
 src/bin/pg_basebackup/pg_receivexlog.c | 154 +
 2 files changed, 168 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml
index 5916b8f..69aa0af 100644
--- a/doc/src/sgml/ref/pg_receivexlog.sgml
+++ b/doc/src/sgml/ref/pg_receivexlog.sgml
@@ -72,6 +72,35 @@ PostgreSQL documentation
   titleOptions/title
 
para
+applicationpg_receivexlog/application can run in one of two following
+modes, which control physical replication slot:
+
+variablelist
+
+ varlistentry
+  termoption--create/option/term
+  listitem
+   para
+Create a new physical replication slot with the name specified in
+option--slot/option.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
+  termoption--drop/option/term
+  listitem
+   para
+Drop the replication slot with the name specified in
+option--slot/option, then exit.
+   /para
+  /listitem
+ /varlistentry
+/variablelist
+
+   /para
+
+   para
 The following command-line options control the location and format of the
 output.
 
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 171cf43..8c4f752 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -38,11 +38,15 @@ static int	noloop = 0;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
 static int	fsync_interval = 0; /* 0 = default */
 static volatile bool time_to_abort = false;
+static bool do_create_slot = false;
+static bool do_drop_slot = false;
 
 
 static void usage(void);
+static DIR* get_destination_dir(char *dest_folder);
+static void close_destination_dir(DIR *dest_dir, char *dest_folder);
 static XLogRecPtr FindStreamingStart(uint32 *tli);
-static void StreamLog();
+static void StreamLog(void);
 static bool stop_streaming(XLogRecPtr segendpos, uint32 timeline,
 			   bool segment_finished);
 
@@ -78,6 +82,9 @@ usage(void)
 	printf(_(  -w, --no-password  never prompt for password\n));
 	printf(_(  -W, --password force password prompt (should happen automatically)\n));
 	printf(_(  -S, --slot=SLOTNAMEreplication slot to use\n));
+	printf(_(\nOptional actions:\n));
+	printf(_(  --create   create a new replication slot (for the slot's name see --slot)\n));
+	printf(_(  --drop drop the replication slot (for the slot's name see --slot)\n));
 	printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n));
 }
 
@@ -118,6 +125,44 @@ stop_streaming(XLogRecPtr xlogpos, uint32 timeline, bool segment_finished)
 	return false;
 }
 
+
+/*
+ * Get destination directory.
+ */
+static DIR*
+get_destination_dir(char *dest_folder)
+{
+	DIR *dir;
+
+	Assert(dest_folder != NULL);
+	dir = opendir(dest_folder);
+	if (dir == NULL)
+	{
+		fprintf(stderr, _(%s: could not open directory \%s\: %s\n),
+progname, basedir, strerror(errno));
+		disconnect_and_exit(1);
+	}
+
+	return dir;
+}
+
+
+/*
+ * Close existing directory.
+ */
+static void
+close_destination_dir(DIR *dest_dir, char *dest_folder)
+{
+	Assert(dest_dir != NULL  dest_folder != NULL);
+	if (closedir(dest_dir))
+	{
+		fprintf(stderr, _(%s: could not close directory \%s\: %s\n),
+progname, dest_folder, strerror(errno));
+		disconnect_and_exit(1);
+	}
+}
+
+
 /*
  * Determine starting location for streaming, based on any existing xlog
  * segments in the directory. We start at the end of the last one that is
@@ -134,13 +179,7 @@ FindStreamingStart(uint32 *tli)
 	uint32		high_tli = 0;
 	bool		high_ispartial = false;
 
-	dir = opendir(basedir);
-	if (dir == NULL)
-	{
-		fprintf(stderr, _(%s: could not open directory \%s\: %s\n),
-progname, basedir, strerror(errno));
-		disconnect_and_exit(1);
-	}
+	dir = get_destination_dir(basedir);
 
 	while (errno = 0, (dirent = 

[HACKERS] GiST splitting on empty pages

2014-10-02 Thread Andrew Gierth
This is from Bug #11555, which is still in moderation as I type this
(analysis was done via IRC).

The GiST insertion code appears to have no length checks at all on the
inserted entry. index_form_tuple checks for length = 8191, with the
default blocksize, but obviously a tuple less than 8191 bytes may
still not fit on the page due to page header info etc.

However, the gist insertion code assumes that if the tuple doesn't fit,
then it must have to split the page, with no check whether the page is
already empty.

So this crashes with infinite recursion in gistSplit (which also lacks
a check_stack_depth() call):

create extension pg_trgm;

create table t1 (a text);
create index on t1 using gist (a gist_trgm_ops);

insert into t1 values (
 'vvsehbxxlezgwtyvbgdyburmhxmuzbwvwoaepxbbbaiyuguwnxvprmbzkotkqqfnegruds'
 'wftedolykzvfonsqndehouxuibazwdrtjlynzjlkihqxvjnimrpbmnvupvtlzlejxdwwmh'
 'hvpxtkggstyivlvqgkmawmlbvjerfrzmnokgyrnrllagwwxdgjddwofrxjidbiqowbvusi'
 'mdumkrihuprxsnmyekhnojvexsftmybzcwlmntuijlfcyracciqqrmuoeairzkqbgcouvi'
 'cfthhszhvbplshxmwcnetnokovmdpimrnfzuxzcsaseszcfvetaxgoivjuzzclqprqkopn'
 'hqgmjgoocsicqpqylatkzvvqlmhwbwjjmpvvwkkyctatirstsldsgzismqonmxxzntvkdf'
 'ifzizharbsdfkjetcrjqfwocvcvqmywuevvevyjgiozkfialfpnarjqdinymibqlem'
 'qakzgtofeuoeftutulclpkynxgoostaitkizewfirunxnhqhttsiervbxkpqdqyxbhxfdc'
 'nvwbskiirbckkgbfizqypuoorpvovzqiunjnxswpuyaefbkobbmrvbgmrbbmbsvwffjcxf'
 'ssesxjtiyvjkmemsrdusqvklspqbsohkhlcevwmtrveeaqwrurjknuwfkngcbnnjzpnvma'
 'odvsiwjfnewxpjslocyveajsjjhxeuxsxtlgqvldzhbortagvybazlsjuagyueqsycyoxj'
 'swrtljnlpikrjjccswczuxelpdnorlyjhpdszqdozngjxilqfoqalumaxapplnzscclctp'
 'rtdxdagorlchmocypayepjrpcusowldnfgkihrxzcagoojndjmizwzoyugmqsqeyxpgege'
 'ejelytulxeyfdufsszzfqrvupskwxrbbafnyzhkwlicpchivhhaxywsopdlnumpusctrje'
 'ovmqlpytlfamdziwnxzyltlaodciummihzzsoxmadmmgluczscxdwiekmgsgsfpaeostme'
 'tprfwcazbtbzwyibiuhbbahqalftfryyhpeeseduxftudcvmwdoxdvodgtxllvktkoxdta'
 'xrgqmjtiwqlknpfctmwqyhliawxyzrywienvogdkwovkeanxmjnkrztsvrqviprquflimp'
 'tjeouiphfcrtnisgaoxrjfgbwahijuxddbsxkhfqjwjwfcdgrbxagdcdekmoekshmkfwsl'
 'mbivynyctpdrqkutnzdaohkgpwqvsihfkpajczlwoonfziynibnwjxczttumcbrnrswtri'
 'qgxelwmjjvlwruuutnoozqpregjbaajrhhvsdicndnkvhepbseprvfjzmsamtkearzsuiu'
 'ilhsgpwwqoafgvkpuwhujbenbwnuqvoygwrnlnjccjhiesyyogtyhymiuzclvrkbobpapy'
 'crhjalcykreepmdbvyaxkvpuxdwmdllfcspdesbpjguysyaowbmhwbcufyhiksonleqpws'
 'ffyzerxefufrcctexydegnxvajzrywjebiegzckfxqxzsqdpohuvusrvbrmanwepeivelg'
 'jiwhhoxlemszimraisrvterytwhpasvkarrhgptlclklyblhuccnhumbqtqrllcldutkkn'
 'vmyfyxhkecmhqubcvsvmkgxmsbgllqyhdxmbuulzwygmtipoakakqywjadvltusxrfymzk'
 'mwjsjcayqbirlzpiipmebfyucqabcampwvigxieoknfwnvfvlranxyiaoibringfjolgxq'
 'uhdaeqwjmhamvxldxzlzqunxawmmdjcyrgzvxvfjcfwydhbbhmbxhovhlhtoqwnicmeahj'
 'jkpgitojuvwvtekomvwfkncxvfkzfrjpcyvlskvmfrizwsoiokoyxqwsvhsazbpbalmsvh'
 'fbznavgoeuystwjpoexhfwjvxkgkdcridrwdncsrxrkqntgbydjdzszwcfgghyolqlodnh'
 'ukyfblyhnwkwajpzgsfnynlnybynfmuzxseyddfrapnaycafugsstdfsfefkqaknsplwsq'
 'ntgbufdukybcrugxnmbsxrsielxqiqhwjnxdtbydzzgqunnhzoawgsflecbmtjjcxggqhe'
 'tgeaxynkgmzgjgzordrtqkdznaftqnyktdkrcxcikbouiniarathkxgyxmsnzrytuikwfm'
 'eqotkxgtxxtrfeomclyvzymxrggcdmpicebmbifyfzpldexgqvbptnnlutnxfdfihhuipa'
 'hvaxgdbdkszliszvetpsrvvxddeymuytpyrvctzmlyytrxovreojzjhcnlazgzsvykrbdq'
 'nopmhgjwcbaqlaasdneemkdfgcpxdtoqhddoknlmomdzmdrprvtegxkmzajctytacxpmka'
 'zzncyzgqpxmjcsgmfgmojfndgpawckwbjjeijlzzjmilfpxkwkzfqmjxbjteuqfeaknjvm'
 'iezrqegnodynjpasmbbffwvlavwnfraowzfmdmaspygyograisgopcaqxwednerkexwijw'
 'azvhyjnpkwiqkxsloqhsuvwlfbjbjtykturrefhpcpfnnyybpftjaqvfsfhbygmraejekq'
 'umfzztyxuocoydftixzqzxwlpxpyczowmuwlnuiiilxgocaxaaozxklnialkaagmucyixh'
 'qgsnnhqnfqntpleaymbkxckdpfgnnduejnrwuikayytokyoilqtisdmhisvwwpafcscxan'
 'xylrnvpcebsxjlbvtkoogkegqhzsfzgdyrulnknslgqusrqmbebhpfofnnysnewlvqxjal'
 'cmrshkjxwkcxsrdhwquujhzftvwqexbgjtyqdioatqxliatfrnabvzhoueeybgflzecdmq'
 'dghbsqclvuyvvtudiohm'
);

Suggested fixes (probably all of these are appropriate):

1. gistSplit should have check_stack_depth()

2. gistSplit should probably refuse to split anything if called with
only one item (which is the value being inserted).

3. somewhere before reaching gistSplit it might make sense to check
explicitly (e.g. in gistFormTuple) whether the tuple will actually
fit on a page.

4. pg_trgm probably should do something more sensible with large leaf
items, but this is a peripheral issue since ultimately the gist core
must enforce these limits rather than rely on the opclass.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 make check-world dies ingloriously for me, like this:

FWIW, it works fine for me on my Mac laptop, using the Perl 5.16.2 that
comes standard with OSX 10.9.5.  I did have to install IPC::Run from
CPAN though.

 #   Failed test 'initdb options handling'
 #   at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229.

This output seems to be pretty clear proof that you're not using
Apple's Perl.  What is it exactly (where did you get it from)?

Also, noticing that what you're using is evidently Perl 5.12, I'm
wondering whether our TAP test scripts require a fairly new Perl version.
I recall some of my Salesforce colleagues griping that the TAP scripts
didn't work with older Perls.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add support of IF NOT EXISTS to others CREATE statements

2014-10-02 Thread Fabrízio de Royes Mello
On Thu, Oct 2, 2014 at 9:38 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Tue, Aug 26, 2014 at 4:20 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  On 04/14/2014 10:31 PM, Fabrízio de Royes Mello wrote:
  The attached patch contains CINE for sequences.
 
  I just strip this code from the patch rejected before.
 
  Committed with minor changes

 Hmm, the CommitFest app lists Marko Tiikkaja as the reviewer, but I
 can't find his review anywhere...


Maybe he have no time to review it.


 The documentation claims:
 CREATE [ IF NOT EXISTS ] SEQUENCE name
 But grammar implements it the other way around:
 CREATE SEQUENCE IF NOT EXISTS name;


You are correct. Fix attached.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/create_sequence.sgml b/doc/src/sgml/ref/create_sequence.sgml
index 7292c3f..9e364ff 100644
--- a/doc/src/sgml/ref/create_sequence.sgml
+++ b/doc/src/sgml/ref/create_sequence.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-CREATE [ TEMPORARY | TEMP ] [ IF NOT EXISTS ] SEQUENCE replaceable class=parametername/replaceable [ INCREMENT [ BY ] replaceable class=parameterincrement/replaceable ]
+CREATE [ TEMPORARY | TEMP ] SEQUENCE [ IF NOT EXISTS ] replaceable class=parametername/replaceable [ INCREMENT [ BY ] replaceable class=parameterincrement/replaceable ]
 [ MINVALUE replaceable class=parameterminvalue/replaceable | NO MINVALUE ] [ MAXVALUE replaceable class=parametermaxvalue/replaceable | NO MAXVALUE ]
 [ START [ WITH ] replaceable class=parameterstart/replaceable ] [ CACHE replaceable class=parametercache/replaceable ] [ [ NO ] CYCLE ]
 [ OWNED BY { replaceable class=parametertable_name/replaceable.replaceable class=parametercolumn_name/replaceable | NONE } ]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread Jim Nasby

On 10/2/14, 2:43 PM, Josh Berkus wrote:

Questions:

  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

Yes.

+1

  2. What do you think such a feature should look like?

As with others, I think EXPLAIN is a good way to do this without adding
a keyword.  So you'd do:

EXPLAIN
ALTER TABLE 

I'm thinking it would be better to have something you could set at a session 
level, so you don't have to stick EXPLAIN in front of all your DDL.

As for the dry-run idea, I don't think that's really necessary. I've never seen anyone 
serious that doesn't have a development environment, which is where you would simply 
deploy the real DDL using verbose mode and see what the underlying commands 
actually do.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >