date:20130515

Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-15 Thread Amit Langote

Attached herewith is a patch based on description in my previous mail.
This patch would need revision since the error situation in case of
authentication timeout on the server needs to be handled; probably in
simple_prompt()?


--
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast promotion failure

2013-05-15 Thread Kyotaro HORIGUCHI

Hello,

> > >> Is the point of this discussion that the patch may leave out some
> > >> glich about timing of timeline-related changing and Heikki saw an
> > >> egress of that?
> > >
> > > AFAIU, the committed patch has some gap in overall scenario which is
> > the
> > > fast promotion issue.
> > 
> > Right, the fast promotion issue is still there.
> > 
> > Just to get us all on the same page again: Does anyone see a problem
> > with a fresh git checkout, with the fast-promotion-quick-fix.patch
> > applied?
> > (http://www.postgresql.org/message-id/51894942.4080...@vmware.com). If
> > you do, please speak up. As far as I know, the already-committed patch,
> > together with fast-promotion-quick-fix.patch, should fix all known
> > issues (*).

Shared XLogCtl->ThisTimeLineID is written and read without
fencing by spinlock unlike some other XLogCtl members. Can this
break coherency of its memory between different processors?  It
is quite reasonable that I cannot find the trouble if it is the
cause. I didn't see the issue even without
fast-promotion-quick-fix.patch.

> The patch provided will un-necessarily call InitXLOGAccess() 2 times for End
> of recovery checkpoint, it doesn't matter w.r.t performance but actually the
> purpose will
> be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(),
> or am I missing something.
> 
> One more thing, I think after fast promotion, either it should set timeline
> or give error in CreateCheckPoint() function before it reaches the check
> mentioned by you in your initial mail.
> if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0) 
> elog(ERROR, "can't create a checkpoint during recovery");
> Shouldn't it set timeline in above check (RecoveryInProgress()) or when
> RecoveryInProgress() is called before CreateCheckPoint()?

Thinking of checkpointer, it does RecoveryInProgress() far
earlier to there, in waiting loop in CheckPointerMain where to
decide which to do between checkpoint and restartpoint. So
InitXLogAccess() has been already done when checkpoint is choosed
there for the first time. And before that, ThisTimeLineID in the
startup process gets incremented and is copied onto
XLogCtl->ThisTimeLineID before xlogctl->SharedRecoveryInProgress
becomes false in StartupXLog().  I don't think it is possible for
checkpointer to run on older timeline id on codition that all
processes looks exactly the same memory image.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)

2013-05-15 Thread Daniel Farina

On Wed, May 15, 2013 at 3:08 PM, Daniel Farina  wrote:
> On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas
>  wrote:
>> pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when
>> I run "pgbench -S -c64 -j64 -T60 -M prepared" on a 32-core Linux machine:
>>
>> -  64.09%  postgres  postgres   [.] tas
>>- tas
>>   - 99.83% s_lock
>>  - 53.22% LWLockAcquire
>> + 99.87% GetSnapshotData
>>  - 46.78% LWLockRelease
>>   GetSnapshotData
>> + GetTransactionSnapshot
>> +   2.97%  postgres  postgres   [.] tas
>> +   1.53%  postgres  libc-2.13.so   [.] 0x119873
>> +   1.44%  postgres  postgres   [.] GetSnapshotData
>> +   1.29%  postgres  [kernel.kallsyms]  [k] arch_local_irq_enable
>> +   1.18%  postgres  postgres   [.] AllocSetAlloc
>> ...
>>
>> So, on this test, a lot of time is wasted spinning on the mutex of
>> ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a
>> surprisingly steep drop in performance once you go beyond 29 clients
>> (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory
>> is that after that point all the cores are busy, and processes start to be
>> sometimes context switched while holding the spinlock, which kills
>> performance.

I accidentally some important last words from Heikki's last words in
his mail, which make my correspondence pretty bizarre to understand at
the outset.  Apologies.  He wrote:

>> [...] Has anyone else seen that pattern?

> I have, I also used linux perf to come to this conclusion, and my
> determination was similar: a system was undergoing increasingly heavy
> load, in this case with processes >> number of processors.  It was
> also a phase-change type of event: at one moment everything would be
> going great, but once a critical threshold was hit, s_lock would
> consume enormous amount of CPU time.  I figured preemption while in
> the spinlock was to blame at the time, given the extreme nature.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-15 Thread Amit Langote

Sorry that I am writing separate emails on the same topic.
I seem to have a solution that allows us to accomplish what we are
trying to without much change to the existing libpq interface
(especially what to expect about return values and connection state
that we are in when we return from connectDBComplete() and
PQconnectPoll() ).

Following are required changes roughly:

1] in src/bin/psql/startup.c, main()

if (PQstatus(pset.db) == CONNECTION_BAD &&
PQconnectionNeedsPassword(pset.db) &&
password == NULL &&
pset.getPassword != TRI_NO)
{
password = simple_prompt(password_prompt, 100, false);
/* How would this detect authentication_timeoue and exit
accordingly ?*/
PQsendPassword(pset.db, password);
}

And there is no do{...}while(new_pass); unlike current code.

2] in src/interfaces/libpq/fe-connect.c, new function: void
PQsendPassword(PGconn *conn, char *password)/*suggest better name?
*/

void PQsendPassword(PGconn *conn, char *password)
{
   conn->pgpass = password;
   conn->status = CONNECTION_SENDING_PASSWORD;   /*suggest better
name for the status? */

   (void) connectDBComplete(conn);

}


3] in src/interfaces/libpq/fe-connect.c, connectDBComplete(PGconn
*conn), No change required. :-)

4] in in src/interfaces/libpq/fe-connect.c, PQconnectPoll(PGconn *conn)

a) add a new case for both switch's (one before and after keep_going: )

/* These are writing states, so we just proceed. */
case CONNECTION_STARTED:
case CONNECTION_MADE:
case CONNECTION_SENDING_PASSWORD:
break;
...
...
keep_going:
...
...
case CONNECTION_SENDING_PASSWORD:
 {
 /*
 ** Note that conn->pghost must be
non-NULL if we are going to
 ** avoid the Kerberos code doing a
hostname look-up.
 **/
if (pg_fe_sendauth(areq, conn) != STATUS_OK)
{
conn->errorMessage.len =
strlen(conn->errorMessage.data);
goto error_return;
}
conn->errorMessage.len =
strlen(conn->errorMessage.data);

/*
 ** Just make sure that any data sent
by pg_fe_sendauth is
 ** flushed out.  Although this
theoretically could block, it
 ** really shouldn't since we don't
send large auth responses.
 **/
if (pqFlush(conn))
goto error_return;

/*
 * Now go to read the server's
response to password just sent
 * */
conn->status = CONNECTION_AWAITING_RESPONSE;
return PGRES_POLLING_READING;
}

5] in src/interfaces/libpq/libpq-fe.h, add a new intermediate connection state

/*
 * Although it is okay to add to these lists, values which become unused
 * should never be removed, nor should constants be redefined - that would
 * break compatibility with existing code.
 */

typedef enum
{
CONNECTION_OK,
CONNECTION_BAD,
/* Non-blocking mode only below here */

/*
 * The existence of these should never be relied upon - they should only
 * be used for user feedback or similar purposes.
 */
CONNECTION_STARTED, /* Waiting for
connection to be made.  */
CONNECTION_MADE,/* Connection OK;
waiting to send. */
CONNECTION_AWAITING_RESPONSE,   /* Waiting for a
response from the

  * postmaster.*/
CONNECTION_AUTH_OK, /* Received
authentication; waiting for
 *
backend startup. */
CONNECTION_SETENV,  /* Negotiating environment. */
CONNECTION_SSL_STARTUP, /* Negotiating SSL. */
CONNECTION_NEEDED,  /* Internal state:
connect() needed */
CONNECTION_SENDING_PASSWORD
} ConnStatusType;



As you can probably see this requires minimum libpq changes:

1] Add one more connection state: CONNECTION_SENDING_PASSWORD
2] Add one more function: PQsendPassword(PGconn*, char*)
3] Modify PQconnectPoll() to allow to handle an intermediate
CONNECTION_SENDING_PASSWORD state for the clients which use
PQsendPassword() to send a password that user entered in between a
connection sequence over an existing connection.

Comments?


-- 
Sent via pgsq

Re: [HACKERS] plperl segfault in plperl_trusted_init() on kfreebsd

2013-05-15 Thread Christoph Berg

Re: Tom Lane 2013-05-15 <244.1368675...@sss.pgh.pa.us>
> You sure that wholesale override of CFLAGS is a good idea?
> This report doesn't show what flags the rest of your system was built
> with, but seems like there could be a compatibility issue.  Or maybe
> you prevented plperl from being built with -fpic, or something like
> that.

That was because the plain "./configure" version (for a minimal way to
reproduce) didn't built with debug symbols. The original gcc line from the
Debian build log is:

cd build && ../configure \
   --with-tcl \
   --with-perl \
   --with-python \
   --with-pam \
   --with-krb5 \
   --with-gssapi \
   --with-openssl \
   --with-libxml \
   --with-libxslt \
   --with-ldap \
   --with-tclconfig=/usr/lib/tcl8.5 \
   --with-tkconfig=/usr/lib/tk8.5 \
   --with-includes=/usr/include/tcl8.5 \
   PYTHON=/usr/bin/python \
   --mandir=/usr/share/postgresql/9.3/man 
--docdir=/usr/share/doc/postgresql-doc-9.3 --sysconfdir=/etc/postgresql-common 
--datarootdir=/usr/share/ --datadir=/usr/share/postgresql/9.3 
--bindir=/usr/lib/postgresql/9.3/bin --libdir=/usr/lib/ 
--libexecdir=/usr/lib/postgresql/ --includedir=/usr/include/postgresql/ 
--enable-nls --enable-integer-datetimes --enable-thread-safety --enable-debug 
--disable-rpath --with-ossp-uuid --with-gnu-ld --with-pgport=5432 
--with-system-tzdata=/usr/share/zoneinfo CFLAGS='-g -O2 -fstack-protector 
--param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie 
-I/usr/include/mit-krb5' LDFLAGS='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed 
-L/usr/lib/mit-krb5 -L/usr/lib/x86_64-kfreebsd-gnu/mit-krb5'


/usr/bin/make -C plperl all
make[5]: Entering directory 
`/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/src/pl/plperl'
msgfmt -o po/cs.mo 
/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/po/cs.po
[...]
'/usr/bin/perl' 
/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/text2macro.pl
 --strip='^(\#.*|\s*)$' 
/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/plc_perlboot.pl
 
/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/plc_trusted.pl
 > perlchunks.h
'/usr/bin/perl' 
/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/plperl_opmask.pl
 plperl_opmask.h
gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -fPIC -pie -I/usr/include/mit-krb5 -Wall 
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. 
-I/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl
 -I../../../src/include 
-I/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/include
 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/tcl8.5 -I/usr/lib/perl/5.14/CORE  -c -o plperl.o 
/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/plperl.c
'/usr/bin/perl' /usr/share/perl/5.14/ExtUtils/xsubpp -typemap 
/usr/share/perl/5.14/ExtUtils/typemap 
/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/SPI.xs
 >SPI.c
gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -fPIC -pie -I/usr/include/mit-krb5 -Wall 
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. 
-I/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl
 -I../../../src/include 
-I/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/include
 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/tcl8.5 -I/usr/lib/perl/5.14/CORE  -c -o SPI.o SPI.c
'/usr/bin/perl' /usr/share/perl/5.14/ExtUtils/xsubpp -typemap 
/usr/share/perl/5.14/ExtUtils/typemap 
/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/Util.xs
 >Util.c
gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -fPIC -pie -I/usr/include/mit-krb5 -Wall 
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. 
-I/build/buildd-postgresql-9.3_9.3~beta1-2-

Re: [HACKERS] commit fest schedule for 9.4

2013-05-15 Thread Boszormenyi Zoltan


2013-05-15 20:05 keltezéssel, Andrew Dunstan írta:


On 05/15/2013 02:00 PM, Josh Berkus wrote:

Obviously we need a meta-manager who makes sure we have managers, and is
able to notice that a manager is MIA and needs replaced (or at least
backed-up).

And then a meta-meta-manager to make sure that the meta-manager is
meta-managing.

And an Inspector General.  Anyone have Danny Kaye's phone number?



Or Gogol's?


You have to learn the dialing chant to call them... ;-)



cheers

andrew





--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-15 Thread Amit Langote

>>> I created a patch which enables it to use the existing connection in
>>> such a case (unlike what we currently do). It modifies
>>> connectDBComplete() and PQconnectPoll() to also include states
>>> pertaining to password being accepted from the user. That is, the
>>> state machine in PQconnectPoll() is further extended to include a
>>> connection state called CONNECTION_ASKING_PASSWORD which is entered
>>> when server sends AUTH_REQ_MD5 or AUTH_REQ_PASSWORD auth requests.
>>
>> Great! The new client state seems to be effective also for MD5.  But
>> it seems to break existing libpq client doing the same authentication
>> sequence as current psql. Some means would be necessary to switch the
>> behavior when password is not previously provided but needed by the
>> server, or make the first half of the connection sequence to be
>> compatible to the current sequence - in other words - It should be
>> that when the user finds stauts is CONNECTION_BAD and
>> PQconnectionNeedsPassword() == true, the user can throw away the
>> connection and make new connection providing password, and also can
>> send password on existing connection.
>
> The first half of connection sequence remains same except for one
> change: in PQconnectPoll(), when in case CONNECTION_AWAITING_RESPONSE,
> if server sends md5/password authentication request, it returns
> PGRES_POLLING_WAITING_PASSWORD, which, back in connectDBComplete()
> sets conn->password = true and conn->status =
> CONNECTION_ASKING_PASSWORD. Back in main(), this causes a password
> prompt and then the second half of the connection sequence. Hence
> pg_fe_sendauth() is not called in this first half unless it's a
> different authentication method than md5 and password.

One more thing that I forgot to mention is that connection sequence
would enter CONNECTION_ASKING_PASSWORD in the first half, only if
password is currently not set to a non-empty value that is (
conn->pgpass ==NULL || conn->pgpass[0] = '\0' ) is true. I was
wondering what would be the case for other applications using libpq
when they return from connectionDBComplete() with conn->status set to
CONNECTION_ASKING_PASSWORD, provided they have not set conn->pgpass to
a non-empty value.If they are currently handling this based on
CONNECTION_BAD, then this change does no good to them. In fact there
needs to be a way for them to get CONNECTION_BAD.
Since, this whole patch is about not having to drop-and-reconnect *in
case of password prompt*, how it changes libpq for other applications
also needs to be addressed here. especially for md5/password
authentication cases. Currently, any attempt to connect using empty or
NULL password results in CONNECTION_BAD for all libpq based clients.
Thoughts?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-15 Thread Jon Nelson

On Wed, May 15, 2013 at 10:17 PM, Alvaro Herrera
 wrote:
> Jon Nelson escribió:
>> On Wed, May 15, 2013 at 4:46 PM, Jon Nelson  
>> wrote:
>
>> > That's true. I originally wrote the patch using fallocate(2). What
>> > would be appropriate here? Should I switch on the return value and the
>> > six (6) or so relevant error codes?
>>
>> I addressed this, hopefully in a reasonable way.
>
> Would it work to just assign the value you got from posix_fallocate (if
> nonzero) to errno and then use %m in the errmsg() call in ereport()?

That strikes me as a better way. I'll work something up soon.
Thanks!

--
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plperl segfault in plperl_trusted_init() on kfreebsd

2013-05-15 Thread Tom Lane

Christoph Berg  writes:
> Here's what I did:

> ./configure
> cd src && make
> cd pl/plperl && make clean && make CFLAGS="-g"
> make check

You sure that wholesale override of CFLAGS is a good idea?
This report doesn't show what flags the rest of your system was built
with, but seems like there could be a compatibility issue.  Or maybe
you prevented plperl from being built with -fpic, or something like
that.

I tend to use
make PROFILE="-g"
for this sort of tweak to the compile flags --- that adds the flags
you specify, without removing any.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-15 Thread Alvaro Herrera

Jon Nelson escribió:
> On Wed, May 15, 2013 at 4:46 PM, Jon Nelson  wrote:

> > That's true. I originally wrote the patch using fallocate(2). What
> > would be appropriate here? Should I switch on the return value and the
> > six (6) or so relevant error codes?
> 
> I addressed this, hopefully in a reasonable way.

Would it work to just assign the value you got from posix_fallocate (if
nonzero) to errno and then use %m in the errmsg() call in ereport()?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-15 Thread Jon Nelson

On Wed, May 15, 2013 at 4:46 PM, Jon Nelson  wrote:
> On Wed, May 15, 2013 at 4:34 PM, Andres Freund  wrote:
..
>> Some where quick comments, without thinking about this:
>
> Thank you for the kind feedback.
>
>> * needs a configure check for posix_fallocate. The current version will
>>   e.g. fail to compile on windows or many other non linux systems. Check
>>   how its done for posix_fadvise.

The following patch includes the changes to configure.in.
I had to make other changes (not included here) because my local
system uses autoconf 2.69, but I did test this successfully.

> That's true. I originally wrote the patch using fallocate(2). What
> would be appropriate here? Should I switch on the return value and the
> six (6) or so relevant error codes?

I addressed this, hopefully in a reasonable way.

--
Jon


fallocate.patch-v2
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)

2013-05-15 Thread Andres Freund

On 2013-05-15 18:35:35 +0300, Heikki Linnakangas wrote:
> Truncating a heap at the end of vacuum, to release unused space back to
> the OS, currently requires taking an AccessExclusiveLock. Although it's only
> held for a short duration, it can be enough to cause a hiccup in query
> processing while it's held. Also, if there is a continuous stream of queries
> on the table, autovacuum never succeeds in acquiring the lock, and thus the
> table never gets truncated.
> 
> I'd like to eliminate the need for AccessExclusiveLock while truncating.

Couldn't we "just" take the extension lock and then walk backwards from
the rechecked end of relation ConditionalLockBufferForCleanup() the
buffers?
For every such locked page we check whether its still empty. If we find
a page that we couldn't lock, isn't empty or we already locked a
sufficient number of pages we truncate.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Extent Locks

2013-05-15 Thread Stephen Frost

All,

  Starting a new thread to avoid hijacking Heikki's original, but..

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> Truncating a heap at the end of vacuum, to release unused space back to
> the OS, currently requires taking an AccessExclusiveLock. Although
> it's only held for a short duration, it can be enough to cause a
> hiccup in query processing while it's held. Also, if there is a
> continuous stream of queries on the table, autovacuum never succeeds
> in acquiring the lock, and thus the table never gets truncated.

  Extent locking suffers from very similar problems and we really need
  to improve this situation.  With today's fast i/o systems, and massive
  numbers of CPUs in a single system, it's absolutely trivial to have a
  whole slew of processes trying to add data to a single relation and
  that access getting nearly serialized due to everyone waiting on the
  extent lock.

  Perhaps one really simple approach would be to increase the size of
  the extent which is created in relation to the size of the relation.
  I've no clue what level of effort is involved there but I'm hoping
  such an approach would help.  I've long thought that it'd be very neat
  if we could simply give each bulk-inserter process their own 1G chunk
  to insert directly into w/o having to talk to anyone else.  The
  creation of the specific 1G piece could, hopefully, be made atomic
  easily (either thanks to the OS or with our own locking), etc, etc.

  I'm sure it's many bricks shy of a load, but I wanted to raise the
  issue, again, as I've seen it happening on yet another high-volume
  write-intensive system.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] "on existing update" construct

2013-05-15 Thread Dev Kumkar

Basically I was referring to this link http://mbk.projects.postgresql.org

Any suggestions here?

Thanks!

Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)

2013-05-15 Thread Tom Lane

Robert Haas  writes:
> I've been thinking for a while that we need some other system for
> managing other kinds of invalidations.  For example, suppose we want
> to cache relation sizes in blocks.  So we allocate 4kB of shared
> memory, interpreted as an array of 512 8-byte entries.  Whenever you
> extend a relation, you hash the relfilenode and take the low-order 9
> bits of the hash value as an index into the array.  You increment that
> value either under a spinlock or perhaps using fetch-and-add where
> available.

I'm not sure I believe the details of that.

1. 4 bytes is not enough to store the exact identity of the table that
the cache entry belongs to, so how do you disambiguate?

2. If you don't find an entry for your target rel in the cache, aren't
you still going to have to do an lseek?

Having said that, the idea of specialized caches in shared memory seems
plenty reasonable to me.

One thing that's bothering me about Heikki's proposal is that it's not
clear that it's a *cache*; that is, I don't see the fallback logic to
use when there's no entry for a relation for lack of room.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "on existing update" construct

2013-05-15 Thread Dev Kumkar

> Please do not top-post on the PostgreSQL lists. See <
http://idallen.com/topposting.html>
> Also, note that pgsql-hackers is entirely the wrong list for this sort of
question. You should have been asking on pgsql-general.
Apologies for top-post!
Well I thought pgsql-hackers will suggest some new solution was just going
by the description of mailing-list. Its worth looking into
http://stackoverflow.com/questions/1109061/insert-on-duplicate-update-postgresql

Let me know and I will post this question on pgsql-general.

> It's been on the TODO list forever.  It's harder to implement well than
> you would think.
Yes, I think it be nice to see this in one of the upcoming releases.

Thanks!

Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)

2013-05-15 Thread Robert Haas

On Wed, May 15, 2013 at 7:10 PM, Tom Lane  wrote:
>> Another problem is that sinval resets are bad for performance, and
>> anything we do that pushes more messages through sinval will increase
>> the frequency of resets.
>
> I've been thinking that we should increase the size of the sinval ring;
> now that we're out from under SysV shmem size limits, it wouldn't be
> especially painful to do that.  That's not terribly relevant to this
> issue though.  I agree that we don't want an sinval message per relation
> extension, no matter what the ring size is.

I've been thinking for a while that we need some other system for
managing other kinds of invalidations.  For example, suppose we want
to cache relation sizes in blocks.  So we allocate 4kB of shared
memory, interpreted as an array of 512 8-byte entries.  Whenever you
extend a relation, you hash the relfilenode and take the low-order 9
bits of the hash value as an index into the array.  You increment that
value either under a spinlock or perhaps using fetch-and-add where
available.

On the read side, every backend can cache the length of as many
relations as it wants.  But before relying on a cached value, it must
index into the shared array and see whether the value has been
updated.  On 64-bit systems, this requires no lock, only a barrier,
and some 32-bit systems have special instructions that can be used for
an 8-byte atomic read, and hence could avoid the lock as well.  This
would almost certainly be cheaper than doing an lseek every time,
although maybe not by enough to matter.  At least on Linux, the
syscall seems to be pretty cheap.

Now, a problem with this is that we keep doing things that make it
hard for people to run very low memory instances of PostgreSQL.  So
potentially whether or not we allocate space for this could be
controlled by a GUC.  Or maybe the structure could be made somewhat
larger and shared among multiple caching needs.

I'm not sure whether this idea can be adapted to do what Heikki is
after.  But I think these kinds of techniques are worth thinking about
as we look for ways to further improve performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)

2013-05-15 Thread Tom Lane

Robert Haas  writes:
> On Wed, May 15, 2013 at 11:35 AM, Heikki Linnakangas
>  wrote:
>> To not slow down common backend
>> operations, the values (or lack thereof) are cached in relcache. To sync the
>> relcache when the values change, there will be a new shared cache
>> invalidation event to force backends to refresh the cached watermark values.

> AFAIK, the sinval mechanism isn't really well-designed to ensure that
> these kinds of notifications arrive in a timely fashion.

Yeah; currently it's only meant to guarantee that you see updates that
were protected by obtaining a heavyweight lock with which your own lock
request conflicts.  It will *not* work for the usage Heikki proposes,
at least not without sprinkling sinval queue checks into a lot of places
where they aren't now.  And as you say, the side-effects of that would
be worrisome.

> Another problem is that sinval resets are bad for performance, and
> anything we do that pushes more messages through sinval will increase
> the frequency of resets.

I've been thinking that we should increase the size of the sinval ring;
now that we're out from under SysV shmem size limits, it wouldn't be
especially painful to do that.  That's not terribly relevant to this
issue though.  I agree that we don't want an sinval message per relation
extension, no matter what the ring size is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump versus defaults on foreign tables

2013-05-15 Thread Josh Berkus


> In any case, that restriction isn't breaking pg_dump so I don't feel
> a need to change it during beta.

Sure.  Maybe put it on the stack for 9.4.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "on existing update" construct

2013-05-15 Thread Josh Berkus

On 05/15/2013 02:22 PM, Dev Kumkar wrote:
> Well ya, stored procedure also was thinking about pre-insert trigger.
> 
> Am sure folks here must have gone thru this and had alternatives. Also was
> reading about merge_by_key.

It's been on the TODO list forever.  It's harder to implement well than
you would think.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump versus defaults on foreign tables

2013-05-15 Thread Tom Lane

Josh Berkus  writes:
>> Comments?

> I personally think that FOREIGN should be optional on most commands
> where ambiguity is unlikely.  In addition to ALTER, there's no
> particularly good reason why "DROP TABLE myforeigntable" doesn't work.

Well, it's certainly bizarre that we allow ALTER TABLE to work for most
non-table relation types (eg views, sequences) but specifically exclude
foreign tables, which after all are *called* tables.  So I think that's
just plain wrong.

I'm less excited about changing the DROP case, because there is
precedent extending way back for being anal about that case, eg
we do this at least as far back as 7.4:

regression=# create sequence s1;
CREATE SEQUENCE
regression=# drop table s1;
ERROR:  "s1" is not a table
HINT:  Use DROP SEQUENCE to remove a sequence.

I'm not against being laxer about DROP, but it ought to be a
thought-through change that considers what to do for all relation types
not just foreign tables.  In particular, should it follow the precedent
of ALTER TABLE that we allow "ALTER TABLE" on non-tables but if you use
another spelling such as "ALTER SEQUENCE" then it has to match the
relation type?

In any case, that restriction isn't breaking pg_dump so I don't feel
a need to change it during beta.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better handling of archive_command problems

2013-05-15 Thread Peter Geoghegan

On Wed, May 15, 2013 at 3:46 AM, Robert Haas  wrote:
> One possible objection to this line of attack is that, IIUC, waits to
> acquire a LWLock are non-interruptible.  If someone tells PostgreSQL
> to wait for some period of time before performing each WAL write,
> other backends that grab the WALWriteLock will not respond to query
> cancels during that time.

I don't see any reasonable way to make LWLocks care about interrupts
(using all 3 possible underlying semaphore implementations, no less).
As it says within LWLockAcquire:

/*
 * Lock out cancel/die interrupts until we exit the code section protected
 * by the LWLock.  This ensures that interrupts will not interfere with
 * manipulations of data structures in shared memory.
 */
HOLD_INTERRUPTS();

We've been pretty judicious about placing CHECK_FOR_INTERRUPTS() calls
in the right places, but it's still quite possible to see the server
take multiple seconds - perhaps even as many as 10 - to respond to an
interrupt (by psql SIGINT). Now, I didn't have enough of an interest
at the times I noticed this to figure out exactly why that may have
been or to somehow characterize it, but I don't accept that it's a
violation of some Postgres precept that this setting could result in
interrupts taking multiple seconds, and maybe even as many as 10. I'd
go so far as to let the user make the throttling sleep take as long as
they like, even though this admittedly would sort of break such a
precept.

There is a setting called zero_damaged_pages, and enabling it causes
data loss. I've seen cases where it was enabled within postgresql.conf
for years.

> Worse, the locks have a tendency to back up.

Well, yes, sleeping on WALWriteLock has some fairly bad consequences
for performance. That's sort of the idea. The bar is rather low here,
to my mind (the state of the art currently is deal with it/monitor it
yourself, and risk a PANIC shutdown if you fail). I think it's worth
noting that at least one other "Enterprise" RDBMS has a similar
feature.

The only situation that I can see where a backend would acquire
WALWriteLock alongside WALInsertLock is when a segment switch needs to
occur (i.e. a XLOG_SWITCH record is being inserted). If that's the
case, no one is going to be able to insert WAL for as long as the
sleep to throttle occurs anyway, so there is no additional harm done.

> Now despite all that, I can see this being useful enough that Heroku
> might want to insert a very small patch into their version of
> PostgreSQL to do it this way, and just live with the downsides.  But
> anything that can propagate non-interruptible waits across the entire
> system does not sound to me like a feature that is sufficiently
> polished that we want to expose it to users less sophisticated than
> Heroku (i.e. nearly all of them).  If we do this, I think we ought to
> find a way to make the waits interruptible, and to insert them in
> places where they really don't interfere with read-only backends.

It would be nice to be able to be sure that CLogControlLock could not
be held for multiple seconds as a result of this. However, I don't see
any reasons to let the perfect be the enemy of the good, or at least
the better. Just how likely is it that the scenario you describe will
affect reads in the real world? In any case, this is a problem in its
own right.

I don't intend to make any promises about how this throttling will
affect read queries, except perhaps something vague and loose.

> I'd probably also argue that we ought to try to design it such that the
> GUC can be in MB/s rather than delay/WAL writer cycle.

That'd certainly be better, but I can't see a way of doing it without
adding a whole bunch of mechanism to some important codepaths, like
within XLogWrite(), which would be quite a hard sell.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)

2013-05-15 Thread Daniel Farina

On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas
 wrote:
> pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when
> I run "pgbench -S -c64 -j64 -T60 -M prepared" on a 32-core Linux machine:
>
> -  64.09%  postgres  postgres   [.] tas
>- tas
>   - 99.83% s_lock
>  - 53.22% LWLockAcquire
> + 99.87% GetSnapshotData
>  - 46.78% LWLockRelease
>   GetSnapshotData
> + GetTransactionSnapshot
> +   2.97%  postgres  postgres   [.] tas
> +   1.53%  postgres  libc-2.13.so   [.] 0x119873
> +   1.44%  postgres  postgres   [.] GetSnapshotData
> +   1.29%  postgres  [kernel.kallsyms]  [k] arch_local_irq_enable
> +   1.18%  postgres  postgres   [.] AllocSetAlloc
> ...
>
> So, on this test, a lot of time is wasted spinning on the mutex of
> ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a
> surprisingly steep drop in performance once you go beyond 29 clients
> (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory
> is that after that point all the cores are busy, and processes start to be
> sometimes context switched while holding the spinlock, which kills
> performance.

I have, I also used linux perf to come to this conclusion, and my
determination was similar: a system was undergoing increasingly heavy
load, in this case with processes >> number of processors.  It was
also a phase-change type of event: at one moment everything would be
going great, but once a critical threshold was hit, s_lock would
consume enormous amount of CPU time.  I figured preemption while in
the spinlock was to blame at the time, given the extreme nature.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-15 Thread Jon Nelson

On Wed, May 15, 2013 at 4:34 PM, Andres Freund  wrote:
> Hi,
>
> On 2013-05-15 16:26:15 -0500, Jon Nelson wrote:
>> >> I have written up a patch to use posix_fallocate in new WAL file
>> >> creation, including configuration by way of a GUC variable, but I've
>> >> not contributed to the PostgreSQL project before. Therefore, I'm
>> >> fairly certain the patch is not formatted properly or conforms to the
>> >> appropriate style guides. Currently, the patch is based on 9.2, and is
>> >> quite small in size - 3.6KiB.
>>
>> I have re-based and reformatted the code, and basic testing shows a
>> reduction in WAL-file creation time of a fairly significant amount.
>> I ran 'make test' and did additional local testing without issue.
>> Therefore, I am attaching the patch. I will try to add it to the
>> commitfest page.
>
> Some where quick comments, without thinking about this:

Thank you for the kind feedback.

> * needs a configure check for posix_fallocate. The current version will
>   e.g. fail to compile on windows or many other non linux systems. Check
>   how its done for posix_fadvise.

I will address as soon as I am able.

> * Is wal file creation performance actually relevant? Is the performance
>   of a system running on fallocate()d wal files any different?

In my limited testing, I noticed a drop of approx. 100ms per WAL file.
I do not have a good idea for how to really stress the WAL-file
creation area without calling pg_start_backup and pg_stop_backup over
and over (with archiving enabled).

However, a file allocated with fallocate is (supposed to be) less
fragmented than one created by the traditional means.

> * According to the man page posix_fallocate doesn't set errno but rather
>   returns the error code.

That's true. I originally wrote the patch using fallocate(2). What
would be appropriate here? Should I switch on the return value and the
six (6) or so relevant error codes?

> * I wonder whether we ever want to actually disable this? Afair the libc
>   contains emulation for posix_fadvise if the filesystem doesn't support
>   it.

I know that glibc does, but I don't know about other libc implementations.

--
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-15 Thread Andres Freund

Hi,

On 2013-05-15 16:26:15 -0500, Jon Nelson wrote:
> >> I have written up a patch to use posix_fallocate in new WAL file
> >> creation, including configuration by way of a GUC variable, but I've
> >> not contributed to the PostgreSQL project before. Therefore, I'm
> >> fairly certain the patch is not formatted properly or conforms to the
> >> appropriate style guides. Currently, the patch is based on 9.2, and is
> >> quite small in size - 3.6KiB.
> 
> I have re-based and reformatted the code, and basic testing shows a
> reduction in WAL-file creation time of a fairly significant amount.
> I ran 'make test' and did additional local testing without issue.
> Therefore, I am attaching the patch. I will try to add it to the
> commitfest page.

Some where quick comments, without thinking about this:

* needs a configure check for posix_fallocate. The current version will
  e.g. fail to compile on windows or many other non linux systems. Check
  how its done for posix_fadvise.
* Is wal file creation performance actually relevant? Is the performance
  of a system running on fallocate()d wal files any different?
* According to the man page posix_fallocate doesn't set errno but rather
  returns the error code.
* I wonder whether we ever want to actually disable this? Afair the libc
  contains emulation for posix_fadvise if the filesystem doesn't support
  it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-15 Thread Jon Nelson

On Tue, May 14, 2013 at 9:43 PM, Robert Haas  wrote:
> On Mon, May 13, 2013 at 9:54 PM, Jon Nelson  wrote:
>> Pertinent to another thread titled
>> [HACKERS] corrupt pages detected by enabling checksums
>> I hope to explore the possibility of using fallocate (or
>> posix_fallocate) for new WAL file creation.
>>
>> Most modern Linux filesystems support fast fallocate/posix_fallocate,
>> reducing extent fragmentation (where extents are used) and frequently
>> offering a pretty significant speed improvement. In my tests, using
>> posix_fallocate (followed by pg_fsync) is at least 28 times quicker
>> than using the current method (which writes zeroes followed by
>> pg_fsync).
>>
>> I have written up a patch to use posix_fallocate in new WAL file
>> creation, including configuration by way of a GUC variable, but I've
>> not contributed to the PostgreSQL project before. Therefore, I'm
>> fairly certain the patch is not formatted properly or conforms to the
>> appropriate style guides. Currently, the patch is based on 9.2, and is
>> quite small in size - 3.6KiB.

I have re-based and reformatted the code, and basic testing shows a
reduction in WAL-file creation time of a fairly significant amount.
I ran 'make test' and did additional local testing without issue.
Therefore, I am attaching the patch. I will try to add it to the
commitfest page.


--
Jon


0001-enhance-GUC-and-xlog-with-wal_use_fallocate-boolean-.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "on existing update" construct

2013-05-15 Thread Dev Kumkar

Well ya, stored procedure also was thinking about pre-insert trigger.

Am sure folks here must have gone thru this and had alternatives. Also was
reading about merge_by_key.

On Wed, May 15, 2013 at 12:51 PM, Daniel Farina  wrote:

> On Wed, May 15, 2013 at 11:44 AM, Dev Kumkar 
> wrote:
> > Hello,
> >
> > Is there an alternative of Sybase "on existing update" construct in
> pgsql.
> >
> > "ON DUPLICATE KEY UPDATE" doesn't work.
> >
> > Thanks in advance!
>
> No, you'll have to either handle this in the application or use a
> stored procedure at this time.  The omission of such a construct from
> psql's "\h" command and the manual is not in error.
>

Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)

2013-05-15 Thread Robert Haas

On Wed, May 15, 2013 at 11:35 AM, Heikki Linnakangas
 wrote:
> Shared memory space is limited, but we only need the watermarks for any
> in-progress truncations. Let's keep them in shared memory, in a small
> fixed-size array. That limits the number of concurrent truncations that can
> be in-progress, but that should be ok.

Would it only limit the number of concurrent transactions that can be
in progress *due to vacuum*?  Or would it limit the total number of
TOTAL concurrent truncations?  Because a table could have arbitrarily
many inheritance children, and you might try to truncate the whole
thing at once...

> To not slow down common backend
> operations, the values (or lack thereof) are cached in relcache. To sync the
> relcache when the values change, there will be a new shared cache
> invalidation event to force backends to refresh the cached watermark values.
> A backend (vacuum) can ensure that all backends see the new value by first
> updating the value in shared memory, sending the sinval message, and waiting
> until everyone has received it.

AFAIK, the sinval mechanism isn't really well-designed to ensure that
these kinds of notifications arrive in a timely fashion.  There's no
particular bound on how long you might have to wait.  Pretty much all
inner loops have CHECK_FOR_INTERRUPTS(), but they definitely do not
all have AcceptInvalidationMessages(), nor would that be safe or
practical.  The sinval code sends catchup interrupts, but only for the
purpose of preventing sinval overflow, not for timely receipt.

Another problem is that sinval resets are bad for performance, and
anything we do that pushes more messages through sinval will increase
the frequency of resets.  Now if those are operations are things that
are relatively uncommon, it's not worth worrying about - but if it's
something that happens on every relation extension, I think that's
likely to cause problems.  That could leave to wrapping the sinval
queue around in a fraction of a second, leading to system-wide sinval
resets.  Ouch.

> With the watermarks, truncation works like this:
>
> 1. Set soft watermark to the point where we think we can truncate the
> relation. Wait until everyone sees it (send sinval message, wait).

I'm also concerned about how you plan to synchronize access to this
shared memory arena.  I thought about implementing a relation size
cache during the 9.2 cycle, to avoid the overhead of the approximately
1 gazillion lseek calls we do under e.g. a pgbench workload.  But the
thing is, at least on Linux, the system calls are pretty cheap, and on
kernels >= 3.2, they are lock-free.  On earlier kernels, there's a
spinlock acquire/release cycle for every lseek, and performance tanks
with >= 44 cores.  That spinlock is around a single memory fetch, so a
spinlock or lwlock around the entire array would presumably be a lot
worse.

It seems to me that under this system, everyone who would under
present circumstances invoke lseek() would have to first have to query
this shared memory area, and then if they miss (which is likely, since
most of the time there won't be a truncation in progress) they'll
still have to do the lseek.  So even if there's no contention problem,
there could still be a raw loss of performance.  I feel like I might
be missing a trick though; it seems like somehow we ought to be able
to cache the relation size for long periods of time when no extension
is in progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump versus defaults on foreign tables

2013-05-15 Thread Josh Berkus

Tom,

> Comments?

I personally think that FOREIGN should be optional on most commands
where ambiguity is unlikely.  In addition to ALTER, there's no
particularly good reason why "DROP TABLE myforeigntable" doesn't work.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump versus defaults on foreign tables

2013-05-15 Thread Tom Lane

While fooling with the issue of serial columns for foreign tables,
I noticed that pg_dump will sometimes try to restore a default
expression with a command like this:

ALTER TABLE ONLY rem ALTER COLUMN f1 SET DEFAULT 
nextval('rem_f1_seq'::regclass);

which when applied to a foreign table gets you

ERROR:  "rem" is a foreign table
HINT:  Use ALTER FOREIGN TABLE instead.

Now, I suppose we could kluge up pg_dump to emit ALTER FOREIGN TABLE
instead, but I'd like to object to this degree of anal retentivity.
There is *no* good reason to refuse this command, and plenty of
precedent in favor of being laxer.  Most ALTER TABLE variants have
historically allowed any relation type for which the action is sensible.
As an example, three lines earlier in this same dump file I find

ALTER TABLE public.rem_f1_seq OWNER TO postgres;

for which we are not bleating that you have to use ALTER SEQUENCE
instead (and we'd better not start, since this usage is well embedded
in existing dump files).

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Claudio Freire

On Wed, May 15, 2013 at 3:04 PM, Noah Misch  wrote:
> On Tue, May 14, 2013 at 12:15:24PM -0300, Claudio Freire wrote:
>> You know what would be a low-hanging fruit that I've been thinking
>> would benefit many of my own queries?
>>
>> "Parallel" sequential scan nodes. Even if there's no real parallelism
>> involved, when a query has to scan the same table at multiple nodes,
>> if it's big, it would be worth parallelizing the scans to transform
>> them into synchro scans.
>>
>> I have absolutely no idea how this would work easily without forked
>> workers, because the scans might be buried in more complex execution
>> trees. But still, it's worth considering, that parallelizing may
>> benefit more than core usage.
>>
>> If execution nodes could be paused at arbitrary points, a "parallel
>> scan" node could pause one branch that has consumed the circular
>> buffer, letting another branches consume their part, and thus
>> "parallelizing" branch execution. But this would be perhaps more
>> complex than simply forking.
>
> Execution nodes do pause between every output tuple, at least nominally.
> Still, given the architecture of our executor and the planner work to
> implement such a thing, I would not classify it as low-hanging fruit.  It
> would primarily apply to a plan with independent sequential scans of the same
> large (relative to total memory) relation.  I'm sure that comes up, but it
> doesn't strike me as typical.


I found it rather typical of some of my workloads, but it could
probably not be the case globally.

It would be rather easier if it could pause without returning rows. I
think ATM, not returning any rows means the node is finished doing its
scan. The nodes that would have to be pausable like this wouldn't be
sequential scans, but sorts, hashes, and in general those that take a
long time to start returning rows.

So, a plan that goes like:

  Seq on A -> Sort -> Merge -> result
  Seq on A -> Sort --/

Would be turned into:

  Seq on A -> Step Sort -> Parallel Merge -> result
  Seq on A -> Step Sort --/

Or even maybe

  Seq on A -> Sort -> Tee X -> Parallel Merge
 X --/

I think Tee and Parallel Merge should be doable with current
infrastructure, because they don't require pausing without returning
any tuples. Not sure how may meters above ground that is, or how many
gotchas might be involved. But it's been circling in my head for a
while.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-15 Thread Heikki Linnakangas


On 15.05.2013 22:50, Benedikt Grundmann wrote:

On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas
The subject says 9.2.3. Are you sure you're running 9.2.4 on all the
servers? There was a fix to a bug related to starting a standby server from
a filesystem snapshot. I don't think it was quite the case you have, but
pretty close.


So this is delightfully embarrassing I just went back to double check and

- primary box is 9.2.3
- standby is 9.2.4
- testing is 9.2.4

I guess that alone could possibly explain it?


Hmm, no, it should still work. There haven't been any changes in the WAL 
format. I do recommend upgrading the primary, of course, but I don't 
really see how that would explain what you're seeing.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-15 Thread Benedikt Grundmann

On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas  wrote:

> On 15.05.2013 15:42, David Powers wrote:
>
>> First, thanks for the replies.  This sort of thing is frustrating and hard
>> to diagnose at a distance, and any help is appreciated.
>>
>> Here is some more background:
>>
>> We have 3 9.2.4 databases using the following setup:
>>
>
> The subject says 9.2.3. Are you sure you're running 9.2.4 on all the
> servers? There was a fix to a bug related to starting a standby server from
> a filesystem snapshot. I don't think it was quite the case you have, but
> pretty close.
>
>
So this is delightfully embarrassing I just went back to double check and

- primary box is 9.2.3
- standby is 9.2.4
- testing is 9.2.4

I guess that alone could possibly explain it?

Thanks,

Bene

Re: [HACKERS] "on existing update" construct

2013-05-15 Thread Daniel Farina

On Wed, May 15, 2013 at 11:44 AM, Dev Kumkar  wrote:
> Hello,
>
> Is there an alternative of Sybase "on existing update" construct in pgsql.
>
> "ON DUPLICATE KEY UPDATE" doesn't work.
>
> Thanks in advance!

No, you'll have to either handle this in the application or use a
stored procedure at this time.  The omission of such a construct from
psql's "\h" command and the manual is not in error.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] counting algorithm for incremental matview maintenance

2013-05-15 Thread Josh Berkus

Kevin,

It's fairly common for matviews to be constructed such that updates to
them are strictly appends.  For example, a matview which has a daily
summary would just get appended to each day, and existing rows would not
change barring a major historical database cleanup.

It seems like we could ... and ought to ... optimize for this pattern
somehow for incremental updates.  That is, if the user knows that we're
going to be only appending new rows and not modifying any old ones, s/he
ought to be able to tell the database that somehow and avoid the
overhead of checking.  While the overhead of checking a count wouldn't
be that high for a few hundred rows, I've dealt with matviews which were
thousands to millions of rows.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Peter Geoghegan

On Wed, May 15, 2013 at 11:32 AM, Peter Geoghegan  wrote:
> I think that this effort could justify itself independently of any
> attempt to introduce parallelism to in-memory sorting. I abandoned a
> patch to introduce timsort to Postgres, because I knew that there was
> no principled way to reap the benefits.

Just for the record, I attach a patch that introduces a timsort_arg
function as a drop-in replacement for quicksort_arg (including
replacing all of the specializations that went into 9.2). It has been
rebased against master. For what it's worth, if anyone wanted to pick
this up, that would be fine with me.

Don't be fooled by the superficial regression test failures. The tests
in question are subtly wrong, because they rely on a certain ordering
that isn't explicitly requested. Timsort is stable, whereas quicksort
generally isn't stable (our implementation certainly isn't).

-- 
Peter Geoghegan

timsort.2013_05_15.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-15 Thread Heikki Linnakangas


On 15.05.2013 15:42, David Powers wrote:

First, thanks for the replies.  This sort of thing is frustrating and hard
to diagnose at a distance, and any help is appreciated.

Here is some more background:

We have 3 9.2.4 databases using the following setup:


The subject says 9.2.3. Are you sure you're running 9.2.4 on all the 
servers? There was a fix to a bug related to starting a standby server 
from a filesystem snapshot. I don't think it was quite the case you 
have, but pretty close.



- A primary box
- A standby box running as a hot streaming replica from the primary
- A testing box restored nightly from a static backup

As noted, the static backup is taken off of the standby by taking an LVM
snapshot of the database filesystem and rsyncing.  I don't think it's a
likely problem but the rsync leverages the previous backup (using
--link-dest) to make the rsync faster and the resulting backup smaller.


I guess that should work, as long as you make a full copy when you 
restore the backup, and never start a postgres server directly against a 
backup that contains the hardlinks.



We've been using the same system for quite some time, but previously (~ 1
month ago) had been taking the backup off of the primary (still using the
LVM snapshot).  The replication is a recent addition, and a very helpful
one.  LVM snapshots aren't lightweight in the face of writes and in some
circumstances a long running rsync would spike the IO load on the
production box.

Results of some additional tests:

After the user noticed that the test restore showed the original problem we
ran `vacuum analyze` on all three testing databases thinking that it had a
good chance of quickly touching most of the underlying files.  That gave us
errors on two of the testing restores similar to:

ERROR:  invalid page header in block 5427 of relation base/16417/199732075


Huh, that's surprising. You have to be quite unlucky to end with a 
corrupt page header, even if there's something wrong with the underlying 
storage or backup routines. Could you grab a copy of that block? 
Something like


dd skip=5427 bs=8192 count=1 if=base/16417/199732075 of=corrupt-block

I'm especially curious what the first 20 bytes or so look like. I'm 
guessing that it's all zeros, while some later part of the page contains 
data. Ie. the page was torn at some point in the process.



I'm also running the vacuum analyze on the production machines to double
check that the base databases don't have a subtle corruption that simply
hasn't been noticed.  They run with normal autovacuum settings, so I
suspect that they are fine/this won't show anything because we should have
seen this from the autovacuum daemon before.


Vacuum analyze doesn't always scan the whole table with default 
settings, so that's no proof that are no missing pages. "set 
vacuum_freeze_table_age=0; vacuum " will force a full scan of the 
table.



I'm happy to share the scripts we use for the backup/restore process if the
information above isn't enough, as well as the logs - though the postgres
logs don't seem to contain much of interest (the database system doesn't
really get involved).


Yeah, please share the scripts. What you've described seems correct to 
me, but maybe there's something I'm missing.



I also have the rsyncs of the failed snapshots available and could restore
them for testing purposes.  It's also easy to look in them (they are just
saved as normal directories on a big SAN) if I know what to look for.


Great. Not sure what to look at right now, but good that you have kept 
the evidence.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Andres Freund

On 2013-05-13 10:28:59 -0400, Noah Misch wrote:
> Each worker needs to make SnapshotNow visibility decisions coherent with the
> master.  For sorting, this allows us to look up comparison functions, even
> when the current transaction created or modified those functions.  This will
> also be an essential building block for any parallelism project that consults
> user tables.  Implementing this means copying the subtransaction stack and the
> combocid hash to each worker.  For the sake of completeness, we should also
> copy the global MVCC snapshot data (sorting probably won't care).  It also
> means forbidding, while a parallel task is in flight, operations that affect
> the transaction state:

Btw, if you assume you can simply copy a snapshot from the normal
backend to the worker backend to make visibility decisions in the
general case: You're wrong. Unfortunately you need in-memory state to
make sense of combocids...

Not impossible to solve, but you should be aware of the issue.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "on existing update" construct

2013-05-15 Thread Dev Kumkar

Hello,

Is there an alternative of Sybase "on existing update" construct in pgsql.

"ON DUPLICATE KEY UPDATE" doesn't work.

Thanks in advance!

Regards - Dev

Re: [HACKERS] commit fest schedule for 9.4

2013-05-15 Thread Josh Berkus

On 05/15/2013 11:05 AM, Andrew Dunstan wrote:
> 
> On 05/15/2013 02:00 PM, Josh Berkus wrote:
>>> Obviously we need a meta-manager who makes sure we have managers, and is
>>> able to notice that a manager is MIA and needs replaced (or at least
>>> backed-up).

Actuall, on a more serious basis, we could simply assign a backup CFM
(CFM-b) for each CF.  The backup CFM would jump in if nobody has heard
from the CFM for several days.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-15 Thread Cédric Villemain

Le mercredi 15 mai 2013 16:43:17, Andrew Dunstan a écrit :
> On 05/15/2013 10:05 AM, Tom Lane wrote:
> > Peter Eisentraut  writes:
> >> That said, I'm obviously outnumbered here.  What about the following
> >> compromise:  Use the configure-selected install program inside
> >> PostgreSQL (which we can test easily), and use install-sh under
> >> USE_PGXS?  Admittedly, the make install time of extensions is probably
> >> not an issue.
> > 
> > That works for me, since as you say we can easily fix any such bugs
> > in the core code.  The scary thing about this for extension authors
> > is that they may very well see no bug in their own testing, only to
> > have their packages fall over in the wild.  We shouldn't make each
> > author who's copied that code rediscover the problem for themselves
> > that expensively.
> 
> +1, although I will be renovating the Makefiles for all my extensions
> along the lines of my previous email.

pgxs.mk has some old rules to replace $libdir (and some few other maybe). 
Maybe we can try to find what make sense for most of the extension authors and 
add rules/target to pgxs.mk to reduce the useless copy/paste in the Makefile of 
extensions.

So what's desirable ?
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Peter Geoghegan

On Mon, May 13, 2013 at 7:28 AM, Noah Misch  wrote:
> We should decide whether to actually sort in parallel based on the comparator
> cost and the data size.  The system currently has no information on comparator
> cost: bt*cmp (and indeed almost all built-in functions) all have procost=1,
> but bttextcmp is at least 1000x slower than btint4cmp.

I think that this effort could justify itself independently of any
attempt to introduce parallelism to in-memory sorting. I abandoned a
patch to introduce timsort to Postgres, because I knew that there was
no principled way to reap the benefits. Unless you introduce
parallelism, it's probably going to be virtually impossible to come up
with an alogorithm that does in-memory sorting faster (in terms of the
amount of system time taken) than a highly optimized quicksort when
sorting integers. But sorting types with really expensive comparators
(even considerably more expensive than bttextcmp) for
pass-by-reference Datums (where the memory locality advantage of
quicksort doesn't really help so much) makes timsort much more
compelling. That's why it's used for Python lists.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] counting algorithm for incremental matview maintenance

2013-05-15 Thread Kevin Grittner

Merlin Moncure  wrote:
> Kevin Grittner  wrote:
>> Merlin Moncure  wrote:
>>
>>> #1 issue I have with current matview functionality is locking.
>>> currently refresh takes out an access exclusive lock.  so,
>>> question is, do you think your proposal will be such that it
>>> will no longer require taking out full lock for refresh
>>> purposes (either incremental or otherwise)?
>>
>> The right thread for *that* question is "Differential
>> (transactional) REFRESH"; however, I might as well say here that
>> I don't think we want to get rid of the (faster) version that
>> just replaces the current heap when we add the (slower) option
>> to REFRESH it transactionally.
>
> sorry, didn't notice that thread.  agreed, that seems good
> candidate for user input to refresh command to manage the
> tradeoff.
>
> well, do you expect the application of differential refresh to be
> automatic?

I expect considerable bikeshedding on this, but my preference would
be to allow syntax for specifying which technique is desired on
the REFRESH command, and in the absence of specification I would
prefer that it default to the current technique for a matview which
is not yet populated and the transactional technique for a
populated matview.  We could associate some property with the
matview for default REFRESH technique, but I don't know whether
that's worth the trouble.

> lockless + differential refresh would be game changer in terms of
> how I build up data for analytics.

Yeah, it's definitely something I would have liked to have in the
initial release, but I tried to keep the scope very limited given
how little time there was left in the release cycle when I got a
chance to start on it.  Adding this seemed to be just a little too
much.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Noah Misch

On Wed, May 15, 2013 at 12:26:52PM +0530, Amit Kapila wrote:
> On Monday, May 13, 2013 7:59 PM Noah Misch wrote:
> > We can allocate a small amount of permanent shared memory for
> > coordination
> > among a group of processes, but sorting will benefit from a region as
> > large as
> > maintenance_work_mem.  Expect on-demand memory sharing.
> 
> Will the shared memory used for coordinating tuples between master and
> worker be fixed or varying depending on size of tuples to be sorted or
> number of workers associated.
> If it is varying, then it can sometimes encounter situation where required
> memory is not available and in that case it has to revert to serial sorting

> How will the parallel sorting tasks be divided and assigned to each worker?

I haven't selected answers for those details, yet.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Noah Misch

On Wed, May 15, 2013 at 08:12:34AM +0900, Michael Paquier wrote:
> The concept of clause parallelism for backend worker is close to the
> concept of clause shippability introduced in Postgres-XC. In the case of
> XC, the equivalent of the master backend is a backend located on a node
> called Coordinator that merges and organizes results fetched in parallel
> from remote nodes where data scans occur (on nodes called Datanodes). The
> backends used for tuple scans across Datanodes share the same data
> visibility as they use the same snapshot and transaction ID as the backend
> on Coordinator. This is different from the parallelism as there is no idea
> of snapshot import to worker backends.

Worker backends would indeed share snapshot and XID.

> However, the code in XC planner used for clause shippability evaluation is
> definitely worth looking at just considering the many similarities it
> shares with parallelism when evaluating if a given clause can be executed
> on a worker backend or not. It would be a waste to implement twice the same
> thing is there is code already available.

Agreed.  Local parallel query is very similar to distributed query; the
specific IPC cost multipliers differ, but that's about it.  I hope we can
benefit from XC's experience in this area.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest schedule for 9.4

2013-05-15 Thread Andrew Dunstan



On 05/15/2013 02:00 PM, Josh Berkus wrote:

Obviously we need a meta-manager who makes sure we have managers, and is
able to notice that a manager is MIA and needs replaced (or at least
backed-up).

And then a meta-meta-manager to make sure that the meta-manager is
meta-managing.

And an Inspector General.  Anyone have Danny Kaye's phone number?



Or Gogol's?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Noah Misch

On Tue, May 14, 2013 at 12:15:24PM -0300, Claudio Freire wrote:
> You know what would be a low-hanging fruit that I've been thinking
> would benefit many of my own queries?
> 
> "Parallel" sequential scan nodes. Even if there's no real parallelism
> involved, when a query has to scan the same table at multiple nodes,
> if it's big, it would be worth parallelizing the scans to transform
> them into synchro scans.
> 
> I have absolutely no idea how this would work easily without forked
> workers, because the scans might be buried in more complex execution
> trees. But still, it's worth considering, that parallelizing may
> benefit more than core usage.
> 
> If execution nodes could be paused at arbitrary points, a "parallel
> scan" node could pause one branch that has consumed the circular
> buffer, letting another branches consume their part, and thus
> "parallelizing" branch execution. But this would be perhaps more
> complex than simply forking.

Execution nodes do pause between every output tuple, at least nominally.
Still, given the architecture of our executor and the planner work to
implement such a thing, I would not classify it as low-hanging fruit.  It
would primarily apply to a plan with independent sequential scans of the same
large (relative to total memory) relation.  I'm sure that comes up, but it
doesn't strike me as typical.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest schedule for 9.4

2013-05-15 Thread Josh Berkus


> Obviously we need a meta-manager who makes sure we have managers, and is
> able to notice that a manager is MIA and needs replaced (or at least
> backed-up).

And then a meta-meta-manager to make sure that the meta-manager is
meta-managing.

And an Inspector General.  Anyone have Danny Kaye's phone number?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest schedule for 9.4

2013-05-15 Thread Alvaro Herrera

Josh Berkus wrote:
> On 05/15/2013 10:30 AM, Alvaro Herrera wrote:
> > Uhm.  If I've been anything wrt the 9.3 schedule, happy is not it.  We
> > completely failed to manage it in any kind of sane way.  I vote +1 for
> > keeping the same commitfest schedule this year, but please let's do
> > everyone a favor and make sure we get some more (stricter?) management
> > of the commitfests this time.  Not closing the November commitfest in
> > March would be appreciated, I think.
> 
> Well, we could actually *follow* the schedule we outlined at the
> Developer Meeting last year, including the triage periods.
> 
> I'll also say:
> * we need to assign CF managers at least 2 weeks in advance of each CF *
> we need to replace them if they get too busy to follow-through,
> * and the last CF needs two managers.

Obviously we need a meta-manager who makes sure we have managers, and is
able to notice that a manager is MIA and needs replaced (or at least
backed-up).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-15 Thread David Powers

First, thanks for the replies.  This sort of thing is frustrating and hard
to diagnose at a distance, and any help is appreciated.

Here is some more background:

We have 3 9.2.4 databases using the following setup:

- A primary box
- A standby box running as a hot streaming replica from the primary
- A testing box restored nightly from a static backup

As noted, the static backup is taken off of the standby by taking an LVM
snapshot of the database filesystem and rsyncing.  I don't think it's a
likely problem but the rsync leverages the previous backup (using
--link-dest) to make the rsync faster and the resulting backup smaller.
 Each database is ~1.5T, so this is necessary to keep static backup times
reasonable.

We've been using the same system for quite some time, but previously (~ 1
month ago) had been taking the backup off of the primary (still using the
LVM snapshot).  The replication is a recent addition, and a very helpful
one.  LVM snapshots aren't lightweight in the face of writes and in some
circumstances a long running rsync would spike the IO load on the
production box.

Results of some additional tests:

After the user noticed that the test restore showed the original problem we
ran `vacuum analyze` on all three testing databases thinking that it had a
good chance of quickly touching most of the underlying files.  That gave us
errors on two of the testing restores similar to:

ERROR:  invalid page header in block 5427 of relation base/16417/199732075

In the meantime I modified the static backup procedure to shut standby
completely down before taking the LVM snapshot and am trying a restore
using that snapshot now.  I'll test that using the same vacuum analyze
test, and if that passes, a full vacuum.

I'm also running the vacuum analyze on the production machines to double
check that the base databases don't have a subtle corruption that simply
hasn't been noticed.  They run with normal autovacuum settings, so I
suspect that they are fine/this won't show anything because we should have
seen this from the autovacuum daemon before.

I'm happy to share the scripts we use for the backup/restore process if the
information above isn't enough, as well as the logs - though the postgres
logs don't seem to contain much of interest (the database system doesn't
really get involved).

I also have the rsyncs of the failed snapshots available and could restore
them for testing purposes.  It's also easy to look in them (they are just
saved as normal directories on a big SAN) if I know what to look for.

-David

On Wed, May 15, 2013 at 2:24 AM, Heikki Linnakangas  wrote:

> On 14.05.2013 23:47, Benedikt Grundmann wrote:
>
>> The only thing that is *new* is that we took the snapshot from the
>>
>> streaming replica.  So again my best guess as of now is that if the
>> database crashes while it is in streaming standby a invalid disk state can
>> result during during the following startup (in rare and as of now unclear
>> circumstances).
>>
>
> A bug is certainly possible. There isn't much detail here to debug with,
> I'm afraid. Can you share the full logs on all three systems? I'm
> particularly interest
>
>
>  You seem to be quite convinced that it must be LVM can you elaborate why?
>>
>
> Well, you said that there was a file in the original filesystem, but not
> in the snapshot. If you didn't do anything in between, then surely the
> snapshot is broken, if it skipped a file. Or was the file created in the
> original filesystem after the snapshot was taken? You probably left out
> some crucial details on how exactly the snapshot and rsync are performed.
> Can you share the scripts you're using?
>
> Can you reproduce this problem with a new snapshot? Do you still have the
> failed snapshot unchanged?
>
> - Heikki
>

Re: [HACKERS] commit fest schedule for 9.4

2013-05-15 Thread Josh Berkus

On 05/15/2013 10:30 AM, Alvaro Herrera wrote:
> Uhm.  If I've been anything wrt the 9.3 schedule, happy is not it.  We
> completely failed to manage it in any kind of sane way.  I vote +1 for
> keeping the same commitfest schedule this year, but please let's do
> everyone a favor and make sure we get some more (stricter?) management
> of the commitfests this time.  Not closing the November commitfest in
> March would be appreciated, I think.

Well, we could actually *follow* the schedule we outlined at the
Developer Meeting last year, including the triage periods.

I'll also say:
* we need to assign CF managers at least 2 weeks in advance of each CF *
we need to replace them if they get too busy to follow-through,
* and the last CF needs two managers.

We went pretty far off schedule this year because we didn't manage the
process.  Especially given the rudimentary tools we have to work with,
we can't afford to do that.

I'd be happy to manage the first CF on June 15th, if nobody else wants to.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest schedule for 9.4

2013-05-15 Thread Alvaro Herrera

Heikki Linnakangas wrote:
> On 14.05.2013 05:34, Peter Eisentraut wrote:
> >In the last two years, the first commit fest started in June, which is
> >about a month from now.  If we are going to do that again, we should
> >clarify that as soon as possible.  And if we are not, then we should
> >also clarify that, because some people are probably expecting that we
> >are.
> >
> >So, any thoughts on the commit fest schedule, any need for deviations,
> >or same old?
> 
> Same old sounds good to me. We released 9.3beta1 at the same time
> this year that we released 9.2beta1 last year. I've been quite happy
> with the 9.3 schedule. There are a few items on the 9.3 Open Items
> list, but I would expect them to be resolved in the next week or
> two, so that barring any major new issues, we could release 9.3beta2
> at the end of May, like we released 9.2beta2 last year.

Uhm.  If I've been anything wrt the 9.3 schedule, happy is not it.  We
completely failed to manage it in any kind of sane way.  I vote +1 for
keeping the same commitfest schedule this year, but please let's do
everyone a favor and make sure we get some more (stricter?) management
of the commitfests this time.  Not closing the November commitfest in
March would be appreciated, I think.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] counting algorithm for incremental matview maintenance

2013-05-15 Thread Merlin Moncure

On Wed, May 15, 2013 at 11:33 AM, Kevin Grittner  wrote:
> Merlin Moncure  wrote:
>
>> #1 issue I have with current matview functionality is locking.
>> currently refresh takes out an access exclusive lock.  so,
>> question is, do you think your proposal will be such that it will
>> no longer require taking out full lock for refresh purposes
>> (either incremental or otherwise)?
>
> The right thread for *that* question is "Differential
> (transactional) REFRESH"; however, I might as well say here that I
> don't think we want to get rid of the (faster) version that just
> replaces the current heap when we add the (slower) option to
> REFRESH it transactionally.

sorry, didn't notice that thread.   agreed, that seems good candidate
for user input to refresh command to manage the tradeoff.

well, do you expect the application of differential refresh to be
automatic?  lockless + differential refresh would be game changer in
terms of how I build up data for analytics.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed TODO: add support for "any" for PL/PythonU and PL/Perl

2013-05-15 Thread Manlio Perillo

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 14/05/2013 20:24, Josh Berkus ha scritto:
> Hackers,
> 
> I'd like to add the following todo items to the TODO list:
> 
> PL/Python:
> * add support for anyelement and anyarray to PL/Python
> * add support for VARIADIC "ANY" to PL/Python
> 
> PL/Perl:
> * add support for anyelement and anyarray to PL/Perl
> * add support for VARIADIC "ANY" to PL/Perl
> 
> The reason for this is that both Python and Perl are loosely typed
> languages,

I don't know about Perl, but, to be precise, Python is not a loosely
typed language.  It is a strongly typed dynamic language.

> [...]


Regards   Manlio Perillo
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlGTq1sACgkQscQJ24LbaUQdogCeNd1Boj/w2UaQIvDuiSU41Lrn
UnoAn3vuehDS2+woqVyahwktm5sYCtGG
=mwuk
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] counting algorithm for incremental matview maintenance

2013-05-15 Thread Kevin Grittner

Merlin Moncure  wrote:

> #1 issue I have with current matview functionality is locking.
> currently refresh takes out an access exclusive lock.  so,
> question is, do you think your proposal will be such that it will
> no longer require taking out full lock for refresh purposes
> (either incremental or otherwise)?

The right thread for *that* question is "Differential
(transactional) REFRESH"; however, I might as well say here that I
don't think we want to get rid of the (faster) version that just
replaces the current heap when we add the (slower) option to
REFRESH it transactionally.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] counting algorithm for incremental matview maintenance

2013-05-15 Thread Merlin Moncure

On Tue, May 14, 2013 at 2:52 PM, Kevin Grittner  wrote:
> In surveying the literature on $subject, I find that most of the
> theoretical work related to how to incrementally update
> materialized views based on the matview declaration was published
> between 1988 and 1993.  The best paper I have been able to find on
> the topic was published in ACM SIGMOD in 1993[1], and covers two
> algorithms: counting and DRed.  The former should be very fast for
> non-recursive views, but not very efficient for recursive views.
> The latter algorithm is the other way around -- it looks like it
> will do well with recursive views but generally be slower for
> non-recursive views.
>
> It does not seem feasible to me to implement both techniques in a
> single one-year PostgreSQL release.  In fact, if we have trouble
> getting everyone onto the same page early, we might have to settle
> for trying to just get some infrastructure into place, without
> anything to actually make use of it.  That would be unfortunate,
> since Oracle added incremental maintenance of matviews to their
> existing feature in 1999, and have been improving it regularly
> since then.  Many other products also have mature implementations
> of this, and there seems to be a lot of demand for it in
> PostgreSQL.  In the best of circumstances it will take years for us
> to catch up on this front.

#1 issue I have with current matview functionality is locking.
currently refresh takes out an access exclusive lock.  so, question
is, do you think your proposal will be such that it will no longer
require taking out full lock for refresh purposes (either incremental
or otherwise)?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-15 Thread David Fetter

On Wed, May 15, 2013 at 09:51:15AM -0400, Peter Eisentraut wrote:
> On 5/14/13 10:38 AM, Cédric Villemain wrote:
> > If I follow your example, then I would rewrite http://manager.pgxn.org/howto
> 
> Oh that's where that is coming from?  Well that example has all kinds of
> problems.

Would you be so kind as to point out same, or better still, to write
up what you think of as a better example intended for the same
audience?  I'm sure the PGXN people would be delighted to put
something better up there.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-15 Thread Amit Langote

On Wed, May 15, 2013 at 11:04 PM, Kyotaro HORIGUCHI
 wrote:
>> Is it right that it is only in the case a password prompt is needed
>> that a new connection is created after dropping the just-failed
>> connection?
>
> It's quite doubtful.\:-p The sequense seems fragile to say the
> least. Inserting password request state into the client-side state
> machine looks quite reasonable.

Looking at current code (well, pseudo-code!) :

do
{
new_pass = false;

if (CONNECTION_BAD && NEEDS_PASSWORD && password == NULL && ! FORCE_NO_PASSOWRD)
{
PQfinish();
password = simple_prompt() ;
new_pass = true;
}
}while(new_pass)

So, it looks like the loop will be repeated only if an authentication
method requiring the user to enter password is encountered in the
PQconnectPoll() which are AUTH_REQ_MD5 & AUTH_REQ_PASSWORD. As you can
see in the following code fragment from pg_fe_sendauth() which
apparently sets conn->password_needed:

case AUTH_REQ_MD5:
case AUTH_REQ_PASSWORD:
conn->password_needed = true;
if (conn->pgpass == NULL || conn->pgpass[0] == '\0')
{
printfPQExpBuffer(&conn->errorMessage,
  PQnoPasswordSupplied);
return STATUS_ERROR;
}
if (pg_password_sendauth(conn, conn->pgpass, areq) != STATUS_OK)
{
printfPQExpBuffer(&conn->errorMessage,
 "fe_sendauth: error sending password authentication\n");
return STATUS_ERROR;
}
break;

this seems to be the only code path that causes conn->password_needed
to be set to true. So, these seem to be only cases when a prompt will
be provided and new_pass would become true causing the
drop-and-reconnect by repetition of the loop. Am I missing some other
case when this might happen?

>> I created a patch which enables it to use the existing connection in
>> such a case (unlike what we currently do). It modifies
>> connectDBComplete() and PQconnectPoll() to also include states
>> pertaining to password being accepted from the user. That is, the
>> state machine in PQconnectPoll() is further extended to include a
>> connection state called CONNECTION_ASKING_PASSWORD which is entered
>> when server sends AUTH_REQ_MD5 or AUTH_REQ_PASSWORD auth requests.
>
> Great! The new client state seems to be effective also for MD5.  But
> it seems to break existing libpq client doing the same authentication
> sequence as current psql. Some means would be necessary to switch the
> behavior when password is not previously provided but needed by the
> server, or make the first half of the connection sequence to be
> compatible to the current sequence - in other words - It should be
> that when the user finds stauts is CONNECTION_BAD and
> PQconnectionNeedsPassword() == true, the user can throw away the
> connection and make new connection providing password, and also can
> send password on existing connection.

The first half of connection sequence remains same except for one
change: in PQconnectPoll(), when in case CONNECTION_AWAITING_RESPONSE,
if server sends md5/password authentication request, it returns
PGRES_POLLING_WAITING_PASSWORD, which, back in connectDBComplete()
sets conn->password = true and conn->status =
CONNECTION_ASKING_PASSWORD. Back in main(), this causes a password
prompt and then the second half of the connection sequence. Hence
pg_fe_sendauth() is not called in this first half unless it's a
different authentication method than md5 and password.

>
> the old style
>
> |   db = PQconnect();
> |   if (PQstatus(db) == CONNECTION_BAD && PQconnectionNeedsPassword(db))
> |   {
> | PQfinish(db);
> | value[..] = password = ;
> | db = PQconnect();
> | if (PQstatus(db) == CONNECTION_BAD)
> |
>
> and the new style
>
> |   db = PQconnect();
> |   if (PQconnectionNeedsPassword(db))
> |  PQsendPassword(db, password);
> |   if (PQstatus(db) == CONNECTION_BAD)
> |  
>
> should be standing together.

I see this accounts for CONNECTION_BAD (if any) in the first half. But
this CONNECTION_BAD has other reasons than conn->password_needed as
far as I can imagine since conn->password_needed would only be set in
connectDBComplete() in PGRES_POLLING_WAITING_PASSWORD. So, this
CONNECTION_BAD would require some different processing. Thoughts?

> Where, PQsendPassword is combined function of PQcopyPassword and
> PQcontinuedbConnect. If the only porpose of these functions is sending
> password then these functions are needed to be separately.
>
> What do you think for the compatibility and simpler API.

I think one function PQsendPassword(PGconn*, char *) would be
sufficient which would contain the code of both PQcopyPassword() and
PQcontinuedbConnect(). I would complete the connection sequence by
running its second half.

>> The backend waits for the password until authentication timeout
>> happens in which case the client can not send the pass

[HACKERS] Heap truncation without AccessExclusiveLock (9.4)

2013-05-15 Thread Heikki Linnakangas


Truncating a heap at the end of vacuum, to release unused space back to
the OS, currently requires taking an AccessExclusiveLock. Although it's 
only held for a short duration, it can be enough to cause a hiccup in 
query processing while it's held. Also, if there is a continuous stream 
of queries on the table, autovacuum never succeeds in acquiring the 
lock, and thus the table never gets truncated.


I'd like to eliminate the need for AccessExclusiveLock while truncating.

Design
--

In shared memory, keep two watermarks: a "soft" truncation watermark, 
and a "hard" truncation watermark. If there is no truncation in 
progress, the values are not set and everything works like today.


The soft watermark is the relation size (ie. number of pages) that 
vacuum wants to truncate the relation to. Backends can read pages above 
the soft watermark normally, but should refrain from inserting new 
tuples there. However, it's OK to update a page above the soft 
watermark, including adding new tuples, if the page is not completely 
empty (vacuum will check and not truncate away non-empty pages). If a 
backend nevertheless has to insert a new tuple to an empty page above 
the soft watermark, for example if there is no more free space in any 
lower-numbered pages, it must grab the extension lock, and update the 
soft watermark while holding it.


The hard watermark is the point above which there is guaranteed to be no 
tuples. A backend must not try to read or write any pages above the hard 
watermark - it should be thought of as the end of file for all practical 
purposes. If a backend needs to write above the hard watermark, ie. to 
extend the relation, it must first grab the extension lock, and raise 
the hard watermark.


The hard watermark is always >= the soft watermark.

Shared memory space is limited, but we only need the watermarks for any 
in-progress truncations. Let's keep them in shared memory, in a small 
fixed-size array. That limits the number of concurrent truncations that 
can be in-progress, but that should be ok. To not slow down common 
backend operations, the values (or lack thereof) are cached in relcache. 
To sync the relcache when the values change, there will be a new shared 
cache invalidation event to force backends to refresh the cached 
watermark values. A backend (vacuum) can ensure that all backends see 
the new value by first updating the value in shared memory, sending the 
sinval message, and waiting until everyone has received it.


With the watermarks, truncation works like this:

1. Set soft watermark to the point where we think we can truncate the 
relation. Wait until everyone sees it (send sinval message, wait).


2. Scan the pages to verify they are still empty.

3. Grab extension lock. Set hard watermark to current soft watermark (a 
backend might have inserted a tuple and raised the soft watermark while 
we were scanning). Release lock.


4. Wait until everyone sees the new hard watermark.

5. Grab extension lock.

6. Check (or wait) that there are no pinned buffers above the current 
hard watermark. (a backend might have a scan in progress that started 
before any of this, still holding a buffer pinned, even though it's empty.)


7. Truncate relation to the current hard watermark.

8. Release extension lock.


If a backend inserts a new tuple before step 2, the vacuum scan will see 
it. If it's inserted after step 2, the backend's cached soft watermark 
is already up-to-date, and thus the backend will update the soft 
watermark before the insert. Thus after the vacuum scan has finished the 
scan at step 2, all pages above the current soft watermark must still be 
empty.



Implementation details
--

There are three kinds of access to a heap page:

A) As a target for new tuple.
B) Following an index pointer, ctid or similar.
C) A sequential scan (and bitmap heap scan?)


To refrain from inserting new tuples to non-empty pages above the soft 
watermark (A), RelationGetBufferForTuple() is modified to check the soft 
watermark (and raise it if necessary).


An index scan (B) should never try to read beyond the high watermark, 
because there are no tuples above it, and thus there should be no 
pointers to pages above it either.


A sequential scan (C) must refrain from reading beyond the hard 
watermark. This can be implemented by always checking the (cached) high 
watermark value before stepping to next page.



Truncation during hot standby is a lot simpler: set soft and hard 
watermarks to the truncation point, wait until everyone sees the new 
values, and truncate the relation.



Does anyone see a flaw in this?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> The generic issue there is whether we can allow column defaults to be
> evaluated on the remote end.  The original postgres_fdw submission tried
> to support that, but it had enough bugs and logical inconsistencies that
> I ended up ripping that out before commit.  There's a good deal of
> discussion about that in the archives (in January or February IIRC).

Yeah, I watched much of that go by- just couldn't follow it entirely at
the time.  Still..

> However, when and if we do allow that to work, I'd still say that it's
> reasonable for "SERIAL" to mean local creation of the default value.

I agree with this; all I was trying to get at is that we shouldn't close
off any doors to eventually providing a way for defaults to be pushed to
the remote.

> If you want a remotely-supplied default to work, you'd not put a DEFAULT
> clause into the local definition; and SERIAL is essentially a shorthand
> for a DEFAULT clause.

Agreed.

> Yeah, I think the possibility of such a workaround was one of the
> reasons we decided it was okay to support only locally-computed
> defaults for now.

Right, and, of course, a simple trigger on the remote table would
probably work just fine too.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Tom Lane

Stephen Frost  writes:
> I'm alright with allowing it and making it the 'default', but I can
> certainly see use-cases for having a sequence on the remote side which
> is used for new values for that table and I'd suggest that we figure out
> a way to support that.

The generic issue there is whether we can allow column defaults to be
evaluated on the remote end.  The original postgres_fdw submission tried
to support that, but it had enough bugs and logical inconsistencies that
I ended up ripping that out before commit.  There's a good deal of
discussion about that in the archives (in January or February IIRC).

However, when and if we do allow that to work, I'd still say that it's
reasonable for "SERIAL" to mean local creation of the default value.
If you want a remotely-supplied default to work, you'd not put a DEFAULT
clause into the local definition; and SERIAL is essentially a shorthand
for a DEFAULT clause.

> At first blush, with 'simple' writable views, perhaps that can just be a
> view definition on the remote side which doesn't include that column and
> therefore that column won't be sent to the remote side explicitly but,
> but the view, running on the remote, would turn around and pick up the
> default value for any fields which aren't in the view definition when
> inserting into the table underneath.

Yeah, I think the possibility of such a workaround was one of the
reasons we decided it was okay to support only locally-computed
defaults for now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Stephen Frost

* Andrew Dunstan (and...@dunslane.net) wrote:
> On 05/15/2013 10:27 AM, Tom Lane wrote:
> >You can certainly argue that it might be silly to have a local sequence
> >generating the default value for insertions into a remote table; but
> >on the other hand, it might not be silly, depending on usage scenario.
> >I don't think the database should be enforcing a policy choice like
> >that.
> 
> Presumably if it's not appropriate they won't define it as a serial
> column. If Postgres is the primary source of the data rather than
> the foreign handler then it makes plenty of sense to have a serial
> column, I should think.
> 
> So +1 for allowing it.

I'm alright with allowing it and making it the 'default', but I can
certainly see use-cases for having a sequence on the remote side which
is used for new values for that table and I'd suggest that we figure out
a way to support that.

At first blush, with 'simple' writable views, perhaps that can just be a
view definition on the remote side which doesn't include that column and
therefore that column won't be sent to the remote side explicitly but,
but the view, running on the remote, would turn around and pick up the
default value for any fields which aren't in the view definition when
inserting into the table underneath.  I'm not suggesting that as
something we do for the user, but perhaps we could include a note along
these lines in the docs for users who need the default evaluated on the
remote?  And come up with a better way to handle it in the future
(perhaps an option in the foreign table definition?).  This, of course,
presumes that the solution I've described actually works, iow, not
tested. :)

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-15 Thread Andrew Dunstan



On 05/15/2013 10:05 AM, Tom Lane wrote:

Peter Eisentraut  writes:

That said, I'm obviously outnumbered here.  What about the following
compromise:  Use the configure-selected install program inside
PostgreSQL (which we can test easily), and use install-sh under
USE_PGXS?  Admittedly, the make install time of extensions is probably
not an issue.

That works for me, since as you say we can easily fix any such bugs
in the core code.  The scary thing about this for extension authors
is that they may very well see no bug in their own testing, only to
have their packages fall over in the wild.  We shouldn't make each
author who's copied that code rediscover the problem for themselves
that expensively.



+1, although I will be renovating the Makefiles for all my extensions 
along the lines of my previous email.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Andrew Dunstan



On 05/15/2013 10:27 AM, Tom Lane wrote:

You can certainly argue that it might be silly to have a local sequence
generating the default value for insertions into a remote table; but
on the other hand, it might not be silly, depending on usage scenario.
I don't think the database should be enforcing a policy choice like
that.


Presumably if it's not appropriate they won't define it as a serial 
column. If Postgres is the primary source of the data rather than the 
foreign handler then it makes plenty of sense to have a serial column, I 
should think.


So +1 for allowing it.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Tom Lane

Albe Laurenz  writes:
> Tom Lane wrote:
>> "Nicholson, Brad (Toronto, ON, CA)"  writes:
>>> [ this error message sucks: ]
>>> test=# create foreign table local_foo (id serial) server test_server 
>>> options (table_name 'foo');
>>> ERROR:  referenced relation "local_foo" is not a table

>> Yeah, I'd noticed that myself.  We could probably tweak the code to
>> issue a different error message and/or add a HINT if the serial's
>> parent relation is a foreign table.  I'm not exactly sure what it
>> should say though.  Thoughts?

> HINT: Serial columns can only be defined for local tables.  Use "integer" or 
> "bigint" instead.

Actually ... wait a minute.  Why *don't* we allow SERIAL columns in
foreign tables?  That made sense before, but now that we support column
defaults for them, I don't see any good reason for this prohibition.

The behavior, if we just remove this error check, would be that we'd
create a local sequence and it would become the source of default values
for insertions into the foreign table.  Since we've already committed to
the decision that column defaults are evaluated locally, this is
perfectly consistent.

You can certainly argue that it might be silly to have a local sequence
generating the default value for insertions into a remote table; but
on the other hand, it might not be silly, depending on usage scenario.
I don't think the database should be enforcing a policy choice like
that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Albe Laurenz

Tom Lane wrote:
> "Nicholson, Brad (Toronto, ON, CA)"  writes:
>> [ this error message sucks: ]
>> test=# create foreign table local_foo (id serial) server test_server options 
>> (table_name 'foo');
>> ERROR:  referenced relation "local_foo" is not a table
> 
> Yeah, I'd noticed that myself.  We could probably tweak the code to
> issue a different error message and/or add a HINT if the serial's
> parent relation is a foreign table.  I'm not exactly sure what it
> should say though.  Thoughts?

HINT: Serial columns can only be defined for local tables.  Use "integer" or 
"bigint" instead.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-15 Thread Cédric Villemain

Le mardi 14 mai 2013 15:08:42, Andrew Dunstan a écrit :
> On 05/14/2013 07:59 AM, Peter Eisentraut wrote:
> > On 5/14/13 4:17 AM, Marti Raudsepp wrote:
> >> On Tue, May 14, 2013 at 5:27 AM, Peter Eisentraut  
wrote:
> >>> On Tue, 2013-05-14 at 04:12 +0300, Marti Raudsepp wrote:
>  It's caused by this common pattern in extension makefiles:
>  DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql
> >>> 
> >>> What is the point of this?  Why have the wildcard and then the
> >>> non-wildcard term?
> >> 
> >> Because the non-wildcard file is built by the same Makefile (it's
> >> copied from the sql/$(EXTENSION).sql file). If it wasn't there, a
> >> "make install" from a clean checkout would miss this file.
> > 
> > If it's built, then it should be listed in DATA_built.
> 
> So, AIUI, leaving aside stylistic arguments about use of wildcards, one
> solution could be:
> 
> DATA_built = sql/$(EXTENSION)--$(EXTVERSION).sql
> DATA = $(filter-out sql/$(EXTENSION)--$(EXTVERSION).sql, $(wildcard
> sql/*--*.sql))
> 
> Is that right?

Yes.
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-15 Thread Tom Lane

Peter Eisentraut  writes:
> That said, I'm obviously outnumbered here.  What about the following
> compromise:  Use the configure-selected install program inside
> PostgreSQL (which we can test easily), and use install-sh under
> USE_PGXS?  Admittedly, the make install time of extensions is probably
> not an issue.

That works for me, since as you say we can easily fix any such bugs
in the core code.  The scary thing about this for extension authors
is that they may very well see no bug in their own testing, only to
have their packages fall over in the wild.  We shouldn't make each
author who's copied that code rediscover the problem for themselves
that expensively.

> (The affected extensions will still be buggy because users can still
> substitute their own install programs.  We're just hiding the issue for
> a while.)

I'm not following this argument.  The old makefile coding explicitly set

INSTALL = $(SHELL) $(top_srcdir)/config/install-sh -c

The only way I can see that that would be overridden is an explicit

make INSTALL=/my/random/script install

and surely we cannot guarantee that things don't break when you do
something like that.  There are hundreds of ways you can break
the build if you mess with makefile variables.  So I'm not prepared
to call it a bug if that doesn't work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-15 Thread Kyotaro HORIGUCHI

> Is it right that it is only in the case a password prompt is needed
> that a new connection is created after dropping the just-failed
> connection?

It's quite doubtful.\:-p The sequense seems fragile to say the
least. Inserting password request state into the client-side state
machine looks quite reasonable.

> I created a patch which enables it to use the existing connection in
> such a case (unlike what we currently do). It modifies
> connectDBComplete() and PQconnectPoll() to also include states
> pertaining to password being accepted from the user. That is, the
> state machine in PQconnectPoll() is further extended to include a
> connection state called CONNECTION_ASKING_PASSWORD which is entered
> when server sends AUTH_REQ_MD5 or AUTH_REQ_PASSWORD auth requests.

Great! The new client state seems to be effective also for MD5.  But
it seems to break existing libpq client doing the same authentication
sequence as current psql. Some means would be necessary to switch the
behavior when password is not previously provided but needed by the
server, or make the first half of the connection sequence to be
compatible to the current sequence - in other words - It should be
that when the user finds stauts is CONNECTION_BAD and
PQconnectionNeedsPassword() == true, the user can throw away the
connection and make new connection providing password, and also can
send password on existing connection.

the old style

|   db = PQconnect();
|   if (PQstatus(db) == CONNECTION_BAD && PQconnectionNeedsPassword(db))
|   {
| PQfinish(db);
| value[..] = password = ;
| db = PQconnect();
| if (PQstatus(db) == CONNECTION_BAD)
|

and the new style

|   db = PQconnect();
|   if (PQconnectionNeedsPassword(db))
|  PQsendPassword(db, password);
|   if (PQstatus(db) == CONNECTION_BAD)
|  

should be standing together.

Where, PQsendPassword is combined function of PQcopyPassword and
PQcontinuedbConnect. If the only porpose of these functions is sending
password then these functions are needed to be separately.

What do you think for the compatibility and simpler API.

> These two request types require a password to be entered by the user.
> There is a new PostgresPollingStatusType value called
> PGRES_POLLING_WAITING_PASSWORD which is the polling status while a
> password is being entered by the user.
>
> When user enters the password the PQconnectPoll() continues forward in
> CONNECTION_ASKING_PASSWORD wherein it sends the password to the server
> (by calling pg_fe_sendauth() and this time with a potentially correct
> password) and later goes back to CONNECTION_AWAITING_RESPONSE to read
> server's response to the password just entered where it either
> receives authorization OK or error response thus completing the
> connection start-up process.
>
> The backend waits for the password until authentication timeout
> happens in which case the client can not send the password anymore
> since the backend has exited due to authentication timeout. I wonder
> if this is one of the reasons why this has not already been
> implemented?
>
> Comments?

Hmmm. On current implement, server is not running while the client is
reading password so the authentication timeout is provided only for
hostile clients.  Conversely, the new sequence can enforce true
authentication timeout. It results in error after leaving the password
prompt for 60 seconds. I suppose that more desirable behavior in spite of
the poor message..

| Password: 
| psql: fe_sendauth: error sending password authentication

The point at issue now seems how to inform the timeout to the client
under reading password, especially prohibiting using thread nor
SIGALRM.

Providing password input function in libpq like below might make it
viable using select(2).

PQsendPassword(prompt="Passowrd: ", in_fd = stdin)

Any thoughts?

regareds,

--
Kyotaro Horiguchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-15 Thread Peter Eisentraut

On 5/14/13 10:38 AM, Cédric Villemain wrote:
> If I follow your example, then I would rewrite http://manager.pgxn.org/howto

Oh that's where that is coming from?  Well that example has all kinds of
problems.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-15 Thread Peter Eisentraut

On 5/14/13 5:45 PM, Tom Lane wrote:
> We changed to using install-sh unconditionally back in 2001 because
> we had too many problems with system-provided scripts that didn't do
> what we expected.  I see very little reason to believe that the
> compatibility problems have disappeared since then, and in fact this
> complaint seems to me to be sufficient to refute that thesis.

The compatibility issues in 2001 were completely different ones and were
explicitly resolved with a new configure check (which is used worldwide
by thousands of packages, note).  Let's not confuse the issue.

> The argument that the system-provided program might be faster carries
> very little weight for me --- "make install" is fast enough already.

It is not for me.  Note also that make install is part of test runs.

> It's not worth making a bunch of extension authors jump through hoops,
> whether their style was bad or not.

Well, I consider that this is not a style issue.  It's an issue of
wide-spread bugginess caused by uninformed copy-and-paste, and I'm glad
we found it.  Considering the widespread crappiness in PostgreSQL
extension makefiles, I don't consider it a problem that a few things
need to be fixed.

That said, I'm obviously outnumbered here.  What about the following
compromise:  Use the configure-selected install program inside
PostgreSQL (which we can test easily), and use install-sh under
USE_PGXS?  Admittedly, the make install time of extensions is probably
not an issue.

(The affected extensions will still be buggy because users can still
substitute their own install programs.  We're just hiding the issue for
a while.)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Atri Sharma



Sent from my iPad

On 15-May-2013, at 18:56, Tom Lane  wrote:

> "Nicholson, Brad (Toronto, ON, CA)"  writes:
>> [ this error message sucks: ]
>> test=# create foreign table local_foo (id serial) server test_server options 
>> (table_name 'foo');  
>>  
>> ERROR:  referenced relation "local_foo" is not a table
> 
> Yeah, I'd noticed that myself.  We could probably tweak the code to
> issue a different error message and/or add a HINT if the serial's
> parent relation is a foreign table.  I'm not exactly sure what it
> should say though.  Thoughts?
> 
> 
> 
> 
> 

+1 for the HINT message addition.

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Tom Lane

"Nicholson, Brad (Toronto, ON, CA)"  writes:
> [ this error message sucks: ]
> test=# create foreign table local_foo (id serial) server test_server options 
> (table_name 'foo');   
> ERROR:  referenced relation "local_foo" is not a table

Yeah, I'd noticed that myself.  We could probably tweak the code to
issue a different error message and/or add a HINT if the serial's
parent relation is a foreign table.  I'm not exactly sure what it
should say though.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-15 Thread Robert Haas

On Tue, May 14, 2013 at 10:30 PM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I still think we should revert 9db7ccae2000524b72a4052352cbb5407fb53b02.
>> The argument that the system-provided program might be faster carries
>> very little weight for me --- "make install" is fast enough already.
>> It's not worth making a bunch of extension authors jump through hoops,
>> whether their style was bad or not.
>
> +1.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-15 Thread Amit Langote

> Please add patches here so they don't get forgotten:
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>
> Do we really need to add *2* new libpq functions just to support this?

I will add the patches to commitfest after reviewing it a bit to see
if we can do away without having to create more new functions than
necessary and make appropriate changes.

--
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] postgres_fdw foreign tables and serial columns

2013-05-15 Thread Nicholson, Brad (Toronto, ON, CA)

Hi,

I'm kicking the tires on the 9.3 postgres_fdw stuff - I'm not sure if this is 
an issue or intended behavior, but it was pretty confusing based on the error 
message that was output.  If you try creating a foreign table with a reference 
to a serial data type, it comes back with a "referenced relation is not a 
table" error.  If you change the data type in the referenced table to integer - 
then it works.  Completely understand why this is needed - but it tripped me up 
for a while.  At the very least, can I suggest adding something in the 
documentation about serial columns (if it is not an issue)? 

 test=# create table foo (id serial);
CREATE TABLE

test=# create foreign table local_foo (id serial) server test_server options 
(table_name 'foo');   
ERROR:  referenced relation "local_foo" is not a table

test=# create foreign table local_foo (id integer) server test_server options 
(table_name 'foo');
CREATE FOREIGN TABLE

Brad


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed TODO: add support for "any" for PL/PythonU and PL/Perl

2013-05-15 Thread Pavel Stehule

+1

2013/5/15 Robert Haas :
> On Tue, May 14, 2013 at 2:24 PM, Josh Berkus  wrote:
>> Hackers,
>>
>> I'd like to add the following todo items to the TODO list:
>>
>> PL/Python:
>> * add support for anyelement and anyarray to PL/Python
>> * add support for VARIADIC "ANY" to PL/Python
>>
>> PL/Perl:
>> * add support for anyelement and anyarray to PL/Perl
>> * add support for VARIADIC "ANY" to PL/Perl
>>
>> The reason for this is that both Python and Perl are loosely typed
>> languages, and deal with variables as polymorphic.  Mapping Postgres
>> polymorphic parameters to functions in these PLs couldn't be more
>> natural.  I know from my part that support for VARIADIC "ANY" in
>> PL/Python would save me a bunch of shell function writing.
>>
>> I don't personally intend to hack these out, but they seem like good
>> things to put on the TODO list as good tasks for new hackers (and maybe
>> GSOC students) to take on.
>
> +1.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed TODO: add support for "any" for PL/PythonU and PL/Perl

2013-05-15 Thread Robert Haas

On Tue, May 14, 2013 at 2:24 PM, Josh Berkus  wrote:
> Hackers,
>
> I'd like to add the following todo items to the TODO list:
>
> PL/Python:
> * add support for anyelement and anyarray to PL/Python
> * add support for VARIADIC "ANY" to PL/Python
>
> PL/Perl:
> * add support for anyelement and anyarray to PL/Perl
> * add support for VARIADIC "ANY" to PL/Perl
>
> The reason for this is that both Python and Perl are loosely typed
> languages, and deal with variables as polymorphic.  Mapping Postgres
> polymorphic parameters to functions in these PLs couldn't be more
> natural.  I know from my part that support for VARIADIC "ANY" in
> PL/Python would save me a bunch of shell function writing.
>
> I don't personally intend to hack these out, but they seem like good
> things to put on the TODO list as good tasks for new hackers (and maybe
> GSOC students) to take on.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-15 Thread Robert Haas

On Tue, May 14, 2013 at 11:20 AM, Amit Langote  wrote:
> Hello,
>
> Is it right that it is only in the case a password prompt is needed
> that a new connection is created after dropping the just-failed
> connection?
> I created a patch which enables it to use the existing connection in
> such a case (unlike what we currently do). It modifies
> connectDBComplete() and PQconnectPoll() to also include states
> pertaining to password being accepted from the user. That is, the
> state machine in PQconnectPoll() is further extended to include a
> connection state called CONNECTION_ASKING_PASSWORD which is entered
> when server sends AUTH_REQ_MD5 or AUTH_REQ_PASSWORD auth requests.
> These two request types require a password to be entered by the user.
> There is a new PostgresPollingStatusType value called
> PGRES_POLLING_WAITING_PASSWORD which is the polling status while a
> password is being entered by the user.
>
> When user enters the password the PQconnectPoll() continues forward in
> CONNECTION_ASKING_PASSWORD wherein it sends the password to the server
> (by calling pg_fe_sendauth() and this time with a potentially correct
> password) and later goes back to CONNECTION_AWAITING_RESPONSE to read
> server's response to the password just entered where it either
> receives authorization OK or error response thus completing the
> connection start-up process.
>
> The backend waits for the password until authentication timeout
> happens in which case the client can not send the password anymore
> since the backend has exited due to authentication timeout. I wonder
> if this is one of the reasons why this has not already been
> implemented?
>
> Comments?

Please add patches here so they don't get forgotten:

https://commitfest.postgresql.org/action/commitfest_view/open

Do we really need to add *2* new libpq functions just to support this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better handling of archive_command problems

2013-05-15 Thread Robert Haas

On Tue, May 14, 2013 at 12:23 AM, Daniel Farina  wrote:
> On Mon, May 13, 2013 at 3:02 PM, Peter Geoghegan  wrote:
>> Has anyone else thought about approaches to mitigating the problems
>> that arise when an archive_command continually fails, and the DBA must
>> manually clean up the mess?
>
> Notably, the most common problem in this vein suffered at Heroku has
> nothing to do with archive_command failing, and everything to do with
> the ratio of block device write performance (hence, backlog) versus
> the archiving performance.  When CPU is uncontended it's not a huge
> deficit, but it is there and it causes quite a bit of stress.
>
> Archive commands failing are definitely a special case there, where it
> might be nice to bring write traffic to exactly zero for a time.

One possible objection to this line of attack is that, IIUC, waits to
acquire a LWLock are non-interruptible.  If someone tells PostgreSQL
to wait for some period of time before performing each WAL write,
other backends that grab the WALWriteLock will not respond to query
cancels during that time.  Worse, the locks have a tendency to back
up.  What I have observed is that if WAL isn't flushed in a timely
fashion, someone will try to grab WALWriteLock while holding
WALInsertLock.  Now anyone who attempts to insert WAL is in a
non-interruptible wait.  If the system is busy, it won't be long
before someone tries to extend pg_clog, and to do that they'll try to
grab WALInsertLock while holding CLogControlLock.  At that point, any
CLOG lookup that misses in the already-resident pages will send that
backend into a non-interruptible wait. I have seen cases where this
pile-up occurs during a heavy pgbench workload and paralyzes the
entire system, including any read-only queries, until the WAL write
completes.

Now despite all that, I can see this being useful enough that Heroku
might want to insert a very small patch into their version of
PostgreSQL to do it this way, and just live with the downsides.  But
anything that can propagate non-interruptible waits across the entire
system does not sound to me like a feature that is sufficiently
polished that we want to expose it to users less sophisticated than
Heroku (i.e. nearly all of them).  If we do this, I think we ought to
find a way to make the waits interruptible, and to insert them in
places where they really don't interfere with read-only backends.  I'd
probably also argue that we ought to try to design it such that the
GUC can be in MB/s rather than delay/WAL writer cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Dead regression tests

2013-05-15 Thread Heikki Linnakangas

There are a few .sql files in src/test/regress/sql/, that AFAICS are not 
used for anything:


drop.sql

Looking at the history, this was used for some kind of cleanup in the 
ancient regress.sh script. It has been unused since Nov 2000, when 
regress.sh was removed.


euc_cn.sql
euc_jp.sql
euc_kr.sql
euc_tw.sql
mule_internal.sql

These are duplicates of the scripts in src/test/mb/sql/.

sql_ascii.sql

This is similar to the euc_* and mule_internal tests, but is not present 
in src/test/mb/sql. Looking at the expected output, had this test ever 
been run, it would've failed since at least 2001, when the error message 
memorized in the expected output, "Relation '%s' does not exist", was 
changed in the code.


hs_primary_extremes.sql

This was introduced as part of the hot standby patch. I don't see any 
reference to how it's supposed to be used. Simon?



Barring objections, I'm going to just remove all of the above, except 
for hs_primary_extremes.sql.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: option --application_name for psql

2013-05-15 Thread Pavel Stehule

2013/5/14 Pavel Stehule :
> 2013/5/14 Magnus Hagander :
>> On Tue, May 14, 2013 at 9:36 PM, Pavel Stehule  
>> wrote:
>>> 2013/5/14 Magnus Hagander :
 psql in "connection string mode"
>>>
>>> Hi Magnus, can me send some link?
>>
>> http://www.postgresql.org/docs/9.2/static/app-psql.html
>>
>> "If this parameter contains an = sign or starts with a valid URI
>> prefix (postgresql:// or postgres://), it is treated as a conninfo
>> string. See Section 31.1 for more information."
>
> I was blind, thank you

After comments from Erik and Magnus my proposal has no sense.

Regards

Pavel

>
> Pavel
>
>>
>>
>> --
>>  Magnus Hagander
>>  Me: http://www.hagander.net/
>>  Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slicing TOAST

2013-05-15 Thread Pavel Golub

Hello, Heikki.

You wrote:

HL> On 14.05.2013 21:36, Josh Berkus wrote:
>>
>>> I'm proposing this now as a possible GSoC project; I don't propose to
>>> actively work on it myself.
>>
>> The deadline for submitting GSOC projects (by students) was a week ago.
>>   So is this a project suggestion for next year ...?

HL> I've been thinking, we should already start collecting ideas for next 
HL> year, and collect them throughout the year. I know I come up with some
HL> ideas every now and then, but when it's time for another GSoC, I can't
HL> remember any of them.

HL> I just created a GSoC2014 ideas pages on the wiki, for collecting these:
HL> https://wiki.postgresql.org/wiki/GSoC_2014. Let's keep the ideas coming,
HL> throughout the year.

Good idea! It reminds about feature proposed by Pavel Stehule while
ago here: 
http://www.postgresql.org/message-id/banlktini+chgkfnyjkf1rshsq2kmkts...@mail.gmail.com

It's about streaming functionality for BYTEA type. But I think
streaming must be added to BYTEA, TEXT and VARCHAR without length
specifier too.

As Pavel stated: "A very large bytea are limited by
query size - processing long query needs too RAM". This is the holy
true, which came up suddenly in the project of one of my client.
Becuase he used bytea for images storing and text format in
PQexec, which as you know doubles-triples size of the data.

Some more details from Pavel:

There is a few disadvantages LO against bytea, so there are requests
for "smarter" API for bytea.

Significant problem is different implementation of LO for people who
have to port application to PostgreSQL from Oracle, DB2. There are
some JDBC issues too.

For me - main disadvantage of LO in one space for all. Bytea removes
this disadvantage, but it is slower for lengths > 20 MB. It could be
really very practical have a possibility insert some large fields in
second NON SQL stream. Same situation is when large bytea is read.

I'm not sure if the whole project is simple enough for GSOC, but I
suppose it may be splitted somehow.

PS Should we start separate thread for proposals, because I've spent
an hour since I found wiki for GSOC14 mention.

HL> - Heikki

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

83 matches

Mail list logo