Re: [HACKERS] Logging of PAM Authentication Failure
Attached herewith is a patch based on description in my previous mail. This patch would need revision since the error situation in case of authentication timeout on the server needs to be handled; probably in simple_prompt()? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
Hello, > > >> Is the point of this discussion that the patch may leave out some > > >> glich about timing of timeline-related changing and Heikki saw an > > >> egress of that? > > > > > > AFAIU, the committed patch has some gap in overall scenario which is > > the > > > fast promotion issue. > > > > Right, the fast promotion issue is still there. > > > > Just to get us all on the same page again: Does anyone see a problem > > with a fresh git checkout, with the fast-promotion-quick-fix.patch > > applied? > > (http://www.postgresql.org/message-id/51894942.4080...@vmware.com). If > > you do, please speak up. As far as I know, the already-committed patch, > > together with fast-promotion-quick-fix.patch, should fix all known > > issues (*). Shared XLogCtl->ThisTimeLineID is written and read without fencing by spinlock unlike some other XLogCtl members. Can this break coherency of its memory between different processors? It is quite reasonable that I cannot find the trouble if it is the cause. I didn't see the issue even without fast-promotion-quick-fix.patch. > The patch provided will un-necessarily call InitXLOGAccess() 2 times for End > of recovery checkpoint, it doesn't matter w.r.t performance but actually the > purpose will > be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(), > or am I missing something. > > One more thing, I think after fast promotion, either it should set timeline > or give error in CreateCheckPoint() function before it reaches the check > mentioned by you in your initial mail. > if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0) > elog(ERROR, "can't create a checkpoint during recovery"); > Shouldn't it set timeline in above check (RecoveryInProgress()) or when > RecoveryInProgress() is called before CreateCheckPoint()? Thinking of checkpointer, it does RecoveryInProgress() far earlier to there, in waiting loop in CheckPointerMain where to decide which to do between checkpoint and restartpoint. So InitXLogAccess() has been already done when checkpoint is choosed there for the first time. And before that, ThisTimeLineID in the startup process gets incremented and is copied onto XLogCtl->ThisTimeLineID before xlogctl->SharedRecoveryInProgress becomes false in StartupXLog(). I don't think it is possible for checkpointer to run on older timeline id on codition that all processes looks exactly the same memory image. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)
On Wed, May 15, 2013 at 3:08 PM, Daniel Farina wrote: > On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas > wrote: >> pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when >> I run "pgbench -S -c64 -j64 -T60 -M prepared" on a 32-core Linux machine: >> >> - 64.09% postgres postgres [.] tas >>- tas >> - 99.83% s_lock >> - 53.22% LWLockAcquire >> + 99.87% GetSnapshotData >> - 46.78% LWLockRelease >> GetSnapshotData >> + GetTransactionSnapshot >> + 2.97% postgres postgres [.] tas >> + 1.53% postgres libc-2.13.so [.] 0x119873 >> + 1.44% postgres postgres [.] GetSnapshotData >> + 1.29% postgres [kernel.kallsyms] [k] arch_local_irq_enable >> + 1.18% postgres postgres [.] AllocSetAlloc >> ... >> >> So, on this test, a lot of time is wasted spinning on the mutex of >> ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a >> surprisingly steep drop in performance once you go beyond 29 clients >> (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory >> is that after that point all the cores are busy, and processes start to be >> sometimes context switched while holding the spinlock, which kills >> performance. I accidentally some important last words from Heikki's last words in his mail, which make my correspondence pretty bizarre to understand at the outset. Apologies. He wrote: >> [...] Has anyone else seen that pattern? > I have, I also used linux perf to come to this conclusion, and my > determination was similar: a system was undergoing increasingly heavy > load, in this case with processes >> number of processors. It was > also a phase-change type of event: at one moment everything would be > going great, but once a critical threshold was hit, s_lock would > consume enormous amount of CPU time. I figured preemption while in > the spinlock was to blame at the time, given the extreme nature. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
Sorry that I am writing separate emails on the same topic. I seem to have a solution that allows us to accomplish what we are trying to without much change to the existing libpq interface (especially what to expect about return values and connection state that we are in when we return from connectDBComplete() and PQconnectPoll() ). Following are required changes roughly: 1] in src/bin/psql/startup.c, main() if (PQstatus(pset.db) == CONNECTION_BAD && PQconnectionNeedsPassword(pset.db) && password == NULL && pset.getPassword != TRI_NO) { password = simple_prompt(password_prompt, 100, false); /* How would this detect authentication_timeoue and exit accordingly ?*/ PQsendPassword(pset.db, password); } And there is no do{...}while(new_pass); unlike current code. 2] in src/interfaces/libpq/fe-connect.c, new function: void PQsendPassword(PGconn *conn, char *password)/*suggest better name? */ void PQsendPassword(PGconn *conn, char *password) { conn->pgpass = password; conn->status = CONNECTION_SENDING_PASSWORD; /*suggest better name for the status? */ (void) connectDBComplete(conn); } 3] in src/interfaces/libpq/fe-connect.c, connectDBComplete(PGconn *conn), No change required. :-) 4] in in src/interfaces/libpq/fe-connect.c, PQconnectPoll(PGconn *conn) a) add a new case for both switch's (one before and after keep_going: ) /* These are writing states, so we just proceed. */ case CONNECTION_STARTED: case CONNECTION_MADE: case CONNECTION_SENDING_PASSWORD: break; ... ... keep_going: ... ... case CONNECTION_SENDING_PASSWORD: { /* ** Note that conn->pghost must be non-NULL if we are going to ** avoid the Kerberos code doing a hostname look-up. **/ if (pg_fe_sendauth(areq, conn) != STATUS_OK) { conn->errorMessage.len = strlen(conn->errorMessage.data); goto error_return; } conn->errorMessage.len = strlen(conn->errorMessage.data); /* ** Just make sure that any data sent by pg_fe_sendauth is ** flushed out. Although this theoretically could block, it ** really shouldn't since we don't send large auth responses. **/ if (pqFlush(conn)) goto error_return; /* * Now go to read the server's response to password just sent * */ conn->status = CONNECTION_AWAITING_RESPONSE; return PGRES_POLLING_READING; } 5] in src/interfaces/libpq/libpq-fe.h, add a new intermediate connection state /* * Although it is okay to add to these lists, values which become unused * should never be removed, nor should constants be redefined - that would * break compatibility with existing code. */ typedef enum { CONNECTION_OK, CONNECTION_BAD, /* Non-blocking mode only below here */ /* * The existence of these should never be relied upon - they should only * be used for user feedback or similar purposes. */ CONNECTION_STARTED, /* Waiting for connection to be made. */ CONNECTION_MADE,/* Connection OK; waiting to send. */ CONNECTION_AWAITING_RESPONSE, /* Waiting for a response from the * postmaster.*/ CONNECTION_AUTH_OK, /* Received authentication; waiting for * backend startup. */ CONNECTION_SETENV, /* Negotiating environment. */ CONNECTION_SSL_STARTUP, /* Negotiating SSL. */ CONNECTION_NEEDED, /* Internal state: connect() needed */ CONNECTION_SENDING_PASSWORD } ConnStatusType; As you can probably see this requires minimum libpq changes: 1] Add one more connection state: CONNECTION_SENDING_PASSWORD 2] Add one more function: PQsendPassword(PGconn*, char*) 3] Modify PQconnectPoll() to allow to handle an intermediate CONNECTION_SENDING_PASSWORD state for the clients which use PQsendPassword() to send a password that user entered in between a connection sequence over an existing connection. Comments? -- Sent via pgsq
Re: [HACKERS] plperl segfault in plperl_trusted_init() on kfreebsd
Re: Tom Lane 2013-05-15 <244.1368675...@sss.pgh.pa.us> > You sure that wholesale override of CFLAGS is a good idea? > This report doesn't show what flags the rest of your system was built > with, but seems like there could be a compatibility issue. Or maybe > you prevented plperl from being built with -fpic, or something like > that. That was because the plain "./configure" version (for a minimal way to reproduce) didn't built with debug symbols. The original gcc line from the Debian build log is: cd build && ../configure \ --with-tcl \ --with-perl \ --with-python \ --with-pam \ --with-krb5 \ --with-gssapi \ --with-openssl \ --with-libxml \ --with-libxslt \ --with-ldap \ --with-tclconfig=/usr/lib/tcl8.5 \ --with-tkconfig=/usr/lib/tk8.5 \ --with-includes=/usr/include/tcl8.5 \ PYTHON=/usr/bin/python \ --mandir=/usr/share/postgresql/9.3/man --docdir=/usr/share/doc/postgresql-doc-9.3 --sysconfdir=/etc/postgresql-common --datarootdir=/usr/share/ --datadir=/usr/share/postgresql/9.3 --bindir=/usr/lib/postgresql/9.3/bin --libdir=/usr/lib/ --libexecdir=/usr/lib/postgresql/ --includedir=/usr/include/postgresql/ --enable-nls --enable-integer-datetimes --enable-thread-safety --enable-debug --disable-rpath --with-ossp-uuid --with-gnu-ld --with-pgport=5432 --with-system-tzdata=/usr/share/zoneinfo CFLAGS='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie -I/usr/include/mit-krb5' LDFLAGS='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed -L/usr/lib/mit-krb5 -L/usr/lib/x86_64-kfreebsd-gnu/mit-krb5' /usr/bin/make -C plperl all make[5]: Entering directory `/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/src/pl/plperl' msgfmt -o po/cs.mo /build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/po/cs.po [...] '/usr/bin/perl' /build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/text2macro.pl --strip='^(\#.*|\s*)$' /build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/plc_perlboot.pl /build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/plc_trusted.pl > perlchunks.h '/usr/bin/perl' /build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/plperl_opmask.pl plperl_opmask.h gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie -I/usr/include/mit-krb5 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl -I../../../src/include -I/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/tcl8.5 -I/usr/lib/perl/5.14/CORE -c -o plperl.o /build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/plperl.c '/usr/bin/perl' /usr/share/perl/5.14/ExtUtils/xsubpp -typemap /usr/share/perl/5.14/ExtUtils/typemap /build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/SPI.xs >SPI.c gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie -I/usr/include/mit-krb5 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl -I../../../src/include -I/build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/tcl8.5 -I/usr/lib/perl/5.14/CORE -c -o SPI.o SPI.c '/usr/bin/perl' /usr/share/perl/5.14/ExtUtils/xsubpp -typemap /usr/share/perl/5.14/ExtUtils/typemap /build/buildd-postgresql-9.3_9.3~beta1-2-kfreebsd-amd64-3i_y9K/postgresql-9.3-9.3~beta1/build/../src/pl/plperl/Util.xs >Util.c gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -pie -I/usr/include/mit-krb5 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I/build/buildd-postgresql-9.3_9.3~beta1-2-
Re: [HACKERS] commit fest schedule for 9.4
2013-05-15 20:05 keltezéssel, Andrew Dunstan írta: On 05/15/2013 02:00 PM, Josh Berkus wrote: Obviously we need a meta-manager who makes sure we have managers, and is able to notice that a manager is MIA and needs replaced (or at least backed-up). And then a meta-meta-manager to make sure that the meta-manager is meta-managing. And an Inspector General. Anyone have Danny Kaye's phone number? Or Gogol's? You have to learn the dialing chant to call them... ;-) cheers andrew -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
>>> I created a patch which enables it to use the existing connection in >>> such a case (unlike what we currently do). It modifies >>> connectDBComplete() and PQconnectPoll() to also include states >>> pertaining to password being accepted from the user. That is, the >>> state machine in PQconnectPoll() is further extended to include a >>> connection state called CONNECTION_ASKING_PASSWORD which is entered >>> when server sends AUTH_REQ_MD5 or AUTH_REQ_PASSWORD auth requests. >> >> Great! The new client state seems to be effective also for MD5. But >> it seems to break existing libpq client doing the same authentication >> sequence as current psql. Some means would be necessary to switch the >> behavior when password is not previously provided but needed by the >> server, or make the first half of the connection sequence to be >> compatible to the current sequence - in other words - It should be >> that when the user finds stauts is CONNECTION_BAD and >> PQconnectionNeedsPassword() == true, the user can throw away the >> connection and make new connection providing password, and also can >> send password on existing connection. > > The first half of connection sequence remains same except for one > change: in PQconnectPoll(), when in case CONNECTION_AWAITING_RESPONSE, > if server sends md5/password authentication request, it returns > PGRES_POLLING_WAITING_PASSWORD, which, back in connectDBComplete() > sets conn->password = true and conn->status = > CONNECTION_ASKING_PASSWORD. Back in main(), this causes a password > prompt and then the second half of the connection sequence. Hence > pg_fe_sendauth() is not called in this first half unless it's a > different authentication method than md5 and password. One more thing that I forgot to mention is that connection sequence would enter CONNECTION_ASKING_PASSWORD in the first half, only if password is currently not set to a non-empty value that is ( conn->pgpass ==NULL || conn->pgpass[0] = '\0' ) is true. I was wondering what would be the case for other applications using libpq when they return from connectionDBComplete() with conn->status set to CONNECTION_ASKING_PASSWORD, provided they have not set conn->pgpass to a non-empty value.If they are currently handling this based on CONNECTION_BAD, then this change does no good to them. In fact there needs to be a way for them to get CONNECTION_BAD. Since, this whole patch is about not having to drop-and-reconnect *in case of password prompt*, how it changes libpq for other applications also needs to be addressed here. especially for md5/password authentication cases. Currently, any attempt to connect using empty or NULL password results in CONNECTION_BAD for all libpq based clients. Thoughts? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Wed, May 15, 2013 at 10:17 PM, Alvaro Herrera wrote: > Jon Nelson escribió: >> On Wed, May 15, 2013 at 4:46 PM, Jon Nelson >> wrote: > >> > That's true. I originally wrote the patch using fallocate(2). What >> > would be appropriate here? Should I switch on the return value and the >> > six (6) or so relevant error codes? >> >> I addressed this, hopefully in a reasonable way. > > Would it work to just assign the value you got from posix_fallocate (if > nonzero) to errno and then use %m in the errmsg() call in ereport()? That strikes me as a better way. I'll work something up soon. Thanks! -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl segfault in plperl_trusted_init() on kfreebsd
Christoph Berg writes: > Here's what I did: > ./configure > cd src && make > cd pl/plperl && make clean && make CFLAGS="-g" > make check You sure that wholesale override of CFLAGS is a good idea? This report doesn't show what flags the rest of your system was built with, but seems like there could be a compatibility issue. Or maybe you prevented plperl from being built with -fpic, or something like that. I tend to use make PROFILE="-g" for this sort of tweak to the compile flags --- that adds the flags you specify, without removing any. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
Jon Nelson escribió: > On Wed, May 15, 2013 at 4:46 PM, Jon Nelson wrote: > > That's true. I originally wrote the patch using fallocate(2). What > > would be appropriate here? Should I switch on the return value and the > > six (6) or so relevant error codes? > > I addressed this, hopefully in a reasonable way. Would it work to just assign the value you got from posix_fallocate (if nonzero) to errno and then use %m in the errmsg() call in ereport()? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Wed, May 15, 2013 at 4:46 PM, Jon Nelson wrote: > On Wed, May 15, 2013 at 4:34 PM, Andres Freund wrote: .. >> Some where quick comments, without thinking about this: > > Thank you for the kind feedback. > >> * needs a configure check for posix_fallocate. The current version will >> e.g. fail to compile on windows or many other non linux systems. Check >> how its done for posix_fadvise. The following patch includes the changes to configure.in. I had to make other changes (not included here) because my local system uses autoconf 2.69, but I did test this successfully. > That's true. I originally wrote the patch using fallocate(2). What > would be appropriate here? Should I switch on the return value and the > six (6) or so relevant error codes? I addressed this, hopefully in a reasonable way. -- Jon fallocate.patch-v2 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)
On 2013-05-15 18:35:35 +0300, Heikki Linnakangas wrote: > Truncating a heap at the end of vacuum, to release unused space back to > the OS, currently requires taking an AccessExclusiveLock. Although it's only > held for a short duration, it can be enough to cause a hiccup in query > processing while it's held. Also, if there is a continuous stream of queries > on the table, autovacuum never succeeds in acquiring the lock, and thus the > table never gets truncated. > > I'd like to eliminate the need for AccessExclusiveLock while truncating. Couldn't we "just" take the extension lock and then walk backwards from the rechecked end of relation ConditionalLockBufferForCleanup() the buffers? For every such locked page we check whether its still empty. If we find a page that we couldn't lock, isn't empty or we already locked a sufficient number of pages we truncate. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Extent Locks
All, Starting a new thread to avoid hijacking Heikki's original, but.. * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: > Truncating a heap at the end of vacuum, to release unused space back to > the OS, currently requires taking an AccessExclusiveLock. Although > it's only held for a short duration, it can be enough to cause a > hiccup in query processing while it's held. Also, if there is a > continuous stream of queries on the table, autovacuum never succeeds > in acquiring the lock, and thus the table never gets truncated. Extent locking suffers from very similar problems and we really need to improve this situation. With today's fast i/o systems, and massive numbers of CPUs in a single system, it's absolutely trivial to have a whole slew of processes trying to add data to a single relation and that access getting nearly serialized due to everyone waiting on the extent lock. Perhaps one really simple approach would be to increase the size of the extent which is created in relation to the size of the relation. I've no clue what level of effort is involved there but I'm hoping such an approach would help. I've long thought that it'd be very neat if we could simply give each bulk-inserter process their own 1G chunk to insert directly into w/o having to talk to anyone else. The creation of the specific 1G piece could, hopefully, be made atomic easily (either thanks to the OS or with our own locking), etc, etc. I'm sure it's many bricks shy of a load, but I wanted to raise the issue, again, as I've seen it happening on yet another high-volume write-intensive system. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] "on existing update" construct
Basically I was referring to this link http://mbk.projects.postgresql.org Any suggestions here? Thanks!
Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)
Robert Haas writes: > I've been thinking for a while that we need some other system for > managing other kinds of invalidations. For example, suppose we want > to cache relation sizes in blocks. So we allocate 4kB of shared > memory, interpreted as an array of 512 8-byte entries. Whenever you > extend a relation, you hash the relfilenode and take the low-order 9 > bits of the hash value as an index into the array. You increment that > value either under a spinlock or perhaps using fetch-and-add where > available. I'm not sure I believe the details of that. 1. 4 bytes is not enough to store the exact identity of the table that the cache entry belongs to, so how do you disambiguate? 2. If you don't find an entry for your target rel in the cache, aren't you still going to have to do an lseek? Having said that, the idea of specialized caches in shared memory seems plenty reasonable to me. One thing that's bothering me about Heikki's proposal is that it's not clear that it's a *cache*; that is, I don't see the fallback logic to use when there's no entry for a relation for lack of room. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "on existing update" construct
> Please do not top-post on the PostgreSQL lists. See < http://idallen.com/topposting.html> > Also, note that pgsql-hackers is entirely the wrong list for this sort of question. You should have been asking on pgsql-general. Apologies for top-post! Well I thought pgsql-hackers will suggest some new solution was just going by the description of mailing-list. Its worth looking into http://stackoverflow.com/questions/1109061/insert-on-duplicate-update-postgresql Let me know and I will post this question on pgsql-general. > It's been on the TODO list forever. It's harder to implement well than > you would think. Yes, I think it be nice to see this in one of the upcoming releases. Thanks!
Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)
On Wed, May 15, 2013 at 7:10 PM, Tom Lane wrote: >> Another problem is that sinval resets are bad for performance, and >> anything we do that pushes more messages through sinval will increase >> the frequency of resets. > > I've been thinking that we should increase the size of the sinval ring; > now that we're out from under SysV shmem size limits, it wouldn't be > especially painful to do that. That's not terribly relevant to this > issue though. I agree that we don't want an sinval message per relation > extension, no matter what the ring size is. I've been thinking for a while that we need some other system for managing other kinds of invalidations. For example, suppose we want to cache relation sizes in blocks. So we allocate 4kB of shared memory, interpreted as an array of 512 8-byte entries. Whenever you extend a relation, you hash the relfilenode and take the low-order 9 bits of the hash value as an index into the array. You increment that value either under a spinlock or perhaps using fetch-and-add where available. On the read side, every backend can cache the length of as many relations as it wants. But before relying on a cached value, it must index into the shared array and see whether the value has been updated. On 64-bit systems, this requires no lock, only a barrier, and some 32-bit systems have special instructions that can be used for an 8-byte atomic read, and hence could avoid the lock as well. This would almost certainly be cheaper than doing an lseek every time, although maybe not by enough to matter. At least on Linux, the syscall seems to be pretty cheap. Now, a problem with this is that we keep doing things that make it hard for people to run very low memory instances of PostgreSQL. So potentially whether or not we allocate space for this could be controlled by a GUC. Or maybe the structure could be made somewhat larger and shared among multiple caching needs. I'm not sure whether this idea can be adapted to do what Heikki is after. But I think these kinds of techniques are worth thinking about as we look for ways to further improve performance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)
Robert Haas writes: > On Wed, May 15, 2013 at 11:35 AM, Heikki Linnakangas > wrote: >> To not slow down common backend >> operations, the values (or lack thereof) are cached in relcache. To sync the >> relcache when the values change, there will be a new shared cache >> invalidation event to force backends to refresh the cached watermark values. > AFAIK, the sinval mechanism isn't really well-designed to ensure that > these kinds of notifications arrive in a timely fashion. Yeah; currently it's only meant to guarantee that you see updates that were protected by obtaining a heavyweight lock with which your own lock request conflicts. It will *not* work for the usage Heikki proposes, at least not without sprinkling sinval queue checks into a lot of places where they aren't now. And as you say, the side-effects of that would be worrisome. > Another problem is that sinval resets are bad for performance, and > anything we do that pushes more messages through sinval will increase > the frequency of resets. I've been thinking that we should increase the size of the sinval ring; now that we're out from under SysV shmem size limits, it wouldn't be especially painful to do that. That's not terribly relevant to this issue though. I agree that we don't want an sinval message per relation extension, no matter what the ring size is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump versus defaults on foreign tables
> In any case, that restriction isn't breaking pg_dump so I don't feel > a need to change it during beta. Sure. Maybe put it on the stack for 9.4. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "on existing update" construct
On 05/15/2013 02:22 PM, Dev Kumkar wrote: > Well ya, stored procedure also was thinking about pre-insert trigger. > > Am sure folks here must have gone thru this and had alternatives. Also was > reading about merge_by_key. It's been on the TODO list forever. It's harder to implement well than you would think. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump versus defaults on foreign tables
Josh Berkus writes: >> Comments? > I personally think that FOREIGN should be optional on most commands > where ambiguity is unlikely. In addition to ALTER, there's no > particularly good reason why "DROP TABLE myforeigntable" doesn't work. Well, it's certainly bizarre that we allow ALTER TABLE to work for most non-table relation types (eg views, sequences) but specifically exclude foreign tables, which after all are *called* tables. So I think that's just plain wrong. I'm less excited about changing the DROP case, because there is precedent extending way back for being anal about that case, eg we do this at least as far back as 7.4: regression=# create sequence s1; CREATE SEQUENCE regression=# drop table s1; ERROR: "s1" is not a table HINT: Use DROP SEQUENCE to remove a sequence. I'm not against being laxer about DROP, but it ought to be a thought-through change that considers what to do for all relation types not just foreign tables. In particular, should it follow the precedent of ALTER TABLE that we allow "ALTER TABLE" on non-tables but if you use another spelling such as "ALTER SEQUENCE" then it has to match the relation type? In any case, that restriction isn't breaking pg_dump so I don't feel a need to change it during beta. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better handling of archive_command problems
On Wed, May 15, 2013 at 3:46 AM, Robert Haas wrote: > One possible objection to this line of attack is that, IIUC, waits to > acquire a LWLock are non-interruptible. If someone tells PostgreSQL > to wait for some period of time before performing each WAL write, > other backends that grab the WALWriteLock will not respond to query > cancels during that time. I don't see any reasonable way to make LWLocks care about interrupts (using all 3 possible underlying semaphore implementations, no less). As it says within LWLockAcquire: /* * Lock out cancel/die interrupts until we exit the code section protected * by the LWLock. This ensures that interrupts will not interfere with * manipulations of data structures in shared memory. */ HOLD_INTERRUPTS(); We've been pretty judicious about placing CHECK_FOR_INTERRUPTS() calls in the right places, but it's still quite possible to see the server take multiple seconds - perhaps even as many as 10 - to respond to an interrupt (by psql SIGINT). Now, I didn't have enough of an interest at the times I noticed this to figure out exactly why that may have been or to somehow characterize it, but I don't accept that it's a violation of some Postgres precept that this setting could result in interrupts taking multiple seconds, and maybe even as many as 10. I'd go so far as to let the user make the throttling sleep take as long as they like, even though this admittedly would sort of break such a precept. There is a setting called zero_damaged_pages, and enabling it causes data loss. I've seen cases where it was enabled within postgresql.conf for years. > Worse, the locks have a tendency to back up. Well, yes, sleeping on WALWriteLock has some fairly bad consequences for performance. That's sort of the idea. The bar is rather low here, to my mind (the state of the art currently is deal with it/monitor it yourself, and risk a PANIC shutdown if you fail). I think it's worth noting that at least one other "Enterprise" RDBMS has a similar feature. The only situation that I can see where a backend would acquire WALWriteLock alongside WALInsertLock is when a segment switch needs to occur (i.e. a XLOG_SWITCH record is being inserted). If that's the case, no one is going to be able to insert WAL for as long as the sleep to throttle occurs anyway, so there is no additional harm done. > Now despite all that, I can see this being useful enough that Heroku > might want to insert a very small patch into their version of > PostgreSQL to do it this way, and just live with the downsides. But > anything that can propagate non-interruptible waits across the entire > system does not sound to me like a feature that is sufficiently > polished that we want to expose it to users less sophisticated than > Heroku (i.e. nearly all of them). If we do this, I think we ought to > find a way to make the waits interruptible, and to insert them in > places where they really don't interfere with read-only backends. It would be nice to be able to be sure that CLogControlLock could not be held for multiple seconds as a result of this. However, I don't see any reasons to let the perfect be the enemy of the good, or at least the better. Just how likely is it that the scenario you describe will affect reads in the real world? In any case, this is a problem in its own right. I don't intend to make any promises about how this throttling will affect read queries, except perhaps something vague and loose. > I'd probably also argue that we ought to try to design it such that the > GUC can be in MB/s rather than delay/WAL writer cycle. That'd certainly be better, but I can't see a way of doing it without adding a whole bunch of mechanism to some important codepaths, like within XLogWrite(), which would be quite a hard sell. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)
On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas wrote: > pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when > I run "pgbench -S -c64 -j64 -T60 -M prepared" on a 32-core Linux machine: > > - 64.09% postgres postgres [.] tas >- tas > - 99.83% s_lock > - 53.22% LWLockAcquire > + 99.87% GetSnapshotData > - 46.78% LWLockRelease > GetSnapshotData > + GetTransactionSnapshot > + 2.97% postgres postgres [.] tas > + 1.53% postgres libc-2.13.so [.] 0x119873 > + 1.44% postgres postgres [.] GetSnapshotData > + 1.29% postgres [kernel.kallsyms] [k] arch_local_irq_enable > + 1.18% postgres postgres [.] AllocSetAlloc > ... > > So, on this test, a lot of time is wasted spinning on the mutex of > ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a > surprisingly steep drop in performance once you go beyond 29 clients > (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory > is that after that point all the cores are busy, and processes start to be > sometimes context switched while holding the spinlock, which kills > performance. I have, I also used linux perf to come to this conclusion, and my determination was similar: a system was undergoing increasingly heavy load, in this case with processes >> number of processors. It was also a phase-change type of event: at one moment everything would be going great, but once a critical threshold was hit, s_lock would consume enormous amount of CPU time. I figured preemption while in the spinlock was to blame at the time, given the extreme nature. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Wed, May 15, 2013 at 4:34 PM, Andres Freund wrote: > Hi, > > On 2013-05-15 16:26:15 -0500, Jon Nelson wrote: >> >> I have written up a patch to use posix_fallocate in new WAL file >> >> creation, including configuration by way of a GUC variable, but I've >> >> not contributed to the PostgreSQL project before. Therefore, I'm >> >> fairly certain the patch is not formatted properly or conforms to the >> >> appropriate style guides. Currently, the patch is based on 9.2, and is >> >> quite small in size - 3.6KiB. >> >> I have re-based and reformatted the code, and basic testing shows a >> reduction in WAL-file creation time of a fairly significant amount. >> I ran 'make test' and did additional local testing without issue. >> Therefore, I am attaching the patch. I will try to add it to the >> commitfest page. > > Some where quick comments, without thinking about this: Thank you for the kind feedback. > * needs a configure check for posix_fallocate. The current version will > e.g. fail to compile on windows or many other non linux systems. Check > how its done for posix_fadvise. I will address as soon as I am able. > * Is wal file creation performance actually relevant? Is the performance > of a system running on fallocate()d wal files any different? In my limited testing, I noticed a drop of approx. 100ms per WAL file. I do not have a good idea for how to really stress the WAL-file creation area without calling pg_start_backup and pg_stop_backup over and over (with archiving enabled). However, a file allocated with fallocate is (supposed to be) less fragmented than one created by the traditional means. > * According to the man page posix_fallocate doesn't set errno but rather > returns the error code. That's true. I originally wrote the patch using fallocate(2). What would be appropriate here? Should I switch on the return value and the six (6) or so relevant error codes? > * I wonder whether we ever want to actually disable this? Afair the libc > contains emulation for posix_fadvise if the filesystem doesn't support > it. I know that glibc does, but I don't know about other libc implementations. -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
Hi, On 2013-05-15 16:26:15 -0500, Jon Nelson wrote: > >> I have written up a patch to use posix_fallocate in new WAL file > >> creation, including configuration by way of a GUC variable, but I've > >> not contributed to the PostgreSQL project before. Therefore, I'm > >> fairly certain the patch is not formatted properly or conforms to the > >> appropriate style guides. Currently, the patch is based on 9.2, and is > >> quite small in size - 3.6KiB. > > I have re-based and reformatted the code, and basic testing shows a > reduction in WAL-file creation time of a fairly significant amount. > I ran 'make test' and did additional local testing without issue. > Therefore, I am attaching the patch. I will try to add it to the > commitfest page. Some where quick comments, without thinking about this: * needs a configure check for posix_fallocate. The current version will e.g. fail to compile on windows or many other non linux systems. Check how its done for posix_fadvise. * Is wal file creation performance actually relevant? Is the performance of a system running on fallocate()d wal files any different? * According to the man page posix_fallocate doesn't set errno but rather returns the error code. * I wonder whether we ever want to actually disable this? Afair the libc contains emulation for posix_fadvise if the filesystem doesn't support it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Tue, May 14, 2013 at 9:43 PM, Robert Haas wrote: > On Mon, May 13, 2013 at 9:54 PM, Jon Nelson wrote: >> Pertinent to another thread titled >> [HACKERS] corrupt pages detected by enabling checksums >> I hope to explore the possibility of using fallocate (or >> posix_fallocate) for new WAL file creation. >> >> Most modern Linux filesystems support fast fallocate/posix_fallocate, >> reducing extent fragmentation (where extents are used) and frequently >> offering a pretty significant speed improvement. In my tests, using >> posix_fallocate (followed by pg_fsync) is at least 28 times quicker >> than using the current method (which writes zeroes followed by >> pg_fsync). >> >> I have written up a patch to use posix_fallocate in new WAL file >> creation, including configuration by way of a GUC variable, but I've >> not contributed to the PostgreSQL project before. Therefore, I'm >> fairly certain the patch is not formatted properly or conforms to the >> appropriate style guides. Currently, the patch is based on 9.2, and is >> quite small in size - 3.6KiB. I have re-based and reformatted the code, and basic testing shows a reduction in WAL-file creation time of a fairly significant amount. I ran 'make test' and did additional local testing without issue. Therefore, I am attaching the patch. I will try to add it to the commitfest page. -- Jon 0001-enhance-GUC-and-xlog-with-wal_use_fallocate-boolean-.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "on existing update" construct
Well ya, stored procedure also was thinking about pre-insert trigger. Am sure folks here must have gone thru this and had alternatives. Also was reading about merge_by_key. On Wed, May 15, 2013 at 12:51 PM, Daniel Farina wrote: > On Wed, May 15, 2013 at 11:44 AM, Dev Kumkar > wrote: > > Hello, > > > > Is there an alternative of Sybase "on existing update" construct in > pgsql. > > > > "ON DUPLICATE KEY UPDATE" doesn't work. > > > > Thanks in advance! > > No, you'll have to either handle this in the application or use a > stored procedure at this time. The omission of such a construct from > psql's "\h" command and the manual is not in error. >
Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)
On Wed, May 15, 2013 at 11:35 AM, Heikki Linnakangas wrote: > Shared memory space is limited, but we only need the watermarks for any > in-progress truncations. Let's keep them in shared memory, in a small > fixed-size array. That limits the number of concurrent truncations that can > be in-progress, but that should be ok. Would it only limit the number of concurrent transactions that can be in progress *due to vacuum*? Or would it limit the total number of TOTAL concurrent truncations? Because a table could have arbitrarily many inheritance children, and you might try to truncate the whole thing at once... > To not slow down common backend > operations, the values (or lack thereof) are cached in relcache. To sync the > relcache when the values change, there will be a new shared cache > invalidation event to force backends to refresh the cached watermark values. > A backend (vacuum) can ensure that all backends see the new value by first > updating the value in shared memory, sending the sinval message, and waiting > until everyone has received it. AFAIK, the sinval mechanism isn't really well-designed to ensure that these kinds of notifications arrive in a timely fashion. There's no particular bound on how long you might have to wait. Pretty much all inner loops have CHECK_FOR_INTERRUPTS(), but they definitely do not all have AcceptInvalidationMessages(), nor would that be safe or practical. The sinval code sends catchup interrupts, but only for the purpose of preventing sinval overflow, not for timely receipt. Another problem is that sinval resets are bad for performance, and anything we do that pushes more messages through sinval will increase the frequency of resets. Now if those are operations are things that are relatively uncommon, it's not worth worrying about - but if it's something that happens on every relation extension, I think that's likely to cause problems. That could leave to wrapping the sinval queue around in a fraction of a second, leading to system-wide sinval resets. Ouch. > With the watermarks, truncation works like this: > > 1. Set soft watermark to the point where we think we can truncate the > relation. Wait until everyone sees it (send sinval message, wait). I'm also concerned about how you plan to synchronize access to this shared memory arena. I thought about implementing a relation size cache during the 9.2 cycle, to avoid the overhead of the approximately 1 gazillion lseek calls we do under e.g. a pgbench workload. But the thing is, at least on Linux, the system calls are pretty cheap, and on kernels >= 3.2, they are lock-free. On earlier kernels, there's a spinlock acquire/release cycle for every lseek, and performance tanks with >= 44 cores. That spinlock is around a single memory fetch, so a spinlock or lwlock around the entire array would presumably be a lot worse. It seems to me that under this system, everyone who would under present circumstances invoke lseek() would have to first have to query this shared memory area, and then if they miss (which is likely, since most of the time there won't be a truncation in progress) they'll still have to do the lseek. So even if there's no contention problem, there could still be a raw loss of performance. I feel like I might be missing a trick though; it seems like somehow we ought to be able to cache the relation size for long periods of time when no extension is in progress. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump versus defaults on foreign tables
Tom, > Comments? I personally think that FOREIGN should be optional on most commands where ambiguity is unlikely. In addition to ALTER, there's no particularly good reason why "DROP TABLE myforeigntable" doesn't work. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump versus defaults on foreign tables
While fooling with the issue of serial columns for foreign tables, I noticed that pg_dump will sometimes try to restore a default expression with a command like this: ALTER TABLE ONLY rem ALTER COLUMN f1 SET DEFAULT nextval('rem_f1_seq'::regclass); which when applied to a foreign table gets you ERROR: "rem" is a foreign table HINT: Use ALTER FOREIGN TABLE instead. Now, I suppose we could kluge up pg_dump to emit ALTER FOREIGN TABLE instead, but I'd like to object to this degree of anal retentivity. There is *no* good reason to refuse this command, and plenty of precedent in favor of being laxer. Most ALTER TABLE variants have historically allowed any relation type for which the action is sensible. As an example, three lines earlier in this same dump file I find ALTER TABLE public.rem_f1_seq OWNER TO postgres; for which we are not bleating that you have to use ALTER SEQUENCE instead (and we'd better not start, since this usage is well embedded in existing dump files). Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Sort
On Wed, May 15, 2013 at 3:04 PM, Noah Misch wrote: > On Tue, May 14, 2013 at 12:15:24PM -0300, Claudio Freire wrote: >> You know what would be a low-hanging fruit that I've been thinking >> would benefit many of my own queries? >> >> "Parallel" sequential scan nodes. Even if there's no real parallelism >> involved, when a query has to scan the same table at multiple nodes, >> if it's big, it would be worth parallelizing the scans to transform >> them into synchro scans. >> >> I have absolutely no idea how this would work easily without forked >> workers, because the scans might be buried in more complex execution >> trees. But still, it's worth considering, that parallelizing may >> benefit more than core usage. >> >> If execution nodes could be paused at arbitrary points, a "parallel >> scan" node could pause one branch that has consumed the circular >> buffer, letting another branches consume their part, and thus >> "parallelizing" branch execution. But this would be perhaps more >> complex than simply forking. > > Execution nodes do pause between every output tuple, at least nominally. > Still, given the architecture of our executor and the planner work to > implement such a thing, I would not classify it as low-hanging fruit. It > would primarily apply to a plan with independent sequential scans of the same > large (relative to total memory) relation. I'm sure that comes up, but it > doesn't strike me as typical. I found it rather typical of some of my workloads, but it could probably not be the case globally. It would be rather easier if it could pause without returning rows. I think ATM, not returning any rows means the node is finished doing its scan. The nodes that would have to be pausable like this wouldn't be sequential scans, but sorts, hashes, and in general those that take a long time to start returning rows. So, a plan that goes like: Seq on A -> Sort -> Merge -> result Seq on A -> Sort --/ Would be turned into: Seq on A -> Step Sort -> Parallel Merge -> result Seq on A -> Step Sort --/ Or even maybe Seq on A -> Sort -> Tee X -> Parallel Merge X --/ I think Tee and Parallel Merge should be doable with current infrastructure, because they don't require pausing without returning any tuples. Not sure how may meters above ground that is, or how many gotchas might be involved. But it's been circling in my head for a while. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
On 15.05.2013 22:50, Benedikt Grundmann wrote: On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas The subject says 9.2.3. Are you sure you're running 9.2.4 on all the servers? There was a fix to a bug related to starting a standby server from a filesystem snapshot. I don't think it was quite the case you have, but pretty close. So this is delightfully embarrassing I just went back to double check and - primary box is 9.2.3 - standby is 9.2.4 - testing is 9.2.4 I guess that alone could possibly explain it? Hmm, no, it should still work. There haven't been any changes in the WAL format. I do recommend upgrading the primary, of course, but I don't really see how that would explain what you're seeing. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas wrote: > On 15.05.2013 15:42, David Powers wrote: > >> First, thanks for the replies. This sort of thing is frustrating and hard >> to diagnose at a distance, and any help is appreciated. >> >> Here is some more background: >> >> We have 3 9.2.4 databases using the following setup: >> > > The subject says 9.2.3. Are you sure you're running 9.2.4 on all the > servers? There was a fix to a bug related to starting a standby server from > a filesystem snapshot. I don't think it was quite the case you have, but > pretty close. > > So this is delightfully embarrassing I just went back to double check and - primary box is 9.2.3 - standby is 9.2.4 - testing is 9.2.4 I guess that alone could possibly explain it? Thanks, Bene
Re: [HACKERS] "on existing update" construct
On Wed, May 15, 2013 at 11:44 AM, Dev Kumkar wrote: > Hello, > > Is there an alternative of Sybase "on existing update" construct in pgsql. > > "ON DUPLICATE KEY UPDATE" doesn't work. > > Thanks in advance! No, you'll have to either handle this in the application or use a stored procedure at this time. The omission of such a construct from psql's "\h" command and the manual is not in error. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] counting algorithm for incremental matview maintenance
Kevin, It's fairly common for matviews to be constructed such that updates to them are strictly appends. For example, a matview which has a daily summary would just get appended to each day, and existing rows would not change barring a major historical database cleanup. It seems like we could ... and ought to ... optimize for this pattern somehow for incremental updates. That is, if the user knows that we're going to be only appending new rows and not modifying any old ones, s/he ought to be able to tell the database that somehow and avoid the overhead of checking. While the overhead of checking a count wouldn't be that high for a few hundred rows, I've dealt with matviews which were thousands to millions of rows. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Sort
On Wed, May 15, 2013 at 11:32 AM, Peter Geoghegan wrote: > I think that this effort could justify itself independently of any > attempt to introduce parallelism to in-memory sorting. I abandoned a > patch to introduce timsort to Postgres, because I knew that there was > no principled way to reap the benefits. Just for the record, I attach a patch that introduces a timsort_arg function as a drop-in replacement for quicksort_arg (including replacing all of the specializations that went into 9.2). It has been rebased against master. For what it's worth, if anyone wanted to pick this up, that would be fine with me. Don't be fooled by the superficial regression test failures. The tests in question are subtly wrong, because they rely on a certain ordering that isn't explicitly requested. Timsort is stable, whereas quicksort generally isn't stable (our implementation certainly isn't). -- Peter Geoghegan timsort.2013_05_15.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
On 15.05.2013 15:42, David Powers wrote: First, thanks for the replies. This sort of thing is frustrating and hard to diagnose at a distance, and any help is appreciated. Here is some more background: We have 3 9.2.4 databases using the following setup: The subject says 9.2.3. Are you sure you're running 9.2.4 on all the servers? There was a fix to a bug related to starting a standby server from a filesystem snapshot. I don't think it was quite the case you have, but pretty close. - A primary box - A standby box running as a hot streaming replica from the primary - A testing box restored nightly from a static backup As noted, the static backup is taken off of the standby by taking an LVM snapshot of the database filesystem and rsyncing. I don't think it's a likely problem but the rsync leverages the previous backup (using --link-dest) to make the rsync faster and the resulting backup smaller. I guess that should work, as long as you make a full copy when you restore the backup, and never start a postgres server directly against a backup that contains the hardlinks. We've been using the same system for quite some time, but previously (~ 1 month ago) had been taking the backup off of the primary (still using the LVM snapshot). The replication is a recent addition, and a very helpful one. LVM snapshots aren't lightweight in the face of writes and in some circumstances a long running rsync would spike the IO load on the production box. Results of some additional tests: After the user noticed that the test restore showed the original problem we ran `vacuum analyze` on all three testing databases thinking that it had a good chance of quickly touching most of the underlying files. That gave us errors on two of the testing restores similar to: ERROR: invalid page header in block 5427 of relation base/16417/199732075 Huh, that's surprising. You have to be quite unlucky to end with a corrupt page header, even if there's something wrong with the underlying storage or backup routines. Could you grab a copy of that block? Something like dd skip=5427 bs=8192 count=1 if=base/16417/199732075 of=corrupt-block I'm especially curious what the first 20 bytes or so look like. I'm guessing that it's all zeros, while some later part of the page contains data. Ie. the page was torn at some point in the process. I'm also running the vacuum analyze on the production machines to double check that the base databases don't have a subtle corruption that simply hasn't been noticed. They run with normal autovacuum settings, so I suspect that they are fine/this won't show anything because we should have seen this from the autovacuum daemon before. Vacuum analyze doesn't always scan the whole table with default settings, so that's no proof that are no missing pages. "set vacuum_freeze_table_age=0; vacuum " will force a full scan of the table. I'm happy to share the scripts we use for the backup/restore process if the information above isn't enough, as well as the logs - though the postgres logs don't seem to contain much of interest (the database system doesn't really get involved). Yeah, please share the scripts. What you've described seems correct to me, but maybe there's something I'm missing. I also have the rsyncs of the failed snapshots available and could restore them for testing purposes. It's also easy to look in them (they are just saved as normal directories on a big SAN) if I know what to look for. Great. Not sure what to look at right now, but good that you have kept the evidence. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Sort
On 2013-05-13 10:28:59 -0400, Noah Misch wrote: > Each worker needs to make SnapshotNow visibility decisions coherent with the > master. For sorting, this allows us to look up comparison functions, even > when the current transaction created or modified those functions. This will > also be an essential building block for any parallelism project that consults > user tables. Implementing this means copying the subtransaction stack and the > combocid hash to each worker. For the sake of completeness, we should also > copy the global MVCC snapshot data (sorting probably won't care). It also > means forbidding, while a parallel task is in flight, operations that affect > the transaction state: Btw, if you assume you can simply copy a snapshot from the normal backend to the worker backend to make visibility decisions in the general case: You're wrong. Unfortunately you need in-memory state to make sense of combocids... Not impossible to solve, but you should be aware of the issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] "on existing update" construct
Hello, Is there an alternative of Sybase "on existing update" construct in pgsql. "ON DUPLICATE KEY UPDATE" doesn't work. Thanks in advance! Regards - Dev
Re: [HACKERS] commit fest schedule for 9.4
On 05/15/2013 11:05 AM, Andrew Dunstan wrote: > > On 05/15/2013 02:00 PM, Josh Berkus wrote: >>> Obviously we need a meta-manager who makes sure we have managers, and is >>> able to notice that a manager is MIA and needs replaced (or at least >>> backed-up). Actuall, on a more serious basis, we could simply assign a backup CFM (CFM-b) for each CF. The backup CFM would jump in if nobody has heard from the CFM for several days. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
Le mercredi 15 mai 2013 16:43:17, Andrew Dunstan a écrit : > On 05/15/2013 10:05 AM, Tom Lane wrote: > > Peter Eisentraut writes: > >> That said, I'm obviously outnumbered here. What about the following > >> compromise: Use the configure-selected install program inside > >> PostgreSQL (which we can test easily), and use install-sh under > >> USE_PGXS? Admittedly, the make install time of extensions is probably > >> not an issue. > > > > That works for me, since as you say we can easily fix any such bugs > > in the core code. The scary thing about this for extension authors > > is that they may very well see no bug in their own testing, only to > > have their packages fall over in the wild. We shouldn't make each > > author who's copied that code rediscover the problem for themselves > > that expensively. > > +1, although I will be renovating the Makefiles for all my extensions > along the lines of my previous email. pgxs.mk has some old rules to replace $libdir (and some few other maybe). Maybe we can try to find what make sense for most of the extension authors and add rules/target to pgxs.mk to reduce the useless copy/paste in the Makefile of extensions. So what's desirable ? -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Parallel Sort
On Mon, May 13, 2013 at 7:28 AM, Noah Misch wrote: > We should decide whether to actually sort in parallel based on the comparator > cost and the data size. The system currently has no information on comparator > cost: bt*cmp (and indeed almost all built-in functions) all have procost=1, > but bttextcmp is at least 1000x slower than btint4cmp. I think that this effort could justify itself independently of any attempt to introduce parallelism to in-memory sorting. I abandoned a patch to introduce timsort to Postgres, because I knew that there was no principled way to reap the benefits. Unless you introduce parallelism, it's probably going to be virtually impossible to come up with an alogorithm that does in-memory sorting faster (in terms of the amount of system time taken) than a highly optimized quicksort when sorting integers. But sorting types with really expensive comparators (even considerably more expensive than bttextcmp) for pass-by-reference Datums (where the memory locality advantage of quicksort doesn't really help so much) makes timsort much more compelling. That's why it's used for Python lists. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] counting algorithm for incremental matview maintenance
Merlin Moncure wrote: > Kevin Grittner wrote: >> Merlin Moncure wrote: >> >>> #1 issue I have with current matview functionality is locking. >>> currently refresh takes out an access exclusive lock. so, >>> question is, do you think your proposal will be such that it >>> will no longer require taking out full lock for refresh >>> purposes (either incremental or otherwise)? >> >> The right thread for *that* question is "Differential >> (transactional) REFRESH"; however, I might as well say here that >> I don't think we want to get rid of the (faster) version that >> just replaces the current heap when we add the (slower) option >> to REFRESH it transactionally. > > sorry, didn't notice that thread. agreed, that seems good > candidate for user input to refresh command to manage the > tradeoff. > > well, do you expect the application of differential refresh to be > automatic? I expect considerable bikeshedding on this, but my preference would be to allow syntax for specifying which technique is desired on the REFRESH command, and in the absence of specification I would prefer that it default to the current technique for a matview which is not yet populated and the transactional technique for a populated matview. We could associate some property with the matview for default REFRESH technique, but I don't know whether that's worth the trouble. > lockless + differential refresh would be game changer in terms of > how I build up data for analytics. Yeah, it's definitely something I would have liked to have in the initial release, but I tried to keep the scope very limited given how little time there was left in the release cycle when I got a chance to start on it. Adding this seemed to be just a little too much. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Sort
On Wed, May 15, 2013 at 12:26:52PM +0530, Amit Kapila wrote: > On Monday, May 13, 2013 7:59 PM Noah Misch wrote: > > We can allocate a small amount of permanent shared memory for > > coordination > > among a group of processes, but sorting will benefit from a region as > > large as > > maintenance_work_mem. Expect on-demand memory sharing. > > Will the shared memory used for coordinating tuples between master and > worker be fixed or varying depending on size of tuples to be sorted or > number of workers associated. > If it is varying, then it can sometimes encounter situation where required > memory is not available and in that case it has to revert to serial sorting > How will the parallel sorting tasks be divided and assigned to each worker? I haven't selected answers for those details, yet. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Sort
On Wed, May 15, 2013 at 08:12:34AM +0900, Michael Paquier wrote: > The concept of clause parallelism for backend worker is close to the > concept of clause shippability introduced in Postgres-XC. In the case of > XC, the equivalent of the master backend is a backend located on a node > called Coordinator that merges and organizes results fetched in parallel > from remote nodes where data scans occur (on nodes called Datanodes). The > backends used for tuple scans across Datanodes share the same data > visibility as they use the same snapshot and transaction ID as the backend > on Coordinator. This is different from the parallelism as there is no idea > of snapshot import to worker backends. Worker backends would indeed share snapshot and XID. > However, the code in XC planner used for clause shippability evaluation is > definitely worth looking at just considering the many similarities it > shares with parallelism when evaluating if a given clause can be executed > on a worker backend or not. It would be a waste to implement twice the same > thing is there is code already available. Agreed. Local parallel query is very similar to distributed query; the specific IPC cost multipliers differ, but that's about it. I hope we can benefit from XC's experience in this area. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fest schedule for 9.4
On 05/15/2013 02:00 PM, Josh Berkus wrote: Obviously we need a meta-manager who makes sure we have managers, and is able to notice that a manager is MIA and needs replaced (or at least backed-up). And then a meta-meta-manager to make sure that the meta-manager is meta-managing. And an Inspector General. Anyone have Danny Kaye's phone number? Or Gogol's? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Sort
On Tue, May 14, 2013 at 12:15:24PM -0300, Claudio Freire wrote: > You know what would be a low-hanging fruit that I've been thinking > would benefit many of my own queries? > > "Parallel" sequential scan nodes. Even if there's no real parallelism > involved, when a query has to scan the same table at multiple nodes, > if it's big, it would be worth parallelizing the scans to transform > them into synchro scans. > > I have absolutely no idea how this would work easily without forked > workers, because the scans might be buried in more complex execution > trees. But still, it's worth considering, that parallelizing may > benefit more than core usage. > > If execution nodes could be paused at arbitrary points, a "parallel > scan" node could pause one branch that has consumed the circular > buffer, letting another branches consume their part, and thus > "parallelizing" branch execution. But this would be perhaps more > complex than simply forking. Execution nodes do pause between every output tuple, at least nominally. Still, given the architecture of our executor and the planner work to implement such a thing, I would not classify it as low-hanging fruit. It would primarily apply to a plan with independent sequential scans of the same large (relative to total memory) relation. I'm sure that comes up, but it doesn't strike me as typical. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fest schedule for 9.4
> Obviously we need a meta-manager who makes sure we have managers, and is > able to notice that a manager is MIA and needs replaced (or at least > backed-up). And then a meta-meta-manager to make sure that the meta-manager is meta-managing. And an Inspector General. Anyone have Danny Kaye's phone number? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fest schedule for 9.4
Josh Berkus wrote: > On 05/15/2013 10:30 AM, Alvaro Herrera wrote: > > Uhm. If I've been anything wrt the 9.3 schedule, happy is not it. We > > completely failed to manage it in any kind of sane way. I vote +1 for > > keeping the same commitfest schedule this year, but please let's do > > everyone a favor and make sure we get some more (stricter?) management > > of the commitfests this time. Not closing the November commitfest in > > March would be appreciated, I think. > > Well, we could actually *follow* the schedule we outlined at the > Developer Meeting last year, including the triage periods. > > I'll also say: > * we need to assign CF managers at least 2 weeks in advance of each CF * > we need to replace them if they get too busy to follow-through, > * and the last CF needs two managers. Obviously we need a meta-manager who makes sure we have managers, and is able to notice that a manager is MIA and needs replaced (or at least backed-up). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
First, thanks for the replies. This sort of thing is frustrating and hard to diagnose at a distance, and any help is appreciated. Here is some more background: We have 3 9.2.4 databases using the following setup: - A primary box - A standby box running as a hot streaming replica from the primary - A testing box restored nightly from a static backup As noted, the static backup is taken off of the standby by taking an LVM snapshot of the database filesystem and rsyncing. I don't think it's a likely problem but the rsync leverages the previous backup (using --link-dest) to make the rsync faster and the resulting backup smaller. Each database is ~1.5T, so this is necessary to keep static backup times reasonable. We've been using the same system for quite some time, but previously (~ 1 month ago) had been taking the backup off of the primary (still using the LVM snapshot). The replication is a recent addition, and a very helpful one. LVM snapshots aren't lightweight in the face of writes and in some circumstances a long running rsync would spike the IO load on the production box. Results of some additional tests: After the user noticed that the test restore showed the original problem we ran `vacuum analyze` on all three testing databases thinking that it had a good chance of quickly touching most of the underlying files. That gave us errors on two of the testing restores similar to: ERROR: invalid page header in block 5427 of relation base/16417/199732075 In the meantime I modified the static backup procedure to shut standby completely down before taking the LVM snapshot and am trying a restore using that snapshot now. I'll test that using the same vacuum analyze test, and if that passes, a full vacuum. I'm also running the vacuum analyze on the production machines to double check that the base databases don't have a subtle corruption that simply hasn't been noticed. They run with normal autovacuum settings, so I suspect that they are fine/this won't show anything because we should have seen this from the autovacuum daemon before. I'm happy to share the scripts we use for the backup/restore process if the information above isn't enough, as well as the logs - though the postgres logs don't seem to contain much of interest (the database system doesn't really get involved). I also have the rsyncs of the failed snapshots available and could restore them for testing purposes. It's also easy to look in them (they are just saved as normal directories on a big SAN) if I know what to look for. -David On Wed, May 15, 2013 at 2:24 AM, Heikki Linnakangas wrote: > On 14.05.2013 23:47, Benedikt Grundmann wrote: > >> The only thing that is *new* is that we took the snapshot from the >> >> streaming replica. So again my best guess as of now is that if the >> database crashes while it is in streaming standby a invalid disk state can >> result during during the following startup (in rare and as of now unclear >> circumstances). >> > > A bug is certainly possible. There isn't much detail here to debug with, > I'm afraid. Can you share the full logs on all three systems? I'm > particularly interest > > > You seem to be quite convinced that it must be LVM can you elaborate why? >> > > Well, you said that there was a file in the original filesystem, but not > in the snapshot. If you didn't do anything in between, then surely the > snapshot is broken, if it skipped a file. Or was the file created in the > original filesystem after the snapshot was taken? You probably left out > some crucial details on how exactly the snapshot and rsync are performed. > Can you share the scripts you're using? > > Can you reproduce this problem with a new snapshot? Do you still have the > failed snapshot unchanged? > > - Heikki >
Re: [HACKERS] commit fest schedule for 9.4
On 05/15/2013 10:30 AM, Alvaro Herrera wrote: > Uhm. If I've been anything wrt the 9.3 schedule, happy is not it. We > completely failed to manage it in any kind of sane way. I vote +1 for > keeping the same commitfest schedule this year, but please let's do > everyone a favor and make sure we get some more (stricter?) management > of the commitfests this time. Not closing the November commitfest in > March would be appreciated, I think. Well, we could actually *follow* the schedule we outlined at the Developer Meeting last year, including the triage periods. I'll also say: * we need to assign CF managers at least 2 weeks in advance of each CF * we need to replace them if they get too busy to follow-through, * and the last CF needs two managers. We went pretty far off schedule this year because we didn't manage the process. Especially given the rudimentary tools we have to work with, we can't afford to do that. I'd be happy to manage the first CF on June 15th, if nobody else wants to. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fest schedule for 9.4
Heikki Linnakangas wrote: > On 14.05.2013 05:34, Peter Eisentraut wrote: > >In the last two years, the first commit fest started in June, which is > >about a month from now. If we are going to do that again, we should > >clarify that as soon as possible. And if we are not, then we should > >also clarify that, because some people are probably expecting that we > >are. > > > >So, any thoughts on the commit fest schedule, any need for deviations, > >or same old? > > Same old sounds good to me. We released 9.3beta1 at the same time > this year that we released 9.2beta1 last year. I've been quite happy > with the 9.3 schedule. There are a few items on the 9.3 Open Items > list, but I would expect them to be resolved in the next week or > two, so that barring any major new issues, we could release 9.3beta2 > at the end of May, like we released 9.2beta2 last year. Uhm. If I've been anything wrt the 9.3 schedule, happy is not it. We completely failed to manage it in any kind of sane way. I vote +1 for keeping the same commitfest schedule this year, but please let's do everyone a favor and make sure we get some more (stricter?) management of the commitfests this time. Not closing the November commitfest in March would be appreciated, I think. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] counting algorithm for incremental matview maintenance
On Wed, May 15, 2013 at 11:33 AM, Kevin Grittner wrote: > Merlin Moncure wrote: > >> #1 issue I have with current matview functionality is locking. >> currently refresh takes out an access exclusive lock. so, >> question is, do you think your proposal will be such that it will >> no longer require taking out full lock for refresh purposes >> (either incremental or otherwise)? > > The right thread for *that* question is "Differential > (transactional) REFRESH"; however, I might as well say here that I > don't think we want to get rid of the (faster) version that just > replaces the current heap when we add the (slower) option to > REFRESH it transactionally. sorry, didn't notice that thread. agreed, that seems good candidate for user input to refresh command to manage the tradeoff. well, do you expect the application of differential refresh to be automatic? lockless + differential refresh would be game changer in terms of how I build up data for analytics. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed TODO: add support for "any" for PL/PythonU and PL/Perl
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Il 14/05/2013 20:24, Josh Berkus ha scritto: > Hackers, > > I'd like to add the following todo items to the TODO list: > > PL/Python: > * add support for anyelement and anyarray to PL/Python > * add support for VARIADIC "ANY" to PL/Python > > PL/Perl: > * add support for anyelement and anyarray to PL/Perl > * add support for VARIADIC "ANY" to PL/Perl > > The reason for this is that both Python and Perl are loosely typed > languages, I don't know about Perl, but, to be precise, Python is not a loosely typed language. It is a strongly typed dynamic language. > [...] Regards Manlio Perillo -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlGTq1sACgkQscQJ24LbaUQdogCeNd1Boj/w2UaQIvDuiSU41Lrn UnoAn3vuehDS2+woqVyahwktm5sYCtGG =mwuk -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] counting algorithm for incremental matview maintenance
Merlin Moncure wrote: > #1 issue I have with current matview functionality is locking. > currently refresh takes out an access exclusive lock. so, > question is, do you think your proposal will be such that it will > no longer require taking out full lock for refresh purposes > (either incremental or otherwise)? The right thread for *that* question is "Differential (transactional) REFRESH"; however, I might as well say here that I don't think we want to get rid of the (faster) version that just replaces the current heap when we add the (slower) option to REFRESH it transactionally. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] counting algorithm for incremental matview maintenance
On Tue, May 14, 2013 at 2:52 PM, Kevin Grittner wrote: > In surveying the literature on $subject, I find that most of the > theoretical work related to how to incrementally update > materialized views based on the matview declaration was published > between 1988 and 1993. The best paper I have been able to find on > the topic was published in ACM SIGMOD in 1993[1], and covers two > algorithms: counting and DRed. The former should be very fast for > non-recursive views, but not very efficient for recursive views. > The latter algorithm is the other way around -- it looks like it > will do well with recursive views but generally be slower for > non-recursive views. > > It does not seem feasible to me to implement both techniques in a > single one-year PostgreSQL release. In fact, if we have trouble > getting everyone onto the same page early, we might have to settle > for trying to just get some infrastructure into place, without > anything to actually make use of it. That would be unfortunate, > since Oracle added incremental maintenance of matviews to their > existing feature in 1999, and have been improving it regularly > since then. Many other products also have mature implementations > of this, and there seems to be a lot of demand for it in > PostgreSQL. In the best of circumstances it will take years for us > to catch up on this front. #1 issue I have with current matview functionality is locking. currently refresh takes out an access exclusive lock. so, question is, do you think your proposal will be such that it will no longer require taking out full lock for refresh purposes (either incremental or otherwise)? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
On Wed, May 15, 2013 at 09:51:15AM -0400, Peter Eisentraut wrote: > On 5/14/13 10:38 AM, Cédric Villemain wrote: > > If I follow your example, then I would rewrite http://manager.pgxn.org/howto > > Oh that's where that is coming from? Well that example has all kinds of > problems. Would you be so kind as to point out same, or better still, to write up what you think of as a better example intended for the same audience? I'm sure the PGXN people would be delighted to put something better up there. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
On Wed, May 15, 2013 at 11:04 PM, Kyotaro HORIGUCHI wrote: >> Is it right that it is only in the case a password prompt is needed >> that a new connection is created after dropping the just-failed >> connection? > > It's quite doubtful.\:-p The sequense seems fragile to say the > least. Inserting password request state into the client-side state > machine looks quite reasonable. Looking at current code (well, pseudo-code!) : do { new_pass = false; if (CONNECTION_BAD && NEEDS_PASSWORD && password == NULL && ! FORCE_NO_PASSOWRD) { PQfinish(); password = simple_prompt() ; new_pass = true; } }while(new_pass) So, it looks like the loop will be repeated only if an authentication method requiring the user to enter password is encountered in the PQconnectPoll() which are AUTH_REQ_MD5 & AUTH_REQ_PASSWORD. As you can see in the following code fragment from pg_fe_sendauth() which apparently sets conn->password_needed: case AUTH_REQ_MD5: case AUTH_REQ_PASSWORD: conn->password_needed = true; if (conn->pgpass == NULL || conn->pgpass[0] == '\0') { printfPQExpBuffer(&conn->errorMessage, PQnoPasswordSupplied); return STATUS_ERROR; } if (pg_password_sendauth(conn, conn->pgpass, areq) != STATUS_OK) { printfPQExpBuffer(&conn->errorMessage, "fe_sendauth: error sending password authentication\n"); return STATUS_ERROR; } break; this seems to be the only code path that causes conn->password_needed to be set to true. So, these seem to be only cases when a prompt will be provided and new_pass would become true causing the drop-and-reconnect by repetition of the loop. Am I missing some other case when this might happen? >> I created a patch which enables it to use the existing connection in >> such a case (unlike what we currently do). It modifies >> connectDBComplete() and PQconnectPoll() to also include states >> pertaining to password being accepted from the user. That is, the >> state machine in PQconnectPoll() is further extended to include a >> connection state called CONNECTION_ASKING_PASSWORD which is entered >> when server sends AUTH_REQ_MD5 or AUTH_REQ_PASSWORD auth requests. > > Great! The new client state seems to be effective also for MD5. But > it seems to break existing libpq client doing the same authentication > sequence as current psql. Some means would be necessary to switch the > behavior when password is not previously provided but needed by the > server, or make the first half of the connection sequence to be > compatible to the current sequence - in other words - It should be > that when the user finds stauts is CONNECTION_BAD and > PQconnectionNeedsPassword() == true, the user can throw away the > connection and make new connection providing password, and also can > send password on existing connection. The first half of connection sequence remains same except for one change: in PQconnectPoll(), when in case CONNECTION_AWAITING_RESPONSE, if server sends md5/password authentication request, it returns PGRES_POLLING_WAITING_PASSWORD, which, back in connectDBComplete() sets conn->password = true and conn->status = CONNECTION_ASKING_PASSWORD. Back in main(), this causes a password prompt and then the second half of the connection sequence. Hence pg_fe_sendauth() is not called in this first half unless it's a different authentication method than md5 and password. > > the old style > > | db = PQconnect(); > | if (PQstatus(db) == CONNECTION_BAD && PQconnectionNeedsPassword(db)) > | { > | PQfinish(db); > | value[..] = password = ; > | db = PQconnect(); > | if (PQstatus(db) == CONNECTION_BAD) > | > > and the new style > > | db = PQconnect(); > | if (PQconnectionNeedsPassword(db)) > | PQsendPassword(db, password); > | if (PQstatus(db) == CONNECTION_BAD) > | > > should be standing together. I see this accounts for CONNECTION_BAD (if any) in the first half. But this CONNECTION_BAD has other reasons than conn->password_needed as far as I can imagine since conn->password_needed would only be set in connectDBComplete() in PGRES_POLLING_WAITING_PASSWORD. So, this CONNECTION_BAD would require some different processing. Thoughts? > Where, PQsendPassword is combined function of PQcopyPassword and > PQcontinuedbConnect. If the only porpose of these functions is sending > password then these functions are needed to be separately. > > What do you think for the compatibility and simpler API. I think one function PQsendPassword(PGconn*, char *) would be sufficient which would contain the code of both PQcopyPassword() and PQcontinuedbConnect(). I would complete the connection sequence by running its second half. >> The backend waits for the password until authentication timeout >> happens in which case the client can not send the pass
[HACKERS] Heap truncation without AccessExclusiveLock (9.4)
Truncating a heap at the end of vacuum, to release unused space back to the OS, currently requires taking an AccessExclusiveLock. Although it's only held for a short duration, it can be enough to cause a hiccup in query processing while it's held. Also, if there is a continuous stream of queries on the table, autovacuum never succeeds in acquiring the lock, and thus the table never gets truncated. I'd like to eliminate the need for AccessExclusiveLock while truncating. Design -- In shared memory, keep two watermarks: a "soft" truncation watermark, and a "hard" truncation watermark. If there is no truncation in progress, the values are not set and everything works like today. The soft watermark is the relation size (ie. number of pages) that vacuum wants to truncate the relation to. Backends can read pages above the soft watermark normally, but should refrain from inserting new tuples there. However, it's OK to update a page above the soft watermark, including adding new tuples, if the page is not completely empty (vacuum will check and not truncate away non-empty pages). If a backend nevertheless has to insert a new tuple to an empty page above the soft watermark, for example if there is no more free space in any lower-numbered pages, it must grab the extension lock, and update the soft watermark while holding it. The hard watermark is the point above which there is guaranteed to be no tuples. A backend must not try to read or write any pages above the hard watermark - it should be thought of as the end of file for all practical purposes. If a backend needs to write above the hard watermark, ie. to extend the relation, it must first grab the extension lock, and raise the hard watermark. The hard watermark is always >= the soft watermark. Shared memory space is limited, but we only need the watermarks for any in-progress truncations. Let's keep them in shared memory, in a small fixed-size array. That limits the number of concurrent truncations that can be in-progress, but that should be ok. To not slow down common backend operations, the values (or lack thereof) are cached in relcache. To sync the relcache when the values change, there will be a new shared cache invalidation event to force backends to refresh the cached watermark values. A backend (vacuum) can ensure that all backends see the new value by first updating the value in shared memory, sending the sinval message, and waiting until everyone has received it. With the watermarks, truncation works like this: 1. Set soft watermark to the point where we think we can truncate the relation. Wait until everyone sees it (send sinval message, wait). 2. Scan the pages to verify they are still empty. 3. Grab extension lock. Set hard watermark to current soft watermark (a backend might have inserted a tuple and raised the soft watermark while we were scanning). Release lock. 4. Wait until everyone sees the new hard watermark. 5. Grab extension lock. 6. Check (or wait) that there are no pinned buffers above the current hard watermark. (a backend might have a scan in progress that started before any of this, still holding a buffer pinned, even though it's empty.) 7. Truncate relation to the current hard watermark. 8. Release extension lock. If a backend inserts a new tuple before step 2, the vacuum scan will see it. If it's inserted after step 2, the backend's cached soft watermark is already up-to-date, and thus the backend will update the soft watermark before the insert. Thus after the vacuum scan has finished the scan at step 2, all pages above the current soft watermark must still be empty. Implementation details -- There are three kinds of access to a heap page: A) As a target for new tuple. B) Following an index pointer, ctid or similar. C) A sequential scan (and bitmap heap scan?) To refrain from inserting new tuples to non-empty pages above the soft watermark (A), RelationGetBufferForTuple() is modified to check the soft watermark (and raise it if necessary). An index scan (B) should never try to read beyond the high watermark, because there are no tuples above it, and thus there should be no pointers to pages above it either. A sequential scan (C) must refrain from reading beyond the hard watermark. This can be implemented by always checking the (cached) high watermark value before stepping to next page. Truncation during hot standby is a lot simpler: set soft and hard watermarks to the truncation point, wait until everyone sees the new values, and truncate the relation. Does anyone see a flaw in this? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw foreign tables and serial columns
* Tom Lane (t...@sss.pgh.pa.us) wrote: > The generic issue there is whether we can allow column defaults to be > evaluated on the remote end. The original postgres_fdw submission tried > to support that, but it had enough bugs and logical inconsistencies that > I ended up ripping that out before commit. There's a good deal of > discussion about that in the archives (in January or February IIRC). Yeah, I watched much of that go by- just couldn't follow it entirely at the time. Still.. > However, when and if we do allow that to work, I'd still say that it's > reasonable for "SERIAL" to mean local creation of the default value. I agree with this; all I was trying to get at is that we shouldn't close off any doors to eventually providing a way for defaults to be pushed to the remote. > If you want a remotely-supplied default to work, you'd not put a DEFAULT > clause into the local definition; and SERIAL is essentially a shorthand > for a DEFAULT clause. Agreed. > Yeah, I think the possibility of such a workaround was one of the > reasons we decided it was okay to support only locally-computed > defaults for now. Right, and, of course, a simple trigger on the remote table would probably work just fine too. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] postgres_fdw foreign tables and serial columns
Stephen Frost writes: > I'm alright with allowing it and making it the 'default', but I can > certainly see use-cases for having a sequence on the remote side which > is used for new values for that table and I'd suggest that we figure out > a way to support that. The generic issue there is whether we can allow column defaults to be evaluated on the remote end. The original postgres_fdw submission tried to support that, but it had enough bugs and logical inconsistencies that I ended up ripping that out before commit. There's a good deal of discussion about that in the archives (in January or February IIRC). However, when and if we do allow that to work, I'd still say that it's reasonable for "SERIAL" to mean local creation of the default value. If you want a remotely-supplied default to work, you'd not put a DEFAULT clause into the local definition; and SERIAL is essentially a shorthand for a DEFAULT clause. > At first blush, with 'simple' writable views, perhaps that can just be a > view definition on the remote side which doesn't include that column and > therefore that column won't be sent to the remote side explicitly but, > but the view, running on the remote, would turn around and pick up the > default value for any fields which aren't in the view definition when > inserting into the table underneath. Yeah, I think the possibility of such a workaround was one of the reasons we decided it was okay to support only locally-computed defaults for now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw foreign tables and serial columns
* Andrew Dunstan (and...@dunslane.net) wrote: > On 05/15/2013 10:27 AM, Tom Lane wrote: > >You can certainly argue that it might be silly to have a local sequence > >generating the default value for insertions into a remote table; but > >on the other hand, it might not be silly, depending on usage scenario. > >I don't think the database should be enforcing a policy choice like > >that. > > Presumably if it's not appropriate they won't define it as a serial > column. If Postgres is the primary source of the data rather than > the foreign handler then it makes plenty of sense to have a serial > column, I should think. > > So +1 for allowing it. I'm alright with allowing it and making it the 'default', but I can certainly see use-cases for having a sequence on the remote side which is used for new values for that table and I'd suggest that we figure out a way to support that. At first blush, with 'simple' writable views, perhaps that can just be a view definition on the remote side which doesn't include that column and therefore that column won't be sent to the remote side explicitly but, but the view, running on the remote, would turn around and pick up the default value for any fields which aren't in the view definition when inserting into the table underneath. I'm not suggesting that as something we do for the user, but perhaps we could include a note along these lines in the docs for users who need the default evaluated on the remote? And come up with a better way to handle it in the future (perhaps an option in the foreign table definition?). This, of course, presumes that the solution I've described actually works, iow, not tested. :) Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
On 05/15/2013 10:05 AM, Tom Lane wrote: Peter Eisentraut writes: That said, I'm obviously outnumbered here. What about the following compromise: Use the configure-selected install program inside PostgreSQL (which we can test easily), and use install-sh under USE_PGXS? Admittedly, the make install time of extensions is probably not an issue. That works for me, since as you say we can easily fix any such bugs in the core code. The scary thing about this for extension authors is that they may very well see no bug in their own testing, only to have their packages fall over in the wild. We shouldn't make each author who's copied that code rediscover the problem for themselves that expensively. +1, although I will be renovating the Makefiles for all my extensions along the lines of my previous email. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw foreign tables and serial columns
On 05/15/2013 10:27 AM, Tom Lane wrote: You can certainly argue that it might be silly to have a local sequence generating the default value for insertions into a remote table; but on the other hand, it might not be silly, depending on usage scenario. I don't think the database should be enforcing a policy choice like that. Presumably if it's not appropriate they won't define it as a serial column. If Postgres is the primary source of the data rather than the foreign handler then it makes plenty of sense to have a serial column, I should think. So +1 for allowing it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw foreign tables and serial columns
Albe Laurenz writes: > Tom Lane wrote: >> "Nicholson, Brad (Toronto, ON, CA)" writes: >>> [ this error message sucks: ] >>> test=# create foreign table local_foo (id serial) server test_server >>> options (table_name 'foo'); >>> ERROR: referenced relation "local_foo" is not a table >> Yeah, I'd noticed that myself. We could probably tweak the code to >> issue a different error message and/or add a HINT if the serial's >> parent relation is a foreign table. I'm not exactly sure what it >> should say though. Thoughts? > HINT: Serial columns can only be defined for local tables. Use "integer" or > "bigint" instead. Actually ... wait a minute. Why *don't* we allow SERIAL columns in foreign tables? That made sense before, but now that we support column defaults for them, I don't see any good reason for this prohibition. The behavior, if we just remove this error check, would be that we'd create a local sequence and it would become the source of default values for insertions into the foreign table. Since we've already committed to the decision that column defaults are evaluated locally, this is perfectly consistent. You can certainly argue that it might be silly to have a local sequence generating the default value for insertions into a remote table; but on the other hand, it might not be silly, depending on usage scenario. I don't think the database should be enforcing a policy choice like that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw foreign tables and serial columns
Tom Lane wrote: > "Nicholson, Brad (Toronto, ON, CA)" writes: >> [ this error message sucks: ] >> test=# create foreign table local_foo (id serial) server test_server options >> (table_name 'foo'); >> ERROR: referenced relation "local_foo" is not a table > > Yeah, I'd noticed that myself. We could probably tweak the code to > issue a different error message and/or add a HINT if the serial's > parent relation is a foreign table. I'm not exactly sure what it > should say though. Thoughts? HINT: Serial columns can only be defined for local tables. Use "integer" or "bigint" instead. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
Le mardi 14 mai 2013 15:08:42, Andrew Dunstan a écrit : > On 05/14/2013 07:59 AM, Peter Eisentraut wrote: > > On 5/14/13 4:17 AM, Marti Raudsepp wrote: > >> On Tue, May 14, 2013 at 5:27 AM, Peter Eisentraut wrote: > >>> On Tue, 2013-05-14 at 04:12 +0300, Marti Raudsepp wrote: > It's caused by this common pattern in extension makefiles: > DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql > >>> > >>> What is the point of this? Why have the wildcard and then the > >>> non-wildcard term? > >> > >> Because the non-wildcard file is built by the same Makefile (it's > >> copied from the sql/$(EXTENSION).sql file). If it wasn't there, a > >> "make install" from a clean checkout would miss this file. > > > > If it's built, then it should be listed in DATA_built. > > So, AIUI, leaving aside stylistic arguments about use of wildcards, one > solution could be: > > DATA_built = sql/$(EXTENSION)--$(EXTVERSION).sql > DATA = $(filter-out sql/$(EXTENSION)--$(EXTVERSION).sql, $(wildcard > sql/*--*.sql)) > > Is that right? Yes. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
Peter Eisentraut writes: > That said, I'm obviously outnumbered here. What about the following > compromise: Use the configure-selected install program inside > PostgreSQL (which we can test easily), and use install-sh under > USE_PGXS? Admittedly, the make install time of extensions is probably > not an issue. That works for me, since as you say we can easily fix any such bugs in the core code. The scary thing about this for extension authors is that they may very well see no bug in their own testing, only to have their packages fall over in the wild. We shouldn't make each author who's copied that code rediscover the problem for themselves that expensively. > (The affected extensions will still be buggy because users can still > substitute their own install programs. We're just hiding the issue for > a while.) I'm not following this argument. The old makefile coding explicitly set INSTALL = $(SHELL) $(top_srcdir)/config/install-sh -c The only way I can see that that would be overridden is an explicit make INSTALL=/my/random/script install and surely we cannot guarantee that things don't break when you do something like that. There are hundreds of ways you can break the build if you mess with makefile variables. So I'm not prepared to call it a bug if that doesn't work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
> Is it right that it is only in the case a password prompt is needed > that a new connection is created after dropping the just-failed > connection? It's quite doubtful.\:-p The sequense seems fragile to say the least. Inserting password request state into the client-side state machine looks quite reasonable. > I created a patch which enables it to use the existing connection in > such a case (unlike what we currently do). It modifies > connectDBComplete() and PQconnectPoll() to also include states > pertaining to password being accepted from the user. That is, the > state machine in PQconnectPoll() is further extended to include a > connection state called CONNECTION_ASKING_PASSWORD which is entered > when server sends AUTH_REQ_MD5 or AUTH_REQ_PASSWORD auth requests. Great! The new client state seems to be effective also for MD5. But it seems to break existing libpq client doing the same authentication sequence as current psql. Some means would be necessary to switch the behavior when password is not previously provided but needed by the server, or make the first half of the connection sequence to be compatible to the current sequence - in other words - It should be that when the user finds stauts is CONNECTION_BAD and PQconnectionNeedsPassword() == true, the user can throw away the connection and make new connection providing password, and also can send password on existing connection. the old style | db = PQconnect(); | if (PQstatus(db) == CONNECTION_BAD && PQconnectionNeedsPassword(db)) | { | PQfinish(db); | value[..] = password = ; | db = PQconnect(); | if (PQstatus(db) == CONNECTION_BAD) | and the new style | db = PQconnect(); | if (PQconnectionNeedsPassword(db)) | PQsendPassword(db, password); | if (PQstatus(db) == CONNECTION_BAD) | should be standing together. Where, PQsendPassword is combined function of PQcopyPassword and PQcontinuedbConnect. If the only porpose of these functions is sending password then these functions are needed to be separately. What do you think for the compatibility and simpler API. > These two request types require a password to be entered by the user. > There is a new PostgresPollingStatusType value called > PGRES_POLLING_WAITING_PASSWORD which is the polling status while a > password is being entered by the user. > > When user enters the password the PQconnectPoll() continues forward in > CONNECTION_ASKING_PASSWORD wherein it sends the password to the server > (by calling pg_fe_sendauth() and this time with a potentially correct > password) and later goes back to CONNECTION_AWAITING_RESPONSE to read > server's response to the password just entered where it either > receives authorization OK or error response thus completing the > connection start-up process. > > The backend waits for the password until authentication timeout > happens in which case the client can not send the password anymore > since the backend has exited due to authentication timeout. I wonder > if this is one of the reasons why this has not already been > implemented? > > Comments? Hmmm. On current implement, server is not running while the client is reading password so the authentication timeout is provided only for hostile clients. Conversely, the new sequence can enforce true authentication timeout. It results in error after leaving the password prompt for 60 seconds. I suppose that more desirable behavior in spite of the poor message.. | Password: | psql: fe_sendauth: error sending password authentication The point at issue now seems how to inform the timeout to the client under reading password, especially prohibiting using thread nor SIGALRM. Providing password input function in libpq like below might make it viable using select(2). PQsendPassword(prompt="Passowrd: ", in_fd = stdin) Any thoughts? regareds, -- Kyotaro Horiguchi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
On 5/14/13 10:38 AM, Cédric Villemain wrote: > If I follow your example, then I would rewrite http://manager.pgxn.org/howto Oh that's where that is coming from? Well that example has all kinds of problems. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
On 5/14/13 5:45 PM, Tom Lane wrote: > We changed to using install-sh unconditionally back in 2001 because > we had too many problems with system-provided scripts that didn't do > what we expected. I see very little reason to believe that the > compatibility problems have disappeared since then, and in fact this > complaint seems to me to be sufficient to refute that thesis. The compatibility issues in 2001 were completely different ones and were explicitly resolved with a new configure check (which is used worldwide by thousands of packages, note). Let's not confuse the issue. > The argument that the system-provided program might be faster carries > very little weight for me --- "make install" is fast enough already. It is not for me. Note also that make install is part of test runs. > It's not worth making a bunch of extension authors jump through hoops, > whether their style was bad or not. Well, I consider that this is not a style issue. It's an issue of wide-spread bugginess caused by uninformed copy-and-paste, and I'm glad we found it. Considering the widespread crappiness in PostgreSQL extension makefiles, I don't consider it a problem that a few things need to be fixed. That said, I'm obviously outnumbered here. What about the following compromise: Use the configure-selected install program inside PostgreSQL (which we can test easily), and use install-sh under USE_PGXS? Admittedly, the make install time of extensions is probably not an issue. (The affected extensions will still be buggy because users can still substitute their own install programs. We're just hiding the issue for a while.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw foreign tables and serial columns
Sent from my iPad On 15-May-2013, at 18:56, Tom Lane wrote: > "Nicholson, Brad (Toronto, ON, CA)" writes: >> [ this error message sucks: ] >> test=# create foreign table local_foo (id serial) server test_server options >> (table_name 'foo'); >> >> ERROR: referenced relation "local_foo" is not a table > > Yeah, I'd noticed that myself. We could probably tweak the code to > issue a different error message and/or add a HINT if the serial's > parent relation is a foreign table. I'm not exactly sure what it > should say though. Thoughts? > > > > > +1 for the HINT message addition. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw foreign tables and serial columns
"Nicholson, Brad (Toronto, ON, CA)" writes: > [ this error message sucks: ] > test=# create foreign table local_foo (id serial) server test_server options > (table_name 'foo'); > ERROR: referenced relation "local_foo" is not a table Yeah, I'd noticed that myself. We could probably tweak the code to issue a different error message and/or add a HINT if the serial's parent relation is a foreign table. I'm not exactly sure what it should say though. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
On Tue, May 14, 2013 at 10:30 PM, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> I still think we should revert 9db7ccae2000524b72a4052352cbb5407fb53b02. >> The argument that the system-provided program might be faster carries >> very little weight for me --- "make install" is fast enough already. >> It's not worth making a bunch of extension authors jump through hoops, >> whether their style was bad or not. > > +1. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
> Please add patches here so they don't get forgotten: > > https://commitfest.postgresql.org/action/commitfest_view/open > > Do we really need to add *2* new libpq functions just to support this? I will add the patches to commitfest after reviewing it a bit to see if we can do away without having to create more new functions than necessary and make appropriate changes. -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] postgres_fdw foreign tables and serial columns
Hi, I'm kicking the tires on the 9.3 postgres_fdw stuff - I'm not sure if this is an issue or intended behavior, but it was pretty confusing based on the error message that was output. If you try creating a foreign table with a reference to a serial data type, it comes back with a "referenced relation is not a table" error. If you change the data type in the referenced table to integer - then it works. Completely understand why this is needed - but it tripped me up for a while. At the very least, can I suggest adding something in the documentation about serial columns (if it is not an issue)? test=# create table foo (id serial); CREATE TABLE test=# create foreign table local_foo (id serial) server test_server options (table_name 'foo'); ERROR: referenced relation "local_foo" is not a table test=# create foreign table local_foo (id integer) server test_server options (table_name 'foo'); CREATE FOREIGN TABLE Brad -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed TODO: add support for "any" for PL/PythonU and PL/Perl
+1 2013/5/15 Robert Haas : > On Tue, May 14, 2013 at 2:24 PM, Josh Berkus wrote: >> Hackers, >> >> I'd like to add the following todo items to the TODO list: >> >> PL/Python: >> * add support for anyelement and anyarray to PL/Python >> * add support for VARIADIC "ANY" to PL/Python >> >> PL/Perl: >> * add support for anyelement and anyarray to PL/Perl >> * add support for VARIADIC "ANY" to PL/Perl >> >> The reason for this is that both Python and Perl are loosely typed >> languages, and deal with variables as polymorphic. Mapping Postgres >> polymorphic parameters to functions in these PLs couldn't be more >> natural. I know from my part that support for VARIADIC "ANY" in >> PL/Python would save me a bunch of shell function writing. >> >> I don't personally intend to hack these out, but they seem like good >> things to put on the TODO list as good tasks for new hackers (and maybe >> GSOC students) to take on. > > +1. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed TODO: add support for "any" for PL/PythonU and PL/Perl
On Tue, May 14, 2013 at 2:24 PM, Josh Berkus wrote: > Hackers, > > I'd like to add the following todo items to the TODO list: > > PL/Python: > * add support for anyelement and anyarray to PL/Python > * add support for VARIADIC "ANY" to PL/Python > > PL/Perl: > * add support for anyelement and anyarray to PL/Perl > * add support for VARIADIC "ANY" to PL/Perl > > The reason for this is that both Python and Perl are loosely typed > languages, and deal with variables as polymorphic. Mapping Postgres > polymorphic parameters to functions in these PLs couldn't be more > natural. I know from my part that support for VARIADIC "ANY" in > PL/Python would save me a bunch of shell function writing. > > I don't personally intend to hack these out, but they seem like good > things to put on the TODO list as good tasks for new hackers (and maybe > GSOC students) to take on. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
On Tue, May 14, 2013 at 11:20 AM, Amit Langote wrote: > Hello, > > Is it right that it is only in the case a password prompt is needed > that a new connection is created after dropping the just-failed > connection? > I created a patch which enables it to use the existing connection in > such a case (unlike what we currently do). It modifies > connectDBComplete() and PQconnectPoll() to also include states > pertaining to password being accepted from the user. That is, the > state machine in PQconnectPoll() is further extended to include a > connection state called CONNECTION_ASKING_PASSWORD which is entered > when server sends AUTH_REQ_MD5 or AUTH_REQ_PASSWORD auth requests. > These two request types require a password to be entered by the user. > There is a new PostgresPollingStatusType value called > PGRES_POLLING_WAITING_PASSWORD which is the polling status while a > password is being entered by the user. > > When user enters the password the PQconnectPoll() continues forward in > CONNECTION_ASKING_PASSWORD wherein it sends the password to the server > (by calling pg_fe_sendauth() and this time with a potentially correct > password) and later goes back to CONNECTION_AWAITING_RESPONSE to read > server's response to the password just entered where it either > receives authorization OK or error response thus completing the > connection start-up process. > > The backend waits for the password until authentication timeout > happens in which case the client can not send the password anymore > since the backend has exited due to authentication timeout. I wonder > if this is one of the reasons why this has not already been > implemented? > > Comments? Please add patches here so they don't get forgotten: https://commitfest.postgresql.org/action/commitfest_view/open Do we really need to add *2* new libpq functions just to support this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better handling of archive_command problems
On Tue, May 14, 2013 at 12:23 AM, Daniel Farina wrote: > On Mon, May 13, 2013 at 3:02 PM, Peter Geoghegan wrote: >> Has anyone else thought about approaches to mitigating the problems >> that arise when an archive_command continually fails, and the DBA must >> manually clean up the mess? > > Notably, the most common problem in this vein suffered at Heroku has > nothing to do with archive_command failing, and everything to do with > the ratio of block device write performance (hence, backlog) versus > the archiving performance. When CPU is uncontended it's not a huge > deficit, but it is there and it causes quite a bit of stress. > > Archive commands failing are definitely a special case there, where it > might be nice to bring write traffic to exactly zero for a time. One possible objection to this line of attack is that, IIUC, waits to acquire a LWLock are non-interruptible. If someone tells PostgreSQL to wait for some period of time before performing each WAL write, other backends that grab the WALWriteLock will not respond to query cancels during that time. Worse, the locks have a tendency to back up. What I have observed is that if WAL isn't flushed in a timely fashion, someone will try to grab WALWriteLock while holding WALInsertLock. Now anyone who attempts to insert WAL is in a non-interruptible wait. If the system is busy, it won't be long before someone tries to extend pg_clog, and to do that they'll try to grab WALInsertLock while holding CLogControlLock. At that point, any CLOG lookup that misses in the already-resident pages will send that backend into a non-interruptible wait. I have seen cases where this pile-up occurs during a heavy pgbench workload and paralyzes the entire system, including any read-only queries, until the WAL write completes. Now despite all that, I can see this being useful enough that Heroku might want to insert a very small patch into their version of PostgreSQL to do it this way, and just live with the downsides. But anything that can propagate non-interruptible waits across the entire system does not sound to me like a feature that is sufficiently polished that we want to expose it to users less sophisticated than Heroku (i.e. nearly all of them). If we do this, I think we ought to find a way to make the waits interruptible, and to insert them in places where they really don't interfere with read-only backends. I'd probably also argue that we ought to try to design it such that the GUC can be in MB/s rather than delay/WAL writer cycle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Dead regression tests
There are a few .sql files in src/test/regress/sql/, that AFAICS are not used for anything: drop.sql Looking at the history, this was used for some kind of cleanup in the ancient regress.sh script. It has been unused since Nov 2000, when regress.sh was removed. euc_cn.sql euc_jp.sql euc_kr.sql euc_tw.sql mule_internal.sql These are duplicates of the scripts in src/test/mb/sql/. sql_ascii.sql This is similar to the euc_* and mule_internal tests, but is not present in src/test/mb/sql. Looking at the expected output, had this test ever been run, it would've failed since at least 2001, when the error message memorized in the expected output, "Relation '%s' does not exist", was changed in the code. hs_primary_extremes.sql This was introduced as part of the hot standby patch. I don't see any reference to how it's supposed to be used. Simon? Barring objections, I'm going to just remove all of the above, except for hs_primary_extremes.sql. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: option --application_name for psql
2013/5/14 Pavel Stehule : > 2013/5/14 Magnus Hagander : >> On Tue, May 14, 2013 at 9:36 PM, Pavel Stehule >> wrote: >>> 2013/5/14 Magnus Hagander : psql in "connection string mode" >>> >>> Hi Magnus, can me send some link? >> >> http://www.postgresql.org/docs/9.2/static/app-psql.html >> >> "If this parameter contains an = sign or starts with a valid URI >> prefix (postgresql:// or postgres://), it is treated as a conninfo >> string. See Section 31.1 for more information." > > I was blind, thank you After comments from Erik and Magnus my proposal has no sense. Regards Pavel > > Pavel > >> >> >> -- >> Magnus Hagander >> Me: http://www.hagander.net/ >> Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slicing TOAST
Hello, Heikki. You wrote: HL> On 14.05.2013 21:36, Josh Berkus wrote: >> >>> I'm proposing this now as a possible GSoC project; I don't propose to >>> actively work on it myself. >> >> The deadline for submitting GSOC projects (by students) was a week ago. >> So is this a project suggestion for next year ...? HL> I've been thinking, we should already start collecting ideas for next HL> year, and collect them throughout the year. I know I come up with some HL> ideas every now and then, but when it's time for another GSoC, I can't HL> remember any of them. HL> I just created a GSoC2014 ideas pages on the wiki, for collecting these: HL> https://wiki.postgresql.org/wiki/GSoC_2014. Let's keep the ideas coming, HL> throughout the year. Good idea! It reminds about feature proposed by Pavel Stehule while ago here: http://www.postgresql.org/message-id/banlktini+chgkfnyjkf1rshsq2kmkts...@mail.gmail.com It's about streaming functionality for BYTEA type. But I think streaming must be added to BYTEA, TEXT and VARCHAR without length specifier too. As Pavel stated: "A very large bytea are limited by query size - processing long query needs too RAM". This is the holy true, which came up suddenly in the project of one of my client. Becuase he used bytea for images storing and text format in PQexec, which as you know doubles-triples size of the data. Some more details from Pavel: There is a few disadvantages LO against bytea, so there are requests for "smarter" API for bytea. Significant problem is different implementation of LO for people who have to port application to PostgreSQL from Oracle, DB2. There are some JDBC issues too. For me - main disadvantage of LO in one space for all. Bytea removes this disadvantage, but it is slower for lengths > 20 MB. It could be really very practical have a possibility insert some large fields in second NON SQL stream. Same situation is when large bytea is read. I'm not sure if the whole project is simple enough for GSOC, but I suppose it may be splitted somehow. PS Should we start separate thread for proposals, because I've spent an hour since I found wiki for GSOC14 mention. HL> - Heikki -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers