Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Thu, Feb 16, 2012 at 02:24:19PM +0900, Kyotaro HORIGUCHI wrote: As far as I see, on an out-of-memory in getAnotherTuple() makes conn-result-resultStatus = PGRES_FATAL_ERROR and qpParseInputp[23]() skips succeeding 'D' messages consequently. When exception raised within row processor, pg_conn-inCursor always positioned in consistent and result-resultStatus == PGRES_TUPLES_OK. The choices of the libpq user on that point are, - Continue to read succeeding tuples. Call PQgetResult() to read 'D' messages and hand it to row processor succeedingly. - Throw away the remaining results. Call pqClearAsyncResult() and pqSaveErrorResult(), then call PQgetResult() to skip over the succeeding 'D' messages. (Of course the user can't do that on current implement.) There is also third choice, which may be even more popular than those ones - PQfinish(). To make the users able to select the second choice (I think this is rather major), we should only provide and export the new PQ* function to do that, I think. I think we already have such function - PQsetRowProcessor(). Considering the user can use that to install skipping callback or simply set some flag in it's own per-connection state, I suspect the need is not that big. But if you want to set error state for skipping, I would instead generalize PQsetRowProcessorErrMsg() to support setting error state outside of callback. That would also help the external processing with 'return 2'. But I would keep the requirement that row processing must be ongoing, standalone error setting does not make sense. Thus the name can stay. There seems to be 2 ways to do it: 1) Replace the PGresult under PGconn. This seems ot require that PQsetRowProcessorErrMsg takes PGconn as argument instead of PGresult. This also means callback should get PGconn as argument. Kind of makes sense even. 2) Convert current partial PGresult to error state. That also makes sense, current use -rowProcessorErrMsg to transport the message to later set the error in caller feels weird. I guess I slightly prefer 2) to 1). -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Google Summer of Code? Call for mentors.
On Wed, Feb 15, 2012 at 10:18 PM, Josh Berkus j...@agliodbs.com wrote: Hackers, The call is now open for Google Summer of Code. If you are interested in being a GSoC mentor this summer, please reply to this email. I want to gauge whether or not we should participate this summer. I'll play, but I think I'm going to have to be very sure about any project before I agree to mentor it this year (not that there was anything wrong with my student last year - quite the opposite in fact - I just need to be sure I'm spending my time on the right things). -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)
On Thu, Feb 16, 2012 at 5:02 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 15.02.2012 18:52, Fujii Masao wrote: On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Are you still seeing this failure with the latest patch I posted (http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)? Yes. Just to be safe, I again applied the latest patch to HEAD, compiled that and tried the same test. Then unfortunately I got the same failure again. Ok. I ran the configure with '--enable-debug' '--enable-cassert' 'CPPFLAGS=-DWAL_DEBUG', and make with -j 2 option. When I ran the test with wal_debug = on, I got the following assertion failure. LOG: INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/197 STATEMENT: create table t (i int); insert into t values(generate_series(1,1)); delete from t LOG: INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/198 STATEMENT: create table t (i int); insert into t values(generate_series(1,1)); delete from t TRAP: FailedAssertion(!(((bool) (((void*)((target-tid)) != ((void *)0)) (((target-tid))-ip_posid != 0, File: heapam.c, Line: 5578) LOG: xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0 LOG: xlog bg flush request 0/17B4000; write 0/17B; flush 0/17B LOG: server process (PID 16806) was terminated by signal 6: Abort trap This might be related to the original problem which Jeff and I saw. That's strange. I made a fresh checkout, too, and applied the patch, but still can't reproduce. I used the attached script to test it. It's surprising that the crash happens when the records are inserted, not at recovery. I don't see anything obviously wrong there, so could you please take a look around in gdb and see if you can get a clue what's going on? What's the stack trace? According to the above log messages, one strange thing is that the location of the WAL record (i.e., 0/17B3F90) is not the same as the previous location of the following WAL record (i.e., 0/17B3F50). Is this intentional? BTW, when I ran the test on my Ubuntu, I could not reproduce the problem. I could reproduce the problem only in MacOS. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)
On Thu, Feb 16, 2012 at 6:15 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Feb 16, 2012 at 5:02 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 15.02.2012 18:52, Fujii Masao wrote: On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Are you still seeing this failure with the latest patch I posted (http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)? Yes. Just to be safe, I again applied the latest patch to HEAD, compiled that and tried the same test. Then unfortunately I got the same failure again. Ok. I ran the configure with '--enable-debug' '--enable-cassert' 'CPPFLAGS=-DWAL_DEBUG', and make with -j 2 option. When I ran the test with wal_debug = on, I got the following assertion failure. LOG: INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/197 STATEMENT: create table t (i int); insert into t values(generate_series(1,1)); delete from t LOG: INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/198 STATEMENT: create table t (i int); insert into t values(generate_series(1,1)); delete from t TRAP: FailedAssertion(!(((bool) (((void*)((target-tid)) != ((void *)0)) (((target-tid))-ip_posid != 0, File: heapam.c, Line: 5578) LOG: xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0 LOG: xlog bg flush request 0/17B4000; write 0/17B; flush 0/17B LOG: server process (PID 16806) was terminated by signal 6: Abort trap This might be related to the original problem which Jeff and I saw. That's strange. I made a fresh checkout, too, and applied the patch, but still can't reproduce. I used the attached script to test it. It's surprising that the crash happens when the records are inserted, not at recovery. I don't see anything obviously wrong there, so could you please take a look around in gdb and see if you can get a clue what's going on? What's the stack trace? According to the above log messages, one strange thing is that the location of the WAL record (i.e., 0/17B3F90) is not the same as the previous location of the following WAL record (i.e., 0/17B3F50). Is this intentional? BTW, when I ran the test on my Ubuntu, I could not reproduce the problem. I could reproduce the problem only in MacOS. + nextslot = Insert-nextslot; + if (NextSlotNo(nextslot) == lastslot) + { + /* +* Oops, we've caught our tail and the oldest slot is still in use. +* Have to wait for it to become vacant. +*/ + SpinLockRelease(Insert-insertpos_lck); + WaitForXLogInsertionSlotToBecomeFree(); + goto retry; + } + myslot = XLogCtl-XLogInsertSlots[nextslot]; + nextslot = NextSlotNo(nextslot); nextslot can reach NumXLogInsertSlots, which would be a bug, I guess. When I did the quick-fix and ran the test, I could not reproduce the problem any more. I'm not sure if this is really the cause of the problem, though. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Designing an extension for feature-space similarity search
On Thu, Feb 16, 2012 at 12:34 AM, Jay Levitt jay.lev...@gmail.com wrote: - But a dimension might be in any domain, not just floats - The distance along each dimension is a domain-specific function What exact domains do you expect? Some domains could appear to be quite hard for index-based similarity search using GiST (for example, sets, strings etc.). -- With best regards, Alexander Korotkov.
Re: [HACKERS] 16-bit page checksums for 9.2
A Dijous, 16 de febrer de 2012 12:16:31, Simon Riggs va escriure: v8 attached Maybe the docs should include what will happen if the checksum is not correct? -- Albert Cervera i Areny http://www.NaN-tic.com Tel: +34 93 553 18 03 http://twitter.com/albertnan http://www.nan-tic.com/blog
Re: [HACKERS] 16-bit page checksums for 9.2
On Thu, Feb 16, 2012 at 6:16 AM, Simon Riggs si...@2ndquadrant.com wrote: v8 attached It's hard to believe that this version has been tested terribly thoroughly, because it doesn't compile. + LockBufHdr(buf); + + /* +* Run PageGetLSN while holding header lock. +*/ + recptr = BufferGetLSN(buf); + + /* To check if block content changes while flushing. - vadim 01/17/97 */ + buf-flags = ~BM_JUST_DIRTIED; + UnlockBufHdr(buf); + This doesn't seem very helpful. It's obvious even without the comment that we're running PageGetLSN while holding the header lock. What's not obvious, and what the comment should be explaining, is why we're doing that. + /* +* If we're in recovery we cannot dirty a page because of a hint. +* We can set the hint, just not dirty the page as a result so +* the hint is lost when we evict the page or shutdown. +* +* See long discussion in bufpage.c +*/ + if (RecoveryInProgress()) + return; Doesn't this seem awfully bad for performance on Hot Standby servers? I agree that it fixes the problem with un-WAL-logged pages there, but I seem to recall some recent complaining about performance features that work on the master but not the standby. Durable hint bits are one such feature. +* Basically, we simply prevent the checkpoint WAL record from +* being written until we have marked the buffer dirty. We don't +* start the checkpoint flush until we have marked dirty, so our +* checkpoint must flush the change to disk successfully or the +* checkpoint never gets written, so crash recovery will fix. +* +* It's possible we may enter here without an xid, so it is +* essential that CreateCheckpoint waits for virtual transactions +* rather than full transactionids. +*/ + MyPgXact-delayChkpt = delayChkpt = true; I am slightly worried that this expansion in the use of this mechanism (formerly called inCommit, for those following along at home) could lead to checkpoint starvation. Suppose we've got one or two large table scans wandering along, setting hint bits, and now suddenly it's time to checkpoint. How long will it take the checkpoint process to find a time when nobody's got delayChkpt set? + #define PageSetChecksum(page) \ + do \ + { \ + PageHeader p = (PageHeader) page; \ + p-pd_flags |= PD_PAGE_VERSION_PLUS1; \ + p-pd_flags |= PD_CHECKSUM1; \ + p-pd_flags = ~PD_CHECKSUM2; \ + p-pd_verify.pd_checksum16 = PageCalcChecksum16(page); \ + } while (0); + + /* ensure any older checksum info is overwritten with watermark */ + #define PageResetVersion(page) \ + do \ + { \ + PageHeader p = (PageHeader) page; \ + if (!PageHasNoChecksum(p)) \ + { \ + p-pd_flags = ~PD_PAGE_VERSION_PLUS1; \ + p-pd_flags = ~PD_CHECKSUM1; \ + p-pd_flags = ~PD_CHECKSUM2; \ + PageSetPageSizeAndVersion(p, BLCKSZ, PG_PAGE_LAYOUT_VERSION); \ + } \ + } while (0); So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it doesn't have a checksum, PD_CHECKSUM2 is not set? What good does that do? * PageGetPageSize *Returns the page size of a page. * ! * Since PageSizeIsValid() when pagesize == BLCKSZ, just written BLCKSZ. ! * This can be called on any page, initialised or not, in or out of buffers. ! * You might think this can vary at runtime but you'd be wrong, since pages ! * frequently need to occupy buffers and pages are copied from one to another ! * so there are many hidden assumptions that this simple definition is true. */ ! #define PageGetPageSize(page) (BLCKSZ) I think I agree that the current definition of PageGetPageSize seems unlikely to come anywhere close to allowing us to cope with multiple page sizes, but I think this method of disabling it is a hack. The callers that want to know how big the page really is should just use BLCKSZ instead of this macro, and those that want to know how big the page THINKS it is (notably contrib/pageinspect) need a way to get that information. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] run GUC check hooks on RESET
On Wed, Feb 15, 2012 at 5:36 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Agreed. I'm not sure we want to change the message text at all in 9.1. Translations and all that. Agreed. I think we definitely don't want to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we add crc32 in libpgport?
On Fri, Feb 3, 2012 at 7:33 PM, Daniel Farina dan...@heroku.com wrote: Ah, yes, I think my optimizations were off when building, or something. I didn't get such verbosity at first, and then I remember doing something slightly different and then getting a lot of output. I didn't pay attention to the build size. I will investigate. [...] I agree, I was about to say what about a preprocessor hack... after the last paragraph, but saw you beat me to the punch. I'll have a look soon. Ping! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Shigeru Hanada wrote: Thanks for the review. Attached patches are revised version, though only fdw_helper_v5.patch is unchanged. Two questions: - Is it on purpose that you can specify all SSL client options except sslcompression? - Since a rescan is done by rewinding the cursor, is it necessary to have any other remote isolation level than READ COMMITED? There is only one query issued per transaction. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Thu, Feb 16, 2012 at 05:49:34PM +0900, Kyotaro HORIGUCHI wrote: I added the function PQskipRemainingResult() and use it in dblink. This reduces the number of executing try-catch block from the number of rows to one per query in dblink. This implementation is wrong - you must not simply call PQgetResult() when connection is in async mode. And the resulting PGresult must be freed. Please just make PGsetRowProcessorErrMsg() callable outside from callback. That also makes clear to code that sees final PGresult what happened. As a bonus, this will simply make the function more robust and less special. Although it's easy to create some PQsetRowSkipFlag() function that will silently skip remaining rows, I think such logic is best left to user to handle. It creates unnecessary special case when handling final PGresult, so in the big picture it creates confusion. This patch is based on the patch above and composed in the same manner - main three patches include all modifications and the '2' patch separately. I think there is not need to carry the early-exit patch separately. It is visible in archives and nobody screamed about it yet, so I guess it's acceptable. Also it's so simple, there is no point in spending time rebasing it. diff --git a/src/interfaces/libpq/#fe-protocol3.c# b/src/interfaces/libpq/#fe-protocol3.c# There is some backup file in your git repo. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
I found a strange behavior with v10. Is it available to reproduce? In case of ftbl is declared as follows: postgres=# select * FROM ftbl; a | b ---+- 1 | aaa 2 | bbb 3 | ccc 4 | ddd 5 | eee (5 rows) I tried to raise an error on remote side. postgres=# select * FROM ftbl WHERE 100 / (a - 3) 0; The connection to the server was lost. Attempting reset: Failed. The connection to the server was lost. Attempting reset: Failed. ! \q Its call-trace was: (gdb) bt #0 0x0031030810a4 in free () from /lib64/libc.so.6 #1 0x7f2caa620bd9 in PQclear (res=0x2102500) at fe-exec.c:679 #2 0x7f2caa83c4db in execute_query (node=0x20f20a0) at pgsql_fdw.c:722 #3 0x7f2caa83c64a in pgsqlIterateForeignScan (node=0x20f20a0) at pgsql_fdw.c:402 #4 0x005c120f in ForeignNext (node=0x20f20a0) at nodeForeignscan.c:50 #5 0x005a9b37 in ExecScanFetch (recheckMtd=0x5c11c0 ForeignRecheck, accessMtd=0x5c11d0 ForeignNext, node=0x20f20a0) at execScan.c:82 #6 ExecScan (node=0x20f20a0, accessMtd=0x5c11d0 ForeignNext, recheckMtd=0x5c11c0 ForeignRecheck) at execScan.c:132 #7 0x005a2128 in ExecProcNode (node=0x20f20a0) at execProcnode.c:441 #8 0x0059edc2 in ExecutePlan (dest=0x210f280, direction=optimized out, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x20f20a0, estate=0x20f1f88) at execMain.c:1449 This is the PG_CATCH block at execute_query(). fetch_result() raises an error, then it shall be catched to release PGresult. Although res should be NULL at this point, PQclear was called with a non-zero value according to the call trace. More strangely, I tried to inject elog(INFO, ...) to show the value of res at this point. Then, it become unavailable to reproduce when I tried to show the pointer of res with elog(INFO, res = %p, res); Why the res has a non-zero value, even though it was cleared prior to fetch_result() and an error was raised within this function? Thanks, 2012年2月16日13:41 Shigeru Hanada shigeru.han...@gmail.com: Kaigai-san, Thanks for the review. Attached patches are revised version, though only fdw_helper_v5.patch is unchanged. (2012/02/16 0:09), Kohei KaiGai wrote: [memory context of tuple store] It calls tuplestore_begin_heap() under the memory context of festate-scan_cxt at pgsqlBeginForeignScan. Yes, it's because tuplestore uses a context which was current when tuplestore_begin_heap was called. I want to use per-scan context for tuplestore, to keep its content tuples alive through the scan. On the other hand, tuplestore_gettupleslot() is called under the memory context of festate-tuples. Yes, result tuples to be returned to executor should be allocated in per-scan context and live until next IterateForeignScan (or EndForeignScan), because such tuple will be released via ExecClearTuple in next IterateForeignScan call. If we don't switch context to per-scan context, result tuple is allocated in per-tuple context and cause double-free and server crash. I could not find a callback functions being invoked on errors, so I doubt the memory objects acquired within tuplestore_begin_heap() shall be leaked, even though it is my suggestion to create a sub-context under the existing one. How do you confirmed that no callback function is invoked on errors? I think that memory objects acquired within tuplestore_begin_heap (I guess you mean excluding stored tuples, right?) are released during cleanup of aborted transaction. I tested that by adding elog(ERROR) to the tail of store_result() for intentional error, and execute large query 100 times in a session. I saw VIRT value (via top command) comes down to constant level after every query. In my opinion, it is a good choice to use es_query_cxt of the supplied EState. What does prevent to apply this per-query memory context? Ah, I've confused context management of pgsql_fdw... I fixed pgsql_fdw to create per-scan context as a child of es_query_cxt in BeginForeignScan, and use it for tuplestore of the scan. So, tuplestore and its contents are released correctly at EndForeignScan, or cleanup of aborted transaction in error case. You mention about PGresult being malloc()'ed. However, it seems to me fetch_result() and store_result() once copy the contents on malloc()'ed area to the palloc()'ed area, and PQresult is released on an error using PG_TRY() ... PG_CATCH() block. During thinking about this comment, I found double-free bug of PGresult in execute_query, thanks :) But, sorry, I'm not sure what the concern you show here is. The reason why copying tuples from malloc'ed area to palloc'ed area is to release PGresult before returning from the IterateForeingScan call. The reason why using PG_TRY block is to sure that PGresult is released before jump back to upstream in error case. [Minor comments] Please set NULL to sql variable at begin_remote_tx(). Compiler raises a warnning
Re: [HACKERS] Designing an extension for feature-space similarity search
Alexander Korotkov wrote: On Thu, Feb 16, 2012 at 12:34 AM, Jay Levitt jay.lev...@gmail.com mailto:jay.lev...@gmail.com wrote: - But a dimension might be in any domain, not just floats - The distance along each dimension is a domain-specific function What exact domains do you expect? Some domains could appear to be quite hard for index-based similarity search using GiST (for example, sets, strings etc.). Oh, nothing nearly so complex, and (to Tom's point) no composite types, either. Right now we have demographics like gender, geolocation, and birthdate; I think any domain will be a type that's easily expressible in linear terms. I was thinking in domains rather than types because there isn't one distance function for date or float; me.birthdate - you.birthdate birthdate is normalized to a different curve than now() - posting_date, and raw_score - raw_score would differ from z_score - z_score. It would have been elegant to express that distance with -, but since domains can't have operators, I can create distance(this, other) functions for each domain. It just won't look as pretty! Jay -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] possible new option for wal_sync_method
When running Postgres on a single ext3 filesystem on Linux, we find that the attached simple patch gives significant performance benefit (7-8% in numbers below). The patch adds a new option for wal_sync_method, which is open_direct. With this option, the WAL is always opened with O_DIRECT (but not O_SYNC or O_DSYNC). For Linux, the use of only O_DIRECT should be correct. All WAL logs are fully allocated before being used, and the WAL buffers are 8K-aligned, so all direct writes are guaranteed to complete before returning. (See http://lwn.net/Articles/348739/) The advantage of using O_DIRECT is that there is no fsync/fdatasync() used. All of the other wal_sync_methods use fsync/fdatasync(), either explicitly or implicitly (via the O_SYNC and O_DATASYNC options). fsync/fdatasync can be very slow on ext3, because it seems to have to always wait for the current filesystem meta-data transaction to complete, even if that meta-data operation is completely unrelated to the file being fsync'ed. There can be many metadata operations happening on the data files, so the WAL log fsync can wait for metadata operations on the data files. Since O_DIRECT does not do any fsync/fdatasync operation, it avoids this bottleneck, and can finish more quickly on average. The open_sync and open_dsync options do not have this benefit, because they do an equivalent of an fsync/fdatasync after every WAL write. For the open_sync and open_dsync options, O_DIRECT is used for writes only if the xlog will not need to be consumed by the archiver or hot-standby. I am not keying the open_direct behavior based on whether XLogIsNeeded() is true, because we see performance gain even when archiving is enabled (using a simple script that copies and compresses the log segments). For 2-processor, 50-warehouse DBT2 run on SLES 11, I get the following NOTPM results: wal_sync_method fdatasync open_direct open_sync archiving off: 17076 18481 17094 archiving on: 15704 16923 15898 Do folks have any interest in this change, or comments on its usefulness/correctness? It would be just an extra option for wal_sync_method that users can try out and has benefits for certain configurations. Dan diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 266c0de..a830a01 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -122,6 +122,7 @@ const struct config_enum_entry sync_method_options[] = { #ifdef OPEN_DATASYNC_FLAG {open_datasync, SYNC_METHOD_OPEN_DSYNC, false}, #endif + {open_direct, SYNC_METHOD_OPEN_DIRECT, false}, {NULL, 0, false} }; @@ -1925,7 +1926,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch) * fsync more than one file. */ if (sync_method != SYNC_METHOD_OPEN - sync_method != SYNC_METHOD_OPEN_DSYNC) + sync_method != SYNC_METHOD_OPEN_DSYNC + sync_method != SYNC_METHOD_OPEN_DIRECT) { if (openLogFile = 0 !XLByteInPrevSeg(LogwrtResult.Write, openLogId, openLogSeg)) @@ -8958,6 +8960,15 @@ get_sync_bit(int method) case SYNC_METHOD_OPEN_DSYNC: return OPEN_DATASYNC_FLAG | o_direct_flag; #endif + case SYNC_METHOD_OPEN_DIRECT: + /* + * Open the log with O_DIRECT flag only. O_DIRECT guarantees + * that data is written to disk when the IO completes if and + * only if the file is fully allocated. Fortunately, the log + * files are always fully allocated by XLogFileInit() (or are + * recycled from a fully-allocated log). + */ + return O_DIRECT; default: /* can't happen (unless we are out of sync with option array) */ elog(ERROR, unrecognized wal_sync_method: %d, method); @@ -9031,6 +9042,7 @@ issue_xlog_fsync(int fd, uint32 log, uint32 seg) #endif case SYNC_METHOD_OPEN: case SYNC_METHOD_OPEN_DSYNC: + case SYNC_METHOD_OPEN_DIRECT: /* write synced it already */ break; default: diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 400c52b..97acde5 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -564,3 +564,4 @@ #-- # Add settings for extensions here +wal_sync_method = open_direct diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index f8aecef..b888ee7 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -83,6 +83,7 @@ typedef struct XLogRecord #define SYNC_METHOD_OPEN 2 /* for O_SYNC */ #define SYNC_METHOD_FSYNC_WRITETHROUGH 3 #define SYNC_METHOD_OPEN_DSYNC 4 /* for O_DSYNC */ +#define SYNC_METHOD_OPEN_DIRECT 5 /* for O_DIRECT */ extern int sync_method; /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
2012年2月16日13:41 Shigeru Hanada shigeru.han...@gmail.com: Kaigai-san, Thanks for the review. Attached patches are revised version, though only fdw_helper_v5.patch is unchanged. (2012/02/16 0:09), Kohei KaiGai wrote: [memory context of tuple store] It calls tuplestore_begin_heap() under the memory context of festate-scan_cxt at pgsqlBeginForeignScan. Yes, it's because tuplestore uses a context which was current when tuplestore_begin_heap was called. I want to use per-scan context for tuplestore, to keep its content tuples alive through the scan. On the other hand, tuplestore_gettupleslot() is called under the memory context of festate-tuples. Yes, result tuples to be returned to executor should be allocated in per-scan context and live until next IterateForeignScan (or EndForeignScan), because such tuple will be released via ExecClearTuple in next IterateForeignScan call. If we don't switch context to per-scan context, result tuple is allocated in per-tuple context and cause double-free and server crash. I could not find a callback functions being invoked on errors, so I doubt the memory objects acquired within tuplestore_begin_heap() shall be leaked, even though it is my suggestion to create a sub-context under the existing one. How do you confirmed that no callback function is invoked on errors? I think that memory objects acquired within tuplestore_begin_heap (I guess you mean excluding stored tuples, right?) are released during cleanup of aborted transaction. I tested that by adding elog(ERROR) to the tail of store_result() for intentional error, and execute large query 100 times in a session. I saw VIRT value (via top command) comes down to constant level after every query. Oops, I overlooked the point where MessageContext and its children get reset. However, as its name, I don't believe it was right usage of memory context. As the latest version doing, es_query_cxt is the right way to acquire memory object with per-query duration. In my opinion, it is a good choice to use es_query_cxt of the supplied EState. What does prevent to apply this per-query memory context? Ah, I've confused context management of pgsql_fdw... I fixed pgsql_fdw to create per-scan context as a child of es_query_cxt in BeginForeignScan, and use it for tuplestore of the scan. So, tuplestore and its contents are released correctly at EndForeignScan, or cleanup of aborted transaction in error case. I believe it is right direction. You mention about PGresult being malloc()'ed. However, it seems to me fetch_result() and store_result() once copy the contents on malloc()'ed area to the palloc()'ed area, and PQresult is released on an error using PG_TRY() ... PG_CATCH() block. During thinking about this comment, I found double-free bug of PGresult in execute_query, thanks :) Unfortunately, I found the strange behavior around this code. I doubt an interaction between longjmp and compiler optimization, but it is not certain right now. I'd like to push this patch to committer reviews after this problem got closed. Right now, I don't have comments on this patch any more. But, sorry, I'm not sure what the concern you show here is. The reason why copying tuples from malloc'ed area to palloc'ed area is to release PGresult before returning from the IterateForeingScan call. The reason why using PG_TRY block is to sure that PGresult is released before jump back to upstream in error case. [Minor comments] Please set NULL to sql variable at begin_remote_tx(). Compiler raises a warnning due to references of uninitialized variable, even though the code path never run. Fixed. BTW, just out of curiosity, which compiler do you use? My compiler ,gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10) on Fedora 15, doesn't warn it. I uses Fedora 16, and GCC 4.6.2. [kaigai@iwashi pgsql_fdw]$ gcc --version gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1) It is not a matter related to compiler version, but common manner in PostgreSQL code. You can likely found source code comments like /* keep compiler quiet */ Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Designing an extension for feature-space similarity search
Tom Lane wrote: Jay Levittjay.lev...@gmail.com writes: - I'm not sure how to represent arbitrary column-like features without reinventing the wheel and putting a database in the database. ISTM you could define a composite type and then create operators and an operator class over that type. If you were trying to make a btree opclass there might be a conflict with the built-in record_ops opclass, but since you're only interested in GIST I don't see any real roadblocks. Perfect. Composite types are exactly what I need here; the application can declare its composite type and provide distance functions for each member, and the extension can use those to calculate similarity. How do I introspect the composite type's pg_class to see what it contains? I assume there's a better way than SPI on system catalogs :) Should I be using systable_* functions from genam, or is there an in-memory tree? I feel like funcapi gets me partway there but there's magic in the middle. Can you think of any code that would serve as a sample, maybe whatever creates the output for psql's \d? The main potential disadvantage of this is that you'd have the standard tuple header as overhead in index entries --- but maybe the entries are large enough that that doesn't matter, and in any case you could probably make use of the GIST compress method to get rid of most of the header. Maybe convert to MinimalTuple, for instance, if you want to still be able to leverage existing support code for field extraction. Probably not worth it to save the 8 bytes; we're starting out at about 20 floats per row. But good to know for later optimization... - Can domains have operators, or are operators defined on types? I think the current state of play is that you can have such things but the system will only consider them for exact type matches, so you might need more explicit casts than you ordinarily would. However, we only support domains over base types not composites, so this isn't really going to be a profitable direction for you anyway. Actually, as mentioned to Alexander, I'm thinking of domains per feature, not for the overall tuple, so birthdate-birthdate differs from now()-posting_date. Sounds like that might work - I'll play. - Does KNN-GiST run into problems when- returns values that don't make sense in the physical world? Wouldn't surprise me. In general, non-strict index operators are a bad idea. However, if the indexed entities are records, it would be entirely your own business how you handled individual fields being NULL. Yeah, that example conflated NULLs in the feature fields (we don't know your birthdate) with - on the whole tuple. Oops. I guess I can just test this by verifying that KNN-GiST ordered by distance returns the same results as without the index. Thanks for your help here. Jay -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Wed, Feb 8, 2012 at 7:56 PM, Joachim Wieland j...@mcknight.de wrote: So we at least need to press on far enough to get to that point. Sure, let me know if I can help you with something. Alright. I think (hope) that I've pushed this far enough to serve the needs of parallel pg_dump. The error handling is still pretty grotty and might need a bit more surgery to handle the parallel case, but I think that making this actually not ugly will require eliminating the Archive/ArchiveHandle distinction, which is probably a good thing to do but, as you point out, maybe not the first priority right now. Can you provide an updated patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible new option for wal_sync_method
Hi, On Thursday, February 16, 2012 06:18:23 PM Dan Scales wrote: When running Postgres on a single ext3 filesystem on Linux, we find that the attached simple patch gives significant performance benefit (7-8% in numbers below). The patch adds a new option for wal_sync_method, which is open_direct. With this option, the WAL is always opened with O_DIRECT (but not O_SYNC or O_DSYNC). For Linux, the use of only O_DIRECT should be correct. All WAL logs are fully allocated before being used, and the WAL buffers are 8K-aligned, so all direct writes are guaranteed to complete before returning. (See http://lwn.net/Articles/348739/) I don't think that behaviour is safe in the face of write caches in the IO path. Linux takes care to issue flush/barrier instructions when necessary if you issue an fsync/fdatasync, but to my knowledge it does not when O_DIRECT is used (That would suck performancewise). I think that behaviour is safe if you have no externally visible write caching enabled but thats not exactly easy to get/document knowledge. Why should there otherwise be any performance difference between O_DIRECT| O_SYNC and O_DIRECT in wal write case? There is no metadata that needs to be written and I have a hard time imaging that the check whether there is metadata is that expensive. I guess a more interesting case would be comparing O_DIRECT|O_SYNC with O_DIRECT + fdatasync() or even O_DIRECT + sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER) Any special reason youve did that comparison on ext3? Especially with data=ordered its behaviour regarding syncs is pretty insane performancewise. Ext4 would be a bit more interesting... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml
Alvaro Herrera alvhe...@commandprompt.com writes: If you copied from a pager, those tend to expand tabs to spaces, so the patch gets mangled at that point. At least less does that. OTOH if you :r the patch file, it works fine. You should be able to add an “inline” attachment too, and the receiver should then be able to both see it inline in the email and store it as a file. Tom uses inline attachments a lot, and apparently very few MUA are allowing both features on that (gnus is perfectly fine here). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Designing an extension for feature-space similarity search
Jay Levitt jay.lev...@gmail.com writes: Perfect. Composite types are exactly what I need here; the application can declare its composite type and provide distance functions for each member, and the extension can use those to calculate similarity. How do I introspect the composite type's pg_class to see what it contains? I assume there's a better way than SPI on system catalogs :) Definitely. Take a look at record_out, record_cmp, and sibling functions on generic composites (src/backend/utils/adt/rowtypes.c). You might or might not feel like wiring in more assumptions than those have about the possible contents of the record type. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible new option for wal_sync_method
On Thu, Feb 16, 2012 at 19:18, Dan Scales sca...@vmware.com wrote: fsync/fdatasync can be very slow on ext3, because it seems to have to always wait for the current filesystem meta-data transaction to complete, even if that meta-data operation is completely unrelated to the file being fsync'ed. Use the data=writeback mount option to remove this restriction. This is actually the suggested setting for PostgreSQL file systems: http://www.postgresql.org/docs/current/static/wal-intro.html (Note that this is unsafe for some other applications, so I wouldn't use it on the root file system) Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible new option for wal_sync_method
On 2/16/12 9:18 AM, Dan Scales wrote: Do folks have any interest in this change, or comments on its usefulness/correctness? It would be just an extra option for wal_sync_method that users can try out and has benefits for certain configurations. Does it have any benefit on Ext4/XFS/Butrfs? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: copybytea command for psql
A while ago I went looking for nice ways to export an unencoded bytea value using psql, see http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. Regina Obe is also in want of a solution for this: http://www.postgresonline.com/journal/archives/243-PSQL-needs-a-better-way-of-outputting-bytea-to-binary-files.html. It seems like what we need is a psql command for it, something like: \copybytea (select query_returning_one_bytea) to /path/to/file Does anyone have a better solution or any objection to this feature? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: copybytea command for psql
Andrew Dunstan and...@dunslane.net writes: A while ago I went looking for nice ways to export an unencoded bytea value using psql, see http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. Regina Obe is also in want of a solution for this: http://www.postgresonline.com/journal/archives/243-PSQL-needs-a-better-way-of-outputting-bytea-to-binary-files.html. It seems like what we need is a psql command for it, something like: \copybytea (select query_returning_one_bytea) to /path/to/file Does anyone have a better solution or any objection to this feature? It seems awfully narrow. In the first place, why restrict it to bytea? In the second, that syntax is going to cause serious headaches, not least because backslash commands can't extend across multiple lines. The idea that comes to mind for me, if you want to connect this up to SELECT and not COPY, is some variant of \g that implies (1) pull back the data as binary not text, and (2) dump it to the target file with absolutely no recordseps, fieldseps, etc; just the bytes, ma'am. It might be worth thinking of (1) and (2) as separately invokable features, or then again it might not. I also wonder how this might interact with Peter E's recent commit for null-byte separators. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Thu, Feb 16, 2012 at 12:42 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Fixed, I've spent quite some time refining the API. That is better. What could still be done here is to create a cache (along the lines of attoptcache.c) that stores a big array with one slot for each supported command type. The first time a command is executed in a particular session, we construct a cache entry for that command type, storing the OIDs of the before and after triggers. Then, instead of having to do a catalog scan to notice that there are no triggers for a given command type, we can just reference into the array and see, oh, we probed before and found no triggers. CacheRegisterSyscacheCallback or some other mechanism can be used to flush all the cached information whenever we notice that pg_cmdtrigger has been updated. Now, I don't know for sure that this is necessary without performance testing, but I think we ought to try to figure that out. This version doesn't apply cleanly for me; there is a conflict in functioncmds.c, which precludes my easily benchmarking it. It strikes me that this patch introduces a possible security hole: it allows command triggers to be installed by the database owner, but that seems like it might allow the database owner can usurp the privileges of any user who runs one of these commands in his or her database, including the superuser. Perhaps that could be fixed by running command triggers as the person who created them rather than as the superuser, but that seems like it might be lead to other kinds of privilege-escalation bugs. If I install a command trigger that prevents all DDL, how do I uninstall it? Or say I'm the superuser and I want to undo something my disgruntled DBA did before he quit. I would much prefer to have DropCmdTrigStmt wedge itself into the existing DropStmt infrastructure which I just recently worked so hard on. If you do that, you should find that you can then easily also support comments on command triggers, security labels on command triggers (though I don't know if that's useful), and the ability to include command triggers in extensions. I am a bit worried about supporting command triggers on statements that do internal transaction control, such as VACUUM and CREATE INDEX CONCURRENTLY. Obviously that's very useful and I'd like to have it, but there's a problem: if the AFTER trigger errors out, it won't undo the whole command. That might be very surprising. BEFORE triggers seem OK, and AFTER triggers might be OK too but we at least need to think hard about how to document that. I think it would be better to bail on trying to use CREATE TRIGGER and DROP TRIGGER as a basis for this functionality, and instead create completely new toplevel statements CREATE COMMAND TRIGGER and DROP COMMAND TRIGGER. Then, you could decide that all command triggers live in the same namespace, and therefore to get rid of the command trigger called bob you can just say DROP COMMAND TRIGGER bob, without having to specify the type of command it applies to. It's still clear that you're dropping a *command* trigger because that's right in the statement name, whereas it would make me a bit uneasy to decide that DROP TRIGGER bob means a command trigger just by virtue of the fact that no table name was specified. That would probably also make it easier to accomplish the above-described goal of integrating this into the DropStmt infrastructure. + if (!superuser()) + if (!pg_database_ownercheck(MyDatabaseId, GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE, + get_database_name(MyDatabaseId)); The separate superuser check is superfluous; pg_database_ownercheck() already handles that. Can we call InitCommandContext in some centralized place, rather than separately at lots of different call sites? I am confused why this is adding a new file called dumpcatalog.c which looks suspiciously similar to some existing pg_dump code I've been staring at all day. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: copybytea command for psql
On 02/16/2012 03:32 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: A while ago I went looking for nice ways to export an unencoded bytea value using psql, see http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. Regina Obe is also in want of a solution for this: http://www.postgresonline.com/journal/archives/243-PSQL-needs-a-better-way-of-outputting-bytea-to-binary-files.html. It seems like what we need is a psql command for it, something like: \copybytea (select query_returning_one_bytea) to /path/to/file Does anyone have a better solution or any objection to this feature? It seems awfully narrow. In the first place, why restrict it to bytea? In the second, that syntax is going to cause serious headaches, not least because backslash commands can't extend across multiple lines. The idea that comes to mind for me, if you want to connect this up to SELECT and not COPY, is some variant of \g that implies (1) pull back the data as binary not text, and (2) dump it to the target file with absolutely no recordseps, fieldseps, etc; just the bytes, ma'am. It might be worth thinking of (1) and (2) as separately invokable features, or then again it might not. I also wonder how this might interact with Peter E's recent commit for null-byte separators. Oh, nice idea. say \g{bn} where b was for binary fetch/output and n was for no recordseps etc? That looks like a winner. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Excerpts from Dimitri Fontaine's message of jue feb 16 14:42:26 -0300 2012: Hi, Please find attached version 8 of the patch, which fixes most of your complaints. Hi, I didn't like the new cmdtrigger.h file. It's included by a lot of other headers, and it's also itself including execnodes.h and parsenodes.h which means practically the whole lot of the source tree is included in turn. If you could split it, so that the struct definition is in a different file that's only included by the few .c files that actually use that struct, I'd be much happier. ... after looking at it more closely, I think only this line needs to be in a separate file: typedef struct CommandContextData *CommandContext; and that file is included by other headers; they don't need the function declarations or the struct definition. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Hi, First answer now, new patch version tomorrow. Robert Haas robertmh...@gmail.com writes: That is better. Cool :) What could still be done here is to create a cache (along the lines of attoptcache.c) that stores a big array with one slot for each supported command type. The first time a command is executed in a particular session, we construct a cache entry for that command type, storing the OIDs of the before and after triggers. Then, instead of having to do a catalog scan to notice that there are no triggers for a given command type, we can just reference into the array and see, oh, we probed before and found no triggers. CacheRegisterSyscacheCallback or some other mechanism can be used to flush all the cached information whenever we notice that pg_cmdtrigger has been updated. I guess it's another spelling for a catalog cache, so I'll look at what it takes to build either a full catcache or this array cache you're talking about tomorrow. Now, I don't know for sure that this is necessary without performance testing, but I think we ought to try to figure that out. This version doesn't apply cleanly for me; there is a conflict in functioncmds.c, which precludes my easily benchmarking it. Means I need to update my master's branch and merge conflicts. You could also test right from my github branch too, I guess. https://github.com/dimitri/postgres https://github.com/dimitri/postgres/tree/command_triggers It strikes me that this patch introduces a possible security hole: it allows command triggers to be installed by the database owner, but that seems like it might allow the database owner can usurp the privileges of any user who runs one of these commands in his or her database, including the superuser. Perhaps that could be fixed by running command triggers as the person who created them rather than as the superuser, but that seems like it might be lead to other kinds of privilege-escalation bugs. We could decide command triggers are superuser only. Security is not something I'm very strong at, so I'll leave it up to you to decide. If I install a command trigger that prevents all DDL, how do I uninstall it? Or say I'm the superuser and I want to undo something my disgruntled DBA did before he quit. Good catch, I guess we need to remove creating and dropping a command trigger to the list of supported commands in the ANY COMMAND list. I would much prefer to have DropCmdTrigStmt wedge itself into the existing DropStmt infrastructure which I just recently worked so hard on. If you do that, you should find that you can then easily also support comments on command triggers, security labels on command triggers (though I don't know if that's useful), and the ability to include command triggers in extensions. Ah yes, that too was on the TODO list, I just forgot about it. I still remember the merge conflicts when that patch went it, you know… :) I am a bit worried about supporting command triggers on statements that do internal transaction control, such as VACUUM and CREATE INDEX CONCURRENTLY. Obviously that's very useful and I'd like to have it, but there's a problem: if the AFTER trigger errors out, it won't undo the whole command. That might be very surprising. BEFORE triggers seem OK, and AFTER triggers might be OK too but we at least need to think hard about how to document that. I think we should limit the support to BEFORE command trigger only. It was unclear to me how to solve the problem for the AFTER command case, and if you're unclear to, then there's not that many open questions. I think it would be better to bail on trying to use CREATE TRIGGER and DROP TRIGGER as a basis for this functionality, and instead create completely new toplevel statements CREATE COMMAND TRIGGER and DROP COMMAND TRIGGER. Then, you could decide that all command triggers live in the same namespace, and therefore to get rid of the command trigger called bob you can just say DROP COMMAND TRIGGER bob, without having to specify the type of command it applies to. It's I have no strong feeling about that. Would that require that COMMAND be more reserved than it currently is (UNRESERVED_KEYWORD)? still clear that you're dropping a *command* trigger because that's right in the statement name, whereas it would make me a bit uneasy to decide that DROP TRIGGER bob means a command trigger just by virtue of the fact that no table name was specified. That would probably also make it easier to accomplish the above-described goal of integrating this into the DropStmt infrastructure. The other thing is that you might want to drop the trigger from only one command, of course we could support both syntax. We could also add support for the following one: DROP TRIGGER bob ON ALL COMMANDS; As ALL is already a reserved word, that will work. + if (!superuser()) + if (!pg_database_ownercheck(MyDatabaseId, GetUserId())) +
Re: [HACKERS] Command Triggers
Alvaro Herrera alvhe...@commandprompt.com writes: I didn't like the new cmdtrigger.h file. It's included by a lot of other headers, and it's also itself including execnodes.h and parsenodes.h which means practically the whole lot of the source tree is included in turn. If you could split it, so that the struct definition is in a different file that's only included by the few .c files that actually use that struct, I'd be much happier. I didn't realize that, thanks for reviewing! ... after looking at it more closely, I think only this line needs to be in a separate file: typedef struct CommandContextData *CommandContext; and that file is included by other headers; they don't need the function declarations or the struct definition. I'll look into that tomorrow then. The same trick is already applied to Relation and RelationData (resp. in src/include/utils/relcache.h and src/include/utils/rel.h), and only now I understand why :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug in intarray?
Hi, On a french PostgreSQL web forum, one of our users asked about a curious behaviour of the intarray extension. This query: SELECT ARRAY[-1,3,1] ARRAY[1, 2]; should give {1} as a result. But, on HEAD (and according to his tests, on 9.0.6 and 9.1.2), it appears to give en empty array. Digging on this issue, another user (Julien Rouhaud) made an interesting comment on this line of code: if (i + j == 0 || (i + j 0 *(dr - 1) != db[j])) (line 159 of contrib/intarray/_int_tool.c, current HEAD) Apparently, the code tries to check the current value of the right side array with the previous value of the resulting array. Which clearly cannot work if there is no previous value in the resulting array. So I worked on a patch to fix this, as I think it is a bug (but I may be wrong). Patch is attached and fixes the issue AFAICT. Thanks. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com diff --git a/contrib/intarray/_int_tool.c b/contrib/intarray/_int_tool.c index 79f018d..4d7a1f2 100644 --- a/contrib/intarray/_int_tool.c +++ b/contrib/intarray/_int_tool.c @@ -159,7 +159,7 @@ inner_int_inter(ArrayType *a, ArrayType *b) i++; else if (da[i] == db[j]) { - if (i + j == 0 || (i + j 0 *(dr - 1) != db[j])) + if (i + j == 0 || (i + j 0 (dr - ARRPTR(r)) == 0) || (i + j 0 *(dr - 1) != db[j])) *dr++ = db[j]; i++; j++; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] run GUC check hooks on RESET
Tom Lane t...@sss.pgh.pa.us wrote: The main thing I would be worried about is whether you're sure that you have separated the RESET-as-a-command case from the cases where we actually are rolling back to a previous state. It looks good to me. I added a few regression tests for that. Robert Haas robertmh...@gmail.com wrote: +1 for such a comment. Added. The attached patch includes these. If it seems close, I'd be happy to come up with a version for the 9.1 branch. Basically it looks like that means omitting the changes to variable.c (which only changed message text and made the order of steps on related GUCs more consistent), and capturing a different out file for the expected directory. -Kevin *** a/src/backend/commands/variable.c --- b/src/backend/commands/variable.c *** *** 600,616 check_XactIsoLevel(char **newval, void **extra, GucSource source) if (newXactIsoLevel != XactIsoLevel) { ! if (FirstSnapshotSet) { GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION); ! GUC_check_errmsg(SET TRANSACTION ISOLATION LEVEL must be called before any query); return false; } ! /* We ignore a subtransaction setting it to the existing value. */ ! if (IsSubTransaction()) { GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION); ! GUC_check_errmsg(SET TRANSACTION ISOLATION LEVEL must not be called in a subtransaction); return false; } /* Can't go to serializable mode while recovery is still active */ --- 600,616 if (newXactIsoLevel != XactIsoLevel) { ! /* We ignore a subtransaction setting it to the existing value. */ ! if (IsSubTransaction()) { GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION); ! GUC_check_errmsg(cannot set transaction isolation level in a subtransaction); return false; } ! if (FirstSnapshotSet) { GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION); ! GUC_check_errmsg(transaction isolation level must be set before any query); return false; } /* Can't go to serializable mode while recovery is still active */ *** *** 665,677 check_transaction_deferrable(bool *newval, void **extra, GucSource source) if (IsSubTransaction()) { GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION); ! GUC_check_errmsg(SET TRANSACTION [NOT] DEFERRABLE cannot be called within a subtransaction); return false; } if (FirstSnapshotSet) { GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION); ! GUC_check_errmsg(SET TRANSACTION [NOT] DEFERRABLE must be called before any query); return false; } --- 665,677 if (IsSubTransaction()) { GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION); ! GUC_check_errmsg(cannot set transaction deferrable state within a subtransaction); return false; } if (FirstSnapshotSet) { GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION); ! GUC_check_errmsg(transaction deferrable state must be set before any query); return false; } *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** *** 5042,5047 config_enum_get_options(struct config_enum * record, const char *prefix, --- 5042,5051 * * If value is NULL, set the option to its default value (normally the * reset_val, but if source == PGC_S_DEFAULT we instead use the boot_val). + * When we reset a value we can't assume that the old value is valid, but must + * call the check hook if present. This is because the validity of a change + * might depend on context. For example, transaction isolation may not be + * changed after the transaction has run a query, even by a RESET command. * * action indicates whether to set the value globally in the session, locally * to the current top transaction, or just for the duration of a function call. *** *** 5287,5302 set_config_option(const char *name, const char *value, name))); return 0; } - if (!call_bool_check_hook(conf, newval, newextra, - source, elevel)) -
Re: [HACKERS] Command Triggers
On Thu, Feb 16, 2012 at 4:21 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: We could decide command triggers are superuser only. Security is not something I'm very strong at, so I'll leave it up to you to decide. That's certainly the easiest option. If you don't feel passionate about spending a lot of energy figuring out how to make it secure, then I suggest we just restrict it to superusers until someone does. If I install a command trigger that prevents all DDL, how do I uninstall it? Or say I'm the superuser and I want to undo something my disgruntled DBA did before he quit. Good catch, I guess we need to remove creating and dropping a command trigger to the list of supported commands in the ANY COMMAND list. Another option would be to add a PGC_SUSET boolean GUC that can be used to disable command triggers. I think that might be more flexible, not to mention useful for recursion prevention. I am a bit worried about supporting command triggers on statements that do internal transaction control, such as VACUUM and CREATE INDEX CONCURRENTLY. Obviously that's very useful and I'd like to have it, but there's a problem: if the AFTER trigger errors out, it won't undo the whole command. That might be very surprising. BEFORE triggers seem OK, and AFTER triggers might be OK too but we at least need to think hard about how to document that. I think we should limit the support to BEFORE command trigger only. It was unclear to me how to solve the problem for the AFTER command case, and if you're unclear to, then there's not that many open questions. Works for me. I think it would be better to bail on trying to use CREATE TRIGGER and DROP TRIGGER as a basis for this functionality, and instead create completely new toplevel statements CREATE COMMAND TRIGGER and DROP COMMAND TRIGGER. Then, you could decide that all command triggers live in the same namespace, and therefore to get rid of the command trigger called bob you can just say DROP COMMAND TRIGGER bob, without having to specify the type of command it applies to. It's I have no strong feeling about that. Would that require that COMMAND be more reserved than it currently is (UNRESERVED_KEYWORD)? It shouldn't. still clear that you're dropping a *command* trigger because that's right in the statement name, whereas it would make me a bit uneasy to decide that DROP TRIGGER bob means a command trigger just by virtue of the fact that no table name was specified. That would probably also make it easier to accomplish the above-described goal of integrating this into the DropStmt infrastructure. The other thing is that you might want to drop the trigger from only one command, of course we could support both syntax. We could also add support for the following one: DROP TRIGGER bob ON ALL COMMANDS; Uh, hold on. Creating a trigger on multiple commands ought to only create one entry in pg_cmdtrigger. If you drop it, you drop the whole thing. Changing which commands the trigger applies to would be the job for ALTER, not DROP. But note that we have no similar functionality for regular triggers, so I can't think we really need it here either. Can we call InitCommandContext in some centralized place, rather than separately at lots of different call sites? Then you have to either make the current command context a backend private global variable, or amend even more call sites to pass it down. The global variable idea does not work, see RemoveRelations() and RemoveObjects() which are handling an array of command contexts. So do you prefer lots of InitCommandContext() or adding another parameter to almost all functions called by standard_ProcessUtility()? Blech. Maybe we should just have CommandFiresTriggers initialize it if not already done? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Robert Haas robertmh...@gmail.com writes: On Thu, Feb 16, 2012 at 4:21 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: That's certainly the easiest option. If you don't feel passionate about spending a lot of energy figuring out how to make it secure, then I suggest we just restrict it to superusers until someone does. Works for me. If I install a command trigger that prevents all DDL, how do I uninstall it? Or say I'm the superuser and I want to undo something my disgruntled DBA did before he quit. Good catch, I guess we need to remove creating and dropping a command trigger to the list of supported commands in the ANY COMMAND list. Another option would be to add a PGC_SUSET boolean GUC that can be used to disable command triggers. I think that might be more flexible, not to mention useful for recursion prevention. Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED; which I had forgotten about in the previous answer, so I think we're good as it is. That's how I prevented recursion in some of my tests (not included in the regress tests, was using INSTEAD OF). DROP TRIGGER bob ON ALL COMMANDS; Uh, hold on. Creating a trigger on multiple commands ought to only create one entry in pg_cmdtrigger. If you drop it, you drop the whole thing. Changing which commands the trigger applies to would be the job for ALTER, not DROP. But note that we have no similar functionality for regular triggers, so I can't think we really need it here either. Why would we do it that way (a single entry for multiple commands)? The way it is now, it's only syntactic sugar, so I think it's easier to implement, document and use. So do you prefer lots of InitCommandContext() or adding another parameter to almost all functions called by standard_ProcessUtility()? Blech. Maybe we should just have CommandFiresTriggers initialize it if not already done? Thing is, in a vast number of cases (almost of ALTER OBJECT name, namespace and owner) you don't have the Node * parse tree any more from the place where you check for CommandFiresTriggers(), so you can't initialize there. That's part of the fun. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Designing an extension for feature-space similarity search
Tom Lane wrote: - Can domains have operators, or are operators defined on types? I think the current state of play is that you can have such things but the system will only consider them for exact type matches, so you might need more explicit casts than you ordinarily would. Turns out it's even smarter than that; it seems to coerce when it's unambiguous: create domain birthdate as date; create function date_dist(birthdate, birthdate) returns integer as $$ select 123; $$ language sql; create operator - ( procedure = date_dist, leftarg = birthdate, rightarg = birthdate); select '2012-01-01'::birthdate - '2012-01-01'::birthdate; -- 123 select '2012-01-01'::date - '2012-01-01'::date ; -- 123 create domain activity_date as date; create function date_dist(activity_date, activity_date) returns integer as $$ select 432; $$ language sql; create operator - ( procedure = date_dist, leftarg = activity_date, rightarg = activity_date); select '2012-01-01'::activity_date - '2012-01-01'::activity_date; -- 432 select '2012-01-01'::birthdate - '2012-01-01'::birthdate; -- 123 select '2012-01-01'::date - '2012-01-01'::date ; -- ERROR: operator is not unique: date - date -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Qual evaluation cost estimates for GIN indexes
I looked into the complaint here of poor estimation for GIN indexscans: http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php At first glance it sounds like a mistake in selectivity estimation, but it isn't: the rowcount estimates are pretty nearly dead on. The problem is in the planner's estimate of the cost of executing the @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but actually it's a good deal more expensive than that. Some experimentation suggests that @@ might be about 500 times as expensive as a simple integer comparison. I don't propose pushing its procost up that much, but surely at least 10 would be appropriate, maybe even 100. However ... if you just alter pg_proc.procost in Marc's example, the planner *still* picks a seqscan, even though its estimate of the seqscan cost surely does go up. The reason is that its estimate of the GIN indexscan cost goes up just as much, since we charge one qual eval cost per returned tuple in gincostestimate. It is easy to tell from the actual runtimes that that is not what's happening in a GIN indexscan; we are not re-executing the @@ operator for every tuple. But the planner's cost model doesn't know that. There are a couple of issues that would have to be addressed to make this better: 1. If we shouldn't charge procost per row, what should we charge? It's probably reasonable to assume that the primitive GIN-index-entry comparison operations have cost equal to one cpu_operator_cost regardless of what's assigned to the user-visible operators, but I'm not convinced that that's sufficient to model complicated operators. It might be okay to charge that much per index entry visited rather than driving it off the number of heap tuples returned. The code in gincostestimate goes to considerable lengths to estimate the number of index pages fetched, and it seems like it might be able to derive the number of index entries visited too, but it's not trying to account for any CPU costs at the moment. 2. What about lossy operators, or lossy bitmap scans? If either of those things happen, we *will* re-execute the @@ operator at visited tuples, so discounting its high cost would be a mistake. But the planner has no information about either effect, since we moved all support for lossiness to runtime. I think it was point #2 that led us to not consider these issues before. But now that we've seen actual cases where the planner makes a poor decision because it's not modeling this effect, I think we ought to try to do something about it. I haven't got time to do anything about this for 9.2, and I bet you don't either, but it ought to be on the TODO list to try to improve this. BTW, an entirely different line of thought is why on earth is @@ so frickin expensive, when it's comparing already-processed tsvectors with only a few entries to an already-processed tsquery with only one entry??. This test case suggests to me that there's something unnecessarily slow in there, and a bit of micro-optimization effort might be well repaid. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Qual evaluation cost estimates for GIN indexes
I wrote: BTW, an entirely different line of thought is why on earth is @@ so frickin expensive, when it's comparing already-processed tsvectors with only a few entries to an already-processed tsquery with only one entry??. This test case suggests to me that there's something unnecessarily slow in there, and a bit of micro-optimization effort might be well repaid. Oh, scratch that: a bit of oprofiling shows that while the tsvectors aren't all that long, they are long enough to get compressed, and most of the runtime is going into pglz_decompress not @@ itself. So this goes back to the known issue that the planner ought to try to account for detoasting costs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Designing an extension for feature-space similarity search
Jay Levitt jay.lev...@gmail.com writes: Tom Lane wrote: - Can domains have operators, or are operators defined on types? I think the current state of play is that you can have such things but the system will only consider them for exact type matches, so you might need more explicit casts than you ordinarily would. Turns out it's even smarter than that; it seems to coerce when it's unambiguous: Yeah, that sort of case will probably work all right. The thing to keep in mind is that operators/functions declared to take domains are definitely second-class citizens, and will lose out in many somewhat-ambiguous cases where an operator on a base type could get selected due to the ambiguity resolution rules. For your application it will probably help if you can pick an operator name that's not in use for any operation on the base type(s). Also, it's conceivable that it won't matter much to you, since I gather these operators will mostly get invoked behind the scenes and not directly written in queries; it should not be too hard to avoid ambiguity in mechanically-generated references. (There was a good deal of chatter some years ago about trying to improve support of functions taking domains, but nothing's gotten done yet.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible new option for wal_sync_method
Good point, thanks. From the ext3 source code, it looks like ext3_sync_file() does a blkdev_issue_flush(), which issues a flush to the block device, whereas simple direct IO does not. So, that would make this wal_sync_method option less useful, since, as you say, the user would have to know if the block device is doing write caching. For the numbers I reported, I don't think the performance gain is from not doing the block device flush. The system being measured is a Fibre Channel disk which should have a fully-nonvolatile disk array. And measurements using systemtap show that blkdev_issue_flush() always takes only in the microsecond range. I think the overhead is still from the fact that ext3_sync_file() waits for the current in-flight transaction if there is one (and does an explicit device flush if there is no transaction to wait for.) I do think there are lots of meta-data operations happening on the data files (especially for a growing database), so the WAL log commit is waiting for unrelated data operations. It would be nice if there a simple file system operation that just flushed the cache of the block device containing the filesystem (i.e. just does the blkdev_issue_flush() and not the other things in ext3_sync_file()). The ext4_sync_file() code looks fairly similar, so I think it may have the same problem, though I can't be positive. In that case, this wal_sync_method option might help ext4 as well. With respect to sync_file_range(), the Linux code that I'm looking at doesn't really seem to indicate that there is a device flush (since it never calls a f_op-fsync_file operation). So sync_file_range() may be not be as useful as thought. By the way, all the numbers were measured with data=writeback, barrier=1 options for ext3. I don't think that I have seen a significant different when the DBT2 workload for ext3 option data=ordered. I will measure all these numbers again tonight, but with barrier=0, so as to try to confirm that the write flush itself isn't costing a lot for this configuration. Dan - Original Message - From: Andres Freund and...@anarazel.de To: pgsql-hackers@postgresql.org Cc: Dan Scales sca...@vmware.com Sent: Thursday, February 16, 2012 10:32:09 AM Subject: Re: [HACKERS] possible new option for wal_sync_method Hi, On Thursday, February 16, 2012 06:18:23 PM Dan Scales wrote: When running Postgres on a single ext3 filesystem on Linux, we find that the attached simple patch gives significant performance benefit (7-8% in numbers below). The patch adds a new option for wal_sync_method, which is open_direct. With this option, the WAL is always opened with O_DIRECT (but not O_SYNC or O_DSYNC). For Linux, the use of only O_DIRECT should be correct. All WAL logs are fully allocated before being used, and the WAL buffers are 8K-aligned, so all direct writes are guaranteed to complete before returning. (See http://lwn.net/Articles/348739/) I don't think that behaviour is safe in the face of write caches in the IO path. Linux takes care to issue flush/barrier instructions when necessary if you issue an fsync/fdatasync, but to my knowledge it does not when O_DIRECT is used (That would suck performancewise). I think that behaviour is safe if you have no externally visible write caching enabled but thats not exactly easy to get/document knowledge. Why should there otherwise be any performance difference between O_DIRECT| O_SYNC and O_DIRECT in wal write case? There is no metadata that needs to be written and I have a hard time imaging that the check whether there is metadata is that expensive. I guess a more interesting case would be comparing O_DIRECT|O_SYNC with O_DIRECT + fdatasync() or even O_DIRECT + sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER) Any special reason youve did that comparison on ext3? Especially with data=ordered its behaviour regarding syncs is pretty insane performancewise. Ext4 would be a bit more interesting... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in intarray?
Guillaume Lelarge guilla...@lelarge.info writes: This query: SELECT ARRAY[-1,3,1] ARRAY[1, 2]; should give {1} as a result. But, on HEAD (and according to his tests, on 9.0.6 and 9.1.2), it appears to give en empty array. Definitely a bug, and I'll bet it goes all the way back. Digging on this issue, another user (Julien Rouhaud) made an interesting comment on this line of code: if (i + j == 0 || (i + j 0 *(dr - 1) != db[j])) (line 159 of contrib/intarray/_int_tool.c, current HEAD) Apparently, the code tries to check the current value of the right side array with the previous value of the resulting array. Which clearly cannot work if there is no previous value in the resulting array. So I worked on a patch to fix this, as I think it is a bug (but I may be wrong). Patch is attached and fixes the issue AFAICT. Yeah, this code is bogus, but it's also pretty unreadable. I think it's better to get rid of the inconsistently-used pointer arithmetic and the fundamentally wrong/irrelevant test on i+j, along the lines of the attached. regards, tom lane diff --git a/contrib/intarray/_int_tool.c b/contrib/intarray/_int_tool.c index 79f018d333b1d6810aa7fd2996c158b41b0263fb..132d15316054d842593468f191eca9053846f706 100644 *** a/contrib/intarray/_int_tool.c --- b/contrib/intarray/_int_tool.c *** inner_int_inter(ArrayType *a, ArrayType *** 140,146 *db, *dr; int i, ! j; if (ARRISEMPTY(a) || ARRISEMPTY(b)) return new_intArrayType(0); --- 140,147 *db, *dr; int i, ! j, ! k; if (ARRISEMPTY(a) || ARRISEMPTY(b)) return new_intArrayType(0); *** inner_int_inter(ArrayType *a, ArrayType *** 152,166 r = new_intArrayType(Min(na, nb)); dr = ARRPTR(r); ! i = j = 0; while (i na j nb) { if (da[i] db[j]) i++; else if (da[i] == db[j]) { ! if (i + j == 0 || (i + j 0 *(dr - 1) != db[j])) ! *dr++ = db[j]; i++; j++; } --- 153,167 r = new_intArrayType(Min(na, nb)); dr = ARRPTR(r); ! i = j = k = 0; while (i na j nb) { if (da[i] db[j]) i++; else if (da[i] == db[j]) { ! if (k == 0 || dr[k - 1] != db[j]) ! dr[k++] = db[j]; i++; j++; } *** inner_int_inter(ArrayType *a, ArrayType *** 168,180 j++; } ! if ((dr - ARRPTR(r)) == 0) { pfree(r); return new_intArrayType(0); } else ! return resize_intArrayType(r, dr - ARRPTR(r)); } void --- 169,181 j++; } ! if (k == 0) { pfree(r); return new_intArrayType(0); } else ! return resize_intArrayType(r, k); } void diff --git a/contrib/intarray/expected/_int.out b/contrib/intarray/expected/_int.out index 6ed3cc6ced096e2e97665e76ceba7fc139415029..4080b9633fe98a91861684e0d82f1297c21b91af 100644 *** a/contrib/intarray/expected/_int.out --- b/contrib/intarray/expected/_int.out *** SELECT '{123,623,445}'::int[] '{1623,6 *** 137,142 --- 137,148 {623} (1 row) + SELECT '{-1,3,1}'::int[] '{1,2}'; + ?column? + -- + {1} + (1 row) + --test query_int SELECT '1'::query_int; query_int diff --git a/contrib/intarray/sql/_int.sql b/contrib/intarray/sql/_int.sql index b60e936dc520d8308bd16e50681f061dc86e2dfe..216c5c58d615a7cd1a5fe3714c9a5c91fb255138 100644 *** a/contrib/intarray/sql/_int.sql --- b/contrib/intarray/sql/_int.sql *** SELECT '{123,623,445}'::int[] | 623; *** 24,29 --- 24,30 SELECT '{123,623,445}'::int[] | 1623; SELECT '{123,623,445}'::int[] | '{1623,623}'; SELECT '{123,623,445}'::int[] '{1623,623}'; + SELECT '{-1,3,1}'::int[] '{1,2}'; --test query_int -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
(2012/02/17 0:15), Albe Laurenz wrote: Shigeru Hanada wrote: Thanks for the review. Attached patches are revised version, though only fdw_helper_v5.patch is unchanged. Two questions: - Is it on purpose that you can specify all SSL client options except sslcompression? No, just an oversight. Good catch. - Since a rescan is done by rewinding the cursor, is it necessary to have any other remote isolation level than READ COMMITED? There is only one query issued per transaction. If multiple foreign tables on a foreign server is used in a local query, multiple queries are executed in a remote transaction. So IMO isolation levels are useful even if remote query is executed only once. Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Qual evaluation cost estimates for GIN indexes
On Thu, Feb 16, 2012 at 6:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: BTW, an entirely different line of thought is why on earth is @@ so frickin expensive, when it's comparing already-processed tsvectors with only a few entries to an already-processed tsquery with only one entry??. This test case suggests to me that there's something unnecessarily slow in there, and a bit of micro-optimization effort might be well repaid. Oh, scratch that: a bit of oprofiling shows that while the tsvectors aren't all that long, they are long enough to get compressed, and most of the runtime is going into pglz_decompress not @@ itself. So this goes back to the known issue that the planner ought to try to account for detoasting costs. This issue of detoasting costs comes up a lot, specifically in reference to @@. I wonder if we shouldn't try to apply some quick and dirty hack in time for 9.2, like maybe random_page_cost for every row or every attribute we think will require detoasting. That's obviously going to be an underestimate in many if not most cases, but it would probably still be an improvement over assuming that detoasting is free. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Qual evaluation cost estimates for GIN indexes
Robert Haas robertmh...@gmail.com writes: This issue of detoasting costs comes up a lot, specifically in reference to @@. I wonder if we shouldn't try to apply some quick and dirty hack in time for 9.2, like maybe random_page_cost for every row or every attribute we think will require detoasting. That's obviously going to be an underestimate in many if not most cases, but it would probably still be an improvement over assuming that detoasting is free. Well, you can't theorize without data, to misquote Sherlock. We'd need to have some stats on which to base we think this will require detoasting. I guess we could teach ANALYZE to compute and store fractions percent of entries in this column that are compressed and percent that are stored out-of-line, and then hope that those percentages apply to the subset of entries that a given query will visit, and thereby derive a number of operations to multiply by whatever we think the cost-per-detoast-operation is. It's probably all do-able, but it seems way too late to be thinking about this for 9.2. We've already got a ton of new stuff that needs to be polished and tuned... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Thu, Feb 16, 2012 at 5:38 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Another option would be to add a PGC_SUSET boolean GUC that can be used to disable command triggers. I think that might be more flexible, not to mention useful for recursion prevention. Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED; which I had forgotten about in the previous answer, so I think we're good as it is. That's how I prevented recursion in some of my tests (not included in the regress tests, was using INSTEAD OF). Eh, so what happens then if someone sets a command trigger on ALTER TRIGGER? DROP TRIGGER bob ON ALL COMMANDS; Uh, hold on. Creating a trigger on multiple commands ought to only create one entry in pg_cmdtrigger. If you drop it, you drop the whole thing. Changing which commands the trigger applies to would be the job for ALTER, not DROP. But note that we have no similar functionality for regular triggers, so I can't think we really need it here either. Why would we do it that way (a single entry for multiple commands)? The way it is now, it's only syntactic sugar, so I think it's easier to implement, document and use. Well, for one thing, it's consistent with how we handle it for regular triggers. For two things, if you create an object named bob, you expect to end up with an object named bob - not 47 objects (or whatever) that are all named bob. Also, suppose you create a trigger on ALL COMMANDS, and then a new version of PG adds a new command. When you dump and reload, do you expect to end up with a trigger on all commands that existed in the old version, or all the commands that exist in the new version? Or conversely, suppose we get rid of a command in a future release. How will we handle that? I can't think of another example of where a CREATE command creates multiple objects like that. So do you prefer lots of InitCommandContext() or adding another parameter to almost all functions called by standard_ProcessUtility()? Blech. Maybe we should just have CommandFiresTriggers initialize it if not already done? Thing is, in a vast number of cases (almost of ALTER OBJECT name, namespace and owner) you don't have the Node * parse tree any more from the place where you check for CommandFiresTriggers(), so you can't initialize there. That's part of the fun. Hmm, I'll look at this in more detail next time through. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)
On Mon, Feb 13, 2012 at 8:37 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 13.02.2012 01:04, Jeff Janes wrote: Attached is my quick and dirty attempt to set XLP_FIRST_IS_CONTRECORD. I have no idea if I did it correctly, in particular if calling GetXLogBuffer(CurrPos) twice is OK or if GetXLogBuffer has side effects that make that a bad thing to do. I'm not proposing it as the real fix, I just wanted to get around this problem in order to do more testing. Thanks. That's basically the right approach. Attached patch contains a cleaned up version of that. Got another problem: when I ran pg_stop_backup to take an online backup, it got stuck until I had generated new WAL record. This happens because, in the patch, when pg_stop_backup forces a switch to new WAL file, old WAL file is not marked as archivable until next new WAL record has been inserted, but pg_stop_backup keeps waiting for that WAL file to be archived. OTOH, without the patch, WAL file is marked as archivable as soon as WAL file switch occurs. So, in short, the patch seems to handle the WAL file switch logic incorrectly. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] MySQL search query is not executing in Postgres DB
In MySQL the below query is executing properly. SELECT * FROM Table-name WHERE (Table.ID LIKE '1%') But when i try to execute the above query in Postgres, i get the following Exception org.postgresql.util.PSQLException: ERROR: operator does not exist: integer ~~ unknown Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts. If i convert the same query SELECT * FROM Table-name WHERE CAST(Table.ID as TEXT) LIKE '1%' . This gets executed directly in Postgres DB. But i need some query which implicitly type cast in DB, which allows me to execute the MySQL query without any Exception. Because i remember there is a way for integer to boolean implicit type cast. Please refer the following link. http://archives.postgresql.org/pgsql-general/2011-01/msg00866.php Thanks in advance. -- View this message in context: http://postgresql.1045698.n5.nabble.com/MySQL-search-query-is-not-executing-in-Postgres-DB-tp5491531p5491531.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Qual evaluation cost estimates for GIN indexes
Hi. First, thanks for looking at this. Except from GIN indexes and full-text-search being really good in our applications, this also points to those excact places where it can be improved. On 2012-02-17 00:15, Tom Lane wrote: I looked into the complaint here of poor estimation for GIN indexscans: http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php I think this is the excact same issue: http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php At first glance it sounds like a mistake in selectivity estimation, but it isn't: the rowcount estimates are pretty nearly dead on. The problem is in the planner's estimate of the cost of executing the @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but actually it's a good deal more expensive than that. Some experimentation suggests that @@ might be about 500 times as expensive as a simple integer comparison. I don't propose pushing its procost up that much, but surely at least 10 would be appropriate, maybe even 100. However ... if you just alter pg_proc.procost in Marc's example, the planner *still* picks a seqscan, even though its estimate of the seqscan cost surely does go up. The reason is that its estimate of the GIN indexscan cost goes up just as much, since we charge one qual eval cost per returned tuple in gincostestimate. It is easy to tell from the actual runtimes that that is not what's happening in a GIN indexscan; we are not re-executing the @@ operator for every tuple. But the planner's cost model doesn't know that. There is something about lossy vs. non-lossy, if the index-result is lossy, then it would need to execute the @@ operator on each tuple and de-toast the toasted stuff and go all the way. If it isn't then at least count() on a gin-index should be able to utillize an index-only scan now? I've had a significant amout of struggle over the years in this corner and the patch that went in for gincostestimate brought a huge set of problems to the ground, but not all. Other related threads: http://archives.postgresql.org/pgsql-performance/2010-05/msg00031.php (ts_match_vq cost in discussion) http://archives.postgresql.org/pgsql-performance/2010-05/msg00266.php I dont think I have ever seen the actual run-time of any @@ query to be faster going through the seq-scan than going through the index. Not even if it is pulling near all the tuples out. (test-case that tries to go in that corner). http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php And I think is it due to a coulple of real-world things: 1) The tsvector-column is typically toasted. 2) The selected columns are typically in the main table. 3) The gin-index search + pulling main table is in fact a measuable cheaper operation than pulling main+toast uncompressing toast and applying ts_match_vq even in the most favourable case for the seqscan. Another real-world thing is that since the tsvector column is in toast and isn't read when performing a bitmap-heap-scan, in addition to the decompress-cost is it almost never hot in memory either, causing its actuall runtime to be even worse. Same problems hit a index-scan on another key where filtering on a @@ operator, but I think I got around most of them by bumping both cost of @@ and limit in the query to 10K instead of the 200 actually wanted. I do think I have been digging sufficiently in this corner and can fairly easy test and craft test-examples that will demonstrate the challenges. (a few is attached in above links). Thanks for digging in this corner. Let me know if i can help, allthough my actual coding skills are spare (at best). -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MySQL search query is not executing in Postgres DB
On 17.02.2012 07:33, premanand wrote: In MySQL the below query is executing properly. SELECT * FROMTable-name WHERE (Table.ID LIKE '1%') But when i try to execute the above query in Postgres, i get the following Exception org.postgresql.util.PSQLException: ERROR: operator does not exist: integer ~~ unknown Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts. If i convert the same query SELECT * FROMTable-name WHERE CAST(Table.ID as TEXT) LIKE '1%' . This gets executed directly in Postgres DB. But i need some query which implicitly type cast in DB, which allows me to execute the MySQL query without any Exception. Because i remember there is a way for integer to boolean implicit type cast. Please refer the following link. http://archives.postgresql.org/pgsql-general/2011-01/msg00866.php You can use CREATE CAST (http://www.postgresql.org/docs/current/static/sql-createcast.html). Or you can create the operator integer ~~ text with CREATE FUNCTION + CREATE OPERATOR. The latter would match fewer cases, which would reduce the chances of introducing subtle bugs elsewhere in your application. Of course, the best fix would be to change your queries. It's quite sloppy to rely on integer LIKE text without an explicit cast in the query. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Collecting statistics on CSV file data
Hi Hanada-san, Sorry for the late response. (2012/02/10 22:05), Shigeru Hanada wrote: (2011/12/15 11:30), Etsuro Fujita wrote: (2011/12/14 15:34), Shigeru Hanada wrote: I think this patch could be marked as Ready for committer with some minor fixes. Please find attached a revised patch (v6.1). I've tried to make pgsql_fdw work with this feature, and found that few static functions to be needed to exported to implement ANALYZE handler in short-cut style. The Short-cut style means the way to generate statistics (pg_class and pg_statistic) for foreign tables without retrieving sample data from foreign server. That's great! Here is my review. The patch applies with some modifications and compiles cleanly. But regression tests on subqueries failed in addition to role related tests as discussed earlier. While I've not looked at the patch in detail, I have some comments: 1. The patch might need codes to handle the irregular case where ANALYZE-related catalog data such as attstattarget are different between the local and the remote. (Although we don't have the options to set such a data on a foreign table in ALTER FOREIGN TABLE.) For example, while attstattarget = -1 for some column on the local, attstattarget = 0 for that column on the remote meaning that there can be no stats available for that column. In such a case it would be better to inform the user of it. 2. It might be better for the FDW to estimate the costs of a remote query for itself without doing EXPLAIN if stats were available using this feature. While this approach is less accurate compared to the EXPLAIN approach due to the lack of information such as seq_page_cost or randam_page_cost on the remote, it is cheaper! I think such a information may be added to generic options for a foreign table, which may have been previously discussed. 3. In implementing ANALYZE handler, hardest part was copying anyarray values from remote to local. If we can make it common in core, it would help FDW authors who want to implement ANALYZE handler without retrieving sample rows from remote server. +1 from me. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers