date:20120216

On Thu, Feb 16, 2012 at 6:16 AM, Simon Riggs si...@2ndquadrant.com wrote:
 v8 attached

It's hard to believe that this version has been tested terribly
thoroughly, because it doesn't compile.

+   LockBufHdr(buf);
+
+   /*
+* Run PageGetLSN while holding header lock.
+*/
+   recptr = BufferGetLSN(buf);
+
+   /* To check if block content changes while flushing. - vadim 01/17/97 */
+   buf-flags = ~BM_JUST_DIRTIED;
+   UnlockBufHdr(buf);
+

This doesn't seem very helpful.  It's obvious even without the comment
that we're running PageGetLSN while holding the header lock.  What's
not obvious, and what the comment should be explaining, is why we're
doing that.

+   /*
+* If we're in recovery we cannot dirty a page because 
of a hint.
+* We can set the hint, just not dirty the page as a 
result so
+* the hint is lost when we evict the page or shutdown.
+*
+* See long discussion in bufpage.c
+*/
+   if (RecoveryInProgress())
+   return;

Doesn't this seem awfully bad for performance on Hot Standby servers?
I agree that it fixes the problem with un-WAL-logged pages there, but
I seem to recall some recent complaining about performance features
that work on the master but not the standby.  Durable hint bits are
one such feature.

+* Basically, we simply prevent the checkpoint WAL 
record from
+* being written until we have marked the buffer dirty. 
We don't
+* start the checkpoint flush until we have marked 
dirty, so our
+* checkpoint must flush the change to disk 
successfully or the
+* checkpoint never gets written, so crash recovery 
will fix.
+*
+* It's possible we may enter here without an xid, so 
it is
+* essential that CreateCheckpoint waits for virtual 
transactions
+* rather than full transactionids.
+*/
+   MyPgXact-delayChkpt = delayChkpt = true;

I am slightly worried that this expansion in the use of this mechanism
(formerly called inCommit, for those following along at home) could
lead to checkpoint starvation.  Suppose we've got one or two large
table scans wandering along, setting hint bits, and now suddenly it's
time to checkpoint.  How long will it take the checkpoint process to
find a time when nobody's got delayChkpt set?

+ #define PageSetChecksum(page) \
+ do \
+ { \
+   PageHeader  p = (PageHeader) page; \
+   p-pd_flags |= PD_PAGE_VERSION_PLUS1; \
+   p-pd_flags |= PD_CHECKSUM1; \
+   p-pd_flags = ~PD_CHECKSUM2; \
+   p-pd_verify.pd_checksum16 = PageCalcChecksum16(page); \
+ } while (0);
+
+ /* ensure any older checksum info is overwritten with watermark */
+ #define PageResetVersion(page) \
+ do \
+ { \
+   PageHeader  p = (PageHeader) page; \
+   if (!PageHasNoChecksum(p)) \
+   { \
+   p-pd_flags = ~PD_PAGE_VERSION_PLUS1; \
+   p-pd_flags = ~PD_CHECKSUM1; \
+   p-pd_flags = ~PD_CHECKSUM2; \
+   PageSetPageSizeAndVersion(p, BLCKSZ, PG_PAGE_LAYOUT_VERSION); \
+   } \
+ } while (0);

So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it
doesn't have a checksum, PD_CHECKSUM2 is not set?  What good does that
do?

   * PageGetPageSize
   *Returns the page size of a page.
   *
!  * Since PageSizeIsValid() when pagesize == BLCKSZ, just written BLCKSZ.
!  * This can be called on any page, initialised or not, in or out of buffers.
!  * You might think this can vary at runtime but you'd be wrong, since pages
!  * frequently need to occupy buffers and pages are copied from one to another
!  * so there are many hidden assumptions that this simple definition is true.
   */
! #define PageGetPageSize(page) (BLCKSZ)

I think I agree that the current definition of PageGetPageSize seems
unlikely to come anywhere close to allowing us to cope with multiple
page sizes, but I think this method of disabling it is a hack.  The
callers that want to know how big the page really is should just use
BLCKSZ instead of this macro, and those that want to know how big the
page THINKS it is (notably contrib/pageinspect) need a way to get that
information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] run GUC check hooks on RESET

On Wed, Feb 15, 2012 at 5:36 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Agreed.  I'm not sure we want to change the message text at all in
 9.1.  Translations and all that.

Agreed.  I think we definitely don't want to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we add crc32 in libpgport?

On Fri, Feb 3, 2012 at 7:33 PM, Daniel Farina dan...@heroku.com wrote:
 Ah, yes, I think my optimizations were off when building, or
 something.  I didn't get such verbosity at first, and then I remember
 doing something slightly different and then getting a lot of output.
 I didn't pay attention to the build size.  I will investigate.
[...]

 I agree, I was about to say what about a preprocessor hack... after
 the last paragraph, but saw you beat me to the punch.  I'll have a look soon.

Ping!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-16 Thread Albe Laurenz

Shigeru Hanada wrote:
 Thanks for the review.  Attached patches are revised version, though
 only fdw_helper_v5.patch is unchanged.

Two questions:
- Is it on purpose that you can specify all SSL client options
  except sslcompression?
- Since a rescan is done by rewinding the cursor, is it necessary
  to have any other remote isolation level than READ COMMITED?
  There is only one query issued per transaction.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-02-16 Thread Marko Kreen

On Thu, Feb 16, 2012 at 05:49:34PM +0900, Kyotaro HORIGUCHI wrote:
  I added the function PQskipRemainingResult() and use it in
 dblink. This reduces the number of executing try-catch block from
 the number of rows to one per query in dblink.

This implementation is wrong - you must not simply call PQgetResult()
when connection is in async mode.  And the resulting PGresult must
be freed.

Please just make PGsetRowProcessorErrMsg() callable outside from
callback.  That also makes clear to code that sees final PGresult
what happened.  As a bonus, this will simply make the function
more robust and less special.

Although it's easy to create some PQsetRowSkipFlag() function
that will silently skip remaining rows, I think such logic
is best left to user to handle.  It creates unnecessary special
case when handling final PGresult, so in the big picture
it creates confusion.

 This patch is based on the patch above and composed in the same
 manner - main three patches include all modifications and the '2'
 patch separately.

I think there is not need to carry the early-exit patch separately.
It is visible in archives and nobody screamed about it yet,
so I guess it's acceptable.  Also it's so simple, there is no
point in spending time rebasing it.

 diff --git a/src/interfaces/libpq/#fe-protocol3.c# 
 b/src/interfaces/libpq/#fe-protocol3.c#

There is some backup file in your git repo. 

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-16 Thread Kohei KaiGai

I found a strange behavior with v10. Is it available to reproduce?

In case of ftbl is declared as follows:
  postgres=# select * FROM ftbl;
   a |  b
  ---+-
   1 | aaa
   2 | bbb
   3 | ccc
   4 | ddd
   5 | eee
  (5 rows)

I tried to raise an error on remote side.

  postgres=# select * FROM ftbl WHERE 100 / (a - 3)  0;
  The connection to the server was lost. Attempting reset: Failed.
  The connection to the server was lost. Attempting reset: Failed.
  ! \q

Its call-trace was:

(gdb) bt
#0  0x0031030810a4 in free () from /lib64/libc.so.6
#1  0x7f2caa620bd9 in PQclear (res=0x2102500) at fe-exec.c:679
#2  0x7f2caa83c4db in execute_query (node=0x20f20a0) at pgsql_fdw.c:722
#3  0x7f2caa83c64a in pgsqlIterateForeignScan (node=0x20f20a0)
at pgsql_fdw.c:402
#4  0x005c120f in ForeignNext (node=0x20f20a0) at nodeForeignscan.c:50
#5  0x005a9b37 in ExecScanFetch (recheckMtd=0x5c11c0 ForeignRecheck,
accessMtd=0x5c11d0 ForeignNext, node=0x20f20a0) at execScan.c:82
#6  ExecScan (node=0x20f20a0, accessMtd=0x5c11d0 ForeignNext,
recheckMtd=0x5c11c0 ForeignRecheck) at execScan.c:132
#7  0x005a2128 in ExecProcNode (node=0x20f20a0) at execProcnode.c:441
#8  0x0059edc2 in ExecutePlan (dest=0x210f280,
direction=optimized out, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, planstate=0x20f20a0, estate=0x20f1f88)
at execMain.c:1449

This is the PG_CATCH block at execute_query(). fetch_result() raises
an error, then it shall be catched to release PGresult.
Although res should be NULL at this point, PQclear was called with
a non-zero value according to the call trace.

More strangely, I tried to inject elog(INFO, ...) to show the value of res
at this point. Then, it become unavailable to reproduce when I tried to
show the pointer of res with elog(INFO, res = %p, res);

Why the res has a non-zero value, even though it was cleared prior
to fetch_result() and an error was raised within this function?

Thanks,

2012年2月16日13:41 Shigeru Hanada shigeru.han...@gmail.com:
 Kaigai-san,

 Thanks for the review.  Attached patches are revised version, though
 only fdw_helper_v5.patch is unchanged.

 (2012/02/16 0:09), Kohei KaiGai wrote:
 [memory context of tuple store]
 It calls tuplestore_begin_heap() under the memory context of
 festate-scan_cxt at pgsqlBeginForeignScan.

 Yes, it's because tuplestore uses a context which was current when
 tuplestore_begin_heap was called.  I want to use per-scan context for
 tuplestore, to keep its content tuples alive through the scan.

 On the other hand, tuplestore_gettupleslot() is called under the
 memory context of festate-tuples.

 Yes, result tuples to be returned to executor should be allocated in
 per-scan context and live until next IterateForeignScan (or
 EndForeignScan),  because such tuple will be released via ExecClearTuple
 in next IterateForeignScan call.  If we don't switch context to per-scan
 context, result tuple is allocated in per-tuple context and cause
 double-free and server crash.

 I could not find a callback functions being invoked on errors,
 so I doubt the memory objects acquired within tuplestore_begin_heap()
 shall be leaked, even though it is my suggestion to create a sub-context
 under the existing one.

 How do you confirmed that no callback function is invoked on errors?  I
 think that memory objects acquired within tuplestore_begin_heap (I guess
 you mean excluding stored tuples, right?) are released during cleanup of
 aborted transaction.  I tested that by adding elog(ERROR) to the tail of
 store_result() for intentional error, and execute large query 100 times
 in a session.  I saw VIRT value (via top command) comes down to constant
 level after every query.

 In my opinion, it is a good choice to use es_query_cxt of the supplied 
 EState.
 What does prevent to apply this per-query memory context?

 Ah, I've confused context management of pgsql_fdw...  I fixed pgsql_fdw
 to create per-scan context as a child of es_query_cxt in
 BeginForeignScan, and use it for tuplestore of the scan.  So, tuplestore
 and its contents are released correctly at EndForeignScan, or cleanup of
 aborted transaction in error case.

 You mention about PGresult being malloc()'ed. However, it seems to me
 fetch_result() and store_result() once copy the contents on malloc()'ed
 area to the palloc()'ed area, and PQresult is released on an error using
 PG_TRY() ... PG_CATCH() block.

 During thinking about this comment, I found double-free bug of PGresult
 in execute_query, thanks :)

 But, sorry, I'm not sure what the concern you show here is.  The reason
 why copying  tuples from malloc'ed area to palloc'ed area is to release
 PGresult before returning from the IterateForeingScan call.  The reason
 why using PG_TRY block is to sure that PGresult is released before jump
 back to upstream in error case.

 [Minor comments]
 Please set NULL to sql variable at begin_remote_tx().
 Compiler raises a warnning

Re: [HACKERS] Designing an extension for feature-space similarity search

2012-02-16 Thread Jay Levitt


Alexander Korotkov wrote:

On Thu, Feb 16, 2012 at 12:34 AM, Jay Levitt jay.lev...@gmail.com
mailto:jay.lev...@gmail.com wrote:

- But a dimension might be in any domain, not just floats
- The distance along each dimension is a domain-specific function

What exact domains do you expect? Some domains could appear to be quite hard
for index-based similarity search using GiST (for example, sets, strings etc.).


Oh, nothing nearly so complex, and (to Tom's point) no composite types, 
either. Right now we have demographics like gender, geolocation, and 
birthdate; I think any domain will be a type that's easily expressible in 
linear terms.  I was thinking in domains rather than types because there 
isn't one distance function for date or float; me.birthdate - 
you.birthdate birthdate is normalized to a different curve than now() - 
posting_date, and raw_score - raw_score would differ from z_score - z_score.


It would have been elegant to express that distance with -, but since 
domains can't have operators, I can create distance(this, other) functions 
for each domain. It just won't look as pretty!


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] possible new option for wal_sync_method

2012-02-16 Thread Dan Scales

When running Postgres on a single ext3 filesystem on Linux, we find that
the attached simple patch gives significant performance benefit (7-8% in
numbers below).  The patch adds a new option for wal_sync_method, which
is open_direct.  With this option, the WAL is always opened with
O_DIRECT (but not O_SYNC or O_DSYNC).  For Linux, the use of only
O_DIRECT should be correct.  All WAL logs are fully allocated before
being used, and the WAL buffers are 8K-aligned, so all direct writes are
guaranteed to complete before returning.  (See
http://lwn.net/Articles/348739/)

The advantage of using O_DIRECT is that there is no fsync/fdatasync()
used.  All of the other wal_sync_methods use fsync/fdatasync(), either
explicitly or implicitly (via the O_SYNC and O_DATASYNC options).
fsync/fdatasync can be very slow on ext3, because it seems to have to
always wait for the current filesystem meta-data transaction to complete,
even if that meta-data operation is completely unrelated to the file
being fsync'ed.  There can be many metadata operations happening on the
data files, so the WAL log fsync can wait for metadata operations on
the data files.  Since O_DIRECT does not do any fsync/fdatasync operation,
it avoids this bottleneck, and can finish more quickly on average.
The open_sync and open_dsync options do not have this benefit, because
they do an equivalent of an fsync/fdatasync after every WAL write.

For the open_sync and open_dsync options, O_DIRECT is used for writes
only if the xlog will not need to be consumed by the archiver or
hot-standby.  I am not keying the open_direct behavior based on whether
XLogIsNeeded() is true, because we see performance gain even when
archiving is enabled (using a simple script that copies and compresses
the log segments).  For 2-processor, 50-warehouse DBT2 run on SLES 11, I
get the following NOTPM results:

  wal_sync_method
 fdatasync   open_direct  open_sync

archiving off: 17076   18481   17094
archiving on:  15704   16923   15898


Do folks have any interest in this change, or comments on its
usefulness/correctness?  It would be just an extra option for
wal_sync_method that users can try out and has benefits for certain
configurations.

Dan
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 266c0de..a830a01 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -122,6 +122,7 @@ const struct config_enum_entry sync_method_options[] = {
 #ifdef OPEN_DATASYNC_FLAG
 	{open_datasync, SYNC_METHOD_OPEN_DSYNC, false},
 #endif
+	{open_direct, SYNC_METHOD_OPEN_DIRECT, false},
 	{NULL, 0, false}
 };
 
@@ -1925,7 +1926,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch)
 		 * fsync more than one file.
 		 */
 		if (sync_method != SYNC_METHOD_OPEN 
-			sync_method != SYNC_METHOD_OPEN_DSYNC)
+			sync_method != SYNC_METHOD_OPEN_DSYNC 
+			sync_method != SYNC_METHOD_OPEN_DIRECT)
 		{
 			if (openLogFile = 0 
 !XLByteInPrevSeg(LogwrtResult.Write, openLogId, openLogSeg))
@@ -8958,6 +8960,15 @@ get_sync_bit(int method)
 		case SYNC_METHOD_OPEN_DSYNC:
 			return OPEN_DATASYNC_FLAG | o_direct_flag;
 #endif
+   case SYNC_METHOD_OPEN_DIRECT:
+			/*
+			 * Open the log with O_DIRECT flag only.  O_DIRECT guarantees
+			 * that data is written to disk when the IO completes if and
+			 * only if the file is fully allocated.  Fortunately, the log
+			 * files are always fully allocated by XLogFileInit() (or are
+			 * recycled from a fully-allocated log).
+			 */
+			return O_DIRECT;
 		default:
 			/* can't happen (unless we are out of sync with option array) */
 			elog(ERROR, unrecognized wal_sync_method: %d, method);
@@ -9031,6 +9042,7 @@ issue_xlog_fsync(int fd, uint32 log, uint32 seg)
 #endif
 		case SYNC_METHOD_OPEN:
 		case SYNC_METHOD_OPEN_DSYNC:
+		case SYNC_METHOD_OPEN_DIRECT:
 			/* write synced it already */
 			break;
 		default:
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 400c52b..97acde5 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -564,3 +564,4 @@
 #--
 
 # Add settings for extensions here
+wal_sync_method = open_direct
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f8aecef..b888ee7 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -83,6 +83,7 @@ typedef struct XLogRecord
 #define SYNC_METHOD_OPEN		2		/* for O_SYNC */
 #define SYNC_METHOD_FSYNC_WRITETHROUGH	3
 #define SYNC_METHOD_OPEN_DSYNC	4		/* for O_DSYNC */
+#define SYNC_METHOD_OPEN_DIRECT	5		/* for O_DIRECT */
 extern int	sync_method;
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-16 Thread Kohei KaiGai

2012年2月16日13:41 Shigeru Hanada shigeru.han...@gmail.com:
 Kaigai-san,

 Thanks for the review.  Attached patches are revised version, though
 only fdw_helper_v5.patch is unchanged.

 (2012/02/16 0:09), Kohei KaiGai wrote:
 [memory context of tuple store]
 It calls tuplestore_begin_heap() under the memory context of
 festate-scan_cxt at pgsqlBeginForeignScan.

 Yes, it's because tuplestore uses a context which was current when
 tuplestore_begin_heap was called.  I want to use per-scan context for
 tuplestore, to keep its content tuples alive through the scan.

 On the other hand, tuplestore_gettupleslot() is called under the
 memory context of festate-tuples.

 Yes, result tuples to be returned to executor should be allocated in
 per-scan context and live until next IterateForeignScan (or
 EndForeignScan),  because such tuple will be released via ExecClearTuple
 in next IterateForeignScan call.  If we don't switch context to per-scan
 context, result tuple is allocated in per-tuple context and cause
 double-free and server crash.

 I could not find a callback functions being invoked on errors,
 so I doubt the memory objects acquired within tuplestore_begin_heap()
 shall be leaked, even though it is my suggestion to create a sub-context
 under the existing one.

 How do you confirmed that no callback function is invoked on errors?  I
 think that memory objects acquired within tuplestore_begin_heap (I guess
 you mean excluding stored tuples, right?) are released during cleanup of
 aborted transaction.  I tested that by adding elog(ERROR) to the tail of
 store_result() for intentional error, and execute large query 100 times
 in a session.  I saw VIRT value (via top command) comes down to constant
 level after every query.

Oops, I overlooked the point where MessageContext and its children get
reset. However, as its name, I don't believe it was right usage of memory
context.
As the latest version doing, es_query_cxt is the right way to acquire
memory object with per-query duration.

 In my opinion, it is a good choice to use es_query_cxt of the supplied 
 EState.
 What does prevent to apply this per-query memory context?

 Ah, I've confused context management of pgsql_fdw...  I fixed pgsql_fdw
 to create per-scan context as a child of es_query_cxt in
 BeginForeignScan, and use it for tuplestore of the scan.  So, tuplestore
 and its contents are released correctly at EndForeignScan, or cleanup of
 aborted transaction in error case.

I believe it is right direction.

 You mention about PGresult being malloc()'ed. However, it seems to me
 fetch_result() and store_result() once copy the contents on malloc()'ed
 area to the palloc()'ed area, and PQresult is released on an error using
 PG_TRY() ... PG_CATCH() block.

 During thinking about this comment, I found double-free bug of PGresult
 in execute_query, thanks :)

Unfortunately, I found the strange behavior around this code.
I doubt an interaction between longjmp and compiler optimization,
but it is not certain right now.

I'd like to push this patch to committer reviews after this problem got closed.

Right now, I don't have comments on this patch any more.

 But, sorry, I'm not sure what the concern you show here is.  The reason
 why copying  tuples from malloc'ed area to palloc'ed area is to release
 PGresult before returning from the IterateForeingScan call.  The reason
 why using PG_TRY block is to sure that PGresult is released before jump
 back to upstream in error case.

 [Minor comments]
 Please set NULL to sql variable at begin_remote_tx().
 Compiler raises a warnning due to references of uninitialized variable,
 even though the code path never run.

 Fixed.  BTW, just out of curiosity, which compiler do you use?  My
 compiler ,gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10) on Fedora 15,
 doesn't warn it.

I uses Fedora 16, and GCC 4.6.2.

[kaigai@iwashi pgsql_fdw]$ gcc --version
gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1)

It is not a matter related to compiler version, but common manner in
PostgreSQL code. You can likely found source code comments
like /* keep compiler quiet */

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Designing an extension for feature-space similarity search

2012-02-16 Thread Jay Levitt


Tom Lane wrote:

Jay Levittjay.lev...@gmail.com  writes:

- I'm not sure how to represent arbitrary column-like features without
reinventing the wheel and putting a database in the database.


ISTM you could define a composite type and then create operators and an
operator class over that type.  If you were trying to make a btree
opclass there might be a conflict with the built-in record_ops opclass,
but since you're only interested in GIST I don't see any real
roadblocks.


Perfect. Composite types are exactly what I need here; the application can 
declare its composite type and provide distance functions for each member, 
and the extension can use those to calculate similarity. How do I introspect 
the composite type's pg_class to see what it contains? I assume there's a 
better way than SPI on system catalogs :) Should I be using systable_* 
functions from genam, or is there an in-memory tree? I feel like funcapi 
gets me partway there but there's magic in the middle.


Can you think of any code that would serve as a sample, maybe whatever 
creates the output for psql's \d?




The main potential disadvantage of this is that you'd have
the standard tuple header as overhead in index entries --- but maybe the
entries are large enough that that doesn't matter, and in any case you
could probably make use of the GIST compress method to get rid of most
of the header.  Maybe convert to MinimalTuple, for instance, if you want
to still be able to leverage existing support code for field extraction.


Probably not worth it to save the 8 bytes; we're starting out at about 20 
floats per row. But good to know for later optimization...





- Can domains have operators, or are operators defined on types?


I think the current state of play is that you can have such things but
the system will only consider them for exact type matches, so you might
need more explicit casts than you ordinarily would.  However, we only
support domains over base types not composites, so this isn't really
going to be a profitable direction for you anyway.


Actually, as mentioned to Alexander, I'm thinking of domains per feature, 
not for the overall tuple, so birthdate-birthdate differs from 
now()-posting_date.  Sounds like that might work - I'll play.



- Does KNN-GiST run into problems when-  returns values that don't make
sense in the physical world?


Wouldn't surprise me.  In general, non-strict index operators are a bad
idea.  However, if the indexed entities are records, it would be
entirely your own business how you handled individual fields being NULL.


Yeah, that example conflated NULLs in the feature fields (we don't know your 
birthdate) with - on the whole tuple.  Oops.


I guess I can just test this by verifying that KNN-GiST ordered by distance 
returns the same results as without the index.


Thanks for your help here.

Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

On Wed, Feb 8, 2012 at 7:56 PM, Joachim Wieland j...@mcknight.de wrote:
 So we at least need to press on far enough to get to that point.

 Sure, let me know if I can help you with something.

Alright.  I think (hope) that I've pushed this far enough to serve the
needs of parallel pg_dump.  The error handling is still pretty grotty
and might need a bit more surgery to handle the parallel case, but I
think that making this actually not ugly will require eliminating the
Archive/ArchiveHandle distinction, which is probably a good thing to
do but, as you point out, maybe not the first priority right now.

Can you provide an updated patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possible new option for wal_sync_method

2012-02-16 Thread Andres Freund

Hi,

On Thursday, February 16, 2012 06:18:23 PM Dan Scales wrote:
 When running Postgres on a single ext3 filesystem on Linux, we find that
 the attached simple patch gives significant performance benefit (7-8% in
 numbers below).  The patch adds a new option for wal_sync_method, which
 is open_direct.  With this option, the WAL is always opened with
 O_DIRECT (but not O_SYNC or O_DSYNC).  For Linux, the use of only
 O_DIRECT should be correct.  All WAL logs are fully allocated before
 being used, and the WAL buffers are 8K-aligned, so all direct writes are
 guaranteed to complete before returning.  (See
 http://lwn.net/Articles/348739/)
I don't think that behaviour is safe in the face of write caches in the IO 
path. Linux takes care to issue flush/barrier instructions when necessary if 
you issue an fsync/fdatasync, but to my knowledge it does not when O_DIRECT is 
used (That would suck performancewise).
I think that behaviour is safe if you have no externally visible write caching 
enabled but thats not exactly easy to get/document knowledge.


Why should there otherwise be any performance difference between O_DIRECT|
O_SYNC and O_DIRECT in wal write case? There is no metadata that needs to be 
written and I have a hard time imaging that the check whether there is 
metadata is that expensive.

I guess a more interesting case would be comparing O_DIRECT|O_SYNC with 
O_DIRECT + fdatasync() or even O_DIRECT +  
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | 
SYNC_FILE_RANGE_WAIT_AFTER)

Any special reason youve did that comparison on ext3? Especially with 
data=ordered its behaviour regarding syncs is pretty insane performancewise. 
Ext4 would be a bit more interesting...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml

Alvaro Herrera alvhe...@commandprompt.com writes:
 If you copied from a pager, those tend to expand tabs to spaces, so the
 patch gets mangled at that point.  At least less does that.  OTOH if
 you :r the patch file, it works fine.

You should be able to add an “inline” attachment too, and the receiver
should then be able to both see it inline in the email and store it as a
file.  Tom uses inline attachments a lot, and apparently very few MUA
are allowing both features on that (gnus is perfectly fine here).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Designing an extension for feature-space similarity search

Jay Levitt jay.lev...@gmail.com writes:
 Perfect. Composite types are exactly what I need here; the application can 
 declare its composite type and provide distance functions for each member, 
 and the extension can use those to calculate similarity. How do I introspect 
 the composite type's pg_class to see what it contains? I assume there's a 
 better way than SPI on system catalogs :)

Definitely.  Take a look at record_out, record_cmp, and sibling
functions on generic composites (src/backend/utils/adt/rowtypes.c).
You might or might not feel like wiring in more assumptions than those
have about the possible contents of the record type.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possible new option for wal_sync_method

2012-02-16 Thread Marti Raudsepp

On Thu, Feb 16, 2012 at 19:18, Dan Scales sca...@vmware.com wrote:
 fsync/fdatasync can be very slow on ext3, because it seems to have to
 always wait for the current filesystem meta-data transaction to complete,
 even if that meta-data operation is completely unrelated to the file
 being fsync'ed.

Use the data=writeback mount option to remove this restriction. This
is actually the suggested setting for PostgreSQL file systems:
http://www.postgresql.org/docs/current/static/wal-intro.html

(Note that this is unsafe for some other applications, so I wouldn't
use it on the root file system)

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possible new option for wal_sync_method

2012-02-16 Thread Josh Berkus

On 2/16/12 9:18 AM, Dan Scales wrote:
 Do folks have any interest in this change, or comments on its
 usefulness/correctness?  It would be just an extra option for
 wal_sync_method that users can try out and has benefits for certain
 configurations.

Does it have any benefit on Ext4/XFS/Butrfs?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: copybytea command for psql

2012-02-16 Thread Andrew Dunstan

A while ago I went looking for nice ways to export an unencoded bytea 
value using psql, see 
http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. 
Regina Obe is also in want of a solution for this: 
http://www.postgresonline.com/journal/archives/243-PSQL-needs-a-better-way-of-outputting-bytea-to-binary-files.html.


It seems like what we need is a psql command for it, something like:

   \copybytea (select query_returning_one_bytea) to /path/to/file


Does anyone have a better solution or any objection to this feature?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: copybytea command for psql

Andrew Dunstan and...@dunslane.net writes:
 A while ago I went looking for nice ways to export an unencoded bytea 
 value using psql, see 
 http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html.
  
 Regina Obe is also in want of a solution for this: 
 http://www.postgresonline.com/journal/archives/243-PSQL-needs-a-better-way-of-outputting-bytea-to-binary-files.html.

 It seems like what we need is a psql command for it, something like:
 \copybytea (select query_returning_one_bytea) to /path/to/file
 Does anyone have a better solution or any objection to this feature?

It seems awfully narrow.  In the first place, why restrict it to bytea?
In the second, that syntax is going to cause serious headaches, not
least because backslash commands can't extend across multiple lines.

The idea that comes to mind for me, if you want to connect this up to
SELECT and not COPY, is some variant of \g that implies (1) pull back
the data as binary not text, and (2) dump it to the target file with
absolutely no recordseps, fieldseps, etc; just the bytes, ma'am.

It might be worth thinking of (1) and (2) as separately invokable
features, or then again it might not.  I also wonder how this might
interact with Peter E's recent commit for null-byte separators.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

On Thu, Feb 16, 2012 at 12:42 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Fixed, I've spent quite some time refining the API.

That is better.

What could still be done here is to create a cache (along the lines of
attoptcache.c) that stores a big array with one slot for each
supported command type.  The first time a command is executed in a
particular session, we construct a cache entry for that command type,
storing the OIDs of the before and after triggers.  Then, instead of
having to do a catalog scan to notice that there are no triggers for a
given command type, we can just reference into the array and see, oh,
we probed before and found no triggers.  CacheRegisterSyscacheCallback
or some other mechanism can be used to flush all the cached
information whenever we notice that pg_cmdtrigger has been updated.

Now, I don't know for sure that this is necessary without performance
testing, but I think we ought to try to figure that out.  This version
doesn't apply cleanly for me; there is a conflict in functioncmds.c,
which precludes my easily benchmarking it.

It strikes me that this patch introduces a possible security hole: it
allows command triggers to be installed by the database owner, but
that seems like it might allow the database owner can usurp the
privileges of any user who runs one of these commands in his or her
database, including the superuser.   Perhaps that could be fixed by
running command triggers as the person who created them rather than as
the superuser, but that seems like it might be lead to other kinds of
privilege-escalation bugs.

If I install a command trigger that prevents all DDL, how do I
uninstall it?  Or say I'm the superuser and I want to undo something
my disgruntled DBA did before he quit.

I would much prefer to have DropCmdTrigStmt wedge itself into the
existing DropStmt infrastructure which I just recently worked so hard
on.  If you do that, you should find that you can then easily also
support comments on command triggers, security labels on command
triggers (though I don't know if that's useful), and the ability to
include command triggers in extensions.

I am a bit worried about supporting command triggers on statements
that do internal transaction control, such as VACUUM and CREATE INDEX
CONCURRENTLY.  Obviously that's very useful and I'd like to have it,
but there's a problem: if the AFTER trigger errors out, it won't undo
the whole command.  That might be very surprising.  BEFORE triggers
seem OK, and AFTER triggers might be OK too but we at least need to
think hard about how to document that.

I think it would be better to bail on trying to use CREATE TRIGGER and
DROP TRIGGER as a basis for this functionality, and instead create
completely new toplevel statements CREATE COMMAND TRIGGER and DROP
COMMAND TRIGGER.  Then, you could decide that all command triggers
live in the same namespace, and therefore to get rid of the command
trigger called bob you can just say DROP COMMAND TRIGGER bob,
without having to specify the type of command it applies to.  It's
still clear that you're dropping a *command* trigger because that's
right in the statement name, whereas it would make me a bit uneasy to
decide that DROP TRIGGER bob means a command trigger just by virtue
of the fact that no table name was specified.  That would probably
also make it easier to accomplish the above-described goal of
integrating this into the DropStmt infrastructure.

+   if (!superuser())
+   if (!pg_database_ownercheck(MyDatabaseId, GetUserId()))
+   aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE,
+
get_database_name(MyDatabaseId));

The separate superuser check is superfluous; pg_database_ownercheck()
already handles that.

Can we call InitCommandContext in some centralized place, rather than
separately at lots of different call sites?

I am confused why this is adding a new file called dumpcatalog.c which
looks suspiciously similar to some existing pg_dump code I've been
staring at all day.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: copybytea command for psql

2012-02-16 Thread Andrew Dunstan

On 02/16/2012 03:32 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net writes:

A while ago I went looking for nice ways to export an unencoded bytea
value using psql, see
http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html.
Regina Obe is also in want of a solution for this:
http://www.postgresonline.com/journal/archives/243-PSQL-needs-a-better-way-of-outputting-bytea-to-binary-files.html.
It seems like what we need is a psql command for it, something like:
\copybytea (select query_returning_one_bytea) to /path/to/file
Does anyone have a better solution or any objection to this feature?

It seems awfully narrow. In the first place, why restrict it to bytea?
In the second, that syntax is going to cause serious headaches, not
least because backslash commands can't extend across multiple lines.

The idea that comes to mind for me, if you want to connect this up to
SELECT and not COPY, is some variant of \g that implies (1) pull back
the data as binary not text, and (2) dump it to the target file with
absolutely no recordseps, fieldseps, etc; just the bytes, ma'am.

It might be worth thinking of (1) and (2) as separately invokable
features, or then again it might not. I also wonder how this might
interact with Peter E's recent commit for null-byte separators.

Oh, nice idea. say \g{bn} where b was for binary fetch/output and n was
for no recordseps etc?

That looks like a winner.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-02-16 Thread Alvaro Herrera


Excerpts from Dimitri Fontaine's message of jue feb 16 14:42:26 -0300 2012:
 Hi,
 
 Please find attached version 8 of the patch, which fixes most of your
 complaints.

Hi,

I didn't like the new cmdtrigger.h file.  It's included by a lot of
other headers, and it's also itself including execnodes.h and
parsenodes.h which means practically the whole lot of the source tree
is included in turn.  If you could split it, so that the struct
definition is in a different file that's only included by the few .c
files that actually use that struct, I'd be much happier.

... after looking at it more closely, I think only this line needs to be
in a separate file:

typedef struct CommandContextData *CommandContext;

and that file is included by other headers; they don't need the function
declarations or the struct definition.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

Hi,

First answer now, new patch version tomorrow.

Robert Haas robertmh...@gmail.com writes:
 That is better.

Cool :)

 What could still be done here is to create a cache (along the lines of
 attoptcache.c) that stores a big array with one slot for each
 supported command type.  The first time a command is executed in a
 particular session, we construct a cache entry for that command type,
 storing the OIDs of the before and after triggers.  Then, instead of
 having to do a catalog scan to notice that there are no triggers for a
 given command type, we can just reference into the array and see, oh,
 we probed before and found no triggers.  CacheRegisterSyscacheCallback
 or some other mechanism can be used to flush all the cached
 information whenever we notice that pg_cmdtrigger has been updated.

I guess it's another spelling for a catalog cache, so I'll look at what
it takes to build either a full catcache or this array cache you're
talking about tomorrow.

 Now, I don't know for sure that this is necessary without performance
 testing, but I think we ought to try to figure that out.  This version
 doesn't apply cleanly for me; there is a conflict in functioncmds.c,
 which precludes my easily benchmarking it.

Means I need to update my master's branch and merge conflicts. You could
also test right from my github branch too, I guess.

  https://github.com/dimitri/postgres
  https://github.com/dimitri/postgres/tree/command_triggers

 It strikes me that this patch introduces a possible security hole: it
 allows command triggers to be installed by the database owner, but
 that seems like it might allow the database owner can usurp the
 privileges of any user who runs one of these commands in his or her
 database, including the superuser.   Perhaps that could be fixed by
 running command triggers as the person who created them rather than as
 the superuser, but that seems like it might be lead to other kinds of
 privilege-escalation bugs.

We could decide command triggers are superuser only.  Security is not
something I'm very strong at, so I'll leave it up to you to decide.

 If I install a command trigger that prevents all DDL, how do I
 uninstall it?  Or say I'm the superuser and I want to undo something
 my disgruntled DBA did before he quit.

Good catch, I guess we need to remove creating and dropping a command
trigger to the list of supported commands in the ANY COMMAND list.

 I would much prefer to have DropCmdTrigStmt wedge itself into the
 existing DropStmt infrastructure which I just recently worked so hard
 on.  If you do that, you should find that you can then easily also
 support comments on command triggers, security labels on command
 triggers (though I don't know if that's useful), and the ability to
 include command triggers in extensions.

Ah yes, that too was on the TODO list, I just forgot about it. I still
remember the merge conflicts when that patch went it, you know… :)

 I am a bit worried about supporting command triggers on statements
 that do internal transaction control, such as VACUUM and CREATE INDEX
 CONCURRENTLY.  Obviously that's very useful and I'd like to have it,
 but there's a problem: if the AFTER trigger errors out, it won't undo
 the whole command.  That might be very surprising.  BEFORE triggers
 seem OK, and AFTER triggers might be OK too but we at least need to
 think hard about how to document that.

I think we should limit the support to BEFORE command trigger only.  It
was unclear to me how to solve the problem for the AFTER command case,
and if you're unclear to, then there's not that many open questions.

 I think it would be better to bail on trying to use CREATE TRIGGER and
 DROP TRIGGER as a basis for this functionality, and instead create
 completely new toplevel statements CREATE COMMAND TRIGGER and DROP
 COMMAND TRIGGER.  Then, you could decide that all command triggers
 live in the same namespace, and therefore to get rid of the command
 trigger called bob you can just say DROP COMMAND TRIGGER bob,
 without having to specify the type of command it applies to.  It's

I have no strong feeling about that.  Would that require that COMMAND be
more reserved than it currently is (UNRESERVED_KEYWORD)?

 still clear that you're dropping a *command* trigger because that's
 right in the statement name, whereas it would make me a bit uneasy to
 decide that DROP TRIGGER bob means a command trigger just by virtue
 of the fact that no table name was specified.  That would probably
 also make it easier to accomplish the above-described goal of
 integrating this into the DropStmt infrastructure.

The other thing is that you might want to drop the trigger from only one
command, of course we could support both syntax.  We could also add
support for the following one:

  DROP TRIGGER bob ON ALL COMMANDS;

As ALL is already a reserved word, that will work.

 +   if (!superuser())
 +   if (!pg_database_ownercheck(MyDatabaseId, GetUserId()))
 +

Re: [HACKERS] Command Triggers

Alvaro Herrera alvhe...@commandprompt.com writes:
 I didn't like the new cmdtrigger.h file.  It's included by a lot of
 other headers, and it's also itself including execnodes.h and
 parsenodes.h which means practically the whole lot of the source tree
 is included in turn.  If you could split it, so that the struct
 definition is in a different file that's only included by the few .c
 files that actually use that struct, I'd be much happier.

I didn't realize that, thanks for reviewing!

 ... after looking at it more closely, I think only this line needs to be
 in a separate file:

 typedef struct CommandContextData *CommandContext;

 and that file is included by other headers; they don't need the function
 declarations or the struct definition.

I'll look into that tomorrow then. The same trick is already applied to
Relation and RelationData (resp. in src/include/utils/relcache.h and
src/include/utils/rel.h), and only now I understand why :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Bug in intarray?

2012-02-16 Thread Guillaume Lelarge

Hi,

On a french PostgreSQL web forum, one of our users asked about a curious
behaviour of the intarray extension.

This query:
  SELECT ARRAY[-1,3,1]  ARRAY[1, 2];
should give {1} as a result.

But, on HEAD (and according to his tests, on 9.0.6 and 9.1.2), it
appears to give en empty array.

Digging on this issue, another user (Julien Rouhaud) made an interesting
comment on this line of code:

if (i + j == 0 || (i + j  0  *(dr - 1) != db[j]))

(line 159 of contrib/intarray/_int_tool.c, current HEAD)

Apparently, the code tries to check the current value of the right side
array with the previous value of the resulting array. Which clearly
cannot work if there is no previous value in the resulting array.

So I worked on a patch to fix this, as I think it is a bug (but I may be
wrong). Patch is attached and fixes the issue AFAICT.

Thanks.


-- 
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
diff --git a/contrib/intarray/_int_tool.c b/contrib/intarray/_int_tool.c
index 79f018d..4d7a1f2 100644
--- a/contrib/intarray/_int_tool.c
+++ b/contrib/intarray/_int_tool.c
@@ -159,7 +159,7 @@ inner_int_inter(ArrayType *a, ArrayType *b)
 			i++;
 		else if (da[i] == db[j])
 		{
-			if (i + j == 0 || (i + j  0  *(dr - 1) != db[j]))
+			if (i + j == 0 || (i + j  0  (dr - ARRPTR(r)) == 0) || (i + j  0  *(dr - 1) != db[j]))
 *dr++ = db[j];
 			i++;
 			j++;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] run GUC check hooks on RESET

2012-02-16 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 
 The main thing I would be worried about is whether you're sure
 that you have separated the RESET-as-a-command case from the cases
 where we actually are rolling back to a previous state.
 
It looks good to me.  I added a few regression tests for that.
 
 
Robert Haas robertmh...@gmail.com wrote:
 
 +1 for such a comment.
 
Added.
 
 
The attached patch includes these.  If it seems close, I'd be happy
to come up with a version for the 9.1 branch.  Basically it looks
like that means omitting the changes to variable.c (which only
changed message text and made the order of steps on related GUCs
more consistent), and capturing a different out file for the
expected directory.
 
-Kevin

*** a/src/backend/commands/variable.c
--- b/src/backend/commands/variable.c
***
*** 600,616  check_XactIsoLevel(char **newval, void **extra, GucSource 
source)
  
if (newXactIsoLevel != XactIsoLevel)
{
!   if (FirstSnapshotSet)
{
GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION);
!   GUC_check_errmsg(SET TRANSACTION ISOLATION LEVEL must 
be called before any query);
return false;
}
!   /* We ignore a subtransaction setting it to the existing value. 
*/
!   if (IsSubTransaction())
{
GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION);
!   GUC_check_errmsg(SET TRANSACTION ISOLATION LEVEL must 
not be called in a subtransaction);
return false;
}
/* Can't go to serializable mode while recovery is still active 
*/
--- 600,616 
  
if (newXactIsoLevel != XactIsoLevel)
{
!   /* We ignore a subtransaction setting it to the existing value. 
*/
!   if (IsSubTransaction())
{
GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION);
!   GUC_check_errmsg(cannot set transaction isolation 
level in a subtransaction);
return false;
}
!   if (FirstSnapshotSet)
{
GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION);
!   GUC_check_errmsg(transaction isolation level must be 
set before any query);
return false;
}
/* Can't go to serializable mode while recovery is still active 
*/
***
*** 665,677  check_transaction_deferrable(bool *newval, void **extra, 
GucSource source)
if (IsSubTransaction())
{
GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION);
!   GUC_check_errmsg(SET TRANSACTION [NOT] DEFERRABLE cannot be 
called within a subtransaction);
return false;
}
if (FirstSnapshotSet)
{
GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION);
!   GUC_check_errmsg(SET TRANSACTION [NOT] DEFERRABLE must be 
called before any query);
return false;
}
  
--- 665,677 
if (IsSubTransaction())
{
GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION);
!   GUC_check_errmsg(cannot set transaction deferrable state 
within a subtransaction);
return false;
}
if (FirstSnapshotSet)
{
GUC_check_errcode(ERRCODE_ACTIVE_SQL_TRANSACTION);
!   GUC_check_errmsg(transaction deferrable state must be set 
before any query);
return false;
}
  
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 5042,5047  config_enum_get_options(struct config_enum * record, const 
char *prefix,
--- 5042,5051 
   *
   * If value is NULL, set the option to its default value (normally the
   * reset_val, but if source == PGC_S_DEFAULT we instead use the boot_val).
+  * When we reset a value we can't assume that the old value is valid, but must
+  * call the check hook if present.  This is because the validity of a change
+  * might depend on context.  For example, transaction isolation may not be
+  * changed after the transaction has run a query, even by a RESET command.
   *
   * action indicates whether to set the value globally in the session, locally
   * to the current top transaction, or just for the duration of a function 
call.
***
*** 5287,5302  set_config_option(const char *name, const char *value,
 name)));
return 0;
}
-   if (!call_bool_check_hook(conf, 
newval, newextra,
-   
  source, elevel))
-

Re: [HACKERS] Command Triggers

On Thu, Feb 16, 2012 at 4:21 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 We could decide command triggers are superuser only.  Security is not
 something I'm very strong at, so I'll leave it up to you to decide.

That's certainly the easiest option.  If you don't feel passionate
about spending a lot of energy figuring out how to make it secure,
then I suggest we just restrict it to superusers until someone does.

 If I install a command trigger that prevents all DDL, how do I
 uninstall it?  Or say I'm the superuser and I want to undo something
 my disgruntled DBA did before he quit.

 Good catch, I guess we need to remove creating and dropping a command
 trigger to the list of supported commands in the ANY COMMAND list.

Another option would be to add a PGC_SUSET boolean GUC that can be
used to disable command triggers.  I think that might be more
flexible, not to mention useful for recursion prevention.

 I am a bit worried about supporting command triggers on statements
 that do internal transaction control, such as VACUUM and CREATE INDEX
 CONCURRENTLY.  Obviously that's very useful and I'd like to have it,
 but there's a problem: if the AFTER trigger errors out, it won't undo
 the whole command.  That might be very surprising.  BEFORE triggers
 seem OK, and AFTER triggers might be OK too but we at least need to
 think hard about how to document that.

 I think we should limit the support to BEFORE command trigger only.  It
 was unclear to me how to solve the problem for the AFTER command case,
 and if you're unclear to, then there's not that many open questions.

Works for me.

 I think it would be better to bail on trying to use CREATE TRIGGER and
 DROP TRIGGER as a basis for this functionality, and instead create
 completely new toplevel statements CREATE COMMAND TRIGGER and DROP
 COMMAND TRIGGER.  Then, you could decide that all command triggers
 live in the same namespace, and therefore to get rid of the command
 trigger called bob you can just say DROP COMMAND TRIGGER bob,
 without having to specify the type of command it applies to.  It's

 I have no strong feeling about that.  Would that require that COMMAND be
 more reserved than it currently is (UNRESERVED_KEYWORD)?

It shouldn't.

 still clear that you're dropping a *command* trigger because that's
 right in the statement name, whereas it would make me a bit uneasy to
 decide that DROP TRIGGER bob means a command trigger just by virtue
 of the fact that no table name was specified.  That would probably
 also make it easier to accomplish the above-described goal of
 integrating this into the DropStmt infrastructure.

 The other thing is that you might want to drop the trigger from only one
 command, of course we could support both syntax.  We could also add
 support for the following one:

  DROP TRIGGER bob ON ALL COMMANDS;

Uh, hold on.  Creating a trigger on multiple commands ought to only
create one entry in pg_cmdtrigger.  If you drop it, you drop the whole
thing.  Changing which commands the trigger applies to would be the
job for ALTER, not DROP.  But note that we have no similar
functionality for regular triggers, so I can't think we really need it
here either.

 Can we call InitCommandContext in some centralized place, rather than
 separately at lots of different call sites?

 Then you have to either make the current command context a backend
 private global variable, or amend even more call sites to pass it down.
 The global variable idea does not work, see RemoveRelations() and
 RemoveObjects() which are handling an array of command contexts.

 So do you prefer lots of InitCommandContext() or adding another parameter
 to almost all functions called by standard_ProcessUtility()?

Blech.  Maybe we should just have CommandFiresTriggers initialize it
if not already done?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

Robert Haas robertmh...@gmail.com writes:
 On Thu, Feb 16, 2012 at 4:21 PM, Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
 That's certainly the easiest option.  If you don't feel passionate
 about spending a lot of energy figuring out how to make it secure,
 then I suggest we just restrict it to superusers until someone does.

Works for me.

 If I install a command trigger that prevents all DDL, how do I
 uninstall it?  Or say I'm the superuser and I want to undo something
 my disgruntled DBA did before he quit.

 Good catch, I guess we need to remove creating and dropping a command
 trigger to the list of supported commands in the ANY COMMAND list.

 Another option would be to add a PGC_SUSET boolean GUC that can be
 used to disable command triggers.  I think that might be more
 flexible, not to mention useful for recursion prevention.

Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED;
which I had forgotten about in the previous answer, so I think we're
good as it is.  That's how I prevented recursion in some of my tests
(not included in the regress tests, was using INSTEAD OF).

  DROP TRIGGER bob ON ALL COMMANDS;

 Uh, hold on.  Creating a trigger on multiple commands ought to only
 create one entry in pg_cmdtrigger.  If you drop it, you drop the whole
 thing.  Changing which commands the trigger applies to would be the
 job for ALTER, not DROP.  But note that we have no similar
 functionality for regular triggers, so I can't think we really need it
 here either.

Why would we do it that way (a single entry for multiple commands)?  The
way it is now, it's only syntactic sugar, so I think it's easier to
implement, document and use.

 So do you prefer lots of InitCommandContext() or adding another parameter
 to almost all functions called by standard_ProcessUtility()?

 Blech.  Maybe we should just have CommandFiresTriggers initialize it
 if not already done?

Thing is, in a vast number of cases (almost of ALTER OBJECT name,
namespace and owner) you don't have the Node * parse tree any more from
the place where you check for CommandFiresTriggers(), so you can't
initialize there. That's part of the fun.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Designing an extension for feature-space similarity search

2012-02-16 Thread Jay Levitt


Tom Lane wrote:

- Can domains have operators, or are operators defined on types?


I think the current state of play is that you can have such things but
the system will only consider them for exact type matches, so you might
need more explicit casts than you ordinarily would.


Turns out it's even smarter than that; it seems to coerce when it's unambiguous:

create domain birthdate as date;
create function date_dist(birthdate, birthdate) returns integer as $$
select 123;
$$ language sql;
create operator - (
procedure = date_dist,
leftarg = birthdate,
rightarg = birthdate);

select '2012-01-01'::birthdate - '2012-01-01'::birthdate;
-- 123

select '2012-01-01'::date - '2012-01-01'::date ;
-- 123


create domain activity_date as date;
create function date_dist(activity_date, activity_date)
returns integer as $$
  select 432;
$$ language sql;
create operator - (
procedure = date_dist,
leftarg = activity_date,
rightarg = activity_date);

select '2012-01-01'::activity_date - '2012-01-01'::activity_date;
-- 432

select '2012-01-01'::birthdate - '2012-01-01'::birthdate;
-- 123

select '2012-01-01'::date - '2012-01-01'::date ;
-- ERROR:  operator is not unique: date - date

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Qual evaluation cost estimates for GIN indexes

I looked into the complaint here of poor estimation for GIN indexscans:
http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php
At first glance it sounds like a mistake in selectivity estimation,
but it isn't: the rowcount estimates are pretty nearly dead on.
The problem is in the planner's estimate of the cost of executing the
@@ operator.  We have pg_proc.procost set to 1 for ts_match_vq, but
actually it's a good deal more expensive than that.  Some
experimentation suggests that @@ might be about 500 times as expensive
as a simple integer comparison.  I don't propose pushing its procost
up that much, but surely at least 10 would be appropriate, maybe even
100.

However ... if you just alter pg_proc.procost in Marc's example, the
planner *still* picks a seqscan, even though its estimate of the seqscan
cost surely does go up.  The reason is that its estimate of the GIN
indexscan cost goes up just as much, since we charge one qual eval cost
per returned tuple in gincostestimate.  It is easy to tell from the
actual runtimes that that is not what's happening in a GIN indexscan;
we are not re-executing the @@ operator for every tuple.  But the
planner's cost model doesn't know that.

There are a couple of issues that would have to be addressed to make
this better:

1. If we shouldn't charge procost per row, what should we charge?
It's probably reasonable to assume that the primitive GIN-index-entry
comparison operations have cost equal to one cpu_operator_cost
regardless of what's assigned to the user-visible operators, but I'm
not convinced that that's sufficient to model complicated operators.
It might be okay to charge that much per index entry visited rather
than driving it off the number of heap tuples returned.  The code in
gincostestimate goes to considerable lengths to estimate the number of
index pages fetched, and it seems like it might be able to derive the
number of index entries visited too, but it's not trying to account for
any CPU costs at the moment.

2. What about lossy operators, or lossy bitmap scans?  If either of
those things happen, we *will* re-execute the @@ operator at visited
tuples, so discounting its high cost would be a mistake.  But the
planner has no information about either effect, since we moved all
support for lossiness to runtime.

I think it was point #2 that led us to not consider these issues before.
But now that we've seen actual cases where the planner makes a poor
decision because it's not modeling this effect, I think we ought to try
to do something about it.

I haven't got time to do anything about this for 9.2, and I bet you
don't either, but it ought to be on the TODO list to try to improve
this.

BTW, an entirely different line of thought is why on earth is @@ so
frickin expensive, when it's comparing already-processed tsvectors
with only a few entries to an already-processed tsquery with only one
entry??.  This test case suggests to me that there's something
unnecessarily slow in there, and a bit of micro-optimization effort
might be well repaid.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Qual evaluation cost estimates for GIN indexes

I wrote:
 BTW, an entirely different line of thought is why on earth is @@ so
 frickin expensive, when it's comparing already-processed tsvectors
 with only a few entries to an already-processed tsquery with only one
 entry??.  This test case suggests to me that there's something
 unnecessarily slow in there, and a bit of micro-optimization effort
 might be well repaid.

Oh, scratch that: a bit of oprofiling shows that while the tsvectors
aren't all that long, they are long enough to get compressed, and most
of the runtime is going into pglz_decompress not @@ itself.  So this
goes back to the known issue that the planner ought to try to account
for detoasting costs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Designing an extension for feature-space similarity search

Jay Levitt jay.lev...@gmail.com writes:
 Tom Lane wrote:
 - Can domains have operators, or are operators defined on types?
 
 I think the current state of play is that you can have such things but
 the system will only consider them for exact type matches, so you might
 need more explicit casts than you ordinarily would.

 Turns out it's even smarter than that; it seems to coerce when it's 
 unambiguous:

Yeah, that sort of case will probably work all right.  The thing to keep
in mind is that operators/functions declared to take domains are
definitely second-class citizens, and will lose out in many
somewhat-ambiguous cases where an operator on a base type could get
selected due to the ambiguity resolution rules.  For your application it
will probably help if you can pick an operator name that's not in use
for any operation on the base type(s).  Also, it's conceivable that it
won't matter much to you, since I gather these operators will mostly get
invoked behind the scenes and not directly written in queries; it
should not be too hard to avoid ambiguity in mechanically-generated
references.

(There was a good deal of chatter some years ago about trying to improve
support of functions taking domains, but nothing's gotten done yet.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possible new option for wal_sync_method

2012-02-16 Thread Dan Scales

Good point, thanks.  From the ext3 source code, it looks like
ext3_sync_file() does a blkdev_issue_flush(), which issues a flush to the
block device, whereas simple direct IO does not.  So, that would make
this wal_sync_method option less useful, since, as you say, the user
would have to know if the block device is doing write caching.

For the numbers I reported, I don't think the performance gain is from
not doing the block device flush.  The system being measured is a Fibre
Channel disk which should have a fully-nonvolatile disk array.  And
measurements using systemtap show that blkdev_issue_flush() always takes
only in the microsecond range.

I think the overhead is still from the fact that ext3_sync_file() waits
for the current in-flight transaction if there is one (and does an
explicit device flush if there is no transaction to wait for.)  I do
think there are lots of meta-data operations happening on the data files
(especially for a growing database), so the WAL log commit is waiting for
unrelated data operations.  It would be nice if there a simple file
system operation that just flushed the cache of the block device
containing the filesystem (i.e. just does the blkdev_issue_flush() and
not the other things in ext3_sync_file()).

The ext4_sync_file() code looks fairly similar, so I think it may have
the same problem, though I can't be positive.  In that case, this
wal_sync_method option might help ext4 as well.

With respect to sync_file_range(), the Linux code that I'm looking at
doesn't really seem to indicate that there is a device flush (since it
never calls a f_op-fsync_file operation).  So sync_file_range() may be
not be as useful as thought.

By the way, all the numbers were measured with data=writeback,
barrier=1 options for ext3.  I don't think that I have seen a
significant different when the DBT2 workload for ext3 option
data=ordered.

I will measure all these numbers again tonight, but with barrier=0, so as
to try to confirm that the write flush itself isn't costing a lot for
this configuration.

Dan


- Original Message -
From: Andres Freund and...@anarazel.de
To: pgsql-hackers@postgresql.org
Cc: Dan Scales sca...@vmware.com
Sent: Thursday, February 16, 2012 10:32:09 AM
Subject: Re: [HACKERS] possible new option for wal_sync_method

Hi,

On Thursday, February 16, 2012 06:18:23 PM Dan Scales wrote:
 When running Postgres on a single ext3 filesystem on Linux, we find that
 the attached simple patch gives significant performance benefit (7-8% in
 numbers below).  The patch adds a new option for wal_sync_method, which
 is open_direct.  With this option, the WAL is always opened with
 O_DIRECT (but not O_SYNC or O_DSYNC).  For Linux, the use of only
 O_DIRECT should be correct.  All WAL logs are fully allocated before
 being used, and the WAL buffers are 8K-aligned, so all direct writes are
 guaranteed to complete before returning.  (See
 http://lwn.net/Articles/348739/)
I don't think that behaviour is safe in the face of write caches in the IO 
path. Linux takes care to issue flush/barrier instructions when necessary if 
you issue an fsync/fdatasync, but to my knowledge it does not when O_DIRECT is 
used (That would suck performancewise).
I think that behaviour is safe if you have no externally visible write caching 
enabled but thats not exactly easy to get/document knowledge.


Why should there otherwise be any performance difference between O_DIRECT|
O_SYNC and O_DIRECT in wal write case? There is no metadata that needs to be 
written and I have a hard time imaging that the check whether there is 
metadata is that expensive.

I guess a more interesting case would be comparing O_DIRECT|O_SYNC with 
O_DIRECT + fdatasync() or even O_DIRECT +  
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | 
SYNC_FILE_RANGE_WAIT_AFTER)

Any special reason youve did that comparison on ext3? Especially with 
data=ordered its behaviour regarding syncs is pretty insane performancewise. 
Ext4 would be a bit more interesting...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in intarray?

Guillaume Lelarge guilla...@lelarge.info writes:
 This query:
   SELECT ARRAY[-1,3,1]  ARRAY[1, 2];
 should give {1} as a result.

 But, on HEAD (and according to his tests, on 9.0.6 and 9.1.2), it
 appears to give en empty array.

Definitely a bug, and I'll bet it goes all the way back.

 Digging on this issue, another user (Julien Rouhaud) made an interesting
 comment on this line of code:

 if (i + j == 0 || (i + j  0  *(dr - 1) != db[j]))

 (line 159 of contrib/intarray/_int_tool.c, current HEAD)

 Apparently, the code tries to check the current value of the right side
 array with the previous value of the resulting array. Which clearly
 cannot work if there is no previous value in the resulting array.

 So I worked on a patch to fix this, as I think it is a bug (but I may be
 wrong). Patch is attached and fixes the issue AFAICT.

Yeah, this code is bogus, but it's also pretty unreadable.  I think
it's better to get rid of the inconsistently-used pointer arithmetic
and the fundamentally wrong/irrelevant test on i+j, along the lines
of the attached.

regards, tom lane


diff --git a/contrib/intarray/_int_tool.c b/contrib/intarray/_int_tool.c
index 79f018d333b1d6810aa7fd2996c158b41b0263fb..132d15316054d842593468f191eca9053846f706 100644
*** a/contrib/intarray/_int_tool.c
--- b/contrib/intarray/_int_tool.c
*** inner_int_inter(ArrayType *a, ArrayType 
*** 140,146 
  			   *db,
  			   *dr;
  	int			i,
! j;
  
  	if (ARRISEMPTY(a) || ARRISEMPTY(b))
  		return new_intArrayType(0);
--- 140,147 
  			   *db,
  			   *dr;
  	int			i,
! j,
! k;
  
  	if (ARRISEMPTY(a) || ARRISEMPTY(b))
  		return new_intArrayType(0);
*** inner_int_inter(ArrayType *a, ArrayType 
*** 152,166 
  	r = new_intArrayType(Min(na, nb));
  	dr = ARRPTR(r);
  
! 	i = j = 0;
  	while (i  na  j  nb)
  	{
  		if (da[i]  db[j])
  			i++;
  		else if (da[i] == db[j])
  		{
! 			if (i + j == 0 || (i + j  0  *(dr - 1) != db[j]))
! *dr++ = db[j];
  			i++;
  			j++;
  		}
--- 153,167 
  	r = new_intArrayType(Min(na, nb));
  	dr = ARRPTR(r);
  
! 	i = j = k = 0;
  	while (i  na  j  nb)
  	{
  		if (da[i]  db[j])
  			i++;
  		else if (da[i] == db[j])
  		{
! 			if (k == 0 || dr[k - 1] != db[j])
! dr[k++] = db[j];
  			i++;
  			j++;
  		}
*** inner_int_inter(ArrayType *a, ArrayType 
*** 168,180 
  			j++;
  	}
  
! 	if ((dr - ARRPTR(r)) == 0)
  	{
  		pfree(r);
  		return new_intArrayType(0);
  	}
  	else
! 		return resize_intArrayType(r, dr - ARRPTR(r));
  }
  
  void
--- 169,181 
  			j++;
  	}
  
! 	if (k == 0)
  	{
  		pfree(r);
  		return new_intArrayType(0);
  	}
  	else
! 		return resize_intArrayType(r, k);
  }
  
  void
diff --git a/contrib/intarray/expected/_int.out b/contrib/intarray/expected/_int.out
index 6ed3cc6ced096e2e97665e76ceba7fc139415029..4080b9633fe98a91861684e0d82f1297c21b91af 100644
*** a/contrib/intarray/expected/_int.out
--- b/contrib/intarray/expected/_int.out
*** SELECT '{123,623,445}'::int[]  '{1623,6
*** 137,142 
--- 137,148 
   {623}
  (1 row)
  
+ SELECT '{-1,3,1}'::int[]  '{1,2}';
+  ?column? 
+ --
+  {1}
+ (1 row)
+ 
  --test query_int
  SELECT '1'::query_int;
   query_int 
diff --git a/contrib/intarray/sql/_int.sql b/contrib/intarray/sql/_int.sql
index b60e936dc520d8308bd16e50681f061dc86e2dfe..216c5c58d615a7cd1a5fe3714c9a5c91fb255138 100644
*** a/contrib/intarray/sql/_int.sql
--- b/contrib/intarray/sql/_int.sql
*** SELECT '{123,623,445}'::int[] | 623;
*** 24,29 
--- 24,30 
  SELECT '{123,623,445}'::int[] | 1623;
  SELECT '{123,623,445}'::int[] | '{1623,623}';
  SELECT '{123,623,445}'::int[]  '{1623,623}';
+ SELECT '{-1,3,1}'::int[]  '{1,2}';
  
  
  --test query_int

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-16 Thread Shigeru Hanada

(2012/02/17 0:15), Albe Laurenz wrote:
 Shigeru Hanada wrote:
 Thanks for the review.  Attached patches are revised version, though
 only fdw_helper_v5.patch is unchanged.
 
 Two questions:
 - Is it on purpose that you can specify all SSL client options
except sslcompression?

No, just an oversight.  Good catch.

 - Since a rescan is done by rewinding the cursor, is it necessary
to have any other remote isolation level than READ COMMITED?
There is only one query issued per transaction.

If multiple foreign tables on a foreign server is used in a local query,
multiple queries are executed in a remote transaction.  So IMO isolation
levels are useful even if remote query is executed only once.

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Qual evaluation cost estimates for GIN indexes

On Thu, Feb 16, 2012 at 6:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 BTW, an entirely different line of thought is why on earth is @@ so
 frickin expensive, when it's comparing already-processed tsvectors
 with only a few entries to an already-processed tsquery with only one
 entry??.  This test case suggests to me that there's something
 unnecessarily slow in there, and a bit of micro-optimization effort
 might be well repaid.

 Oh, scratch that: a bit of oprofiling shows that while the tsvectors
 aren't all that long, they are long enough to get compressed, and most
 of the runtime is going into pglz_decompress not @@ itself.  So this
 goes back to the known issue that the planner ought to try to account
 for detoasting costs.

This issue of detoasting costs comes up a lot, specifically in
reference to @@.  I wonder if we shouldn't try to apply some quick and
dirty hack in time for 9.2, like maybe random_page_cost for every row
or every attribute we think will require detoasting.  That's obviously
going to be an underestimate in many if not most cases, but it would
probably still be an improvement over assuming that detoasting is
free.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Qual evaluation cost estimates for GIN indexes

Robert Haas robertmh...@gmail.com writes:
 This issue of detoasting costs comes up a lot, specifically in
 reference to @@.  I wonder if we shouldn't try to apply some quick and
 dirty hack in time for 9.2, like maybe random_page_cost for every row
 or every attribute we think will require detoasting.  That's obviously
 going to be an underestimate in many if not most cases, but it would
 probably still be an improvement over assuming that detoasting is
 free.

Well, you can't theorize without data, to misquote Sherlock.  We'd need
to have some stats on which to base we think this will require
detoasting.  I guess we could teach ANALYZE to compute and store
fractions percent of entries in this column that are compressed
and percent that are stored out-of-line, and then hope that those
percentages apply to the subset of entries that a given query will
visit, and thereby derive a number of operations to multiply by whatever
we think the cost-per-detoast-operation is.

It's probably all do-able, but it seems way too late to be thinking
about this for 9.2.  We've already got a ton of new stuff that needs
to be polished and tuned...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers