Re: [HACKERS] psql: show only failed queries

2014-06-30 Thread Rajeev rastogi
On 26 June 2014 11:53, Samrat Revagade Wrote:

 I am sending updated patch - buggy statement is printed via more logical 
 psql_error function instead printf

Thank you for updating patch, I really appreciate your efforts.

Now, everything is good from my side.
* it apply cleanly to the current git master
* includes necessary docs
* I think It is very good  and necessary feature.

If Kumar Rajeev Rastogi do not have any extra comments, then I think patch is  
ready for committer.

I have reviewed this patch. Please find my review comments below:

1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for new 
functionality is not provided.

2. New Command start-up option should be added in psql --help as well as 
in documentation.

Also as I understand, this new option is kind of sub-set of existing option 
(ECHO=query), so should not we display
query string in the same format as it was getting printed earlier.
Though I also feel that prefixing query with STATEMENT word will be helpful to 
grep but at the same time I am worried
about inconsistency with existing option.

Thanks and Regards,
Kumar Rajeev Rastogi


Re: [HACKERS] pg_xlogdump --stats

2014-06-30 Thread Abhijit Menon-Sen
At 2014-06-30 05:19:10 +, dilip.ku...@huawei.com wrote:

 I have started reviewing the patch..

Thanks.

 1. Patch applied to git head cleanly.
 2. Compiled in Linux  -- Some warnings same as mentioned by furuyao 

I've attached an updated version of the patch which should fix the
warnings by using %zu.

 3. Some compilation error in windows
 .\contrib\pg_xlogdump\pg_xlogdump.c(1002) : error C2065: 'optional_argument' 
 : undeclared identifier
 .\contrib\pg_xlogdump\pg_xlogdump.c(1002) : error C2099: initializer is not a 
 constant
 
 optional_argument should be added to getopt_long.h file for windows.

Hmm. I have no idea what to do about this. I did notice when I wrote the
code that nothing else used optional_argument, but I didn't realise that
it wouldn't work on Windows.

It may be that the best thing to do would be to avoid using
optional_argument altogether, and have separate --stats and
--stats-per-record options. Thoughts?

 Please fix these issues and send the updated patch..

I've also attached a separate patch to factor out the identify_record
function into rm_identify functions in the individual rmgr files, so
that the code can live next to the respective _desc functions.

This allows us to remove a number of #includes from pg_xlogdump, and
makes the code a bit more straightforward to read.

-- Abhijit
diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index 824b8c3..1d3e664 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -15,12 +15,28 @@
 #include dirent.h
 #include unistd.h
 
+#include access/clog.h
+#include access/gin_private.h
+#include access/gist_private.h
+#include access/heapam_xlog.h
+#include access/heapam_xlog.h
+#include access/multixact.h
+#include access/nbtree.h
+#include access/spgist_private.h
+#include access/xact.h
 #include access/xlog.h
 #include access/xlogreader.h
 #include access/transam.h
+#include catalog/pg_control.h
+#include catalog/storage_xlog.h
+#include commands/dbcommands.h
+#include commands/sequence.h
+#include commands/tablespace.h
 #include common/fe_memutils.h
 #include getopt_long.h
 #include rmgrdesc.h
+#include storage/standby.h
+#include utils/relmapper.h
 
 
 static const char *progname;
@@ -41,6 +57,8 @@ typedef struct XLogDumpConfig
 	int			stop_after_records;
 	int			already_displayed_records;
 	bool		follow;
+	bool		stats;
+	bool		stats_per_record;
 
 	/* filter options */
 	int			filter_by_rmgr;
@@ -48,6 +66,20 @@ typedef struct XLogDumpConfig
 	bool		filter_by_xid_enabled;
 } XLogDumpConfig;
 
+typedef struct Stats
+{
+	uint64		count;
+	uint64		rec_len;
+	uint64		fpi_len;
+} Stats;
+
+typedef struct XLogDumpStats
+{
+	uint64		count;
+	Stats		rmgr_stats[RM_NEXT_ID];
+	Stats		record_stats[RM_NEXT_ID][16];
+} XLogDumpStats;
+
 static void
 fatal_error(const char *fmt,...)
 __attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2)));
@@ -322,6 +354,39 @@ XLogDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
 }
 
 /*
+ * Store per-rmgr and per-record statistics for a given record.
+ */
+static void
+XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats, XLogRecPtr ReadRecPtr, XLogRecord *record)
+{
+	RmgrId		rmid;
+	uint8  		recid;
+
+	if (config-filter_by_rmgr != -1 
+		config-filter_by_rmgr != record-xl_rmid)
+		return;
+
+	if (config-filter_by_xid_enabled 
+		config-filter_by_xid != record-xl_xid)
+		return;
+
+	rmid = record-xl_rmid;
+	recid = (record-xl_info  ~XLR_INFO_MASK)  4;
+
+	stats-count++;
+
+	stats-rmgr_stats[rmid].count++;
+	stats-rmgr_stats[rmid].rec_len += record-xl_len;
+	stats-rmgr_stats[rmid].fpi_len +=
+		(record-xl_tot_len - record-xl_len - SizeOfXLogRecord);
+
+	stats-record_stats[rmid][recid].count++;
+	stats-record_stats[rmid][recid].rec_len += record-xl_len;
+	stats-record_stats[rmid][recid].fpi_len +=
+		(record-xl_tot_len - record-xl_len - SizeOfXLogRecord);
+}
+
+/*
  * Print a record to stdout
  */
 static void
@@ -380,6 +445,476 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogRecPtr ReadRecPtr, XLogRecord
 	}
 }
 
+/*
+ * Given an xl_rmid and the high bits of xl_info, returns a string
+ * describing the record type.
+ */
+static const char *
+identify_record(RmgrId rmid, uint8 info)
+{
+	const RmgrDescData *desc = RmgrDescTable[rmid];
+	const char *rec;
+
+	rec = psprintf(0x%x, info);
+
+	switch (rmid)
+	{
+		case RM_XLOG_ID:
+			switch (info)
+			{
+case XLOG_CHECKPOINT_SHUTDOWN:
+	rec = CHECKPOINT_SHUTDOWN;
+	break;
+case XLOG_CHECKPOINT_ONLINE:
+	rec = CHECKPOINT_ONLINE;
+	break;
+case XLOG_NOOP:
+	rec = NOOP;
+	break;
+case XLOG_NEXTOID:
+	rec = NEXTOID;
+	break;
+case XLOG_SWITCH:
+	rec = SWITCH;
+	break;
+case XLOG_BACKUP_END:
+	rec = BACKUP_END;
+	break;
+case XLOG_PARAMETER_CHANGE:
+	rec = PARAMETER_CHANGE;
+	break;
+case XLOG_RESTORE_POINT:
+	rec = RESTORE_POINT;
+	break;
+case XLOG_FPW_CHANGE:
+	

Re: [HACKERS] pg_xlogdump --stats

2014-06-30 Thread Abhijit Menon-Sen
At 2014-06-30 12:05:06 +0530, a...@2ndquadrant.com wrote:

 It may be that the best thing to do would be to avoid using
 optional_argument altogether, and have separate --stats and
 --stats-per-record options. Thoughts?

That's what I've done in the attached patch, except I've called the new
option --record-stats. Both options now use no_argument. This should
apply on top of the diff I posted a little while ago.

-- Abhijit
commit cc9422aa71ef0b507c634282272be3fd15c39c0b
Author: Abhijit Menon-Sen a...@2ndquadrant.com
Date:   Mon Jun 30 12:15:54 2014 +0530

Introduce --record-stats to avoid use of optional_argument

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index 47838d4..1853b47 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -609,7 +609,8 @@ main(int argc, char **argv)
 		{timeline, required_argument, NULL, 't'},
 		{xid, required_argument, NULL, 'x'},
 		{version, no_argument, NULL, 'V'},
-		{stats, optional_argument, NULL, 'z'},
+		{stats, no_argument, NULL, 'z'},
+		{record-stats, no_argument, NULL, 'Z'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -643,7 +644,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, be:?fn:p:r:s:t:Vx:z::,
+	while ((option = getopt_long(argc, argv, be:?fn:p:r:s:t:Vx:zZ,
  long_options, optindex)) != -1)
 	{
 		switch (option)
@@ -738,18 +739,10 @@ main(int argc, char **argv)
 break;
 			case 'z':
 config.stats = true;
-config.stats_per_record = false;
-if (optarg)
-{
-	if (strcmp(optarg, record) == 0)
-		config.stats_per_record = true;
-	else if (strcmp(optarg, rmgr) != 0)
-	{
-		fprintf(stderr, %s: unrecognised argument to --stats: %s\n,
-progname, optarg);
-		goto bad_argument;
-	}
-}
+break;
+			case 'Z':
+config.stats = true;
+config.stats_per_record = true;
 break;
 			default:
 goto bad_argument;
diff --git a/doc/src/sgml/pg_xlogdump.sgml b/doc/src/sgml/pg_xlogdump.sgml
index d9f4a6a..bfd9eb9 100644
--- a/doc/src/sgml/pg_xlogdump.sgml
+++ b/doc/src/sgml/pg_xlogdump.sgml
@@ -181,12 +181,22 @@ PostgreSQL documentation
 
  varlistentry
   termoption-z/option/term
-  termoption--stats[=record]/option/term
+  termoption--stats/option/term
   listitem
para
 Display summary statistics (number and size of records and
-full-page images) instead of individual records. Optionally
-generate statistics per-record instead of per-rmgr.
+full-page images per rmgr) instead of individual records.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
+  termoption-Z/option/term
+  termoption--record-stats/option/term
+  listitem
+   para
+Display summary statistics (number and size of records and
+full-page images) instead of individual records.
/para
   /listitem
  /varlistentry

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: show only failed queries

2014-06-30 Thread Pavel Stehule
2014-06-30 8:17 GMT+02:00 Rajeev rastogi rajeev.rast...@huawei.com:

  On 26 June 2014 11:53, Samrat Revagade Wrote:



  I am sending updated patch - buggy statement is printed via more
 logical psql_error function instead printf



 Thank you for updating patch, I really appreciate your efforts.



 Now, everything is good from my side.

 * it apply cleanly to the current git master

 * includes necessary docs

 * I think It is very good  and necessary feature.



 If Kumar Rajeev Rastogi do not have any extra comments, then I
 think patch is  ready for committer.



 I have reviewed this patch. Please find my review comments below:

 1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for
 new functionality is not provided.

all not options entered via psql variables has psql option and psql
comment. I'll plan add new decription to --help-variables list.

If it is necessary I can add long option --echo-errors, I didn't a good
char for short option. Any idea?




  2. New Command start-up option should be added in psql --help as
 well as in documentation.

depends on previous,





 Also as I understand, this new option is kind of sub-set of existing
 option (ECHO=query), so should not we display

 query string in the same format as it was getting printed earlier.

 Though I also feel that prefixing query with STATEMENT word will be
 helpful to grep but at the same time I am worried

 about inconsistency with existing option.


This is question. And I am not strong in feeling what should be preferred.
But still I am inclined to prefer a variant with STATEMENT prefix. Mode
with -a is used with different purpose than mode show errors only - and
output with prefix is much more consistent with log entry - and displaying
error. So I agree, so there is potential inconsistency (but nowhere is
output defined), but this output is more practical, when you are
concentrated to error's processing.

Regards

Pavel




 *Thanks and Regards,*

 *Kumar Rajeev Rastogi *



Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2014-06-30 Thread Abhijit Menon-Sen
At 2014-06-29 22:25:54 +0530, a...@2ndquadrant.com wrote:

 I think the really right thing to do would be to have two separate
 columns, one with all, sameuser, samerole, replication, or
 empty; and the other an array of database names.

After sleeping on it, I realised that the code would return '{all}' for
'all' in pg_hba.conf, but '{all}' for 'all'. So it's not exactly
ambiguous, but I don't think it's especially useful for callers.

I think having two columns would work. The columns could be called
database and database_list and user and user_list respectively.

The database column may contain one of all, sameuser, samegroup,
replication, but if it's empty, database_list will contain an array of
database names. Then (all, {}) and (, {all}) are easily separated.
Likewise for user and user_list.

I've marked this patch returned with feedback and moved it to the
August CF after discussion with Vaishnavi.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL access to database attributes

2014-06-30 Thread Pavel Stehule
2014-06-29 21:09 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Vik Fearing vik.fear...@dalibo.com writes:
  On 06/21/2014 10:11 PM, Pavel Stehule wrote:
  Is any reason or is acceptable incompatible change CONNECTION_LIMIT
  instead CONNECTION LIMIT? Is decreasing parser size about 1% good enough
  for breaking compatibility?

  How is compatibility broken?  The grammar still accepts the old way, I
  just changed the documentation to promote the new way.

 While I agree that this patch wouldn't break backwards compatibility,
 I don't really see what the argument is for changing the recommended
 spelling of the command.

 The difficulty with doing what you've done here is that it creates
 unnecessary cross-version incompatibilities; for example a 9.5 psql
 being used against a 9.4 server would tab-complete the wrong spelling
 of the option.  Back-patching would change the set of versions for
 which the problem exists, but it wouldn't remove the problem altogether.
 And in fact it'd add new problems, e.g. pg_dumpall output from a 9.3.5
 pg_dumpall failing to load into a 9.3.4 server.  This is not the kind of
 change we customarily back-patch anyway.

 So personally I'd have just made connection_limit be an undocumented
 internal equivalent for CONNECTION LIMIT, and kept the latter as the
 preferred spelling, with no client-side changes.


+1

There is no important reason do hard changes in this moment

Pavel




 regards, tom lane



[HACKERS] Escaping from blocked send() reprised.

2014-06-30 Thread Kyotaro HORIGUCHI
Hello, I have received inquiries related to blocked communication
several times for these weeks with different symptoms. Then I
found this message from archive,

http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html

 Subject: Escaping a blocked sendto() syscall without causing a restart

Mr. Tom Lane gave a comment replying it,

 Offhand it looks to me like most signals would kick the backend off the
 send() call ... but it would loop right back and try again.  See
 internal_flush() in pqcomm.c.  (If you're using SSL, this diagnosis
 may or may not apply.)
 
 We can't do anything except repeat the send attempt if the client
 connection is to be kept in a sane state.
(snipped)
  And I'm not at all sure if we could get it to work in SSL mode...

That's true for timeouts that should continue the connection,
say, statement_timeout, but focusing on intentional backend
termination, I think it does no harm to break it up abruptly,
even if it was on SSL. On the other hand it seems still
preferable to keep a connection when not blocked. The following
expression would detects such a blocking state at just before
next send(2) after the previous try exited by signals.

(ProcDiePending  select(1, NULL, fd, NULL, '1 sec') == 0)

Finally, pg_terminate_backend() works even when send is blocked
for both SSL and non-SSL connections after 1 second delay with
this patch (break_socket_blocking_on_termination_v1.patch).

Nevetheless, of course statement_timeout cannot become effective
by this method since it breaks the consistency in the client
protocol. It needs change in client protocol to have out of
band mechanism or something, maybe.

Any suggestions?



Attached patches are:

  - break_socket_blocking_on_termination_v1.patch : The patch to
break blocked state of send(2) for pg_terminate_backend().

  - socket_block_test.patch : debug printing and changing send
buffer of libpq for reproducing the blocked situation.



Some point of discussion follows,

 Discussion about the appropriateness of looking into
 ProcDiePending there and calling CHECK_FOR_INTERRUPTS()
 seeing it.

I have somewhat uneasiness of these things, but what we can at
most seems to be replacing ProcDiePending here with some another
variable, say ImmediatelyExitFromBlockedState, and somehow go
upstairs through normal return path. Additional Try-Catch seems
can do that but it looks no benefit for the added complexity..



 Discussion on breaking up connetion especially for SSL

Breaking an SSL connection up in my_sock_write() cause the
following message on client side if it still lives and resumes to
receive from the connection, this seems to show that the client
handles the event properly.

| SSL SYSCALL error: EOF detected
| The connection to the server was lost. Attempting reset: Succeeded.



 Discussion on reliability of select(2)

This method is not a perfect solution, since the select(2)
sometimes gives a wrong oracle about wheather the follwing
send(2) will be blocked.

Even so, as far as I see, select(2) just after exiting from
blocked send(2) by signal seems always says 'write will be
blocked', so what this patch does seems to save most cases except
when the any amount of socket buffer is vacated just before the
following select. The second chance to exit from blocked send(2)
won't come after this(, before one more pg_terminate_backend() ?).

Removing the select(2) from the condition (that is,
CHECK_FOR_INTERRUPTS() is called always ProcDiePending is true)
prevents such a possibility, in exchange for killing 'really
live' connection but IMHO it's no problem on intentional server
termination.

More reliable measure for this would be non-blocking IO but it
changes more of the code.



  Reproducing the situation.

Another possible question would be about the possibility of such
blocking, or how to reproduce the situation. I found that send(2)
on CentOS6.5 somehow sends successfully, for most cases, the
remaining data at the retry after exiting by signal during being
blocked with buffer full, in spite of no change in environment.

So reproducing the stucked situation is rather difficult on the
server as is. But such situation would be reproduced quite easily
with some cheat, that is, enlarging PQ send buffer, say by ten
times.

Applying the attached testing patch (socket_block_test.patch),
the following steps will make the stucked situation.

 1. Do a select which returns big result and enter Ctrl-Z just
after invoking.

   cl $ psql -h localhost postgres
   cl postgres=# select 1 from generate_series(0, 999);
   cl ^Z
   cl [4]+  Stopped psql -h localhost postgres

 2. Watch the server to stuck.

  The server starts to print lines like following to console
  after a while, then stops. The number enclosed by the square
  brackets is PID of the server.

   sv  [8809] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0

 3. Do 

Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-30 Thread Craig Ringer
On 06/29/2014 02:19 AM, Matheus de Oliveira wrote:
 Hi Hackers,
 
 I have worked on that patch a little more. So now I have functional
 patch (although still WIP) attached. The feature works as following:
 
 - Added a boolean parameter only_temp_files to pg_tablespace.spcoptions;
 - This parameter can be set to true only during CREATE TABLESPACE, not
 on ALTER TABLESPACE (I have thought of ways of implementing the latter,
 and I'd like to discuss it more latter);
 - On the creation of relations, it is checked if it is a
 temporary-tablespace, and an error occurs when it is and the relation is
 not temporary (temp table or index on a temp table);
 - When a temporary file (either relation file or sort/agg file) is
 created inside a temporary-tablespace, the entire directories structure
 is created on-demand (e.g. if
 pg_tblspc/oid/TABLESPACE_VERSION_DIRECTORY is missing, it is created
 on demand) it is done on OpenTemporaryFileInTablespace, at fd.c (I
 wonder if shouldn't we do that for any tablespace) and on
 TablespaceCreateDbspace, at tablespace.c.

Right now PostgreSQL appears to rely on the absence of the tablespace
directory as a flag to tell it don't start up, something's badly wrong
here.

If the user creates the tablespace directory, it figures they at least
know what they're doing and carries on starting up with the empty
tablespace. It's treated as an admin statement - I know it's gone, it's
not coming back, just carry on as best you can.

If Pg were to auto-create the directory, then if (say) a mount of a
tablespace dir failed at boot, Pg would still happily start up and might
create files in the tablespace, despite there being inaccessible
tables/indexes/etc on the not-mounted volume that's *supposed* to be the
tablespace storage. That'd create plenty of mess.

So no, I don't think Pg should auto-create the tablespace directory if
it's missing for non-temporary tablespaces. Not unless the current (less
than friendly) behaviour around lost tablespaces is replaced with
something like an `ignore_missing_tablespaces` or
`drop_missing_tablespaces` GUC - akin to our existing
`zero_missing_pages` recovery GUC.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-06-30 Thread Ashutosh Bapat
I checked that it's reporting the right tableoid now.

BTW, why aren't you using the tlist passed to this function? I guess
create_scan_plan() passes tlist after processing it, so that should be used
rather than rel-reltargetlist.


On Mon, Jun 30, 2014 at 12:22 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
 wrote:

 (2014/06/24 16:30), Etsuro Fujita wrote:

 (2014/06/23 18:35), Ashutosh Bapat wrote:


  Selecting tableoid on parent causes an error, ERROR:  cannot extract
 system attribute from virtual tuple. The foreign table has an OID which
 can be reported as tableoid for the rows coming from that foreign table.
 Do we want to do that?


 No.  I think it's a bug.  I'll fix it.


 Done.  I think this is because create_foreignscan_plan() makes reference
 to attr_needed, which isn't computed for inheritance children.  To aboid
 this, I've modified create_foreignscan_plan() to see reltargetlist and
 baserestrictinfo, instead of attr_needed.  Please find attached an updated
 version of the patch.


 Sorry for the delay.

 Best regards,
 Etsuro Fujita




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] replicating DROP commands across servers

2014-06-30 Thread Abhijit Menon-Sen
Hi.

I thought I'd review this patch, since pgaudit uses the
pg_event_trigger_dropped_objects function.

I went through the patch line by line, and I don't really have anything
to say about it. I notice that there are some XXX/FIXME comments in the
code, but it's not clear if those need to (or can be) fixed before the
code is committed.

Everything else looks good. I think this is ready for committer.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-06-30 Thread Ronan Dunklau
Le lundi 30 juin 2014 16:43:38 Michael Paquier a écrit :
 On Thu, Jun 19, 2014 at 11:00 PM, Michael Paquier michael.paqu...@gmail.com
  wrote:
   This seems to be related to re-using the table-name between
 
  invocations. The
 
   attached patch should fix point 2. As for point 1, I don't know the
 
  cause for
 
   it. Do you have a reproducible test case ?
  
   Sure. I'll try harder when looking in more details at the patch for
   postgres_fdw :)
 
  With v2, I have tried randomly some of the scenarios of v1 plus some
  extras, but was not able to reproduce it.
 
   I'll look into the patch for postgres_fdw later.
 
  And here are some comments about it, when applied on top of the other 2
  patches.
  1) Code compiles without warning, regression tests pass.
  2) In fetch_remote_tables, the palloc for the parameters should be
  done after the number of parameters is determined. In the case of
  IMPORT_ALL, some memory is wasted for nothing.
  3) Current code is not able to import default expressions for a table.
  A little bit of processing with pg_get_expr would be necessary:
  select pg_catalog.pg_get_expr(adbin, adrelid) from pg_attrdef;
  There are of course bonus cases like SERIAL columns coming immediately
  to my mind but it would be possible to determine if a given column is
  serial with pg_get_serial_sequence.
  This would be a good addition for the FDW IMO.
  4) The same applies of course to the constraint name: CREATE FOREIGN
  TABLE foobar (a int CONSTRAINT toto NOT NULL) for example.
  5) A bonus idea: enable default expression obtention and null/not null
  switch by default but add options to disable their respective
  obtention.
  6) Defining once PGFDW_IMPORTRESULT_NUMCOLS at the top of
  postgres_fdw.c without undefining would be perfectly fine.
  7) In postgresImportForeignSchema, the palloc calls and the
  definitions of the variables used to save the results should be done
  within the for loop.
  8) At quick glance, the logic of postgresImportForeignSchema looks
  awkward... I'll have a second look with a fresher mind later on this
  one.

 While having a second look at the core patch, I have found myself
 re-hacking it, fixing a couple of bugs and adding things that have been
 missing in the former implementation:
 - Deletions of unnecessary structures to simplify code and make it cleaner
 - Fixed a bug related to the management of local schema name. A FDW was
 free to set the schema name where local tables are created, this should not
 be the case.
 - Improved documentation, examples and other things, fixed doc padding for
 example
 - Added some missing stuff in equalfuncs.c and copyfuncs.c
 - Some other things.
 With that, core patch looks pretty nice actually, and I think that we
 should let a committer have a look at this part at least for this CF.

 Also, the postgres_fdw portion has been updated based on the previous core
 patch modified, using a version that Ronan sent me, which has addressed the
 remarks I sent before. This patch is still lacking documentation, some
 cleanup, and regression tests are broken, but it can be used to test the
 core feature. I unfortunately don't have much time today but I am sending
 this patch either way to let people play with IMPORT SCHEMA if I don't come
 back to it before.

The regression tests fail because of a typo in pg_type.h: BOOLARRAYOID should
be defined to 1000, not 1003 (which clashes against NAMEARRAYOID).

What do you think should be documented, and where ?


 Regards,

--
Ronan Dunklau
http://dalibo.com - http://dalibo.orgFrom 1cc2922e9087f2d852d2b1196314d7e35dce42a3 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 30 Jun 2014 16:36:47 +0900
Subject: [PATCH 2/2] Add support of IMPORT SCHEMA for postgres_fdw

---
 contrib/postgres_fdw/expected/postgres_fdw.out |  62 ++
 contrib/postgres_fdw/postgres_fdw.c| 259 +
 contrib/postgres_fdw/sql/postgres_fdw.sql  |  30 +++
 src/include/catalog/pg_type.h  |   2 +
 4 files changed, 353 insertions(+)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2e49ee3..07e6c11 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2834,3 +2834,65 @@ NOTICE:  NEW: (13,test triggered !)
  (0,27)
 (1 row)

+-- Test IMPORT FOREIGN SCHEMA statement
+CREATE schema import_destination;
+CREATE schema import_source;
+CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL);
+CREATE TABLE import_source.t2 (c1 int, c2 varchar);
+CREATE TABLE import_source.t3 (c1 int, c2 varchar);
+IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_destination;
+\det+ import_destination;
+List of foreign tables
+   Schema   | Table |  Server  |  FDW Options   | Description

Re: [HACKERS] psql: show only failed queries

2014-06-30 Thread Rajeev rastogi
On 30 June 2014 12:24, Pavel Stehule Wrote:

I have reviewed this patch. Please find my review comments below:

1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for new 
functionality is not provided.
all not options entered via psql variables has psql option and psql comment. 
I'll plan add new decription to --help-variables list.
If it is necessary I can add long option --echo-errors, I didn't a good char 
for short option. Any idea?

But the new option we are adding are on a track of existing option, so better 
we add start-up option for this also.
Yeah long option –echo-errors seems to be fine to me also. For short option, I 
think we can use “-b” stands for blunder. This is the closest one I could think 
of.


2. New Command start-up option should be added in psql --help as well 
as in documentation.
depends on previous,
Right.
Also as I understand, this new option is kind of sub-set of existing option 
(ECHO=query), so should not we display
query string in the same format as it was getting printed earlier.
Though I also feel that prefixing query with STATEMENT word will be helpful 
to grep but at the same time I am worried
about inconsistency with existing option.

This is question. And I am not strong in feeling what should be preferred. But 
still I am inclined to prefer a variant with STATEMENT prefix. Mode with -a is 
used with different purpose than mode show errors only - and output with 
prefix is much
 more consistent with log entry - and displaying error. So I agree, so there 
 is potential inconsistency (but nowhere is output defined), but this output 
 is more practical, when you are concentrated to error's processing.
Yeah right, I just wanted to raise point to provoke other thought to see if 
anyone having different opinion. If no objection from others, we can go ahead 
with the current prefixing approach.
Thanks and Regards,
Kumar Rajeev Rastogi


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Andres Freund
On 2014-06-30 11:04:53 +0530, Amit Kapila wrote:
 On Sun, Jun 29, 2014 at 2:54 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-06-29 11:53:28 +0530, Amit Kapila wrote:
   On Sat, Jun 28, 2014 at 1:48 PM, Andres Freund and...@2ndquadrant.com
   I think it is better to have some discussion about it. Apart from this,
   today I noticed atomic read/write doesn't use any barrier semantics
   and just rely on volatile.
 
  Yes, intentionally so. It's often important to get/set the current value
  without the cost of emitting a memory barrier. It just has to be a
  recent value  and it actually has to come from memory.
 
 I agree with you that we don't want to incur the cost of memory barrier
 for get/set, however it should not be at the cost of correctness.

I really can't follow here. A volatile read/store forces it to go to
memory without barriers. The ABIs of all platforms we work with
guarantee that 4bytes stores/reads are atomic - we've been relying on
that for a long while.

  And that's actually
  enforced by the current definition - I think?
 
 Yeah, this is the only point which I am trying to ensure, thats why I sent
 you few links in previous mail where I had got some suspicion that
 just doing get/set with volatile might lead to correctness issue in some
 cases.

But this isn't something this patch started doing - we've been doing
that for a long while?

 Some another Note, I found today while reading more on it which suggests
 that my previous observation is right:
 
 Within a thread of execution, accesses (reads and writes) to all volatile
 objects are guaranteed to not be reordered relative to each other, but this
 order is not guaranteed to be observed by another thread, since volatile
 access does not establish inter-thread synchronization.
 http://en.cppreference.com/w/c/atomic/memory_order

That's just saying there's no ordering guarantees with volatile
stores/reads. I don't see a contradiction?

b) It's only supported from vista onwards. Afaik we still support XP.
   #ifndef pg_memory_barrier_impl
   #define pg_memory_barrier_impl() MemoryBarrier()
   #endif
  
   The MemoryBarrier() macro used also supports only on vista onwards,
   so I think for reasons similar to using MemoryBarrier() can apply for
   this as well.
 
  I think that's odd because barrier.h has been using MemoryBarrier()
  without a version test for a long time now. I guess everyone uses a new
  enough visual studio.
 
 Yeap or might be the people who even are not using new enough version
 doesn't have a use case that can hit a problem due to MemoryBarrier() 

Well, with barrier.h as it stands they'd get a compiler error if it
indeed is unsupported? But I think there's something else going on -
msft might just be removing XP from it's documentation. Postgres supports
building with with visual studio 2005 and MemoryBarrier() seems to be
supported by it.
I think we'd otherwise seen problems since 9.1 where barrier.h was introduced.

 In this case, I have a question for you.
 
 Un-patched usage  in barrier.h is as follows:
 ..
 #elif defined(__ia64__) || defined(__ia64)
 #define pg_compiler_barrier() _Asm_sched_fence()
 #define pg_memory_barrier() _Asm_mf()
 
 #elif defined(WIN32_ONLY_COMPILER)
 /* Should work on both MSVC and Borland. */
 #include intrin.h
 #pragma intrinsic(_ReadWriteBarrier)
 #define pg_compiler_barrier() _ReadWriteBarrier()
 #define pg_memory_barrier() MemoryBarrier()
 #endif
 
 If I understand correctly the current define mechanism in barrier.h,
 it will have different definition for Itanium processors even for windows.

Either noone has ever tested postgres on itanium windows (quite
possible), because afaik _Asm_sched_fence() doesn't work on
windows/msvc, or windows doesn't define __ia64__/__ia64. Those macros
arefor HP's acc on HPUX.

 However the patch defines as below:
 
 #if defined(HAVE_GCC__ATOMIC_INT32_CAS) || defined(HAVE_GCC__SYNC_INT32_CAS)
 # include storage/atomics-generic-gcc.h
 #elif defined(WIN32_ONLY_COMPILER)
 # include storage/atomics-generic-msvc.h
 
 What I can understand from above is that defines in 
 storage/atomics-generic-msvc.h, will override any previous defines
 for compiler/memory barriers and _ReadWriteBarrier()/MemoryBarrier()
 will be considered for Windows always.

Well, the memory barrier is surrounded by #ifndef
pg_memory_barrier_impl. The compiler barrier can't reasonably be defined
earlier since it's a compiler not an architecture thing.

 6.
 pg_atomic_compare_exchange_u32()

 It is better to have comments above this and all other related
   functions.
   Okay, generally code has such comments on top of function
   definition rather than with declaration.
 
  I don't want to have it on the _impl functions because they're
  duplicated for the individual platforms and will just become out of
  date...
 
 Sure, I was also not asking for _impl functions.  What I was asking
 in this point was to have comments on top of definition of
 

[HACKERS] uninitialized values in revised prepared xact code

2014-06-30 Thread Andres Freund
Hi,

I've just rerun valgrind for the first time in a while and saw the
following splat. My guess is it exists since bb38fb0d43c, but that's
blindly guessing:

==2049== Use of uninitialised value of size 8
==2049==at 0x4FE66D: EndPrepare (twophase.c:1063)
==2049==by 0x4F231B: PrepareTransaction (xact.c:2217)
==2049==by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
==2049==by 0x79013E: finish_xact_command (postgres.c:2408)
==2049==by 0x78DE97: exec_simple_query (postgres.c:1062)
==2049==by 0x791FDD: PostgresMain (postgres.c:4010)
==2049==by 0x71B13B: BackendRun (postmaster.c:4113)
==2049==by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049==by 0x71714C: ServerLoop (postmaster.c:1566)
==2049==by 0x716804: PostmasterMain (postmaster.c:1219)
==2049==by 0x679405: main (main.c:219)
==2049==  Uninitialised value was created by a stack allocation
==2049==at 0x4FE16C: StartPrepare (twophase.c:942)
==2049==
==2049== Syscall param write(buf) points to uninitialised byte(s)
==2049==at 0x5C69640: __write_nocancel (syscall-template.S:81)
==2049==by 0x4FE6AE: EndPrepare (twophase.c:1064)
==2049==by 0x4F231B: PrepareTransaction (xact.c:2217)
==2049==by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
==2049==by 0x79013E: finish_xact_command (postgres.c:2408)
==2049==by 0x78DE97: exec_simple_query (postgres.c:1062)
==2049==by 0x791FDD: PostgresMain (postgres.c:4010)
==2049==by 0x71B13B: BackendRun (postmaster.c:4113)
==2049==by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049==by 0x71714C: ServerLoop (postmaster.c:1566)
==2049==by 0x716804: PostmasterMain (postmaster.c:1219)
==2049==by 0x679405: main (main.c:219)
==2049==  Address 0x64694ed is 1,389 bytes inside a block of size 8,192 alloc'd
==2049==at 0x4C27B8F: malloc (vg_replace_malloc.c:298)
==2049==by 0x8E766E: AllocSetAlloc (aset.c:853)
==2049==by 0x8E8E04: MemoryContextAllocZero (mcxt.c:627)
==2049==by 0x8A54D3: AtStart_Inval (inval.c:704)
==2049==by 0x4F1DFC: StartTransaction (xact.c:1841)
==2049==by 0x4F28D1: StartTransactionCommand (xact.c:2529)
==2049==by 0x7900A7: start_xact_command (postgres.c:2383)
==2049==by 0x78DAF4: exec_simple_query (postgres.c:860)
==2049==by 0x791FDD: PostgresMain (postgres.c:4010)
==2049==by 0x71B13B: BackendRun (postmaster.c:4113)
==2049==by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049==by 0x71714C: ServerLoop (postmaster.c:1566)
==2049==  Uninitialised value was created by a stack allocation
==2049==at 0x4FE16C: StartPrepare (twophase.c:942)

It's probably just padding - twophase.c:1063 is the CRC32 computation of
the record data.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending MSVC scripts to support --with-extra-version

2014-06-30 Thread Asif Naeem
Thank you for sharing updated patch. I have compared it with MSVC and
configure generated build i.e.

*MacOSX (*--with-extra-version -30JUN*)*

pc1dotnetpk:inst asif$ ./bin/psql -d postgres
 psql (9.5devel-30JUN)
 Type help for help.
 postgres=# select version();
   version

 
  PostgreSQL 9.5devel-30JUN on x86_64-apple-darwin13.2.0, compiled by Apple
 LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit
 (1 row)
 pc1dotnetpk:inst asif$ ./bin/initdb -V
 initdb (PostgreSQL) 9.5devel-30JUN


*Windows7 64bit (*extraver = '-30JUN'*)*

C:\PG\postgresql\inst_withpatch_v2_extra-versionbin\psql -d postgres
 psql (9.5devel-30JUN)
 WARNING: Console code page (437) differs from Windows code page (1252)
  8-bit characters might not work correctly. See psql reference
  page Notes for Windows users for details.
 Type help for help.
 postgres=# select version();
version
 --
  PostgreSQL 9.5devel-30JUN, compiled by Visual C++ build 1600, 64-bit
 (1 row)
 C:\PG\postgresql\inst_withpatch_v2_extra-versionbin\initdb.exe -V
 initdb (PostgreSQL) 9.5devel-30JUN


Patch looks good to me. I think it is ready for committer. Thanks.

Regards,
Muhammad Asif Naeem


On Fri, Jun 27, 2014 at 5:00 AM, Michael Paquier michael.paqu...@gmail.com
wrote:




 On Fri, Jun 27, 2014 at 8:26 AM, Asif Naeem anaeem...@gmail.com wrote:

 I have spent some time reviewing the code. It applies well and PG master
 branch build fine with setting extraver or keep it undefined.

 Thanks for reviewing that.


 I have observed the following output applying the patch i.e.

 It seems that extraver information only appears when version function is
 being used. If we use -V (--version)  with pg utilities/binaries, it does
 not include additional provided information.

 You are right. The first version of this patch updates PG_VERSION_STR but
 not PG_VERSION, which is the string used for all the binaries to report the
 version.


  Can you please guide how can I perform similar functionality via
 configure script (that can be used on Unix like OS/MinGW) or It is intended
 for Window specific requirement ?. Thanks.

 Sure, you can do the equivalent with plain configure like that:
 ./configure --with-extra-version=-foo --prefix=/to/path/
 And here is the output that I get with such options on OSX for example:
 $ psql -c 'select substring(version(), 1, 52)'
   substring
 --
  PostgreSQL 9.5devel-foo on x86_64-apple-darwin13.2.0
 (1 row)
 $ initdb --version
 initdb (PostgreSQL) 9.5devel-foo

 With the new patch attached, the following output is generated for an MSVC
 build:
 $ psql -c 'select version()'
   version
 
  PostgreSQL 9.5devel-foo, compiled by Visual C++ build 1600, 64-bit
 (1 row)
 $ initdb --version
 initdb (PostgreSQL) 9.5devel-foo

 Regards,
 --
 Michael



Re: [HACKERS] pg_receivexlog add synchronous mode

2014-06-30 Thread furuyao
 Thanks for the review!
 
 +if (secs = 0)
 +secs = 1;/* Always sleep at least 1 sec */
 +
 +sleeptime = secs * 1000 + usecs / 1000;
 
 The above is the code which caused that problem. 'usecs' should have been
 reset to zero when 'secs' are rounded up to 1 second. But not. Attached
 is the updated version of the patch.
Thank you for the refactoring v2 patch.
I did a review of the patch. 

1. applied cleanly and compilation was without warnings and errors
2. all regress tests was passed ok
3. sleeptime is ok when the --status-intarvall is set to 1

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-06-30 Thread Abhijit Menon-Sen
If I understand correctly, the design of this patch has already been
considered earlier and rejected. So I guess the patch should also be
marked rejected?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-06-30 Thread Etsuro Fujita

(2014/06/30 17:47), Ashutosh Bapat wrote:

I checked that it's reporting the right tableoid now.


Thank you for the check.


BTW, why aren't you using the tlist passed to this function? I guess
create_scan_plan() passes tlist after processing it, so that should be
used rather than rel-reltargetlist.


I think that that would be maybe OK, but I think that it would not be 
efficient to use the list to compute attrs_used, because the tlist would 
have more information than rel-reltargetlist in cases where the tlist 
is build through build_physical_tlist().


Thanks,

Best regards,
Etsuro Fujita



On Mon, Jun 30, 2014 at 12:22 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:

(2014/06/24 16:30), Etsuro Fujita wrote:

(2014/06/23 18:35), Ashutosh Bapat wrote:


Selecting tableoid on parent causes an error, ERROR:
  cannot extract
system attribute from virtual tuple. The foreign table has
an OID which
can be reported as tableoid for the rows coming from that
foreign table.
Do we want to do that?


No.  I think it's a bug.  I'll fix it.


Done.  I think this is because create_foreignscan_plan() makes
reference to attr_needed, which isn't computed for inheritance
children.  To aboid this, I've modified create_foreignscan_plan() to
see reltargetlist and baserestrictinfo, instead of attr_needed.
  Please find attached an updated version of the patch.


Sorry for the delay.

Best regards,
Etsuro Fujita




--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: show only failed queries

2014-06-30 Thread Pavel Stehule
2014-06-30 11:20 GMT+02:00 Rajeev rastogi rajeev.rast...@huawei.com:

  On 30 June 2014 12:24, Pavel Stehule Wrote:



 I have reviewed this patch. Please find my review comments below:

 1. Command start-up option (e.g. -a/--echo-all for --ECHO=all), for
 new functionality is not provided.

 all not options entered via psql variables has psql option and psql
 comment. I'll plan add new decription to --help-variables list.

 If it is necessary I can add long option --echo-errors, I didn't a good
 char for short option. Any idea?



 But the new option we are adding are on a track of existing option, so
 better we add start-up option for this also.

 Yeah long option –echo-errors seems to be fine to me also. For short
 option, I think we can use “-b” stands for blunder. This is the closest one
 I could think of.


fixed

see a attachment pls




 2. New Command start-up option should be added in psql --help as
 well as in documentation.

 depends on previous,

 Right.

 Also as I understand, this new option is kind of sub-set of existing
 option (ECHO=query), so should not we display

 query string in the same format as it was getting printed earlier.

 Though I also feel that prefixing query with STATEMENT word will be
 helpful to grep but at the same time I am worried

 about inconsistency with existing option.



 This is question. And I am not strong in feeling what should be
 preferred. But still I am inclined to prefer a variant with STATEMENT
 prefix. Mode with -a is used with different purpose than mode show errors
 only - and output with prefix is much

  more consistent with log entry - and displaying error. So I agree, so
 there is potential inconsistency (but nowhere is output defined), but this
 output is more practical, when you are concentrated to error's processing.

 Yeah right, I just wanted to raise point to provoke other thought to see
 if anyone having different opinion. If no objection from others, we can go
 ahead with the current prefixing approach.

ok, we can wait two days

Regards

Pavel





  *Thanks and Regards,*

 *Kumar Rajeev Rastogi*

commit e05f23c69db53cefa0c3438c5eaeec11e5fa05b0
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Mon Jun 30 12:39:42 2014 +0200

new -b, --echo-errors option

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ee6ec3a..ccaf1c3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -73,6 +73,18 @@ PostgreSQL documentation
 /varlistentry
 
 varlistentry
+  termoption-b//term
+  termoption--echo-errors//term
+  listitem
+  para
+  Print a failed SQL commands to standard error output. This is
+  equivalent to setting the variable varnameECHO/varname to
+  literalerrors/literal.
+  /para
+  /listitem
+/varlistentry
+
+varlistentry
   termoption-c replaceable class=parametercommand/replaceable//term
   termoption--command=replaceable class=parametercommand/replaceable//term
   listitem
@@ -2812,7 +2824,8 @@ bar
 literalqueries/literal,
 applicationpsql/application merely prints all queries as
 they are sent to the server. The switch for this is
-option-e/option.
+option-e/option. If set to literalerror/literal then only
+failed queries are displayed.
 /para
 /listitem
   /varlistentry
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 60169a2..6551b15 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -995,6 +995,9 @@ SendQuery(const char *query)
 		results = NULL;			/* PQclear(NULL) does nothing */
 	}
 
+	if (!OK  pset.echo == PSQL_ECHO_ERRORS)
+		psql_error(STATEMENT:  %s\n, query);
+
 	/* If we made a temporary savepoint, possibly release/rollback */
 	if (on_error_rollback_savepoint)
 	{
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 3aa3c16..0128060 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -87,6 +87,7 @@ usage(void)
 
 	printf(_(\nInput and output options:\n));
 	printf(_(  -a, --echo-all   echo all input from script\n));
+	printf(_(  -b  --echo-errorsecho failed commands sent to server\n));
 	printf(_(  -e, --echo-queries   echo commands sent to server\n));
 	printf(_(  -E, --echo-hiddendisplay queries that internal commands generate\n));
 	printf(_(  -L, --log-file=FILENAME  send session log to file\n));
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 0a60e68..453d6c8 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -31,6 +31,7 @@ typedef enum
 {
 	PSQL_ECHO_NONE,
 	PSQL_ECHO_QUERIES,
+	PSQL_ECHO_ERRORS,
 	PSQL_ECHO_ALL
 } PSQL_ECHO;
 
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 45653a1..3ca3f1e 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -354,6 +354,7 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options)
 		{command, 

Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-30 Thread Fujii Masao
On Sun, Jun 29, 2014 at 11:18 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-29 19:44:21 +0530, Abhijit Menon-Sen wrote:
 At 2014-06-27 00:51:02 +0200, p...@2ndquadrant.com wrote:
 
  Also based on Alvaro's comment, I replaced the scanf parsing code with
  strtoul(l) function.

 As far as I can tell (from the thread and a quick look at the patch),
 your latest version of the patch addresses all the review comments.

 Should I mark this ready for committer now?

 Well, we haven't really agreed on a way forward yet. Robert disagrees
 that the feature is independently useful and thinks it might be
 dangerous for some users. I don't agree with either of those points, but
 that doesn't absolve the patch from the objection. I think tentatively
 have been more people in favor, but it's not clear cut imo.

So what's the usecase of this feature? If it's for normal operation,
using pg_resetxlog for that is a bit dangerous because it can corrupt
the database easily.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: show only failed queries

2014-06-30 Thread Abhijit Menon-Sen
At 2014-06-30 12:48:30 +0200, pavel.steh...@gmail.com wrote:

 +  para
 +  Print a failed SQL commands to standard error output. This is
 +  equivalent to setting the variable varnameECHO/varname to
 +  literalerrors/literal.

No a, just Print failed SQL commands ….

 -option-e/option.
 +option-e/option. If set to literalerror/literal then only
 +failed queries are displayed.

Should be errors here, not error.

   printf(_(  -a, --echo-all   echo all input from script\n));
 + printf(_(  -b  --echo-errorsecho failed commands sent to 
 server\n));
   printf(_(  -e, --echo-queries   echo commands sent to server\n));

Should have a comma after -b to match other options. Also I would remove
sent to server from the description: echo failed commands is fine.

Otherwise looks fine. I see no reason to wait for further feedback, so
I'll mark this ready for committer if you make the above corrections.

At some point, you should probably also update your --help-variables
patch to add this new value to the description of ECHO.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-30 Thread Matheus de Oliveira
On Mon, Jun 30, 2014 at 5:14 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 Right now PostgreSQL appears to rely on the absence of the tablespace
 directory as a flag to tell it don't start up, something's badly wrong
 here.


Well, in fact the behavior is just to give a LOG message:

LOG:  could not open tablespace directory
pg_tblspc/100607/PG_9.3_201306121: No such file or directory

But it starts fine. No?

I ask because I'm relying on that. My patch does not on startup, so in the
absence of the tablespace directory, the above LOG is shown even for
temporary-tablespace, and it tries to create the directory when any
process tries to use it. As we don't have catalog access on startup, it
will require me to use some kind of _init fork for tablespace if we want it
to be re-created during startup. Should we bother about this?


If the user creates the tablespace directory, it figures they at least
 know what they're doing and carries on starting up with the empty
 tablespace. It's treated as an admin statement - I know it's gone, it's
 not coming back, just carry on as best you can.


Well, I think for a tablespace like that it is okay to create it on-demand
(note that I have done this only for temp, not ordinary ones).


  If Pg were to auto-create the directory, then if (say) a mount of a
 tablespace dir failed at boot, Pg would still happily start up and might
 create files in the tablespace, despite there being inaccessible
 tables/indexes/etc on the not-mounted volume that's *supposed* to be the
 tablespace storage. That'd create plenty of mess.

 So no, I don't think Pg should auto-create the tablespace directory if
 it's missing for non-temporary tablespaces.


Ok. I agree. Let's create only for temporary ones (as done by the patch
now).


 Not unless the current (less
 than friendly) behaviour around lost tablespaces is replaced with
 something like an `ignore_missing_tablespaces` or
 `drop_missing_tablespaces` GUC - akin to our existing
 `zero_missing_pages` recovery GUC.


You meant `zero_damaged_pages`, I think. I don't think that is really
necessary in this case.

Anyway, I was thinking about some way to this tablespace to also allow
creation of non-temporary indexes, and after a crash we could mark such
indexes as `NOT VALID` (perhaps `zero_damaged_pages` could be of help
here?!). That should be part of another patch thought, and would require
more thoughts. But I think that would be a great improvement for some
environments, like those on AWS with their ephemeral SSDs.

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-30 Thread Andres Freund
On 2014-06-30 19:57:58 +0900, Fujii Masao wrote:
 On Sun, Jun 29, 2014 at 11:18 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-06-29 19:44:21 +0530, Abhijit Menon-Sen wrote:
  At 2014-06-27 00:51:02 +0200, p...@2ndquadrant.com wrote:
  
   Also based on Alvaro's comment, I replaced the scanf parsing code with
   strtoul(l) function.
 
  As far as I can tell (from the thread and a quick look at the patch),
  your latest version of the patch addresses all the review comments.
 
  Should I mark this ready for committer now?
 
  Well, we haven't really agreed on a way forward yet. Robert disagrees
  that the feature is independently useful and thinks it might be
  dangerous for some users. I don't agree with either of those points, but
  that doesn't absolve the patch from the objection. I think tentatively
  have been more people in favor, but it's not clear cut imo.
 
 So what's the usecase of this feature? If it's for normal operation,
 using pg_resetxlog for that is a bit dangerous because it can corrupt
 the database easily.

a) Mark a database as not being the same. Currently if you promote two
   databases, e.g. to shard out, they'll continue to have same system
   identifier. Which really sucks, especially because timeline ids will
   often increase synchronously.

b) For data recovery it's sometimes useful to create a new database
   (with the same catalog state) and replay all WAL. For that you need to
   reset the system identifier. I've done so hacking up resetxlog
   before.

c) We already allow to set pretty much all aspects of the control file
   via resetxlog - there seems little point of not having the ability to
   change the system identifier.

d) In a logical replication scenario one way to identify individual
   nodes is via the system identifier. If you want to convert a
   basebackup into logical standby one sensible way to do so is to
   create a logical replication slots *before* promoting a physical
   backup to guarantee that slot is able to stream out all changes. If
   the slot names contain the consumer's system identifier you need to
   know the new system identifier beforehand.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-06-30 Thread Ashutosh Bapat
On Mon, Jun 30, 2014 at 4:17 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
wrote:

 (2014/06/30 17:47), Ashutosh Bapat wrote:

 I checked that it's reporting the right tableoid now.


 Thank you for the check.


  BTW, why aren't you using the tlist passed to this function? I guess
 create_scan_plan() passes tlist after processing it, so that should be
 used rather than rel-reltargetlist.


 I think that that would be maybe OK, but I think that it would not be
 efficient to use the list to compute attrs_used, because the tlist would
 have more information than rel-reltargetlist in cases where the tlist is
 build through build_physical_tlist().


In that case, we can call build_relation_tlist() for foreign tables.



 Thanks,

 Best regards,
 Etsuro Fujita


 On Mon, Jun 30, 2014 at 12:22 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:

 (2014/06/24 16:30), Etsuro Fujita wrote:

 (2014/06/23 18:35), Ashutosh Bapat wrote:


 Selecting tableoid on parent causes an error, ERROR:
   cannot extract
 system attribute from virtual tuple. The foreign table has
 an OID which
 can be reported as tableoid for the rows coming from that
 foreign table.
 Do we want to do that?


 No.  I think it's a bug.  I'll fix it.


 Done.  I think this is because create_foreignscan_plan() makes
 reference to attr_needed, which isn't computed for inheritance
 children.  To aboid this, I've modified create_foreignscan_plan() to
 see reltargetlist and baserestrictinfo, instead of attr_needed.
   Please find attached an updated version of the patch.


 Sorry for the delay.

 Best regards,
 Etsuro Fujita




 --
 Best Wishes,
 Ashutosh Bapat
 EnterpriseDB Corporation
 The Postgres Database Company




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] psql: show only failed queries

2014-06-30 Thread Pavel Stehule
2014-06-30 13:01 GMT+02:00 Abhijit Menon-Sen a...@2ndquadrant.com:

 At 2014-06-30 12:48:30 +0200, pavel.steh...@gmail.com wrote:
 
  +  para
  +  Print a failed SQL commands to standard error output. This is
  +  equivalent to setting the variable varnameECHO/varname to
  +  literalerrors/literal.

 No a, just Print failed SQL commands ….

  -option-e/option.
  +option-e/option. If set to literalerror/literal then
 only
  +failed queries are displayed.

 Should be errors here, not error.

printf(_(  -a, --echo-all   echo all input from
 script\n));
  + printf(_(  -b  --echo-errorsecho failed commands sent to
 server\n));
printf(_(  -e, --echo-queries   echo commands sent to
 server\n));

 Should have a comma after -b to match other options. Also I would remove
 sent to server from the description: echo failed commands is fine.


fixed



 Otherwise looks fine. I see no reason to wait for further feedback, so
 I'll mark this ready for committer if you make the above corrections.

 At some point, you should probably also update your --help-variables
 patch to add this new value to the description of ECHO.


I have it in TODO. But I don't would to introduce a dependency between
these patches - so when first patch will be committed, than I update second
patch

Regards

Pavel



 -- Abhijit

commit 998203a2e35643430059c4d3490a176a830cfee2
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Mon Jun 30 12:39:42 2014 +0200

new -b, --echo-errors option

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ee6ec3a..2c4cc96 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -73,6 +73,18 @@ PostgreSQL documentation
 /varlistentry
 
 varlistentry
+  termoption-b//term
+  termoption--echo-errors//term
+  listitem
+  para
+  Print failed SQL commands to standard error output. This is
+  equivalent to setting the variable varnameECHO/varname to
+  literalerrors/literal.
+  /para
+  /listitem
+/varlistentry
+
+varlistentry
   termoption-c replaceable class=parametercommand/replaceable//term
   termoption--command=replaceable class=parametercommand/replaceable//term
   listitem
@@ -2812,7 +2824,8 @@ bar
 literalqueries/literal,
 applicationpsql/application merely prints all queries as
 they are sent to the server. The switch for this is
-option-e/option.
+option-e/option. If set to literalerrors/literal then only
+failed queries are displayed.
 /para
 /listitem
   /varlistentry
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 60169a2..6551b15 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -995,6 +995,9 @@ SendQuery(const char *query)
 		results = NULL;			/* PQclear(NULL) does nothing */
 	}
 
+	if (!OK  pset.echo == PSQL_ECHO_ERRORS)
+		psql_error(STATEMENT:  %s\n, query);
+
 	/* If we made a temporary savepoint, possibly release/rollback */
 	if (on_error_rollback_savepoint)
 	{
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 3aa3c16..f8f000f 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -87,6 +87,7 @@ usage(void)
 
 	printf(_(\nInput and output options:\n));
 	printf(_(  -a, --echo-all   echo all input from script\n));
+	printf(_(  -b, --echo-errorsecho failed commands\n));
 	printf(_(  -e, --echo-queries   echo commands sent to server\n));
 	printf(_(  -E, --echo-hiddendisplay queries that internal commands generate\n));
 	printf(_(  -L, --log-file=FILENAME  send session log to file\n));
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 0a60e68..453d6c8 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -31,6 +31,7 @@ typedef enum
 {
 	PSQL_ECHO_NONE,
 	PSQL_ECHO_QUERIES,
+	PSQL_ECHO_ERRORS,
 	PSQL_ECHO_ALL
 } PSQL_ECHO;
 
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 45653a1..3ca3f1e 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -354,6 +354,7 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options)
 		{command, required_argument, NULL, 'c'},
 		{dbname, required_argument, NULL, 'd'},
 		{echo-queries, no_argument, NULL, 'e'},
+		{echo-errors, no_argument, NULL, 'b'},
 		{echo-hidden, no_argument, NULL, 'E'},
 		{file, required_argument, NULL, 'f'},
 		{field-separator, required_argument, NULL, 'F'},
@@ -402,6 +403,9 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options)
 			case 'A':
 pset.popt.topt.format = PRINT_UNALIGNED;
 break;
+			case 'b':
+SetVariable(pset.vars, ECHO, errors);
+break;
 			case 'c':
 options-action_string = pg_strdup(optarg);
 if (optarg[0] == '\\')
@@ -720,6 +724,8 @@ echo_hook(const char *newval)
 		pset.echo = PSQL_ECHO_NONE;
 	else if (strcmp(newval, queries) == 0)
 		pset.echo = 

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-06-30 Thread Asif Naeem
Hi Haribabu,

I am not able to apply latest patch on REL9_4_STABLE or master branch i.e.

pc1dotnetpk:postgresql asif$ git apply
 ~/core/min_max_support_for_inet_datatypes/inet_agg_v4.patch
 fatal: unrecognized input


pc1dotnetpk:postgresql asif$ patch -p0 
 ~/core/min_max_support_for_inet_datatypes/inet_agg_v4.patch
 can't find file to patch at input line 3
 Perhaps you used the wrong -p or --strip option?
 The text leading up to this was:
 --
 |*** a/src/backend/utils/adt/network.c
 |--- b/src/backend/utils/adt/network.c
 --
 File to patch:


Is there any other utility required to apply the patch, Can you please
guide ?. Thanks.

Regards,
Muhammad Asif Naeem


On Thu, Jun 5, 2014 at 6:28 AM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

 On Thu, Jun 5, 2014 at 9:12 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  Hi,
 
  On 2014-06-04 10:37:48 +1000, Haribabu Kommi wrote:
  Thanks for the test. Patch is re-based to the latest head.
 
  Did you look at the source of the conflict? Did you intentionally mark
  the functions as leakproof and reviewed that that truly is the case? Or
  was that caused by copy  paste?

 Yes it is copy  paste mistake. I didn't know much about that parameter.
 Thanks for the information. I changed it.

 Regards,
 Hari Babu
 Fujitsu Australia


 --
 Sent via pgsql-bugs mailing list (pgsql-b...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-bugs




Re: [HACKERS] Extending MSVC scripts to support --with-extra-version

2014-06-30 Thread Michael Paquier



On Mon, Jun 30, 2014 at 7:05 PM, Asif Naeem anaeem...@gmail.com wrote:
 Patch looks good to me. I think it is ready for committer. Thanks.
Thanks for the extra checks and for taking the time to review it.
-- 
Michael
On Mon, Jun 30, 2014 at 7:05 PM, Asif Naeem anaeem...@gmail.com wrote:
Thank you for sharing updated patch. I have compared it with MSVC and configure generated build i.e. 
MacOSX (--with-extra-version -30JUN)
pc1dotnetpk:inst asif$ ./bin/psql -d postgres

psql (9.5devel-30JUN)Type help for help.postgres=# select version();                                                                  version

 PostgreSQL 9.5devel-30JUN on x86_64-apple-darwin13.2.0, compiled by Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit(1 row)pc1dotnetpk:inst asif$ ./bin/initdb -Vinitdb (PostgreSQL) 9.5devel-30JUN

Windows7 64bit (extraver = -30JUN)

C:\PG\postgresql\inst_withpatch_v2_extra-versionbin\psql -d postgrespsql (9.5devel-30JUN)WARNING: Console code page (437) differs from Windows code page (1252)         8-bit characters might not work correctly. See psql reference

         page Notes for Windows users for details.Type help for help.postgres=# select version();                               version--

 PostgreSQL 9.5devel-30JUN, compiled by Visual C++ build 1600, 64-bit(1 row)C:\PG\postgresql\inst_withpatch_v2_extra-versionbin\initdb.exe -Vinitdb (PostgreSQL) 9.5devel-30JUN

Patch looks good to me. I think it is ready for committer. Thanks. 
Regards,Muhammad Asif NaeemOn Fri, Jun 27, 2014 at 5:00 AM, Michael Paquier michael.paqu...@gmail.com wrote:

On Fri, Jun 27, 2014 at 8:26 AM, Asif Naeem anaeem...@gmail.com wrote:




I have spent some time reviewing the code. It applies well and PG master branch build fine with setting extraver or keep it undefined.




Thanks for reviewing that. 
I have observed the following output applying the patch i.e. 



It seems that extraver information only appears when version function is being used. If we use -V (--version)  with pg utilities/binaries, it does not include additional provided information. 




You are right. The first version of this patch updates PG_VERSION_STR but not PG_VERSION, which is the string used for all the binaries to report the version. 





Can you please guide how can I perform similar functionality via configure script (that can be used on Unix like OS/MinGW) or It is intended for Window specific requirement ?. Thanks.




Sure, you can do the equivalent with plain configure like that:./configure --with-extra-version=-foo --prefix=/to/path/And here is the output that I get with such options on OSX for example:




$ psql -c select substring(version(), 1, 52)  substring   -- PostgreSQL 9.5devel-foo on x86_64-apple-darwin13.2.0




(1 row)$ initdb --versioninitdb (PostgreSQL) 9.5devel-fooWith the new patch attached, the following output is generated for an MSVC build:$ psql -c select version()  version



 PostgreSQL 9.5devel-foo, compiled by Visual C++ build 1600, 64-bit(1 row)$ initdb --versioninitdb (PostgreSQL) 9.5devel-foo



Regards,--
Michael


-- Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-06-30 Thread Abhijit Menon-Sen
At 2014-06-30 16:35:45 +0500, anaeem...@gmail.com wrote:

 pc1dotnetpk:postgresql asif$ patch -p0 
  ~/core/min_max_support_for_inet_datatypes/inet_agg_v4.patch
  can't find file to patch at input line 3
  Perhaps you used the wrong -p or --strip option?
  The text leading up to this was:
  --
  |*** a/src/backend/utils/adt/network.c
  |--- b/src/backend/utils/adt/network.c
  --

You need to use patch -p1, to strip off the a/b prefix.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Robert Haas
On Sat, Jun 28, 2014 at 9:41 AM, Andres Freund and...@2ndquadrant.com wrote:
 No, I think it's going to be *much* simpler than that.  How about I
 take a crack at this next week and then either (a) I'll see why it's a
 bad idea and we can go from there or (b) you can review what I come up
 with and tell me why it sucks?

 Ok. I think that's going in the wrong direction (duplication of
 nontrivial knowledge), but maybe I'm taking a to 'purist' approach
 here. Prove me wrong :)

I'm not sure what you'll think of this, but see attached.  I think
newer releases of Sun Studio support that GCC way of doing a barrier,
but I don't know how to test for that, so at the moment that uses the
fallback put it in a function approach, along with HPPA non-GCC and
Univel CC.  Those things are obscure enough that I don't care about
making them less performance.  I think AIX is actually OK as-is; if
_check_lock() is a compiler barrier but _clear_lock() is not, that
would be pretty odd.

  How are you suggesting we deal with the generic S_UNLOCK
  case without having a huge ifdef?
  Why do we build an abstraction layer (barrier.h) and then not use it?

 Because (1) the abstraction doesn't fit very well unless we do a lot
 of additional work to build acquire and release barriers for every
 platform we support and

 Meh. Something like the (untested):
 #if !defined(pg_release_barrier)  defined(pg_read_barrier)  
 defined(pg_write_barrier)
 #define pg_release_barrier() do { pg_read_barrier(); pg_write_barrier();} 
 while (0)
 #elif !defined(pg_release_barrier)
 #define pg_release_barrier() pg_memory_barrier()
 #endif

 before the fallback definition of pg_read/write_barrier should suffice?

That doesn't sound like a good idea.  For example, on PPC, a read
barrier is lwsync, and so is a write barrier.  That would also suck on
any architecture where either a read or write barrier gets implemented
as a full memory barrier, though I guess it might be smart enough to
optimize away most of the cost of the second of two barriers in
immediate succession.

I think if we want to use barrier.h as the foundation for this, we're
going to need to define a new set of things in there that have acquire
and release semantics, or just use full barriers for everything -
which would be wasteful on, e.g., x86.  And I don't particularly see
the point in going to a lot of work to invent those new abstractions
everywhere when we can just tweak s_lock.h in place a bit and be done
with it.  Making those files interdependent doesn't strike me as a
win.

 (2) I don't have much confidence that we can
 depend on the spinlock fallback for barriers without completely
 breaking obscure platforms, and I'd rather make a more minimal set of
 changes.

 Well, it's the beginning of the cycle. And we're already depending on
 barriers for correctness (and it's not getting less), so I don't really
 see what avoiding barrier usage buys us except harder to find breakage?

If we use barriers to implement any facility other than spinlocks, I
have reasonable confidence that we won't break things more than they
already are broken today, because I think the fallback to
acquire-and-release a spinlock probably works, even though it's likely
terrible for performance.  I have significantly less confidence that
trying to implement spinlocks in terms of barriers is going to be
reliable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index efe1b43..38dc34d 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -154,6 +154,17 @@ s_lock(volatile slock_t *lock, const char *file, int line)
 	return delays;
 }
 
+#ifdef USE_DEFAULT_S_UNLOCK
+void
+s_unlock(slock_t *lock)
+{
+#ifdef TAS_ACTIVE_WORD
+	*TAS_ACTIVE_WORD(lock) = -1;
+#else
+	*lock = 0;
+#endif
+}
+#endif
 
 /*
  * Set local copy of spins_per_delay during backend startup.
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 895abe6..f1a89dc 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -55,14 +55,16 @@
  *	on Alpha TAS() will fail if interrupted.  Therefore a retry loop must
  *	always be used, even if you are certain the lock is free.
  *
- *	Another caution for users of these macros is that it is the caller's
- *	responsibility to ensure that the compiler doesn't re-order accesses
- *	to shared memory to precede the actual lock acquisition, or follow the
- *	lock release.  Typically we handle this by using volatile-qualified
- *	pointers to refer to both the spinlock itself and the shared data
- *	structure being accessed within the spinlocked critical section.
- *	That fixes it because compilers are not allowed to re-order accesses
- *	to volatile objects relative to other such accesses.
+ *	It is the responsibility of these macros to make sure that the compiler
+ *	does not re-order accesses to shared 

Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-06-30 Thread Michael Paquier
On Mon, Jun 30, 2014 at 6:01 PM, Ronan Dunklau ronan.dunk...@dalibo.com
wrote:

 BOOLARRAYOID should be defined to 1000, not 1003 (which clashes against
 NAMEARRAYOID).

Oops, a bad copy-paste. Thanks for catching that.

I had a more detailed look at the postgres_fdw patch:
1) Having an example of options usable with postgres_fdw would be a good
thing to test the FDW API more extensively. What about simply disable_nulls
to disable NULL/NOT NULL creation on the tables imported.
2) (Should have noticed that earlier, sorry) I don't see any advantages but
only complications in using PQexecParams to launch the queries checking
schema existence remotely and fetching back table definitions as we use
them only once per import. Why not removing the parameter part and build
only plain query strings using StringInfo? This would have the merit to
really simplify fetch_remote_tables.
3) If you add an option, let's get rid of PGFDW_IMPORTRESULT_NUMCOLS as the
number of result columns would change.
4) Instead of using a double-layer of arrays when processing results, I
think that it would make the whole process more readable to have one array
for each result column (attname, atttype, atttypmod, attnotnull)
initialized each time a new row is processed and then process through them
to get the array items. I imagine that a small function doing the array
parsing called on each array result would bring more readability as well to
the process flow.
5) This could be costly with large sets of tables in LIMIT TO... I imagine
that we can live with this O(n log(n)) process as LIMIT TO should not get
big..
foreach(lc1, table_names)
{
found = false;
looked_up = ((RangeVar *) lfirst(lc1))-relname;
foreach(lc2, tables)

What do you think should be documented, and where ?


Two things coming to my mind:
- If postgres_fdw can use options with IMPORT, they should be explicitly
listed in the section FDW Options of postgres_fdw.
- Documenting that postgres_fdw support IMPORT FOREIGN SCHEMA. A simple
paragraph in the main section of postgres_fdw containing descriptions would
be enough.

Once we get that done, I'll do another round of review on this patch and I
think that we will be able to mark it as ready for committer.

Regards,
-- 
Michael


Re: [HACKERS] Priority table or Cache table

2014-06-30 Thread Beena Emerson
On Tue, Jun 3, 2014 at 9:50 AM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

 Sorry for the late reply. Thanks for the test.
 Please find the re-based patch with a temp fix for correcting the problem.
 I will a submit a proper patch fix later.


Please note that the new patch still gives assertion error:

TRAP: FailedAssertion(!(buf-freeNext != (-2)), File: freelist.c, Line:
178)
psql:load_test.sql:5: connection to server was lost


Hence, the patch was installed with assertions off.

I also ran the test script after making the same configuration changes that
you have specified. I found that I was not able to get the same performance
difference that you have reported.

Following table lists the tps in each scenario and the % increase in
performance.

Threads Head PatchedDiff
1  1669  1718  3%
2  2844  3195  12%
4  3909  4915  26%
8  7332  8329 14%

Kindly let me know if I am missing something.

--
Beena Emerson


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-30 Thread Robert Haas
On Fri, Jun 27, 2014 at 2:23 PM, Stephen Frost sfr...@snowman.net wrote:
 I disagree with Stephen's proposal that this should be in core, or
 that it needs its own dedicated syntax.  I think contrib is a great
 place for extensions like this.  That makes it a whole lot easier for
 people to develop customized vesions that meet particular needs they
 may have, and it avoids bloating core with a bunch of stuff that is
 only of interest to a subset of users.  We spent years getting sepgsql
 into a form that could be shipped in contrib without substantially
 burdening core, and I don't see why this feature should be handed any
 differently.

 I don't exactly see how an extension or contrib module is going to be
 able to have a reasonable catalog structure which avoids the risk that
 renaming an object (role, schema, table, column, policy) will break the
 configuration without being part of core.

At some level, I guess that's true, but a custom GUC would get you
per-role and per-database and a custom reloption could take it
further.  A security label provider specifically for auditing can
already associate custom metadata with just about any SQL-visible
object in the system.

 Add on to that the desire for
 audit control to be a grant'able right along the lines of:

 GRANT AUDIT ON TABLE x TO audit_role;

 which allows a user who is not the table owner or a superuser to manage
 the auditing.

As Tom would say, I think you just moved the goalposts into the next
county.  I don't know off-hand what that syntax is supposed to allow,
and I don't think it's necessary for pgaudit to implement that (or
anything else that you may want) in order to be a viable facility.

 In fact, considering how much variation there is likely to be between
 auditing requirements at different sites, I'm not sure this should
 even be in contrib at this point.  It's clearly useful, but there is
 no requirement that every useful extension must be bundled with the
 core server, and in fact doing so severely limits our ability to make
 further changes.

 For my part, I do view auditing as being a requirement we should be
 addressing through core- and not just for the reasons mentioned above.
 We might be able to manage all of the above through an extension,
 however, as we continue to add capabilities and features, new types and
 methods, ways to extend the system, and more, auditing needs to keep
 pace and be considered a core feature as much as the GRANT system is.

I think the fact that pgaudit does X and you think it should do Y is a
perfect example of why we're nowhere close to being ready to push
anything into core.  We may very well want to do that someday, but not
yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS Design

2014-06-30 Thread Robert Haas
On Sun, Jun 29, 2014 at 3:42 PM, Stephen Frost sfr...@snowman.net wrote:
  An interesting question we haven't much considered is: who can set up
  policies and add then to users?  Maybe we should flip this around, and
  instead of adding users to policies, we should exempt users from
  policies.
 
  CREATE POLICY p1;
 
  And then, if they own p1 and t1, they can do:
 
  ALTER TABLE t1 SET POLICY p1 TO t1_p1_quals;
  (or maybe we should associate it to the policy instead of the table:
  ALTER POLICY p1 SET TABLE t1 TO t1_p1_quals)
 
  And then the policy applies to everyone who doesn't have the grantable
  EXEMPT privilege on the policy.  The policy owner and superuser have
  that privilege by default and it can be handed out to others like
  this:
 
  GRANT EXEMPT ON POLICY p1 TO snowden;
 
  Then users who have row_level_security=on will bypass RLS if possible,
  and otherwise it will be applied.  Users who have
  row_level_security=off will bypass RLS if possible, and otherwise
  error.  And users who have row_level_security=force will apply RLS
  even if they are entitled to bypass it.

 That's interesting. I need to think some more about what that means.

 I'm not a fan of the EXEMPT approach..

Just out of curiosity, why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Andres Freund
Hi,

On 2014-06-30 08:03:40 -0400, Robert Haas wrote:
 On Sat, Jun 28, 2014 at 9:41 AM, Andres Freund and...@2ndquadrant.com wrote:
  No, I think it's going to be *much* simpler than that.  How about I
  take a crack at this next week and then either (a) I'll see why it's a
  bad idea and we can go from there or (b) you can review what I come up
  with and tell me why it sucks?
 
  Ok. I think that's going in the wrong direction (duplication of
  nontrivial knowledge), but maybe I'm taking a to 'purist' approach
  here. Prove me wrong :)
 
 I'm not sure what you'll think of this, but see attached.

I think it continues in the tradition of making s_lock.h ever harder to
follow. But it's still better than what we have now from a correctness
perspective.

  I think
 newer releases of Sun Studio support that GCC way of doing a barrier,
 but I don't know how to test for that, so at the moment that uses the
 fallback put it in a function approach,

Sun studio's support for the gcc way is new (some update to sun studio 12), so 
that's
probably not sufficient.
#include mbarrier.h and __compiler_barrier()/__machine_rel_barrier()
should do the trick for spinlocks. That seems to have existed for
longer.

Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO,
relaxed memory ordering), so it's probably fine to just use the compiler
barrier.

 along with HPPA non-GCC and
 Univel CC.  Those things are obscure enough that I don't care about
 making them less performance.

Fine with me.

  I think AIX is actually OK as-is; if _check_lock() is a compiler
 barrier but _clear_lock() is not, that would be pretty odd.

Agreed.

   How are you suggesting we deal with the generic S_UNLOCK
   case without having a huge ifdef?
   Why do we build an abstraction layer (barrier.h) and then not use it?
 
  Because (1) the abstraction doesn't fit very well unless we do a lot
  of additional work to build acquire and release barriers for every
  platform we support and
 
  Meh. Something like the (untested):
  #if !defined(pg_release_barrier)  defined(pg_read_barrier)  
  defined(pg_write_barrier)
  #define pg_release_barrier() do { pg_read_barrier(); pg_write_barrier();} 
  while (0)
  #elif !defined(pg_release_barrier)
  #define pg_release_barrier() pg_memory_barrier()
  #endif
 
  before the fallback definition of pg_read/write_barrier should suffice?
 
 That doesn't sound like a good idea.  For example, on PPC, a read
 barrier is lwsync, and so is a write barrier.  That would also suck on
 any architecture where either a read or write barrier gets implemented
 as a full memory barrier, though I guess it might be smart enough to
 optimize away most of the cost of the second of two barriers in
 immediate succession.

Well, that's why I suggested only doing it if we haven't got a
pg_release_barrier() defined. And fallback to memory_barrier() directly
if read/write barriers are implemented using it so we don't have two
memory barriers in a row.

 I think if we want to use barrier.h as the foundation for this, we're
 going to need to define a new set of things in there that have acquire
 and release semantics, or just use full barriers for everything -
 which would be wasteful on, e.g., x86.  And I don't particularly see
 the point in going to a lot of work to invent those new abstractions
 everywhere when we can just tweak s_lock.h in place a bit and be done
 with it.  Making those files interdependent doesn't strike me as a
 win.

We'll need release semantics in more places than just s_lock.h. Anything
that acts like a lock without using spinlocks actually needs
acquire/release semantics. The lwlock patch only gets away with it
because the atomic operations implementation happen to act as acquire or
full memory barriers.

  (2) I don't have much confidence that we can
  depend on the spinlock fallback for barriers without completely
  breaking obscure platforms, and I'd rather make a more minimal set of
  changes.
 
  Well, it's the beginning of the cycle. And we're already depending on
  barriers for correctness (and it's not getting less), so I don't really
  see what avoiding barrier usage buys us except harder to find breakage?
 
 If we use barriers to implement any facility other than spinlocks, I
 have reasonable confidence that we won't break things more than they
 already are broken today, because I think the fallback to
 acquire-and-release a spinlock probably works, even though it's likely
 terrible for performance.  I have significantly less confidence that
 trying to implement spinlocks in terms of barriers is going to be
 reliable.

So you believe we have a reliable barrier implementation - but you don't
believe it's reliable enough for spinlocks? If we'd *not* use the
barrier fallback for spinlock release if, and only if, it's implemented
via spinlocks, I don't see why we'd be worse off?

 +#if !defined(S_UNLOCK)
 +#if defined(__INTEL_COMPILER)
 +#define S_UNLOCK(lock)   \
 + do { 

Re: [HACKERS] RLS Design

2014-06-30 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Sun, Jun 29, 2014 at 3:42 PM, Stephen Frost sfr...@snowman.net wrote:
   An interesting question we haven't much considered is: who can set up
   policies and add then to users?  Maybe we should flip this around, and
   instead of adding users to policies, we should exempt users from
   policies.
  
   CREATE POLICY p1;
  
   And then, if they own p1 and t1, they can do:
  
   ALTER TABLE t1 SET POLICY p1 TO t1_p1_quals;
   (or maybe we should associate it to the policy instead of the table:
   ALTER POLICY p1 SET TABLE t1 TO t1_p1_quals)
  
   And then the policy applies to everyone who doesn't have the grantable
   EXEMPT privilege on the policy.  The policy owner and superuser have
   that privilege by default and it can be handed out to others like
   this:
  
   GRANT EXEMPT ON POLICY p1 TO snowden;
  
   Then users who have row_level_security=on will bypass RLS if possible,
   and otherwise it will be applied.  Users who have
   row_level_security=off will bypass RLS if possible, and otherwise
   error.  And users who have row_level_security=force will apply RLS
   even if they are entitled to bypass it.
 
  That's interesting. I need to think some more about what that means.
 
  I'm not a fan of the EXEMPT approach..
 
 Just out of curiosity, why not?

I don't see it as really solving the flexibility need and it feels quite
a bit more complicated to reason about.  Would someone who is EXEMPT
from one policy on a given table still have other policies on that table
applied to them?  Would a user be able to be EXEMPT from multiple
policies?  I feel like that's what you're suggesting with this approach,
otherwise I don't see it as really different from the 'DIRECT SELECT'
privilege discussed previously..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] inherit support for foreign tables

2014-06-30 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Done.  I think this is because create_foreignscan_plan() makes reference 
 to attr_needed, which isn't computed for inheritance children.

I wonder whether it isn't time to change that.  It was coded like that
originally only because calculating the values would've been a waste of
cycles at the time.  But this is at least the third place where it'd be
useful to have attr_needed for child rels.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-30 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Jun 27, 2014 at 2:23 PM, Stephen Frost sfr...@snowman.net wrote:
  I don't exactly see how an extension or contrib module is going to be
  able to have a reasonable catalog structure which avoids the risk that
  renaming an object (role, schema, table, column, policy) will break the
  configuration without being part of core.
 
 At some level, I guess that's true, but a custom GUC would get you
 per-role and per-database and a custom reloption could take it
 further.  A security label provider specifically for auditing can
 already associate custom metadata with just about any SQL-visible
 object in the system.

Ugh.  Yes, we could implement a lot of things using per-object GUCs and
reloptions.  We could do the same for columns with custom attoptions.
For my part, that design looks absolutely horrible to me for
more-or-less everything.  I certainly don't feel like it's the solution
which extension authors are looking for and will be happy with, nor do I
feel it's the right answer for our users.

  Add on to that the desire for
  audit control to be a grant'able right along the lines of:
 
  GRANT AUDIT ON TABLE x TO audit_role;
 
  which allows a user who is not the table owner or a superuser to manage
  the auditing.
 
 As Tom would say, I think you just moved the goalposts into the next
 county.  I don't know off-hand what that syntax is supposed to allow,
 and I don't think it's necessary for pgaudit to implement that (or
 anything else that you may want) in order to be a viable facility.

Perhaps I should have used the same argument on the RLS thread..
Instead, I realized that you and others were right- we don't want to
implement and commit something which would make adding the features
being asked for very difficult to do in the future.  The point of the
command above is to allow a role *other* than the table owner to
control the auditing on the table.  It's important in certain places
that the auditor have the ability to control the auditing and *only*
the auditing- not drop the table or change its definition.

I'm also not asking for this to be implemented today in pgaudit but
rather to point out that we should have a design which at least allows
us to get to the point where these features can be added.  Perhaps
there's a path to allowing the control I'm suggesting here with the
existing pgaudit design and with pgaudit as an extension- if so, please
speak up and outline it sufficiently that we can understand that path
and know that we're not putting ourselves into a situation where we
won't be able to support that flexibility.

  For my part, I do view auditing as being a requirement we should be
  addressing through core- and not just for the reasons mentioned above.
  We might be able to manage all of the above through an extension,
  however, as we continue to add capabilities and features, new types and
  methods, ways to extend the system, and more, auditing needs to keep
  pace and be considered a core feature as much as the GRANT system is.
 
 I think the fact that pgaudit does X and you think it should do Y is a
 perfect example of why we're nowhere close to being ready to push
 anything into core.  We may very well want to do that someday, but not
 yet.

That's fine- but don't push something in which will make it difficult to
add these capabilities later (and, to be clear, I'm not asking out of
pipe dreams and wishes but rather after having very specific discussions
with users and reviewing documents like NIST 800-53, which is publically
available for anyone to peruse).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Robert Haas
On Mon, Jun 30, 2014 at 9:26 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think it continues in the tradition of making s_lock.h ever harder to
 follow. But it's still better than what we have now from a correctness
 perspective.

Well, as you and I have discussed before, someday we probably need to
get rid of s_lock.h or at least refactor it heavily, but let's do one
thing at a time.  I think we're eventually going to require every
platform that wants to run PostgreSQL to have working barriers and
atomics, and then we can rewrite s_lock.h into something much shorter
and cleaner, but I am opposed to doing that today, because even if we
don't care about people running obscure proprietary compilers on weird
platforms, there are still lots of people running older GCC versions.
For right now, I think the prudent thing to do is keep s_lock.h on
life support.

  I think
 newer releases of Sun Studio support that GCC way of doing a barrier,
 but I don't know how to test for that, so at the moment that uses the
 fallback put it in a function approach,

 Sun studio's support for the gcc way is new (some update to sun studio 12), 
 so that's
 probably not sufficient.
 #include mbarrier.h and __compiler_barrier()/__machine_rel_barrier()
 should do the trick for spinlocks. That seems to have existed for
 longer.

Can you either link to relevant documentation or be more specific
about what needs to be done here?

 Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO,
 relaxed memory ordering), so it's probably fine to just use the compiler
 barrier.

If it isn't, that's a change that has nothing to do with making
spinlock operations compiler barriers and everything to do with fixing
a bug in the existing implementation.

 So you believe we have a reliable barrier implementation - but you don't
 believe it's reliable enough for spinlocks? If we'd *not* use the
 barrier fallback for spinlock release if, and only if, it's implemented
 via spinlocks, I don't see why we'd be worse off?

I can't parse this sentence because it's not clearly to me exactly
which part the not applies to, and I think we're talking past each
other a bit, too.  Basically, every platform we support today has a
spinlock implementation that is supposed to prevent CPU reordering
across the acquire and release operations but might not prevent
compiler reordering.  IOW, S_LOCK() should be a CPU acquire fence, and
S_UNLOCK() should be a CPU release fence.  Now, we want to make these
operations compiler fences as well, and AIUI your proposal for that is
to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy)
+ S_UNLOCK(dummy) + S_UNLOCK(lock).  My proposal is to make
NEW_S_UNLOCK(lock) = out of line function call + S_UNLOCK(lock), which
I think is strictly better.  There's zip to guarantee that adding
S_LOCK(dummy) + S_UNLOCK(dummy) is doing anything we want in there,
and it's definitely going to be worse for performance.

The other thing that I don't like about your proposal has to do with
the fact that the support matrix for barrier.h and s_lock.h are not
identical.  If s_lock.h is confusing to you today, making it further
intertwined with barrier.h is not going to make things better; at
least, that confuses the crap out of me.  Perhaps the only good thing
to be said about this mess is that, right now, the dependency is in
just one direction: barrier.h depends on s_lock.h, and not the other
way around.  At some future point we may well want to reverse the
direction of that dependency, but to do that we need barrier.h to have
a working barrier implementation for every platform that s_lock.h
supports.  Maybe we'll accomplish that by adding to barrier.h and and
maybe we'll accomplish that by subtracting from s_lock.h but the short
path to getting this issue fixed is to be agnostic to that question.

 1)
 Afaics ARM will fall back to this for older gccs - and ARM is weakly
 ordered. I think I'd just also use swpb to release the lock. Something
 like
 #define S_INIT_LOCK(lock)   (*(lock)) = 0);

 #define S_UNLOCK s_unlock
 static __inline__ void
 s_unlock(volatile slock_t *lock)
 {
 register slock_t _res = 0;

 __asm__ __volatile__(
swpb%0, %0, [%2]\n
 :   +r(_res), +m(*lock)
 :   r(lock)
 :   memory);
 Assert(_res == 1); // lock should be held by us
 }

 2)
 Afaics it's also wrong for sparc on linux (and probably the BSDs) where
 relaxed memory ordering is used.

 3)
 Also looks wrong on mips which needs a full memory barrier.

You're mixing apples and oranges again.  If the existing definitions
aren't CPU barriers, they're already broken, and we should fix that
and back-patch.  On the other hand, the API contract change making
these into compiler barriers is a master-only change.  I think for
purposes of this patch we should assume that the existing code is
sufficient to prevent CPU reordering and just focus on fixing 

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-30 Thread Abhijit Menon-Sen
At 2014-06-30 09:39:29 -0400, sfr...@snowman.net wrote:

 I certainly don't feel like it's the solution which extension authors
 are looking for and will be happy with

I don't know if there are any other extension authors involved in this
discussion, but I'm not shedding any tears over the idea. (That may be
because I see operational compatibility with 9.[234] as a major plus,
not a minor footnote.)

  As Tom would say, I think you just moved the goalposts into
  the next county.

(And they're not even the same distance apart any more. ;-)

 That's fine- but don't push something in which will make it difficult
 to add these capabilities later

I've been trying to understand why a pgaudit extension (which already
exists) will make it difficult to add a hypothetical GRANT AUDIT ON
goalpost TO referee syntax later. About the only thing I've come up
with is people complaining about having to learn the new syntax when
they were used to the old one.

Surely that's not the sort of thing you mean? (You've mentioned
pg_upgrade and backwards compatibility too, and I don't really
understand those either.)

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS Design

2014-06-30 Thread Robert Haas
On Mon, Jun 30, 2014 at 9:42 AM, Stephen Frost sfr...@snowman.net wrote:
  I'm not a fan of the EXEMPT approach..

 Just out of curiosity, why not?

 I don't see it as really solving the flexibility need and it feels quite
 a bit more complicated to reason about.  Would someone who is EXEMPT
 from one policy on a given table still have other policies on that table
 applied to them?

Yes; otherwise, EXEMPT couldn't be granted by non-superusers, and the
whole point of that proposal was to come up with something that would
be clearly safe for ordinary users to use.

 Would a user be able to be EXEMPT from multiple
 policies?

Yes, clearly.  It would be a privilege on the policy object, so
different objects can have different privileges.

 I feel like that's what you're suggesting with this approach,
 otherwise I don't see it as really different from the 'DIRECT SELECT'
 privilege discussed previously..

Right.  If you took that away, it wouldn't be different.

The number of possible approaches here has expanded beyond what I can
keep in my head; I'm assuming you are planning to think this over and
propose something comprehensive, or maybe Dean or someone else will do
that.  But I'm not sure that all the approaches proposed would make it
safe for non-superusers to use RLS, and I think it would be good if
they could.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-30 Thread Stephen Frost
Abhijit,

* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:
 At 2014-06-30 09:39:29 -0400, sfr...@snowman.net wrote:
  I certainly don't feel like it's the solution which extension authors
  are looking for and will be happy with
 
 I don't know if there are any other extension authors involved in this
 discussion, but I'm not shedding any tears over the idea. (That may be
 because I see operational compatibility with 9.[234] as a major plus,
 not a minor footnote.)

We're certainly not going to include this in the -contrib packages for
9.4-or-earlier, so I've not really been thinking about those concerns,
no.  Indeed, if you want that to be supported by packagers, they'd
probably be happier with it being an independent extension distributed
through pgxn or similar..  Having an extension for 9.4-and-earlier which
isn't in -contrib then be moved to -contrib for 9.5 is at least an
annoyance when it comes to packaging.

  That's fine- but don't push something in which will make it difficult
  to add these capabilities later
 
 I've been trying to understand why a pgaudit extension (which already
 exists) will make it difficult to add a hypothetical GRANT AUDIT ON
 goalpost TO referee syntax later. About the only thing I've come up
 with is people complaining about having to learn the new syntax when
 they were used to the old one.

I do think people will complain a bit about such a change, but I expect
that to be on the level of grumblings rather than real issues.

 Surely that's not the sort of thing you mean? (You've mentioned
 pg_upgrade and backwards compatibility too, and I don't really
 understand those either.)

Where would you store the information about the GRANTs?  In the
reloptions too?  Or would you have to write pg_upgrade to detect if
pgaudit had been installed and had played with the reloptions field to
then extract out the values from it into proper columns or an actual
table, not just for grants but for any other custom reloptions which
were used?  Were pgaudit to grow any tables, functions, etc, we'd have
to address how those would be handled by pg_upgrade also.  I don't think
that's an issue currently, but we'd have to be darn careful to make sure
it doesn't happen or we end up with the same upgrade issues hstore has.

We've absolutely had push-back regarding breaking things for people who
have installed extensions from -contrib by moving those things into core
without a very clearly defined way to *not* break them.  There have been
proposals in the past to explicitly allocate/reserve OIDs to address
this issue but those haven't been successful.

I'm not saying that I know it's going to be impossible- I'm asking
that we consider how this might move into core properly in a way that
will make it possible to pg_upgrade.  Part of that does include
considering what the design of the core solution will offer and what
feature set it will have..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] RLS Design

2014-06-30 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Jun 30, 2014 at 9:42 AM, Stephen Frost sfr...@snowman.net wrote:
  I don't see it as really solving the flexibility need and it feels quite
  a bit more complicated to reason about.  Would someone who is EXEMPT
  from one policy on a given table still have other policies on that table
  applied to them?
 
 Yes; otherwise, EXEMPT couldn't be granted by non-superusers, and the
 whole point of that proposal was to come up with something that would
 be clearly safe for ordinary users to use.

I'm confused on this part- granting EXEMPT and/or DIRECT SELECT would
definitely need to be supported by a non-superuser, though someone who
had the appropriate rights on the object involved (either the policy or
the table, depending on where we feel that definition should go).

  Would a user be able to be EXEMPT from multiple
  policies?
 
 Yes, clearly.  It would be a privilege on the policy object, so
 different objects can have different privileges.

Ok..  then I'm not entirely sure how this is different from Dean's
proposal except that it's a way of defining the inverse, which we don't
do anywhere else in the system today..

  I feel like that's what you're suggesting with this approach,
  otherwise I don't see it as really different from the 'DIRECT SELECT'
  privilege discussed previously..
 
 Right.  If you took that away, it wouldn't be different.

Ok.

 The number of possible approaches here has expanded beyond what I can
 keep in my head; I'm assuming you are planning to think this over and
 propose something comprehensive, or maybe Dean or someone else will do
 that.  But I'm not sure that all the approaches proposed would make it
 safe for non-superusers to use RLS, and I think it would be good if
 they could.

I've been thinking about it quite a bit over the past few days (weeks?)
and trying to continue to outline the proposals as they've changed.
I'll try and work up another comprehensive email which covers the
options currently under discussion as I understand them.  Allowing
non-superuser to use RLS is absolutely key to this in any case- it'd be
great if you could voice any specific concerns you see there.  We've
already been working through the GUCs previously discussed, as I feel
those will be necessary for any of these approaches (in particular the
bypass RLS-or-error GUC which pg_dump will enable by default).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-30 Thread Robert Haas
On Mon, Jun 30, 2014 at 9:39 AM, Stephen Frost sfr...@snowman.net wrote:
 I think the fact that pgaudit does X and you think it should do Y is a
 perfect example of why we're nowhere close to being ready to push
 anything into core.  We may very well want to do that someday, but not
 yet.

 That's fine- but don't push something in which will make it difficult to
 add these capabilities later (and, to be clear, I'm not asking out of
 pipe dreams and wishes but rather after having very specific discussions
 with users and reviewing documents like NIST 800-53, which is publically
 available for anyone to peruse).

I don't think that's a valid objection.  If we someday have auditing
in core, and if it subsumes what pgaudit does, then whatever
interfaces pgaudit implements can be replaced with wrappers around the
core functionality, just as we did for text search.

But personally, I think this patch deserves to be reviewed on its own
merits, and not the extent to which it satisfies your requirements, or
those of NIST 800-53.  As I said before, I think auditing is a
complicated topic and there's no guarantee that one solution will be
right for everyone.  As long as we keep those solutions out of core,
there's no reason that multiple solutions can't coexist; people can
pick the one that best meets their requirements.  As soon as we start
talking about something putting into core, the bar is a lot higher,
because we're not going to put two auditing solutions into core, so if
we do put one in, it had better be the right thing for everybody.  I
don't even think we should be considering that at this point; I think
the interesting (and under-discussed) question on this thread is
whether it even makes sense to put this into contrib.  That means we
need some review of the patch for what it is, which there hasn't been
much of, yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-30 Thread Simon Riggs
On 12 May 2014 17:14, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 05/12/2014 06:26 PM, Andres Freund wrote:

 With the new commit-in-progress status in clog, we won't need the
 sub-committed clog status anymore. The commit-in-progress status will
 achieve the same thing.

 Wouldn't that cause many spurious waits? Because commit-in-progress
 needs to be waited on, but a sub-committed xact surely not?


 Ah, no. Even today, a subxid isn't marked as sub-committed, until you commit
 the top-level transaction. The sub-commit state is a very transient state
 during the commit process, used to make the commit of the sub-transactions
 and the top-level transaction appear atomic. The commit-in-progress state
 would be a similarly short-lived state. You mark the subxids and the top xid
 as commit-in-progress just before the XLogInsert() of the commit record, and
 you replace them with the real LSNs right after XLogInsert().

My understanding is that we aim to speed up the rate of Snapshot
reads. Which may make it feasible to store autonomous transactions in
shared memory, hopefully DSM.

The above mechanism sounds like it might slow down transaction commit.
Could we prototype that and measure the speed?

More generally, do we have any explicit performance goals or
acceptance criteria for this patch?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-30 Thread Robert Haas
On Sat, Jun 28, 2014 at 10:35 AM, Kevin Grittner kgri...@ymail.com wrote:
 David Fetter da...@fetter.org wrote:
 On Sat, Jun 21, 2014 at 11:06:26AM -0700, Kevin Grittner wrote:

 Here is v2.

 I've taken the liberty of making an extension that uses this.
 Preliminary tests indicate a 10x performance improvement over the
 user-space hack I did that's similar in functionality.

 Wow, this goes well beyond what I expected for a review!  Thanks!

 As I said in an earlier post, I think that this is best committed
 as a series of patches, one for the core portion and one for each
 PL which implements the ability to use the transition (delta)
 relations in AFTER triggers.  Your extension covers the C trigger
 angle, and it seems to me to be worth committing to contrib as a
 sample of how to use this feature in C.

 It is very encouraging that you were able to use this without
 touching what I did in core, and that it runs 10x faster than the
 alternatives before the patch.

 Because this review advances the patch so far, it may be feasible
 to get it committed in this CF.  I'll see what is needed to get
 there and maybe have a patch toward that end in a few days.  The
 minimum that would require, IMV, is a plpgsql implementation,
 moving the new pg_trigger columns to the variable portion of the
 record so they can be null capable, more docs, and regression
 tests.

Not to rain on your parade, but this patch hasn't really had a serious
code review yet.  Performance testing is good, but it's not the same
thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Andres Freund
On 2014-06-30 10:05:44 -0400, Robert Haas wrote:
 On Mon, Jun 30, 2014 at 9:26 AM, Andres Freund and...@2ndquadrant.com wrote:
  I think it continues in the tradition of making s_lock.h ever harder to
  follow. But it's still better than what we have now from a correctness
  perspective.
 
 Well, as you and I have discussed before, someday we probably need to
 get rid of s_lock.h or at least refactor it heavily, but let's do one
 thing at a time.

Well, that task gets harder by making it more complex...

(will answer separately about sun studio)

  Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO,
  relaxed memory ordering), so it's probably fine to just use the compiler
  barrier.
 
 If it isn't, that's a change that has nothing to do with making
 spinlock operations compiler barriers and everything to do with fixing
 a bug in the existing implementation.

Well, it actually has. If we start to remove volatiles from critical
sections the compiler will schedule stores closer to the spinlock
release and reads more freely - making externally visible ordering
violations more likely. Especially on itanic.

So, I agree that we need to fix unlocks that aren't sufficiently strong
memory barriers, but it does get more urgent by not relying on volatile
inside criticial sections anymore.

  Basically, every platform we support today has a
 spinlock implementation that is supposed to prevent CPU reordering
 across the acquire and release operations but might not prevent
 compiler reordering.  IOW, S_LOCK() should be a CPU acquire fence, and
 S_UNLOCK() should be a CPU release fence.

Well, I think s_lock.h comes woefully short of that goal on several
platforms. Scary.

 Now, we want to make these
 operations compiler fences as well, and AIUI your proposal for that is
 to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy)
 + S_UNLOCK(dummy) + S_UNLOCK(lock).

My proposal was to use barrier.h provided barriers as long as it
provides a native implementation. If neither a compiler nor a memory
barrier is provided my proposal was to fall back to the memory barrier
emulation we already have, but move it out of line (to avoid reordering
issues). The reason for doing so is that the release *has* to be a
release barrier.

To avoid issues with recursion the S_UNLOCK() used in the out of line
memory barrier implementation used a modified S_UNLOCK that doesn't
include a barrier.

 My proposal is to make
 NEW_S_UNLOCK(lock) = out of line function call + S_UNLOCK(lock), which
 I think is strictly better.  There's zip to guarantee that adding
 S_LOCK(dummy) + S_UNLOCK(dummy) is doing anything we want in there,
 and it's definitely going to be worse for performance.

Uh? At the very least doing a S_LOCK() guarantees that we're doing some
sort of memory barrier, which your's doesn't at all. That'd actually fix
the majority of platforms with borked release semantics because pretty
much all tas() implementations are full barriers.

 The other thing that I don't like about your proposal has to do with
 the fact that the support matrix for barrier.h and s_lock.h are not
 identical.  If s_lock.h is confusing to you today, making it further
 intertwined with barrier.h is not going to make things better; at
 least, that confuses the crap out of me.

Uh. It actually *removed* the confusing edge of the dependency. It's
rather confusing that memory barriers rely spinlocks and not the other
way round. I think we actually should do that unconditionally,
independent of any other changes. The only reasons barrier.h includes
s_lock.h is that dummy_spinlock is declared and that the memory barrier
is declared inline.

  1)
  Afaics ARM will fall back to this for older gccs - and ARM is weakly
  ordered. I think I'd just also use swpb to release the lock. Something
  like
  #define S_INIT_LOCK(lock)   (*(lock)) = 0);
 
  #define S_UNLOCK s_unlock
  static __inline__ void
  s_unlock(volatile slock_t *lock)
  {
  register slock_t _res = 0;
 
  __asm__ __volatile__(
 swpb%0, %0, [%2]\n
  :   +r(_res), +m(*lock)
  :   r(lock)
  :   memory);
  Assert(_res == 1); // lock should be held by us
  }
 
  2)
  Afaics it's also wrong for sparc on linux (and probably the BSDs) where
  relaxed memory ordering is used.
 
  3)
  Also looks wrong on mips which needs a full memory barrier.
 
 You're mixing apples and oranges again.

No, I'm not.

 If the existing definitions
 aren't CPU barriers, they're already broken, and we should fix that
 and back-patch.

Yea, and that gets harder if we do it after master has changed
incompatibly. And, as explained above, we need to fix S_UNLOCK() to be a
memory barrier before we can remove volatiles - which is the goal of
your patch, no?

 On the other hand, the API contract change making
 these into compiler barriers is a master-only change.  I think for
 purposes of this patch we should assume 

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-30 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Jun 30, 2014 at 9:39 AM, Stephen Frost sfr...@snowman.net wrote:
  I think the fact that pgaudit does X and you think it should do Y is a
  perfect example of why we're nowhere close to being ready to push
  anything into core.  We may very well want to do that someday, but not
  yet.
 
  That's fine- but don't push something in which will make it difficult to
  add these capabilities later (and, to be clear, I'm not asking out of
  pipe dreams and wishes but rather after having very specific discussions
  with users and reviewing documents like NIST 800-53, which is publically
  available for anyone to peruse).
 
 I don't think that's a valid objection.  If we someday have auditing
 in core, and if it subsumes what pgaudit does, then whatever
 interfaces pgaudit implements can be replaced with wrappers around the
 core functionality, just as we did for text search.

I actually doubt that we'd be willing to do what we did for text search
again.  Perhaps I'm wrong, which would be great, but I doubt it.

 But personally, I think this patch deserves to be reviewed on its own
 merits, and not the extent to which it satisfies your requirements, or
 those of NIST 800-53.  

eh?  The focus of this patch is to add auditing to PG and having very
clearly drawn auditing requirements of a very large customer isn't
relevant?  I don't follow that logic at all.  In fact, I thought the
point of pgaudit was specifically to help address the issues which are
being raised from this section of our user base (perhaps not those who
follow NIST, but at least those organizations with similar interests..).

 As I said before, I think auditing is a
 complicated topic and there's no guarantee that one solution will be
 right for everyone.  As long as we keep those solutions out of core,
 there's no reason that multiple solutions can't coexist; people can
 pick the one that best meets their requirements.  As soon as we start
 talking about something putting into core, the bar is a lot higher,
 because we're not going to put two auditing solutions into core, so if
 we do put one in, it had better be the right thing for everybody.

I agree that auditing is a complicated topic.  Given the dearth of
existing auditing solutions, I seriously doubt we're going to actually
see more than one arise.  I'd much rather focus on an in-core solution
which allows the mechanism by which auditing logs are produced by
through an extension (perhaps through hooks in the right places) but the
actual definition of *what* gets audited to be defined through SQL with
a permissions system that works with the rest of the system.

What I'm getting at is this- sure, different users will want to have
their audit logs be in different formats (JSON, XML, CSV, whatever), or
sent via different means (logged to files, sent to syslog, forwarded on
to a RabbitMQ system, splunk, etc), but the definition of what gets
audited and allowing individuals other than the owners to be able to
modify the auditing is much more clear-cut to me as there's only so many
different objects in the system and only so many operations which can be
performed on them..  This strikes me as similar to security labels, as
was discussed previously.

 I
 don't even think we should be considering that at this point; I think
 the interesting (and under-discussed) question on this thread is
 whether it even makes sense to put this into contrib.  That means we
 need some review of the patch for what it is, which there hasn't been
 much of, yet.

Evidently I wasn't very clear on this point but my concerns are about it
going into contrib, which is what's been proposed, because I worry about
an eventual upgrade path from contrib to an in-core solution.  I'm also
a bit nervous that a contrib solution might appear 'good enough' to
enough committers that the push-back on an in-core solution would lead
to us never having one.  These concerns are, perhaps, not very palatable
but I feel they're real enough to bring up.

contrib works quite well for stand-alone sets of functions which don't,
and never will, have any in-core direct grammar or catalog support.
While pgaudit doesn't have those today, I'd like to see *auditing* in
PG, eventually, which has both a real grammar and a catalog definition.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] delta relations in AFTER triggers

2014-06-30 Thread David Fetter
On Mon, Jun 30, 2014 at 11:03:06AM -0400, Robert Haas wrote:
 On Sat, Jun 28, 2014 at 10:35 AM, Kevin Grittner kgri...@ymail.com wrote:
  David Fetter da...@fetter.org wrote:
  On Sat, Jun 21, 2014 at 11:06:26AM -0700, Kevin Grittner wrote:
 
  Here is v2.
 
  I've taken the liberty of making an extension that uses this.
  Preliminary tests indicate a 10x performance improvement over the
  user-space hack I did that's similar in functionality.
 
  Wow, this goes well beyond what I expected for a review!  Thanks!
 
  As I said in an earlier post, I think that this is best committed
  as a series of patches, one for the core portion and one for each
  PL which implements the ability to use the transition (delta)
  relations in AFTER triggers.  Your extension covers the C trigger
  angle, and it seems to me to be worth committing to contrib as a
  sample of how to use this feature in C.
 
  It is very encouraging that you were able to use this without
  touching what I did in core, and that it runs 10x faster than the
  alternatives before the patch.
 
  Because this review advances the patch so far, it may be feasible
  to get it committed in this CF.  I'll see what is needed to get
  there and maybe have a patch toward that end in a few days.  The
  minimum that would require, IMV, is a plpgsql implementation,
  moving the new pg_trigger columns to the variable portion of the
  record so they can be null capable, more docs, and regression
  tests.
 
 Not to rain on your parade, but this patch hasn't really had a serious
 code review yet.  Performance testing is good, but it's not the same
 thing.

Happy to help with that, too.

What I wanted to start with is whether there was even rudimentary
functionality, which I established by writing that extension.  I
happened to notice, basically as a sanity check, that doing this via
tuplestores happened, at least in one case, to be quicker than doing
it in user space with temp tables.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-06-30 Thread Robert Haas
On Mon, Jun 30, 2014 at 4:13 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, I have received inquiries related to blocked communication
 several times for these weeks with different symptoms. Then I
 found this message from archive,

 http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html

 Subject: Escaping a blocked sendto() syscall without causing a restart

 Mr. Tom Lane gave a comment replying it,

 Offhand it looks to me like most signals would kick the backend off the
 send() call ... but it would loop right back and try again.  See
 internal_flush() in pqcomm.c.  (If you're using SSL, this diagnosis
 may or may not apply.)

 We can't do anything except repeat the send attempt if the client
 connection is to be kept in a sane state.
 (snipped)
  And I'm not at all sure if we could get it to work in SSL mode...

 That's true for timeouts that should continue the connection,
 say, statement_timeout, but focusing on intentional backend
 termination, I think it does no harm to break it up abruptly,
 even if it was on SSL. On the other hand it seems still
 preferable to keep a connection when not blocked. The following
 expression would detects such a blocking state at just before
 next send(2) after the previous try exited by signals.

 (ProcDiePending  select(1, NULL, fd, NULL, '1 sec') == 0)

 Finally, pg_terminate_backend() works even when send is blocked
 for both SSL and non-SSL connections after 1 second delay with
 this patch (break_socket_blocking_on_termination_v1.patch).

 Nevetheless, of course statement_timeout cannot become effective
 by this method since it breaks the consistency in the client
 protocol. It needs change in client protocol to have out of
 band mechanism or something, maybe.

 Any suggestions?

You should probably add your patch here, so it doesn't get forgotten about:

https://commitfest.postgresql.org/action/commitfest_view/open

We're focused on reviewing patches for the current CommitFest, so your
patch might not get attention right away.  A couple of general
thoughts on this topic:

1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case.  I'm not sure what, if anything, we can do about that.

2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time.  But I'm unsure what a
reasonable period of time means.  This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Array of composite types returned from python

2014-06-30 Thread Behn, Edward (EBEHN)
Just writing to check in.  

I haven't done anything to look into allowing arrays of composites for input
to PL/Python function.  I made the submitted modification for a specific
project that I'm working on that involves python code that returns data
structures. 

I also have no idea about a more efficient way to convert composite
elements. 
 -Ed 


-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Sunday, June 29, 2014 4:54 PM
To: Abhijit Menon-Sen
Cc: Sim Zacks; Behn, Edward (EBEHN); pgsql-hackers@postgresql.org; Peter
Eisentraut
Subject: Re: [HACKERS] Array of composite types returned from python

Abhijit Menon-Sen a...@2ndquadrant.com writes:
 3) This is such a simple change with no new infrastructure code 
 (PLyObject_ToComposite already exists). Can you think of a reason why 
 this wasn't done until now? Was it a simple miss or purposefully 
 excluded?

 This is not an authoritative answer: I think the infrastructure was 
 originally missing, but was later added in #bc411f25 for OUT parameters.
 Perhaps it was overlooked at the time that the same code would suffice 
 for this earlier-missing case. (I've Cc:ed Peter E. in case he has any
 comments.)

 I think the patch is ready for committer.

I took a quick look at this; not really a review either, but I have a couple
comments.

1. While I think the patch does what it intends to, it's a bit distressing
that it will invoke the information lookups in PLyObject_ToComposite over
again for *each element* of the array.  We probably ought to quantify that
overhead to see if it's bad enough that we need to do something about
improving caching, as speculated in the comment in PLyObject_ToComposite.

2. I wonder whether the no-composites restriction in plpy.prepare (see lines
133ff in plpy_spi.c) could be removed as easily.

regards, tom lane


smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Robert Haas
On Mon, Jun 30, 2014 at 11:17 AM, Andres Freund and...@2ndquadrant.com wrote:
  Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO,
  relaxed memory ordering), so it's probably fine to just use the compiler
  barrier.

 If it isn't, that's a change that has nothing to do with making
 spinlock operations compiler barriers and everything to do with fixing
 a bug in the existing implementation.

 Well, it actually has. If we start to remove volatiles from critical
 sections the compiler will schedule stores closer to the spinlock
 release and reads more freely - making externally visible ordering
 violations more likely. Especially on itanic.

Granted.

 Now, we want to make these
 operations compiler fences as well, and AIUI your proposal for that is
 to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy)
 + S_UNLOCK(dummy) + S_UNLOCK(lock).

 My proposal was to use barrier.h provided barriers as long as it
 provides a native implementation. If neither a compiler nor a memory
 barrier is provided my proposal was to fall back to the memory barrier
 emulation we already have, but move it out of line (to avoid reordering
 issues). The reason for doing so is that the release *has* to be a
 release barrier.

What do you mean by the memory barrier emulation we already have?
The only memory barrier emulation we have, to my knowledge, is a
spinlock acquire-release cycle.  But trying to use a spinlock
acquire-release to shore up problems with the spinlock release
implementation makes my head explode.

 On the other hand, the API contract change making
 these into compiler barriers is a master-only change.  I think for
 purposes of this patch we should assume that the existing code is
 sufficient to prevent CPU reordering and just focus on fixing compiler
 ordering problems.  Whatever needs to change on the CPU-reordering end
 of things is a separate commit.

 I'm not arguing against that it should be a separate commit. Just that
 the proper memory barrier bit has to come first.

I feel like you're speaking somewhat imprecisely in an area where very
precise speech is needed to avoid confusion.   If you're saying that
you think we should fix the S_UNLOCK() implementations that fail to
prevent CPU-ordering before we change all the S_UNLOCK()
implementations to prevent compiler-reordering, then that is fine with
me; otherwise, I don't understand what you're getting at here.  Do you
want to propose a patch that does *only* that first part before we go
further here?  Do you want me to try to put together such a patch
based on the details mentioned on this thread?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.4 logical replication - walsender keepalive replies

2014-06-30 Thread Steve Singer

In 9.4 we've the below  block of code to walsender.c as

/*
 * We only send regular messages to the client for full decoded
 * transactions, but a synchronous replication and walsender shutdown
 * possibly are waiting for a later location. So we send pings
 * containing the flush location every now and then.
*/
if (MyWalSnd-flush  sentPtr  !waiting_for_ping_response)
{
WalSndKeepalive(true);
waiting_for_ping_response = true;
}


I am finding that my logical replication reader is spending a tremendous 
amount of time sending feedback to the server because a keep alive reply 
was requested.  My flush pointer is smaller than sendPtr, which I see as 
the normal case (The client hasn't confirmed all the wal it has been 
sent).   My client queues the records it receives and only confirms when 
actually processes the record.


So the sequence looks something like

Server Sends LSN 0/1000
Server Sends LSN 0/2000
Server Sends LSN 0/3000
Client confirms LSN 0/2000


I don't see why all these keep alive replies are needed in this case 
(the timeout value is bumped way up, it's the above block that is 
triggering the reply request not something related to timeout)


Steve






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Robert Haas
On Sat, Jun 28, 2014 at 4:25 AM, Andres Freund and...@2ndquadrant.com wrote:
  To make pin/unpin et al safe without acquiring the spinlock the pincount
  and the flags had to be moved into one variable so that when
  incrementing the pincount you automatically get the flagbits back. If
  it's currently valid all is good, otherwise the spinlock needs to be
  acquired. But for that to work you need to set flags while the spinlock
  is held.
 
  Possibly you can come up with ways to avoid that, but I am pretty damn
  sure that the code will be less robust by forbidding atomics inside a
  critical section, not the contrary. It's a good idea to avoid it, but
  not at all costs.

 You might be right, but I'm not convinced.

 Why? Anything I can do to convince you of this? Note that other users of
 atomics (e.g. the linux kernel) widely use atomics inside spinlock
 protected regions.

The Linux kernel isn't a great example, because they can ensure that
they aren't preempted while holding the spinlock.  We can't, and
that's a principle part of my concern here.

More broadly, it doesn't seem consistent.  I think that other projects
also sometimes write code that acquires a spinlock while holding
another spinlock, and we don't do that; in fact, we've elevated that
to a principle, regardless of whether it performs well in practice.
In some cases, the CPU instruction that we issue to acquire that
spinlock might be the exact same instruction we'd issue to manipulate
an atomic variable.  I don't see how it can be right to say that a
spinlock-protected critical section is allowed to manipulate an atomic
variable with cmpxchg, but not allowed to acquire another spinlock
with cmpxchg.  Either acquiring the second spinlock is OK, in which
case our current coding rule is wrong, or acquiring the atomic
variable is not OK, either.  I don't see how we can have that both
ways.

 What I'm basically afraid of is that this will work fine in many cases
 but have really ugly failure modes.  That's pretty much what happens
 with spinlocks already - the overhead is insignificant at low levels
 of contention but as the spinlock starts to become contended the CPUs
 all fight over the cache line and performance goes to pot.  ISTM that
 making the spinlock critical section significantly longer by putting
 atomic ops in there could appear to win in some cases while being very
 bad in others.

 Well, I'm not saying it's something I suggest doing all the time. But if
 using an atomic op in the slow path allows you to remove the spinlock
 from 99% of the cases I don't see it having a significant impact.
 In most scenarios (where atomics aren't emulated, i.e. platforms we
 expect to used in heavily concurrent cases) the spinlock and the atomic
 will be on the same cacheline making stalls much less likely.

And this again is my point: why can't we make the same argument about
two spinlocks situated on the same cache line?  I don't have a bit of
trouble believing that doing the same thing with a couple of spinlocks
could sometimes work out well, too, but Tom is adamantly opposed to
that.  I don't want to just make an end run around that issue without
understanding whether he has a valid point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-06-30 Thread Tom Lane
Joshua Yanovski pythones...@gmail.com writes:
 Proof of concept initial patch for enabling index only scans for
 partial indices even when an attribute is not in the target list, as
 long as it is only used in restriction clauses that can be proved by
 the index predicate.  This also works for index quals, though they
 still can't be used in the target list.  However, this patch may be
 inefficient since it duplicates effort that is currently delayed until
 after the best plan is chosen.

I took a quick look at this.  I think it's logically incorrect to exclude
Vars used only in index quals from the set that the index has to return,
since you can't know at this stage whether the index is lossy (ie, might
report xs_recheck = TRUE at runtime).  While this is moot for btree
indexes, it's not moot for SPGiST indexes which also support index-only
scans today.

In principle we could extend the AM and opclass API and demand that AMs
tell us whether they might return xs_recheck = TRUE.  However, I'm pretty
hesitant to change the opclass APIs for this purpose; it'd likely break
third-party code.  Moreover, an advantage of confining the patch to
considering only partial-index quals is that you could skip all the added
work for non-partial indexes, which would probably largely solve the
added-planning-time problem.

 It also includes a minor
 fix in the same code in createplan.c to make sure we're explicitly
 comparing an empty list to NIL, but I can take that out if that's not
 considered in scope.

I don't think the existing code is poor style there.  There are certainly
hundreds of other cases where we treat != NULL or != NIL as implicit
(though of course also other places where we don't).

 ... as I see it performance could improve in any combination of five
 ways:
 * Improve the performance of determining which clauses can't be
 discarded (e.g. precompute some information about equivalence classes
 for index predicates, mess around with the order in which we check the
 clauses to make it fail faster, switch to real union-find data
 structures for equivalence classes).

This is certainly possible, though rather open-ended, and it's not clear
that it really fixes the objection (ie, if you speed these things up then
you still have a performance discrepancy from adding the tests earlier).

 * Take advantage of work we do here to speed things up elsewhere (e.g.
 if this does get chosen as the best plan we don't need to recompute
 the same information in create_indexscan_plan).

That would likely be worth doing if we do this, but it will only buy
back a small part of the cost, since the whole problem here is we'd
be doing this work for all indexes and not only the eventually selected
one.

 * Delay determining whether to use an index scan or index only scan
 until after cost analysis somehow.  I'm not sure exactly what this
 would entail.

That seems impossible to me, since the whole point of an index-only
scan is that it's a lot cheaper than a regular one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Andres Freund
On 2014-06-30 11:38:31 -0400, Robert Haas wrote:
 On Mon, Jun 30, 2014 at 11:17 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Now, we want to make these
  operations compiler fences as well, and AIUI your proposal for that is
  to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy)
  + S_UNLOCK(dummy) + S_UNLOCK(lock).
 
  My proposal was to use barrier.h provided barriers as long as it
  provides a native implementation. If neither a compiler nor a memory
  barrier is provided my proposal was to fall back to the memory barrier
  emulation we already have, but move it out of line (to avoid reordering
  issues). The reason for doing so is that the release *has* to be a
  release barrier.
 
 What do you mean by the memory barrier emulation we already have?
 The only memory barrier emulation we have, to my knowledge, is a
 spinlock acquire-release cycle.

Yes.

Unrelated, but why haven't we defined it as if (TAS(dummy))
S_UNLOCK(dummy);? That's just about as strong and less of a performance
drain?
Hm, I guess platforms that do an unlocked test first would be
problematic :/

 But trying to use a spinlock
 acquire-release to shore up problems with the spinlock release
 implementation makes my head explode.

Well, it actually makes some sense. Nearly any TAS() implementation is
going to have some memory barrier semantics - so using a TAS() as
fallback makes sense. That's why we're relying on it for use in memory
barrier emulation after all.

Also note that barrier.h assumes that a S_LOCK/S_UNLOCK acts as a
compiler barrier - which really isn't guaranteed by the current contract
of s_lock.h. Although the tas() implementations look safe.

  On the other hand, the API contract change making
  these into compiler barriers is a master-only change.  I think for
  purposes of this patch we should assume that the existing code is
  sufficient to prevent CPU reordering and just focus on fixing compiler
  ordering problems.  Whatever needs to change on the CPU-reordering end
  of things is a separate commit.
 
  I'm not arguing against that it should be a separate commit. Just that
  the proper memory barrier bit has to come first.
 
 I feel like you're speaking somewhat imprecisely in an area where very
 precise speech is needed to avoid confusion.   If you're saying that
 you think we should fix the S_UNLOCK() implementations that fail to
 prevent CPU-ordering before we change all the S_UNLOCK()
 implementations to prevent compiler-reordering, then that is fine with
 me;

Yes, that's what I think is needed.

 Do you want to propose a patch that does *only* that first part before
 we go further here?  Do you want me to try to put together such a
 patch based on the details mentioned on this thread?

I'm fine with either - we're both going to flying pretty blind :/.

I think the way S_UNLOCK's release memory barrier semantics are fixed
might influence the way the compiler barrier issue can be solved. Fixing
the release semantics will involve going through all tas()
implementations and see whether the default release semantics are
ok. The ones with broken semantics will need to grow their own
S_UNLOCK. I am wondering if that commit shouldn't just remove the
default S_UNLOCK and move it to the tas() implementation sites.



Please don't forget that I started this thread because I found the
current implementation lacking because s_lock neither has sane memory
release nor compiler release semantics... So it's not surprising that I
want to solve both :)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-30 Thread Kevin Grittner
David Fetter da...@fetter.org wrote:

 It's missing a few pieces like surfacing transition table names. 
 I'll work on those.  Also, it's not clear to me how to access the
 pre- and post- relations at the same time, this being necessary
 for many use cases.  I guess I need to think more about how that
 would be done.

If you're going to do any work on the C extension, please start
from the attached.  I renamed it to something which seemed more
meaningful (to me, at least), and cleaned up some cosmetic issues. 
The substance is the same.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/contrib/Makefile b/contrib/Makefile
index b37d0dd..bcd7c28 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -48,6 +48,7 @@ SUBDIRS = \
 		postgres_fdw	\
 		seg		\
 		spi		\
+		transition_tables \
 		tablefunc	\
 		tcn		\
 		test_decoding	\
diff --git a/contrib/transition_tables/.gitignore b/contrib/transition_tables/.gitignore
new file mode 100644
index 000..5dcb3ff
--- /dev/null
+++ b/contrib/transition_tables/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/transition_tables/Makefile b/contrib/transition_tables/Makefile
new file mode 100644
index 000..8a75350
--- /dev/null
+++ b/contrib/transition_tables/Makefile
@@ -0,0 +1,20 @@
+# contrib/transition_tables/Makefile
+
+MODULE_big = transition_tables
+OBJS = transition_tables.o
+
+EXTENSION = transition_tables
+DATA = transition_tables--1.0.sql
+
+REGRESS = transition_tables
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/transition_tables
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/transition_tables/expected/transition_tables.out b/contrib/transition_tables/expected/transition_tables.out
new file mode 100644
index 000..7dfed35
--- /dev/null
+++ b/contrib/transition_tables/expected/transition_tables.out
@@ -0,0 +1,18 @@
+CREATE EXTENSION transition_tables;
+CREATE TABLE IF NOT EXISTS e(
+i INT
+);
+CREATE TRIGGER statement_dml_e
+AFTER INSERT OR UPDATE OR DELETE ON e
+REFERENCING
+OLD TABLE AS old_e
+NEW TABLE AS new_e
+FOR EACH STATEMENT
+EXECUTE PROCEDURE transition_tables();
+INSERT INTO e(i)
+SELECT * FROM generate_series(1,1);
+NOTICE:  Total change: 50005000
+UPDATE e SET i=i+1;
+NOTICE:  Total change: 1
+DELETE FROM e WHERE i  5000;
+NOTICE:  Total change: -12497499
diff --git a/contrib/transition_tables/sql/transition_tables.sql b/contrib/transition_tables/sql/transition_tables.sql
new file mode 100644
index 000..04f8bd7
--- /dev/null
+++ b/contrib/transition_tables/sql/transition_tables.sql
@@ -0,0 +1,20 @@
+CREATE EXTENSION transition_tables;
+
+CREATE TABLE IF NOT EXISTS e(
+i INT
+);
+
+CREATE TRIGGER statement_dml_e
+AFTER INSERT OR UPDATE OR DELETE ON e
+REFERENCING
+OLD TABLE AS old_e
+NEW TABLE AS new_e
+FOR EACH STATEMENT
+EXECUTE PROCEDURE transition_tables();
+
+INSERT INTO e(i)
+SELECT * FROM generate_series(1,1);
+
+UPDATE e SET i=i+1;
+
+DELETE FROM e WHERE i  5000;
diff --git a/contrib/transition_tables/transition_tables--1.0.sql b/contrib/transition_tables/transition_tables--1.0.sql
new file mode 100644
index 000..6c1625e
--- /dev/null
+++ b/contrib/transition_tables/transition_tables--1.0.sql
@@ -0,0 +1,10 @@
+/* contrib/transition_tables--1.0.sql */
+
+
+-- Complain if script is sourced in psql, rather than via CREATE EXTENSION.
+\echo Use CREATE EXTENSION transition_tables to load this file.  \quit
+
+CREATE FUNCTION transition_tables()
+RETURNS pg_catalog.trigger
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
diff --git a/contrib/transition_tables/transition_tables.c b/contrib/transition_tables/transition_tables.c
new file mode 100644
index 000..2b9264c
--- /dev/null
+++ b/contrib/transition_tables/transition_tables.c
@@ -0,0 +1,144 @@
+/*-
+ *
+ * transition_tables.c
+ *		sample of accessing trigger transition tables from a C trigger
+ *
+ * Portions Copyright (c) 2011-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * 		contrib/transition_tables.c
+ *
+ *-
+ */
+
+#include postgres.h
+
+#include commands/trigger.h
+#include utils/rel.h
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(transition_tables);
+
+Datum
+transition_tables(PG_FUNCTION_ARGS)
+{
+	TriggerData		*trigdata = (TriggerData *) fcinfo-context;
+	TupleDesc		tupdesc;
+	TupleTableSlot	*slot;
+	Tuplestorestate	*new_tuplestore;
+	Tuplestorestate	*old_tuplestore;
+	int64			delta = 0;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		ereport(ERROR,
+

[HACKERS] Does changing attribute order breaks something?

2014-06-30 Thread geohas

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi!

Are there any known issues when changing the attnums?

like :

airport=# update pg_attribute set attnum = 3 where attname = 'abc' and
attrelid = 18328;
UPDATE 1
airport=# update pg_attribute set attnum = 2 where attname = 'foo' and
attrelid = 18328;
UPDATE 1

Does something break or are there any other ideas to change the order of
columns?

regard

geohas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTsZJMAAoJEJFGMlQe7wR/WL4H/32LaOBR5jKL4kvmCNKeItIF
g8EFPf4gJS9SleX0n6m7ctuKUkGQTkD6Ae7Azv0YyiMzJATgXlMHFUojj2JdyGar
yvPF2pxEzTwLPkST0cOuC68k/6KxZBvpy5FBaSFNAI1mExOTyNpO8H7lk6zWxw34
Iw8IWdayBqStdZ4HVMVTp6Vq6ReWpRKf/ekcbTIp9EKpimz0RsLafO3b0J1epv6t
w5VHCMLMuE8QtKuYaGkQ1+OwgKUiZQT/8vp4F4c5DIIYqpWWa4JjoNVmOA0leaVm
I1IFhNlo59jnPIRNR5/mIzezeA2LxZhuY4EOX6+zLy3iuja7KbdO6wXt1zedWBY=
=0KWv
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Robert Haas
On Mon, Jun 30, 2014 at 12:20 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-30 11:38:31 -0400, Robert Haas wrote:
 On Mon, Jun 30, 2014 at 11:17 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Now, we want to make these
  operations compiler fences as well, and AIUI your proposal for that is
  to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy)
  + S_UNLOCK(dummy) + S_UNLOCK(lock).
 
  My proposal was to use barrier.h provided barriers as long as it
  provides a native implementation. If neither a compiler nor a memory
  barrier is provided my proposal was to fall back to the memory barrier
  emulation we already have, but move it out of line (to avoid reordering
  issues). The reason for doing so is that the release *has* to be a
  release barrier.

 What do you mean by the memory barrier emulation we already have?
 The only memory barrier emulation we have, to my knowledge, is a
 spinlock acquire-release cycle.

 Yes.

 Unrelated, but why haven't we defined it as if (TAS(dummy))
 S_UNLOCK(dummy);? That's just about as strong and less of a performance
 drain?
 Hm, I guess platforms that do an unlocked test first would be
 problematic :/

Yes; also, there's no requirement for S_LOCK() to be defined in terms
of TAS(), and thus no requirement for TAS() to exist at all.

 But trying to use a spinlock
 acquire-release to shore up problems with the spinlock release
 implementation makes my head explode.

 Well, it actually makes some sense. Nearly any TAS() implementation is
 going to have some memory barrier semantics - so using a TAS() as
 fallback makes sense. That's why we're relying on it for use in memory
 barrier emulation after all.

As far as I know, all of our TAS() implementations prevent CPU
reordering in the acquire direction.  It is not clear that they
provide CPU-reordering guarantees adequate for the release direction,
even when paired with whatever S_UNLOCK() implementation they're mated
with.  And it's quite clear that many of them aren't adequate to
prevent compiler-reordering in any case.

 Also note that barrier.h assumes that a S_LOCK/S_UNLOCK acts as a
 compiler barrier - which really isn't guaranteed by the current contract
 of s_lock.h. Although the tas() implementations look safe.

Ugh, you're right.  That should really be moved out-of-line.

  On the other hand, the API contract change making
  these into compiler barriers is a master-only change.  I think for
  purposes of this patch we should assume that the existing code is
  sufficient to prevent CPU reordering and just focus on fixing compiler
  ordering problems.  Whatever needs to change on the CPU-reordering end
  of things is a separate commit.
 
  I'm not arguing against that it should be a separate commit. Just that
  the proper memory barrier bit has to come first.

 I feel like you're speaking somewhat imprecisely in an area where very
 precise speech is needed to avoid confusion.   If you're saying that
 you think we should fix the S_UNLOCK() implementations that fail to
 prevent CPU-ordering before we change all the S_UNLOCK()
 implementations to prevent compiler-reordering, then that is fine with
 me;

 Yes, that's what I think is needed.

OK, let's do it that way then.

 Do you want to propose a patch that does *only* that first part before
 we go further here?  Do you want me to try to put together such a
 patch based on the details mentioned on this thread?

 I'm fine with either - we're both going to flying pretty blind :/.

 I think the way S_UNLOCK's release memory barrier semantics are fixed
 might influence the way the compiler barrier issue can be solved.

I think I'm starting to understand the terminological confusion: to
me, a memory barrier means a fence against both the compiler and the
CPU.  I now think you're using it to mean a fence against the CPU, as
distinct from a fence against the compiler.  That had me really
confused in some previous replies.

 Fixing
 the release semantics will involve going through all tas()
 implementations and see whether the default release semantics are
 ok. The ones with broken semantics will need to grow their own
 S_UNLOCK. I am wondering if that commit shouldn't just remove the
 default S_UNLOCK and move it to the tas() implementation sites.

So, I think that here you are talking about CPU behavior rather than
compiler behavior.  Next paragraph is on that basis.

The implementations that don't currently have their own S_UNLOCK() are
i386, x86_64, Itanium, ARM without GCC atomics, S/390 zSeries, Sparc,
Linux m68k, VAX, m32r, SuperH, non-Linux m68k, Univel CC, HP/UX
non-GCC, Sun Studio, and WIN32_ONLY_COMPILER.  Because most of those
are older platforms, I'm betting that more of them than not are OK; I
think we should confine ourselves to trying to fix the ones we're sure
are wrong, which if I understand you correctly are ARM without GCC
atomics, Sparc, and MIPS.  I think it'd be better to just add copies
of S_UNLOCK to those three 

Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Andres Freund
On 2014-06-30 12:15:06 -0400, Robert Haas wrote:
 More broadly, it doesn't seem consistent.  I think that other projects
 also sometimes write code that acquires a spinlock while holding
 another spinlock, and we don't do that; in fact, we've elevated that
 to a principle, regardless of whether it performs well in practice.

It's actually more than a principle: It's now required for correctness,
because otherwise the semaphore based spinlock implementation will
deadlock otherwise.

 In some cases, the CPU instruction that we issue to acquire that
 spinlock might be the exact same instruction we'd issue to manipulate
 an atomic variable.  I don't see how it can be right to say that a
 spinlock-protected critical section is allowed to manipulate an atomic
 variable with cmpxchg, but not allowed to acquire another spinlock
 with cmpxchg.  Either acquiring the second spinlock is OK, in which
 case our current coding rule is wrong, or acquiring the atomic
 variable is not OK, either.  I don't see how we can have that both
 ways.

The no nested spinlock thing used to be a guideline. One nobody had big
problems with because it didn't prohibit solutions to problems. Since
guidelines are more useful when simple it's no nested
spinlocks!.

Also those two cmpxchg's aren't the same:

The CAS for spinlock acquiration will only succeed if the lock isn't
acquired. If the lock holder sleeps you'll have to wait for it to wake
up.

In contrast to that CASs for lockfree (or lock-reduced) algorithms won't
be blocked if the last manipulation was done by a backend that's now
sleeping. It'll possibly loop once, get the new value, and retry the
CAS. Yes, it can still take couple of iterations under heavy
concurrency, but you don't have the problem that the lockholder goes to
sleep.

Now, you can argue that the spinlock based fallbacks make that
difference moot, but I think that's just the price for an emulation
fringe platforms will have to pay. They aren't platforms used under
heavy concurrency.

  What I'm basically afraid of is that this will work fine in many cases
  but have really ugly failure modes.  That's pretty much what happens
  with spinlocks already - the overhead is insignificant at low levels
  of contention but as the spinlock starts to become contended the CPUs
  all fight over the cache line and performance goes to pot.  ISTM that
  making the spinlock critical section significantly longer by putting
  atomic ops in there could appear to win in some cases while being very
  bad in others.
 
  Well, I'm not saying it's something I suggest doing all the time. But if
  using an atomic op in the slow path allows you to remove the spinlock
  from 99% of the cases I don't see it having a significant impact.
  In most scenarios (where atomics aren't emulated, i.e. platforms we
  expect to used in heavily concurrent cases) the spinlock and the atomic
  will be on the same cacheline making stalls much less likely.
 
 And this again is my point: why can't we make the same argument about
 two spinlocks situated on the same cache line?

Because it's not relevant? Where and why do we want to acquire two
spinlocks that are on the same cacheline?
As far as I know there's never been an actual need for nested
spinlocks. So the guideline can be as restrictive as is it is because
the price is low and there's no benefit in making it more complex.


I think this line of discussion is pretty backwards. The reason I (and
seemingly Heikki) want to use atomics is that it can reduce contention
significantly. That contention is today too a good part based on
spinlocks.  Your argument is basically that increasing spinlock
contention by doing things that can take more than a fixed cycles can
increase contention. But the changes we've talked about only make sense
if they significantly reduce spinlock contention in the first place - a
potential for a couple iterations while holding better not be relevant
for proposed patches.

I don't think a blanket rule makes sense here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... this again is my point: why can't we make the same argument about
 two spinlocks situated on the same cache line?  I don't have a bit of
 trouble believing that doing the same thing with a couple of spinlocks
 could sometimes work out well, too, but Tom is adamantly opposed to
 that.

I think you might be overstating my position here.  What I'm concerned
about is that we be sure that spinlocks are held for a sufficiently short
time that it's very unlikely that we get pre-empted while holding one.
I don't have any particular bright line about how short a time that is,
but more than a few instructions worries me.  As you say, the Linux
kernel is a bad example to follow because it hasn't got a risk of losing
its timeslice while holding a spinlock.

The existing coding rules discourage looping (though I might be okay
with a small constant loop count), and subroutine calls (mainly because
somebody might add $random_amount_of_work to the subroutine if they
don't realize it can be called while holding a spinlock).  Both of these
rules are meant to reduce the risk that a short interval balloons
into a long one due to unrelated coding changes.

The existing coding rules also discourage spinlocking within a spinlock,
and the reason for that is that there's no very clear upper bound to the
time required to obtain a spinlock, so that there would also be no clear
upper bound to the time you're holding the original one (thus possibly
leading to cascading performance problems).

So ISTM the question we ought to be asking is whether atomic operations
have bounded execution time, or more generally what the distribution of
execution times is likely to be.  I'd be OK with an answer that includes
sometimes it can be long so long as sometimes is pretty damn seldom.
Spinlocks have a nonzero risk of taking a long time already, since we
can't entirely prevent the possibility of losing our timeslice while
holding one.  The issue here is just to be sure that that happens seldom
enough that it doesn't cause performance problems.  If we fail to do that
we might negate all the desired performance improvements from adopting
atomic ops in the first place.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-06-30 Thread Tom Lane
Abhijit Menon-Sen a...@2ndquadrant.com writes:
 I don't think it's fair to mark this as returned with feedback without a
 more detailed review (I think of returned with feedback as providing a
 concrete direction to follow). I've set it back to needs review.

 Does anyone else want to look at this patch?

I offered a few comments.  I think it might also be useful to try to
quantify the worst-case performance cost for this, which would arise
for a single-table query on a table with lots of indexes.  Don't
have time for that myself though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Robert Haas
On Mon, Jun 30, 2014 at 12:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The existing coding rules also discourage spinlocking within a spinlock,
 and the reason for that is that there's no very clear upper bound to the
 time required to obtain a spinlock, so that there would also be no clear
 upper bound to the time you're holding the original one (thus possibly
 leading to cascading performance problems).

Well, as Andres points out, commit
daa7527afc2274432094ebe7ceb03aa41f916607 made that more of an ironclad
requirement than a suggestion.  If it's sometimes OK to acquire a
spinlock from within a spinlock, then that commit was badly misguided;
but the design of that patch was pretty much driven by your insistence
that that was never, ever OK.  If that was wrong, then we should
reconsider that whole commit more broadly; but if it was right, then
the atomics patch shouldn't drill a hole through it to install an
exception just for emulating atomics.

I'm personally not convinced that we're approaching this topic in the
right way.  I'm not convinced that it's at all reasonable to try to
emulate atomics on platforms that don't have them.  I would punt the
problem into the next layer and force things like lwlock.c to have
fallback implementations at that level that don't require atomics, and
remove those fallback implementations if and when we move the
goalposts so that all supported platforms must have working atomics
implementations.  People who write code that uses atomics are not
likely to think about how those algorithms will actually perform when
those atomics are merely emulated, and I suspect that means that in
practice platforms that have only emulated atomics are going to
regress significantly vs. the status quo today.  If this patch goes in
the way it's written right now, then I think it basically amounts to
throwing platforms that don't have working atomics emulation under the
bus, and my vote is that if we're going to do that, we should do it
explicitly: sorry, we now only support platforms with working atomics.
You don't have any, so you can't run PostgreSQL.  Have a nice day.

If we *don't* want to make that decision, then I'm not convinced that
the approach this patch takes is in any way viable, completely aside
from this particular question.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-06-30 Thread Pavel Stehule
2014-06-30 12:38 GMT+02:00 Abhijit Menon-Sen a...@2ndquadrant.com:

 If I understand correctly, the design of this patch has already been
 considered earlier and rejected. So I guess the patch should also be
 marked rejected?


I didn't find a related message.

?

Regards

Pavel



 -- Abhijit



Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Andres Freund
On 2014-06-30 12:46:29 -0400, Robert Haas wrote:
  But trying to use a spinlock
  acquire-release to shore up problems with the spinlock release
  implementation makes my head explode.
 
  Well, it actually makes some sense. Nearly any TAS() implementation is
  going to have some memory barrier semantics - so using a TAS() as
  fallback makes sense. That's why we're relying on it for use in memory
  barrier emulation after all.
 
 As far as I know, all of our TAS() implementations prevent CPU
 reordering in the acquire direction.  It is not clear that they
 provide CPU-reordering guarantees adequate for the release direction,
 even when paired with whatever S_UNLOCK() implementation they're mated
 with.


 And it's quite clear that many of them aren't adequate to prevent
 compiler-reordering in any case.

I actually don't think there currently are tas() implementations that
aren't compiler barriers? Maybe UNIVEL/unixware. A bit of luck.

  Also note that barrier.h assumes that a S_LOCK/S_UNLOCK acts as a
  compiler barrier - which really isn't guaranteed by the current contract
  of s_lock.h. Although the tas() implementations look safe.
 
 Ugh, you're right.  That should really be moved out-of-line.

Good. Then we already loose the compile time interdependency from
barrier.h to s_lock.h - although the fallback will have a runtime
dependency.

  Do you want to propose a patch that does *only* that first part before
  we go further here?  Do you want me to try to put together such a
  patch based on the details mentioned on this thread?
 
  I'm fine with either - we're both going to flying pretty blind :/.
 
  I think the way S_UNLOCK's release memory barrier semantics are fixed
  might influence the way the compiler barrier issue can be solved.
 
 I think I'm starting to understand the terminological confusion: to
 me, a memory barrier means a fence against both the compiler and the
 CPU.  I now think you're using it to mean a fence against the CPU, as
 distinct from a fence against the compiler.  That had me really
 confused in some previous replies.

Oh. Lets henceforth define 'fence' as the cache coherency thingy and
read/write/release/acquire/memory barrier as the combination?

  Fixing
  the release semantics will involve going through all tas()
  implementations and see whether the default release semantics are
  ok. The ones with broken semantics will need to grow their own
  S_UNLOCK. I am wondering if that commit shouldn't just remove the
  default S_UNLOCK and move it to the tas() implementation sites.
 
 So, I think that here you are talking about CPU behavior rather than
 compiler behavior.  Next paragraph is on that basis.

Yes, I am.

 The implementations that don't currently have their own S_UNLOCK() are
 i386
 x86_64

TSO, thus fine.

 Itanium

As a special case volatile stores emit release/acquire fences unless
specific compiler flags are used. Thus safe.

 ARM without GCC atomics

Borked.

 S/390 zSeries

Strongly ordered.

 Sparc
 Sun Studio

Borked. At least in some setups.

 Linux m68k

At least linux doesn't support SMP for m68k, so cache coherency
shouldn't be a problem.

 VAX

I don't really know, but I don't care. The NetBSD people statements about VAX
SMP support didn't increase my concern for VAX SMP.

 m32r

No idea.

 SuperH

Not offially supported (as it's not in installation.sgml), don't care :)

 non-Linux m68k

Couldn't figure out if anybody supports SMP here. Doesn't look like it.

 Univel CC

Don't care.

 HP/UX non-GCC

Itanics volatile semantics should work.

 and WIN32_ONLY_COMPILER

Should be fine due to TSO on x86 and itanic volatiles.

 Because most of those
 are older platforms, I'm betting that more of them than not are OK; I
 think we should confine ourselves to trying to fix the ones we're sure
 are wrong

Sounds sane.

, which if I understand you correctly are ARM without GCC
 atomics, Sparc, and MIPS.

I've to revise my statement on MIPS, it actually looks safe. I seem to
have missed that it has its own S_UNLOCK.

Do I see it correctly that !(defined(__mips__)  !defined(__sgi)) isn't
supported?

 I think it'd be better to just add copies
 of S_UNLOCK to those three rather and NOT duplicate the definition in
 the other 12 places.

Ok.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does changing attribute order breaks something?

2014-06-30 Thread Tom Lane
geohas li...@hasibether.at writes:
 Are there any known issues when changing the attnums?

 airport=# update pg_attribute set attnum = 3 where attname = 'abc' and
 attrelid = 18328;
 UPDATE 1
 airport=# update pg_attribute set attnum = 2 where attname = 'foo' and
 attrelid = 18328;
 UPDATE 1

I assume you quickly found out that that totally broke your table,
so why are you asking?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Andres Freund
On 2014-06-30 13:15:23 -0400, Robert Haas wrote:
 On Mon, Jun 30, 2014 at 12:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  The existing coding rules also discourage spinlocking within a spinlock,
  and the reason for that is that there's no very clear upper bound to the
  time required to obtain a spinlock, so that there would also be no clear
  upper bound to the time you're holding the original one (thus possibly
  leading to cascading performance problems).
 
 Well, as Andres points out, commit
 daa7527afc2274432094ebe7ceb03aa41f916607 made that more of an ironclad
 requirement than a suggestion.  If it's sometimes OK to acquire a
 spinlock from within a spinlock, then that commit was badly misguided;
 but the design of that patch was pretty much driven by your insistence
 that that was never, ever OK.  If that was wrong, then we should
 reconsider that whole commit more broadly; but if it was right, then
 the atomics patch shouldn't drill a hole through it to install an
 exception just for emulating atomics.

Well, since nobody has a problem with the rule of not having nested
spinlocks and since the problems aren't the same I'm failing to see why
we should reconsider this.

My recollection was that we primarily discussed whether it'd be a good
idea to add code to Assert() when spinlocks are nested to which Tom
responded that it's not necessary because critical sections should be
short enough to immmediately see that that's not a problem. Then we
argued a bit back and forth around that.

 I'm personally not convinced that we're approaching this topic in the
 right way.  I'm not convinced that it's at all reasonable to try to
 emulate atomics on platforms that don't have them.  I would punt the
 problem into the next layer and force things like lwlock.c to have
 fallback implementations at that level that don't require atomics, and
 remove those fallback implementations if and when we move the
 goalposts so that all supported platforms must have working atomics
 implementations.

If we're requiring a atomics-less implementation for lwlock.c,
bufmgr. et al. I'll stop working on those features (and by extension on
atomics). It's an utter waste of resources and maintainability trying to
maintain parallel high level locking algorithms in several pieces of the
codebase. The nonatomic versions will stagnate and won't actually work
under load. I'd rather spend my time on something useful. The spinlock
based atomics emulation is *small*. It's easy to reason about
correctness there.

If we're going that way, we've made a couple of absolute fringe
platforms hold postgres, as actually used in reality, hostage. I think
that's insane.

 People who write code that uses atomics are not
 likely to think about how those algorithms will actually perform when
 those atomics are merely emulated, and I suspect that means that in
 practice platforms that have only emulated atomics are going to
 regress significantly vs. the status quo today.

I think you're overstating the likely performance penalty for
nonparallel platforms/workloads here quite a bit. The platforms without
changes of gettings atomics implemented aren't ones where that's a
likely thing. Yes, you'll see a regression if you run a readonly pgbench
over a 4 node NUMA system - but it's not large. You won't see much of
improved performance in comparison to 9.4, but I think that's primarily
it.

Also it's not like this is something new - the semaphore based fallback
*sucks* but we still have it.

*Many* features in new major versions regress performance for fringe
platforms. That's normal.

 If this patch goes in
 the way it's written right now, then I think it basically amounts to
 throwing platforms that don't have working atomics emulation under the
 bus

Why? Nobody is going to run 9.5 under high performance workloads on such
platforms. They'll be so IO and RAM starved that the spinlocks won't be
the bottleneck.

, and my vote is that if we're going to do that, we should do it
 explicitly: sorry, we now only support platforms with working atomics.
 You don't have any, so you can't run PostgreSQL.  Have a nice day.

I'd be fine with that. I don't think we're gaining much by those
platforms.
*But* I don't really see why it's required at this point. The fallback
works and unless you're using semaphore based spinlocks the performance
isn't bad.
The lwlock code doesn't really get slower/faster in comparison to before
when the atomics implementation is backed by semaphores. It's pretty
much the same number of syscalls using the atomics/current implementation.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-30 Thread Alvaro Herrera
Andres Freund wrote:

 c) We already allow to set pretty much all aspects of the control file
via resetxlog - there seems little point of not having the ability to
change the system identifier.

I think it's pretty much a given that pg_resetxlog is a tool that can
have disastrous effects if used lightly.  If people changes their sysid
wrongly, they're not any worse than if they change their multixact
counters and start getting failures because the old values stored in
data cannot be resolved anymore (it's already been wrapped around).
Or if they remove all the XLOG they have since the latest crash.  From
that POV, I don't think the objection that but this can be used to
corrupt data! has any value.

Also on the other hand pg_resetxlog is already a tool for PG hackers to
fool around and test things.  In a way, being able to change values in
pg_control is useful to many of us; this is only an extension of that.

If we only had bricks and mortar, I think we would have a tool to
display and tweak pg_control separately from emptying pg_xlog, rather
than this odd separation between pg_controldata and pg_resetxlog, each
of which do a mixture of those things.  But we have a wall two thirds
done already, so it seems to make more sense to me to continue forward
rather than tear it down and start afresh.  This patch is a natural
extension of what we already have.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm personally not convinced that we're approaching this topic in the
 right way.  I'm not convinced that it's at all reasonable to try to
 emulate atomics on platforms that don't have them.  I would punt the
 problem into the next layer and force things like lwlock.c to have
 fallback implementations at that level that don't require atomics, and
 remove those fallback implementations if and when we move the
 goalposts so that all supported platforms must have working atomics
 implementations.  People who write code that uses atomics are not
 likely to think about how those algorithms will actually perform when
 those atomics are merely emulated, and I suspect that means that in
 practice platforms that have only emulated atomics are going to
 regress significantly vs. the status quo today.

I think this is a valid objection, and I for one am not prepared to
say that we no longer care about platforms that don't have atomic ops
(especially not if it's not a *very small* set of atomic ops).

Also, just because a platform claims to have atomic ops doesn't mean that
those ops perform well.  If there's a kernel trap involved, they don't,
at least not for our purposes.  We're only going to be bothering with
installing atomic-op code in places that are contention bottlenecks
for us already, so we are not going to be happy with the results for any
atomic-op implementation that's not industrial strength.  This is one
reason why I'm extremely suspicious of depending on gcc's intrinsics for
this; that will not make the issue go away, only make it beyond our
power to control.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Andres Freund
On 2014-06-30 12:54:10 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  ... this again is my point: why can't we make the same argument about
  two spinlocks situated on the same cache line?  I don't have a bit of
  trouble believing that doing the same thing with a couple of spinlocks
  could sometimes work out well, too, but Tom is adamantly opposed to
  that.
 
 I think you might be overstating my position here.  What I'm concerned
 about is that we be sure that spinlocks are held for a sufficiently short
 time that it's very unlikely that we get pre-empted while holding one.
 I don't have any particular bright line about how short a time that is,
 but more than a few instructions worries me.  As you say, the Linux
 kernel is a bad example to follow because it hasn't got a risk of losing
 its timeslice while holding a spinlock.

Well, they actually have preemptable and irq-interruptile versions for
some of these locks, but whatever :).

 So ISTM the question we ought to be asking is whether atomic operations
 have bounded execution time, or more generally what the distribution of
 execution times is likely to be.  I'd be OK with an answer that includes
 sometimes it can be long so long as sometimes is pretty damn
 seldom.

I think it ranges from 'never' to 'sometimes, but pretty darn
seldom'. Depending on the platform and emulation.

And, leaving the emulation and badly written code aside, there's no
issue with another backend sleeping while the atomic variable needs to
be modified.

 Spinlocks have a nonzero risk of taking a long time already, since we
 can't entirely prevent the possibility of losing our timeslice while
 holding one.  The issue here is just to be sure that that happens seldom
 enough that it doesn't cause performance problems.  If we fail to do that
 we might negate all the desired performance improvements from adopting
 atomic ops in the first place.

I think individual patches will have to stand up to a test like
this. Modifying a atomic variable inside a spinlock is only going to be
worth it if it allows to avoid spinlocks alltogether in 95%+ of the
time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Andres Freund
On 2014-06-30 19:44:47 +0200, Andres Freund wrote:
 On 2014-06-30 13:15:23 -0400, Robert Haas wrote:
  People who write code that uses atomics are not
  likely to think about how those algorithms will actually perform when
  those atomics are merely emulated, and I suspect that means that in
  practice platforms that have only emulated atomics are going to
  regress significantly vs. the status quo today.
 
 I think you're overstating the likely performance penalty for
 nonparallel platforms/workloads here quite a bit. The platforms without
 changes of gettings atomics implemented aren't ones where that's a
 likely thing. Yes, you'll see a regression if you run a readonly pgbench
 over a 4 node NUMA system - but it's not large. You won't see much of
 improved performance in comparison to 9.4, but I think that's primarily
 it.

To quantify this, on my 2 socket xeon E5520 workstation - which is too
small to heavily show the contention problems in pgbench -S - the
numbers are:

pgbench -M prepared -c 16 -j 16 -T 10 -S (best of 5, noticeably variability)
master: 152354.294117
lwlocks-atomics: 170528.784958
lwlocks-atomics-spin: 159416.202578

pgbench -M prepared -c 1 -j 1 -T 10 -S (best of 5, noticeably variability)
master: 16273.702191
lwlocks-atomics: 16768.712433
lwlocks-atomics-spin: 16744.913083

So, there really isn't a problem with the emulation. It's not actually
that surprising - the absolute number of atomic ops is prety much the
same. Where we earlier took a spinlock to increment shared we now still
take one.

I expect that repeating this on the 4 socket machine will show a large
gap between lwlocks-atomics and the other two. The other two will be
about the same, with lwlocks-atomics-spin remaining a bit better than
master.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster name in ps output

2014-06-30 Thread Josh Berkus
Abjit, all:

If we're adding log_line_prefix support for cluster_name (something I
think is a good idea), we need to also add it to CSVLOG.  So, where do
we put it in CSVLog?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster name in ps output

2014-06-30 Thread Andres Freund
On 2014-06-30 12:10:53 -0700, Josh Berkus wrote:
 Abjit, all:
 
 If we're adding log_line_prefix support for cluster_name (something I
 think is a good idea), we need to also add it to CSVLOG.  So, where do
 we put it in CSVLog?

That was shot down (unfortunately imo), and I don't think anybody
actively working on it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-30 Thread Tom Lane
Abhijit Menon-Sen a...@2ndquadrant.com writes:
 I've attached a patch to contrib/unaccent as outlined in my review the
 other day.

I went to commit this, and while testing I realized that the current
implementation of unaccent_lexize is only capable of coping with src
strings that are single characters in the current encoding.  I'm not
sure exactly how complicated it would be to fix that, but it might
not be terribly difficult.  (Overlapping matches would be the main
complication, likely.)

Anyway, this raises the question of whether the current patch is
actually a desirable way to do things, or whether it would be better if
the unaccenting rules were like base-char accent-char - base-char.
Presumably, that would require more rules, but it would prevent deletion
of a standalone accent-char, which might or might not be a good thing.
Also, if there are any contexts where the right translation of an
accent-char depends on the base-char, you couldn't do it with the
patch as it stands.  I don't know any languages that use separate
accent-chars, so I'm not qualified to opine on whether this is important.

It's not unlikely that we want this patch *and* an improvement that allows
multi-character src strings, but I held off committing for the moment
until somebody weighs in with an opinion about that.

In any case, the patch failed to update the user-facing docs
(unaccent.sgml) so we need some more work in that department.  The
user-facing docs are already pretty weak in that they fail to explain the
whitespace rules, much less clarify that the src must be exactly a single
character.

If we don't improve the code to allow multi-character src, I wonder
whether the rule-reading code shouldn't be improved to forbid such
cases.  I'm also wondering why it silently ignores malformed lines
instead of bleating.  But that's more or less orthogonal to this patch.

Lastly, I didn't especially like the coding details of either proposed
patch, and rewrote it as attached.  I didn't see a need to introduce
a state 5, and I also didn't agree with changing the initial values
of trg/trglen to be meaningful.  Right now, those variables are
initialized only to keep the compiler from bleating; the initial values
aren't actually used anywhere.  It didn't seem like a good idea for
the trgxxx variables to be different from the srcxxx variables in that
respect.

regards, tom lane

diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
index a337df6..5a31f85 100644
*** a/contrib/unaccent/unaccent.c
--- b/contrib/unaccent/unaccent.c
*** initTrie(char *filename)
*** 104,114 
  
  			while ((line = tsearch_readline(trst)) != NULL)
  			{
! /*
!  * The format of each line must be src trg where src and trg
!  * are sequences of one or more non-whitespace characters,
!  * separated by whitespace.  Whitespace at start or end of
!  * line is ignored.
   */
  int			state;
  char	   *ptr;
--- 104,124 
  
  			while ((line = tsearch_readline(trst)) != NULL)
  			{
! /*--
!  * The format of each line must be src or src trg, where
!  * src and trg are sequences of one or more non-whitespace
!  * characters, separated by whitespace.  Whitespace at start
!  * or end of line is ignored.  If trg is omitted, an empty
!  * string is used as the replacement.
!  *
!  * We use a simple state machine, with states
!  *	0	initial (before src)
!  *	1	in src
!  *	2	in whitespace after src
!  *	3	in trg
!  *	4	in whitespace after trg
!  *	-1	syntax error detected (line will be ignored)
!  *--
   */
  int			state;
  char	   *ptr;
*** initTrie(char *filename)
*** 160,166 
  	}
  }
  
! if (state = 3)
  	rootTrie = placeChar(rootTrie,
  		 (unsigned char *) src, srclen,
  		 trg, trglen);
--- 170,183 
  	}
  }
  
! if (state == 1 || state == 2)
! {
! 	/* trg was omitted, so use  */
! 	trg = ;
! 	trglen = 0;
! }
! 
! if (state  0)
  	rootTrie = placeChar(rootTrie,
  		 (unsigned char *) src, srclen,
  		 trg, trglen);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-06-30 Thread Christian Ullrich
* From: Noah Misch [mailto:n...@leadboat.com]

 I liked the proposal here; was there a problem with it?
 http://www.postgresql.org/message-
 id/ca+tgmoz3ake4enctmqmzsykc_0pjl_u4c_x47ge48uy1upb...@mail.gmail.com

You're referring to the suggestion of accepting and ignoring the option on
non-Windows, right? I can do that, I just don't see the point as long as
pg_ctl has a separate code path (well, #ifdef) for Windows anyway.

 The pg_upgrade test suite and the $(prove_check)-based test suites rely on
 their pg_ctl-started postmasters receiving any console ^C.  pg_ctl
 deserves a --foreground or --no-background option for callers that prefer
 the current behavior.  That, or those tests need a new way to launch the
 postmaster.

Ah. More good news. Just to confirm, this is only about the tests, right,
not the code they are testing?

If so, is there even a way to run either on Windows? The pg_upgrade test
suite is a shell script, and prove_check is defined in Makefile.global and 
definitely not applicable to Windows.

-- 
Christian


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-30 Thread Abhijit Menon-Sen
At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote:

 Anyway, this raises the question of whether the current patch is
 actually a desirable way to do things, or whether it would be better
 if the unaccenting rules were like base-char accent-char -
 base-char.

It might be useful to be able to write such rules, but it would be
highly impractical to do so instead of being able to single out
accent-chars for removal.

In all the languages I'm familiar with that use such accent-chars, any
accent-char would form a valid combination with nearly every base-char,
unlike European languages where you don't have to worry about k-umlaut,
say. Also, a standalone accent-char would always be meaningless.

(These accent-chars don't actually exist independently in the syllabary
that a Hindi speaker might learn in school: they're combining forms of
vowels and are treated differently from characters in practice.)

 Also, if there are any contexts where the right translation of an
 accent-char depends on the base-char, you couldn't do it with the
 patch as it stands.

I can't think of a satisfactory example at the moment, but that sounds
entirely plausible.

 It's not unlikely that we want this patch *and* an improvement that
 allows multi-character src strings

I think it's enough to apply just this patch, but I wouldn't object to
doing both if it were easy. It's not clear to me if that's true after a
quick glance at the code, but I'll look again when I'm properly awake.

 Lastly, I didn't especially like the coding details of either proposed
 patch, and rewrote it as attached.

:-)

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Andres Freund
On 2014-06-30 13:45:52 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  I'm personally not convinced that we're approaching this topic in the
  right way.  I'm not convinced that it's at all reasonable to try to
  emulate atomics on platforms that don't have them.  I would punt the
  problem into the next layer and force things like lwlock.c to have
  fallback implementations at that level that don't require atomics, and
  remove those fallback implementations if and when we move the
  goalposts so that all supported platforms must have working atomics
  implementations.  People who write code that uses atomics are not
  likely to think about how those algorithms will actually perform when
  those atomics are merely emulated, and I suspect that means that in
  practice platforms that have only emulated atomics are going to
  regress significantly vs. the status quo today.
 
 I think this is a valid objection, and I for one am not prepared to
 say that we no longer care about platforms that don't have atomic ops
 (especially not if it's not a *very small* set of atomic ops).

It's still TAS(), cmpxchg(), xadd. With TAS/xadd possibly implemented
via cmpxchg (which quite possibly *is* the native implementation for
$arch).

 Also, just because a platform claims to have atomic ops doesn't mean that
 those ops perform well.  If there's a kernel trap involved, they don't,
 at least not for our purposes.

Yea. I think if we start to use 8byte atomics at some later point we're
going to have to prohibit e.g. older arms from using them, even if the
compiler claims they're supported. But that's just one #define.

 This is one
 reason why I'm extremely suspicious of depending on gcc's intrinsics for
 this; that will not make the issue go away, only make it beyond our
 power to control.

The patch as submitted makes it easy to provide a platform specific
implementation of the important routines if it's likely that it's better
than the intrinsics provided one.

I'm not too worried about the intrinsics though - the number of projects
relying on them these days is quite large.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-06-30 Thread Noah Misch
On Mon, Jun 30, 2014 at 07:28:03PM +, Christian Ullrich wrote:
 * From: Noah Misch [mailto:n...@leadboat.com]
 
  I liked the proposal here; was there a problem with it?
  http://www.postgresql.org/message-
  id/ca+tgmoz3ake4enctmqmzsykc_0pjl_u4c_x47ge48uy1upb...@mail.gmail.com
 
 You're referring to the suggestion of accepting and ignoring the option on
 non-Windows, right? I can do that, I just don't see the point as long as
 pg_ctl has a separate code path (well, #ifdef) for Windows anyway.

Yes.  We normally recognize platform-specific options on every platform.  For
example, the effective_io_concurrency GUC exists on all platforms, but you can
only change it on platforms where it matters.  In that light, one could argue
for raising an error for --background on non-Windows systems.  I don't have a
strong opinion on raising an error vs. ignoring the option, but I do think the
outcome of --background should be distinguishable from the outcome of
--sometypo on all platforms.

  The pg_upgrade test suite and the $(prove_check)-based test suites rely on
  their pg_ctl-started postmasters receiving any console ^C.  pg_ctl
  deserves a --foreground or --no-background option for callers that prefer
  the current behavior.  That, or those tests need a new way to launch the
  postmaster.
 
 Ah. More good news. Just to confirm, this is only about the tests, right,
 not the code they are testing?

Yes; the consequence of ignoring ^C is that the test postmaster would persist
indefinitely after the ^C.  The system under test doesn't care per se, but
future test runs might fail strangely due to the conflicting postmaster.  The
user running the tests won't appreciate the clutter in his process list.

 If so, is there even a way to run either on Windows? The pg_upgrade test
 suite is a shell script, and prove_check is defined in Makefile.global and 
 definitely not applicable to Windows.

contrib/pg_upgrade/test.sh works under MSYS.  Perhaps $(prove_check) currently
doesn't, but that's something to be fixed, not relied upon.  There's a clone
of contrib/pg_upgrade/test.sh in src/tools/msvc/vcregress.pl, and I expect it
would have the same problem.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-06-30 Thread Tomas Vondra
Hi,

attached is v5 of the patch. The main change is that scaling the number
of buckets is done only once, after the initial hash table is build. The
main advantage of this is lower price. This also allowed some cleanup of
unecessary code.

However, this new patch causes warning like this:

WARNING:  temporary file leak: File 231 still referenced

I've been eyeballing the code for a while now, but I still fail to see
where this comes from :-( Any ideas?

regards
Tomas
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0d9663c..db3a953 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1900,18 +1900,20 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		if (es-format != EXPLAIN_FORMAT_TEXT)
 		{
 			ExplainPropertyLong(Hash Buckets, hashtable-nbuckets, es);
+			ExplainPropertyLong(Original Hash Buckets,
+hashtable-nbuckets_original, es);
 			ExplainPropertyLong(Hash Batches, hashtable-nbatch, es);
 			ExplainPropertyLong(Original Hash Batches,
 hashtable-nbatch_original, es);
 			ExplainPropertyLong(Peak Memory Usage, spacePeakKb, es);
 		}
-		else if (hashtable-nbatch_original != hashtable-nbatch)
+		else if ((hashtable-nbatch_original != hashtable-nbatch) || (hashtable-nbuckets_original != hashtable-nbuckets))
 		{
 			appendStringInfoSpaces(es-str, es-indent * 2);
 			appendStringInfo(es-str,
-			Buckets: %d  Batches: %d (originally %d)  Memory Usage: %ldkB\n,
-			 hashtable-nbuckets, hashtable-nbatch,
-			 hashtable-nbatch_original, spacePeakKb);
+			Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n,
+			 hashtable-nbuckets, hashtable-nbuckets_original,
+			 hashtable-nbatch, hashtable-nbatch_original, spacePeakKb);
 		}
 		else
 		{
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 589b2f1..fbd721d 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -37,8 +37,10 @@
 #include utils/lsyscache.h
 #include utils/syscache.h
 
+bool enable_hashjoin_bucket = true;
 
 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
+static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
 	  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -48,6 +50,33 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);
 
 
+/*
+ * Compute appropriate size for hashtable given the estimated size of the
+ * relation to be hashed (number of rows and average row width).
+ *
+ * This is exported so that the planner's costsize.c can use it.
+ */
+
+/* Target bucket loading (tuples per bucket) */
+#define NTUP_PER_BUCKET			10
+
+/* Multiple of NTUP_PER_BUCKET triggering the increase of nbuckets.
+ * 
+ * Once we reach the threshold we double the number of buckets, and we
+ * want to get 1.0 on average (to get NTUP_PER_BUCKET on average). That
+ * means these two equations should hold
+ * 
+ *   b = 2a (growth)
+ *   (a + b)/2 = 1  (oscillate around NTUP_PER_BUCKET)
+ * 
+ * which means b=1. (a = b/2). If we wanted higher threshold, we
+ * could grow the nbuckets to (4*nbuckets), thus using (b=4a) for
+ * growth, leading to (b=1.6). Or (b=8a) giving 1. etc.
+ * 
+ * Let's start with doubling the bucket count, i.e. 1.333. */
+#define NTUP_GROW_COEFFICIENT	1.333
+#define NTUP_GROW_THRESHOLD		(NTUP_PER_BUCKET * NTUP_GROW_COEFFICIENT)
+
 /* 
  *		ExecHash
  *
@@ -126,6 +155,23 @@ MultiExecHash(HashState *node)
 		}
 	}
 
+	/* If average number of tuples per bucket is over the defined threshold,
+	 * increase the number of buckets to get below it. */
+	if (enable_hashjoin_bucket) {
+
+		double batchTuples = (hashtable-totalTuples / hashtable-nbatch);
+
+		if (batchTuples  (hashtable-nbuckets * NTUP_GROW_THRESHOLD)) {
+
+#ifdef HJDEBUG
+			printf(Increasing nbucket to %d (average per bucket = %.1f)\n,
+   nbuckets,  (batchTuples / hashtable-nbuckets));
+#endif
+			ExecHashIncreaseNumBuckets(hashtable);
+
+		}
+	}
+
 	/* must provide our own instrumentation support */
 	if (node-ps.instrument)
 		InstrStopNode(node-ps.instrument, hashtable-totalTuples);
@@ -271,6 +317,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	 */
 	hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
 	hashtable-nbuckets = nbuckets;
+	hashtable-nbuckets_original = nbuckets;
 	hashtable-log2_nbuckets = log2_nbuckets;
 	hashtable-buckets = NULL;
 	hashtable-keepNulls = keepNulls;
@@ -375,17 +422,6 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	return hashtable;
 }
 
-
-/*
- * Compute appropriate size for hashtable given the estimated size of the
- * relation to be hashed (number of rows and average row width).
- *
- * This 

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-06-30 Thread Christian Ullrich
* From: Noah Misch [mailto:n...@leadboat.com]

 On Mon, Jun 30, 2014 at 07:28:03PM +, Christian Ullrich wrote:

  * From: Noah Misch [mailto:n...@leadboat.com]

   I liked the proposal here; was there a problem with it?
   http://www.postgresql.org/message-
   id/ca+tgmoz3ake4enctmqmzsykc_0pjl_u4c_x47ge48uy1upb...@mail.gmail.co
   m
 
  You're referring to the suggestion of accepting and ignoring the
  option on non-Windows, right? I can do that, I just don't see the
  point as long as pg_ctl has a separate code path (well, #ifdef) for
  Windows anyway.
 
 Yes.  We normally recognize platform-specific options on every platform.
 For example, the effective_io_concurrency GUC exists on all platforms, but
 you can only change it on platforms where it matters.  In that light, one
 could argue for raising an error for --background on non-Windows systems.
 I don't have a strong opinion on raising an error vs. ignoring the option,
 but I do think the outcome of --background should be distinguishable from
 the outcome of --sometypo on all platforms.

I'm convinced, will change to --sometypo treatment.

   The pg_upgrade test suite and the $(prove_check)-based test suites
   rely on their pg_ctl-started postmasters receiving any console ^C.
   pg_ctl deserves a --foreground or --no-background option for callers
   that prefer the current behavior.  That, or those tests need a new
   way to launch the postmaster.
 
  Ah. More good news. Just to confirm, this is only about the tests,
  right, not the code they are testing?
 
 Yes; the consequence of ignoring ^C is that the test postmaster would
 persist indefinitely after the ^C.  The system under test doesn't care per

No it won't, please see below.

  If so, is there even a way to run either on Windows? The pg_upgrade
  test suite is a shell script, and prove_check is defined in
  Makefile.global and definitely not applicable to Windows.
 
 contrib/pg_upgrade/test.sh works under MSYS.  Perhaps $(prove_check)
 currently doesn't, but that's something to be fixed, not relied upon.

Yes. That doesn't matter, though.

 There's a clone of contrib/pg_upgrade/test.sh in
 src/tools/msvc/vcregress.pl, and I expect it would have the same problem.

Oh, right. I didn't notice that because it doesn't mention upgradecheck in its 
usage message.

I think I know where we're talking past each other. You may be assuming that 
kill(postmaster, SIGINT) would be affected by my patch. It would not. 
PostgreSQL's signal emulation on Windows works completely separately from the 
actual Windows analogy to signals in console processes. My patch drops these 
analogous events, but the emulated signals still work the same way as before.

When Ctrl-C is pressed in a console window, pg_console_handler() in 
backend/port/win32/signal.c is called with a CTRL_C_EVENT, and converts that to 
an emulated SIGINT (unless I tell it not to). That emulated SIGINT is then 
processed as usual. pg_ctl -m fast stop sends the emulated SIGINT directly.

Anyway, I just ran upgradecheck, and it reported success without leaving any 
stale postmasters around.

-- 
Christian


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-30 Thread Jeff Janes
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar dilip.ku...@huawei.com wrote:
...

 Updated patch is attached in the mail..

Thanks Dilip.

I get a compiler warning when building on Windows.  When I started
looking into that, I see that two files have too much code duplication
between them:

src/bin/scripts/vac_parallel.c   (new file)
src/bin/pg_dump/parallel.c  (existing file)

In particular, pgpipe is almost an exact duplicate between them,
except the copy in vac_parallel.c has fallen behind changes made to
parallel.c.  (Those changes would have fixed the Windows warnings).  I
think that this function (and perhaps other parts as
well--exit_horribly for example) need to refactored into a common
file that both files can include.  I don't know where the best place
for that would be, though.  (I haven't done this type of refactoring
myself.)

Also, there are several places in the patch which use spaces for
indentation where tabs are called for by the coding style. It looks
like you may have copied the code from one terminal window and copied
it into another one, converting tabs to spaces in the process.  This
makes it hard to evaluate the amount of code duplication.

In some places the code spins in a tight loop while waiting for a
worker process to become free.  If I strace the process, I got a long
list of selects with 0 time outs:

select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout)

I have not tried to track down the code that causes it.  I did notice
that vacuumdb spends an awful lot of time at the top of the Linux
top output, and this is probably why.


Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fresh initdb contains a few deleted B-Tree pages

2014-06-30 Thread Peter Geoghegan
I would like to put on the record that there are a few deleted B-Tree
pages in a freshly initdb'd database. As my btreecheck tools shows:

mgd=# select bt_index_verify(indexrelid::regclass::text) from pg_index;
NOTICE:  0: page 12 of index pg_attribute_relid_attnam_index is deleted
LOCATION:  bt_page_verify_worker, btreecheck.c:206
NOTICE:  0: page 13 of index pg_attribute_relid_attnam_index is deleted
LOCATION:  bt_page_verify_worker, btreecheck.c:206
NOTICE:  0: page 14 of index pg_attribute_relid_attnam_index is deleted
LOCATION:  bt_page_verify_worker, btreecheck.c:206
NOTICE:  0: page 9 of index pg_attribute_relid_attnum_index is deleted
LOCATION:  bt_page_verify_worker, btreecheck.c:206

(contrib/pageinspect's bt_page_items() will show a similar message if
this is done manually)

This is not a new-to-9.4 issue. I am not suggesting that it's a real
problem that requires immediate fixing, but it is suboptimal. We may
wish to fix this someday.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-06-30 Thread Fujii Masao
On Tue, Jul 1, 2014 at 5:51 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Thanks for the review!


 On 06/27/2014 09:12 AM, Michael Paquier wrote:

 Looking at this patch, it does not compile when assertions are enabled
 because of this assertion in heapam.c:
 Assert(recdata == data + datalen);
 It seems like a thinko as removing it makes the code compile.


 Fixed.


 Some comments about the code:
 1) In src/backend/access/transam/README:
 's/no further action is no required/no further action is required/g'


 Fixed.


 2) Description of XLogBeginInsert, XLogHasBlockRef, XLogGetBlockRef and
 XLogGetBlockRefIds are missing in src/backend/access/transam/README.


 Added descriptions of XLogBeginInsert and XLogHasBlockRef. Ḯ'm not sure
 what to do with the latter two. They're not really intended for use by redo
 routines, but for things like pg_xlogdump that want to extract information
 about any record type.


 3) There are a couple of code blocks marked as FIXME and BROKEN. There are
 as well some NOT_USED blocks.


 Yeah. The FIXMEs and BROKEN blocks are all in rm_desc routines. They mostly
 stand for code that used to print block numbers or relfilenodes, which
 doesn't make much sense to do in an ad hoc way in rmgr-specific desc
 routines anymore. Will need to go through them all an decide what's the
 important information to print for each record type.

 The few NOT_USED blocks are around code in redo routines that extract some
 information from the WAL record, that isn't needed anymore. We could remove
 the fields from the WAL records altogether, but might be useful to keep for
 debugging purposes.


   4) Current patch crashes when running make check in
 contrib/test_decoding.
 Looking closer, wal_level = logical is broken:
 =# show wal_level;
   wal_level
 ---
   logical
 (1 row)
 =# create database foo;
 The connection to the server was lost. Attempting reset: Failed.


 Fixed.


 Looking even closer, log_heap_new_cid is broken:


 Fixed.


 6) XLogRegisterData creates a XLogRecData entry and appends it in one of
 the static XLogRecData entries if buffer_id = 0 or to
 rdata_head/rdata_tail if buffer_id = -1 (for permanent data related to the
 WAL record). XLogRegisterXLogRecData does something similar, but with an
 entry of XLogRecData already built. What do you think about removing
 XLogRegisterXLogRecData and keeping only XLogRegisterData. This makes the
 interface simpler, and XLogRecData could be kept private in xlog.c. This
 would need more work especially on gin side where I am seeing for example
 constructLeafRecompressWALData directly building a XLogRecData entry.


 Hmm. I considered that, but punted because I didn't want to rewrite all the
 places that currently use XLogRecDatas in less straightforward ways. I kept
 XLogRegisterXLogRecData as an escape hatch for those.

 But I bit the bullet now, and got rid of all the XLogRegisterXLogRecData()
 calls. XLogRecData struct is now completely private to xlog.c.

 Another big change in the attached patch version is that XLogRegisterData()
 now *copies* the data to a working area, instead of merely adding a
 XLogRecData entry to the chain. This simplifies things in xlog.c, now that
 the XLogRecData concept is not visible to the rest of the system. This is a
 subtle semantic change for the callers: you can no longer register a struct
 first, and fill the fields afterwards.

 That adds a lot of memcpys to the critical path, which could be bad for
 performance. In practice, I think it's OK, or even a win, because a typical
 WAL record payload is very small. But I haven't measured that. If it becomes
 an issue, we could try to optimize, and link larger data blocks to the rdata
 chain, but memcpy small small chunks of data. There are a few WAL record
 types that don't fit in a small statically allocated buffer, so I provided a
 new function, XLogEnsureSpace(), that can be called before XLogBegin() to
 allocate a large-enough working area.

 These changes required changing a couple of places where WAL records are
 created:

 * ginPlaceToPage. This function has a strange split of responsibility
 between ginPlaceToPage and dataPlaceToPage/entryPlaceToPage. ginPlaceToPage
 calls dataPlaceToPage or entryPlaceToPage, depending on what kind of a tree
 it is, and dataPlaceToPage or entryPlaceToPage contributes some data to the
 WAL record. Then ginPlaceToPage adds some more, and finally calls
 XLogInsert(). I simplified the SPLIT case so that we now just create full
 page images of both page halves. We were already logging all the data on
 both page halves, but we were doing it in a smarter way, knowing what kind
 of pages they were. For example, for an entry tree page, we included an
 IndexTuple struct for every tuple on the page, but didn't include the item
 pointers. A straight full page image takes more space, but not much, so I
 think in any real application it's going to be lost in noise. (again,
 haven't measured it though)


 8) 

Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-30 Thread Andres Freund
On 2014-06-30 19:22:59 +0200, Andres Freund wrote:
 On 2014-06-30 12:46:29 -0400, Robert Haas wrote:
 , which if I understand you correctly are ARM without GCC
  atomics, Sparc, and MIPS.
 
 I've to revise my statement on MIPS, it actually looks safe. I seem to
 have missed that it has its own S_UNLOCK.

So, here's my first blind attempt at fixing these. Too tired to think
much more about it. Sparc's configurable cache coherency drives me
nuts. Linux apparently switched somewhere in 2011 from RMO (relaxed
memory order) to TSO (total store order), solaris always used TSO, but
the BSDs don't. Man.

Accordingly there's a somewhat ugly thingy attached. I don't think this
is really ready, but it's what I can come up today.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 691839dc8fbb78134b530096d33d8a1307231eef Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 1 Jul 2014 00:11:17 +0200
Subject: [PATCH] Fix spinlock implementations for some sparc and arm
 platforms.

When compiling postgres on arm using a older gcc, that doesn't yet
understand __sync_lock_test_and_set(), the default S_UNLOCK routine
was used. Unfortunately that's not correct for ARM's memory model. The
support for older gccs is using the swbp instruction (available for
both arvm5 and most armv6) - unfortunately there's not a common
barrier instruction for those architecture versions. So instead
implement unlock using swbp again, that's possibly a bit slower but
correct.

Some Sparc CPUs can be run in various coherence models, ranging from
RMO (relaxed) over PSO (partial) to TSO (total). Solaris has always
run CPUs in TSO mode, but linux didn't use to and the various *BSDs
still don't. Unfortunately the sparc TAS/S_UNLOCK were only correct
under TSO. Fix that by adding the necessary memory barrier
instructions. On sparcv8+, which should be all relavant CPUs, these
are treated as NOPs if the current consistency model doesn't require
the barriers.

Blindly written, so possibly not working.

They didn't use to use correct acquire/
---
 src/backend/port/tas/sunstudio_sparc.s |  2 +
 src/include/storage/s_lock.h   | 74 +-
 2 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/src/backend/port/tas/sunstudio_sparc.s b/src/backend/port/tas/sunstudio_sparc.s
index 486b7be..dcf54e2 100644
--- a/src/backend/port/tas/sunstudio_sparc.s
+++ b/src/backend/port/tas/sunstudio_sparc.s
@@ -37,6 +37,8 @@ pg_atomic_cas:
 	!
 	!   http://cvs.opensolaris.org/source/xref/on/usr/src/lib/libc/sparc/threads/sparc.il
 	!
+	! NB: We're assuming we're running on a TSO system here - solaris
+	! userland luckily always has done so.
 
 #if defined(__sparcv9) || defined(__sparcv8plus)
 	cas [%o0],%o2,%o1
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 895abe6..9aa9e8e 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -341,6 +341,30 @@ tas(volatile slock_t *lock)
 	return (int) _res;
 }
 
+/*
+ * Implement unlocking using swpb as well, that guarantees release
+ * semantics. Only some armv5 models have the data synchronization barrier
+ * instructions - but those need it. Luckily swpb always works.
+ */
+#define S_UNLOCK(lock)		s_unlock(lock)
+
+static __inline__ void
+s_unlock(volatile slock_t *lock)
+{
+	register slock_t _res = 0;
+
+	__asm__ __volatile__(
+			swpb 	%0, %0, [%2]	\n
+:		+r(_res), +m(*lock)
+:		r(lock)
+:		memory);
+
+	/* lock should have been held */
+	Assert(res == 1);
+}
+
+#define S_INIT_LOCK(lock)	(*(lock) == 0)
+
 #endif	 /* HAVE_GCC_INT_ATOMICS */
 #endif	 /* __arm__ */
 
@@ -393,6 +417,12 @@ tas(volatile slock_t *lock)
 
 
 #if defined(__sparc__)		/* Sparc */
+/*
+ * Solaris has always run sparc processors in TSO (total store) mode, but
+ * linux didn't use to and the *BSDs still don't. So, be careful about
+ * acquire/release semantics. The CPU will treat superflous membars as NOPs,
+ * so it's just code space.
+ */
 #define HAS_TEST_AND_SET
 
 typedef unsigned char slock_t;
@@ -414,9 +444,51 @@ tas(volatile slock_t *lock)
 :		=r(_res), +m(*lock)
 :		r(lock)
 :		memory);
+#if defined(__sparcv7)
+	/*
+	 * No stbar or membar available, luckily no actually produced hardware
+	 * requires a barrier
+	 */
+#elif defined(__sparcv8)
+	/* stbar is available (and required for both PSO, RMO), membar isn't */
+	__asm__ __volatile__ (stbar	 \n:::memory);
+#else
+	/*
+	 * #StoreLoad (RMO) | #StoreStore (PSO, RMO)are the approppriate release
+	 * barrier for sparcv8 upwards.
+	 */
+	__asm__ __volatile__ (membar  #StoreLoad | #StoreStore \n:::memory);
+#endif
 	return (int) _res;
 }
 
+#if defined(__sparcv7)
+/*
+ * No stbar or membar available, luckily no actually produced hardware
+ * requires a barrier
+ */
+#define S_UNLOCK(lock)		(*((volatile slock_t *) (lock)) = 0)
+#elif  __sparcv8
+/* stbar is available (and required 

Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-06-30 Thread Tomas Vondra
On 30.6.2014 23:12, Tomas Vondra wrote:
 Hi,
 
 attached is v5 of the patch. The main change is that scaling the number
 of buckets is done only once, after the initial hash table is build. The
 main advantage of this is lower price. This also allowed some cleanup of
 unecessary code.
 
 However, this new patch causes warning like this:
 
 WARNING:  temporary file leak: File 231 still referenced
 
 I've been eyeballing the code for a while now, but I still fail to see
 where this comes from :-( Any ideas?

Meh, the patches are wrong - I haven't realized the tight coupling
between buckets/batches in ExecHashGetBucketAndBatch:

  *bucketno = hashvalue  (nbuckets - 1);
  *batchno = (hashvalue  hashtable-log2_nbuckets)  (nbatch - 1);

The previous patches worked mostly by pure luck (the nbuckets was set
only in the first batch), but once I moved the code at the end, it
started failing. And by worked I mean didn't throw an error, but
probably returned bogus results with (nbatch1).

However, ISTM this nbuckets-nbatch coupling is not really necessary,
it's only constructed this way to assign independent batch/bucket. So I
went and changed the code like this:

  *bucketno = hashvalue  (nbuckets - 1);
  *batchno = (hashvalue  (32 - hashtable-log2_nbatch));

I.e. using the top bits for batchno, low bits for bucketno (as before).

Hopefully I got it right this time. At least it seems to be working for
cases that failed before (no file leaks, proper rowcounts so far).

regards
Tomas
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0d9663c..db3a953 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1900,18 +1900,20 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		if (es-format != EXPLAIN_FORMAT_TEXT)
 		{
 			ExplainPropertyLong(Hash Buckets, hashtable-nbuckets, es);
+			ExplainPropertyLong(Original Hash Buckets,
+hashtable-nbuckets_original, es);
 			ExplainPropertyLong(Hash Batches, hashtable-nbatch, es);
 			ExplainPropertyLong(Original Hash Batches,
 hashtable-nbatch_original, es);
 			ExplainPropertyLong(Peak Memory Usage, spacePeakKb, es);
 		}
-		else if (hashtable-nbatch_original != hashtable-nbatch)
+		else if ((hashtable-nbatch_original != hashtable-nbatch) || (hashtable-nbuckets_original != hashtable-nbuckets))
 		{
 			appendStringInfoSpaces(es-str, es-indent * 2);
 			appendStringInfo(es-str,
-			Buckets: %d  Batches: %d (originally %d)  Memory Usage: %ldkB\n,
-			 hashtable-nbuckets, hashtable-nbatch,
-			 hashtable-nbatch_original, spacePeakKb);
+			Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n,
+			 hashtable-nbuckets, hashtable-nbuckets_original,
+			 hashtable-nbatch, hashtable-nbatch_original, spacePeakKb);
 		}
 		else
 		{
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 589b2f1..48cca62 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -37,8 +37,10 @@
 #include utils/lsyscache.h
 #include utils/syscache.h
 
+bool enable_hashjoin_bucket = true;
 
 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
+static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
 	  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -48,6 +50,33 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable,
 static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);
 
 
+/*
+ * Compute appropriate size for hashtable given the estimated size of the
+ * relation to be hashed (number of rows and average row width).
+ *
+ * This is exported so that the planner's costsize.c can use it.
+ */
+
+/* Target bucket loading (tuples per bucket) */
+#define NTUP_PER_BUCKET			10
+
+/* Multiple of NTUP_PER_BUCKET triggering the increase of nbuckets.
+ * 
+ * Once we reach the threshold we double the number of buckets, and we
+ * want to get 1.0 on average (to get NTUP_PER_BUCKET on average). That
+ * means these two equations should hold
+ * 
+ *   b = 2a (growth)
+ *   (a + b)/2 = 1  (oscillate around NTUP_PER_BUCKET)
+ * 
+ * which means b=1. (a = b/2). If we wanted higher threshold, we
+ * could grow the nbuckets to (4*nbuckets), thus using (b=4a) for
+ * growth, leading to (b=1.6). Or (b=8a) giving 1. etc.
+ * 
+ * Let's start with doubling the bucket count, i.e. 1.333. */
+#define NTUP_GROW_COEFFICIENT	1.333
+#define NTUP_GROW_THRESHOLD		(NTUP_PER_BUCKET * NTUP_GROW_COEFFICIENT)
+
 /* 
  *		ExecHash
  *
@@ -116,6 +145,7 @@ MultiExecHash(HashState *node)
 /* It's a skew tuple, so put it into that hash table */
 ExecHashSkewTableInsert(hashtable, slot, hashvalue,
 		bucketNumber);
+hashtable-skewTuples += 1;
 			}
 			else
 			{
@@ -126,6 +156,23 @@ MultiExecHash(HashState *node)
 		

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-30 Thread Tom Lane
Abhijit Menon-Sen a...@2ndquadrant.com writes:
 At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote:
 Anyway, this raises the question of whether the current patch is
 actually a desirable way to do things, or whether it would be better
 if the unaccenting rules were like base-char accent-char -
 base-char.

 It might be useful to be able to write such rules, but it would be
 highly impractical to do so instead of being able to single out
 accent-chars for removal.

On reflection, if we were thinking of this as a general
substring-replacement mechanism rather than just a de-accenter
(and why shouldn't we think of it that way?), then clearly both
multi-character source strings and zero-character substitute strings
could be of value.  The fact that the existing patch only fixes
one of those omissions isn't a strike against it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap vacuum cleanup locks

2014-06-30 Thread Greg Stark
Zombie thread arise!

I was searching for old threads on a specific problem and came across
this patch that was dropped due to concerns about SnapshotNow:

On Wed, Nov 2, 2011 at 3:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark gsst...@mit.edu wrote:
 Well it's super-exclusive-vacuum-lock avoidance techniques. Why
 shouldn't it make more sense to try to reduce the frequency and impact
 of the single-purpose outlier in a non-critical-path instead of
 burdening every other data reader with extra overhead?

 I think Robert's plan is exactly right though I would phrase it
 differently. We should get the exclusive lock, freeze/kill any xids
 and line pointers, then if the pin-count is 1 do the compaction.

 I wrote a really neat patch to do this today...  and then, as I
 thought about it some more, I started to think that it's probably
 unsafe.  Here's the trouble: with this approach, we assume that it's
 OK to change the contents of the line pointer while holding only an
 exclusive lock on the buffer.  But there is a good deal of code out
 there that thinks it's OK to examine a line pointer with only a pin on
 the buffer (no lock).  You need a content lock to scan the item
 pointer array, but once you've identified an item of interest, you're
 entitled to assume that it won't be modified while you hold a buffer
 pin.  Now, if you've identified a particular tuple as being visible to
 your scan, then you might think that VACUUM shouldn't be removing it
 anyway.  But I think that's only true for MVCC scans - for example,
 what happens under SnapshotNow semantics?

Well now that you've done all that amazing work eliminating
SnapshotNow maybe this patch deserves another look? I see
anti-wraparound vacuums more and more often as database transaction
velocity increases and vacuum takes longer and longer as database
sizes increase. Having a faster vacuum that can skip vacuuming pages
but is still guaranteed to freeze every page is pretty tempting.

Of course the freeze epoch in the header obviating the need for
freezing is an even more attractive goal but AIUI we're not even sure
that's going to work and this is a nice easy win.

  But then then on third
 thought, if you've also got an MVCC snapshot taken before the start of
 the SnapshotNow scan, you are probably OK, because your advertised
 xmin should prevent anyone from removing anything anyway, so how do
 you actually provoke a failure?

 Anyway, I'm attaching the patch, in case anyone has any ideas on where
 to go with this.

 I'm really wishing we had more bits in the vm. It looks like we could use:
  - contains not-all-visible tuples
  - contains not-frozen xids
  - in need of compaction

Another idea would be to store the upper 2-4 bits of the oldest xid in
the page. That would let you determine whether it's in need of an
anti-wraparound vacuum.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-30 Thread Alvaro Herrera
Jeff Janes wrote:

 In particular, pgpipe is almost an exact duplicate between them,
 except the copy in vac_parallel.c has fallen behind changes made to
 parallel.c.  (Those changes would have fixed the Windows warnings).  I
 think that this function (and perhaps other parts as
 well--exit_horribly for example) need to refactored into a common
 file that both files can include.  I don't know where the best place
 for that would be, though.  (I haven't done this type of refactoring
 myself.)

I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
Maybe we should move pgpipe back to src/port and have pg_dump and this
new thing use that.  I'm not sure about the rest of duplication in
vac_parallel.c; there might be a lot in common with what
pg_dump/parallel.c does too.  Having two copies of code is frowned upon
for good reasons.  This patch introduces 1200 lines of new code in
vac_parallel.c, ugh.

If we really require 1200 lines to get parallel vacuum working for
vacuumdb, I would question the wisdom of this effort.  To me, it seems
better spent improving autovacuum to cover whatever it is that this
patch is supposed to be good for --- or maybe just enable having a shell
script that launches multiple vacuumdb instances in parallel ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2014-06-30 Thread Dave McGuire
On 06/29/2014 03:10 PM, Patrick Finnegan wrote:
 And it also runs on the 11/780 which can have multiple CPUs... but I've
 never seen support for using more than one CPU (and the NetBSD page
 still says NetBSD/vax can only make use of one CPU on multi-CPU
 machines).  If that has changed, I'd love to hear about it.  Support
 for my VAX 6000 would also be nice...

  It changed well over a decade ago, if memory serves.  The specific
work was done on a VAX-8300 or -8350.  I'm pretty sure the 11/780's
specific flavor of SMP is not supported.  (though I do have a pair of
11/785s here...wanna come hack? ;))

   -Dave

-- 
Dave McGuire, AK4HZ/3
New Kensington, PA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2014-06-30 Thread Dave McGuire
On 06/29/2014 10:24 AM, Tom Lane wrote:
 Is there anyone in the NetBSD/VAX community who would be willing to
 host a PG buildfarm member?
 
   I could put together a simh-based machine (i.e., fast) on a vm, if
 nobody else has stepped up for this.
 
 No other volunteers have emerged, so if you'd do that it'd be great.

  Ok, I am certainly willing to do it.  Though I haven't used PostgreSQL
in quite awhile, I ran it A LOT back when its query language was
PostQUEL, and later when it was known as Postgres95.  It'd give me a
serious warm fuzzy to be able to support the project in some way.

 Dave McGuire, AK4HZ/3
 New Kensington, PA
 
 Hey, right up the river from here!

  Come on up and hack!  There's always something neat going on around
here.  Ever run a PDP-11?  B-)

 -Dave

-- 
Dave McGuire, AK4HZ/3
New Kensington, PA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2014-06-30 Thread Patrick Finnegan
On Sun, Jun 29, 2014 at 3:12 PM, Dave McGuire mcgu...@neurotica.com wrote:

 On 06/29/2014 03:10 PM, Patrick Finnegan wrote:
  And it also runs on the 11/780 which can have multiple CPUs... but I've
  never seen support for using more than one CPU (and the NetBSD page
  still says NetBSD/vax can only make use of one CPU on multi-CPU
  machines).  If that has changed, I'd love to hear about it.  Support
  for my VAX 6000 would also be nice...

   It changed well over a decade ago, if memory serves.  The specific
 work was done on a VAX-8300 or -8350.  I'm pretty sure the 11/780's
 specific flavor of SMP is not supported.  (though I do have a pair of
 11/785s here...wanna come hack? ;))


If it works, someone should update the documentation. :)

Which flavor of 11/78x MP? The official DEC kind (which is really just two
computers with a block of shared memory) or the Purdue kind (which isn't
quite SMP, but actually shares the system bus)?

Pat


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2014-06-30 Thread Anders Magnusson


Dave McGuire skrev 2014-06-29 21:01:

On 06/29/2014 02:58 PM, Patrick Finnegan wrote:

Last I checked, NetBSD doesn't support any sort of multiprocessor VAX.
  Multiprocessor VAXes exist, but you're stuck with either Ultrix or VMS
on them.

   Hi Pat, it's good to see your name in my inbox.

   NetBSD ran on multiprocessor BI-bus VAXen many, many years ago.  Is
that support broken?

I made it run on 8300 once, in the early days of NetBSD MP.  I planned 
to make it run on both 8800 and 6300, but due to lack of docs neither of 
those came true.
So, unless someone has a 8300 to test on (just over 1 VUPS per CPU, not 
much), current state is unknown.  It worked last time I tested :-)


-- Ragge


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2014-06-30 Thread Patrick Finnegan
On Sun, Jun 29, 2014 at 3:01 PM, Dave McGuire mcgu...@neurotica.com wrote:

 On 06/29/2014 02:58 PM, Patrick Finnegan wrote:
  Last I checked, NetBSD doesn't support any sort of multiprocessor VAX.
   Multiprocessor VAXes exist, but you're stuck with either Ultrix or VMS
  on them.

   Hi Pat, it's good to see your name in my inbox.


Hi! :)




  NetBSD ran on multiprocessor BI-bus VAXen many, many years ago.  Is
 that support broken?


And it also runs on the 11/780 which can have multiple CPUs... but I've
never seen support for using more than one CPU (and the NetBSD page still
says NetBSD/vax can only make use of one CPU on multi-CPU machines).  If
that has changed, I'd love to hear about it.  Support for my VAX 6000 would
also be nice...
.

Pat


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2014-06-30 Thread Patrick Finnegan
Last I checked, NetBSD doesn't support any sort of multiprocessor VAX.
 Multiprocessor VAXes exist, but you're stuck with either Ultrix or VMS on
them.

Pat


On Sun, Jun 29, 2014 at 2:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Dave McGuire mcgu...@neurotica.com writes:
  On 06/29/2014 10:54 AM, Andres Freund wrote:
  Maybe I'm just not playful enough, but keeping a platform alive so we
  can run postgres in simulator seems a bit, well, pointless.

On the in a simulator matter: It's important to keep in mind that
  there are more VAXen out there than just simulated ones.  I'm offering
  up a simulated one here because I can spin it up in a dedicated VM on a
  VMware host that's already running and I already have power budget for.
   I could just as easily run it on real hardware...there are, at last
  count, close to forty real-iron VAXen here, but only a few of those are
  running 24/7.  I'd happily bring up another one to do Postgres builds
  and testing, if someone will send me the bucks to pay for the additional
  power and cooling.  (that is a real offer)

 Well, the issue from our point of view is that a lot of what we care about
 testing is extremely low-level hardware behavior, like whether spinlocks
 work as expected across processors.  It's not clear that a simulator would
 provide a sufficiently accurate emulation.

 OTOH, the really nasty issues like cache coherency rules don't arise in
 single-processor systems.  So unless you have a multiprocessor VAX
 available to spin up, a simulator may tell us as much as we'd learn
 anyway.

 (If you have got one, maybe some cash could be found --- we do have
 project funds available, and I think they'd be well spent on testing
 purposes.  I don't make those decisions though.)

 regards, tom lane




Re: [HACKERS] Cluster name in ps output

2014-06-30 Thread Andres Freund
Hi,

On 2014-06-28 15:08:45 +0200, Andres Freund wrote:
 Otherwise it looks good to me.

So, I'd looked at it with an eye towards committing it and found some
more things. I've now
* added the restriction that the cluster name can only be ASCII. It's
  shown from several clusters with differing encodings, and it's shown
  in process titles, so that seems wise.
* moved everything to the LOGGING_WHAT category
* added minimal documention to monitoring.sgml
* expanded the config.sgml entry to mention the restriction on the name.
* Changed the GUCs short description

I'll leave the patch sit a while before actually committing it.

I also think, but haven't done so, we should add a double colon after
the cluster name, so it's not:

postgres: server1 stats collector process
but
postgres: server1: stats collector process

Comments?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From cdef775befc2a770e3ca29c434ae42666375358b Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sun, 29 Jun 2014 11:40:53 +0200
Subject: [PATCH] Add cluster_name GUC which will be included in process titles
 if set.

When running several postgres clusters on one OS instance it's often
inconveniently hard to identify which postgres process belongs to
which instance.

Add the cluster_name GUC which will be included as part of the process
titles if set. With that processes can more easily identified using
tools like 'ps'.

Thomas Munro, with some adjustments by me and review by a host of people.
---
 doc/src/sgml/config.sgml  | 23 ++
 doc/src/sgml/monitoring.sgml  | 16 +++
 src/backend/utils/misc/guc.c  | 28 +++
 src/backend/utils/misc/postgresql.conf.sample |  3 ++-
 src/backend/utils/misc/ps_status.c| 22 +++--
 src/include/utils/guc.h   |  1 +
 6 files changed, 86 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e3d1c62..11d552e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4131,6 +4131,29 @@ local0.*/var/log/postgresql
   /listitem
  /varlistentry
 
+ varlistentry id=guc-cluster-name xreflabel=cluster_name
+  termvarnamecluster_name/varname (typestring/type)/term
+  indexterm
+   primaryvarnamecluster_name/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+Sets the cluster name that appears in the process title for all
+processes in this cluster. The name can be any string of less than
+symbolNAMEDATALEN/ characters (64 characters in a standard
+build). Only printable ASCII characters may be used in the
+varnameapplication_name/varname value. Other characters will be
+replaced with question marks (literal?/literal).  No name is shown
+if this parameter is set to the empty string literal''/ (which is
+the default). This parameter can only be set at server start.
+   /para
+   para
+The process title is typically viewed using programs like
+applicationps/ or, on Windows, applicationProcess Explorer/.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry
   termvarnamedebug_print_parse/varname (typeboolean/type)
   indexterm
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 69d99e7..88f22a1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -92,6 +92,22 @@ postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re
   /para
 
   para
+   If xref linkend=guc-cluster-name has been configured the
+   cluster name will also be show in commandps/ output:
+screen
+$ psql -c 'SHOW cluster_name'
+ cluster_name
+--
+ server1
+(1 row)
+
+$ ps aux|grep server1
+postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: server1 logger process
+...
+/screen
+  /para
+
+  para
If you have turned off xref linkend=guc-update-process-title then the
activity indicator is not updated; the process title is set only once
when a new process is launched.  On some platforms this saves a measurable
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6902c23..3a31a75 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -198,6 +198,7 @@ static void assign_effective_io_concurrency(int newval, void *extra);
 static void assign_pgstat_temp_directory(const char *newval, void *extra);
 static bool check_application_name(char **newval, void **extra, GucSource source);
 static void assign_application_name(const char *newval, void *extra);
+static bool check_cluster_name(char **newval, void **extra, GucSource source);
 static const char *show_unix_socket_permissions(void);
 static const char 

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2014-06-30 Thread John Klos

Well, the issue from our point of view is that a lot of what we care about
testing is extremely low-level hardware behavior, like whether spinlocks
work as expected across processors.  It's not clear that a simulator would
provide a sufficiently accurate emulation.

OTOH, the really nasty issues like cache coherency rules don't arise in
single-processor systems.  So unless you have a multiprocessor VAX
available to spin up, a simulator may tell us as much as we'd learn
anyway.

(If you have got one, maybe some cash could be found --- we do have
project funds available, and I think they'd be well spent on testing
purposes.  I don't make those decisions though.)


Depending on how often you'd like the system to try to run a compile, I'd 
be happy to run it on a VAXstation 4000/60. It runs bulk package builds 
for pkgsrc, but we could do a compile every week or so (every day would 
really eat into cycles for other packages).


John


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-30 Thread Noah Misch
On Mon, Jun 30, 2014 at 07:44:47PM +0200, Andres Freund wrote:
 On 2014-06-30 13:15:23 -0400, Robert Haas wrote:
  I'm personally not convinced that we're approaching this topic in the
  right way.  I'm not convinced that it's at all reasonable to try to
  emulate atomics on platforms that don't have them.  I would punt the
  problem into the next layer and force things like lwlock.c to have
  fallback implementations at that level that don't require atomics, and
  remove those fallback implementations if and when we move the
  goalposts so that all supported platforms must have working atomics
  implementations.
 
 If we're requiring a atomics-less implementation for lwlock.c,
 bufmgr. et al. I'll stop working on those features (and by extension on
 atomics). It's an utter waste of resources and maintainability trying to
 maintain parallel high level locking algorithms in several pieces of the
 codebase. The nonatomic versions will stagnate and won't actually work
 under load. I'd rather spend my time on something useful. The spinlock
 based atomics emulation is *small*. It's easy to reason about
 correctness there.
 
 If we're going that way, we've made a couple of absolute fringe
 platforms hold postgres, as actually used in reality, hostage. I think
 that's insane.

I agree completely.

  If this patch goes in
  the way it's written right now, then I think it basically amounts to
  throwing platforms that don't have working atomics emulation under the
  bus

 , and my vote is that if we're going to do that, we should do it
  explicitly: sorry, we now only support platforms with working atomics.
  You don't have any, so you can't run PostgreSQL.  Have a nice day.

I might share that view if a platform faced a huge and easily-seen performance
regression, say a 40% regression to 5-client performance.  I don't expect the
shortfall to reach that ballpark for any atomics-exploiting algorithm we'll
adopt, and your (Andres's) latest measurements corroborate that.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-30 Thread Tom Lane
Abhijit Menon-Sen a...@2ndquadrant.com writes:
 At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote:
 It's not unlikely that we want this patch *and* an improvement that
 allows multi-character src strings

 I think it's enough to apply just this patch, but I wouldn't object to
 doing both if it were easy. It's not clear to me if that's true after a
 quick glance at the code, but I'll look again when I'm properly awake.

I went ahead and committed this patch, and also some further work to
fix the multicharacter-source problem.  I took it on myself to make the
code issue warnings about misformatted lines, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-06-30 Thread Noah Misch
On Mon, Jun 30, 2014 at 09:16:45PM +, Christian Ullrich wrote:
 * From: Noah Misch [mailto:n...@leadboat.com]
  Yes; the consequence of ignoring ^C is that the test postmaster would
  persist indefinitely after the ^C.  The system under test doesn't care per
 
 No it won't, please see below.

 I think I know where we're talking past each other. You may be assuming that 
 kill(postmaster, SIGINT) would be affected by my patch. It would not. 
 PostgreSQL's signal emulation on Windows works completely separately from the 
 actual Windows analogy to signals in console processes. My patch drops these 
 analogous events, but the emulated signals still work the same way as 
 before.

I wasn't assuming otherwise.

 When Ctrl-C is pressed in a console window, pg_console_handler() in 
 backend/port/win32/signal.c is called with a CTRL_C_EVENT, and converts that 
 to an emulated SIGINT (unless I tell it not to). That emulated SIGINT is then 
 processed as usual. pg_ctl -m fast stop sends the emulated SIGINT directly.

The main point of this patch is to have pg_console_handler() in the postmaster
ignore ^C when pg_ctl start had started this postmaster.  Right?

 Anyway, I just ran upgradecheck, and it reported success without leaving any 
 stale postmasters around.

By the time the test reports success, it has already shutdown the postmaster
cleanly.  I'm focusing on the case where the user decides halfway through the
test run to cancel it.  Currently, ^C will reach both the test driver and the
postmaster, and everything will shut down; no pg_ctl stop is involved.  What
happens if you issue vcregress upgradecheck and strike ^C between the
Setting up data for upgrading and Dumping old cluster messages?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-06-30 Thread Etsuro Fujita

(2014/06/30 20:17), Ashutosh Bapat wrote:

On Mon, Jun 30, 2014 at 4:17 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:



(2014/06/30 17:47), Ashutosh Bapat wrote:



BTW, why aren't you using the tlist passed to this function? I guess
create_scan_plan() passes tlist after processing it, so that
should be
used rather than rel-reltargetlist.



I think that that would be maybe OK, but I think that it would not
be efficient to use the list to compute attrs_used, because the
tlist would have more information than rel-reltargetlist in cases
where the tlist is build through build_physical_tlist().



In that case, we can call build_relation_tlist() for foreign tables.


Do you mean build_physical_tlist()?

Yeah, we can call build_physical_tlist() (and do that in some cases), 
but if we call the function, it would generate a tlist that contains all 
Vars in the relation, not only those Vars actually needed by the query 
(ie, Vars in reltargetlist), and thus it would take more cycles to 
compute attr_used from the tlist than from reltargetlist.  That' what I 
wanted to say.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-06-30 Thread Etsuro Fujita
(2014/06/30 22:48), Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Done.  I think this is because create_foreignscan_plan() makes reference
 to attr_needed, which isn't computed for inheritance children.
 
 I wonder whether it isn't time to change that.  It was coded like that
 originally only because calculating the values would've been a waste of
 cycles at the time.  But this is at least the third place where it'd be
 useful to have attr_needed for child rels.

+1 for calculating attr_needed for child rels.  (I was wondering too.)

I'll create a separate patch for it.

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >