Re: [HACKERS] review: plpgsql return a row-expression

2012-11-23 Thread Asif Rehman
Hi,

I forgot to add documentation changes in the earlier patch. In the attached
patch, I have added documentation as well as fixed other issues you raised.

Regards,
--Asif

On Thu, Nov 22, 2012 at 10:47 PM, Asif Rehman asifr.reh...@gmail.comwrote:

 Thanks for the review Pavel. I have taken care of the points you raised.
 Please see the updated patch.

 Regards,
 --Asif


 On Wed, Nov 21, 2012 at 1:47 AM, Pavel Stehule pavel.steh...@gmail.comwrote:

 related to
 http://archives.postgresql.org/message-id/caauglxwpedfwae6dajmf7sxewfusa0f68p07retbbpf_fsa...@mail.gmail.com

 * patched and compiled withou warnings

 * All 133 tests passed.


 but

 I don't like

 * call invalid function from anonymous block - it is messy (regress
 tests) - there is no reason why do it

 +create or replace function foo() returns footype as $$
 +declare
 +  v record;
 +  v2 record;
 +begin
 +  v := (1, 'hello');
 +  v2 := (1, 'hello');
 +  return (v || v2);
 +end;
 +$$ language plpgsql;
 +DO $$
 +declare
 +  v footype;
 +begin
 +  v := foo();
 +  raise info 'x = %', v.x;
 +  raise info 'y = %', v.y;
 +end; $$;
 +ERROR:  operator does not exist: record || record
 +LINE 1: SELECT (v || v2)
 +  ^

 * there is some performance issue

 create or replace function fx2(a int)
 returns footype as $$
 declare x footype;
 begin
   x = (10,20);
   return x;
 end;
 $$ language plpgsql;

 postgres=# select sum(fx2.x) from generate_series(1,10) g(i),
 lateral fx2(i);
sum
 -
  100
 (1 row)

 Time: 497.129 ms

 returns footype as $$
 begin
   return (10,20);
 end;
 $$ language plpgsql;

 postgres=# select sum(fx2.x) from generate_series(1,10) g(i),
 lateral fx2(i);
sum
 -
  100
 (1 row)

 Time: 941.192 ms

 following code has same functionality and it is faster

 if (stmt-expr != NULL)
 {
 if (estate-retistuple)
 {
 TupleDesc   tupdesc;
 Datum   retval;
 Oid rettype;
 boolisnull;
 int32   tupTypmod;
 Oid tupType;
 HeapTupleHeader td;
 HeapTupleData   tmptup;

 retval = exec_eval_expr(estate,

 stmt-expr,
 isnull,
 rettype);

 /* Source must be of RECORD or composite type */
 if (!type_is_rowtype(rettype))
 ereport(ERROR,

 (errcode(ERRCODE_DATATYPE_MISMATCH),
  errmsg(cannot return
 non-composite value from composite type returning function)));

 if (!isnull)
 {
 /* Source is a tuple Datum, so safe to
 do this: */
 td = DatumGetHeapTupleHeader(retval);
 /* Extract rowtype info and find a
 tupdesc */
 tupType = HeapTupleHeaderGetTypeId(td);
 tupTypmod = HeapTupleHeaderGetTypMod(td);
 tupdesc =
 lookup_rowtype_tupdesc(tupType, tupTypmod);

 /* Build a temporary HeapTuple control
 structure */
 tmptup.t_len =
 HeapTupleHeaderGetDatumLength(td);
 ItemPointerSetInvalid((tmptup.t_self));
 tmptup.t_tableOid = InvalidOid;
 tmptup.t_data = td;

 estate-retval =
 PointerGetDatum(heap_copytuple(tmptup));
 estate-rettupdesc =
 CreateTupleDescCopy(tupdesc);
 ReleaseTupleDesc(tupdesc);
 }

 estate-retisnull = isnull;
 }



 * it is to restrictive (maybe) - almost all plpgsql' statements does
 automatic conversions (IO conversions when is necessary)

 create type footype2 as (a numeric, b varchar)

 postgres=# create or replace function fx3(a int)
 returns footype2 as
 $$
 begin
   return (1000.22234213412342143,'ewqerwqre');
 end;
 $$ language plpgsql;
 CREATE FUNCTION
 postgres=# select fx3(10);
 ERROR:  returned record type does not match expected record type
 DETAIL:  Returned type unknown does not match expected type character
 varying in column 2.
 CONTEXT:  PL/pgSQL function fx3(integer) while casting return value to
 function's return type
 postgres=#

 * it doesn't support RETURN NEXT

 postgres=# create or replace function fx4()
 postgres-# returns setof footype as $$
 postgres$# begin
 postgres$#   for i in 1..10
 postgres$#   loop
 postgres$# return next 

Re: [HACKERS] PQconninfo function for libpq

2012-11-23 Thread Boszormenyi Zoltan

2012-11-23 06:30 keltezéssel, Fujii Masao írta:

On Thu, Nov 22, 2012 at 10:05 PM, Boszormenyi Zoltan z...@cybertec.at wrote:

2012-11-22 12:44 keltezéssel, Magnus Hagander írta:

Also, a question was buried in the other review which is - are we OK
to remove the requiressl parameter. Both these patches do so, because
the code becomes much simpler if we can do that. It has been
deprecated since 7.2. Is it OK to remove it, or do we need to put
back
in the more complex code to deal with both?

Just going to highlight that we're looking for at least one third
party to comment on this :)


Yes, me too. A +1 for removing wouldn't count from me. ;-)

+1


Thanks. :-)




The second one is the product of what caught my attention while
I was looking at pg_receivexlog. The current coding may write
beyond the end of the allocated arrays and the password may
overwrite a previously set keyword/value pair.

ISTM that such problem doesn't happen at all because argcount is
incremented as follows.

if (dbhost)
argcount++;
if (dbuser)
argcount++;
if (dbport)
argcount++;


Right, forget about the second patch.



Regards,




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-23 Thread Amit Kapila
On Thursday, November 22, 2012 10:09 PM Fujii Masao wrote:
 On Thu, Nov 22, 2012 at 9:38 PM, Amit kapila amit.kap...@huawei.com
 wrote:
  Patch to implement SET PERSISTENT command is attached with this mail.
  Now it can be reviewed.
 
 I got the following compile warnings:
 xact.c:1847: warning: implicit declaration of function
 'AtEOXact_UpdateAutoConfFiles'
 guc.c:5134: warning: enumeration value 'PGC_ENUM' not handled in switch
 
 SET PERSISTENT name TO value failed, though SET PERSISTENT name =
 value
 succeeded.
 
 =# SET PERSISTENT synchronous_commit TO 'local';
 ERROR:  syntax error at or near TO
 LINE 1: SET PERSISTENT synchronous_commit TO 'local';
 =# SET PERSISTENT synchronous_commit = 'local';
 SET
 
 SET PERSISTENT succeeded even if invalid setting value was set.
 
 =# SET PERSISTENT synchronous_commit = 'hoge';
 SET
 
 SET PERSISTENT syntax should be explained by \help SET psql command.

Thank you. I shall take care of above in next patch and do more test.
 
 When I remove postgresql.auto.conf, SET PERSISTENT failed.
 
 =# SET PERSISTENT synchronous_commit = 'local';
 ERROR:  failed to open postgresql.auto.conf file
 
There can be 2 ways to handle this, either we recreate the
postgresql.auto.conf file or give error.
I am not sure if user tries to delete internal files, what should be exact
behavior?
Any suggestion?

 We should implement RESET PERSISTENT? Otherwise, there is no way
 to get rid of the parameter setting from postgresql.auto.conf, via SQL.
 Also
 We should implement SET PERSISTENT name TO DEFAULT?

Till now, I have not implemented this in patch, thinking that it can be done
as a 2nd part if basic stuff is ready.
However I think you are right without one of RESET PERSISTENT or SET
PERSISTENT name TO DEFAULT, it is difficult for user 
to get rid of parameter.
Will SET PERSISTENT name TO DEFAULT be sufficient or do you think both are
necessary, because RESET PERSISTENT also internally might need
to behave similarly.

For implementation of SET PERSISTENT name TO DEFAULT, there can be 2 ways
1) Delete the entry from postgresql.auto.conf
2) Update the entry value in postgresql.auto.conf to default value

Incase we just do update, then user might not be able to modify
postgresql.conf manually once the command is executed.
So I think Delete is better option. Let me know if you think otherwise or
you have some other idea to achieve it.

 Is it helpful to output the notice message like 'Run pg_reload_conf() or
 restart the server if you want new settings to take effect' always after
 SET PERSISTENT command?

Not sure because if someone uses SET PERSISTENT command, he should know the
effect or behavior of same.
I think it's good to put this in UM along with PERSISTENT option
explanation.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: plpgsql return a row-expression

2012-11-23 Thread Pavel Stehule
Hello

2012/11/23 Asif Rehman asifr.reh...@gmail.com:
 Hi,

 I forgot to add documentation changes in the earlier patch. In the attached
 patch, I have added documentation as well as fixed other issues you raised.


ok

so

* applied and compiled cleanly
* All 133 tests passed.
* there are doc
* there are necessary regress tests
* automatic conversion is works like plpgsql user expects
* there are no performance issue now
* code respects our coding standards
+ code remove redundant lines

I have no any objection

this patch is ready to commit!

Regards

Pavel




 Regards,
 --Asif


 On Thu, Nov 22, 2012 at 10:47 PM, Asif Rehman asifr.reh...@gmail.com
 wrote:

 Thanks for the review Pavel. I have taken care of the points you raised.
 Please see the updated patch.

 Regards,
 --Asif


 On Wed, Nov 21, 2012 at 1:47 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:

 related to
 http://archives.postgresql.org/message-id/caauglxwpedfwae6dajmf7sxewfusa0f68p07retbbpf_fsa...@mail.gmail.com

 * patched and compiled withou warnings

 * All 133 tests passed.


 but

 I don't like

 * call invalid function from anonymous block - it is messy (regress
 tests) - there is no reason why do it

 +create or replace function foo() returns footype as $$
 +declare
 +  v record;
 +  v2 record;
 +begin
 +  v := (1, 'hello');
 +  v2 := (1, 'hello');
 +  return (v || v2);
 +end;
 +$$ language plpgsql;
 +DO $$
 +declare
 +  v footype;
 +begin
 +  v := foo();
 +  raise info 'x = %', v.x;
 +  raise info 'y = %', v.y;
 +end; $$;
 +ERROR:  operator does not exist: record || record
 +LINE 1: SELECT (v || v2)
 +  ^

 * there is some performance issue

 create or replace function fx2(a int)
 returns footype as $$
 declare x footype;
 begin
   x = (10,20);
   return x;
 end;
 $$ language plpgsql;

 postgres=# select sum(fx2.x) from generate_series(1,10) g(i),
 lateral fx2(i);
sum
 -
  100
 (1 row)

 Time: 497.129 ms

 returns footype as $$
 begin
   return (10,20);
 end;
 $$ language plpgsql;

 postgres=# select sum(fx2.x) from generate_series(1,10) g(i),
 lateral fx2(i);
sum
 -
  100
 (1 row)

 Time: 941.192 ms

 following code has same functionality and it is faster

 if (stmt-expr != NULL)
 {
 if (estate-retistuple)
 {
 TupleDesc   tupdesc;
 Datum   retval;
 Oid rettype;
 boolisnull;
 int32   tupTypmod;
 Oid tupType;
 HeapTupleHeader td;
 HeapTupleData   tmptup;

 retval = exec_eval_expr(estate,

 stmt-expr,
 isnull,

 rettype);

 /* Source must be of RECORD or composite type */
 if (!type_is_rowtype(rettype))
 ereport(ERROR,

 (errcode(ERRCODE_DATATYPE_MISMATCH),
  errmsg(cannot return
 non-composite value from composite type returning function)));

 if (!isnull)
 {
 /* Source is a tuple Datum, so safe to
 do this: */
 td = DatumGetHeapTupleHeader(retval);
 /* Extract rowtype info and find a
 tupdesc */
 tupType = HeapTupleHeaderGetTypeId(td);
 tupTypmod = HeapTupleHeaderGetTypMod(td);
 tupdesc =
 lookup_rowtype_tupdesc(tupType, tupTypmod);

 /* Build a temporary HeapTuple control
 structure */
 tmptup.t_len =
 HeapTupleHeaderGetDatumLength(td);
 ItemPointerSetInvalid((tmptup.t_self));
 tmptup.t_tableOid = InvalidOid;
 tmptup.t_data = td;

 estate-retval =
 PointerGetDatum(heap_copytuple(tmptup));
 estate-rettupdesc =
 CreateTupleDescCopy(tupdesc);
 ReleaseTupleDesc(tupdesc);
 }

 estate-retisnull = isnull;
 }



 * it is to restrictive (maybe) - almost all plpgsql' statements does
 automatic conversions (IO conversions when is necessary)

 create type footype2 as (a numeric, b varchar)

 postgres=# create or replace function fx3(a int)
 returns footype2 as
 $$
 begin
   return (1000.22234213412342143,'ewqerwqre');
 end;
 $$ language plpgsql;
 CREATE FUNCTION
 postgres=# select fx3(10);
 ERROR:  returned record type does not match expected record type
 DETAIL:  Returned type unknown does not match expected type 

Re: [HACKERS] PQconninfo function for libpq

2012-11-23 Thread Magnus Hagander
On Fri, Nov 23, 2012 at 6:30 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Nov 22, 2012 at 10:05 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 2012-11-22 12:44 keltezéssel, Magnus Hagander írta:
 Also, a question was buried in the other review which is - are we OK
 to remove the requiressl parameter. Both these patches do so, because
 the code becomes much simpler if we can do that. It has been
 deprecated since 7.2. Is it OK to remove it, or do we need to put
 back
 in the more complex code to deal with both?

 Just going to highlight that we're looking for at least one third
 party to comment on this :)


 Yes, me too. A +1 for removing wouldn't count from me. ;-)

 +1

Actually, with the cleaner code that resulted from the rewrite,
reintroducing it turned out to be pretty darn simple - if I'm not
missing something. PFA a patch that comes *with* requiressl=n
support, without making the code that much more complex.

It also fixes the fact that pg_service.conf was broken by the previous one.

It also includes the small fixes from Zoltans latest round (the one
that was for libpq, not the one for pg_receivexlog that turned out to
be wrong).

And a pgindent run :)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


PQconninfo-mha-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-23 Thread Amit Kapila
Shouldn't that data be in the shared buffers if not the OS cache and hence
approximately same IO will be required?

 

I don't think so as the data in OS cache or PG Shared buffers doesn't have
any direct relation, OS can flush its buffers based on its scheduler
algorithm.

 

Let us try to see by example:

Total RAM - 22G

Database size - 16G

 

Case -1 (Shared Buffers - 5G)

a. Load all the files in OS buffers. Chances are good that all 16G data
will be there in OS buffers as OS has still 17G of memory available.

b. Try to load all in Shared buffers. Last 5G will be there in shared
buffers.

c. Chances are high that remaining 11G buffers access will not lead to IO
as they will be in OS buffers.

 

Case -2 (Shared Buffers - 10G)

a. Load all the files in OS buffers. In best case OS buffers can
contain10-12G data as OS has 12G of memory available.

b. Try to load all in Shared buffers. Last 10G will be there in shared
buffers.

c. Now as there is no direct correlation of data between Shared Buffers and
OS buffers, so whenever PG has to access any data

   which is not there in Shared Buffers, good chances are there that it can
lead to IO.

 

 

 Again, the drop in the performance is so severe that it seems worth
investigating that further, especially because you can reproduce it
reliably.

 

   Yes, I agree that it is worth investigating, but IMO this is a different
problem which might not be addressed with the Patch in discussion. 

The 2 reasons I can think for dip in performance when Shared Buffers
increase beyond certain  threshhold percentage of RAM are, 

 a. either the algorithm of Buffer Management has some bottleneck

   b. due to the way data is managed in Shared Buffers and OS buffer cache

 

The point I want to tell is explained at below link as well.

http://blog.kimiensoftware.com/2011/05/postgresql-vs-oracle-differences-4-sh
ared-memory-usage-257

 

So if above is true, I think the performance will regain if in the test
Shared Buffers are set to 16G. I shall once try that setting for test run.

 

With Regards,

Amit Kapila.

 



Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)

2012-11-23 Thread Heikki Linnakangas

On 15.11.2012 17:16, Heikki Linnakangas wrote:

On 15.11.2012 16:55, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com writes:

This is a fairly general issue, actually. Looking around, I can see at
least two similar cases in existing code, with BasicOpenFile, where we
will leak file descriptors on error:


Um, don't we automatically clean those up during transaction abort?


Not the ones allocated with PathNameOpenFile or BasicOpenFile. Files
allocated with AllocateFile() and OpenTemporaryFile() are cleaned up at
abort.


If we don't, we ought to think about that, not about cluttering calling
code with certain-to-be-inadequate cleanup in error cases.


Agreed. Cleaning up at end-of-xact won't help walsender or other
non-backend processes, though, because they don't do transactions. But a
top-level ResourceOwner that's reset in the sigsetjmp() cleanup routine
would work.


This is what I came up with. It adds a new function, OpenFile, that 
returns a raw file descriptor like BasicOpenFile, but the file 
descriptor is associated with the current subtransaction and 
automatically closed at end-of-xact, in the same way that AllocateFile 
and AllocateDir work. In other words, OpenFile is to open() what 
AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw 
fd and it's solely the caller's responsibility to close it, but many of 
the places that used to call BasicOpenFile now use the safer OpenFile 
function instead.


This patch plugs three existing fd (or virtual fd) leaks:

1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile
2. XLogFileLinit() - fixed by adding close() calls to the error cases. 
Can't use OpenFile here because the fd is supposed to persist over 
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenFile instead of 
PathNameOpenFile.


In addition, this replaces many BasicOpenFile() calls with OpenFile() 
that were not leaking, because the code meticulously closed the file on 
error. That wasn't strictly necessary, but IMHO it's good for robustness.


One thing I'm not too fond of is the naming. I'm calling the new 
functions OpenFile and CloseFile. There's some danger of confusion 
there, as the function to close a virtual file opened with 
PathNameOpenFile is called FileClose. OpenFile is really the same kind 
of operation as AllocateFile and AllocateDir, but returns an unbuffered 
fd. So it would be nice if it was called AllocateSomething, too. But 
AllocateFile is already taken. And I don't much like the Allocate* 
naming for these anyway, you really would expect the name to contain open.


Do we want to backpatch this? We've had zero complaints, but this seems 
fairly safe to backpatch, and at least the leak in copy_file() can be 
quite annoying. If you run out of disk space in CREATE DATABASE, the 
target file is kept open even though it's deleted, so the space isn't 
reclaimed until you disconnect.


- Heikki
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index dd69c23..cd60dd8 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -531,7 +531,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruFlush fdata)
 		int			i;
 
 		for (i = 0; i  fdata-num_files; i++)
-			close(fdata-fd[i]);
+			CloseFile(fdata-fd[i]);
 	}
 
 	/* Re-acquire control lock and update page state */
@@ -593,7 +593,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	 * SlruPhysicalWritePage).	Hence, if we are InRecovery, allow the case
 	 * where the file doesn't exist, and return zeroes instead.
 	 */
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
+	fd = OpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
 	if (fd  0)
 	{
 		if (errno != ENOENT || !InRecovery)
@@ -614,7 +614,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	{
 		slru_errcause = SLRU_SEEK_FAILED;
 		slru_errno = errno;
-		close(fd);
+		CloseFile(fd);
 		return false;
 	}
 
@@ -623,11 +623,11 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	{
 		slru_errcause = SLRU_READ_FAILED;
 		slru_errno = errno;
-		close(fd);
+		CloseFile(fd);
 		return false;
 	}
 
-	if (close(fd))
+	if (CloseFile(fd))
 	{
 		slru_errcause = SLRU_CLOSE_FAILED;
 		slru_errno = errno;
@@ -740,7 +740,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		 * don't use O_EXCL or O_TRUNC or anything like that.
 		 */
 		SlruFileName(ctl, path, segno);
-		fd = BasicOpenFile(path, O_RDWR | O_CREAT | PG_BINARY,
+		fd = OpenFile(path, O_RDWR | O_CREAT | PG_BINARY,
 		   S_IRUSR | S_IWUSR);
 		if (fd  0)
 		{
@@ -773,7 +773,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		slru_errcause = SLRU_SEEK_FAILED;
 		slru_errno = errno;
 		if (!fdata)
-			close(fd);
+			CloseFile(fd);
 		return false;
 	}
 
@@ -786,7 +786,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		slru_errcause = 

Re: [HACKERS] fix ecpg core dump when there's a very long struct variable name in .pgc file

2012-11-23 Thread Michael Meskes
On Thu, Nov 22, 2012 at 06:09:20PM +0800, Chen Huajun wrote:
 When use a struct variable whose name length is very very long such as 12KB 
 in .pgc source,
 ecpg will core dump because of buffer overflow if precompile the .pgc file.

How on earth did you run into this? :)

I absolutely agree that this is better be fixed and cjust committed the second
version of your patch.

Thanks.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-11-23 Thread Michael Paquier
On Sat, Nov 17, 2012 at 2:48 AM, Phil Sorber p...@omniti.com wrote:

 On Thu, Nov 15, 2012 at 10:55 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Fri, Nov 16, 2012 at 12:34 PM, Phil Sorber p...@omniti.com wrote:
  On Thu, Nov 15, 2012 at 9:23 PM, Michael Paquier
  michael.paqu...@gmail.com wrote:
   3) Having an output close to what ping actually does would also be
 nice,
   the
   current output like Accepting/Rejecting Connections are not that
 
  Could you be more specific? Are you saying you don't want to see
  accepting/rejecting info output?
 
  OK sorry.
 
  I meant something like that for an accessible server:
  $ pg_ping -c 3 -h server.com
  PING server.com (192.168.1.3)
  accept from 192.168.1.3: icmp_seq=0 time=0.241 ms
  accept from 192.168.1.3: icmp_seq=0 time=0.240 ms
  accept from 192.168.1.3: icmp_seq=0 time=0.242 ms
 
  Like that for a rejected connection:
  reject from 192.168.1.3: icmp_seq=0 time=0.241 ms
 
  Like that for a timeout:
  timeout from 192.168.1.3: icmp_seq=0
  Then print 1 line for each ping taken to stdout.

 How does icmp_seq fit into this? Or was that an oversight?

 Also, in standard ping if you don't pass -c it will continue to loop
 until interrupted. Would you suggest that pg_ping mimic that, or that
 we add an additional flag for that behavior?

 FWIW, I would use 'watch' with the existing output for cases that I
 would need something like that.

We waited a couple of days for feedback for this feature. So based on all
the comments provided by everybody on this thread, perhaps we should move
on and implement the options that would make pg_ping a better wrapper for
PQPing. Comments?
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-11-23 Thread Kevin Grittner
Merlin Moncure wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Merlin Moncure mmonc...@gmail.com wrote:

 Hm, I bet it's possible (although probably not easy) to deduce
 volatility from the function body...maybe through the validator.
 If you could do that (perhaps warning in cases where you can't),
 then the performance regression-inducing-argument (which I agree
 with) becomes greatly ameliorated.

 For about the last 10 years the Wisconsin Courts have been parsing
 SQL code to generate Java query classes, including stored
 procedures, and determining information like this. For example,
 we set a readOnly property for the query class by examining the
 statements in the procedure and the readOnly status of each called
 procedure. It wasn't that hard for us, and I'm not sure what would
 make much it harder in PostgreSQL, if we can do it where a parse
 tree for the function is handy.
 
 hm, what do you do about 'after the fact' changes to things the
 procedure body is pointing to? what would the server have to do?

We did a regeneration of the whole set near the end of each release
cycle, and that or smaller changes as needed during the release
cycle. Of course, we didn't have any equivalent of pg_depend.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [pgsql-cluster-hackers] Question: Can i cut NON-HOT chain Pointers if there are no concurrent updates?

2012-11-23 Thread Markus Wanner
Henning,

On 11/23/2012 03:17 PM, Henning Mälzer wrote:
 Can somebody help me?

Sure, but you might get better answers on the -hackers mailing list. I'm
redirecting there. The cluster-hackers one is pretty low volume and low
subscribers, I think.

 Question:
 What would be the loss if i cut NON-HOT chain Pointers, meaning i set 
 t_ctid=t_self in the case where it points to a tuple on another page?

READ COMMITTED would stop to work correctly in the face of concurrent
transactions, I guess. See the fine manual:

http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-READ-COMMITTED

The problem essentially boils down to: READ COMMITTED transactions need
to learn about tuples *newer* than what their snapshot would see.

 I am working on a project based on postgres (PostgreSQL) 8.5devel with the 
 code from several master thesises befor me.

Care to elaborate a bit? Can (part of) that code be released under an
open license?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-23 Thread Alvaro Herrera
Jan,

Are you posting an updated patch?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-11-23 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 If the bgworker developer gets really tense about this stuff (or
 anything at all, really), they can create a completely new sigmask and
 do sigaddset() etc.  Since this is all C code, we cannot keep them from
 doing anything, really; I think what we need to provide here is just a
 framework to ease development of simple cases.

An important point here is that if a bgworker does need to do its own
signal manipulation --- for example, installing custom signal handlers
--- it would be absolutely catastrophic for us to unblock signals before
reaching worker-specific code; signals might arrive before the process
had a chance to fix their handling.  So I'm against Heikki's auto-unblock
proposal.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Re] [Re] Re: [HACKERS] PANIC: could not write to log file

2012-11-23 Thread Tom Lane
Cyril VELTER cyril.vel...@metadys.com writes:
I follow up on my previous message. Just got two more crash today very 
 similar to the first ones :

 PANIC:  could not write to log file 118, segment 237 at offset 2686976, 
 length 5578752: Invalid argument
 STATEMENT:  COMMIT
 PANIC:  could not write to log file 118, segment 227 at offset 9764864, 
 length 57344: Invalid argument
 STATEMENT:  COMMIT

 for memory the first ones are 

 PANIC:  could not write to log file 117, segment 117 at offset 5660672, 
 length 4096000: Invalid argument
 STATEMENT:  COMMIT
 PANIC:  could not write to log file 118, segment 74 at offset 12189696, 
 length 475136: Invalid argument
 STATEMENT:  COMMIT

Hm, those write lengths seem rather large.  What have you got
wal_buffers set to?  Does it help if you back it off to whatever you
were using in the 8.2 installation?  (The default back in 8.2 days
seems to have been 64kB, if that helps.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-23 Thread Fujii Masao
On Fri, Nov 23, 2012 at 6:56 PM, Amit Kapila amit.kap...@huawei.com wrote:
 When I remove postgresql.auto.conf, SET PERSISTENT failed.

 =# SET PERSISTENT synchronous_commit = 'local';
 ERROR:  failed to open postgresql.auto.conf file

 There can be 2 ways to handle this, either we recreate the
 postgresql.auto.conf file or give error.
 I am not sure if user tries to delete internal files, what should be exact
 behavior?
 Any suggestion?

I prefer to recreate it.

$PGDATA/config_dir is specified in include_dir by default. Users might
create and remove the configuration files in that directory many times.
So I'm not surprised even if a user accidentally removes
postgresql.auto.conf in that directory. Also users might purposely remove
that file to reset all the settings by SET PERSISTENT. So I think that SET
PERSISTENT should handle the case where postgresql.auto.conf doesn't
exist.

We might be able to expect that postgresql.auto.conf is not deleted by
a user if it's in $PGDATA/global or base directory.

 We should implement RESET PERSISTENT? Otherwise, there is no way
 to get rid of the parameter setting from postgresql.auto.conf, via SQL.
 Also
 We should implement SET PERSISTENT name TO DEFAULT?

 Till now, I have not implemented this in patch, thinking that it can be done
 as a 2nd part if basic stuff is ready.
 However I think you are right without one of RESET PERSISTENT or SET
 PERSISTENT name TO DEFAULT, it is difficult for user
 to get rid of parameter.
 Will SET PERSISTENT name TO DEFAULT be sufficient or do you think both are
 necessary, because RESET PERSISTENT also internally might need
 to behave similarly.

 For implementation of SET PERSISTENT name TO DEFAULT, there can be 2 ways
 1) Delete the entry from postgresql.auto.conf
 2) Update the entry value in postgresql.auto.conf to default value

Both seems to be useful. I think that SET ... TO DEFAULT is suitable for
2), and RESET PERSISTENT ... is suitable for 1).


Another comment is:

What happens if the server crashes while SET PERSISTENT is writing the
setting to the file? A partial write occurs and restart of the server would fail
because of corrupted postgresql.auto.conf?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further pg_upgrade analysis for many tables

2012-11-23 Thread Bruce Momjian
On Thu, Nov 15, 2012 at 07:05:00PM -0800, Jeff Janes wrote:
 On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Jeff Janes jeff.ja...@gmail.com writes:
  On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  There are at least three ways we could whack that mole: ...
 
  * Keep a separate list (or data structure of your choice) so that
  relcache entries created in the current xact could be found directly
  rather than having to scan the whole relcache.  That'd add complexity
  though, and could perhaps be a net loss for cases where the relcache
  isn't so bloated.
 
  Maybe a static list that can overflow, like the ResourceOwner/Lock
  table one recently added.  The overhead of that should be very low.
 
  Are the three places where need_eoxact_work = true; the only places
  where things need to be added to the new structure?
 
  Yeah.  The problem is not so much the number of places that do that,
  as that places that flush entries from the relcache would need to know
  to remove them from the separate list, else you'd have dangling
  pointers.
 
 If the list is of hash-tags rather than pointers, all we would have to
 do is ignore entries that are not still in the hash table, right?
 
 
 On a related thought, is a shame that create temp table on commit
 drop sets need_eoxact_work, because by the time we get to
 AtEOXact_RelationCache upon commit, the entry is already gone and so
 there is actual work to do (unless a non-temp  table was also
 created).  But on abort, the entry is still there.  I don't know if
 there is an opportunity for optimization there for people who use temp
 tables a lot.  If we go with a caching list, that would render it moot
 unless they use so many as to routinely overflow the cache.

I added the attached C comment last year to mention why temp tables are
not as isolated as we think, and can't be optimized as much as you would
think.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
commit f458c90bff45ecae91fb55ef2b938af37d977af3
Author: Bruce Momjian br...@momjian.us
Date:   Mon Sep 5 22:08:14 2011 -0400

Add C comment about why we send cache invalidation messages for
session-local objects.

diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
new file mode 100644
index 337fe64..98dc3ad
*** a/src/backend/utils/cache/inval.c
--- b/src/backend/utils/cache/inval.c
*** ProcessCommittedInvalidationMessages(Sha
*** 812,817 
--- 812,821 
   * about CurrentCmdInvalidMsgs too, since those changes haven't touched
   * the caches yet.
   *
+  * We still send invalidation messages for session-local objects to other
+  * backends because, while other backends cannot see any tuples, they can
+  * drop tables that are session-local to another session.
+  * 
   * In any case, reset the various lists to empty.  We need not physically
   * free memory here, since TopTransactionContext is about to be emptied
   * anyway.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state

2012-11-23 Thread Heikki Linnakangas

On 09.11.2012 15:28, Fujii Masao wrote:

On Fri, Nov 9, 2012 at 4:06 AM, Alvaro Herreraalvhe...@2ndquadrant.com  wrote:

Fujii Masao escribió:

On Fri, Oct 19, 2012 at 10:29 PM, Fujii Masaomasao.fu...@gmail.com  wrote:



However, I've forgotten to treat other three portions in
walsender.c and syncrep.c also does XLogRecPtrIsInvalid(XLogPtr
which comes from WAL receiver).  This new patch includes the
changes for them.


Good catch.


Does any commiter pick up this?


If not, please add to next commitfest so that we don't forget.


Yep, I added this to next CF. This is just a bug fix, so please feel free to
pick up this even before CF.


Committed, thanks.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Deparsing DDL command strings

2012-11-23 Thread Pavel Stehule
Hello

2012/11/22 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Hi,

 Thank you Pavel for reviewing my patch! :)

 Pavel Stehule pavel.steh...@gmail.com writes:
 * issues

 ** missing doc

 Yes. I wanted to reach an agreement on why we need the de parsing for.
 You can read the Last comment we made about it at the following link, I
 didn't have any answer to my proposal.

   http://archives.postgresql.org/message-id/m2391fu79m@2ndquadrant.fr

 In that email, I said:
 Also, I'm thinking that publishing the normalized command string is
 something that must be maintained in the core code, whereas the external
 stable format might be done as a contrib extension, coded in C, working
 with the Node *parsetree. Because it needs to ouput a compatible format
 when applied to different major versions of PostgreSQL, I think it suits
 quite well the model of C coded extensions.

 I'd like to hear what people think about that position. I could be
 working on the docs meanwhile I guess, but a non trivial amount of what
 I'm going to document is to be thrown away if we end up deciding that we
 don't want the normalized command string…


I agree with you, so for PL languages just plain text is best format -
it is enough for all tasks that, I can imagine, can be solved by all
supported PL. And for complex tasks - exported parsetree is perfect.

My starting point is strategy, so everything should by possible from
PL/pgSQL, that is our most used PL - and there any complex format is
bad - the most work is doable via plan text and regular expressions.

There is precedent - EXPLAIN - we support more formats, but in PL, we
use usually just plain format.



 ** statements are not deparsd for ddl_command_start event

 Yes, that's by design, and that's why we have the ddl_command_trace
 event alias too. Tom is right in saying that until the catalogs are
 fully populated we can not rewrite most of the command string if we want
 normalized output (types, constraints, namespaces, all the jazz).


ok

 ** CREATE FUNCTION is not supported

 Not yet. The goal of this patch in this CF is to get a green light on
 providing a full implementation of what information to add to event
 triggers and in particular about rewriting command strings.

I don't have a objection. Any other ???

Regards

Pavel


 That said, if you think it would help you as a reviewer, I may attack
 the CREATE FUNCTION support right now.

 * some wrong in deparsing - doesn't support constraints

 postgres=# alter table a add column bbbsss int check (bbbsss  0);
 NOTICE:  event: ddl_command_start, context: TOPLEVEL, tag: ALTER
 TABLE, operation: ALTER, type: TABLE, schema: NULL, name: NULL
 NOTICE:  command: NULL
 NOTICE:  event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE,
 operation: ALTER, type: TABLE, schema: public, name: a
 NOTICE:  command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ;
 ALTER TABLE

 Took me some time to figure out how the constraint processing works in
 CREATE TABLE, and apparently I didn't finish ALTER TABLE support yet.
 Working on that, thanks.




 Regards,
 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state

2012-11-23 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Committed, thanks.

Doesn't seem to have been pushed to master?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state

2012-11-23 Thread Heikki Linnakangas

On 23.11.2012 19:55, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

Committed, thanks.


Doesn't seem to have been pushed to master?


On purpose. Per commit message:


9.3 doesn't have this problem because XLogRecPtr is now a single 64-bit
integer, so XLogRecPtrIsInvalid() does the right thing. Apply to 9.2, and
9.1 where pg_stat_replication view was introduced.


I considered applying it to master anyway, just to keep the branches in 
sync. But XLogRecPtrIsInvalid(var) seems slightly more readable than 
XLByteEQ(var, InvalidXLogRecPtr), so I decided not to.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-23 Thread Amit kapila
On Friday, November 23, 2012 11:15 AM Pavan Deolasee wrote:
On Thu, Nov 22, 2012 at 2:05 PM, Amit Kapila amit.kap...@huawei.com wrote:



Sorry, I haven't followed this thread at all, but the numbers (43171 and 
57920) in the last two runs of @mv-free-list for 32 clients look 
aberrations, no ?  I wonder if
that's skewing the average.

Yes, that is one of the main reasons, but in all runs this is consistent that 
for 32 clients or above this kind of numbers  are observed.
Even Jeff has pointed the similar thing in one of his mails and suggested to 
run the tests such that first test should run “with patch” and then “without 
patch”. 
After doing what he suggested the observations are still similar.


Are we convinced that the jump that we are seeing is a real one then ? 

  Still not convinced, as the data has been collected in only my setup. 

I'm a bit surprised because it happens only with the patch and only for 32 
clients. How would you explain that ?

The reason this patch can improve performance is due to reduce contention for 
BufFreeListLock and PartitionLock (which it takes in BufferAlloc a. to remove 
old page from buffer or b. to see if block is already in buffer pool) in 
backends. As the number of backends increase the chances of improved 
performance is much better. In particular for 32 clients when tests run for 
longer time results are not that skewed.

For 32 clients, as mentioned in previous mail when the test has ran for 1 hr, 
the differrence is not very skewed.

 32 client /32 thread for 1 hour 
@mv-free-lst@9.3devl 
Single-run:9842.019229  8050.357981 

 I also looked at the the Results.htm file down thread. There seem to be a 
 steep degradation when the shared buffers are increased from 5GB to 10GB, 
 both with and 
 without the patch. Is that expected ? If so, isn't that worth investigating 
 and possibly even fixing before we do anything else ?

 The reason for decrease in performance is that when shared buffers are 
 increased from 5GB to 10GB, the I/O starts as after increasing it cannot 
 hold all
 the data in OS buffers.


Shouldn't that data be in the shared buffers if not the OS cache and hence 
approximately same IO will be required?

I don't think so as the data in OS cache or PG Shared buffers doesn't have any 
direct relation, OS can flush its buffers based on its scheduler algorithm.

Let us try to see by example:
Total RAM - 22G
Database size - 16G

Case -1 (Shared Buffers - 5G)
a. Load all the files in OS buffers. Chances are good that all 16G data will be 
there in OS buffers as OS has still 17G of memory available.
b. Try to load all in Shared buffers. Last 5G will be there in shared buffers.
c. Chances are high that remaining 11G buffers access will not lead to IO as 
they will be in OS buffers.

Case -2 (Shared Buffers - 10G)
a. Load all the files in OS buffers. In best case OS buffers can contain10-12G 
data as OS has 12G of memory available.
b. Try to load all in Shared buffers. Last 10G will be there in shared buffers.
c. Now as there is no direct correlation of data between Shared Buffers and OS 
buffers, so whenever PG has to access any data
   which is not there in Shared Buffers, good chances are there that it can 
lead to IO.


 Again, the drop in the performance is so severe that it seems worth 
 investigating that further, especially because you can reproduce it reliably.

   Yes, I agree that it is worth investigating, but IMO this is a different 
problem which might not be addressed with the Patch in discussion. 
The 2 reasons I can think for dip in performance when Shared Buffers 
increase beyond certain threshhold percentage of RAM are, 
   a. either the algorithm of Buffer Management has some bottleneck
   b. due to the way data is managed in Shared Buffers and OS buffer cache

Any Suggestion/Comments?

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-23 Thread Josh Kupershmidt
On Wed, Nov 21, 2012 at 5:48 AM, Karl O. Pinc k...@meme.com wrote:

 On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc k...@meme.com wrote:
  On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

 OT:
 After looking at the code I found a number of  conflicting
 option combinations are not tested for or rejected.   I can't
 recall what they are now.  The right way to fix these would be
 to send in a separate patch for each conflict all attached
 to one email/commitfest item?  Or one patch that just gets
 adjusted until it's right?

Typically I think it's easiest for the reviewer+committer to consider
a bunch of such similar changes altogether in one patch, rather than
in a handful of separate patches, though that's just my own
preference.

  
 
  More serious objections were raised regarding semantics.
 
  What if, instead, the initial structure looked like:
 
  CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);
 
  CREATE TABLE schemaB.bar
(id INT CONSTRAINT bar_on_foo REFERENCES foo
   , moredata INT);
 
  With a case like this, in most real-world situations, you'd
  have to use pg_restore with --disable-triggers if you wanted
  to use --data-only and --truncate-tables.  The possibility of
  foreign key referential integrity corruption is obvious.

 Why would --disable-triggers be used in this example? I don't think
 you could use --truncate-tables to restore only table foo, because
 you would get:

   ERROR:  cannot truncate a table referenced in a foreign key
 constraint

 (At least, I think so, not having tried with the actual patch.) You
 could use --disable-triggers when restoring bar.

 I tried it and you're quite right.  (I thought I'd tried this
 before, but clearly I did not -- proving the utility of the review
 process.  :-)  My assumption was that since triggers
 were turned off that constraints, being triggers, would be off as
 well and therefore tables with foreign keys could be truncated.
 Obviously not, since the I get the above error.

 I just tried it.  --disable-triggers does not disable constraints.

Just to be clear, I believe the problem in this example is that
--disable-triggers does not disable any foreign key constraints of
other tables when you are restoring a single table. So with:

  pg_restore -1 --truncate-tables --disable-triggers --table=foo \
  --data-only my.dump ...

you would get the complaint

  ERROR:  cannot truncate a table referenced in a foreign key constraint

which is talking about bar's referencing foo, and there was no

  ALTER TABLE bar DISABLE TRIGGER ALL

done, since bar isn't being restored. I don't have a quibble with
this existing behavior -- you are instructing pg_restore to only mess
with bar, after all. See below, though, for a comparison of --clean
and --truncate-tables when restoring foo and bar together.

 For your first example, --truncate-tables seems to have some use, so
 that the admin isn't forced to recreate various views which may be
 dependent on the table. (Though it's not too difficult to work around
 this case today.)

 As an aside: I never have an issue with this, having planned ahead.
 I'm curious what the not-too-difficult work-around is that you have
 in mind.  I'm not coming up with a tidy way to, e.g, pull a lot
 of views out of a dump.

Well, for the first example, there was one table and only one view
which depended on the table, so manually editing the list file like
so:

  pg_restore --list --file=my.dump  output.list
  # manually edit file output.list, select only entries pertaining
  # to the table and dependent view
  pg_restore -1 --clean --use-list=output.list ...

isn't too arduous, though it would become so if you had more dependent
views to worry about.

I'm willing to count this use-case as a usability win for
--truncate-tables, since with that option the restore can be boiled
down to a short and sweet:

  pg_restore -1 --data-only --truncate-tables --table=mytable my.dump ...

Though note this may not prove practical for large tables, since
--data-only leaves constraints and indexes intact during the restore,
and can be massively slower compared to adding the constraints only
after the data has been COPYed in, as pg_restore otherwise does.

 For the second example involving FKs, I'm a little bit more hesitant
 about  endorsing the use of --truncate-tables combined with
 --disable-triggers to potentially allow bogus FKs. I know this is
 possible to some extent today using the --disable-triggers option,
 but
 it makes me nervous to be adding a mode to pg_restore if it's
 primarily designed to work together with --disable-triggers as a
 larger foot-gun.

 This is the crux of the issue, and where I was thinking of
 taking this patch.  I very often am of the mindset that
 foreign keys are no more or less special than other
 more complex data integrity rules enforced with triggers.
 (I suppose others might say the same regards to integrity
 enforced at the application level.)  This naturally
 inclines me to think that 

Re: [HACKERS] review: Deparsing DDL command strings

2012-11-23 Thread Dimitri Fontaine
Pavel Stehule pavel.steh...@gmail.com writes:
 * some wrong in deparsing - doesn't support constraints

 postgres=# alter table a add column bbbsss int check (bbbsss  0);
 NOTICE:  event: ddl_command_start, context: TOPLEVEL, tag: ALTER
 TABLE, operation: ALTER, type: TABLE, schema: NULL, name: NULL
 NOTICE:  command: NULL
 NOTICE:  event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE,
 operation: ALTER, type: TABLE, schema: public, name: a
 NOTICE:  command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ;
 ALTER TABLE

So apparently to be able to decipher what ALTER TABLE did actually do
you need more than just hook yourself after transformAlterTableStmt()
have been done, because the interesting stuff is happening in the alter
table work queue, see ATController in src/backend/commands/tablecmds.c
for details.

Exporting that data, I'm now able to implement constraint rewriting this
way:

alter table a add column bbbsss int check (bbbsss  0);
NOTICE:  AT_AddConstraint: a_bbbsss_check
NOTICE:  event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE, 
operation: ALTER, type: TABLE, schema: public, name: a
NOTICE:  command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ADD 
CONSTRAINT a_bbbsss_check CHECK ((bbbsss  0));
ALTER TABLE

I did publish that work on the github repository (that I rebase every
time I'm pulling from master, beware):

  https://github.com/dimitri/postgres/compare/evt_add_info

This implementation also shows to me that it's not possible to get the
command string from the parsetree directly nor after the DDL transform
step if you want such things as the constraint name that's been produced
automatically by the backend. And I don't see any way to implement that
from an extension, without first patching the backend.

As that's the kind of code we want to be able to break at will in
between releases (or to fix an important bug in a minor update), I think
we need to have the facility to provide users with the normalized
command string in core.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add -Wlogical-op to standard compiler options?

2012-11-23 Thread Peter Eisentraut
On Wed, 2012-11-21 at 11:46 -0800, Jeff Janes wrote:
 Using gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4),  I get dozens of
 warnings, all apparently coming from somewhere in the MemSet macro. 

OK, reverted.

It looks like they dialed it down to a more useful volume in GCC 4.5,
but that doesn't really help us.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-23 Thread Kevin Grittner
Alvaro Herrera wrote:

 Are you posting an updated patch?

Well, there wasn't exactly a consensus on what should change, so I'll
throw some initial review comments out even though I still need to
check some things in the code and do more testing.

The patch applied cleanly, compiled without warnings, and passed all
tests in `make check-world`.

The previous discussion seemed to me to achieve consensus on the need
for the feature, but a general dislike for adding so many new GUC
settings to configure it. I concur on that. I agree with the feelings
of others that just using deadlock_timeout / 10 (probably clamped to
a minimum of 10ms) would be good instead of
autovacuum_truncate_lock_check. I agree that the values of the other
two settings probably aren't too critical as long as they are fairly
reasonable, which I would define as being 20ms to 100ms with with
retries lasting no more than 2 seconds.  I'm inclined to suggest a
total time of deadlock_timeout * 2 clamped to 2 seconds, but maybe
that is over-engineering it.

There was also a mention of possibly inserting a vacuum_delay_point()
call outside of the AccessExclusiveLock. I don't feel strongly about
it, but if the page scanning cost can be tracked easily, I guess it
is better to do that.

Other than simplifying the code based on eliminating the new GUCs,
the coding issues I found were:

TRUE and FALSE were used as literals.  Since true and false are used
about 10 times as often and current code in the modified files is
mixed, I would recommend the lowercase form. We decided it wasn't
worth the trouble to convert the minority usage over, but I don't see
any reason to add new instances of the minority case.

In LockHasWaiters() the partition lock is acquired using
LW_EXCLUSIVE. I don't see why that can't be LW_SHARED. Was there a
reason for this, or was it just not changed after copy/paste?

I still need to review the timing calls, since I'm not familiar with
them so it wasn't immediately obvious to me whether they were being
used correctly. I have no reason to believe that they aren't, but
feel I should check.

Also, I want to do another pass looking just for off-by-one errors on
blkno. Again, I have no reason to believe that there is a problem; it
just seems like it would be a particularly easy type of mistake to
make and miss when a key structure has this field:

  BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */

And I want to test more.

But if a new version could be created based on the above, that would
be great. Chances seem relatively slim at this point that there will
be much else to change, if anything.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] splitting *_desc routines

2012-11-23 Thread Alvaro Herrera
Robert Haas escribió:
 On Thu, Oct 25, 2012 at 4:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
  I'd put these in a separate directory to avoid annoyance. Transam is
  already too large.
 
 +1.  A further advantage of that is that it means that that everything
 in that new directory will be intended to end up as
 all-separately-compilable, which should make it easier to remember
 what the rules are, and easier to design an automated framework that
 continuously verifies that it still works that way.

So incorporating these ideas, the layout now looks like this:

./commands/seq_desc.c
./commands/dbase_desc.c
./commands/tablespace_desc.c
./catalog/storage_desc.c
./utils/cache/relmap_desc.c
./access/rmgrdesc/mxact_desc.c
./access/rmgrdesc/spgdesc.c
./access/rmgrdesc/xact_desc.c
./access/rmgrdesc/heapdesc.c
./access/rmgrdesc/tupdesc.c
./access/rmgrdesc/xlog_desc.c
./access/rmgrdesc/gistdesc.c
./access/rmgrdesc/clog_desc.c
./access/rmgrdesc/hashdesc.c
./access/rmgrdesc/gindesc.c
./access/rmgrdesc/nbtdesc.c
./storage/ipc/standby_desc.c

There are two files that are two subdirs down from the backend
directory:

./storage/ipc/standby_desc.c
./utils/cache/relmap_desc.c

We could keep them in there, or we could alternatively create more
rmgrdesc subdirs for them, so

./storage/rmgrdesc/standby_desc.c
./utils/rmgrdesc/relmap_desc.c

This approach appears more future-proof if we ever want to create desc
routines in other subdirs in storage/ and utils/, though it's a bit
annoying to have subdirs that will only hold a single file each (and a
very tiny file to boot).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further pg_upgrade analysis for many tables

2012-11-23 Thread Jeff Janes
On Thu, Nov 15, 2012 at 7:05 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Janes jeff.ja...@gmail.com writes:
 On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There are at least three ways we could whack that mole: ...

 * Keep a separate list (or data structure of your choice) so that
 relcache entries created in the current xact could be found directly
 rather than having to scan the whole relcache.  That'd add complexity
 though, and could perhaps be a net loss for cases where the relcache
 isn't so bloated.

 Maybe a static list that can overflow, like the ResourceOwner/Lock
 table one recently added.  The overhead of that should be very low.

 Are the three places where need_eoxact_work = true; the only places
 where things need to be added to the new structure?

 Yeah.  The problem is not so much the number of places that do that,
 as that places that flush entries from the relcache would need to know
 to remove them from the separate list, else you'd have dangling
 pointers.

 If the list is of hash-tags rather than pointers, all we would have to
 do is ignore entries that are not still in the hash table, right?


I've attached a proof-of-concept patch to implement this.

I got rid of need_eoxact_work entirely and replaced it with a short
list that fulfills the functions of indicating that work is needed,
and suggesting which rels might need that work.  There is no attempt
to prevent duplicates, nor to remove invalidated entries from the
list.   Invalid entries are skipped when the hash entry is not found,
and processing is idempotent so duplicates are not a problem.

Formally speaking, if MAX_EOXACT_LIST were 0, so that the list
overflowed the first time it was accessed, then it would be identical
to the current behavior or having only a flag.  So formally all I did
was increase the max from 0 to 10.

I wasn't so sure about the idempotent nature of Sub transaction
processing, so I chickened out and left that part alone.  I know of no
workflow for which that was a bottleneck.

AtEOXact_release is oddly indented because that makes the patch
smaller and easier to read.

This makes the non -1 restore of large dumps very much faster (and
makes them faster than -1 restores, as well)

I added a create temp table foo (x integer) on commit drop; line to
the default pgbench transaction and tested that.  I was hoping to see
a performance improvement there was well (the transaction has ~110
entries in the RelationIdCache at eoxact each time), but the
performance was too variable (probably due to the intense IO it
causes) to detect any changes.  At least it is not noticeably slower.
If I hack pgbench to bloat the RelationIdCache by touching 20,000
useless tables as part of the connection start up process, then this
patch does show a win.

It is not obvious what value to set the MAX list size to.  Since this
array is only allocated once per back-end, and since it not groveled
through to invalidate relations at each invalidation, there is no
particular reason it must be small.  But if the same table is assigned
new filenodes (or forced index lists, whatever those are) repeatedly
within a transaction, the list could become bloated with replicate
entries, potentially becoming even larger than the hash table whose
scan it is intended to short-cut.

In any event, 10 seems to be large enough to overcome the currently
known bottle-neck.  Maybe 100 would be a more principled number, as
that is about where the list could start to become as big as the basal
size of the RelationIdCache table.

I don't think this patch replaces having some mechanism for
restricting how large RelationIdCache can get or how LRU entries in it
can get as Robert suggested.  But this approach seems like it is
easier to implement and agree upon; and doesn't preclude doing other
optimizations later.

Cheers,

Jeff


relcache_list_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] splitting *_desc routines

2012-11-23 Thread Simon Riggs
On 23 November 2012 22:52, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 ./access/rmgrdesc/xact_desc.c
 ./access/rmgrdesc/heapdesc.c

Could we have either all underscores _ or none at all for the naming?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] splitting *_desc routines

2012-11-23 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 So incorporating these ideas, the layout now looks like this:

 ./commands/seq_desc.c
 ./commands/dbase_desc.c
 ./commands/tablespace_desc.c
 ./catalog/storage_desc.c
 ./utils/cache/relmap_desc.c
 ./access/rmgrdesc/mxact_desc.c
 ./access/rmgrdesc/spgdesc.c
 ./access/rmgrdesc/xact_desc.c
 ./access/rmgrdesc/heapdesc.c
 ./access/rmgrdesc/tupdesc.c
 ./access/rmgrdesc/xlog_desc.c
 ./access/rmgrdesc/gistdesc.c
 ./access/rmgrdesc/clog_desc.c
 ./access/rmgrdesc/hashdesc.c
 ./access/rmgrdesc/gindesc.c
 ./access/rmgrdesc/nbtdesc.c
 ./storage/ipc/standby_desc.c

FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory
(which may as well be under access/ since that's where the xlog code is),
regardless of where the corresponding replay code is in the source tree.
I don't think splitting them up into half a dozen directories adds
anything except confusion.  If you want, the header comment for each
file could mention where the corresponding replay code lives.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-23 Thread Bruce Momjian
 On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote:

[ Sorry for the delay in replying.]

 On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian br...@momjian.us wrote:
  On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:
 
  Is turning off synchronous_commit enough?  What about turning off fsync?
 
  I did some testing with the attached patch on a magnetic disk with no
  BBU that turns off fsync;
 
 With which file system? I wouldn't expect you to see a benefit with
 ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits
 group fsync of new file creations but rather does each one serially.
  Whether it is worth applying a fix that is only needed for that one
 file system, I don't know.  The trade-offs are not all that clear to
 me yet.

That only ext4 shows the difference seems possible.

   I got these results
 
   sync_com=off  fsync=off
  115.90 13.51
   100026.09 24.56
   200033.41 31.20
   400057.39 57.74
   8000   102.84116.28
  16000   189.43207.84
 
  It shows fsync faster for  4k, and slower for  4k.  Not sure why this
  is the cause but perhaps the buffering of the fsync is actually faster
  than doing a no-op fsync.
 
 synchronous-commit=off turns off not only the fsync at each commit,
 but also the write-to-kernel at each commit; so it is not surprising
 that it is faster at large scale.  I would specify both
 synchronous-commit=off and fsync=off.

I would like to see actual numbers showing synchronous-commit=off is
also useful if we use fsync=off.

  When I'm doing a pg_upgrade with thousands of tables, the shutdown
  checkpoint after restoring the dump to the new cluster takes a very
  long time, as the writer drains its operation table by opening and
  individually fsync-ing thousands of files.  This takes about 40 ms per
  file, which I assume is a combination of slow lap-top disk drive, and
  a strange deal with ext4 which makes fsyncing a recently created file
  very slow.   But even with faster hdd, this would still be a problem
  if it works the same way, with every file needing 4 rotations to be
  fsynced and this happens in serial.
 
  Is this with the current code that does synchronous_commit=off?  If not,
  can you test to see if this is still a problem?
 
 Yes, it is with synchronous_commit=off. (or if it wasn't originally,
 it is now, with the same result)
 
 Applying your fsync patch does solve the problem for me on ext4.
 Having the new cluster be on ext3 rather than ext4 also solves the
 problem, without the need for a patch; but it would be nice to more
 friendly to ext4, which is popular even though not recommended.

Do you have numbers with synchronous-commit=off, fsync=off, and both, on
ext4?

  Anyway, the reason I think turning fsync off might be reasonable is
  that as soon as the new cluster is shut down, pg_upgrade starts
  overwriting most of those just-fsynced file with other files from the
  old cluster, and AFAICT makes no effort to fsync them.  So until there
  is a system-wide sync after the pg_upgrade finishes, your new cluster
  is already in mortal danger anyway.
 
  pg_upgrade does a cluster shutdown before overwriting those files.
 
 Right.  So as far as the cluster is concerned, those files have been
 fsynced.  But then the next step is go behind the cluster's back and
 replace those fsynced files with different files, which may or may not
 have been fsynced.  This is what makes me thing the new cluster is in
 mortal danger.  Not only have the new files perhaps not been fsynced,
 but the cluster is not even aware of this fact, so you can start it
 up, and then shut it down, and it still won't bother to fsync them,
 because as far as it is concerned they already have been.
 
 Given that, how much extra danger would be added by having the new
 cluster schema restore run with fsync=off?
 
 In any event, I think the documentation should caution that the
 upgrade should not be deemed to be a success until after a system-wide
 sync has been done.  Even if we use the link rather than copy method,
 are we sure that that is safe if the directories recording those links
 have not been fsynced?

OK, the above is something I have been thinking about, and obviously you
have too.  If you change fsync from off to on in a cluster, and restart
it, there is no guarantee that the dirty pages you read from the kernel
are actually on disk, because Postgres doesn't know they are dirty. 
They probably will be pushed to disk by the kernel in less than one
minute, but still, it doesn't seem reliable. Should this be documented
in the fsync section?

Again, another reason not to use fsync=off, though your example of the
file copy is a good one.  As you stated, this is a problem with the file
copy/link, independent of how Postgres handles the files.  We can tell
people to use 'sync' as root on Unix, but what about Windows?

-- 
  

[HACKERS] Re: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

2012-11-23 Thread Bruce Momjian
On Wed, Nov 14, 2012 at 06:28:41PM -0500, Bruce Momjian wrote:
 On Wed, Nov 14, 2012 at 06:15:30PM -0500, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   On Wed, Nov 14, 2012 at 05:39:29PM -0500, Tom Lane wrote:
   You would have been better off keeping the array and sorting it so you
   could use binary search, instead of passing the problem off to the
   filesystem.
  
   Well, testing showed using open() was a big win.
  
  ... on the filesystem you tested on.  I'm concerned that it might not
  look so good on other platforms.
 
 True. I am on ext3.  So I need to generate a proof-of-concept patch and
 have others test it?

OK, test patch attached.  The patch will only work if your database uses
a single tablespace.  Doing multiple tablespaces seemed too complex for
the test patch.

Here are my results:

# tables  gitbsearch patch
111.16 10.99
 100019.13 19.26
 200026.78 27.11
 400043.81 42.15
 800079.96 77.38
16000   165.26162.24
32000   378.18368.49
64000  1083.35   1086.77
 
As you can see, the bsearch code doesn't see to make much difference. 
Sometimes it is faster, other times, slower ---  seem to be just
measurement noise.  This code uses the same method of file lookup as git
head, meaning it looks for specific files rather than patterns ---  this
simplified the bsearch code.

Can anyone see a consistent improvement with the bsearch patch? 
Attached is my test shell script.  There is no reason we can't use
bsearch(), except not using it allows us to remove the pg_upgrade
directory listing function.
 
-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/file.c b/contrib/pg_upgrade/file.c
new file mode 100644
index d8cd8f5..a5d92c6
*** a/contrib/pg_upgrade/file.c
--- b/contrib/pg_upgrade/file.c
*** copy_file(const char *srcfile, const cha
*** 221,226 
--- 221,281 
  #endif
  
  
+ /*
+  * load_directory()
+  *
+  * Read all the file names in the specified directory, and return them as
+  * an array of char * pointers.  The array address is returned in
+  * *namelist, and the function result is the count of file names.
+  *
+  * To free the result data, free each (char *) array member, then free the
+  * namelist array itself.
+  */
+ int
+ load_directory(const char *dirname, char ***namelist)
+ {
+ 	DIR		   *dirdesc;
+ 	struct dirent *direntry;
+ 	int			count = 0;
+ 	int			allocsize = 64;		/* initial array size */
+ 
+ 	*namelist = (char **) pg_malloc(allocsize * sizeof(char *));
+ 
+ 	if ((dirdesc = opendir(dirname)) == NULL)
+ 		pg_log(PG_FATAL, could not open directory \%s\: %s\n,
+ 			   dirname, getErrorText(errno));
+ 
+ 	while (errno = 0, (direntry = readdir(dirdesc)) != NULL)
+ 	{
+ 		if (count = allocsize)
+ 		{
+ 			allocsize *= 2;
+ 			*namelist = (char **)
+ 		pg_realloc(*namelist, allocsize * sizeof(char *));
+ 		}
+ 
+ 		(*namelist)[count++] = pg_strdup(direntry-d_name);
+ 	}
+ 
+ #ifdef WIN32
+ 	/*
+ 	 * This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in
+ 	 * released version
+ 	 */
+ 	if (GetLastError() == ERROR_NO_MORE_FILES)
+ 		errno = 0;
+ #endif
+ 
+ 	if (errno)
+ 		pg_log(PG_FATAL, could not read directory \%s\: %s\n,
+ 			   dirname, getErrorText(errno));
+ 
+ 	closedir(dirdesc);
+ 
+ 	return count;
+ }
+ 
+ 
  void
  check_hard_link(void)
  {
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index ace56e5..0248d40
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
***
*** 7,12 
--- 7,13 
  
  #include unistd.h
  #include assert.h
+ #include dirent.h
  #include sys/stat.h
  #include sys/time.h
  
*** const char *setupPageConverter(pageCnvCt
*** 365,370 
--- 366,372 
  typedef void *pageCnvCtx;
  #endif
  
+ int			load_directory(const char *dirname, char ***namelist);
  const char *copyAndUpdateFile(pageCnvCtx *pageConverter, const char *src,
    const char *dst, bool force);
  const char *linkAndUpdateFile(pageCnvCtx *pageConverter, const char *src,
diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c
new file mode 100644
index 7dbaac9..b5b7380
*** a/contrib/pg_upgrade/relfilenode.c
--- b/contrib/pg_upgrade/relfilenode.c
***
*** 18,24 
  static void transfer_single_new_db(pageCnvCtx *pageConverter,
  	   FileNameMap *maps, int size);
  static void transfer_relfile(pageCnvCtx *pageConverter, FileNameMap *map,
! 			 const char *suffix);
  
  
  /*
--- 18,24 
  static void transfer_single_new_db(pageCnvCtx *pageConverter,
  	   FileNameMap *maps, int size);
  static void transfer_relfile(pageCnvCtx *pageConverter, FileNameMap 

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-23 Thread Amit kapila
On Friday, November 23, 2012 10:10 PM Fujii Masao wrote:
On Fri, Nov 23, 2012 at 6:56 PM, Amit Kapila amit.kap...@huawei.com wrote:
 When I remove postgresql.auto.conf, SET PERSISTENT failed.

 =# SET PERSISTENT synchronous_commit = 'local';
 ERROR:  failed to open postgresql.auto.conf file

 There can be 2 ways to handle this, either we recreate the
 postgresql.auto.conf file or give error.
 I am not sure if user tries to delete internal files, what should be exact
 behavior?
 Any suggestion?

 I prefer to recreate it.

$PGDATA/config_dir is specified in include_dir by default. Users might
create and remove the configuration files in that directory many times.
So I'm not surprised even if a user accidentally removes
postgresql.auto.conf in that directory. Also users might purposely remove
that file to reset all the settings by SET PERSISTENT. 

One of the suggestion in this mail chain was if above happens then at restart, 
it should display/log message.
However I think if such a situation happens then we should provide some 
mechanism to users so that the settings through 
command should work.

 So I think that SET
PERSISTENT should handle the case where postgresql.auto.conf doesn't
exist.

If we directly create a file user might not be aware that his settings done in 
previous commands will not work.
So how about if we display NOTICE also when internally for SET PERSISTENT new 
file needs to be created.
Notice should suggest that the settings done by previous commands are lost due 
to manual deletion of internal file.


We might be able to expect that postgresql.auto.conf is not deleted by
a user if it's in $PGDATA/global or base directory.

So do you mean to say that if we don't find file in config_dir, then we should 
search in $PGDATA/global or base directory?
Can't we mandate to user that even if it is deleted, the file has to be only 
expected in config_dir.


 We should implement RESET PERSISTENT? Otherwise, there is no way
 to get rid of the parameter setting from postgresql.auto.conf, via SQL.
 Also
 We should implement SET PERSISTENT name TO DEFAULT?

 Till now, I have not implemented this in patch, thinking that it can be done
 as a 2nd part if basic stuff is ready.
 However I think you are right without one of RESET PERSISTENT or SET
 PERSISTENT name TO DEFAULT, it is difficult for user
 to get rid of parameter.
 Will SET PERSISTENT name TO DEFAULT be sufficient or do you think both are
 necessary, because RESET PERSISTENT also internally might need
 to behave similarly.

 For implementation of SET PERSISTENT name TO DEFAULT, there can be 2 ways
 1) Delete the entry from postgresql.auto.conf
 2) Update the entry value in postgresql.auto.conf to default value

Both seems to be useful. I think that SET ... TO DEFAULT is suitable for
2), and RESET PERSISTENT ... is suitable for 1).

For other commands behavior for SET ... TO DEFAULT and RESET  ... is same.
In the docs it is mentioned RESET is an alternative name for SET ... TO 
DEFAULT

However still the way you are telling can be done. 
Others, any objection to it, else I will go ahead with it?

 Another comment is:

 What happens if the server crashes while SET PERSISTENT is writing the
 setting to the file? A partial write occurs and restart of the server would 
 fail
 because of corrupted postgresql.auto.conf?

This situation will not happen as SET PERSISTENT command will first write to 
.lock file and then at commit time, 
rename it to .auto.conf.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers