from:"Steve Singer"

Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value

2013-03-28 Thread Steve Singer


On 13-03-26 12:40 AM, Tom Lane wrote:

Bruce Momjian br...@momjian.us writes:

On Mon, Mar 25, 2013 at 07:07:42PM -0400, Tom Lane wrote:

Well, plan B would be to invent a replacement function that does have
the ability to return an error message, but that seems like a lot of
work for a problem that's so marginal that it wasn't noticed till now.
(It's not so much creating the function that worries me, it's fixing
clients to use it.)

Plan C would be to redefine bogus value of PGSERVICE as not an error,
period.



Given all of these poor options, is defining a PQconndefaults() as
perhaps out of memory or a service file problem really not better?


Uh ... no.  In the first place, what evidence have you got that those
are (and will continue to be) the only two possible causes?  In the
second place, this still requires changing every client of
PQconndefaults(), even if it's only to the extent of fixing their
error message texts.  If we're going to do that, I'd rather ask them
to change to a more future-proof solution.



So to summarise:

Plan A: The first patch I attached for pg_upgrade + documentation 
changes, and changing the other places that call PQconndefaults() to 
accept failures on either out of memory, or an invalid PGSERVICE


Plan B: Create a new function PQconndefaults2(char * errorBuffer) or 
something similar that returned error information to the caller.


Plan C: PQconndefaults() just ignores an invalid service but connection 
attempts fail because other callers of conninfo_add_defaults still pay 
attention to connection failures.  This is the second patch I sent.


Plan D:  Service lookup failures are always ignored by 
conninfo_add_defaults. If you attempt to connect with a bad PGSERVICE 
set it will behave as if no PGSERVICE value was set.   I don't think 
anyone explicitly proposed this yet.


Plan 'D' is the only option that I'm opposed to, it will effect a lot 
more applications then ones that call PQconndefaults() and I feel it 
will confuse users.


I'm not convinced plan B is worth the effort of having to maintain two 
versions of PQconndefaults() for a while to fix a corner case.







regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value

2013-03-25 Thread Steve Singer


On 13-03-20 05:54 PM, Tom Lane wrote:

Steve Singer ssin...@ca.afilias.info writes:





  From a end-user expectations point of view I am okay with somehow
marking the structure returned by PQconndefaults in a way that the
connect calls will later fail.


Unless the program changes the value of PGSERVICE, surely all subsequent
connection attempts will fail for the same reason, regardless of what
PQconndefaults returns?

regards, tom lane




So your proposing we do something like the attached patch?  Where we 
change conninfo_add_defaults to ignore an invalid PGSERVICE if being 
called by PQconndefaults() but keep the existing behaviour in other 
contexts where it is actually being used to establish a connection?


In this case even if someone takes the result of PQconndefaults and uses 
that to build connection options for a new connection it should fail 
when it does the pgservice lookup when establishing the connection. 
That sounds reasonable to me.


Steve
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
index ed67759..c1d8d69 100644
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*** check_pghost_envvar(void)
*** 300,306 
  	/* Get valid libpq env vars from the PQconndefaults function */
  
  	start = PQconndefaults();
! 
  	for (option = start; option-keyword != NULL; option++)
  	{
  		if (option-envvar  (strcmp(option-envvar, PGHOST) == 0 ||
--- 300,309 
  	/* Get valid libpq env vars from the PQconndefaults function */
  
  	start = PQconndefaults();
! 	if (start == NULL)
! 	{
! 		pg_log(PG_FATAL,can not get default connection options out of memory\n);
! 	}
  	for (option = start; option-keyword != NULL; option++)
  	{
  		if (option-envvar  (strcmp(option-envvar, PGHOST) == 0 ||
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index eea9c6b..0a9a29e 100644
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*** static PQconninfoOption *conninfo_array_
*** 347,353 
  	 const char *const * values, PQExpBuffer errorMessage,
  	 bool use_defaults, int expand_dbname);
  static bool conninfo_add_defaults(PQconninfoOption *options,
! 	  PQExpBuffer errorMessage);
  static PQconninfoOption *conninfo_uri_parse(const char *uri,
     PQExpBuffer errorMessage, bool use_defaults);
  static bool conninfo_uri_parse_options(PQconninfoOption *options,
--- 347,354 
  	 const char *const * values, PQExpBuffer errorMessage,
  	 bool use_defaults, int expand_dbname);
  static bool conninfo_add_defaults(PQconninfoOption *options,
!   PQExpBuffer errorMessage,
!   bool ignore_missing_service);
  static PQconninfoOption *conninfo_uri_parse(const char *uri,
     PQExpBuffer errorMessage, bool use_defaults);
  static bool conninfo_uri_parse_options(PQconninfoOption *options,
*** PQconndefaults(void)
*** 875,881 
  	connOptions = conninfo_init(errorBuf);
  	if (connOptions != NULL)
  	{
! 		if (!conninfo_add_defaults(connOptions, errorBuf))
  		{
  			PQconninfoFree(connOptions);
  			connOptions = NULL;
--- 876,882 
  	connOptions = conninfo_init(errorBuf);
  	if (connOptions != NULL)
  	{
! 		if (!conninfo_add_defaults(connOptions, errorBuf,true))
  		{
  			PQconninfoFree(connOptions);
  			connOptions = NULL;
*** conninfo_parse(const char *conninfo, PQE
*** 4243,4249 
  	 */
  	if (use_defaults)
  	{
! 		if (!conninfo_add_defaults(options, errorMessage))
  		{
  			PQconninfoFree(options);
  			return NULL;
--- 4244,4250 
  	 */
  	if (use_defaults)
  	{
! 		if (!conninfo_add_defaults(options, errorMessage,false))
  		{
  			PQconninfoFree(options);
  			return NULL;
*** conninfo_array_parse(const char *const *
*** 4395,4401 
  	 */
  	if (use_defaults)
  	{
! 		if (!conninfo_add_defaults(options, errorMessage))
  		{
  			PQconninfoFree(options);
  			return NULL;
--- 4396,4402 
  	 */
  	if (use_defaults)
  	{
! 		if (!conninfo_add_defaults(options, errorMessage,false))
  		{
  			PQconninfoFree(options);
  			return NULL;
*** conninfo_array_parse(const char *const *
*** 4416,4422 
   * error condition here --- we just leave the option's value as NULL.
   */
  static bool
! conninfo_add_defaults(PQconninfoOption *options, PQExpBuffer errorMessage)
  {
  	PQconninfoOption *option;
  	char	   *tmp;
--- 4417,4424 
   * error condition here --- we just leave the option's value as NULL.
   */
  static bool
! conninfo_add_defaults(PQconninfoOption *options, PQExpBuffer errorMessage,
! 	  bool ignore_missing_service)
  {
  	PQconninfoOption *option;
  	char	   *tmp;
*** conninfo_add_defaults(PQconninfoOption *
*** 4425,4431 
  	 * If there's a service spec, use it to obtain any not-explicitly-given
  	 * parameters.
  	 */
! 	if (parseServiceInfo(options, errorMessage) != 0)
  		return false

Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value

2013-03-20 Thread Steve Singer


On 13-03-20 02:17 PM, Bruce Momjian wrote:

On Wed, Mar 20, 2013 at 01:30:20PM -0400, Tom Lane wrote:




While this surely isn't the nicest answer, it doesn't seem totally
unreasonable to me.  A bad service name indeed does not contribute
anything to the set of defaults available.


I think the concern is that the services file could easily change the
defaults that are used for connecting, though you could argue that the
real defaults for a bad service entry are properly returned.


Yes, my concern is that if I have a typo in the value of PGSERVICE I 
don't want to end up getting connected a connection to localhost instead 
of an error.


From a end-user expectations point of view I am okay with somehow 
marking the structure returned by PQconndefaults in a way that the 
connect calls will later fail.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value

2013-03-19 Thread Steve Singer


On 13-03-18 09:17 PM, Bruce Momjian wrote:

On Mon, Mar 18, 2013 at 12:08:09PM -0400, Steve Singer wrote:

If you try running pg_upgrade with the PGSERVICE environment
variable set to some invalid/non-existent service pg_upgrade
segfaults

Program received signal SIGSEGV, Segmentation fault.
0x0040bdd1 in check_pghost_envvar () at server.c:304
304 for (option = start; option-keyword != NULL; option++)
(gdb) p start
$5 = (PQconninfoOption *) 0x0


PQconndefaults can return NULL if it has issues.

The attached patch prints a minimally useful error message. I don't
a good way of getting a more useful error message out of
PQconndefaults()

I checked this against master but it was reported to me as a issue in 9.2


Well, that's interesting.  There is no mention of PQconndefaults()
returning NULL except for out of memory:

Returns a connection options array.  This can be used to determine
all possible functionPQconnectdb/function options and their
current default values.  The return value points to an array of
structnamePQconninfoOption/structname structures, which ends
with an entry having a null structfieldkeyword/ pointer.  The
--null pointer is returned if memory could not be allocated. Note that
the current default values (structfieldval/structfield fields)
will depend on environment variables and other context.  Callers
must treat the connection options data as read-only.

Looking at libpq/fe-connect.c::PQconndefaults(), it calls
conninfo_add_defaults(), which has this:

 /*
  * If there's a service spec, use it to obtain any not-explicitly-given
  * parameters.
  */
 if (parseServiceInfo(options, errorMessage) != 0)
 return false;

so it is clearly possible for PQconndefaults() to return NULL for
service file failures.  The questions are:

*  Is this what we want?


What other choices do we have? I don't think PQconndefaults() should 
continue on as if PGSERVICE wasn't set in the environment after a 
failure from parseServiceInfo.




*  Should we document this?


Yes the documentation should indicate that PQconndefaults() can return 
NULL for more than just memory failures.



*  Should we change this to just throw a warning?


How would we communicate warnings from PQconndefaults() back to the caller?




Also, it seems pg_upgrade isn't the only utility that is confused:

contrib/postgres_fdw/option.c and contrib/dblink/dblink.c think
PQconndefaults() returning NULL means out of memory and report that
as the error string.

bin/scripts/pg_isready.c and contrib/pg_upgrade/server.c have no
check for NULL return.

libpq/test/uri-regress.c knows to throw a generic error message.

So, we have some decisions and work to do.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value

2013-03-18 Thread Steve Singer

If you try running pg_upgrade with the PGSERVICE environment variable 
set to some invalid/non-existent service pg_upgrade segfaults


Program received signal SIGSEGV, Segmentation fault.
0x0040bdd1 in check_pghost_envvar () at server.c:304
304 for (option = start; option-keyword != NULL; option++)
(gdb) p start
$5 = (PQconninfoOption *) 0x0


PQconndefaults can return NULL if it has issues.

The attached patch prints a minimally useful error message. I don't a 
good way of getting a more useful error message out of PQconndefaults()


I checked this against master but it was reported to me as a issue in 9.2

Steve


diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
index ed67759..f2e4d63 100644
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*** check_pghost_envvar(void)
*** 300,306 
  	/* Get valid libpq env vars from the PQconndefaults function */
  
  	start = PQconndefaults();
! 
  	for (option = start; option-keyword != NULL; option++)
  	{
  		if (option-envvar  (strcmp(option-envvar, PGHOST) == 0 ||
--- 300,309 
  	/* Get valid libpq env vars from the PQconndefaults function */
  
  	start = PQconndefaults();
! 	if (start == NULL)
! 	{
! 		pg_log(PG_FATAL,can not get default connection options\n);
! 	}
  	for (option = start; option-keyword != NULL; option++)
  	{
  		if (option-envvar  (strcmp(option-envvar, PGHOST) == 0 ||

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

2013-03-05 Thread Steve Singer


On 13-03-03 08:13 PM, Josh Berkus wrote:

This (creating the extensions) works fine for me on a  Ubuntu 10.x system




Now if only we can work out the combinatorics issue ...



The plpython2u extensions worked fine but I haven't been able to get 
this to work with plpython3u (python 3.1).


create extension hstore_plpython3u;
ERROR:  could not load library 
/usr/local/pgsql93git/lib/hstore_plpython3.so: 
/usr/local/pgsql93git/lib/hstore_plpython3.so: undefined symbol: 
_Py_NoneStruct



Peter mentioned in the submission that the unit tests don't pass with 
python3, it doesn't  work for meoutside the tests either.


Steve




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

2013-03-03 Thread Steve Singer


On 13-03-03 06:15 PM, Josh Berkus wrote:

transforms=# create extension hstore_plperl;
ERROR:  could not load library
/home/josh/pg93/lib/postgresql/hstore_plperl.so:
/home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
hstoreUniquePairs
STATEMENT:  create extension hstore_plperl;

This surprised me, because make check for the extensions passed fine.

Oh, this is on Ubuntu 12.10, not OSX.   So possibly the fixes you made
to fix linking on OSX broke other platforms.




This (creating the extensions) works fine for me on a  Ubuntu 10.x system

template1=# create database test;
CREATE DATABASE
template1=# \c test
You are now connected to database test as user ssinger.
test=# create extension hstore;
CREATE EXTENSION
test=# create extension hstore_plpythonu;
ERROR:  required extension plpythonu is not installed
STATEMENT:  create extension hstore_plpythonu;
ERROR:  required extension plpythonu is not installed
test=# create extension plpythonu;
CREATE EXTENSION
test=# create extension hstore_plpythonu;
CREATE EXTENSION
test=#
test=# create extension plperl;
CREATE EXTENSION
test=# create extension hstore_plperl;
CREATE EXTENSION
test=# create extension plperlu;
CREATE EXTENSION
test=# create extension hstore_plperlu;
CREATE EXTENSION
test=#




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json generation enhancements

2013-02-28 Thread Steve Singer


On 13-02-25 05:37 PM, Andrew Dunstan wrote:


On 02/24/2013 01:09 AM, Steve Singer wrote:

On 13-01-11 11:03 AM, Andrew Dunstan wrote:


On 01/11/2013 11:00 AM, Andrew Dunstan wrote:


I have not had anyone follow up on this, so I have added docs and 
will add this to the commitfest.


Recap:

This adds the following:

json_agg(anyrecord) - json
to_json(any) - json
hstore_to_json(hstore) - json (also used as a cast)
hstore_to_json_loose(hstore) - json

Also, in json generation, if any non-builtin type has a cast to 
json, that function is used instead of the type's output function.





This time with a patch.



Here is a review of this patch.,

Overview
-
This patch adds a set of functions to convert types to json. 
Specifically


* An aggregate that take the elements and builds up a json array.
* Conversions from an hstore to json

For non-builtin types the text conversion is used unless a cast has 
specifically been defined from the type to json.


There was some discussion last year on this patch that raised three 
issues


a) Robert was concerned that if someone added a cast from a non-core 
type like citext to json that it would change the behaviour of this 
function. No one else offered an opinion on this at the time.  I 
don't see this as a problem, if I add a cast between citext (or any 
other non-core datatype) to json I would expect it to effect how that 
datatype is generated as a json object in any functions that generate 
json.   I don't see why this would be surprising behaviour.  If I add 
a cast between my datatype and json to generate a json representation 
that differs from the textout representation then I would expect that 
representation to be used in the json_agg function as well.


b) There was some talk in the json bikeshedding thread that 
json_agg() should be renamed to collect_json() but more people 
preferred json_agg().


c) Should an aggregate of an empty result produce NULL or an empty 
json element. Most people who commented on the thread seemed to feel 
that NULL is preferred because it is  consistent with other aggregates


I feel that the functionality of this patch is fine.

Testing
---

When I try running the regression tests with this patch I get:
creating template1 database in 
/usr/local/src/postgresql/src/test/regress/./tmp_check/data/base/1 
... FATAL:  could not create unique index pg_proc_oid_index

DETAIL:  Key (oid)=(3171) is duplicated.
child process exited with exit code 1

oid 3170, 3171 and 3172 appears to be used by lo_tell64 and lo_lseek64

If I fix those up the regression tests pass.

I tried using the new functions in a few different ways and 
everything worked as expected.


Code Review
---
in contrib/hstore/hstore_io.c
+   /* not an int - try a double */
+   double doubleres = 
strtod(src-data,endptr);

+   if (*endptr == '\0')
+   is_number = true;
+   else if (false)
+   {
+   /* shut the compiler 
up about unused variables */
+   longres = (long) 
doubleres;

+   longres = longres / 2;


I dislike that we have to do this to avoid compiler warnings.  I am 
also worried the some other compiler might decide that the else if 
(false) is worthy of a warning.  Does anyone have cleaner way of 
getting rid of the warning we get if we don't store the strtol/strtod 
result?


Should we do something like:

(void) ( strtol(src-data,endptr,10)+1);

Other than that I don't see any issues.




Thanks for the review. Revised patch attached with fresh OIDs and 
numeric detection code cleaned up along the lines you suggest.




The opr_test unit test still fails.
I think you forgot to include your update to pg_aggregate.h.  See the 
attached diff.


Other than that it looks fine, Craig is satisfied with the casting 
behaviour and no one else has objected to it.


I think your good to commit this

Steve


cheers

andrew




diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 3c5f59b..6fb10a9 100644
*** a/src/include/catalog/pg_aggregate.h
--- b/src/include/catalog/pg_aggregate.h
*** DATA(insert ( 3538	string_agg_transfn	st
*** 233,239 
  DATA(insert ( 3545	bytea_string_agg_transfn	bytea_string_agg_finalfn		0	2281	_null_ ));
  
  /* json */
! DATA(insert ( 3172	json_agg_transfn	json_agg_finalfn		0	2281	_null_ ));
  
  /*
   * prototypes for functions in pg_aggregate.c
--- 233,239 
  DATA(insert ( 3545	bytea_string_agg_transfn	bytea_string_agg_finalfn		0	2281	_null_ ));
  
  /* json */
! DATA(insert ( 3175	json_agg_transfn	json_agg_finalfn		0	2281	_null_ ));
  
  /*
   * prototypes for functions

Re: [HACKERS] json generation enhancements

2013-02-23 Thread Steve Singer


On 13-01-11 11:03 AM, Andrew Dunstan wrote:


On 01/11/2013 11:00 AM, Andrew Dunstan wrote:


I have not had anyone follow up on this, so I have added docs and 
will add this to the commitfest.


Recap:

This adds the following:

json_agg(anyrecord) - json
to_json(any) - json
hstore_to_json(hstore) - json (also used as a cast)
hstore_to_json_loose(hstore) - json

Also, in json generation, if any non-builtin type has a cast to json, 
that function is used instead of the type's output function.





This time with a patch.



Here is a review of this patch.,

Overview
-
This patch adds a set of functions to convert types to json. Specifically

* An aggregate that take the elements and builds up a json array.
* Conversions from an hstore to json

For non-builtin types the text conversion is used unless a cast has 
specifically been defined from the type to json.


There was some discussion last year on this patch that raised three issues

a) Robert was concerned that if someone added a cast from a non-core 
type like citext to json that it would change the behaviour of this 
function. No one else offered an opinion on this at the time.  I don't 
see this as a problem, if I add a cast between citext (or any other 
non-core datatype) to json I would expect it to effect how that datatype 
is generated as a json object in any functions that generate json.   I 
don't see why this would be surprising behaviour.  If I add a cast 
between my datatype and json to generate a json representation that 
differs from the textout representation then I would expect that 
representation to be used in the json_agg function as well.


b) There was some talk in the json bikeshedding thread that json_agg() 
should be renamed to collect_json() but more people preferred json_agg().


c) Should an aggregate of an empty result produce NULL or an empty json 
element. Most people who commented on the thread seemed to feel that 
NULL is preferred because it is  consistent with other aggregates


I feel that the functionality of this patch is fine.

Testing
---

When I try running the regression tests with this patch I get:
creating template1 database in 
/usr/local/src/postgresql/src/test/regress/./tmp_check/data/base/1 ... 
FATAL:  could not create unique index pg_proc_oid_index

DETAIL:  Key (oid)=(3171) is duplicated.
child process exited with exit code 1

oid 3170, 3171 and 3172 appears to be used by lo_tell64 and lo_lseek64

If I fix those up the regression tests pass.

I tried using the new functions in a few different ways and everything 
worked as expected.


Code Review
---
in contrib/hstore/hstore_io.c
+   /* not an int - try a double */
+   double doubleres = 
strtod(src-data,endptr);

+   if (*endptr == '\0')
+   is_number = true;
+   else if (false)
+   {
+   /* shut the compiler up 
about unused variables */

+   longres = (long) doubleres;
+   longres = longres / 2;


I dislike that we have to do this to avoid compiler warnings.  I am also 
worried the some other compiler might decide that the else if (false) is 
worthy of a warning.  Does anyone have  cleaner way of getting rid of 
the warning we get if we don't store the strtol/strtod result?


Should we do something like:

(void) ( strtol(src-data,endptr,10)+1);

Other than that I don't see any issues.

Steve


cheers

andrew








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Python result object str handler

2013-02-02 Thread Steve Singer


On 13-01-07 09:58 PM, Peter Eisentraut wrote:

By implementing a str handler for the result object, it now prints
something like

PLyResult status=5 nrows=2 rows=[{'foo': 1, 'bar': '11'}, {'foo': 2, 'bar': 
'22'}]

Patch attached for review.



Here is a review:

This patch adds a function that pl/python functions can call to convert 
a query result hash into a string suitable for debug purposes. The use 
case for this feature is primarily for debugging and logging purposes.   
I feel that this is useful since a lot of debugging of stored functions 
is usually done with print/elog style debugging.


There already some discussion on the thread as if the number of rows 
printed should be limited, the consensus seemed to be 'no' since someone 
would be unhappy with any limit and printing everything is the same 
behaviour you get with the standard python print.


I've tested this with python2.6 and 3.1 and it seems to work as described.

I've looked through the code and everything looks fine.

The patch includes no documentation.   Adding a few lines to the 
Utility Functions section of the plpython documentation so people know 
about this feature would be good.


Other than that I think it is fine to commit.  I am setting this as 
ready for committer,  I assume you'll commit this yourself and that you 
can add a paragraph to the docs as you commit it.



Steve










--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-28 Thread Steve Singer


On 13-01-28 06:17 AM, Andres Freund wrote:

Hi,

3.  Pass the delete (with no key values) onto the replication client and let
it deal with it (see 1 and 2)
Hm.


While I agree that nicer behaviour would be good I think the real
enforcement should happen on a higher level, e.g. with event triggers or
somesuch. It seems way too late to do anything about it when we're
already decoding. The transaction will already have committed...


Ideally the first line of enforcement would be with event triggers. The 
thing with user-level mechanisms for enforcing things is that they 
sometimes can be disabled or by-passed.  I don't have a lot of sympathy 
for people who do this but I like the idea of at least having the option 
coding defensively to detect the situation and whine to the user.



How do you plan on dealing with sequences?
I don't see my plugin being called on sequence changes and I don't see
XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer.  Is there a reason why
this can't be easily added?

I basically was hoping for Simon's sequence-am to get in before doing
anything real here. That didn't really happen yet.
I am not sure whether there's a real usecase in decoding normal
XLOG_SEQ_LOG records, their content isn't all that easy to interpet
unless youre rather familiar with pg's innards.

So, adding support wouldn't hard from a technical pov but it seems the
semantics are a bit hard to nail down.


Also what do we want to do about TRUNCATE support.  I could always leave a
TRUNCATE trigger in place that logged the truncate to a sl_truncates and
have my replication daemon respond to the insert on a   sl_truncates table
by actually truncating the data on the replica.

I have planned to add some generic table_rewrite handling, but I have
to admit I haven't thought too much about it yet. Currently if somebody
rewrites a table, e.g. with an ALTER ... ADD COLUMN .. DEFAULT .. or
ALTER COLUMN ... USING ..., you will see INSERTs into a temporary
table. That basically seems to be a good thing, but the user needs to be
told about that ;)


I've spent some time this weekend updating my prototype plugin that
generates slony 2.2 style COPY output.  I have attached my progress here
(also https://github.com/ssinger/slony1-engine/tree/logical_repl).  I have
not gotten as far as modifying slon to act as a logical log receiver, or
made a version of the slony apply trigger that would process these
changes.

I only gave it a quick look and have a couple of questions and
remarks. The way you used the options it looks like youre thinking of
specifying all the tables as options? I would have thought those would
get stored  queried locally and only something like the 'replication
set' name or such would be set as an option.


The way slony works today is that the list of tables to pull for a SYNC 
comes from the subscriber because the subscriber might be behind the 
provider, where a table has been removed from the set in the meantime.  
The subscriber still needs to receive data from that table until it is 
caught up to the point where that removal happens.


Having a time-travelled version of a user table (sl_table) might fix 
that problem but I haven't yet figured out how that needs to work with 
cascading (since that is a feature of slony today I can't ignore the 
problem). I'm also not sure how that will work with table renames.  
Today if the user renames a table inside of an EXECUTE SCRIPT slony will 
update the name of the table in sl_table.   This type of change wouldn't 
be visible (yet) in the time-travelled catalog.  There might be a 
solution to this yet but I haven't figured out it.  Sticking with what 
slony does today seemed easier as a first step.



Iterating over a list with
for(i = 0; i  options-length; i= i + 2 )
{
DefElem * def_schema = (DefElem*) list_nth(options,i);
is not a good idea btw, thats quadratic in complexity ;)


Thanks I'll rewrite this to walk a list of  ListCell objects with next.



In the REORDER_BUFFER_CHANGE_UPDATE I suggest using
relation-rd_primary, just as in the DELETE case, that should always
give you a consistent candidate key in an efficient manner.


I haven't looked into the details of what is involved in setting up a
subscription with the snapshot exporting.

That hopefully shouldn't be too hard... At least thats the idea :P


I couldn't get the options on the START REPLICATION command to parse so I
just hard coded some list building code in the init method. I do plan on
pasing the list of tables to replicate from the replica to the plugin
(because this list comes from the replica).   Passing what could be a few
thousand table names as a list of arguments is a bit ugly and I admit my
list processing code is rough.  Does this make us want to reconsider the
format of the option_list ?

Yea, something's screwed up there, sorry. Will push a fix later today.


I guess should provide an opinion on if I think that the patch in this CF,
if committed

Re: [HACKERS] logical changeset generation v4

2013-01-28 Thread Steve Singer


On 13-01-28 06:23 AM, Andres Freund wrote:
The CF is also there to find UI warts and such, so something like this 
seems perfectly fine. Even moreso as it doesn't look this will get 
into 9.3 anyway. I wanted to add such an option, but I was too 
lazy^Wbusy to think about the sematics. Your current syntax doesn't 
really allow arguments to be specified in a nice way. I was thinking 
of -o name=value and allowing multiple specifications of -o to build 
the option string. Any arguments against that?


Multiple -o options sound fine to me.



/* Initiate the replication stream at specified location */
-   snprintf(query, sizeof(query), START_LOGICAL_REPLICATION '%s' %X/%X,
-slot, (uint32) (startpos  32), (uint32) startpos);
+   snprintf(query, sizeof(query), START_LOGICAL_REPLICATION '%s' %X/%X 
(%s),
+slot, (uint32) (startpos  32), (uint32) 
startpos,plugin_opts);

ISTM that (%s) shouldn't be specified when there are no options, but as
the options need to be pre-escaped anyway, that looks like a non-problem
in a bit more complete implementation.

Greetings,

Andres Freund






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers: adding information

2013-01-27 Thread Steve Singer


On 13-01-26 11:11 PM, Robert Haas wrote:

On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:

My understanding is that if the command string we give to event triggers
is ambiguous (sub-object names, schema qualifications, etc), it comes
useless for logical replication use. I'll leave it to the consumers of
that to speak up now.


Yeah, that's probably true.  I think it might be useful for other
purposes, but I think we need a bunch of infrastructure we don't have
yet to make logical replication of DDL a reality.



I agree.  Does anyone have a specific use case other than DDL 
replication where an ambiguous command string would be useful? Even for 
use cases like automatically removing a table from replication when it 
is dropped, I would want to be able to determine which table is being 
dropped unambiguously. Could I determine that from an oid? I suspect so, 
but parsing a command string and then trying to figure out the table 
from the search_path doesn't sound very appealing.



Well, the point is that if you have a function that maps a parse tree
onto an object name, any API or ABI changes can be reflected in an
updated definition for that function.  So suppose I have the command
CREATE TABLE public.foo (a int).  And we have a call
pg_target_object_namespace(), which will return public given the
parse tree for the foregoing command.  And we have a call
pg_target_object_name(), which will return foo.  We can whack around
the underlying parse tree representation all we want and still not
break anything - because any imaginable parse tree representation will
allow the object name and object namespace to be extracted.  Were that
not possible it could scarcely be called a parse tree any longer.


How do you get the fully qualified type of the first column?
col1=pg_target_get_column(x, 0)
pg_target_get_type(col1);

or something similar.

I think that could work but we would be adding a lot of API functions to 
get all the various bits of info one would want the API to expose. I 
also suspect executing triggers that had to make lots of function calls 
to walk a tree would be much slower than an extension that could just 
walk the parse-tree or some other abstract tree like structure.


Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4

2013-01-27 Thread Steve Singer


On 13-01-22 11:30 AM, Andres Freund wrote:

Hi,

I pushed a new rebased version (the xlogreader commit made it annoying
to merge).

The main improvements are
* way much coherent code internally for intializing logical rep
* explicit control over slots
* options for logical replication



Exactly what is the syntax for using that.  My reading your changes to 
repl_gram.y make me think that any of the following should work (but 
they don't).


START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1')
 ERROR:  syntax error: unexpected character (

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1' 'val1')
 ERROR:  syntax error: unexpected character (

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1','opt2')
ERROR:  syntax error: unexpected character (

I'm also attaching a patch to pg_receivellog that allows you to specify 
these options on the command line.  I'm not saying I think that it is 
appropriate to be adding more bells and whistles to the utilities  two 
weeks into the CF but I found this useful for testing so I'm sharing it.






Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services





From 176087bacec6cbf0b86e4ffeb918f41b4a5b8d7a Mon Sep 17 00:00:00 2001
From: Steve Singer ssin...@ca.afilias.info
Date: Sun, 27 Jan 2013 12:24:33 -0500
Subject: [PATCH] allow pg_receivellog to pass plugin options from the command line to the plugin

---
 src/bin/pg_basebackup/pg_receivellog.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
index 04bedbe..30b3cea 100644
--- a/src/bin/pg_basebackup/pg_receivellog.c
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -54,7 +54,7 @@ static XLogRecPtr	startpos;
 static bool do_init_slot = false;
 static bool do_start_slot = false;
 static bool do_stop_slot = false;
-
+static const char * plugin_opts=;
 
 static void usage(void);
 static void StreamLog();
@@ -84,6 +84,7 @@ usage(void)
 	printf(_(  -s, --status-interval=INTERVAL\n
 			  time between status packets sent to server (in seconds)\n));
 	printf(_(  -S, --slot=SLOTuse existing replication slot SLOT instead of starting a new one\n));
+	printf(_(  -o --options=OPTIONS   A comma separated list of options to the plugin\n));
 	printf(_(\nAction to be performed:\n));
 	printf(_(  --init initiate a new replication slot (for the slotname see --slot)\n));
 	printf(_(  --startstart streaming in a replication slot (for the slotname see --slot)\n));
@@ -264,8 +265,8 @@ StreamLog(void)
 slot);
 
 	/* Initiate the replication stream at specified location */
-	snprintf(query, sizeof(query), START_LOGICAL_REPLICATION '%s' %X/%X,
-			 slot, (uint32) (startpos  32), (uint32) startpos);
+	snprintf(query, sizeof(query), START_LOGICAL_REPLICATION '%s' %X/%X (%s),
+			 slot, (uint32) (startpos  32), (uint32) startpos,plugin_opts);
 	res = PQexec(conn, query);
 	if (PQresultStatus(res) != PGRES_COPY_BOTH)
 	{
@@ -560,6 +561,7 @@ main(int argc, char **argv)
 		{init, no_argument, NULL, 1},
 		{start, no_argument, NULL, 2},
 		{stop, no_argument, NULL, 3},
+		{options,required_argument,NULL,'o'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -584,7 +586,7 @@ main(int argc, char **argv)
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, f:nvd:h:p:U:wWP:s:S:,
+	while ((c = getopt_long(argc, argv, f:nvd:h:p:U:wWP:s:S:o:,
 			long_options, option_index)) != -1)
 	{
 		switch (c)
@@ -659,6 +661,10 @@ main(int argc, char **argv)
 			case 3:
 do_stop_slot = true;
 break;
+			case 'o':
+if(optarg != NULL)
+	plugin_opts = pg_strdup(optarg);
+break;
 /* action */
 
 			default:
-- 
1.7.0.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-27 Thread Steve Singer


On 13-01-24 11:15 AM, Steve Singer wrote:

On 13-01-24 06:40 AM, Andres Freund wrote:


Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?


I was able to get something that generated output for INSERT 
statements in a format similar to what a modified slony apply trigger 
would want.  This was with the list of tables to replicate hard-coded 
in the plugin.  This was with the patchset from the last commitfest.I 
had gotten a bit hung up on the UPDATE and DELETE support because 
slony allows you to use an arbitrary user specified unique index as 
your key.  It looks like better support for tables with a unique 
non-primary key is in the most recent patch set.  I am hoping to have 
time this weekend to update my plugin to use parameters passed in on 
the init and other updates in the most recent version.  If I make some 
progress I will post a link to my progress at the end of the weekend.  
My big issue is that I have limited time to spend on this.






A few more comments;

In decode.c DecodeDelete

+   if (r-xl_len = (SizeOfHeapDelete + SizeOfHeapHeader))
+   {
+   elog(DEBUG2, huh, no primary key for a delete on wal_level = 
logical?);

+   return;
+   }
+

I think we should be passing delete's with candidate key data logged to 
the plugin.  If the table isn't a replicated table then ignoring the 
delete is fine.  If the table is a replicated table but someone has 
deleted the unique index from the table then the plugin will receive 
INSERT changes on the table but not DELETE changes. If this happens the 
plugin would have any way of knowing that it is missing delete changes.  
If my plugin gets passed a DELETE change record but with no key data 
then my plugin could do any of

1.  Start screaming for help (ie log errors)
2.  Drop the table from replication
3.  Pass the delete (with no key values) onto the replication client and 
let it deal with it (see 1 and 2)


Also, 'huh' isn't one of our standard log message phrases :)


How do you plan on dealing with sequences?
I don't see my plugin being called on sequence changes and I don't see 
XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer.  Is there a reason 
why this can't be easily added?


Also what do we want to do about TRUNCATE support.  I could always leave 
a TRUNCATE trigger in place that logged the truncate to a sl_truncates 
and have my replication daemon respond to the insert on a   sl_truncates 
table by actually truncating the data on the replica.


I've spent some time this weekend updating my prototype plugin that 
generates slony 2.2 style COPY output.  I have attached my progress here 
(also https://github.com/ssinger/slony1-engine/tree/logical_repl).  I 
have not gotten as far as modifying slon to act as a logical log 
receiver, or made a version of the slony apply trigger that would 
process these changes.  I haven't looked into the details of what is 
involved in setting up a subscription with the snapshot exporting.


I couldn't get the options on the START REPLICATION command to parse so 
I just hard coded some list building code in the init method. I do plan 
on pasing the list of tables to replicate from the replica to the plugin 
(because this list comes from the replica).   Passing what could be a 
few thousand table names as a list of arguments is a bit ugly and I 
admit my list processing code is rough.  Does this make us want to 
reconsider the format of the option_list ?


I guess should provide an opinion on if I think that the patch in this 
CF, if committed could be used to act as a source for slony instead of 
the log trigger.



The biggest missing piece I mentioned in my email yesterday, that we 
aren't logging the old primary key on row UPDATEs.  I don't see building 
a credible replication system where you don't allow users to update any 
column of a row.


The other issues I've raised (DecodeDelete hiding bad deletes, 
replication options not parsing for me) look like easy fixes


 no wal decoding support for sequences or truncate are things that I 
could work around by doing things much like slony does today.  The SYNC 
can still capture the sequence changes in  a table (where the INSERT's 
would be logged) and I can have a trigger capture truncates.


I mostly did this review from the point of view of someone trying to use 
the feature, I haven't done a line-by-line review of the code.


I suspect Andres can address these issues and get an updated patch out 
during this CF.   I think a more detailed code review by someone more 
familiar with postgres internals will reveal a handful of other issues 
that hopefully can be fixed without a lot of effort. If this were the 
only patch in the commitfest I would encourage Andres to push to get 
these changes done.  If the standard for CF4 is that a patch needs to be 
basically in a commitable state

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-26 Thread Steve Singer


On 13-01-24 11:15 AM, Steve Singer wrote:

On 13-01-24 06:40 AM, Andres Freund wrote:


Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?


I was able to get something that generated output for INSERT 
statements in a format similar to what a modified slony apply trigger 
would want.  This was with the list of tables to replicate hard-coded 
in the plugin.  This was with the patchset from the last commitfest.I 
had gotten a bit hung up on the UPDATE and DELETE support because 
slony allows you to use an arbitrary user specified unique index as 
your key.  It looks like better support for tables with a unique 
non-primary key is in the most recent patch set.  I am hoping to have 
time this weekend to update my plugin to use parameters passed in on 
the init and other updates in the most recent version.  If I make some 
progress I will post a link to my progress at the end of the weekend.  
My big issue is that I have limited time to spend on this.




This isn't a complete review just a few questions I've hit so far that I 
thought I'd ask to see if I'm not seeing something related to updates.



*** a/src/include/catalog/index.h
--- b/src/include/catalog/index.h
*** extern bool ReindexIsProcessingHeap(Oid
*** 114,117 
--- 114,121 
  extern bool ReindexIsProcessingIndex(Oid indexOid);
  extern OidIndexGetRelation(Oid indexId, bool missing_ok);

+ extern void relationFindPrimaryKey(Relation pkrel, Oid *indexOid,
+int16 *nratts, int16 *attnums, Oid 
*atttypids,

+Oid *opclasses);
+
  #endif   /* INDEX_H */


I don't see this defined anywhere could it be left over from a previous 
version of the patch?



In decode.c
DecodeUpdate:
+
+   /*
+* FIXME: need to get/save the old tuple as well if we want primary key
+* changes to work.
+*/
+   change-newtuple = ReorderBufferGetTupleBuf(reorder);

I also don't see any code in heap_update to find + save the old primary 
key values like you added to heap_delete.   You didn't list Add ability 
to change the primary key on an UPDATE in the TODO so I'm wondering if 
I'm missing something.  Is there another way I can bet the primary key 
values for the old_tuple?


Also,

I think the name of the test contrib module was changed but you didn't 
update the make file. This fixes it


diff --git a/contrib/Makefile b/contrib/Makefile
index 1cc30fe..36e6bfe 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -50,7 +50,7 @@ SUBDIRS = \
tcn \
test_parser \
test_decoding   \
-   test_logical_replication \
+   test_logical_decoding \
tsearch2\
unaccent\
vacuumlo\







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-24 Thread Steve Singer

On 13-01-24 06:40 AM, Andres Freund wrote:

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

I was able to get something that generated output for INSERT statements
in a format similar to what a modified slony apply trigger would want.
This was with the list of tables to replicate hard-coded in the plugin.
This was with the patchset from the last commitfest.I had gotten a bit
hung up on the UPDATE and DELETE support because slony allows you to use
an arbitrary user specified unique index as your key. It looks like
better support for tables with a unique non-primary key is in the most
recent patch set. I am hoping to have time this weekend to update my
plugin to use parameters passed in on the init and other updates in the
most recent version. If I make some progress I will post a link to my
progress at the end of the weekend. My big issue is that I have limited
time to spend on this.

BTW, why does all the transaction reordering stuff has to be in core?

It didn't use to, but people argued pretty damned hard that no undecoded
data should ever allowed to leave the postgres cluster. And to be fair
it makes writing an output plugin *way* much easier. Check
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4
If you skip over tuple_to_stringinfo(), which is just pretty generic
scaffolding for converting a whole tuple to a string, writing out the
changes in some format by now is pretty damn simple.

I think we will find that the replication systems won't be the only
users of this feature. I have often seen systems that have a logging
requirement for auditing purposes or to log then reconstruct the
sequence of changes made to a set of tables in order to feed a
downstream application. Triggers and a journaling table are the
traditional way of doing this but it should be pretty easy to write a
plugin to accomplish the same thing that should give better
performance. If the reordering stuff wasn't in core this would be much
harder.

How much of this infrastructure is to support replicating DDL changes? IOW,
if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

Any other features or requirements that could be dropped? I think it's clear at
this stage that
this patch is not going to be committed as it is. If you can reduce it to a
fraction of what it is now, that fraction might have a chance. Otherwise,
it's just going to be pushed to the next commitfest as whole, and we're
going to be having the same doubts and discussions then.

One thing that reduces complexity is to declare the following as
unsupported:
- CREATE TABLE foo(data text);
- DECODE UP TO HERE;
- INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple);
- DROP TABLE foo;
- DECODE UP TO HERE;

but thats just a minor thing.

I think what we can do more realistically than to chop of required parts
of changeset extraction is to start applying some of the preliminary
patches independently:
- the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
relnode) should be independently committable if a bit boring
- allowing walsenders to connect to a database possibly needs an interface
change
but otherwise it should be fine to go in independently. It also has
other potential use-cases, so I think thats fair.
- logging xl_running_xact's more frequently could also be committed
independently and makes sense independently as it allows a standby to
enter HS faster if the master is busy
- Introducing InvalidCommandId should be relatively uncontroversial. The
fact that no invalid value for command ids exists is imo an oversight
- the *Satisfies change could be applied and they are imo ready but
there's no use-case for it without the rest, so I am not sure whether
theres a point
- currently not separately available, but we could add wal_level=logical
independently. There would be no user of it, but it would be partial
work. That includes the relcache support for keeping track of the
primary key which already is available separately.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers: adding information

2013-01-24 Thread Steve Singer


On 13-01-24 05:43 AM, Dimitri Fontaine wrote:

Robert Haas robertmh...@gmail.com writes:

On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:

   - Context exposing and filtering


I'm not feeling very sanguine about any of this.  I feel strongly that
we should try to do something that's more principled than just
throwing random junk into ProcessUtility.


The goal is to allow for advanced users of that feature to get at the
sequence, constraints and index names that have been picked
automatically by PostgreSQL when doing some CREATE TABLE statement that
involves them:

CREATE TABLE foo (id serial PRIMARY KEY, f1 text);

Andres and Steve, as you're the next ones to benefit directly from that
whole feature and at different levels, do you have a strong opinion on
whether you really need that to operate at the logical level?




I haven't been following this thread very closely and I'm not exactly 
sure what your asking.


If the question is what a replication system would need to replicate DDL 
commands, then I need a method of translating whatever the DDL trigger 
is passed into either the 'CREATE TABLE public.foo(id serial primary 
key, f1 text)' OR some set of equivalent commands that will allow me to 
create the same objects on the replica. I don't have any brilliant 
insight on how I would go about during it. Honestly I haven't spent a 
lot of time thinking about it in the past 8 months.


If your asking what I would need for a trigger to automatically add a 
new table to replication then I think I would only need to know (or be 
able to obtain) the fully qualified name of the table and then in 
another trigger call be given the name of the fully qualified name of 
the sequence.  The triggers would need to be able to do DML to 
configuration tables.  I don't need to know that a table and sequence 
are connected if all I want to do is replicate them.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-21 Thread Steve Singer


On 13-01-21 02:28 AM, Andres Freund wrote:


I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its only for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...


Here is the output from tprof

Here is the baseline:

Average time 37862

Total Ticks For All Processes (./postgres_perftest) = 19089



Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  2270   0.99 aset.c 3bfb84b0
.base_yyparse   1217   0.53 gram.c fe644  168ec
.text   1015   0.44 copyfuncs.c 11bfb4   4430
.MemoryContextAllocZero  535   0.23 mcxt.c 3b990 f0
.MemoryContextAlloc  533   0.23 mcxt.c 3b780 e0
.check_stack_depth   392   0.17 postgres.c 568ac100
.core_yylex  385   0.17 gram.c fb5b4   1c90
.expression_tree_walker  347   0.15 nodeFuncs.c 50da4750
.AllocSetFree308   0.13 aset.c 3c9981b0
.grouping_planner242   0.11 planner.c 2399f0   1d10
.SearchCatCache  198   0.09 catcache.c 7ec20550
._SPI_execute_plan   195   0.09 spi.c 2f0c187c0
.pfree   195   0.09 mcxt.c 3b3b0 70
query_dependencies_walker185   0.08 setrefs.c 2559441b0
.GetSnapshotData 183   0.08 procarray.c 69efc460
.query_tree_walker   176   0.08 nodeFuncs.c 50ae4210
.strncpy 168   0.07 strncpy.s ba080130
.fmgr_info_cxt_security  166   0.07 fmgr.c 3f7b0850
.transformStmt   159   0.07 analyze.c 29091c   12d0
.text141   0.06 parse_collate.c 28ddf8  1
.ExecInitExpr137   0.06 execQual.c 17f18c   15f0
.fix_expr_common 132   0.06 setrefs.c 2557e4160
.standard_ExecutorStart  127   0.06 execMain.c 1d9a00940
.GetCachedPlan   125   0.05 plancache.c ce664310
.strcpy  121   0.05 noname 3bd401a8


With your changes (same test as I described before)

Average: 37938


Total Ticks For All Processes (./postgres_perftest) = 19311

Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  2162   2.17 aset.c 3bfb84b0
.base_yyparse   1242   1.25 gram.c fdc7c  167f0
.text   1028   1.03 copyfuncs.c 11b4d0   4210
.palloc  553   0.56 mcxt.c 3b4c8 d0
.MemoryContextAllocZero  509   0.51 mcxt.c 3b9e8 f0
.core_yylex  413   0.41 gram.c fac2c   1c60
.check_stack_depth   404   0.41 postgres.c 56730100
.expression_tree_walker  320   0.32 nodeFuncs.c 50d28750
.AllocSetFree261   0.26 aset.c 3c9981b0
._SPI_execute_plan   232   0.23 spi.c 2ee6247c0
.GetSnapshotData 221   0.22 procarray.c 69d54460
.grouping_planner211   0.21 planner.c 237b60   1cf0
.MemoryContextAlloc  190   0.19 mcxt.c 3b738 e0
.query_tree_walker   184   0.18 nodeFuncs.c 50a68210
.SearchCatCache  182   0.18 catcache.c 7ea08550
.transformStmt   181   0.18 analyze.c 28e774   12d0
query_dependencies_walker180   0.18 setrefs.c 25397c1b0
.strncpy 175   0.18 strncpy.s b9a60130
.MemoryContextCreate 167   0.17 mcxt.c 3bad8160
.pfree   151   0.15 mcxt.c 3b208 70
.strcpy  150   0.15 noname 3bd401a8
.fmgr_info_cxt_security  146   0.15 fmgr.c 3f790850
.text132   0.13 parse_collate.c 28bc50  1
.ExecInitExpr125   0.13 execQual.c 17de28   15e0
.expression_tree_mutator 124   0.12 nodeFuncs.c 53268   1080
.strcmp  122   0.12 noname 1e44158
.fix_expr_common 117   0.12 setrefs.c 25381c160





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-21 Thread Steve Singer


On 13-01-21 12:15 PM, Andres Freund wrote:

On 2013-01-21 11:59:18 -0500, Steve Singer wrote:

On 13-01-21 02:28 AM, Andres Freund wrote:

I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its only for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...

Here is the output from tprof

Here is the baseline:

Any chance to do the test ontop of HEAD? The elog stuff has gone in only
afterwards and might actually effect this enough to be relevant.


HEAD(:fb197290)
Average: 28877

.AllocSetAlloc  1442   1.90 aset.c 3a9384b0
.base_yyparse   1220   1.61 gram.c fdbc0  168ec
.MemoryContextAlloc  485   0.64 mcxt.c 3a154 e0
.core_yylex  407   0.54 gram.c fab30   1c90
.AllocSetFree320   0.42 aset.c 3b3181b0
.MemoryContextAllocZero  316   0.42 mcxt.c 3a364 f0
.grouping_planner271   0.36 planner.c 2b0ce8   1d10
.strncpy 256   0.34 strncpy.s b8ca0130
.expression_tree_walker  222   0.29 nodeFuncs.c 4f734750
._SPI_execute_plan   221   0.29 spi.c 2fb230840
.SearchCatCache  182   0.24 catcache.c 7d870550
.GetSnapshotData 161   0.21 procarray.c 68acc460
.fmgr_info_cxt_security  155   0.20 fmgr.c 3e130850
.pfree   152   0.20 mcxt.c 39d84 70
.expression_tree_mutator 151   0.20 nodeFuncs.c 51c84   1170
.check_stack_depth   142   0.19 postgres.c 5523c100
.text126   0.17 parse_collate.c 233d90  1
.transformStmt   125   0.16 analyze.c 289e88   12c0
.ScanKeywordLookup   123   0.16 kwlookup.c f7ab4154
.new_list120   0.16 list.c 40f74 b0
.subquery_planner115   0.15 planner.c 2b29f8dc0
.GetCachedPlan   115   0.15 plancache.c cdab0310
.ExecInitExpr114   0.15 execQual.c 17e690   15f0
.set_plan_refs   113   0.15 setrefs.c 252630cb0
.standard_ExecutorStart  110   0.14 execMain.c 1d9264940
.heap_form_tuple 107   0.14 heaptuple.c 177c842f0
.query_tree_walker   105   0.14 nodeFuncs.c 4f474210



HEAD + patch:
Average: 29035

Total Ticks For All Processes (./postgres_perftest) = 15044

Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  1422   1.64 aset.c 3a9384b0
.base_yyparse   1201   1.38 gram.c fd1e8  167f0
.palloc  470   0.54 mcxt.c 39e04 d0
.core_yylex  364   0.42 gram.c fa198   1c60
.MemoryContextAllocZero  282   0.33 mcxt.c 3a324 f0
._SPI_execute_plan   250   0.29 spi.c 2f8c18840
.AllocSetFree244   0.28 aset.c 3b3181b0
.strncpy 234   0.27 strncpy.s b86a0130
.expression_tree_walker  229   0.26 nodeFuncs.c 4f698750
.grouping_planner176   0.20 planner.c 2aea84   1d30
.SearchCatCache  170   0.20 catcache.c 7d638550
.GetSnapshotData 168   0.19 procarray.c 68904460
.assign_collations_walker155   0.18 parse_collate.c 231f4c7e0
.subquery_planner141   0.16 planner.c 2b07b4dc0
.fmgr_info_cxt_security  141   0.16 fmgr.c 3e110850
.check_stack_depth   140   0.16 postgres.c 550a0100
.ExecInitExpr138   0.16 execQual.c 17d2f8   15e0
.pfree   137   0.16 mcxt.c 39b44 70
.transformStmt   132   0.15 analyze.c 287dec   12c0
.new_list128   0.15 list.c 40f44 90
.expression_tree_mutator 125   0.14 nodeFuncs.c 51bd8   1080
.preprocess_expression   118   0.14 planner.c 2adf541a0
.strcmp  118   0.14 noname 1e44158
.standard_ExecutorStart  115   0.13 execMain.c 1d77c0940
.MemoryContextAlloc  111   0.13 mcxt.c 3a074 e0
.set_plan_refs   109   0.13 setrefs.c 2506c4ca0





Otherwise I have to say I am a bit confused by the mighty difference in
costs attributed to AllocSetAlloc given the amount of calls should be
exactly the same and the number of trampoline function calls should
also stay the same.
Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-19 Thread Steve Singer


On 13-01-09 03:07 PM, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

Well, I *did* benchmark it as noted elsewhere in the thread, but thats
obviously just machine (E5520 x 2) with one rather restricted workload
(pgbench -S -jc 40 -T60). At least its rather palloc heavy.
Here are the numbers:
before:
#101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 
101449.857665
after:
#101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 
101673.400992
So on my system if there is a difference, its positive (0.12%).

pgbench-based testing doesn't fill me with a lot of confidence for this
--- its numbers contain a lot of communication overhead, not to mention
that pgbench itself can be a bottleneck.  It struck me that we have a
recent test case that's known to be really palloc-intensive, namely
Pavel's example here:
http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com

I set up a non-cassert build of commit
78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
reduced the data-copying overhead for that).  On my Fedora 16 machine
(dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
I get a runtime for Pavel's example of 17023 msec (average over five
runs).  I then applied oprofile and got a breakdown like this:

   samples|  %|
--
108409 84.5083 /home/tgl/testversion/bin/postgres
 13723 10.6975 /lib64/libc-2.14.90.so
  3153  2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %symbol name
1096010.1495  AllocSetAlloc
6325  5.8572  MemoryContextAllocZeroAligned
6225  5.7646  base_yyparse
3765  3.4866  copyObject
2511  2.3253  MemoryContextAlloc
2292  2.1225  grouping_planner
2044  1.8928  SearchCatCache
1956  1.8113  core_yylex
1763  1.6326  expression_tree_walker
1347  1.2474  MemoryContextCreate
1340  1.2409  check_stack_depth
1276  1.1816  GetCachedPlan
1175  1.0881  AllocSetFree
1106  1.0242  GetSnapshotData
1106  1.0242  _SPI_execute_plan
1101  1.0196  extract_query_dependencies_walker

I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
Now I get an average runtime of 1 ms, a full 2% faster, which is a
bit astonishing, particularly because the oprofile results haven't moved
much:

107642 83.7427 /home/tgl/testversion/bin/postgres
 14677 11.4183 /lib64/libc-2.14.90.so
  3180  2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %symbol name
10038 9.3537  AllocSetAlloc
6392  5.9562  MemoryContextAllocZeroAligned
5763  5.3701  base_yyparse
4810  4.4821  copyObject
2268  2.1134  grouping_planner
2178  2.0295  core_yylex
1963  1.8292  palloc
1867  1.7397  SearchCatCache
1835  1.7099  expression_tree_walker
1551  1.4453  check_stack_depth
1374  1.2803  _SPI_execute_plan
1282  1.1946  MemoryContextCreate
1187  1.1061  AllocSetFree
...
653   0.6085  palloc0
...
552   0.5144  MemoryContextAlloc

The number of calls of AllocSetAlloc certainly hasn't changed at all, so
how did that get faster?

I notice that the postgres executable is about 0.2% smaller, presumably
because a whole lot of inlined fetches of CurrentMemoryContext are gone.
This makes me wonder if my result is due to chance improvements of cache
line alignment for inner loops.

I would like to know if other people get comparable results on other
hardware (non-Intel hardware would be especially interesting).  If this
result holds up across a range of platforms, I'll withdraw my objection
to making palloc a plain function.

regards, tom lane



Sorry for the delay I only read this thread today.


I just tried Pawel's test on a POWER5 machine with an older version of 
gcc (see the grebe buildfarm animal for details)


78a5e738e:   37874.855 (average of 6 runs)
78a5e738 + palloc.h + mcxt.c: 38076.8035

The functions do seem to slightly slow things down on POWER. I haven't bothered 
to run oprofile or tprof to get a breakdown of the functions since Andres has 
already removed this from his patch.

Steve




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4

2013-01-19 Thread Steve Singer


On 13-01-14 08:38 PM, Andres Freund wrote:

Hi everyone,

Here is the newest version of logical changeset generation.



2) Currently the logical replication infrastructure assigns a 'slot-id'
when a new replica is setup. That slot id isn't really nice
(e.g. id-321578-3). It also requires that [18] keeps state in a global
variable to make writing regression tests easy.

I think it would be better to make the user specify those replication
slot ids, but I am not really sure about it.


Shortly after trying out the latest version I hit the following scenario
1. I started pg_receivellog but mistyped the name of my plugin
2. It looped and used up all of my logical replication slots

I killed pg_receivellog and restarted it with the correct plugin name 
but it won't do anything because I have no free slots.  I can't free the 
slots with -F because I have no clue what the names of the slots are.
I can figure the names out by looking in pg_llog but if my replication 
program can't do that so it won't be able to clean up from a failed attempt.


I agree with you that we should make the user program specify a slot, we 
eventually might want to provide a view that shows the currently 
allocated slots. For a logical based slony I would just generate the 
slot name based on the remote node id.  If walsender generates the slot 
name then I would need to store a mapping between slot names and slons 
so when a slon restarted it would know which slot to resume using.   I'd 
have to use a table in the slony schema on the remote database for 
this.  There would always be a risk of losing track of a slot id if the 
slon crashed after getting the slot number but before committing the 
mapping on the remote database.






3) Currently no options can be passed to an output plugin. I am thinking
about making INIT_LOGICAL_REPLICATION 'plugin' accept the now widely
used ('option' ['value'], ...) syntax and pass that to the output
plugin's initialization function.


I think we discussed this last CF, I like this idea.


4) Does anybody object to:
-- allocate a permanent replication slot
INIT_LOGICAL_REPLICATION 'plugin' 'slotname' (options);

-- stream data
START_LOGICAL_REPLICATION 'slotname' 'recptr';

-- deallocate a permanent replication slot
FREE_LOGICAL_REPLICATION 'slotname';


+1



5) Currently its only allowed to access catalog tables, its fairly
trivial to extend this to additional tables if you can accept some
(noticeable but not too big) overhead for modifications on those tables.

I was thinking of making that an option for tables, that would be useful
for replication solutions configuration tables.


I think this will make the life of anyone developing a new replication 
system easier.  Slony has a lot of infrastructure for allowing slonik 
scripts to wait for configuration changes to popogate everywhere before 
making other configuration changes because you can get race conditions.  
If I were designing a new replication system and I had this feature then 
I would try to use it to come up with a simpler model of propagating 
configuration changes.




Andres Freund




Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] AIX buildfarm member

2013-01-11 Thread Steve Singer

The only animal in the buildfarm running AIX is grebe 
(http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grebebr=HEAD)


It is likely that the server running this animal will go away sometime 
this year and the machine replacing it isn't running AIX.


If someone else in the community is running PostgreSQL on AIX then it 
would be good if they setup a buildfarm member, perhaps with a more 
recent version of AIX.



Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding exported base snapshot

2012-12-13 Thread Steve Singer


On 12-12-12 06:20 AM, Andres Freund wrote:

Possible solutions:
1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
confirms that logical replication initialization is finished. Before
that the walsender connection cannot be used for anything else.

2) we remove the snapshot as soon as any other commend is received, this
way the replication connection stays usable, e.g. to issue a
START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
case the snapshot would have to be imported *before* the next command
was received as SET TRANSACTION SNAPSHOT requires the source transaction
to be still open.
Option 2 sounds more flexible.  Is it more difficult to implement?

No, I don't think so. It's a bit more intrusive in that it requires
knowledge about logical replication in more parts of walsender, but it
should be ok.

Note btw, that my description of 1) was easy to misunderstand. The
that in Before that the walsender connection cannot be used for
anything else. is the answer from the client, not the usage of the
exported snapshot. Once the snapshot has been iimported into other
session(s) the source doesn't need to be alive anymore.
Does that explanation change anything?


I think I understood you were saying the walsender connection can't be 
used for anything else (ie streaming WAL) until the exported snapshot 
has been imported.  I think your clarification is still consistent with 
this?


WIth option 2 I can still get the option 1 behaviour by not sending the 
next command to the walsender until I am done importing the snapshot.  
However if I want to start processing WAL before the snapshot has been 
imported I can do that with option 2.


I am not sure I would need to do that, I'm just saying having the option 
is more flexible.







Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding exported base snapshot

2012-12-11 Thread Steve Singer


On 12-12-11 06:52 PM, Andres Freund wrote:

Hi,




Problem 1:

One problem I see is that while exporting a snapshot solves the
visibility issues of the table's contents it does not protect against
schema changes. I am not sure whether thats a problem.

If somebody runs a CLUSTER or something like that, the table's contents
will be preserved including MVCC semantics. That's fine.
The more problematic cases I see are TRUNCATE, DROP and ALTER
TABLE. Imagine the following:

S1: INIT_LOGICAL_REPLICATION
S1: get snapshot: 0333-1
S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text);
S3: pg_dump --snapshot 0333-1
S1: START_LOGICAL_REPLICATION

In that case the pg_dump would dump foo using the schema *after* the
ALTER TABLE but showing only rows visible to our snapshot. After
START_LOGICAL_REPLICATION all changes after the xlog position from
INIT_LOGICAL_REPLICATION will be returned though - including all the
tuples from the ALTER TABLE and potentially - if some form of schema
capturing was in place - the ALTER TABLE itself. The copied schema would
have the new format already though.

Does anybody see that as aproblem or is it just a case of PEBKAC? One
argument for the latter is that thats already a problematic case for
normal pg_dump's. Its just that the window is a bit larger here.


Is there anyway to detect this situation as part of the pg_dump?  If I 
could detect this, abort my pg_dump then go and get a new snapshot then 
I don't see this as a big deal.  I can live with telling users, don't 
do DDL like things while subscribing a new node, if you do the 
subscription will restart. I am less keen on telling users don't do 
DDL like things while subscribing a new node or the results will be 
unpredictable


I'm trying to convince myself if I will be able to take a pg_dump from 
an exported snapshot plus the changes made after in between the snapshot 
id to some later time and turn the results into a consistent database.  
What if something like this comes along


INIT REPLICATION
insert into foo (id,bar) values (1,2);
alter table foo drop column bar;
pg_dump --snapshot


The schema I get as part of the pg_dump won't have bar because it has 
been dropped, even though it will have the rows that existed with bar.
I then go to process the INSERT statement.  It will have a WAL record 
with column data for bar and the logical replication replay will lookup 
the catalog rows from before the alter table so it will generate a 
logical INSERT record with BAR.  That will fail on the replica.


I'm worried that it will be  difficult to pragmatically stitch together 
the  inconsistent snapshot from the pg_dump plus the logical records 
generated in between the snapshot and the dump (along with any record of 
the DDL if it exists).






Problem 2:

Control Flow.

To be able to do a SET TRANSACTION SNAPSHOT the source transaction
needs to be alive. That's currently solved by exporting the snapshot in
the walsender connection that did the INIT_LOGICAL_REPLICATION. The
question is how long should we preserve that snapshot?

There is no requirement - and there *cannot* be one in the general case,
the slot needs to be usable after a restart - that
START_LOGICAL_REPLICATION has to be run in the same replication
connection as INIT.

Possible solutions:
1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
confirms that logical replication initialization is finished. Before
that the walsender connection cannot be used for anything else.

2) we remove the snapshot as soon as any other commend is received, this
way the replication connection stays usable, e.g. to issue a
START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
case the snapshot would have to be imported *before* the next command
was received as SET TRANSACTION SNAPSHOT requires the source transaction
to be still open.

Opinions?


Option 2 sounds more flexible.  Is it more difficult to implement?

Andres
--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel

2012-12-03 Thread Steve Singer


On 12-12-03 07:42 AM, Andres Freund wrote:

Two things:
1) Which exact options are you using for pg_receivellog? Not -d
replication by any chance?


Yes that is exactly what I'md doing.  Using a real database name instead 
makes this go away.


Thanks


This seems to produce exactly that kind off error messages. I added some
error checking around that. Pushed.

Thanks!

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel

2012-12-03 Thread Steve Singer


On 12-12-03 09:48 AM, Andres Freund wrote:

On 2012-12-03 09:35:55 -0500, Steve Singer wrote:

On 12-12-03 07:42 AM, Andres Freund wrote:
Yes that is exactly what I'md doing. Using a real database name 
instead makes this go away. 

Was using replication an accident or do you think it makes sense in
some way?


The 'replication' line in pg_hba.conf made me think that the database 
name had to be replication for walsender connections.



Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel

2012-12-02 Thread Steve Singer


On 12-11-14 08:17 PM, Andres Freund wrote:

I am getting errors like the following when I try to use either your 
test_decoding plugin or my own (which does even less than yours)



LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
WARNING:  connecting to
WARNING:  Initiating logical rep
LOG:  computed new xmin: 773
LOG:  start reading from 0/17F5D58, scrolled back to 0/17F4000
LOG:  got new xmin 773 at 25124280
LOG:  found initial snapshot (via running xacts). Done: 1
WARNING:  reached consistent point, stopping!
WARNING:  Starting logical replication
LOG:  start reading from 0/17F5D58, scrolled back to 0/17F4000
LOG:  found initial snapshot (via running xacts). Done: 1
FATAL:  cannot read pg_class without having selected a database
TRAP: FailedAssertion(!(SHMQueueEmpty((MyProc-myProcLocks[i]))), 
File: proc.c, Line: 759)


This seems to be happening under the calls at
reorderbuffer.c:832if (!SnapBuildHasCatalogChanges(NULL, xid, 
change-relnode))


The sequence of events I do is:
1. start pg_receivellog
2. run a checkpoint
3. Attach to the walsender process with gdb
4. Start a new client connection with psql and do 'INSERT INTO a values 
(1)' twice.


(skipping step 3 doesn't make a difference)




I



This introduces several things:
* 'reorderbuffer' module which reassembles transactions from a stream of 
interspersed changes
* 'snapbuilder' which builds catalog snapshots so that tuples from wal can be 
understood
* logging more data into wal to facilitate logical decoding
* wal decoding into an reorderbuffer
* shared library output plugins with 5 callbacks
  * init
  * begin
  * change
  * commit
* walsender infrastructur to stream out changes and to keep the global xmin low 
enough
  * INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built 
and returns
* initial LSN
* replication slot identifier
* id of a pg_export() style snapshot
  * START_LOGICAL_REPLICATION $id $lsn; streams out changes
  * uses named output plugins for output specification

Todo:
* testing infrastructure (isolationtester)
* persistence/spilling to disk of built snapshots, longrunning
   transactions
* user docs
* more frequent lowering of xmins
* more docs about the internals
* support for user declared catalog tables
* actual exporting of initial pg_export snapshots after
   INIT_LOGICAL_REPLICATION
* own shared memory segment instead of piggybacking on walsender's
* nicer interface between snapbuild.c, reorderbuffer.c, decode.c and the
   outside.
* more frequent xl_running_xid's so xmin can be upped more frequently
* add STOP_LOGICAL_REPLICATION $id
---
  src/backend/access/heap/heapam.c|  280 +-
  src/backend/access/transam/xlog.c   |1 +
  src/backend/catalog/index.c |   74 ++
  src/backend/replication/Makefile|2 +
  src/backend/replication/logical/Makefile|   19 +
  src/backend/replication/logical/decode.c|  496 ++
  src/backend/replication/logical/logicalfuncs.c  |  247 +
  src/backend/replication/logical/reorderbuffer.c | 1156 +++
  src/backend/replication/logical/snapbuild.c | 1144 ++
  src/backend/replication/repl_gram.y |   32 +-
  src/backend/replication/repl_scanner.l  |2 +
  src/backend/replication/walsender.c |  566 ++-
  src/backend/storage/ipc/procarray.c |   23 +
  src/backend/storage/ipc/standby.c   |8 +-
  src/backend/utils/cache/inval.c |2 +-
  src/backend/utils/cache/relcache.c  |3 +-
  src/backend/utils/misc/guc.c|   11 +
  src/backend/utils/time/tqual.c  |  249 +
  src/bin/pg_controldata/pg_controldata.c |2 +
  src/include/access/heapam_xlog.h|   23 +
  src/include/access/transam.h|5 +
  src/include/access/xlog.h   |3 +-
  src/include/catalog/index.h |4 +
  src/include/nodes/nodes.h   |2 +
  src/include/nodes/replnodes.h   |   22 +
  src/include/replication/decode.h|   21 +
  src/include/replication/logicalfuncs.h  |   44 +
  src/include/replication/output_plugin.h |   76 ++
  src/include/replication/reorderbuffer.h |  284 ++
  src/include/replication/snapbuild.h |  128 +++
  src/include/replication/walsender.h |1 +
  src/include/replication/walsender_private.h |   34 +-
  src/include/storage/itemptr.h   |3 +
  src/include/storage/sinval.h|2 +
  src/include/utils/tqual.h   |   31 +-
  35 files changed, 4966 insertions(+), 34 deletions(-)
  create mode 100644 src/backend/replication/logical/Makefile
  create mode 100644 src/backend/replication/logical/decode.c

Re: [HACKERS] logical changeset generation v3 - Source for Slony

2012-11-19 Thread Steve Singer


First, you can add me to the list of people saying 'wow', I'm impressed.

The approach I am taking to reviewing this to try and answer the 
following question


1) How might a future version of slony be able to use logical 
replication as described by your patch and design documents

and what would that look like.

2) What functionality is missing from the patch set that would stop me 
from implementing or prototyping the above.




Connecting slon to the remote postgresql


Today the slony remote listener thread queries a bunch of events from 
sl_event for a batch of SYNC events. Then the remote helper thread 
queries data from sl_log_1 and sl_log_2.I see this changing, instead 
the slony remote listener thread would connect to the remote system and 
get a logical replication stream.


  1) Would slony connect as a normal client connection and call 
something like 'select pg_slony_process_xlog(...)' to get bunch of 
logical replication

  change records to process.
  OR
  2) Would slony connect as a replication connection similar to how the 
pg_receivelog program does today and then process the logical changeset

  outputs.  Instead of writing it to a file (as pg_receivelog does)

It seems that the second approach is what is encouraged.  I think we 
would put a lot of the pg_receivelog functionality into slon and it 
would issue a command like 'INIT_LOGICAL_REPLICATION 'slony') to use the 
slony logical replication plugin.  Slon would also have to provide 
feedback to the walsender about what it has processed so the origin 
database knows what catalog snapshots can be expired.  Based on 
eyeballing pg_receivelog.c it seems that about half the code in the 700 
file is related to command line arguments etc, and the other half is 
related to looping over the copy out stream, sending feedback and other 
things that we would need to duplicate in slon.


pg_receivelog.c has  comment:

/*
 * We have to use postgres.h not postgres_fe.h here, because there's so 
much

 * backend-only stuff in the XLOG include files we need.  But we need a
 * frontend-ish environment otherwise.Hence this ugly hack.
 */

This looks like more of a carryover from pg_receivexlog.c.  From what I 
can tell we can eliminate the postgres.h include if we also eliminate 
the utils/datetime.h and utils/timestamp.h and instead add in:


#include postgres_fe.h
#define POSTGRES_EPOCH_JDATE 2451545
#define UNIX_EPOCH_JDATE 2440588
#define SECS_PER_DAY 86400
#define USECS_PER_SEC INT64CONST(100)
typedef int64 XLogRecPtr;
#define InvalidXLogRecPtr 0

If there is a better way of getting these defines someone should speak 
up.   I recall that in the past slon actually did include postgres.h and 
it caused some issues (I think with MSVC win32 builds).  Since 
pg_receivelog.c will be used as a starting point/sample for third 
parties to write client programs it would be better if it didn't 
encourage client programs to include postgres.h



The Slony Output Plugin
=

Once we've modified slon to connect as a logical replication client we 
will need to write a slony plugin.


As I understand the plugin API:
* A walsender is processing through WAL records, each time it sees a 
COMMIT WAL record it will call my plugins

.begin
.change (for each change in the transaction)
.commit

* The plugin for a particular stream/replication client will see one 
transaction at a time passed to it in commit order.  It won't see 
.change(t1) followed by .change (t2), followed by a second .change(t1).  
The reorder buffer code hides me from all that complexity (yah)


From a slony point of view I think the output of the plugin will be 
rows, suitable to be passed to COPY IN of the form:


origin_id, table_namespace,table_name,command_type, 
cmd_updatencols,command_args


This is basically the Slony 2.2 sl_log format minus a few columns we no 
longer need (txid, actionseq).
command_args is a postgresql text array of column=value pairs.  Ie [ 
{id=1},{name='steve'},{project='slony'}]


I don't t think our output plugin will be much more complicated than the 
test_decoding plugin.  I suspect we will want to give it the ability to 
filter out non-replicated tables.   We will also have to filter out 
change records that didn't originate on the local-node that aren't part 
of a cascaded subscription.  Remember that in a two node cluster  slony 
will have connections from A--B  and from B---A even if user tables 
only flow one way. Data that is replicated from A into B will show up in 
the WAL stream for B.


Exactly how we do this filtering is an open question,  I think the 
output plugin will at a minimum need to know:


a) What the slony node id is of the node it is running on.  This is easy 
to figure out if the output plugin is able/allowed to query its 
database.  Will this be possible? I would expect to be able to query the 
database as it exists now(at plugin invocation time) not as it existed 
in the past

Re: [HACKERS] logical changeset generation v3 - Source for Slony

2012-11-19 Thread Steve Singer


On 12-11-18 11:07 AM, Andres Freund wrote:

Hi Steve!


I think we should provide some glue code to do this, otherwise people
will start replicating all the bugs I hacked into this... More
seriously: I think we should have support code here, no user will want
to learn the intracacies of feedback messages and such. Where that would
live? No idea.


libpglogicalrep.so ?


I wholeheartedly aggree. It should also be cleaned up a fair bit before
others copy it should we not go for having some client side library.

Imo the library could very roughly be something like:

state = SetupStreamingLLog(replication-slot, ...);
while((message = StreamingLLogNextMessage(state))
{
  write(outfd, message-data, message-length);
  if (received_100_messages)
  {
   fsync(outfd);
   StreamingLLogConfirm(message);
  }
}

Although I guess thats not good enough because StreamingLLogNextMessage
would be blocking, but that shouldn't be too hard to work around.



How about we pass a timeout value to StreamingLLogNextMessage (..) where 
it returns if no data is available after the timeout to give the caller 
a chance to do something else.



This is basically the Slony 2.2 sl_log format minus a few columns we no
longer need (txid, actionseq).
command_args is a postgresql text array of column=value pairs.  Ie [
{id=1},{name='steve'},{project='slony'}]

It seems to me that that makes escaping unneccesarily complicated, but
given you already have all the code... ;)


When I look at the actual code/representation we picked it is closer to 
{column1,value1,column2,value2...}





I don't t think our output plugin will be much more complicated than the
test_decoding plugin.

Good. Thats the idea ;). Are you ok with the interface as it is now or
would you like to change something?


I'm going to think about this some more and maybe try to write an 
example plugin before I can say anything with confidence.




Yes. We will also need something like that. If you remember the first
prototype we sent to the list, it included the concept of an
'origin_node' in wal record. I think you actually reviewed that one ;)

That was exactly aimed at something like this...

Since then my thoughts about how the origin_id looks like have changed a
bit:
- origin id is internally still represented as an uint32/Oid
   - never visible outside of wal/system catalogs
- externally visible it gets
   - assigned an uuid
   - optionally assigned a user defined name
- user settable (permissions?) origin when executing sql:
   - SET change_origin_uuid = 'uuid';
   - SET change_origin_name = 'user-settable-name';
   - defaults to the local node
- decoding callbacks get passed the origin of a change
   - txn-{origin_uuid, origin_name, origin_internal?}
- the init decoding callback can setup an array of interesting origins,
   so the others don't even get the ReorderBuffer treatment

I have to thank the discussion on -hackers and a march through prague
with Marko here...
So would the uuid and optional name assignment be done in the output 
plugin or some else?
When/how does the uuid get generated and where do we store it so the 
same uuid gets returned when postgres restarts.  Slony today stores all 
this type of stuff in user-level tables and user-level functions 
(because it has no other choice).What is the connection between 
these values and the 'slot-id' in your proposal for the init arguments? 
Does the slot-id need to be the external uuid of the other end or is 
there no direct connection?


Today slony allows us to replicate between two databases in the same 
postgresql cluster (I use this for testing all the time)
Slony also allows for two different 'slony clusters' to be setup in the 
same database (or so I'm told, I don't think I have ever tried this myself).


plugin functions that let me query the local database and then return 
the  uuid and origin_name would work in this model.


+1 on being able to mark the 'change origin' in a SET command when the 
replication process is pushing data into the replica.



Exactly how we do this filtering is an open question,  I think the output
plugin will at a minimum need to know:

a) What the slony node id is of the node it is running on.  This is easy to
figure out if the output plugin is able/allowed to query its database.  Will
this be possible? I would expect to be able to query the database as it
exists now(at plugin invocation time) not as it existed in the past when the
WAL was generated.   In addition to the node ID I can see us wanting to be
able to query other slony tables (sl_table,sl_set etc...)

Hm. There is no fundamental reason not to allow normal database access
to the current database but it won't be all that cheap, so doing it
frequently is not a good idea.
The reason its not cheap is that you basically need to teardown the
postgres internal caches if you switch the timestream in which you are
working.

Would go something like:

TransactionContext =

Re: [HACKERS] [PATCH 09/14] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader

2012-11-16 Thread Steve Singer


On 12-11-14 08:17 PM, Andres Freund wrote:

For the regular satisfies routines this is needed in prepareation of logical
decoding. I changed the non-regular ones for consistency as well.

The naming between htup, tuple and similar is rather confused, I could not find
any consistent naming anywhere.

This is preparatory work for the logical decoding feature which needs to be
able to get to a valid relfilenode from when checking the visibility of a
tuple.



I have taken a look at this patch.   The patch does what it says, it 
changes a bunch of
HeapTupleSatisfies routines to take a HeapTuple structure instead of 
a HeapTupleHeader.

It also sets the HeapTuple.t_tableOid value before calling these routines.

The patch does not modify these routines to actually do anything useful 
with the additional data fields though it does add some assertions to 
make sure t_tableOid is set.


The patch compiles cleanly and the unit tests pass.

This patch  does not seem to depend on any of the other patches in this 
set and applies cleanly against master.  The patch doesn't actually add 
any functionality, unless someone sees a reason for complaining about 
this that I don't see, then I think it can be committed.


Steve


---
  contrib/pgrowlocks/pgrowlocks.c  |  2 +-
  src/backend/access/heap/heapam.c | 13 ++
  src/backend/access/heap/pruneheap.c  | 16 ++--
  src/backend/catalog/index.c  |  2 +-
  src/backend/commands/analyze.c   |  3 ++-
  src/backend/commands/cluster.c   |  2 +-
  src/backend/commands/vacuumlazy.c|  3 ++-
  src/backend/storage/lmgr/predicate.c |  2 +-
  src/backend/utils/time/tqual.c   | 50 +---
  src/include/utils/snapshot.h |  4 +--
  src/include/utils/tqual.h| 20 +++
  11 files changed, 83 insertions(+), 34 deletions(-)

Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

2012-10-15 Thread Steve Singer


On 12-10-15 04:51 PM, Andres Freund wrote:


Well, as a crosscheck, could you list your requirements?

Do you need anything more than outputting data in a format compatible to whats
stored in sl_log_*? You wouldn't have sl_actionseq, everything else should be
there (Well, you would need to do lookups to get the tableid, but thats not
really much of a problem). The results would be ordered in complete
transactions, in commit order.

I guess the other tables would stay as they are as they contain the added
value of slony?

Greetings,


I actually had spent some time a few weeks ago looking over the 
documents and code.  I never did get around to writing a review as 
elegant as Peter's.   I have not seen any red flags that make me thing 
that what your proposing wouldn't be suitable for slony but sometimes 
you don't see details until you start implementing something.


My initial approach to modifying slony to work with this might be 
something like:


* Leave sl_event as is for non SYNC events, slon would still generate 
SYNC events in sl_event
* We would modify the remote_worker thread in slon to instead of 
selecting from sl_event it would get the the next 'committed' 
transaction from your apply cache.   For each ApplyChange record we 
would check to see if it is an insert into sl_event ,if so we would 
trigger our existing event processing logic based on the contents of the 
ev_type column.
* If the change involves a insert/update/delete/truncate to a replicated 
table we would translate that change into SQL and apply it on the 
replica, we would  not commit changes on the replica until we encounter 
a SYNC being added to sl_event for the current origin.
* SQL will be applied in a slightly different order than slony does 
today.  Today if two concurrent transactions are inserting into the same 
replicated table and they commit one after the other there is a good 
chance that the apply order on the replica will also be intermixed 
(assuming both commits were in between two SYNC events). My thinking is 
that we would just replay them one after the other on the replica in 
commit order. (Slony doesn't use commit order because we don't have it, 
not because we don't like it) this would mean we do away with tracking 
the action id.


* If a node is configured as a 'forwarder' not it would store the 
processed output of each ApplyChange record in a table on the replica. 
If a slon is pulling data from a non-orign (ie if remoteWorkerThread_1 
is pulling data from node 2) then it would need to query this table 
instead of calling the functions that process the ApplyCache contents.


* To subscribe a node we would generate a SYNC event on the provider and 
do the copy_set.  We would keep track of that SYNC event.  The remote 
worker would then ignore any data that comes before that SYNC event  
when it starts pulling data from the apply cache.
* DDL events in 2.2+ go into sl_ddl_script (or someting like that) when 
we see INSERT commands to that table we would now to then apply the DDL 
on the node.


* We would need to continue to populate sl_confirm because nowing what 
SYNC events have already been processed by a node is pretty important in 
a MOVE SET or FAILOVER.  It is possible that we might need to still 
track the xip lists of each SYNC for MOVE SET/FAILOVER but I'm not sure 
why/why not.


This is all easier said than implemented


Steve






Andres




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 8/8] Introduce wal decoding via catalog timetravel

2012-10-11 Thread Steve Singer


On 12-10-11 06:27 PM, Josh Berkus wrote:

On 10/10/12 7:26 PM, Bruce Momjian wrote:

How does Slony write its changes without causing serialization replay
conflicts?

Since nobody from the Slony team answered this:

a) Slony replicates *rows*, not *statements*


True, but the proposed logical replication also would replicate rows not 
the original statements.  I don't consider this proposal to be an 
example of  'statement' replication like pgpool  is.  If the original 
SQL was 'update foo set x=x+1 where id  10';  there will be a WAL 
record to decode for each row modified by the table. In a million row 
table I'd expect the replica will have to apply a million records 
(whether they be binary heap tuples or SQL statements).

b) Slony uses serializable mode internally for row replication


Actually recent versions of slony apply transactions against the replica 
in read committed mode.  Older versions used serializable mode but with 
the SSI changes in 9.1 we found slony tended to have serialization 
conflicts with itself on the slony internal tables resulting in a lot of 
aborted transactions.


When slony applies changes on a replica table it does so in a single 
transaction.  Slony finds a set of transactions that committed on the 
master in between two SYNC events.  It then applies all of the rows 
changed by any of those transactions as part of a single transaction on 
the replica. Chris's post explains this in more detail.


Conflicts with user transactions on the replica are possible.

c) it's possible (though difficult) for creative usage to get Slony into
a deadlock situation

FWIW, I have always assumed that is is impossible (even theoretically)
to have statement-based replication without some constraints on the
statements you can run, or some replication failures.  I think we should
expect 9.3's logical replication out-the-gate to have some issues and
impose constraints on users, and improve with time but never be perfect.

The design Andres and Simon have advanced already eliminates a lot of
the common failure cases (now(), random(), nextval()) suffered by pgPool
and similar tools.  But remember, this feature doesn't have to be
*perfect*, it just has to be *better* than the alternatives.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] canceling autovacuum task woes

2012-07-24 Thread Steve Singer


On 12-07-24 01:48 PM, Robert Haas wrote:

I am running into a lot of customer situations where the customer
reports that canceling autovacuum task shows up in the logs, and
it's unclear whether this is happening often enough to matter, and
even more unclear what's causing it.


Could autovacuum be compacting a lot of space at the end of the table.  
This is described
in the thread 
http://archives.postgresql.org/message-id/4d8df88e.7080...@yahoo.com




Me: So, do you know what table it's getting cancelled on?
Customer: Nope.
Me: Are you running any DDL commands anywhere in the cluster?
Customer: Nope, absolutely none.
Me: Well you've got to be running something somewhere or it wouldn't
be having a lock conflict.
Customer: OK, well I don't know of any.  What should I do?

It would be awfully nice if the process that does the cancelling would
provide the same kind of reporting that we do for a deadlock: the
relevant lock tag, the PID of the process sending the cancel, and the
query string.

Personally, I'm starting to have a sneaky suspicion that there is
something actually broken here - that is, that there are lock
conflicts involve here other than the obvious one (SHARE UPDATE
EXCLUSIVE on the table) that are allowing autovac to get cancelled
more often than we realize.  But whether that's true or not, the
current logging is wholly inadequate.

Thoughts?  Anybody else have this problem?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-25 Thread Steve Singer


On 12-06-21 04:37 AM, Andres Freund wrote:

Hi Steve,
Thanks!



Attached is a detailed review of the patch.

Very good analysis, thanks!

Another reasons why we cannot easily do 1) is that subtransactions aren't
discernible from top-level transactions before the top-level commit happens,
we can only properly merge in the right order (by sorting via lsn) once we
have seen the commit record which includes a list of all committed
subtransactions.



Based on that and your comments further down in your reply (and that no 
one spoke up and disagreed with you) It sounds like that doing (1) isn't 
going to be practical.



I also don't think 1) would be particularly welcome by people trying to
replicate into foreign systems.



They could still sort the changes into transaction groups before 
applying to the foreign system.



I planned to have some cutoff 'max_changes_in_memory_per_txn' value. 
If it has

been reached for one transaction all existing changes are spilled to disk. New
changes again can be kept in memory till its reached again.

Do you want max_changes_per_in_memory_txn or do you want to put a limit 
on the total amount of memory that the cache is able to use? How are you 
going to tell a DBA to tune max_changes_in_memory_per_txn? They know how 
much memory their system has and that they can devote to the apply cache 
versus other things, giving them guidance on how estimating how much 
open transactions they might have at a point in time  and how many
WAL change records each transaction generates seems like a step 
backwards from the progress we've been making in getting Postgresql to 
be easier to tune.  The maximum number of transactions that could be 
opened at a time is governed by max_connections on the master at the 
time the WAL was generated , so I don't even see how the machine 
processing the WAL records could autotune/guess that.





We need to support serializing the cache for crash recovery + shutdown of the
receiving side as well. Depending on how we do the wal decoding we will need
it more frequently...



Have you described your thoughts on crash recovery on another thread?

I am thinking that this module would have to serialize some state 
everytime it calls cache-commit() to ensure that consumers don't get 
invoked twice on the same transaction.


If the apply module is making changes to the same backend that the apply 
cache serializes to then both the state for the apply cache and the 
changes that committed changes/transactions make will be persisted (or 
not persisted) together.   What if I am replicating from x86 to x86_64 
via a apply module that does textout conversions?


x86 Proxy x86_64
WAL-- apply
 cache
  |   (proxy catalog)
  |
 apply module
  textout  -
  SQL statements


How do we ensure that the commits are all visible(or not visible)  on 
the catalog on the proxy instance used for decoding WAL, the destination 
database, and the state + spill files of the apply cache stay consistent 
in the event of a crash of either the proxy or the target?
I don't think you can (unless we consider two-phase commit, and I'd 
rather we didn't).  Can we come up with a way of avoiding the need for 
them to be consistent with each other?


I think apply modules will need to be able to be passed the same 
transaction twice (once before the crash and again after) and come up 
with a  way of deciding if that transaction has  a) Been applied to the 
translation/proxy catalog and b) been applied on the replica instance.   
How is the walreceiver going to decide which WAL sgements it needs to 
re-process after a crash?  I would want to see more of these details 
worked out before we finalize the interface between the apply cache and 
the apply modules and how the serialization works.



Code Review
=

applycache.h
---
+typedef struct ApplyCacheTupleBuf
+{
+/* position in preallocated list */
+ilist_s_node node;
+
+HeapTupleData tuple;
+HeapTupleHeaderData header;
+char data[MaxHeapTupleSize];
+} ApplyCacheTupleBuf;

Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big 
the data in the transaction is? Wouldn't workloads with inserts of lots 
of small rows in a transaction eat up lots of memory that is allocated 
but storing nothing?  The only alternative I can think of is dynamically 
allocating these and I don't know what the cost/benefit of that overhead 
will be versus spilling to disk sooner.


+* FIXME: better name
+ */
+ApplyCacheChange*
+ApplyCacheGetChange(ApplyCache*);

How about:

ApplyCacheReserveChangeStruct(..)
ApplyCacheReserveChange(...)
ApplyCacheAllocateChange(...)

as ideas?
+/*
+ * Return an unused ApplyCacheChange struct
 +*/
+void
+ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-20 Thread Steve Singer


On 12-06-13 07:28 AM, Andres Freund wrote:

From: Andres Freundand...@anarazel.de

The individual changes need to be identified by an xid. The xid can be a
subtransaction or a toplevel one, at commit those can be reintegrated by doing
a k-way mergesort between the individual transaction.

Callbacks for apply_begin, apply_change and apply_commit are provided to
retrieve complete transactions.

Missing:
- spill-to-disk
- correct subtransaction merge, current behaviour is simple/wrong
- DDL handling (?)
- resource usage controls

Here is an initial review of the ApplyCache patch.

This patch provides a module for taking actions in the WAL stream and 
groups the actions by transaction, then passing these change records to 
a set of plugin functions.


For each transaction it encounters it keeps a list of the actions in 
that transaction. The ilist included in an earlier patch is used, 
changes resulting from that patch review would effect the code here but 
not in a way that chances the design.  When the module sees a commit for 
a transaction it calls the apply_change callback for each change.


I can think of three ways that a replication system like this could try 
to apply transactions.


1) Each time it sees a new transaction it could open up a new 
transaction on the replica and makes that change.  It leaves the
transaction open and goes on applying the next change (which might be 
for the current transaction or might be for another one).
When it comes across a commit record it would then commit the 
transaction.   If 100 concurrent transactions were open on the origin 
then 100 concurrent transactions will be open on the replica.


2) Determine the commit order of the transactions, group all the changes 
for a particular transaction together and apply them in that order for 
the transaction that committed first, commit that transaction and then 
move onto the transaction that committed second.


3) Group the transactions in a way that you move the replica from one 
consistent snapshot to another.  This is what Slony and Londiste do 
because they don't have the commit order or commit timestamps. Built-in 
replication can do better.


This patch implements option (2).   If we had a way of implementing 
option (1) efficiently would we be better off?


Option (2) requires us to put unparsed WAL data (HeapTuples) in the 
apply cache.  You can't translate this to an independent LCR until you 
call the apply_change record (which happens once the commit is 
encountered).  The reason for this is because some of the changes might 
be DDL (or things generated by a DDL trigger) that will change the 
translation catalog so you can't translate the HeapData to LCR's until 
your at a stage where you can update the translation catalog.  In both 
cases you might need to see later WAL records before you can convert an 
earlier one into an LCR (ie TOAST).


Some of my concerns with the apply cache are

Big transactions (bulk loads, mass updates) will be cached in the apply 
cache until the commit comes along.  One issue Slony has todo with bulk 
operations is that the replicas can't start processing the bulk INSERT 
until after it has commited.  If it takes 10 hours to load the data on 
the master it will take another 10 hours (at best) to load the data into 
the replica(20 hours after you start the process).  With binary 
streaming replication your replica is done processing the bulk update 
shortly after the master is.


Long running transactions can sit in the cache for a long time.  When 
you spill to disk we would want the long running but inactive ones 
spilled to disk first.  This is solvable but adds to the complexity of 
this module, how were you planning on managing which items of the list 
get spilled to disk?


The idea that we can safely reorder the commands into transactional 
groupings works (as far as I know) today because DDL commands get big 
heavy locks that are held until the end of the transaction.  I think 
Robert mentioned earlier in the parent thread that maybe some of that 
will be changed one day.


The downsides of (1) that I see are:

We would want a single backend to keep open multiple transactions at 
once. How hard would that be to implement? Would subtransactions be good 
enough here?


Applying (or even translating WAL to LCR's) the changes in parallel 
across transactions might complicate the catalog structure because each 
concurrent transaction might need its own version of the catalog (or can 
you depend on the locking at the master for this? I think you can today)


With approach (1) changes that are part of a rolledback transaction 
would have more overhead because you would call apply_change on them.


With approach (1) a later component could still group the LCR's by 
transaction before applying by running the LCR's through a data 
structure very similar to the ApplyCache.



I think I need more convincing that approach (2), what this patch 
implements, is the best way doing

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-18 Thread Steve Singer


On 12-06-18 07:30 AM, Andres Freund wrote:


Hrmpf #666. I will go through through the series commit-by-commit again to
make sure everything compiles again. Reordinging this late definitely wasn't a
good idea...

I pushed a rebased version with all those fixups (and removal of the
zeroRecPtr patch).


Where did you push that rebased version to? I don't see an attachment, 
or an updated patch in the commitfest app and your repo at 
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary 
hasn't been updated in 5 days.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-18 Thread Steve Singer


On 12-06-18 11:50 AM, Andres Freund wrote:

Hi Simon,



I think we need to agree on the parameter name. It currently is
'multimaster_node_id'. In the discussion with Steve we got to
replication_node_id. I don't particularly like either.

Other suggestions?

Other things that come to mind (for naming this parameter in the 
postgresql.conf)


node_id
origin_node_id
local_node_id

I wished we had some flag bits available before as well. I find 256 nodes a
pretty low value to start with though, 4096 sounds better though, so I would
be happy with 4 flag bits. I think for cascading setups and such you want to
add node ids for every node, not only masters...

Any opinions from others on this?



256 sounds a bit low to me as well.  Sometimes the use case of a retail 
chain comes up where people want each store to have a postgresql 
instance and replicate back to a central office.  I can think of many 
chains with more than 256 stores.





Thanks,

Andres



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-17 Thread Steve Singer


On 12-06-13 01:27 PM, Andres Freund wrote:

The previous mail contained a patch with a mismerge caused by reording
commits. Corrected version attached.

Thanks to Steve Singer for noticing this quickly.



Attached is a more complete review of this patch.

I agree that we will need to identify the node a change originated at.  
We will not only want this for multi-master support but it might also be 
very helpful once we introduce things like cascaded replicas. Using a 16 
bit integer for this purpose makes sense to me.


This patch (with the previous numbered patches already applied), still 
doesn't compile.


gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include 
-D_GNU_SOURCE   -c -o xact.o xact.c

xact.c: In function 'xact_redo_commit':
xact.c:4678: error: 'xl_xact_commit' has no member named 'origin_lsn'
make[4]: *** [xact.o] Error 1

Your complete patch set did compile.  origin_lsn gets added as part of 
your 12'th patch.  Managing so many related patches is going to be a 
pain. but it beats one big patch.  I don't think this patch actually 
requires the origin_lsn change.



Code Review
-
src/backend/utils/misc/guc.c
@@ -1598,6 +1600,16 @@ static struct config_int ConfigureNamesInt[] =
 },

 {
+{multimaster_node_id, PGC_POSTMASTER, REPLICATION_MASTER,
+gettext_noop(node id for multimaster.),
+NULL
+},
+ guc_replication_origin_id,
+InvalidMultimasterNodeId, InvalidMultimasterNodeId, 
MaxMultimasterNodeId,

+NULL, assign_replication_node_id, NULL
+},

I'd rather see us refer to this as the 'node id for logical replication' 
over the multimaster node id.  I think that terminology will be less 
controversial.  Multi-master means different things to different people 
and it is still unclear what forms of multi-master we will have 
in-core.  For example,  most people don't consider slony to be 
multi-master replication.  If a future version of slony were to feed off 
logical replication (instead of triggers) then I think it would need 
this node id to determine which node a particular change has come from.


The description inside the gettext call should probably be Sets the 
node id for . to be consistent with the description of the rest of 
the GUC's


BootStrapXLOG in xlog.c
creates a XLogRecord structure and shouldit  set xl_origin_id to the  
InvalidMultimasterNodeId?


WriteEmptyXLOG in pg_resetxlog.c might also should set xl_origin_id to a 
well defined value.  I think InvalidMultimasterNodeId should be safe 
even for a no-op record in from a node that actually has a node_id set 
on real records.


backend/replication/logical/logical.c:
XLogRecPtr current_replication_origin_lsn = {0, 0};

This variable isn't used/referenced by this patch it probably belongs as 
part of the later patch.



Steve


Andres

Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-16 Thread Steve Singer


On 12-06-15 04:03 PM, Robert Haas wrote:

On Thu, Jun 14, 2012 at 4:13 PM, Andres Freundand...@2ndquadrant.com  wrote:

I don't plan to throw in loads of conflict resolution smarts. The aim is to get
to the place where all the infrastructure is there so that a MM solution can
be built by basically plugging in a conflict resolution mechanism. Maybe
providing a very simple one.
I think without in-core support its really, really hard to build a sensible MM
implementation. Which doesn't mean it has to live entirely in core.

Of course, several people have already done it, perhaps most notably Bucardo.

Anyway, it would be good to get opinions from more people here.  I am
sure I am not the only person with an opinion on the appropriateness
of trying to build a multi-master replication solution in core or,
indeed, the only person with an opinion on any of these other issues.


This sounds like a good place for me to chime in.

I feel that in-core support to capture changes and turn them into change 
records that can be replayed on other databases, without relying on 
triggers and log tables, would be good to have.


I think we want some flexible enough that people write consumers of the 
LCRs to do conflict resolution for multi-master but I am not sure that 
the conflict resolution support actually belongs in core.


Most of the complexity of slony (both in terms of lines of code, and 
issues people encounter using it) comes not from the log triggers or 
replay of the logged data but comes from the configuration of the cluster.

Controlling things like

* Which tables replicate from a node to which other nodes
* How do you change the cluster configuration on a running system 
(adding nodes, removing nodes, moving the origin of a table, adding 
tables to replication etc...)


This is the harder part of the problem, I think we need to first get the 
infrastructure committed (that the current patch set deals with) to 
capturing, transporting and translating the LCR's into the system before 
get too caught up in the configuration aspects.   I think we will have a 
hard time agreeing on behaviours for some of that other stuff that are 
both flexible for enough use cases and simple enough for 
administrators.  I'd like to see in-core support for a lot of that stuff 
but I'm not holding my breath.



It is not good for those other opinions to be saved for a later date.


Hm. Yes, you could do that. But I have to say I don't really see a point.
Maybe the fact that I do envision multimaster systems at some point is
clouding my judgement though as its far less easy in that case.

Why?  I don't think that particularly changes anything.


It also complicates the wal format as you now need to specify whether you
transport a full or a primary-key only tuple...

Why?  If the schemas are in sync, the target knows what the PK is
perfectly well.  If not, you're probably in trouble anyway.





I think though that we do not want to enforce that mode of operation for
tightly coupled instances. For those I was thinking of using command triggers
to synchronize the catalogs.
One of the big screwups of the current replication solutions is exactly that
you cannot sensibly do DDL which is not a big problem if you have a huge
system with loads of different databases and very knowledgeable people et al.
but at the beginning it really sucks. I have no problem with making one of the
nodes the schema master in that case.
Also I would like to avoid the overhead of the proxy instance for use-cases
where you really want one node replicated as fully as possible with the slight
exception of being able to have summing tables, different indexes et al.

In my view, a logical replication solution is precisely one in which
the catalogs don't need to be in sync.  If the catalogs have to be in
sync, it's not logical replication.  ISTM that what you're talking
about is sort of a hybrid between physical replication (pages) and
logical replication (tuples) - you want to ship around raw binary
tuple data, but not entire pages.  The problem with that is it's going
to be tough to make robust.  Users could easily end up with answers
that are total nonsense, or probably even crash the server.



I see three catalogs in play here.
1. The catalog on the origin
2. The catalog on the proxy system (this is the catalog used to 
translate the WAL records to LCR's).  The proxy system will need 
essentially the same pgsql binaries (same architecture, important 
complie flags etc..) as the origin

3. The catalog on the destination system(s).

The catalog 2 must be in sync with catalog 1, catalog 3 shouldn't need 
to be in-sync with catalog 1.   I think catalogs 2 and 3 are combined in 
the current patch set (though I haven't yet looked at the code 
closely).   I think the performance optimizations Andres has implemented 
to update tuples through low-level functions should be left for later 
and that we should  be generating SQL in the apply cache so we don't 
start

Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-13 Thread Steve Singer


On 12-06-13 07:27 AM, Andres Freund wrote:

Its also available in the 'cabal-rebasing' branch on
git.postgresql.org/users/andresfreund/postgres.git . That branch will modify
history though.



That branch has a merge error in f685a11ce43b9694cbe61ffa42e396c9fbc32b05


gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include 
-D_GNU_SOURCE -c -o xact.o xact.c

xact.c:4684: error: expected identifier or ‘(’ before ‘’ token
xact.c:4690:46: warning: character constant too long for its type
xact.c:4712:46: warning: character constant too long for its type
xact.c:4719: error: expected identifier or ‘(’ before ‘’ token
xact.c:4740:46: warning: character constant too long for its type
make[4]: *** [xact.o] Error 1



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-19 Thread Steve Singer


On 12-01-17 05:38 AM, Fujii Masao wrote:

On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

The amount of code changes to allow pg_basebackup to make a backup from
the standby seems to be small. So I ended up merging that changes and the
infrastructure patch. WIP patch attached. But I'd happy to split the patch again
if you want.

Attached is the updated version of the patch. I wrote the limitations of
standby-only backup in the document and changed the error messages.



Here is my review of this verison of the patch. I think this patch has 
been in every CF for 9.2 and I feel it is getting close to being 
committed.  The only issue of significants is a crash I encountered 
while testing, see below.


I am fine with including the pg_basebackup changes in the patch it also 
makes testing some of the other changes possible.



The documentation updates you have are good

I don't see any issues looking at the code.



Testing Review


I encountered this on my first replica (the one based on the master).  I 
am not sure if it is related to this patch, it happened after the 
pg_basebackup against the replica finished.


TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File: 
twophase.c, Line: 1238)

LOG:  startup process (PID 1) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

A little earlier this postmaster had printed.

LOG:  restored log file 0001001F from archive
LOG:  restored log file 00010020 from archive
cp: cannot stat 
`/usr/local/pgsql92git/archive/00010021': No such file 
or directory

LOG:  unexpected pageaddr 0/1900 in log file 0, segment 33, offset 0
cp: cannot stat 
`/usr/local/pgsql92git/archive/00010021': No such file 
or directory



I have NOT been able to replicate this error  and I am not sure exactly 
what I had done in my testing prior to that point.



In another test run I had

- set full page writes=off and did a checkpoint
- Started the pg_basebackup
- set full_page_writes=on and did a HUP + some database activity that 
might have forced a checkpoint.


I got this message from pg_basebackup.
./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
pg_basebackup: could not get WAL end position from server

I point this out because the message is different than the normal could 
not initiate base backup: FATAL:  WAL generated with 
full_page_writes=off thatI normally see.We might want to add a 
PQerrorMessage(conn)) to pg_basebackup to print the error details.  
Since this patch didn't actually change pg_basebackup I don't think your 
required to improve the error messages in it.  I am just mentioning this 
because it came up in testing.



The rest of the tests I did involving changing full_page_writes  
with/without checkpoints and sighups and promoting the replica seemed to 
work as expected.






Regards,

Re: [HACKERS] static or dynamic libpgport

2011-12-09 Thread Steve Singer


On 11-12-09 11:13 AM, Andrew Dunstan wrote:
Recently I attempted to build an external package (pg_bulkload) 
against the latest Fedora packages. Unfortunately this fails, as pgxs 
adds -lpgport to any link line for an executable, and the 
corresponding libpgport.a isn't there. And in fact, pg_bulkload does 
use some of the functionality there (e.g. pg_strncasecmp), so just 
stripping -lpgport out doesn't work either.


This happened because Fedora packaging guidelines 
http://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Static_Libraries 
are strongly against shipping static libraries, and so all the 
PostgreSQL static libraries are excluded from the distribution (and I 
believe there are similar restrictions for RHEL). Of these libraries, 
I believe the only one that is *only* built as a static library is 
libpgport.


Is there any good reason why we shouldn't build and install a dynamic 
libpgport.so?


+1

We've struggled with slony and pgport because so many users have had 
problems with pgport not being included in some distributions.  It has 
some useful functions, I think recent versions of slony use it on win32 
but don't elsewhere. Wee have had at least one patch floating around 
that makes conditionally includes  certain small behaviours in slony 
based on if pgport is available or not based on a configure check.


What package would a shared static pgport be installed with? Slony 
requires a server + headers to build but slon and slonik only have a 
runtime dependency on libpq (I don't know if anyone installs slon/slonik 
on a machine without a postgresql server but you could)


Steve




cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpython SPI cursors

2011-11-26 Thread Steve Singer

On 11-11-23 01:58 PM, Jan Urbański wrote:

On 20/11/11 19:14, Steve Singer wrote:

On 11-10-15 07:28 PM, Jan Urbański wrote:

Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code

Responding now to all questions and attaching a revised patch based on
your comments.

I've looked over the revised version of the patch and it now seems fine.

Ready for committer.

Do we like the name plpy.cursor or would we rather call it something
like

plpy.execute_cursor(...) or plpy.cursor_open(...) or
plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is
worth
some opinions on. I like cursor() but if anyone disagrees now is the
time.

We use plpy.subtransaction() to create Subxact objects, so I though
plpy.cursor() would be most appropriate.

This patch does not provide a wrapper around SPI_cursor_move. The patch
is useful without that and I don't see anything that preculdes
someone else

adding that later if they see a need.

My idea is to add keyword arguments to plpy.cursor() that will allow
you to decide whether you want a scrollable cursor and after that
provide a move() method.

The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database
access
functions like the plperl
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring. That should be done as a seperate patch and
shouldn't be a barrier to committing this one.

Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet
summoned force to overhaul them.

in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portal portal;
+ char *volatile nulls;
+ volatile int j;

I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need
to be

volatile here.

It looked like you could drop volatile there (and in
PLy_spi_execute_plan, where this is copied from (did I mention there's
quite some code duplication in PL/Python?)) but digging in git I found
this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000

that added the original volatile qualification, so I guess there's a
reason.

line 444
PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char *query;
+ PyObject *plan;
+ PyObject *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, s, query))
+ return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()

No, PyArg_ParseTuple returns a string on the stack, I check that
repeatedly creating a cursor with a plan argument does not leak memory
and that adding PyMem_Free there promptly leads to a segfault.

I tested both python 2.6 and 3 on a Linux system

[test cases demonstrating bugs]

Turns out it's a really bad idea to store pointers to Portal
structures, because they get invalidated by the subtransaction abort
hooks.

I switched to storing the cursor name and looking it up in the
appropriate hash table every time it's used. The examples you sent
(which I included as regression tests) now cause a ValueError to be
raised with a message stating that the cursor has been created in an
aborted subtransaction.

Not sure about the wording of the error message, though.

Thanks again for the review!

Cheers,
Jan

Re: [HACKERS] plpython SPI cursors

2011-11-20 Thread Steve Singer


On 11-10-15 07:28 PM, Jan Urbański wrote:

Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

J


I found a few bugs  (see my testing section below) that will need fixing 
+ a few questions about the code



Overview  Feature Review
---
This patch adds cursor support to plpython.  SPI cursors will allow
a plpython function to read process a large results set without having to
read it all into memory at once.  This is a good thing.  Without this
patch I think you could accomplish the same with using SQL DECLARE CURSOR
and SQL fetch.This feature allows you to use a python cursor as
an iterator resulting in much cleaner python code than the SQL FETCH
approach.   I think the feature is worth having


Usability Review
--
 The patch adds the user methods
cursor=plpy.cursor(query_or_plan)
cursor.fetch(100)
cursor.close()

Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is worth
some opinions on.  I like cursor() but if anyone disagrees now is the time.

This patch does not provide a wrapper around SPI_cursor_move.  The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.


Documentation Review
-

The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database access
functions like the plperl 
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page 
does.  I think the organization of the perl page is

clearer than the python one and we should think about a doing some
documentaiton refactoring.  That should be done as a seperate patch and
shouldn't be a barrier to committing this one.




Code Review


in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portalportal;
+ char   *volatile nulls;
+ volatile int j;
+
+ if (nargs  0)
+ nulls = palloc(nargs * sizeof(char));
+ else
+ nulls = NULL;
+
+ for (j = 0; j  nargs; j++)
+ {
+ PyObject   *elem;
I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.

line 444
 PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char*query;
+ PyObject*plan;
+ PyObject   *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, s, query))
+ return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()



Testing
---

I tested both python 2.6 and 3 on a Linux system



create or replace function x() returns text as $$
cur=None
try:
  with plpy.subtransaction():
cur=plpy.cursor('select generate_series(1,1000)')
rows=cur.fetch(10);
plpy.execute('select f()')

except plpy.SPIError:
  rows=cur.fetch(10);
  return rows[0]['generate_series']
return 'none'
$$ language plpythonu;
select x();

crashes the backend
test=# select x();
The connection to the server was lost. Attempting reset: LOG:  server 
process (PID 3166) was terminated by signal 11: Segmentation fault


The below test gives me a strange error message:

create or replace function x1() returns text as $$
plan=None
try:
  with plpy.subtransaction():
plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)')
plan=plpy.prepare('select * FROM z')
plpy.execute('select * FROM does_not_exist')
except plpy.SPIError, e:
cur=plpy.cursor(plan)
rows=cur.fetch(10)
return rows[0]['generate_series']
return '1'
$$ language plpythonu;
select x1();

test=# select x1()
test-# ;
ERROR:  TypeError: Expected sequence of 82187072 arguments, got 0: NULL
CONTEXT:  Traceback (most recent call last):
  PL/Python function x1, line 9, in module
cur=plpy.cursor(plan)
PL/Python function x1
STATEMENT:  select x1()


I was expecting an error from the function just a bit more useful one.


Performance
---
I did not do any specific performance testing but I don't see this patch 
as having any impact to performance




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-11-14 Thread Steve Singer

On 11-10-31 12:11 AM, Jun Ishiduka wrote:

 Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it
 as the infrastructure patch.

 The changes of pg_start_backup() etc that Ishiduka-san did are also
 a server-side infrastructure. I will extract them as another infrastructure 
 one.

 Ishiduka-san, if you have time, feel free to try the above, barring 
 objection.

 Done.
 Changed the name of the patch.

 Modifications
  So changed to the positioning of infrastructure,
* Removed the documentation.
* changed to an error when you run pg_start/stop_backup() on the standby.



Here is my stab at reviewing this version of this version of the patch.

Submission
---
The purpose of this version of the patch is to provide some
infrastructure needed for backups from the slave without having to solve
some of the usability issues raised in previous versions of the patch.

This patch applied fine earlier versions of head but it doesn't today.
Simon moved some of the code touched by this patch as part of the xlog
refactoring. Please post an updated/rebased version of the patch.


I think the purpose of this patch is to provide

a) The code changes to record changes to fpw state of the master in WAL.
b) Track the state of FPW while in recovery mode

This version of the patch is NOT intended to allow SQL calls to
pg_start_backup() on slaves to work. This patch lays the infrastructure
for another patch (which I haven't seen) to allow pg_basebackup to do a
base backup from a slave assuming fpw=on has been set on the master (my
understanding of this patch is that it puts into place all of the pieces
required for the pg_basebackup patch to detect if fpw!=on and abort).


The consensus upthread was to get this infrastructure in and figure out
a safe+usable way of doing a slave backup without pg_basebackup later.

The patch seems to do what I expect of it.

I don't see any issues with most of the code changes in this patch.
However I admit that even after reviewing many versions of this patch I
still am not familiar enough with the recovery code to comment on a lot
of the details.

One thing I did see:

In pg_ctl.c

! if (stat(recovery_file, statbuf) != 0)
! print_msg(_(WARNING: online backup mode is active\n
! Shutdown will not complete until pg_stop_backup() is called.\n\n));
! else
! print_msg(_(WARNING: online backup mode is active if you can connect
as a superuser to server\n
! If so, shutdown will not complete until pg_stop_backup() is
called.\n\n));

I am having difficulty understanding what this error message is trying
to tell me. I think it is telling me (based on the code comments) that
if I can't connect to the server because the server is not yet accepting
connections then I shouldn't worry about anything. However if the server
is accepting connections then I need to login and call pg_stop_backup().

Maybe
WARNING: online backup mode is active. If your server is accepting
connections then you must connect as superuser and run pg_stop_backup()
before shutdown will complete

I will wait on attempting to test the patch until you have sent a
version that applies against the current HEAD.


 Regards.


 
 Jun Ishizuka
 NTT Software Corporation
 TEL：045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp

Re: [HACKERS] pg_dump 9.1.1 hanging (collectSecLabels gets 0 labels)

2011-11-10 Thread Steve Singer


On 11-11-09 06:35 PM, Tom Lane wrote:

Steve Singerssin...@ca.afilias.info  writes:

I've tracked the issue down to collectSecLabels in pg_dump.c



SELECT label, provider, classoid, objoid, objsbid FROM
pg_catalog.pg_seclabel;



returns 0 rows.



The code in collectSecLabels() is not prepared to deal with a zero row
result and tries to malloc 0 bytes.


pg_seclabel is almost always empty, so I'm not convinced that you've
identified your problem correctly.

regards, tom lane



The attached patch seems to fix the issue.

The man page for malloc on AIX is pretty clear on what happens when you 
try to malloc 0 bytes.  It returns NULL.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index fce9d3b..9e31767 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*** findSecLabels(Archive *fout, Oid classoi
*** 11760,11766 
  	/* Get security labels if we didn't already */
  	if (nlabels  0)
  		nlabels = collectSecLabels(fout, labels);
! 
  	/*
  	 * Do binary search to find some item matching the object.
  	 */
--- 11760,11770 
  	/* Get security labels if we didn't already */
  	if (nlabels  0)
  		nlabels = collectSecLabels(fout, labels);
! 	if (nlabels == 0)
! 	{
! 		*items=NULL;
! 		return 0;
! 	}
  	/*
  	 * Do binary search to find some item matching the object.
  	 */
*** collectSecLabels(Archive *fout, SecLabel
*** 11858,11875 
  	i_objsubid = PQfnumber(res, objsubid);
  
  	ntups = PQntuples(res);
! 
! 	labels = (SecLabelItem *) malloc(ntups * sizeof(SecLabelItem));
! 
! 	for (i = 0; i  ntups; i++)
  	{
! 		labels[i].label = PQgetvalue(res, i, i_label);
! 		labels[i].provider = PQgetvalue(res, i, i_provider);
! 		labels[i].classoid = atooid(PQgetvalue(res, i, i_classoid));
! 		labels[i].objoid = atooid(PQgetvalue(res, i, i_objoid));
! 		labels[i].objsubid = atoi(PQgetvalue(res, i, i_objsubid));
  	}
  
  	/* Do NOT free the PGresult since we are keeping pointers into it */
  	destroyPQExpBuffer(query);
  
--- 11862,11889 
  	i_objsubid = PQfnumber(res, objsubid);
  
  	ntups = PQntuples(res);
! 	if ( ntups == 0)
  	{
! 		labels = NULL;
  	}
+ 	else
+ 	{
+ 		labels = (SecLabelItem *) malloc(ntups * sizeof(SecLabelItem));
+ 		if (labels == NULL )
+ 		{
+ 			write_msg(NULL, out of memory);
+ 			exit(1);
+ 		}
  
+ 		for (i = 0; i  ntups; i++)
+ 		{
+ 			labels[i].label = PQgetvalue(res, i, i_label);
+ 			labels[i].provider = PQgetvalue(res, i, i_provider);
+ 			labels[i].classoid = atooid(PQgetvalue(res, i, i_classoid));
+ 			labels[i].objoid = atooid(PQgetvalue(res, i, i_objoid));
+ 			labels[i].objsubid = atoi(PQgetvalue(res, i, i_objsubid));
+ 		}
+ 	}
  	/* Do NOT free the PGresult since we are keeping pointers into it */
  	destroyPQExpBuffer(query);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump 9.1.1 hanging (collectSecLabels gets 0 labels)

2011-11-10 Thread Steve Singer


On 11-11-10 02:00 PM, Tom Lane wrote:

Steve Singerssin...@ca.afilias.info  writes:

The man page for malloc on AIX is pretty clear on what happens when you
try to malloc 0 bytes.  It returns NULL.


Yes, that's a pretty common behavior for malloc(0).  It should not cause
a problem here AFAICS.

... Oh, I see, the problem is thatlabels[-1] might not compare to
labels[0] the way we want.  I think only the first hunk of your
patch is actually necessary.

regards, tom lane



Yes the problem is still fixed if I only apply the first hunk.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump 9.1.1 hanging (collectSecLabels gets 0 labels)

2011-11-09 Thread Steve Singer

We have a cluster running 9.1.1 where pg_dump hangs when we try to dump 
some a database inside of the cluster.  The server is running AIX.


I can see this on clean cluster where we do an initdb, followed by a 
createdb and try running pg_dump.


I've tracked the issue down to collectSecLabels in pg_dump.c

SELECT label, provider, classoid, objoid, objsbid FROM 
pg_catalog.pg_seclabel;


returns 0 rows.

The code in collectSecLabels() is not prepared to deal with a zero row 
result and tries to malloc 0 bytes.


I am not yet sure if the problem is that my pg_seclabel is empty or if 
the issue is in collectSecLabels() or if collectSecLabels shouldn't even 
be called.


Has anyone seen something similar?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-10-25 Thread Steve Singer


On 11-10-25 02:44 AM, Heikki Linnakangas wrote:
With pg_basebackup, we have a fighting chance of getting this right, 
because we have more control over how the backup is made. For example, 
we can co-operate with the buffer manager to avoid torn-pages, 
eliminating the need for full_page_writes=on, and we can include a 
control file with the correct end-of-backup location automatically, 
without requiring user intervention. pg_basebackup is less flexible 
than the pg_start/stop_backup method, and unfortunately you're more 
likely to need the flexibility in a more complicated setup with a hot 
standby server and all, but making the generic pg_start/stop_backup 
method work seems infeasible at the moment.


Would pg_basebackup be able to work with the buffer manager on the slave 
to avoid full_page_writes=on needing to be set on the master?  (the 
point of this is to be able to take the base backup without having the 
backup program contact the master). If so could pg_start_backup() not 
just put the buffer manager into the same state?





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-10-11 Thread Steve Singer

On 11-10-11 11:17 AM, Jun Ishiduka wrote:
 Done.

 Updated patch attached.


I have taken Jun's latest patch and applied it on top of Fujii's most
recent patch. I did some testing with the result but nothing theory
enough to stumble on any race conditions.

Some testing notes
--
select pg_start_backup('x');
ERROR: full_page_writes on master is set invalid at least once since
latest checkpoint

I think this error should be rewritten as
ERROR: full_page_writes on master has been off at some point since
latest checkpoint

We should be using 'off' instead of 'invalid' since that is what is what
the user sets it to.


I switched full_page_writes=on , on the master

did a pg_start_backup() on the slave1.

Then I switched full_page_writes=off on the master, did a reload +
checkpoint.

I was able to then do my backup of slave1, copy the control file, and
pg_stop_backup().
When I did the test slave2 started okay, but is this safe? Do we need a
warning from pg_stop_backup() that is printed if it is detected that
full_page_writes was turned off on the master during the backup period?


Code Notes
-
*** 6865,6870 
--- 6871,6886 
/* Pre-scan prepared transactions to find out the range of XIDs present */
oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);

+ /*
+ * The startup updates FPW in shaerd-memory after REDO. However, it must
+ * perform before writing the WAL of the CHECKPOINT. The reason is that
+ * it uses a value of fpw in shared-memory when it writes a WAL of its
+ * CHECKPOTNT.
+ */

Minor typo above at 'CHECKPOTNT'



If my concern about full page writes being switched to off in the middle
of a backup is unfounded then I think this patch is ready for a
committer. They can clean the two editorial changes when they apply the
patches.

If do_pg_stop_backup is going to need some logic to recheck the full
page write status then an updated patch is required.





 Regards.

 
 Jun Ishizuka
 NTT Software Corporation
 TEL：045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp

Re: [HACKERS] Online base backup from the hot-standby

2011-09-27 Thread Steve Singer


On 11-09-26 10:56 PM, Fujii Masao wrote:


Looks weired. Though the WAL record starting from 0/6000298 was read
successfully, then re-fetch of the same record fails at the end of recovery.
One possible cause is the corruption of archived WAL file. What
restore_command on the standby and archive_command on the master
are you using? Could you confirm that there is no chance to overwrite
archive WAL files in your environment?

I tried to reproduce this problem several times, but I could not. Could
you provide the test case which reproduces the problem?



This is the test procedure I'm trying today, I wasn't able to reproduce 
the crash.  What I was doing the other day was similar but I can't speak 
to unintentional differences.



I have my master server
data
port=5439
wal_level=hot_standby
archive_mode=on
archive_command='cp -i %p /usr/local/pgsql92git/archive/%f'
hot_standby=on

I then run
select pg_start_backup('foo');
$ rm -r ../data2
$ cp -r ../data ../data2
$ rm ../data2/postmaster.pid
select pg_stop_backup();
I edit data2/postgresql.conf so
port=5438
I commented out archive_mode and archive_command (or at least today I did)
recovery.conf is

standby_mode='on'
primary_conninfo='host=127.0.0.1 port=5439 user=ssinger dbname=test'
restore_command='cp /usr/local/pgsql92git/archive/%f %p'

I then start up the second cluster. On it I run

select pg_start_backup('1');
$ rm -r ../data3
$ rm -r ../archive2
$ cp -r ../data2 ../data3
$ cp ../data2/global/pg_control ../data3/global

select pg_stop_backup();
I edit ../data2/postgresql.conf
port=5437
archive_mode=on
# (change requires restart)
archive_command='cp -i %p /usr/local/pgsql92git/archive2/%f'

recovery.conf is

standby_mode='on'
primary_conninfo='host=127.0.0.1 port=5439 user=ssinger dbname=test'
restore_command='cp /usr/local/pgsql92git/archive/%f %p'
trigger_file='/tmp/3'

$ postgres -D ../data3

The first time I did this postgres came up quickly.

$ touch /tmp/3

worked fine.

I then stopped data3
$ rm -r ../data3
on data 2 I run
pg_start_backup('1')
$ cp -r ../data2 ../data3
$ cp ../data2/global/pg_control ../data3/global
select pg_stop_backup() # on data2
$ rm ../data3/postmaster.pid
vi ../data3/postgresql.conf # same changes as above for data3
vi ../data3/recovery.conf # same as above for data 3
postgres -D ../data3

This time I got
./postgres -D ../data3
LOG:  database system was interrupted while in recovery at log time 
2011-09-27 22:04:17 GMT
HINT:  If this has occurred more than once some data might be corrupted 
and you might need to choose an earlier recovery target.

LOG:  entering standby mode
cp: cannot stat 
`/usr/local/pgsql92git/archive/0001000C': No such file 
or directory

LOG:  redo starts at 0/C20
LOG:  record with incorrect prev-link 0/958 at 0/CB0
cp: cannot stat 
`/usr/local/pgsql92git/archive/0001000C': No such file 
or directory

LOG:  streaming replication successfully connected to primary
FATAL:  the database system is starting up
FATAL:  the database system is starting up
LOG:  consistent recovery state reached at 0/CE8
LOG:  database system is ready to accept read only connections

In order to get the database to come in read only mode I manually issued 
a checkpoint on the master (data) shortly after the checkpoint command 
the data3 instance went to read only mode.


then

touch /tmp/3

trigger file found: /tmp/3
FATAL:  terminating walreceiver process due to administrator command
cp: cannot stat 
`/usr/local/pgsql92git/archive/0001000C': No such file 
or directory

LOG:  record with incorrect prev-link 0/9000298 at 0/C0002F0
cp: cannot stat 
`/usr/local/pgsql92git/archive/0001000C': No such file 
or directory

LOG:  redo done at 0/C000298
cp: cannot stat 
`/usr/local/pgsql92git/archive/0001000C': No such file 
or directory
cp: cannot stat `/usr/local/pgsql92git/archive/0002.history': No 
such file or directory

LOG:  selected new timeline ID: 2
cp: cannot stat `/usr/local/pgsql92git/archive/0001.history': No 
such file or directory

LOG:  archive recovery complete
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started


It looks like data3 is still pulling files with the recovery command 
after it sees the touch file (is this expected behaviour?)

$ grep archive ../data3/postgresql.conf
#wal_level = minimal# minimal, archive, or hot_standby
#archive_mode = off# allows archiving to be done
archive_mode=on
archive_command='cp -i %p /usr/local/pgsql92git/archive2/%f'


I have NOT been able to make postgres crash during a recovery (today).  
It is *possible* that on some of my runs the other day I had skipped 
changing the archive command on data3 to write to archive2 instead of 
archive.


I have also today not been able to get it to attempt to restore the same 
WAL file twice.




If a base backup is in progress on a recovery database

Re: [HACKERS] Online base backup from the hot-standby

2011-09-25 Thread Steve Singer


On 11-09-22 09:24 AM, Fujii Masao wrote:

On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masaomasao.fu...@gmail.com  wrote:

2011/9/13 Jun Ishidukaishizuka@po.ntts.co.jp:

Update patch.

Changes:
  * set 'on' full_page_writes by user (in document)
  * read FROM: XX in backup_label (in xlog.c)
  * check status when pg_stop_backup is executed (in xlog.c)

Thanks for updating the patch.

Before reviewing the patch, to encourage people to comment and
review the patch, I explain what this patch provides:

Attached is the updated version of the patch. I refactored the code, fixed
some bugs, added lots of source code comments, improved the document,
but didn't change the basic design. Please check this patch, and let's use
this patch as the base if you agree with that.



I have looked at both Jun's patch from Sept 13 and Fujii's updates to 
the patch.  I agree that Fujii's updated version should be used as the 
basis for changes going forward.   My comments below refer to that 
version (unless otherwise noted).



In backup.sgml  the new section titled Making a Base Backup during 
Recovery  I would prefer to see some mention in the title that this 
procedure is for standby servers ie Making a Base Backup from a Standby 
Database.  Users who have setup a hot-standby database should be 
familiar with the 'standby' terminology. I agree that the during 
recovery description is technically correct but I'm not sure someone 
who is looking through the manual for instructions on making a base 
backup from here standby will realize this is the section they should read.


Around line 969 where you give an example of copying the control file I 
would be a bit clearer that this is an example command.  Ie (Copy the 
pg_control file from the cluster directory to the global sub-directory 
of the backup.  For example cp $PGDATA/global/pg_control 
/mnt/server/backupdir/global)



Testing Notes
-

I created a standby server from a base backup of another standby server. 
On this new standby server I then


1. Ran pg_start_backup('3'); and left the psql connection open
2. touch /tmp/3 -- my trigger_file

ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG:  trigger file 
found: /tmp/3

FATAL:  terminating walreceiver process due to administrator command
LOG:  restored log file 00010006 from archive
LOG:  record with zero length at 0/60002F0
LOG:  restored log file 00010006 from archive
LOG:  redo done at 0/6000298
LOG:  restored log file 00010006 from archive
PANIC:  record with zero length at 0/6000298
LOG:  startup process (PID 19011) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.


The new postmaster (the one trying to be promoted) dies.  This is 
somewhat repeatable.




If a base backup is in progress on a recovery database and that recovery 
database is promoted to master, following the promotion (if you don't 
restart the postmaster).  I see

select pg_stop_backup();
ERROR:  database system status mismatches between pg_start_backup() and 
pg_stop_backup()


If you restart the postmaster this goes away.  When the postmaster 
leaves recovery mode I think it should abort an existing base backup so 
pg_stop_backup() will say no backup in progress, or give an error 
message on pg_stop_backup() saying that the base backup won't be 
usable.  The above error doesn't really tell the user why there is a 
mismatch.


-

In my testing a few times I got into a situation where a standby server 
coming from a recovery target took a while to finish recovery (this is 
on a database with no activity).  Then when i tried promoting that 
server to master I got


LOG:  trigger file found: /tmp/3
FATAL:  terminating walreceiver process due to administrator command
LOG:  restored log file 00010009 from archive
LOG:  restored log file 00010009 from archive
LOG:  redo done at 0/9E8
LOG:  restored log file 00010009 from archive
PANIC:  unexpected pageaddr 0/600 in log file 0, segment 9, offset 0
LOG:  startup process (PID 1804) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes


It is *possible* I mixed up the order of a step somewhere since my 
testing isn't script based. A standby server that 'looks' okay but can't 
actually be promoted is dangerous.


This version of the patch (I was testing the Sept 22nd version) seems 
less stable than how I remember the version from the July CF.  Maybe I'm 
just testing it harder or maybe something has been broken.





In the current

Re: [HACKERS] postgesql-9.0.4 compile on AIX 6.1 using gcc 4.4.6

2011-08-31 Thread Steve Singer


On 11-08-30 07:58 AM, Weiss, Wilfried wrote:


Hello,

I am just trying to compile postgresql-9.0.4 on AIX 6100-06-03-1048 
using gcc 4.4.6.


Unfortunately that was not all.

There was also:

[Bug target/46072] AIX linker chokes on debug info for uninitialized 
static variables


This is an IBM bug in AIX's assembler (as) which causes corrupt object 
code that is crashing when trying to execute it.


As far as I know IBM still not delived a fix for this. It seems that 
they are not interested in this as IBM's xlc is not using the 
assembler to create object code.


Does any one know whether there is an alternate way to compile 
postgresql on AIX 6.1 using gcc???


I appreciate even the smallest hint!



I have compiled 9.0.4 on AIX 5.3 with GCC 4.1.1 without any issues.
(well the regression tests hit an issue on REL9_0_STABLE builds that 
they don't hit with more recent branches but that is due to a makefile 
related issue that I should post about in a different thread.


The buildfarm member grebe 
(http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grebebr=HEAD) 
does this.


I do not have access to a AIX 6.1 machine



Regards

WW


http://www.pilkington.com/nsg/disclaimer.htm

[HACKERS] skip WAL on COPY patch

2011-08-23 Thread Steve Singer

The attached patch adds an option to the COPY command to skip writing 
WAL when the following conditions are all met:


1) The table is empty (zero size on disk)
2) The copy command can obtain an access exclusive lock on the table 
with out blocking.

3) The WAL isn't needed for replication

For example

COPY a FROM '/tmp/a.txt' (SKIP_WAL);

A non-default option to the copy command is required because the copy 
will block out any concurrent access to the table which would be 
undesirable in some cases and is different from the current behaviour.


This can safely be done because if the transaction does not commit the 
empty version of the data files are still available.  The COPY command 
already skips WAL if the table was created in the current transaction.



There was a discussion on something similar before[1] but I didn't see 
any discussion of having it only obtain the lock if it can do so without 
waiting (nor could I find in the archives what happened to that patch). 
 I'm not attached to the SKIP_WAL vs LOCK as the option



1- see http://archives.postgresql.org/pgsql-patches/2005-12/msg00206.php

Steve
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a73b022..3a0e521 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY { replaceable class=parameterta
*** 42,47 
--- 42,48 
  FORCE_QUOTE { ( replaceable class=parametercolumn/replaceable [, ...] ) | * }
  FORCE_NOT_NULL ( replaceable class=parametercolumn/replaceable [, ...] ) |
  ENCODING 'replaceable class=parameterencoding_name/replaceable'
+ SKIP_WAL 
  /synopsis
   /refsynopsisdiv
  
*** COPY { replaceable class=parameterta
*** 293,298 
--- 294,312 
for more details.
   /para
  /listitem
+ 	/varlistentry
+ 	varlistentry
+ 	termliteralSKIP_WAL//term
+  listitem
+ 	   para
+ Specifies that the writing of WAL should be skipped if possible.
+ WAL can be skipped if the table being copied into is empty and
+ if an exclusive lock can be obtained without waiting.  If this
+ option is specified and WAL is skipped then the transaction will
+ hold an exclusive lock on the table being copied until the transaction
+ commits.
+ 		/para
+ 	   /listitem
 /varlistentry
  
/variablelist
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 528a3a1..bd81a4b 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 29,34 
--- 29,35 
  #include commands/defrem.h
  #include commands/trigger.h
  #include executor/executor.h
+ #include commands/tablecmds.h
  #include libpq/libpq.h
  #include libpq/pqformat.h
  #include mb/pg_wchar.h
***
*** 37,42 
--- 38,44 
  #include parser/parse_relation.h
  #include rewrite/rewriteHandler.h
  #include storage/fd.h
+ #include storage/lmgr.h
  #include tcop/tcopprot.h
  #include utils/acl.h
  #include utils/builtins.h
*** typedef struct CopyStateData
*** 120,125 
--- 122,128 
  	bool	   *force_quote_flags;		/* per-column CSV FQ flags */
  	List	   *force_notnull;	/* list of column names */
  	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+ 	bool		skip_wal;/* skip WAL if able */
  
  	/* these are just for error messages, see CopyFromErrorCallback */
  	const char *cur_relname;	/* table name for error messages */
*** ProcessCopyOptions(CopyState cstate,
*** 965,970 
--- 968,978 
  		 errmsg(argument to option \%s\ must be a valid encoding name,
  defel-defname)));
  		}
+ 		else if (strcmp(defel-defname,skip_wal) == 0)
+ 		{
+ 
+ 			cstate-skip_wal=true;
+ 		}
  		else
  			ereport(ERROR,
  	(errcode(ERRCODE_SYNTAX_ERROR),
*** CopyFrom(CopyState cstate)
*** 1910,1915 
--- 1918,1957 
  		if (!XLogIsNeeded())
  			hi_options |= HEAP_INSERT_SKIP_WAL;
  	}
+ 	
+ 	/*
+ 	 * if SKIP_WAL was requested we try to avoid writing
+ 	 * WAL if the table is 0 bytes on disk (empty) and
+ 	 * that we can obtain an exclusive lock on it without blocking. 
+ 	 * 
+ 	 */
+ 	if(cstate-skip_wal  !XLogIsNeeded()  
+ 	   ConditionalLockRelationOid(cstate-rel-rd_id,AccessExclusiveLock))
+ 	{
+ 		
+ 		Datum size = DirectFunctionCall2(pg_relation_size,
+ 		 ObjectIdGetDatum(cstate-rel-rd_id),
+ 		 PointerGetDatum(cstring_to_text(main)));
+ 		if ( DatumGetInt64(size)==0)
+ 		{
+ 			/**
+ 			 * The relation is empty + unused.
+ 			 * truncate it so that if this transaction
+ 			 * rollsback then the changes to the relation files
+ 			 * will dissapear (the current relation files will
+ 			 * remain untouched)
+ 			 */
+ 			truncate_relation(cstate-rel);
+ 			hi_options |= HEAP_INSERT_SKIP_FSM;
+ 			hi_options |= HEAP_INSERT_SKIP_WAL;
+ 		}
+ 		else
+ 		{
+ 			UnlockRelation(cstate-rel,AccessExclusiveLock);
+ 		}
+ 	  
+ 	}
+ 
  
  	/*
  	 * We need a ResultRelInfo so we can use

Re: [HACKERS] skip WAL on COPY patch

2011-08-23 Thread Steve Singer


On 11-08-23 04:17 PM, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

What I think would be really interesting is a way to make this work
when the table *isn't* empty.  In other words, have a COPY option that
(1) takes an exclusive lock on the table, (2) writes the data being
inserted into new pages beyond the old EOF, and (3) arranges for crash
recovery or transaction abort to truncate the table back to its
previous length.  Then you could do fast bulk loads even into a table
that's already populated, so long as you don't mind that the table
will be excusive-locked and freespace within existing heap pages won't
be reused.


What are you going to do with the table's indexes?

regards, tom lane



What about not updating the indexes during the copy operation then to an 
automatic rebuild of the indexes after the copy (but during the same 
transaction).  If your only adding a few rows to a large table this 
wouldn't be what you want, but if your only adding a few rows then a 
small amount of WAL isn't a big concern either.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-08-16 Thread Steve Singer

On 11-08-16 02:09 AM, Jun Ishiduka wrote:

 Thanks. 

 This has the following two problems.
  * pg_start_backup() must set 'on' to full_page_writes of the master that 
is actual writing of the WAL, but not the standby.
Is there any way to tell from the WAL segments if they contain the full
page data? If so could you verify this on the second slave when it is
brought up? Or can you track this on the first slave and produce an
error in either pg_start_backup or pg_stop_backup()

I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
flag is used to indicate that the archiver can compress the full page
blocks to non-full page blocks. I am not familiar with where in the code
this actually happens but will this cause issues if the first standby is
processing WAL files from the archive?


  * The standby doesn't need to connect to the master that's actual writing 
WAL.
(Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)

 I'm worried how I should clear these problems.

 Regards.

 
 Jun Ishizuka
 NTT Software Corporation
 TEL：045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walprotocol.h vs frontends

2011-08-15 Thread Steve Singer


On 11-08-15 10:00 AM, Peter Geoghegan wrote:


Without commenting on what should be done in your specific case, I
wonder whether it's time to fully retire the deprecated double
representation of timestamps. Is anyone actually expected to rely on
their availability when 9.2 is released? This also caused difficulties
for Johann Oskarsson recently, during work on PL/Java.


This would mean that anyone using the floating point timestamps today 
won't be able to use pg_upgrade to upgrade to whichever version we 
remove them from.  8.3 had float based timestamps as the default and I 
suspect many installations with the default 8.3 settings have been 
upgraded via pg_upgrade to 9.0 built the old timestamps representation.


Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walprotocol.h vs frontends

2011-08-15 Thread Steve Singer


On 11-08-15 12:33 PM, Peter Geoghegan wrote:

On 15 August 2011 16:56, Steve Singerssin...@ca.afilias.info  wrote:

This would mean that anyone using the floating point timestamps today won't
be able to use pg_upgrade to upgrade to whichever version we remove them
from.  8.3 had float based timestamps as the default and I suspect many
installations with the default 8.3 settings have been upgraded via
pg_upgrade to 9.0 built the old timestamps representation.


Really? I find that slightly surprising, considering that a quick look
at master's timestamp.c suggests that the choice to use the in64
representation over the double representation is entirely a case of
compile time either/or. There is no apparent fall-back to the double
representation available to binaries built without
--disable-integer-datetimes.



I *think* the default on 8.3 was float based timestamps.  If you want to 
upgrade a system running 8.3 (that uses float based timestamps) in 
using pg_upgrade you must compile 9.0 (or 8.4 or 9.1) with 
--disable-integer-datetimes.  If at some point in the future you then 
want to upgrade to 9.2 with pg_upgrade you will again need to build 9.2 
with --disable-integer-datetimes.If we remove the code for floating 
point representations of datetime then you won't be able to do that.



Steve

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-07-10 Thread Steve Singer

On 11-07-07 09:22 PM, Jun Ishiduka wrote:
 As you proposed, adding new field which stores the backup end location
 taken from minRecoveryPoint, into pg_control sounds good idea.
 Update patch.

Here is a review of the updated patch

This version of the patch adds a field into pg_controldata that tries to
store the source of the base backup while in recovery mode.
I think your ultimate goal with this patch is to be able to take a
backup of a running hot-standby slave and recover it as another
instance. This patch seems to provide the ability to have the second
slave stop recovery at minRecoveryPoint from the control file.


My understanding of the procedure you want to get to to take base
backups off a slave is

1. execute pg_start_backup('x') on the slave (*)
2. take a backup of the data dir
3. call pg_stop_backup() on the slave
4. Copy the control file on the slave

This patch only addresses the recovery portions.

* - I think your goal is to be able to run pg_start_backup() on the
slave, the patch so far doesn't do this. If your goal was require this
to be run on the master, then correct me.


Code Review
---
A few comments on the code

 *** postgresql/src/include/catalog/pg_control.h 2011-06-30
 10:04:48.0 +0900
 --- postgresql_with_patch/src/include/catalog/pg_control.h 2011-07-07
 18:23:56.0 +0900
 ***
 *** 142,147 
 --- 142,157 
 XLogRecPtr backupStartPoint;

 /*
 + * Postgres keeps where to take a backup server.
 + *
 + * backupserver is none , master or slave, its default is none.
 + * When postgres starts and it is none, it is updated to either
 master
 + * or slave with minRecoveryPoint of the backup server.
 + * When postgres reaches backup end location, it is updated to none.
 + */
 + int backupserver;
 +
 + /*
 * Parameter settings that determine if the WAL can be used for archival
 * or hot standby.
 */

I don't think the above comment is very clear on what backupserver is.
Perhaps

/**
* backupserver is used while postgresql is in recovery mode to
* store the location of where the backup comes from.
* When Postgres starts recovery operations
* it is set to none. During recovery it is updated to either master,
or slave
* When recovery operations finish it is updated back to none
**/

Also shouldn't backupServer be the enum type of 'BackupServer' not int?
Other enums in the structure such as DBState are defined this way.

Testing Review
--

Since I can't yet call pg_start_backup or pg_stop_backup() on the slave
I am calling them on the master.
(I also did some testing where I didn't put the system into backup
mode). I admit that I am not sure what to look for as an indication that
the system isn't recovering to the correct point. In much of my testing
I was just verifying that the slave started and my data 'looked' okay.


I seem to get this warning in my logs when I start up the instance based
on the slave backup.
LOG: 0: database system was interrupted while in recovery at log
time 2011-07-08 18:40:20 EDT
HINT: If this has occurred more than once some data might be corrupted
and you might need to choose an earlier recovery target

I'm wondering if this warning is a bit misleading to users because it is
an expected message when starting up an instance based on a slave backup
(because the slave was already in recovery mode). If I shutdown this
instance and start it up again I keep getting the warning. My
understanding of your patch is that there shouldn't be any risk of
corruption in that case (assuming your patch has no bugs). Can/should we
be suppressing this message when we detect that we are recovering from a
slave backup?


The direction of the patch has changed a bit during this commit fest. I
think it would be good to provide an update on the rest of the changes
you plan for this to be a complete useable feature. That would make it
easier to comment on something you
missed versus something your planning on dealing with in the next stage.

Steve


 Regards.

 
 Jun Ishizuka
 NTT Software Corporation
 TEL：045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp

Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-02 Thread Steve Singer


On 11-06-28 02:14 PM, Martin Pihlak wrote:

Thanks for the review!
I have since simplified the patch to assume that partial SSL writes are
disabled -- according to SSL_write(3) this is the default behaviour.
Now the SSL retry buffer only holds the data to be retried, the
remainder is moved to the new outBuffer.



That sounds okay.  Does it make sense to add in a check to verify that 
SSL didn't send a partial write?  I don't know how bad openssl is about 
changing default behaviours or if we are concerned about protecting 
against someone changing the SSL parameters.  My inclination is that 
this isn't needed but I'll raise the issue.

Fixed.

New version of the patch attached.



Otherwise this version of the patch looks good to me.

The only testing I have done is running the test program you sent 
earlier on in the thread and verified that the regression tests all 
pass.  Other than something like your test program I'm not sure how else 
this bug can be induced.


Since the original patch was submitted as a WIP patch and this version 
wasn't sent until well into the commit fest I am not sure if it 
qualifies for a committer during this commitfest or if it needs to wait 
until the next one.







regards,
Martin

Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-30 Thread Steve Singer


On 11-06-28 02:14 PM, Martin Pihlak wrote:


Hmm, I thought I thought about that. There was a check in the original
patch: if (conn-sslRetryBytes || (conn-outCount - remaining)  0)
So if the SSL retry buffer was emptied it would return 1 if there was
something left in the regular output buffer. Or did I miss something
there?



The issue I saw in the original patch was that at that point in the 
function, sslRetryBytes could be zero (if the data was sent) but  
conn-outCount - remaining would be an incorrect value since remaining 
is the number of bytes left to send from sslRetryBuf NOT 
conn-outBuffer.   This is no longer an issue in the updated patch.

I will try to take a closer look at your updated patch in the next few days.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-06-28 Thread Steve Singer

On 11-06-28 01:52 AM, Jun Ishiduka wrote:
 Considering everything that has been discussed on this thread so far.

 Do you still think your patch is the best way to accomplish base backups
 from standby servers?
 If not what changes do you think should be made?
 I reconsider the way to not use pg_stop_backup().

 Process of online base backup on standby server:
  1. pg_start_backup('x');
  2. copy the data directory
  3. copy *pg_control*

 Behavior while restore:
  * read Minimum recovery ending location of the copied pg_control.
  * use the value with the same purposes as the end-of-backup location.
- When the value is equal to 0/0, this behavior do not do.
   This situation is to acquire backup from master server.


The behaviour you describe above sounds okay to me, if someone else sees
issues with this then they should speak up (ideally before you go off
and write a new patch)

I'm going to consolidate my other comments below so this can act as a
more complete review.

Usability Review
-
We would like to be able to perform base backups from slaves without
having to call pg_start_backup() on the master. We can not currently do
this. The patch tries to do this. From a useability point of view it
would be nice if this could be done both manually with pg_start_backup()
and with pg_basebackup.

The main issue I have with the first patch you submitted is that it does
not work for cases where you don't want to call pg_basebackup or you
don't want to include the wal segments in the output of pg_basebackup.
There are a number of these use-cases (examples include the wal is
already available on an archive server, or you want to use
filesystem/disk array level snapshots instead of tar) . I feel your
above proposal to copy the control file as the last step in the
basebackup and the get the minRecoveryEnding location from this solves
these issues. It would be nice if things 'worked' when calling
pg_basebackup against the slave (maybe by having perform_base_backup()
resend the control file after it has sent the log?).

Feature test  Performance review
-
Skipped since a new patch is coming

Coding Review
--
I didn't look too closely at the code since a new patch that might
change a lot of the code. I did like how you added comments to most of
the larger code blocks that you added.


Architecture Review
---
There were some concerns with your original approach but using the
control file was suggested by Heikki and seems sound to me.


I'm marking this 'waiting for author' , if you don't think you'll be
able to get a reworked patch out during this commitfest then you should
move it to 'returned with feedback'

Steve


 
 Jun Ishizuka
 NTT Software Corporation
 TEL：045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-06-24 Thread Steve Singer

On 11-06-24 12:41 AM, Jun Ishiduka wrote:

 The logic that not use pg_stop_backup() would be difficult,
 because pg_stop_backup() is used to identify minRecoveryPoint.


Considering everything that has been discussed on this thread so far.

Do you still think your patch is the best way to accomplish base backups
from standby servers?
If not what changes do you think should be made?


Steve

 
 Jun Ishizuka
 NTT Software Corporation
 TEL：045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-24 Thread Steve Singer


On 11-06-15 03:20 PM, Martin Pihlak wrote:


Yes, that sounds like a good idea -- especially considering that COPY
is not the only operation that can cause SSL_write retries.


This is of course still work in progress, needs cleaning up and
definitely more testing. But at this point before going any further,
I'd really appreciate a quick review from resident libpq gurus.



Martin,

I'm not a libpq guru but I've given your patch a look over.

The idea of storing the original buffer in new members of connection 
structure for later re-use seems like a good way to approach this.


A few things I noticed (that you might be aware of since you mentioned 
it needs cleanup)


-The patch doesn't compile with non-ssl builds,  the debug at the bottom 
of PQSendSome isn't in an #ifdef


-I don't think your handling the return code properly.   Consider this case.

pqSendSome(some data)
  sslRetryBuf = some Data
  return 1
pqSendSome(more data)
  it sends all of 'some data'
  returns 0

I think 1 should be returned because all of 'more data' still needs to 
be sent.  I think returning a 0 will break PQsetnonblocking if you call 
it when there is data in both sslRetryBuf and outputBuffer.
We might even want to try sending the data in outputBuffer after we've 
sent all the data sitting in sslRetryBuf.



If you close the connection with an outstanding sslRetryBuf you need to 
free it.



Other than running your test program I didn't do any testing with this 
patch.


Steve


regards,
Martin

Re: [HACKERS] Online base backup from the hot-standby

2011-06-23 Thread Steve Singer

On 11-06-23 02:41 AM, Jun Ishiduka wrote:
 I receive this mail, so I notice I do wrong recognition to what
 Heikki is proposing. 

 my recognition:
   Before:
 * I thought Heikki proposes, Execute SQL(pg_start_backup('x'); copy 
   the data directory and pg_stop_backup();) from the standby server 
   to the primary server.
   - I disliked this. 
   Now:
 * Heikki is proposing both No using pg_basebackup and Not specify -x.
   So,
 * Use the output of pg_stop_backup().
 * Don't use backup history file.
   he thinks.

 Right?


What I think he is proposing would not require using pg_stop_backup()
but you could also modify pg_stop_back() to work as well.

What do you think of this idea?

Do you see how the patch can be reworked to accomplish this?



 
 Jun Ishizuka
 NTT Software Corporation
 TEL：045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-06-21 Thread Steve Singer


On 11-06-20 03:44 PM, Pavel Stehule wrote:

Hello


You need to update config.sgml at the same time you update this format.
You need to append a , after application name but before constraintName.
As it stands the CSV log has something like:
.nbtinsert.c:433,psqla_pkey,public,a,a

fixed



The CSV log seems fine now.




nbtinsert.c

pg_get_indrelation is named differently than everything else in this file
(ie _bt...).  My guess is that this function belongs somewhere else but I
don't know the code well enough to say where you should move it too.


I renamed this function to IndexRelationGetParentRelation and muved to
relcache.c



Thanks, it looks less out of place there than it did in nbtinsert.c


I don't call a quote_identifier on only data error properties like
table_name or schema_name (but I am open to arguments for it or
against it). The quote_identifier is used for column names, because
there should be a more names and comma should be used inside name -
and this is consistent with pg_get_indexdef_columns.

Regards



Okay.




Pavel Stehule




I'm going to mark this as ready for a committer.

Re: [HACKERS] Online base backup from the hot-standby

2011-06-21 Thread Steve Singer

On 11-06-14 02:52 AM, Jun Ishiduka wrote:
 I still think that's headed in the wrong direction.
 (http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php)
 Please check these mails, and teach the reason for content of the wrong 
 direction.
 (http://archives.postgresql.org/pgsql-hackers/2011-06/msg00209.php)
 (http://archives.postgresql.org/pgsql-hackers/2011-05/msg01566.php)



Jun, I've been reviewing these threads as a start to reviewing your
patch (I haven't yet looked at the patch).

I *think* the concern is that

1) Today you can do a backup by just calling pg_start_backup('x'); copy
the data directory and
pg_stop_backup(); You do not need to use pg_basebackup to create a
backup. The solution you are proposing would require pg_basebackup to be
used to build backups from standby servers.

2) If I run pg_basebackup but do not specify '-x' then no pg_xlog
segments are included in the output. The relevant pg_xlog segments are
in the archive from the master. I can see situations where you are
already copying the archive to the remote site that the new standby will
be created in so you don't want to have to copy the pg_xlog segments
twice over your network.

What Heikki is proposing will work both when you aren't using
pg_basebackup (as long the output of pg_stop_backup() is somehow
captured in a way that it can be read) and will also work with
pg_basebackup when '-x' isn't specified.

Steve


 
 Jun Ishizuka
 NTT Software Corporation
 TEL：045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-06-19 Thread Steve Singer


On Sun, 19 Jun 2011, Pavel Stehule wrote:


Maybe there is second issue (little bit - performance - you have to
call a output function), But I agree, so this information is very
interesting and can help.


I am concerned about the performance impact of doing that.  Not all 
constraints are on int4 columns.  Some constraints might be on a geometry 
type that is megabytes in side taking a substantial chunk of CPU and 
bandwith to convert it into a text representation and then send it back to 
the client.





I am open for any ideas in this direction.

Regards

Pavel



best regards,
Florian Pflug







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-06-18 Thread Steve Singer


On 11-06-08 04:14 PM, Pavel Stehule wrote:

Hello

Attached patch implements a new erros's fields that describes table,
colums related to error. This enhanced info is limited to constraints
and RI.



Here is my review of this patch

Submission Review:

The patch applies cleanly against master
The patch does not include any documentation updates (see note below to 
update config.sgml)
The patch does not include any unit tests. At a minimum it should add a 
few tests with verbosity set to verbose



Usability Review

The patch adds the ability to get more information about the reasons a 
query failed.  Pavel indicates that this is a building block for a later 
patch.   This sounds like a worthwhile goal, without this patch I don't 
see another good way of getting the details on what columns make up the 
constraint that fails, other than making additional queries into the 
catalog.


This patch should not impact pg_dump or pg_upgrade.

Pavel has submitted a related patch that adds support for this feature 
to plpgsql,  in theory other pl's might want to use the information this 
patch exposes.



Feature Test
---

The error messages behave as described with \set verbosity verbose.

I tried this using both the 8.3 and 9.0 versions of psql (against a 
postgresql server with this patch) and things worked as expected (the 
extra error messages did not show).  I also tried the patched psql 
against an 8.3 backend and verified that we don't get strange behaviour 
going against an older backend with verbosity set.


I tried adding multiple constraints to a table and inserting a row that 
violates them, only one of the constraints showed up in the error 
message, this is fine and consistent with the existing behaviour



Consider this example of an error that gets generated

ERROR:  23505: duplicate key value violates unique constraint A_pkey
DETAIL:  Key (a)=(1) already exists.
LOCATION:  _bt_check_unique, nbtinsert.c:433
CONSTRAINT:  A_pkey
SCHEMA:  public
TABLE:  A
COLUMN:  a
STATEMENT:  insert into A values (1);

I think that both the CONSTRAINT, and TABLE name should be double quoted 
like A_pkey is above.  When doing this make sure you don't break the 
quoting in the CSV format log.



Performance Review
-
I don't see this patch impacting performance, I did not conduct any 
performance tests.



Coding Review
-


In tupdesc.c

line 202 the existing code is performing a deep copy of ConstrCheck.  Do 
you need to copy nkeys and conkey here as well?


Then at line 250 ccname is freed but not conkey


postgres_ext.h line 55
+ #define PG_DIAG_SCHEMA_NAME's'
+ #define PG_DIAG_TABLE_NAME't'
+ #define PG_DIAG_COLUMN_NAMES'c'
+ #define PG_DIAG_CONSTRAINT_NAME 'n'

The assignment of letters to parameters seems arbitrary to me, I don't 
have a different non-arbitrary mapping in mind but if anyone else does 
they should speak up.  I think it will be difficult to change this after 
9.2 goes out.



elog.c:
***
*** 2197,2202 
--- 2299,2319 
  if (application_name)
  appendCSVLiteral(buf, application_name);

+ /* constraint_name */
+ appendCSVLiteral(buf, edata-constraint_name);
+ appendStringInfoChar(buf, ',');
+
+ /* schema name */
+ appendCSVLiteral(buf, edata-schema_name);
+ appendStringInfoChar(buf, ',');

You need to update config.sgml at the same time you update this format.
You need to append a , after application name but before 
constraintName.   As it stands the CSV log has something like:

.nbtinsert.c:433,psqla_pkey,public,a,a


nbtinsert.c

pg_get_indrelation is named differently than everything else in this 
file (ie _bt...).  My guess is that this function belongs somewhere else 
but I don't know the code well enough to say where you should move it too.




Everything I've mentioned above is a minor issue, I will move the patch 
to 'waiting for author' and wait for you to release an updated patch.


Steve Singer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for 9.2: enhanced errors

2011-06-18 Thread Steve Singer


On 11-06-18 06:36 PM, Steve Singer wrote:

On 11-06-08 04:14 PM, Pavel Stehule wrote:

Here is my review of this patch

Submission Review:

The patch applies cleanly against master
The patch does not include any documentation updates (see note below 
to update config.sgml)
The patch does not include any unit tests. At a minimum it should add 
a few tests with verbosity set to verbose




On second thought tests might not work. Is the only way to get the 
constraint details are in verbose mode where line numbers from the c 
file are also included? If so then this won't work for the regression 
tests.   Having the diff comparison fail every time someone makes an 
unrelated change to a source file isn't what we want.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_listener in 9.0

2011-06-01 Thread Steve Singer


On 11-06-01 09:30 AM, Christopher Browne wrote:

On Wed, Jun 1, 2011 at 8:29 AM, Dave Pagedp...@pgadmin.org  wrote:

On Wed, Jun 1, 2011 at 12:27 PM, Andrew Dunstanand...@dunslane.net  wrote:

The whole point of the revamp was that pg_listener was a major performance
bottleneck and needed to go, and without it being gone we would not have got
notification payloads.

Yeah, I know why it was replaced. That doesn't mean we cannot provide
an alternative interface to the same info though (other things might
of course).


I suspect you're pretty much out of luck.

Not me - our users.

Note that in Slony 2.1, there's a table called sl_components, which is
used to capture the state of the various database connections,
checking in as the various threads do their various actions.

Also, slon and slonik try to report their respective application, so
it can be reported on pg_stat_activity.


Slony 2.1 also sets application_name.

If this were a big deal for pgAdmin we could consider backporting the 
application_name change to 2.0.x for users running against 9.0.


Slony also has a table called sl_nodelock that each slon process writes 
adds a row for on startup.  This includes the backend pid() for one of 
the connections.  Slony 1.2, 2.0 and 2.1 all use sl_nodelock




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink crash on PPC

2011-05-27 Thread Steve Singer


On 11-05-27 12:35 PM, Tom Lane wrote:


grebe, which is also a PPC64 machine, isn't showing the bug.  And I just
failed to reproduce the problem on a RHEL6 PPC64 box.  About to go try
it on RHEL5, which has a gcc version much closer to what wombat says
it's using, but I'm not very hopeful about that.  I think the more
likely thing to be keeping in mind is that Gentoo is a platform with
poor quality control.

regards, tom lane



As another data point, the dblink regression tests work fine for me on a 
PPC32 debian (squeeze,gcc 4.4.5) based system.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] branching for 9.2devel

2011-04-25 Thread Steve Singer


On 11-04-25 03:40 PM, Robert Haas wrote:

At the risk of getting a bit cranky, you haven't participated in a
material way in any CommitFest we've had in well over a year.  AFAICS,
the first, last, and only time you are listed in the CommitFest
application is as co-reviewer of a patch in July 2009, which means
that the last time you really had a major roll in this process was
during the 8.4 cycle.  So I'm really rather suspicious that you know
what's wrong with the process and how to fix it better than the people
who are involved currently.  I think we need here is more input from
the people who are regularly submitting and reviewing patches, and
those who have tried recently but been turned off by some aspect of
the process.



I reviewed a handful of patches during commitfests during the 9.1 
release.  I think  a commitfest lasting one week will make me review 
fewer patches not more.At the start of a commitfest I would 
volunteer to review a single patch knowing that it wouldn't be hard to 
find 4-6 hours to review the patch during the next few weeks.   Once I 
was done with the first patch when I thought I'd have sometime in the 
next few days to review another patch I would pick another one off the 
list.   At the start of the commitfest I wasn't concerned about picking 
up a patch because I knew I had lots of time to get to it.  Near the end 
of the last commitfest I wasn't concerned about picking up an unassigned 
patch because there were so many patches people picked up earlier on in 
the commitfest waiting for review that I didn't think I'd get pressured 
on a patch I had only picked up a day or two ago.  If I knew I was 
expected to turn a review  around in a short window I think I would only 
pick up a patch if I was 90% sure I'd have time to get to it during the 
week. I sometimes can spend $work time on reviewing patches but I can 
rarely block time off or schedule when that will be, and the same 
somewhat applies to the patch reviews I do on my own time (postgresql 
stuff comes after other commitments).


It is easy to say four weeks in a row I won't have time to review one 
patch this week it is harder to say I won't have time to review a 
single patch in the next month


I also should add that sometimes I'd review a patch and the only issue 
from the review  might be is this really how we want the thing to 
work? and the commitfest app doesn't have a good state for this.  If 
patch needs feedback or a decision from the community and it sometimes 
isn't clear when enough silence or +1's justify moving it to a committer 
or bouncing the patch to be reworked.


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] JDBC connections to 9.1

2011-04-18 Thread Steve Singer

I'm getting JDBC exceptions when I try to connect to 9.1 (master) with 
the postgresql-9.0-801.jdbc3.jar  I don't have this issue with 9.0.



There is nothing obvious at http://jdbc.postgresql.org or in the 9.1 
alpha release notes that indicate a newer JDBC driver will be required.


Have other people seen similar issues?

If 9.1 does require a JDBC driver upgrade then it would be good if an 
updated driver was posted before we start encouraging people to test 
their applications with the beta.






Caused by: org.postgresql.util.PSQLException: Protocol error.  Session 
setup failed.
	at 
org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:496)
	at 
org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:112)
	at 
org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)
	at 
org.postgresql.jdbc2.AbstractJdbc2Connection.init(AbstractJdbc2Connection.java:125)
	at 
org.postgresql.jdbc3.AbstractJdbc3Connection.init(AbstractJdbc3Connection.java:30)

at org.postgresql.jdbc3.Jdbc3Connection.init(Jdbc3Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:393)
at org.postgresql.Driver.connect(Driver.java:267)
at java.sql.DriverManager.getConnection(DriverManager.java:620)
at java.sql.DriverManager.getConnection(DriverManager.java:200)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:616)
at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:161)
... 14 more

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JDBC connections to 9.1

2011-04-18 Thread Steve Singer


On 11-04-18 09:44 AM, Tom Lane wrote:

Steve Singerssin...@ca.afilias.info  writes:

I'm getting JDBC exceptions when I try to connect to 9.1 (master) with
the postgresql-9.0-801.jdbc3.jar  I don't have this issue with 9.0.


Hmm, what shows up in the postmaster log?

regards, tom lane



LOG:  unexpected EOF on client connection


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python custom exceptions for SPI

2011-02-13 Thread Steve Singer


On 11-02-12 05:58 AM, Jan Urbański wrote:

On 11/02/11 10:53, Jan Urbański wrote:

On 10/02/11 22:26, Steve Singer wrote:

Here's an updated patch with documentation. It's an incremental patch on
top of the latest explicit-subxacts version.



This looks fine.  I've attached a one word documentation change to go o 
top of the patch.


I'll let Peter decide if he likes those assert's or not.  I don't have a 
good enough sense of if we often use asserts in that fashion elsewhere.





Cheers,
Jan





diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index aee54bf..4a90430 100644
*** a/doc/src/sgml/plpython.sgml
--- b/doc/src/sgml/plpython.sgml
*** $$ LANGUAGE plpythonu;
*** 966,972 
  /programlisting
 /para
 para
! The actual class of the exception being raised corresponds to exact
  condition that caused the error (refer to xref linkend=errcodes-table
  for a list of possible conditions).  The
  literalplpy.spiexceptions/literal module defines an exception class for
--- 966,972 
  /programlisting
 /para
 para
! The actual class of the exception being raised corresponds to the exact
  condition that caused the error (refer to xref linkend=errcodes-table
  for a list of possible conditions).  The
  literalplpy.spiexceptions/literal module defines an exception class for

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python explicit subtransactions

2011-02-11 Thread Steve Singer


On 11-02-10 05:20 AM, Jan Urbański wrote:


D'oh, I was thinking about whether it's safe to skip the internal
subxact if you're in an implicit one and somehow I always convinced
myself that since you eventually close the explicit one, it is.

Obviously my testing wasn't enough :( Attaching an updated patch with
improved docs incorporating Steve's fixes, and fixes  tests for not
statring the implicit subxact. That actually makes the patch a bit
smaller ;) OTOH I had to remove the section from the docs that claimed
performance improvement due to only starting the explicit subxact...



This version of the patch looks fine to me and seems to work as expected.



Cheers,
Jan

Re: [HACKERS] log_hostname and pg_stat_activity

2011-02-10 Thread Steve Singer


On 11-02-10 10:13 AM, Robert Haas wrote:

On Tue, Feb 1, 2011 at 1:33 PM, Robert Haasrobertmh...@gmail.com  wrote:

On Tue, Feb 1, 2011 at 1:09 PM, Peter Eisentrautpete...@gmx.net  wrote:

On tis, 2011-01-18 at 19:24 -0500, Steve Singer wrote:

However if I connect with a line in pg_hba that matches on an IP
network then my client_hostname is always null unless log_hostname is
set to true.  This is consistent with the behavior you describe but I
think the average user will find it a bit confusing.  Having a column
that is always null unless a GUC is set is less than ideal but I
understand why log_hostname isn't on by default.

Well, we have all these track_* variables, which also control what
appears in the statistics views.

After thinking about this some more, I think it might be better to be
less cute and forget about the interaction with the pg_hba.conf hostname
behavior.  That is, the host name is set if and only if log_hostname is
on.

+1 for doing it that way.

I think there are no outstanding issues with this patch of any
significance, so I'm marking it Ready for Committer.


Was there an uodated version of this patch I missed?

The original patch needed some sort of documentation saying that having 
something showup in the new pg_stat_activity columns is controlled by 
log_hostname.


Above Peter and you seem to agree that having the having the line 
matched in pg_hba being a controlling factor should be removed but I 
haven't seen an updated patch that implements that.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] log_hostname and pg_stat_activity

2011-02-10 Thread Steve Singer


On 11-02-10 10:32 AM, Robert Haas wrote:


I was assuming those changes were sufficiently trivial that they could
be made at commit-time, especially if Peter is committing it himself.
Of course if he'd like a re-review, he can always post an updated
patch, but I just thought that was overly pedantic in this particular
case.



Sounds reasonable.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python custom exceptions for SPI

2011-02-10 Thread Steve Singer


On 11-02-10 03:13 PM, Jan Urbański wrote:

On 10/02/11 20:24, Peter Eisentraut wrote:


Here is the rest of my review.


Submission Review
---
Patch applies cleanly.
Documentation is still outstanding but Jan has promised it soon.

Usability Review
---
We don't have this for plpython,  that we have a similar idea with 
plpgsql.  I think this feature is useful and worth having.


The CamelCase naming of the exceptions is consistent with how the 
built-in python exceptions are named (camel case).




Feature Test
---
I did basic testing of the feature (catching a few exception types 
thrown from both direct SQL and prepared statements) and the feature 
worked as expected.


Performance Impact

The impact of mapping error codes to exception types shouldn't come into 
play unless an SPI error is returned and with the hash it should still 
be minimal.




Code Review
-

Ideally char * members of ExceptionMap would be const, but since many 
versions of python take a non-const value to PyErr_NewException that 
won't work :(


After you search the for an exception in the hash you have:

/* We really should find it, but just in case have a fallback */
Assert(entry != NULL);
exc = entry ? entry-exc : PLy_exc_spi_error;

I'm not sure the assert is needed here.  Just falling back to the 
exception type seems reasonable and more desirable than an assert if 
showhow a new exception gets missed from the list. I don't feel that 
strongly on this.



line 3575:  PLy_elog(ERROR, Failed to add the spiexceptions module);
Failed should be failed

Other than that the patch looks fine to me.





Updated again.


Why do the error messages print spiexceptions.SyntaxError instead of
plpy.spiexceptions.SyntaxError?  Is this intentional or just the way it
comes out of Python?


That's how traceback.format_exception() works IIRC, which is what the
Python interpreter uses and what PL/Python mimicks in PLy_traceback.


Please add some documentation.  Not a list of all exceptions, but at
least a paragraph that various kinds of specific exceptions may be
generated, what package and module they are in, and how they relate.


Sure, Steve already asked for docs in another thread, and I'm writing them.

Jan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python explicit subtransactions

2011-02-09 Thread Steve Singer


On 11-02-09 05:22 PM, Peter Eisentraut wrote:

On tis, 2011-02-08 at 00:32 -0500, Steve Singer wrote:

On 11-02-06 11:40 AM, Jan Urbański wrote:


PFA an updated patch with documentation.
Yeah, changed them.

Those changes look fine.  The tests now pass.

I've attached a new version of the patch that fixes a few typos/wording
issues I saw in the documentation.  I also changed the link to the
python reference manual section on context managers. I think it is
better to link to that versus the original PEP.

The documentation could probably still use more word-smithing but that
can happen later.  I'm marking this as ready for a committer.

Is it necessarily a good idea that an explicit subtransaction disables
the implicit sub-subtransactions?  It might be conceivable that you'd
still want to do some try/catch within explicit subtransactions.




I had tested nested subtransactions but not a normal try/catch within a 
subtransaction.  That sounds reasonable to allow.


Unfortunately it leads to:


test=# create table foo(a int4 primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 
foo_pkey for table foo

CREATE TABLE
test=# DO $$
test$# try:
test$#   with plpy.subtransaction():
test$# plpy.execute(insert into foo values(1))
test$# try:
test$#   plpy.execute(insert into foo values(1))
test$# except:
test$#   plpy.notice('inside exception')
test$# except plpy.SPIError:
test$#   f=0
test$# $$ language plpythonu;
TRAP: FailedAssertion(!(afterTriggers-query_depth == 
afterTriggers-depth_stack[my_level]), File: trigger.c, Line: 3846)

NOTICE:  inside exception
CONTEXT:  PL/Python anonymous code block
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postponing some large patches to 9.2

2011-02-08 Thread Steve Singer


On 11-02-08 10:07 AM, Jan Urbański wrote:


  * custom SPI exceptions - I'd really like this one to go in, because it
allows writing UPSERT-kind functions in PL/Python very easily, and it's
just a handful of lines of code



I will try to do a review of this one (probably tomorrow night) since 
I've reviewed many of the related patches.



  * don't remove arguments - a bugfix, really, and a very small one

So from the above, I'd say custom datatype parsers could get rejected if
noone feels like having a discussion about it for 9.1. Table functions,
custom SPI exceptions and tracebacks are niceties that if postponed to
9.2 will just mean that many features less in 9.1. The rest is bordering
on bugfixes, and I think should go in.

Cheers,
Jan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postponing some large patches to 9.2

2011-02-07 Thread Steve Singer


On 11-02-07 10:37 PM, Robert Haas wrote:


- The PL/python extravaganza.  I'm not really clear where we stand
with this.  There are a lot of patches here.



Some of the patches have been committed a few others are ready (or 
almost ready) for a committer.   The table function one  is the only one 
in 'waiting for author'


4 of the patches haven't yet received any review. Jan Urbanski has been 
pretty good about posting updated patches as the dependent patches get 
updated.  It would be good if a few people grabbed these. The individual 
patches tend to not be that large.






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python explicit subtransactions

2011-02-07 Thread Steve Singer


On 11-02-06 11:40 AM, Jan Urbański wrote:


PFA an updated patch with documentation.



Yeah, changed them.


Those changes look fine.  The tests now pass.

I've attached a new version of the patch that fixes a few typos/wording 
issues I saw in the documentation.  I also changed the link to the 
python reference manual section on context managers. I think it is 
better to link to that versus the original PEP.


The documentation could probably still use more word-smithing but that 
can happen later.  I'm marking this as ready for a committer.






Thanks,
Jan


diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index e05c293..9137ceb 100644
*** a/doc/src/sgml/plpython.sgml
--- b/doc/src/sgml/plpython.sgml
*** $$ LANGUAGE plpythonu;
*** 943,949 
  
/sect2
  
!   sect2
 titleTrapping Errors/title
  
 para
--- 943,949 
  
/sect2
  
!   sect2 id=plpython-trapping
 titleTrapping Errors/title
  
 para
*** $$ LANGUAGE plpythonu;
*** 969,974 
--- 969,1087 
/sect2
   /sect1
  
+  sect1 id=plpython-subtransaction
+   titleExplicit subtransactions/title
+   para
+ Recovering from errors caused by database access as described
+ in xref linkend=plpython-trapping can lead to an undesirable situation
+ where some operations succeed before one of them fails and after recovering
+ from that error the data is left in an inconsistent state. PL/Python offers
+ a solution to this problem in the form of explicit subtransactions.
+   /para
+ 
+   sect2
+titleSubtransaction context managers/title
+para
+  Consider a function that implements a transfer between two accounts:
+ programlisting
+ CREATE FUNCTION transfer_funds() RETURNS void AS $$
+ try:
+ plpy.execute(UPDATE accounts SET balance = balance - 100 WHERE account_name = 'joe')
+ plpy.execute(UPDATE accounts SET balance = balance + 100 WHERE account_name = 'mary')
+ except plpy.SPIError, e:
+ result = error transferring funds: %s % e.args
+ else:
+ result = funds transferred correctly
+ plpy.execute(INSERT INTO operations(result) VALUES ('%s') % result)
+ $$ LANGUAGE plpythonu;
+ /programlisting
+  If the second literalUPDATE/literal statement results in an exception
+  being raised, this function will report the error, but the result of the
+  first literalUPDATE/literal will nevertheless be committed. In other
+  words, the funds will be withdrawn from Joe's account, but will not be
+  transferred to Mary's account.
+/para
+para
+  To avoid such issues, you can wrap your literalplpy.execute/literal
+  calls in an explicit subtransaction. The literalplpy/literal module
+  provides a helper object to manage explicit subtransactions that gets
+  created with the literalplpy.subtransaction()/literal function. 
+  Objects created by this function implement the ulink url=http://www.python.org/doc//current/library/stdtypes.html#context-manager-types;context manager interface/ulink.
+  
+  Using explicit subtransactions we can rewrite our function as:
+ programlisting
+ CREATE FUNCTION transfer_funds2() RETURNS void AS $$
+ try:
+ with plpy.subtransaction():
+ plpy.execute(UPDATE accounts SET balance = balance - 100 WHERE account_name = 'joe')
+ plpy.execute(UPDATE accounts SET balance = balance + 100 WHERE account_name = 'mary')
+ except plpy.SPIError, e:
+ result = error transferring funds: %s % e.args
+ else:
+ result = funds transferred correctly
+ plpy.execute(INSERT INTO operations(result) VALUES ('%s') % result)
+ $$ LANGUAGE plpythonu;
+ /programlisting
+  Note that the use of literaltry/catch/literal is still
+  required. Otherwise the exception would propagate to the top of the Python
+  stack and would cause the whole function to abort with
+  a productnamePostgreSQL/productname error.
+  The literaloperations/literal table would not have any row inserted
+  into it. The subtransaction context manager does not trap errors, it only
+  assures that all database operations executed inside its scope will be
+  atomically committed or rolled back.  A rollback of the subtransaction
+  block occurs on any kind of exception exit, not only ones caused by
+  errors originating from database access. A regular Python exception raised
+  inside an explicit subtransaction block would also cause the
+  subtransaction to be rolled back.
+/para
+para
+  Another reason to use explicit subtransactions is that each
+  time literalplpy.execute/literal or literalplpy.prepare/literal is
+  used, it has to create its own internal subtransaction in order to be able
+  to recover from errors using the literaltry/catch/literal construct. If
+  you explicitly put your database access code in the scope of a
+

Re: [HACKERS] pl/python explicit subtransactions

2011-02-02 Thread Steve Singer


On 11-01-27 05:11 PM, Jan Urbański wrote:

On 23/12/10 15:32, Jan Urbański wrote:

Here's a patch implementing explicitly starting subtransactions mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the spi-in-subxacts patch sent eariler.


Updated to the spi-in-subxacts version sent earlier.






Submission Review
-

The patch applies against master.
Test updates are included.

The patch doesn't included any documentation updates.  The author did 
mention that he'll do these if it looks like the patch is going to be 
accepted.


The plpython_subxact regression test you addded is failing on both 
python3 and 2.4 for me.  It seems to be creating functions with the same 
name twice and the second time is failing with ERROR: function . 
already exists.  I think this is just an issue with your expect files.


Usability Review
---

The patch implements a python context manager that allows plpython 
programs to control subtransactions with the python 'with' syntax.
The patch implements what it describes.  Using the subtransaction 
manager seems consistent with other examples of Python context managers. 
 This feature seems useful for pl/python developers.


The 'with' syntax was only officially added with python 2.6.  I have 
confirmed that the patch does not break plpython going as far back as 
2.5 and 2.4.  I have no reason to think that earlier versions will be 
broken either, I just didn't test anything earlier than 2.4.


I think this feature is useful for developing more complicated functions 
in pl/python and we don't have an existing way of managing savepoints 
from pl/python.  The context manager approach seems consistent with how 
recent python versions deal with this type of thing in other areas.




Feature Test

No issues found.


Code Review



PLy_abort_open_subtransactions(...) line 1215:

ereport(WARNING,
(errmsg(Forcibly aborting a subtransaction 
that has not been exited)));

Forcibly should be forcibly (lower case)

Similarly in PLy_subxact_enter and PLy_subxact_exit a few
PLy_exception_set calls start with an upper case character when I think 
we want it to be lower case.



Other than that I don't see any issues.  I am marking this as waiting 
for author since the documentation is still outstanding.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python SPI in subtransactions

2011-01-29 Thread Steve Singer


On 11-01-27 04:33 PM, Jan Urbański wrote:


Right, without the patch you can never catch errors originating from
plpy.execute, so any error terminates the whole function, and so rolls
back the statement. FWIW PL/Perl works the same:

begin;
create table foo(i int primary key);
DO $$
spi_exec_query(insert into foo values ('1'));
eval { spi_exec_query(insert into foo values ('1')); };
elog(LOG, $@) if $@;
$$ language plperl;
select * from foo;

You will see a row in foo. AFAICS PL/Tcl also does it, but I don't have
it complied it to try. It does consitute a behaviour change, but we
didn't get any complains when the same thing happened for Perl.



If we've made this type of behaviour change for pl/perl and no one 
complained then I don't see an issue with doing it for plpython (if 
anyone does they should speak up)




I am finding the treatment of savepoints very strange.
If as a function author I'm able to recover from errors then I'd expect
(or maybe want) to be able to manage them through savepoints

Ooops, you found a bug there. In the attached patch you get the same
error (SPI_ERROR_TRANSACTION) as in master. Also added a unit test for that.


I think you need to make the same change to PLy_spi_execute_plan.

Try

BEGIN;
create table foo(a int4 primary key);
DO $$
prep=plpy.prepare(savepoint save)
plpy.execute(prep)
r=plpy.execute(insert into foo values ('1'))
try :
  r=plpy.execute(insert into foo values ('1'))
except:
  prep=plpy.prepare(rollback to save)
  plpy.execute(prep)
  plpy.log(something went wrong)
$$ language plpythonu;



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python SPI in subtransactions

2011-01-29 Thread Steve Singer


On 11-01-29 03:39 PM, Jan Urbański wrote:


D'oh, you're right, thanks. Attached patch with fix. Curiosly, right now
in master your example with plpy.prepare will result in savepoint
being swallowed, but it's of course better to react with an error.

Cheers,
Jan


This seems to fix it.

You mentioned that you were going to add a few paragraphs to the 
documentation saying that you can now actually catch SPI errors. I 
haven't seen that yet.


Other than that I don't see any issues with the patch and it should be 
ready for a committer.









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python SPI in subtransactions

2011-01-25 Thread Steve Singer


On 10-12-23 08:45 AM, Jan Urbański wrote:

Here's a patch implementing a executing SPI in an subtransaction
mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the plpython-refactor patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/spi-in-subxacts.

Without it the error handling in PL/Python is really broken, as we jump
between from a saught longjmp back into Python without any cleanup. As
an additional bonus you no longer get all the ugly unrecognized error
in PLy_spi_execute_query errors.



I see you've merged the changes from the refactoring branch down but 
haven't yet posted an updated patch.  This review is based on 
2f2b4a33bf344058620a5c684d1f2459e505c727


As a disclaimer, I have worked  python before but not used plpython for 
anything substantial.



Submission Review
---
I think Jan intends to post an updated patch once the refactor branch 
has been dealt with.


The patch updates the excepted results of the regression tests so they 
no longer expect the 'unrecognized error' warnings.   No new unit test 
are added to verify that behavior changes will continue to function as 
intended (though they could be)


No documentation updates are included.  The current documentation is 
silent on the behaviour of plpython when SPI calls generate errors so 
this change doesn't invalidate any documentation but it would be nice if 
we described what effect SQL invoked through SPI from the functions have 
on the transaction. Maybe a Trapping Errors section?



Usability Review
---
Does the patch implement what it says:  yes

Do we want this:  yes I think so.  This patch allows pl/python functions 
to catch errors from the SQL they issue and deal with them as the 
function author sees fit.  I see this being useful.


A concern I have is that some users might find this surprising.  For 
plpgsql the exception handling will rollback SQL from before the 
exception and I suspect the other PL's are the same.  It would be good 
if a few more people chimed in on if they see this as a good idea.


Another concern is that we are probably breaking some peoples code.

Consider the following:

BEGIN;
create table foo(a int4 primary key);
DO $$
r=plpy.execute(insert into foo values ('1'))
try :
  r=plpy.execute(insert into foo values ('1'))
except:
  plpy.log(something went wrong)
$$ language plpython2u;
select * FROM foo;
a
---
 1
(1 row)


This code is going to behave different with the patch.  Without the 
patch the select fails because a) the transaction has rolled back which 
rollsback both insert and the create table.   With the patch the first 
row shows up in the select.   How concerned are we with changing the 
behaviour of existing plpython functions? This needs discussion.



I am finding the treatment of savepoints very strange.
If as a function author I'm able to recover from errors then I'd expect 
(or maybe want) to be able to manage them through savepoints



BEGIN;
create table foo(a int4 primary key);
DO $$
plpy.execute(savepoint save)
r=plpy.execute(insert into foo values ('1'))
try :
  r=plpy.execute(insert into foo values ('1'))
except:
  plpy.execute(rollback to save)
  plpy.log(something went wrong)
$$ language plpython2u;
select * FROM foo;
a
---
 1
(1 row)

when I wrote the above I was expecting either an error when I tried to 
use the savepoint (like in plpgsql) or for it rollback the insert. 
Without the patch I get

PL/Python: plpy.SPIError: SPI_execute failed: SPI_ERROR_TRANSACTION
This is much better than silently ignoring the command.

I've only glanced at your transaction manager patch, from what I can 
tell it will give me another way of managing the inner transaction but 
I'm not sure it will make the savepoint calls work(will it?). I also 
wonder how useful in practice this patch will be if the other patch 
doesn't also get applied (function others will be stuck with an all or 
nothing as their options for error handling)




Code Review
---
I don't see any issues with the code.






Cheers,
Jan







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] log_hostname and pg_stat_activity

2011-01-18 Thread Steve Singer



Here is my review for this patch


Submission Review

-Patch applies cleanly

-Patch does not include documentation changes.  At a minimum: update the 
table that lists what pg_stat_activity and pg_stat_replication includes 
in monitoring.sgml but I propose more below.


-No tests are included but writing unit tests that depend on produce the 
same output involving the hostname of the client is not possible.



Usability review


See my comments below in the testing section. The patch does do what it 
says but the log_hostname issue is a usability issue (it not being 
obvious why you get only null owhen log_hostname is turned off). 
Documenting it might be fine.  If log_hostname were new to 9.1 I would 
encourage renaming it to something that implies it does more than just 
control logging output but I don't see this as a good enough reason to 
rename a parameter.


I think being able to see the hostnames connections in pg_stat_activity 
come from is useful and it is a feature we don't already have.




Feature Test


If my connection is authorized through a line in pg_hba that uses 
client_hostname then the column shows what I expect even with 
log_hostname set to off.


However if I connect with a line in pg_hba that matches on an IP network 
then my client_hostname is always null unless log_hostname is set to 
true.  This is consistent with the behavior you describe but I think the 
average user will find it a bit confusing.  Having a column that is 
always null unless a GUC is set is less than ideal but I understand why 
log_hostname isn't on by default.


I would be inclined to add an entry for these views in catalogs.sgml 
that where we can then give users a pointer that they need to set 
log_hostname to get anything out of this column.


If I connect through unix sockets (say psql -h /tmp --port 5433)
I was also expecting to see [local] in client_hostname but I am just 
getting NULL.  This this lookup is static I don't see why it should be 
dependent on log_hostname (even with log_hostname set I'm not seeing 
[local])


I have not tested this with ipv6

Performance Review
--
The lookup is done when the connection is established not each time the 
view is queried.  I don't see any performance issues



Coding Review
-

As Alvaro pointed out BackendStatusShmemSize should be updated.

To answer his question about why clientaddr works:  clientaddr is a 
SockAddr which is a structure not a pointer so the data gets copied by 
value to beentry.   That won't work for strings,  I have no issues with 
how your allocating space in beentry and then strlcpy'ing the values.


I see no issues with the implementation

I'm marking this as 'Waiting for Author' pending documentation changes 
and maybe a fix the behaviour with unix sockets


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add a primary key using an existing index

2011-01-16 Thread Steve Singer


I've taken a look at this version of the patch.


Submission Review

This version of the patch applies cleanly to master. It matches your git 
repo and includes test + docs.


Usability Review
---

The command syntax now matches what was discussed during the last cf.

The text of the notice:

test=# alter table a add constraint acons unique using index aind2;
NOTICE:  ALTER TABLE / ADD UNIQUE USING INDEX will rename index aind2 
to acons




Documentation
--

I've attached a patch (to be applied on top of your latest patch) with 
some editorial changes I'd recommend to your documentation.  I feel it 
reads a bit clearer (but others should chime in if they disagree or have 
better wordings)


 A git tree with changes rebased to master + this change is available 
at https://github.com/ssinger/postgres/tree/ssinger/constraint_with_index



Code Review
---

src/backend/parser/parse_utilcmd.c: 1452
Your calling strdup on the attribute name.  I don't have a good enough 
grasp on the code to be able to trace this through to where the memory 
gets free'd.  Does it get freed? Should/could this be a call to pstrdup


Feature Test
-

I wasn't able to find any issues in my testing

I'm marking this as returned with feedback pending your answer on the 
possible memory leak above but I think the patch is very close to being 
ready.



Steve Singer


diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 83d2fbb..0b486ab 100644
*** a/doc/src/sgml/ref/alter_table.sgml
--- b/doc/src/sgml/ref/alter_table.sgml
*** ALTER TABLE replaceable class=PARAMETE
*** 242,268 
  termliteralADD replaceable class=PARAMETERtable_constraint_using_index/replaceable/literal/term
  listitem
   para
!   This form adds a new literalPRIMARY KEY/ or literalUNIQUE/
constraint to the table using an existing index. All the columns of the
index will be included in the constraint.
   /para
  
   para
!   The index should be UNIQUE, and should not be a firsttermpartial index/
!   or an firsttermexpressional index/.
   /para
  
   para
!   This can be helpful in situations where one wishes to recreate or
!   literalREINDEX/ the index of a literalPRIMARY KEY/ or a
!   literalUNIQUE/ constraint, but dropping and recreating the constraint
!   to acheive the effect is not desirable. See the illustrative example below.
   /para
  
   note
   para
If a constraint name is provided then the index will be renamed to that
!   name, else the constraint will be named to match the index name.
   /para
  /note
  
--- 242,270 
  termliteralADD replaceable class=PARAMETERtable_constraint_using_index/replaceable/literal/term
  listitem
   para
!   This form adds a new literalPRIMARY KEY/ or literalUNIQUE/literal
constraint to the table using an existing index. All the columns of the
index will be included in the constraint.
   /para
  
   para
!   The index should be a UNIQUE index. A firsttermpartial index/firstterm
! 	  or an firsttermexpressional index/firstterm is not allowed.
   /para
  
   para
!   Adding a constraint using an existing index can be helpful in situations 
! 	  where you wishes to rebuild an index used for a  
! 	  literalPRIMARY KEY/literal or a literalUNIQUE/literal constraint,
! 	  but dropping and recreating the constraint
!   is not desirable. See the illustrative example below.
   /para
  
   note
   para
If a constraint name is provided then the index will be renamed to that
!   name of the constraint. Otherwise the constraint will be named to match 
! 	  the index name.
   /para
  /note
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-07 Thread Steve Singer


On 10-12-06 09:00 PM, Josh Berkus wrote:

Steve,


If you tell me which options to pgbench and which .conf file settings
you'd like to see I can probably arrange to run some tests on AIX.


Compile and run test_fsync in PGSRC/src/tools/fsync.



Attached are runs against two different disk sub-systems from a server 
running AIX 5.3.


The first one is against the local disks


Loops = 1

Simple write:
8k write  60812.454/second

Compare file sync methods using one write:
open_datasync 8k write  162.160/second
open_sync 8k write  158.472/second
8k write, fdatasync 158.157/second
8k write, fsync  45.382/second

Compare file sync methods using two writes:
2 open_datasync 8k writes79.472/second
2 open_sync 8k writes80.095/second
8k write, 8k write, fdatasync   159.268/second
8k write, 8k write, fsync44.725/second

Compare open_sync with different sizes:
open_sync 16k write 162.017/second
2 open_sync 8k writes79.709/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close   45.361/second
8k write, close, fsync   36.311/second





The below profile is from the same machine using an IBM DS 6800 SAN for 
storage.



Loops = 1

Simple write:
8k write  75933.027/second

Compare file sync methods using one write:
open_datasync 8k write 2762.801/second
open_sync 8k write 2453.822/second
8k write, fdatasync2867.331/second
8k write, fsync1094.048/second

Compare file sync methods using two writes:
2 open_datasync 8k writes  1287.845/second
2 open_sync 8k writes  1332.084/second
8k write, 8k write, fdatasync  1966.411/second
8k write, 8k write, fsync  1048.354/second

Compare open_sync with different sizes:
open_sync 16k write2281.425/second
2 open_sync 8k writes  1401.561/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 1298.404/second
8k write, close, fsync 1188.582/second




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Steve Singer


On 10-12-06 06:56 PM, Greg Smith wrote:

Tom Lane wrote:

The various testing that's been reported so far is all for
Linux and thus doesn't directly address the question of whether other
kernels will have similar performance properties.


Survey of some popular platforms:



snip


So my guess is that some small percentage of Windows users might notice
a change here, and some testing on FreeBSD would be useful too. That's
about it for platforms that I think anybody needs to worry about.


If you tell me which options to pgbench and which .conf file settings 
you'd like to see I can probably arrange to run some tests on AIX.






--
Greg Smith   2ndQuadrant usg...@2ndquadrant.comBaltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance:http://www.2ndQuadrant.com/books




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add a primary key using an existing index

2010-11-25 Thread Steve Singer


On 10-11-22 03:24 PM, Steve Singer wrote:

On 10-11-22 09:37 AM, Gurjeet Singh wrote:

On Sat, Nov 20, 2010 at 9:00 AM, Steve Singer ssinger...@sympatico.ca



Almost fixed.
I still get an unexpected difference.

! DETAIL: cannot create PRIMARY KEY/UNIQUE constraint with a non-unique
index.
CREATE UNIQUE INDEX rpi_idx2 ON rpi_test(a , b);
-- should fail; WITH INDEX option specified more than once.
ALTER TABLE rpi_test ADD PRIMARY KEY (a, b)
--- 35,41 
-- should fail; non-unique
ALTER TABLE rpi_test ADD primary key(a, b) WITH (INDEX = 'rpi_idx1');
ERROR: rpi_idx1 is not a unique index
! DETAIL: Cannot create PRIMARY KEY/UNIQUE constraint using a non-unique
index.


The attached version of the patch gets your regression tests to pass.

I'm going to mark this as ready for a committer.






replace_pkey_index.revised2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add a primary key using an existing index

2010-11-22 Thread Steve Singer


On 10-11-22 09:37 AM, Gurjeet Singh wrote:
On Sat, Nov 20, 2010 at 9:00 AM, Steve Singer ssinger...@sympatico.ca 
mailto:ssinger...@sympatico.ca wrote:



Submission Review:


Tests

The expected output for the regression tests you added don't match
what I'm getting when I run the tests with your patch applied.
I think you just need to regenerate the expected results they seem
to be from a previous version of the patch (different error
messages etc..).


Fixed. Also modified one test to cover the case where constraint name 
is provided.


Almost fixed.
I still get an unexpected difference.

! DETAIL:  cannot create PRIMARY KEY/UNIQUE constraint with a non-unique 
index.

  CREATE UNIQUE INDEX rpi_idx2 ON rpi_test(a , b);
  -- should fail; WITH INDEX option specified more than once.
  ALTER TABLE rpi_test ADD PRIMARY KEY (a, b)
--- 35,41 
  -- should fail; non-unique
  ALTER TABLE rpi_test ADD primary key(a, b) WITH (INDEX = 'rpi_idx1');
  ERROR:  rpi_idx1 is not a unique index
! DETAIL:  Cannot create PRIMARY KEY/UNIQUE constraint using a 
non-unique index.







Documentation
---

I was able to generate the docs.

The ALTER TABLE page under the synopsis has

ADD table_constraint

where table_constraint is defined on the CREATE TABLE page.
On the CREATE TABLE page table_constraint isn't defined as having
the WITH
, the WITH is part of index_parameters.

I propose the alter table page instead have

ADD table_constraint [index_parameters]

where index_parameters also references the CREATE TABLE page like
table_constraint.


IMHO index_parameters is an optional component of table_constraint, 
and hence can't be mentioned here, at least not the way shown above.




My reading of CREATE TABLE is that index_parameters is an optional 
parameter that comes after table_constraint and isn't part of 
table_constraint.   Any other opinions?


Everything else I mentioned seems fixed in this version



gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: [HACKERS] Patch to add a primary key using an existing index

2010-11-19 Thread Steve Singer


On 10-11-07 01:54 PM, Gurjeet Singh wrote:

Attached is the patch that extends the same feature for UNIQUE indexes.

It also includes some doc changes for the ALTER TABLE command, but I
could not verify the resulting changes since I don't have the
doc-building infrastructure installed.

Regards,



Gurjeet,

I've taken a stab at reviewing this.

Submission Review:


Tests

The expected output for the regression tests you added don't match
what I'm getting when I run the tests with your patch applied.
I think you just need to regenerate the expected results they seem
to be from a previous version of the patch (different error messages etc..).


Documentation
---

I was able to generate the docs.

The ALTER TABLE page under the synopsis has

 ADD table_constraint

where table_constraint is defined on the CREATE TABLE page.
On the CREATE TABLE page table_constraint isn't defined as having the WITH
, the WITH is part of index_parameters.

I propose the alter table page instead have

ADD table_constraint [index_parameters]

where index_parameters also references the CREATE TABLE page like 
table_constraint.




Usability Review


Behaviour
-
I feel that if the ALTER TABLE ... renames the the index
a NOTICE should be generated.  We generate notices about creating an 
index for a new pkey. We should give them a notice that we are renaming 
an index on them.


Coding Review:
==

Error Messages
-
in tablecmds your errdetail messages often don't start with a capital 
letter. I belive the preference is to have the errdetail strings start 
with a capital letter and end with a period.



tablecmds.c  - get_constraint_index_oid

contains the check

/* Currently only B-tree indexes are suupported for primary keys */
if (index_rel-rd_rel-relam != BTREE_AM_OID)
elog(ERROR, \%s\ is not a B-Tree index, index_name);

but above we already validate that the index is a unique index with 
another check.  Today only B-tree indexes support unique constraints. 
If this changed at some point and we could have a unique index of some 
other type, would something in this patch need to be changed to support 
them?  If we are only depending on the uniqueness property then I think 
this check is covered by the uniquness one higher in the function.


Also note the typo in your comment above (suupported)




Comments
-

index.c: Line 671 and 694.  Your indentation changes make the comments
run over 80 characters.  If you end up submitting a new version
of the patch I'd reformat those two comments.


Other than those issues the patch looks good to me.

Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: rollback sequence reset for TRUNCATE ... RESTART IDENTITY

2010-11-17 Thread Steve Singer


On 10-11-17 03:00 PM, Marc Cousin wrote:

The Wednesday 17 November 2010 19:41:19, Tom Lane wrote :

Marc Cousincousinm...@gmail.com  writes:

- Does the feature work as advertised?

Yes. It works consistently, isn't fooled by savepoints or multiple
serials in a table, or concurrent transactions


I think there's a rather nasty problem here, which is what to do with
the cached nextval/currval state.  As submitted, the patch does the same
thing as ALTER SEQUENCE RESTART (to wit, clear any cached unissued
nextval values, but don't touch currval) at the time of resetting the
sequence.  That's fine, but what if the transaction later rolls back?
The cached state is untouched by rollback, so if the transaction had
done any nextval()s meanwhile, the cache will be out of step with the
rolled-back sequence contents.


Yes, I completely missed testing with non default cache value. And it fails,
of course, some values are generated a second time twice after a rollback



I will look at addressing this in an updated patch.



We never had to worry about this before because sequence operations
didn't roll back, by definition.  If we're going to add a situation
where they do roll back, we need to consider the case.

I think we can arrange to clear cached unissued values on the next
attempt to nextval() the sequence, by dint of adding the relfilenode
to SeqTable entries and clearing cached state whenever we note that
it doesn't match the current relfilenode of the sequence.  However,
I'm unsure what ought to happen to currval.  It doesn't seem too
practical to try to roll it back to its pre-transaction value.
Should we leave it alone (ie, possibly reflecting a value that was
assigned inside the failed transaction)?  The other alternative would
be to clear it as though nextval had never been issued at all in the
session.



Should currval really be used after a failed transaction ? Right now, we can
have a value that has been generated inside a rollbacked transaction too. I'd
vote for leave it alone.



I agree  probably shouldn't be using curval after a failed transaction 
which is why having it return as if it hadn't been issued sounds like a 
more reasonable behaviour.  If an application tries a currval following 
the rollback then at least the application won't get a bogus value.  It 
is better to return an error than to let the application continuing on 
thinking it has a sequence value that won't be (or has not) been 
assigned to some other session.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Rollback sequence reset on TRUNCATE rollback patch

2010-10-26 Thread Steve Singer

The attached patch modifies TRUNCATE ... RESTART IDENTITY so that if the 
transaction rolls back the restart of the sequence will also be rolled back.


It follows the general outline discussed at 
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00550.php of 
assigning a new reffilenode to the sequence.



I will add this to the next commitfest.


Steve
diff --git a/doc/src/sgml/ref/truncate.sgml b/doc/src/sgml/ref/truncate.sgml
index f32d255..137eade 100644
*** a/doc/src/sgml/ref/truncate.sgml
--- b/doc/src/sgml/ref/truncate.sgml
*** TRUNCATE [ TABLE ] [ ONLY ] replaceable
*** 159,190 
 transaction does not commit.
/para
  
-   warning
-para
- Any commandALTER SEQUENCE RESTART/ operations performed as a
- consequence of using the literalRESTART IDENTITY/ option are
- nontransactional and will not be rolled back on failure.  To minimize
- the risk, these operations are performed only after all the rest of
- commandTRUNCATE/'s work is done.  However, there is still a risk
- if commandTRUNCATE/ is performed inside a transaction block that is
- aborted afterwards.  For example, consider
  
- programlisting
- BEGIN;
- TRUNCATE TABLE foo RESTART IDENTITY;
- COPY foo FROM ...;
- COMMIT;
- /programlisting
- 
- If the commandCOPY/ fails partway through, the table data
- rolls back correctly, but the sequences will be left with values
- that are probably smaller than they had before, possibly leading
- to duplicate-key failures or other problems in later transactions.
- If this is likely to be a problem, it's best to avoid using
- literalRESTART IDENTITY/, and accept that the new contents of
- the table will have higher serial numbers than the old.
-/para
-   /warning
   /refsect1
  
   refsect1
--- 159,165 
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 04b0c71..4fb9093 100644
*** a/src/backend/commands/sequence.c
--- b/src/backend/commands/sequence.c
***
*** 35,41 
  #include utils/lsyscache.h
  #include utils/resowner.h
  #include utils/syscache.h
! 
  
  /*
   * We don't want to log each fetching of a value from a sequence,
--- 35,41 
  #include utils/lsyscache.h
  #include utils/resowner.h
  #include utils/syscache.h
! #include utils/snapmgr.h
  
  /*
   * We don't want to log each fetching of a value from a sequence,
*** static void init_params(List *options, b
*** 96,101 
--- 96,104 
  static void do_setval(Oid relid, int64 next, bool iscalled);
  static void process_owned_by(Relation seqrel, List *owned_by);
  
+ static void init_seq_relation(Relation rel,TupleDesc tupDesc,Datum * value,
+ 			  bool * null,List * owned_by);
+ 
  
  /*
   * DefineSequence
*** DefineSequence(CreateSeqStmt *seq)
*** 109,119 
  	CreateStmt *stmt = makeNode(CreateStmt);
  	Oid			seqoid;
  	Relation	rel;
! 	Buffer		buf;
! 	Page		page;
! 	sequence_magic *sm;
! 	HeapTuple	tuple;
! 	TupleDesc	tupDesc;
  	Datum		value[SEQ_COL_LASTCOL];
  	bool		null[SEQ_COL_LASTCOL];
  	int			i;
--- 112,118 
  	CreateStmt *stmt = makeNode(CreateStmt);
  	Oid			seqoid;
  	Relation	rel;
! 	TupleDesc tupDesc;
  	Datum		value[SEQ_COL_LASTCOL];
  	bool		null[SEQ_COL_LASTCOL];
  	int			i;
*** DefineSequence(CreateSeqStmt *seq)
*** 210,217 
  
  	rel = heap_open(seqoid, AccessExclusiveLock);
  	tupDesc = RelationGetDescr(rel);
  
! 	/* Initialize first page of relation with special magic number */
  
  	buf = ReadBuffer(rel, P_NEW);
  	Assert(BufferGetBlockNumber(buf) == 0);
--- 209,293 
  
  	rel = heap_open(seqoid, AccessExclusiveLock);
  	tupDesc = RelationGetDescr(rel);
+ 	init_seq_relation(rel, tupDesc,value,null,owned_by);
+ 	heap_close(rel, NoLock);
+ }
  
! /**
!  * Resets the relation used by a sequence.
!  *
!  * The sequence is reset to its initial values,
!  * the old sequence is left in place in case the
!  * transaction rolls back.
!  */
! void ResetSequenceRelation(Oid seq_relid,List * options)
! {
! 	Relation seq_rel = relation_open(seq_relid,AccessExclusiveLock);
! 	SeqTable elm;
! 	Page page;
! 	Form_pg_sequence seq;
! 	Buffer buf;
! 	TupleDesc tupDesc;
! 	sequence_magic * sm;
! 	HeapTupleData tuple;
! 	ItemId lp;
! 	Datum * values;
! 	bool * isnull;
! 
! 	/**
! 	 * Read the old sequence.
! 	 *
! 	 */
! 	init_sequence(seq_relid,elm,seq_rel);
! 	seq = read_info(elm,seq_rel,buf);
! 	page = BufferGetPage(buf);
! 	sm = (sequence_magic *) PageGetSpecialPointer(page);
! 
! 	if (sm-magic != SEQ_MAGIC)
! 		elog(ERROR, bad magic number in sequence \%s\: %08X,
! 			 RelationGetRelationName(seq_rel), sm-magic);
! 
! 	lp = PageGetItemId(page, FirstOffsetNumber);
! 	Assert(ItemIdIsNormal(lp));
! 
! 	tuple.t_data = (HeapTupleHeader) PageGetItem(page, lp);
! 	tupDesc = RelationGetDescr(seq_rel);
! 	values=palloc(sizeof(Datum)*tupDesc-natts);
! 	isnull=palloc(sizeof(bool)*tupDesc-natts);
!

Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-11 Thread Steve Singer


On Sun, 10 Oct 2010, Marko Tiikkaja wrote:


On 2010-10-07 5:21 AM +0300, Steve Singer wrote:

Since no one else has proposed a better idea and the commit fest is ticking
away I think you should go ahead and do that.


Here's a new version of the patch, deprecating pg_parse_and_rewrite.

I duplicated the parse/rewrite logic in the two places where 
pg_parse_and_rewrite is currently used, per comment from Tom.


That looks fine.

I've marking the patch as ready for a committer.





Regards,
Marko Tiikkaja




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep at Oct 5

2010-10-07 Thread Steve Singer


On 10-10-07 05:52 AM, Fujii Masao wrote:

On Wed, Oct 6, 2010 at 4:06 PM, Simon Riggssi...@2ndquadrant.com  wrote:

The problem is how much WAL is stored on (any) node. Currently that is
wal_keep_segments, which doesn't work very well, but I've seen no better
ideas that cover all important cases.


What about allowing the master to read and send WAL from the archive?

Regards,


Then you have to deal with telling the archive how long it needs to keep 
WAL segments because the master might ask for them back.  If the archive 
is remote from the master then you have some extra network copying going 
on.  It would be better to let the slave being reconfigured to read the 
missing WAL from the archive.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

< 1 2 3 >

101 - 200 of 223 matches

Mail list logo