[HACKERS] recapitulation: FOREACH-IN-ARRAY

2010-12-23 Thread Pavel Stehule
Hello

I reread a discus to this topic. It's look like we can to find a
agreement on following syntax:

FOREACH var [,var [..]] [SLICE number] IN expr
LOOP
END LOOP;

In default mode, without keyword SLICE, it will iterate over every
field like unnest does. With SLICE keyword, it iterate over nested
arrays. Deep can be chosen. I don't see a reason why the deep should
be a dynamic. There can be a constant.

variable list is used only when item is record or row type.

I am not sure what is better keyword - SLICE or SLICING ?

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Magnus Hagander
Here's a patch that changes walsender to require a special privilege
for replication instead of relying on superuser permissions. We
discussed this back before 9.0 was finalized, but IIRC we ran out of
time. The motivation being that you really want to use superuser as
little as possible - and since being a replication slave is a read
only role, it shouldn't require the maximum permission available in
the system.

Obviously the patch needs docs and some system views updates, which I
will add later. But I wanted to post what I have so far for a quick
review to confirm whether I'm on the right track or not... How it
works should be rather obvious - adds a WITH
REPLICATION/NOREPLICATION to the create and alter role commands, and
then check this when a connection attempts to start the walsender.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/src/backend/commands/user.c
--- b/src/backend/commands/user.c
***
*** 94,99  CreateRole(CreateRoleStmt *stmt)
--- 94,100 
  	bool		createrole = false;		/* Can this user create roles? */
  	bool		createdb = false;		/* Can the user create databases? */
  	bool		canlogin = false;		/* Can this user login? */
+ 	bool		isreplication = false; /* Is this a replication role? */
  	int			connlimit = -1; /* maximum connections allowed */
  	List	   *addroleto = NIL;	/* roles to make this a member of */
  	List	   *rolemembers = NIL;		/* roles to be members of this role */
***
*** 107,112  CreateRole(CreateRoleStmt *stmt)
--- 108,114 
  	DefElem*dcreaterole = NULL;
  	DefElem*dcreatedb = NULL;
  	DefElem*dcanlogin = NULL;
+ 	DefElem	   *disreplication = NULL;
  	DefElem*dconnlimit = NULL;
  	DefElem*daddroleto = NULL;
  	DefElem*drolemembers = NULL;
***
*** 190,195  CreateRole(CreateRoleStmt *stmt)
--- 192,205 
  		 errmsg(conflicting or redundant options)));
  			dcanlogin = defel;
  		}
+ 		else if (strcmp(defel-defname, isreplication) == 0)
+ 		{
+ 			if (disreplication)
+ ereport(ERROR,
+ 		(errcode(ERRCODE_SYNTAX_ERROR),
+ 		 errmsg(conflicting or redundant options)));
+ 			disreplication = defel;
+ 		}
  		else if (strcmp(defel-defname, connectionlimit) == 0)
  		{
  			if (dconnlimit)
***
*** 247,252  CreateRole(CreateRoleStmt *stmt)
--- 257,264 
  		createdb = intVal(dcreatedb-arg) != 0;
  	if (dcanlogin)
  		canlogin = intVal(dcanlogin-arg) != 0;
+ 	if (disreplication)
+ 		isreplication = intVal(disreplication-arg) != 0;
  	if (dconnlimit)
  	{
  		connlimit = intVal(dconnlimit-arg);
***
*** 341,346  CreateRole(CreateRoleStmt *stmt)
--- 353,359 
  	/* superuser gets catupdate right by default */
  	new_record[Anum_pg_authid_rolcatupdate - 1] = BoolGetDatum(issuper);
  	new_record[Anum_pg_authid_rolcanlogin - 1] = BoolGetDatum(canlogin);
+ 	new_record[Anum_pg_authid_rolreplication - 1] = BoolGetDatum(isreplication);
  	new_record[Anum_pg_authid_rolconnlimit - 1] = Int32GetDatum(connlimit);
  
  	if (password)
***
*** 439,444  AlterRole(AlterRoleStmt *stmt)
--- 452,458 
  	int			createrole = -1;	/* Can this user create roles? */
  	int			createdb = -1;	/* Can the user create databases? */
  	int			canlogin = -1;	/* Can this user login? */
+ 	int			isreplication = -1; /* Is this a replication role? */
  	int			connlimit = -1; /* maximum connections allowed */
  	List	   *rolemembers = NIL;		/* roles to be added/removed */
  	char	   *validUntil = NULL;		/* time the login is valid until */
***
*** 450,455  AlterRole(AlterRoleStmt *stmt)
--- 464,470 
  	DefElem*dcreaterole = NULL;
  	DefElem*dcreatedb = NULL;
  	DefElem*dcanlogin = NULL;
+ 	DefElem	   *disreplication = NULL;
  	DefElem*dconnlimit = NULL;
  	DefElem*drolemembers = NULL;
  	DefElem*dvalidUntil = NULL;
***
*** 514,519  AlterRole(AlterRoleStmt *stmt)
--- 529,542 
  		 errmsg(conflicting or redundant options)));
  			dcanlogin = defel;
  		}
+ 		else if (strcmp(defel-defname, isreplication) == 0)
+ 		{
+ 			if (disreplication)
+ ereport(ERROR,
+ 		(errcode(ERRCODE_SYNTAX_ERROR),
+ 		 errmsg(conflicting or redundant options)));
+ 			disreplication = defel;
+ 		}
  		else if (strcmp(defel-defname, connectionlimit) == 0)
  		{
  			if (dconnlimit)
***
*** 556,561  AlterRole(AlterRoleStmt *stmt)
--- 579,586 
  		createdb = intVal(dcreatedb-arg);
  	if (dcanlogin)
  		canlogin = intVal(dcanlogin-arg);
+ 	if (disreplication)
+ 		isreplication = intVal(disreplication-arg);
  	if (dconnlimit)
  	{
  		connlimit = intVal(dconnlimit-arg);
***
*** 600,605  AlterRole(AlterRoleStmt *stmt)
--- 625,631 
  			  createrole  0 
  			  createdb  0 
  			  canlogin  0 
+ 			  isreplication  0 
  			  !dconnlimit 
  			  !rolemembers 
  			  !validUntil 
***
*** 685,690  AlterRole(AlterRoleStmt 

Re: [HACKERS] pl/python improvements

2010-12-23 Thread Marti Raudsepp
On Thu, Dec 23, 2010 at 04:08, Jan Urbański wulc...@wulczer.org wrote:
  * providing custom exceptions for SPI errors, so you can catch only
 UniqueViolations and not have to muck around with SQLCODE

py-postgresql already has a mapping from error codes to Python
exceptions. I think it makes sense to re-use that, instead of
inventing new names.
https://github.com/jwp/py-postgresql/blob/v1.1/postgresql/exceptions.py

It also follows the Python convention of ending exception classes with
Error, so instead of UniqueViolation they have UniqueError, instead
of InvalidTextRepresentation, they have TextRepresentationError

 Meanwhile the code
 is available at https://github.com/wulczer/postgres. You will find 10
 branches there, 9 correspond to these features, and the plpython
 branch is the sum of them all.

I tried building the plpython branch, but got an unrelated error. I
didn't investigate further for now...

make[3]: Entering directory
`/home/marti/src/postgresql-py/src/backend/bootstrap'
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -I. -I. -I../../../src/include -D_GNU_SOURCE   -c -o
bootparse.o bootparse.c
bootparse.y: In function ‘boot_yyparse’:
bootparse.y:224:16: error: too few arguments to function ‘heap_create’
../../../src/include/catalog/heap.h:37:17: note: declared here
bootparse.y:249:16: error: too few arguments to function
‘heap_create_with_catalog’
../../../src/include/catalog/heap.h:48:12: note: declared here
make[3]: *** [bootparse.o] Error 1


Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python improvements

2010-12-23 Thread Jan Urbański
On 23/12/10 12:16, Marti Raudsepp wrote:
 On Thu, Dec 23, 2010 at 04:08, Jan Urbański wulc...@wulczer.org wrote:
  * providing custom exceptions for SPI errors, so you can catch only
 UniqueViolations and not have to muck around with SQLCODE
 
 py-postgresql already has a mapping from error codes to Python
 exceptions. I think it makes sense to re-use that, instead of
 inventing new names.
 https://github.com/jwp/py-postgresql/blob/v1.1/postgresql/exceptions.py
 
 It also follows the Python convention of ending exception classes with
 Error, so instead of UniqueViolation they have UniqueError, instead
 of InvalidTextRepresentation, they have TextRepresentationError

Oh, didn't know that. I see that it does some more fancy things, like
defining a inheritance hierarchy for these exceptions and adding some
more into the mix.

The names I used are not really invented, they're just plpgsql condition
names from
http://www.postgresql.org/docs/current/static/errcodes-appendix.html
with underscores changed to camel case. Also, since they're
autogenerated from utils/errcodes.h they don't have any hierarchy, they
just all inherit from SPIError.

Sticking Error to every one of them will result in things like
SubstringErrorError, so I'm not really sold on that. Basically I think
more PL/Python users will be familiar with condition names as you use
them in pl/pgsql than with the names from py-postgresql.

 Meanwhile the code
 is available at https://github.com/wulczer/postgres. You will find 10
 branches there, 9 correspond to these features, and the plpython
 branch is the sum of them all.
 
 I tried building the plpython branch, but got an unrelated error. I
 didn't investigate further for now...
 
 make[3]: Entering directory
 `/home/marti/src/postgresql-py/src/backend/bootstrap'
 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
 -fwrapv -I. -I. -I../../../src/include -D_GNU_SOURCE   -c -o
 bootparse.o bootparse.c
 bootparse.y: In function ‘boot_yyparse’:
 bootparse.y:224:16: error: too few arguments to function ‘heap_create’
 ../../../src/include/catalog/heap.h:37:17: note: declared here
 bootparse.y:249:16: error: too few arguments to function
 ‘heap_create_with_catalog’
 ../../../src/include/catalog/heap.h:48:12: note: declared here
 make[3]: *** [bootparse.o] Error 1

I'm pretty sure it'll go away it you do make maintainer-clean or git
clean -dfx after switching to the plpython branch.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Recovery conflict monitoring

2010-12-23 Thread Magnus Hagander
This patch adds counters and views to monitor hot standby generated
recovery conflicts. It extends the pg_stat_database view with one
column with the total number of conflicts, and also creates a new view
pg_stat_database_conflicts that contains a breakdown of exactly what
caused the conflicts.

Documentation still pending, but comments meanwhile is of course appreciated ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***
*** 502,508  CREATE VIEW pg_stat_database AS
  pg_stat_get_db_tuples_fetched(D.oid) AS tup_fetched,
  pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
  pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
! pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted
  FROM pg_database D;
  
  CREATE VIEW pg_stat_user_functions AS
--- 502,521 
  pg_stat_get_db_tuples_fetched(D.oid) AS tup_fetched,
  pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
  pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
! pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted,
! 	pg_stat_get_db_conflict_all(D.oid) AS conflicts
! FROM pg_database D;
! 
! CREATE VIEW pg_stat_database_conflicts AS
! SELECT
! D.oid AS datid,
! D.datname AS datname,
! pg_stat_get_db_conflict_database(D.oid) AS confl_database,
! pg_stat_get_db_conflict_tablespace(D.oid) AS confl_tablespace,
! pg_stat_get_db_conflict_lock(D.oid) AS confl_lock,
! pg_stat_get_db_conflict_snapshot(D.oid) AS confl_snapshot,
! pg_stat_get_db_conflict_bufferpin(D.oid) AS confl_bufferpin,
! pg_stat_get_db_conflict_startup_deadlock(D.oid) AS confl_deadlock
  FROM pg_database D;
  
  CREATE VIEW pg_stat_user_functions AS
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 57,62 
--- 57,63 
  #include storage/ipc.h
  #include storage/pg_shmem.h
  #include storage/pmsignal.h
+ #include storage/procsignal.h
  #include utils/guc.h
  #include utils/memutils.h
  #include utils/ps_status.h
***
*** 278,283  static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
--- 279,285 
  static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
  static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
  static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
+ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
  
  
  /* 
***
*** 1314,1319  pgstat_report_analyze(Relation rel, bool adopt_counts,
--- 1316,1340 
  	pgstat_send(msg, sizeof(msg));
  }
  
+ /* 
+  * pgstat_report_recovery_conflict() -
+  *
+  *  Tell the collector about a Hot Standby recovery conflict.
+  * 
+  */
+ void
+ pgstat_report_recovery_conflict(int reason)
+ {
+ 	PgStat_MsgRecoveryConflict msg;
+ 
+ 	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ 		return;
+ 
+ 	pgstat_setheader(msg.m_hdr, PGSTAT_MTYPE_RECOVERYCONFLICT);
+ 	msg.m_databaseid = MyDatabaseId;
+ 	msg.m_reason = reason;
+ 	pgstat_send(msg, sizeof(msg));
+ }
  
  /* --
   * pgstat_ping() -
***
*** 3053,3058  PgstatCollectorMain(int argc, char *argv[])
--- 3074,3083 
  	pgstat_recv_funcpurge((PgStat_MsgFuncpurge *) msg, len);
  	break;
  
+ case PGSTAT_MTYPE_RECOVERYCONFLICT:
+ 	pgstat_recv_recoveryconflict((PgStat_MsgRecoveryConflict *) msg, len);
+ 	break;
+ 
  default:
  	break;
  			}
***
*** 3129,3134  pgstat_get_db_entry(Oid databaseid, bool create)
--- 3154,3165 
  		result-n_tuples_updated = 0;
  		result-n_tuples_deleted = 0;
  		result-last_autovac_time = 0;
+ 		result-n_conflict_database = 0;
+ 		result-n_conflict_tablespace = 0;
+ 		result-n_conflict_lock = 0;
+ 		result-n_conflict_snapshot = 0;
+ 		result-n_conflict_bufferpin = 0;
+ 		result-n_conflict_startup_deadlock = 0;
  
  		memset(hash_ctl, 0, sizeof(hash_ctl));
  		hash_ctl.keysize = sizeof(Oid);
***
*** 4204,4209  pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
--- 4235,4275 
  }
  
  /* --
+  * pgstat_recv_recoveryconflict() -
+  *
+  *  Process as RECOVERYCONFLICT message.
+  * --
+  */
+ static void
+ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len)
+ {
+ 	PgStat_StatDBEntry *dbentry;
+ 	dbentry = pgstat_get_db_entry(msg-m_databaseid, true);
+ 
+ 	switch (msg-m_reason)
+ 	{
+ 		case PROCSIG_RECOVERY_CONFLICT_DATABASE:
+ 			dbentry-n_conflict_database++;
+ 			break;
+ 		case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+ 			dbentry-n_conflict_tablespace++;
+ 			break;
+ 		case PROCSIG_RECOVERY_CONFLICT_LOCK:
+ 			

[HACKERS] pl/python validator function

2010-12-23 Thread Jan Urbański
Here's a patch implementing a validator functoin mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the plpython-refactor patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/validator.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python validator function

2010-12-23 Thread Jan Urbański
On 23/12/10 14:41, Jan Urbański wrote:
 Here's a patch implementing a validator functoin mentioned in
 http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
 an incremental patch on top of the plpython-refactor patch sent eariler.
 
 Git branch for this patch:
 https://github.com/wulczer/postgres/tree/validator.
 
 Cheers,
 Jan

Yes, here's a patch.

Jan

diff --git a/src/include/catalog/pg_pltemplate.h b/src/include/catalog/pg_pltemplate.h
index 6befcfa..2f1d8e1 100644
*** a/src/include/catalog/pg_pltemplate.h
--- b/src/include/catalog/pg_pltemplate.h
*** DATA(insert ( pltcl		t t pltcl_call_h
*** 72,79 
  DATA(insert ( pltclu		f f pltclu_call_handler _null_ _null_ $libdir/pltcl _null_ ));
  DATA(insert ( plperl		t t plperl_call_handler plperl_inline_handler plperl_validator $libdir/plperl _null_ ));
  DATA(insert ( plperlu		f f plperl_call_handler plperl_inline_handler plperl_validator $libdir/plperl _null_ ));
! DATA(insert ( plpythonu	f f plpython_call_handler plpython_inline_handler _null_ $libdir/plpython _null_ ));
! DATA(insert ( plpython2u	f f plpython_call_handler plpython_inline_handler _null_ $libdir/plpython2 _null_ ));
! DATA(insert ( plpython3u	f f plpython3_call_handler plpython3_inline_handler _null_ $libdir/plpython3 _null_ ));
  
  #endif   /* PG_PLTEMPLATE_H */
--- 72,79 
  DATA(insert ( pltclu		f f pltclu_call_handler _null_ _null_ $libdir/pltcl _null_ ));
  DATA(insert ( plperl		t t plperl_call_handler plperl_inline_handler plperl_validator $libdir/plperl _null_ ));
  DATA(insert ( plperlu		f f plperl_call_handler plperl_inline_handler plperl_validator $libdir/plperl _null_ ));
! DATA(insert ( plpythonu	f f plpython_call_handler plpython_inline_handler plpython_validator $libdir/plpython _null_ ));
! DATA(insert ( plpython2u	f f plpython_call_handler plpython_inline_handler plpython_validator $libdir/plpython2 _null_ ));
! DATA(insert ( plpython3u	f f plpython3_call_handler plpython3_inline_handler plpython3_validator $libdir/plpython3 _null_ ));
  
  #endif   /* PG_PLTEMPLATE_H */
diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out
index 70890a8..a9e1f15 100644
*** a/src/pl/plpython/expected/plpython_error.out
--- b/src/pl/plpython/expected/plpython_error.out
***
*** 1,6 
--- 1,30 
  -- test error handling, i forgot to restore Warn_restart in
  -- the trigger handler once. the errors and subsequent core dump were
  -- interesting.
+ /* Flat out Python syntax error
+  */
+ CREATE FUNCTION python_syntax_error() RETURNS text
+ AS
+ '.syntaxerror'
+ LANGUAGE plpythonu;
+ ERROR:  could not compile PL/Python function python_syntax_error
+ DETAIL:  SyntaxError: invalid syntax (string, line 2)
+ /* With check_function_bodies = false the function should get defined
+  * and the error reported when called
+  */
+ SET check_function_bodies = false;
+ CREATE FUNCTION python_syntax_error() RETURNS text
+ AS
+ '.syntaxerror'
+ LANGUAGE plpythonu;
+ SELECT python_syntax_error();
+ ERROR:  could not compile PL/Python function python_syntax_error
+ DETAIL:  SyntaxError: invalid syntax (string, line 2)
+ /* Run the function twice to check if the hashtable entry gets cleaned up */
+ SELECT python_syntax_error();
+ ERROR:  could not compile PL/Python function python_syntax_error
+ DETAIL:  SyntaxError: invalid syntax (string, line 2)
+ RESET check_function_bodies;
  /* Flat out syntax error
   */
  CREATE FUNCTION sql_syntax_error() RETURNS text
diff --git a/src/pl/plpython/expected/plpython_record.out b/src/pl/plpython/expected/plpython_record.out
index c8c4f9d..770f764 100644
*** a/src/pl/plpython/expected/plpython_record.out
--- b/src/pl/plpython/expected/plpython_record.out
*** CREATE FUNCTION test_in_out_params_multi
*** 47,52 
--- 47,53 
   second out text, third out text) AS $$
  return first + '_record_in_to_out';
  $$ LANGUAGE plpythonu;
+ ERROR:  PL/Python functions cannot return type record
  CREATE FUNCTION test_inout_params(first inout text) AS $$
  return first + '_inout';
  $$ LANGUAGE plpythonu;
*** SELECT * FROM test_in_out_params('test_i
*** 299,305 
  
  -- this doesn't work yet :-(
  SELECT * FROM test_in_out_params_multi('test_in');
! ERROR:  PL/Python functions cannot return type record
  SELECT * FROM test_inout_params('test_in');
   first 
  ---
--- 300,309 
  
  -- this doesn't work yet :-(
  SELECT * FROM test_in_out_params_multi('test_in');
! ERROR:  function test_in_out_params_multi(unknown) does not exist
! LINE 1: SELECT * FROM test_in_out_params_multi('test_in');
!   ^
! HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
  SELECT * FROM test_inout_params('test_in');
   first 
  ---
diff --git a/src/pl/plpython/expected/plpython_types.out 

[HACKERS] pl/python SPI in subtransactions

2010-12-23 Thread Jan Urbański
Here's a patch implementing a executing SPI in an subtransaction
mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the plpython-refactor patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/spi-in-subxacts.

Without it the error handling in PL/Python is really broken, as we jump
between from a saught longjmp back into Python without any cleanup. As
an additional bonus you no longer get all the ugly unrecognized error
in PLy_spi_execute_query errors.

Cheers,
Jan
diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out
index 70890a8..7fc8337 100644
*** a/src/pl/plpython/expected/plpython_error.out
--- b/src/pl/plpython/expected/plpython_error.out
*** CREATE FUNCTION sql_syntax_error() RETUR
*** 8,15 
  'plpy.execute(syntax error)'
  LANGUAGE plpythonu;
  SELECT sql_syntax_error();
- WARNING:  plpy.SPIError: unrecognized error in PLy_spi_execute_query
- CONTEXT:  PL/Python function sql_syntax_error
  ERROR:  plpy.SPIError: syntax error at or near syntax
  CONTEXT:  PL/Python function sql_syntax_error
  /* check the handling of uncaught python exceptions
--- 8,13 
*** CREATE FUNCTION exception_index_invalid_
*** 29,36 
  return rv[0]'
  	LANGUAGE plpythonu;
  SELECT exception_index_invalid_nested();
- WARNING:  plpy.SPIError: unrecognized error in PLy_spi_execute_query
- CONTEXT:  PL/Python function exception_index_invalid_nested
  ERROR:  plpy.SPIError: function test5(unknown) does not exist
  CONTEXT:  PL/Python function exception_index_invalid_nested
  /* a typo
--- 27,32 
*** return None
*** 47,54 
  '
  	LANGUAGE plpythonu;
  SELECT invalid_type_uncaught('rick');
- WARNING:  plpy.SPIError: unrecognized error in PLy_spi_prepare
- CONTEXT:  PL/Python function invalid_type_uncaught
  ERROR:  plpy.SPIError: type test does not exist
  CONTEXT:  PL/Python function invalid_type_uncaught
  /* for what it's worth catch the exception generated by
--- 43,48 
*** return None
*** 70,77 
  '
  	LANGUAGE plpythonu;
  SELECT invalid_type_caught('rick');
- WARNING:  plpy.SPIError: unrecognized error in PLy_spi_prepare
- CONTEXT:  PL/Python function invalid_type_caught
  NOTICE:  type test does not exist
  CONTEXT:  PL/Python function invalid_type_caught
   invalid_type_caught 
--- 64,69 
*** return None
*** 97,104 
  '
  	LANGUAGE plpythonu;
  SELECT invalid_type_reraised('rick');
- WARNING:  plpy.SPIError: unrecognized error in PLy_spi_prepare
- CONTEXT:  PL/Python function invalid_type_reraised
  ERROR:  plpy.Error: type test does not exist
  CONTEXT:  PL/Python function invalid_type_reraised
  /* no typo no messing about
--- 89,94 
diff --git a/src/pl/plpython/plpython.c b/src/pl/plpython/plpython.c
index 67eb0f3..5b216b2 100644
*** a/src/pl/plpython/plpython.c
--- b/src/pl/plpython/plpython.c
*** typedef int Py_ssize_t;
*** 101,106 
--- 101,107 
  #include nodes/makefuncs.h
  #include parser/parse_type.h
  #include tcop/tcopprot.h
+ #include access/xact.h
  #include utils/builtins.h
  #include utils/lsyscache.h
  #include utils/memutils.h
*** PLy_spi_prepare(PyObject *self, PyObject
*** 2829,2834 
--- 2830,2836 
  	int			nargs;
  
  	MemoryContext volatile oldcontext = CurrentMemoryContext;
+ 	ResourceOwner volatile oldowner = CurrentResourceOwner;
  
  	if (!PyArg_ParseTuple(args, s|O, query, list))
  	{
*** PLy_spi_prepare(PyObject *self, PyObject
*** 2852,2857 
--- 2854,2862 
  	plan-values = PLy_malloc(sizeof(Datum) * nargs);
  	plan-args = PLy_malloc(sizeof(PLyTypeInfo) * nargs);
  
+ 	BeginInternalSubTransaction(NULL);
+ 	MemoryContextSwitchTo(oldcontext);
+ 
  	PG_TRY();
  	{
  		int	i;
*** PLy_spi_prepare(PyObject *self, PyObject
*** 2931,2949 
  		if (plan-plan == NULL)
  			elog(ERROR, SPI_saveplan failed: %s,
   SPI_result_code_string(SPI_result));
  	}
  	PG_CATCH();
  	{
  		ErrorData	*edata;
  
- 		MemoryContextSwitchTo(oldcontext);
- 		edata = CopyErrorData();
  		Py_DECREF(plan);
  		Py_XDECREF(optr);
! 		if (!PyErr_Occurred())
! 			PLy_exception_set(PLy_exc_spi_error,
! 			  unrecognized error in PLy_spi_prepare);
! 		PLy_elog(WARNING, NULL);
  		PLy_exception_set(PLy_exc_spi_error, edata-message);
  		return NULL;
  	}
--- 2936,2978 
  		if (plan-plan == NULL)
  			elog(ERROR, SPI_saveplan failed: %s,
   SPI_result_code_string(SPI_result));
+ 
+ 		/* Commit the inner transaction, return to outer xact context */
+ 		ReleaseCurrentSubTransaction();
+ 		MemoryContextSwitchTo(oldcontext);
+ 		CurrentResourceOwner = oldowner;
+ 
+ 		/*
+ 		 * AtEOSubXact_SPI() should not have popped any SPI context, but just
+ 		 * in case it did, make sure we remain connected.
+ 		 */
+ 		SPI_restore_connection();
  	}
  	PG_CATCH();
  	{
  		ErrorData	*edata;
  
  		

[HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Jie Li
Hi,

Here is the test table,

postgres=# \d big_wf
Table public.big_wf
 Column |  Type   | Modifiers
+-+---
 age| integer |
 id | integer |

postgres=# \dt+ big_wf
 List of relations
 Schema |  Name  | Type  |  Owner   |  Size  | Description
++---+--++-
 public | big_wf | table | workshop | 142 MB |


The first query sorting on one column:
postgres=# explain analyze select * from big_wf order by age;
   QUERY
PLAN
-
 Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
time=11228.155..16427.149 rows=410 loops=1)
   Sort Key: age
   Sort Method:  external sort  Disk: 72112kB
   -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
(actual time=6.196..4797.620 rows=410 loops=1)
 Total runtime: 19530.452 ms
(5 rows)

The second query sorting on two columns:
postgres=# explain analyze select * from big_wf order by age,id;
   QUERY
PLAN
-
 Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
time=37544.779..48206.702 rows=410 loops=1)
   Sort Key: age, id
   Sort Method:  external merge  Disk: 72048kB
   -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
(actual time=6.796..5518.663 rows=410 loops=1)
 Total runtime: 51258.000 ms
(5 rows)

The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
first column(age) of all the tuples are of the same value, so the second
column(id) is always needed for comparison.  While the first sorting takes
about only 6 seconds, the second one takes over 30 seconds,  Is this too
much than expected? Is there any possible optimization ?

Thanks,
Li Jie


[HACKERS] pl/python invalidate functions with composite arguments

2010-12-23 Thread Jan Urbański
Here's a patch implementing properly invalidating functions that have
composite type arguments after the type changes, as mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the plpython-refactor patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/invalidate-composites.

The idea in general is for this to work (taken straight from the unit
tests, btw)

CREATE TABLE employee (
name text,
basesalary integer,
bonus integer
);
INSERT INTO employee VALUES ('John', 100, 10), ('Mary', 200, 10);

CREATE OR REPLACE FUNCTION test_composite_table_input(e employee)
RETURNS integer AS $$
return e['basesalary'] + e['bonus']
$$ LANGUAGE plpythonu;

SELECT name, test_composite_table_input(employee.*) FROM employee;
ALTER TABLE employee DROP bonus;
-- this fails
SELECT name, test_composite_table_input(employee.*) FROM employee;
ALTER TABLE employee ADD bonus integer;
UPDATE employee SET bonus = 10;
-- this works again
SELECT name, test_composite_table_input(employee.*) FROM employee;

It's a long-standing TODO item, and a generally good thing to do.

Cheers,
Jan
diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out
index 982005b..cf7b7de 100644
*** a/src/pl/plpython/expected/plpython_types.out
--- b/src/pl/plpython/expected/plpython_types.out
*** SELECT * FROM test_type_conversion_array
*** 599,604 
--- 599,656 
  ERROR:  return value of function with array return type is not a Python sequence
  CONTEXT:  while creating return value
  PL/Python function test_type_conversion_array_error
+ ---
+ --- Composite types
+ ---
+ CREATE TABLE employee (
+ name text,
+ basesalary integer,
+ bonus integer
+ );
+ INSERT INTO employee VALUES ('John', 100, 10), ('Mary', 200, 10);
+ CREATE OR REPLACE FUNCTION test_composite_table_input(e employee) RETURNS integer AS $$
+ return e['basesalary'] + e['bonus']
+ $$ LANGUAGE plpythonu;
+ SELECT name, test_composite_table_input(employee.*) FROM employee;
+  name | test_composite_table_input 
+ --+
+  John |110
+  Mary |210
+ (2 rows)
+ 
+ ALTER TABLE employee DROP bonus;
+ SELECT name, test_composite_table_input(employee.*) FROM employee;
+ ERROR:  KeyError: 'bonus'
+ CONTEXT:  PL/Python function test_composite_table_input
+ ALTER TABLE employee ADD bonus integer;
+ UPDATE employee SET bonus = 10;
+ SELECT name, test_composite_table_input(employee.*) FROM employee;
+  name | test_composite_table_input 
+ --+
+  John |110
+  Mary |210
+ (2 rows)
+ 
+ CREATE TYPE named_pair AS (
+ i integer,
+ j integer
+ );
+ CREATE OR REPLACE FUNCTION test_composite_type_input(p named_pair) RETURNS integer AS $$
+ return sum(p.values())
+ $$ LANGUAGE plpythonu;
+ SELECT test_composite_type_input(row(1, 2));
+  test_composite_type_input 
+ ---
+  3
+ (1 row)
+ 
+ ALTER TYPE named_pair RENAME TO named_pair_2;
+ SELECT test_composite_type_input(row(1, 2));
+  test_composite_type_input 
+ ---
+  3
+ (1 row)
+ 
  --
  -- Prepared statements
  --
diff --git a/src/pl/plpython/expected/plpython_types_3.out b/src/pl/plpython/expected/plpython_types_3.out
index eeb23b7..e10b060 100644
*** a/src/pl/plpython/expected/plpython_types_3.out
--- b/src/pl/plpython/expected/plpython_types_3.out
*** SELECT * FROM test_type_conversion_array
*** 599,604 
--- 599,656 
  ERROR:  return value of function with array return type is not a Python sequence
  CONTEXT:  while creating return value
  PL/Python function test_type_conversion_array_error
+ ---
+ --- Composite types
+ ---
+ CREATE TABLE employee (
+ name text,
+ basesalary integer,
+ bonus integer
+ );
+ INSERT INTO employee VALUES ('John', 100, 10), ('Mary', 200, 10);
+ CREATE OR REPLACE FUNCTION test_composite_table_input(e employee) RETURNS integer AS $$
+ return e['basesalary'] + e['bonus']
+ $$ LANGUAGE plpython3u;
+ SELECT name, test_composite_table_input(employee.*) FROM employee;
+  name | test_composite_table_input 
+ --+
+  John |110
+  Mary |210
+ (2 rows)
+ 
+ ALTER TABLE employee DROP bonus;
+ SELECT name, test_composite_table_input(employee.*) FROM employee;
+ ERROR:  KeyError: 'bonus'
+ CONTEXT:  PL/Python function test_composite_table_input
+ ALTER TABLE employee ADD bonus integer;
+ UPDATE employee SET bonus = 10;
+ SELECT name, test_composite_table_input(employee.*) FROM employee;
+  name | test_composite_table_input 
+ --+
+  John |110
+  Mary |210
+ (2 rows)
+ 
+ CREATE TYPE named_pair AS (
+ i integer,
+ j 

[HACKERS] pl/python tracebacks

2010-12-23 Thread Jan Urbański
Here's a patch implementing traceback support for PL/Python mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the plpython-refactor patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/tracebacks.

It's a variant of
http://archives.postgresql.org/pgsql-patches/2006-02/msg00288.php with a
few more twists.

For errors originating from Python exceptions add the traceback as the
message detail. The patch tries to mimick Python's traceback.py module
behaviour as close as possible, icluding interleaving stack frames with
source code lines in the detail message. Any Python developer should
instantly recognize these kind of error reporting, it looks almost the
same as an error in the interactive Python shell.

A future optimisation might be not splitting the procedure source each
time a traceback is generated, but for now it's probably not the most
important scenario to optimise for.

Cheers,
Jan
diff --git a/src/pl/plpython/expected/plpython_do.out b/src/pl/plpython/expected/plpython_do.out
index a21b088..fb0f0e5 100644
*** a/src/pl/plpython/expected/plpython_do.out
--- b/src/pl/plpython/expected/plpython_do.out
*** NOTICE:  This is plpythonu.
*** 3,6 
--- 3,9 
  CONTEXT:  PL/Python anonymous code block
  DO $$ nonsense $$ LANGUAGE plpythonu;
  ERROR:  NameError: global name 'nonsense' is not defined
+ DETAIL:  Traceback (most recent call last):
+   PL/Python anonymous code block, line 1, in module
+ nonsense 
  CONTEXT:  PL/Python anonymous code block
diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out
index 70890a8..fe8a91f 100644
*** a/src/pl/plpython/expected/plpython_error.out
--- b/src/pl/plpython/expected/plpython_error.out
*** SELECT sql_syntax_error();
*** 11,16 
--- 11,19 
  WARNING:  plpy.SPIError: unrecognized error in PLy_spi_execute_query
  CONTEXT:  PL/Python function sql_syntax_error
  ERROR:  plpy.SPIError: syntax error at or near syntax
+ DETAIL:  Traceback (most recent call last):
+   PL/Python function sql_syntax_error, line 1, in module
+ plpy.execute(syntax error)
  CONTEXT:  PL/Python function sql_syntax_error
  /* check the handling of uncaught python exceptions
   */
*** CREATE FUNCTION exception_index_invalid(
*** 20,25 
--- 23,31 
  	LANGUAGE plpythonu;
  SELECT exception_index_invalid('test');
  ERROR:  IndexError: list index out of range
+ DETAIL:  Traceback (most recent call last):
+   PL/Python function exception_index_invalid, line 1, in module
+ return args[1]
  CONTEXT:  PL/Python function exception_index_invalid
  /* check handling of nested exceptions
   */
*** SELECT exception_index_invalid_nested();
*** 32,37 
--- 38,46 
  WARNING:  plpy.SPIError: unrecognized error in PLy_spi_execute_query
  CONTEXT:  PL/Python function exception_index_invalid_nested
  ERROR:  plpy.SPIError: function test5(unknown) does not exist
+ DETAIL:  Traceback (most recent call last):
+   PL/Python function exception_index_invalid_nested, line 1, in module
+ rv = plpy.execute(SELECT test5('foo'))
  CONTEXT:  PL/Python function exception_index_invalid_nested
  /* a typo
   */
*** SELECT invalid_type_uncaught('rick');
*** 50,55 
--- 59,67 
  WARNING:  plpy.SPIError: unrecognized error in PLy_spi_prepare
  CONTEXT:  PL/Python function invalid_type_uncaught
  ERROR:  plpy.SPIError: type test does not exist
+ DETAIL:  Traceback (most recent call last):
+   PL/Python function invalid_type_uncaught, line 3, in module
+ SD[plan] = plpy.prepare(q, [ test ])
  CONTEXT:  PL/Python function invalid_type_uncaught
  /* for what it's worth catch the exception generated by
   * the typo, and return None
*** SELECT invalid_type_reraised('rick');
*** 100,105 
--- 112,120 
  WARNING:  plpy.SPIError: unrecognized error in PLy_spi_prepare
  CONTEXT:  PL/Python function invalid_type_reraised
  ERROR:  plpy.Error: type test does not exist
+ DETAIL:  Traceback (most recent call last):
+   PL/Python function invalid_type_reraised, line 6, in module
+ plpy.error(str(ex))
  CONTEXT:  PL/Python function invalid_type_reraised
  /* no typo no messing about
   */
*** SELECT valid_type('rick');
*** 119,121 
--- 134,231 
   
  (1 row)
  
+ /* error in nested functions to get a traceback
+ */
+ CREATE FUNCTION nested_error() RETURNS text
+ 	AS
+ 'def fun1():
+ 	plpy.error(boom)
+ 
+ def fun2():
+ 	fun1()
+ 
+ def fun3():
+ 	fun2()
+ 
+ fun3()
+ return not reached
+ '
+ 	LANGUAGE plpythonu;
+ SELECT nested_error();
+ ERROR:  plpy.Error: boom
+ DETAIL:  Traceback (most recent call last):
+   PL/Python function nested_error, line 10, in module
+ fun3()
+   PL/Python function nested_error, line 8, in fun3
+ fun2()
+   PL/Python function nested_error, line 5, in fun2
+ fun1()
+   PL/Python function nested_error, 

[HACKERS] pl/python table functions

2010-12-23 Thread Jan Urbański
Here's a patch implementing table functions mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the plpython-refactor patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/table-functions.

This allows functions with multiple OUT parameters returning both one or
multiple records (RECORD or SETOF RECORD). There's one inconvenience,
which is that if you return a record that has fields of composite types,
the I/O functions for these types will be looked up on each execution.
Changing that would require some juggling of the PL/Python structures,
so I just left it at that.

Note that returning just the composite type (or a set of them) does
cache the I/O funcs. You get the repeated lookups only if the function
returns an unnamed record, that has composite field among others, so
something like

CREATE FUNCTION x(OUT x table_type, OUT y integer) RETURNS RECORD

which I think is fairly uncommon.

Cheers,
Jan

diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index 16d78ae..167393e 100644
*** a/src/pl/plpython/Makefile
--- b/src/pl/plpython/Makefile
*** REGRESS = \
*** 79,84 
--- 79,85 
  	plpython_types \
  	plpython_error \
  	plpython_unicode \
+ 	plpython_composite \
  	plpython_drop
  # where to find psql for running the tests
  PSQLDIR = $(bindir)
diff --git a/src/pl/plpython/expected/plpython_composite.out b/src/pl/plpython/expected/plpython_composite.out
index ...70a4571 .
*** a/src/pl/plpython/expected/plpython_composite.out
--- b/src/pl/plpython/expected/plpython_composite.out
***
*** 0 
--- 1,275 
+ CREATE FUNCTION multiout_simple(OUT i integer, OUT j integer) AS $$
+ return (1, 2)
+ $$ LANGUAGE plpythonu;
+ SELECT multiout_simple();
+  multiout_simple 
+ -
+  (1,2)
+ (1 row)
+ 
+ SELECT * FROM multiout_simple();
+  i | j 
+ ---+---
+  1 | 2
+ (1 row)
+ 
+ SELECT i, j + 2 FROM multiout_simple();
+  i | ?column? 
+ ---+--
+  1 |4
+ (1 row)
+ 
+ SELECT (multiout_simple()).j + 3;
+  ?column? 
+ --
+ 5
+ (1 row)
+ 
+ CREATE FUNCTION multiout_simple_setof(n integer = 1, OUT integer, OUT integer) RETURNS SETOF record AS $$
+ return [(1, 2)] * n
+ $$ LANGUAGE plpythonu;
+ SELECT multiout_simple_setof();
+  multiout_simple_setof 
+ ---
+  (1,2)
+ (1 row)
+ 
+ SELECT * FROM multiout_simple_setof();
+  column1 | column2 
+ -+-
+1 |   2
+ (1 row)
+ 
+ SELECT * FROM multiout_simple_setof(3);
+  column1 | column2 
+ -+-
+1 |   2
+1 |   2
+1 |   2
+ (3 rows)
+ 
+ CREATE FUNCTION multiout_record_as(typ text,
+first text, OUT first text,
+second integer, OUT second integer,
+retnull boolean) RETURNS record AS $$
+ if retnull:
+ return None
+ if typ == 'dict':
+ return { 'first': first, 'second': second, 'additionalfield': 'must not cause trouble' }
+ elif typ == 'tuple':
+ return ( first, second )
+ elif typ == 'list':
+ return [ first, second ]
+ elif typ == 'obj':
+ class type_record: pass
+ type_record.first = first
+ type_record.second = second
+ return type_record
+ $$ LANGUAGE plpythonu;
+ SELECT * from multiout_record_as('dict', 'foo', 1, 'f');
+  first | second 
+ ---+
+  foo   |  1
+ (1 row)
+ 
+ SELECT multiout_record_as('dict', 'foo', 1, 'f');
+  multiout_record_as 
+ 
+  (foo,1)
+ (1 row)
+ 
+ SELECT *, s IS NULL as snull from multiout_record_as('tuple', 'xxx', NULL, 'f') AS f(f, s);
+   f  | s | snull 
+ -+---+---
+  xxx |   | t
+ (1 row)
+ 
+ SELECT *, f IS NULL as fnull, s IS NULL as snull from multiout_record_as('tuple', 'xxx', 1, 't') AS f(f, s);
+  f | s | fnull | snull 
+ ---+---+---+---
+|   | t | t
+ (1 row)
+ 
+ SELECT * from multiout_record_as('obj', NULL, 10, 'f');
+  first | second 
+ ---+
+| 10
+ (1 row)
+ 
+ CREATE FUNCTION multiout_setof(n integer,
+OUT power_of_2 integer,
+OUT length integer) RETURNS SETOF record AS $$
+ for i in range(n):
+ power = 2 ** i
+ length = plpy.execute(select length('%d') % power)[0]['length']
+ yield power, length
+ $$ LANGUAGE plpythonu;
+ SELECT * FROM multiout_setof(3);
+  power_of_2 | length 
+ +
+   1 |  1
+   2 |  1
+   4 |  1
+ (3 rows)
+ 
+ SELECT multiout_setof(5);
+  multiout_setof 
+ 
+  (1,1)
+  (2,1)
+  (4,1)
+  (8,1)
+  (16,2)
+ (5 rows)
+ 
+ CREATE FUNCTION multiout_return_table() RETURNS TABLE (x integer, y text) AS $$
+ return [{'x': 4, 'y' :'four'},
+ {'x': 7, 'y' :'seven'},
+ {'x': 0, 'y' :'zero'}]
+ $$ LANGUAGE plpythonu;
+ SELECT * FROM multiout_return_table();
+  

Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Kenneth Marshall
On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote:
 Hi,
 
 Here is the test table,
 
 postgres=# \d big_wf
 Table public.big_wf
  Column |  Type   | Modifiers
 +-+---
  age| integer |
  id | integer |
 
 postgres=# \dt+ big_wf
  List of relations
  Schema |  Name  | Type  |  Owner   |  Size  | Description
 ++---+--++-
  public | big_wf | table | workshop | 142 MB |
 
 
 The first query sorting on one column:
 postgres=# explain analyze select * from big_wf order by age;
QUERY
 PLAN
 -
  Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
 time=11228.155..16427.149 rows=410 loops=1)
Sort Key: age
Sort Method:  external sort  Disk: 72112kB
-  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
 (actual time=6.196..4797.620 rows=410 loops=1)
  Total runtime: 19530.452 ms
 (5 rows)
 
 The second query sorting on two columns:
 postgres=# explain analyze select * from big_wf order by age,id;
QUERY
 PLAN
 -
  Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
 time=37544.779..48206.702 rows=410 loops=1)
Sort Key: age, id
Sort Method:  external merge  Disk: 72048kB
-  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
 (actual time=6.796..5518.663 rows=410 loops=1)
  Total runtime: 51258.000 ms
 (5 rows)
 
 The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
 first column(age) of all the tuples are of the same value, so the second
 column(id) is always needed for comparison.  While the first sorting takes
 about only 6 seconds, the second one takes over 30 seconds,  Is this too
 much than expected? Is there any possible optimization ?
 
 Thanks,
 Li Jie

Hi Li,

If I understand your description, in the first query the sort does
not actually have to do anything because the column values for age
are all degenerate. In the second query, you actually need to sort
the values which is why it takes longer. If the first column values
are the same, then simply sorting by id alone would be faster.
You could also bump up work_mem for the query to perform the sort
in memory.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pl/python custom datatype parsers

2010-12-23 Thread Jan Urbański
Here's a patch implementing custom parsers for data types mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the plpython-refactor patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/custom-parsers.

The idea has been discussed in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01307.php.

With that patch, when built with --with-python, the hstore module
includes code that adds a GUC called plpython.hstore.

This GUC should be set to the full name of the hstore datatype, for
instance plpython.hstore = 'public.hstore'.

If it is set, the datatype's OID is looked up and hstore sets up a
rendezvous variable called PLPYTHON_OID_PARSERS that points to two
functions that can convert a hstore Datum to a PyObject and back.

PL/Python ot the other hand when it sees an argument with an unknown
type tries to look up a rendezvous variable using the type's OID and if
it finds it, it uses the parser functions pointed at by that variable.

Long story short, it works so:

LOAD 'hstore';
SET plpython.hstore = 'public.hstore'
CREATE FUNCTION pick_one(h hstore, key text) RETURNS hstore AS $$ return
{key: h[key]} $$ LANGUAGE plpythonu;
SELECT pick_one('a=3,b=4', 'b')
-- gives bask a hstore 'b=4'

There's some ugliness with how hstore's Makefile handles building it,
and I'm not sure what's needed to make it work with the Windows build
system. Also, documentation is missing. It's already usable, but if we
decide to commit that, I'll probably need some help with Windows and docs.

I first tried to make hstore generate a separate .so with that
functionality if --with-python was specified, but couldn't convince the
Makefile to do that. So if you configure the tree with --with-python,
hstore will link to libpython, maybe that's OK?

Cheers,
Jan

PS: of course, once committed we can add custom parsers for isbn,
citext, uuids, cubes, and other weird things.

J
diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index e466b6f..dbeeb89 100644
*** a/contrib/hstore/Makefile
--- b/contrib/hstore/Makefile
*** top_builddir = ../..
*** 5,12 
  include $(top_builddir)/src/Makefile.global
  
  MODULE_big = hstore
  OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
! 	crc32.o
  
  DATA_built = hstore.sql
  DATA = uninstall_hstore.sql
--- 5,21 
  include $(top_builddir)/src/Makefile.global
  
  MODULE_big = hstore
+ 
  OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
! 	hstore_plpython.o crc32.o
! 
! ifeq ($(with_python),yes)
! 
! PG_CPPFLAGS := -I$(srcdir) -I$(top_builddir)/src/pl/plpython \
! 			$(python_includespec) -DHSTORE_PLPYTHON_SUPPORT
! SHLIB_LINK = $(python_libspec) $(python_additional_libs) \
! 		$(filter -lintl,$(LIBS)) $(CPPFLAGS)
! endif
  
  DATA_built = hstore.sql
  DATA = uninstall_hstore.sql
diff --git a/contrib/hstore/hstore.h b/contrib/hstore/hstore.h
index 8906397..6edfc70 100644
*** a/contrib/hstore/hstore.h
--- b/contrib/hstore/hstore.h
*** extern Pairs *hstoreArrayToPairs(ArrayTy
*** 174,179 
--- 174,182 
  #define HStoreExistsAllStrategyNumber	11
  #define HStoreOldContainsStrategyNumber 13		/* backwards compatibility */
  
+ /* PL/Python support */
+ extern void hstore_plpython_init(void);
+ 
  /*
   * defining HSTORE_POLLUTE_NAMESPACE=0 will prevent use of old function names;
   * for now, we default to on for the benefit of people restoring old dumps
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 0d6f0b6..92c8db9 100644
*** a/contrib/hstore/hstore_io.c
--- b/contrib/hstore/hstore_io.c
*** PG_MODULE_MAGIC;
*** 20,25 
--- 20,26 
  /* old names for C functions */
  HSTORE_POLLUTE(hstore_from_text, tconvert);
  
+ void _PG_init(void);
  
  typedef struct
  {
*** hstore_send(PG_FUNCTION_ARGS)
*** 1211,1213 
--- 1212,1220 
  
  	PG_RETURN_BYTEA_P(pq_endtypsend(buf));
  }
+ 
+ void
+ _PG_init(void)
+ {
+ 	hstore_plpython_init();
+ }
diff --git a/contrib/hstore/hstore_plpython.c b/contrib/hstore/hstore_plpython.c
index ...081a33e .
*** a/contrib/hstore/hstore_plpython.c
--- b/contrib/hstore/hstore_plpython.c
***
*** 0 
--- 1,249 
+ /*
+  * contrib/src/hstore_plpython.c
+  *
+  * bidirectional transformation between hstores and Python dictionary objects
+  */
+ 
+ /* Only build if PL/Python support is needed */
+ #if defined(HSTORE_PLPYTHON_SUPPORT)
+ 
+ #if defined(_MSC_VER)  defined(_DEBUG)
+ /* Python uses #pragma to bring in a non-default libpython on VC++ if
+  * _DEBUG is defined */
+ #undef _DEBUG
+ /* Also hide away errcode, since we load Python.h before postgres.h */
+ #define errcode __msvc_errcode
+ #include Python.h
+ #undef errcode
+ #define _DEBUG
+ #elif defined (_MSC_VER)
+ #define errcode __msvc_errcode
+ #include Python.h
+ #undef errcode
+ #else
+ #include Python.h
+ #endif
+ 
+ #include postgres.h
+ #include 

Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Marti Raudsepp
On Thu, Dec 23, 2010 at 09:33, Jie Li jay23j...@gmail.com wrote:
 While the first sorting takes
 about only 6 seconds, the second one takes over 30 seconds,  Is this too
 much than expected? Is there any possible optimization ?

If you're doing these queries often, you should:
CREATE INDEX ix_big_wf_age_id ON big_wf (age, id)

If that's still not fast enough, you can physically sort rows in the
table using the newly created index:
CLUSTER big_wf USING ix_big_wf_age_id;

Please post back your results. :)

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST insert algorithm rewrite

2010-12-23 Thread Heikki Linnakangas

On 21.12.2010 20:00, Heikki Linnakangas wrote:

One final version, with a bug fix wrt. root page split and some cleanup.
I'm planning to commit this before Christmas. It's a big patch, so
review would be much appreciated.


Committed. Phew! Review  testing is of course still appreciated, given 
how big and complicated this was.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is sorting on two columns so slower than sortingon one column?

2010-12-23 Thread Kenneth Marshall
On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
 Hi Ken,
 
 Thanks for your tips! Yes it is the case, and I run another query sorting on 
 the second column whose values are random.
 
 postgres=# explain analyze select * from big_wf order by id;
   QUERY PLAN  
   
 -
 Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual 
 time=25681.875..36458.824 rows=410 loops=1)
   Sort Key: id
   Sort Method:  external merge  Disk: 72048kB
   -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8) (actual 
 time=8.595..5569.500 rows=410 loops=1)
 
 Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 
 seconds, right? But one thing I'm confused is that, why is additional 
 comparison really so expensive?  Does it incur additional I/O? From the cost 
 model, it seems not, all the cost are the same (575775.45).
 
 Thanks,
 Li Jie

In the first query, the cost is basically the I/O cost to read the
table from disk. The actual sort does not do anything since the
sort values are the same. In the second query, the sort has to
swap things in memory/disk to get them in the correct order for
the result. This actually takes CPU and possibly additional I/O
which is why it is slower. In the case of sorting by just the id
column, the size of the sorted values is smaller which would need
fewer batches to complete the sort since the sort is bigger than
the work_mem.

Cheers,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pl/python explicit subtransactions

2010-12-23 Thread Jan Urbański
Here's a patch implementing explicitly starting subtransactions mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the spi-in-subxacts patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/explicit-subxacts.

The idea has been proposed in
http://archives.postgresql.org/pgsql-hackers/2010-11/msg00122.php

This patch provides a subtransaction context manager, in the vein of
http://www.python.org/dev/peps/pep-0343/.

When inside an explicit subtransaction, SPI calls do not start another
one, so you pay the subxact start overhead only once, and you get atomic
behaviour.

For instance this:

with plpy.subxact():
plpy.execute(insert into t values (1))
plpy.execute(insert into t values (2))
plpy.execute(ooops)

will not insert any rows into t. Just so you realise it, it *will* raise
the exception from the last execute, if you want to continue execution
you need to put your with block in a try/catch. If the code starts a
subtransaction but fails to close it, PL/Python will forcibly roll it
back to leave the backend in a clean state.

The patch lacks user-facing documentation, I'll add that later if it
gets accepted. For more usage examples refer to the unit tests that the
patch adds.

Cheers,
Jan
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index 16d78ae..33dddc6 100644
*** a/src/pl/plpython/Makefile
--- b/src/pl/plpython/Makefile
*** REGRESS = \
*** 79,84 
--- 79,85 
  	plpython_types \
  	plpython_error \
  	plpython_unicode \
+ 	plpython_subxact \
  	plpython_drop
  # where to find psql for running the tests
  PSQLDIR = $(bindir)
diff --git a/src/pl/plpython/expected/README b/src/pl/plpython/expected/README
index f011528..362cc0d 100644
*** a/src/pl/plpython/expected/README
--- b/src/pl/plpython/expected/README
*** plpython_unicode_2.out		Python 2.2
*** 6,8 
--- 6,11 
  plpython_unicode_3.out		Python 2.3 through 3.1
  
  plpython_types_3.out		Python 3.1
+ 
+ plpython_subxact.out		Python 2.6 through 3.1
+ plpython_subxact_0.out		older Pythons that don't have the with statement
diff --git a/src/pl/plpython/expected/plpython_subxact.out b/src/pl/plpython/expected/plpython_subxact.out
index ...25a5a4b .
*** a/src/pl/plpython/expected/plpython_subxact.out
--- b/src/pl/plpython/expected/plpython_subxact.out
***
*** 0 
--- 1,321 
+ -- test explicit subtransaction starting
+ /* Test table to see if transactions get properly rolled back
+  */
+ CREATE TABLE subxact_tbl (
+ i integer
+ );
+ /* Explicit case for Python 2.6
+  */
+ CREATE FUNCTION subxact_test(what_error text = NULL) RETURNS text
+ AS $$
+ import sys
+ subxact = plpy.subxact()
+ subxact.__enter__()
+ exc = True
+ try:
+ try:
+ plpy.execute(insert into subxact_tbl values(1))
+ plpy.execute(insert into subxact_tbl values(2))
+ if what_error == SPI:
+ plpy.execute(insert into subxact_tbl values('oops'))
+ elif what_error == Python:
+ plpy.attribute_error
+ except:
+ exc = False
+ subxact.__exit__(*sys.exc_info())
+ raise
+ finally:
+ if exc:
+ subxact.__exit__(None, None, None)
+ $$ LANGUAGE plpythonu;
+ SELECT subxact_test();
+  subxact_test 
+ --
+  
+ (1 row)
+ 
+ SELECT * FROM subxact_tbl;
+  i 
+ ---
+  1
+  2
+ (2 rows)
+ 
+ TRUNCATE subxact_tbl;
+ SELECT subxact_test('SPI');
+ ERROR:  plpy.SPIError: invalid input syntax for integer: oops
+ CONTEXT:  PL/Python function subxact_test
+ SELECT * FROM subxact_tbl;
+  i 
+ ---
+ (0 rows)
+ 
+ TRUNCATE subxact_tbl;
+ SELECT subxact_test('Python');
+ ERROR:  AttributeError: 'module' object has no attribute 'attribute_error'
+ CONTEXT:  PL/Python function subxact_test
+ SELECT * FROM subxact_tbl;
+  i 
+ ---
+ (0 rows)
+ 
+ TRUNCATE subxact_tbl;
+ /* Context manager case for Python =2.6
+  */
+ CREATE FUNCTION subxact_ctx_test(what_error text = NULL) RETURNS text
+ AS $$
+ with plpy.subxact():
+ plpy.execute(insert into subxact_tbl values(1))
+ plpy.execute(insert into subxact_tbl values(2))
+ if what_error == SPI:
+ plpy.execute(insert into subxact_tbl values('oops'))
+ elif what_error == Python:
+ plpy.attribute_error
+ $$ LANGUAGE plpythonu;
+ SELECT subxact_ctx_test();
+  subxact_ctx_test 
+ --
+  
+ (1 row)
+ 
+ SELECT * FROM subxact_tbl;
+  i 
+ ---
+  1
+  2
+ (2 rows)
+ 
+ TRUNCATE subxact_tbl;
+ SELECT subxact_ctx_test('SPI');
+ ERROR:  plpy.SPIError: invalid input syntax for integer: oops
+ CONTEXT:  PL/Python function subxact_ctx_test
+ SELECT * FROM subxact_tbl;
+  i 
+ ---
+ (0 rows)
+ 
+ TRUNCATE subxact_tbl;
+ SELECT subxact_ctx_test('Python');
+ ERROR:  AttributeError: 'module' object has no attribute 'attribute_error'
+ CONTEXT:  PL/Python function subxact_ctx_test
+ SELECT * FROM subxact_tbl;
+  i 
+ ---
+ (0 rows)
+ 
+ TRUNCATE subxact_tbl;
+ 

[HACKERS] pl/python custom exceptions for SPI

2010-12-23 Thread Jan Urbański
Here's a patch implementing custom Python exceptions for SPI errors
mentioned in
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
an incremental patch on top of the explicit-subxacts patch sent eariler.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/custom-spi-exceptions.

What the patch does is provide a Python exception per each error defined
in utils/errcodes.h, and then raise the corresponding exception when a
SPI call fails. The parsing of errcodes.h in the Makefile is a little
grotty and would probably have to be ported to the Windows build system,
which I have no idea about.

With this patch you can do:

from plpy import spiexceptions

try:
plpy.execute(insert into t values (4))
catch spiexceptions.UniqueViolation:
plpy.notice(unique constraint violation)
catch spiexceptions.NotNullViolation:
plpy.notice(not null constraint violation)

All exceptions inherint from plpy.SPIError, so code thta just catches a
blanket SPIErorr will continue to work.

The patch lacks user-facing docs, if it gets accepted I'll write some.
Not sure if we should provide a table similar to
http://www.postgresql.org/docs/current/static/errcodes-appendix.html, or
just refer to that table and state that the rule is changing underscores
to camel case...

Also, I just realised that this patch does not really depend on the
explicit-subxacts change, but rather only on the spi-in-subxacts, so if
need be I can generate it as an incremental changeset ofer the latter
and not the former.

Cheers,
Jan
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index 33dddc6..dd5b445 100644
*** a/src/pl/plpython/Makefile
--- b/src/pl/plpython/Makefile
*** rpathdir = $(python_libdir)
*** 38,44 
  
  NAME = plpython$(python_majorversion)
  OBJS = plpython.o
! 
  
  # Python on win32 ships with import libraries only for Microsoft Visual C++,
  # which are not compatible with mingw gcc. Therefore we need to build a
--- 38,44 
  
  NAME = plpython$(python_majorversion)
  OBJS = plpython.o
! SPIEXCEPTIONS = spiexceptions.h
  
  # Python on win32 ships with import libraries only for Microsoft Visual C++,
  # which are not compatible with mingw gcc. Therefore we need to build a
*** PSQLDIR = $(bindir)
*** 86,93 
  
  include $(top_srcdir)/src/Makefile.shlib
  
  
! all: all-lib
  
  install: all installdirs install-lib
  ifeq ($(python_majorversion),2)
--- 86,113 
  
  include $(top_srcdir)/src/Makefile.shlib
  
+ # A quite horrendous sed, but does the job. The steps are, in order:
+ # 1. Remove everything up to the line with Class 03. We only generate
+ #exceptions for errors, not for warnings or notices
+ # 2. Remove lines that don't define an error code
+ # 3. Change ERRCODE_XXX into { spiexceptions.ERRCODE_XY_Z, XY_Z, ERRCODE_XY_Z },
+ # 4. Leave an uppercase letter after a dot or a quote, change the rest
+ #into lowercase thus giving us
+ #{ spiexceptions.Errcode_xy_z, Xy_z, ERRCODE_XY_Z },
+ # 5. change lowercase letters after an underscore into uppercase, giving us
+ #{ spiexceptions.ErrcodeXyZ, XyZ, ERRCODE_XY_Z },
+ gen-spiexceptions:
+ 	echo /* autogenerated from utils/errcodes.h, do not edit */  $(SPIEXCEPTIONS)
+ 	sed -e '1,/Class 03/ d' \
+ -e '/^#define ERRCODE_.*MAKE_SQLSTATE/! d' \
+ -e 's|#define ERRCODE_\([^\t ]*\).*|{ spiexceptions.\1, \1, ERRCODE_\1 },|' \
+ -e 's|\([\.]\)\([A-Z]\)\([^]*\)|\1\2\L\3|g' \
+ -e 's|_\([a-z]\)|\u\1|g' \
+ 	$(top_srcdir)/src/include/utils/errcodes.h  $(SPIEXCEPTIONS)
  
! .PHONY: gen-spiexceptions
! 
! all: gen-spiexceptions all-lib
  
  install: all installdirs install-lib
  ifeq ($(python_majorversion),2)
*** clean distclean maintainer-clean: clean-
*** 138,143 
--- 158,164 
  	rm -f $(OBJS)
  	rm -rf results
  	rm -f regression.diffs regression.out
+ 	rm -f $(SPIEXCEPTIONS)
  ifeq ($(PORTNAME), win32)
  	rm -f python${pytverstr}.def
  endif
diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out
index 7fc8337..718ebce 100644
*** a/src/pl/plpython/expected/plpython_error.out
--- b/src/pl/plpython/expected/plpython_error.out
*** CREATE FUNCTION sql_syntax_error() RETUR
*** 8,14 
  'plpy.execute(syntax error)'
  LANGUAGE plpythonu;
  SELECT sql_syntax_error();
! ERROR:  plpy.SPIError: syntax error at or near syntax
  CONTEXT:  PL/Python function sql_syntax_error
  /* check the handling of uncaught python exceptions
   */
--- 8,14 
  'plpy.execute(syntax error)'
  LANGUAGE plpythonu;
  SELECT sql_syntax_error();
! ERROR:  spiexceptions.SyntaxError: syntax error at or near syntax
  CONTEXT:  PL/Python function sql_syntax_error
  /* check the handling of uncaught python exceptions
   */
*** CREATE FUNCTION exception_index_invalid_
*** 27,33 
  return rv[0]'
  	LANGUAGE plpythonu;
  SELECT exception_index_invalid_nested();

Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?

2010-12-23 Thread Kenneth Marshall
On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
 - Original Message - 
 From: Kenneth Marshall k...@rice.edu
 To: Li Jie jay23j...@gmail.com
 Cc: pgsql-hackers pgsql-hackers@postgresql.org
 Sent: Thursday, December 23, 2010 10:30 PM
 Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon 
 one column?
 
 
  On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
  Hi Ken,
  
  Thanks for your tips! Yes it is the case, and I run another query sorting 
  on the second column whose values are random.
  
  postgres=# explain analyze select * from big_wf order by id;
QUERY PLAN   
   
  -
  Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual 
  time=25681.875..36458.824 rows=410 loops=1)
Sort Key: id
Sort Method:  external merge  Disk: 72048kB
-  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8) 
  (actual time=8.595..5569.500 rows=410 loops=1)
  
  Now the sorting takes about 20 seconds, so it seems reasonable compared to 
  30 seconds, right? But one thing I'm confused is that, why is additional 
  comparison really so expensive?  Does it incur additional I/O? From the 
  cost model, it seems not, all the cost are the same (575775.45).
  
  Thanks,
  Li Jie
  
  In the first query, the cost is basically the I/O cost to read the
  table from disk. The actual sort does not do anything since the
  sort values are the same. In the second query, the sort has to
  swap things in memory/disk to get them in the correct order for
  the result. This actually takes CPU and possibly additional I/O
  which is why it is slower. In the case of sorting by just the id
  column, the size of the sorted values is smaller which would need
  fewer batches to complete the sort since the sort is bigger than
  the work_mem.
  
  Cheers,
  Ken
 
 Hi Ken,
 
 Thanks for your analysis.
 
 But in the last query that sorts on id,  since the query selects all the 
 columns for output, the actual sorted size is the same, and the only 
 difference is the comparison cost. The query sorting on two columns needs to 
 do twice the comparison. Am I right?
 
 Thanks,
 Li Jie

I think you are right. Sorry for the confusion.

Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 Here's a patch that changes walsender to require a special privilege
 for replication instead of relying on superuser permissions. We
 discussed this back before 9.0 was finalized, but IIRC we ran out of
 time. The motivation being that you really want to use superuser as
 little as possible - and since being a replication slave is a read
 only role, it shouldn't require the maximum permission available in
 the system.

Maybe it needn't require max permissions, but one of the motivations
for requiring superusernesss was to prevent Joe User from sucking every
last byte of data out of your database (and into someplace he could
examine it at leisure).  This patch opens that barn door wide, because
so far as I can see, it allows anybody at all to grant the replication
privilege ... or revoke it, thereby breaking your replication setup.
I think only superusers should be allowed to change the flag.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch BUG #5103: pg_ctl -w (re)start fails with custom unix_socket_directory

2010-12-23 Thread Quan Zongliang
On Wed, 22 Dec 2010 21:02:35 -0500 (EST)
Bruce Momjian br...@momjian.us wrote:

 Alvaro Herrera wrote:
  Excerpts from Quan Zongliang's message of mar dic 21 18:36:11 -0300 2010:
   On Mon, 29 Nov 2010 10:29:17 -0300
   Alvaro Herrera alvhe...@commandprompt.com wrote:
   
  
I think the way this should work is that you call postmaster with a new
switch and it prints out its configuration, after reading the
appropriate config file(s).  That way it handles all the little details
such as figuring out the correct config file, hadle include files, etc.
This output would be presumably easier to parse and more trustworthy.
   
   Sorry for my late reply.
   
   I will check the source of postmaster.
  
  Actually Bruce Momjian is now working on a different fix:
  unix_socket_directory would be added to postmaster.pid, allowing pg_ctl
  to find it.
 
 Yes, I will apply this patch tomorrow and it will be in 9.1.  Thanks for
 the report.
 
Nice work.


-- 
Quan Zongliang quanzongli...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?

2010-12-23 Thread Tom Lane
Kenneth Marshall k...@rice.edu writes:
 On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
 But in the last query that sorts on id,  since the query selects all the 
 columns for output, the actual sorted size is the same, and the only 
 difference is the comparison cost. The query sorting on two columns needs to 
 do twice the comparison. Am I right?

 I think you are right. Sorry for the confusion.

I doubt the cost of comparing two integers is the issue here; rather
it's more likely one of how many merge passes were needed.  You could
find out instead of just speculating by turning on trace_sort and
comparing the log outputs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Robert Haas
On Thu, Dec 23, 2010 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Here's a patch that changes walsender to require a special privilege
 for replication instead of relying on superuser permissions. We
 discussed this back before 9.0 was finalized, but IIRC we ran out of
 time. The motivation being that you really want to use superuser as
 little as possible - and since being a replication slave is a read
 only role, it shouldn't require the maximum permission available in
 the system.

 Maybe it needn't require max permissions, but one of the motivations
 for requiring superusernesss was to prevent Joe User from sucking every
 last byte of data out of your database (and into someplace he could
 examine it at leisure).  This patch opens that barn door wide, because
 so far as I can see, it allows anybody at all to grant the replication
 privilege ... or revoke it, thereby breaking your replication setup.
 I think only superusers should be allowed to change the flag.

I haven't looked at the patch yet, but I think we should continue to
allow superuser-ness to be *sufficient* for replication - i.e.
superusers will automatically have the replication privilege just as
they do any other - and merely allow this as an option for when you
want to avoid doing it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I haven't looked at the patch yet, but I think we should continue to
 allow superuser-ness to be *sufficient* for replication - i.e.
 superusers will automatically have the replication privilege just as
 they do any other - and merely allow this as an option for when you
 want to avoid doing it that way.

I don't particularly mind breaking that.  If we leave it as-is, we'll
be encouraging people to use superuser accounts for things that don't
need that, which can't be good from a security standpoint.

BTW, is it possible to set things up so that a REPLICATION account
can be NOLOGIN, thereby making it really hard to abuse for other
purposes?  Or does the login privilege check come too soon?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Robert Haas
On Thu, Dec 23, 2010 at 10:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I haven't looked at the patch yet, but I think we should continue to
 allow superuser-ness to be *sufficient* for replication - i.e.
 superusers will automatically have the replication privilege just as
 they do any other - and merely allow this as an option for when you
 want to avoid doing it that way.

 I don't particularly mind breaking that.  If we leave it as-is, we'll
 be encouraging people to use superuser accounts for things that don't
 need that, which can't be good from a security standpoint.

And if we break it, we'll be adding an additional, mandatory step to
make replication work that isn't required today.  You might think
that's OK, but I think the majority opinion is that it's already
excessively complex.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Database file copy

2010-12-23 Thread Robert Haas
On Wed, Dec 22, 2010 at 7:35 PM, Srini Raghavan sixersr...@yahoo.com wrote:
 I have tested this and it works, and I am continuing to test it more. I
 would like for validation of this idea from the experts and the community to
 make sure I haven't overlooked something obvious that might cause issues.

Interesting idea.  It seems like it might be possible to make this
work.  One obvious thing to watch out for is object ownership
information.  Roles are stored in pg_authid, which is a shared
catalog, so if you're unlucky you could manage to create a database
containing one or more objects that owned by a role ID that doesn't
exist in pg_authid, which will probably break things all over the
place.  There could be other pitfalls as well but that's the only one
that's obvious to me off the top of my head...

I would strongly recommend basing this on the latest minor release of
PostgreSQL 9.0 rather than an outdated minor release of PostgreSQL
8.4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-23 Thread Tomas Vondra
Dne 20.12.2010 00:03, Tom Lane napsal(a):
 I wrote:
 That is not the number of interest.  The number of interest is that it's
 8 bytes added onto a struct that currently contains 11 of 'em; in other
 words a 9% increase in the size of the stats file, and consequently
 about a 9% increase in the cost of updating it.
 
 Wups, sorry, I was looking at the wrong struct.  It's actually an
 addition of 1 doubleword to a struct of 21 of 'em, or about 5%.
 That's still an awful lot in comparison to the prospective usefulness
 of the information.
 
   regards, tom lane
 

OK, so here goes the simplified patch - it tracks one reset timestamp
for a background writer and for each database.

regards
Tomas

PS: I've noticed Magnus posted a patch to track recovery conflicts,
adding a new view pg_stat_database_conflicts. I have not checked it yet
but it should not influence this patch.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-23 Thread Robert Haas
2010/12/23 Tomas Vondra t...@fuzzy.cz:
 Dne 20.12.2010 00:03, Tom Lane napsal(a):
 I wrote:
 That is not the number of interest.  The number of interest is that it's
 8 bytes added onto a struct that currently contains 11 of 'em; in other
 words a 9% increase in the size of the stats file, and consequently
 about a 9% increase in the cost of updating it.

 Wups, sorry, I was looking at the wrong struct.  It's actually an
 addition of 1 doubleword to a struct of 21 of 'em, or about 5%.
 That's still an awful lot in comparison to the prospective usefulness
 of the information.

                       regards, tom lane


 OK, so here goes the simplified patch - it tracks one reset timestamp
 for a background writer and for each database.

I think you forgot the attachment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-23 Thread Tomas Vondra
Dne 21.12.2010 16:54, Florian Pflug napsal(a):
 I think that maybe it'd be acceptable to scan a large portion of the
 table to estimate dist(A,B), *if* we must only do so only once in a while. 
 But even with
 a full scan, how would we arrive at an estimate for dist(A,B)? Keeping all 
 values in memory,
 and spilling them into, say, an on-disk hash table adds even more overhead to 
 the already
 expensive full scan. Maybe using a bloom filter instead of a hash table could 
 avoid
 the spilling to disk, in exchange for a slightly less precise result...

I've read some basics about a Bloom filters, and I'm not sure we can use
it in this case. I see two problems with this approach:

1) impossibility to predict the number of elements in advance

   To build a Bloom filter with limited error rate, you need to size it
   properly (number of hash function and size of the bit array). But
   that's exactly the the information we're looking for.

   I guess we could use the highest possible value (equal to the number
   of tuples) - according to wiki you need about 10 bits per element
   with 1% error, i.e. about 10MB of memory for each million of
   elements.

   That's not much but this needs to be done for each column separately
   and for the combination - for N columns you need (at least) N+1
   filters.

   Sure - increasing the error rate and using a different estimate to
   build the bloom filter would significantly decrease the memory
   requirements.

2) sampling the whole table

   A naive approach would be to sample the whole table each time, but
   for large tables that might be a bit of a problem. So I'm thinking
   about what alternatives are there ...

   One possibility is to build the Bloom filter incrementally and store
   it on disk, or something like that. I guess this could be done
   during VACUUM or something like that. Anyway there's an issue how to
   set the filter size initially (estimate the number of elements),
   just as in the previous section.

   Another possibility is to collect the data from just a small portion
   of a table and then use the result to estimate the number of distinct
   values for the whole table. But I'm not sure we can do this reliably,
   I see many traps in this.

But maybe I'm just missing something important.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-23 Thread Tomas Vondra
Dne 23.12.2010 20:09, Robert Haas napsal(a):
 2010/12/23 Tomas Vondra t...@fuzzy.cz:
 Dne 20.12.2010 00:03, Tom Lane napsal(a):
 I wrote:
 That is not the number of interest.  The number of interest is that it's
 8 bytes added onto a struct that currently contains 11 of 'em; in other
 words a 9% increase in the size of the stats file, and consequently
 about a 9% increase in the cost of updating it.

 Wups, sorry, I was looking at the wrong struct.  It's actually an
 addition of 1 doubleword to a struct of 21 of 'em, or about 5%.
 That's still an awful lot in comparison to the prospective usefulness
 of the information.

   regards, tom lane


 OK, so here goes the simplified patch - it tracks one reset timestamp
 for a background writer and for each database.
 
 I think you forgot the attachment.

Yes, I did. Thanks!

Tomas

From da2154954a547c143cd0d130597411911f339c07 Mon Sep 17 00:00:00 2001
From: vampire vamp...@rimmer.reddwarf
Date: Thu, 23 Dec 2010 18:40:48 +0100
Subject: [PATCH] Track timestamp of the last stats reset (for each database and 
a background writer).

This is simplified version of the previous patch, that kept a timestamp for each
database, table, index and function and for the background writer. This does not
keep the timestamp for individual objects, but if stats for a 
table/index/function
are reset, the timestamp for the whole database is updated.
---
 doc/src/sgml/monitoring.sgml |   18 ++
 src/backend/catalog/system_views.sql |6 --
 src/backend/postmaster/pgstat.c  |   13 +
 src/backend/utils/adt/pgstatfuncs.c  |   26 ++
 src/include/catalog/pg_proc.h|4 
 src/include/pgstat.h |5 +
 6 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5fd0213..99e091d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -600,6 +600,15 @@ postgres: replaceableuser/ replaceabledatabase/ 
replaceablehost/ re
  /row
 
  row
+  
entryliteralfunctionpg_stat_get_db_stat_reset_time/function(typeoid/type)/literal/entry
+  entrytypetimestamptz/type/entry
+  entry
+   Time of the last reset of statistics for this database (as a result of 
executing
+   functionpg_stat_reset/function function) or any object within it 
(table or index).
+  /entry
+ /row
+
+ row
   
entryliteralfunctionpg_stat_get_numscans/function(typeoid/type)/literal/entry
   entrytypebigint/type/entry
   entry
@@ -1062,6 +1071,15 @@ postgres: replaceableuser/ replaceabledatabase/ 
replaceablehost/ re
varnamebgwriter_lru_maxpages/varname parameter
   /entry
  /row
+ 
+ row
+  
entryliteralfunctionpg_stat_get_bgwriter_stat_reset_time()/function/literal/entry
+  entrytypetimestamptz/type/entry
+  entry
+Time of the last reset of statistics for the background writer (as a 
result of executing
+functionpg_stat_reset_shared('bgwriter')/function)
+  /entry
+ /row
 
  row
   
entryliteralfunctionpg_stat_get_buf_written_backend()/function/literal/entry
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 346eaaf..da92438 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -502,7 +502,8 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_tuples_fetched(D.oid) AS tup_fetched,
 pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
 pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
-pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted
+pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted,
+   pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
 FROM pg_database D;
 
 CREATE VIEW pg_stat_user_functions AS
@@ -538,7 +539,8 @@ CREATE VIEW pg_stat_bgwriter AS
 pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
 pg_stat_get_buf_written_backend() AS buffers_backend,
 pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-pg_stat_get_buf_alloc() AS buffers_alloc;
+pg_stat_get_buf_alloc() AS buffers_alloc,
+   pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
 CREATE VIEW pg_user_mappings AS
 SELECT
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 856daa7..66e0b69 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3130,6 +3130,8 @@ pgstat_get_db_entry(Oid databaseid, bool create)
result-n_tuples_deleted = 0;
result-last_autovac_time = 0;
 
+   result-stat_reset_timestamp = GetCurrentTimestamp();
+
memset(hash_ctl, 0, sizeof(hash_ctl));
hash_ctl.keysize = sizeof(Oid);
hash_ctl.entrysize = sizeof(PgStat_StatTabEntry);
@@ -3408,6 

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Magnus Hagander
On Thu, Dec 23, 2010 at 16:15, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Here's a patch that changes walsender to require a special privilege
 for replication instead of relying on superuser permissions. We
 discussed this back before 9.0 was finalized, but IIRC we ran out of
 time. The motivation being that you really want to use superuser as
 little as possible - and since being a replication slave is a read
 only role, it shouldn't require the maximum permission available in
 the system.

 Maybe it needn't require max permissions, but one of the motivations
 for requiring superusernesss was to prevent Joe User from sucking every
 last byte of data out of your database (and into someplace he could
 examine it at leisure).  This patch opens that barn door wide, because
 so far as I can see, it allows anybody at all to grant the replication
 privilege ... or revoke it, thereby breaking your replication setup.
 I think only superusers should be allowed to change the flag.

That was certainly not intentional - and doesn't work that way for me
at least, unless I broke it right before I submitted it.

oh hang on.. Yeah, it's allowing anybody *that has CREATE ROLE*
privilege to do it, I think. And I agree that's wrong and should be
fixed. But I can't see it allowing anybody at all to do it - am I
misreading the code?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Magnus Hagander
On Thu, Dec 23, 2010 at 16:57, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 23, 2010 at 10:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I haven't looked at the patch yet, but I think we should continue to
 allow superuser-ness to be *sufficient* for replication - i.e.
 superusers will automatically have the replication privilege just as
 they do any other - and merely allow this as an option for when you
 want to avoid doing it that way.

 I don't particularly mind breaking that.  If we leave it as-is, we'll
 be encouraging people to use superuser accounts for things that don't
 need that, which can't be good from a security standpoint.

 And if we break it, we'll be adding an additional, mandatory step to
 make replication work that isn't required today.  You might think
 that's OK, but I think the majority opinion is that it's already
 excessively complex.

Most of the people I run across in the real world are rather surprised
how *easy* it is to set up, and not how complex. And tbh, the only
complexity complaints I've heard there are about the requirement to
start/backup/stop to get it up and running. I've always told everybody
to create a separate account to do it, and not heard a single comment
about that.

That said, how about a compromise in that we add the replication flag
by default to the initial superuser when it's created? That way, it's
at least possible to remove it if you want to. Would that address your
complexity concern?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?

2010-12-23 Thread Li Jie
- Original Message - 
From: Kenneth Marshall k...@rice.edu
To: Li Jie jay23j...@gmail.com
Cc: pgsql-hackers pgsql-hackers@postgresql.org
Sent: Thursday, December 23, 2010 10:30 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon 
one column?


 On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
 Hi Ken,
 
 Thanks for your tips! Yes it is the case, and I run another query sorting on 
 the second column whose values are random.
 
 postgres=# explain analyze select * from big_wf order by id;
   QUERY PLAN 

 -
 Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual 
 time=25681.875..36458.824 rows=410 loops=1)
   Sort Key: id
   Sort Method:  external merge  Disk: 72048kB
   -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8) (actual 
 time=8.595..5569.500 rows=410 loops=1)
 
 Now the sorting takes about 20 seconds, so it seems reasonable compared to 
 30 seconds, right? But one thing I'm confused is that, why is additional 
 comparison really so expensive?  Does it incur additional I/O? From the cost 
 model, it seems not, all the cost are the same (575775.45).
 
 Thanks,
 Li Jie
 
 In the first query, the cost is basically the I/O cost to read the
 table from disk. The actual sort does not do anything since the
 sort values are the same. In the second query, the sort has to
 swap things in memory/disk to get them in the correct order for
 the result. This actually takes CPU and possibly additional I/O
 which is why it is slower. In the case of sorting by just the id
 column, the size of the sorted values is smaller which would need
 fewer batches to complete the sort since the sort is bigger than
 the work_mem.
 
 Cheers,
 Ken

Hi Ken,

Thanks for your analysis.

But in the last query that sorts on id,  since the query selects all the 
columns for output, the actual sorted size is the same, and the only difference 
is the comparison cost. The query sorting on two columns needs to do twice the 
comparison. Am I right?

Thanks,
Li Jie
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Li Jie
Hi Marti,

Thanks for your help! I guess I understand what you mean, a clustered index 
will make sorting as cheap as a seq scan, right?

But what I meant is, is there any potential optimization for the backend 
implementation? Intuitively, if sorting on one column or two columns will incur 
the same I/O costs, why should there be so much difference?

Thanks,
Li Jie

- Original Message - 
From: Marti Raudsepp ma...@juffo.org
To: Jie Li jay23j...@gmail.com
Cc: pgsql-hackers pgsql-hackers@postgresql.org
Sent: Thursday, December 23, 2010 10:17 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower than sorting on 
one column?


On Thu, Dec 23, 2010 at 09:33, Jie Li jay23j...@gmail.com wrote:
 While the first sorting takes
 about only 6 seconds, the second one takes over 30 seconds, Is this too
 much than expected? Is there any possible optimization ?

If you're doing these queries often, you should:
CREATE INDEX ix_big_wf_age_id ON big_wf (age, id)

If that's still not fast enough, you can physically sort rows in the
table using the newly created index:
CLUSTER big_wf USING ix_big_wf_age_id;

Please post back your results. :)

Regards,
Marti
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is sorting on two columns so slower than sortingon one column?

2010-12-23 Thread Li Jie
Hi Ken,

Thanks for your tips! Yes it is the case, and I run another query sorting on 
the second column whose values are random.

postgres=# explain analyze select * from big_wf order by id;
  QUERY PLAN

-
Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual 
time=25681.875..36458.824 rows=410 loops=1)
  Sort Key: id
  Sort Method:  external merge  Disk: 72048kB
  -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8) (actual 
time=8.595..5569.500 rows=410 loops=1)

Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 
seconds, right? But one thing I'm confused is that, why is additional 
comparison really so expensive?  Does it incur additional I/O? From the cost 
model, it seems not, all the cost are the same (575775.45).

Thanks,
Li Jie

- Original Message - 
From: Kenneth Marshall k...@rice.edu
To: Jie Li jay23j...@gmail.com
Cc: pgsql-hackers pgsql-hackers@postgresql.org
Sent: Thursday, December 23, 2010 10:03 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower than sortingon 
one column?


 On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote:
 Hi,
 
 Here is the test table,
 
 postgres=# \d big_wf
 Table public.big_wf
  Column |  Type   | Modifiers
 +-+---
  age| integer |
  id | integer |
 
 postgres=# \dt+ big_wf
  List of relations
  Schema |  Name  | Type  |  Owner   |  Size  | Description
 ++---+--++-
  public | big_wf | table | workshop | 142 MB |
 
 
 The first query sorting on one column:
 postgres=# explain analyze select * from big_wf order by age;
QUERY
 PLAN
 -
  Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
 time=11228.155..16427.149 rows=410 loops=1)
Sort Key: age
Sort Method:  external sort  Disk: 72112kB
-  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
 (actual time=6.196..4797.620 rows=410 loops=1)
  Total runtime: 19530.452 ms
 (5 rows)
 
 The second query sorting on two columns:
 postgres=# explain analyze select * from big_wf order by age,id;
QUERY
 PLAN
 -
  Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
 time=37544.779..48206.702 rows=410 loops=1)
Sort Key: age, id
Sort Method:  external merge  Disk: 72048kB
-  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
 (actual time=6.796..5518.663 rows=410 loops=1)
  Total runtime: 51258.000 ms
 (5 rows)
 
 The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
 first column(age) of all the tuples are of the same value, so the second
 column(id) is always needed for comparison.  While the first sorting takes
 about only 6 seconds, the second one takes over 30 seconds,  Is this too
 much than expected? Is there any possible optimization ?
 
 Thanks,
 Li Jie
 
 Hi Li,
 
 If I understand your description, in the first query the sort does
 not actually have to do anything because the column values for age
 are all degenerate. In the second query, you actually need to sort
 the values which is why it takes longer. If the first column values
 are the same, then simply sorting by id alone would be faster.
 You could also bump up work_mem for the query to perform the sort
 in memory.
 
 Regards,
 Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Dec 23, 2010 at 16:15, Tom Lane t...@sss.pgh.pa.us wrote:
 I think only superusers should be allowed to change the flag.

 That was certainly not intentional - and doesn't work that way for me
 at least, unless I broke it right before I submitted it.

 oh hang on.. Yeah, it's allowing anybody *that has CREATE ROLE*
 privilege to do it, I think. And I agree that's wrong and should be
 fixed. But I can't see it allowing anybody at all to do it - am I
 misreading the code?

Ah, sorry, yeah there are probably CREATE ROLE privilege checks
somewhere upstream of here.  I was expecting to see a privilege check
added by the patch itself, and did not, so I complained.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: plpgsql - foreach in

2010-12-23 Thread Pavel Stehule
Hello

attached patch contains a implementation of iteration over a array:

Regards

Pavel Stehule
*** ./doc/src/sgml/plpgsql.sgml.orig	2010-12-09 08:20:08.0 +0100
--- ./doc/src/sgml/plpgsql.sgml	2010-12-23 21:05:51.459678695 +0100
***
*** 2238,2243 
--- 2238,2268 
  /para
 /sect2
  
+sect2 id=plpgsql-array-iterating
+ titleLooping Through Array/title
+ synopsis
+ optional lt;lt;replaceablelabel/replaceablegt;gt; /optional
+ FOREACH replaceabletarget/replaceable optional SCALE replaceablenumber/replaceable /optional IN replaceableexpression/replaceable LOOP
+ replaceablestatements/replaceable
+ END LOOP optional replaceablelabel/replaceable /optional;
+ /synopsis
+ 
+ programlisting
+ CREATE FUNCTION sum(VARIADIC int[]) RETURNS int8 AS $$
+ DECLARE
+   s int8; x int;
+ BEGIN
+   FOREACH x IN $1
+   LOOP
+ s := s + x;
+   END LOOP;
+   RETURN s;
+ END;
+ $$ LANGUAGE plpgsql;
+ /programlisting
+ 
+/sect2
+ 
 sect2 id=plpgsql-error-trapping
  titleTrapping Errors/title
  
*** ./src/pl/plpgsql/src/gram.y.orig	2010-12-17 09:46:55.0 +0100
--- ./src/pl/plpgsql/src/gram.y	2010-12-23 21:24:56.604966977 +0100
***
*** 21,26 
--- 21,27 
  #include parser/parse_type.h
  #include parser/scanner.h
  #include parser/scansup.h
+ #include utils/array.h
  
  
  /* Location tracking support --- simpler than bison's default */
***
*** 134,139 
--- 135,141 
  			PLpgSQL_datum   *scalar;
  			PLpgSQL_rec *rec;
  			PLpgSQL_row *row;
+ 			int	slice;
  		}		forvariable;
  		struct
  		{
***
*** 178,184 
  %type ival	assign_var
  %type var		cursor_variable
  %type datum	decl_cursor_arg
! %type forvariable	for_variable
  %type stmt	for_control
  
  %type str		any_identifier opt_block_label opt_label
--- 180,186 
  %type ival	assign_var
  %type var		cursor_variable
  %type datum	decl_cursor_arg
! %type forvariable	for_variable foreach_control
  %type stmt	for_control
  
  %type str		any_identifier opt_block_label opt_label
***
*** 190,196 
  %type stmt	stmt_return stmt_raise stmt_execsql
  %type stmt	stmt_dynexecute stmt_for stmt_perform stmt_getdiag
  %type stmt	stmt_open stmt_fetch stmt_move stmt_close stmt_null
! %type stmt	stmt_case
  
  %type list	proc_exceptions
  %type exception_block exception_sect
--- 192,198 
  %type stmt	stmt_return stmt_raise stmt_execsql
  %type stmt	stmt_dynexecute stmt_for stmt_perform stmt_getdiag
  %type stmt	stmt_open stmt_fetch stmt_move stmt_close stmt_null
! %type stmt	stmt_case stmt_foreach_a 
  
  %type list	proc_exceptions
  %type exception_block exception_sect
***
*** 264,269 
--- 266,272 
  %token keyword	K_FETCH
  %token keyword	K_FIRST
  %token keyword	K_FOR
+ %token keyword	K_FOREACH
  %token keyword	K_FORWARD
  %token keyword	K_FROM
  %token keyword	K_GET
***
*** 298,303 
--- 301,307 
  %token keyword	K_ROWTYPE
  %token keyword	K_ROW_COUNT
  %token keyword	K_SCROLL
+ %token keyword	K_SLICE
  %token keyword	K_SQLSTATE
  %token keyword	K_STRICT
  %token keyword	K_THEN
***
*** 739,744 
--- 743,750 
  		{ $$ = $1; }
  | stmt_for
  		{ $$ = $1; }
+ | stmt_foreach_a
+ 		{ $$ = $1; }
  | stmt_exit
  		{ $$ = $1; }
  | stmt_return
***
*** 1386,1391 
--- 1392,1455 
  	}
  ;
  
+ stmt_foreach_a		: opt_block_label K_FOREACH foreach_control K_IN  expr_until_loop loop_body
+ 	{
+ 		PLpgSQL_stmt_foreach_a *new = palloc0(sizeof(PLpgSQL_stmt_foreach_a));
+ 		new-cmd_type = PLPGSQL_STMT_FOREACH_A;
+ 		new-lineno = plpgsql_location_to_lineno(@2);
+ 		new-label = $1;
+ 		new-expr = $5;
+ 		new-slice = $3.slice;
+ 		new-body = $6.stmts;
+ 
+ 		if ($3.rec)
+ 		{
+ 			new-rec = $3.rec;
+ 			check_assignable((PLpgSQL_datum *) new-rec, @3);
+ 		}
+ 		else if ($3.row)
+ 		{
+ 			new-row = $3.row;
+ 			check_assignable((PLpgSQL_datum *) new-row, @3);
+ 		}
+ 		else if ($3.scalar)
+ 		{
+ 			Assert($3.scalar-dtype == PLPGSQL_DTYPE_VAR);
+ 			new-var = (PLpgSQL_var *) $3.scalar;
+ 			check_assignable($3.scalar, @3);
+ 
+ 		}
+ 		else
+ 		{
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_SYNTAX_ERROR),
+ 	 errmsg(loop variable of loop over arrat must be a record, row variable, scalar variable or list of scalar variables),
+ 			 parser_errposition(@3)));
+ 		}
+ 
+ 		check_labels($1, $6.end_label, $6.end_label_location);
+ 		$$ = (PLpgSQL_stmt *) new;
+ 	}
+ ;
+ 
+ foreach_control		: for_variable
+ 	{
+ 		$$ = $1;
+ 		$$.slice = 0;
+ 	}
+ | for_variable K_SLICE ICONST
+ 	{
+ 		$$ = $1;
+ 		$$.slice = $3;
+ 		if ($3  0 || $3 = MAXDIM)
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ 	 errmsg(number of slicing array 

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Dec 23, 2010 at 16:57, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 23, 2010 at 10:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't particularly mind breaking that.  If we leave it as-is, we'll
 be encouraging people to use superuser accounts for things that don't
 need that, which can't be good from a security standpoint.

 And if we break it, we'll be adding an additional, mandatory step to
 make replication work that isn't required today.  You might think
 that's OK, but I think the majority opinion is that it's already
 excessively complex.

 Most of the people I run across in the real world are rather surprised
 how *easy* it is to set up, and not how complex. And tbh, the only
 complexity complaints I've heard there are about the requirement to
 start/backup/stop to get it up and running. I've always told everybody
 to create a separate account to do it, and not heard a single comment
 about that.

FWIW, it seems unreasonable to me to expect that we will not be breaking
any part of a 9.0 replication configuration over the next release or
two.  We *knew* we were shipping a rough version that would require
refinements, and this is one of the planned refinements.

 That said, how about a compromise in that we add the replication flag
 by default to the initial superuser when it's created? That way, it's
 at least possible to remove it if you want to. Would that address your
 complexity concern?

It does nothing to address my security concern.  I want to discourage
people from using superuser accounts for this, full stop.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Stefan Kaltenbrunner

On 12/23/2010 08:59 PM, Magnus Hagander wrote:

On Thu, Dec 23, 2010 at 16:57, Robert Haasrobertmh...@gmail.com  wrote:

On Thu, Dec 23, 2010 at 10:54 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

Robert Haasrobertmh...@gmail.com  writes:

I haven't looked at the patch yet, but I think we should continue to
allow superuser-ness to be *sufficient* for replication - i.e.
superusers will automatically have the replication privilege just as
they do any other - and merely allow this as an option for when you
want to avoid doing it that way.


I don't particularly mind breaking that.  If we leave it as-is, we'll
be encouraging people to use superuser accounts for things that don't
need that, which can't be good from a security standpoint.


And if we break it, we'll be adding an additional, mandatory step to
make replication work that isn't required today.  You might think
that's OK, but I think the majority opinion is that it's already
excessively complex.


Most of the people I run across in the real world are rather surprised
how *easy* it is to set up, and not how complex. And tbh, the only
complexity complaints I've heard there are about the requirement to
start/backup/stop to get it up and running. I've always told everybody
to create a separate account to do it, and not heard a single comment
about that.


I agree - people I talked to are fairly surprised on us not using a 
dedicated replication role but are surprised at the complexity of 
actually initializing the replication (mostly the we cannot do a base 
backup over the replication connection missfeature)



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] log_hostname and pg_stat_activity

2010-12-23 Thread Peter Eisentraut
Somehow I fantasized that log_hostname would also turn
pg_stat_activity.client_addr into names instead of IP addresses.  It
doesn't, but perhaps it should?  It would be nice to be able to
configure an IP-address free setup.  Or would it be worth having a
separate configuration parameter for that?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_hostname and pg_stat_activity

2010-12-23 Thread Magnus Hagander
On Thu, Dec 23, 2010 at 22:09, Peter Eisentraut pete...@gmx.net wrote:
 Somehow I fantasized that log_hostname would also turn
 pg_stat_activity.client_addr into names instead of IP addresses.  It
 doesn't, but perhaps it should?  It would be nice to be able to
 configure an IP-address free setup.  Or would it be worth having a
 separate configuration parameter for that?

It should certainly be renamed to something else if it does both, but
I don't see the point of having two separate parameters between them.
As long as you can use a cached version of the lookup, you're only
paying the price once, after all...

However, pg_stat_activity.client_addr is an inet field, not a text
string, so you'd have to invent a separate field for it I think...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-23 Thread Tomas Vondra
Dne 21.12.2010 19:03, Tomas Vondra napsal(a):
 My plan for the near future (a few weeks) is to build a simple 'module'
 with the ability to estimate the number of rows for a given condition.
 This could actually be quite useful as a stand-alone contrib module, as
 the users often ask how to get a number of rows fast (usually for paging).

Hm, I've been thinking about where to place this new module - I thought
pgfoundry might be the right place, but I've noticed it supports just
CVS. Well, that's a bit antiquated SCM I think - I'm very happy with the
comfort provided by Git or SVN.

Is there some other place commonly used for pg-related projects? I've
been thinking about github or something like that ...

And I've finally set up the wiki-page about this effort - it's just a
summary of this whole thread.

http://wiki.postgresql.org/wiki/Cross_Columns_Stats

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Database file copy

2010-12-23 Thread Srini Raghavan
Thank you very much for reviewing, appreciate the feedback.  As pointed out by 
you, it is always best to test it out with the latest version, so, I tested the 
same approach with postgres 9.0.2 on windows just now, and it works! 


I forgot to mention earlier that in addition to setting vacuum_freeze_table_age 
to 0, vacuum_freeze_min_age must also be set to 0 to reset xmin with the 
FrozenXid. 


And you were spot on with regards to permission issues with roles. I had been 
testing with the postgres account, which is a superuser and it always works.  
After the database files are copied over in the deploy instance, any object 
that 
had ownership set to a custom role gets messed up, and logging in as that user 
gives permission denined error. But, there is a easy fix to this. As the 
postgres user, I ran the 


alter table objectname owner to rolename 

command for every object, followed by 

grant all on objecttype objectname to rolename 

command for every object, which resolved the permission denied issue. Thanks 
for 
pointing this out. 


Please let me know if you or anyone think of any other potential issues. Thanks 
again for reviewing.

Srini



  

Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-12-23 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote:
 
 I will try to read the thread and make a proposal for a more
 carefull implementation - just not today... I think the results
 would be interesting...
 
FWIW, the SSI patch that Dan and I are working on can't have a
guarantee that it is immediately safe to retry a transaction which
rolls back with a serialization failure (without potentially failing
again on exactly the same set of transactions) unless there is a
capability such as this Idle in transaction cancellation patch
would provide.  Safe immediate retry would be a nice guarantee for
SSI to provide.
 
That being the case, we may not be able to generate the final form
of the SSI patch until a patch for this issue is applied.  Obviously
I know that nobody is in a position to make any promises on this,
but I just thought I'd let people know that this issue could
possibly be on the critical path to a timely release -- at least if
that release will include SSI with the safe retry guarantee.  (At
least when I'm planning for a release, I like to know such
things)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Josh Berkus
On 12/23/10 7:49 AM, Robert Haas wrote:
 I haven't looked at the patch yet, but I think we should continue to
 allow superuser-ness to be *sufficient* for replication - i.e.
 superusers will automatically have the replication privilege just as
 they do any other - and merely allow this as an option for when you
 want to avoid doing it that way.

Yes.  Currently I already create a separate replicator superuser just
so that I can simply track which connections belong to replication.  It
would be great if it could make the replicator user less than a superuser.

If we still make it possible for postgres to replicate, then we don't
add any complexity to the simplest setup.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Database file copy

2010-12-23 Thread Alvaro Herrera
Excerpts from Srini Raghavan's message of jue dic 23 18:55:20 -0300 2010:

 Please let me know if you or anyone think of any other potential issues. 
 Thanks 
 again for reviewing.

I think anything in the shared catalogs could be an issue (look for
tables with pg_class.relisshared=true).  I think you'll need to do
something about shared dependencies as well; not sure what.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 If we still make it possible for postgres to replicate, then we don't
 add any complexity to the simplest setup.

Well, that's one laudable goal here, but secure by default is another
one that ought to be taken into consideration.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Josh Berkus
On 12/23/10 2:21 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 If we still make it possible for postgres to replicate, then we don't
 add any complexity to the simplest setup.
 
 Well, that's one laudable goal here, but secure by default is another
 one that ought to be taken into consideration.

I don't see how *not* granting the superuser replication permissions
makes things more secure.  The superuser can grant replication
permissions to itself, so why is suspending them by default beneficial?
 I'm not following your logic here.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot compile Pg 9.0.2 with MinGW under Windows

2010-12-23 Thread Andrew Dunstan



On 12/23/2010 07:11 AM, Pavel Golub wrote:

Hello, Pgsql-bugs.

Tried to use MinGw under windows to build client libraries at least.
However failed on ./configure --withou-zlib stage.

Please find attached log file, stdout and stderr outputs.

The main problem here I suppose is
configure: WARNING:someheader.h: present but cannot be compiled

Please five me advice on this.
Thanks in advance



Your gcc doesn't look like others we have:

You have:

   gcc (GCC) 3.4.4 (msys special)
   Copyright (C) 2004 Free Software Foundation, Inc.
   This is free software; see the source for copying conditions.  There
   is NO
   warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
   PURPOSE.

   configure:3252: $? = 0
   configure:3259: gcc -v 5
   Reading specs from /usr/lib/gcc/i686-pc-msys/3.4.4/specs
   Configured with: /home/cstrauss/build/gcc3/gcc-3.4.4/configure
   --prefix=/usr --sysconfdir=/etc --localstatedir=/var
   --infodir=/share/info --mandir=/share/man --libexecdir=/lib
   --enable-languages=c,c++ --disable-nls --enable-threads=posix
   --enable-sjlj-exceptions --enable-hash-synchronization
   --enable-libstdcxx-debug --with-newlib
   Thread model: posix



Buildfarm narwhal has:

   gcc.exe (GCC) 3.4.2 (mingw-special)
   Copyright (C) 2004 Free Software Foundation, Inc.
   This is free software; see the source for copying conditions.  There is NO
   warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

   configure:3252: $? = 0
   configure:3259: gcc -v5
   Reading specs from c:/MinGW/bin/../lib/gcc/mingw32/3.4.2/specs
   Configured with: ../gcc/configure --with-gcc --with-gnu-ld --with-gnu-as 
--host=mingw32 --target=mingw32 --prefix=/mingw --enable-threads --disable-nls 
--enable-languages=c,c++,f77,ada,objc,java --disable-win32-registry 
--disable-shared --enable-sjlj-exceptions --enable-libgcj --disable-java-awt 
--without-x --enable-java-gc=boehm --disable-libgcj-debug --enable-interpreter 
--enable-hash-synchronization --enable-libstdcxx-debug
   Thread model: win32
   gcc version 3.4.2 (mingw-special)





Buildfarm frogmouth has:

   gcc.exe (GCC) 4.5.0
   Copyright (C) 2010 Free Software Foundation, Inc.
   This is free software; see the source for copying conditions.  There is NO
   warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

   configure:3252: $? = 0
   configure:3259: gcc -v5
   Using built-in specs.
   COLLECT_GCC=c:\mingw\bin\gcc.exe
   COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/mingw32/4.5.0/lto-wrapper.exe
   Target: mingw32
   Configured with: ../gcc-4.5.0/configure 
--enable-languages=c,c++,ada,fortran,objc,obj-c++ --disable-sjlj-exceptions 
--with-dwarf2 --enable-shared --enable-libgomp --disable-win32-registry 
--enable-libstdcxx-debug --enable-version-specific-runtime-libs 
--disable-werror --build=mingw32 --prefix=/mingw
   Thread model: win32


   gcc version 4.5.0 (GCC)




cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 12/23/10 2:21 PM, Tom Lane wrote:
 Well, that's one laudable goal here, but secure by default is another
 one that ought to be taken into consideration.

 I don't see how *not* granting the superuser replication permissions
 makes things more secure.  The superuser can grant replication
 permissions to itself, so why is suspending them by default beneficial?
  I'm not following your logic here.

Well, the reverse of that is just as true: if we ship it without
replication permissions on the postgres user, people can change that if
they'd rather not create a separate role for replication.  But I think
we should encourage people to NOT do it that way.  Setting it up that
way by default hardly encourages use of a more secure arrangement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
 On 12/23/10 2:21 PM, Tom Lane wrote:
  Well, that's one laudable goal here, but secure by default is another
  one that ought to be taken into consideration.
 
 I don't see how *not* granting the superuser replication permissions
 makes things more secure.  The superuser can grant replication
 permissions to itself, so why is suspending them by default beneficial?
  I'm not following your logic here.

The point is that the *replication* role can't grant itself superuser
privs.  Having the replication role compromised isn't great, but if that
role is *also* a superuser, then the whole database server could be
compromised.  Encouraging users to continue to configure remote systems
with the ability to connect as a superuser when it's not necessary is a
bad idea.

One compromise would be to:

a) let superusers be granted the replication permission
b) have pg_dump assume that superusers have that permission when dumping
   from a version which pre-dates the replication grant
c) have pg_upgrade assume the superuser has that permission when
   upgrading
d) *not* grant replication to the default superuser 

A better alternative, imv, would be to just have a  d, and mention in
the release notes that users *should* create a dedicated replication
role which is *not* a superuser but *does* have the replication grant,
but if they don't want to change their existing configurations, they can
just grant the replication privilege to whatever role they're currently
using.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Josh Berkus
On 12/23/10 2:33 PM, Stephen Frost wrote:
 A better alternative, imv, would be to just have a  d, and mention in
 the release notes that users *should* create a dedicated replication
 role which is *not* a superuser but *does* have the replication grant,
 but if they don't want to change their existing configurations, they can
 just grant the replication privilege to whatever role they're currently
 using.

Well, if we really want people to change their behavior then we need to
make it easy for them:

1) have a replication permission
2) *by default* create a replication user with the replication
permission when we initdb.
3) have an example line for a replication connection by the replication
user in the default pg_hba.conf (commented out).
4) change all our docs and examples to use that replication user.

If using the replication user is easier than any other path, people
will.  If it's harder, they won't.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 12/23/10 2:33 PM, Stephen Frost wrote:
 A better alternative, imv, would be to just have a  d, and mention in
 the release notes that users *should* create a dedicated replication
 role which is *not* a superuser but *does* have the replication grant,
 but if they don't want to change their existing configurations, they can
 just grant the replication privilege to whatever role they're currently
 using.

 Well, if we really want people to change their behavior then we need to
 make it easy for them:

 1) have a replication permission
 2) *by default* create a replication user with the replication
 permission when we initdb.

Yeah, I could see doing that ... the entry would be wasted if you're not
doing any replication, but one wasted catalog entry isn't much.

However, it'd be a real good idea for that role to be NOLOGIN if it's
there by default.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
 1) have a replication permission

Right, that's what this patch is about.

 2) *by default* create a replication user with the replication
 permission when we initdb.

I'm not entirely sure about this one..  I'm not against it but I'm also
not really 'for' it. :)

 3) have an example line for a replication connection by the replication
 user in the default pg_hba.conf (commented out).

Sure.

 4) change all our docs and examples to use that replication user.

I don't have a problem with that.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

2010-12-23 Thread Josh Berkus

 Which means if we just export the macros, we would still not have caught
 this.  I would like to share all the defines --- I am just saying it
 isn't trivial.

I just called all the define variables manually rather than relying on
the macros.  Seemed to work fine.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 However, it'd be a real good idea for that role to be NOLOGIN if it's
 there by default.

Definitely.  I'd also suggest we WARN if someone creates a situation
where a role has both replication and login, and maybe prevent it
altogether, it's just a bad idea..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Magnus Hagander
On Thu, Dec 23, 2010 at 23:44, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 On 12/23/10 2:33 PM, Stephen Frost wrote:
 A better alternative, imv, would be to just have a  d, and mention in
 the release notes that users *should* create a dedicated replication
 role which is *not* a superuser but *does* have the replication grant,
 but if they don't want to change their existing configurations, they can
 just grant the replication privilege to whatever role they're currently
 using.

 Well, if we really want people to change their behavior then we need to
 make it easy for them:

 1) have a replication permission
 2) *by default* create a replication user with the replication
 permission when we initdb.

 Yeah, I could see doing that ... the entry would be wasted if you're not
 doing any replication, but one wasted catalog entry isn't much.

 However, it'd be a real good idea for that role to be NOLOGIN if it's
 there by default.

That shouldn't be too hard - I'll look at making the patch do that to
make sure it *isn't* that hard ;)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Josh Berkus

 I'm not entirely sure about this one..  I'm not against it but I'm also
 not really 'for' it. :)

2 reasons: (a) if users need to CREATE USER, that *does* add an extra
step and they're more likely to just choose to grant to postgres
instead, (b) having a standard installed user (just like the user
postgres is standard) would make our docs, examples and tutorials much
easier to use.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Database file copy

2010-12-23 Thread Srini Raghavan
Thank you, that is a great point. 
 
Based on your suggesstion, I wrote the following query:
 
select * from pg_class where relisshared=true order by relname
 
The above query returns 27 rows. I evaluated the impact on the following:
 
pg_auth_members - We create roles and memberships on each deploy instance, so 
this shouldn't be an issue.
 
pg_authid - As noted in my previous post, issuing alter and grant commands 
after 
file copy updates pg_authid with the correct information.
 
pg_database - not an issue, as we are creating the database on the deploy 
instance, we don't copy the database oid over from the master instance.
 
pg_db_role_setting - We don't have any database specific role settings. Even if 
we have a need in the future, we will set this up on the deploy instance, so, 
this shouldn't be an issue.
 
pg_pltemplate - We use plpgsql functions, and it works without any issues after 
file copy.
 
pg_shdepend - There is one SHARED_DEPENDENCY_PIN(p) entry in this system 
catalog, and the remaining are SHARED_DEPENDENCY_OWNER (o) entries. Since I am 
issuing an alter command to change the ownership after file copy to the 
appropriate role, this system catalog gets populated correctly. I wrote this 
query select oid,relname from pg_class where oid in (select objid from 
pg_shdepend) on the copied database, and it returns valid results, so this 
doens't seem to be an issue. As the documentation states, currently, postgres 
tracks the object to role dependencies, and it may track more types of 
dependencies in the future. Role dependencies has a fix as stated above, and 
when new dependencies come about, we will need to evaluate them.
 
pg_shdescription - stores optional comments, which we don't use.
 
pg_tablespace - we are looking to use the default tablespace at this time, 
which 
works. Need to evaluate the impact if we need to use custom tablespace.
 
The remaining entries or toast and index entries, which again should not be an 
impact.
 
Anything else? I am feeling confident about this after each review post. And, 
whereever I have said this shouldn't be an issue above, if you see any 
discrepancies, kindly highlight.
 
Thanks
 
Srini


  

Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

2010-12-23 Thread Josh Berkus
Greg, All:

Results for Solaris 10u8, on ZFS on a 7-drive attached storage array:

bash-3.00# ./test_fsync -f /dbdata/pgdata/test.out
Loops = 1

Simple write:
8k write  59988.002/second

Compare file sync methods using one write:
open_datasync 8k write  214.125/second
(unavailable: o_direct)
open_sync 8k write  222.155/second
(unavailable: o_direct)
8k write, fdatasync 214.086/second
8k write, fsync 215.035/second
(unavailable: fsync_writethrough)

Compare file sync methods using two writes:
2 open_datasync 8k writes   108.227/second
(unavailable: o_direct)
2 open_sync 8k writes   106.935/second
(unavailable: o_direct)
8k write, 8k write, fdatasync   205.525/second
8k write, 8k write, fsync   210.483/second
(unavailable: fsync_writethrough)

Compare open_sync with different sizes:
open_sync 16k write 211.481/second
2 open_sync 8k writes   106.202/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close  207.499/second
8k write, close, fsync  213.656/second




-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Robert Haas
On Thu, Dec 23, 2010 at 5:59 PM, Josh Berkus j...@agliodbs.com wrote:

 I'm not entirely sure about this one..  I'm not against it but I'm also
 not really 'for' it. :)

 2 reasons: (a) if users need to CREATE USER, that *does* add an extra
 step and they're more likely to just choose to grant to postgres
 instead, (b) having a standard installed user (just like the user
 postgres is standard) would make our docs, examples and tutorials much
 easier to use.

If the user exists but is disabled by default, I'm not sure that
really provides any advantage over not having it at all.  And it
certainly can't be enabled by default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Josh Berkus

 If the user exists but is disabled by default, I'm not sure that
 really provides any advantage over not having it at all.  And it
 certainly can't be enabled by default.

I was presuming that NOLOGIN wouldn't prevent a replication connections
via the replication user.  So the user would be enabled as far as
replication was concerned.

And currently, there is no replication connection in the pg_hba.conf.
So that's not a change; in fact, having a sample one would be an
improvement.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2010-12-23 Thread Itagaki Takahiro
On Tue, Dec 21, 2010 at 21:32, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 On Tue, Dec 21, 2010 at 20:14, Shigeru HANADA han...@metrosystems.co.jp 
 wrote:
 Attached is the revised version of file_fdw patch.  This patch is
 based on Itagaki-san's copy_export-20101220.diff patch.

 #1. Don't you have per-tuple memory leak? I added GetCopyExecutorState()
 because the caller needs to reset the per-tuple context periodically.

Sorry, I found there are no memory leak here. The related comment is:
[execnodes.h]
 *  CurrentMemoryContext should be set to ecxt_per_tuple_memory before
 *  calling ExecEvalExpr() --- see ExecEvalExprSwitchContext().
I guess CurrentMemoryContext in Iterate callback a per-tuple context.
So, we don't have to xport cstate-estate via GetCopyExecutorState().

 Or, if you eventually make a HeapTuple from values and nulls arrays,

ExecStoreVirtualTuple() seems to be better than the combination of
heap_form_tuple() and ExecStoreTuple() for the purpose. Could you try
to use slot-tts_values and slot-tts_isnull for NextCopyFrom() directly?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Florian Pflug
On Dec23, 2010, at 16:54 , Tom Lane wrote:
 BTW, is it possible to set things up so that a REPLICATION account
 can be NOLOGIN, thereby making it really hard to abuse for other
 purposes?  Or does the login privilege check come too soon?

Please don't. This violates the principle of least surprise big time!
And, whats worse, violate it in such a way that people *underestimate*
the permissions a particular role has, which from a security POV is
the worst than can happen. You pointed out yourself that REPLICATION
is a powerful permission to have because it essentially grants you read
access to the whole cluster. By allowing roles to connect a WAL receivers
even if they have NOLOGIN set, you're giving these powerful permission
to a role a DBA might think is disabled!

Now, I *can* see that having roles which may only connect as WAL receivers,
not to issue queries, is worthwhile. However, please either

A) Invert a new flag for that, for example SQL/NOSQL. A pure replication
   role would have WITH REPLICATION NOSQL, while most other would have
   WITH NOREPLICATION SQL. It's debatable wether postgres should have
   WITH REPLICATION SQL or WITH NOREPLICATION SQL by default.

B) Forbid REPLICATION roles from connecting as anything else than WAL
   receivers altogether. It'd then probably be a good idea to prevent
   such roles from having SUPERUSER, CREATEDB, CREATEROLE and INHERIT
   set as these flag wouldn't then make any sense.

The only downside of (B) that I can see is that it'd break setups that
work with 9.0.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Dec23, 2010, at 16:54 , Tom Lane wrote:
 BTW, is it possible to set things up so that a REPLICATION account
 can be NOLOGIN, thereby making it really hard to abuse for other
 purposes?  Or does the login privilege check come too soon?

 Please don't. This violates the principle of least surprise big time!

How so?  (Please note I said *can be*, not *has to be*.)

The point of this is to ensure that if someone does illicitly come by
the replication role's password, they can't use it to log in.  They can
still steal all your data, but they can't actually get into the
database.  I don't see why it's a bad idea to configure things that way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Florian Pflug
On Dec24, 2010, at 04:16 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 On Dec23, 2010, at 16:54 , Tom Lane wrote:
 BTW, is it possible to set things up so that a REPLICATION account
 can be NOLOGIN, thereby making it really hard to abuse for other
 purposes?  Or does the login privilege check come too soon?
 
 Please don't. This violates the principle of least surprise big time!
 
 How so?  (Please note I said *can be*, not *has to be*.)

Because a DBA might ALTER ROLE replication WITH NOLOGIN, thinking he has
just disabled that role. Only to find out weeks later than he hasn't
and that someone has been using that role to stream weeks worth of
confidential data to who knows where. 

The problem here is that you suggest NOLOGIN should mean Not allowed
to issue SQL commands, which really isn't what the name NOLOGIN
conveys. The concept itself is perfectly fine, but the name is dangerously
confusing.

 The point of this is to ensure that if someone does illicitly come by
 the replication role's password, they can't use it to log in.  They can
 still steal all your data, but they can't actually get into the
 database.  I don't see why it's a bad idea to configure things that way.

It's perfectly fine to configure things that way, and is in fact what I would
do. I'd just prefer the name for that setting to convey it's actual meaning
which is why I suggested adding a SQL/NOSQL flag. (Or SESSION/NOSESSION,
or whatever). Or, much simpler, to prevent WITH REPLICATION roles from issuing
SQL commands altogether. That'd achieve your goal just as well and is way less
confusing.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-23 Thread Florian Pflug
On Dec23, 2010, at 20:39 , Tomas Vondra wrote:
 Dne 21.12.2010 16:54, Florian Pflug napsal(a):
 I think that maybe it'd be acceptable to scan a large portion of the
 table to estimate dist(A,B), *if* we must only do so only once in a while. 
 But even with
 a full scan, how would we arrive at an estimate for dist(A,B)? Keeping all 
 values in memory,
 and spilling them into, say, an on-disk hash table adds even more overhead 
 to the already
 expensive full scan. Maybe using a bloom filter instead of a hash table 
 could avoid
 the spilling to disk, in exchange for a slightly less precise result...
 
 I've read some basics about a Bloom filters, and I'm not sure we can use
 it in this case. I see two problems with this approach:
 
 1) impossibility to predict the number of elements in advance
 
   To build a Bloom filter with limited error rate, you need to size it
   properly (number of hash function and size of the bit array). But
   that's exactly the the information we're looking for.
 
   I guess we could use the highest possible value (equal to the number
   of tuples) - according to wiki you need about 10 bits per element
   with 1% error, i.e. about 10MB of memory for each million of
   elements.
Drat. I had expected these number to come out quite a bit lower than
that, at least for a higher error target. But even with 10% false
positive rate, it's still 4.5MB per 1e6 elements. Still too much to
assume the filter will always fit into memory, I fear :-(

 2) sampling the whole table
 
   A naive approach would be to sample the whole table each time, but
   for large tables that might be a bit of a problem. So I'm thinking
   about what alternatives are there ..


   One possibility is to build the Bloom filter incrementally and store
   it on disk, or something like that. I guess this could be done
   during VACUUM or something like that. Anyway there's an issue how to
   set the filter size initially (estimate the number of elements),
   just as in the previous section.
The filter size could be derived from the table's statistics target, or
be otherwise user-definable. We could also auto-resize once it gets too
full. But still, that all seems awfully complex :-(

   Another possibility is to collect the data from just a small portion
   of a table and then use the result to estimate the number of distinct
   values for the whole table. But I'm not sure we can do this reliably,
   I see many traps in this.
This is how it works currently. The problem with this approach is that
it gives you very little guarantees about how precise the result will be.
Extrapolating works very well for things like MKVs and histograms, because
there you're by definition interested mostly in values which occur often -
and thus with a high probability in the relative few rows you sample. For
the number of distinct values, however, this isn't true - if ndistinct
is an order of magnitude smaller than the number of rows, relatively few
rows can account for a large percentage of the distinct values...

Another idea would be to obtain the ndistinct values from an index somehow.
Postgres cannot currently scan an index in physical order, only in logical
order, due to locking considerations. But since we'd only be interested in
an estimate, maybe a scan in physical block order would work for ndistinc
estimates? Just a wild idea, mind you, I haven't checked at all if that'd
be even remotely feasible.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 The problem here is that you suggest NOLOGIN should mean Not allowed
 to issue SQL commands, which really isn't what the name NOLOGIN
 conveys.

No, it means not allowed to connect.  It's possible now to issue
commands as a NOLOGIN user, you just have to use SET ROLE to become the
user.  I think you're arguing about a design choice that was already
made some time ago.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Robert Haas
On Thu, Dec 23, 2010 at 11:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 The problem here is that you suggest NOLOGIN should mean Not allowed
 to issue SQL commands, which really isn't what the name NOLOGIN
 conveys.

 No, it means not allowed to connect.  It's possible now to issue
 commands as a NOLOGIN user, you just have to use SET ROLE to become the
 user.  I think you're arguing about a design choice that was already
 made some time ago.

I think I agree with Florian about the confusing-ness of the proposed
semantics.  Aren't you saying you want NOLOGIN mean not allowed to
log in for the purposes of issuing SQL commands, but allowed to log in
for replication?  Uggh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump -X

2010-12-23 Thread Robert Haas
Back in 2006, we have this commit:

commit 2b25e1169f44368c120931787628d51731b5cc8c
Author: Peter Eisentraut pete...@gmx.net
Date:   Sat Oct 7 20:59:05 2006 +

The -X option in pg_dump was supposed to be a workaround for the lack of
portable long options.  But we have had portable long options for a long
time now, so this is obsolete.  Now people have added options which *only*
work with -X but not as regular long option, so I'm putting a stop to this:
-X is deprecated; it still works, but it has been removed from the
documentation, and please don't add more of them.

Since then, two additional -X options have crept in, doubtless due to
mimicry of the existing options without examination of the commit
logs.  I think we should either (a) remove the -X option altogether or
(b) change the comment so that it clearly states the same message that
appears here in the commit log, namely, that no new -X options are to
be created.  The existing comment says that -X is deprecated, but that
doesn't make it entirely 100% clear that the code isn't intended to be
further updated, at least judging by the results.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-23 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 I think I agree with Florian about the confusing-ness of the proposed
 semantics.  Aren't you saying you want NOLOGIN mean not allowed to
 log in for the purposes of issuing SQL commands, but allowed to log in
 for replication?  Uggh.

I like the general idea of a replication-only role or login.  Maybe
implementing that as a role w/ all the things that come along with it
being a role isn't right, but we don't want to have to reinvent all the
supported auth mechanisms (and please don't propose limiting the auth
options for the replication login!).  Is there a way we can leverage the
auth mechanisms, etc, while forcing the 'replication role' to only be
able to do what a 'replication role' should do?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] sepgsql contrib module

2010-12-23 Thread KaiGai Kohei
(2010/12/24 11:53), KaiGai Kohei wrote:
 There is one another issue to be discussed.
 We need a special form of regression test. Because SE-PostgreSQL
 makes access control decision based on security label of the peer
 process, we need to switch psql process during regression test.
 (So, I don't include test cases yet.)
 
 We have 'runcon' command to launch a child process with specified
 security label as long as the security policy allows. If we could
 launch 'psql' by 'runcon' with specified label, we can describe
 test-cases on the existing framework on 'make installcheck'.
 
 An idea is to add an option to pg_regress to launch psql command
 with a specified wrapper program (like 'runcon').
 In this case, each contrib modules kicks with REGRESS_OPTS setting.
 One thing to be considered is the security label to be given to
 the 'runcon' is flexible for each *.sql files.
 
The attached patch adds --launcher=COMMAND option to pg_regress.
If a command was specified, pg_regress prepends the specified
command string in front of psql command.

When we use this option, psql command process will launched via
the launcher program. Of course, the launcher has responsibility
to execute psql correctly.)

This example is a case when I run a regression test on cube module.
It tries to launch psql using 'runcon -l s0'.

  [kai...@saba cube]$ make installcheck REGRESS_OPTS=--launcher='runcon -l s0' 
--dbname=cube_regress
  make -C ../../src/test/regress pg_regress
  make[1]: Entering directory `/home/kaigai/repo/pgsql/src/test/regress'
  make -C ../../../src/port all
  make[2]: Entering directory `/home/kaigai/repo/pgsql/src/port'
  make[2]: Nothing to be done for `all'.
  make[2]: Leaving directory `/home/kaigai/repo/pgsql/src/port'
  make[1]: Leaving directory `/home/kaigai/repo/pgsql/src/test/regress'
  ../../src/test/regress/pg_regress --inputdir=. --psqldir=/usr/local/pgsql/bin 
--launcher='runcon -l s0' --dbname=cube_regress cube
  (using postmaster on Unix socket, default port)
  == dropping database cube_regress   ==
  DROP DATABASE
  == creating database cube_regress   ==
  CREATE DATABASE
  ALTER DATABASE
  == running regression test queries==
  test cube ... ok

  =
   All 1 tests passed.
  =

During the regression test, I checked security context of the process.

  [kai...@saba ~]$ env LANG=C pstree -Z
  systemd(`system_u:system_r:init_t:s0')
   :
   |-sshd(`unconfined_u:system_r:sshd_t:s0-s0:c0.c1023')
   |  |-sshd(`unconfined_u:system_r:sshd_t:s0-s0:c0.c1023')
   |  |  `-sshd(`unconfined_u:system_r:sshd_t:s0-s0:c0.c1023')
   |  | `-bash(`unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023')
   |  |`-make(`unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023')
   |  |   
`-pg_regress(`unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023')
   |  |  `-psql(`unconfined_u:unconfined_r:unconfined_t:s0')

It shows us the launcher program drops privileges of c0.c1023 on end of
the security label of processes between pg_regress and psql.

How about the idea to implement regression test for SE-PostgreSQL, or
possible other stuff which depends on environment variables.

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com


pg_regress-launcher.patch
Description: application/octect-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump -X

2010-12-23 Thread J. Roeleveld
On Friday 24 December 2010 05:35:26 Robert Haas wrote:
 Back in 2006, we have this commit:
 
 commit 2b25e1169f44368c120931787628d51731b5cc8c
 Author: Peter Eisentraut pete...@gmx.net
 Date:   Sat Oct 7 20:59:05 2006 +
 
 The -X option in pg_dump was supposed to be a workaround for the lack
 of portable long options.  But we have had portable long options for a
 long time now, so this is obsolete.  Now people have added options which
 *only* work with -X but not as regular long option, so I'm putting a stop
 to this: -X is deprecated; it still works, but it has been removed from
 the documentation, and please don't add more of them.
 
 Since then, two additional -X options have crept in, doubtless due to
 mimicry of the existing options without examination of the commit
 logs.  I think we should either (a) remove the -X option altogether or
 (b) change the comment so that it clearly states the same message that
 appears here in the commit log, namely, that no new -X options are to
 be created.  The existing comment says that -X is deprecated, but that
 doesn't make it entirely 100% clear that the code isn't intended to be
 further updated, at least judging by the results.

 Just a thought, but wouldn't a bit warning that appears when someone actually 
uses the -X option informing the user that it's deprecated prevent further 
ones to appear?

Are there situations where the -X option are actually required? Can't the code 
be adapted to use different options then the -X one?

Also, unless Gentoo actually strips the man-page and --help page (which I do 
seriously doubt), I do not see the -X option in the documentation.

--
Joost

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers