date:20101128

On Sat, Nov 27, 2010 at 2:17 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Thanks!

 The _oid variants will have to re-appear in the alter extension set
 schema patch, which is the last of the series. Meanwhile, I will have
 to merge head with the current extension patch (already overdue for a
 new version, waiting on purpose) so that it no longer includes the
 cfparser and execute from file patches too (which have changed a lot
 underneath it already).

I expected as much, but at least at that point it will be possible to
test that code.

 I'm not sure there's lots of precedent for dealing with in-commitfest
 patches dependencies, so here's a summary of what I think would ideally
 happen next (ordering counts):

  1. cfparser
    https://commitfest.postgresql.org/action/patch_view?id=413
    ready for commiter

  2. pg_execute_from_file
    https://commitfest.postgresql.org/action/patch_view?id=414
    needs another review and maybe some more documentation

  3. extensions
    https://commitfest.postgresql.org/action/patch_view?id=404
    needs review and minor updating
    will need another version after merging the two previous patches

  4. alter extension set schema
    https://commitfest.postgresql.org/action/patch_view?id=416
    needs review and a reviewer
    will need bitrot fix (and adding in the _oid variants)
    would be better to adjust once the 3 previous are in

Thanks, that's very helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib: auth_delay module

On Sat, Nov 27, 2010 at 2:44 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Nov 4, 2010 at 6:35 AM, Stephen Frost sfr...@snowman.net wrote:
 * Jan Urbański (wulc...@wulczer.org) wrote:
 On 04/11/10 14:09, Robert Haas wrote:
  Hmm, I wonder how useful this is given that restriction.

 As KaiGai mentined, it's more to make bruteforcing difficult (read: tmie
 consuming), right?

 Which it would still do, since the attacker would be bumping up against
 max_connections.  max_connections would be a DOS point, but that's no
 different from today.

 I haven' t thought of a way to test this, so I guess I'll just ask.
 If the attacking client just waits a few milliseconds for a response
 and then drops the socket, opening a new one, will the server-side
 walking-dead process continue to be charged against max_connections
 until it's sleep expires?

I'm not sure, either.  I suspect the answer is yes.  I guess you could
test this by writing a loop like this:

while true; do psql connection parameters that will fail authentication; done

...and then hitting ^C every few seconds during execution.  After
doing that for a bit, run select * from pg_stat_activity or ps auxww |
grep postgres in another window.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PROPOSAL of xmlvalidate

2010-11-28 Thread Andrew Dunstan




On 11/28/2010 05:33 AM, Tomáš Pospíšil wrote:

Hi,

I am working on patch adding xmlvalidate() functionality. LibXML 2.7.7 improved 
DTD, XSD, Relax-NG validation, so using that. I have idea of creating system 
table for holding DTDs, XSDs, Relax-NGs (similar as on ORACLE).

Is that good idea? If so, how to implement that table? pg_attribute and 
pg_class had changed guite from PostgresSQL 8.0.




In the first place you need to tell us why you think it should go in a 
catalog table at all. Unless you intend to use some sort of typmod with 
the xml type, to indicate the validation object, it seems quite unnecessary.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PLy_malloc and plperl mallocs

2010-11-28 Thread Jan Urbański

On 28/11/10 05:23, Andrew Dunstan wrote:
 
 
 On 11/27/2010 10:28 PM, Tom Lane wrote:
 =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=wulc...@wulczer.org  writes:
 I noticed that PL/Python uses a simple wrapper around malloc that does
 ereport(FATAL) if malloc returns NULL. I find it a bit harsh, don't we
 normally do ERROR if we run out of memory?
 And while looking at how PL/Perl does these things I find that one
 failed malloc (in compile_plperl_function) throws an ERROR, and the rest
 (in plperl_spi_prepare) are simply unguarded...
 I guess PL/Python should stop throwing FATAL errors and PL/Perl should
 get its own malloc_or_ERROR helper and start using that.
 The real question is why they're not using palloc instead.


 
 Well, the stuff in plperl_spi_prepare needs to be allocated in a
 long-lived context. We could use palloc in TopMemoryContext instead, I
 guess.

I'll do that for PL/Python for now. While on the topic of needless FATAL
errors, if you try to create a Python 3 function in a session that
already loaded Python 2, you get a FATAL error with an errhint of

Start a new session to use a different Python major version.

If you raise a FATAL error, you don't really have much choice than to
start a new session, since the current one just got nuked, no? Should
this be an ERROR?

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking representation of sort/hash semantics in queries and plans

2010-11-28 Thread Martijn van Oosterhout

On Sat, Nov 27, 2010 at 10:02:33PM -0500, Tom Lane wrote:
 In recent discussions of the plan-tree representation for KNNGIST index
 scans, there seemed to be agreement that it'd be a good idea to explicitly
 represent the expected sort ordering of the output.  While thinking about
 how best to do that, it struck me that there's some pretty horrendous
 impedance mismatches in the way we do things now.  Different parts of the
 system use two different representations of sort ordering:
 
 * A sort operator (which can have  or  semantics) plus nulls-first flag
 
 * A btree opfamily plus direction and nulls-first flags

Sounds like a good idea to me. Quite aside from the performance issues,
having one way to represent things will make it clearer what's going on
and easier to extend in the future.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

[HACKERS] [PATCH] Return command tag 'REPLACE X' for CREATE OR REPLACE statements.

2010-11-28 Thread Marti Raudsepp

Hi list,

Often enough when developing PostgreSQL views and functions, I have
pasted the CREATE OR REPLACE commands into the wrong window/shell and
ran them there without realizing that I'm creating a function in the
wrong database, instead of replacing. Currently psql does not provide
any feedback of which action really occured.

Only after writing this patch I realized that I could instead raise a
NOTICE, like current IF EXISTS/IF NOT EXISTS clauses do. Is that a
better way to solve this?

This patch returns command tag CREATE X or REPLACE X for
LANGAUGE/VIEW/RULE/FUNCTION. This is done by passing completionTag to
from ProcessUtility to more functions, and adding a 'bool *didUpdate'
argument to some lower-level functions. I'm not sure if passing back
the status in a bool* is considered good style, but this way all the
functions look consistent.

Regards,
Marti
From b848a4129e31aa021059dab5a0a7ad139fe018b3 Mon Sep 17 00:00:00 2001
From: Marti Raudsepp ma...@juffo.org
Date: Sun, 28 Nov 2010 16:49:41 +0200
Subject: [PATCH] Return command tag 'REPLACE X' for CREATE OR REPLACE statements.

This affects CREATE OR REPLACE LANGUAGE/VIEW/RULE/FUNCTION
---
 src/backend/catalog/pg_aggregate.c  |3 ++-
 src/backend/catalog/pg_proc.c   |6 +-
 src/backend/commands/functioncmds.c |   16 +---
 src/backend/commands/proclang.c |   30 ++
 src/backend/commands/view.c |   18 ++
 src/backend/rewrite/rewriteDefine.c |   25 -
 src/backend/tcop/utility.c  |8 
 src/include/catalog/pg_proc_fn.h|3 ++-
 src/include/commands/defrem.h   |2 +-
 src/include/commands/proclang.h |2 +-
 src/include/commands/view.h |2 +-
 src/include/rewrite/rewriteDefine.h |5 +++--
 12 files changed, 88 insertions(+), 32 deletions(-)

diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index eadf41f..31e91b4 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -229,7 +229,8 @@ AggregateCreate(const char *aggName,
 			  NIL,		/* parameterDefaults */
 			  PointerGetDatum(NULL),	/* proconfig */
 			  1,	/* procost */
-			  0);		/* prorows */
+			  0,		/* prorows */
+			  NULL);	/* didReplace */
 
 	/*
 	 * Okay to create the pg_aggregate entry.
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index d464979..e456139 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -85,7 +85,8 @@ ProcedureCreate(const char *procedureName,
 List *parameterDefaults,
 Datum proconfig,
 float4 procost,
-float4 prorows)
+float4 prorows,
+bool *didReplace)
 {
 	Oid			retval;
 	int			parameterCount;
@@ -650,6 +651,9 @@ ProcedureCreate(const char *procedureName,
 			AtEOXact_GUC(true, save_nestlevel);
 	}
 
+	if(didReplace)
+		*didReplace = is_update;
+
 	return retval;
 }
 
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 2347cad..4e07c9d 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -770,7 +770,7 @@ interpret_AS_clause(Oid languageOid, const char *languageName,
  *	 Execute a CREATE FUNCTION utility statement.
  */
 void
-CreateFunction(CreateFunctionStmt *stmt, const char *queryString)
+CreateFunction(CreateFunctionStmt *stmt, const char *queryString, char *completionTag)
 {
 	char	   *probin_str;
 	char	   *prosrc_str;
@@ -791,7 +791,8 @@ CreateFunction(CreateFunctionStmt *stmt, const char *queryString)
 	Oid			requiredResultType;
 	bool		isWindowFunc,
 isStrict,
-security;
+security,
+didReplace;
 	char		volatility;
 	ArrayType  *proconfig;
 	float4		procost;
@@ -958,7 +959,16 @@ CreateFunction(CreateFunctionStmt *stmt, const char *queryString)
 	parameterDefaults,
 	PointerGetDatum(proconfig),
 	procost,
-	prorows);
+	prorows,
+	didReplace);
+
+	if (completionTag)
+	{
+		if (didReplace)
+			strcpy(completionTag, REPLACE FUNCTION);
+		else
+			strcpy(completionTag, CREATE FUNCTION);
+	}
 }
 
 
diff --git a/src/backend/commands/proclang.c b/src/backend/commands/proclang.c
index 9d0d3b2..10d3cd9 100644
--- a/src/backend/commands/proclang.c
+++ b/src/backend/commands/proclang.c
@@ -51,7 +51,7 @@ typedef struct
 
 static void create_proc_lang(const char *languageName, bool replace,
  Oid languageOwner, Oid handlerOid, Oid inlineOid,
- Oid valOid, bool trusted);
+ Oid valOid, bool trusted, bool *didReplace);
 static PLTemplate *find_language_template(const char *languageName);
 static void AlterLanguageOwner_internal(HeapTuple tup, Relation rel,
 			Oid newOwnerId);
@@ -62,7 +62,7 @@ static void AlterLanguageOwner_internal(HeapTuple tup, Relation rel,
  * -
  */
 void
-CreateProceduralLanguage(CreatePLangStmt *stmt)

Re: [HACKERS] [GENERAL] column-level update privs + lock table

2010-11-28 Thread Josh Kupershmidt

On Fri, Nov 26, 2010 at 7:11 PM, Robert Haas robertmh...@gmail.com wrote:
 I'm not totally convinced that this is the correct behavior.  It seems
 a bit surprising that UPDATE privilege on a single column is enough to
 lock out all SELECT activity from the table.  It's actually a bit
 surprising that even full-table UPDATE privileges are enough to do
 this, but this change would allow people to block access to data they
 can neither see nor modify.  That seems counterintuitive, if not a
 security hole.

The way I see it, it's a Good Thing to encourage people to assign
UPDATE privileges on tables only as minimally as possible. The damage
that a poorly coded or malicious user can do with LOCK TABLE
privileges is insignificant next to the damage they can do with more
UPDATE privileges than they really need.

Right now, we're basically encouraging admins to grant full-table
update privileges when that's not really necessary.

If, in the future, Postgres supports the ability to LOCK TABLE only on
specific columns, I think we could refine this permissions check so
that column-level update privileges only allowed the user to lock
those columns. But I think this patch is a step in the right
direction.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Return command tag 'REPLACE X' for CREATE OR REPLACE statements.

Marti Raudsepp ma...@juffo.org writes:
 This patch returns command tag CREATE X or REPLACE X for
 LANGAUGE/VIEW/RULE/FUNCTION. This is done by passing completionTag to
 from ProcessUtility to more functions, and adding a 'bool *didUpdate'
 argument to some lower-level functions. I'm not sure if passing back
 the status in a bool* is considered good style, but this way all the
 functions look consistent.

This is going to break clients that expect commands to return the same
command tag as they have in the past.  I doubt that whatever usefulness
is gained will outweigh the compatibility problems.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] column-level update privs + lock table

On Fri, 2010-11-26 at 19:11 -0500, Robert Haas wrote:
 2010/11/25 KaiGai Kohei kai...@ak.jp.nec.com:
  (2010/10/16 4:49), Josh Kupershmidt wrote:
  [Moving to -hackers]
 
  On Fri, Oct 15, 2010 at 3:43 AM, Simon Riggssi...@2ndquadrant.com  wrote:
  On Mon, 2010-10-11 at 09:41 -0400, Josh Kupershmidt wrote:
  On Thu, Oct 7, 2010 at 7:43 PM, Josh Kupershmidtschmi...@gmail.com  
  wrote:
 
  I noticed that granting a user column-level update privileges doesn't
  allow that user to issue LOCK TABLE with any mode other than Access
  Share.
 
  Anyone think this could be added as a TODO?
 
  Seems so to me, but you raise on Hackers.
 
  Thanks, Simon. Attached is a simple patch to let column-level UPDATE
  privileges allow a user to LOCK TABLE in a mode higher than Access
  Share. Small doc. update and regression test update are included as
  well. Feedback is welcome.
 
 
  I checked your patch, then I'd like to mark it as ready for committer.
 
  The point of this patch is trying to solve an incompatible behavior
  between SELECT ... FOR SHARE/UPDATE and LOCK command.
 
  On ExecCheckRTEPerms(), it allows the required accesses when no columns
  are explicitly specified in the query and the current user has necessary
  privilege on one of columns within the target relation.
  If we stand on the perspective that LOCK command should take same
  privileges with the case when we use SELECT ... FOR SHARE/UPDATE without
  specifying explicit columns, like COUNT(*), the existing LOCK command
  seems to me odd.
 
  I think this patch fixes the behavior as we expected.
 
 I'm not totally convinced that this is the correct behavior.  It seems
 a bit surprising that UPDATE privilege on a single column is enough to
 lock out all SELECT activity from the table.  It's actually a bit
 surprising that even full-table UPDATE privileges are enough to do
 this, but this change would allow people to block access to data they
 can neither see nor modify.  That seems counterintuitive, if not a
 security hole.

This comment misses the point. A user can already lock every row of a
table, if they choose, by issuing SELECT ... FOR SHARE/UPDATE, if they
have update rights on a single column. So the patch does not increase
the rights of the user, it merely allows it to happen in a rational way
and in a way that makes SELECT and LOCK work the same.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG question about PREPARE and EXECUTE

2010-11-28 Thread Michael Meskes

On Wed, Nov 10, 2010 at 11:44:52AM +0100, Boszormenyi Zoltan wrote:
 a question came to us in the form of a code example,
 which I shortened. Say, we have this structure:
 ...
 Any comment on why it isn't done?

Missing feature. Originally the pure text based statement copying wasn't able
to cope with these and then it simply wasn't implemented when we went to server
side prepares, that is if my memory serves well.

The same feature is implemented for cursor declaration/open I think.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

Robert Haas robertmh...@gmail.com writes:
 On Sat, Nov 27, 2010 at 11:18 PM, Bruce Momjian br...@momjian.us wrote:
 Not sure that information moves us forward.  If the postmaster cleared
 the memory, we would have COW in the child and probably be even slower.

 Well, we can determine the answers to these questions empirically.

Not really.  Per Bruce's description, a page would become COW the moment
the postmaster touched (either write or read) any variable on it.  Since
we have no control over how the loader lays out static variables, the
actual behavior of a particular build would be pretty random and subject
to unexpected changes caused by seemingly unrelated edits.

Also, the referenced URL only purports to describe the behavior of
HPUX, which is not exactly a mainstream OS.  I think it requires a
considerable leap of faith to assume that all or even most platforms
work the way this suggests, and not in the dumber fashion Andres
suggested.  Has anybody here actually looked at the relevant Linux
or BSD kernel code?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PLy_malloc and plperl mallocs

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 I'll do that for PL/Python for now. While on the topic of needless FATAL
 errors, if you try to create a Python 3 function in a session that
 already loaded Python 2, you get a FATAL error with an errhint of

 Start a new session to use a different Python major version.

 If you raise a FATAL error, you don't really have much choice than to
 start a new session, since the current one just got nuked, no? Should
 this be an ERROR?

I believe we decided that it had to be FATAL because the session could
no longer be trusted to execute functions of the other python version
either.  Check the archives from when that patch was put in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Report: Linux huge pages with Postgres

On Sat, 2010-11-27 at 14:27 -0500, Tom Lane wrote:

 This is discouraging; it certainly doesn't make me want to expend the
 effort to develop a production patch.

Perhaps.

Why do this only for shared memory? Surely the majority of memory
accesses are to private memory, so being able to allocate private memory
in a single huge page would be better for avoiding TLB cache misses.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Report: Linux huge pages with Postgres

Simon Riggs si...@2ndquadrant.com writes:
 On Sat, 2010-11-27 at 14:27 -0500, Tom Lane wrote:
 This is discouraging; it certainly doesn't make me want to expend the
 effort to develop a production patch.

 Perhaps.

 Why do this only for shared memory?

There's no exposed API for causing a process's regular memory to become
hugepages.

 Surely the majority of memory
 accesses are to private memory, so being able to allocate private memory
 in a single huge page would be better for avoiding TLB cache misses.

It's not really about the number of memory accesses, it's about the
number of TLB entries needed.  Private memory is generally a lot smaller
than shared, in a tuned PG installation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Report: Linux huge pages with Postgres

On Sun, 2010-11-28 at 12:04 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Sat, 2010-11-27 at 14:27 -0500, Tom Lane wrote:
  This is discouraging; it certainly doesn't make me want to expend the
  effort to develop a production patch.
 
  Perhaps.
 
  Why do this only for shared memory?
 
 There's no exposed API for causing a process's regular memory to become
 hugepages.

We could make all the palloc stuff into shared memory also (private
shared memory that is). We're not likely to run out of 64-bit memory
addresses any time soon.

  Surely the majority of memory
  accesses are to private memory, so being able to allocate private memory
  in a single huge page would be better for avoiding TLB cache misses.
 
 It's not really about the number of memory accesses, it's about the
 number of TLB entries needed.  Private memory is generally a lot smaller
 than shared, in a tuned PG installation.

Sure, but 4MB of memory is enough to require 1000 TLB entries, which is
more than enough to blow the TLB even on a Nehalem. So the size of the
memory we access is already big enough to blow the cache, even without
shared buffers. If the majority of accesses are from private memory then
the TLB cache will already be thrashed by the time we access shared
buffers again.

That is at least one possible explanation for the lack of benefit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Report: Linux huge pages with Postgres

Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-11-28 at 12:04 -0500, Tom Lane wrote:
 There's no exposed API for causing a process's regular memory to become
 hugepages.

 We could make all the palloc stuff into shared memory also (private
 shared memory that is). We're not likely to run out of 64-bit memory
 addresses any time soon.

Mph.  It's still not going to work well enough to be useful, because the
kernel design for hugepages assumes a pretty static number of them.
That maps well to our use of shared memory, not at all well to process
local memory.

 Sure, but 4MB of memory is enough to require 1000 TLB entries, which is
 more than enough to blow the TLB even on a Nehalem.

That can't possibly be right.  I'm sure the chip designers have heard of
programs using more than 4MB.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

On Sun, Nov 28, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Nov 27, 2010 at 11:18 PM, Bruce Momjian br...@momjian.us wrote:
 Not sure that information moves us forward.  If the postmaster cleared
 the memory, we would have COW in the child and probably be even slower.

 Well, we can determine the answers to these questions empirically.

 Not really.  Per Bruce's description, a page would become COW the moment
 the postmaster touched (either write or read) any variable on it.  Since
 we have no control over how the loader lays out static variables, the
 actual behavior of a particular build would be pretty random and subject
 to unexpected changes caused by seemingly unrelated edits.

Well, one big character array pretty much has to be laid out
contiguously, and it would be pretty surprising (but not entirely
impossible) to find that the linker randomly sprinkles symbols from
other files in between consecutive definitions in the same source
file.  I think the next question to answer is to try to allocate blame
for the memset/memcpy overhead between page faults and the zeroing
itself.  That seems like something we can easily member by writing a
test program that zeroes the same region twice and kicks out timing
numbers.  If, as you and Andres are arguing, the actual zeroing is
minor, then we can forget this whole line of discussion and move on to
other possible optimizations.  If that turns out not to be true then
we can worry about how best to avoid the zeroing.  I have to believe
that's a solvable problem; the question is whether there's a benefit.

In a close race, I don't think we should get bogged down in
micro-optimization here, both because micro-optimizations may not gain
much and because what works well on one platform may not do much at
all on another.  The more general issue here is what to do about our
high backend startup costs.  Beyond trying to recycle backends for new
connections, as I've previous proposed and with all the problems it
entails, the only thing that looks promising here is to try to somehow
cut down on the cost of populating the catcache and relcache, not that
I have a very clear idea how to do that.  This has to be a soluble
problem because other people have solved it.  To some degree we're a
victim of our own flexible and extensible architecture here, but I
find it pretty unsatisfying to just say, OK, well, we're slow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Report: Linux huge pages with Postgres

2010-11-28 Thread Martijn van Oosterhout

On Sun, Nov 28, 2010 at 02:32:04PM -0500, Tom Lane wrote:
  Sure, but 4MB of memory is enough to require 1000 TLB entries, which is
  more than enough to blow the TLB even on a Nehalem.
 
 That can't possibly be right.  I'm sure the chip designers have heard of
 programs using more than 4MB.

According to
http://www.realworldtech.com/page.cfm?ArticleID=RWT040208182719p=8
on the Core 2 chip there wasn't even enough TLB to cover the entire
onboard cache. With Nehalem there are 2304 TLB entries on the chip,
which cover at least the whole onboard cache, but only just.

Memory access is expensive. I think if you got good statistics on how
much time your CPU is waiting for memory it'd be pretty depressing.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Rethinking representation of sort/hash semantics in queries and plans

2010-11-28 Thread Dimitri Fontaine

Tom Lane t...@sss.pgh.pa.us writes:
 If you look closely at what we're doing with sort operators
 (get_ordering_op_properties pretty much encapsulates this), it turns out
 that a sort operator is shorthand for three pieces of information:

 1. btree opfamily OID
 2. specific input datatype for the opfamily
 3. ascending or descending direction

 So to fix these problems we'd need to replace sort operator OIDs in
 SortGroupClause and plan nodes with those three items.  Obviously, this
 would be slightly bulkier, but the extra cost added to copying parse and
 plan trees should be tiny compared to the avoided syscache lookups.

 A possible compromise is to avoid storing the specific input datatype.

My understanding is that opfamily+datatype gives an opclass. What about
storing the opclass OID there?

Other than that, cleaning up the current situation by having as good an
view of the bigger picture as what you have now sounds great.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Bruce Momjian

Robert Haas wrote:
 On Sat, Nov 27, 2010 at 11:18 PM, Bruce Momjian br...@momjian.us wrote:
  Not sure that information moves us forward. ?If the postmaster cleared
  the memory, we would have COW in the child and probably be even slower.
 
 Well, we can determine the answers to these questions empirically.  I
 think some more scrutiny of the code with the points you and Andres
 and Tom have raised is probably in order, and probably some more
 benchmarking, too.  I haven't had a chance to do that yet, however.

Basically, my bet is if you allocated a large zero-data variable in the
postmaster but never accessed it from the postmaster, at most you would
copy-on-write (COW) fault in two page, one at the beginning that is
shared by accessed variables, and one at the end.  The remaining pages
(4k default for x86) would be zero-filled and not COW shared.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

Robert Haas robertmh...@gmail.com writes:
 The more general issue here is what to do about our
 high backend startup costs.  Beyond trying to recycle backends for new
 connections, as I've previous proposed and with all the problems it
 entails, the only thing that looks promising here is to try to somehow
 cut down on the cost of populating the catcache and relcache, not that
 I have a very clear idea how to do that.

One comment to make here is that it would be a serious error to focus on
the costs of just starting and stopping a backend; you have to think
about cases where the backend does at least some useful work in between,
and that means actually *populating* those caches (to some extent) not
just initializing them.  Maybe your wording above was chosen with that
in mind, but I think onlookers might easily overlook the point.

FWIW, today I've been looking into getting rid of the silliness in
build_index_pathkeys whereby it reconstructs pathkey opfamily OIDs
from sortops instead of just using the index opfamilies directly.
It turns out that once you fix that, there is no need at all for
relcache to cache per-index operator data (the rd_operator arrays)
because that's the only code that uses 'em.  I don't see any particular
change in the runtime of the regression tests from ripping out that
part of the cached data, but it ought to have at least some beneficial
effect on real startup time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking representation of sort/hash semantics in queries and plans

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 So to fix these problems we'd need to replace sort operator OIDs in
 SortGroupClause and plan nodes with those three items.  Obviously, this
 would be slightly bulkier, but the extra cost added to copying parse and
 plan trees should be tiny compared to the avoided syscache lookups.

 My understanding is that opfamily+datatype gives an opclass. What about
 storing the opclass OID there?

Then you'd just need to look up the opfamily again.  Opclasses are too
small a division to be useful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] column-level update privs + lock table

On Sun, Nov 28, 2010 at 11:35 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, 2010-11-26 at 19:11 -0500, Robert Haas wrote:
 2010/11/25 KaiGai Kohei kai...@ak.jp.nec.com:
  (2010/10/16 4:49), Josh Kupershmidt wrote:
  [Moving to -hackers]
 
  On Fri, Oct 15, 2010 at 3:43 AM, Simon Riggssi...@2ndquadrant.com  
  wrote:
  On Mon, 2010-10-11 at 09:41 -0400, Josh Kupershmidt wrote:
  On Thu, Oct 7, 2010 at 7:43 PM, Josh Kupershmidtschmi...@gmail.com  
  wrote:
 
  I noticed that granting a user column-level update privileges doesn't
  allow that user to issue LOCK TABLE with any mode other than Access
  Share.
 
  Anyone think this could be added as a TODO?
 
  Seems so to me, but you raise on Hackers.
 
  Thanks, Simon. Attached is a simple patch to let column-level UPDATE
  privileges allow a user to LOCK TABLE in a mode higher than Access
  Share. Small doc. update and regression test update are included as
  well. Feedback is welcome.
 
 
  I checked your patch, then I'd like to mark it as ready for committer.
 
  The point of this patch is trying to solve an incompatible behavior
  between SELECT ... FOR SHARE/UPDATE and LOCK command.
 
  On ExecCheckRTEPerms(), it allows the required accesses when no columns
  are explicitly specified in the query and the current user has necessary
  privilege on one of columns within the target relation.
  If we stand on the perspective that LOCK command should take same
  privileges with the case when we use SELECT ... FOR SHARE/UPDATE without
  specifying explicit columns, like COUNT(*), the existing LOCK command
  seems to me odd.
 
  I think this patch fixes the behavior as we expected.

 I'm not totally convinced that this is the correct behavior.  It seems
 a bit surprising that UPDATE privilege on a single column is enough to
 lock out all SELECT activity from the table.  It's actually a bit
 surprising that even full-table UPDATE privileges are enough to do
 this, but this change would allow people to block access to data they
 can neither see nor modify.  That seems counterintuitive, if not a
 security hole.

 This comment misses the point. A user can already lock every row of a
 table, if they choose, by issuing SELECT ... FOR SHARE/UPDATE, if they
 have update rights on a single column. So the patch does not increase
 the rights of the user, it merely allows it to happen in a rational way
 and in a way that makes SELECT and LOCK work the same.

Locking every row of the table allows a user with UPDATE privileges to
block all current UPDATE and DELETE statements, but it won't
necessarily block INSERT statements (depending on unique indices) and
it certainly won't block SELECT statements.  This patch proposes to
allow a user with update privileges on a single column to lock out ALL
concurrent activity, reads and writes.  So it is not by any definition
making SELECT and LOCK work the same.

What it IS doing is making column-level update permissions and
table-level update permissions work the same way.  After all, one
might argue, if full-table update permissions allow a user to take an
access exclusive lock, why not single-column update permissions?  I
think, though, that there is a reasonable argument to be made that a
user who has been given UPDATE privileges on the entire table contents
is more trusted than one who has privileges only on certain columns.
The first user can presumably trash the entire table contents if he so
desires; the second one can't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SSI using rw-conflict lists

2010-11-28 Thread Kevin Grittner

Well, it's been a productive holiday weekend. I've completed the
switch of the SSI implementation from one conflict pointer in and one
conflict pointer out per transaction to a list of conflicts between
transactions. This eliminated all false positives in my dtester
suite. The only test which had still had any was the one based on
the oft-cited receipting example, which had 210 permutations of how
the statements could be interleaved, of which six actually created an
anomaly; but about half the others were false positives because of
the limits of the conflict pointers.

For those following along, the initial (2008 ACM SIGMOD) paper on SSI
used booleans to flag conflicts. Cahill's 2009 doctoral work
improved the false positive rate by switching these to pointers. The
obvious next step for anyone competent in the field was to go from
simple pointers to lists. That was in my mind as a useful
optimization from the beginning, but was brought to a head by Heikki
and Jeff with their concerns about memory management -- I couldn't
see a good way to fix that without first implementing the lists.

Along the way I also implemented more aggressive clean-up of memory
resources, including immediate complete clean-up of a transaction
which is rolled back. The patch is now also poised to use Andres
Freund's IDLE IN TRANSACTION cancellation code in such a way that
SSI can guarantee that even the most pessimal load will not thrash to
the point of no progress -- when this is done, some transaction which
wrote data must successfully commit before transactions concurrent to
it can be cancelled. These nice features required the conflict list.

I need some advice, though. I had somehow had it in my head that
fields not declared static had a shared copy among all PostgreSQL
backends. When things didn't behave properly, I discovered the error
of my ways, and moved them to a shared memory structure used by the
SSI code, just to get things working. It looks kinda ugly at the
moment, and I'm not sure what would be considered best practices to
clean it up.

(1) Should these be moved to ShmemVariableCache, or is it OK to
leave them in this structure as long as I comment it adequately?

(2) Would it be a good idea to create macros for referencing these
fields? The current references are long and distracting to read, and
would all need to change if the fields are moved to a different
shared memory structure.

The relevant commit diff is here:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=bc3aba45e192afcd7776c68e28438991a3406db6

There's another issue involving clarity of code, versus correct
behavior. When I got to the last few false positives, I found that
they could only be eliminated by tracking one more datum for a
transaction committed with a conflict out to another transaction.
There was a field which was unused in the particular situations where
we needed this new datum, so I made it work with the dual use of the
existing field.

(3) What would be the cleanest way to conditionally store values
representing different things in one field of a structure, so that
someone reading the code understands what's happening?

The commit diff for this is here:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=9425d7fa551a1433bdbec1de5cfb1bfa9f43da22

As always, any other tips or criticisms (or encouragement!) are
welcome.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking representation of sort/hash semantics in queries and plans

I wrote:
 (For some extra amusement, trace through where
 build_index_pathkeys' data comes from...)

While I don't propose to implement right away the whole SortGroupClause
and plan tree modification sketched above, I did look into fixing
build_index_pathkeys so that it doesn't uselessly convert from
opfamilies to sort operators and back again.  The main reason this is
relevant at the moment is that we could get rid of relcache.c's caching
of operators related to indexes, which seems possibly useful in
connection with the current discussion of backend startup time.

What becomes apparent when you look closely at what that code is doing
is that it's catering to the possibility of an amcanorder index access
method that isn't btree.  As things stand in HEAD, an add-on index
access method would be recognized as supporting ordering so long as it
registers the regular comparison operators (, , etc) with the same
index strategy numbers as btree.  The reason that it would work is that
those operators would be found as the fwdsortop/revsortop entries for
the index, and then looking up those operator OIDs in btree opfamilies
would locate the corresponding btree opfamily OIDs, which is what you
have to have to match to a pathkey's opfamily.

In the attached draft patch that would no longer work, because the new
access method would have to have the exact same opfamily OIDs as btree
in order to match to btree-derived pathkeys --- and of course it can't
have that, since access method is a property of an opfamily.

Now, this loss of flexibility doesn't particularly bother me, because
I know of no existing or contemplated btree-substitute access methods.
If one did appear on the horizon, there are a couple of ways we could
fix the problem, the cleanest being to let a non-btree opfamily declare
that it sorts the same as btree opfamily so-and-so.  Or we could fix
it locally in plancat.c by performing the lookup-the-operators-and-
then-the-btree-opfamilies dance on the fly when setting up IndexOptInfo
for a non-btree amcanorder index.  But I'm disinclined to write such
code when there's no way to test it and no foreseeable possibility
that it'll ever be used.  Maybe we should just make plancat.c throw
a not-implemented error if amcanorder is true but it's not btree.

Thoughts?  Anyone particularly opposed to pursuing an optimization
of this kind at all?

regards, tom lane

PS: the attached patch doesn't yet include removal of relcache
rd_operator arrays, since that would just make the patch bigger
without exposing any new issues.

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index d8fc12068fb6113c0baf8aa4e5941e0150ce3521..f73e0e6dc6007ce768828480f74621752aaf57d2 100644
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
*** find_usable_indexes(PlannerInfo *root, R
*** 380,386 
  		 * how many of them are actually useful for this query.  This is not
  		 * relevant unless we are at top level.
  		 */
! 		index_is_ordered = OidIsValid(index-fwdsortop[0]);
  		if (index_is_ordered  possibly_useful_pathkeys 
  			istoplevel  outer_rel == NULL)
  		{
--- 380,386 
  		 * how many of them are actually useful for this query.  This is not
  		 * relevant unless we are at top level.
  		 */
! 		index_is_ordered = (index-sortopfamily != NULL);
  		if (index_is_ordered  possibly_useful_pathkeys 
  			istoplevel  outer_rel == NULL)
  		{
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 8af0c6dc482372244b15a8a5f5a4a562e34ac91b..6325655eed84759168a51aed3f712582ec9c1f00 100644
*** a/src/backend/optimizer/path/pathkeys.c
--- b/src/backend/optimizer/path/pathkeys.c
*** static PathKey *make_canonical_pathkey(P
*** 36,47 
  	   EquivalenceClass *eclass, Oid opfamily,
  	   int strategy, bool nulls_first);
  static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
- static PathKey *make_pathkey_from_sortinfo(PlannerInfo *root,
- 		   Expr *expr, Oid ordering_op,
- 		   bool nulls_first,
- 		   Index sortref,
- 		   bool create_it,
- 		   bool canonicalize);
  static Var *find_indexkey_var(PlannerInfo *root, RelOptInfo *rel,
    AttrNumber varattno);
  static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
--- 36,41 
*** canonicalize_pathkeys(PlannerInfo *root,
*** 224,232 
  
  /*
   * make_pathkey_from_sortinfo
!  *	  Given an expression, a sortop, and a nulls-first flag, create
!  *	  a PathKey.  If canonicalize = true, the result is a canonical
!  *	  PathKey, otherwise not.  (But note it might be redundant anyway.)
   *
   * If the PathKey is being generated from a SortGroupClause, sortref should be
   * the SortGroupClause's SortGroupRef; otherwise zero.
--- 218,226 
  
  /*
   * make_pathkey_from_sortinfo
!  *	  Given an expression and sort-order information, create

Re: [HACKERS] Report: Linux huge pages with Postgres

2010-11-28 Thread Kenneth Marshall

On Sat, Nov 27, 2010 at 02:27:12PM -0500, Tom Lane wrote:
 We've gotten a few inquiries about whether Postgres can use huge pages
 under Linux.  In principle that should be more efficient for large shmem
 regions, since fewer TLB entries are needed to support the address
 space.  I spent a bit of time today looking into what that would take.
 My testing was done with current Fedora 13, kernel version
 2.6.34.7-61.fc13.x86_64 --- it's possible some of these details vary
 across other kernel versions.
 
 You can test this with fairly minimal code changes, as illustrated in
 the attached not-production-grade patch.  To select huge pages we have
 to include SHM_HUGETLB in the flags for shmget(), and we have to be
 prepared for failure (due to permissions or lack of allocated
 hugepages).  I made the code just fall back to a normal shmget on
 failure.  A bigger problem is that the shmem request size must be a
 multiple of the system's hugepage size, which is *not* a constant
 even though the test patch just uses 2MB as the assumed value.  For a
 production-grade patch we'd have to scrounge the active value out of
 someplace in the /proc filesystem (ick).
 

I would expect that you can just iterate through the size possibilities
pretty quickly and just use the first one that works -- no /proc
groveling.

 In addition to the code changes there are a couple of sysadmin
 requirements to make huge pages available to Postgres:
 
 1. You have to configure the Postgres user as a member of the group
 that's permitted to allocate hugepage shared memory.  I did this:
 sudo sh -c id -g postgres /proc/sys/vm/hugetlb_shm_group
 For production use you'd need to put this in the PG initscript,
 probably, to ensure it gets re-set after every reboot and before PG
 is started.
 
Since it would take advantage of them automatically, this would be
just a normal DBA/admin task.

 2. You have to manually allocate some huge pages --- there doesn't
 seem to be any setting that says just give them out on demand.
 I did this:
 sudo sh -c echo 600 /proc/sys/vm/nr_hugepages
 which gave me a bit over 1GB of space reserved as huge pages.
 Again, this'd have to be done over again at each system boot.
 
Same.

 For testing purposes, I figured that what I wanted to stress was
 postgres process swapping and shmem access.  I built current git HEAD
 with --enable-debug and no other options, and tested with these
 non-default settings:
  shared_buffers   1GB
  checkpoint_segments  50
  fsyncoff
 (fsync intentionally off since I'm not trying to measure disk speed).
 The test machine has two dual-core Nehalem CPUs.  Test case is pgbench
 at -s 25; I ran several iterations of pgbench -c 10 -T 60 bench
 in each configuration.
 
 And the bottom line is: if there's any performance benefit at all,
 it's on the order of 1%.  The best result I got was about 3200 TPS
 with hugepages, and about 3160 without.  The noise in these numbers
 is more than 1% though.
 
 This is discouraging; it certainly doesn't make me want to expend the
 effort to develop a production patch.  However, perhaps someone else
 can try to show a greater benefit under some other test conditions.
 
   regards, tom lane
 
I would not really expect to see much benefit in the region that the
normal TLB page size would cover with the typical number of TLB entries.
1GB of shared buffers would not be enough to cause TLB thrashing with
most processors. Bump it to 8-32GB or more and if the queries use up
TLB entries with local work_mem you should see some more value in the
patch. 

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib: auth_delay module

2010-11-28 Thread Jeff Janes

On Sun, Nov 28, 2010 at 5:38 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Nov 27, 2010 at 2:44 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 I haven' t thought of a way to test this, so I guess I'll just ask.
 If the attacking client just waits a few milliseconds for a response
 and then drops the socket, opening a new one, will the server-side
 walking-dead process continue to be charged against max_connections
 until it's sleep expires?

 I'm not sure, either.  I suspect the answer is yes.  I guess you could
 test this by writing a loop like this:

 while true; do psql connection parameters that will fail authentication; 
 done

 ...and then hitting ^C every few seconds during execution.  After
 doing that for a bit, run select * from pg_stat_activity or ps auxww |
 grep postgres in another window.

Right, I didn't think of using psql, I thought I'd have to wrangle my
own socket code.

I wrote up a perl script that spawns psql and immediately kills it.  I
quickly start getting psql: FATAL:  sorry, too many clients already
errors.  And that condition doesn't clear until the sleep expires on
the earliest ones spawned.

So it looks like the max_connections is charged until the auth_delay expires.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

On Sun, Nov 28, 2010 at 3:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 The more general issue here is what to do about our
 high backend startup costs.  Beyond trying to recycle backends for new
 connections, as I've previous proposed and with all the problems it
 entails, the only thing that looks promising here is to try to somehow
 cut down on the cost of populating the catcache and relcache, not that
 I have a very clear idea how to do that.

 One comment to make here is that it would be a serious error to focus on
 the costs of just starting and stopping a backend; you have to think
 about cases where the backend does at least some useful work in between,
 and that means actually *populating* those caches (to some extent) not
 just initializing them.  Maybe your wording above was chosen with that
 in mind, but I think onlookers might easily overlook the point.

I did have that in mind, but I agree the point is worth mentioning.
So, for example, it wouldn't gain anything meaningful for us to
postpone catcache initialization until someone executes a query.  It
would improve the synthetic benchmark, but that's it.

 FWIW, today I've been looking into getting rid of the silliness in
 build_index_pathkeys whereby it reconstructs pathkey opfamily OIDs
 from sortops instead of just using the index opfamilies directly.
 It turns out that once you fix that, there is no need at all for
 relcache to cache per-index operator data (the rd_operator arrays)
 because that's the only code that uses 'em.  I don't see any particular
 change in the runtime of the regression tests from ripping out that
 part of the cached data, but it ought to have at least some beneficial
 effect on real startup time.

Wow. that's great.  The fact that it simplifies the code is probably
the main point, but obviously whatever cycles we can save during
startup (and ongoing operation) are all to the good.

One possible way to get a real speedup here would be to look for ways
to trim the number of catcaches.  But I'm not too convinced there's
much water to squeeze out of that rock.  After our recent conversation
about KNNGIST, it occurred to me to wonder whether there's really any
point in pretending that a user can usefully add an AM, both due to
hard-wired planner knowledge and due to lack of any sort of extensible
XLOG support.  If not, we could potentially turn pg_am into a
hardcoded lookup table rather than a modifiable catalog, which would
also likely be faster; and perhaps reference AMs elsewhere with
characters rather than OIDs.  But even if this were judged a sensible
thing to do I'm not very sure that even a purpose-built synthetic
benchmark would be able to measure the speedup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

Robert Haas robertmh...@gmail.com writes:
 After our recent conversation
 about KNNGIST, it occurred to me to wonder whether there's really any
 point in pretending that a user can usefully add an AM, both due to
 hard-wired planner knowledge and due to lack of any sort of extensible
 XLOG support.  If not, we could potentially turn pg_am into a
 hardcoded lookup table rather than a modifiable catalog, which would
 also likely be faster; and perhaps reference AMs elsewhere with
 characters rather than OIDs.  But even if this were judged a sensible
 thing to do I'm not very sure that even a purpose-built synthetic
 benchmark would be able to measure the speedup.

Well, the lack of extensible XLOG support is definitely a big handicap
to building a *production* index AM as an add-on.  But it's not such a
handicap for development.  And I don't believe that the planner is
hardwired in any way that doesn't allow new index types.  GIST and GIN
have both been added successfully without kluging the planner.  It does
know a lot more about btree than other index types, but that doesn't
mean you can't add a new index type that doesn't behave like btree;
that's more reflective of where development effort has been spent.

So I would consider the above idea a step backwards, and I doubt it
would save anything meaningful anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] column-level update privs + lock table

2010-11-28 Thread KaiGai Kohei

(2010/11/27 9:11), Robert Haas wrote:
 2010/11/25 KaiGai Koheikai...@ak.jp.nec.com:
 (2010/10/16 4:49), Josh Kupershmidt wrote:
 [Moving to -hackers]

 On Fri, Oct 15, 2010 at 3:43 AM, Simon Riggssi...@2ndquadrant.com
 wrote:
 On Mon, 2010-10-11 at 09:41 -0400, Josh Kupershmidt wrote:
 On Thu, Oct 7, 2010 at 7:43 PM, Josh Kupershmidtschmi...@gmail.com
 wrote:

 I noticed that granting a user column-level update privileges doesn't
 allow that user to issue LOCK TABLE with any mode other than Access
 Share.

 Anyone think this could be added as a TODO?

 Seems so to me, but you raise on Hackers.

 Thanks, Simon. Attached is a simple patch to let column-level UPDATE
 privileges allow a user to LOCK TABLE in a mode higher than Access
 Share. Small doc. update and regression test update are included as
 well. Feedback is welcome.


 I checked your patch, then I'd like to mark it as ready for committer.

 The point of this patch is trying to solve an incompatible behavior
 between SELECT ... FOR SHARE/UPDATE and LOCK command.

 On ExecCheckRTEPerms(), it allows the required accesses when no columns
 are explicitly specified in the query and the current user has necessary
 privilege on one of columns within the target relation.
 If we stand on the perspective that LOCK command should take same
 privileges with the case when we use SELECT ... FOR SHARE/UPDATE without
 specifying explicit columns, like COUNT(*), the existing LOCK command
 seems to me odd.

 I think this patch fixes the behavior as we expected.
 
 I'm not totally convinced that this is the correct behavior.  It seems
 a bit surprising that UPDATE privilege on a single column is enough to
 lock out all SELECT activity from the table.  It's actually a bit
 surprising that even full-table UPDATE privileges are enough to do
 this, but this change would allow people to block access to data they
 can neither see nor modify.  That seems counterintuitive, if not a
 security hole.
 
In my understanding, UPDATE privilege on a single column also allows
lock out concurrent updating even if this query tries to update rows
partially.
Therefore, the current code considers UPDATE privilege on a single
column is enough to lock out the table. Right?

My comment was from a standpoint which wants consistent behavior
between SELECT ... FOR and LOCK command. If we concerned about this
behavior, ExecCheckRTEPerms() might be a place where we also should fix.

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib: auth_delay module

On Sun, Nov 28, 2010 at 5:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Sun, Nov 28, 2010 at 5:38 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Nov 27, 2010 at 2:44 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 I haven' t thought of a way to test this, so I guess I'll just ask.
 If the attacking client just waits a few milliseconds for a response
 and then drops the socket, opening a new one, will the server-side
 walking-dead process continue to be charged against max_connections
 until it's sleep expires?

 I'm not sure, either.  I suspect the answer is yes.  I guess you could
 test this by writing a loop like this:

 while true; do psql connection parameters that will fail authentication; 
 done

 ...and then hitting ^C every few seconds during execution.  After
 doing that for a bit, run select * from pg_stat_activity or ps auxww |
 grep postgres in another window.

 Right, I didn't think of using psql, I thought I'd have to wrangle my
 own socket code.

 I wrote up a perl script that spawns psql and immediately kills it.  I
 quickly start getting psql: FATAL:  sorry, too many clients already
 errors.  And that condition doesn't clear until the sleep expires on
 the earliest ones spawned.

 So it looks like the max_connections is charged until the auth_delay expires.

Yeah.  Avoiding that would be hard, and it's not clear that there's
any demand.  The demand for doing this much seems a bit marginal too,
but there were several people who seemed to think it worth committing,
so I did.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

On Sun, Nov 28, 2010 at 6:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 After our recent conversation
 about KNNGIST, it occurred to me to wonder whether there's really any
 point in pretending that a user can usefully add an AM, both due to
 hard-wired planner knowledge and due to lack of any sort of extensible
 XLOG support.  If not, we could potentially turn pg_am into a
 hardcoded lookup table rather than a modifiable catalog, which would
 also likely be faster; and perhaps reference AMs elsewhere with
 characters rather than OIDs.  But even if this were judged a sensible
 thing to do I'm not very sure that even a purpose-built synthetic
 benchmark would be able to measure the speedup.

 Well, the lack of extensible XLOG support is definitely a big handicap
 to building a *production* index AM as an add-on.  But it's not such a
 handicap for development.

Realistically, it's hard for me to imagine that anyone would go to the
trouble of building it as a loadable module first and then converting
it to part of core later on.  That'd hardly be less work.

 And I don't believe that the planner is
 hardwired in any way that doesn't allow new index types.  GIST and GIN
 have both been added successfully without kluging the planner.

We have 9 boolean flags to indicate the capabilities (or lack thereof)
of AMs, and we only have 4 AMs.  It seems altogether plausible to
assume that the next AM we add could require flags 10 and 11.  Heck, I
think KNNGIST is going to require another flag... which will likely
never be set for any AM other than GIST.

 It does
 know a lot more about btree than other index types, but that doesn't
 mean you can't add a new index type that doesn't behave like btree;
 that's more reflective of where development effort has been spent.

 So I would consider the above idea a step backwards, and I doubt it
 would save anything meaningful anyway.

That latter point, as far as I'm concerned, is the real nail in the coffin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib: auth_delay module

2010-11-28 Thread Jeff Janes

On Sun, Nov 28, 2010 at 3:57 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Nov 28, 2010 at 5:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Sun, Nov 28, 2010 at 5:38 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Nov 27, 2010 at 2:44 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 I haven' t thought of a way to test this, so I guess I'll just ask.
 If the attacking client just waits a few milliseconds for a response
 and then drops the socket, opening a new one, will the server-side
 walking-dead process continue to be charged against max_connections
 until it's sleep expires?

 I'm not sure, either.  I suspect the answer is yes.  I guess you could
 test this by writing a loop like this:

 while true; do psql connection parameters that will fail authentication; 
 done

 ...and then hitting ^C every few seconds during execution.  After
 doing that for a bit, run select * from pg_stat_activity or ps auxww |
 grep postgres in another window.

 Right, I didn't think of using psql, I thought I'd have to wrangle my
 own socket code.

 I wrote up a perl script that spawns psql and immediately kills it.  I
 quickly start getting psql: FATAL:  sorry, too many clients already
 errors.  And that condition doesn't clear until the sleep expires on
 the earliest ones spawned.

 So it looks like the max_connections is charged until the auth_delay expires.

 Yeah.  Avoiding that would be hard, and it's not clear that there's
 any demand.  The demand for doing this much seems a bit marginal too,
 but there were several people who seemed to think it worth committing,
 so I did.


Oh, I wasn't complaining.  I think that having max_connections be
charged for the duration even if the socket is dropped is the only
reasonable thing to do, and wanted to verify that it did happen.
Otherwise the module wouldn't do a very good job at its purpose, the
attacker would simply wait a few milliseconds and then assume it got
the wrong password and kill the connection and start new one.
Preventing the brute force password attack by shunting it into a DOS
attack instead seems reasonable.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib: auth_delay module

On Sun, Nov 28, 2010 at 7:10 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Oh, I wasn't complaining.  I think that having max_connections be
 charged for the duration even if the socket is dropped is the only
 reasonable thing to do, and wanted to verify that it did happen.
 Otherwise the module wouldn't do a very good job at its purpose, the
 attacker would simply wait a few milliseconds and then assume it got
 the wrong password and kill the connection and start new one.

Good point.

 Preventing the brute force password attack by shunting it into a DOS
 attack instead seems reasonable.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Report: Linux huge pages with Postgres

Kenneth Marshall k...@rice.edu writes:
 On Sat, Nov 27, 2010 at 02:27:12PM -0500, Tom Lane wrote:
 ... A bigger problem is that the shmem request size must be a
 multiple of the system's hugepage size, which is *not* a constant
 even though the test patch just uses 2MB as the assumed value.  For a
 production-grade patch we'd have to scrounge the active value out of
 someplace in the /proc filesystem (ick).

 I would expect that you can just iterate through the size possibilities
 pretty quickly and just use the first one that works -- no /proc
 groveling.

It's not really that easy, because (at least on the kernel version I
tested) it's not the shmget that fails, it's the later shmat.  Releasing
and reacquiring the shm segment would require significant code
restructuring, and at least on some platforms could produce weird
failure cases --- I seem to recall having heard of kernels where the
release isn't instantaneous, so that you could run up against SHMMAX
for no apparent reason.  Really you do want to scrape the value.

 2. You have to manually allocate some huge pages --- there doesn't
 seem to be any setting that says just give them out on demand.
 I did this:
 sudo sh -c echo 600 /proc/sys/vm/nr_hugepages
 which gave me a bit over 1GB of space reserved as huge pages.
 Again, this'd have to be done over again at each system boot.

 Same.

The fact that hugepages have to be manually managed, and that any
unaccounted-for represent completely wasted RAM, seems like a pretty
large PITA to me.  I don't see anybody buying into that for gains
measured in single-digit percentages.

 1GB of shared buffers would not be enough to cause TLB thrashing with
 most processors.

Well, bigger cases would be useful to try, although Simon was claiming
that the TLB starts to fall over at 4MB of working set.  I don't have a
large enough machine to try the sort of test you're suggesting, so if
anyone thinks this is worth pursuing, there's the patch ... go test it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

Robert Haas robertmh...@gmail.com writes:
 One possible way to get a real speedup here would be to look for ways
 to trim the number of catcaches.

BTW, it's not going to help to remove catcaches that have a small
initial size, as the pg_am cache certainly does.  If the bucket zeroing
cost is really something to minimize, it's only the caches with the
largest nbuckets counts that are worth considering --- and we certainly
can't remove those without penalty.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Return command tag 'REPLACE X' for CREATE OR REPLACE statements.

On Sun, Nov 28, 2010 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marti Raudsepp ma...@juffo.org writes:
 This patch returns command tag CREATE X or REPLACE X for
 LANGAUGE/VIEW/RULE/FUNCTION. This is done by passing completionTag to
 from ProcessUtility to more functions, and adding a 'bool *didUpdate'
 argument to some lower-level functions. I'm not sure if passing back
 the status in a bool* is considered good style, but this way all the
 functions look consistent.

 This is going to break clients that expect commands to return the same
 command tag as they have in the past.  I doubt that whatever usefulness
 is gained will outweigh the compatibility problems.

You complained about this when we changed the SELECT tag for 9.0 to
include row-counts for CTAS etc. where it hadn't before.  Have we
gotten any complaints about that change breaking clients?

I think more expessive command tags are in general a good thing.  The
idea that this particular change would be useful primarily for humans
examining the psql output seems a bit weak to me, but I can easily see
it being useful for programs.  Right now a program has no reason to
look at this command tag anyway; it'll always be the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

BTW, this might be premature to mention pending some tests about mapping
versus zeroing overhead, but it strikes me that there's more than one
way to skin a cat.  I still think the idea of statically allocated space
sucks.  But what if we rearranged things so that palloc0 doesn't consist
of palloc-then-memset, but rather push the zeroing responsibility down
into the allocator?  In particular, I'm imagining that palloc0 with a
sufficiently large space request --- more than a couple pages --- could
somehow arrange to get space that's guaranteed zero already.  And if the
request isn't large, zeroing it isn't where our problem is anyhow.

The most portable way to do that would be to use calloc insted of malloc,
and hope that libc is smart enough to provide freshly-mapped space.
It would be good to look and see whether glibc actually does so,
of course.  If not we might end up having to mess with sbrk for
ourselves, and I'm not sure how pleasantly that interacts with malloc.

Another question that would be worth asking here is whether the
hand-baked MemSet macro still outruns memset on modern architectures.
I think it's been quite a few years since that was last tested.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Return command tag 'REPLACE X' for CREATE OR REPLACE statements.

Robert Haas robertmh...@gmail.com writes:
 I think more expessive command tags are in general a good thing.  The
 idea that this particular change would be useful primarily for humans
 examining the psql output seems a bit weak to me, but I can easily see
 it being useful for programs.  Right now a program has no reason to
 look at this command tag anyway; it'll always be the same.

Hmm ... that's a good point, although I'm not sure that it's 100% true.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking representation of sort/hash semantics in queries and plans

I wrote:
 Now, this loss of flexibility doesn't particularly bother me, because
 I know of no existing or contemplated btree-substitute access methods.
 If one did appear on the horizon, there are a couple of ways we could
 fix the problem, the cleanest being to let a non-btree opfamily declare
 that it sorts the same as btree opfamily so-and-so.  Or we could fix
 it locally in plancat.c by performing the lookup-the-operators-and-
 then-the-btree-opfamilies dance on the fly when setting up IndexOptInfo
 for a non-btree amcanorder index.  But I'm disinclined to write such
 code when there's no way to test it and no foreseeable possibility
 that it'll ever be used.  Maybe we should just make plancat.c throw
 a not-implemented error if amcanorder is true but it's not btree.

On further reflection the last seems like clearly the thing to do.

 PS: the attached patch doesn't yet include removal of relcache
 rd_operator arrays, since that would just make the patch bigger
 without exposing any new issues.

And here's the complete patch.

regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index d8fc12068fb6113c0baf8aa4e5941e0150ce3521..f73e0e6dc6007ce768828480f74621752aaf57d2 100644
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
*** find_usable_indexes(PlannerInfo *root, R
*** 380,386 
  		 * how many of them are actually useful for this query.  This is not
  		 * relevant unless we are at top level.
  		 */
! 		index_is_ordered = OidIsValid(index-fwdsortop[0]);
  		if (index_is_ordered  possibly_useful_pathkeys 
  			istoplevel  outer_rel == NULL)
  		{
--- 380,386 
  		 * how many of them are actually useful for this query.  This is not
  		 * relevant unless we are at top level.
  		 */
! 		index_is_ordered = (index-sortopfamily != NULL);
  		if (index_is_ordered  possibly_useful_pathkeys 
  			istoplevel  outer_rel == NULL)
  		{
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 8af0c6dc482372244b15a8a5f5a4a562e34ac91b..6325655eed84759168a51aed3f712582ec9c1f00 100644
*** a/src/backend/optimizer/path/pathkeys.c
--- b/src/backend/optimizer/path/pathkeys.c
*** static PathKey *make_canonical_pathkey(P
*** 36,47 
  	   EquivalenceClass *eclass, Oid opfamily,
  	   int strategy, bool nulls_first);
  static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
- static PathKey *make_pathkey_from_sortinfo(PlannerInfo *root,
- 		   Expr *expr, Oid ordering_op,
- 		   bool nulls_first,
- 		   Index sortref,
- 		   bool create_it,
- 		   bool canonicalize);
  static Var *find_indexkey_var(PlannerInfo *root, RelOptInfo *rel,
    AttrNumber varattno);
  static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
--- 36,41 
*** canonicalize_pathkeys(PlannerInfo *root,
*** 224,232 
  
  /*
   * make_pathkey_from_sortinfo
!  *	  Given an expression, a sortop, and a nulls-first flag, create
!  *	  a PathKey.  If canonicalize = true, the result is a canonical
!  *	  PathKey, otherwise not.  (But note it might be redundant anyway.)
   *
   * If the PathKey is being generated from a SortGroupClause, sortref should be
   * the SortGroupClause's SortGroupRef; otherwise zero.
--- 218,226 
  
  /*
   * make_pathkey_from_sortinfo
!  *	  Given an expression and sort-order information, create a PathKey.
!  *	  If canonicalize = true, the result is a canonical PathKey,
!  *	  otherwise not.  (But note it might be redundant anyway.)
   *
   * If the PathKey is being generated from a SortGroupClause, sortref should be
   * the SortGroupClause's SortGroupRef; otherwise zero.
*** canonicalize_pathkeys(PlannerInfo *root,
*** 240,285 
   */
  static PathKey *
  make_pathkey_from_sortinfo(PlannerInfo *root,
! 		   Expr *expr, Oid ordering_op,
  		   bool nulls_first,
  		   Index sortref,
  		   bool create_it,
  		   bool canonicalize)
  {
- 	Oid			opfamily,
- opcintype;
  	int16		strategy;
  	Oid			equality_op;
  	List	   *opfamilies;
  	EquivalenceClass *eclass;
  
  	/*
! 	 * An ordering operator fully determines the behavior of its opfamily, so
! 	 * could only meaningfully appear in one family --- or perhaps two if one
! 	 * builds a reverse-sort opfamily, but there's not much point in that
! 	 * anymore.  But EquivalenceClasses need to contain opfamily lists based
! 	 * on the family membership of equality operators, which could easily be
! 	 * bigger.	So, look up the equality operator that goes with the ordering
! 	 * operator (this should be unique) and get its membership.
  	 */
- 
- 	/* Find the operator in pg_amop --- failure shouldn't happen */
- 	if (!get_ordering_op_properties(ordering_op,
- 	opfamily, opcintype, strategy))
- 		elog(ERROR, operator %u is not a valid ordering operator,
-

Re: [HACKERS] Report: Linux huge pages with Postgres

2010-11-28 Thread Greg Stark

On Mon, Nov 29, 2010 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would expect that you can just iterate through the size possibilities
 pretty quickly and just use the first one that works -- no /proc
 groveling.

 It's not really that easy, because (at least on the kernel version I
 tested) it's not the shmget that fails, it's the later shmat.  Releasing
 and reacquiring the shm segment would require significant code
 restructuring, and at least on some platforms could produce weird
 failure cases --- I seem to recall having heard of kernels where the
 release isn't instantaneous, so that you could run up against SHMMAX
 for no apparent reason.  Really you do want to scrape the value.


Couldn't we just round the shared memory allocation down to a multiple
of 4MB? That would handle all older architectures where the size is
2MB or 4MB.

I see online that IA64 supports larger page sizes up to 256MB but then
could we make it the user's problem if they change their hugepagesize
to a larger value to pick a value of shared_buffers that will fit
cleanly? We might need to rejigger things so that the shared memory
segment is exactly the size of shared_buffers and any other shared
data structures are in a separate segment though for that to work.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Feature request: INSERT .... ON DUPLICATE UPDATE ....

2010-11-28 Thread Yourfriend

Dear all,

In our company,  we use both PostgreSQL and MySQL, our developers include me
think that the INSERT ... ON DUPLICATE UPDATE  clause is a much more user
friendly function,so, would you please add this liked function in
PostgreSQL, I know we can write PROCEDURE or RULE to solve this problem,but
compare to write PROCEDURE or RULE, it's apparently more easy to add an
INSERT... ON DUPLICATE clause, so, please consider our request to make
PostgreSQL more power.

Thanks all for great work to PostgreSQL.

Daojing.Zhou
2010/11/29

Re: [HACKERS] contrib: auth_delay module

2010-11-28 Thread Fujii Masao

On Sat, Nov 27, 2010 at 9:25 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Nov 25, 2010 at 1:18 AM, KaiGai Kohei kai...@ak.jp.nec.com wrote:
 The attached patch is revised version.

 - Logging part within auth_delay was removed. This module now focuses on
  injection of a few seconds delay on authentication failed.
 - Documentation parts were added like any other contrib modules.

 Committed, with a few adjustments.  Per Fujii Masao's suggestion, I
 changed sleep() to pg_usleep(); I also changed the GUC to be reckoned
 in milliseconds rather than seconds.

Thanks. I found the typo:

-
diff --git a/contrib/README b/contrib/README
index 9e223ef..8a12cc1 100644
--- a/contrib/README
+++ b/contrib/README
@@ -30,7 +30,7 @@ adminpack -

 auth_delay
Add a short delay after a failed authentication attempt, to make
-make brute-force attacks on database passwords a bit harder.
+   brute-force attacks on database passwords a bit harder.
by KaiGai Kohei kai...@ak.jp.nec.com

 auto_explain -
-

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Report: Linux huge pages with Postgres

Greg Stark gsst...@mit.edu writes:
 On Mon, Nov 29, 2010 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Really you do want to scrape the value.

 Couldn't we just round the shared memory allocation down to a multiple
 of 4MB? That would handle all older architectures where the size is
 2MB or 4MB.

Rounding *down* will not work, at least not without extremely invasive
changes to the shmem allocation code.  Rounding up is okay, as long as
you don't mind some possibly-wasted space.

 I see online that IA64 supports larger page sizes up to 256MB but then
 could we make it the user's problem if they change their hugepagesize
 to a larger value to pick a value of shared_buffers that will fit
 cleanly? We might need to rejigger things so that the shared memory
 segment is exactly the size of shared_buffers and any other shared
 data structures are in a separate segment though for that to work.

Two shmem segments would be a pretty serious PITA too, certainly a lot
more so than a few lines to read a magic number from /proc.

But this is all premature pending a demonstration that there's enough
potential gain here to be worth taking any trouble at all.  The one
set of numbers we have says otherwise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-11-28 Thread Greg Stark

On Mon, Nov 29, 2010 at 12:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The most portable way to do that would be to use calloc insted of malloc,
 and hope that libc is smart enough to provide freshly-mapped space.
 It would be good to look and see whether glibc actually does so,
 of course.  If not we might end up having to mess with sbrk for
 ourselves, and I'm not sure how pleasantly that interacts with malloc.

It's *supposed* to interact fine. The only thing I wonder is that I
think malloc intentionally uses mmap for larger allocations but I'm
not clear what the advantages are. Is it because it's a cheaper way to
get zeroed bytes? Or just so that free has a hope of returning the
allocations to the OS?


 Another question that would be worth asking here is whether the
 hand-baked MemSet macro still outruns memset on modern architectures.
 I think it's been quite a few years since that was last tested.

I know glibc has some sexy memset macros for cases where the size is a
constant. I'm not sure there's been much of an advance in the general
case though. This would tend to imply we should consider going the
other direction of having the caller of palloc0 do the zeroing
instead. Or making palloc0 a macro which expands to include calling
memset with the parameter inlined.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

Greg Stark gsst...@mit.edu writes:
 On Mon, Nov 29, 2010 at 12:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another question that would be worth asking here is whether the
 hand-baked MemSet macro still outruns memset on modern architectures.
 I think it's been quite a few years since that was last tested.

 I know glibc has some sexy memset macros for cases where the size is a
 constant. I'm not sure there's been much of an advance in the general
 case though. This would tend to imply we should consider going the
 other direction of having the caller of palloc0 do the zeroing
 instead. Or making palloc0 a macro which expands to include calling
 memset with the parameter inlined.

Well, that was exactly the reason why we did it the way we do it.
However, I think it's probably only node allocations where the size
is likely to be constant and hence result in a win.  Perhaps we should
implement makeNode() differently from the general case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add a primary key using an existing index

2010-11-28 Thread Itagaki Takahiro

On Fri, Nov 26, 2010 at 05:58, Steve Singer ssin...@ca.afilias.info wrote:
 The attached version of the patch gets your regression tests to pass.
 I'm going to mark this as ready for a committer.

I think we need more discussions about the syntax:
  ALTER TABLE table_name ADD PRIMARY KEY (...) WITH (INDEX='index_name')

Issues:
* WITH (...) is designed for storage parameters. I think treating INDEX
  as a special keyword in the way might be confusable.
* 'index_name' needs to be single-quoted, but object identifiers should
  be double-quoted literals in normal cases.
* The key specifier is a duplicated option because the index has own keys.
  Do we need it? It might be for safety, but redundant.
  Note that the patch raises a reasonable error on conflict:
ERROR:  PRIMARY KEY/UNIQUE constraint definition does not match the index

And, I found a bug:
* USING INDEX TABLESPACE clause is silently ignored, even if the index
  uses another tablespace.

After all, do we need a special syntax for the functionality?
Reusing WITH (...) syntax seems to be a trouble for me.
ADD PRIMARY KEY USING index_name might be a candidate, but we'd
better reserve USING for non-btree PRIMARY KEY/UNIQUE indexes.
Ideas and suggestions?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure on hot standby

On Fri, 2010-11-26 at 01:11 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  That would mean running GetCurrentTransactionId() inside LockAcquire()
 
  if (lockmode = AccessExclusiveLock 
  locktag-locktag_type == LOCKTAG_RELATION 
  !RecoveryInProgress())
  (void) GetCurrentTransactionId();
 
  Any objections to that fix?
 
 Could we have a wal level test in there too please?  It's pretty awful
 in any case...

Slightly neater version of same idea applied to resolve this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

On Sun, Nov 28, 2010 at 7:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 One possible way to get a real speedup here would be to look for ways
 to trim the number of catcaches.

 BTW, it's not going to help to remove catcaches that have a small
 initial size, as the pg_am cache certainly does.  If the bucket zeroing
 cost is really something to minimize, it's only the caches with the
 largest nbuckets counts that are worth considering --- and we certainly
 can't remove those without penalty.

Yeah, very true.  What's a bit frustrating about the whole thing is
that we spend a lot of time pulling data into the caches that's
basically static and never likely to change anywhere, ever.  I bet the
number of people for whom (int4, int4) has any non-standard
properties is somewhere between slim and none; and it might well be
the case that formrdesc() is faster than reading the relcache init
file, if we didn't need to worry about deviation from canonical.  This
is even more frustrating in the hypothetical situation where a backend
can switch databases, because we have to blow away all of these cache
entries that are 99.9% likely to be basically identical in the old and
new databases.

The relation descriptors for pg_class and pg_attribute are examples of
things it would be nice to hardwire and never need to update.  We are
really pretty much screwed if there is any meaningful deviation from
what is expected, but relpages, reltuples, and relfrozenxid - and
maybe relacl or reloptions - can legitimately vary between databases.

Maybe we could speed things up a bit if we got rid of the pg_attribute
entries for the system attributes (except OID).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add a primary key using an existing index

On Sun, Nov 28, 2010 at 8:06 PM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 On Fri, Nov 26, 2010 at 05:58, Steve Singer ssin...@ca.afilias.info wrote:
 The attached version of the patch gets your regression tests to pass.
 I'm going to mark this as ready for a committer.

 I think we need more discussions about the syntax:
  ALTER TABLE table_name ADD PRIMARY KEY (...) WITH (INDEX='index_name')

Why not:

ALTER TABLE table_name ADD PRIMARY KEY (...) INDEX index_name;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] column-level update privs + lock table

2010/11/28 KaiGai Kohei kai...@ak.jp.nec.com:
 I'm not totally convinced that this is the correct behavior.  It seems
 a bit surprising that UPDATE privilege on a single column is enough to
 lock out all SELECT activity from the table.  It's actually a bit
 surprising that even full-table UPDATE privileges are enough to do
 this, but this change would allow people to block access to data they
 can neither see nor modify.  That seems counterintuitive, if not a
 security hole.

 In my understanding, UPDATE privilege on a single column also allows
 lock out concurrent updating even if this query tries to update rows
 partially.
 Therefore, the current code considers UPDATE privilege on a single
 column is enough to lock out the table. Right?

Against concurrent updates and deletes, yes.  Against inserts that
don't involve potential unique-index conflicts, and against selects,
no.

 My comment was from a standpoint which wants consistent behavior
 between SELECT ... FOR and LOCK command.

Again, nothing about this makes those consistent.

 If we concerned about this
 behavior, ExecCheckRTEPerms() might be a place where we also should fix.

I don't understand what you're getting at here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature request: INSERT .... ON DUPLICATE UPDATE ....

On Sun, Nov 28, 2010 at 7:44 PM, Yourfriend doudou...@gmail.com wrote:
 In our company,  we use both PostgreSQL and MySQL, our developers include me
 think that the INSERT ... ON DUPLICATE UPDATE  clause is a much more user
 friendly function,so, would you please add this liked function in
 PostgreSQL, I know we can write PROCEDURE or RULE to solve this problem,but
 compare to write PROCEDURE or RULE, it's apparently more easy to add an
 INSERT... ON DUPLICATE clause, so, please consider our request to make
 PostgreSQL more power.

I hope very much we'll support something like this some day, but no
one is working on it presently.  There is, however, ongoing work to
support SQL-standard MERGE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib: auth_delay module

On Sun, Nov 28, 2010 at 7:45 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks. I found the typo:

I only have one?  :-)

Thanks, fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] column-level update privs + lock table

2010-11-28 Thread KaiGai Kohei

(2010/11/29 10:43), Robert Haas wrote:
 2010/11/28 KaiGai Koheikai...@ak.jp.nec.com:
 I'm not totally convinced that this is the correct behavior.  It seems
 a bit surprising that UPDATE privilege on a single column is enough to
 lock out all SELECT activity from the table.  It's actually a bit
 surprising that even full-table UPDATE privileges are enough to do
 this, but this change would allow people to block access to data they
 can neither see nor modify.  That seems counterintuitive, if not a
 security hole.

 In my understanding, UPDATE privilege on a single column also allows
 lock out concurrent updating even if this query tries to update rows
 partially.
 Therefore, the current code considers UPDATE privilege on a single
 column is enough to lock out the table. Right?
 
 Against concurrent updates and deletes, yes.  Against inserts that
 don't involve potential unique-index conflicts, and against selects,
 no.
 
 My comment was from a standpoint which wants consistent behavior
 between SELECT ... FOR and LOCK command.
 
 Again, nothing about this makes those consistent.
 
 If we concerned about this
 behavior, ExecCheckRTEPerms() might be a place where we also should fix.
 
 I don't understand what you're getting at here.
 
I thought the author concerned about inconsistency between them.
(Perhaps, I might misunderstood his motivation?)

What was the purpose that this patch tries to solve?
In the first message of this topic, he concerned as follows:

 I noticed that granting a user column-level update privileges doesn't
 allow that user to issue LOCK TABLE with any mode other than Access
 Share.

Do we need to answer: Yes, it is a specification, so you need to grant
table level privileges, instead?

Thanks
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead