[HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

2011-07-26 Thread _石头
Hello Guys,


   Lately, I saw this syntax in ./src/backend/parser/Gray.y as following!
   
  CreateFunctionStmt:
CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
RETURNS func_return createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n-replace = $2;
n-funcname = $4;
n-parameters = $5;
n-returnType = $7;
n-options = $8;
n-withClause = $9;
$$ = (Node *)n;
}
| CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
  RETURNS TABLE '(' table_func_column_list ')' 
createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n-replace = $2;
n-funcname = $4;
n-parameters = 
mergeTableFuncParameters($5, $9);
n-returnType = TableFuncTypeName($9);
n-returnType-location = @7;
n-options = $11;
n-withClause = $12;
$$ = (Node *)n;
}
| CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
  createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n-replace = $2;
n-funcname = $4;
n-parameters = $5;
n-returnType = NULL;
n-options = $6;
n-withClause = $7;
$$ = (Node *)n;
}
;


  I do not know how to use the second syntax:RETURNS TABLE '(' 
table_func_column_list ')' createfunc_opt_list opt_definition.


  May someone help me to write a simple example of this syntax!  Thank 
you very much. Looking forward for your help!

Re: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

2011-07-26 Thread Heikki Linnakangas

On 26.07.2011 10:22, _ʯͷ wrote:

   I do not know how to use the second syntax:RETURNS TABLE '(' 
table_func_column_list ')' createfunc_opt_list opt_definition.


This is hardly a question related to PostgreSQL development, 
pgsql-general mailing list would've been more appropriate.


This is documented at:
http://www.postgresql.org/docs/9.0/static/sql-createfunction.html

There's an example using that syntax on that page.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Nikhil Sontakke
  8.4 had this change:
 
  *
Force child tables to inherit CHECK constraints from parents
(Alex Hunsaker, Nikhil Sontakke, Tom)

  You're not the only one who occasionally bangs his head against it.


Sorry for the occasional head bumps :)


 Yeah.  I think it's good that there's a barrier to blindly dropping a
 constraint that may be important to have on children, but there should
 be a way to override that.


Hmmm, but then it does open up the possibility of naive users shooting
themselves in the foot. It can be easy to conjure up a
parent-only-constraint that does not gel too well with its children. And
that's precisely why this feature was added in the first place..

Regards,
Nikhils


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Aron
Here's another small change to the patch, it works fine for me and it quite
saved my day.

I try to submit the patch by email.


diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c
index f6e2a28..1f88d72 100644
--- a/contrib/vacuumlo/vacuumlo.c
+++ b/contrib/vacuumlo/vacuumlo.c
@@ -48,6 +48,7 @@ struct _param
char   *pg_host;
int verbose;
int dry_run;
+   int transaction_limit;
 };
 
 intvacuumlo(char *, struct _param *);
@@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param)
}
else
deleted++;
+   if(param-transaction_limit!=0 
deleted=param-transaction_limit)
+   break;
}
PQclear(res);
 
@@ -313,6 +316,7 @@ usage(const char *progname)
printf(  -h HOSTNAME  database server host or socket directory\n);
printf(  -n   don't remove large objects, just show what would 
be
done\n);
printf(  -p PORT  database server port\n);
+   printf(  -l LIMIT stop after removing LIMIT LOs\n);
printf(  -U USERNAME  user name to connect as\n);
printf(  -w   never prompt for password\n);
printf(  -W   force password prompt\n);
@@ -342,6 +346,7 @@ main(int argc, char **argv)
param.pg_port = NULL;
param.verbose = 0;
param.dry_run = 0;
+   param.transaction_limit = 0;
 
if (argc  1)
{
@@ -359,7 +364,7 @@ main(int argc, char **argv)
 
while (1)
{
-   c = getopt(argc, argv, h:U:p:vnwW);
+   c = getopt(argc, argv, h:U:p:l:vnwW);
if (c == -1)
break;
 
@@ -395,6 +400,14 @@ main(int argc, char **argv)
}
param.pg_port = strdup(optarg);
break;
+   case 'l':
+   param.transaction_limit = strtol(optarg, NULL, 
10);
+   if ((param.transaction_limit  0) || 
(param.transaction_limit 
2147483647))
+   {
+   fprintf(stderr, %s: invalid transaction 
limit number: %s, valid
range is form 0(disabled) to 2147483647.\n, progname, optarg);
+   exit(1);
+   }
+   break;
case 'h':
param.pg_host = strdup(optarg);
break;


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 回复: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

2011-07-26 Thread _石头
thank you!
 
 
-- 原始邮件 --
发件人: Heikki Linnakangas; 
发送时间: 2011年7月26日(星期二) 下午3:58
收件人: _??; 
抄送: pgsql-hackers; 
主题: Re: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

 
On 26.07.2011 10:22, _?? wrote:
I do not know how to use the second syntax:RETURNS TABLE '(' 
 table_func_column_list ')' createfunc_opt_list opt_definition.

This is hardly a question related to PostgreSQL development, 
pgsql-general mailing list would've been more appropriate.

This is documented at:
http://www.postgresql.org/docs/9.0/static/sql-createfunction.html

There's an example using that syntax on that page.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 01:57 , Noah Misch wrote:
 On Mon, Jul 25, 2011 at 09:06:41PM +0200, Florian Pflug wrote:
 On Jul25, 2011, at 20:37 , Bernd Helmle wrote:
 Ah, but i got now what's wrong here: configure is confusing both libxml2
 installations, and a quick look into config.log proves that: it uses the
 xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
 MacPorts, though i seem to recall to have changed this in the past).
 
 Hm, but I still think there's a bug lurking there. Using a different libxml2
 version for the configure checks than for actual builds surely isn't good...
 
 From looking at configure.in, it seems that we use xml2-config to figure out
 the CFLAGS and LDFLAGS required to build and link against libxml. I guess we
 somehow end up not using these flags when we later test for
 xmlStructuredErrorContext, but do use them during the actual build. Or maybe
 the order of the -I and -L flags just ends up being different in the two 
 cases.
 
 I can reproduce similar behavior on GNU/Linux.  If my setup was sufficiently
 similar, Bernd's problematic build would have used this sequence of directives
 during both configuration and build:
 
  -I/usr/include/libxml2  -I/opt/local/include   -L/opt/local/lib
 
 The directories passed using --with-includes and --with-libraries took
 priority over those from xml2-config.  Since libxml2 headers live in a
 `libxml2' subdirectory, --with-includes=/opt/local/include did not affect
 finding them.  --with-libraries=/opt/local/lib *did* affect finding the
 library binaries, though.  Therefore, he built entirely against /usr headers
 and /opt/local libraries.

Right. Thanks for detailed analysis, Noah!

What's more, xml.c actually does attempt to protect against this situation
by calling LIBXML_TEST_VERSION in pg_xml_init_library().

But that check doesn't fire in our case, because it explicitly allows the
actual library version to be newer than the header file version, as long
as the major versions agree (the major version being 2 in this case).

So in essence, libxml promises ABI backwards-compatibility, but breaks
that promise when it comes to restoring the structured error context.

The only fix I can come up with is to explicitly test whether or not we're
able to restore the structured error context in pg_xml_init_library(). I'm
envisioning we'd so something like

xmlStructuredErrorFunc *saved_errfunc = xmlStructuredError;
#if HAVE_XMLSTRUCTUREDERRORCONTEXT
void *saved_errctx = xmlStructuredErrorContext;
#else
void *saved_errctx = xmlGenericErrorContext;
#endif

xmlSetStructuredErrorFunc((void *) MAGIC, xml_errorHandler);

#ifdef HAVE_XMLSTRUCTUREDERRORCONTEXT
if (xmlStructuredErrorContext != MAGIC)
#else 
if (xmlGenericErrorContext != MAGIC)
#endif
{
  ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg(unable to restore the libxml error context),
 errhint(Postgres was probably built against libxml  2.7.4 but loaded 
a version = 2.7.4 at runtime)))
}

xmlSetStructuredErrorFunc(saved_errctx, saved_errfunc);

The downside of this is that a libxml version upgrade might break postgres,
of course. But by the time postgres 9.1 is released, 2.7.4 will be nearly 3
years old, so I dunno how like it is that a distro would afterwards decide to
upgrade from a version earlier than that to a version later than that.

 We could rearrange things so the xml2-config -L
 flags (or lack thereof) take priority over a --with-libraries directory for
 the purpose of finding libxml2.

Hm, how would we do that just for the purpose of finding libxml? Do autotools
provide a way to specify per-library -L flags?

 As a side note, we don't add an rpath for libxml2 like we do for Perl and
 Python.  That doesn't matter on Darwin, but with GNU libc, it entails setting
 LD_LIBRARY_PATH or updating /etc/ld.so.conf to make the run time linker find
 the library binary used at build time.

Sounds like we should change that.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Robert Haas
On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 07/25/2011 04:07 PM, Robert Haas wrote:

 I did 5-minute pgbench runs with unlogged tables and with permanent
 tables, restarting the database server and reinitializing the tables
 between each run.

 Database scale?  One or multiple pgbench worker threads?  A reminder on the
 amount of RAM in the server would be helpful for interpreting the results
 too.

Ah, sorry.  scale = 100, so small.  pgbench invocation is:

pgbench -T $time -c $clients -j $clients

128GB RAM.

 The other thing I'd recommend if you're running more write-heavy tests is to
 turn off autovacuum.  Whether or not it kicks in depends on the test
 duration and the TPS rate, which adds a source of variability better avoided
 here. It also means that faster tests end up getting penalized by having it
 run near their end, which makes them no longer look like fast results.

Well, I kind of don't want to do that, because surely you wouldn't run
that way in production.  Anyway, it seems clear that nearly all the
contention is in CLogControlLock, ProcArrayLock, and WALInsertLock,
and I don't think shutting off autovacuum is going to change that.
The point of my benchmarking here may be a bit different from the
benchmarking you normally do: I'm not trying to figure out whether
I've got the best hardware, or how good it is, so I don't really care
about squeezing out every last tps.  I just need number that can be
reproduced quickly with different patches to see whether a given code
change helps and by how much.  I wasn't sure that 5-minute pgbench
runs would be good enough for that, but so far the results have been
stable enough that I'm fairly confident I'm seeing real effects rather
than random noise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke
nikhil.sonta...@enterprisedb.com wrote:
 Yeah.  I think it's good that there's a barrier to blindly dropping a
 constraint that may be important to have on children, but there should
 be a way to override that.

 Hmmm, but then it does open up the possibility of naive users shooting
 themselves in the foot. It can be easy to conjure up a
 parent-only-constraint that does not gel too well with its children. And
 that's precisely why this feature was added in the first place..

Yeah, but I think we need to take that chance.  At the very least, we
need to support the equivalent of a non-inherited CHECK (false) on
parent tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] longstanding mingw warning

2011-07-26 Thread Andrew Dunstan


Why do we get this warning on Mingw?:

   x86_64-w64-mingw32-gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing 
-fwrapv -g -I../../../../src/include 
-I/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/include 
-I../pgsql/src/include/port/win32 -DEXEC_BACKEND  -I/c/prog/mingwdep/include 
-I/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/include/port/win32 
-DBUILDING_DLL  -c -o mingwcompat.o 
/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/backend/port/win32/mingwcompat.c
   
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/backend/port/win32/mingwcompat.c:60:1:
 warning: 'RegisterWaitForSingleObject' redeclared without dllimport attribute: 
previous dllimport ignored


Can we get rid of it?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Andrew Dunstan



On 07/26/2011 09:08 AM, Robert Haas wrote:

On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke
nikhil.sonta...@enterprisedb.com  wrote:

Yeah.  I think it's good that there's a barrier to blindly dropping a
constraint that may be important to have on children, but there should
be a way to override that.

Hmmm, but then it does open up the possibility of naive users shooting
themselves in the foot. It can be easy to conjure up a
parent-only-constraint that does not gel too well with its children. And
that's precisely why this feature was added in the first place..

Yeah, but I think we need to take that chance.  At the very least, we
need to support the equivalent of a non-inherited CHECK (false) on
parent tables.


Indeed. I usually enforce that with a trigger that raises an exception, 
but of course that doesn't help at all with constraint exclusion, and I 
saw a result just a few weeks ago (I forget the exact details) where it 
appeared that the plan chosen was significantly worse because the parent 
table wasn't excluded, so there's a  non-trivial downside from having 
this restriction.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 What's more, xml.c actually does attempt to protect against this situation
 by calling LIBXML_TEST_VERSION in pg_xml_init_library().

 But that check doesn't fire in our case, because it explicitly allows the
 actual library version to be newer than the header file version, as long
 as the major versions agree (the major version being 2 in this case).

 So in essence, libxml promises ABI backwards-compatibility, but breaks
 that promise when it comes to restoring the structured error context.

Fun.

 The only fix I can come up with is to explicitly test whether or not we're
 able to restore the structured error context in pg_xml_init_library().

Yeah, I think you are right.

 The downside of this is that a libxml version upgrade might break postgres,

Such an upgrade would break Postgres anyway --- better we are able to
detect and report it clearly than that we fail in bizarre ways.

 As a side note, we don't add an rpath for libxml2 like we do for Perl and
 Python.

 Sounds like we should change that.

I don't think so.  It would just be another headache for packagers ---
in Red Hat distros, at least, packages are explicitly forbidden from
containing any rpath settings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: bogus descriptions displayed by \d+

2011-07-26 Thread Robert Haas
On Mon, Jul 25, 2011 at 10:29 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 That seems like a good way to document this; patch for master updated.
 I avoided mucking with the documentation for COMMENT ON RULE and
 COMMENT ON TRIGGER this time; they both say table when they really
 mean table or view, but maybe trying to differentiate between
 table, table_or_view, and relation will make things overly
 complicated.

I think this is basically the right approach but I found what you did
here a bit wordy, so I simplified it, committed it, and back-patched
to 9.0 with suitable adjustment.  Hopefully I didn't simplify it into
a form you don't like.

 Also, a patch against master to:
  * get rid of the bogus Description outputs for \d+ sequence_name
 and \d+ index_name

 This part looks OK, but instead of doing a negative test (not-index,
 not-sequence) let's have it do a positive test, for the same types
 comment.c allows.

 Right, fixed.

Committed this part to head with minor tweaks.

 And while I'm messing with this, some further nitpicks about psql not
 addressed by these patches:
  * The Storage column for \d+ sequence_name is correct, I suppose,
 but repetitive

 I'm OK with removing that.

 Hrm, would it be better to keep that  Storage bit around in some
 non-repetitive form, maybe on its own line below the table output?

Well, I don't really see that it has any value.  I'd probably just
leave it the way it is, but if we're going to change something, I
would favor removing it over relocating it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 As a side note, we don't add an rpath for libxml2 like we do for Perl and
 Python.

 Sounds like we should change that.

 I don't think so.  It would just be another headache for packagers ---
 in Red Hat distros, at least, packages are explicitly forbidden from
 containing any rpath settings.

So what do they do about Perl and Python?

Not sure I understand the virtue of being inconsistent here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deferred partial/expression unique constraints

2011-07-26 Thread Robert Haas
On Mon, Jul 25, 2011 at 2:29 PM, Jeff Davis pg...@j-davis.com wrote:
 On Fri, 2011-07-22 at 23:35 +0300, Peter Eisentraut wrote:
 On ons, 2011-07-13 at 11:26 -0400, Tom Lane wrote:
  Our standard reason for not implementing UNIQUE constraints on
  expressions has been that then you would have a thing that claims to be
  a UNIQUE constraint but isn't representable in the information_schema
  views that are supposed to show UNIQUE constraints.  We avoid this
  objection in the current design by shoving all that functionality into
  EXCLUDE constraints, which are clearly outside the scope of the spec.

 I have never heard that reason before, and I think it's a pretty poor
 one.  There are a lot of other things that are not representable in the
 information schema.

+1.

 I think what Tom is saying is that the information_schema might appear
 inconsistent to someone following the spec.

 Can you give another example where we do something like that?

http://archives.postgresql.org/pgsql-bugs/2010-08/msg00374.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 15:52 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 The only fix I can come up with is to explicitly test whether or not we're
 able to restore the structured error context in pg_xml_init_library().
 
 Yeah, I think you are right.
 
 The downside of this is that a libxml version upgrade might break postgres,
 
 Such an upgrade would break Postgres anyway --- better we are able to
 detect and report it clearly than that we fail in bizarre ways.

On further reflection, instead of checking whether we can restore the error
handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
in pg_xml_done() to ereport(ERROR), and include a hint that explains the
probably cause.

The upside being that we only fail when we actually need to restore the
error handler. Since there's one caller (parse_xml_decl) who calls
pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
isn't only academic. At least XML would output will continue to work
work after libxml is upgraded from  2.7.4 to = 2.7.4.

If nobody objects, I'll post a patch to that effect later today.

BTW, come to think of it, I believe the elog(ERROR) that you put into
xml_errorHandler should to be upgraded to elog(FATAL). Once we longjmp()
out of libxml, it doesn't seem safe to re-enter it, so making the backend
exit seems to be the only safe thing to do.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On further reflection, instead of checking whether we can restore the error
 handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
 in pg_xml_done() to ereport(ERROR), and include a hint that explains the
 probably cause.

 The upside being that we only fail when we actually need to restore the
 error handler. Since there's one caller (parse_xml_decl) who calls
 pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
 isn't only academic. At least XML would output will continue to work
 work after libxml is upgraded from  2.7.4 to = 2.7.4.

Good point.  But what about failing in pg_xml_init?  That is, after
calling xmlSetStructuredErrorFunc, check that it set the variable we
expected it to set.

The purpose of the check in pg_xml_done is not to detect library issues,
but to detect omitted save/restore pairs and similar coding mistakes.
I don't want to upgrade it to an ERROR, and I don't want to confuse
people by hinting that the problem is with libxml.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Alvaro Herrera
Excerpts from Aron's message of mar jul 26 04:18:55 -0400 2011:
 Here's another small change to the patch, it works fine for me and it quite
 saved my day.
 
 I try to submit the patch by email.

There's a problem with this patch: long lines are being wrapped by
your email client, which makes it impossible to apply.  I think it would
be safer if you sent it as an attachment instead of pasting it in the
body of the message.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 16:22 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 On further reflection, instead of checking whether we can restore the error
 handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
 in pg_xml_done() to ereport(ERROR), and include a hint that explains the
 probably cause.
 
 The upside being that we only fail when we actually need to restore the
 error handler. Since there's one caller (parse_xml_decl) who calls
 pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
 isn't only academic. At least XML would output will continue to work
 work after libxml is upgraded from  2.7.4 to = 2.7.4.
 
 Good point.  But what about failing in pg_xml_init?  That is, after
 calling xmlSetStructuredErrorFunc, check that it set the variable we
 expected it to set.

Yeah, that's even better. Will do it that way.

What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
to elog(FATAL)? Shall I do that too, or leave it for now?

 The purpose of the check in pg_xml_done is not to detect library issues,
 but to detect omitted save/restore pairs and similar coding mistakes.
 I don't want to upgrade it to an ERROR, and I don't want to confuse
 people by hinting that the problem is with libxml.

I can see the concern about possible confusion, and I agree that
pg_xml_init(), right after we set our error handler, is the most logical
place to test whether we can read it back.

I guess I don't really see the benefit of the check in pg_xml_done()
triggering a WARNING instead of an ERROR, but since we won't be hijacking
that check now anyway, I don't mind it being a WARNING either.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke
 nikhil.sonta...@enterprisedb.com wrote:
 Hmmm, but then it does open up the possibility of naive users shooting
 themselves in the foot. It can be easy to conjure up a
 parent-only-constraint that does not gel too well with its children. And
 that's precisely why this feature was added in the first place..

 Yeah, but I think we need to take that chance.  At the very least, we
 need to support the equivalent of a non-inherited CHECK (false) on
 parent tables.

No, the right solution is to invent an actual concept of partitioned
tables, not to keep adding ever-weirder frammishes to inheritance so
that it can continue to provide an awkward, poorly-performing emulation
of them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't think so.  It would just be another headache for packagers ---
 in Red Hat distros, at least, packages are explicitly forbidden from
 containing any rpath settings.

 So what do they do about Perl and Python?

Patch the source to not add rpath switches.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
 to elog(FATAL)? Shall I do that too, or leave it for now?

No objection here --- I had considered writing it that way myself.
I refrained for fear of making a bad situation even worse, but if
other people think FATAL would be the safer way, fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Aron Wieck
Hi Tim,

you correctly assumed I was just trying to clean up the code. There was no 
reason to change the behavior, so please ignore my change to the patch.

Aron


On 26.07.2011, at 15:44, Tim Lewis wrote:

 Hi Aron,
 
 Thanks for the input. The small change you suggest would change the 
 behavior of the patch which I would prefer not to do as I have reasons for 
 the previous behavior.
 Because you gave no reasons and stop after removing LIMIT LOs was not 
 changed to stop after attempting to remove LIMIT LOs I suspect you were 
 just trying to keep the code clean.
 
 The difference:
 In your version of the patch vacuumlo will stop after N lo_unlink(OID) 
 attempts.
 The previous behavior of the patch is that vacuumlo will stop after N 
 successful lo_unlink(OID)s.
 
 If you have good reason for your behavior please add another flag so that it 
 is optional.
 There should be a clear distinction between counting vs not, and aborting 
 vs continuing when a lo_unlink(OID) is unsuccessful.
 
 
 On Tue, Jul 26, 2011 at 4:18 AM, Aron m...@eunice.de wrote:
 Here's another small change to the patch, it works fine for me and it quite
 saved my day.
 
 I try to submit the patch by email.
 
 
 diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c
 index f6e2a28..1f88d72 100644
 --- a/contrib/vacuumlo/vacuumlo.c
 +++ b/contrib/vacuumlo/vacuumlo.c
 @@ -48,6 +48,7 @@ struct _param
char   *pg_host;
int verbose;
int dry_run;
 +   int transaction_limit;
  };
 
  intvacuumlo(char *, struct _param *);
 @@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param)
}
else
deleted++;
 +   if(param-transaction_limit!=0 
 deleted=param-transaction_limit)
 +   break;
}
PQclear(res);
 
 @@ -313,6 +316,7 @@ usage(const char *progname)
printf(  -h HOSTNAME  database server host or socket directory\n);
printf(  -n   don't remove large objects, just show what 
 would be
 done\n);
printf(  -p PORT  database server port\n);
 +   printf(  -l LIMIT stop after removing LIMIT LOs\n);
printf(  -U USERNAME  user name to connect as\n);
printf(  -w   never prompt for password\n);
printf(  -W   force password prompt\n);
 @@ -342,6 +346,7 @@ main(int argc, char **argv)
param.pg_port = NULL;
param.verbose = 0;
param.dry_run = 0;
 +   param.transaction_limit = 0;
 
if (argc  1)
{
 @@ -359,7 +364,7 @@ main(int argc, char **argv)
 
while (1)
{
 -   c = getopt(argc, argv, h:U:p:vnwW);
 +   c = getopt(argc, argv, h:U:p:l:vnwW);
if (c == -1)
break;
 
 @@ -395,6 +400,14 @@ main(int argc, char **argv)
}
param.pg_port = strdup(optarg);
break;
 +   case 'l':
 +   param.transaction_limit = strtol(optarg, 
 NULL, 10);
 +   if ((param.transaction_limit  0) || 
 (param.transaction_limit 
 2147483647))
 +   {
 +   fprintf(stderr, %s: invalid transaction 
 limit number: %s, valid
 range is form 0(disabled) to 2147483647.\n, progname, optarg);
 +   exit(1);
 +   }
 +   break;
case 'h':
param.pg_host = strdup(optarg);
break;
 
 
 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html
 Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 
 -- 
 
 Noodle
 Connecting People, Content  Capabilities within the Enterprise
 
 
 Toll Free: 866-258-6951 x 701
 tim.le...@vialect.com
 http://www.vialect.com
 
 Noodle is a product of Vialect Inc
 
 Follow Noodle on Twitter
 http://www.twitter.com/noodle_news
 



Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Aron Wieck
Hi Tim,

I have to correct my previous answer, my change does not alter the behavior of 
your patch significantly.

 The difference:
 In your version of the patch vacuumlo will stop after N lo_unlink(OID) 
 attempts.
 The previous behavior of the patch is that vacuumlo will stop after N 
 successful lo_unlink(OID)s.
 
 If you have good reason for your behavior please add another flag so that it 
 is optional.
 There should be a clear distinction between counting vs not, and aborting 
 vs continuing when a lo_unlink(OID) is unsuccessful.



if (param-dry_run == 0)
{
if (lo_unlink(conn, lo)  0)
{
fprintf(stderr, \nFailed to remove lo %u: , 
lo);
fprintf(stderr, %s, PQerrorMessage(conn));
}
else
deleted++;
}
else
deleted++;
if(param-transaction_limit!=0  
deleted=param-transaction_limit)
break;


The variable deleted is only incremented if a lo_unlink was successful, so my 
patch only introduces a negligible overhead but no actual change in behavior.

I'm very grateful for your patch and I think it should be accepted as soon as 
possible, one or two if does not matter to me.

Aron
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Aron Wieck
 Excerpts from Aron's message of mar jul 26 04:18:55 -0400 2011:
 Here's another small change to the patch, it works fine for me and it quite
 saved my day.
 
 I try to submit the patch by email.
 
 There's a problem with this patch: long lines are being wrapped by
 your email client, which makes it impossible to apply.  I think it would
 be safer if you sent it as an attachment instead of pasting it in the
 body of the message.

I posted using nabble, edited the post and uploaded the file as an attachment.

But here you go again, this time as an attachment.



vacuumlo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Tom Lane t...@sss.pgh.pa.us writes:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't think so.  It would just be another headache for packagers ---
 in Red Hat distros, at least, packages are explicitly forbidden from
 containing any rpath settings.

 So what do they do about Perl and Python?

 Patch the source to not add rpath switches.

No, I take that back.  What the RH packages do is configure with
--disable-rpath, and so long as that applies to this too, I'd
have no objection.

What I was mis-remembering is that there's a patch that *puts back*
plperl's rpath for libperl, because for some reason that no one has ever
satisfactorily explained to me, perl has an exemption from the distro
policy that loadable libraries must be installed so that they're known
to ldconfig.  Which of course is exactly why rpath isn't (supposed to
be) needed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Pavan Deolasee
On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 07/25/2011 04:07 PM, Robert Haas wrote:

 I did 5-minute pgbench runs with unlogged tables and with permanent
 tables, restarting the database server and reinitializing the tables
 between each run.

 Database scale?  One or multiple pgbench worker threads?  A reminder on the
 amount of RAM in the server would be helpful for interpreting the results
 too.

 Ah, sorry.  scale = 100, so small.  pgbench invocation is:


It might be worthwhile to test only with the accounts and history
table and also increasing the number of statements in a transaction.
Otherwise the tiny tables can quickly become a bottleneck.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread David Fetter
On Tue, Jul 26, 2011 at 10:51:58AM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke
  nikhil.sonta...@enterprisedb.com wrote:
  Hmmm, but then it does open up the possibility of naive users shooting
  themselves in the foot. It can be easy to conjure up a
  parent-only-constraint that does not gel too well with its children. And
  that's precisely why this feature was added in the first place..
 
  Yeah, but I think we need to take that chance.  At the very least, we
  need to support the equivalent of a non-inherited CHECK (false) on
  parent tables.
 
 No, the right solution is to invent an actual concept of partitioned
 tables, not to keep adding ever-weirder frammishes to inheritance so
 that it can continue to provide an awkward, poorly-performing emulation
 of them.

Other SQL engines have partitions of types list, range and hash, and
some can sub-partition.  I'm thinking it might be easiest to do the
first before adding layers of partition structure, although we should
probably bear in mind that such layers will eventually exist.

Does the wiki on this need updating?

http://wiki.postgresql.org/wiki/Table_partitioning

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Tim Lewis
Hi Aron,

Thanks for the input. The small change you suggest would change the
behavior of the patch which I would prefer not to do as I have reasons for
the previous behavior.
Because you gave no reasons and stop after removing LIMIT LOs was not
changed to stop after attempting to remove LIMIT LOs I suspect you were
just trying to keep the code clean.

The difference:
In your version of the patch vacuumlo will stop after N lo_unlink(OID)
attempts.
The previous behavior of the patch is that vacuumlo will stop after N
successful lo_unlink(OID)s.

If you have good reason for your behavior please add another flag so that it
is optional.
There should be a clear distinction between counting vs not, and aborting
vs continuing when a lo_unlink(OID) is unsuccessful.


On Tue, Jul 26, 2011 at 4:18 AM, Aron m...@eunice.de wrote:

 Here's another small change to the patch, it works fine for me and it quite
 saved my day.

 I try to submit the patch by email.


 diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c
 index f6e2a28..1f88d72 100644
 --- a/contrib/vacuumlo/vacuumlo.c
 +++ b/contrib/vacuumlo/vacuumlo.c
 @@ -48,6 +48,7 @@ struct _param
char   *pg_host;
int verbose;
int dry_run;
 +   int transaction_limit;
  };

  intvacuumlo(char *, struct _param *);
 @@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param)
 }
else
deleted++;
 +   if(param-transaction_limit!=0 
 deleted=param-transaction_limit)
 +   break;
}
PQclear(res);

 @@ -313,6 +316,7 @@ usage(const char *progname)
 printf(  -h HOSTNAME  database server host or socket
 directory\n);
printf(  -n   don't remove large objects, just show what
 would be
 done\n);
printf(  -p PORT  database server port\n);
 +   printf(  -l LIMIT stop after removing LIMIT LOs\n);
printf(  -U USERNAME  user name to connect as\n);
printf(  -w   never prompt for password\n);
printf(  -W   force password prompt\n);
 @@ -342,6 +346,7 @@ main(int argc, char **argv)
 param.pg_port = NULL;
param.verbose = 0;
param.dry_run = 0;
 +   param.transaction_limit = 0;

if (argc  1)
{
 @@ -359,7 +364,7 @@ main(int argc, char **argv)

while (1)
{
 -   c = getopt(argc, argv, h:U:p:vnwW);
 +   c = getopt(argc, argv, h:U:p:l:vnwW);
if (c == -1)
break;

 @@ -395,6 +400,14 @@ main(int argc, char **argv)
 }
param.pg_port = strdup(optarg);
break;
 +   case 'l':
 +   param.transaction_limit = strtol(optarg,
 NULL, 10);
 +   if ((param.transaction_limit  0) ||
 (param.transaction_limit 
 2147483647))
 +   {
 +   fprintf(stderr, %s: invalid
 transaction limit number: %s, valid
 range is form 0(disabled) to 2147483647.\n, progname, optarg);
 +   exit(1);
 +   }
 +   break;
case 'h':
param.pg_host = strdup(optarg);
break;


 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html
 Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 

Noodle
Connecting People, Content  Capabilities within the Enterprise


Toll Free: 866-258-6951 x 701
tim.le...@vialect.com
http://www.vialect.com

Noodle is a product of Vialect Inc

Follow Noodle on Twitter
http://www.twitter.com/noodle_news


[HACKERS] How to use CreateFunctionStmt's RETURN TABLE?

2011-07-26 Thread _石头
Hello Guys,


   Lately, I saw this syntax in ./src/backend/parser/Gray.y as following!
   
  CreateFunctionStmt:
CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
RETURNS func_return createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n-replace = $2;
n-funcname = $4;
n-parameters = $5;
n-returnType = $7;
n-options = $8;
n-withClause = $9;
$$ = (Node *)n;
}
| CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
  RETURNS TABLE '(' table_func_column_list ')' 
createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n-replace = $2;
n-funcname = $4;
n-parameters = 
mergeTableFuncParameters($5, $9);
n-returnType = TableFuncTypeName($9);
n-returnType-location = @7;
n-options = $11;
n-withClause = $12;
$$ = (Node *)n;
}
| CREATE opt_or_replace FUNCTION func_name 
func_args_with_defaults
  createfunc_opt_list opt_definition
{
CreateFunctionStmt *n = 
makeNode(CreateFunctionStmt);
n-replace = $2;
n-funcname = $4;
n-parameters = $5;
n-returnType = NULL;
n-options = $6;
n-withClause = $7;
$$ = (Node *)n;
}
;


  I do not know how to use the second syntax:RETURNS TABLE '(' 
table_func_column_list ')' createfunc_opt_list opt_definition.


  May someone help me to write a simple example of this syntax!  Thank 
you very much. Looking forward for your help!

Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Jerry Sievers
Andrew Dunstan and...@dunslane.net writes:

 On 07/25/2011 10:31 PM, Jerry Sievers wrote:
 Hackers;

 I just noticed that somewhere between 8.2 and 8.4, an exception is
 raised trying to alter table ONLY some_partition_parent ADD CHECK
 (foo).



 8.4 had this change:

*

  Force child tables to inherit CHECK constraints from parents
  (Alex Hunsaker, Nikhil Sontakke, Tom)

  Formerly it was possible to drop such a constraint from a
  child table, allowing rows that violate the constraint to be
  visible when scanning the parent table. This was deemed
  inconsistent, as well as contrary to SQL standard.


 You're not the only one who occasionally bangs his head against it.

 cheers

 andrew

Thanks Andrew!...  Yeah, I figured it was a documented change but too
lazy tonight to browse release notes :-)

The previous behavior was to me convenient, but I agree, probably lead
to some confusion too.

That our version of partitioning can be overloaded like this though I
think adds power.  A bit of which we lost adding the restrictgion.





-- 
Jerry Sievers
e: jerry.siev...@comcast.net
p: 305.321.1144

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote:
 On Jul26, 2011, at 01:57 , Noah Misch wrote:
  We could rearrange things so the xml2-config -L
  flags (or lack thereof) take priority over a --with-libraries directory for
  the purpose of finding libxml2.
 
 Hm, how would we do that just for the purpose of finding libxml? Do autotools
 provide a way to specify per-library -L flags?

Autoconf (built-in macros, that is) and Automake do not get involved in that.  I
vaguely recall Libtool having support for it, but PostgreSQL does not use
Automake or Libtool.  I also spoke too loosely: -L is never per-library, but you
can emulate that by completing the search externally and passing a full path to
the linker.

Honestly, the benefit is probably too light to justify going to this trouble.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread Timothy D. F. Lewis
I'm not sure what David did for me so as per Roberts suggestion I have
addedhttps://commitfest.postgresql.org/action/patch_view?id=609this
patch to the commit fest.
I'm hoping I have not put this patch in more than one
workflowhttp://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_review_and_commit
.

On Mon, Jul 25, 2011 at 7:28 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jul 25, 2011 at 9:37 AM, Tim elatl...@gmail.com wrote:
  Updated the patch to also apply when the no-action flag is enabled.

 You may want to read this:

 http://wiki.postgresql.org/wiki/Submitting_a_Patch

 And add your patch here:

 https://commitfest.postgresql.org/action/commitfest_view/open

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 18:04 , Noah Misch wrote:
 On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote:
 On Jul26, 2011, at 01:57 , Noah Misch wrote:
 We could rearrange things so the xml2-config -L
 flags (or lack thereof) take priority over a --with-libraries directory for
 the purpose of finding libxml2.
 
 Hm, how would we do that just for the purpose of finding libxml? Do autotools
 provide a way to specify per-library -L flags?
 
 Autoconf (built-in macros, that is) and Automake do not get involved in that. 
  I
 vaguely recall Libtool having support for it, but PostgreSQL does not use
 Automake or Libtool.  I also spoke too loosely: -L is never per-library, but 
 you
 can emulate that by completing the search externally and passing a full path 
 to
 the linker.

Yeah, that was what I though would be the only way too. I had the slight hope
that I had missed something, but unfortunately it seems I didn't :-(

 Honestly, the benefit is probably too light to justify going to this trouble.

Yep. It'd probably be weeks, if not months, until we made that work on all
supported platforms. And by the time it does, we'd probably have re-invented
a sizeable portion of libtool.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 11:40 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 07/25/2011 04:07 PM, Robert Haas wrote:

 I did 5-minute pgbench runs with unlogged tables and with permanent
 tables, restarting the database server and reinitializing the tables
 between each run.

 Database scale?  One or multiple pgbench worker threads?  A reminder on the
 amount of RAM in the server would be helpful for interpreting the results
 too.

 Ah, sorry.  scale = 100, so small.  pgbench invocation is:


 It might be worthwhile to test only with the accounts and history
 table and also increasing the number of statements in a transaction.
 Otherwise the tiny tables can quickly become a bottleneck.

What kind of bottleneck?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-07-26 Thread David Fetter
On Tue, Jul 26, 2011 at 12:18:31PM -0400, Timothy D. F. Lewis wrote:
 On Mon, Jul 25, 2011 at 7:28 PM, Robert Haas robertmh...@gmail.com wrote:
 
  On Mon, Jul 25, 2011 at 9:37 AM, Tim elatl...@gmail.com wrote:
   Updated the patch to also apply when the no-action flag is enabled.
 
  You may want to read this:
 
  http://wiki.postgresql.org/wiki/Submitting_a_Patch
 
  And add your patch here:
 
  https://commitfest.postgresql.org/action/commitfest_view/open
 
  --
  Robert Haas
  EnterpriseDB: http://www.enterprisedb.com
  The Enterprise PostgreSQL Company
 
 I'm not sure what David did for me so as per Roberts suggestion I have
 addedhttps://commitfest.postgresql.org/action/patch_view?id=609this
 patch to the commit fest.
 I'm hoping I have not put this patch in more than one
 workflowhttp://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_review_and_commit
 .

I just put your name as you've asked me to phrase it into the pending
patches part of the PostgreSQL Weekly News.

Thanks for the contribution, and here's hoping your experience here
will be pleasant and productive :)

Cheers,
David.

p.s.  We don't top-post on the PostgreSQL mailing lists.  As with the
customary reply-to-all, it's a convention we've decided works well
for us. :)
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Pavan Deolasee
On Tue, Jul 26, 2011 at 12:24 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jul 26, 2011 at 11:40 AM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
 On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 07/25/2011 04:07 PM, Robert Haas wrote:

 I did 5-minute pgbench runs with unlogged tables and with permanent
 tables, restarting the database server and reinitializing the tables
 between each run.

 Database scale?  One or multiple pgbench worker threads?  A reminder on the
 amount of RAM in the server would be helpful for interpreting the results
 too.

 Ah, sorry.  scale = 100, so small.  pgbench invocation is:


 It might be worthwhile to test only with the accounts and history
 table and also increasing the number of statements in a transaction.
 Otherwise the tiny tables can quickly become a bottleneck.

 What kind of bottleneck?


So many transactions trying to update a small set of rows in a table.
Is that what we really want to measure ? My thinking is that we might
see different result if they are updating different parts of the table
and the transaction start/stop overhead is spread across few
statements.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Pavan Deolasee
On Tue, Jul 26, 2011 at 12:29 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:


 So many transactions trying to update a small set of rows in a table.
 Is that what we really want to measure ? My thinking is that we might
 see different result if they are updating different parts of the table
 and the transaction start/stop overhead is spread across few
 statements.


I think what I am suggesting is that the default pgbench test setup
would probably not give you a good scalability as number of clients
are increased and one reason could be the contention in the small
table. So it might be a good idea to get rid of that and see if we get
much better scalability and understand other bottlenecks.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 4:40M, Pavan Deolasee pavan.deola...@gmail.com wrote:
 On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 07/25/2011 04:07 PM, Robert Haas wrote:

 I did 5-minute pgbench runs with unlogged tables and with permanent
 tables, restarting the database server and reinitializing the tables
 between each run.

 Database scale?  One or multiple pgbench worker threads?  A reminder on the
 amount of RAM in the server would be helpful for interpreting the results
 too.

 Ah, sorry.  scale = 100, so small.  pgbench invocation is:


 It might be worthwhile to test only with the accounts and history
 table and also increasing the number of statements in a transaction.
 Otherwise the tiny tables can quickly become a bottleneck.

+1

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Mon, Jul 25, 2011 at 6:24 PM, Noah Misch n...@2ndquadrant.com wrote:
 On Fri, Jul 22, 2011 at 03:54:03PM -0400, Robert Haas wrote:
 On Fri, Jul 22, 2011 at 3:28 PM, Noah Misch n...@2ndquadrant.com wrote:
  This is attractive, and I don't see any problems with it.  (In theory, you 
  could
  hit a case where the load of resetState gives an ancient false just as 
  the
  counters wrap to match.  Given that the wrap interval is 100x as long 
  as the
  reset interval, I'm not worried about problems on actual silicon.)

 It's actually 262,144 times as long - see MSGNUMWRAPAROUND.

 Ah, so it is.

 It would be pretty easy to eliminate even the theoretical possibility
 of a race by getting rid of resetState altogether and using nextMsgNum
 = -1 to mean that.  Maybe I should go ahead and do that.

 Seems like a nice simplification.

On further reflection, I don't see that this helps: it just moves the
problem around.  With resetState as a separate variable, nextMsgNum is
never changed by anyone other than the owner, so we can never have a
stale load.  But if we overload nextMsgNum to also indicate whether
our state has been reset, then there's a race between when we load
nextMsgNum and when we load maxMsgNum (instead of code I posted
previously, which has a race between when we load resetState and when
we load maxMsgNum).  Now, as you say, it seems really, really
difficult to hit that in practice, but I don't see a way of getting
rid of the theoretical possibility without either (1) a spinlock or
(2) a fence.  (Of course, on x86, the fence could be optimized down to
a compiler barrier.)  I guess the question is should we worry about
that?.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On further reflection, I don't see that this helps: it just moves the
 problem around.  With resetState as a separate variable, nextMsgNum is
 never changed by anyone other than the owner, so we can never have a
 stale load.  But if we overload nextMsgNum to also indicate whether
 our state has been reset, then there's a race between when we load
 nextMsgNum and when we load maxMsgNum (instead of code I posted
 previously, which has a race between when we load resetState and when
 we load maxMsgNum).  Now, as you say, it seems really, really
 difficult to hit that in practice, but I don't see a way of getting
 rid of the theoretical possibility without either (1) a spinlock or
 (2) a fence.  (Of course, on x86, the fence could be optimized down to
 a compiler barrier.)  I guess the question is should we worry about
 that?.

Uh, yes.  I've lost count of the number of times I've seen people hit
one-instruction-wide race condition windows, SIGSEGV crash based on
accessing only one byte past the valid data structure, etc etc.
The worst thing about this type of bug is that you can't reproduce the
failure when you try; doesn't mean your users won't see it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem with compiling beta3 on mingw32+WinXP

2011-07-26 Thread pasman pasmański
After reinstalling mingw is ok.


2011/7/25, pasman pasmański pasma...@gmail.com:
 After googling i found that mingw's gcc works with 64 bit integers.
 But printf is incompatible :( . Possible workaround: include
 inttypes.h , define macros and convert printf strings:

 printf(% LL,(long long)100)

 2011/7/25, pasman pasmański pasma...@gmail.com:
 Hi.

 When i try to compile postgresql-beta3 on mingw32 ./configure pass ok,
 but there is error when i do make:


 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement
  -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv
 -I../../src/port
 -DFRONTEND -I../../src/include -I./src/include/port/win32 -DEXEC_BACKEND
 -I../
 ../src/include/port/win32  -c -o crypt.o crypt.c
 In file included from crypt.c:44:0:
 ../../src/include/c.h:284:2: error: #error must have a working 64-bit
 integer da
 tatype
 In file included from ../../src/include/c.h:851:0,
  from crypt.c:44:
 ../../src/include/port.h:390:0: warning: fseeko redefined
 ../../src/include/pg_config_os.h:228:0: note: this is the location of the
 previo
 us definition
 ../../src/include/port.h:391:0: warning: ftello redefined
 ../../src/include/pg_config_os.h:229:0: note: this is the location of the
 previo
 us definition
 make[2]: *** [crypt.o] Error 1
 make[2]: Leaving directory `/home/rosinkr1/postgresql-9.1beta3/src/port'
 make[1]: *** [all-port-recurse] Error 2
 make[1]: Leaving directory `/home/rosinkr1/postgresql-9.1beta3/src'
 make: *** [all-src-recurse] Error 2

 Make version 3.81.
 Status file included.
 What is wrong ?


 
 pasman



 --
 
 pasman



-- 

pasman

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar jul 26 13:43:08 -0400 2011:

 Uh, yes.  I've lost count of the number of times I've seen people hit
 one-instruction-wide race condition windows, SIGSEGV crash based on
 accessing only one byte past the valid data structure, etc etc.

I think you need a better locking protocol on that counter of yours.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write scalability

2011-07-26 Thread Greg Smith

On 07/26/2011 12:33 PM, Pavan Deolasee wrote:

I think what I am suggesting is that the default pgbench test setup
would probably not give you a good scalability as number of clients
are increased and one reason could be the contention in the small
table. So it might be a good idea to get rid of that and see if we get
much better scalability and understand other bottlenecks.
   


You can easily see this form of contention pulling down results when the 
database itself is really small and the overall transaction rate is very 
high.  With Robert using a scale=100, no more than 80 clients, and 
transaction rates peaking at 10K TPS on a 24 core box, I wouldn't 
expect this form of contention to be a large issue.  It may be dropping 
results a few percent, but I'd be surprised if it was any more than that.


It's easy enough to use -N to skip the updates to the smaller tellers 
and branches table to pull that out of the way.  TPS will go up, because 
it's doing less per transaction.  That's not necessarily a better test 
case though, it's just a different one.  The regular case includes a lot 
of overwriting the same blocks in the hot branches and tellers tables.  
That effectively pushes a lot more I/O toward being likely to happen at 
checkpoint time.  Those tables rarely have any significant I/O outside 
of the checkpoint in the standard TPC-B like scenario, because their 
usage counts stay high most of the time.


Contention for small tables that are being heavily updated is still 
important to optimize too though.  Which type of test makes more sense 
depends on what aspect of performance you're trying to focus on.  I'll 
sometimes do a full pgbench-tools sweep (range of multiple scales and 
clients at each scale) in both regular write and -N write modes, just 
to collect the slightly different data each provides.  The form of 
checkpoint I/O spike you see at sync time is very different in the two 
cases, but both usage profiles are important to consider and optimize.  
That Robert has started with the regular case doesn't worry me too much 
now that I've seen the parameters he's using, he's not running it in a 
way where I'd expect branch/teller contention to be a major limiting 
factor on the results.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 6:36 PM, Robert Haas robertmh...@gmail.com wrote:

 Now, as you say, it seems really, really
 difficult to hit that in practice, but I don't see a way of getting
 rid of the theoretical possibility without either (1) a spinlock or
 (2) a fence.  (Of course, on x86, the fence could be optimized down to
 a compiler barrier.)  I guess the question is should we worry about
 that?.

Perhaps the answer lies in a different direction altogether?

Let me ask a few questions to stimulate a different solution

* Can we do this using an active technique (e.g. signals) rather than
a passive one (reading a counter?)

* Can we partition the sinval lock, so we have multiple copies? That
increases the task for those who trigger an invalidation, but will
relieve the pressure for most readers.

* Can we put the sinval info in a different place? e.g. inside each
lock partition.

* Why do we have a different mechanism for cache invalidation
internally (sinval) to the one we offer externally (LISTEN/NOTIFY)?
Why don't we have just one?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 17:07 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
 to elog(FATAL)? Shall I do that too, or leave it for now?
 
 No objection here --- I had considered writing it that way myself.
 I refrained for fear of making a bad situation even worse, but if
 other people think FATAL would be the safer way, fine.

Patch attached.

pg_xml_init() now checks the error context after setting it, and
ereport(ERROR)s if the check fails. I've made it so that the errhint()
which suggest that compiling against libxml = 2.7.3 but running
against  2.7.3 might be the reason is only emitted if we actually
compiled against an older version. This is meant to avoid confusion
should there ever be another reason for that check to fail.

I've also changed the elog(ERROR) to elog(FATAL) in xml_errorHandler().

I've pondered whether to add a check to configure which verifies that
the headers match the libxml version exactly at compile time. In the end,
I didn't do that, for two reasons. First, there isn't anything wrong with
using older headers together with a newer libxml, so long as both versions
are either = 2.7.3 or  2.7.3. And second, with such a check in place,
the only way to exercise libxml's promised ABI compatibility is to upgrade
the libxml 2package after compiling postgres. That, however, is unlikely
to happen except on production servers, and so by adding the check we'd
increase the chance of ABI-compatibility problems remaining undetected
during development and testing.

best regards,
Florian Pflug


pg_xml_errctxcheck.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Alvaro Herrera
Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011:

 Let me ask a few questions to stimulate a different solution
 
 * Can we do this using an active technique (e.g. signals) rather than
 a passive one (reading a counter?)

Signals are already in use for special cases (queue is full), and I
think going through the kernel to achieve much more will lower
performance significantly.

 * Can we partition the sinval lock, so we have multiple copies? That
 increases the task for those who trigger an invalidation, but will
 relieve the pressure for most readers.

Not sure there's a way to meaningfully partition the queue.  In any
case, I think the problem being dealt with here is how to update the
read heads of the queue, not its contents.

 * Can we put the sinval info in a different place? e.g. inside each
 lock partition.

I don't think you can relate sinval messages to particular locks, which
would make this infeasible.  I think this bit is directly related to the
point above.

 * Why do we have a different mechanism for cache invalidation
 internally (sinval) to the one we offer externally (LISTEN/NOTIFY)?
 Why don't we have just one?

Performance.  Not sure there are other reasons as well; but just this
one is significant enough.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011:

 Let me ask a few questions to stimulate a different solution

 * Can we do this using an active technique (e.g. signals) rather than
 a passive one (reading a counter?)

 Signals are already in use for special cases (queue is full), and I
 think going through the kernel to achieve much more will lower
 performance significantly.

If there are no invalidations, there would be no signals. How would
zero signals decrease performance?


 * Can we partition the sinval lock, so we have multiple copies? That
 increases the task for those who trigger an invalidation, but will
 relieve the pressure for most readers.

 Not sure there's a way to meaningfully partition the queue.  In any
 case, I think the problem being dealt with here is how to update the
 read heads of the queue, not its contents.

I agree there's no meaningful way to partition the queue, but we can
store the information in more than one place to reduce the contention
of people requesting it.

Both those ideas are still on the table.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 2:56 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011:

 Let me ask a few questions to stimulate a different solution

 * Can we do this using an active technique (e.g. signals) rather than
 a passive one (reading a counter?)

 Signals are already in use for special cases (queue is full), and I
 think going through the kernel to achieve much more will lower
 performance significantly.

 If there are no invalidations, there would be no signals. How would
 zero signals decrease performance?

It wouldn't, although it might be bad in the case where there are lots
of temp tables being created and dropped.  I think the biggest problem
with signals is that they don't provide any meaningful synchronization
guarantees.  When you send somebody a signal, you don't really know
how long it's going to take for them to receive it.

 * Can we partition the sinval lock, so we have multiple copies? That
 increases the task for those who trigger an invalidation, but will
 relieve the pressure for most readers.

 Not sure there's a way to meaningfully partition the queue.  In any
 case, I think the problem being dealt with here is how to update the
 read heads of the queue, not its contents.

 I agree there's no meaningful way to partition the queue, but we can
 store the information in more than one place to reduce the contention
 of people requesting it.

I thought about that.  Basically, that saves you a factor of N in
contention on the read side (where N is the number of copies) and
costs you a factor of N on the write side (you have to update N
copies, taking a spinlock or lwlock for each).  In the limit, you
could do one copy of the counter per backend.

I think, though, that a lock-free implementation using memory barriers
is going to end up being the only real solution.  We might possibly
convince ourselves that we're OK with increasing the cost of
SIInsertDataEntries(), but any solution that involves replication the
data is still based on doing at least some locking.  And I am pretty
well convinced that even one spinlock acquisition in
SIInsertDataEntries() is too many.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Signals are already in use for special cases (queue is full), and I
 think going through the kernel to achieve much more will lower
 performance significantly.

 If there are no invalidations, there would be no signals. How would
 zero signals decrease performance?

But if there *is* an invalidation (which is not a negligible case),
it'd get significantly slower.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas robertmh...@gmail.com wrote:

 It wouldn't, although it might be bad in the case where there are lots
 of temp tables being created and dropped.

Do temp tables cause relcache invalidations?

That seems like something we'd want to change in itself.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))

2011-07-26 Thread Yeb Havinga

On 2011-07-22 17:35, Hitoshi Harada wrote:

2011/7/23 Yeb Havingayebhavi...@gmail.com:

Works like a charm :-). However, now there is always a copyObject of a
subquery even when the subquery is not safe for qual pushdown. The problem
with the previous issafe was that it was only assigned for
rel-baserestrictinfo != NIL. If it is assigned before the if statement, it
still works. See attached patch that avoids subquery copy for unsafe
subqueries, and also exits best_inner_subqueryscan before palloc of
differenttypes in case of unsafe queries.


Ah, yeah, right. Too quick fix bloated my brain :P Thanks for testing!
I'll check it more.


A few days ago I read Tomas Vondra's blog post about dss tpc-h queries 
on PostgreSQL at 
http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in 
which he showed how to manually pull up a dss subquery to get a large 
speed up. Initially I thought: cool, this is probably now handled by 
Hitoshi's patch, but it turns out the subquery type in the dss query is 
different.


The original and rewritten queries are below. The debug_print_plan 
output shows the subquery is called from a opexpr ( l_quantity, 
subquery output) and the sublink type is EXPR_SUBLINK. Looking at the 
source code; pull_up_sublink only considers ANY and EXISTS sublinks. I'm 
wondering if this could be expanded to deal with EXPR sublinks. Clearly 
in the example Tomas has given this can be done. I'm wondering if there 
are show stoppers that prevent this to be possible in the general case, 
but can't think of any, other than the case of a sublink returning NULL 
and the opexpr is part of a larger OR expression or IS NULL; in which 
case it should not be pulled op, or perhaps it could be pulled up as 
outer join.


Thoughts?

regards,
Yeb


The original query:

tpch=# explain select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#13'
and p_container = 'JUMBO PKG'
and l_quantity  (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
)
LIMIT 1;
 QUERY PLAN

 Limit  (cost=183345309.79..183345309.81 rows=1 width=8)
   -  Aggregate  (cost=183345309.79..183345309.81 rows=1 width=8)
 -  Hash Join  (cost=2839.99..183345307.76 rows=815 width=8)
   Hash Cond: (public.lineitem.l_partkey = part.p_partkey)
   Join Filter: (public.lineitem.l_quantity  (SubPlan 1))
   -  Seq Scan on lineitem  (cost=0.00..68985.69 
rows=2399869 width=17)

   -  Hash  (cost=2839.00..2839.00 rows=79 width=4)
 -  Seq Scan on part  (cost=0.00..2839.00 rows=79 
width=4)
   Filter: ((p_brand = 'Brand#13'::bpchar) AND 
(p_container = 'JUMBO PKG'::bpchar))

   SubPlan 1
 -  Aggregate  (cost=74985.44..74985.46 rows=1 width=5)
   -  Seq Scan on lineitem  (cost=0.00..74985.36 
rows=31 width=5)

 Filter: (l_partkey = part.p_partkey)

manually rewritten variant:

tpch=# explain select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part,
(SELECT l_partkey AS agg_partkey,
0.2 * avg(l_quantity) AS avg_quantity
  FROM lineitem GROUP BY l_partkey) part_agg
where
p_partkey = l_partkey
and agg_partkey = l_partkey
and p_brand = 'Brand#13'
and p_container = 'JUMBO PKG'
and l_quantity  avg_quantity
LIMIT 1;
   QUERY PLAN

 Limit  (cost=179643.88..179643.89 rows=1 width=8)
   -  Aggregate  (cost=179643.88..179643.89 rows=1 width=8)
 -  Hash Join  (cost=161865.21..178853.91 rows=315985 width=8)
   Hash Cond: (public.lineitem.l_partkey = part.p_partkey)
   Join Filter: (public.lineitem.l_quantity  ((0.2 * 
avg(public.lineitem.l_quantity
   -  HashAggregate  (cost=80985.04..82148.65 rows=77574 
width=9)
 -  Seq Scan on lineitem  (cost=0.00..68985.69 
rows=2399869 width=9)

   -  Hash  (cost=80849.63..80849.63 rows=2444 width=21)
 -  Hash Join  (cost=2839.99..80849.63 rows=2444 
width=21)
   Hash Cond: (public.lineitem.l_partkey = 
part.p_partkey)
   -  Seq Scan on lineitem  
(cost=0.00..68985.69 rows=2399869 width=17)
   -  Hash  (cost=2839.00..2839.00 rows=79 
width=4)
 -  Seq Scan on part  
(cost=0.00..2839.00 rows=79 width=4)
   Filter: ((p_brand = 
'Brand#13'::bpchar) AND (p_container = 'JUMBO PKG'::bpchar))




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your 

Re: [HACKERS] WIP: Fast GiST index build

2011-07-26 Thread Alexander Korotkov
On Tue, Jul 26, 2011 at 9:24 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 That was a quite off-the-cuff remark, so I took the patch and culled out
 loaded-tree bookkeeping. A lot of other changes fell off from that, so it
 took me quite some time to get it working again, but here it is. This is a
 *lot* smaller patch, although that's partly explained by the fact that I
 left out some features: prefetching and the neighbor relocation code is
 gone.

 I'm pretty exhausted by this, so I just wanted to send this out without
 further analysis. Let me know if you have questions on the approach taken.
 I'm also not sure how this performs compared to your latest patch, I haven't
 done any performance testing. Feel free to use this as is, or as a source of
 inspiration :-).

I also was going to try to evade keeping loaded-tree hash. This might help
me a lot. Thanks.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Simon Riggs
On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas robertmh...@gmail.com wrote:

 * Can we partition the sinval lock, so we have multiple copies? That
 increases the task for those who trigger an invalidation, but will
 relieve the pressure for most readers.

 Not sure there's a way to meaningfully partition the queue.  In any
 case, I think the problem being dealt with here is how to update the
 read heads of the queue, not its contents.

 I agree there's no meaningful way to partition the queue, but we can
 store the information in more than one place to reduce the contention
 of people requesting it.

 I thought about that.  Basically, that saves you a factor of N in
 contention on the read side (where N is the number of copies) and
 costs you a factor of N on the write side (you have to update N
 copies, taking a spinlock or lwlock for each).  In the limit, you
 could do one copy of the counter per backend.

 I think, though, that a lock-free implementation using memory barriers
 is going to end up being the only real solution.  We might possibly
 convince ourselves that we're OK with increasing the cost of
 SIInsertDataEntries(), but any solution that involves replication the
 data is still based on doing at least some locking.  And I am pretty
 well convinced that even one spinlock acquisition in
 SIInsertDataEntries() is too many.

You might be right, but I think we have little knowledge of how some
memory barrier code you haven't written yet effects performance on
various architectures.

A spinlock per backend would cache very nicely, now you mention it. So
my money would be on the multiple copies.

It's not completely clear to me that updating N copies would be more
expensive. Accessing N low contention copies rather than 1
high-contention value might actually be a win.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Noah Misch n...@2ndquadrant.com writes:
 On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote:
 I think I have a simpler idea, though:
 before acquiring any locks, just have SIGetDataEntries() do this:
 
 +   if (stateP-nextMsgNum == segP-maxMsgNum  !stateP-resetState)
 +   return 0;
 
 Patch (with comment explaining why I think this is OK) attached.  If
 the message numbers happen to be equal only because the counter has
 wrapped, then stateP-resetState will be true, so we'll still realize
 we need to do some work.

 This is attractive, and I don't see any problems with it.  (In theory, you 
 could
 hit a case where the load of resetState gives an ancient false just as the
 counters wrap to match.  Given that the wrap interval is 100x as long as 
 the
 reset interval, I'm not worried about problems on actual silicon.)

 +1 for doing this and moving on.

After some further reflection I believe this patch actually is pretty
safe, although Noah's explanation of why seems a bit confused.  The key
points are that (1) each of the reads is atomic, and (2) we should not
be able to see a value that is older than our last acquisition/release
of a shared memory lock.  These being granted, we will make a decision
that is correct, or at least was correct as of some time after that last
lock action.  As stated in the patch comments, we are not required to
promptly assimilate sinval actions on objects we don't hold any lock on,
so this seems sufficient.  In essence, we are relying on an assumed
prior visit to the lock manager to provide memory access synchronization
for the sinval no-work-to-do test.

The case Noah speculates about above amounts to supposing that this
reasoning fails to hold for the resetState value, and I don't see why
that would be true.  Even if the above scenario did manage to happen,
it would not mean that we missed the reset, just that we hadn't noticed
it *yet*.  And by hypothesis, none of the as-yet-not-seen catalog
updates are a problem for us.

BTW, the patch really ought to add something to the comment around
line 100.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 Patch attached.

Will check and apply this.

 I've pondered whether to add a check to configure which verifies that
 the headers match the libxml version exactly at compile time. In the end,
 I didn't do that, for two reasons. First, there isn't anything wrong with
 using older headers together with a newer libxml, so long as both versions
 are either = 2.7.3 or  2.7.3. And second, with such a check in place,
 the only way to exercise libxml's promised ABI compatibility is to upgrade
 the libxml 2package after compiling postgres. That, however, is unlikely
 to happen except on production servers, and so by adding the check we'd
 increase the chance of ABI-compatibility problems remaining undetected
 during development and testing.

I concur with *not* doing that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote:
 Noah Misch n...@2ndquadrant.com writes:
  On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote:
  I think I have a simpler idea, though:
  before acquiring any locks, just have SIGetDataEntries() do this:
  
  +   if (stateP-nextMsgNum == segP-maxMsgNum  !stateP-resetState)
  +   return 0;
  
  Patch (with comment explaining why I think this is OK) attached.  If
  the message numbers happen to be equal only because the counter has
  wrapped, then stateP-resetState will be true, so we'll still realize
  we need to do some work.
 
  This is attractive, and I don't see any problems with it.  (In theory, you 
  could
  hit a case where the load of resetState gives an ancient false just as the
  counters wrap to match.  Given that the wrap interval is 100x as long 
  as the
  reset interval, I'm not worried about problems on actual silicon.)
 
  +1 for doing this and moving on.
 
 After some further reflection I believe this patch actually is pretty
 safe, although Noah's explanation of why seems a bit confused.  The key
 points are that (1) each of the reads is atomic, and (2) we should not
 be able to see a value that is older than our last acquisition/release
 of a shared memory lock.  These being granted, we will make a decision
 that is correct, or at least was correct as of some time after that last
 lock action.  As stated in the patch comments, we are not required to
 promptly assimilate sinval actions on objects we don't hold any lock on,
 so this seems sufficient.  In essence, we are relying on an assumed
 prior visit to the lock manager to provide memory access synchronization
 for the sinval no-work-to-do test.
 
 The case Noah speculates about above amounts to supposing that this
 reasoning fails to hold for the resetState value, and I don't see why
 that would be true.  Even if the above scenario did manage to happen,
 it would not mean that we missed the reset, just that we hadn't noticed
 it *yet*.  And by hypothesis, none of the as-yet-not-seen catalog
 updates are a problem for us.

Here's the way it can fail:

1. Backend enters SIGetDataEntries() with main memory bearing stateP-resetState
= false, stateP-nextMsgNum = 500, segP-maxMsgNum = 505.  The CPU has those
latest stateP values in cache, but segP-maxMsgNum is *not* in cache.
2. Backend stalls for long time.  Meanwhile, other backends issue
MSGNUMWRAPAROUND - 5 invalidation messages.  Main memory bears
stateP-resetState = true, stateP-nextMsgNum = 500 - MSGNUMWRAPAROUND,
segP-maxMsgNum = 500.
3. Backend wakes up, uses its cached stateP values and reads segP-maxMsgNum =
500 from main memory.  The patch's test finds no need to reset or process
invalidation messages.

That's the theoretical risk I wished to illustrate.  Though this appears
possible on an abstract x86_64 system, I think it's unrealistic to suppose that
a dirty cache line could persist *throughout* the issuance of more than 10^9
invalidation messages on a concrete implementation.

A way to think about the problem is that our read of segP-maxMsgNum is too new.
If we had used a snapshot of values as of the most recent lock
acquisition/release, there would be no problem.  It's the mix of a new-enough
stateP with an all-too-new segP that yields the anomaly.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 01:36:26PM -0400, Robert Haas wrote:
 On Mon, Jul 25, 2011 at 6:24 PM, Noah Misch n...@2ndquadrant.com wrote:
  On Fri, Jul 22, 2011 at 03:54:03PM -0400, Robert Haas wrote:
  On Fri, Jul 22, 2011 at 3:28 PM, Noah Misch n...@2ndquadrant.com wrote:
   This is attractive, and I don't see any problems with it.  (In theory, 
   you could
   hit a case where the load of resetState gives an ancient false just as 
   the
   counters wrap to match.  Given that the wrap interval is 100x as 
   long as the
   reset interval, I'm not worried about problems on actual silicon.)
 
  It's actually 262,144 times as long - see MSGNUMWRAPAROUND.
 
  Ah, so it is.
 
  It would be pretty easy to eliminate even the theoretical possibility
  of a race by getting rid of resetState altogether and using nextMsgNum
  = -1 to mean that.  Maybe I should go ahead and do that.
 
  Seems like a nice simplification.
 
 On further reflection, I don't see that this helps: it just moves the
 problem around.

Alas, yes.

 With resetState as a separate variable, nextMsgNum is
 never changed by anyone other than the owner, so we can never have a
 stale load.

It's also changed when SICleanupQueue() decides to wrap everyone.  This could
probably be eliminated by using uint32 and letting overflow take care of wrap
implicitly, but ...

 But if we overload nextMsgNum to also indicate whether
 our state has been reset, then there's a race between when we load
 nextMsgNum and when we load maxMsgNum (instead of code I posted
 previously, which has a race between when we load resetState and when
 we load maxMsgNum).  Now, as you say, it seems really, really
 difficult to hit that in practice, but I don't see a way of getting
 rid of the theoretical possibility without either (1) a spinlock or
 (2) a fence.  (Of course, on x86, the fence could be optimized down to
 a compiler barrier.)

No new ideas come to mind, here.  We could migrate back toward your original
proposal of making the counter a non-wrapping uint64; Florian described some
nice techniques for doing atomic 64-bit reads on x86:

http://archives.postgresql.org/message-id/7c94c386-122f-4918-8624-a14352e8d...@phlo.org

I'm not personally acquainted with those approaches, but they sound promising.
Since the overlap between 32-bit installations and performance-sensitive
installations is all but gone, we could even just use a spinlock under 32-bit.

 I guess the question is should we worry about that?.

I'd lean toward no.  I share Tom's unease about writing off a race condition
as being too improbable, but this is quite exceptionally improbable.  On the
other hand, some of the fixes don't look too invasive.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-26 Thread Florian Pflug
Hi

While reviewing the (now applied) XPATH escaping patches, Radoslaw found one
case where the previous failure of XPATH to escape its return value was offset
by XMLATTRIBUTES insistence to escape all input values, even if they're
already of type XML.

To wit, if you do

  SELECT XMLELEMENT(NAME t, XMLATTRIBUTES('amp;'::XML AS a))

you get

 xmlelement 

 t a=amp;amp;/

which is somewhat surprising. Especially since

  SELECT XMLELEMENT(NAME t, 'amp;'::XML);

gives

  xmlelement  
--
 tamp;/t

as one would except. Now, it seems rather unlikely that a real application
would actually contain a query similar to the former one - you usually don 't
store XML in the attributes of XML nodes. But since XPATH() returns XML, it's
not unlikely that an application would do

  SELECT XMLELEMENT(NAME ..., XMLATTRIBUTES((XPATH(...))[1] AS ...))

As it stands, on 9.2 the values returned by the XPath expression will be
escaped twice - once by XPATH and a second time by XMLATTRIBUTES, while on 9.1
the value will be escaped only once.

Since this seems to be the most likely situation in which XMLATTRIBUTES would
receive an input argument of type XML, and since it's not entirely logical for
XMLATTRIBUTES to unconditionally escapes values already of type XML, I propose
to change the behaviour of XMLATTRIBUTES as follows.

Values not of type XML are be treated as they always have been, i.e. all
special characters (') are replaced by an entity reference (lt;,
gt;, amp;, quot;, apos;).

Values of type XML are assumed to be already escaped, so '' isn't treated as
a special character to avoid double escaping. The other special characters
(,,,') are escaped as usual.

The safety of this relies on the fact that '' may never appear in a
well-formed XML document, except as part of an entity reference. This seems to
be the case, except if CDATA sections are involved, which may contain plain
ampersands. So the actual escaping rule would have to be a bit more complex
than I made it sound above - we'd need to detect CDATA sections, and
re-enabled the escaping of '' until the end of such a section.

To actually implement this, I'd remove the use of libxml from the
implementation of XMLELEMENT, and instead use our already existing
escape_xml() function, enhanced with the ability to handle the partial
escaping algorithm outlined above. We already only use libxml to escape
attribute values, so doing that isn't a radical departure from xml.c's ways.
As an added benefit, all the encoding-related problems of XMLELEMENT and
XMLATTRIBUTES would go away once libxml is removed from this code path. So
XPATH() would be the only remaining function which breaks in non-UTF-8
databases.

Comments? Thoughts? Suggestions?

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] isolation test deadlocking on buildfarm member coypu

2011-07-26 Thread Alvaro Herrera
After committing Noah's patch to fix the isolation tests, there have
been two more failures in Rémi's machines pika and coypu:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pikadt=2011-07-24%2006%3A45%3A45
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypudt=2011-07-23%2021%3A50%3A54

In both cases, the failure is identical in fk-deadlock2:

== pgsql.20950/src/test/isolation/regression.diffs 
===
*** 
/home/pgbuildfarm/workdir/HEAD/pgsql.20950/src/test/isolation/expected/fk-deadlock2.out
 Sun Jul 24 08:46:44 2011
--- 
/home/pgbuildfarm/workdir/HEAD/pgsql.20950/src/test/isolation/results/fk-deadlock2.out
  Sun Jul 24 15:11:42 2011
***
*** 22,29 
  step s2u1:  UPDATE B SET Col2 = 1 WHERE BID = 2; 
  step s1u2:  UPDATE B SET Col2 = 1 WHERE BID = 2;  waiting ...
  step s2u2:  UPDATE B SET Col2 = 1 WHERE BID = 2; 
- step s1u2: ... completed
  ERROR:  deadlock detected
  step s1c:  COMMIT; 
  step s2c:  COMMIT; 
  
--- 22,29 
  step s2u1:  UPDATE B SET Col2 = 1 WHERE BID = 2; 
  step s1u2:  UPDATE B SET Col2 = 1 WHERE BID = 2;  waiting ...
  step s2u2:  UPDATE B SET Col2 = 1 WHERE BID = 2; 
  ERROR:  deadlock detected
+ step s1u2: ... completed
  step s1c:  COMMIT; 
  step s2c:  COMMIT; 



I think the only explanation necessary for this is that one process
reports its status before the other one.  I think it would be enough to
add another variant of the expected file to fix this problem, but I
don't quite want to do that because we already have three of them, and I
think we would need to add one per existing expected, so we'd end up
with 6 expected files which would be a pain to work with.

Not quite sure what to do with this.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Noah Misch n...@2ndquadrant.com writes:
 On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote:
 After some further reflection I believe this patch actually is pretty
 safe, although Noah's explanation of why seems a bit confused.

 Here's the way it can fail:

 1. Backend enters SIGetDataEntries() with main memory bearing 
 stateP-resetState
 = false, stateP-nextMsgNum = 500, segP-maxMsgNum = 505.  The CPU has those
 latest stateP values in cache, but segP-maxMsgNum is *not* in cache.
 2. Backend stalls for long time.  Meanwhile, other backends issue
 MSGNUMWRAPAROUND - 5 invalidation messages.  Main memory bears
 stateP-resetState = true, stateP-nextMsgNum = 500 - MSGNUMWRAPAROUND,
 segP-maxMsgNum = 500.
 3. Backend wakes up, uses its cached stateP values and reads segP-maxMsgNum =
 500 from main memory.  The patch's test finds no need to reset or process
 invalidation messages.

[ squint... ]  Hmm, you're right.  The case where this would break
things is if (some of) the five unprocessed messages relate to some
object we've just locked.  But the initial state you describe would be
valid right after obtaining such a lock.

 That's the theoretical risk I wished to illustrate.  Though this appears
 possible on an abstract x86_64 system, I think it's unrealistic to suppose 
 that
 a dirty cache line could persist *throughout* the issuance of more than 10^9
 invalidation messages on a concrete implementation.

Dirty cache line, maybe not, but what if the assembly code commands the
CPU to load those variables into CPU registers before doing the
comparison?  If they're loaded with maxMsgNum coming in last (or at
least after resetState), I think you can have the problem without any
assumptions about cache line behavior at all.  You just need the process
to lose the CPU at the right time.

If we marked the pointers volatile, we could probably ensure that the
assembly code tests resetState last, but that's not sufficient to avoid
the stale-cache-line risk.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))

2011-07-26 Thread Tom Lane
Yeb Havinga yebhavi...@gmail.com writes:
 A few days ago I read Tomas Vondra's blog post about dss tpc-h queries 
 on PostgreSQL at 
 http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in 
 which he showed how to manually pull up a dss subquery to get a large 
 speed up. Initially I thought: cool, this is probably now handled by 
 Hitoshi's patch, but it turns out the subquery type in the dss query is 
 different.

Actually, I believe this example is the exact opposite of the
transformation Hitoshi proposes.  Tomas was manually replacing an
aggregated subquery by a reference to a grouped table, which can be
a win if the subquery would be executed enough times to amortize
calculation of the grouped table over all the groups (some of which
might never be demanded by the outer query).  Hitoshi was talking about
avoiding calculations of grouped-table elements that we don't need,
which would be a win in different cases.  Or at least that was the
thrust of his original proposal; I'm not sure where the patch went since
then.

This leads me to think that we need to represent both cases as the same
sort of query and make a cost-based decision as to which way to go.
Thinking of it as a pull-up or push-down transformation is the wrong
approach because those sorts of transformations are done too early to
be able to use cost comparisons.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote:
 Noah Misch n...@2ndquadrant.com writes:
  That's the theoretical risk I wished to illustrate.  Though this appears
  possible on an abstract x86_64 system, I think it's unrealistic to suppose 
  that
  a dirty cache line could persist *throughout* the issuance of more than 10^9
  invalidation messages on a concrete implementation.
 
 Dirty cache line, maybe not, but what if the assembly code commands the
 CPU to load those variables into CPU registers before doing the
 comparison?  If they're loaded with maxMsgNum coming in last (or at
 least after resetState), I think you can have the problem without any
 assumptions about cache line behavior at all.  You just need the process
 to lose the CPU at the right time.

True.  If the compiler places the resetState load first, you could hit the
anomaly by merely setting a breakpoint on the next instruction, waiting for
exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend continue.
I think, though, we should either plug that _and_ the cache incoherency case or
worry about neither.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Tom Lane
Noah Misch n...@2ndquadrant.com writes:
 On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote:
 Dirty cache line, maybe not, but what if the assembly code commands the
 CPU to load those variables into CPU registers before doing the
 comparison?  If they're loaded with maxMsgNum coming in last (or at
 least after resetState), I think you can have the problem without any
 assumptions about cache line behavior at all.  You just need the process
 to lose the CPU at the right time.

 True.  If the compiler places the resetState load first, you could hit the
 anomaly by merely setting a breakpoint on the next instruction, waiting for
 exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend 
 continue.
 I think, though, we should either plug that _and_ the cache incoherency case 
 or
 worry about neither.

How do you figure that?  The poor-assembly-code-order risk is both a lot
easier to fix and a lot higher probability.  Admittedly, it's still way
way down there, but you only need a precisely-timed sleep, not a
precisely-timed sleep *and* a cache line that somehow remained stale.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add missing newlines at end of error messages

2011-07-26 Thread David Fetter
This seems like a mechanical check.  Should it be part of what gets
checked when people push?

Cheers,
David.
On Tue, Jul 26, 2011 at 08:30:16PM +, Peter Eisentraut wrote:
 Add missing newlines at end of error messages
 
 Branch
 --
 master
 
 Details
 ---
 http://git.postgresql.org/pg/commitdiff/e67efb01e886d69d40d1cd87fba4507e8bb1035e
 
 Modified Files
 --
 src/bin/psql/command.c |2 +-
 src/bin/psql/common.c  |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
 
 
 -- 
 Sent via pgsql-committers mailing list (pgsql-committ...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-committers

-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] storing TZ along timestamps

2011-07-26 Thread Jim Nasby
On Jul 25, 2011, at 6:22 PM, Robert Haas wrote:
 On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby j...@nasby.net wrote:
 Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to 
 it dynamically all the time? Perhaps we can enforce that we'll only 
 recognize new TZ info as part of a config reload?
 
 Hmm.  That might work in theory, but I don't see any good way to
 update every database's tz table on each reload.

I'm assuming that the issue here is that multiple backends could be connected 
to the same database, and we don't want all of them to try to actually do the 
updates, only the first one that discovers the change. If that's the problem 
you foresee then perhaps it's a non-issue... if each backend only updates 
things that have actually changed, and they do that with race-free 'merge' 
logic, then only the first backend to attempt the update would end up finding 
actual work to do.

Or are you seeing a problem I'm missing?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] storing TZ along timestamps

2011-07-26 Thread Christopher Browne
On Tue, Jul 26, 2011 at 6:45 PM, Jim Nasby j...@nasby.net wrote:
 On Jul 25, 2011, at 6:22 PM, Robert Haas wrote:
 On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby j...@nasby.net wrote:
 Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to 
 it dynamically all the time? Perhaps we can enforce that we'll only 
 recognize new TZ info as part of a config reload?

 Hmm.  That might work in theory, but I don't see any good way to
 update every database's tz table on each reload.

 I'm assuming that the issue here is that multiple backends could be connected 
 to the same database, and we don't want all of them to try to actually do the 
 updates, only the first one that discovers the change. If that's the problem 
 you foresee then perhaps it's a non-issue... if each backend only updates 
 things that have actually changed, and they do that with race-free 'merge' 
 logic, then only the first backend to attempt the update would end up finding 
 actual work to do.

 Or are you seeing a problem I'm missing?

What if 4 backends concurrently are the first ones to try to
simultaneously add South America/Ruritania, and...

1.  Connection #1 came in 'first', but rolls back its transaction.
2.  Connection #2 came in 'second', and also winds up rolling back its
transaction because the connection fails due to a network problem.
3.  Connection #3 actually completes.  But doesn't commit until after #4.
4.  Connection #4 started last, but turns out to COMMIT first.

The merge is a pretty bad one.  They all have to try to succeed, and
in some way that doesn't block things.

Perhaps the TZ values need to not be UNIQUE, but some process can come
in afterwards and rewrite to drop out non-unique values.  That's not
very nice either; that means you can't use a FK reference against the
TZ table.

Or you need to have something that comes in afterwards and repoints
tuples to the *real* TZ entry, which seems likely to be troublesome.

This just gets so ugly so fast; the attempt to save space by storing a
pointer to the TZ value is just filled with trouble (and potential
#fail).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-26 Thread Jim Nasby
On Jul 25, 2011, at 9:59 PM, Jerry Sievers wrote:
 That our version of partitioning can be overloaded like this though I
 think adds power.  A bit of which we lost adding the restrictgion.

That's why I'd be opposed to any partitioning scheme that removed the ability 
to have different fields in different children. We've found that ability to be 
very useful. Likewise, I think we need to have intelligent plans involving a 
parent table that's either completely empty or mostly empty.

As for dealing with inheritance and putting stuff on some children but not 
others, take a look at http://pgfoundry.org/projects/enova-tools/. There's a 
presentation there that discusses how we solved these issues and it includes 
the tools we created to do it. Note that we're close to releasing a cleaner 
version of that stuff, so if you decide to use it please ping me off-list if we 
haven't gotten the new stuff posted.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] storing TZ along timestamps

2011-07-26 Thread Jim Nasby
On Jul 26, 2011, at 5:56 PM, Christopher Browne wrote:
 I'm assuming that the issue here is that multiple backends could be 
 connected to the same database, and we don't want all of them to try to 
 actually do the updates, only the first one that discovers the change. If 
 that's the problem you foresee then perhaps it's a non-issue... if each 
 backend only updates things that have actually changed, and they do that 
 with race-free 'merge' logic, then only the first backend to attempt the 
 update would end up finding actual work to do.
 
 Or are you seeing a problem I'm missing?
 
 What if 4 backends concurrently are the first ones to try to
 simultaneously add South America/Ruritania, and...
 
 1.  Connection #1 came in 'first', but rolls back its transaction.
 2.  Connection #2 came in 'second', and also winds up rolling back its
 transaction because the connection fails due to a network problem.
 3.  Connection #3 actually completes.  But doesn't commit until after #4.
 4.  Connection #4 started last, but turns out to COMMIT first.

Ugh, I didn't realize that a reload would take effect in the middle of a 
transaction. That certainly kills what I proposed.

Though, now I'm wondering why this info would need to be in every database 
anyway... certainly this should be treated as global data, and if that's the 
case then only one process needs to update it. Though I'm not sure if it's 
possible for global data to be ACID.

Anyway, rather than continue this on-list, I'm going to get Alvaro to think 
about it in more detail and see what he comes up with.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning WAS: Check constraints on partition parents only?

2011-07-26 Thread Josh Berkus
Jim,

 That's why I'd be opposed to any partitioning scheme that removed the ability 
 to have different fields in different children. We've found that ability to 
 be very useful. Likewise, I think we need to have intelligent plans involving 
 a parent table that's either completely empty or mostly empty.

Well, I don't think that anyone is proposing making constraint exclusion
go away.  However, we also need a new version of partitioning which
happens below the table level.  I don't agree that the new
partitioning needs -- at least at the start -- the level of flexibility
which CE gives the user.  In order to get simplicity, we have to
sacrifice flexibility.

In fact, I'd suggest extreme simplicity for the first version of this,
with just key partitioning.  That is:

CREATE TABLE table_name (
... cols ... )
PARTITION ON key_expression
[ AUTOMATIC CREATE ];

... where key_expression can be any immutable expression on one or
more columns of some_table.  This actually covers range partitioning as
well, provided that the ranges can be expressed as the results of an
expression (e.g. EXTRACT ('month' FROM date_processed ) ).

For the optional AUTOMATIC CREATE phrase, new values for key_expression
would result in the automatic creation of new partitions when they
appear (this has some potential deadlocking issues, so it's not ideal
for a lot of applications).  Otherwise, you'd create partitions manually:

CREATE PARTITION ON table_name KEY key_value;
DROP PARTITION ON table_name KEY key_value;

... where key_value is some valid value which could result from
key_expression.

Yes, this is a very narrow and simplistic partitioning spec.  However,
it would cover 80% of the use cases I see in the field or on IRC, while
being 80% simpler than CE.  And CE would still be there for those who
need it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: bogus descriptions displayed by \d+

2011-07-26 Thread Josh Kupershmidt
On Tue, Jul 26, 2011 at 9:53 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jul 25, 2011 at 10:29 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 I think this is basically the right approach but I found what you did
 here a bit wordy, so I simplified it, committed it, and back-patched
 to 9.0 with suitable adjustment.  Hopefully I didn't simplify it into
 a form you don't like.

That looks fine. Minor grammar quibble about:

+  When commenting on a column,
+  replaceable class=parameterrelation_name/replaceable must refer
+  to a table, view, composite types, or foreign table.

types should probably be the singular type.

  * get rid of the bogus Description outputs for \d+ sequence_name
 and \d+ index_name

 Committed this part to head with minor tweaks.

Thanks for the commit.

  * The Storage column for \d+ sequence_name is correct, I suppose,
 but repetitive

 I'm OK with removing that.

 Hrm, would it be better to keep that  Storage bit around in some
 non-repetitive form, maybe on its own line below the table output?

 Well, I don't really see that it has any value.  I'd probably just
 leave it the way it is, but if we're going to change something, I
 would favor removing it over relocating it.

I notice the Storage information is also repeated for multi-column
indexes. I don't mind leaving this wart as-is for now, since
single-column indexes are probably the norm, and we would presumably
want to fix both types in one go.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 3:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 You might be right, but I think we have little knowledge of how some
 memory barrier code you haven't written yet effects performance on
 various architectures.

 A spinlock per backend would cache very nicely, now you mention it. So
 my money would be on the multiple copies.

Maybe so, but you can see from the numbers in my OP that the results
still leave something to be desired.

 It's not completely clear to me that updating N copies would be more
 expensive. Accessing N low contention copies rather than 1
 high-contention value might actually be a win.

Yeah, I haven't tested that approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 3:25 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas robertmh...@gmail.com wrote:
 It wouldn't, although it might be bad in the case where there are lots
 of temp tables being created and dropped.

 Do temp tables cause relcache invalidations?

 That seems like something we'd want to change in itself.

I agree.  Unfortunately, I think it's a non-trivial fix.

I've also been wondering if we could avoid taking an
AccessExclusiveLock on a newly created (temporary?) table.  It seems
like no one should be able to see it until commit, at which point we'd
be releasing the lock anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sinval synchronization considered harmful

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 4:38 PM, Noah Misch n...@2ndquadrant.com wrote:
 No new ideas come to mind, here.

OK, I have a new idea.  :-)

1. Add a new flag to each procState called something like timeToPayAttention.
2. Each call to SIGetDataEntries() iterates over all the ProcStates
whose index is  lastBackend and sets stateP-timeToPayAttention =
TRUE for each.
3. At the beginning of SIGetDataEntries(), we do an unlocked if
(!stateP-timeToPayAttention) return 0.
4. Immediately following that if statement and before acquiring any
locks, we set stateP-timeToPayAttention = FALSE.

The LWLockRelease() in SIGetDataEntries() will be sufficient to force
all of the stateP-timeToPayAttention writes out to main memory, and
the read side is OK because we either just took a lock (which acted as
a fence) or else there's a race anyway.  But unlike my previous
proposal, it doesn't involve *comparing* anything.  We needn't worry
about whether we could read two different values that are through
great misfortune out of sync because we're only reading one value.

If, by chance, the value is set to true just after we set it to false,
that won't mess us up either: we'll still read all the messages after
acquiring the lock.

Now, there's some downside to this approach - it involves doing O(N)
work for each invalidation message we receive.  But as long as it's
only O(N) stores and not O(N) lock acquisitions and releases, I think
that might be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers