[HACKERS] How to use CreateFunctionStmt's RETURN TABLE?
Hello Guys, Lately, I saw this syntax in ./src/backend/parser/Gray.y as following! CreateFunctionStmt: CREATE opt_or_replace FUNCTION func_name func_args_with_defaults RETURNS func_return createfunc_opt_list opt_definition { CreateFunctionStmt *n = makeNode(CreateFunctionStmt); n-replace = $2; n-funcname = $4; n-parameters = $5; n-returnType = $7; n-options = $8; n-withClause = $9; $$ = (Node *)n; } | CREATE opt_or_replace FUNCTION func_name func_args_with_defaults RETURNS TABLE '(' table_func_column_list ')' createfunc_opt_list opt_definition { CreateFunctionStmt *n = makeNode(CreateFunctionStmt); n-replace = $2; n-funcname = $4; n-parameters = mergeTableFuncParameters($5, $9); n-returnType = TableFuncTypeName($9); n-returnType-location = @7; n-options = $11; n-withClause = $12; $$ = (Node *)n; } | CREATE opt_or_replace FUNCTION func_name func_args_with_defaults createfunc_opt_list opt_definition { CreateFunctionStmt *n = makeNode(CreateFunctionStmt); n-replace = $2; n-funcname = $4; n-parameters = $5; n-returnType = NULL; n-options = $6; n-withClause = $7; $$ = (Node *)n; } ; I do not know how to use the second syntax:RETURNS TABLE '(' table_func_column_list ')' createfunc_opt_list opt_definition. May someone help me to write a simple example of this syntax! Thank you very much. Looking forward for your help!
Re: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE?
On 26.07.2011 10:22, _ʯͷ wrote: I do not know how to use the second syntax:RETURNS TABLE '(' table_func_column_list ')' createfunc_opt_list opt_definition. This is hardly a question related to PostgreSQL development, pgsql-general mailing list would've been more appropriate. This is documented at: http://www.postgresql.org/docs/9.0/static/sql-createfunction.html There's an example using that syntax on that page. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
8.4 had this change: * Force child tables to inherit CHECK constraints from parents (Alex Hunsaker, Nikhil Sontakke, Tom) You're not the only one who occasionally bangs his head against it. Sorry for the occasional head bumps :) Yeah. I think it's good that there's a barrier to blindly dropping a constraint that may be important to have on children, but there should be a way to override that. Hmmm, but then it does open up the possibility of naive users shooting themselves in the foot. It can be easy to conjure up a parent-only-constraint that does not gel too well with its children. And that's precisely why this feature was added in the first place.. Regards, Nikhils
Re: [HACKERS] vacuumlo patch
Here's another small change to the patch, it works fine for me and it quite saved my day. I try to submit the patch by email. diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c index f6e2a28..1f88d72 100644 --- a/contrib/vacuumlo/vacuumlo.c +++ b/contrib/vacuumlo/vacuumlo.c @@ -48,6 +48,7 @@ struct _param char *pg_host; int verbose; int dry_run; + int transaction_limit; }; intvacuumlo(char *, struct _param *); @@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param) } else deleted++; + if(param-transaction_limit!=0 deleted=param-transaction_limit) + break; } PQclear(res); @@ -313,6 +316,7 @@ usage(const char *progname) printf( -h HOSTNAME database server host or socket directory\n); printf( -n don't remove large objects, just show what would be done\n); printf( -p PORT database server port\n); + printf( -l LIMIT stop after removing LIMIT LOs\n); printf( -U USERNAME user name to connect as\n); printf( -w never prompt for password\n); printf( -W force password prompt\n); @@ -342,6 +346,7 @@ main(int argc, char **argv) param.pg_port = NULL; param.verbose = 0; param.dry_run = 0; + param.transaction_limit = 0; if (argc 1) { @@ -359,7 +364,7 @@ main(int argc, char **argv) while (1) { - c = getopt(argc, argv, h:U:p:vnwW); + c = getopt(argc, argv, h:U:p:l:vnwW); if (c == -1) break; @@ -395,6 +400,14 @@ main(int argc, char **argv) } param.pg_port = strdup(optarg); break; + case 'l': + param.transaction_limit = strtol(optarg, NULL, 10); + if ((param.transaction_limit 0) || (param.transaction_limit 2147483647)) + { + fprintf(stderr, %s: invalid transaction limit number: %s, valid range is form 0(disabled) to 2147483647.\n, progname, optarg); + exit(1); + } + break; case 'h': param.pg_host = strdup(optarg); break; -- View this message in context: http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 回复: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE?
thank you! -- 原始邮件 -- 发件人: Heikki Linnakangas; 发送时间: 2011年7月26日(星期二) 下午3:58 收件人: _??; 抄送: pgsql-hackers; 主题: Re: [HACKERS] How to use CreateFunctionStmt's RETURN TABLE? On 26.07.2011 10:22, _?? wrote: I do not know how to use the second syntax:RETURNS TABLE '(' table_func_column_list ')' createfunc_opt_list opt_definition. This is hardly a question related to PostgreSQL development, pgsql-general mailing list would've been more appropriate. This is documented at: http://www.postgresql.org/docs/9.0/static/sql-createfunction.html There's an example using that syntax on that page. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Another issue with invalid XML values
On Jul26, 2011, at 01:57 , Noah Misch wrote: On Mon, Jul 25, 2011 at 09:06:41PM +0200, Florian Pflug wrote: On Jul25, 2011, at 20:37 , Bernd Helmle wrote: Ah, but i got now what's wrong here: configure is confusing both libxml2 installations, and a quick look into config.log proves that: it uses the xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of MacPorts, though i seem to recall to have changed this in the past). Hm, but I still think there's a bug lurking there. Using a different libxml2 version for the configure checks than for actual builds surely isn't good... From looking at configure.in, it seems that we use xml2-config to figure out the CFLAGS and LDFLAGS required to build and link against libxml. I guess we somehow end up not using these flags when we later test for xmlStructuredErrorContext, but do use them during the actual build. Or maybe the order of the -I and -L flags just ends up being different in the two cases. I can reproduce similar behavior on GNU/Linux. If my setup was sufficiently similar, Bernd's problematic build would have used this sequence of directives during both configuration and build: -I/usr/include/libxml2 -I/opt/local/include -L/opt/local/lib The directories passed using --with-includes and --with-libraries took priority over those from xml2-config. Since libxml2 headers live in a `libxml2' subdirectory, --with-includes=/opt/local/include did not affect finding them. --with-libraries=/opt/local/lib *did* affect finding the library binaries, though. Therefore, he built entirely against /usr headers and /opt/local libraries. Right. Thanks for detailed analysis, Noah! What's more, xml.c actually does attempt to protect against this situation by calling LIBXML_TEST_VERSION in pg_xml_init_library(). But that check doesn't fire in our case, because it explicitly allows the actual library version to be newer than the header file version, as long as the major versions agree (the major version being 2 in this case). So in essence, libxml promises ABI backwards-compatibility, but breaks that promise when it comes to restoring the structured error context. The only fix I can come up with is to explicitly test whether or not we're able to restore the structured error context in pg_xml_init_library(). I'm envisioning we'd so something like xmlStructuredErrorFunc *saved_errfunc = xmlStructuredError; #if HAVE_XMLSTRUCTUREDERRORCONTEXT void *saved_errctx = xmlStructuredErrorContext; #else void *saved_errctx = xmlGenericErrorContext; #endif xmlSetStructuredErrorFunc((void *) MAGIC, xml_errorHandler); #ifdef HAVE_XMLSTRUCTUREDERRORCONTEXT if (xmlStructuredErrorContext != MAGIC) #else if (xmlGenericErrorContext != MAGIC) #endif { ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg(unable to restore the libxml error context), errhint(Postgres was probably built against libxml 2.7.4 but loaded a version = 2.7.4 at runtime))) } xmlSetStructuredErrorFunc(saved_errctx, saved_errfunc); The downside of this is that a libxml version upgrade might break postgres, of course. But by the time postgres 9.1 is released, 2.7.4 will be nearly 3 years old, so I dunno how like it is that a distro would afterwards decide to upgrade from a version earlier than that to a version later than that. We could rearrange things so the xml2-config -L flags (or lack thereof) take priority over a --with-libraries directory for the purpose of finding libxml2. Hm, how would we do that just for the purpose of finding libxml? Do autotools provide a way to specify per-library -L flags? As a side note, we don't add an rpath for libxml2 like we do for Perl and Python. That doesn't matter on Darwin, but with GNU libc, it entails setting LD_LIBRARY_PATH or updating /etc/ld.so.conf to make the run time linker find the library binary used at build time. Sounds like we should change that. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote: On 07/25/2011 04:07 PM, Robert Haas wrote: I did 5-minute pgbench runs with unlogged tables and with permanent tables, restarting the database server and reinitializing the tables between each run. Database scale? One or multiple pgbench worker threads? A reminder on the amount of RAM in the server would be helpful for interpreting the results too. Ah, sorry. scale = 100, so small. pgbench invocation is: pgbench -T $time -c $clients -j $clients 128GB RAM. The other thing I'd recommend if you're running more write-heavy tests is to turn off autovacuum. Whether or not it kicks in depends on the test duration and the TPS rate, which adds a source of variability better avoided here. It also means that faster tests end up getting penalized by having it run near their end, which makes them no longer look like fast results. Well, I kind of don't want to do that, because surely you wouldn't run that way in production. Anyway, it seems clear that nearly all the contention is in CLogControlLock, ProcArrayLock, and WALInsertLock, and I don't think shutting off autovacuum is going to change that. The point of my benchmarking here may be a bit different from the benchmarking you normally do: I'm not trying to figure out whether I've got the best hardware, or how good it is, so I don't really care about squeezing out every last tps. I just need number that can be reproduced quickly with different patches to see whether a given code change helps and by how much. I wasn't sure that 5-minute pgbench runs would be good enough for that, but so far the results have been stable enough that I'm fairly confident I'm seeing real effects rather than random noise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke nikhil.sonta...@enterprisedb.com wrote: Yeah. I think it's good that there's a barrier to blindly dropping a constraint that may be important to have on children, but there should be a way to override that. Hmmm, but then it does open up the possibility of naive users shooting themselves in the foot. It can be easy to conjure up a parent-only-constraint that does not gel too well with its children. And that's precisely why this feature was added in the first place.. Yeah, but I think we need to take that chance. At the very least, we need to support the equivalent of a non-inherited CHECK (false) on parent tables. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] longstanding mingw warning
Why do we get this warning on Mingw?: x86_64-w64-mingw32-gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -g -I../../../../src/include -I/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/include -I../pgsql/src/include/port/win32 -DEXEC_BACKEND -I/c/prog/mingwdep/include -I/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/include/port/win32 -DBUILDING_DLL -c -o mingwcompat.o /home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/backend/port/win32/mingwcompat.c c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.3896/../pgsql/src/backend/port/win32/mingwcompat.c:60:1: warning: 'RegisterWaitForSingleObject' redeclared without dllimport attribute: previous dllimport ignored Can we get rid of it? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
On 07/26/2011 09:08 AM, Robert Haas wrote: On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke nikhil.sonta...@enterprisedb.com wrote: Yeah. I think it's good that there's a barrier to blindly dropping a constraint that may be important to have on children, but there should be a way to override that. Hmmm, but then it does open up the possibility of naive users shooting themselves in the foot. It can be easy to conjure up a parent-only-constraint that does not gel too well with its children. And that's precisely why this feature was added in the first place.. Yeah, but I think we need to take that chance. At the very least, we need to support the equivalent of a non-inherited CHECK (false) on parent tables. Indeed. I usually enforce that with a trigger that raises an exception, but of course that doesn't help at all with constraint exclusion, and I saw a result just a few weeks ago (I forget the exact details) where it appeared that the plan chosen was significantly worse because the parent table wasn't excluded, so there's a non-trivial downside from having this restriction. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
Florian Pflug f...@phlo.org writes: What's more, xml.c actually does attempt to protect against this situation by calling LIBXML_TEST_VERSION in pg_xml_init_library(). But that check doesn't fire in our case, because it explicitly allows the actual library version to be newer than the header file version, as long as the major versions agree (the major version being 2 in this case). So in essence, libxml promises ABI backwards-compatibility, but breaks that promise when it comes to restoring the structured error context. Fun. The only fix I can come up with is to explicitly test whether or not we're able to restore the structured error context in pg_xml_init_library(). Yeah, I think you are right. The downside of this is that a libxml version upgrade might break postgres, Such an upgrade would break Postgres anyway --- better we are able to detect and report it clearly than that we fail in bizarre ways. As a side note, we don't add an rpath for libxml2 like we do for Perl and Python. Sounds like we should change that. I don't think so. It would just be another headache for packagers --- in Red Hat distros, at least, packages are explicitly forbidden from containing any rpath settings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: bogus descriptions displayed by \d+
On Mon, Jul 25, 2011 at 10:29 PM, Josh Kupershmidt schmi...@gmail.com wrote: That seems like a good way to document this; patch for master updated. I avoided mucking with the documentation for COMMENT ON RULE and COMMENT ON TRIGGER this time; they both say table when they really mean table or view, but maybe trying to differentiate between table, table_or_view, and relation will make things overly complicated. I think this is basically the right approach but I found what you did here a bit wordy, so I simplified it, committed it, and back-patched to 9.0 with suitable adjustment. Hopefully I didn't simplify it into a form you don't like. Also, a patch against master to: * get rid of the bogus Description outputs for \d+ sequence_name and \d+ index_name This part looks OK, but instead of doing a negative test (not-index, not-sequence) let's have it do a positive test, for the same types comment.c allows. Right, fixed. Committed this part to head with minor tweaks. And while I'm messing with this, some further nitpicks about psql not addressed by these patches: * The Storage column for \d+ sequence_name is correct, I suppose, but repetitive I'm OK with removing that. Hrm, would it be better to keep that Storage bit around in some non-repetitive form, maybe on its own line below the table output? Well, I don't really see that it has any value. I'd probably just leave it the way it is, but if we're going to change something, I would favor removing it over relocating it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: As a side note, we don't add an rpath for libxml2 like we do for Perl and Python. Sounds like we should change that. I don't think so. It would just be another headache for packagers --- in Red Hat distros, at least, packages are explicitly forbidden from containing any rpath settings. So what do they do about Perl and Python? Not sure I understand the virtue of being inconsistent here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deferred partial/expression unique constraints
On Mon, Jul 25, 2011 at 2:29 PM, Jeff Davis pg...@j-davis.com wrote: On Fri, 2011-07-22 at 23:35 +0300, Peter Eisentraut wrote: On ons, 2011-07-13 at 11:26 -0400, Tom Lane wrote: Our standard reason for not implementing UNIQUE constraints on expressions has been that then you would have a thing that claims to be a UNIQUE constraint but isn't representable in the information_schema views that are supposed to show UNIQUE constraints. We avoid this objection in the current design by shoving all that functionality into EXCLUDE constraints, which are clearly outside the scope of the spec. I have never heard that reason before, and I think it's a pretty poor one. There are a lot of other things that are not representable in the information schema. +1. I think what Tom is saying is that the information_schema might appear inconsistent to someone following the spec. Can you give another example where we do something like that? http://archives.postgresql.org/pgsql-bugs/2010-08/msg00374.php -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
On Jul26, 2011, at 15:52 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: The only fix I can come up with is to explicitly test whether or not we're able to restore the structured error context in pg_xml_init_library(). Yeah, I think you are right. The downside of this is that a libxml version upgrade might break postgres, Such an upgrade would break Postgres anyway --- better we are able to detect and report it clearly than that we fail in bizarre ways. On further reflection, instead of checking whether we can restore the error handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING) in pg_xml_done() to ereport(ERROR), and include a hint that explains the probably cause. The upside being that we only fail when we actually need to restore the error handler. Since there's one caller (parse_xml_decl) who calls pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference isn't only academic. At least XML would output will continue to work work after libxml is upgraded from 2.7.4 to = 2.7.4. If nobody objects, I'll post a patch to that effect later today. BTW, come to think of it, I believe the elog(ERROR) that you put into xml_errorHandler should to be upgraded to elog(FATAL). Once we longjmp() out of libxml, it doesn't seem safe to re-enter it, so making the backend exit seems to be the only safe thing to do. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
Florian Pflug f...@phlo.org writes: On further reflection, instead of checking whether we can restore the error handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING) in pg_xml_done() to ereport(ERROR), and include a hint that explains the probably cause. The upside being that we only fail when we actually need to restore the error handler. Since there's one caller (parse_xml_decl) who calls pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference isn't only academic. At least XML would output will continue to work work after libxml is upgraded from 2.7.4 to = 2.7.4. Good point. But what about failing in pg_xml_init? That is, after calling xmlSetStructuredErrorFunc, check that it set the variable we expected it to set. The purpose of the check in pg_xml_done is not to detect library issues, but to detect omitted save/restore pairs and similar coding mistakes. I don't want to upgrade it to an ERROR, and I don't want to confuse people by hinting that the problem is with libxml. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vacuumlo patch
Excerpts from Aron's message of mar jul 26 04:18:55 -0400 2011: Here's another small change to the patch, it works fine for me and it quite saved my day. I try to submit the patch by email. There's a problem with this patch: long lines are being wrapped by your email client, which makes it impossible to apply. I think it would be safer if you sent it as an attachment instead of pasting it in the body of the message. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
On Jul26, 2011, at 16:22 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: On further reflection, instead of checking whether we can restore the error handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING) in pg_xml_done() to ereport(ERROR), and include a hint that explains the probably cause. The upside being that we only fail when we actually need to restore the error handler. Since there's one caller (parse_xml_decl) who calls pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference isn't only academic. At least XML would output will continue to work work after libxml is upgraded from 2.7.4 to = 2.7.4. Good point. But what about failing in pg_xml_init? That is, after calling xmlSetStructuredErrorFunc, check that it set the variable we expected it to set. Yeah, that's even better. Will do it that way. What about the suggested upgrade of the elog(ERROR) in xml_errorHandler to elog(FATAL)? Shall I do that too, or leave it for now? The purpose of the check in pg_xml_done is not to detect library issues, but to detect omitted save/restore pairs and similar coding mistakes. I don't want to upgrade it to an ERROR, and I don't want to confuse people by hinting that the problem is with libxml. I can see the concern about possible confusion, and I agree that pg_xml_init(), right after we set our error handler, is the most logical place to test whether we can read it back. I guess I don't really see the benefit of the check in pg_xml_done() triggering a WARNING instead of an ERROR, but since we won't be hijacking that check now anyway, I don't mind it being a WARNING either. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
Robert Haas robertmh...@gmail.com writes: On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke nikhil.sonta...@enterprisedb.com wrote: Hmmm, but then it does open up the possibility of naive users shooting themselves in the foot. It can be easy to conjure up a parent-only-constraint that does not gel too well with its children. And that's precisely why this feature was added in the first place.. Yeah, but I think we need to take that chance. At the very least, we need to support the equivalent of a non-inherited CHECK (false) on parent tables. No, the right solution is to invent an actual concept of partitioned tables, not to keep adding ever-weirder frammishes to inheritance so that it can continue to provide an awkward, poorly-performing emulation of them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
Robert Haas robertmh...@gmail.com writes: On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: I don't think so. It would just be another headache for packagers --- in Red Hat distros, at least, packages are explicitly forbidden from containing any rpath settings. So what do they do about Perl and Python? Patch the source to not add rpath switches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
Florian Pflug f...@phlo.org writes: What about the suggested upgrade of the elog(ERROR) in xml_errorHandler to elog(FATAL)? Shall I do that too, or leave it for now? No objection here --- I had considered writing it that way myself. I refrained for fear of making a bad situation even worse, but if other people think FATAL would be the safer way, fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vacuumlo patch
Hi Tim, you correctly assumed I was just trying to clean up the code. There was no reason to change the behavior, so please ignore my change to the patch. Aron On 26.07.2011, at 15:44, Tim Lewis wrote: Hi Aron, Thanks for the input. The small change you suggest would change the behavior of the patch which I would prefer not to do as I have reasons for the previous behavior. Because you gave no reasons and stop after removing LIMIT LOs was not changed to stop after attempting to remove LIMIT LOs I suspect you were just trying to keep the code clean. The difference: In your version of the patch vacuumlo will stop after N lo_unlink(OID) attempts. The previous behavior of the patch is that vacuumlo will stop after N successful lo_unlink(OID)s. If you have good reason for your behavior please add another flag so that it is optional. There should be a clear distinction between counting vs not, and aborting vs continuing when a lo_unlink(OID) is unsuccessful. On Tue, Jul 26, 2011 at 4:18 AM, Aron m...@eunice.de wrote: Here's another small change to the patch, it works fine for me and it quite saved my day. I try to submit the patch by email. diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c index f6e2a28..1f88d72 100644 --- a/contrib/vacuumlo/vacuumlo.c +++ b/contrib/vacuumlo/vacuumlo.c @@ -48,6 +48,7 @@ struct _param char *pg_host; int verbose; int dry_run; + int transaction_limit; }; intvacuumlo(char *, struct _param *); @@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param) } else deleted++; + if(param-transaction_limit!=0 deleted=param-transaction_limit) + break; } PQclear(res); @@ -313,6 +316,7 @@ usage(const char *progname) printf( -h HOSTNAME database server host or socket directory\n); printf( -n don't remove large objects, just show what would be done\n); printf( -p PORT database server port\n); + printf( -l LIMIT stop after removing LIMIT LOs\n); printf( -U USERNAME user name to connect as\n); printf( -w never prompt for password\n); printf( -W force password prompt\n); @@ -342,6 +346,7 @@ main(int argc, char **argv) param.pg_port = NULL; param.verbose = 0; param.dry_run = 0; + param.transaction_limit = 0; if (argc 1) { @@ -359,7 +364,7 @@ main(int argc, char **argv) while (1) { - c = getopt(argc, argv, h:U:p:vnwW); + c = getopt(argc, argv, h:U:p:l:vnwW); if (c == -1) break; @@ -395,6 +400,14 @@ main(int argc, char **argv) } param.pg_port = strdup(optarg); break; + case 'l': + param.transaction_limit = strtol(optarg, NULL, 10); + if ((param.transaction_limit 0) || (param.transaction_limit 2147483647)) + { + fprintf(stderr, %s: invalid transaction limit number: %s, valid range is form 0(disabled) to 2147483647.\n, progname, optarg); + exit(1); + } + break; case 'h': param.pg_host = strdup(optarg); break; -- View this message in context: http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Noodle Connecting People, Content Capabilities within the Enterprise Toll Free: 866-258-6951 x 701 tim.le...@vialect.com http://www.vialect.com Noodle is a product of Vialect Inc Follow Noodle on Twitter http://www.twitter.com/noodle_news
Re: [HACKERS] vacuumlo patch
Hi Tim, I have to correct my previous answer, my change does not alter the behavior of your patch significantly. The difference: In your version of the patch vacuumlo will stop after N lo_unlink(OID) attempts. The previous behavior of the patch is that vacuumlo will stop after N successful lo_unlink(OID)s. If you have good reason for your behavior please add another flag so that it is optional. There should be a clear distinction between counting vs not, and aborting vs continuing when a lo_unlink(OID) is unsuccessful. if (param-dry_run == 0) { if (lo_unlink(conn, lo) 0) { fprintf(stderr, \nFailed to remove lo %u: , lo); fprintf(stderr, %s, PQerrorMessage(conn)); } else deleted++; } else deleted++; if(param-transaction_limit!=0 deleted=param-transaction_limit) break; The variable deleted is only incremented if a lo_unlink was successful, so my patch only introduces a negligible overhead but no actual change in behavior. I'm very grateful for your patch and I think it should be accepted as soon as possible, one or two if does not matter to me. Aron -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vacuumlo patch
Excerpts from Aron's message of mar jul 26 04:18:55 -0400 2011: Here's another small change to the patch, it works fine for me and it quite saved my day. I try to submit the patch by email. There's a problem with this patch: long lines are being wrapped by your email client, which makes it impossible to apply. I think it would be safer if you sent it as an attachment instead of pasting it in the body of the message. I posted using nabble, edited the post and uploaded the file as an attachment. But here you go again, this time as an attachment. vacuumlo.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
Tom Lane t...@sss.pgh.pa.us writes: Robert Haas robertmh...@gmail.com writes: On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: I don't think so. It would just be another headache for packagers --- in Red Hat distros, at least, packages are explicitly forbidden from containing any rpath settings. So what do they do about Perl and Python? Patch the source to not add rpath switches. No, I take that back. What the RH packages do is configure with --disable-rpath, and so long as that applies to this too, I'd have no objection. What I was mis-remembering is that there's a patch that *puts back* plperl's rpath for libperl, because for some reason that no one has ever satisfactorily explained to me, perl has an exemption from the distro policy that loadable libraries must be installed so that they're known to ldconfig. Which of course is exactly why rpath isn't (supposed to be) needed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote: On 07/25/2011 04:07 PM, Robert Haas wrote: I did 5-minute pgbench runs with unlogged tables and with permanent tables, restarting the database server and reinitializing the tables between each run. Database scale? One or multiple pgbench worker threads? A reminder on the amount of RAM in the server would be helpful for interpreting the results too. Ah, sorry. scale = 100, so small. pgbench invocation is: It might be worthwhile to test only with the accounts and history table and also increasing the number of statements in a transaction. Otherwise the tiny tables can quickly become a bottleneck. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
On Tue, Jul 26, 2011 at 10:51:58AM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jul 26, 2011 at 4:12 AM, Nikhil Sontakke nikhil.sonta...@enterprisedb.com wrote: Hmmm, but then it does open up the possibility of naive users shooting themselves in the foot. It can be easy to conjure up a parent-only-constraint that does not gel too well with its children. And that's precisely why this feature was added in the first place.. Yeah, but I think we need to take that chance. At the very least, we need to support the equivalent of a non-inherited CHECK (false) on parent tables. No, the right solution is to invent an actual concept of partitioned tables, not to keep adding ever-weirder frammishes to inheritance so that it can continue to provide an awkward, poorly-performing emulation of them. Other SQL engines have partitions of types list, range and hash, and some can sub-partition. I'm thinking it might be easiest to do the first before adding layers of partition structure, although we should probably bear in mind that such layers will eventually exist. Does the wiki on this need updating? http://wiki.postgresql.org/wiki/Table_partitioning Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vacuumlo patch
Hi Aron, Thanks for the input. The small change you suggest would change the behavior of the patch which I would prefer not to do as I have reasons for the previous behavior. Because you gave no reasons and stop after removing LIMIT LOs was not changed to stop after attempting to remove LIMIT LOs I suspect you were just trying to keep the code clean. The difference: In your version of the patch vacuumlo will stop after N lo_unlink(OID) attempts. The previous behavior of the patch is that vacuumlo will stop after N successful lo_unlink(OID)s. If you have good reason for your behavior please add another flag so that it is optional. There should be a clear distinction between counting vs not, and aborting vs continuing when a lo_unlink(OID) is unsuccessful. On Tue, Jul 26, 2011 at 4:18 AM, Aron m...@eunice.de wrote: Here's another small change to the patch, it works fine for me and it quite saved my day. I try to submit the patch by email. diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c index f6e2a28..1f88d72 100644 --- a/contrib/vacuumlo/vacuumlo.c +++ b/contrib/vacuumlo/vacuumlo.c @@ -48,6 +48,7 @@ struct _param char *pg_host; int verbose; int dry_run; + int transaction_limit; }; intvacuumlo(char *, struct _param *); @@ -286,6 +287,8 @@ vacuumlo(char *database, struct _param * param) } else deleted++; + if(param-transaction_limit!=0 deleted=param-transaction_limit) + break; } PQclear(res); @@ -313,6 +316,7 @@ usage(const char *progname) printf( -h HOSTNAME database server host or socket directory\n); printf( -n don't remove large objects, just show what would be done\n); printf( -p PORT database server port\n); + printf( -l LIMIT stop after removing LIMIT LOs\n); printf( -U USERNAME user name to connect as\n); printf( -w never prompt for password\n); printf( -W force password prompt\n); @@ -342,6 +346,7 @@ main(int argc, char **argv) param.pg_port = NULL; param.verbose = 0; param.dry_run = 0; + param.transaction_limit = 0; if (argc 1) { @@ -359,7 +364,7 @@ main(int argc, char **argv) while (1) { - c = getopt(argc, argv, h:U:p:vnwW); + c = getopt(argc, argv, h:U:p:l:vnwW); if (c == -1) break; @@ -395,6 +400,14 @@ main(int argc, char **argv) } param.pg_port = strdup(optarg); break; + case 'l': + param.transaction_limit = strtol(optarg, NULL, 10); + if ((param.transaction_limit 0) || (param.transaction_limit 2147483647)) + { + fprintf(stderr, %s: invalid transaction limit number: %s, valid range is form 0(disabled) to 2147483647.\n, progname, optarg); + exit(1); + } + break; case 'h': param.pg_host = strdup(optarg); break; -- View this message in context: http://postgresql.1045698.n5.nabble.com/vacuumlo-patch-tp4628522p4634026.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Noodle Connecting People, Content Capabilities within the Enterprise Toll Free: 866-258-6951 x 701 tim.le...@vialect.com http://www.vialect.com Noodle is a product of Vialect Inc Follow Noodle on Twitter http://www.twitter.com/noodle_news
[HACKERS] How to use CreateFunctionStmt's RETURN TABLE?
Hello Guys, Lately, I saw this syntax in ./src/backend/parser/Gray.y as following! CreateFunctionStmt: CREATE opt_or_replace FUNCTION func_name func_args_with_defaults RETURNS func_return createfunc_opt_list opt_definition { CreateFunctionStmt *n = makeNode(CreateFunctionStmt); n-replace = $2; n-funcname = $4; n-parameters = $5; n-returnType = $7; n-options = $8; n-withClause = $9; $$ = (Node *)n; } | CREATE opt_or_replace FUNCTION func_name func_args_with_defaults RETURNS TABLE '(' table_func_column_list ')' createfunc_opt_list opt_definition { CreateFunctionStmt *n = makeNode(CreateFunctionStmt); n-replace = $2; n-funcname = $4; n-parameters = mergeTableFuncParameters($5, $9); n-returnType = TableFuncTypeName($9); n-returnType-location = @7; n-options = $11; n-withClause = $12; $$ = (Node *)n; } | CREATE opt_or_replace FUNCTION func_name func_args_with_defaults createfunc_opt_list opt_definition { CreateFunctionStmt *n = makeNode(CreateFunctionStmt); n-replace = $2; n-funcname = $4; n-parameters = $5; n-returnType = NULL; n-options = $6; n-withClause = $7; $$ = (Node *)n; } ; I do not know how to use the second syntax:RETURNS TABLE '(' table_func_column_list ')' createfunc_opt_list opt_definition. May someone help me to write a simple example of this syntax! Thank you very much. Looking forward for your help!
Re: [HACKERS] Check constraints on partition parents only?
Andrew Dunstan and...@dunslane.net writes: On 07/25/2011 10:31 PM, Jerry Sievers wrote: Hackers; I just noticed that somewhere between 8.2 and 8.4, an exception is raised trying to alter table ONLY some_partition_parent ADD CHECK (foo). 8.4 had this change: * Force child tables to inherit CHECK constraints from parents (Alex Hunsaker, Nikhil Sontakke, Tom) Formerly it was possible to drop such a constraint from a child table, allowing rows that violate the constraint to be visible when scanning the parent table. This was deemed inconsistent, as well as contrary to SQL standard. You're not the only one who occasionally bangs his head against it. cheers andrew Thanks Andrew!... Yeah, I figured it was a documented change but too lazy tonight to browse release notes :-) The previous behavior was to me convenient, but I agree, probably lead to some confusion too. That our version of partitioning can be overloaded like this though I think adds power. A bit of which we lost adding the restrictgion. -- Jerry Sievers e: jerry.siev...@comcast.net p: 305.321.1144 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote: On Jul26, 2011, at 01:57 , Noah Misch wrote: We could rearrange things so the xml2-config -L flags (or lack thereof) take priority over a --with-libraries directory for the purpose of finding libxml2. Hm, how would we do that just for the purpose of finding libxml? Do autotools provide a way to specify per-library -L flags? Autoconf (built-in macros, that is) and Automake do not get involved in that. I vaguely recall Libtool having support for it, but PostgreSQL does not use Automake or Libtool. I also spoke too loosely: -L is never per-library, but you can emulate that by completing the search externally and passing a full path to the linker. Honestly, the benefit is probably too light to justify going to this trouble. -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vacuumlo patch
I'm not sure what David did for me so as per Roberts suggestion I have addedhttps://commitfest.postgresql.org/action/patch_view?id=609this patch to the commit fest. I'm hoping I have not put this patch in more than one workflowhttp://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_review_and_commit . On Mon, Jul 25, 2011 at 7:28 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 25, 2011 at 9:37 AM, Tim elatl...@gmail.com wrote: Updated the patch to also apply when the no-action flag is enabled. You may want to read this: http://wiki.postgresql.org/wiki/Submitting_a_Patch And add your patch here: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Another issue with invalid XML values
On Jul26, 2011, at 18:04 , Noah Misch wrote: On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote: On Jul26, 2011, at 01:57 , Noah Misch wrote: We could rearrange things so the xml2-config -L flags (or lack thereof) take priority over a --with-libraries directory for the purpose of finding libxml2. Hm, how would we do that just for the purpose of finding libxml? Do autotools provide a way to specify per-library -L flags? Autoconf (built-in macros, that is) and Automake do not get involved in that. I vaguely recall Libtool having support for it, but PostgreSQL does not use Automake or Libtool. I also spoke too loosely: -L is never per-library, but you can emulate that by completing the search externally and passing a full path to the linker. Yeah, that was what I though would be the only way too. I had the slight hope that I had missed something, but unfortunately it seems I didn't :-( Honestly, the benefit is probably too light to justify going to this trouble. Yep. It'd probably be weeks, if not months, until we made that work on all supported platforms. And by the time it does, we'd probably have re-invented a sizeable portion of libtool. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On Tue, Jul 26, 2011 at 11:40 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote: On 07/25/2011 04:07 PM, Robert Haas wrote: I did 5-minute pgbench runs with unlogged tables and with permanent tables, restarting the database server and reinitializing the tables between each run. Database scale? One or multiple pgbench worker threads? A reminder on the amount of RAM in the server would be helpful for interpreting the results too. Ah, sorry. scale = 100, so small. pgbench invocation is: It might be worthwhile to test only with the accounts and history table and also increasing the number of statements in a transaction. Otherwise the tiny tables can quickly become a bottleneck. What kind of bottleneck? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vacuumlo patch
On Tue, Jul 26, 2011 at 12:18:31PM -0400, Timothy D. F. Lewis wrote: On Mon, Jul 25, 2011 at 7:28 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 25, 2011 at 9:37 AM, Tim elatl...@gmail.com wrote: Updated the patch to also apply when the no-action flag is enabled. You may want to read this: http://wiki.postgresql.org/wiki/Submitting_a_Patch And add your patch here: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company I'm not sure what David did for me so as per Roberts suggestion I have addedhttps://commitfest.postgresql.org/action/patch_view?id=609this patch to the commit fest. I'm hoping I have not put this patch in more than one workflowhttp://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_review_and_commit . I just put your name as you've asked me to phrase it into the pending patches part of the PostgreSQL Weekly News. Thanks for the contribution, and here's hoping your experience here will be pleasant and productive :) Cheers, David. p.s. We don't top-post on the PostgreSQL mailing lists. As with the customary reply-to-all, it's a convention we've decided works well for us. :) -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On Tue, Jul 26, 2011 at 12:24 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jul 26, 2011 at 11:40 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote: On 07/25/2011 04:07 PM, Robert Haas wrote: I did 5-minute pgbench runs with unlogged tables and with permanent tables, restarting the database server and reinitializing the tables between each run. Database scale? One or multiple pgbench worker threads? A reminder on the amount of RAM in the server would be helpful for interpreting the results too. Ah, sorry. scale = 100, so small. pgbench invocation is: It might be worthwhile to test only with the accounts and history table and also increasing the number of statements in a transaction. Otherwise the tiny tables can quickly become a bottleneck. What kind of bottleneck? So many transactions trying to update a small set of rows in a table. Is that what we really want to measure ? My thinking is that we might see different result if they are updating different parts of the table and the transaction start/stop overhead is spread across few statements. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On Tue, Jul 26, 2011 at 12:29 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: So many transactions trying to update a small set of rows in a table. Is that what we really want to measure ? My thinking is that we might see different result if they are updating different parts of the table and the transaction start/stop overhead is spread across few statements. I think what I am suggesting is that the default pgbench test setup would probably not give you a good scalability as number of clients are increased and one reason could be the contention in the small table. So it might be a good idea to get rid of that and see if we get much better scalability and understand other bottlenecks. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On Tue, Jul 26, 2011 at 4:40M, Pavan Deolasee pavan.deola...@gmail.com wrote: On Tue, Jul 26, 2011 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 25, 2011 at 10:14 PM, Greg Smith g...@2ndquadrant.com wrote: On 07/25/2011 04:07 PM, Robert Haas wrote: I did 5-minute pgbench runs with unlogged tables and with permanent tables, restarting the database server and reinitializing the tables between each run. Database scale? One or multiple pgbench worker threads? A reminder on the amount of RAM in the server would be helpful for interpreting the results too. Ah, sorry. scale = 100, so small. pgbench invocation is: It might be worthwhile to test only with the accounts and history table and also increasing the number of statements in a transaction. Otherwise the tiny tables can quickly become a bottleneck. +1 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Mon, Jul 25, 2011 at 6:24 PM, Noah Misch n...@2ndquadrant.com wrote: On Fri, Jul 22, 2011 at 03:54:03PM -0400, Robert Haas wrote: On Fri, Jul 22, 2011 at 3:28 PM, Noah Misch n...@2ndquadrant.com wrote: This is attractive, and I don't see any problems with it. (In theory, you could hit a case where the load of resetState gives an ancient false just as the counters wrap to match. Given that the wrap interval is 100x as long as the reset interval, I'm not worried about problems on actual silicon.) It's actually 262,144 times as long - see MSGNUMWRAPAROUND. Ah, so it is. It would be pretty easy to eliminate even the theoretical possibility of a race by getting rid of resetState altogether and using nextMsgNum = -1 to mean that. Maybe I should go ahead and do that. Seems like a nice simplification. On further reflection, I don't see that this helps: it just moves the problem around. With resetState as a separate variable, nextMsgNum is never changed by anyone other than the owner, so we can never have a stale load. But if we overload nextMsgNum to also indicate whether our state has been reset, then there's a race between when we load nextMsgNum and when we load maxMsgNum (instead of code I posted previously, which has a race between when we load resetState and when we load maxMsgNum). Now, as you say, it seems really, really difficult to hit that in practice, but I don't see a way of getting rid of the theoretical possibility without either (1) a spinlock or (2) a fence. (Of course, on x86, the fence could be optimized down to a compiler barrier.) I guess the question is should we worry about that?. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Robert Haas robertmh...@gmail.com writes: On further reflection, I don't see that this helps: it just moves the problem around. With resetState as a separate variable, nextMsgNum is never changed by anyone other than the owner, so we can never have a stale load. But if we overload nextMsgNum to also indicate whether our state has been reset, then there's a race between when we load nextMsgNum and when we load maxMsgNum (instead of code I posted previously, which has a race between when we load resetState and when we load maxMsgNum). Now, as you say, it seems really, really difficult to hit that in practice, but I don't see a way of getting rid of the theoretical possibility without either (1) a spinlock or (2) a fence. (Of course, on x86, the fence could be optimized down to a compiler barrier.) I guess the question is should we worry about that?. Uh, yes. I've lost count of the number of times I've seen people hit one-instruction-wide race condition windows, SIGSEGV crash based on accessing only one byte past the valid data structure, etc etc. The worst thing about this type of bug is that you can't reproduce the failure when you try; doesn't mean your users won't see it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem with compiling beta3 on mingw32+WinXP
After reinstalling mingw is ok. 2011/7/25, pasman pasmański pasma...@gmail.com: After googling i found that mingw's gcc works with 64 bit integers. But printf is incompatible :( . Possible workaround: include inttypes.h , define macros and convert printf strings: printf(% LL,(long long)100) 2011/7/25, pasman pasmański pasma...@gmail.com: Hi. When i try to compile postgresql-beta3 on mingw32 ./configure pass ok, but there is error when i do make: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -I../../src/port -DFRONTEND -I../../src/include -I./src/include/port/win32 -DEXEC_BACKEND -I../ ../src/include/port/win32 -c -o crypt.o crypt.c In file included from crypt.c:44:0: ../../src/include/c.h:284:2: error: #error must have a working 64-bit integer da tatype In file included from ../../src/include/c.h:851:0, from crypt.c:44: ../../src/include/port.h:390:0: warning: fseeko redefined ../../src/include/pg_config_os.h:228:0: note: this is the location of the previo us definition ../../src/include/port.h:391:0: warning: ftello redefined ../../src/include/pg_config_os.h:229:0: note: this is the location of the previo us definition make[2]: *** [crypt.o] Error 1 make[2]: Leaving directory `/home/rosinkr1/postgresql-9.1beta3/src/port' make[1]: *** [all-port-recurse] Error 2 make[1]: Leaving directory `/home/rosinkr1/postgresql-9.1beta3/src' make: *** [all-src-recurse] Error 2 Make version 3.81. Status file included. What is wrong ? pasman -- pasman -- pasman -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Excerpts from Tom Lane's message of mar jul 26 13:43:08 -0400 2011: Uh, yes. I've lost count of the number of times I've seen people hit one-instruction-wide race condition windows, SIGSEGV crash based on accessing only one byte past the valid data structure, etc etc. I think you need a better locking protocol on that counter of yours. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write scalability
On 07/26/2011 12:33 PM, Pavan Deolasee wrote: I think what I am suggesting is that the default pgbench test setup would probably not give you a good scalability as number of clients are increased and one reason could be the contention in the small table. So it might be a good idea to get rid of that and see if we get much better scalability and understand other bottlenecks. You can easily see this form of contention pulling down results when the database itself is really small and the overall transaction rate is very high. With Robert using a scale=100, no more than 80 clients, and transaction rates peaking at 10K TPS on a 24 core box, I wouldn't expect this form of contention to be a large issue. It may be dropping results a few percent, but I'd be surprised if it was any more than that. It's easy enough to use -N to skip the updates to the smaller tellers and branches table to pull that out of the way. TPS will go up, because it's doing less per transaction. That's not necessarily a better test case though, it's just a different one. The regular case includes a lot of overwriting the same blocks in the hot branches and tellers tables. That effectively pushes a lot more I/O toward being likely to happen at checkpoint time. Those tables rarely have any significant I/O outside of the checkpoint in the standard TPC-B like scenario, because their usage counts stay high most of the time. Contention for small tables that are being heavily updated is still important to optimize too though. Which type of test makes more sense depends on what aspect of performance you're trying to focus on. I'll sometimes do a full pgbench-tools sweep (range of multiple scales and clients at each scale) in both regular write and -N write modes, just to collect the slightly different data each provides. The form of checkpoint I/O spike you see at sync time is very different in the two cases, but both usage profiles are important to consider and optimize. That Robert has started with the regular case doesn't worry me too much now that I've seen the parameters he's using, he's not running it in a way where I'd expect branch/teller contention to be a major limiting factor on the results. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 6:36 PM, Robert Haas robertmh...@gmail.com wrote: Now, as you say, it seems really, really difficult to hit that in practice, but I don't see a way of getting rid of the theoretical possibility without either (1) a spinlock or (2) a fence. (Of course, on x86, the fence could be optimized down to a compiler barrier.) I guess the question is should we worry about that?. Perhaps the answer lies in a different direction altogether? Let me ask a few questions to stimulate a different solution * Can we do this using an active technique (e.g. signals) rather than a passive one (reading a counter?) * Can we partition the sinval lock, so we have multiple copies? That increases the task for those who trigger an invalidation, but will relieve the pressure for most readers. * Can we put the sinval info in a different place? e.g. inside each lock partition. * Why do we have a different mechanism for cache invalidation internally (sinval) to the one we offer externally (LISTEN/NOTIFY)? Why don't we have just one? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
On Jul26, 2011, at 17:07 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: What about the suggested upgrade of the elog(ERROR) in xml_errorHandler to elog(FATAL)? Shall I do that too, or leave it for now? No objection here --- I had considered writing it that way myself. I refrained for fear of making a bad situation even worse, but if other people think FATAL would be the safer way, fine. Patch attached. pg_xml_init() now checks the error context after setting it, and ereport(ERROR)s if the check fails. I've made it so that the errhint() which suggest that compiling against libxml = 2.7.3 but running against 2.7.3 might be the reason is only emitted if we actually compiled against an older version. This is meant to avoid confusion should there ever be another reason for that check to fail. I've also changed the elog(ERROR) to elog(FATAL) in xml_errorHandler(). I've pondered whether to add a check to configure which verifies that the headers match the libxml version exactly at compile time. In the end, I didn't do that, for two reasons. First, there isn't anything wrong with using older headers together with a newer libxml, so long as both versions are either = 2.7.3 or 2.7.3. And second, with such a check in place, the only way to exercise libxml's promised ABI compatibility is to upgrade the libxml 2package after compiling postgres. That, however, is unlikely to happen except on production servers, and so by adding the check we'd increase the chance of ABI-compatibility problems remaining undetected during development and testing. best regards, Florian Pflug pg_xml_errctxcheck.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011: Let me ask a few questions to stimulate a different solution * Can we do this using an active technique (e.g. signals) rather than a passive one (reading a counter?) Signals are already in use for special cases (queue is full), and I think going through the kernel to achieve much more will lower performance significantly. * Can we partition the sinval lock, so we have multiple copies? That increases the task for those who trigger an invalidation, but will relieve the pressure for most readers. Not sure there's a way to meaningfully partition the queue. In any case, I think the problem being dealt with here is how to update the read heads of the queue, not its contents. * Can we put the sinval info in a different place? e.g. inside each lock partition. I don't think you can relate sinval messages to particular locks, which would make this infeasible. I think this bit is directly related to the point above. * Why do we have a different mechanism for cache invalidation internally (sinval) to the one we offer externally (LISTEN/NOTIFY)? Why don't we have just one? Performance. Not sure there are other reasons as well; but just this one is significant enough. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011: Let me ask a few questions to stimulate a different solution * Can we do this using an active technique (e.g. signals) rather than a passive one (reading a counter?) Signals are already in use for special cases (queue is full), and I think going through the kernel to achieve much more will lower performance significantly. If there are no invalidations, there would be no signals. How would zero signals decrease performance? * Can we partition the sinval lock, so we have multiple copies? That increases the task for those who trigger an invalidation, but will relieve the pressure for most readers. Not sure there's a way to meaningfully partition the queue. In any case, I think the problem being dealt with here is how to update the read heads of the queue, not its contents. I agree there's no meaningful way to partition the queue, but we can store the information in more than one place to reduce the contention of people requesting it. Both those ideas are still on the table. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 2:56 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Simon Riggs's message of mar jul 26 14:11:19 -0400 2011: Let me ask a few questions to stimulate a different solution * Can we do this using an active technique (e.g. signals) rather than a passive one (reading a counter?) Signals are already in use for special cases (queue is full), and I think going through the kernel to achieve much more will lower performance significantly. If there are no invalidations, there would be no signals. How would zero signals decrease performance? It wouldn't, although it might be bad in the case where there are lots of temp tables being created and dropped. I think the biggest problem with signals is that they don't provide any meaningful synchronization guarantees. When you send somebody a signal, you don't really know how long it's going to take for them to receive it. * Can we partition the sinval lock, so we have multiple copies? That increases the task for those who trigger an invalidation, but will relieve the pressure for most readers. Not sure there's a way to meaningfully partition the queue. In any case, I think the problem being dealt with here is how to update the read heads of the queue, not its contents. I agree there's no meaningful way to partition the queue, but we can store the information in more than one place to reduce the contention of people requesting it. I thought about that. Basically, that saves you a factor of N in contention on the read side (where N is the number of copies) and costs you a factor of N on the write side (you have to update N copies, taking a spinlock or lwlock for each). In the limit, you could do one copy of the counter per backend. I think, though, that a lock-free implementation using memory barriers is going to end up being the only real solution. We might possibly convince ourselves that we're OK with increasing the cost of SIInsertDataEntries(), but any solution that involves replication the data is still based on doing at least some locking. And I am pretty well convinced that even one spinlock acquisition in SIInsertDataEntries() is too many. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Simon Riggs si...@2ndquadrant.com writes: On Tue, Jul 26, 2011 at 7:24 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Signals are already in use for special cases (queue is full), and I think going through the kernel to achieve much more will lower performance significantly. If there are no invalidations, there would be no signals. How would zero signals decrease performance? But if there *is* an invalidation (which is not a negligible case), it'd get significantly slower. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas robertmh...@gmail.com wrote: It wouldn't, although it might be bad in the case where there are lots of temp tables being created and dropped. Do temp tables cause relcache invalidations? That seems like something we'd want to change in itself. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))
On 2011-07-22 17:35, Hitoshi Harada wrote: 2011/7/23 Yeb Havingayebhavi...@gmail.com: Works like a charm :-). However, now there is always a copyObject of a subquery even when the subquery is not safe for qual pushdown. The problem with the previous issafe was that it was only assigned for rel-baserestrictinfo != NIL. If it is assigned before the if statement, it still works. See attached patch that avoids subquery copy for unsafe subqueries, and also exits best_inner_subqueryscan before palloc of differenttypes in case of unsafe queries. Ah, yeah, right. Too quick fix bloated my brain :P Thanks for testing! I'll check it more. A few days ago I read Tomas Vondra's blog post about dss tpc-h queries on PostgreSQL at http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in which he showed how to manually pull up a dss subquery to get a large speed up. Initially I thought: cool, this is probably now handled by Hitoshi's patch, but it turns out the subquery type in the dss query is different. The original and rewritten queries are below. The debug_print_plan output shows the subquery is called from a opexpr ( l_quantity, subquery output) and the sublink type is EXPR_SUBLINK. Looking at the source code; pull_up_sublink only considers ANY and EXISTS sublinks. I'm wondering if this could be expanded to deal with EXPR sublinks. Clearly in the example Tomas has given this can be done. I'm wondering if there are show stoppers that prevent this to be possible in the general case, but can't think of any, other than the case of a sublink returning NULL and the opexpr is part of a larger OR expression or IS NULL; in which case it should not be pulled op, or perhaps it could be pulled up as outer join. Thoughts? regards, Yeb The original query: tpch=# explain select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#13' and p_container = 'JUMBO PKG' and l_quantity ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ) LIMIT 1; QUERY PLAN Limit (cost=183345309.79..183345309.81 rows=1 width=8) - Aggregate (cost=183345309.79..183345309.81 rows=1 width=8) - Hash Join (cost=2839.99..183345307.76 rows=815 width=8) Hash Cond: (public.lineitem.l_partkey = part.p_partkey) Join Filter: (public.lineitem.l_quantity (SubPlan 1)) - Seq Scan on lineitem (cost=0.00..68985.69 rows=2399869 width=17) - Hash (cost=2839.00..2839.00 rows=79 width=4) - Seq Scan on part (cost=0.00..2839.00 rows=79 width=4) Filter: ((p_brand = 'Brand#13'::bpchar) AND (p_container = 'JUMBO PKG'::bpchar)) SubPlan 1 - Aggregate (cost=74985.44..74985.46 rows=1 width=5) - Seq Scan on lineitem (cost=0.00..74985.36 rows=31 width=5) Filter: (l_partkey = part.p_partkey) manually rewritten variant: tpch=# explain select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part, (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg where p_partkey = l_partkey and agg_partkey = l_partkey and p_brand = 'Brand#13' and p_container = 'JUMBO PKG' and l_quantity avg_quantity LIMIT 1; QUERY PLAN Limit (cost=179643.88..179643.89 rows=1 width=8) - Aggregate (cost=179643.88..179643.89 rows=1 width=8) - Hash Join (cost=161865.21..178853.91 rows=315985 width=8) Hash Cond: (public.lineitem.l_partkey = part.p_partkey) Join Filter: (public.lineitem.l_quantity ((0.2 * avg(public.lineitem.l_quantity - HashAggregate (cost=80985.04..82148.65 rows=77574 width=9) - Seq Scan on lineitem (cost=0.00..68985.69 rows=2399869 width=9) - Hash (cost=80849.63..80849.63 rows=2444 width=21) - Hash Join (cost=2839.99..80849.63 rows=2444 width=21) Hash Cond: (public.lineitem.l_partkey = part.p_partkey) - Seq Scan on lineitem (cost=0.00..68985.69 rows=2399869 width=17) - Hash (cost=2839.00..2839.00 rows=79 width=4) - Seq Scan on part (cost=0.00..2839.00 rows=79 width=4) Filter: ((p_brand = 'Brand#13'::bpchar) AND (p_container = 'JUMBO PKG'::bpchar)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [HACKERS] WIP: Fast GiST index build
On Tue, Jul 26, 2011 at 9:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: That was a quite off-the-cuff remark, so I took the patch and culled out loaded-tree bookkeeping. A lot of other changes fell off from that, so it took me quite some time to get it working again, but here it is. This is a *lot* smaller patch, although that's partly explained by the fact that I left out some features: prefetching and the neighbor relocation code is gone. I'm pretty exhausted by this, so I just wanted to send this out without further analysis. Let me know if you have questions on the approach taken. I'm also not sure how this performs compared to your latest patch, I haven't done any performance testing. Feel free to use this as is, or as a source of inspiration :-). I also was going to try to evade keeping loaded-tree hash. This might help me a lot. Thanks. -- With best regards, Alexander Korotkov.
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas robertmh...@gmail.com wrote: * Can we partition the sinval lock, so we have multiple copies? That increases the task for those who trigger an invalidation, but will relieve the pressure for most readers. Not sure there's a way to meaningfully partition the queue. In any case, I think the problem being dealt with here is how to update the read heads of the queue, not its contents. I agree there's no meaningful way to partition the queue, but we can store the information in more than one place to reduce the contention of people requesting it. I thought about that. Basically, that saves you a factor of N in contention on the read side (where N is the number of copies) and costs you a factor of N on the write side (you have to update N copies, taking a spinlock or lwlock for each). In the limit, you could do one copy of the counter per backend. I think, though, that a lock-free implementation using memory barriers is going to end up being the only real solution. We might possibly convince ourselves that we're OK with increasing the cost of SIInsertDataEntries(), but any solution that involves replication the data is still based on doing at least some locking. And I am pretty well convinced that even one spinlock acquisition in SIInsertDataEntries() is too many. You might be right, but I think we have little knowledge of how some memory barrier code you haven't written yet effects performance on various architectures. A spinlock per backend would cache very nicely, now you mention it. So my money would be on the multiple copies. It's not completely clear to me that updating N copies would be more expensive. Accessing N low contention copies rather than 1 high-contention value might actually be a win. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Noah Misch n...@2ndquadrant.com writes: On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote: I think I have a simpler idea, though: before acquiring any locks, just have SIGetDataEntries() do this: + if (stateP-nextMsgNum == segP-maxMsgNum !stateP-resetState) + return 0; Patch (with comment explaining why I think this is OK) attached. If the message numbers happen to be equal only because the counter has wrapped, then stateP-resetState will be true, so we'll still realize we need to do some work. This is attractive, and I don't see any problems with it. (In theory, you could hit a case where the load of resetState gives an ancient false just as the counters wrap to match. Given that the wrap interval is 100x as long as the reset interval, I'm not worried about problems on actual silicon.) +1 for doing this and moving on. After some further reflection I believe this patch actually is pretty safe, although Noah's explanation of why seems a bit confused. The key points are that (1) each of the reads is atomic, and (2) we should not be able to see a value that is older than our last acquisition/release of a shared memory lock. These being granted, we will make a decision that is correct, or at least was correct as of some time after that last lock action. As stated in the patch comments, we are not required to promptly assimilate sinval actions on objects we don't hold any lock on, so this seems sufficient. In essence, we are relying on an assumed prior visit to the lock manager to provide memory access synchronization for the sinval no-work-to-do test. The case Noah speculates about above amounts to supposing that this reasoning fails to hold for the resetState value, and I don't see why that would be true. Even if the above scenario did manage to happen, it would not mean that we missed the reset, just that we hadn't noticed it *yet*. And by hypothesis, none of the as-yet-not-seen catalog updates are a problem for us. BTW, the patch really ought to add something to the comment around line 100. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another issue with invalid XML values
Florian Pflug f...@phlo.org writes: Patch attached. Will check and apply this. I've pondered whether to add a check to configure which verifies that the headers match the libxml version exactly at compile time. In the end, I didn't do that, for two reasons. First, there isn't anything wrong with using older headers together with a newer libxml, so long as both versions are either = 2.7.3 or 2.7.3. And second, with such a check in place, the only way to exercise libxml's promised ABI compatibility is to upgrade the libxml 2package after compiling postgres. That, however, is unlikely to happen except on production servers, and so by adding the check we'd increase the chance of ABI-compatibility problems remaining undetected during development and testing. I concur with *not* doing that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote: Noah Misch n...@2ndquadrant.com writes: On Thu, Jul 21, 2011 at 11:37:27PM -0400, Robert Haas wrote: I think I have a simpler idea, though: before acquiring any locks, just have SIGetDataEntries() do this: + if (stateP-nextMsgNum == segP-maxMsgNum !stateP-resetState) + return 0; Patch (with comment explaining why I think this is OK) attached. If the message numbers happen to be equal only because the counter has wrapped, then stateP-resetState will be true, so we'll still realize we need to do some work. This is attractive, and I don't see any problems with it. (In theory, you could hit a case where the load of resetState gives an ancient false just as the counters wrap to match. Given that the wrap interval is 100x as long as the reset interval, I'm not worried about problems on actual silicon.) +1 for doing this and moving on. After some further reflection I believe this patch actually is pretty safe, although Noah's explanation of why seems a bit confused. The key points are that (1) each of the reads is atomic, and (2) we should not be able to see a value that is older than our last acquisition/release of a shared memory lock. These being granted, we will make a decision that is correct, or at least was correct as of some time after that last lock action. As stated in the patch comments, we are not required to promptly assimilate sinval actions on objects we don't hold any lock on, so this seems sufficient. In essence, we are relying on an assumed prior visit to the lock manager to provide memory access synchronization for the sinval no-work-to-do test. The case Noah speculates about above amounts to supposing that this reasoning fails to hold for the resetState value, and I don't see why that would be true. Even if the above scenario did manage to happen, it would not mean that we missed the reset, just that we hadn't noticed it *yet*. And by hypothesis, none of the as-yet-not-seen catalog updates are a problem for us. Here's the way it can fail: 1. Backend enters SIGetDataEntries() with main memory bearing stateP-resetState = false, stateP-nextMsgNum = 500, segP-maxMsgNum = 505. The CPU has those latest stateP values in cache, but segP-maxMsgNum is *not* in cache. 2. Backend stalls for long time. Meanwhile, other backends issue MSGNUMWRAPAROUND - 5 invalidation messages. Main memory bears stateP-resetState = true, stateP-nextMsgNum = 500 - MSGNUMWRAPAROUND, segP-maxMsgNum = 500. 3. Backend wakes up, uses its cached stateP values and reads segP-maxMsgNum = 500 from main memory. The patch's test finds no need to reset or process invalidation messages. That's the theoretical risk I wished to illustrate. Though this appears possible on an abstract x86_64 system, I think it's unrealistic to suppose that a dirty cache line could persist *throughout* the issuance of more than 10^9 invalidation messages on a concrete implementation. A way to think about the problem is that our read of segP-maxMsgNum is too new. If we had used a snapshot of values as of the most recent lock acquisition/release, there would be no problem. It's the mix of a new-enough stateP with an all-too-new segP that yields the anomaly. -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 01:36:26PM -0400, Robert Haas wrote: On Mon, Jul 25, 2011 at 6:24 PM, Noah Misch n...@2ndquadrant.com wrote: On Fri, Jul 22, 2011 at 03:54:03PM -0400, Robert Haas wrote: On Fri, Jul 22, 2011 at 3:28 PM, Noah Misch n...@2ndquadrant.com wrote: This is attractive, and I don't see any problems with it. (In theory, you could hit a case where the load of resetState gives an ancient false just as the counters wrap to match. Given that the wrap interval is 100x as long as the reset interval, I'm not worried about problems on actual silicon.) It's actually 262,144 times as long - see MSGNUMWRAPAROUND. Ah, so it is. It would be pretty easy to eliminate even the theoretical possibility of a race by getting rid of resetState altogether and using nextMsgNum = -1 to mean that. Maybe I should go ahead and do that. Seems like a nice simplification. On further reflection, I don't see that this helps: it just moves the problem around. Alas, yes. With resetState as a separate variable, nextMsgNum is never changed by anyone other than the owner, so we can never have a stale load. It's also changed when SICleanupQueue() decides to wrap everyone. This could probably be eliminated by using uint32 and letting overflow take care of wrap implicitly, but ... But if we overload nextMsgNum to also indicate whether our state has been reset, then there's a race between when we load nextMsgNum and when we load maxMsgNum (instead of code I posted previously, which has a race between when we load resetState and when we load maxMsgNum). Now, as you say, it seems really, really difficult to hit that in practice, but I don't see a way of getting rid of the theoretical possibility without either (1) a spinlock or (2) a fence. (Of course, on x86, the fence could be optimized down to a compiler barrier.) No new ideas come to mind, here. We could migrate back toward your original proposal of making the counter a non-wrapping uint64; Florian described some nice techniques for doing atomic 64-bit reads on x86: http://archives.postgresql.org/message-id/7c94c386-122f-4918-8624-a14352e8d...@phlo.org I'm not personally acquainted with those approaches, but they sound promising. Since the overlap between 32-bit installations and performance-sensitive installations is all but gone, we could even just use a spinlock under 32-bit. I guess the question is should we worry about that?. I'd lean toward no. I share Tom's unease about writing off a race condition as being too improbable, but this is quite exceptionally improbable. On the other hand, some of the fixes don't look too invasive. -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] XMLATTRIBUTES vs. values of type XML
Hi While reviewing the (now applied) XPATH escaping patches, Radoslaw found one case where the previous failure of XPATH to escape its return value was offset by XMLATTRIBUTES insistence to escape all input values, even if they're already of type XML. To wit, if you do SELECT XMLELEMENT(NAME t, XMLATTRIBUTES('amp;'::XML AS a)) you get xmlelement t a=amp;amp;/ which is somewhat surprising. Especially since SELECT XMLELEMENT(NAME t, 'amp;'::XML); gives xmlelement -- tamp;/t as one would except. Now, it seems rather unlikely that a real application would actually contain a query similar to the former one - you usually don 't store XML in the attributes of XML nodes. But since XPATH() returns XML, it's not unlikely that an application would do SELECT XMLELEMENT(NAME ..., XMLATTRIBUTES((XPATH(...))[1] AS ...)) As it stands, on 9.2 the values returned by the XPath expression will be escaped twice - once by XPATH and a second time by XMLATTRIBUTES, while on 9.1 the value will be escaped only once. Since this seems to be the most likely situation in which XMLATTRIBUTES would receive an input argument of type XML, and since it's not entirely logical for XMLATTRIBUTES to unconditionally escapes values already of type XML, I propose to change the behaviour of XMLATTRIBUTES as follows. Values not of type XML are be treated as they always have been, i.e. all special characters (') are replaced by an entity reference (lt;, gt;, amp;, quot;, apos;). Values of type XML are assumed to be already escaped, so '' isn't treated as a special character to avoid double escaping. The other special characters (,,,') are escaped as usual. The safety of this relies on the fact that '' may never appear in a well-formed XML document, except as part of an entity reference. This seems to be the case, except if CDATA sections are involved, which may contain plain ampersands. So the actual escaping rule would have to be a bit more complex than I made it sound above - we'd need to detect CDATA sections, and re-enabled the escaping of '' until the end of such a section. To actually implement this, I'd remove the use of libxml from the implementation of XMLELEMENT, and instead use our already existing escape_xml() function, enhanced with the ability to handle the partial escaping algorithm outlined above. We already only use libxml to escape attribute values, so doing that isn't a radical departure from xml.c's ways. As an added benefit, all the encoding-related problems of XMLELEMENT and XMLATTRIBUTES would go away once libxml is removed from this code path. So XPATH() would be the only remaining function which breaks in non-UTF-8 databases. Comments? Thoughts? Suggestions? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] isolation test deadlocking on buildfarm member coypu
After committing Noah's patch to fix the isolation tests, there have been two more failures in Rémi's machines pika and coypu: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pikadt=2011-07-24%2006%3A45%3A45 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypudt=2011-07-23%2021%3A50%3A54 In both cases, the failure is identical in fk-deadlock2: == pgsql.20950/src/test/isolation/regression.diffs === *** /home/pgbuildfarm/workdir/HEAD/pgsql.20950/src/test/isolation/expected/fk-deadlock2.out Sun Jul 24 08:46:44 2011 --- /home/pgbuildfarm/workdir/HEAD/pgsql.20950/src/test/isolation/results/fk-deadlock2.out Sun Jul 24 15:11:42 2011 *** *** 22,29 step s2u1: UPDATE B SET Col2 = 1 WHERE BID = 2; step s1u2: UPDATE B SET Col2 = 1 WHERE BID = 2; waiting ... step s2u2: UPDATE B SET Col2 = 1 WHERE BID = 2; - step s1u2: ... completed ERROR: deadlock detected step s1c: COMMIT; step s2c: COMMIT; --- 22,29 step s2u1: UPDATE B SET Col2 = 1 WHERE BID = 2; step s1u2: UPDATE B SET Col2 = 1 WHERE BID = 2; waiting ... step s2u2: UPDATE B SET Col2 = 1 WHERE BID = 2; ERROR: deadlock detected + step s1u2: ... completed step s1c: COMMIT; step s2c: COMMIT; I think the only explanation necessary for this is that one process reports its status before the other one. I think it would be enough to add another variant of the expected file to fix this problem, but I don't quite want to do that because we already have three of them, and I think we would need to add one per existing expected, so we'd end up with 6 expected files which would be a pain to work with. Not quite sure what to do with this. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Noah Misch n...@2ndquadrant.com writes: On Tue, Jul 26, 2011 at 03:40:32PM -0400, Tom Lane wrote: After some further reflection I believe this patch actually is pretty safe, although Noah's explanation of why seems a bit confused. Here's the way it can fail: 1. Backend enters SIGetDataEntries() with main memory bearing stateP-resetState = false, stateP-nextMsgNum = 500, segP-maxMsgNum = 505. The CPU has those latest stateP values in cache, but segP-maxMsgNum is *not* in cache. 2. Backend stalls for long time. Meanwhile, other backends issue MSGNUMWRAPAROUND - 5 invalidation messages. Main memory bears stateP-resetState = true, stateP-nextMsgNum = 500 - MSGNUMWRAPAROUND, segP-maxMsgNum = 500. 3. Backend wakes up, uses its cached stateP values and reads segP-maxMsgNum = 500 from main memory. The patch's test finds no need to reset or process invalidation messages. [ squint... ] Hmm, you're right. The case where this would break things is if (some of) the five unprocessed messages relate to some object we've just locked. But the initial state you describe would be valid right after obtaining such a lock. That's the theoretical risk I wished to illustrate. Though this appears possible on an abstract x86_64 system, I think it's unrealistic to suppose that a dirty cache line could persist *throughout* the issuance of more than 10^9 invalidation messages on a concrete implementation. Dirty cache line, maybe not, but what if the assembly code commands the CPU to load those variables into CPU registers before doing the comparison? If they're loaded with maxMsgNum coming in last (or at least after resetState), I think you can have the problem without any assumptions about cache line behavior at all. You just need the process to lose the CPU at the right time. If we marked the pointers volatile, we could probably ensure that the assembly code tests resetState last, but that's not sufficient to avoid the stale-cache-line risk. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))
Yeb Havinga yebhavi...@gmail.com writes: A few days ago I read Tomas Vondra's blog post about dss tpc-h queries on PostgreSQL at http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in which he showed how to manually pull up a dss subquery to get a large speed up. Initially I thought: cool, this is probably now handled by Hitoshi's patch, but it turns out the subquery type in the dss query is different. Actually, I believe this example is the exact opposite of the transformation Hitoshi proposes. Tomas was manually replacing an aggregated subquery by a reference to a grouped table, which can be a win if the subquery would be executed enough times to amortize calculation of the grouped table over all the groups (some of which might never be demanded by the outer query). Hitoshi was talking about avoiding calculations of grouped-table elements that we don't need, which would be a win in different cases. Or at least that was the thrust of his original proposal; I'm not sure where the patch went since then. This leads me to think that we need to represent both cases as the same sort of query and make a cost-based decision as to which way to go. Thinking of it as a pull-up or push-down transformation is the wrong approach because those sorts of transformations are done too early to be able to use cost comparisons. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote: Noah Misch n...@2ndquadrant.com writes: That's the theoretical risk I wished to illustrate. Though this appears possible on an abstract x86_64 system, I think it's unrealistic to suppose that a dirty cache line could persist *throughout* the issuance of more than 10^9 invalidation messages on a concrete implementation. Dirty cache line, maybe not, but what if the assembly code commands the CPU to load those variables into CPU registers before doing the comparison? If they're loaded with maxMsgNum coming in last (or at least after resetState), I think you can have the problem without any assumptions about cache line behavior at all. You just need the process to lose the CPU at the right time. True. If the compiler places the resetState load first, you could hit the anomaly by merely setting a breakpoint on the next instruction, waiting for exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend continue. I think, though, we should either plug that _and_ the cache incoherency case or worry about neither. -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Noah Misch n...@2ndquadrant.com writes: On Tue, Jul 26, 2011 at 05:05:15PM -0400, Tom Lane wrote: Dirty cache line, maybe not, but what if the assembly code commands the CPU to load those variables into CPU registers before doing the comparison? If they're loaded with maxMsgNum coming in last (or at least after resetState), I think you can have the problem without any assumptions about cache line behavior at all. You just need the process to lose the CPU at the right time. True. If the compiler places the resetState load first, you could hit the anomaly by merely setting a breakpoint on the next instruction, waiting for exactly MSGNUMWRAPAROUND messages to enqueue, and letting the backend continue. I think, though, we should either plug that _and_ the cache incoherency case or worry about neither. How do you figure that? The poor-assembly-code-order risk is both a lot easier to fix and a lot higher probability. Admittedly, it's still way way down there, but you only need a precisely-timed sleep, not a precisely-timed sleep *and* a cache line that somehow remained stale. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add missing newlines at end of error messages
This seems like a mechanical check. Should it be part of what gets checked when people push? Cheers, David. On Tue, Jul 26, 2011 at 08:30:16PM +, Peter Eisentraut wrote: Add missing newlines at end of error messages Branch -- master Details --- http://git.postgresql.org/pg/commitdiff/e67efb01e886d69d40d1cd87fba4507e8bb1035e Modified Files -- src/bin/psql/command.c |2 +- src/bin/psql/common.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- Sent via pgsql-committers mailing list (pgsql-committ...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-committers -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jul 25, 2011, at 6:22 PM, Robert Haas wrote: On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby j...@nasby.net wrote: Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we can enforce that we'll only recognize new TZ info as part of a config reload? Hmm. That might work in theory, but I don't see any good way to update every database's tz table on each reload. I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't want all of them to try to actually do the updates, only the first one that discovers the change. If that's the problem you foresee then perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do that with race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do. Or are you seeing a problem I'm missing? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Tue, Jul 26, 2011 at 6:45 PM, Jim Nasby j...@nasby.net wrote: On Jul 25, 2011, at 6:22 PM, Robert Haas wrote: On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby j...@nasby.net wrote: Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we can enforce that we'll only recognize new TZ info as part of a config reload? Hmm. That might work in theory, but I don't see any good way to update every database's tz table on each reload. I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't want all of them to try to actually do the updates, only the first one that discovers the change. If that's the problem you foresee then perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do that with race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do. Or are you seeing a problem I'm missing? What if 4 backends concurrently are the first ones to try to simultaneously add South America/Ruritania, and... 1. Connection #1 came in 'first', but rolls back its transaction. 2. Connection #2 came in 'second', and also winds up rolling back its transaction because the connection fails due to a network problem. 3. Connection #3 actually completes. But doesn't commit until after #4. 4. Connection #4 started last, but turns out to COMMIT first. The merge is a pretty bad one. They all have to try to succeed, and in some way that doesn't block things. Perhaps the TZ values need to not be UNIQUE, but some process can come in afterwards and rewrite to drop out non-unique values. That's not very nice either; that means you can't use a FK reference against the TZ table. Or you need to have something that comes in afterwards and repoints tuples to the *real* TZ entry, which seems likely to be troublesome. This just gets so ugly so fast; the attempt to save space by storing a pointer to the TZ value is just filled with trouble (and potential #fail). -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Check constraints on partition parents only?
On Jul 25, 2011, at 9:59 PM, Jerry Sievers wrote: That our version of partitioning can be overloaded like this though I think adds power. A bit of which we lost adding the restrictgion. That's why I'd be opposed to any partitioning scheme that removed the ability to have different fields in different children. We've found that ability to be very useful. Likewise, I think we need to have intelligent plans involving a parent table that's either completely empty or mostly empty. As for dealing with inheritance and putting stuff on some children but not others, take a look at http://pgfoundry.org/projects/enova-tools/. There's a presentation there that discusses how we solved these issues and it includes the tools we created to do it. Note that we're close to releasing a cleaner version of that stuff, so if you decide to use it please ping me off-list if we haven't gotten the new stuff posted. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jul 26, 2011, at 5:56 PM, Christopher Browne wrote: I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't want all of them to try to actually do the updates, only the first one that discovers the change. If that's the problem you foresee then perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do that with race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do. Or are you seeing a problem I'm missing? What if 4 backends concurrently are the first ones to try to simultaneously add South America/Ruritania, and... 1. Connection #1 came in 'first', but rolls back its transaction. 2. Connection #2 came in 'second', and also winds up rolling back its transaction because the connection fails due to a network problem. 3. Connection #3 actually completes. But doesn't commit until after #4. 4. Connection #4 started last, but turns out to COMMIT first. Ugh, I didn't realize that a reload would take effect in the middle of a transaction. That certainly kills what I proposed. Though, now I'm wondering why this info would need to be in every database anyway... certainly this should be treated as global data, and if that's the case then only one process needs to update it. Though I'm not sure if it's possible for global data to be ACID. Anyway, rather than continue this on-list, I'm going to get Alvaro to think about it in more detail and see what he comes up with. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New partitioning WAS: Check constraints on partition parents only?
Jim, That's why I'd be opposed to any partitioning scheme that removed the ability to have different fields in different children. We've found that ability to be very useful. Likewise, I think we need to have intelligent plans involving a parent table that's either completely empty or mostly empty. Well, I don't think that anyone is proposing making constraint exclusion go away. However, we also need a new version of partitioning which happens below the table level. I don't agree that the new partitioning needs -- at least at the start -- the level of flexibility which CE gives the user. In order to get simplicity, we have to sacrifice flexibility. In fact, I'd suggest extreme simplicity for the first version of this, with just key partitioning. That is: CREATE TABLE table_name ( ... cols ... ) PARTITION ON key_expression [ AUTOMATIC CREATE ]; ... where key_expression can be any immutable expression on one or more columns of some_table. This actually covers range partitioning as well, provided that the ranges can be expressed as the results of an expression (e.g. EXTRACT ('month' FROM date_processed ) ). For the optional AUTOMATIC CREATE phrase, new values for key_expression would result in the automatic creation of new partitions when they appear (this has some potential deadlocking issues, so it's not ideal for a lot of applications). Otherwise, you'd create partitions manually: CREATE PARTITION ON table_name KEY key_value; DROP PARTITION ON table_name KEY key_value; ... where key_value is some valid value which could result from key_expression. Yes, this is a very narrow and simplistic partitioning spec. However, it would cover 80% of the use cases I see in the field or on IRC, while being 80% simpler than CE. And CE would still be there for those who need it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: bogus descriptions displayed by \d+
On Tue, Jul 26, 2011 at 9:53 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 25, 2011 at 10:29 PM, Josh Kupershmidt schmi...@gmail.com wrote: I think this is basically the right approach but I found what you did here a bit wordy, so I simplified it, committed it, and back-patched to 9.0 with suitable adjustment. Hopefully I didn't simplify it into a form you don't like. That looks fine. Minor grammar quibble about: + When commenting on a column, + replaceable class=parameterrelation_name/replaceable must refer + to a table, view, composite types, or foreign table. types should probably be the singular type. * get rid of the bogus Description outputs for \d+ sequence_name and \d+ index_name Committed this part to head with minor tweaks. Thanks for the commit. * The Storage column for \d+ sequence_name is correct, I suppose, but repetitive I'm OK with removing that. Hrm, would it be better to keep that Storage bit around in some non-repetitive form, maybe on its own line below the table output? Well, I don't really see that it has any value. I'd probably just leave it the way it is, but if we're going to change something, I would favor removing it over relocating it. I notice the Storage information is also repeated for multi-column indexes. I don't mind leaving this wart as-is for now, since single-column indexes are probably the norm, and we would presumably want to fix both types in one go. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 3:34 PM, Simon Riggs si...@2ndquadrant.com wrote: You might be right, but I think we have little knowledge of how some memory barrier code you haven't written yet effects performance on various architectures. A spinlock per backend would cache very nicely, now you mention it. So my money would be on the multiple copies. Maybe so, but you can see from the numbers in my OP that the results still leave something to be desired. It's not completely clear to me that updating N copies would be more expensive. Accessing N low contention copies rather than 1 high-contention value might actually be a win. Yeah, I haven't tested that approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 3:25 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Jul 26, 2011 at 8:11 PM, Robert Haas robertmh...@gmail.com wrote: It wouldn't, although it might be bad in the case where there are lots of temp tables being created and dropped. Do temp tables cause relcache invalidations? That seems like something we'd want to change in itself. I agree. Unfortunately, I think it's a non-trivial fix. I've also been wondering if we could avoid taking an AccessExclusiveLock on a newly created (temporary?) table. It seems like no one should be able to see it until commit, at which point we'd be releasing the lock anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Tue, Jul 26, 2011 at 4:38 PM, Noah Misch n...@2ndquadrant.com wrote: No new ideas come to mind, here. OK, I have a new idea. :-) 1. Add a new flag to each procState called something like timeToPayAttention. 2. Each call to SIGetDataEntries() iterates over all the ProcStates whose index is lastBackend and sets stateP-timeToPayAttention = TRUE for each. 3. At the beginning of SIGetDataEntries(), we do an unlocked if (!stateP-timeToPayAttention) return 0. 4. Immediately following that if statement and before acquiring any locks, we set stateP-timeToPayAttention = FALSE. The LWLockRelease() in SIGetDataEntries() will be sufficient to force all of the stateP-timeToPayAttention writes out to main memory, and the read side is OK because we either just took a lock (which acted as a fence) or else there's a race anyway. But unlike my previous proposal, it doesn't involve *comparing* anything. We needn't worry about whether we could read two different values that are through great misfortune out of sync because we're only reading one value. If, by chance, the value is set to true just after we set it to false, that won't mess us up either: we'll still read all the messages after acquiring the lock. Now, there's some downside to this approach - it involves doing O(N) work for each invalidation message we receive. But as long as it's only O(N) stores and not O(N) lock acquisitions and releases, I think that might be OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers