Re: [HACKERS] narwhal and PGDLLIMPORT
On Sun, Feb 2, 2014 at 1:03 AM, Tom Lane wrote: > I happened to notice today that the owner of buildfarm member narwhal > is trying to revive it after a long time offline, but it's failing in > the 9.3 branch (and not attempting to build HEAD, yet). The cause > appears to be that contrib/postgres_fdw is referencing the DateStyle > and IntervalStyle variables, which aren't marked PGDLLIMPORT. > Hm, well, that would be an easy change ... but that code was committed > last March. How is it that we didn't notice this long ago? > > What this seems to indicate is that narwhal is the only buildfarm > animal that has a need for PGDLLIMPORT marks on global variables that > are referenced by extensions. Furthermore, nobody has attempted to > build 9.3 on a platform that needs that (or at least they've not > reported failure to us). > > According to the buildfarm database, narwhal is running a gcc build on > Windows 2003. That hardly seems like a mainstream use case. I could > believe that it might be of interest to developers, but clearly no > developers are actually running such a build. > > I think we should give serious consideration to desupporting this > combination so that we can get rid of the plague of PGDLLIMPORT > marks. Obviously this would depend on confirming that there are > no more-interesting Windows build methods that require it --- but > if there are any, I'd sure demand that there be an active buildfarm > instance to keep us from breaking the case again in future. No objection here - though I should point out that it's not been offline for a long time (aside from a couple of weeks in January) - it's been happily building most pre-9.2 branches for ages. 9.1 seems to be stuck, along with HEAD, and I forgot to add 9.3. I'm in the process of cleaning that up as time allows, but am happy to drop it instead if we no longer want to support anything that old. We certainly don't use anything resembling that config for the EDB installer builds. I'm happy to replace it with something newer as time allows - what do we consider to be the biggest gap? I have an MSDN subscription, so can do any versions of Windows or VC++ (and obviously Mingw). -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sat, Feb 1, 2014 at 1:41 PM, Andres Freund wrote: >> However, I tested the >> most recent revision from your git remote on the AWS instance. > But that was before my fix, right. Except you managed to timetravel :) Heh, okay. So Nathan Boley has generously made available a machine with 4 AMD Opteron 6272s. I've performed the same benchmark on that server. However, I thought it might be interesting to circle back and get some additional numbers for the AWS instance already tested - I'd like to see what it looks like after your recent tweaks to fix the regression. The single client performance of that instance seems to be markedly better than that of Nathan's server. Tip: AWS command line tools + S3 are a great way to easily publish bulky pgbench-tools results, once you figure out how to correctly set your S3 bucket's security manifest to allow public http. It has similar advantages to rsync, and just works with the minimum of fuss. Anyway, I don't think that the new, third c3.8xlarge-rwlocks testset tells us much of anything: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/c38xlarge-rwlocks/ Here are the results of a benchmark on Nathan Boley's 64-core, 4 socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ Perhaps I should have gone past 64 clients, because in the document "Lock Scaling Analysis on Intel Xeon Processors" [1], Intel write: "This implies that with oversubscription (more threads running than available logical CPUs), the performance of spinlocks can depend heavily on the exact OS scheduler behavior, and may change drastically with operating system or VM updates." I haven't bothered with a higher client counts though, because Andres noted it's the same with 90 clients on this AMD system. Andres: Do you see big problems when # clients < # logical cores on the affected Intel systems? There is only a marginal improvement in performance on this big 4 socket system. Andres informs me privately that he has reproduced the problem on multiple new 4-socket Intel servers, so it seems reasonable to suppose that more or less an Intel thing. The Intel document [1] further notes: "As the number of threads polling the status of a lock address increases, the time it takes to process those polling requests will increase. Initially, the latency to transfer data across socket boundaries will always be an order of magnitude longer than the on-chip cache-to-cache transfer latencies. Such cross-socket transfers, if they are not effectively minimized by software, will negatively impact the performance of any lock algorithm that depends on them." So, I think it's fair to say, given what we now know from Andres' numbers and the numbers I got from Nathan's server, that this patch is closer to being something that addresses a particularly unfortunate pathology on many-socket Intel system than it is to being a general performance optimization. Based on the above quoted passage, it isn't unreasonable to suppose other vendors or architectures could be affected, but that isn't in evidence. While I welcome the use of atomic operations in the context of LW_SHARED acquisition as general progress, I think that to the outside world my proposed messaging is more useful. It's not quite a bug-fix, but if you're using a many-socket Intel server, you're *definitely* going to want to use a PostgreSQL version that is unaffected. You may well not want to take on the burden of waiting for 9.4, or waiting for it to fully stabilize. I note that Andres has a feature branch of this backported to Postgres 9.2, no doubt because of a request from a 2ndQuadrant customer. I have to wonder if we should think about making this available with a configure switch in one or more back branches. I think that the complete back-porting of the fsync request queue issue's fix in commit 758728 could be considered a precedent - that too was a fix for a really bad worst-case that was encountered fairly infrequently in the wild. It's sort of horrifying to have red-hot spinlocks in production, so that seems like the kind of thing we should make an effort to address for those running multi-socket systems. Of those running Postgres on new multi-socket systems, the reality is that the majority are running on Intel hardware. Unfortunately, everyone knows that Intel will soon be the only game in town when it comes to high-end x86_64 servers, which contributes to my feeling that we need to target back branches. We should do something about the possible regression with older compilers using the fallback first, though. It would be useful to know more about the nature of the problem that made such an appreciable difference in Andres' original post. Again, through private e-mail, I saw perf profiles from affected servers and an unaffected though roughly comparable server (i.e. Nathan's 64 core AMD server). Andres observed that "stalled-cycles-frontend" and "stalled-cycles-backend" Linux perf event
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-01 21:04:44 -0500, Tom Lane wrote: > Andres Freund writes: > > I don't know why it didn't fail for postgres_fdw. Hm, maybe it's because > > Mkvcbuild.pm links postgres_fdw with libpq, but not test_shm_mq? > > Hard to see how libpq as such could have anything to do with it. > I wonder though if adding libpq causes some different link-time > switch to be used. If so, could we arrange to use that switch > all the time, and thus get rid of the need for PGDLLIMPORT marks? Well, not because of libpq itself, but because it causes a def file to be used. I am not sure how they are autogenerated, but it might already cause. I fortunately have forgotten most of that uglyness. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund writes: > I don't know why it didn't fail for postgres_fdw. Hm, maybe it's because > Mkvcbuild.pm links postgres_fdw with libpq, but not test_shm_mq? Hard to see how libpq as such could have anything to do with it. I wonder though if adding libpq causes some different link-time switch to be used. If so, could we arrange to use that switch all the time, and thus get rid of the need for PGDLLIMPORT marks? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund writes: > On 2014-02-01 20:03:58 -0500, Tom Lane wrote: >> I think we should give serious consideration to desupporting this >> combination so that we can get rid of the plague of PGDLLIMPORT >> marks. Obviously this would depend on confirming that there are >> no more-interesting Windows build methods that require it --- but >> if there are any, I'd sure demand that there be an active buildfarm >> instance to keep us from breaking the case again in future. > Weren't there more recent cases of needing to add PGDLLIMPORTs? Yeah, which makes the plot even thicker. If references to those variables failed, how did we not notice postgres_fdw's issue? I don't claim to know exactly what's going on, but I think we need to find out. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-02 02:15:39 +0100, Andres Freund wrote: > On 2014-02-01 20:03:58 -0500, Tom Lane wrote: > > I think we should give serious consideration to desupporting this > > combination so that we can get rid of the plague of PGDLLIMPORT > > marks. Obviously this would depend on confirming that there are > > no more-interesting Windows build methods that require it --- but > > if there are any, I'd sure demand that there be an active buildfarm > > instance to keep us from breaking the case again in future. > > Weren't there more recent cases of needing to add PGDLLIMPORTs? I > certainly remember pushing code to Craig's jenkins instance which made > me do so for recent msvc > builds... E.g. 7d7eee8bb702d7796a0d7c5886c1f4685f2e2806 and > 708c529c7fdeba9387825d746752fc6f439d781e just a few days ago seems to be > some of those cases. And indeed, currawong failed with a link time error: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=currawong&dt=2014-01-17%2013%3A30%3A21 I don't know why it didn't fail for postgres_fdw. Hm, maybe it's because Mkvcbuild.pm links postgres_fdw with libpq, but not test_shm_mq? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/01/2014 08:03 PM, Tom Lane wrote: I happened to notice today that the owner of buildfarm member narwhal is trying to revive it after a long time offline, but it's failing in the 9.3 branch (and not attempting to build HEAD, yet). The cause appears to be that contrib/postgres_fdw is referencing the DateStyle and IntervalStyle variables, which aren't marked PGDLLIMPORT. Hm, well, that would be an easy change ... but that code was committed last March. How is it that we didn't notice this long ago? What this seems to indicate is that narwhal is the only buildfarm animal that has a need for PGDLLIMPORT marks on global variables that are referenced by extensions. Furthermore, nobody has attempted to build 9.3 on a platform that needs that (or at least they've not reported failure to us). According to the buildfarm database, narwhal is running a gcc build on Windows 2003. That hardly seems like a mainstream use case. I could believe that it might be of interest to developers, but clearly no developers are actually running such a build. I think we should give serious consideration to desupporting this combination so that we can get rid of the plague of PGDLLIMPORT marks. Obviously this would depend on confirming that there are no more-interesting Windows build methods that require it --- but if there are any, I'd sure demand that there be an active buildfarm instance to keep us from breaking the case again in future. Thoughts? I don't think that it's Windows 2003 that matters so much here as the fact that it's a very old version of gcc. For example, my fairly old buildfarm animal frogmouth, which will be decommissioned soon when WindowsNT is totally unsupported, runs gcc 4.5.0, whereas narwhal runs 3.4.2. If anyone is really using such an old version, I think they should stop, immediately. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-01 20:03:58 -0500, Tom Lane wrote: > I think we should give serious consideration to desupporting this > combination so that we can get rid of the plague of PGDLLIMPORT > marks. Obviously this would depend on confirming that there are > no more-interesting Windows build methods that require it --- but > if there are any, I'd sure demand that there be an active buildfarm > instance to keep us from breaking the case again in future. Weren't there more recent cases of needing to add PGDLLIMPORTs? I certainly remember pushing code to Craig's jenkins instance which made me do so for recent msvc builds... E.g. 7d7eee8bb702d7796a0d7c5886c1f4685f2e2806 and 708c529c7fdeba9387825d746752fc6f439d781e just a few days ago seems to be some of those cases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Sun, Feb 2, 2014 at 2:43 AM, Gabriele Bartolini wrote: > Hi Michael and Fujii, > > Il 01/02/14 17:46, Fujii Masao ha scritto: >> I think that it's OK to add that as TODO item. There might be >> the system that the speed of WAL archiving is slower than >> that of WAL generation, at peak time. The DBA of that system >> might want to monitor the size of archive queue. > I agree that it is an interesting thing to do. The reason I didn't > introduce it in the first place was that I did not want to make too many > changes in this first attempt. For the time being I have added an item in "Statistics Collector" about that. >> We can implement this by just counting the files with .ready >> extension in pg_xlog/archive_status directory. Or we can also >> implement that by adding new counter field in pg_stat_archiver >> structure, incrementing it whenever creating .ready file, and >> decrementing it whenever changing .ready to .done. > I would love to give it a shot at the next opportunity. Will be happy to look at it once you get something. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] narwhal and PGDLLIMPORT
I happened to notice today that the owner of buildfarm member narwhal is trying to revive it after a long time offline, but it's failing in the 9.3 branch (and not attempting to build HEAD, yet). The cause appears to be that contrib/postgres_fdw is referencing the DateStyle and IntervalStyle variables, which aren't marked PGDLLIMPORT. Hm, well, that would be an easy change ... but that code was committed last March. How is it that we didn't notice this long ago? What this seems to indicate is that narwhal is the only buildfarm animal that has a need for PGDLLIMPORT marks on global variables that are referenced by extensions. Furthermore, nobody has attempted to build 9.3 on a platform that needs that (or at least they've not reported failure to us). According to the buildfarm database, narwhal is running a gcc build on Windows 2003. That hardly seems like a mainstream use case. I could believe that it might be of interest to developers, but clearly no developers are actually running such a build. I think we should give serious consideration to desupporting this combination so that we can get rid of the plague of PGDLLIMPORT marks. Obviously this would depend on confirming that there are no more-interesting Windows build methods that require it --- but if there are any, I'd sure demand that there be an active buildfarm instance to keep us from breaking the case again in future. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/01/2014 06:15 PM, Andres Freund wrote: Hi, On 2014-02-01 18:13:42 -0500, Andrew Dunstan wrote: [Long review] Most of these comments actually refer to Teodor and Oleg's code. I will attend to the parts that apply to my code. Well, somebody will need to address them nonetheless :/ Yes, of course, I didn't suggest otherwise. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Hi, On 2014-02-01 18:13:42 -0500, Andrew Dunstan wrote: > [Long review] > > Most of these comments actually refer to Teodor and Oleg's code. > > I will attend to the parts that apply to my code. Well, somebody will need to address them nonetheless :/ Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/01/2014 05:20 PM, Andres Freund wrote: [Long review] Most of these comments actually refer to Teodor and Oleg's code. I will attend to the parts that apply to my code. Thanks for the review. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql for cygwin - 3rd
Andrew Dunstan writes: > On 02/01/2014 05:12 PM, Marco Atzeri wrote: >> is it possible the tsearch test never worked on en_US.utf8 >> but only on C locale ? > Yes, that's more or less what I said, I thought. > Maybe we need to test this on other Windows systems in non-C encodings. > Let's make sure it's only a Cygwin problem. > I'll work on that. You should try to concentrate on the thinks like > prepared_xacts and isolation_test that we know don't work ONLY on Cygwin. Please test the patch I just posted in the pgsql-bugs thread. It looks to me like there are bugs in both the C and non-C locale cases, but only the latter case would be exercised by our regression tests, since we don't use any non-ASCII characters in the tests. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql for cygwin - 3rd
On 02/01/2014 05:12 PM, Marco Atzeri wrote: On 01/02/2014 22:57, Andrew Dunstan wrote: On 01/25/2014 01:23 PM, Andrew Dunstan wrote: * isolation tests fail with an indefinite hang on newer Cygwin * prepared_xacts test fails with an indefinite hang on newer Cygwin if run in parallel with other tests * tsearch tests fail on non-C locale (or at least on en_US.utf8). It turns out this is actually an old bug, and can be reproduced on my old Cygwin instance. I wonder if it's caused by faulty locale files? Andrew, is it possible the tsearch test never worked on en_US.utf8 but only on C locale ? Yes, that's more or less what I said, I thought. Maybe we need to test this on other Windows systems in non-C encodings. Let's make sure it's only a Cygwin problem. I'll work on that. You should try to concentrate on the thinks like prepared_xacts and isolation_test that we know don't work ONLY on Cygwin. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote: > + > +shows the functions that > are > + available for creating json values. > + (see ) > > > - > -JSON Support Functions > + > + array_to_json > + > + > + row_to_json > + > + > + to_json > + > + > + json_build_array > + > + > + json_build_object > + > + > + json_object > + Hm, why are you collecting the indexterms at the top in the contrast to the previous way of collecting them at the point of documentation? > diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile > index 1ae9fa0..fd93d9b 100644 > --- a/src/backend/utils/adt/Makefile > +++ b/src/backend/utils/adt/Makefile > @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o > array_typanalyze.o \ > tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ > tsvector.o tsvector_op.o tsvector_parser.o \ > txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \ > - rangetypes_typanalyze.o rangetypes_selfuncs.o > + rangetypes_typanalyze.o rangetypes_selfuncs.o \ > + jsonb.o jsonb_support.o Odd, most OBJS lines are kept in alphabetical order, but that doesn't seem to be the case here. > +/* > + * for jsonb we always want the de-escaped value - that's what's in token > + */ > + strange newline. > +static void > +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype) > +{ > + JsonbInState *_state = (JsonbInState *) state; > + JsonbValue v; > + > + v.size = sizeof(JEntry); > + > + switch (tokentype) > + { > + ... > + default:/* nothing else should > be here in fact */ > + break; Shouldn't this at least Assert(false) or something? > +static void > +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c) > +{ > + uint32 hentry = c & JENTRY_TYPEMASK; > + > + if (hentry == JENTRY_ISNULL) > + { > + v->type = jbvNull; > + v->size = sizeof(JEntry); > + } > + else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || > hentry == JENTRY_ISCALAR) > + { > + recvJsonb(buf, v, level + 1, (uint32) c); > + } > + else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE) > + { > + v->type = jbvBool; > + v->size = sizeof(JEntry); > + v->boolean = (hentry == JENTRY_ISFALSE) ? false : true; > + } > + else if (hentry == JENTRY_ISNUMERIC) > + { > + v->type = jbvNumeric; > + v->numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, > PointerGetDatum(buf), > + > Int32GetDatum(0), Int32GetDatum(-1))); > + > + v->size = sizeof(JEntry) * 2 + VARSIZE_ANY(v->numeric); What's the *2 here? > +static void > +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header) > +{ > + uint32 hentry; > + uint32 i; This function and recvJsonbValue call each other recursively, afaics without any limit, shouldn't they check for the stack depth? > + hentry = header & JENTRY_TYPEMASK; > + > + v->size = 3 * sizeof(JEntry); *3? > + if (hentry == JENTRY_ISOBJECT) > + { > + v->type = jbvHash; > + v->hash.npairs = header & JB_COUNT_MASK; > + if (v->hash.npairs > 0) > + { > + v->hash.pairs = palloc(sizeof(*v->hash.pairs) * > v->hash.npairs); > + Hm, if I understand correctly, we're just allocating JB_COUNT_MASK (which is 0x0FFF) * sizeof(*v->hash.pairs) bytes here, without any crosschecks about the actual length of the data? So with a few bytes the server can be coaxed to allocate a gigabyte of data? Since this immediately calls another input routine, this can be done in a nested fashion, quickly OOMing the server. I think this and several other places really need a bit more input sanity checking. > + for (i = 0; i < v->hash.npairs; i++) > + { > + recvJsonbValue(buf, &v->hash.pairs[i].key, > level, pq_getmsgint(buf, 4)); > + if (v->hash.pairs[i].key.type != jbvString) > + elog(ERROR, "jsonb's key could be only > a string"); Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a few other places. > +char * > +JsonbToCString(StringInfo out, char *in, int estimated_len) > +{ > + boolfirst = true; > + JsonbIterator *it; > + int type; > + JsonbValue v; > + int level = 0; > + > + if (out == NULL) > + out = makeStringInfo(); Such a behaviour certainly deserves a documentary comment. Generally some more functions could use that. > + while ((type = JsonbIteratorGet(&it, &v, false)) != 0) > + { > +reout
Re: [HACKERS] Postgresql for cygwin - 3rd
On 01/02/2014 22:57, Andrew Dunstan wrote: On 01/25/2014 01:23 PM, Andrew Dunstan wrote: * isolation tests fail with an indefinite hang on newer Cygwin * prepared_xacts test fails with an indefinite hang on newer Cygwin if run in parallel with other tests * tsearch tests fail on non-C locale (or at least on en_US.utf8). It turns out this is actually an old bug, and can be reproduced on my old Cygwin instance. I wonder if it's caused by faulty locale files? Andrew, is it possible the tsearch test never worked on en_US.utf8 but only on C locale ? See my finding on http://www.postgresql.org/message-id/52ed5627.4070...@gmail.com And these are where we need help, especially from the Cygwin community. The fact that things that work on older Cygwins now fail is annoying. cheers andrew Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql for cygwin - 3rd
On 01/25/2014 01:23 PM, Andrew Dunstan wrote: I have now tested the central part of the proposed changes on both old and new Cygwin installations, and they appear to work. I'm going to commit them and backpatch back to 9.0, which is where we currently have buildfarm coverage (and 8.4 will be at EOL in a few months anyway). That will take care of your rebase issue. That part is done. Reliance on dllwrap is a thing of the past. That leaves several issues to be handled: * LDAP libraries - the way you have proposed surely isn't right. What we want is something more like this in the Makefile.global.in: ifeq ($(PORTNAME), cygwin) libpq_pgport += $(LDAP_LIBS_FE) endif Unless someone comes up with a better answer than this I'm going to commit it too. * isolation tests fail with an indefinite hang on newer Cygwin * prepared_xacts test fails with an indefinite hang on newer Cygwin if run in parallel with other tests * tsearch tests fail on non-C locale (or at least on en_US.utf8). It turns out this is actually an old bug, and can be reproduced on my old Cygwin instance. I wonder if it's caused by faulty locale files? And these are where we need help, especially from the Cygwin community. The fact that things that work on older Cygwins now fail is annoying. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-02-01 13:40:20 -0800, Peter Geoghegan wrote: > On Sat, Feb 1, 2014 at 4:57 AM, Andres Freund wrote: > >> I'm looking at alternative options, because this is not terribly > >> helpful. With those big caveats in mind, consider the results of the > >> benchmark, which show the patch performing somewhat worse than the > >> master baseline at higher client counts: > > > > I think that's actually something else. I'd tried to make some > > definitions simpler, and that has, at least for the machine I have > > occasional access to, pessimized things. I can't always run the tests > > there, so I hadn't noticed before the repost. > > I should have been clearer on one point: The pre-rebased patch (actual > patch series) [1] was applied on top of a commit from around the same > period, in order to work around the bit rot. Ah. Then I indeed wouldn't expect improvements. > However, I tested the > most recent revision from your git remote on the AWS instance. > > [1] > http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de But that was before my fix, right. Except you managed to timetravel :) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sat, Feb 1, 2014 at 4:57 AM, Andres Freund wrote: >> I'm looking at alternative options, because this is not terribly >> helpful. With those big caveats in mind, consider the results of the >> benchmark, which show the patch performing somewhat worse than the >> master baseline at higher client counts: > > I think that's actually something else. I'd tried to make some > definitions simpler, and that has, at least for the machine I have > occasional access to, pessimized things. I can't always run the tests > there, so I hadn't noticed before the repost. I should have been clearer on one point: The pre-rebased patch (actual patch series) [1] was applied on top of a commit from around the same period, in order to work around the bit rot. However, I tested the most recent revision from your git remote on the AWS instance. [1] http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
On 2014-01-24 22:31:17 +0900, MauMau wrote: > From: "Fujii Masao" > >On Wed, Jan 22, 2014 at 6:37 AM, Heikki Linnakangas > >>>Thanks! The patch looks good to me. Attached is the updated version of > >>>the patch. I added the comments. > > Thank you very much. Your comment looks great. I tested some recovery > situations, and confirmed that WAL segments were kept/removed as intended. > I'll update the CommitFest entry with this patch. You haven't updated the patches status so far, so I've now marked as returned feedback as several people voiced serious doubts about the approach. If that's not accurate please speak up. > >The problem is, we might not be able to perform restartpoints more > >aggressively > >even if we reduce checkpoint_timeout in the server under the archive > >recovery. > >Because the frequency of occurrence of restartpoints depends on not only > >that > >checkpoint_timeout but also the checkpoints which happened while the > >server > >was running. > > I haven't tried reducing checkpoint_timeout. Did you try reducing checkpoint_segments? As I pointed out, at least if standby_mode is enabled, it will also trigger checkpoints, independently from checkpoint_timeout. If the issue is that you're not using standby_mode (if so, why?), then the fix maybe is to make that apply to a wider range of situations. > I think we cannot take that > approach, because we cannot suggest appropriate checkpoint_timeout to users. > That is, what checkpoint_timeout setting can we suggest so that WAL doesn't > accumulate in pg_xlog/ more than 9.1? Well, <= 9.1's behaviour can lead to loss of a consistent database, so I don't think it's what we need to compare the current state to. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
On 2014-01-21 23:37:43 +0200, Heikki Linnakangas wrote: > On 01/21/2014 07:31 PM, Fujii Masao wrote: > >On Fri, Dec 20, 2013 at 9:21 PM, MauMau wrote: > >>From: "Fujii Masao" > >> > >>>! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested) > >>> > >>>Even when standby_mode is not enabled, we can use cascade replication and > >>>it needs the accumulated WAL files. So I think that > >>>AllowCascadeReplication() > >>>should be added into this condition. > >>> > >>>! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG"); > >>>! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo); > >>>! > >>>! if (restoredFromArchive) > >>> > >>>Don't we need to check !StandbyModeRequested and > >>>!AllowCascadeReplication() > >>>here? > >> > >>Oh, you are correct. Okay, done. > > > >Thanks! The patch looks good to me. Attached is the updated version of > >the patch. I added the comments. > > Sorry for reacting so slowly, but I'm not sure I like this patch. It's a > quite useful property that all the WAL files that are needed for recovery > are copied into pg_xlog, even when restoring from archive, even when not > doing cascading replication. It guarantees that you can restart the standby, > even if the connection to the archive is lost for some reason. I > intentionally changed the behavior for archive recovery too, when it was > introduced for cascading replication. Also, I think it's good that the > behavior does not depend on whether cascading replication is enabled - it's > a quite subtle difference. > > So, IMHO this is not a bug, it's a feature. Very much seconded. With the old behaviour it's possible to get into the situation that you cannot get your standby productive anymore if the archive server died. Since the archive server is often the primary that's imo unacceptable. I'd even go as far as saying the previous behaviour is a bug. > To solve the original problem of running out of disk space in archive > recovery, I wonder if we should perform restartpoints more aggressively. We > intentionally don't trigger restatpoings by checkpoint_segments, only > checkpoint_timeout, but I wonder if there should be an option for that. > MauMau, did you try simply reducing checkpoint_timeout, while doing > recovery? Hm, don't we actually do cause trigger restartpoints based on checkpoint segments? static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI) { ... if (readFile >= 0 && !XLByteInSeg(targetPagePtr, readSegNo)) { /* * Request a restartpoint if we've replayed too much xlog since the * last one. */ if (StandbyModeRequested && bgwriterLaunched) { if (XLogCheckpointNeeded(readSegNo)) { (void) GetRedoRecPtr(); if (XLogCheckpointNeeded(readSegNo)) RequestCheckpoint(CHECKPOINT_CAUSE_XLOG); } } ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin
On 02/01/2014 08:01 AM, Peter Eisentraut wrote: On 1/31/14, 12:47 PM, Andrew Dunstan wrote: Hmm, looks like it got dropped. I think my original patch is probably about the the right thing to do, and given that it's already done by the MSVC build it's a bug and should be backported, as Alvaro said in the original discussion. I'll get on that shortly - since the patch was untested I'll need to do a test first. I think this change is reasonable to make, but it should be in Makefile.shlib, not in libpq/Makefile. Makefile.shlib already knows about the difference between a link-time and a run-time dynamic library (not to speak of Windows vs. others), so the right change there will make it work for libpq, ecpg, and whatever else someone might come up with automatically. In the end I went with the way I had suggested, because that's what the MSVC system does - it doesn't copy any other DLLs to the bin directory. So doing that seemed sane for backpatching, to bring the two build systems into sync. If you want to propose a better arrangement for the future, to include, say, ecpg DLLs, and including changes to the MSVC system, we can discuss that separately. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What is happening on buildfarm member crake?
Robert Haas writes: > On Sat, Jan 25, 2014 at 5:04 PM, Tom Lane wrote: >> Yeah. If Robert's diagnosis is correct, and it sounds pretty plausible, >> then this is really just one instance of a bug that's probably pretty >> widespread in our signal handlers. Somebody needs to go through 'em >> all and look for touches of shared memory. > I haven't made a comprehensive study of every signal handler we have, > [ but here's how to fix procsignal_sigusr1_handler ] I've trawled all the remaining signal handlers (or at least everything declared with SIGNAL_ARGS, which hopefully is all of them). I find the following bugs: 1. A couple of signal handlers do if (MyWalSnd) SetLatch(&MyWalSnd->latch); which is fine as far as it goes, but the shutdown sequence in WalSndKill has exactly the same bug you just fixed in ProcKill: it needs to clear MyWalSnd before disowning the latch, not after. 2. WalRcvSigUsr1Handler and worker_spi_sighup fail to preserve errno. Will fix, but the latter bug is a tad disappointing considering that the coding rule about saving errno in signal handlers has been there for a *long* time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
Hi Michael and Fujii, Il 01/02/14 17:46, Fujii Masao ha scritto: > I think that it's OK to add that as TODO item. There might be > the system that the speed of WAL archiving is slower than > that of WAL generation, at peak time. The DBA of that system > might want to monitor the size of archive queue. I agree that it is an interesting thing to do. The reason I didn't introduce it in the first place was that I did not want to make too many changes in this first attempt. > We can implement this by just counting the files with .ready > extension in pg_xlog/archive_status directory. Or we can also > implement that by adding new counter field in pg_stat_archiver > structure, incrementing it whenever creating .ready file, and > decrementing it whenever changing .ready to .done. I would love to give it a shot at the next opportunity. Thanks, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Sat, Feb 1, 2014 at 9:32 PM, Michael Paquier wrote: > On Wed, Jan 29, 2014 at 3:03 AM, Fujii Masao wrote: >> On Tue, Jan 28, 2014 at 3:42 AM, Fujii Masao wrote: >>> On Sat, Jan 25, 2014 at 3:19 PM, Michael Paquier >>> wrote: On Sat, Jan 25, 2014 at 5:41 AM, Fujii Masao wrote: > On Thu, Jan 23, 2014 at 4:10 PM, Michael Paquier > wrote: >> On Thu, Jan 9, 2014 at 6:36 AM, Gabriele Bartolini >> wrote: >>> Il 08/01/14 18:42, Simon Riggs ha scritto: Not sure I see why it needs to be an SRF. It only returns one row. >>> Attached is version 4, which removes management of SRF stages. >> >> I have been looking at v4 a bit more, and found a couple of small things: >> - a warning in pgstatfuncs.c >> - some typos and format fixing in the sgml docs >> - some corrections in a couple of comments >> - Fixed an error message related to pg_stat_reset_shared referring >> only to bgwriter and not the new option archiver. Here is how the new >> message looks: >> =# select pg_stat_reset_shared('popo'); >> ERROR: 22023: unrecognized reset target: "popo" >> HINT: Target must be "bgwriter" or "archiver" >> A new version is attached containing those fixes. I played also with >> the patch and pgbench, simulating some archive failures and successes >> while running pgbench and the reports given by pg_stat_archiver were >> correct, so I am marking this patch as "Ready for committer". > > I refactored the patch further. > > * Remove ArchiverStats global variable to simplify pgarch.c. > * Remove stats_timestamp field from PgStat_ArchiverStats struct because >it's not required. Thanks, pgstat_send_archiver is cleaner now. > > I have some review comments: > > +s.archived_wals, > +s.last_archived_wal, > +s.last_archived_wal_time, > +s.failed_attempts, > +s.last_failed_wal, > +s.last_failed_wal_time, > > The column names of pg_stat_archiver look not consistent at least to me. > What about the followings? > > archive_count > last_archived_wal > last_archived_time > fail_count > last_failed_wal > last_failed_time And what about archived_count and failed_count instead of respectively archive_count and failed_count? The rest of the names are better now indeed. > I think that it's time to rename all the variables related to > pg_stat_bgwriter. > For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter. > I think that it's okay to make this change as separate patch, though. Yep, this is definitely a different patch. > +char last_archived_wal[MAXFNAMELEN];/* last WAL file archived */ > +TimestampTz last_archived_wal_timestamp;/* last archival success > */ > +PgStat_Counter failed_attempts; > +char last_failed_wal[MAXFNAMELEN];/* last WAL file involved > in failure */ > Some hackers don't like the increase of the size of the statsfile. In > order to > reduce the size as much as possible, we should use the exact size of WAL > file > here instead of MAXFNAMELEN? The first versions of the patch used a more limited size length more inline with what you say: +#define MAX_XFN_CHARS40 But this is inconsistent with xlog_internal.h. >>> >>> Using MAX_XFN_CHARS instead of MAXFNAMELEN has a clear advantage >>> to reduce the size of the statistics file. Though I'm not sure how much this >>> change improve the performance of the statistics collector, basically >>> I'd like to >>> use MAX_XFN_CHARS here. >> >> I changed the patch in this way, fixed some existing bugs (e.g., >> correct the column >> names of pg_stat_archiver in rules.out), and then just committed it. > "Depesz" mentioned here that it would be interesting to have as well > in this view the size of archive queue: > http://www.depesz.com/2014/01/29/waiting-for-9-4-add-pg_stat_archiver-statistics-view/ > And it looks indeed interesting to have. What do you think about > adding it as a TODO item? I think that it's OK to add that as TODO item. There might be the system that the speed of WAL archiving is slower than that of WAL generation, at peak time. The DBA of that system might want to monitor the size of archive queue. We can implement this by just counting the files with .ready extension in pg_xlog/archive_status directory. Or we can also implement that by adding new counter field in pg_stat_archiver structure, incrementing it whenever creating .ready file, and decrementing it whenever changing .ready to .done. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication
Him On 2014-02-01 17:03:46 +0100, Christian Kruse wrote: > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > index a37e6b6..bef5912 100644 > --- a/doc/src/sgml/monitoring.sgml > +++ b/doc/src/sgml/monitoring.sgml > @@ -629,6 +629,16 @@ postgres: user database > host > > > + transactionid > + xid > + The current transaction identifier. > + The other entries refer to the current backend. Maybe Toplevel transaction identifier of this backend, if any. > + > + xmin > + xid > + The current xmin > value. > + > + I don't think that link is correct, the xmin you're linking to is about a row's xmin, while the column you're documenting is the backends current xmin horizon. Maybe: The current backend's xmin horizon. > query > text > Text of this backend's most recent query. If > @@ -1484,6 +1494,11 @@ postgres: user > database host > > > + xmin > + xid > + The current xmin > value. > + > + Wrong link again. This should probably read "This standby's xmin horizon reported by hot_standby_feedback." > @@ -2785,6 +2787,8 @@ pgstat_read_current_status(void) > volatile PgBackendStatus *beentry; > PgBackendStatus *localtable; > PgBackendStatus *localentry; > + PGPROC *proc; > + PGXACT *xact; A bit hard to read from the diff only, but aren't they now unused? > char *localappname, > *localactivity; > int i; > @@ -2848,6 +2852,8 @@ pgstat_read_current_status(void) > /* Only valid entries get included into the local array */ > if (localentry->st_procpid > 0) > { > + BackendIdGetTransactionIds(localentry->st_procpid, > &localentry->transactionid, &localentry->xmin); > + That's a bit of a long line, try to keep it to 79 chars. > /* > + * BackendIdGetTransactionIds > + * Get the PGPROC structure for a backend, given the backend ID. > Also > + * get the xid and xmin of the backend. The result may be out of > date > + * arbitrarily quickly, so the caller must be careful about how > this > + * information is used. NULL is returned if the backend is not > active. > + */ > +PGPROC * > +BackendIdGetTransactionIds(int backendPid, TransactionId *xid, TransactionId > *xmin) > +{ Hm, why do you even return the PGPROC here? Not that's problematic, but it seems a bit pointless. If you remove it you can remove a fair bit of the documentation. I think it should note instead that the two values can be out of whack with each other. > + PGPROC *result = NULL; > + ProcState *stateP; > + SISeg *segP = shmInvalBuffer; > + int index = 0; > + PGXACT *xact; > + > + /* Need to lock out additions/removals of backends */ > + LWLockAcquire(SInvalWriteLock, LW_SHARED); > + > + if (backendPid > 0) > + { > + for (index = 0; index < segP->lastBackend; index++) > + { > + if (segP->procState[index].procPid == backendPid) > + { > + stateP = &segP->procState[index]; > + result = stateP->proc; > + xact = &ProcGlobal->allPgXact[result->pgprocno]; > + > + *xid = xact->xid; > + *xmin = xact->xmin; > + > + break; > + } > + } > + } > + > + LWLockRelease(SInvalWriteLock); > + > + return result; > +} Uh, why do we suddenly need the loop here? BackendIdGetProc() works without one, so this certainly shouldn't need it, or am I missing something? Note that Robert withdrew his complaint about relying on indexing via BackendId in CA+TgmoaFjcji=Hu9YhCSEL0Z116TY2zEKZf7Z5=da+bptey...@mail.gmail.com . I also think you need to initialize *xid/*xmin to InvalidTransactionId if the proc hasn't been found because it exited, otherwise we'll report stale values. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] IndexBuildHeapScan doesn't use page at a time mode
Hi, While looking at a profile I noticed that non-concurrent index builds use the page at a time mode. Since that causes more buffer lwlocks to be taken, that's a noticeable slowdown according to a completely unscientific hack. The reason for not using the page at a time mode is that we only allow it for mvcc snapshots, but we're using SnapshotAny. Since we're doing additional visibility checks afterwards and since the table is locked exlusively it seems safe to me to allow the page at a time mode? I don't plan to implement this right now, but I do plan to pick it up somewhere in the next cycle if nobody beats me to it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On 2014-02-02 00:04:41 +0900, Michael Paquier wrote: > On Fri, Dec 13, 2013 at 2:47 AM, Tom Lane wrote: > > Andres Freund writes: > >> On 2013-12-12 11:55:51 -0500, Tom Lane wrote: > >>> I'm not, however, terribly thrilled with the suggestions to add implicit > >>> casts associated with this type. Implicit casts are generally dangerous. > > > >> It's a tradeof. Currently we have the following functions returning LSNs > >> as text: > >> * pg_current_xlog_location > >> * pg_current_xlog_insert_location > >> * pg_last_xlog_receive_location > >> * pg_last_xlog_replay_location > >> one view containing LSNs > >> * pg_stat_replication > >> and the following functions accepting LSNs as textual paramters: > >> * pg_xlog_location_diff > >> * pg_xlogfile_name > > > >> The question is how do we deal with backward compatibility when > >> introducing a LSN type? There might be some broken code around > >> monitoring if we simply replace the type without implicit casts. > > > > Given the limited usage, how bad would it really be if we simply > > made all those take/return the LSN type? As long as the type's > > I/O representation looks like the old text format, I suspect > > most queries wouldn't notice. I've asked around inside 2ndq and we could find one single problematic query, so it's really not too bad. > Are there some plans to awaken this patch (including changing the > output of the functions of xlogfuncs.c)? This would be useful for the > differential backup features I am looking at these days. I imagine > that it is too late for 9.4 though... I think we should definitely go ahead with the patch per-se, we just added another system view with lsns in it... I don't have a too strong opinion whether to do it in 9.4 or 9.5. It seems fairly low impact to me, and it's an old patch, but I personally don't have the tuits to refresh the patch right now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On Fri, Dec 13, 2013 at 2:47 AM, Tom Lane wrote: > Andres Freund writes: >> On 2013-12-12 11:55:51 -0500, Tom Lane wrote: >>> I'm not, however, terribly thrilled with the suggestions to add implicit >>> casts associated with this type. Implicit casts are generally dangerous. > >> It's a tradeof. Currently we have the following functions returning LSNs >> as text: >> * pg_current_xlog_location >> * pg_current_xlog_insert_location >> * pg_last_xlog_receive_location >> * pg_last_xlog_replay_location >> one view containing LSNs >> * pg_stat_replication >> and the following functions accepting LSNs as textual paramters: >> * pg_xlog_location_diff >> * pg_xlogfile_name > >> The question is how do we deal with backward compatibility when >> introducing a LSN type? There might be some broken code around >> monitoring if we simply replace the type without implicit casts. > > Given the limited usage, how bad would it really be if we simply > made all those take/return the LSN type? As long as the type's > I/O representation looks like the old text format, I suspect > most queries wouldn't notice. Are there some plans to awaken this patch (including changing the output of the functions of xlogfuncs.c)? This would be useful for the differential backup features I am looking at these days. I imagine that it is too late for 9.4 though... Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication
Hi, On 31/01/14 11:24, Robert Haas wrote: > > what do you think about the approach the attached patch implements? > > I'm not really sure if this is what you had in mind, especially if > > this is the right lock. > > The attached patch seems not to be attached, […] *sighs* I'm at FOSDEM right now, I will send it as soon as I'm back home. > […] but the right lock to use would be the same one > BackendIdGetProc() is using. I'd add a new function > BackendIdGetTransactionIds or something like that. Good – that's exactly what I did (with a slightly different naming). > >> I also note that the docs seem to need some copy-editing: > >> > >> + The current xmin > >> value. > > The link shouldn't include the period, and probably should also not > include the word "value". I would make only the word "xmin" part of > the link. Thanks for elaboration. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pgpkstlLLohOH.pgp Description: PGP signature
Re: [HACKERS] [bug fix] pg_ctl fails with config-only directory
Hi, On 31/01/14 22:17, MauMau wrote: > Thanks for reviewing the patch. Fixed. I'll add this revised patch to the > CommitFest entry soon. Looks fine for me. Set it to „waiting for commit.“ Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pgpLWrb04lc6J.pgp Description: PGP signature
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
Hi, On 01/02/14 02:45, Fujii Masao wrote: > LOG: process 33662 still waiting for ShareLock on transaction > 1011 after 1000.184 ms > DETAIL: Process holding the lock: 33660. Request queue: 33662. > [… snip …] > LOG: process 33665 still waiting for ExclusiveLock on tuple (0,4) > of relation 16384 of database 12310 after 1000.134 ms > DETAIL: Process holding the lock: 33662. Request queue: 33665 > > This log message says that the process 33662 is holding the lock, but > it's not true. As the message says: first lock is waiting for the transaction, second one for the tuple. So that are two different locks thus the two different holders and queues. So… > Is this the intentional behavior? Yes, I think so. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services pgpr14uoFS4_6.pgp Description: PGP signature
Re: [HACKERS] [bug fix] postgres.exe fails to start on Windows Server 2012 due to ASLR
From: "Craig Ringer" I'm reasonably persuaded that there's a need for this, though IFEO (see below) can be used at or after install-time as a workaround. I see. And I also found it effective as another workaround to set the below registry key. This disables ASLR for all executables, so it would be disliked. HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\MoveImages It looks like your patch sets the msbuild equivalent of the /DYNAMICBASE:NO flag (http://msdn.microsoft.com/en-us/library/bb384887.aspx). Is this known to work (or be silently ignored) back to Windows SDK 7.1? (I'll test it today and see). OK, I tried these versions at hand: * Windows SDK 7.1, which happens to be associated with Visual Studio 2010 on my PC link.exe accepts /dynamicbase:no, which takes effect. Without /dynamicbase:no, dumpbin /headers displays the lines: 8140 DLL characteristics Dynamic base NX compatible Terminal Server Aware On the other hand, with /dynamicbase:no, the second line above disappeared. * Visual Studio 2008 Express link.exe seems to silently ignore /dynamicbase:no. dumpbin /headers does not display "Dynamic base" regardless of whether /dynamicbase:no is specified. I don't think we need to worry about Force ASLR (http://support.microsoft.com/kb/2639308) as it seems it only applies when an application loads modules, unless an admin goes playing in the registry. Users facing this problem can work around it without code changes by setting IFEO in the registry to disable ASLR for postgres.exe. See: The problem is that users cannot discover the workaround. As more users use PostgreSQL on Windows 8/2012, similar questions would be asked in pgsql-general and pgsql-bugs. It may be reasonable for EDB to consider releasing updated installers that set IFEO flags to disable ASLR on postgres.exe . Not all users use PostgreSQL built by EnterpriseDB, so I think src\tools\msvc\build.bat should produce modules that are not affected by ASLR. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LDAP: bugfix and deprecated OpenLDAP API
On 1/31/14, 6:32 PM, Bruce Momjian wrote: > On Mon, Oct 21, 2013 at 01:31:26PM +, Albe Laurenz wrote: >> Bind attempts to an LDAP server should time out after two seconds, >> allowing additional lines in the service control file to be parsed >> (which provide a fall back to a secondary LDAP server or default options). >> The existing code failed to enforce that timeout during TCP connect, >> resulting in a hang far longer than two seconds if the LDAP server >> does not respond. > > Where are we on this patch? >From my perspective, we need someone who can test this. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin
On 1/31/14, 12:47 PM, Andrew Dunstan wrote: > Hmm, looks like it got dropped. I think my original patch is probably > about the the right thing to do, and given that it's already done by > the MSVC build it's a bug and should be backported, as Alvaro said in > the original discussion. > > I'll get on that shortly - since the patch was untested I'll need to do > a test first. I think this change is reasonable to make, but it should be in Makefile.shlib, not in libpq/Makefile. Makefile.shlib already knows about the difference between a link-time and a run-time dynamic library (not to speak of Windows vs. others), so the right change there will make it work for libpq, ecpg, and whatever else someone might come up with automatically. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-01-31 17:52:58 -0800, Peter Geoghegan wrote: > On Fri, Jan 31, 2014 at 1:54 AM, Andres Freund wrote: > > I plan to split the atomics patch into smaller chunks before > > reposting. Imo the "Convert the PGPROC->lwWaitLink list into a dlist > > instead of open coding it." is worth being applied independently from > > the rest of the series, it simplies code and it fixes a bug... > > For things that require a format-patch series, I personally find it > easier to work off a feature branch on a remote under the control of > the patch author. The only reason that I don't do so myself is that I > know that that isn't everyone's preference. I do to, that's why I have a git branch for all but the most trivial branches. > I have access to a large server for the purposes of benchmarking this. > On the plus side, this is a box very much capable of exercising these > bottlenecks: a 48 core AMD system, with 256GB of RAM. However, I had > to instruct someone else on how to conduct the benchmark, since I > didn't have SSH access, and the OS and toolchain were antiquated, > particularly for this kind of thing. This is Linux kernel > 2.6.18-274.3.1.el5 (RHEL5.7). The GCC version that Postgres was built > with was 4.1.2. This is not what I'd hoped for; obviously I would have > preferred to be able to act on your warning: "Please also note that > due to the current state of the atomics implementation this likely > will only work on a somewhat recent gcc and that the performance might > be slightly worse than in the previous version because the atomic add > is implemented using the CAS fallback". Even still, it might be > marginally useful to get a sense of that cost. I *think* it should actually work on gcc 4.1, I've since added the fallbacks I hadn't back when I wrote the above. I've added exactly those atomics that are needed for the scalable lwlock things (xadd, xsub (yes, that's essentially the same) and cmpxchg). > I'm looking at alternative options, because this is not terribly > helpful. With those big caveats in mind, consider the results of the > benchmark, which show the patch performing somewhat worse than the > master baseline at higher client counts: I think that's actually something else. I'd tried to make some definitions simpler, and that has, at least for the machine I have occasional access to, pessimized things. I can't always run the tests there, so I hadn't noticed before the repost. I've pushed a preliminary relief to the git repository, any chance you could retry? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Wed, Jan 29, 2014 at 3:03 AM, Fujii Masao wrote: > On Tue, Jan 28, 2014 at 3:42 AM, Fujii Masao wrote: >> On Sat, Jan 25, 2014 at 3:19 PM, Michael Paquier >> wrote: >>> On Sat, Jan 25, 2014 at 5:41 AM, Fujii Masao wrote: On Thu, Jan 23, 2014 at 4:10 PM, Michael Paquier wrote: > On Thu, Jan 9, 2014 at 6:36 AM, Gabriele Bartolini > wrote: >> Il 08/01/14 18:42, Simon Riggs ha scritto: >>> Not sure I see why it needs to be an SRF. It only returns one row. >> Attached is version 4, which removes management of SRF stages. > > I have been looking at v4 a bit more, and found a couple of small things: > - a warning in pgstatfuncs.c > - some typos and format fixing in the sgml docs > - some corrections in a couple of comments > - Fixed an error message related to pg_stat_reset_shared referring > only to bgwriter and not the new option archiver. Here is how the new > message looks: > =# select pg_stat_reset_shared('popo'); > ERROR: 22023: unrecognized reset target: "popo" > HINT: Target must be "bgwriter" or "archiver" > A new version is attached containing those fixes. I played also with > the patch and pgbench, simulating some archive failures and successes > while running pgbench and the reports given by pg_stat_archiver were > correct, so I am marking this patch as "Ready for committer". I refactored the patch further. * Remove ArchiverStats global variable to simplify pgarch.c. * Remove stats_timestamp field from PgStat_ArchiverStats struct because it's not required. >>> Thanks, pgstat_send_archiver is cleaner now. >>> I have some review comments: +s.archived_wals, +s.last_archived_wal, +s.last_archived_wal_time, +s.failed_attempts, +s.last_failed_wal, +s.last_failed_wal_time, The column names of pg_stat_archiver look not consistent at least to me. What about the followings? archive_count last_archived_wal last_archived_time fail_count last_failed_wal last_failed_time >>> And what about archived_count and failed_count instead of respectively >>> archive_count and failed_count? The rest of the names are better now >>> indeed. >>> I think that it's time to rename all the variables related to pg_stat_bgwriter. For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter. I think that it's okay to make this change as separate patch, though. >>> Yep, this is definitely a different patch. >>> +char last_archived_wal[MAXFNAMELEN];/* last WAL file archived */ +TimestampTz last_archived_wal_timestamp;/* last archival success */ +PgStat_Counter failed_attempts; +char last_failed_wal[MAXFNAMELEN];/* last WAL file involved in failure */ Some hackers don't like the increase of the size of the statsfile. In order to reduce the size as much as possible, we should use the exact size of WAL file here instead of MAXFNAMELEN? >>> The first versions of the patch used a more limited size length more >>> inline with what you say: >>> +#define MAX_XFN_CHARS40 >>> But this is inconsistent with xlog_internal.h. >> >> Using MAX_XFN_CHARS instead of MAXFNAMELEN has a clear advantage >> to reduce the size of the statistics file. Though I'm not sure how much this >> change improve the performance of the statistics collector, basically >> I'd like to >> use MAX_XFN_CHARS here. > > I changed the patch in this way, fixed some existing bugs (e.g., > correct the column > names of pg_stat_archiver in rules.out), and then just committed it. "Depesz" mentioned here that it would be interesting to have as well in this view the size of archive queue: http://www.depesz.com/2014/01/29/waiting-for-9-4-add-pg_stat_archiver-statistics-view/ And it looks indeed interesting to have. What do you think about adding it as a TODO item? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second
31.01.2014 10:59, Sawada Masahiko kirjoitti: On Tue, Jan 21, 2014 at 7:17 AM, Oskari Saarenmaa wrote: 18.11.2013 07:53, Sawada Masahiko kirjoitti: On 13 Nov 2013, at 20:51, Mika Eloranta wrote: Prevent excessive progress reporting that can grow to gigabytes of output with large databases. I got error with following scenario. $ initdb -D data -E UTF8 --no-locale /* setting the replication parameters */ $ pg_basebackup -D 2data Floating point exception LOG: could not send data to client: Broken pipe ERROR: base backup could not send data, aborting backup FATAL: connection to client lost Attached a rebased patch with a fix for division by zero error plus some code style issues. I also moved the patch to the current commitfest. Thank you for updating the patch! I have reviewed it easily. I didn't get error of compile, and the patch works fine. I have one question. What does it mean the calling progress_report() which you added at end of ReceiveUnpackTarFile() and RecieveTarFile()? I could not understand necessity of this code. And the force argument of progress_report() is also same. If you would like to use 'force' option, I think that you should add the comment to source code about it. I think the idea in the new progress_report() call (with force == true) is to make sure that there is at least one progress_report call that actually writes the progress report. Otherwise the final report may go missing if it gets suppressed by the time-based check. The force argument as used in the new call skips that check. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
The plot thickens... Looking at the next relation I see a similar symptom of a single valid block at the end of several segments of nuls. This relation is also a btree on the same table and has a header in the near vicinity of the xlog: d9de7pcqls4ib6=# select (page_header(get_raw_page('event_data_event_id_person_id', 'main', 3631161))).* ; lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid +-+---+---+---+-+--+-+--- EA1/63A0A8 | 6 | 0 | 1180 | 3552 |8176 | 8192 | 4 | 0 (1 row) And indeed it's the very next xlog record: [cur:EA1/638988, xid:1418089147, rmid:11(Btree), len/tot_len:18/5894, info:8, prev:EA1/637140] insert_leaf: s/d/r:1663/16385/1364767 tid 2746914/219 [cur:EA1/638988, xid:1418089147, rmid:11(Btree), len/tot_len:18/5894, info:8, prev:EA1/637140] bkpblock[1]: s/d/r:1663/16385/1364767 blk:2746914 hole_off/len:1180/2372 [cur:EA1/63A0A8, xid:1418089147, rmid:1(Transaction), len/tot_len:32/64, info:0, prev:EA1/638988] d/s:16385/1663 commit at 2014-01-21 05:41:11 UTC -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Fri, Jan 31, 2014 at 8:21 PM, Tom Lane wrote: > So on a filesystem that supports "holes" > in files, I'd expect that the added segments would be fully allocated > if XLogReadBufferExtended did the deed, but they'd be quite small if > _mdfd_getseg did so. The du results you started with suggest that the > former is the case, but could you verify that the filesystem this is > on supports holes and that du will report only the actually allocated > space when there's a hole? Yup, it's xfs: # dd if=/dev/zero seek=1M count=1 bs=1kB of=hole 1+0 records in 1+0 records out 1000 bytes (1.0 kB) copied, 5.7286e-05 s, 17.5 MB/s # ls -ls hole 4 -rw-r--r-- 1 root root 1048577000 Feb 1 09:35 hole # ls -ls 1261982.5* 1048576 -rw--- 1 ua8157t9mbut8r postgres 1073741824 Jan 14 12:51 1261982.5 1048576 -rw--- 1 ua8157t9mbut8r postgres 1073741824 Jan 23 02:05 1261982.50 1048576 -rw--- 1 ua8157t9mbut8r postgres 1073741824 Jan 23 02:07 1261982.51 1048576 -rw--- 1 ua8157t9mbut8r postgres 1073741824 Jan 23 02:09 1261982.52 1048576 -rw--- 1 ua8157t9mbut8r postgres 1073741824 Jan 23 02:10 1261982.53 508680 -rw--- 1 ua8157t9mbut8r postgres 520888320 Jan 23 02:12 1261982.54 -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
I thought I'd try out what I was in an immediate position to do without having access to dedicated multi-socket hardware: A benchmark on AWS. This was a "c3.8xlarge" instance, which are reportedly backed by Intel Xeon E5-2680 processors. Since the Intel ARK website reports that these CPUs have 16 "threads" (8 cores + hyperthreading), and AWS's marketing material indicates that this instance type has 32 "vCPUs", I inferred that the underlying hardware had 2 sockets. However, reportedly that wasn't the case when procfs was consulted, no doubt due to Xen Hypervisor voodoo: ubuntu@ip-10-67-128-2:~$ lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):32 On-line CPU(s) list: 0-31 Thread(s) per core:32 Core(s) per socket:1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family:6 Model: 62 Stepping: 4 CPU MHz: 2800.074 BogoMIPS: 5600.14 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-31 I ran the benchmark on Ubuntu 13.10 server, because that seemed to be the only prominent "enterprise" x86_64 AMI (operating system image) that came with GCC 4.8 as part its standard toolchain. This exact setup is benchmarked here: http://www.phoronix.com/scan.php?page=article&item=amazon_ec2_c3&num=1 (Incidentally, some of the other benchmarks on that site use pgbench to benchmark the Linux kernel, filesystems, disks and so on. e.g.: http://www.phoronix.com/scan.php?page=news_item&px=NzI0NQ). I was hesitant to benchmark using a virtualized system. There is a lot of contradictory information about the overhead and/or noise added, which may vary from one workload or hypervisor to the next. But, needs must when the devil drives, and all that. Besides, this kind of setup is very commercially relevant these days, so it doesn't seem unreasonable to see how things work out on an AWS instance that generally performs well for this workload. Of course, I still want to replicate the big improvement you reported for multi-socket systems, but you might have to wait a while for that, unless some kindly benefactor that has a 4 socket server lying around would like to help me out. Results: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/c38xlarge-rwlocks/ You can drill down and find the postgresql.conf settings from the report. There appears to be a modest improvement in transaction throughput. It's not as large as the improvement you reported for your 2 socket workstation, but it's there, just about. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers