Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
Greg Stark writes: > Hmm. The backtrace is here but I think it's lying about the specific line. > #0 convert_one_string_to_scalar (value=0x7f20e9a3 " ", > rangelo=32, rangehi=122, 2132863395, 32, 122) > at selfuncs.c:3873 > #1 0x00435880 in convert_string_to_scalar ( > value=0x7f20e990 "Aztec\n", ' ' , "Ct ", > scaledvalue=0x7fffdb44, > lobound=0x7f225bf4 "Audrey", ' ' , "Dr ", > scaledlobound=0x7fffdb34, > hibound=0x7f225c40 "Balmoral", ' ' , "Dr ", > scaledhibound=0x7fffdb3c, 2132863376, 2147474244, 2132958196, > 2147474228, 2132958272, 2147474236) at selfuncs.c:3847 > Stepping through the code it looks like it actually happens on line > 3882 when denom overflows. Oh, interesting. The largest possible value of "base" is 256, and the code limits the amount of string it'll scan to 20 bytes, which means "denom" could reach at most 256^21 = 3.7e50. Perfectly fine with IEEE-math doubles, but not so much with other arithmetics. We could hold the worst-case value to within the VAX range if we considered only about 14 bytes instead of 20. Probably that wouldn't lose much in terms of estimation accuracy, but given the lack of complaints to date, I'm not sure we should change it ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
On Wed, Aug 19, 2015 at 12:13 PM, Fabien COELHO wrote: > > Sure, I think what can help here is a testcase/'s (in form of script file >> or some other form, to test this behaviour of patch) which you can write >> and post here, so that others can use that to get the data and share it. >> > > Sure... note that I already did that on this thread, without any echo... > but I can do it again... > > Thanks. I have tried your scripts and found some problem while using avg.py script. grep 'progress:' test_medium4_FW_off.out | cut -d' ' -f4 | ./avg.py --limit=10 --length=300 : No such file or directory I didn't get chance to poke into avg.py script (the command without avg.py works fine). Python version on the m/c, I planned to test is Python 2.7.5. Today while reading the first patch (checkpoint-continuous-flush-10-a), I have given some thought to below part of patch which I would like to share with you. +static int +NextBufferToWrite( + TableSpaceCheckpointStatus *spcStatus, int nb_spaces, + int *pspace, int num_to_write, int num_written) +{ + int space = *pspace, buf_id = -1, index; + + /* + * Select a tablespace depending on the current overall progress. + * + * The progress ratio of each unfinished tablespace is compared to + * the overall progress ratio to find one with is not in advance + * (i.e. overall ratio > tablespace ratio, + * i.e. tablespace written/to_write > overall written/to_write Here, I think above calculation can go for toss if backend or bgwriter starts writing buffers when checkpoint is in progress. The tablespace written parameter won't be able to consider the one's written by backends or bgwriter. Now it may not big thing to worry but I find Heikki's version worth considering, he has not changed the overall idea of this patch, but the calculations are somewhat simpler and hence less chance of going wrong. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] allowing wal_level change at run time
On 8/20/15 3:50 PM, Andres Freund wrote: >> But, under any scheme to set wal_level automatically, how will the >> primary know whether it needs to use level archive or hot_standby? > > I'd have said archive_mode triggered archive and everything else > hot_standby. That might be a decent heuristic, but it's not correct if there is no way to override it. People are using archive-only replication with hot standby (for delayed replication, for example). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: numeric timestamp in log_line_prefix
On 08/22/2015 09:54 PM, Fabien COELHO wrote: Hello Tomas, Review of v2: attached is a v2 of the patch, reworked based on the comments. The patch applies cleanly to head, it compiles, I tested it and it mostly work as expected, see below. 1) fix the docs (explicitly say that it's a Unix epoch) I would add the word "numeric" in front of timestamp both in the doc and in the postgresql.conf.sample, as it justifies the chosen %n. I think we're already using 'unix epoch' in the docs without explicitly stating that it's a numeric value, so I don't think we should use it here as it'd be inconsistent. 2) handle 'padding' properly I tried that without success. ISTM that what is padded is the empty string, and the timestamp is just printed on its own without padding afterwards. I think that it should use a string buffer and then used the padding on the string, as case 'c' or 't' for instance. Hmmm, I'm not entirely sure how exactly the padding is supposed to work (IIRC I've never used it), and I thought it behaved correctly. But maybe not - I think the safest thing is copy what 't' does, so I've done that in attached v3 of the patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e900dcc..8328733 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4630,6 +4630,11 @@ local0.*/var/log/postgresql no + %n + Time stamp with milliseconds (as a Unix epoch) + no + + %i Command tag: type of session's current command yes diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 088c714..80ffdbd 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -2438,6 +2438,20 @@ log_line_prefix(StringInfo buf, ErrorData *edata) appendStringInfoString(buf, strfbuf); } break; + case 'n': +{ + struct timeval tv; + char strfbuf[128]; + + gettimeofday(&tv, NULL); + sprintf(strfbuf, "%ld.%.03d", tv.tv_sec, (int)(tv.tv_usec / 1000)); + + if (padding != 0) + appendStringInfo(buf, "%*s", padding, strfbuf); + else + appendStringInfoString(buf, strfbuf); +} +break; case 's': if (formatted_start_time[0] == '\0') setup_formatted_start_time(); diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index e5d275d..34abd17 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -425,6 +425,7 @@ # %p = process ID # %t = timestamp without milliseconds # %m = timestamp with milliseconds + # %n = timestamp with milliseconds (as a Unix epoch) # %i = command tag # %e = SQL state # %c = session ID -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exposing pg_controldata and pg_config as functions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/20/2015 07:54 AM, Joe Conway wrote: > On 08/20/2015 06:59 AM, Andrew Dunstan wrote: >> I was a bit interested in pg_config information, for this reason: >> I had a report of someone who had configured using --with-libxml >> but the xml tests actually returned the results that are expected >> without xml being configured. The regression tests thus passed, >> but should not have. It occurred to me that if we had a test >> like > >> select pg_config('configure') ~ '--with-libxml' as has_xml; > >> in the xml tests then this failure mode would be detected. > > I've found use for them both in the past. A fair amount of bit-rot > has set it no doubt, and these were quick and dirty to begin with, > but I have these hacks from a while back: > > https://github.com/jconway/pg_config The attached implements pg_config as a backend SRF and a matching system view. A few notes: 1) The syntax is a bit different than what Andrew proposed: 8< select setting ~ '--with-libxml' as has_xml from pg_config where name = 'CONFIGURE'; has_xml - - t (1 row) 8< In particular note that the name values are all upper case to be consistent with pg_config, and at least currently there is no version of the function which accepts a name as an argument (didn't seem worthwhile to me). 2) No docs or related regression test yet. I will do that if there is enough interest in this getting committed. So far no one except Andrew and I have chimed in. 3) Requires a catalog version bump (not in posted diff) 4) The static function cleanup_path() was borrowed from src/bin/pg_config/pg_config.c It is a small and stable function (no change since 2010 AFAICS), so maybe not worth the effort, but I was wondering if it should be moved to src/common somewhere and shared. I will add this to the next commitfest. Comments/feedback encouraged. Joe - -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJV2PykAAoJEDfy90M199hlQW0P/1fLCtFe50wFanleOxo41aki yR8uG5vUZCLosx7lYd4+LyeE2g+bfs+fK6XgL1qIafI0zyxQSAU8TtjsIPQjjsvU rNn1MWRrlOLEfOMMzbJPo01w5wzLhBvFzrYQ8vVtvf+T2PzjbU1hTMOcmaeXv6If jYv0KQDgPBk/VPZ0D7MI4nYXVzNSInDLD7TGEpoTQwZ0oqvZwScSXc933isoULB4 4isael+g6mQJNoPz+OQEhUSoC922mrGs12SarfHJiUqJs1/NleClRRZ/9llCBnb2 3+zW6cb4XNh8aVP33zTtCsbrio206VjumWUYMNs546+qChormBOnYtZiIVRNRnPk z4x/vxuhXVndDp1VnE5V5mRiW3B8ABliBf1Bcnf/Z+Gxi84LaZVtmL2hJrmn7voT EZsQn/gmpB6ThHKbOl3t060fGZ/RAPDUwOWoYUIVcohOQqxK/iIka0bFM5cnuXO0 8oJ7CFkPSW7kBPs3uPO4Psf/jzrfaK3b/ZfitoV77sMQiVCABlR3a8khw+cPBrok av/1afnGfz6qSxsV8sAyKUmRZkLDtmT01GUHCuujof1PQ3tD8zVsQWI3r51UcGB3 tFKvvy9koTHEunqkU6yQrCWNOEzHpGXEa1RIV33Ywgh0deKVEU5EbfJF5iIHBgOy dYf2PHbYW7F1RSqKnZIa =A2+X -END PGP SIGNATURE- diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index ccc030f..0d2e8f1 100644 *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** CREATE VIEW pg_timezone_abbrevs AS *** 425,430 --- 425,433 CREATE VIEW pg_timezone_names AS SELECT * FROM pg_timezone_names(); + CREATE VIEW pg_config AS + SELECT * FROM pg_config(); + -- Statistics views CREATE VIEW pg_stat_all_tables AS diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile index 7889101..49c6f08 100644 *** a/src/backend/utils/misc/Makefile --- b/src/backend/utils/misc/Makefile *** include $(top_builddir)/src/Makefile.glo *** 14,20 override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS) ! OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \ sampling.o superuser.o timeout.o tzparser.o # This location might depend on the installation directories. Therefore --- 14,33 override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS) ! # don't include subdirectory-path-dependent -I and -L switches ! STD_CPPFLAGS := $(filter-out -I$(top_srcdir)/src/include -I$(top_builddir)/src/include,$(CPPFLAGS)) ! STD_LDFLAGS := $(filter-out -L$(top_builddir)/src/port,$(LDFLAGS)) ! override CPPFLAGS += -DVAL_CONFIGURE="\"$(configure_args)\"" ! override CPPFLAGS += -DVAL_CC="\"$(CC)\"" ! override CPPFLAGS += -DVAL_CPPFLAGS="\"$(STD_CPPFLAGS)\"" ! override CPPFLAGS += -DVAL_CFLAGS="\"$(CFLAGS)\"" ! override CPPFLAGS += -DVAL_CFLAGS_SL="\"$(CFLAGS_SL)\"" ! override CPPFLAGS += -DVAL_LDFLAGS="\"$(STD_LDFLAGS)\"" ! override CPPFLAGS += -DVAL_LDFLAGS_EX="\"$(LDFLAGS_EX)\"" ! override CPPFLAGS += -DVAL_LDFLAGS_SL="\"$(LDFLAGS_SL)\"" ! override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\"" ! ! OBJS = guc.o help_config.o pg_config.o pg_rusage.o ps_status.o rls.o \ sampling.o superuser.o timeout.o tzparser.o # This location might depend on the installation directories. Therefore diff --git a/src/backend/utils/misc/pg_config.c b/src/backend/utils/misc/pg_con
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
On 22 Aug 2015 18:02, "Tom Lane" wrote: > > Why not define infnan() and make it do the same thing as > FloatExceptionHandler? Or was that what you meant? That's exactly what I meant, instead of my quick hack to add a signal handler for sigill. > > The hang is actually in the groupingset tests in > > bipartite_match.c:hk_breadth_search(). ... > Is it that function itself that's hanging, or is the problem that its > caller expects it to ultimately return true, and it never does? I think it never exits that function but I'll try it again. > I don't think we're really insisting on a true infinity here, only that > get_float4_infinity() produce a large value that isinf() will recognize. > > I'm surprised that any of the hacks in src/port/isinf.c compile on Vax > at all --- did you invent a new one? > > Also, I'd have thought that both get_floatX_infinity and get_floatX_nan > would be liable to produce SIGFPE on non-IEEE machines. Did you mess > with those? I didn't do anything. There's no isinf.o in that directory so I don't think anything got compiled. There are other files in src/port but not that. > > The other place where non-IEEE floats are causing problems internal to > > postgres appears to be inside spgist -- even when planning queries > > using spgist: > > That's pretty odd --- it does not look like spgcostestimate does anything > very exciting. Can you get a stack trace showing where that FPE happens? Hmm. The backtrace is here but I think it's lying about the specific line. #0 convert_one_string_to_scalar (value=0x7f20e9a3 " ", rangelo=32, rangehi=122, 2132863395, 32, 122) at selfuncs.c:3873 #1 0x00435880 in convert_string_to_scalar ( value=0x7f20e990 "Aztec\n", ' ' , "Ct ", scaledvalue=0x7fffdb44, lobound=0x7f225bf4 "Audrey", ' ' , "Dr ", scaledlobound=0x7fffdb34, hibound=0x7f225c40 "Balmoral", ' ' , "Dr ", scaledhibound=0x7fffdb3c, 2132863376, 2147474244, 2132958196, 2147474228, 2132958272, 2147474236) at selfuncs.c:3847 Stepping through the code it looks like it actually happens on line 3882 when denom overflows. (gdb) n 3882denom *= base; 3: denom = 1.666427615935998e+37 2: num = 0.37361810145459621 1: slen = 0 (gdb) n Program received signal SIGFPE, Arithmetic exception. convert_one_string_to_scalar (value=0x7f20e9a4 " ", rangelo=32, rangehi=122, 2132863396, 32, 122) at selfuncs.c:3873 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: numeric timestamp in log_line_prefix
Hello Tomas, Review of v2: attached is a v2 of the patch, reworked based on the comments. The patch applies cleanly to head, it compiles, I tested it and it mostly work as expected, see below. 1) fix the docs (explicitly say that it's a Unix epoch) I would add the word "numeric" in front of timestamp both in the doc and in the postgresql.conf.sample, as it justifies the chosen %n. 2) handle 'padding' properly I tried that without success. ISTM that what is padded is the empty string, and the timestamp is just printed on its own without padding afterwards. I think that it should use a string buffer and then used the padding on the string, as case 'c' or 't' for instance. 3) get rid of timestamp_str - use appendString* methods directly See above, I'm afraid it is necessary for padding, because there are two formatted fields. 4) support just the "with milliseconds" using '%n' escape sequence Ok. All those changes are quite trivial. The only annoying bit is that both '%u' and '%e' are already used, so none of the obvious choices for 'Unix Epoch' are available. So I simply took (%m+1) which is %n. It stands for "numeric" as well, so I think it is quite nice. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error message with plpgsql CONTINUE
I had a few second thoughts about the wording of the error messages in this area. First, consider create or replace function foo() returns void language plpgsql as $$ begin <> loop exit lab1; -- ok end loop; loop exit lab1; -- not so ok end loop; end$$; ERROR: label "lab1" does not exist LINE 8: exit lab1; -- not so ok ^ This message seems confusing: label "lab1" does exist, it's just not attached to the right loop. In a larger function that might not be too obvious, and I can easily imagine somebody wasting some time before figuring out the cause of his problem. Given the way the namespace data structure works, I am not sure that we can realistically detect at line 8 that there was an instance of lab1 earlier, but perhaps we could word the error message to cover either possibility. Maybe something like "there is no label "foo" surrounding this statement"? Second, consider create or replace function foo() returns void language plpgsql as $$ begin <> begin exit lab1; -- ok exit; -- not so ok end; end$$; ERROR: EXIT cannot be used outside a loop LINE 6: exit; -- not so ok ^ This is not too accurate, as shown by the fact that the first EXIT is accepted. Perhaps "EXIT without a label cannot be used outside a loop"? I realize that this is pretty nitpicky, but if we're going to all the trouble of improving the error messages about these things, seems like we ought to be careful about what the messages actually say. I'm not married to these particular wordings though. Suggestions? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: numeric timestamp in log_line_prefix
Hi all, attached is a v2 of the patch, reworked based on the comments. 1) fix the docs (explicitly say that it's a Unix epoch) 2) handle 'padding' properly 3) get rid of timestamp_str - use appendString* methods directly 4) support just the "with milliseconds" using '%n' escape sequence All those changes are quite trivial. The only annoying bit is that both '%u' and '%e' are already used, so none of the obvious choices for 'Unix Epoch' are available. So I simply took (%m+1) which is %n. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e900dcc..8328733 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4630,6 +4630,11 @@ local0.*/var/log/postgresql no + %n + Time stamp with milliseconds (as a Unix epoch) + no + + %i Command tag: type of session's current command yes diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 088c714..4520b9f 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -2438,6 +2438,18 @@ log_line_prefix(StringInfo buf, ErrorData *edata) appendStringInfoString(buf, strfbuf); } break; + case 'n': +{ + struct timeval tv; + gettimeofday(&tv, NULL); + + if (padding != 0) + appendStringInfo(buf, "%*s", padding, ""); + + appendStringInfo(buf, "%ld.%.03d", tv.tv_sec, + (int)(tv.tv_usec / 1000)); +} +break; case 's': if (formatted_start_time[0] == '\0') setup_formatted_start_time(); diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index e5d275d..34abd17 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -425,6 +425,7 @@ # %p = process ID # %t = timestamp without milliseconds # %m = timestamp with milliseconds + # %n = timestamp with milliseconds (as a Unix epoch) # %i = command tag # %e = SQL state # %c = session ID -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Test code is worth the space
On Tue, Aug 18, 2015 at 3:32 PM, David Fetter wrote: > On Tue, Aug 18, 2015 at 04:54:07PM +0100, Greg Stark wrote: > > On Tue, Aug 18, 2015 at 2:16 PM, David Fetter wrote: > > > I'm given to understand that this tight coupling is necessary for > > > performance. Are you saying that it could be unwound, or that > > > testing strategies mostly need to take it into account, or...? > > > > I'm just saying that we shouldn't expect to find a magic bullet test > > framework that solves all these problems. Without restructuring > > code, which I don't think is really feasible, we won't be able to > > have good unit test coverage for most existing code. > > > > It might be more practical to start using such a new tool for new > > code only. Then the new code could be structured in ways that allow > > the environment to be mocked more easily and the results observed > > more easily. > > Great! > > Do we have examples of such tools and code bases structured to > accommodate them that we'd like to use for reference, or at least for > inspiration? > +1 on that. It would be helpful to see successful examples. Especially ones written in C. I can't really figure out what success looks like just from reading the descriptions. Cheers, Jeff
Re: [HACKERS] Potential GIN vacuum bug
On Tue, Aug 18, 2015 at 8:59 AM, Robert Haas wrote: > On Mon, Aug 17, 2015 at 5:41 PM, Jeff Janes wrote: > > User backends attempt to take the lock conditionally, because otherwise > they > > would cause an autovacuum already holding the lock to cancel itself, > which > > seems quite bad. > > > > Not that this a substantial behavior change, in that with this code the > user > > backends which find the list already being cleaned will just add to the > end > > of the pending list and go about their business. So if things are added > to > > the list faster than they can be cleaned up, the size of the pending list > > can increase without bound. > > > > Under the existing code each concurrent user backend will try to clean > the > > pending list at the same time. The work doesn't parallelize, so doing > this > > is just burns CPU (and possibly consuming up to maintenance_work_mem for > > *each* backend) but it does server to throttle the insertion rate and so > > keep the list from growing without bound. > > > > This is just a proof-of-concept patch, because I don't know if this > approach > > is the right approach. > > I'm not sure if this is the right approach, but I'm a little wary of > involving the heavyweight lock manager in this. If pending list > cleanups are frequent, this could involve a lot of additional lock > manager traffic, which could be bad for performance. I don't think 10 cleanups a second should be a problem (and most of those would probably fail to acquire the lock, but I don't know if that would make a difference). If there were several hundred per second, I think you would have bigger problems than traffic through the lock manager. In that case, it is time to either turn off fastupdate, or increase the pending list size. As a mini-vacuum, it seems natural to me to hold a lock of the same nature as a regular vacuum, but just on the one index involved rather than the hole table. > Even if they are > infrequent, it seems like it would be more natural to handle this > without some regime of locks and pins and buffer cleanup locks on the > buffers that are storing the pending list, rather than a heavyweight > lock on the whole relation. But I am just waving my hands wildly > here. > I also thought of a buffer clean up lock on the pending list head buffer to represent the right to do a clean up. But with the proviso that once you have obtained the clean up lock, you can then drop the exclusive buffer content lock and continue to hold the conceptual lock just by maintaining the pin. I think that this would be semantically correct, but backends doing a cleanup would have to get the lock conditionally, and I think you would have too many chances for false failures where it bails out when the other party simply holds a pin. I guess I could implement it and see how it fairs in my test case. Cheers, Jeff
Re: [HACKERS] (full) Memory context dump considered harmful
Tomas Vondra writes: > One question regarding the proposed patch though - if I get it right > (just looking at the diff), it simply limits the output to first 100 > child contexts at each level independently. So if each of those 100 > child contexts has >100 child contexts on it's own, we get 100x100 > lines. And so on, if the hierarchy is deeper. This probably is not > addressable without introducing some global counter of printed contexts, > and it may not be an issue at all (all the cases I could remember have a > single huge context or many sibling contexts). Right. The situation Stefan was complaining of, almost certainly, involved a huge number of children of the same context. This patch would successfully abbreviate that case, no matter where it happened in the context tree exactly. In principle, if you were getting that sort of expansion at multiple levels of the context tree concurrently, you could still get a mighty long context dump ... but I really doubt that would happen in practice. (And if it did happen, an overall limit on the number of contexts printed would hide the fact that it was happening, which wouldn't be desirable.) >> One thing we could consider doing to improve the odds that it's fine >> would be to rearrange things so that child contexts of the same >> parent are more likely to be "similar" --- for example, break out >> all relcache entries to be children of a RelCacheContext instead of >> the generic CacheMemoryContext, likewise for cached plans, etc. But >> I'm not yet convinced that'd be worth the trouble. > That'd be nice but I see that as an independent improvement - it might > improve the odds for internal contexts, but what about contexts coming > from user code (e.g. custom aggregates)? Yeah, cases like custom aggregates would be hard to classify. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (full) Memory context dump considered harmful
On 08/22/2015 06:06 PM, Tom Lane wrote: Tomas Vondra writes: Couldn't we make it a bit smarter to handle even cases like this? For example we might first count/sum the child contexts, and then print either all child contexts (if there are only a few of them) or just those that are >5% of the total, 2x the average or something like that. That seems overly complicated. In the first place, you couldn't possibly implement that without two passes over the context set, which would be mighty expensive. (In the situations where this is getting run, most likely portions of the address space have been swapped out, and we'll have to swap them back in again. Bad enough to iterate over the whole process address space once, but twice?) In the second place, it would seem to require a much wider API between MemoryContextStats and the per-context-type stats methods, including for example a way to decide which numbers about a context were the most important ones. I'm already concerned about having had to expose a set of accumulatable numbers at all ... don't want the API to get into their semantics in that kind of detail. Sure, the two passes seem quite annoying, although I'm not convinced it'd be a problem in practice - most modern systems I've been dealing with recently were configured with the 'no swapping' goal, and using only a tiny swap for extreme cases (e.g. 256GB RAM with 4GB of swap). If only we had some memory accounting in place ;-) Then we wouldn't have to do the two passes ... As I said, based on my experience in looking at memory context dumps (and I've seen a few), a show-the-first-N-children heuristic probably will work fine. If we find out differently we can work on refining it, but I don't think a complex design is warranted at this stage. OK, let's go with the simple approach. I'm still not quite convinced having no GUC for turning this off is a good idea, though. From time to time we're dealing with OOM issues at customer systems, with very limited access (e.g. no gdb). In those cases the memory context is the only information we have initial, and it's usually quite difficult to get additional info. I agree that the 'first-N-children' heuristics is going to work in most cases, but if it doesn't it'd be nice to have a way to force "full" stats in ad-hoc manner (i.e. disable before query, etc.). OTOH now that I'm thinking about it, most OOM errors I've seen (at least on Linux) were either because of exceptionally large palloc() requests (e.g. because of varlena header corruption, overflow bugs, ...) or because of OOM killer. In the first case the memory context stats are utterly useless because the issue has nothing to do with other memory contexts, in the latter case you don't get any stats at all because the process is simply killed. One question regarding the proposed patch though - if I get it right (just looking at the diff), it simply limits the output to first 100 child contexts at each level independently. So if each of those 100 child contexts has >100 child contexts on it's own, we get 100x100 lines. And so on, if the hierarchy is deeper. This probably is not addressable without introducing some global counter of printed contexts, and it may not be an issue at all (all the cases I could remember have a single huge context or many sibling contexts). One thing we could consider doing to improve the odds that it's fine would be to rearrange things so that child contexts of the same parent are more likely to be "similar" --- for example, break out all relcache entries to be children of a RelCacheContext instead of the generic CacheMemoryContext, likewise for cached plans, etc. But I'm not yet convinced that'd be worth the trouble. That'd be nice but I see that as an independent improvement - it might improve the odds for internal contexts, but what about contexts coming from user code (e.g. custom aggregates)? kind regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD
Greg Stark writes: > On Thu, Aug 20, 2015 at 3:29 PM, Tom Lane wrote: >> That seems worth poking into. > Mea culpa. Not a planner crash but rather an overflow from exp(). It > turns out exp() and other math library functions on Vax do not signal > FPE but rather have a curious api that lets us catch the overflow by > defining a function "infnan()" to call when it overflows. If we don't > define that function then it executes an illegal instruction which > generates SIGILL with errno set to EDOM (iirc). For the moment I've > just attached our FPE handler to SIGILL and that's letting me run the > tests without crashes. It's probably just silly make-work but it would > be pretty easy to add a simple function to call our FPE handler > directly to avoid having to have a SIGILL handler which seems like a > bad idea in general. Why not define infnan() and make it do the same thing as FloatExceptionHandler? Or was that what you meant? > The hang is actually in the groupingset tests in > bipartite_match.c:hk_breadth_search(). > Looking at that function it's not surprising that it doesn't work > without IEEE floats given that the first line is > distance[0] = get_float4_infinity(); > And the return value of the function is > !isinf(distance[0]); Is it that function itself that's hanging, or is the problem that its caller expects it to ultimately return true, and it never does? I don't think we're really insisting on a true infinity here, only that get_float4_infinity() produce a large value that isinf() will recognize. I'm surprised that any of the hacks in src/port/isinf.c compile on Vax at all --- did you invent a new one? Also, I'd have thought that both get_floatX_infinity and get_floatX_nan would be liable to produce SIGFPE on non-IEEE machines. Did you mess with those? > The other place where non-IEEE floats are causing problems internal to > postgres appears to be inside spgist -- even when planning queries > using spgist: That's pretty odd --- it does not look like spgcostestimate does anything very exciting. Can you get a stack trace showing where that FPE happens? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (full) Memory context dump considered harmful
Tomas Vondra writes: > On 08/21/2015 08:37 PM, Tom Lane wrote: >> ... But suppose we add a parameter to memory context stats >> collection that is the maximum number of child contexts to print *per >> parent context*. If there are more than that, summarize the rest as per >> your suggestion. >> >> The case where you would lose important data is where the serious >> bloat is in some specific child context that is after the first N >> children of its direct parent. I don't believe I've ever seen a case >> where that was critical information as long as N isn't too tiny. > Couldn't we make it a bit smarter to handle even cases like this? For > example we might first count/sum the child contexts, and then print > either all child contexts (if there are only a few of them) or just > those that are >5% of the total, 2x the average or something like that. That seems overly complicated. In the first place, you couldn't possibly implement that without two passes over the context set, which would be mighty expensive. (In the situations where this is getting run, most likely portions of the address space have been swapped out, and we'll have to swap them back in again. Bad enough to iterate over the whole process address space once, but twice?) In the second place, it would seem to require a much wider API between MemoryContextStats and the per-context-type stats methods, including for example a way to decide which numbers about a context were the most important ones. I'm already concerned about having had to expose a set of accumulatable numbers at all ... don't want the API to get into their semantics in that kind of detail. As I said, based on my experience in looking at memory context dumps (and I've seen a few), a show-the-first-N-children heuristic probably will work fine. If we find out differently we can work on refining it, but I don't think a complex design is warranted at this stage. One thing we could consider doing to improve the odds that it's fine would be to rearrange things so that child contexts of the same parent are more likely to be "similar" --- for example, break out all relcache entries to be children of a RelCacheContext instead of the generic CacheMemoryContext, likewise for cached plans, etc. But I'm not yet convinced that'd be worth the trouble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: numeric timestamp in log_line_prefix
Hi, On 08/22/2015 08:39 AM, Fabien COELHO wrote: Hello Tomas, from time to time I need to correlate PostgreSQL logs to other logs, containing numeric timestamps - a prime example of that is pgbench. With %t and %m that's not quite trivial, because of timezones etc. I propose adding two new log_line_prefix escape sequences - %T and %M, doing the same thing as %t and %m, but formatting the value as a number. Patch attached, I'll add it to CF 2015-06. Are you planing to update this patch? I was wanting to use it for some tests and figured out that it had stayed as a proposal, too bad. I would suggest to maybe follow Tom's %u idea and fix the implementation details wrt to comments received? Yes, I plan to update it according to those comments. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New functions
On 07/08/2015 01:15 PM, Дмитрий Воронин wrote: 07.07.2015, 18:34, "Michael Paquier" : Speaking of which, I have rewritten the patch as attached. This looks way cleaner than the previous version submitted. Dmitry, does that look fine for you? I am switching this patch as "Waiting on Author". Michael, hello. I'm looking your variant of patch. You create function ssl_extensions_info(), that gives information of SSL extensions in more informative view. So, it's cool. Should check the return value of every OpenSSL call for errors. At least BIO_new() could return NULL, but check all the docs of the others too. Are all the functions used in the patch available in all the versions of OpenSSL we support? Other than those little things, looks good to me. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing ClogControlLock contention
Hi, Amit pinged me about my opinion of this patch. I don't really have time/energy for an in-depth review right now, but since I'd looked enough to see some troublesome points, I thought I'd write those. On 2015-06-30 08:02:25 +0100, Simon Riggs wrote: > Proposal for improving this is to acquire the ClogControlLock in Shared > mode, if possible. I find that rather scary. That requires for all read and write paths in clog.c and slru.c to only ever read memory locations once. Otherwise those reads may not get the same results. That'd mean we'd need to add indirections via volatile to all reads/writes. It also requires that we never read in 4 byte quantities. > This is safe because people checking visibility of an xid must always run > TransactionIdIsInProgress() first to avoid race conditions, which will > always return true for the transaction we are currently committing. As a > result, we never get concurrent access to the same bits in clog, which > would require a barrier. I don't think that's really sufficient. There's a bunch of callers do lookups without such checks, e.g. in heapam.c. It might be safe due to other circumstances, but at the very least this is a significant but sublte API revision. > Two concurrent writers might access the same word concurrently, so we > protect against that with a new CommitLock. We could partition that by > pageno also, if needed. To me it seems better to make this more integrated with slru.c. Change the locking so that the control lock (relevant for page mappings et al) is different from the locks for reading/writing data. >* If we're doing an async commit (ie, lsn is valid), then we must wait >* for any active write on the page slot to complete. Otherwise our >* update could reach disk in that write, which will not do since we > @@ -273,7 +290,9 @@ TransactionIdSetPageStatus(TransactionId xid, int > nsubxids, >* write-busy, since we don't care if the update reaches disk sooner > than >* we think. >*/ > - slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), > xid); > + if (!InRecovery) > + LWLockAcquire(CommitLock, LW_EXCLUSIVE); > + slotno = SimpleLruReadPage_ReadOnly(ClogCtl, pageno, > XLogRecPtrIsInvalid(lsn), xid); > > /* >* Set the main transaction id, if any. > @@ -312,6 +331,9 @@ TransactionIdSetPageStatus(TransactionId xid, int > nsubxids, > ClogCtl->shared->page_dirty[slotno] = true; > > LWLockRelease(CLogControlLock); > + > + if (!InRecovery) > + LWLockRelease(CommitLock); > } TransactionIdSetPageStatus() calls TransactionIdSetStatusBit(), which writes an 8 byte variable (the lsn). That's not safe. Maybe write_ok = XLogRecPtrIsInvalid(lsn) is trying to address that? If so, I don't see how. A page is returned with only the shared lock held if it's in VALID state before. Even if that were changed, this'd be a mightily subtle thing to do without a very fat comment. > @@ -447,7 +447,8 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok, > * It is unspecified whether the lock will be shared or exclusive. > */ > int > -SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid) > +SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, bool write_ok, > +TransactionId xid) > { > SlruShared shared = ctl->shared; > int slotno; > @@ -460,7 +461,9 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, > TransactionId xid) > { > if (shared->page_number[slotno] == pageno && > shared->page_status[slotno] != SLRU_PAGE_EMPTY && > - shared->page_status[slotno] != > SLRU_PAGE_READ_IN_PROGRESS) > + shared->page_status[slotno] != > SLRU_PAGE_READ_IN_PROGRESS && > + (write_ok || > + shared->page_status[slotno] != > SLRU_PAGE_WRITE_IN_PROGRESS)) > { > /* See comments for SlruRecentlyUsed macro */ > SlruRecentlyUsed(shared, slotno); > @@ -472,7 +475,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, > TransactionId xid) > LWLockRelease(shared->ControlLock); > LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE); > > - return SimpleLruReadPage(ctl, pageno, true, xid); > + return SimpleLruReadPage(ctl, pageno, write_ok, xid); > } This function's name would definitely need to be changed, and it'll need to be documented when/how it's safe to use write_ok = true. Same with slru.c's header. A patch like this will need far more changes. Every read and write from a page has to be guaranteed to only be done once, otherwise you can get 'effectively torn' values. That is, you can't just do static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno) ... char *byteptr;