Re: Fix example in partitioning documentation
On Tue, Sep 24, 2019 at 10:52:30AM +0900, Amit Langote wrote: > As of v12, Append node is elided when there's a single subnode under > it. An example in the partitioning documentation needs to be fixed to > account for that change. Attached a patch. Indeed, using the same example as the docs: CREATE TABLE measurement ( logdate date not null, peaktempint, unitsales int ) PARTITION BY RANGE (logdate); CREATE TABLE measurement_y2016m07 PARTITION OF measurement ( unitsales DEFAULT 0 ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01'); SET enable_partition_pruning = on; EXPLAIN SELECT count(*) FROM measurement WHERE logdate = DATE '2016-07-02'; I'll take care of committing that, however this will have to wait until v12 RC1 is tagged. -- Michael signature.asc Description: PGP signature
Re: Add "password_protocol" connection parameter to libpq
On Sat, Sep 21, 2019 at 11:24:30AM +0900, Michael Paquier wrote: > And both make sense. > > + * Return true if channel binding was employed and the scram exchange > upper('scram')? > > Except for this nit, it looks good to me. For the archive's sake: this has been committed as of d6e612f. - * We pretend that the connection is OK for the duration of these - * queries. + * We pretend that the connection is OK for the duration of + * these queries. The result had some noise diffs. Perhaps some leftover from the indentation run? That's no big deal anyway. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Sort policies and triggers by table name in pg_dump.
On Mon, Sep 23, 2019 at 10:34:07PM +0100, Benjie Gillam wrote: > The attached draft patch (made against `pg_dump_sort.c` on master) breaks > ties for trigger and policy objects by using the table name, increasing the > sort order stability. I have compiled it and executed it against a number of > local databases and it behaves as desired. Could you provide a simple example of schema (tables with some policies and triggers), with the difference this generates for pg_dump, which shows your point? > I am new to PostgreSQL contribution and my C-skills are rusty, so please let > me know if I can improve the patch, or if there are areas of PostgreSQL that > I have overlooked. Your patch has two warnings because you are trying to map a policy info pointer to a trigger info pointer: pg_dump_sort.c:224:24: warning: initialization of ‘TriggerInfo *’ {aka ‘struct _triggerInfo *’} from incompatible pointer type ‘PolicyInfo * const’ {aka ‘struct _policyInfo * const’} [-Wincompatible-pointer-types] 224 | TriggerInfo *tobj2 = *(PolicyInfo *const *) p2; -- Michael signature.asc Description: PGP signature
Fix example in partitioning documentation
Hi, As of v12, Append node is elided when there's a single subnode under it. An example in the partitioning documentation needs to be fixed to account for that change. Attached a patch. Thanks, Amit doc-partitioning.patch Description: Binary data
Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node
On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote: > On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote: >> Is this an issue? >> Can we fix like this? >> Thanks! >> > > I do think it is a valid issue. No opinion on the fix yet, though. > The report was sent on saturday, so patience ;-) And for some others it was even a longer weekend. Anyway, the problem can be reproduced if you apply the attached which introduces a failure point, and then if you run the following commands: create table aa as select 1; delete from aa; \! touch /tmp/truncate_flag vacuum aa; \! rm /tmp/truncate_flag vacuum aa; -- panic on standby This also points out that there are other things to worry about than interruptions, as for example DropRelFileNodeLocalBuffers() could lead to an ERROR, and this happens before the physical truncation is done but after the WAL record is replayed on the standby, so any failures happening at the truncation phase before the work is done would be a problem. However we are talking about failures which should not happen and these are elog() calls. It would be tempting to add a critical section here, but we could still have problems if we have a failure after the WAL record has been flushed, which means that it would be replayed on the standby, and the surrounding comments are clear about that. In short, as a matter of safety I'd like to think that what you are suggesting is rather acceptable (aka hold interrupts before the WAL record is written and release after the physical truncate), so as truncation avoids failures possible to avoid. Do others have thoughts to share on the matter? -- Michael diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 3cc886f7fe..0a38a32734 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -290,6 +290,13 @@ RelationTruncate(Relation rel, BlockNumber nblocks) XLogFlush(lsn); } + { + struct stat stat_buf; + + if (stat("/tmp/truncate_flag", _buf) == 0) + elog(ERROR, "/tmp/truncate_flag is present"); + } + /* Do the real work */ smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks); } signature.asc Description: PGP signature
Re: Unwanted expression simplification in PG12b2
If the function was moved to the FROM clause where it would be executed as a lateral cross join instead of a target list expression, how would this affect the cost-based positioning of the Gather? On 9/23/19, 8:59 AM, "Robert Haas" wrote: On Sun, Sep 22, 2019 at 7:47 AM Darafei "Komяpa" Praliaskouski wrote: > A heuristic I believe should help my case (and I hardly imagine how it can break others) is that in presence of Gather, all the function calls that are parallel safe should be pushed into it. The cost of pushing data through the Sort is not necessarily insignificant. Your functions are (IIUC) extremely expensive, so it's worth going to any length to reduce the time spent evaluating them. However, if someone has ||(text,text) in the tlist, that might be the wrong approach, because it's not saving much to compute that earlier and it might make the sort a lot wider, especially if de-TOASTing is involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: [bug fix??] Fishy code in tts_cirtual_copyslot()
From: Tom Lane [mailto:t...@sss.pgh.pa.us] > I temporarily changed the Assert to be "==" rather than "<=", and > it still passed check-world, so evidently we are not testing any > cases where the descriptors are of different lengths. This explains > the lack of symptoms. It's still a bug though, so pushed. Thank you for committing. > > BTW, I thought that in PostgreSQL coding convention, local variables > should be defined at the top of blocks, but this function writes "for (int > natts;". > > Yeah, we've agreed to join the 21st century to the extent of allowing > local for-loop variables. That's good news. It'll help a bit to code comfortably. Regards Takayuki Tsunakawa
Re: Cache lookup errors with functions manipulation object addresses
On Mon, Sep 23, 2019 at 09:15:24PM +0200, Dmitry Dolgov wrote: > Thanks for the patch! I couldn't check it in action, since looks like it > doesn't apply anymore [1] (although after a quick check I'm not entirely sure > why). Nevertheless I have a few short commentaries: Thanks for the review. There was a small conflict in objectaddress.h easy enough to solve. > Here and in format_operator_extened commentary says > > * Returns a palloc'd string. > > but now it's possible to return NULL, so I guess comments need to be adjusted, > right? Right. > v16-0003-Eliminate-user-visible-cache-lookup-errors-for-o.patch > > - appendStringInfo(, _("operator %s"), > - format_operator(object->objectId)); > - break; > + { > + char *oprname = format_operator_extended(object->objectId, > + FORMAT_PROC_FORCE_NULL); > > Shouldn't it be FORMAT_OPERATOR_FORCE_NULL here? Indeed, that's the case. Please feel free to use the updated versions attached. These can apply on top of HEAD at 30d1379. -- Michael From dd4b4112eea12b4866aa2c6a33de57302b842c8a Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 24 Sep 2019 08:54:37 +0900 Subject: [PATCH v17 1/3] Add flag to format_type_extended to enforce NULL-ness If a type scanned is undefined, type format routines have two behaviors depending on if FORMAT_TYPE_ALLOW_INVALID is defined by the caller: - Generate an error - Return undefined type name "???" or "-". The current interface is unhelpful for callers willing to format properly a type name, but still make sure that the type is defined as there could be types matching the undefined strings. In order to counter that, add a new flag called FORMAT_TYPE_FORCE_NULL which returns a NULL result instead of "??? or "-" which does not generate an error. They will be used for future patches to improve the SQL interface for object addresses. --- src/backend/utils/adt/format_type.c | 22 +- src/include/utils/builtins.h| 1 + 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c index 6ae2a31345..076d2ca3f3 100644 --- a/src/backend/utils/adt/format_type.c +++ b/src/backend/utils/adt/format_type.c @@ -96,13 +96,16 @@ format_type(PG_FUNCTION_ARGS) * - FORMAT_TYPE_ALLOW_INVALID * if the type OID is invalid or unknown, return ??? or such instead * of failing + * - FORMAT_TYPE_FORCE_NULL + * if the type OID is invalid or unknown, return NULL instead of ??? + * or such * - FORMAT_TYPE_FORCE_QUALIFY * always schema-qualify type names, regardless of search_path * * Note that TYPEMOD_GIVEN is not interchangeable with "typemod == -1"; * see the comments above for format_type(). * - * Returns a palloc'd string. + * Returns a palloc'd string, or NULL. */ char * format_type_extended(Oid type_oid, int32 typemod, bits16 flags) @@ -114,13 +117,20 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags) char *buf; bool with_typemod; - if (type_oid == InvalidOid && (flags & FORMAT_TYPE_ALLOW_INVALID) != 0) - return pstrdup("-"); + if (type_oid == InvalidOid) + { + if ((flags & FORMAT_TYPE_FORCE_NULL) != 0) + return NULL; + else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) + return pstrdup("-"); + } tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(type_oid)); if (!HeapTupleIsValid(tuple)) { - if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) + if ((flags & FORMAT_TYPE_FORCE_NULL) != 0) + return NULL; + else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) return pstrdup("???"); else elog(ERROR, "cache lookup failed for type %u", type_oid); @@ -143,7 +153,9 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags) tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(array_base_type)); if (!HeapTupleIsValid(tuple)) { - if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) + if ((flags & FORMAT_TYPE_FORCE_NULL) != 0) +return NULL; + else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) return pstrdup("???[]"); else elog(ERROR, "cache lookup failed for type %u", type_oid); diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index 937ddb7ef0..78071b7755 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -109,6 +109,7 @@ extern Datum numeric_float8_no_overflow(PG_FUNCTION_ARGS); #define FORMAT_TYPE_TYPEMOD_GIVEN 0x01 /* typemod defined by caller */ #define FORMAT_TYPE_ALLOW_INVALID 0x02 /* allow invalid types */ #define FORMAT_TYPE_FORCE_QUALIFY 0x04 /* force qualification of type */ +#define FORMAT_TYPE_FORCE_NULL 0x08 /* force NULL if undefined */ extern char *format_type_extended(Oid type_oid, int32 typemod, bits16 flags); extern char *format_type_be(Oid type_oid); -- 2.23.0 From de4bae9cbdfb9d88c77221506f167dd4d84ebd57 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 24 Sep 2019 08:54:52 +0900 Subject: [PATCH v17 2/3] Refactor
Re: Attempt to consolidate reading of XLOG page
I spent a couple of hours on this patchset today. I merged 0001 and 0002, and decided the result was still messier than I would have liked, so I played with it a bit more -- see attached. I think this is committable, but I'm afraid it'll cause quite a few conflicts with the rest of your series. I had two gripes, which I feel solved with my changes: 1. I didn't like that "dir" and "wal segment size" were part of the "currently open segment" supporting struct. It seemed that those two were slightly higher-level, since they apply to every segment that's going to be opened, not just the current one. My first thought was to put those as members of XLogReaderState, but that doesn't work because the physical walsender.c code does not use xlogreader at all, even though it is reading WAL. Anyway my solution was to create yet another struct, which for everything that uses xlogreader is just part of that state struct; and for walsender, it's just a separate one alongside sendSeg. All in all, this seems pretty clean. 2. Having the wal dir be #ifdef FRONTEND seemed out of place. I know the backend code does not use that, but eliding it is more "noisy" than just setting it to NULL. Also, the "Finalize the segment pointer" thingy seemed out of place. So my code passes the dir as an argument to XLogReaderAllocate, and if it's null then we just don't allocate it. Everybody else can use it to guide things. This results in cleaner code, because we don't have to handle it externally, which was causing quite some pain to pg_waldump. Note that ws_dir member is a char array in the struct, not just a pointer. This saves trouble trying to allocate it (I mainly did it this way because we don't have pstrdup_extended(MCXT_ALLOC_NO_OOM) ... yes, this could be made with palloc+snprintf, but eh, that doesn't seem worth the trouble.) Separately from those two API-wise points, there was one bug which meant that with your 0002+0003 the recovery tests did not pass -- code placement bug. I suppose the bug disappears with later patches in your series, which probably is why you didn't notice. This is the fix for that: - XLogRead(cur_page, state->seg.size, state->seg.tli, targetPagePtr, - state->seg.tli = pageTLI; + state->seg.ws_tli = pageTLI; + XLogRead(cur_page, state->segcxt.ws_segsize, state->seg.ws_tli, targetPagePtr, XLOG_BLCKSZ); ... Also, yes, I renamed all the struct members. If you don't have any strong dislikes for these changes, I'll push this part and let you rebase the remains on top. Regarding the other patches: 1. I think trying to do palloc(XLogReadError) is a bad idea ... for example, if the read fails because of system pressure, we might return "out of memory" during that palloc instead of the real read error. This particular problem you could forestall by changing to ErrorContext, but I have the impression that it might be better to have the error struct by stack-allocated in the caller stack. This forces you to limit the message string to a maximum size (say 128 bytes or maybe even 1000 bytes like MAX_ERRORMSG_LEN) but I don't have a problem with that. 2. Not a fan of the InvalidTimeLineID stuff offhand. Maybe it's okay ... not convinced yet either way. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Attempt to consolidate reading of XLOG page
On 2019-Sep-23, Alvaro Herrera wrote: > I spent a couple of hours on this patchset today. I merged 0001 and > 0002, and decided the result was still messier than I would have liked, > so I played with it a bit more -- see attached. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 477709bbc2..0635a17eae 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -1386,8 +1386,8 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len) XLogReaderState *xlogreader; char *errormsg; - xlogreader = XLogReaderAllocate(wal_segment_size, _local_xlog_page, - NULL); + xlogreader = XLogReaderAllocate(wal_segment_size, NULL, + _local_xlog_page, NULL); if (!xlogreader) ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 501f46fd52..6c69eb6dd7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -885,8 +885,7 @@ static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, int source, bool notfoundOk); static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source); static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, - int reqLen, XLogRecPtr targetRecPtr, char *readBuf, - TimeLineID *readTLI); + int reqLen, XLogRecPtr targetRecPtr, char *readBuf); static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); @@ -1195,7 +1194,8 @@ XLogInsertRecord(XLogRecData *rdata, appendBinaryStringInfo(, rdata->data, rdata->len); if (!debug_reader) - debug_reader = XLogReaderAllocate(wal_segment_size, NULL, NULL); + debug_reader = XLogReaderAllocate(wal_segment_size, NULL, + NULL, NULL); if (!debug_reader) { @@ -4296,7 +4296,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, XLByteToSeg(xlogreader->latestPagePtr, segno, wal_segment_size); offset = XLogSegmentOffset(xlogreader->latestPagePtr, wal_segment_size); - XLogFileName(fname, xlogreader->readPageTLI, segno, + XLogFileName(fname, xlogreader->seg.ws_tli, segno, wal_segment_size); ereport(emode_for_corrupt_record(emode, RecPtr ? RecPtr : EndRecPtr), @@ -6353,7 +6353,8 @@ StartupXLOG(void) /* Set up XLOG reader facility */ MemSet(, 0, sizeof(XLogPageReadPrivate)); - xlogreader = XLogReaderAllocate(wal_segment_size, , ); + xlogreader = XLogReaderAllocate(wal_segment_size, NULL, + , ); if (!xlogreader) ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), @@ -7355,7 +7356,7 @@ StartupXLOG(void) * and we were reading the old WAL from a segment belonging to a higher * timeline. */ - EndOfLogTLI = xlogreader->readPageTLI; + EndOfLogTLI = xlogreader->seg.ws_tli; /* * Complain if we did not roll forward far enough to render the backup @@ -11523,7 +11524,7 @@ CancelBackup(void) */ static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, - XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI) + XLogRecPtr targetRecPtr, char *readBuf) { XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data; @@ -11640,7 +11641,7 @@ retry: Assert(targetPageOff == readOff); Assert(reqLen <= readLen); - *readTLI = curFileTLI; + xlogreader->seg.ws_tli = curFileTLI; /* * Check the page header immediately, so that we can retry immediately if diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index a66e3324b1..27c27303d6 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -68,8 +68,8 @@ report_invalid_record(XLogReaderState *state, const char *fmt,...) * Returns NULL if the xlogreader couldn't be allocated. */ XLogReaderState * -XLogReaderAllocate(int wal_segment_size, XLogPageReadCB pagereadfunc, - void *private_data) +XLogReaderAllocate(int wal_segment_size, const char *waldir, + XLogPageReadCB pagereadfunc, void *private_data) { XLogReaderState *state; @@ -96,7 +96,10 @@ XLogReaderAllocate(int wal_segment_size, XLogPageReadCB pagereadfunc, return NULL; } - state->wal_segment_size = wal_segment_size; + /* Initialize segment info. */ + WALOpenSegmentInit(>seg, >segcxt, wal_segment_size, + waldir); + state->read_page = pagereadfunc; /* system_identifier initialized to zeroes above */ state->private_data = private_data; @@ -198,6 +201,23 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength) return true; } +/* + * Initialize the passed segment structs. + */
Re: scorpionfly needs more semaphores
On Mon Sep 23, 2019 at 8:34 PM Mikael Kjellström wrote: > On 2019-09-23 00:29, Thomas Munro wrote: > > On Wed, Sep 18, 2019 at 4:55 PM jungle boogie > > wrote: > >> $ sysctl | ag kern.seminfo.semmni > >> kern.seminfo.semmni=100 > > > > It still seems to be happening. Perhaps you need to increase semmns too? > > I have on my OpenBSD 6.5 /etc/sysctl.conf the following: > > kern.seminfo.semmni=200 > kern.seminfo.semmns=4000 > > and it seems to work fine. Thanks! I've made these adjustments and we'll see what happens. Thomas, No offense taken with your input from last week. Your additional input this week is welcomed and I'll pass along to some folks. Thank you all for continuing to monitor Postgresql building on OpenBSD. > > /Mikael
Re: Efficient output for integer types
On Mon, Sep 23, 2019 at 01:16:36PM +0100, Andrew Gierth wrote: > > "David" == David Fetter writes: > > David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - > 1); > > No, this is just reintroducing the undefined behavior again. Once the > value has been converted to unsigned you can't cast it back to signed or > pass it to a function expecting a signed value, since it will overflow > in the INT_MIN case. (and in this example would probably output '-' > signs until it ran off the end of memory). > > Here's how I would do it: > > char * > pg_ltostr_zeropad(char *str, int32 value, int32 minwidth) > { > int32 len; > uint32 uvalue = value; > > Assert(minwidth > 0); > > if (value >= 0) > { > if (value < 100 && minwidth == 2) /* Short cut for common case > */ > { > memcpy(str, DIGIT_TABLE + value*2, 2); > return str + 2; > } > } > else > { > *str++ = '-'; > minwidth -= 1; > uvalue = (uint32)0 - uvalue; > } > > len = pg_ultoa_n(uvalue, str); > if (len >= minwidth) > return str + len; > > memmove(str + minwidth - len, str, len); > memset(str, '0', minwidth - len); > return str + minwidth; > } Done pretty much that way. > David> pg_ltostr(char *str, int32 value) > David> + int32 len = pg_ultoa_n(value, str); > David> + return str + len; > > This seems to have lost its handling of negative numbers entirely Given the comment that precedes it and all the use cases in the code, I changed the signature to take an unsigned integer instead. It's pretty clear that the intent was to add digits and only digits to the passed-in string. > (which doesn't say much for the regression test coverage) I didn't see any obvious way to test functions not surfaced to SQL. Should we have one? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From a0eecd74d1dd3f6a4bb1a07f41740c4f6d34ceac Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 15 Sep 2019 00:06:29 -0700 Subject: [PATCH v12] Make int4 and int8 operations more efficent To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.21.0" This is a multi-part message in MIME format. --2.21.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit - Output routines now do more digits per iteration, and - Code determines the number of decimal digits in int4/int8 efficiently - Split off pg_ltoa_n from pg_ltoa - Use same to make other functions shorter diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 651ade14dd..5c5b6d33b2 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) case INT8OID: { int64 num = DatumGetInt64(value); - char str[23]; /* sign, 21 digits and '\0' */ + char str[MAXINT8LEN + 1]; pg_lltoa(num, str); pq_sendcountedtext(, str, strlen(str), false); diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..6230807906 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -27,8 +27,6 @@ #include "utils/builtins.h" -#define MAXINT8LEN 25 - typedef struct { int64 current; diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 70138feb29..5dfbe8dfcd 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -20,6 +20,68 @@ #include "common/int.h" #include "utils/builtins.h" +#include "port/pg_bitutils.h" + +/* + * A table of all two-digit numbers. This is used to speed up decimal digit + * generation by copying pairs of digits into the final output. + */ +static const char DIGIT_TABLE[200] = +"00" "01" "02" "03" "04" "05" "06" "07" "08" "09" +"10" "11" "12" "13" "14" "15" "16" "17" "18" "19" +"20" "21" "22" "23" "24" "25" "26" "27" "28" "29" +"30" "31" "32" "33" "34" "35" "36" "37" "38" "39" +"40" "41" "42" "43" "44" "45" "46" "47" "48" "49" +"50" "51" "52" "53" "54" "55" "56" "57" "58" "59" +"60" "61" "62" "63" "64" "65" "66" "67" "68" "69" +"70" "71" "72" "73" "74" "75" "76" "77" "78" "79" +"80" "81" "82" "83" "84" "85" "86" "87" "88" "89" +"90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; + +/* + * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ +static inline int +decimalLength32(const uint32 v) +{ + int t; + static uint32 PowersOfTen[] = + {1,10,100, + 1000, 1, 10, + 100, 1000, 1, +
[PATCH] Sort policies and triggers by table name in pg_dump.
Hello all, Currently pg_dump sorts most dumpable objects by priority, namespace, name and then object ID. Since triggers and RLS policies belong to tables, there may be more than one with the same name within the same namespace, leading to potential sorting discrepancies between databases that only differ by object IDs. The attached draft patch (made against `pg_dump_sort.c` on master) breaks ties for trigger and policy objects by using the table name, increasing the sort order stability. I have compiled it and executed it against a number of local databases and it behaves as desired. I am new to PostgreSQL contribution and my C-skills are rusty, so please let me know if I can improve the patch, or if there are areas of PostgreSQL that I have overlooked. Kind regards, Benjie Gillam From 55b3845b746498a1544bc7ced16a369f4da3905f Mon Sep 17 00:00:00 2001 From: Benjie Gillam Date: Mon, 23 Sep 2019 21:18:24 +0100 Subject: [PATCH] Sort policies and triggers by table name in pg_dump. To: pgsql-hack...@postgresql.org --- src/bin/pg_dump/pg_dump_sort.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c index 31fc06a255..5ff7359dff 100644 --- a/src/bin/pg_dump/pg_dump_sort.c +++ b/src/bin/pg_dump/pg_dump_sort.c @@ -207,6 +207,28 @@ DOTypeNameCompare(const void *p1, const void *p2) if (cmpval != 0) return cmpval; } + else if (obj1->objType == DO_POLICY) + { + PolicyInfo *pobj1 = *(PolicyInfo *const *) p1; + PolicyInfo *pobj2 = *(PolicyInfo *const *) p2; + + /* Sort by table name */ + cmpval = strcmp(pobj1->poltable->dobj.name, + pobj2->poltable->dobj.name); + if (cmpval != 0) + return cmpval; + } + else if (obj1->objType == DO_TRIGGER) + { + TriggerInfo *tobj1 = *(PolicyInfo *const *) p1; + TriggerInfo *tobj2 = *(PolicyInfo *const *) p2; + + /* Sort by table name */ + cmpval = strcmp(tobj1->tgtable->dobj.name, + tobj2->tgtable->dobj.name); + if (cmpval != 0) + return cmpval; + } /* Usually shouldn't get here, but if we do, sort by OID */ return oidcmp(obj1->catId.oid, obj2->catId.oid); -- 2.17.1
Re: Efficient output for integer types
On Mon, Sep 23, 2019 at 10:28:09AM +0200, Tels wrote: > Moin, > > On 2019-09-22 23:58, David Fetter wrote: > > On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote: > > > > "David" == David Fetter writes: > > > Fixed. > > Good work, more performance is sure nice :) > > Noticed one more thing in the patch: > > > - *start++ = *a; > > - *a-- = swap; > > + memcpy(pos - 2, DIGIT_TABLE + c, 2); > > + i += 2; > > } > > + else > > + *a = (char) ('0' + value2); > > + > > + return olength; > > } > > The line "i += 2;" modifies i, but i is never used again nor returned. I found a similar one in a similar function, and removed it, too. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: subscriptionCheck failures on nightjar
On 9/20/19 6:17 PM, Tom Lane wrote: > Alvaro Herrera writes: >> Uh .. I didn't think it was possible that we would build the same >> snapshot file more than once. Isn't that a waste of time anyway? Maybe >> we can fix the symptom by just not doing that in the first place? >> I don't have a strategy to do that, but seems worth considering before >> retiring the bf animals. > The comment near the head of SnapBuildSerialize says > > * there is an obvious race condition here between the time we stat(2) the > * file and us writing the file. But we rename the file into place > * atomically and all files created need to contain the same data anyway, > * so this is perfectly fine, although a bit of a resource waste. Locking > * seems like pointless complication. > > which seems like a reasonable argument. Also, this is hardly the only > place where we expect rename(2) to be atomic. So I tend to agree with > Andres that we should consider OSes with such a bug to be unsupported. > > Dromedary is running the last release of macOS that supports 32-bit > hardware, so if we decide to kick that to the curb, I'd either shut > down the box or put some newer Linux or BSD variant on it. > > Well, nightjar is on FBSD 9.0 which is oldish. I can replace it before long with an 11-stable instance if that's appropriate. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Psql patch to show access methods info
On Wed, Sep 18, 2019 at 5:04 PM Alvaro Herrera wrote: > On 2019-Sep-18, Alexander Korotkov wrote: > > > On Tue, Sep 17, 2019 at 9:01 PM Alvaro Herrera > > wrote: > > > > I think \dAf is just as critical as \dAo; the former lets you know which > > > opfamilies you can use in CREATE INDEX, while the latter lets you know > > > which operators would be helped by such an index. (But, really, only if > > > the opfamily name is printed in \d of the index, which we currently > > > don't print unless it's non-default ... which is an omission that > > > perhaps we should consider fixing). > > > I think you have a point. Will add \dAf command to the patch. > > Great, thanks. Revised patch is attached. 1) It adds \dAf[+] command showing opfamilies, which belong to given AM and have opclasses for given datatype. 2) It turns back warning when running \dA[+] with 2 or more arguments. Two questions are open for me: 1) Currently we allow to filter opfamilies by type, but supported types aren't displayed. Should we display datatypes? Should we aggregate them into comma-separated list? 2) Given we now can display the list of opfamilies, it would be reasonable to be able to see list of opclasses belonging to particular opfamily. But currently \dAc doesn't have filter by opclass. Should we implement this as an separate command? I'll be very glad for feedback. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Add-psql-AM-info-commands-v10.patch Description: Binary data
Re: JSONPATH documentation
Hi! On Mon, Sep 23, 2019 at 10:10 PM Steven Pousty wrote: > Thanks for the education on the path spec. Too bad it is in a zip doc - do > you know of a place where it is publicly available so we can link to it? > Perhaps there is some document or page you think would be a good reference > read for people who want to understand more? > https://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip Yes, this link looks good to me. It's technical report, not standard itself. So, it may have some little divergences. But it seems to be the best free resource available, assuming standard itself isn't free. > I am uncertain why JSONPath is considered part of the datatype any more so > than string functions are considered part of the character datatype > https://www.postgresql.org/docs/11/functions-string.html Let me clarify my thoughts. SQL-level functions jsonb_path_*() (table 9.49) are clearly not part of jsonpath datatype. But jsonpath accessors (table 8.25), functions (table 9.44) and operators (table 9.45) are used inside jsonpath value. So, technically they are parts of jsonpath datatype. P.S. We don't use top posting in mailing lists. Please, use bottom posting. See https://en.wikipedia.org/wiki/Posting_style#Top-posting for details. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Proposal: Better query optimization for "NOT IN" clause
Hello, Just for information there are some works regarding how to include this in core, that may interest you ;o) see "NOT IN subquery optimization" https://www.postgresql.org/message-id/flat/1550706289606-0.post%40n3.nabble.com commitfest entry: https://commitfest.postgresql.org/24/2023/ and "Converting NOT IN to anti-joins during planning" https://www.postgresql.org/message-id/flat/CAKJS1f82pqjqe3WT9_xREmXyG20aOkHc-XqkKZG_yMA7JVJ3Tw%40mail.gmail.com commitfest entry: https://commitfest.postgresql.org/24/2020/ Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Cache lookup errors with functions manipulation object addresses
> On Tue, Jul 2, 2019 at 9:28 AM Michael Paquier wrote: > > On Thu, Feb 21, 2019 at 04:40:13PM +0900, Michael Paquier wrote: > > Rebased version fixing some conflicts with HEAD. > > And rebased version for this stuff on HEAD (66c5bd3), giving visibly > v16. Thanks for the patch! I couldn't check it in action, since looks like it doesn't apply anymore [1] (although after a quick check I'm not entirely sure why). Nevertheless I have a few short commentaries: v16-0001-Add-flag-to-format_type_extended-to-enforce-NULL.patch - if (type_oid == InvalidOid && (flags & FORMAT_TYPE_ALLOW_INVALID) != 0) - return pstrdup("-"); + if (type_oid == InvalidOid) + { + if ((flags & FORMAT_TYPE_FORCE_NULL) != 0) + return NULL; + else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) + return pstrdup("-"); + } Here and in format_operator_extened commentary says * Returns a palloc'd string. but now it's possible to return NULL, so I guess comments need to be adjusted, right? v16-0003-Eliminate-user-visible-cache-lookup-errors-for-o.patch - appendStringInfo(, _("operator %s"), - format_operator(object->objectId)); - break; + { + char *oprname = format_operator_extended(object->objectId, + FORMAT_PROC_FORCE_NULL); Shouldn't it be FORMAT_OPERATOR_FORCE_NULL here? I'll continue reviewing the last patch in more details. [1]: http://cfbot.cputube.org/patch_24_1947.log
Re: JSONPATH documentation
Hey there: Thanks for the education on the path spec. Too bad it is in a zip doc - do you know of a place where it is publicly available so we can link to it? Perhaps there is some document or page you think would be a good reference read for people who want to understand more? https://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip I am uncertain why JSONPath is considered part of the datatype any more so than string functions are considered part of the character datatype https://www.postgresql.org/docs/11/functions-string.html On Mon, Sep 23, 2019 at 11:07 AM Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Mon, Sep 23, 2019 at 7:52 PM Steven Pousty > wrote: > > JSON Containment, JSONPath, and Transforms are means to work with JSONB > but not the actual datatype itself. Doc should be split into > > 1) Data type - how do declare, indexing, considerations when using it... > > 2) Ways to work with the data type - functions, containment, JSONPath... > > > > These can be separate pages or on the same page but they need to be > logically and physically separated > > According to your proposal, where jsonpath functions, operators and > accessors should be described in? On the one hand jsonpath functions > etc. are part of jsonpath datatype. On the other hand it's functions > we apply to jsonb documents. > > > There should also be a link to some of the original JSONPath spec > > https://goessner.net/articles/JsonPath/ > > We implement JSONPath according to SQL Standard 2016. Your link > provides earlier attempt to implement jsonpath. It's similar, but > some examples don't follow standard (and don't work in our > implementation). For instance '$.store.book[(@.length-1)].title' > should be written as '$.store.book[last - 1] .title'. > > > Thank you so much for putting so much work into the documentation! > Please let me know if there are others way you would like to me help with > the doc. > > Thank you! My main point is that we should put description of > jsonpath into single place. But we need to reach consensus on what > this place should be. > > -- > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company >
Re: scorpionfly needs more semaphores
On 2019-09-23 00:29, Thomas Munro wrote: On Wed, Sep 18, 2019 at 4:55 PM jungle boogie wrote: $ sysctl | ag kern.seminfo.semmni kern.seminfo.semmni=100 It still seems to be happening. Perhaps you need to increase semmns too? I have on my OpenBSD 6.5 /etc/sysctl.conf the following: kern.seminfo.semmni=200 kern.seminfo.semmns=4000 and it seems to work fine. /Mikael
Re: JSONPATH documentation
On Mon, Sep 23, 2019 at 7:52 PM Steven Pousty wrote: > JSON Containment, JSONPath, and Transforms are means to work with JSONB but > not the actual datatype itself. Doc should be split into > 1) Data type - how do declare, indexing, considerations when using it... > 2) Ways to work with the data type - functions, containment, JSONPath... > > These can be separate pages or on the same page but they need to be logically > and physically separated According to your proposal, where jsonpath functions, operators and accessors should be described in? On the one hand jsonpath functions etc. are part of jsonpath datatype. On the other hand it's functions we apply to jsonb documents. > There should also be a link to some of the original JSONPath spec > https://goessner.net/articles/JsonPath/ We implement JSONPath according to SQL Standard 2016. Your link provides earlier attempt to implement jsonpath. It's similar, but some examples don't follow standard (and don't work in our implementation). For instance '$.store.book[(@.length-1)].title' should be written as '$.store.book[last - 1] .title'. > Thank you so much for putting so much work into the documentation! Please let > me know if there are others way you would like to me help with the doc. Thank you! My main point is that we should put description of jsonpath into single place. But we need to reach consensus on what this place should be. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: overhead due to casting extra parameters with aggregates (over and over)
On Mon, Sep 23, 2019 at 12:53:36PM -0400, Tom Lane wrote: Tomas Vondra writes: I've been working on a custom aggregate, and I've ran into some fairly annoying overhead due to casting direct parameters over and over. I'm wondering if there's a way to eliminate this, somehow, without having to do an explicit cast. Imagine you have a simple aggregate: CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[]) ( ... ); with two direct parameters (actually, I'm not sure that's the correct term, becuse this is not an ordered-set aggregate and [1] only talks about direct parameters in that context). Anyway, I'm talking about the extra parameters, after the 'double precision' value to aggregate. But you're not telling the system that those are direct parameters, at least not if you mean that they can only legitimately have one value across the whole query. As-is, they're just more aggregated arguments so we have to evaluate them again at each row. Understood. It's fairly messy that the SQL spec ties direct arguments to ordered-set aggregates; you'd think there'd be some value in treating those features as orthogonal. I'm not sure how we could wedge them into the syntax otherwise, though :-(. You could perhaps convert your aggregate to an ordered-set aggregate, but then you'd be paying for a sort that you don't need, IIUC. Yeah, having to do the sort (and keep all the data) is exactly what the tdigest is meant to eliminate, so making it an ordered-set aggregate is exactly the thing I don't want to do. Also, it disables parallel query, which is another reason not to do that. After a while, I've realized that the issue is casting - the CTE produces numeric[] array, and we do the cast to double precision[] on every call to the state transition function (and we do ~10M of those). The only reason that the CTE reference is cheap is that we understand that it's stable so we don't have to recompute it each time; otherwise you'd be moaning about that more than the cast. As you say, the short term workaround is to do the casting inside the sub-select. I think the long term fix is to generically avoid re-computing stable subexpressions. There was a patch for that a year or so ago but the author never finished it, AFAIR. Hmmm, yeah. I'll dig throgh the archives, although it's not a very high priority - it's more a thing that surprised/bugged me while working on the custom aggregate. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: overhead due to casting extra parameters with aggregates (over and over)
Tomas Vondra writes: > I've been working on a custom aggregate, and I've ran into some fairly > annoying overhead due to casting direct parameters over and over. I'm > wondering if there's a way to eliminate this, somehow, without having to > do an explicit cast. > Imagine you have a simple aggregate: > CREATE AGGREGATE tdigest_percentile(double precision, int, double > precision[]) > ( > ... > ); > with two direct parameters (actually, I'm not sure that's the correct > term, becuse this is not an ordered-set aggregate and [1] only talks > about direct parameters in that context). Anyway, I'm talking about the > extra parameters, after the 'double precision' value to aggregate. But you're not telling the system that those are direct parameters, at least not if you mean that they can only legitimately have one value across the whole query. As-is, they're just more aggregated arguments so we have to evaluate them again at each row. It's fairly messy that the SQL spec ties direct arguments to ordered-set aggregates; you'd think there'd be some value in treating those features as orthogonal. I'm not sure how we could wedge them into the syntax otherwise, though :-(. You could perhaps convert your aggregate to an ordered-set aggregate, but then you'd be paying for a sort that you don't need, IIUC. > After a while, I've realized that the issue is casting - the CTE > produces numeric[] array, and we do the cast to double precision[] on > every call to the state transition function (and we do ~10M of those). The only reason that the CTE reference is cheap is that we understand that it's stable so we don't have to recompute it each time; otherwise you'd be moaning about that more than the cast. As you say, the short term workaround is to do the casting inside the sub-select. I think the long term fix is to generically avoid re-computing stable subexpressions. There was a patch for that a year or so ago but the author never finished it, AFAIR. regards, tom lane
Re: JSONPATH documentation
JSON Containment, JSONPath, and Transforms are means to work with JSONB but not the actual datatype itself. Doc should be split into 1) Data type - how do declare, indexing, considerations when using it... 2) Ways to work with the data type - functions, containment, JSONPath... These can be separate pages or on the same page but they need to be logically and physically separated There should also be a link to some of the original JSONPath spec https://goessner.net/articles/JsonPath/ Thank you so much for putting so much work into the documentation! Please let me know if there are others way you would like to me help with the doc. On Sun, Sep 22, 2019 at 4:03 PM Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Mon, Sep 23, 2019 at 1:03 AM Tom Lane wrote: > > Alexander Korotkov writes: > > > On Sun, Sep 22, 2019 at 9:18 PM Jeff Janes > wrote: > > > Currently description of jsonpath is divided between datatypes section > > > and functions and operators section. And yes, this looks cumbersome. > > > > Agreed, but ... > > > > > I think we should move the whole description to the one section. > > > Probably we should move jsonpath description to datatypes section > > > (assuming jsonpath is a datatype) leaving functions and operators > > > section with just SQL-level functions and operators. What do you > > > think? > > > > ... I don't think that's an improvement. We don't document detailed > > behavior of a datatype's functions in datatype.sgml, and this seems > > like it would be contrary to that layout. If anything, I'd merge > > the other way, with only a very minimal description of jsonpath > > (perhaps none?) in datatype.sgml. > > > > While we're whining about this, I find it very off-putting that > > the jsonpath stuff was inserted in the JSON functions section > > ahead of the actual JSON functions. I think it should have > > gone after them, because it feels like a barely-related interjection > > as it stands. Maybe there's even a case that it should be > > its own , after the "functions-json" section. > > Yes, it think moving jsonpath description to own is a good > idea. But I still think we should have complete jsonpath description > in the single place. What about joining jsonpath description from > both datatypes section and functions and operators section into this > , leaving datatypes section with something very brief? > > -- > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > >
Re: Global temporary tables
po 23. 9. 2019 v 9:57 odesílatel Konstantin Knizhnik < k.knizh...@postgrespro.ru> napsal: > > > On 20.09.2019 19:43, Pavel Stehule wrote: > > > 1. I do not need them at all. >> 2. Eliminate catalog bloating. >> 3. Mostly needed for compatibility with Oracle (simplify porting,...). >> 4. Parallel query execution. >> 5. Can be used at replica. >> 6. More efficient use of resources (first of all memory). >> > > There can be other point important for cloud. Inside some cloud usually > there are two types of discs - persistent (slow) and ephemeral (fast). We > effectively used temp tables there because we moved temp tablespace to > ephemeral discs. > > > Yes, I already heard this argument and agree with it. > I just want to notice two things: > 1. My assumption is that in most cases data of temporary table can fit in > memory (certainly if we are not limiting them by temp_buffers = 8MB, but > store in shared buffers) and so there is on need to write them to the > persistent media at all. > 2. Global temp tables do not substitute local temp tables, accessed > through local buffers. So if you want to use temporary storage, you will > always have a way to do it. > The question is whether we need to support two kinds of global temp tables > (with shared or private buffers) or just implement one of them. > It's valid only for OLTP. OLAP world is totally different. More if all users used temporary tables, and you should to calculate with it - it is one reason for global temp tables, then you need multiply size by max_connection. hard to say what is best from implementation perspective, but it can be unhappy if global temporary tables has different performance characteristics and configuration than local temporary tables. > > > I missing one point in your list - developer's comfort - using temp tables > is just much more comfortable - you don't need create it again, again, .. > Due this behave is possible to reduce @2 and @3 can be nice side effect. If > you reduce @2 to zero, then @5 should be possible without any other. > > Sorry, I do not completely understand your point here > You can use normal (permanent) table and you will not have to create them > again and again. It is also possible to use them for storing temporary data > - just need to truncate table when data is not needed any more. > Certainly you can not use the same table in more than one backend. Here is > the main advantage of temp tables - you can have storage of per-session > data and do not worry about possible name conflicts. > You use temporary tables because you know so you share data between session never. I don't remember any situation when I designed temp tables with different schema for different sessions. Using global temp table is not effective - you are work with large tables, you need to use delete, .. so you cannot to use classic table like temp tables effectively. > From the other side: there are many cases where format of temporary data > is not statically known: it is determined dynamically during program > execution. > In this case local temp table provides the most convenient mechanism for > working with such data. > > This is why I think that ewe need to have both local and global temp > tables. > > Also I do not agree with your statement "If you reduce @2 to zero, then @5 > should be possible without any other". > In the solution implemented by Aleksander Alekseev metadata of temporary > tables is kept in memory and not affecting catalog at all. > But them still can not be used at replica. > There are still some serious problems which need to be fixed to able it: > allow insert/update/delete statements for read-only transactions, somehow > assign XIDs for them, implement savepoints and rollback of such > transactions. > All this was done in the last version of my patch. > Yes, it doesn't depend on whether we are using shared or private buffers > for temporary tables. The same approach can be implemented for both of them. > The question is whether we are really need temp tables at replica and if > so, do we need full transaction support for them, including rollbacks, > subtransactions. > temporary tables (of any type) on replica is interesting feature that opens some possibilities. Some queries cannot be optimized and should be divided and some results should be stored to temporary tables, analysed (to get correct statistics), maybe indexed, and after that the calculation can continue. Now you can do this just only on master. More - on HotStandBy the data are read only, and without direct impact on master (production), so you can do some harder calculation there. And temporary tables is used technique how to fix estimation errors. I don't think so subtransaction, transaction, rollbacks are necessary for these tables. On second hand with out it, it is half cooked features, and can looks pretty strange in pg environment. I am very happy, how much work you do in this area, I had not a courage to start this job, but I
Re: [proposal] de-TOAST'ing using a iterator
Paul Ramsey, do you have opinions to share about this patch? I think PostGIS might benefit from it. Thread starts here: https://postgr.es/m/cal-ogks_onzpc9m9bxpcztmofwulcfkyecekiagxzwrl8kx...@mail.gmail.com -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_upgrade check fails on Solaris 10
On 2019-Sep-23, Marina Polyakova wrote: > On 2019-09-18 17:36, Alvaro Herrera wrote: > > On 2019-Sep-17, Marina Polyakova wrote: > > > > > We got an error for pg_upgrade check on the branch REL_11_STABLE > > > (commit > > > 40ad4202513c72f5c1beeb03e26dfbc8890770c0) on Solaris 10 because IIUC > > > the > > > argument to the sed command is not enclosed in quotation marks (see > > > [1]): > > > > Hmm, I'm surprised it has taken this long to detect the problem. > > Looking at the members of buildfarm [1] castoroides and protosciurus - IIUC > they do not check pg_upgrade. And I was that lucky one who have run the > branch with the latest commits at our buildfarm... Argh. But I meant "how come nobody runs pg_upgrade tests on old Solaris?" > > I have pushed it to all branches that have src/bin/pg_upgrade (namely, > > 9.5 onwards), thanks. I hope this won't make the msys/mingw machines > > angry ;-) > > Thank you! I ran pg_upgrade tests for MSYS, everything is fine. > > The branch REL9_4_STABLE (commit 8a17afe84be6fefe76d0d2f4d26c5ee075e64487) > has the same issue - according to the release table [2] it is still > supported, isn't it?... Yeah, but pg_upgrade is in contrib/ in 9.4, so nowhere as good as from 9.5 onwards; and it's going to die in a couple of months anyway, so I'm not thrilled about fixing this there. If you *need* to have this fixed in 9.4, we can do that, but do you? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_upgrade check fails on Solaris 10
On 2019-09-18 17:36, Alvaro Herrera wrote: On 2019-Sep-17, Marina Polyakova wrote: Hello, hackers! We got an error for pg_upgrade check on the branch REL_11_STABLE (commit 40ad4202513c72f5c1beeb03e26dfbc8890770c0) on Solaris 10 because IIUC the argument to the sed command is not enclosed in quotation marks (see [1]): Hmm, I'm surprised it has taken this long to detect the problem. Looking at the members of buildfarm [1] castoroides and protosciurus - IIUC they do not check pg_upgrade. And I was that lucky one who have run the branch with the latest commits at our buildfarm... Attached diff.patch fixes the problem. I have pushed it to all branches that have src/bin/pg_upgrade (namely, 9.5 onwards), thanks. I hope this won't make the msys/mingw machines angry ;-) Thank you! I ran pg_upgrade tests for MSYS, everything is fine. The branch REL9_4_STABLE (commit 8a17afe84be6fefe76d0d2f4d26c5ee075e64487) has the same issue - according to the release table [2] it is still supported, isn't it?... [1] https://buildfarm.postgresql.org/cgi-bin/show_members.pl [2] https://www.postgresql.org/support/versioning/ -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
overhead due to casting extra parameters with aggregates (over and over)
Hi, I've been working on a custom aggregate, and I've ran into some fairly annoying overhead due to casting direct parameters over and over. I'm wondering if there's a way to eliminate this, somehow, without having to do an explicit cast. Imagine you have a simple aggregate: CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[]) ( ... ); with two direct parameters (actually, I'm not sure that's the correct term, becuse this is not an ordered-set aggregate and [1] only talks about direct parameters in that context). Anyway, I'm talking about the extra parameters, after the 'double precision' value to aggregate. The last parameter is an array of values in [0.0,1.0], representing percentiles (similarly to what we do in percentile_cont). It's annoying to write literal values, so let's use CTE to generate the array: WITH perc AS (SELECT array_agg(i/100.0) AS p FROM generate_series(1,99) s(i)) SELECT SELECT tdigest_percentile(random(), 100, (SELECT p FROM perc)) FROM generate_series(1,1000); which does work, but it's running for ~180 seconds. When used with an explicit array literal, it runs in ~1.6 second. SELECT tdigest_percentile(random(), 100, ARRAY[0.01, ..., 0.99])) FROM generate_series(1,1000); After a while, I've realized that the issue is casting - the CTE produces numeric[] array, and we do the cast to double precision[] on every call to the state transition function (and we do ~10M of those). The cast is fairly expensive - much more expensive than the aggregate itself. The explicit literal ends up being the right type, so the whole query is much faster. And indeed, adding the explicit cast to the CTE query WITH perc AS (SELECT array_agg((i/100.0)::double precision) AS p FROM generate_series(1,99) s(i)) SELECT SELECT tdigest_percentile(random(), 100, (SELECT p FROM perc)) FROM generate_series(1,1000); does the trick - the query is ~1.6s again. I wonder if there's a chance to detect and handle this without having to do the cast over and over? I'm thinking that's not quite possible, because the value is not actually guaranteed to be the same for all calls (even though it's the case for the example I've given). But maybe we could flag the parameter somehow, to make it more like the direct parameter (which is only evaluated once). I don't really need those extra parameters in the transition function at all, it's fine to just get it to the final function (and there should be far fewer calls to those). regards [1] https://www.postgresql.org/docs/current/sql-createaggregate.html -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Wrong sentence in the README?
"Daniel Westermann (DWE)" writes: >> in the README, top level, there is this: >> PostgreSQL has many language interfaces, many of which are listed here: >> https://www.postgresql.org/download >> I don't think the download page lists any language interfaces or do I miss >> something? >Not directly on that page, though if you drill down into the "software >catalogue" you can find them. >Since there's already a link to that same page a bit further down in >the file, I'm inclined to just remove the quoted sentence. Maybe >add something about "and related software" to the later link. Certainly >"language interfaces" is just one small part of that. Thank you, Tom. I'll try to come up with a patch for that in the next days. Regards Daniel
Re: Commit fest 2019-09
Hello The fourth week of this commitfest begins with these numbers: statusstring │ week1 │ week2 │ week3 │ week4 ┼───┼───┼───┼─── Needs review │ 165 │ 138 │ 116 │ 118 Waiting on Author │30 │44 │51 │44 Ready for Committer│11 │ 5 │ 8 │11 Returned with Feedback │ 1 │ 4 │ 5 │ 5 Moved to next CF │ 2 │ 4 │ 4 │ 4 Committed │14 │23 │32 │34 Rejected │ 1 │ 1 │ 1 │ 1 Withdrawn │ 4 │ 9 │11 │11 Progress last week is disappointing. I hope we manage to get more patches closed this week. In terms of bugfixes, this is how we started the month: status │ week1 │ week4 ┼───┼─── Needs review │20 │10 Waiting on Author │ 4 │ 6 Ready for Committer│ 2 │ 1 Returned with Feedback │ 0 │ 1 Moved to next CF │ 0 │ 1 Committed │ 2 │ 6 Withdrawn │ 1 │ 2 We've not made a lot of progress here either, only four bugfix patches committed in three weeks :-( In terms of patch author participation in reviewing other people's patches, the metrics are decidedly disappointing. Strongest reviewers, counting replies on threads for patches other than one's own: * Michael Paquier * Álvaro Herrera (discounting "please rebase" / "marked WoA" pings) * Amit Kapila * Tom Lane Everyone is welcome to join the party! -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [proposal] de-TOAST'ing using a iterator
Alvaro Herrera 于2019年9月17日周二 上午5:51写道: > On 2019-Sep-10, Binguo Bao wrote: > > > +/* > > + * Support for de-TOASTing toasted value iteratively. "need" is a > pointer > > + * between the beginning and end of iterator's ToastBuffer. The marco > > + * de-TOAST all bytes before "need" into iterator's ToastBuffer. > > + */ > > +#define PG_DETOAST_ITERATE(iter, need) > \ > > + do { > > \ > > + Assert((need) >= (iter)->buf->buf && (need) <= > (iter)->buf->capacity); \ > > + while (!(iter)->done && (need) >= (iter)->buf->limit) { > \ > > + detoast_iterate(iter); > \ > > + } > > \ > > + } while (0) > > /* WARNING -- unaligned pointer */ > > #define PG_DETOAST_DATUM_PACKED(datum) \ > > pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum)) > > In broad terms this patch looks pretty good to me. I only have a small > quibble with this API definition in fmgr.h -- namely that it forces us > to export the definition of all the structs (that could otherwise be > private to toast_internals.h) in order to satisfy callers of this macro. > I am wondering if it would be possible to change detoast_iterate and > PG_DETOAST_ITERATE in a way that those details remain hidden -- I am > thinking something like "if this returns NULL, then iteration has > finished"; and relieve the macro from doing the "->buf->buf" and > "->buf->limit" checks. I think changing that would require a change in > how the rest of the code is structured around this (the textpos internal > function), but seems like it would be better overall. > > (AFAICS that would enable us to expose much less about the > iterator-related structs to detoast.h -- you should be able to move the > struct defs to toast_internals.h) > > Then again, it might be just wishful thinking, but it seems worth > considering at least. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > I've tied to hide the details of the struct in patch v11 with checking "need" pointer inside detoast_iterate function. I also compared the performance of the two versions. patch v10 patch v11 comp. beg.1413ms 1489ms comp. end 24327ms28011ms uncomp. beg. 1439ms 1432ms uncomp. end 25019ms29007ms We can see that v11 is about 15% slower than v10 on suffix queries since this involves the complete de-TOASTing and detoast_iterate() function is called frequently in v11. Personally, I prefer patch v10. Its performance is superior, although it exposes some struct details. -- Best regards, Binguo Bao From 8f6381d272816175d3e681c9d1cd8c6b5f27f44f Mon Sep 17 00:00:00 2001 From: BBG Date: Tue, 4 Jun 2019 22:56:42 +0800 Subject: [PATCH] de-TOASTing using a iterator --- src/backend/access/common/detoast.c | 110 + src/backend/access/common/toast_internals.c | 355 src/backend/utils/adt/varlena.c | 29 ++- src/include/access/detoast.h| 32 +++ src/include/access/toast_internals.h| 75 ++ src/include/fmgr.h | 8 + 6 files changed, 603 insertions(+), 6 deletions(-) diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c index c8b49d6..f437d27 100644 --- a/src/backend/access/common/detoast.c +++ b/src/backend/access/common/detoast.c @@ -290,6 +290,116 @@ heap_tuple_untoast_attr_slice(struct varlena *attr, } /* -- + * create_detoast_iterator - + * + * It only makes sense to initialize a de-TOAST iterator for external on-disk values. + * + * -- + */ +DetoastIterator +create_detoast_iterator(struct varlena *attr) +{ + struct varatt_external toast_pointer; + DetoastIterator iter; + if (VARATT_IS_EXTERNAL_ONDISK(attr)) + { + iter = (DetoastIterator) palloc0(sizeof(DetoastIteratorData)); + iter->done = false; + + /* This is an externally stored datum --- initialize fetch datum iterator */ + iter->fetch_datum_iterator = create_fetch_datum_iterator(attr); + VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); + if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) + { + iter->compressed = true; + + /* prepare buffer to received decompressed data */ + iter->buf = create_toast_buffer(toast_pointer.va_rawsize, false); + + /* initialize state for pglz_decompress_iterate() */ + iter->ctrl = 0; + iter->ctrlc = INVALID_CTRLC; + } + else + { + iter->compressed = false; + + /* point the buffer directly at the raw data */ + iter->buf = iter->fetch_datum_iterator->buf; + } + return iter; + } + else if (VARATT_IS_EXTERNAL_INDIRECT(attr)) + { + /* indirect pointer --- dereference it */ + struct varatt_indirect redirect; + +
Proposal: Better query optimization for "NOT IN" clause
One-line Summary: Better query optimization for "NOT IN" clause Business Use-case: Using x NOT IN (SELECT y FROM target) on extremely large tables can be done very fast. This might be necessary in order to introduce foreign keys where old systems relied on a user to type in a reference manually. Typically I am useing NOT IN to find invalid foreign keys which was not protected by old database designs. More specifically, the following query would return what I am looking for: SELECT source, docno FROM tableA WHERE (source, docno) NOT IN (SELECT b.source, b.docno FROM tableB b) The query took a few hours (tableB contains 1.8m rows and tableA 7.8m rows) User impact with the change: A search for invalid foreign keys can be reduced from hours or days to seconds. The reasoning behind this implementation is as follows: If I can return all rows from "SELECT col1, col2, ... FROM tableA ORDER BY col1, col2" in few seconds, then calculating missing / NOT IN (or existing / IN) values in a few seconds as well. Implementation details: I have written a general purpose plpgsql function which returns an SETOF text[]. The function takes two table names with two sets of column names and optionally two filters on the different tables. Unfortunately I had to cast values to text to have this work in a plpgsql function, but I got the results I needed. My required SQL: SELECT source, docno FROM tableA WHERE (source, docno) NOT IN (SELECT b.source, b.docno FROM tableB b) The query took a few hours (tableB contains 1.8m rows and tableA 7.8m rows) I used my function as follows: SELECT DISTINCT a[1], a[2] FROM util.notIn('tableA', 'source, docno', 'tableB', 'source, docno', null, null) a ORDER BY a[1], a[2] This function returned what I needed in 51 seconds, where the NOT IN statement ran for hours. Before including the plpgsql source, let me explain what it does: The function builds up two SQL statements using the given values (remember I did not specify any filter on either of the two tables - the last two parameters are null). Resulting statements are: Primary: SELECT DISTINCT ARRAY[source::text, docno::text] FROM tableA WHERE source IS NOT NULL AND docno IS NOT NULL ORDER BY ARRAY[source::text, docno::text] Secondary: SELECT DISTINCT ARRAY[source::text, docno::text] FROM tableB WHERE source IS NOT NULL AND docno IS NOT NULL ORDER BY ARRAY[source::text, docno::text] (As you can see, I stripped out NULLs, because that I can check for easily with a simple query) Now I open two cursors and fetch the first record of both statements into separate variables. Since the FOUND variable is overwritten with every FETCH, I store the states in my own variables. Ordering the results are important, since it is important for the working of the function. A loop now advances both or one of the two result sets based on whether the rows are equal or not (if not equal, return a row (if primary rows < secondary ) and only fetch from the dataset which is lesser of the two. You can early terminate when the primary dataset end has been reached, alternatively if the secondary dataset reached its end, you can simply add all remaining rows from the primary dataset to the result. The best implementation for this would be to detect a NOT IN (or IN) WHERE clause during query optimization phase and use a sequential scan as done in the function. Important is that the IN / NOT IN select statement must not be dependent on the current row of the main SELECT. A real long term solution would actually be to extend the SQL definition language to allow for a different type of join. To optimize IN, you could rewrite your query as an INNER JOIN (providing you you join to a DISTINCT dataset). However, the NOT IN is not so easy (except for LEFT JOIN testing for NULL in secondary table). What I would propose is the following: SELECT a.colA FROM TableA [LEFT | RIGHT] MISSING JOIN TableB ON TableB.colB = TableA.colA CREATE OR REPLACE FUNCTION util.notIn(pTableA text, pFieldA text, pTableB text, pFieldB text, pFilterA text, pFilterB text) RETURNS SETOF text[] AS $BODY$ DECLARE vFieldsA text[]; vFieldsB text[]; vValueA text[]; vValueB text[]; vRowA record; vRowB record; vSQLA text; vSQLB text; vWhereA text; vWhereB text; vSelectA text; vSelectB text; vCursorA refcursor; vCursorB refcursor; vFirst integer; vLast integer; vNdx integer; vMoreA boolean; vMoreB boolean; BEGIN IF pTableA
Re: Unwanted expression simplification in PG12b2
On Sun, Sep 22, 2019 at 7:47 AM Darafei "Komяpa" Praliaskouski wrote: > A heuristic I believe should help my case (and I hardly imagine how it can > break others) is that in presence of Gather, all the function calls that are > parallel safe should be pushed into it. The cost of pushing data through the Sort is not necessarily insignificant. Your functions are (IIUC) extremely expensive, so it's worth going to any length to reduce the time spent evaluating them. However, if someone has ||(text,text) in the tlist, that might be the wrong approach, because it's not saving much to compute that earlier and it might make the sort a lot wider, especially if de-TOASTing is involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Efficient output for integer types
> "David" == David Fetter writes: David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - 1); No, this is just reintroducing the undefined behavior again. Once the value has been converted to unsigned you can't cast it back to signed or pass it to a function expecting a signed value, since it will overflow in the INT_MIN case. (and in this example would probably output '-' signs until it ran off the end of memory). Here's how I would do it: char * pg_ltostr_zeropad(char *str, int32 value, int32 minwidth) { int32 len; uint32 uvalue = value; Assert(minwidth > 0); if (value >= 0) { if (value < 100 && minwidth == 2) /* Short cut for common case */ { memcpy(str, DIGIT_TABLE + value*2, 2); return str + 2; } } else { *str++ = '-'; minwidth -= 1; uvalue = (uint32)0 - uvalue; } len = pg_ultoa_n(uvalue, str); if (len >= minwidth) return str + len; memmove(str + minwidth - len, str, len); memset(str, '0', minwidth - len); return str + minwidth; } David> pg_ltostr(char *str, int32 value) David> + int32 len = pg_ultoa_n(value, str); David> + return str + len; This seems to have lost its handling of negative numbers entirely (which doesn't say much for the regression test coverage) -- Andrew (irc:RhodiumToad)
Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node
On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote: Is this an issue? Can we fix like this? Thanks! I do think it is a valid issue. No opinion on the fix yet, though. The report was sent on saturday, so patience ;-) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Hi guys, HELP please
On Fri, Sep 20, 2019 at 09:21:59PM +, Castillo, Steven (Agile) wrote: Hi, I wonder if you guys can help me with this, I've been struggling with this query for almost a week and I haven't been able to tune it, it runs forever and I need it to run fast. Hard to say, because all we have is an explain without any additional information (like amount of data, PostgreSQL version, settings like work_mem). Maybe look at [1] which explains what to try, and also what to include in your question. [1] https://wiki.postgresql.org/wiki/Slow_Query_Questions Now, if I had to guess, I'd say this is a case of underestimate, causing a choice of nested loops. That's fairly deadly. In particular, I'm talking about this: -> Seq Scan on t_territory_common tc (cost=0.00..6494012.54 rows=49 width=232) Filter: (((source)::text = 'DSCHED'::text) AND ... many conditions How many rows does this return when you query just this table (with all the conditions)? Chances are those conditions are correlated, in which case the number of rows is much higher than 49 (possibly by orders of magnitude). If that's the case, you have multiple options: 1) create a temporary table, and then joining it (can be analyzed, estimates are likely much better) 2) disable nested loops for this query (useful for testing/investigation) 3) create extended statistics on those correlated columns (depends on which PostgreSQL version you use) 4) redo the table schema (e.g. have a special column representing combination of those columns), so that there's just a single condition (thus no misestimate due to correlation) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi guys, HELP please
Hi, I wonder if you guys can help me with this, I've been struggling with this query for almost a week and I haven't been able to tune it, it runs forever and I need it to run fast. Regards. Steven Castillo Time Release Plan.sql Description: Time Release Plan.sql Time Release Query.sql Description: Time Release Query.sql
Re: Attempt to consolidate reading of XLOG page
Alvaro Herrera wrote: > I was confused by the struct name XLogSegment -- the struct is used to > represent a WAL segment while it's kept open, rather than just a WAL > segment in abstract. Also, now that we've renamed everything to use the > term WAL, it seems wrong to use the name XLog for new structs. I > propose the name WALOpenSegment for the struct, which solves both > problems. (Its initializer function would get the name > WALOpenSegmentInit.) > > Now, the patch introduces a callback for XLogRead, the type of which is > called XLogOpenSegment. If we rename it from XLog to WAL, both names > end up the same. I propose to rename the function type to > WALSegmentOpen, which in a "noun-verb" view of the world, represents the > action of opening a WAL segment. > > I attach a patch for all this renaming, on top of your series. ok, thanks. In addition I renamed WalSndOpenSegment() to WalSndSegmentOpen() and read_local_xlog_page_open_segment() to read_local_xlog_page_segment_open() > I wonder if each of those WALSegmentOpen callbacks should reset [at > least some members of] the struct; they're already in charge of setting > ->file, and apparently we're leaving the responsibility of setting the > rest of the members to XLogRead. That seems weird. Maybe we should say > that the CB should only open the segment and not touch the struct at all > and XLogRead is in charge of everything. Perhaps the other way around > -- the CB should set everything correctly ... I'm not sure which is > best. But having half here and half there seems a recipe for confusion > and bugs. ok, I've changed the CB signature. Now it receives poiners to the two variables that it can change while the "seg" argument is documented as read-only. To indicate that the CB should determine timeline itself, I introduced a new constant InvalidTimeLineID, see the 0004 part. > Another thing I didn't like much is that everything seems to assume that > the only error possible from XLogRead is a read error. Maybe that's > okay, because it seems to be the current reality, but it seemed odd. In this case I only moved the ereport() code from XLogRead() away (so that both backend and frontend can call the function). Given that the code to open WAL segment is now in the callbacks, the only thing that XLogRead() can ereport is that read() failed. BTW, I introduced one more structure, XLogReadError, in this patch version. I think it's better than adding error-specific fields to the WALOpenSegment structure. -- Antonin Houska Web: https://www.cybertec-postgresql.com >From 674fa97ef9df8b1fe875139aa7b43f605255d8cd Mon Sep 17 00:00:00 2001 From: Antonin Houska Date: Mon, 23 Sep 2019 07:40:49 +0200 Subject: [PATCH 2/6] Remove TLI from some argument lists. The timeline information is available to caller via XLogReaderState. Now that XLogRead() is gonna be (sometimes) responsible for determining the TLI, it would have to be added the (TimeLineID *) argument too, just to be consistent with the current coding style. Since XLogRead() updates also other position-specific fields of XLogReaderState, it seems simpler if we remove the output argument from XLogPageReadCB and always report the TLI via XLogReaderState. --- src/backend/access/transam/xlog.c | 7 +++ src/backend/access/transam/xlogreader.c| 6 +++--- src/backend/access/transam/xlogutils.c | 11 ++- src/backend/replication/logical/logicalfuncs.c | 4 ++-- src/backend/replication/walsender.c| 2 +- src/bin/pg_rewind/parsexlog.c | 8 +++- src/bin/pg_waldump/pg_waldump.c| 2 +- src/include/access/xlogreader.h| 8 +++- src/include/access/xlogutils.h | 5 ++--- src/include/replication/logicalfuncs.h | 2 +- 10 files changed, 25 insertions(+), 30 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b7ff004234..7a89dfed7f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -885,8 +885,7 @@ static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, int source, bool notfoundOk); static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source); static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, - int reqLen, XLogRecPtr targetRecPtr, char *readBuf, - TimeLineID *readTLI); + int reqLen, XLogRecPtr targetRecPtr, char *readBuf); static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); @@ -11523,7 +11522,7 @@ CancelBackup(void) */ static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, - XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI) + XLogRecPtr targetRecPtr, char *readBuf) { XLogPageReadPrivate *private =
Re: pgbench - allow to create partitioned tables
On Mon, Sep 23, 2019 at 11:58 AM Fabien COELHO wrote: > > > Hello Amit, > > > It is better for a user to write a custom script for such cases. > > I kind-of agree, but IMHO this is not for pgbench to decide what is better > for the user and to fail on a script that would not fail. > > > Because after that "select-only" or "simple-update" script doesn't > > make any sense. [...]. > > What make sense in a benchmarking context may not be what you think. For > instance, AFAICR, I already removed benevolent but misplaced guards which > were preventing running scripts without queries: if one wants to look at > pgbench overheads because they are warry that it may be too high, they > really need to be allowed to run such scripts. > > This not for us to decide, and as I already said they do if you want to > test no-op overheads. Moreover the problem pre-exists: if the user deletes > the contents of pgbench_accounts these scripts are no-op, and we do not > complain. The no partition attached is just a particular case. > > > Having said that, I see your point and don't mind allowing such cases > > until we don't have to write special checks in the code to support such > > cases. > > Indeed, it is also simpler to not care about such issues in the code. > If you agree with this, then why haven't you changed below check in patch: + if (partition_method != PART_NONE) + printf("partition method: %s\npartitions: %d\n", +PARTITION_METHOD[partition_method], partitions); This is exactly the thing bothering me. It won't be easy for others to understand why this check for partitioning information is different from other checks. For you or me, it might be okay as we have discussed this case, but it won't be apparent to others. This doesn't buy us much, so it is better to keep this code consistent with other places that check for partitions. > > [...] Now, we can have a detailed comment in printResults to explain why > > we have a different check there as compare to other code paths or change > > other code paths to have a similar check as printResults, but I am not > > convinced of any of those options. > > Yep. ISTM that the current version is reasonable. > > > [...] I am talking about the call to append_fillfactor in > > createPartitions() function. See, in my version, there are some > > comments. > > Ok, I understand that you want a comment. Patch v15 does that. > Thanks! -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Efficient output for integer types
Moin, On 2019-09-22 23:58, David Fetter wrote: On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote: > "David" == David Fetter writes: Fixed. Good work, more performance is sure nice :) Noticed one more thing in the patch: - *start++ = *a; - *a-- = swap; + memcpy(pos - 2, DIGIT_TABLE + c, 2); + i += 2; } + else + *a = (char) ('0' + value2); + + return olength; } The line "i += 2;" modifies i, but i is never used again nor returned. Best regards, Tels
Re: Global temporary tables
On 20.09.2019 19:43, Pavel Stehule wrote: 1. I do not need them at all. 2. Eliminate catalog bloating. 3. Mostly needed for compatibility with Oracle (simplify porting,...). 4. Parallel query execution. 5. Can be used at replica. 6. More efficient use of resources (first of all memory). There can be other point important for cloud. Inside some cloud usually there are two types of discs - persistent (slow) and ephemeral (fast). We effectively used temp tables there because we moved temp tablespace to ephemeral discs. Yes, I already heard this argument and agree with it. I just want to notice two things: 1. My assumption is that in most cases data of temporary table can fit in memory (certainly if we are not limiting them by temp_buffers = 8MB, but store in shared buffers) and so there is on need to write them to the persistent media at all. 2. Global temp tables do not substitute local temp tables, accessed through local buffers. So if you want to use temporary storage, you will always have a way to do it. The question is whether we need to support two kinds of global temp tables (with shared or private buffers) or just implement one of them. I missing one point in your list - developer's comfort - using temp tables is just much more comfortable - you don't need create it again, again, .. Due this behave is possible to reduce @2 and @3 can be nice side effect. If you reduce @2 to zero, then @5 should be possible without any other. Sorry, I do not completely understand your point here You can use normal (permanent) table and you will not have to create them again and again. It is also possible to use them for storing temporary data - just need to truncate table when data is not needed any more. Certainly you can not use the same table in more than one backend. Here is the main advantage of temp tables - you can have storage of per-session data and do not worry about possible name conflicts. From the other side: there are many cases where format of temporary data is not statically known: it is determined dynamically during program execution. In this case local temp table provides the most convenient mechanism for working with such data. This is why I think that ewe need to have both local and global temp tables. Also I do not agree with your statement "If you reduce @2 to zero, then @5 should be possible without any other". In the solution implemented by Aleksander Alekseev metadata of temporary tables is kept in memory and not affecting catalog at all. But them still can not be used at replica. There are still some serious problems which need to be fixed to able it: allow insert/update/delete statements for read-only transactions, somehow assign XIDs for them, implement savepoints and rollback of such transactions. All this was done in the last version of my patch. Yes, it doesn't depend on whether we are using shared or private buffers for temporary tables. The same approach can be implemented for both of them. The question is whether we are really need temp tables at replica and if so, do we need full transaction support for them, including rollbacks, subtransactions. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re:PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node
Is this an issue? Can we fix like this? Thanks! At 2019-09-22 00:38:03, "Thunder" wrote: The step to reproduce this issue. 1. Create a table create table gist_point_tbl(id int4, p point); create index gist_pointidx on gist_point_tbl using gist(p); 2. Insert data insert into gist_point_tbl (id, p) select g,point(g*10, g*10) from generate_series(1, 100) g; 3. Delete data delete from gist_point_bl; 4. Vacuum table vacuum gist_point_tbl; -- Send SIGINT to vacuum process after WAL-log of the truncation is flushed and the truncation is not finished -- We will receive error message "ERROR: canceling statement due to user request" 5. Vacuum table again vacuum gist_point tbl; -- The standby node crashed and the PANIC log is "PANIC: WAL contains references to invalid pages" The standby node succeed to replay truncate log but master node doesn't truncate the file, it will be crashed if master node writes to these blocks which truncated in standby node. I try to fix issue to prevent query cancel interrupts during truncating. diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 5df4382b7e..04b696ae01 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -26,6 +26,7 @@ #include "access/xlogutils.h" #include "catalog/storage.h" #include "catalog/storage_xlog.h" +#include "miscadmin.h" #include "storage/freespace.h" #include "storage/smgr.h" #include "utils/memutils.h" @@ -248,6 +249,14 @@ RelationTruncate(Relation rel, BlockNumber nblocks) if (vm) visibilitymap_truncate(rel, nblocks); + /* +* When master node flush WAL-log of the truncation and then receive SIGINT signal to cancel +* this transaction before the truncation, if standby receive this WAL-log and do the truncation, +* standby node will crash when master node writes to these blocks which are truncated in standby node. +* So we prevent query cancel interrupts. +*/ + HOLD_CANCEL_INTERRUPTS(); + /* * We WAL-log the truncation before actually truncating, which means * trouble if the truncation fails. If we then crash, the WAL replay @@ -288,6 +297,8 @@ RelationTruncate(Relation rel, BlockNumber nblocks) /* Do the real work */ smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks); + + RESUME_CANCEL_INTERRUPTS(); }
Re: pgbench - allow to create partitioned tables
Hello Amit, It is better for a user to write a custom script for such cases. I kind-of agree, but IMHO this is not for pgbench to decide what is better for the user and to fail on a script that would not fail. Because after that "select-only" or "simple-update" script doesn't make any sense. [...]. What make sense in a benchmarking context may not be what you think. For instance, AFAICR, I already removed benevolent but misplaced guards which were preventing running scripts without queries: if one wants to look at pgbench overheads because they are warry that it may be too high, they really need to be allowed to run such scripts. This not for us to decide, and as I already said they do if you want to test no-op overheads. Moreover the problem pre-exists: if the user deletes the contents of pgbench_accounts these scripts are no-op, and we do not complain. The no partition attached is just a particular case. Having said that, I see your point and don't mind allowing such cases until we don't have to write special checks in the code to support such cases. Indeed, it is also simpler to not care about such issues in the code. [...] Now, we can have a detailed comment in printResults to explain why we have a different check there as compare to other code paths or change other code paths to have a similar check as printResults, but I am not convinced of any of those options. Yep. ISTM that the current version is reasonable. [...] I am talking about the call to append_fillfactor in createPartitions() function. See, in my version, there are some comments. Ok, I understand that you want a comment. Patch v15 does that. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index c857aa3cba..e3a0abb4c7 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -306,6 +306,31 @@ pgbench options d + + --partitions=NUM + + +Create a partitioned pgbench_accounts table with +NUM partitions of nearly equal size for +the scaled number of accounts. +Default is 0, meaning no partitioning. + + + + + + --partition-method=NAME + + +Create a partitioned pgbench_accounts table with +NAME method. +Expected values are range or hash. +This option requires that --partitions is set to non-zero. +If unspecified, default is range. + + + + --tablespace=tablespace diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index ed7652bfbf..9e45f00fb5 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -186,6 +186,25 @@ int64 latency_limit = 0; char *tablespace = NULL; char *index_tablespace = NULL; +/* + * Number of "pgbench_accounts" partitions, found or to create. + * When creating, 0 is the default and means no partitioning. + * When running, this is the actual number of partitions. + */ +static int partitions = 0; + +/* partitioning strategy for "pgbench_accounts" */ +typedef enum +{ + PART_NONE, /* no partitioning */ + PART_RANGE, /* range partitioning */ + PART_HASH /* hash partitioning */ +} + partition_method_t; + +static partition_method_t partition_method = PART_NONE; +static const char *PARTITION_METHOD[] = {"none", "range", "hash"}; + /* random seed used to initialize base_random_sequence */ int64 random_seed = -1; @@ -617,6 +636,9 @@ usage(void) " --foreign-keys create foreign key constraints between tables\n" " --index-tablespace=TABLESPACE\n" " create indexes in the specified tablespace\n" + " --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n" + " --partition-method=(range|hash)\n" + " partition pgbench_accounts with this method (default: range)\n" " --tablespace=TABLESPACE create tables in the specified tablespace\n" " --unlogged-tablescreate tables as unlogged tables\n" "\nOptions to select what to run:\n" @@ -3601,6 +3623,87 @@ initDropTables(PGconn *con) "pgbench_tellers"); } +/* + * add fillfactor percent option. + */ +static void +append_fillfactor(char *opts, int len) +{ + /* as default is 100, it could be removed in this case */ + snprintf(opts + strlen(opts), len - strlen(opts), + " with (fillfactor=%d)", fillfactor); +} + +/* + * Create "pgbench_accounts" partitions if needed. + * + * This is the larger table of pgbench default tpc-b like schema + * with a known size, so that it can be partitioned by range. + */ +static void +createPartitions(PGconn *con) +{ + char ff[64]; + + ff[0] = '\0'; + + /* + * Per ddlinfo in initCreateTables below, fillfactor is needed on + * table pgbench_accounts. + */ + append_fillfactor(ff, sizeof(ff)); + + Assert(partitions > 0); + + fprintf(stderr, "creating %d partitions...\n",