Re: Fix example in partitioning documentation

2019-09-23 Thread Michael Paquier
On Tue, Sep 24, 2019 at 10:52:30AM +0900, Amit Langote wrote:
> As of v12, Append node is elided when there's a single subnode under
> it.  An example in the partitioning documentation needs to be fixed to
> account for that change.  Attached a patch.

Indeed, using the same example as the docs:
CREATE TABLE measurement (
 logdate date not null,
 peaktempint,
 unitsales   int
 ) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2016m07
 PARTITION OF measurement (
 unitsales DEFAULT 0
 ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
SET enable_partition_pruning = on;
EXPLAIN SELECT count(*) FROM measurement WHERE logdate = DATE
'2016-07-02';

I'll take care of committing that, however this will have to wait
until v12 RC1 is tagged.
--
Michael


signature.asc
Description: PGP signature


Re: Add "password_protocol" connection parameter to libpq

2019-09-23 Thread Michael Paquier
On Sat, Sep 21, 2019 at 11:24:30AM +0900, Michael Paquier wrote:
> And both make sense.
> 
> + * Return true if channel binding was employed and the scram exchange
> upper('scram')?
> 
> Except for this nit, it looks good to me.

For the archive's sake: this has been committed as of d6e612f.

-   * We pretend that the connection is OK for the duration of these
-   * queries.
+   * We pretend that the connection is OK for the duration of
+   * these queries.
The result had some noise diffs.  Perhaps some leftover from the
indentation run?  That's no big deal anyway.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Sort policies and triggers by table name in pg_dump.

2019-09-23 Thread Michael Paquier
On Mon, Sep 23, 2019 at 10:34:07PM +0100, Benjie Gillam wrote:
> The attached draft patch (made against `pg_dump_sort.c` on master) breaks
> ties for trigger and policy objects by using the table name, increasing the
> sort order stability. I have compiled it and executed it against a number of
> local databases and it behaves as desired.

Could you provide a simple example of schema (tables with some
policies and triggers), with the difference this generates for
pg_dump, which shows your point?

> I am new to PostgreSQL contribution and my C-skills are rusty, so please let
> me know if I can improve the patch, or if there are areas of PostgreSQL that
> I have overlooked.

Your patch has two warnings because you are trying to map a policy
info pointer to a trigger info pointer:
pg_dump_sort.c:224:24: warning: initialization of ‘TriggerInfo *’ {aka
‘struct _triggerInfo *’} from incompatible pointer type ‘PolicyInfo *
const’ {aka ‘struct _policyInfo * const’}
[-Wincompatible-pointer-types]
  224 |   TriggerInfo *tobj2 = *(PolicyInfo *const *) p2;  
--
Michael


signature.asc
Description: PGP signature


Fix example in partitioning documentation

2019-09-23 Thread Amit Langote
Hi,

As of v12, Append node is elided when there's a single subnode under
it.  An example in the partitioning documentation needs to be fixed to
account for that change.  Attached a patch.

Thanks,
Amit


doc-partitioning.patch
Description: Binary data


Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2019-09-23 Thread Michael Paquier
On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote:
> On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:
>> Is this an issue?
>> Can we fix like this?
>> Thanks!
>> 
> 
> I do think it is a valid issue. No opinion on the fix yet, though.
> The report was sent on saturday, so patience ;-)

And for some others it was even a longer weekend.  Anyway, the problem
can be reproduced if you apply the attached which introduces a failure
point, and then if you run the following commands:
create table aa as select 1;
delete from aa;
\! touch /tmp/truncate_flag
vacuum aa;
\! rm /tmp/truncate_flag
vacuum aa; -- panic on standby

This also points out that there are other things to worry about than
interruptions, as for example DropRelFileNodeLocalBuffers() could lead
to an ERROR, and this happens before the physical truncation is done
but after the WAL record is replayed on the standby, so any failures
happening at the truncation phase before the work is done would be a
problem.  However we are talking about failures which should not
happen and these are elog() calls.  It would be tempting to add a
critical section here, but we could still have problems if we have a
failure after the WAL record has been flushed, which means that it
would be replayed on the standby, and the surrounding comments are
clear about that.  In short, as a matter of safety I'd like to think
that what you are suggesting is rather acceptable (aka hold interrupts
before the WAL record is written and release after the physical
truncate), so as truncation avoids failures possible to avoid.

Do others have thoughts to share on the matter?
--
Michael
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 3cc886f7fe..0a38a32734 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -290,6 +290,13 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 			XLogFlush(lsn);
 	}
 
+	{
+		struct stat stat_buf;
+
+		if (stat("/tmp/truncate_flag", _buf) == 0)
+			elog(ERROR, "/tmp/truncate_flag is present");
+	}
+
 	/* Do the real work */
 	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
 }


signature.asc
Description: PGP signature


Re: Unwanted expression simplification in PG12b2

2019-09-23 Thread Finnerty, Jim
If the function was moved to the FROM clause where it would be executed as a 
lateral cross join instead of a target list expression, how would this affect 
the cost-based positioning of the Gather?

On 9/23/19, 8:59 AM, "Robert Haas"  wrote:

On Sun, Sep 22, 2019 at 7:47 AM Darafei "Komяpa" Praliaskouski
 wrote:
> A heuristic I believe should help my case (and I hardly imagine how it 
can break others) is that in presence of Gather, all the function calls that 
are parallel safe should be pushed into it.

The cost of pushing data through the Sort is not necessarily
insignificant.  Your functions are (IIUC) extremely expensive, so it's
worth going to any length to reduce the time spent evaluating them.
However, if someone has ||(text,text) in the tlist, that might be the
wrong approach, because it's not saving much to compute that earlier
and it might make the sort a lot wider, especially if de-TOASTing is
involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






RE: [bug fix??] Fishy code in tts_cirtual_copyslot()

2019-09-23 Thread Tsunakawa, Takayuki
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> I temporarily changed the Assert to be "==" rather than "<=", and
> it still passed check-world, so evidently we are not testing any
> cases where the descriptors are of different lengths.  This explains
> the lack of symptoms.  It's still a bug though, so pushed.

Thank you for committing.

> > BTW, I thought that in PostgreSQL coding convention, local variables
> should be defined at the top of blocks, but this function writes "for (int
> natts;".
> 
> Yeah, we've agreed to join the 21st century to the extent of allowing
> local for-loop variables.

That's good news.  It'll help a bit to code comfortably.


Regards
Takayuki Tsunakawa







Re: Cache lookup errors with functions manipulation object addresses

2019-09-23 Thread Michael Paquier
On Mon, Sep 23, 2019 at 09:15:24PM +0200, Dmitry Dolgov wrote:
> Thanks for the patch! I couldn't check it in action, since looks like it
> doesn't apply anymore [1] (although after a quick check I'm not entirely sure
> why). Nevertheless I have a few short commentaries:

Thanks for the review.  There was a small conflict in objectaddress.h
easy enough to solve.

> Here and in format_operator_extened commentary says
> 
> * Returns a palloc'd string.
> 
> but now it's possible to return NULL, so I guess comments need to be adjusted,
> right?

Right.

> v16-0003-Eliminate-user-visible-cache-lookup-errors-for-o.patch
> 
> - appendStringInfo(, _("operator %s"),
> - format_operator(object->objectId));
> - break;
> + {
> + char *oprname = format_operator_extended(object->objectId,
> + FORMAT_PROC_FORCE_NULL);
> 
> Shouldn't it be FORMAT_OPERATOR_FORCE_NULL here?

Indeed, that's the case.

Please feel free to use the updated versions attached.  These can
apply on top of HEAD at 30d1379.
--
Michael
From dd4b4112eea12b4866aa2c6a33de57302b842c8a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 24 Sep 2019 08:54:37 +0900
Subject: [PATCH v17 1/3] Add flag to format_type_extended to enforce NULL-ness

If a type scanned is undefined, type format routines have two behaviors
depending on if FORMAT_TYPE_ALLOW_INVALID is defined by the caller:
- Generate an error
- Return undefined type name "???" or "-".

The current interface is unhelpful for callers willing to format
properly a type name, but still make sure that the type is defined as
there could be types matching the undefined strings.  In order to
counter that, add a new flag called FORMAT_TYPE_FORCE_NULL which returns
a NULL result instead of "??? or "-" which does not generate an error.
They will be used for future patches to improve the SQL interface for
object addresses.
---
 src/backend/utils/adt/format_type.c | 22 +-
 src/include/utils/builtins.h|  1 +
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 6ae2a31345..076d2ca3f3 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -96,13 +96,16 @@ format_type(PG_FUNCTION_ARGS)
  * - FORMAT_TYPE_ALLOW_INVALID
  *			if the type OID is invalid or unknown, return ??? or such instead
  *			of failing
+ * - FORMAT_TYPE_FORCE_NULL
+ *			if the type OID is invalid or unknown, return NULL instead of ???
+ *			or such
  * - FORMAT_TYPE_FORCE_QUALIFY
  *			always schema-qualify type names, regardless of search_path
  *
  * Note that TYPEMOD_GIVEN is not interchangeable with "typemod == -1";
  * see the comments above for format_type().
  *
- * Returns a palloc'd string.
+ * Returns a palloc'd string, or NULL.
  */
 char *
 format_type_extended(Oid type_oid, int32 typemod, bits16 flags)
@@ -114,13 +117,20 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags)
 	char	   *buf;
 	bool		with_typemod;
 
-	if (type_oid == InvalidOid && (flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
-		return pstrdup("-");
+	if (type_oid == InvalidOid)
+	{
+		if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+			return NULL;
+		else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+			return pstrdup("-");
+	}
 
 	tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(type_oid));
 	if (!HeapTupleIsValid(tuple))
 	{
-		if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+		if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+			return NULL;
+		else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
 			return pstrdup("???");
 		else
 			elog(ERROR, "cache lookup failed for type %u", type_oid);
@@ -143,7 +153,9 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags)
 		tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(array_base_type));
 		if (!HeapTupleIsValid(tuple))
 		{
-			if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+			if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+return NULL;
+			else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
 return pstrdup("???[]");
 			else
 elog(ERROR, "cache lookup failed for type %u", type_oid);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 937ddb7ef0..78071b7755 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -109,6 +109,7 @@ extern Datum numeric_float8_no_overflow(PG_FUNCTION_ARGS);
 #define FORMAT_TYPE_TYPEMOD_GIVEN	0x01	/* typemod defined by caller */
 #define FORMAT_TYPE_ALLOW_INVALID	0x02	/* allow invalid types */
 #define FORMAT_TYPE_FORCE_QUALIFY	0x04	/* force qualification of type */
+#define FORMAT_TYPE_FORCE_NULL		0x08	/* force NULL if undefined */
 extern char *format_type_extended(Oid type_oid, int32 typemod, bits16 flags);
 
 extern char *format_type_be(Oid type_oid);
-- 
2.23.0

From de4bae9cbdfb9d88c77221506f167dd4d84ebd57 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 24 Sep 2019 08:54:52 +0900
Subject: [PATCH v17 2/3] Refactor 

Re: Attempt to consolidate reading of XLOG page

2019-09-23 Thread Alvaro Herrera
I spent a couple of hours on this patchset today.  I merged 0001 and
0002, and decided the result was still messier than I would have liked,
so I played with it a bit more -- see attached.  I think this is
committable, but I'm afraid it'll cause quite a few conflicts with the
rest of your series.

I had two gripes, which I feel solved with my changes:

1. I didn't like that "dir" and "wal segment size" were part of the
"currently open segment" supporting struct.  It seemed that those two
were slightly higher-level, since they apply to every segment that's
going to be opened, not just the current one.

My first thought was to put those as members of XLogReaderState, but
that doesn't work because the physical walsender.c code does not use
xlogreader at all, even though it is reading WAL.  Anyway my solution
was to create yet another struct, which for everything that uses
xlogreader is just part of that state struct; and for walsender, it's
just a separate one alongside sendSeg.  All in all, this seems pretty
clean.

2. Having the wal dir be #ifdef FRONTEND seemed out of place.  I know
the backend code does not use that, but eliding it is more "noisy" than
just setting it to NULL.  Also, the "Finalize the segment pointer"
thingy seemed out of place.  So my code passes the dir as an argument to
XLogReaderAllocate, and if it's null then we just don't allocate it.
Everybody else can use it to guide things.  This results in cleaner
code, because we don't have to handle it externally, which was causing
quite some pain to pg_waldump.  

Note that ws_dir member is a char array in the struct, not just a
pointer.  This saves trouble trying to allocate it (I mainly did it this
way because we don't have pstrdup_extended(MCXT_ALLOC_NO_OOM) ... yes,
this could be made with palloc+snprintf, but eh, that doesn't seem worth
the trouble.)


Separately from those two API-wise points, there was one bug which meant
that with your 0002+0003 the recovery tests did not pass -- code
placement bug.  I suppose the bug disappears with later patches in your
series, which probably is why you didn't notice.  This is the fix for that:

-   XLogRead(cur_page, state->seg.size, state->seg.tli, targetPagePtr,
-   state->seg.tli = pageTLI;
+   state->seg.ws_tli = pageTLI;
+   XLogRead(cur_page, state->segcxt.ws_segsize, state->seg.ws_tli, 
targetPagePtr,
 XLOG_BLCKSZ);


... Also, yes, I renamed all the struct members.


If you don't have any strong dislikes for these changes, I'll push this
part and let you rebase the remains on top.


Regarding the other patches:

1. I think trying to do palloc(XLogReadError) is a bad idea ... for
example, if the read fails because of system pressure, we might return
"out of memory" during that palloc instead of the real read error.  This
particular problem you could forestall by changing to ErrorContext, but
I have the impression that it might be better to have the error struct
by stack-allocated in the caller stack.  This forces you to limit the
message string to a maximum size (say 128 bytes or maybe even 1000 bytes
like MAX_ERRORMSG_LEN) but I don't have a problem with that.

2. Not a fan of the InvalidTimeLineID stuff offhand.  Maybe it's okay ...
not convinced yet either way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Attempt to consolidate reading of XLOG page

2019-09-23 Thread Alvaro Herrera from 2ndQuadrant
On 2019-Sep-23, Alvaro Herrera wrote:

> I spent a couple of hours on this patchset today.  I merged 0001 and
> 0002, and decided the result was still messier than I would have liked,
> so I played with it a bit more -- see attached.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 477709bbc2..0635a17eae 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1386,8 +1386,8 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
 	XLogReaderState *xlogreader;
 	char	   *errormsg;
 
-	xlogreader = XLogReaderAllocate(wal_segment_size, _local_xlog_page,
-	NULL);
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+	_local_xlog_page, NULL);
 	if (!xlogreader)
 		ereport(ERROR,
 (errcode(ERRCODE_OUT_OF_MEMORY),
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 501f46fd52..6c69eb6dd7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -885,8 +885,7 @@ static int	XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 		 int source, bool notfoundOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
 static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
-		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
-		 TimeLineID *readTLI);
+		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		bool fetching_ckpt, XLogRecPtr tliRecPtr);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
@@ -1195,7 +1194,8 @@ XLogInsertRecord(XLogRecData *rdata,
 			appendBinaryStringInfo(, rdata->data, rdata->len);
 
 		if (!debug_reader)
-			debug_reader = XLogReaderAllocate(wal_segment_size, NULL, NULL);
+			debug_reader = XLogReaderAllocate(wal_segment_size, NULL,
+			  NULL, NULL);
 
 		if (!debug_reader)
 		{
@@ -4296,7 +4296,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 			XLByteToSeg(xlogreader->latestPagePtr, segno, wal_segment_size);
 			offset = XLogSegmentOffset(xlogreader->latestPagePtr,
 	   wal_segment_size);
-			XLogFileName(fname, xlogreader->readPageTLI, segno,
+			XLogFileName(fname, xlogreader->seg.ws_tli, segno,
 		 wal_segment_size);
 			ereport(emode_for_corrupt_record(emode,
 			 RecPtr ? RecPtr : EndRecPtr),
@@ -6353,7 +6353,8 @@ StartupXLOG(void)
 
 	/* Set up XLOG reader facility */
 	MemSet(, 0, sizeof(XLogPageReadPrivate));
-	xlogreader = XLogReaderAllocate(wal_segment_size, , );
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+	, );
 	if (!xlogreader)
 		ereport(ERROR,
 (errcode(ERRCODE_OUT_OF_MEMORY),
@@ -7355,7 +7356,7 @@ StartupXLOG(void)
 	 * and we were reading the old WAL from a segment belonging to a higher
 	 * timeline.
 	 */
-	EndOfLogTLI = xlogreader->readPageTLI;
+	EndOfLogTLI = xlogreader->seg.ws_tli;
 
 	/*
 	 * Complain if we did not roll forward far enough to render the backup
@@ -11523,7 +11524,7 @@ CancelBackup(void)
  */
 static int
 XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
-			 XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
+			 XLogRecPtr targetRecPtr, char *readBuf)
 {
 	XLogPageReadPrivate *private =
 	(XLogPageReadPrivate *) xlogreader->private_data;
@@ -11640,7 +11641,7 @@ retry:
 	Assert(targetPageOff == readOff);
 	Assert(reqLen <= readLen);
 
-	*readTLI = curFileTLI;
+	xlogreader->seg.ws_tli = curFileTLI;
 
 	/*
 	 * Check the page header immediately, so that we can retry immediately if
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index a66e3324b1..27c27303d6 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -68,8 +68,8 @@ report_invalid_record(XLogReaderState *state, const char *fmt,...)
  * Returns NULL if the xlogreader couldn't be allocated.
  */
 XLogReaderState *
-XLogReaderAllocate(int wal_segment_size, XLogPageReadCB pagereadfunc,
-   void *private_data)
+XLogReaderAllocate(int wal_segment_size, const char *waldir,
+   XLogPageReadCB pagereadfunc, void *private_data)
 {
 	XLogReaderState *state;
 
@@ -96,7 +96,10 @@ XLogReaderAllocate(int wal_segment_size, XLogPageReadCB pagereadfunc,
 		return NULL;
 	}
 
-	state->wal_segment_size = wal_segment_size;
+	/* Initialize segment info. */
+	WALOpenSegmentInit(>seg, >segcxt, wal_segment_size,
+	   waldir);
+
 	state->read_page = pagereadfunc;
 	/* system_identifier initialized to zeroes above */
 	state->private_data = private_data;
@@ -198,6 +201,23 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength)
 	return true;
 }
 
+/*
+ * Initialize the passed segment structs.
+ */

Re: scorpionfly needs more semaphores

2019-09-23 Thread Jungle Boogie
On Mon Sep 23, 2019 at 8:34 PM Mikael Kjellström wrote:
> On 2019-09-23 00:29, Thomas Munro wrote:
> > On Wed, Sep 18, 2019 at 4:55 PM jungle boogie  
> > wrote:
> >> $ sysctl | ag kern.seminfo.semmni
> >> kern.seminfo.semmni=100
> > 
> > It still seems to be happening.  Perhaps you need to increase semmns too?
> 
> I have on my OpenBSD 6.5 /etc/sysctl.conf the following:
> 
> kern.seminfo.semmni=200
> kern.seminfo.semmns=4000
> 
> and it seems to work fine.

Thanks! I've made these adjustments and we'll see what happens.

Thomas,

No offense taken with your input from last week. Your additional input this week
is welcomed and I'll pass along to some folks.

Thank you all for continuing to monitor Postgresql building on OpenBSD.

> 
> /Mikael




Re: Efficient output for integer types

2019-09-23 Thread David Fetter
On Mon, Sep 23, 2019 at 01:16:36PM +0100, Andrew Gierth wrote:
> > "David" == David Fetter  writes:
> 
>  David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - 
> 1);
> 
> No, this is just reintroducing the undefined behavior again. Once the
> value has been converted to unsigned you can't cast it back to signed or
> pass it to a function expecting a signed value, since it will overflow
> in the INT_MIN case. (and in this example would probably output '-'
> signs until it ran off the end of memory).
> 
> Here's how I would do it:
> 
> char *
> pg_ltostr_zeropad(char *str, int32 value, int32 minwidth)
> {
>   int32   len;
>   uint32  uvalue = value;
> 
>   Assert(minwidth > 0);
> 
>   if (value >= 0)
>   {
>   if (value < 100 && minwidth == 2) /* Short cut for common case 
> */
>   {
>   memcpy(str, DIGIT_TABLE + value*2, 2);
>   return str + 2;
>   }
>   }
>   else
>   {
>   *str++ = '-';
>   minwidth -= 1;
>   uvalue = (uint32)0 - uvalue;
>   }
>   
>   len = pg_ultoa_n(uvalue, str);
>   if (len >= minwidth)
>   return str + len;
> 
>   memmove(str + minwidth - len, str, len);
>   memset(str, '0', minwidth - len);
>   return str + minwidth;
> }

Done pretty much that way.

>  David>  pg_ltostr(char *str, int32 value)
>  David> + int32   len = pg_ultoa_n(value, str);
>  David> + return str + len;
> 
> This seems to have lost its handling of negative numbers entirely

Given the comment that precedes it and all the use cases in the code,
I changed the signature to take an unsigned integer instead. It's
pretty clear that the intent was to add digits and only digits to the
passed-in string.

> (which doesn't say much for the regression test coverage)

I didn't see any obvious way to test functions not surfaced to SQL.
Should we have one?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From a0eecd74d1dd3f6a4bb1a07f41740c4f6d34ceac Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v12] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..5c5b6d33b2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN + 1];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..5dfbe8dfcd 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,68 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] =
+"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
+"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
+"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
+"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
+"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
+"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
+"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
+"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
+"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
+"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline int
+decimalLength32(const uint32 v)
+{
+	int			t;
+	static uint32	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 

[PATCH] Sort policies and triggers by table name in pg_dump.

2019-09-23 Thread Benjie Gillam
Hello all,

Currently pg_dump sorts most dumpable objects by priority, namespace, name
and then object ID. Since triggers and RLS policies belong to tables, there
may be more than one with the same name within the same namespace, leading to
potential sorting discrepancies between databases that only differ by object
IDs.

The attached draft patch (made against `pg_dump_sort.c` on master) breaks
ties for trigger and policy objects by using the table name, increasing the
sort order stability. I have compiled it and executed it against a number of
local databases and it behaves as desired.

I am new to PostgreSQL contribution and my C-skills are rusty, so please let
me know if I can improve the patch, or if there are areas of PostgreSQL that
I have overlooked.

Kind regards,

Benjie Gillam
From 55b3845b746498a1544bc7ced16a369f4da3905f Mon Sep 17 00:00:00 2001
From: Benjie Gillam 
Date: Mon, 23 Sep 2019 21:18:24 +0100
Subject: [PATCH] Sort policies and triggers by table name in pg_dump.
To: pgsql-hack...@postgresql.org

---
 src/bin/pg_dump/pg_dump_sort.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 31fc06a255..5ff7359dff 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -207,6 +207,28 @@ DOTypeNameCompare(const void *p1, const void *p2)
 		if (cmpval != 0)
 			return cmpval;
 	}
+	else if (obj1->objType == DO_POLICY)
+	{
+		PolicyInfo *pobj1 = *(PolicyInfo *const *) p1;
+		PolicyInfo *pobj2 = *(PolicyInfo *const *) p2;
+
+		/* Sort by table name */
+		cmpval = strcmp(pobj1->poltable->dobj.name,
+		pobj2->poltable->dobj.name);
+		if (cmpval != 0)
+			return cmpval;
+	}
+	else if (obj1->objType == DO_TRIGGER)
+	{
+		TriggerInfo *tobj1 = *(PolicyInfo *const *) p1;
+		TriggerInfo *tobj2 = *(PolicyInfo *const *) p2;
+
+		/* Sort by table name */
+		cmpval = strcmp(tobj1->tgtable->dobj.name,
+		tobj2->tgtable->dobj.name);
+		if (cmpval != 0)
+			return cmpval;
+	}
 
 	/* Usually shouldn't get here, but if we do, sort by OID */
 	return oidcmp(obj1->catId.oid, obj2->catId.oid);
-- 
2.17.1



Re: Efficient output for integer types

2019-09-23 Thread David Fetter
On Mon, Sep 23, 2019 at 10:28:09AM +0200, Tels wrote:
> Moin,
> 
> On 2019-09-22 23:58, David Fetter wrote:
> > On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote:
> > > > "David" == David Fetter  writes:
> 
> > Fixed.
> 
> Good work, more performance is sure nice :)
> 
> Noticed one more thing in the patch:
> 
> > -   *start++ = *a;
> > -   *a-- = swap;
> > +   memcpy(pos - 2, DIGIT_TABLE + c, 2);
> > +   i += 2;
> > }
> > +   else
> > +   *a = (char) ('0' + value2);
> > +
> > +   return olength;
> > }
> 
> The line "i += 2;" modifies i, but i is never used again nor returned.

I found a similar one in a similar function, and removed it, too.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: subscriptionCheck failures on nightjar

2019-09-23 Thread Andrew Dunstan


On 9/20/19 6:17 PM, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Uh .. I didn't think it was possible that we would build the same
>> snapshot file more than once.  Isn't that a waste of time anyway?  Maybe
>> we can fix the symptom by just not doing that in the first place?
>> I don't have a strategy to do that, but seems worth considering before
>> retiring the bf animals.
> The comment near the head of SnapBuildSerialize says
>
>  * there is an obvious race condition here between the time we stat(2) the
>  * file and us writing the file. But we rename the file into place
>  * atomically and all files created need to contain the same data anyway,
>  * so this is perfectly fine, although a bit of a resource waste. Locking
>  * seems like pointless complication.
>
> which seems like a reasonable argument.  Also, this is hardly the only
> place where we expect rename(2) to be atomic.  So I tend to agree with
> Andres that we should consider OSes with such a bug to be unsupported.
>
> Dromedary is running the last release of macOS that supports 32-bit
> hardware, so if we decide to kick that to the curb, I'd either shut
> down the box or put some newer Linux or BSD variant on it.
>
>   



Well, nightjar is on FBSD 9.0 which is oldish. I can replace it before
long with an 11-stable instance if that's appropriate.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Psql patch to show access methods info

2019-09-23 Thread Alexander Korotkov
On Wed, Sep 18, 2019 at 5:04 PM Alvaro Herrera  wrote:
> On 2019-Sep-18, Alexander Korotkov wrote:
>
> > On Tue, Sep 17, 2019 at 9:01 PM Alvaro Herrera  
> > wrote:
>
> > > I think \dAf is just as critical as \dAo; the former lets you know which
> > > opfamilies you can use in CREATE INDEX, while the latter lets you know
> > > which operators would be helped by such an index.  (But, really, only if
> > > the opfamily name is printed in \d of the index, which we currently
> > > don't print unless it's non-default ... which is an omission that
> > > perhaps we should consider fixing).
>
> > I think you have a point.  Will add \dAf command to the patch.
>
> Great, thanks.

Revised patch is attached.

1) It adds \dAf[+] command showing opfamilies, which belong to given
AM and have opclasses for given datatype.
2) It turns back warning when running \dA[+] with 2 or more arguments.

Two questions are open for me:

1) Currently we allow to filter opfamilies by type, but supported
types aren't displayed.  Should we display datatypes?  Should we
aggregate them into comma-separated list?
2) Given we now can display the list of opfamilies, it would be
reasonable to be able to see list of opclasses belonging to particular
opfamily.  But currently \dAc doesn't have filter by opclass.  Should
we implement this as an separate command?

I'll be very glad for feedback.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Add-psql-AM-info-commands-v10.patch
Description: Binary data


Re: JSONPATH documentation

2019-09-23 Thread Alexander Korotkov
Hi!

On Mon, Sep 23, 2019 at 10:10 PM Steven Pousty  wrote:
> Thanks for the education on the path spec. Too bad it is in a zip doc - do 
> you know of a place where it is publicly available so we can link to it? 
> Perhaps there is some document or page you think would be a good reference 
> read for people who want to understand more?
> https://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip

Yes, this link looks good to me.  It's technical report, not standard
itself.  So, it may have some little divergences.  But it seems to be
the best free resource available, assuming standard itself isn't free.

> I am uncertain why JSONPath is considered part of the datatype any more so 
> than string functions are considered part of the character datatype
> https://www.postgresql.org/docs/11/functions-string.html

Let me clarify my thoughts.  SQL-level functions jsonb_path_*() (table
9.49) are clearly not part of jsonpath datatype.  But jsonpath
accessors (table 8.25), functions (table 9.44) and operators (table
9.45) are used inside jsonpath value.  So, technically they are parts
of jsonpath datatype.

P.S.  We don't use top posting in mailing lists.  Please, use bottom
posting.  See https://en.wikipedia.org/wiki/Posting_style#Top-posting
for details.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Proposal: Better query optimization for "NOT IN" clause

2019-09-23 Thread legrand legrand
Hello,

Just for information there are some works regarding how to include this in
core,
that may interest you ;o)

see "NOT IN subquery optimization"
https://www.postgresql.org/message-id/flat/1550706289606-0.post%40n3.nabble.com

commitfest entry: https://commitfest.postgresql.org/24/2023/

and "Converting NOT IN to anti-joins during planning"
https://www.postgresql.org/message-id/flat/CAKJS1f82pqjqe3WT9_xREmXyG20aOkHc-XqkKZG_yMA7JVJ3Tw%40mail.gmail.com

commitfest entry: https://commitfest.postgresql.org/24/2020/

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Cache lookup errors with functions manipulation object addresses

2019-09-23 Thread Dmitry Dolgov
> On Tue, Jul 2, 2019 at 9:28 AM Michael Paquier  wrote:
>
> On Thu, Feb 21, 2019 at 04:40:13PM +0900, Michael Paquier wrote:
> > Rebased version fixing some conflicts with HEAD.
>
> And rebased version for this stuff on HEAD (66c5bd3), giving visibly
> v16.

Thanks for the patch! I couldn't check it in action, since looks like it
doesn't apply anymore [1] (although after a quick check I'm not entirely sure
why). Nevertheless I have a few short commentaries:

v16-0001-Add-flag-to-format_type_extended-to-enforce-NULL.patch

- if (type_oid == InvalidOid && (flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
- return pstrdup("-");
+ if (type_oid == InvalidOid)
+ {
+ if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+ return NULL;
+ else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+ return pstrdup("-");
+ }

Here and in format_operator_extened commentary says

* Returns a palloc'd string.

but now it's possible to return NULL, so I guess comments need to be adjusted,
right?

v16-0003-Eliminate-user-visible-cache-lookup-errors-for-o.patch

- appendStringInfo(, _("operator %s"),
- format_operator(object->objectId));
- break;
+ {
+ char *oprname = format_operator_extended(object->objectId,
+ FORMAT_PROC_FORCE_NULL);

Shouldn't it be FORMAT_OPERATOR_FORCE_NULL here?

I'll continue reviewing the last patch in more details.

[1]: http://cfbot.cputube.org/patch_24_1947.log




Re: JSONPATH documentation

2019-09-23 Thread Steven Pousty
Hey there:
Thanks for the education on the path spec. Too bad it is in a zip doc - do
you know of a place where it is publicly available so we can link to it?
Perhaps there is some document or page you think would be a good reference
read for people who want to understand more?
https://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip

I am uncertain why JSONPath is considered part of the datatype any more so
than string functions are considered part of the character datatype
https://www.postgresql.org/docs/11/functions-string.html


On Mon, Sep 23, 2019 at 11:07 AM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Sep 23, 2019 at 7:52 PM Steven Pousty 
> wrote:
> > JSON Containment, JSONPath, and Transforms are means to work with JSONB
> but not the actual datatype itself. Doc should be split into
> > 1) Data type - how do declare, indexing, considerations when using it...
> > 2) Ways to work with the data type - functions, containment, JSONPath...
> >
> > These can be separate pages or on the same page but they need to be
> logically and physically separated
>
> According to your proposal, where jsonpath functions, operators and
> accessors should be described in?  On the one hand jsonpath functions
> etc. are part of jsonpath datatype.  On the other hand it's functions
> we apply to jsonb documents.
>
> > There should also be a link to some of the original JSONPath spec
> > https://goessner.net/articles/JsonPath/
>
> We implement JSONPath according to SQL Standard 2016.  Your link
> provides earlier attempt to implement jsonpath.  It's similar, but
> some examples don't follow standard (and don't work in our
> implementation).  For instance '$.store.book[(@.length-1)].title'
> should be written as '$.store.book[last - 1] .title'.
>
> > Thank you so much for putting so much work into the documentation!
> Please let me know if there are others way you would like to me help with
> the doc.
>
> Thank you!  My main point is that we should put description of
> jsonpath into single place.  But we need to reach consensus on what
> this place should be.
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: scorpionfly needs more semaphores

2019-09-23 Thread Mikael Kjellström

On 2019-09-23 00:29, Thomas Munro wrote:

On Wed, Sep 18, 2019 at 4:55 PM jungle boogie  wrote:

$ sysctl | ag kern.seminfo.semmni
kern.seminfo.semmni=100


It still seems to be happening.  Perhaps you need to increase semmns too?


I have on my OpenBSD 6.5 /etc/sysctl.conf the following:

kern.seminfo.semmni=200
kern.seminfo.semmns=4000

and it seems to work fine.

/Mikael




Re: JSONPATH documentation

2019-09-23 Thread Alexander Korotkov
On Mon, Sep 23, 2019 at 7:52 PM Steven Pousty  wrote:
> JSON Containment, JSONPath, and Transforms are means to work with JSONB but 
> not the actual datatype itself. Doc should be split into
> 1) Data type - how do declare, indexing, considerations when using it...
> 2) Ways to work with the data type - functions, containment, JSONPath...
>
> These can be separate pages or on the same page but they need to be logically 
> and physically separated

According to your proposal, where jsonpath functions, operators and
accessors should be described in?  On the one hand jsonpath functions
etc. are part of jsonpath datatype.  On the other hand it's functions
we apply to jsonb documents.

> There should also be a link to some of the original JSONPath spec
> https://goessner.net/articles/JsonPath/

We implement JSONPath according to SQL Standard 2016.  Your link
provides earlier attempt to implement jsonpath.  It's similar, but
some examples don't follow standard (and don't work in our
implementation).  For instance '$.store.book[(@.length-1)].title'
should be written as '$.store.book[last - 1] .title'.

> Thank you so much for putting so much work into the documentation! Please let 
> me know if there are others way you would like to me help with the doc.

Thank you!  My main point is that we should put description of
jsonpath into single place.  But we need to reach consensus on what
this place should be.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: overhead due to casting extra parameters with aggregates (over and over)

2019-09-23 Thread Tomas Vondra

On Mon, Sep 23, 2019 at 12:53:36PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

I've been working on a custom aggregate, and I've ran into some fairly
annoying overhead due to casting direct parameters over and over. I'm
wondering if there's a way to eliminate this, somehow, without having to
do an explicit cast.



Imagine you have a simple aggregate:



  CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[])
  (
...
  );



with two direct parameters (actually, I'm not sure that's the correct
term, becuse this is not an ordered-set aggregate and [1] only talks
about direct parameters in that context). Anyway, I'm talking about the
extra parameters, after the 'double precision' value to aggregate.


But you're not telling the system that those are direct parameters,
at least not if you mean that they can only legitimately have one value
across the whole query.  As-is, they're just more aggregated arguments
so we have to evaluate them again at each row.



Understood.


It's fairly messy that the SQL spec ties direct arguments to ordered-set
aggregates; you'd think there'd be some value in treating those features
as orthogonal.  I'm not sure how we could wedge them into the syntax
otherwise, though :-(.  You could perhaps convert your aggregate to
an ordered-set aggregate, but then you'd be paying for a sort that
you don't need, IIUC.



Yeah, having to do the sort (and keep all the data) is exactly what the
tdigest is meant to eliminate, so making it an ordered-set aggregate is
exactly the thing I don't want to do. Also, it disables parallel query,
which is another reason not to do that.


After a while, I've realized that the issue is casting - the CTE
produces numeric[] array, and we do the cast to double precision[] on
every call to the state transition function (and we do ~10M of those).


The only reason that the CTE reference is cheap is that we understand
that it's stable so we don't have to recompute it each time; otherwise
you'd be moaning about that more than the cast.  As you say, the short
term workaround is to do the casting inside the sub-select.  I think the
long term fix is to generically avoid re-computing stable subexpressions.
There was a patch for that a year or so ago but the author never finished
it, AFAIR.



Hmmm, yeah. I'll dig throgh the archives, although it's not a very high
priority - it's more a thing that surprised/bugged me while working on
the custom aggregate.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: overhead due to casting extra parameters with aggregates (over and over)

2019-09-23 Thread Tom Lane
Tomas Vondra  writes:
> I've been working on a custom aggregate, and I've ran into some fairly
> annoying overhead due to casting direct parameters over and over. I'm
> wondering if there's a way to eliminate this, somehow, without having to
> do an explicit cast.

> Imagine you have a simple aggregate:

>   CREATE AGGREGATE tdigest_percentile(double precision, int, double 
> precision[])
>   (
> ...
>   );

> with two direct parameters (actually, I'm not sure that's the correct
> term, becuse this is not an ordered-set aggregate and [1] only talks
> about direct parameters in that context). Anyway, I'm talking about the
> extra parameters, after the 'double precision' value to aggregate.

But you're not telling the system that those are direct parameters,
at least not if you mean that they can only legitimately have one value
across the whole query.  As-is, they're just more aggregated arguments
so we have to evaluate them again at each row.

It's fairly messy that the SQL spec ties direct arguments to ordered-set
aggregates; you'd think there'd be some value in treating those features
as orthogonal.  I'm not sure how we could wedge them into the syntax
otherwise, though :-(.  You could perhaps convert your aggregate to
an ordered-set aggregate, but then you'd be paying for a sort that
you don't need, IIUC.

> After a while, I've realized that the issue is casting - the CTE
> produces numeric[] array, and we do the cast to double precision[] on
> every call to the state transition function (and we do ~10M of those).

The only reason that the CTE reference is cheap is that we understand
that it's stable so we don't have to recompute it each time; otherwise
you'd be moaning about that more than the cast.  As you say, the short
term workaround is to do the casting inside the sub-select.  I think the
long term fix is to generically avoid re-computing stable subexpressions.
There was a patch for that a year or so ago but the author never finished
it, AFAIR.

regards, tom lane




Re: JSONPATH documentation

2019-09-23 Thread Steven Pousty
JSON Containment, JSONPath, and Transforms are means to work with JSONB but
not the actual datatype itself. Doc should be split into
1) Data type - how do declare, indexing, considerations when using it...
2) Ways to work with the data type - functions, containment, JSONPath...

These can be separate pages or on the same page but they need to be
logically and physically separated

There should also be a link to some of the original JSONPath spec
https://goessner.net/articles/JsonPath/

Thank you so much for putting so much work into the documentation! Please
let me know if there are others way you would like to me help with the doc.

On Sun, Sep 22, 2019 at 4:03 PM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Sep 23, 2019 at 1:03 AM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > On Sun, Sep 22, 2019 at 9:18 PM Jeff Janes 
> wrote:
> > > Currently description of jsonpath is divided between datatypes section
> > > and functions and operators section.  And yes, this looks cumbersome.
> >
> > Agreed, but ...
> >
> > > I think we should move the whole description to the one section.
> > > Probably we should move jsonpath description to datatypes section
> > > (assuming jsonpath is a datatype) leaving functions and operators
> > > section with just SQL-level functions and operators.  What do you
> > > think?
> >
> > ... I don't think that's an improvement.  We don't document detailed
> > behavior of a datatype's functions in datatype.sgml, and this seems
> > like it would be contrary to that layout.  If anything, I'd merge
> > the other way, with only a very minimal description of jsonpath
> > (perhaps none?) in datatype.sgml.
> >
> > While we're whining about this, I find it very off-putting that
> > the jsonpath stuff was inserted in the JSON functions section
> > ahead of the actual JSON functions.  I think it should have
> > gone after them, because it feels like a barely-related interjection
> > as it stands.  Maybe there's even a case that it should be
> > its own , after the "functions-json" section.
>
> Yes, it think moving jsonpath description to own  is a good
> idea.  But I still think we should have complete jsonpath description
> in the single place.  What about joining jsonpath description from
> both datatypes section and functions and operators section into this
> , leaving datatypes section with something very brief?
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>


Re: Global temporary tables

2019-09-23 Thread Pavel Stehule
po 23. 9. 2019 v 9:57 odesílatel Konstantin Knizhnik <
k.knizh...@postgrespro.ru> napsal:

>
>
> On 20.09.2019 19:43, Pavel Stehule wrote:
>
>
> 1. I do not need them at all.
>> 2. Eliminate catalog bloating.
>> 3. Mostly needed for compatibility with Oracle (simplify porting,...).
>> 4. Parallel query execution.
>> 5. Can be used at replica.
>> 6. More efficient use of resources (first of all memory).
>>
>
> There can be other point important for cloud. Inside some cloud usually
> there are two types of discs - persistent (slow) and ephemeral (fast). We
> effectively used temp tables there because we moved temp tablespace to
> ephemeral discs.
>
>
> Yes, I already heard this argument and agree with it.
> I just want to notice two things:
> 1. My assumption is that in most cases data of temporary table can fit in
> memory (certainly if we are not limiting them by temp_buffers = 8MB, but
> store in shared buffers) and so there is on need to write them to the
> persistent media at all.
> 2. Global temp tables do not substitute local temp tables, accessed
> through local buffers. So if you want to use temporary storage, you will
> always have a way to do it.
> The question is whether we need to support two kinds of global temp tables
> (with shared or private buffers) or just implement one of them.
>

It's valid only for OLTP.  OLAP world is totally different. More if all
users used temporary tables, and you should to calculate with it - it is
one reason for global temp tables, then you need multiply size by
max_connection.

hard to say what is best from implementation perspective, but it can be
unhappy if global temporary tables has different performance
characteristics and configuration than local temporary tables.

>
>
> I missing one point in your list - developer's comfort - using temp tables
> is just much more comfortable - you don't need create it again, again, ..
> Due this behave is possible to reduce @2 and @3 can be nice side effect. If
> you reduce @2 to zero, then @5 should be possible without any other.
>
> Sorry, I do not completely understand your point here
> You can use normal (permanent) table and you will not have to create them
> again and again. It is also possible to use them for storing temporary data
> - just need to truncate table when data is not needed any more.
> Certainly you can not use the same table in more than one backend. Here is
> the main advantage of temp tables - you can have storage of per-session
> data and do not worry about possible name conflicts.
>

You use temporary tables because you know so you share data between session
never. I don't remember any situation when I designed temp tables with
different schema for different sessions.

Using global temp table is not effective - you are work with large tables,
you need to use delete, .. so you cannot to use classic table like temp
tables effectively.


> From the other side: there are many cases where format of temporary data
> is not statically known: it is determined dynamically during program
> execution.
> In this case local temp table provides the most convenient mechanism for
> working with such data.
>
> This is why I think that ewe need to have both local and global temp
> tables.
>
> Also I do not agree with your statement "If you reduce @2 to zero, then @5
> should be possible without any other".
> In the solution implemented by Aleksander Alekseev metadata of temporary
> tables is kept in memory and not affecting catalog at all.
> But them still can not be used at replica.
> There are still some serious problems which need to be fixed to able it:
> allow insert/update/delete statements for read-only transactions, somehow
> assign XIDs for them, implement savepoints and rollback of such
> transactions.
> All this was done in the last version of my patch.
> Yes, it doesn't depend on whether we are using shared or private buffers
> for temporary tables. The same approach can be implemented for both of them.
> The question is whether we are really need temp tables at replica and if
> so, do we need full transaction support for them, including rollbacks,
> subtransactions.
>

temporary tables (of any type) on replica is interesting feature that opens
some possibilities. Some queries cannot be optimized and should be divided
and some results should be stored to temporary tables, analysed (to get
correct statistics), maybe indexed, and after that the calculation can
continue. Now you can do this just only on master. More - on HotStandBy the
data are read only, and without direct impact on master (production), so
you can do some harder calculation there. And temporary tables is used
technique how to fix estimation errors.

I don't think so subtransaction, transaction, rollbacks are necessary for
these tables. On second hand with out it, it is half cooked features, and
can looks pretty strange in pg environment.

I am very happy, how much work you do in this area, I had not a courage to
start this job, but I 

Re: [proposal] de-TOAST'ing using a iterator

2019-09-23 Thread Alvaro Herrera
Paul Ramsey, do you have opinions to share about this patch?  I think
PostGIS might benefit from it.  Thread starts here:

https://postgr.es/m/cal-ogks_onzpc9m9bxpcztmofwulcfkyecekiagxzwrl8kx...@mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_upgrade check fails on Solaris 10

2019-09-23 Thread Alvaro Herrera
On 2019-Sep-23, Marina Polyakova wrote:

> On 2019-09-18 17:36, Alvaro Herrera wrote:
> > On 2019-Sep-17, Marina Polyakova wrote:
> > 

> > > We got an error for pg_upgrade check on the branch REL_11_STABLE
> > > (commit
> > > 40ad4202513c72f5c1beeb03e26dfbc8890770c0) on Solaris 10 because IIUC
> > > the
> > > argument to the sed command is not enclosed in quotation marks (see
> > > [1]):
> > 
> > Hmm, I'm surprised it has taken this long to detect the problem.
> 
> Looking at the members of buildfarm [1] castoroides and protosciurus  - IIUC
> they do not check pg_upgrade. And I was that lucky one who have run the
> branch with the latest commits at our buildfarm...

Argh.

But I meant "how come nobody runs pg_upgrade tests on old Solaris?"

> > I have pushed it to all branches that have src/bin/pg_upgrade (namely,
> > 9.5 onwards), thanks.  I hope this won't make the msys/mingw machines
> > angry ;-)
> 
> Thank you! I ran pg_upgrade tests for MSYS, everything is fine.
> 
> The branch REL9_4_STABLE (commit 8a17afe84be6fefe76d0d2f4d26c5ee075e64487)
> has the same issue - according to the release table [2] it is still
> supported, isn't it?...

Yeah, but pg_upgrade is in contrib/ in 9.4, so nowhere as good as from
9.5 onwards; and it's going to die in a couple of months anyway, so I'm
not thrilled about fixing this there.

If you *need* to have this fixed in 9.4, we can do that, but do you?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_upgrade check fails on Solaris 10

2019-09-23 Thread Marina Polyakova

On 2019-09-18 17:36, Alvaro Herrera wrote:

On 2019-Sep-17, Marina Polyakova wrote:


Hello, hackers!

We got an error for pg_upgrade check on the branch REL_11_STABLE 
(commit
40ad4202513c72f5c1beeb03e26dfbc8890770c0) on Solaris 10 because IIUC 
the
argument to the sed command is not enclosed in quotation marks (see 
[1]):


Hmm, I'm surprised it has taken this long to detect the problem.


Looking at the members of buildfarm [1] castoroides and protosciurus  - 
IIUC they do not check pg_upgrade. And I was that lucky one who have run 
the branch with the latest commits at our buildfarm...



Attached diff.patch fixes the problem.


I have pushed it to all branches that have src/bin/pg_upgrade (namely,
9.5 onwards), thanks.  I hope this won't make the msys/mingw machines
angry ;-)


Thank you! I ran pg_upgrade tests for MSYS, everything is fine.

The branch REL9_4_STABLE (commit 
8a17afe84be6fefe76d0d2f4d26c5ee075e64487) has the same issue - according 
to the release table [2] it is still supported, isn't it?...


[1] https://buildfarm.postgresql.org/cgi-bin/show_members.pl
[2] https://www.postgresql.org/support/versioning/

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




overhead due to casting extra parameters with aggregates (over and over)

2019-09-23 Thread Tomas Vondra

Hi,

I've been working on a custom aggregate, and I've ran into some fairly
annoying overhead due to casting direct parameters over and over. I'm
wondering if there's a way to eliminate this, somehow, without having to
do an explicit cast.

Imagine you have a simple aggregate:

 CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[])
 (
   ...
 );

with two direct parameters (actually, I'm not sure that's the correct
term, becuse this is not an ordered-set aggregate and [1] only talks
about direct parameters in that context). Anyway, I'm talking about the
extra parameters, after the 'double precision' value to aggregate.

The last parameter is an array of values in [0.0,1.0], representing
percentiles (similarly to what we do in percentile_cont). It's annoying
to write literal values, so let's use CTE to generate the array:

 WITH
   perc AS (SELECT array_agg(i/100.0) AS p
FROM generate_series(1,99) s(i))
 SELECT
   SELECT tdigest_percentile(random(), 100, (SELECT p FROM perc))
   FROM generate_series(1,1000);

which does work, but it's running for ~180 seconds. When used with an
explicit array literal, it runs in ~1.6 second.

 SELECT tdigest_percentile(random(), 100, ARRAY[0.01, ..., 0.99]))
 FROM generate_series(1,1000);

After a while, I've realized that the issue is casting - the CTE
produces numeric[] array, and we do the cast to double precision[] on
every call to the state transition function (and we do ~10M of those).
The cast is fairly expensive - much more expensive than the aggregate
itself. The explicit literal ends up being the right type, so the whole
query is much faster.

And indeed, adding the explicit cast to the CTE query

 WITH
   perc AS (SELECT array_agg((i/100.0)::double precision) AS p
FROM generate_series(1,99) s(i))
 SELECT
   SELECT tdigest_percentile(random(), 100, (SELECT p FROM perc))
   FROM generate_series(1,1000);

does the trick - the query is ~1.6s again.

I wonder if there's a chance to detect and handle this without having to
do the cast over and over? I'm thinking that's not quite possible,
because the value is not actually guaranteed to be the same for all
calls (even though it's the case for the example I've given).

But maybe we could flag the parameter somehow, to make it more like the
direct parameter (which is only evaluated once). I don't really need
those extra parameters in the transition function at all, it's fine to
just get it to the final function (and there should be far fewer calls
to those).

regards


[1] https://www.postgresql.org/docs/current/sql-createaggregate.html

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Wrong sentence in the README?

2019-09-23 Thread Daniel Westermann (DWE)
"Daniel Westermann (DWE)"  writes:
>> in the README, top level, there is this:

>> PostgreSQL has many language interfaces, many of which are listed here:
>> https://www.postgresql.org/download

>> I don't think the download page lists any language interfaces or do I miss 
>> something?

>Not directly on that page, though if you drill down into the "software
>catalogue" you can find them.

>Since there's already a link to that same page a bit further down in
>the file, I'm inclined to just remove the quoted sentence.  Maybe
>add something about "and related software" to the later link.  Certainly
>"language interfaces" is just one small part of that.

Thank you, Tom. I'll try to come up with a patch for that in the next days.

Regards
Daniel


Re: Commit fest 2019-09

2019-09-23 Thread Alvaro Herrera
Hello

The fourth week of this commitfest begins with these numbers:

  statusstring  │ week1 │ week2 │ week3 │ week4
┼───┼───┼───┼───
 Needs review   │   165 │   138 │   116 │   118
 Waiting on Author  │30 │44 │51 │44
 Ready for Committer│11 │ 5 │ 8 │11
 Returned with Feedback │ 1 │ 4 │ 5 │ 5
 Moved to next CF   │ 2 │ 4 │ 4 │ 4
 Committed  │14 │23 │32 │34
 Rejected   │ 1 │ 1 │ 1 │ 1
 Withdrawn  │ 4 │ 9 │11 │11

Progress last week is disappointing.  I hope we manage to get more patches
closed this week.

In terms of bugfixes, this is how we started the month:

   status   │ week1 │ week4 
┼───┼───
 Needs review   │20 │10
 Waiting on Author  │ 4 │ 6
 Ready for Committer│ 2 │ 1
 Returned with Feedback │ 0 │ 1
 Moved to next CF   │ 0 │ 1
 Committed  │ 2 │ 6
 Withdrawn  │ 1 │ 2

We've not made a lot of progress here either, only four bugfix patches
committed in three weeks :-(

In terms of patch author participation in reviewing other people's
patches, the metrics are decidedly disappointing.  Strongest reviewers,
counting replies on threads for patches other than one's own:

* Michael Paquier
* Álvaro Herrera (discounting "please rebase" / "marked WoA" pings)
* Amit Kapila
* Tom Lane

Everyone is welcome to join the party!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [proposal] de-TOAST'ing using a iterator

2019-09-23 Thread Binguo Bao
Alvaro Herrera  于2019年9月17日周二 上午5:51写道:

> On 2019-Sep-10, Binguo Bao wrote:
>
> > +/*
> > + * Support for de-TOASTing toasted value iteratively. "need" is a
> pointer
> > + * between the beginning and end of iterator's ToastBuffer. The marco
> > + * de-TOAST all bytes before "need" into iterator's ToastBuffer.
> > + */
> > +#define PG_DETOAST_ITERATE(iter, need)
>  \
> > + do {
>
>   \
> > + Assert((need) >= (iter)->buf->buf && (need) <=
> (iter)->buf->capacity);  \
> > + while (!(iter)->done && (need) >= (iter)->buf->limit) {
>  \
> > + detoast_iterate(iter);
>   \
> > + }
>
>  \
> > + } while (0)
> >  /* WARNING -- unaligned pointer */
> >  #define PG_DETOAST_DATUM_PACKED(datum) \
> >   pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum))
>
> In broad terms this patch looks pretty good to me.  I only have a small
> quibble with this API definition in fmgr.h -- namely that it forces us
> to export the definition of all the structs (that could otherwise be
> private to toast_internals.h) in order to satisfy callers of this macro.
> I am wondering if it would be possible to change detoast_iterate and
> PG_DETOAST_ITERATE in a way that those details remain hidden -- I am
> thinking something like "if this returns NULL, then iteration has
> finished"; and relieve the macro from doing the "->buf->buf" and
> "->buf->limit" checks.  I think changing that would require a change in
> how the rest of the code is structured around this (the textpos internal
> function), but seems like it would be better overall.
>
> (AFAICS that would enable us to expose much less about the
> iterator-related structs to detoast.h -- you should be able to move the
> struct defs to toast_internals.h)
>
> Then again, it might be just wishful thinking, but it seems worth
> considering at least.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

I've tied to hide the details of the struct in patch v11 with checking
"need" pointer
inside detoast_iterate function. I also compared the performance of the two
versions.

 patch v10  patch v11
comp. beg.1413ms  1489ms
comp. end   24327ms28011ms
uncomp. beg. 1439ms  1432ms
uncomp. end 25019ms29007ms

We can see that v11 is about 15% slower than v10 on suffix queries since
this involves
the complete de-TOASTing and detoast_iterate() function is called
frequently in v11.

Personally, I prefer patch v10. Its performance is superior, although it
exposes some struct details.

-- 
Best regards,
Binguo Bao
From 8f6381d272816175d3e681c9d1cd8c6b5f27f44f Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/common/detoast.c | 110 +
 src/backend/access/common/toast_internals.c | 355 
 src/backend/utils/adt/varlena.c |  29 ++-
 src/include/access/detoast.h|  32 +++
 src/include/access/toast_internals.h|  75 ++
 src/include/fmgr.h  |   8 +
 6 files changed, 603 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index c8b49d6..f437d27 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -290,6 +290,116 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 }
 
 /* --
+ * create_detoast_iterator -
+ *
+ * It only makes sense to initialize a de-TOAST iterator for external on-disk values.
+ *
+ * --
+ */
+DetoastIterator
+create_detoast_iterator(struct varlena *attr)
+{
+	struct varatt_external toast_pointer;
+	DetoastIterator iter;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		iter = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iter->done = false;
+
+		/* This is an externally stored datum --- initialize fetch datum iterator */
+		iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			iter->compressed = true;
+
+			/* prepare buffer to received decompressed data */
+			iter->buf = create_toast_buffer(toast_pointer.va_rawsize, false);
+
+			/* initialize state for pglz_decompress_iterate() */
+			iter->ctrl = 0;
+			iter->ctrlc = INVALID_CTRLC;
+		}
+		else
+		{
+			iter->compressed = false;
+
+			/* point the buffer directly at the raw data */
+			iter->buf = iter->fetch_datum_iterator->buf;
+		}
+		return iter;
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/* indirect pointer --- dereference it */
+		struct varatt_indirect redirect;
+
+		

Proposal: Better query optimization for "NOT IN" clause

2019-09-23 Thread John Bester

  
  
One-line Summary:
Better query optimization for "NOT IN" clause



Business Use-case:
Using x NOT IN (SELECT y FROM target) on extremely large tables can
be done very fast. This
might be necessary in order to introduce foreign keys where old
systems relied on
a user to type in a reference manually. Typically I am useing NOT IN
to find invalid
foreign keys which was not protected by old database designs. More
specifically, the
following query would return what I am looking for:
SELECT source, docno
FROM tableA
WHERE (source, docno) NOT IN (SELECT b.source, b.docno FROM tableB
b)
The query took a few hours (tableB contains 1.8m rows and tableA
7.8m rows)



User impact with the change:
A search for invalid foreign keys can be reduced from hours or days
to seconds. The
reasoning behind this implementation is as follows: If I can return
all rows
from "SELECT col1, col2, ... FROM tableA ORDER BY col1, col2" in few
seconds,
then calculating missing / NOT IN (or existing / IN) values in a few
seconds
as well.



Implementation details:
I have written a general purpose plpgsql function which returns an
SETOF text[]. The function
takes two table names with two sets of column names and optionally
two filters on the
different tables. Unfortunately I had to cast values to text to have
this work in
a plpgsql function, but I got the results I needed.

My required SQL:
SELECT source, docno
FROM tableA
WHERE (source, docno) NOT IN (SELECT b.source, b.docno FROM tableB
b)

The query took a few hours (tableB contains 1.8m rows and tableA
7.8m rows)

I used my function as follows:
SELECT DISTINCT a[1], a[2] FROM util.notIn('tableA', 'source,
docno', 'tableB', 'source, docno', null, null) a ORDER BY a[1], a[2]
This function returned what I needed in 51 seconds, where the NOT IN
statement ran for hours.

Before including the plpgsql source, let me explain what it does:
The function builds up two SQL statements using the given values
(remember I did not specify
any filter on either of the two tables - the last two parameters are
null).
Resulting statements are:

Primary: SELECT DISTINCT ARRAY[source::text, docno::text] FROM
tableA WHERE source IS NOT NULL AND docno IS NOT NULL ORDER BY
ARRAY[source::text, docno::text]
Secondary: SELECT DISTINCT ARRAY[source::text, docno::text] FROM
tableB WHERE source IS NOT NULL AND docno IS NOT NULL ORDER BY
ARRAY[source::text, docno::text]

(As you can see, I stripped out NULLs, because that I can check for
easily with a simple query)
Now I open two cursors and fetch the first record of both statements
into separate variables. Since
the FOUND variable is overwritten with every FETCH, I store the
states in my own variables.
Ordering the results are important, since it is important for the
working of the function.
A loop now advances both or one of the two result sets based on
whether the rows are equal or
not (if not equal, return a row (if primary rows < secondary )
and only fetch from the dataset
which is lesser of the two. You can early terminate when the primary
dataset end has been
reached, alternatively if the secondary dataset reached its end, you
can simply add all remaining
rows from the primary dataset to the result.

The best implementation for this would be to detect a NOT IN (or IN)
WHERE clause during query optimization
phase and use a sequential scan as done in the function. Important
is that the IN / NOT IN select statement
must not be dependent on the current row of the main SELECT.

A real long term solution would actually be to extend the SQL
definition language to allow for a different
type of join. To optimize IN, you could rewrite your query as an
INNER JOIN (providing you you join to
a DISTINCT dataset). However, the NOT IN is not so easy (except for
LEFT JOIN testing for NULL in secondary
table). What I would propose is the following:
SELECT a.colA
FROM TableA
[LEFT | RIGHT] MISSING JOIN TableB ON TableB.colB = TableA.colA

CREATE OR REPLACE FUNCTION util.notIn(pTableA text, pFieldA text,
pTableB text, pFieldB text, pFilterA text, pFilterB text) RETURNS
SETOF text[] AS
$BODY$
DECLARE
  vFieldsA text[];
  vFieldsB text[];
  vValueA text[];
  vValueB text[];
  vRowA record;
  vRowB record;
  vSQLA text;
  vSQLB text;
  vWhereA text;
  vWhereB text;
  vSelectA text;
  vSelectB text;
  vCursorA refcursor;
  vCursorB refcursor;
  vFirst integer;
  vLast integer;
  vNdx integer;
  vMoreA boolean;
  vMoreB boolean;
BEGIN
  IF pTableA 

Re: Unwanted expression simplification in PG12b2

2019-09-23 Thread Robert Haas
On Sun, Sep 22, 2019 at 7:47 AM Darafei "Komяpa" Praliaskouski
 wrote:
> A heuristic I believe should help my case (and I hardly imagine how it can 
> break others) is that in presence of Gather, all the function calls that are 
> parallel safe should be pushed into it.

The cost of pushing data through the Sort is not necessarily
insignificant.  Your functions are (IIUC) extremely expensive, so it's
worth going to any length to reduce the time spent evaluating them.
However, if someone has ||(text,text) in the tlist, that might be the
wrong approach, because it's not saving much to compute that earlier
and it might make the sort a lot wider, especially if de-TOASTing is
involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Efficient output for integer types

2019-09-23 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> + return pg_ltostr_zeropad(str, (uint32)0 - (uint32)value, minwidth - 
1);

No, this is just reintroducing the undefined behavior again. Once the
value has been converted to unsigned you can't cast it back to signed or
pass it to a function expecting a signed value, since it will overflow
in the INT_MIN case. (and in this example would probably output '-'
signs until it ran off the end of memory).

Here's how I would do it:

char *
pg_ltostr_zeropad(char *str, int32 value, int32 minwidth)
{
int32   len;
uint32  uvalue = value;

Assert(minwidth > 0);

if (value >= 0)
{
if (value < 100 && minwidth == 2) /* Short cut for common case 
*/
{
memcpy(str, DIGIT_TABLE + value*2, 2);
return str + 2;
}
}
else
{
*str++ = '-';
minwidth -= 1;
uvalue = (uint32)0 - uvalue;
}

len = pg_ultoa_n(uvalue, str);
if (len >= minwidth)
return str + len;

memmove(str + minwidth - len, str, len);
memset(str, '0', minwidth - len);
return str + minwidth;
}

 David>  pg_ltostr(char *str, int32 value)
 David> +   int32   len = pg_ultoa_n(value, str);
 David> +   return str + len;

This seems to have lost its handling of negative numbers entirely (which
doesn't say much for the regression test coverage)

-- 
Andrew (irc:RhodiumToad)




Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2019-09-23 Thread Tomas Vondra

On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:

Is this an issue?
Can we fix like this?
Thanks!



I do think it is a valid issue. No opinion on the fix yet, though.
The report was sent on saturday, so patience ;-)

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Hi guys, HELP please

2019-09-23 Thread Tomas Vondra

On Fri, Sep 20, 2019 at 09:21:59PM +, Castillo, Steven (Agile)
wrote:

Hi,

I wonder if you guys can help me with this, I've been struggling with
this query for almost a week and I haven't been able to tune it, it
runs forever and I need it to run fast.



Hard to say, because all we have is an explain without any additional
information (like amount of data, PostgreSQL version, settings like
work_mem). Maybe look at [1] which explains what to try, and also what
to include in your question.

[1] https://wiki.postgresql.org/wiki/Slow_Query_Questions

Now, if I had to guess, I'd say this is a case of underestimate, causing
a choice of nested loops. That's fairly deadly.

In particular, I'm talking about this:

->  Seq Scan on t_territory_common tc  (cost=0.00..6494012.54 rows=49 width=232)
Filter: (((source)::text = 'DSCHED'::text) AND ... many conditions  


How many rows does this return when you query just this table (with all
the conditions)? Chances are those conditions are correlated, in which
case the number of rows is much higher than 49 (possibly by orders of
magnitude).

If that's the case, you have multiple options:

1) create a temporary table, and then joining it (can be analyzed,
estimates are likely much better)

2) disable nested loops for this query (useful for testing/investigation)

3) create extended statistics on those correlated columns (depends on
which PostgreSQL version you use)

4) redo the table schema (e.g. have a special column representing
combination of those columns), so that there's just a single condition
(thus no misestimate due to correlation)


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Hi guys, HELP please

2019-09-23 Thread Castillo, Steven (Agile)
Hi,

I wonder if you guys can help me with this, I've been struggling with this 
query for almost a week and I haven't been able to tune it, it runs forever and 
I need it to run fast.

Regards.

Steven Castillo



Time Release Plan.sql
Description: Time Release Plan.sql


Time Release Query.sql
Description: Time Release Query.sql


Re: Attempt to consolidate reading of XLOG page

2019-09-23 Thread Antonin Houska
Alvaro Herrera  wrote:

> I was confused by the struct name XLogSegment -- the struct is used to
> represent a WAL segment while it's kept open, rather than just a WAL
> segment in abstract.  Also, now that we've renamed everything to use the
> term WAL, it seems wrong to use the name XLog for new structs.  I
> propose the name WALOpenSegment for the struct, which solves both
> problems.  (Its initializer function would get the name
> WALOpenSegmentInit.)
> 
> Now, the patch introduces a callback for XLogRead, the type of which is
> called XLogOpenSegment.  If we rename it from XLog to WAL, both names
> end up the same.  I propose to rename the function type to
> WALSegmentOpen, which in a "noun-verb" view of the world, represents the
> action of opening a WAL segment.
> 
> I attach a patch for all this renaming, on top of your series.

ok, thanks.

In addition I renamed WalSndOpenSegment() to WalSndSegmentOpen() and
read_local_xlog_page_open_segment() to read_local_xlog_page_segment_open()

> I wonder if each of those WALSegmentOpen callbacks should reset [at
> least some members of] the struct; they're already in charge of setting
> ->file, and apparently we're leaving the responsibility of setting the
> rest of the members to XLogRead.  That seems weird.  Maybe we should say
> that the CB should only open the segment and not touch the struct at all
> and XLogRead is in charge of everything.  Perhaps the other way around
> -- the CB should set everything correctly ... I'm not sure which is
> best.  But having half here and half there seems a recipe for confusion
> and bugs.

ok, I've changed the CB signature. Now it receives poiners to the two
variables that it can change while the "seg" argument is documented as
read-only. To indicate that the CB should determine timeline itself, I
introduced a new constant InvalidTimeLineID, see the 0004 part.

> Another thing I didn't like much is that everything seems to assume that
> the only error possible from XLogRead is a read error.  Maybe that's
> okay, because it seems to be the current reality, but it seemed odd.

In this case I only moved the ereport() code from XLogRead() away (so that
both backend and frontend can call the function). Given that the code to open
WAL segment is now in the callbacks, the only thing that XLogRead() can
ereport is that read() failed. BTW, I introduced one more structure,
XLogReadError, in this patch version. I think it's better than adding
error-specific fields to the WALOpenSegment structure.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

>From 674fa97ef9df8b1fe875139aa7b43f605255d8cd Mon Sep 17 00:00:00 2001
From: Antonin Houska 
Date: Mon, 23 Sep 2019 07:40:49 +0200
Subject: [PATCH 2/6] Remove TLI from some argument lists.

The timeline information is available to caller via XLogReaderState. Now that
XLogRead() is gonna be (sometimes) responsible for determining the TLI, it
would have to be added the (TimeLineID *) argument too, just to be consistent
with the current coding style. Since XLogRead() updates also other
position-specific fields of XLogReaderState, it seems simpler if we remove the
output argument from XLogPageReadCB and always report the TLI via
XLogReaderState.
---
 src/backend/access/transam/xlog.c  |  7 +++
 src/backend/access/transam/xlogreader.c|  6 +++---
 src/backend/access/transam/xlogutils.c | 11 ++-
 src/backend/replication/logical/logicalfuncs.c |  4 ++--
 src/backend/replication/walsender.c|  2 +-
 src/bin/pg_rewind/parsexlog.c  |  8 +++-
 src/bin/pg_waldump/pg_waldump.c|  2 +-
 src/include/access/xlogreader.h|  8 +++-
 src/include/access/xlogutils.h |  5 ++---
 src/include/replication/logicalfuncs.h |  2 +-
 10 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b7ff004234..7a89dfed7f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -885,8 +885,7 @@ static int	XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 		 int source, bool notfoundOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
 static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
-		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
-		 TimeLineID *readTLI);
+		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		bool fetching_ckpt, XLogRecPtr tliRecPtr);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
@@ -11523,7 +11522,7 @@ CancelBackup(void)
  */
 static int
 XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
-			 XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
+			 XLogRecPtr targetRecPtr, char *readBuf)
 {
 	XLogPageReadPrivate *private =

Re: pgbench - allow to create partitioned tables

2019-09-23 Thread Amit Kapila
On Mon, Sep 23, 2019 at 11:58 AM Fabien COELHO  wrote:
>
>
> Hello Amit,
>
> > It is better for a user to write a custom script for such cases.
>
> I kind-of agree, but IMHO this is not for pgbench to decide what is better
> for the user and to fail on a script that would not fail.
>
> > Because after that "select-only" or "simple-update" script doesn't
> > make any sense. [...].
>
> What make sense in a benchmarking context may not be what you think. For
> instance, AFAICR, I already removed benevolent but misplaced guards which
> were preventing running scripts without queries: if one wants to look at
> pgbench overheads because they are warry that it may be too high, they
> really need to be allowed to run such scripts.
>
> This not for us to decide, and as I already said they do if you want to
> test no-op overheads. Moreover the problem pre-exists: if the user deletes
> the contents of pgbench_accounts these scripts are no-op, and we do not
> complain. The no partition attached is just a particular case.
>
> > Having said that, I see your point and don't mind allowing such cases
> > until we don't have to write special checks in the code to support such
> > cases.
>
> Indeed, it is also simpler to not care about such issues in the code.
>

If you agree with this, then why haven't you changed below check in patch:
+ if (partition_method != PART_NONE)
+ printf("partition method: %s\npartitions: %d\n",
+PARTITION_METHOD[partition_method], partitions);

This is exactly the thing bothering me.  It won't be easy for others
to understand why this check for partitioning information is different
from other checks.  For you or me, it might be okay as we have
discussed this case, but it won't be apparent to others.  This doesn't
buy us much, so it is better to keep this code consistent with other
places that check for partitions.

> > [...] Now, we can have a detailed comment in printResults to explain why
> > we have a different check there as compare to other code paths or change
> > other code paths to have a similar check as printResults, but I am not
> > convinced of any of those options.
>
> Yep. ISTM that the current version is reasonable.
>
> > [...] I am talking about the call to append_fillfactor in
> > createPartitions() function.  See, in my version, there are some
> > comments.
>
> Ok, I understand that you want a comment. Patch v15 does that.
>

Thanks!



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Efficient output for integer types

2019-09-23 Thread Tels

Moin,

On 2019-09-22 23:58, David Fetter wrote:

On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote:

> "David" == David Fetter  writes:



Fixed.


Good work, more performance is sure nice :)

Noticed one more thing in the patch:


-   *start++ = *a;
-   *a-- = swap;
+   memcpy(pos - 2, DIGIT_TABLE + c, 2);
+   i += 2;
}
+   else
+   *a = (char) ('0' + value2);
+
+   return olength;
}


The line "i += 2;" modifies i, but i is never used again nor returned.

Best regards,

Tels




Re: Global temporary tables

2019-09-23 Thread Konstantin Knizhnik



On 20.09.2019 19:43, Pavel Stehule wrote:


1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).


There can be other point important for cloud. Inside some cloud 
usually there are two types of discs - persistent (slow) and ephemeral 
(fast). We effectively used temp tables there because we moved temp 
tablespace to ephemeral discs.


Yes, I already heard this argument and agree with it.
I just want to notice two things:
1. My assumption is that in most cases data of temporary table can fit 
in memory (certainly if we are not limiting them by temp_buffers = 8MB, 
but store in shared buffers) and so there is on need to write them to 
the persistent media at all.
2. Global temp tables do not substitute local temp tables, accessed 
through local buffers. So if you want to use temporary storage, you will 
always have a way to do it.
The question is whether we need to support two kinds of global temp 
tables (with shared or private buffers) or just implement one of them.




I missing one point in your list - developer's comfort - using temp 
tables is just much more comfortable - you don't need create it again, 
again, .. Due this behave is possible to reduce @2 and @3 can be nice 
side effect. If you reduce @2 to zero, then @5 should be possible 
without any other.



Sorry, I do not completely understand your point here
You can use normal (permanent) table and you will not have to create 
them again and again. It is also possible to use them for storing 
temporary data - just need to truncate table when data is not needed any 
more.
Certainly you can not use the same table in more than one backend. Here 
is the main advantage of temp tables - you can have storage of 
per-session data and do not worry about possible name conflicts.


From the other side: there are many cases where format of temporary 
data is not statically known: it is determined dynamically during 
program execution.
In this case local temp table provides the most convenient mechanism for 
working with such data.


This is why I think that ewe need to have both local and global temp tables.

Also I do not agree with your statement "If you reduce @2 to zero, then 
@5 should be possible without any other".
In the solution implemented by Aleksander Alekseev metadata of temporary 
tables is kept in memory and not affecting catalog at all.

But them still can not be used at replica.
There are still some serious problems which need to be fixed to able it:
allow insert/update/delete statements for read-only transactions, 
somehow assign XIDs for them, implement savepoints and rollback of such 
transactions.

All this was done in the last version of my patch.
Yes, it doesn't depend on whether we are using shared or private buffers 
for temporary tables. The same approach can be implemented for both of them.
The question is whether we are really need temp tables at replica and if 
so, do we need full transaction support for them, including rollbacks, 
subtransactions.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re:PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2019-09-23 Thread Thunder
Is this an issue? 
Can we fix like this?
Thanks!






At 2019-09-22 00:38:03, "Thunder"  wrote:

The step to reproduce this issue.
1. Create a table
create table gist_point_tbl(id int4, p point);
create index gist_pointidx on gist_point_tbl using gist(p);
2. Insert data
insert into gist_point_tbl (id, p) select g,point(g*10, g*10) from 
generate_series(1, 100) g;
3. Delete data
 delete from gist_point_bl;
4. Vacuum table
vacuum gist_point_tbl;
-- Send SIGINT to vacuum process after WAL-log of the truncation is flushed 
and the truncation is not finished
-- We will receive error message "ERROR:  canceling statement due to user 
request"
5. Vacuum table again
vacuum gist_point tbl;
-- The standby node crashed and the PANIC log is "PANIC:  WAL contains 
references to invalid pages"


The standby node succeed to replay truncate log but master node doesn't 
truncate the file, it will be crashed if master node writes to these blocks 
which truncated in standby node.
I try to fix issue to prevent query cancel interrupts during truncating.


diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 5df4382b7e..04b696ae01 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -26,6 +26,7 @@
 #include "access/xlogutils.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "miscadmin.h"
 #include "storage/freespace.h"
 #include "storage/smgr.h"
 #include "utils/memutils.h"
@@ -248,6 +249,14 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
if (vm)
visibilitymap_truncate(rel, nblocks);


+   /*
+* When master node flush WAL-log of the truncation and then receive 
SIGINT signal to cancel
+* this transaction before the truncation, if standby receive this 
WAL-log and do the truncation,
+* standby node will crash when master node writes to these blocks 
which are truncated in standby node.
+* So we prevent query cancel interrupts.
+*/
+   HOLD_CANCEL_INTERRUPTS();
+
/*
 * We WAL-log the truncation before actually truncating, which means
 * trouble if the truncation fails. If we then crash, the WAL replay
@@ -288,6 +297,8 @@ RelationTruncate(Relation rel, BlockNumber nblocks)


/* Do the real work */
smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+
+   RESUME_CANCEL_INTERRUPTS();
 }




 

Re: pgbench - allow to create partitioned tables

2019-09-23 Thread Fabien COELHO


Hello Amit,


It is better for a user to write a custom script for such cases.


I kind-of agree, but IMHO this is not for pgbench to decide what is better 
for the user and to fail on a script that would not fail.



Because after that "select-only" or "simple-update" script doesn't
make any sense. [...].


What make sense in a benchmarking context may not be what you think. For 
instance, AFAICR, I already removed benevolent but misplaced guards which 
were preventing running scripts without queries: if one wants to look at 
pgbench overheads because they are warry that it may be too high, they 
really need to be allowed to run such scripts.


This not for us to decide, and as I already said they do if you want to 
test no-op overheads. Moreover the problem pre-exists: if the user deletes 
the contents of pgbench_accounts these scripts are no-op, and we do not 
complain. The no partition attached is just a particular case.


Having said that, I see your point and don't mind allowing such cases 
until we don't have to write special checks in the code to support such 
cases.


Indeed, it is also simpler to not care about such issues in the code.

[...] Now, we can have a detailed comment in printResults to explain why 
we have a different check there as compare to other code paths or change 
other code paths to have a similar check as printResults, but I am not 
convinced of any of those options.


Yep. ISTM that the current version is reasonable.

[...] I am talking about the call to append_fillfactor in 
createPartitions() function.  See, in my version, there are some 
comments.


Ok, I understand that you want a comment. Patch v15 does that.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..9e45f00fb5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,25 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/*
+ * Number of "pgbench_accounts" partitions, found or to create.
+ * When creating, 0 is the default and means no partitioning.
+ * When running, this is the actual number of partitions.
+ */
+static int	partitions = 0;
+
+/* partitioning strategy for "pgbench_accounts" */
+typedef enum
+{
+	PART_NONE,		/* no partitioning */
+	PART_RANGE,	/* range partitioning */
+	PART_HASH		/* hash partitioning */
+}
+			partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = {"none", "range", "hash"};
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +636,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3623,87 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	/* as default is 100, it could be removed in this case */
+	snprintf(opts + strlen(opts), len - strlen(opts),
+			 " with (fillfactor=%d)", fillfactor);
+}
+
+/*
+ * Create "pgbench_accounts" partitions if needed.
+ *
+ * This is the larger table of pgbench default tpc-b like schema
+ * with a known size, so that it can be partitioned by range.
+ */
+static void
+createPartitions(PGconn *con)
+{
+	char		ff[64];
+
+	ff[0] = '\0';
+
+	/*
+	 * Per ddlinfo in initCreateTables below, fillfactor is needed on
+	 * table pgbench_accounts.
+	 */
+	append_fillfactor(ff, sizeof(ff));
+
+	Assert(partitions > 0);
+
+	fprintf(stderr, "creating %d partitions...\n",