Re: [proposal] recovery_target "latest"

2019-11-12 Thread Kyotaro Horiguchi
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin  
wrote in 
> 
> On 11/8/19 7:00 AM, Grigory Smolkin wrote:
> >
> > On 11/7/19 4:36 PM, Grigory Smolkin wrote:
> >> I gave it some thought and now think that prohibiting recovery_target
> >> 'latest' and standby was a bad idea.
> >> All recovery_targets follow the same pattern of usage, so
> >> recovery_target "latest" also must be capable of working in standby
> >> mode.
> >> All recovery_targets have a clear deterministic 'target' where
> >> recovery should end.
> >> In case of recovery_target "latest" this target is the end of
> >> available WAL, therefore the end of available WAL must be more clearly
> >> defined.
> >> I will work on it.
> >>
> >> Thank you for a feedback.
> >
> >
> > Attached new patch revision, now end of available WAL is defined as
> > the fact of missing required WAL.
> > In case of standby, the end of WAL is defined as 2 consecutive
> > switches of WAL source, that didn`t provided requested record.
> > In case of streaming standby, each switch of WAL source is forced
> > after 3 failed attempts to get new data from walreceiver.
> >
> > All constants are taken off the top of my head and serves as proof of
> > concept.
> > TAP test is updated.
> >
> Attached new revision, it contains some minor refactoring.

Thanks for the new patch. I found that it needs more than I thought,
but it seems a bit too complicated and less stable.

As the patch does, WaitForWALToBecomeAvailable needs to exit when
avaiable sources are exhausted. However, we don't need to count
failures to do that. It is enough that the function have two more exit
point. When streaming timeout fires, and when we found that streaming
is not set up when archive/wal failed.

In my opinion, it is better that we have less dependency to global
variables in such deep levels in a call hierachy. Such nformation can
be stored in XLogPageReadPrivate.

I think The doc needs to exiplain on the difference between default
and latest.

Please find the attached, which illustrates the first two points of
the aboves.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3b766e66b9..8c8a1c6bf0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -808,6 +808,7 @@ typedef struct XLogPageReadPrivate
 	int			emode;
 	bool		fetching_ckpt;	/* are we fetching a checkpoint record? */
 	bool		randAccess;
+	bool		return_on_eow;  /* returns when reaching End of WAL */
 } XLogPageReadPrivate;
 
 /*
@@ -887,7 +888,9 @@ static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
 static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
-		bool fetching_ckpt, XLogRecPtr tliRecPtr);
+		bool fetching_ckpt,
+		XLogRecPtr tliRecPtr,
+		bool return_on_eow);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -4253,6 +4256,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 	private->fetching_ckpt = fetching_ckpt;
 	private->emode = emode;
 	private->randAccess = (RecPtr != InvalidXLogRecPtr);
+	private->return_on_eow = (recoveryTarget == RECOVERY_TARGET_LATEST);
 
 	/* This is the first attempt to read this page. */
 	lastSourceFailed = false;
@@ -4371,8 +4375,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 continue;
 			}
 
-			/* In standby mode, loop back to retry. Otherwise, give up. */
-			if (StandbyMode && !CheckForStandbyTrigger())
+			/*
+			 * In standby mode, loop back to retry. Otherwise, give up.
+			 * If we told to return on end of WAL, also give up.
+			 */
+			if (StandbyMode && !CheckForStandbyTrigger() &&
+!private->return_on_eow)
 continue;
 			else
 return NULL;
@@ -7271,7 +7279,7 @@ StartupXLOG(void)
 			 * end of main redo apply loop
 			 */
 
-			if (reachedStopPoint)
+			if (reachedStopPoint || recoveryTarget == RECOVERY_TARGET_LATEST)
 			{
 if (!reachedConsistency)
 	ereport(FATAL,
@@ -7473,6 +7481,8 @@ StartupXLOG(void)
 	 recoveryStopName);
 		else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
 			snprintf(reason, sizeof(reason), "reached consistency");
+		else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+			snprintf(reason, sizeof(reason), "end of WAL");
 		else
 			snprintf(reason, sizeof(reason), "no recovery target specified");
 
@@ -11605,7 +11615,8 @@ retry:
 		if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
 		 private->randAccess,
 		 private->fetching_ckpt,
-		 targetRecPtr))
+		 targetRecPtr,
+		 private->return_on_eow))
 		{
 			if (readFile >= 0)
 close(readFile);
@@ -11745,7 +11756,9 @@ next_record

Re: FETCH FIRST clause WITH TIES option

2019-11-12 Thread Surafel Temesgen
On Mon, Nov 11, 2019 at 5:56 PM Alvaro Herrera 
wrote:

> First, I noticed that there's a significant unanswered issue from Andrew
> Gierth about this using a completely different mechanism, namely an
> implicit window function.  I see that Robert was concerned about the
> performance of Andrew's proposed approach, but I don't see any reply
> from Surafel on the whole issue.
>
>
i don't know much about window function implementation but am sure teaching
window
function about limit and implementing limit
*variant on top of it will not be much simpler *

*and performant than the current implementation. i also think more
appropriate place to *

*include limit variant is limitNode not other place which can case
redundancy  and *

*maintainability issue *


*regards *

*Surafel *


Re: Coding in WalSndWaitForWal

2019-11-12 Thread Michael Paquier
On Tue, Nov 12, 2019 at 11:27:16AM -0800, Andres Freund wrote:
> It seems to me it'd be better to just remove the "get a more recent
> flush pointer" block - it doesn't seem to currently surve a meaningful
> purpose.

+1.  That was actually my suggestion upthread :)
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] use separate PartitionedRelOptions structure to store partitioned table options

2019-11-12 Thread Michael Paquier
On Tue, Nov 12, 2019 at 01:50:03PM +0900, Michael Paquier wrote:
> We have been through great length to have build_reloptions, so
> wouldn't it be better to also have this code path do so?  Sure you
> need to pass NULL for the parsing table..  But there is a point to
> reduce the code paths using directly parseRelOptions() and the
> follow-up, expected calls to allocate and to fill in the set of
> reloptions.

So I have been looking at this one, and finished with the attached.
It looks much better to use build_reloptions() IMO, taking advantage
of the same sanity checks present for the other relkinds.
--
Michael
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index db42aa35e0..ee7248ad58 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -306,6 +306,7 @@ extern bytea *default_reloptions(Datum reloptions, bool validate,
  relopt_kind kind);
 extern bytea *heap_reloptions(char relkind, Datum reloptions, bool validate);
 extern bytea *view_reloptions(Datum reloptions, bool validate);
+extern bytea *partitioned_table_reloptions(Datum reloptions, bool validate);
 extern bytea *index_reloptions(amoptions_function amoptions, Datum reloptions,
 			   bool validate);
 extern bytea *attribute_reloptions(Datum reloptions, bool validate);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 8b8b237f0d..1eb323a5d4 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -277,6 +277,16 @@ typedef struct StdRdOptions
 #define HEAP_MIN_FILLFACTOR			10
 #define HEAP_DEFAULT_FILLFACTOR		100
 
+/*
+ * PartitionedRelOptions
+ * Binary representation of relation options for partitioned tables.
+ */
+typedef struct PartitionedRelOptions
+{
+   int32   vl_len_;/* varlena header (do not touch directly!) */
+   /* No options defined yet */
+} PartitionedRelOptions;
+
 /*
  * RelationGetToastTupleTarget
  *		Returns the relation's toast_tuple_target.  Note multiple eval of argument!
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index d8790ad7a3..6600a154ab 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1099,9 +1099,11 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
 		case RELKIND_RELATION:
 		case RELKIND_TOASTVALUE:
 		case RELKIND_MATVIEW:
-		case RELKIND_PARTITIONED_TABLE:
 			options = heap_reloptions(classForm->relkind, datum, false);
 			break;
+		case RELKIND_PARTITIONED_TABLE:
+			options = partitioned_table_reloptions(datum, false);
+			break;
 		case RELKIND_VIEW:
 			options = view_reloptions(datum, false);
 			break;
@@ -1571,6 +1573,21 @@ build_reloptions(Datum reloptions, bool validate,
 	return rdopts;
 }
 
+/*
+ * Option parser for partitioned tables
+ */
+bytea *
+partitioned_table_reloptions(Datum reloptions, bool validate)
+{
+	/*
+	 * There are no options for partitioned tables yet, but this is
+	 * able to do some validation.
+	 */
+	return (bytea *) build_reloptions(reloptions, validate,
+	  RELOPT_KIND_PARTITIONED,
+	  0, NULL, 0);
+}
+
 /*
  * Option parser for views
  */
@@ -1614,9 +1631,6 @@ heap_reloptions(char relkind, Datum reloptions, bool validate)
 		case RELKIND_RELATION:
 		case RELKIND_MATVIEW:
 			return default_reloptions(reloptions, validate, RELOPT_KIND_HEAP);
-		case RELKIND_PARTITIONED_TABLE:
-			return default_reloptions(reloptions, validate,
-	  RELOPT_KIND_PARTITIONED);
 		default:
 			/* other relkinds are not supported */
 			return NULL;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 496c206332..45aae5955d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -719,10 +719,17 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	reloptions = transformRelOptions((Datum) 0, stmt->options, NULL, validnsps,
 	 true, false);
 
-	if (relkind == RELKIND_VIEW)
-		(void) view_reloptions(reloptions, true);
-	else
-		(void) heap_reloptions(relkind, reloptions, true);
+	switch (relkind)
+	{
+		case RELKIND_VIEW:
+			(void) view_reloptions(reloptions, true);
+			break;
+		case RELKIND_PARTITIONED_TABLE:
+			(void) partitioned_table_reloptions(reloptions, true);
+			break;
+		default:
+			(void) heap_reloptions(relkind, reloptions, true);
+	}
 
 	if (stmt->ofTypename)
 	{
@@ -12187,9 +12194,11 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 		case RELKIND_RELATION:
 		case RELKIND_TOASTVALUE:
 		case RELKIND_MATVIEW:
-		case RELKIND_PARTITIONED_TABLE:
 			(void) heap_reloptions(rel->rd_rel->relkind, newOptions, true);
 			break;
+		case RELKIND_PARTITIONED_TABLE:
+			(void) partitioned_table_reloptions(newOptions, true);
+			break;
 		case RELKIND_VIEW:
 			(void) view_reloptions(newOptions, true);
 			break;


signature.asc
Description: PGP signature


Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-11-12 Thread Michael Paquier
On Wed, Nov 13, 2019 at 02:29:49PM +0900, Amit Langote wrote:
> On Wed, Nov 13, 2019 at 2:18 PM Michael Paquier  wrote:
>> There could be an argument for keeping
>> GET_STRING_RELOPTION actually which is still useful to get a string
>> value in an option set using the stored offset, and we have
>> the recently-added dummy_index_am in this category.  Any thoughts?
> 
> Not sure, maybe +0.5 on keeping GET_STRING_RELOPTION.

Thinking more about it, I would tend to keep this one around.  I'll
wait a couple of days before coming back to it.
--
Michael


signature.asc
Description: PGP signature


Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2019-11-12 Thread Michael Paquier
On Tue, Nov 12, 2019 at 10:33:16AM -0800, Andres Freund wrote:
> On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote:
>> On 11 Nov 2019, at 09:32, Michael Paquier  wrote:
>> 
>>> This part has resulted in 75c1921, and we could just change
>>> DecodeMultiInsert() so as if there is no tuple data then we'd just
>>> leave.  However, I don't feel completely comfortable with that either
>>> as it would be nice to still check that for normal relations we
>>> *always* have a FPW available.
>> 
>> XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
>> IIUC (that is, non logically decoded relations), so it seems to me that we 
>> can
>> have a fastpath out of DecodeMultiInsert() which inspects that flag without
>> problems. Is this proposal along the lines of what you were thinking?
> 
> Maybe I'm missing something, but it's the opposite, no?

>   boolneed_tuple_data = RelationIsLogicallyLogged(relation);

Yep.  This checks after IsCatalogRelation().

> ...
>   if (need_tuple_data)
>   xlrec->flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;
> 
> or am I misunderstanding what you mean?

Not sure that I can think about a good way to properly track if the
new tuple data is associated to a catalog or not, aka if the data is
expected all the time or not to get a good sanity check when doing the
multi-insert decoding.  We could extend xl_multi_insert_tuple with a
flag to track that, but that seems like an overkill for a simple
sanity check..
--
Michael


signature.asc
Description: PGP signature


RE: Recovery performance of DROP DATABASE with many tablespaces

2019-11-12 Thread k.jami...@fujitsu.com
On Wed, Oct. 2, 2019 5:40 PM, Fujii Masao wrote:
> On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier  wrote:
> >
> > On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote:
> > > TBH, I have no numbers measured by the test.
> > > One question about your test is; how did you measure the *recovery
> > > time* of DROP DATABASE? Since it's *recovery* performance, basically
> > > it's not easy to measure that.
> >
> > It would be simple to measure the time it takes to replay this single
> > DROP DATABASE record by putting two gettimeofday() calls or such
> > things and then take the time difference.  There are many methods that
> > you could use here, and I suppose that with a shared buffer setting of
> > a couple of GBs of shared buffers you would see a measurable
> > difference with a dozen of tablespaces or so.  You could also take a
> > base backup after creating all the tablespaces, connect the standby
> > and then drop the database on the primary to see the actual time it
> > takes.  Your patch looks logically correct to me because
> > DropDatabaseBuffers is a
> > *bottleneck* with large shared_buffers, and it would be nice to see
> > numbers.
> 
> Thanks for the comment!
> 
> I measured how long it takes to replay DROP DATABASE with 1000 tablespaces,
> in master and patched version. shared_buffers was set to 16GB.
> 
> [master]
> It took 8 seconds to replay DROP DATABASE with 1000 tablespaces, as follows.
> In this case, 16GB shared_buffers was fully scanned 1000 times.
> 
> 2019-10-02 16:50:14 JST LOG:  redo starts at 0/228
> 2019-10-02 16:50:22 JST LOG:  redo done at 0/300A298
> 
> [patched]
> It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,
> as follows. In this case, 16GB shared_buffers was scanned only one time.
> 
> 2019-10-02 16:47:03 JST LOG:  redo starts at 0/228
> 2019-10-02 16:47:03 JST LOG:  redo done at 0/3001588
> 

Hi Fujii-san,

It's been a while, so I checked the patch once again.
It's fairly straightforward and I saw no problems nor bug in the code.

> [patched]
> It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,
The results are good.
I want to replicate the performance to confirm the results as well.
Could you share how you measured the recovery replay?
Did you actually execute a failover?

Regards,
Kirk Jamison


Re: pg_waldump and PREPARE

2019-11-12 Thread Andrey Lepikhov




12.11.2019 12:41, Fujii Masao пишет:

Ok, I changed the patch that way.
Attached is the latest version of the patch.

Regards,


I did not see any problems in this version of the patch. The information 
displayed by pg_waldump for the PREPARE record is sufficient for use.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Re: Remove HeapTuple and Buffer dependency for predicate locking functions

2019-11-12 Thread Ashwin Agrawal
On Sun, Nov 10, 2019 at 8:21 PM Thomas Munro  wrote:

> I pushed the first two,


Thank You!

but on another read-through of the main patch
> I didn't like the comments for CheckForSerializableConflictOut() or
> the fact that it checks SerializationNeededForRead() again, after I
> thought a bit about what the contract for this API is now.  Here's a
> version with small fixup that I'd like to squash into the patch.
> Please let me know what you think,


The thought or reasoning behind having SerializationNeededForRead()
inside CheckForSerializableConflictOut() is to keep that API clean and
complete by itself. Only if AM like heap needs to perform some AM
specific checking only for serialization needed case can do so but is
not forced. So, if AM for example Zedstore doesn't need to do any
specific checking, then it can directly call
CheckForSerializableConflictOut(). With the modified fixup patch, the
responsibility is unnecessarily forced to caller even if
CheckForSerializableConflictOut() can do it. I understand the intent
is to avoid duplicate check for heap.

>
> or if you see how to improve it
> further.
>

I had proposed as alternative way in initial email and also later,
didn't receive comment on that, so re-posting.

Alternative way to refactor. CheckForSerializableConflictOut() can
take callback function and (void *) callback argument for AM specific
check. So, the flow would be AM calling
CheckForSerializableConflictOut() as today and only if
SerializationNeededForRead() will invoke the callback to check with AM
if more work should be performed or not. Essentially
HeapCheckForSerializableConflictOut() will become callback function
instead. So, roughly would look like

typedef bool (*AMCheckForSerializableConflictOutCallback) (void *arg);

void CheckForSerializableConflictOut(Relation relation, TransactionId xid,
Snapshot snapshot, AMCheckForSerializableConflictOutCallback callback, void
*callback_arg)
{
if (!SerializationNeededForRead(relation, snapshot))
   return;
if (callback != NULL && !callback(callback_args))
   return;

.
}

With this AMs which don't have any specific checks to perform can pass
callback as NULL. So, function call is involved only if
SerializationNeededForRead() and only for AMs which need it.

Just due to void* callback argument aspect I didn't prefer that
solution and felt AM performing checks and calling
CheckForSerializableConflictOut() seems better.  Please let me know
how you feel about this.


Re: dropdb --force

2019-11-12 Thread Pavel Stehule
st 13. 11. 2019 v 7:12 odesílatel Amit Kapila 
napsal:

> On Tue, Nov 12, 2019 at 11:17 AM Amit Kapila 
> wrote:
> >
> > I am planning to commit this patch tomorrow unless I see more comments
> > or interest from someone else to review this.
> >
>
> Pushed.  Pavel, feel free to submit dropdb utility-related patch if you
> want.
>

I hope I send this patch today. It's simply job.

Pavel


> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: dropdb --force

2019-11-12 Thread Amit Kapila
On Tue, Nov 12, 2019 at 11:17 AM Amit Kapila  wrote:
>
> I am planning to commit this patch tomorrow unless I see more comments
> or interest from someone else to review this.
>

Pushed.  Pavel, feel free to submit dropdb utility-related patch if you want.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Amit Kapila
On Tue, Nov 12, 2019 at 3:14 PM Mahendra Singh  wrote:
>
> On Mon, 11 Nov 2019 at 16:36, Amit Kapila  wrote:
> >
> > On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh  wrote:
> > >
> > >
> > > For small indexes also, we gained some performance by parallel vacuum.
> > >
> >
> > Thanks for doing all these tests.  It is clear with this and previous
> > tests that this patch has benefit in wide variety of cases.  However,
> > we should try to see some worst cases as well.  For example, if there
> > are multiple indexes on a table and only one of them is large whereas
> > all others are very small say having a few 100 or 1000 rows.
> >
>
> Thanks Amit for your comments.
>
> I did some testing on the above suggested lines. Below is the summary:
> Test case:(I created 16 indexes but only 1 index is large, other are very 
> small)
> create table test(a int, b int, c int, d int, e int, f int, g int, h int);
> create index i3 on test (a) where a > 2000 and a < 3000;
> create index i4 on test (a) where a > 3000 and a < 4000;
> create index i5 on test (a) where a > 4000 and a < 5000;
> create index i6 on test (a) where a > 5000 and a < 6000;
> create index i7 on test (b) where a < 1000;
> create index i8 on test (c) where a < 1000;
> create index i9 on test (d) where a < 1000;
> create index i10 on test (d) where a < 1000;
> create index i11 on test (d) where a < 1000;
> create index i12 on test (d) where a < 1000;
> create index i13 on test (d) where a < 1000;
> create index i14 on test (d) where a < 1000;
> create index i15 on test (d) where a < 1000;
> create index i16 on test (d) where a < 1000;
> insert into test select i,i,i,i,i,i,i,i from generate_series(1,100) as i;
> delete from test where a %2=0;
>
> case 1: vacuum without using parallel workers.
> vacuum test;
> 228.259 ms
>
> case 2: vacuum with 1 parallel worker.
> vacuum (parallel 1) test;
> 251.725 ms
>
> case 3: vacuum with 3 parallel workers.
> vacuum (parallel 3) test;
> 259.986
>
> From above results, it seems that if indexes are small, then parallel vacuum 
> is not beneficial as compared to normal vacuum.
>

Right and that is what is expected as well.  However, I think if
somehow disallow very small indexes to use parallel worker, then it
will be better.   Can we use  min_parallel_index_scan_size to decide
whether a particular index can participate in a parallel vacuum?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Masahiko Sawada
On Wed, 13 Nov 2019 at 12:43, Dilip Kumar  wrote:
>
> On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar  wrote:
> > >
> > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar  wrote:
> > > > >
> > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > > I realized that v31-0006 patch doesn't work fine so I've attached 
> > > > > > the
> > > > > > updated version patch that also incorporated some comments I got so
> > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and 
> > > > > > also
> > > > > > test the total delay time.
> > > > > >
> > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > dividing the maintainance_work_mem
> > > > >
> > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int 
> > > > > nindexes)
> > > > > +{
> > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > + lvshared->maintenance_work_mem_worker =
> > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > maintenance_work_mem;
> > > > > +}
> > > > > Is it fair to just consider the number of indexes which use
> > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > the point in dividing the maintenance_work_mem by 10.
> > > > >
> > > > > IMHO the calculation should be like this
> > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > maintenance_work_mem;
> > > > >
> > > > > Am I missing something?
> > > >
> > > > No, I think you're right. On the other hand I think that dividing it
> > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > parallel at a time.
> > > >
> > > Right, thats the reason I suggested divide with Min(nindexes_mwm, 
> > > nworkers).
> >
> > Thanks! I'll fix it in the next version patch.
> >
> One more comment.
>
> +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps)
> +{
> + 
>
> + if (ParallelVacuumIsActive(lps))
> + {
>
> +
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
> +
> + }
> +
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + /*
> + * Skip indexes that we have already vacuumed during parallel index
> + * vacuuming.
> + */
> + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> + continue;
> +
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + }
> +}
>
> In this function, if ParallelVacuumIsActive, we perform the parallel
> vacuum for all the index for which parallel vacuum is supported and
> once that is over we finish vacuuming remaining indexes for which
> parallel vacuum is not supported.  But, my question is that inside
> lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> to finish their job then only we start with the sequential vacuuming
> shouldn't we start that immediately as soon as the leader
> participation is over in the parallel vacuum?

If we do that, while the leader process is vacuuming indexes that
don't not support parallel vacuum sequentially some workers might be
vacuuming for other indexes. Isn't it a problem? If it's not problem,
I think we can tie up indexes that don't support parallel vacuum to
the leader and do parallel index vacuum.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Invisible PROMPT2

2019-11-12 Thread Pavel Stehule
st 13. 11. 2019 v 4:15 odesílatel Thomas Munro 
napsal:

> Hello hackers,
>
> From the advanced bikeshedding department: I'd like my psql
> transcripts to have the usual alignment, but be easier to copy and
> paste later without having weird prompt stuff in the middle.  How
> about a prompt format directive %w that means "whitespace of the same
> width as %/"?  Then you can make set your PROMPT2 to '%w   ' and it
> becomes invisible:
>
> pgdu=# create table foo (
>  i int,
>  j int
>);
> CREATE TABLE
> pgdu=#
>

+1

Pavel


Re: pg_waldump and PREPARE

2019-11-12 Thread Michael Paquier
On Tue, Nov 12, 2019 at 06:41:12PM +0900, Fujii Masao wrote:
> Ok, I changed the patch that way.
> Attached is the latest version of the patch.

Thanks for the new patch.  Looks fine to me.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Amit Kapila
On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila  wrote:
>
> Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
> confused with option 1.
>

Let me try to summarize the discussion on this point and see if others
have any opinion on this matter.

We need a way to allow IndexAm to specify whether it can participate
in a parallel vacuum.  As we know there are two phases of
index-vacuum, bulkdelete and vacuumcleanup and in many cases, the
bulkdelete performs the main deletion work and then vacuumcleanup just
returns index statistics. So, for such cases, we don't want the second
phase to be performed by a parallel vacuum worker.  Now, if the
bulkdelete phase is not performed, then vacuumcleanup can process the
entire index in which case it is better to do that phase via parallel
worker.

OTOH, in some cases vacuumcleanup takes another pass over-index to
reclaim empty pages and update record the same in FSM even if
bulkdelete is performed.  This happens in gin and bloom indexes.
Then, we have an index where we do all the work in cleanup phase like
in the case of brin indexes.  Now, for this category of indexes, we
want vacuumcleanup phase to be also performed by a parallel worker.

In short different indexes have different requirements for which phase
of index vacuum can be performed in parallel.  Just to be clear, we
can't perform both the phases (bulkdelete and cleanup) in one-go as
bulk-delete can happen multiple times on a large index whereas
vacuumcleanup is done once at the end.

Based on these needs, we came up with a way to allow users to specify
this information for IndexAm's. Basically, Indexam will expose a
variable amparallelvacuumoptions which can have below options

VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
vacuumcleanup) can't be performed in parallel
VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
flag)
VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
gin, gist,
spgist, bloom will set this flag)
VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
parallel even if bulkdelete is already performed (Indexes gin, brin,
and bloom will set this flag)

We have discussed to expose this information via two variables but the
above seems like a better idea to all the people involved.

Any suggestions?  Anyone thinks this is not the right way to expose
this information or there is no need to expose this information or
they have a better idea for this?

Sawada-San, Dilip, feel free to correct me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-11-12 Thread Amit Langote
On Wed, Nov 13, 2019 at 2:18 PM Michael Paquier  wrote:
> On Wed, Nov 13, 2019 at 10:52:52AM +0900, Amit Langote wrote:
> > Thanks for chiming in about that.  I guess that means that we don't
> > need those macros, except GET_STRING_RELOPTION_LEN() that's used in
> > allocateReloptStruct(), which can be moved to reloptions.c.  Is that
> > correct?
>
> I have been looking on the net to see if there are any traces of code
> using those macros, but could not find any.  The last trace of a macro
> use is in 8ebe1e3, which just relies on GET_STRING_RELOPTION_LEN.  So
> it looks rather convincing now to just remove this code.  Attached is
> a patch for that.

Thank you.

> There could be an argument for keeping
> GET_STRING_RELOPTION actually which is still useful to get a string
> value in an option set using the stored offset, and we have
> the recently-added dummy_index_am in this category.  Any thoughts?

Not sure, maybe +0.5 on keeping GET_STRING_RELOPTION.

Thanks,
Amit




Re: SPI error with non-volatile functions

2019-11-12 Thread Tom Mercha
On 13/11/2019 06:13, Andres Freund wrote:
> Hi,
> 
> On 2019-11-13 05:09:31 +, Tom Mercha wrote:
>> I've been using SPI to execute some queries and this time I've tried to
>> issue CREATE TABLE commands through SPI. I've been getting the message
>> "ERROR: CREATE TABLE AS is not allowed in a non-volatile function".
> 
> Any chance you're specifying read_only = true to
> SPI_execute()/execute_plan()/...?
> 
> Greetings,
> 
> Andres Freund
> 

Dear Andres

That's exactly what's up! Everything is working as intended now. So 
sorry this was a bit silly of me, I didn't understand the message as a 
reference to that configuration.

Thanks so much.

Best regards
Tom


Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-11-12 Thread Michael Paquier
On Wed, Nov 13, 2019 at 10:52:52AM +0900, Amit Langote wrote:
> Thanks for chiming in about that.  I guess that means that we don't
> need those macros, except GET_STRING_RELOPTION_LEN() that's used in
> allocateReloptStruct(), which can be moved to reloptions.c.  Is that
> correct?

I have been looking on the net to see if there are any traces of code
using those macros, but could not find any.  The last trace of a macro
use is in 8ebe1e3, which just relies on GET_STRING_RELOPTION_LEN.  So
it looks rather convincing now to just remove this code.  Attached is
a patch for that.  There could be an argument for keeping
GET_STRING_RELOPTION actually which is still useful to get a string
value in an option set using the stored offset, and we have
the recently-added dummy_index_am in this category.  Any thoughts?
--
Michael
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index db42aa35e0..e24fdd1975 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -148,124 +148,6 @@ typedef struct
 	int			offset;			/* offset of field in result struct */
 } relopt_parse_elt;
 
-
-/*
- * These macros exist for the convenience of amoptions writers (but consider
- * using fillRelOptions, which is a lot simpler).  Beware of multiple
- * evaluation of arguments!
- *
- * The last argument in the HANDLE_*_RELOPTION macros allows the caller to
- * determine whether the option was set (true), or its value acquired from
- * defaults (false); it can be passed as (char *) NULL if the caller does not
- * need this information.
- *
- * optname is the option name (a string), var is the variable
- * on which the value should be stored (e.g. StdRdOptions->fillfactor), and
- * option is a relopt_value pointer.
- *
- * The normal way to use this is to loop on the relopt_value array returned by
- * parseRelOptions:
- * for (i = 0; options[i].gen->name; i++)
- * {
- *		if (HAVE_RELOPTION("fillfactor", options[i])
- *		{
- *			HANDLE_INT_RELOPTION("fillfactor", rdopts->fillfactor, options[i], &isset);
- *			continue;
- *		}
- *		if (HAVE_RELOPTION("default_row_acl", options[i])
- *		{
- *			...
- *		}
- *		...
- *		if (validate)
- *			ereport(ERROR,
- *	(errmsg("unknown option")));
- *	}
- *
- *	Note that this is more or less the same that fillRelOptions does, so only
- *	use this if you need to do something non-standard within some option's
- *	code block.
- */
-#define HAVE_RELOPTION(optname, option) \
-	(strncmp(option.gen->name, optname, option.gen->namelen + 1) == 0)
-
-#define HANDLE_INT_RELOPTION(optname, var, option, wasset)		\
-	do {		\
-		if (option.isset)		\
-			var = option.values.int_val;		\
-		else	\
-			var = ((relopt_int *) option.gen)->default_val;		\
-		(wasset) != NULL ? *(wasset) = option.isset : (dummyret)NULL; \
-	} while (0)
-
-#define HANDLE_BOOL_RELOPTION(optname, var, option, wasset)			\
-	do {			\
-		if (option.isset)		\
-			var = option.values.bool_val;		\
-		else	\
-			var = ((relopt_bool *) option.gen)->default_val;	\
-		(wasset) != NULL ? *(wasset) = option.isset : (dummyret) NULL; \
-	} while (0)
-
-#define HANDLE_REAL_RELOPTION(optname, var, option, wasset)		\
-	do {		\
-		if (option.isset)		\
-			var = option.values.real_val;		\
-		else	\
-			var = ((relopt_real *) option.gen)->default_val;	\
-		(wasset) != NULL ? *(wasset) = option.isset : (dummyret) NULL; \
-	} while (0)
-
-/*
- * Note that this assumes that the variable is already allocated at the tail of
- * reloptions structure (StdRdOptions or equivalent).
- *
- * "base" is a pointer to the reloptions structure, and "offset" is an integer
- * variable that must be initialized to sizeof(reloptions structure).  This
- * struct must have been allocated with enough space to hold any string option
- * present, including terminating \0 for every option.  SET_VARSIZE() must be
- * called on the struct with this offset as the second argument, after all the
- * string options have been processed.
- */
-#define HANDLE_STRING_RELOPTION(optname, var, option, base, offset, wasset) \
-	do {		\
-		relopt_string *optstring = (relopt_string *) option.gen;\
-		char *string_val;		\
-		if (option.isset)		\
-			string_val = option.values.string_val;\
-		else if (!optstring->default_isnull)	\
-			string_val = optstring->default_val;\
-		else	\
-			string_val = NULL;	\
-		(wasset) != NULL ? *(wasset) = option.isset : (dummyret) NULL; \
-		if (string_val == NULL)	\
-			var = 0;			\
-		else	\
-		{		\
-			strcpy(((char *)(base)) + (offset), string_val);	\
-			var = (offset);		\
-			(offset) += strlen(string_val) + 1;	\
-		}		\
-	} while (0)
-
-/*
- * For use during amoptions: get the strlen of a string option
- * (either default or the user defined value)
- */
-#define GET_STRING_RELOPTION

Re: SPI error with non-volatile functions

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-13 05:09:31 +, Tom Mercha wrote:
> I've been using SPI to execute some queries and this time I've tried to 
> issue CREATE TABLE commands through SPI. I've been getting the message 
> "ERROR: CREATE TABLE AS is not allowed in a non-volatile function".

Any chance you're specifying read_only = true to
SPI_execute()/execute_plan()/...?

Greetings,

Andres Freund




SPI error with non-volatile functions

2019-11-12 Thread Tom Mercha
Dear Hackers

I've been using SPI to execute some queries and this time I've tried to 
issue CREATE TABLE commands through SPI. I've been getting the message 
"ERROR: CREATE TABLE AS is not allowed in a non-volatile function".

I'm a bit confused because my functions are set as volatile when I got 
that result. I was sure I'd be able to issue CREATE TABLE through SPI 
because a possible return value of SPI_execute is SPI_OK_UTILITY if a 
utility command such as CREATE TABLE was executed.

Maybe the caveat is the following. I am actually invoking my function 
through CREATE LANGUAGE's inline handler using an anonymous do block. So 
maybe I need to take some additional considerations for this reason?

The following is what my functions look like:

CREATE FUNCTION myLanguage.myLanguage_function_call_handler()
RETURNS language_handler
AS 'MODULE_PATHNAME','myLanguage_function_call_handler'
LANGUAGE C VOLATILE;

CREATE FUNCTION myLanguage.myLanguage_inline_function_handler(internal)
RETURNS void
AS 'MODULE_PATHNAME','myLanguage_inline_function_handler'
LANGUAGE C VOLATILE;

CREATE LANGUAGE myLanguage
HANDLER myLanguage.myLanguage_function_call_handler
INLINE myLanguage.myLanguage_inline_function_handler;
COMMENT ON LANGUAGE myLanguage IS 'My Language';


Have I correctly approached the issue? Maybe there is a workaround?

Best regards
Tom


Re: [PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdw message.

2019-11-12 Thread Michael Paquier
On Tue, Nov 12, 2019 at 09:35:03AM +0900, Michael Paquier wrote:
> Indeed, thanks for looking.  I thought that the callback was called
> after checking for max_prepared_transaction, but that's not the case.
> So let's add at least a test case.  Any objections?

Okay, done.
--
Michael


signature.asc
Description: PGP signature


Re: cost based vacuum (parallel)

2019-11-12 Thread Masahiko Sawada
On Tue, 12 Nov 2019 at 20:22, Masahiko Sawada
 wrote:
>
> On Tue, 12 Nov 2019 at 19:08, Amit Kapila  wrote:
> >
> > On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar  wrote:
> > >
> > > On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar  
> > > wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Yeah, I think it is difficult to get the exact balance, but we 
> > > > > > > > can try
> > > > > > > > to be as close as possible.  We can try to play with the 
> > > > > > > > threshold and
> > > > > > > > another possibility is to try to sleep in proportion to the 
> > > > > > > > amount of
> > > > > > > > I/O done by the worker.
> > > > > > > I have done another experiment where I have done another 2 
> > > > > > > changes on
> > > > > > > top op patch3
> > > > > > > a) Only reduce the local balance from the total shared balance
> > > > > > > whenever it's applying delay
> > > > > > > b) Compute the delay based on the local balance.
> > > > > > >
> > > > > > > patch4:
> > > > > > > worker 0 delay=84.13 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 1 delay=89.23 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 2 delay=88.68 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 3 delay=80.79 total I/O=16378 hit=4318 miss=0 dirty=603
> > > > > > >
> > > > > > > I think with this approach the delay is divided among the worker 
> > > > > > > quite
> > > > > > > well compared to other approaches
> > > > > > >
> > > > > > > >
> > > > > ..
> > > > > > I have tested the same with some other workload(test file attached).
> > > > > > I can see the same behaviour with this workload as well that with 
> > > > > > the
> > > > > > patch 4 the distribution of the delay is better compared to other
> > > > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > > > have alsomost equal delay.  Only thing is that the total delay with
> > > > > > the patch 4 is slightly less compared to other pacthes.
> > > > > >
> > > > >
> > > > > I see one problem with the formula you have used in the patch, maybe
> > > > > that is causing the value of total delay to go down.
> > > > >
> > > > > - if (new_balance >= VacuumCostLimit)
> > > > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > > > + if ((new_balance >= VacuumCostLimit) &&
> > > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > > > >
> > > > > As per discussion, the second part of the condition should be
> > > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".  I think
> > > > > you can once change this and try again.  Also, please try with the
> > > > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > > > >
> > > > I have modified the patch4 and ran with different values.  But, I
> > > > don't see much difference in the values with the patch4.  Infact I
> > > > removed the condition for the local balancing check completely still
> > > > the delays are the same,  I think this is because with patch4 worker
> > > > are only reducing their own balance and also delaying as much as their
> > > > local balance.  So maybe the second condition will not have much
> > > > impact.
> > > >
> >
> > Yeah, but I suspect the condition (when the local balance exceeds a
> > certain threshold, then only try to perform delay) you mentioned can
> > have an impact in some other scenarios.  So, it is better to retain
> > the same.  I feel the overall results look sane and the approach seems
> > reasonable to me.
> >
> > > >
> > > I have revised the patch4 so that it doesn't depent upon the fix
> > > number of workers, instead I have dynamically updated the worker
> > > count.
> > >
> >
> > Thanks.  Sawada-San, by any chance, can you try some of the tests done
> > by Dilip or some similar tests just to rule out any sort of
> > machine-specific dependency?
>
> Sure. I'll try it tomorrow.

I've done some tests while changing shared buffer size, delays and
number of workers. The overall results has the similar tendency as the
result shared by Dilip and looks reasonable to me.

* test.sh

shared_buffers = '4GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=89.315000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=88.86 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=89.29 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=81.805000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '1GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=89.21 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=89.325000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=88.870

Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Dilip Kumar
On Wed, Nov 13, 2019 at 9:12 AM Dilip Kumar  wrote:
>
> On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar  wrote:
> > >
> > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar  wrote:
> > > > >
> > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > > I realized that v31-0006 patch doesn't work fine so I've attached 
> > > > > > the
> > > > > > updated version patch that also incorporated some comments I got so
> > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and 
> > > > > > also
> > > > > > test the total delay time.
> > > > > >
> > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > dividing the maintainance_work_mem
> > > > >
> > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int 
> > > > > nindexes)
> > > > > +{
> > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > + lvshared->maintenance_work_mem_worker =
> > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > maintenance_work_mem;
> > > > > +}
> > > > > Is it fair to just consider the number of indexes which use
> > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > the point in dividing the maintenance_work_mem by 10.
> > > > >
> > > > > IMHO the calculation should be like this
> > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > maintenance_work_mem;
> > > > >
> > > > > Am I missing something?
> > > >
> > > > No, I think you're right. On the other hand I think that dividing it
> > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > parallel at a time.
> > > >
> > > Right, thats the reason I suggested divide with Min(nindexes_mwm, 
> > > nworkers).
> >
> > Thanks! I'll fix it in the next version patch.
> >
> One more comment.
>
> +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps)
> +{
> + 
>
> + if (ParallelVacuumIsActive(lps))
> + {
>
> +
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
> +
> + }
> +
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + /*
> + * Skip indexes that we have already vacuumed during parallel index
> + * vacuuming.
> + */
> + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> + continue;
> +
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + }
> +}
>
> In this function, if ParallelVacuumIsActive, we perform the parallel
> vacuum for all the index for which parallel vacuum is supported and
> once that is over we finish vacuuming remaining indexes for which
> parallel vacuum is not supported.  But, my question is that inside
> lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> to finish their job then only we start with the sequential vacuuming
> shouldn't we start that immediately as soon as the leader
> participation is over in the parallel vacuum?
>

+ /*
+ * Since parallel workers cannot access data in temporary tables, parallel
+ * vacuum is not allowed for temporary relation.
+ */
+ if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
+ {
+ ereport(WARNING,
+ (errmsg("skipping vacuum on \"%s\" --- cannot vacuum temporary
tables in parallel",
+ RelationGetRelationName(onerel;
+ relation_close(onerel, lmode);
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+ /* It's OK to proceed with ANALYZE on this table */
+ return true;
+ }
+

If we can not support the parallel vacuum for the temporary table then
shouldn't we fall back to the normal vacuum instead of skipping the
table.  I think it's not fair that if the user has given system-wide
parallel vacuum then all the temp table will be skipped and not at all
vacuumed then user need to again perform normal vacuum on those
tables.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: libpq debug log

2019-11-12 Thread iwata....@fujitsu.com
Hello,

Thank you for your review.
I update patch. Please find attached my patch.


> > 2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT
> pg_catalog.set_config('search_path', '', false)"
> 
> The "59" there seems quite odd, though.
Could you explain more detail about this?

"59" is length of protocol message contents. (It does not contain first 1 
byte.) 
This order is based on the message format.
https://www.postgresql.org/docs/current/protocol-message-formats.html


> * What is this in pg_conn?  I don't understand why this array is of size
>   MAXPGPATH.
> + PGFrontendLogMsgEntry frontend_entry[MAXPGPATH];
>   This seems to make pg_conn much larger than we'd like.
Sure. I remove this and change code to use list.

> Minor review points:
I accept all these review points.

Regards,
Aya Iwata


v7-libpq-PQtrace-output-one-line.patch
Description: v7-libpq-PQtrace-output-one-line.patch


Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Amit Kapila
On Wed, Nov 13, 2019 at 8:34 AM Masahiko Sawada
 wrote:
>
> On Wed, 13 Nov 2019 at 11:38, Amit Kapila  wrote:
> >
>
> > It might be that we need to do it the way
> > I originally proposed the different values of amparallelvacuumoptions
> > or maybe some variant of it where the default value can clearly say
> > that IndexAm doesn't support a parallel vacuum.
>
> Okay. After more thoughts on your original proposal, what I get
> confused on your proposal is that there are two types of flags that
> enable and disable options. Looking at 2, 3 and 4, it looks like all
> options are disabled by default and setting these flags means to
> enable them. On the other hand looking at 1, it looks like these
> options are enabled by default and setting the flag means to disable
> it. 0 makes sense to me. So how about having 0, 2, 3 and 4?
>

Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
confused with option 1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




RE: New SQL counter statistics view (pg_stat_sql)

2019-11-12 Thread Smith, Peter
From: Thomas Munro  Sent: Monday, 4 November 2019 1:43 
PM

> No comment on the patch but I noticed that the documentation changes don't 
> build.  Please make sure you can "make docs" successfully, having installed 
> the documentation tools[1].

Thanks for the feedback. An updated patch which fixes the docs issue is 
attached.

Kind Regards.
Peter Smith
---
Fujitsu Australia



0001-pg_stat_sql.patch
Description: 0001-pg_stat_sql.patch


Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Dilip Kumar
On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
 wrote:
>
> On Tue, 12 Nov 2019 at 20:29, Dilip Kumar  wrote:
> >
> > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> >  wrote:
> > >
> > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar  wrote:
> > > >
> > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada 
> > > >  wrote:
> > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > updated version patch that also incorporated some comments I got so
> > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > test the total delay time.
> > > > >
> > > > While reviewing the 0002, I got one doubt related to how we are
> > > > dividing the maintainance_work_mem
> > > >
> > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int 
> > > > nindexes)
> > > > +{
> > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > + lvshared->maintenance_work_mem_worker =
> > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > maintenance_work_mem;
> > > > +}
> > > > Is it fair to just consider the number of indexes which use
> > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > well.  My point is suppose there are 10 indexes which will use the
> > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > the point in dividing the maintenance_work_mem by 10.
> > > >
> > > > IMHO the calculation should be like this
> > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > maintenance_work_mem;
> > > >
> > > > Am I missing something?
> > >
> > > No, I think you're right. On the other hand I think that dividing it
> > > by the number of indexes that will use the mantenance_work_mem makes
> > > sense when parallel degree > the number of such indexes. Suppose the
> > > table has 2 indexes and there are 10 workers then we should divide the
> > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > parallel at a time.
> > >
> > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
>
> Thanks! I'll fix it in the next version patch.
>
One more comment.

+lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
+ int nindexes, IndexBulkDeleteResult **stats,
+ LVParallelState *lps)
+{
+ 

+ if (ParallelVacuumIsActive(lps))
+ {

+
+ lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+ stats, lps);
+
+ }
+
+ for (idx = 0; idx < nindexes; idx++)
+ {
+ /*
+ * Skip indexes that we have already vacuumed during parallel index
+ * vacuuming.
+ */
+ if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
+ continue;
+
+ lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
+   vacrelstats->old_live_tuples);
+ }
+}

In this function, if ParallelVacuumIsActive, we perform the parallel
vacuum for all the index for which parallel vacuum is supported and
once that is over we finish vacuuming remaining indexes for which
parallel vacuum is not supported.  But, my question is that inside
lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
to finish their job then only we start with the sequential vacuuming
shouldn't we start that immediately as soon as the leader
participation is over in the parallel vacuum?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-11-12 Thread Smith, Peter
From: Andres Freund  Sent: Wednesday, 13 November 2019 6:01 
AM

>Peter Smith:
>
> Is there a reason to not just make StaticAssertExpr and StaticAssertStmt be 
> the same? I don't want to proliferate variants that users have to understand 
> if there's no compelling 
> need.  Nor do I think do we really need two different fallback implementation 
> for static asserts.

>
> As far as I can tell we should be able to use the prototype based approach in 
> all the cases where we currently use the "negative bit-field width" approach?

I also thought that the new "prototype negative array-dimension" based approach 
(i.e. StaticAssertDecl) looked like an improvement over the existing "negative 
bit-field width" approach (i.e. StaticAssertStmt), because it seems to work for 
more scenarios (e.g. file scope). 

But I did not refactor existing code to use the new way because I was fearful 
that there might be some subtle reason why the StaticAssertStmt was 
deliberately made that way (e.g. as do/while), and last thing I want to do was 
break working code.

> Should then also update
> * Otherwise we fall back on a kluge that assumes the compiler will complain
> * about a negative width for a struct bit-field.  This will not include a
> * helpful error message, but it beats not getting an error at all.

Kind Regards.
Peter Smith
---
Fujitsu Australia





Invisible PROMPT2

2019-11-12 Thread Thomas Munro
Hello hackers,

>From the advanced bikeshedding department: I'd like my psql
transcripts to have the usual alignment, but be easier to copy and
paste later without having weird prompt stuff in the middle.  How
about a prompt format directive %w that means "whitespace of the same
width as %/"?  Then you can make set your PROMPT2 to '%w   ' and it
becomes invisible:

pgdu=# create table foo (
 i int,
 j int
   );
CREATE TABLE
pgdu=#


invisible-database.patch
Description: Binary data


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-11-12 Thread Bruce Momjian
On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote:
> On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:
> > The thing is that pg_stat_statements assigns a 0 queryid in the
> > post_parse_analyze_hook to recognize utility statements and avoid
> > tracking instrumentation twice in case of utility statements, and then
> > compute a queryid base on a hash of the query text.  Maybe we could
> > instead fully reserve queryid "2" for utility statements (so forcing
> > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
> > of only 0), and use "2" as the identifier for utility statement
> > instead of "0"?
> 
> Hmm.  Not sure.  At this stage it would be nice to gather more input
> on the matter, and FWIW, I don't like much the assumption that a query
> ID of 0 is perhaps a utility statement, or perhaps nothing depending
> on the state of a backend entry, or even perhaps something else
> depending how on how modules make use and define such query IDs.

I thought each extension would export a function to compute the query
id, and you would all that function with the pg_stat_activity.query
string.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: libpq sslpassword parameter and callback function

2019-11-12 Thread Bruce Momjian
On Sun, Nov 10, 2019 at 05:47:24PM +0800, Craig Ringer wrote:
> Yep, that was a trivial change I rolled into it.
> 
> FWIW, this is related to two other patches: the patch to allow passwordless 
> fdw
> connections with explicit superuser approval, and the patch to allow sslkey/
> sslpassword to be set as user mapping options in postgres_fdw . Together all
> three patches make it possible to use SSL client certificates to manage
> authentication in postgres_fdw user mappings.

Oh, nice, greatly needed!

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Does 'instead of delete' trigger support modification of OLD

2019-11-12 Thread Bruce Momjian
On Mon, Nov 11, 2019 at 07:00:22PM +0300, Liudmila Mantrova wrote:
> 
> > 8 нояб. 2019 г., в 0:26, Bruce Momjian  написал(а):
> > 
> > First, notice "only", which was missing from the later sentence:
> > 
> >For INSERT and UPDATE
> >operations [only], the trigger may modify the
> >NEW row before returning it.
> > 
> > which I have now added with my applied patch to all supported releases. 
> > 
> 
> Hi Bruce, 
> 
> I happened to browse recent documentation-related commits and I didn’t see 
> this patch in REL_12_STABLE. Judging by the commit message, it should be 
> applied there too.

Wow, not sure how that happened, fixed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Masahiko Sawada
On Wed, 13 Nov 2019 at 11:38, Amit Kapila  wrote:
>
> On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada
>  wrote:
> >
> > On Tue, 12 Nov 2019 at 22:33, Amit Kapila  wrote:
> > >
> > >
> > > Hmm, I think we should define these flags in the most simple way.
> > > Your previous proposal sounds okay to me.
> >
> > Okay. As you mentioned before, my previous proposal won't work for
> > existing index AMs that don't set amparallelvacuumoptions.
> >
>
> You mean to say it won't work because it has to set multiple flags
> which means that if IndexAm user doesn't set the value of
> amparallelvacuumoptions then it won't work?

Yes. In my previous proposal every index AMs need to set two flags.

>
> > But since we
> > have amcanparallelvacuum which is false by default I think we don't
> > need to worry about backward compatibility problem. The existing index
> > AM will use neither parallel bulk-deletion nor parallel cleanup by
> > default. When it wants to support parallel vacuum they will set
> > amparallelvacuumoptions as well as amcanparallelvacuum.
> >
>
> Hmm, I was not thinking of multiple variables rather only one
> variable. The default value should indicate that IndexAm doesn't
> support a parallel vacuum.

Yes.

> It might be that we need to do it the way
> I originally proposed the different values of amparallelvacuumoptions
> or maybe some variant of it where the default value can clearly say
> that IndexAm doesn't support a parallel vacuum.

Okay. After more thoughts on your original proposal, what I get
confused on your proposal is that there are two types of flags that
enable and disable options. Looking at 2, 3 and 4, it looks like all
options are disabled by default and setting these flags means to
enable them. On the other hand looking at 1, it looks like these
options are enabled by default and setting the flag means to disable
it. 0 makes sense to me. So how about having 0, 2, 3 and 4?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Dilip Kumar
On Tue, Nov 12, 2019 at 7:03 PM Amit Kapila  wrote:
>
> On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 12 Nov 2019 at 20:11, Amit Kapila  wrote:
> > >
> > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar  wrote:
> > > > >
> > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > Yeah, maybe something like amparallelvacuumoptions.  The options 
> > > > > > can be:
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > > performed in parallel (hash index will set this flag)
> > > > >
> > > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > > will not do the cleanup in parallel right?
> > > > >
> > >
> > > Yeah, but it is better to be explicit about this.
> >
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
> >
>
> I am not sure if that is required.
>
> > I think brin indexes
> > will use this flag.
> >
>
> Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
> it should work.

IIUC, VACUUM_OPTION_PARALLEL_CLEANUP means no parallel bulk delete and
always parallel cleanup?  I am not sure whether this is the best way
because for the cleanup option we are being explicit for each option
i.e PARALLEL_CLEANUP, NO_PARALLEL_CLEANUP, etc, then why not the same
for the bulk delete.  I mean why don't we keep both PARALLEL_BULKDEL
and NO_PARALLEL_BULKDEL?

>
> > It will end up with
> > (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> > VACUUM_OPTION_NO_PARALLEL, though.
> >
> > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set 
> > > > > > this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done 
> > > > > > in
> > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > > gin, gist, spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > > >
> > > > > > Does something like this make sense?
> > > >
> > > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > > two flags instead: one for doing parallel cleanup when not performed
> > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > > >
> > >
> > > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > > what makes you think 4 is conditional.
> >
> > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> > 4 it doesn't need to set 3 because 4 means always doing cleanup in
> > parallel.
> >
>
> Yeah, that makes sense.  They can just set 4.
>
> > >
> > > > That way, we
> > > > can have flags as follows and index AM chooses two flags, one from the
> > > > first two flags for bulk deletion and another from next three flags
> > > > for cleanup.
> > > >
> > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > > >
> > >
> > > This also looks reasonable, but if there is an index that doesn't want
> > > to support a parallel vacuum, it needs to set multiple flags.
> >
> > Right. It would be better to use uint16 as two uint8. I mean that if
> > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> > could be followings:
> >
> > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
> >
>
> Hmm, I think we should define these flags in the most simple way.
> Your previous proposal sounds okay to me.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [proposal] recovery_target "latest"

2019-11-12 Thread Kyotaro Horiguchi
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin  
wrote in 
> While working on it, I stumbled upon something strange:
> 
> why DisownLatch(&XLogCtl->recoveryWakeupLatch) is called before
> ReadRecord(xlogreader, LastRec, PANIC, false) ?
> Isn`t this latch may be accessed in WaitForWALToBecomeAvailable() if
> streaming standby gets promoted?

The DisownLatch is just for the sake of tidiness and can be placed
anywhere after the ShutdownWalRcv() call but the current place (just
before "StandbyMode = false") seems natural. The ReadRecord call
doesn't launch another wal receiver because we cleard StandbyMode just
before.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: ssl passphrase callback

2019-11-12 Thread Bruce Momjian
On Sun, Nov 10, 2019 at 01:01:17PM -0600, Magnus Hagander wrote:
> On Wed, Nov 6, 2019 at 7:24 PM Bruce Momjian  wrote:
>   We had this
> discussion in relation to archive_command years ago, and decided on a
> shell command as the best API.
>
> I don't recall that from back then, but that was a long time ago.
> 
> But it's interesting that you mention it, given the number of people I have
> been discussing that with recently, under the topic of changing it from a 
> shell
> command into a shared library API (with there being a shell command as one
> possible implementation of course). 
> 
> One of the main reasons there being to be easily able to transfer more state
> and give results other than just an exit code, no need to deal with parameter
> escaping etc. Which probably wouldn't matter as much to an SSL passphrase
> command, but still.

I get the callback-is-easier issue with shared objects, but are we
expecting to pass in more information here than we do for
archive_command?  I would think not.  What I am saying is that if we
don't think passing things in works, we should fix all these external
commands, or something.   I don't see why ssl_passphrase_command is
different, except that it is new.  Or is it related to _securely_
passing something?

Also, why was this patch posted without any discussion of these issues?
Shouldn't we ideally discuss the API first?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Amit Kapila
On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada
 wrote:
>
> On Tue, 12 Nov 2019 at 22:33, Amit Kapila  wrote:
> >
> >
> > Hmm, I think we should define these flags in the most simple way.
> > Your previous proposal sounds okay to me.
>
> Okay. As you mentioned before, my previous proposal won't work for
> existing index AMs that don't set amparallelvacuumoptions.
>

You mean to say it won't work because it has to set multiple flags
which means that if IndexAm user doesn't set the value of
amparallelvacuumoptions then it won't work?

> But since we
> have amcanparallelvacuum which is false by default I think we don't
> need to worry about backward compatibility problem. The existing index
> AM will use neither parallel bulk-deletion nor parallel cleanup by
> default. When it wants to support parallel vacuum they will set
> amparallelvacuumoptions as well as amcanparallelvacuum.
>

Hmm, I was not thinking of multiple variables rather only one
variable. The default value should indicate that IndexAm doesn't
support a parallel vacuum.  It might be that we need to do it the way
I originally proposed the different values of amparallelvacuumoptions
or maybe some variant of it where the default value can clearly say
that IndexAm doesn't support a parallel vacuum.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: 'Invalid lp' during heap_xlog_delete

2019-11-12 Thread Daniel Wood
It's been tedious to get it exactly right but I think I got it.  FYI, I was 
delayed because today we had yet another customer hit this:  'redo max offset' 
error.  The system crashed as a number of autovacuums and a checkpoint happened 
and then the REDO failure.

Two tiny code changes:
bufmgr.c:bufferSync()   pg_usleep(1000);  // At begin of function

smgr.c:smgrtruncate():  Add the following just after CacheInvalidateSmgr()
if (forknum == MAIN_FORKNUM && nblocks == 0) {
pg_usleep(4000);
{ char *cp=NULL; *cp=13; }
}

Now for the heavily commented SQL repro.  It will require that you execute a 
checkpoint in another session when instructed by the repro.sql script.  You 
have 4 seconds to do that. The repro script explains exactly what must happen.

---
create table t (c char());
alter table t alter column c set storage plain;
-- Make sure there actually is an allocated page 0 and it is empty.
-- REDO Delete would ignore a non-existant page: XLogReadBufferForRedoExtended: 
return BLK_NOTFOUND;
-- Hopefully two row deletes don't trigger autovacuum and truncate the empty 
page.
insert into t values ('1'), ('2');
delete from t;
checkpoint; -- Checkpoint the empty page to disk.
-- This insert should be before the next checkpoint 'start'. I don't want to 
replay it.
-- And, yes, there needs to be another checkpoint completed to skip its replay 
and start
-- with the replay of the delete below.
insert into t values ('1'), ('2');
-- Checkpoint needs to start in another session. However, I need to stall the 
checkpoint
-- to prevent it from writing the dirty page to disk until I get to the vacuum 
below.
select 'Please start checkpoint in another session';
select pg_sleep(4);
-- Below is the problematic delete.
-- It succeeds now(online) because the dirty page has two rows on it.
-- However, with respect to crash recovery there are 3 possible scenarios 
depending on timing.
-- 1) The ongoing checkpoint might write the page with the two rows on it before
-- the deletes. This leads to BLK_NEEDS_REDO for the deletes. It works
-- because the page read from disk has the rows on it.
-- 2) The ongoing checkpoint might write the page just after the deletes.
-- In that case BLK_DONE will happen and there'll be no problem. LSN check.
-- 3) The checkpoint can fail to write the dirty page because a vacuum can call
-- smgrtruncate->DropRelFileNodeBuffers() which invalidates the dirty page.
-- If smgrtruncate safely completes the physical truncation then BLK_NOTFOUND
-- happens and we skip the redo of the delete. So the skipped dirty write is OK.
-- The problme happens if we crash after the 2nd checkpoint completes
-- but before the physical truncate 'mdtruncate()'.
delete from t;
-- The vacuum must complete DropRelFileNodeBuffers.
-- The vacuum must sleep for a few seconds to allow the checkpoint to complete
-- such that recovery starts with the Delete REDO.
-- We must crash before mdtruncate() does the physical truncate. If the physical
-- truncate happens the BLK_NOTFOUND will be returned and the Delete REDO 
skipped.
vacuum t;



> On November 10, 2019 at 11:51 PM Michael Paquier < mich...@paquier.xyz 
> mailto:mich...@paquier.xyz > wrote:
> 
> 
> On Fri, Nov 08, 2019 at 06:44:08PM -0800, Daniel Wood wrote:
> 
> > > I repro'ed on PG11 and PG10 STABLE but several months old.
> > I looked at 6d05086 but it doesn't address the core issue.
> > 
> > DropRelFileNodeBuffers prevents the checkpoint from writing all
> > needed dirty pages for any REDO's that exist BEFORE the truncate.
> > If we crash after a checkpoint but before the physical truncate then
> > the REDO will need to replay the operation against the dirty page
> > that the Drop invalidated.
> > 
> > > I am beginning to look at this thread more seriously, and I'd 
> > like to
> first try to reproduce that by myself. Could you share the steps you
> used to do that? This includes any manual sleep calls you may have
> added, the timing of the crash, manual checkpoints, debugger
> breakpoints, etc. It may be possible to extract some more generic
> test from that.
> --
> Michael
> 


Re: Add SQL function to show total block numbers in the relation

2019-11-12 Thread btkimurayuzk

Size in block number is useless for those who doesn't understand the
notion of block, or block size. Those who understands the notion
should come up with the simple formula (except the annoying
casts). Anyone can find the clue to the base values by searching the
document in the Web with the keywords "block size" and "relation size"
or even with "table size". (FWIW, I would even do the same for the new
function if any...) If they need it so frequently, a user-defined
function is easily made up.

regards.



I didn't know about the existence of the user-defined function .
I fully understood , Thanks .

Regards,

Yu Kimura





Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-11-12 Thread Amit Langote
Hi Alvaro,

On Wed, Nov 13, 2019 at 6:55 AM Alvaro Herrera  wrote:
> On 2019-Nov-07, Amit Langote wrote:
> > On Tue, Nov 5, 2019 at 9:22 AM Michael Paquier  wrote:
> > >  Please note that I have not switched the old interface
> > > to be static to reloptions.c as if you look closely at reloptions.h we
> > > allow much more flexibility around HANDLE_INT_RELOPTION to fill and
> > > parse the reloptions in custom AM.  AM maintainers had better use the
> > > new interface, but there could be a point for more customized error
> > > messages.
> >
> > I looked around but don't understand why these macros need to be
> > exposed.  I read this comment:
> >
> >  *  Note that this is more or less the same that fillRelOptions does, so 
> > only
> >  *  use this if you need to do something non-standard within some option's
> >  *  code block.
> >
> > but don't see how an AM author might be able to do something
> > non-standard with this interface.
> >
> > Maybe Alvaro knows this better.
>
> I think the idea was that you could have external code doing things in a
> different way for some reason, ie. not use a bytea varlena struct that
> could be filled by fillRelOptions but instead ... do something else.
> That's why those macros are exposed.  Now, this idea doesn't seem to be
> aged very well.  Maybe exposing that stuff is unnecessary.

Thanks for chiming in about that.  I guess that means that we don't
need those macros, except GET_STRING_RELOPTION_LEN() that's used in
allocateReloptStruct(), which can be moved to reloptions.c.  Is that
correct?

Thanks,
Amit




Re: [PATCH][BUG_FIX] Potential null pointer dereferencing.

2019-11-12 Thread Kyotaro Horiguchi
At Tue, 12 Nov 2019 14:03:53 +, Ranier Vilela  
wrote in 
> Hi,
> The condition is :
> 74.  if (TupIsNull(slot))  is true
> 85.  if (TupIsNull(resultTupleSlot)) is true too.

See the definition of TupIsNull. It checks the tupleslot at a valid
pointer is EMPTY as well. And node->ps.ps_ResultTupleSlot cannot be
NULL there since ExecInitUnique initializes it. The sequence is
obvious so even Assert is not needed there, I think.

> If resultTupleSlot is not can be NULL, why test if 
> (TupIsNull(resultTupleSlot))?

It checks if there is no stored "previous" tuple, which is used to
detect the next value. If it is EMPTY (not NULL), it is the first
tuple from the outerPlan as described in the comment just above.

> Occurring these two conditions ExecClearTuple is called, but, don't check by 
> NULL arg.
> 
> There are at least 2 more possible cases, envolving ExecClearTuple:
> nodeFunctionscan.c and nodeWindowAgg.c

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Why overhead of SPI is so large?

2019-11-12 Thread Kyotaro Horiguchi
At Tue, 12 Nov 2019 11:27:24 +0300, Konstantin Knizhnik 
 wrote in 
> 
> 
> On 11.11.2019 20:22, Tom Lane wrote:
> >
> > None of those statements are true, in my experience.
> >
> > In general, this patch seems like it's learned nothing from our
> > experiences with the late and unlamented exec_simple_check_node()
> > (cf commit 00418c612).  Having a piece of plpgsql that has to know
> > about all possible "simple expression" node types is just a major
> > maintenance headache; but there is no short-cut solution that is
> > really going to be acceptable.  Either you're unsafe, or you fail
> > to optimize cases that users will complain about, or you write
> > and maintain a lot of code.
> >
> > I'm also pretty displeased with the is-it-in-the-pg_catalog-schema
> > tests.  Those are expensive, requiring additional catalog lookups,
> > and they prove just about nothing.  There are extensions that shove
> > stuff into pg_catalog (look no further than contrib/adminpack), or
> > users could do it, so you really still are relying on the whole world
> > to get immutability right.  If we're going to not trust immutability
> > markings on user-defined objects, I'd be inclined to do it by
> > checking that the object's OID is less than FirstGenbkiObjectId.
> >
> > But maybe we should just forget that bit of paranoia and rely solely
> > on contain_mutable_functions().  That gets rid of the new maintenance
> > requirement, and I think arguably it's OK.  An "immutable" function
> > whose result depends on changes that were just made by the calling
> > function is pretty obviously mislabeled, so people won't have much of
> > a leg to stand on if they complain about that.  Pavel's argument
> > upthread that people sometimes intentionally mislabel functions as
> > immutable isn't really relevant: in his example, at least, the user
> > is intentionally trying to get the function to be evaluated early.
> > So whether it sees the effects of changes just made by the calling
> > function doesn't seem like it would matter to such a user.
> >
> > regards, tom lane
> 
> In my opinion contain_mutable_functions() is the best solution.
> But if it is not acceptable, I will rewrite the patch in white-list
> fashion.

I agree for just relying on contain_mutable_functions for the same
reasons to Tom.

> I do not understand the argument about expensive
> is-it-in-the-pg_catalog-schema test.
> IsCatalogNameaspace is doing just simple comparison without any
> catalog lookups:
> 
> bool
> IsCatalogNamespace(Oid namespaceId)
> {
>     return namespaceId == PG_CATALOG_NAMESPACE;
> }
> 
> I may agree that it "proves nothing" if extension can put stuff in
> pg_catalog.
> I can replace it with comparison with FirstGenbkiObjectId.
> But all this makes sense only if using contain_mutable_function() is
> not acceptable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Masahiko Sawada
On Tue, 12 Nov 2019 at 22:33, Amit Kapila  wrote:
>
> On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 12 Nov 2019 at 20:11, Amit Kapila  wrote:
> > >
> > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar  wrote:
> > > > >
> > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > Yeah, maybe something like amparallelvacuumoptions.  The options 
> > > > > > can be:
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > > performed in parallel (hash index will set this flag)
> > > > >
> > > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > > will not do the cleanup in parallel right?
> > > > >
> > >
> > > Yeah, but it is better to be explicit about this.
> >
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
> >
>
> I am not sure if that is required.
>
> > I think brin indexes
> > will use this flag.
> >
>
> Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
> it should work.
>
> > It will end up with
> > (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> > VACUUM_OPTION_NO_PARALLEL, though.
> >
> > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set 
> > > > > > this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done 
> > > > > > in
> > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > > gin, gist, spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > > >
> > > > > > Does something like this make sense?
> > > >
> > > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > > two flags instead: one for doing parallel cleanup when not performed
> > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > > >
> > >
> > > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > > what makes you think 4 is conditional.
> >
> > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> > 4 it doesn't need to set 3 because 4 means always doing cleanup in
> > parallel.
> >
>
> Yeah, that makes sense.  They can just set 4.

Okay,

>
> > >
> > > > That way, we
> > > > can have flags as follows and index AM chooses two flags, one from the
> > > > first two flags for bulk deletion and another from next three flags
> > > > for cleanup.
> > > >
> > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > > >
> > >
> > > This also looks reasonable, but if there is an index that doesn't want
> > > to support a parallel vacuum, it needs to set multiple flags.
> >
> > Right. It would be better to use uint16 as two uint8. I mean that if
> > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> > could be followings:
> >
> > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
> >
>
> Hmm, I think we should define these flags in the most simple way.
> Your previous proposal sounds okay to me.

Okay. As you mentioned before, my previous proposal won't work for
existing index AMs that don't set amparallelvacuumoptions. But since we
have amcanparallelvacuum which is false by default I think we don't
need to worry about backward compatibility problem. The existing index
AM will use neither parallel bulk-deletion nor parallel cleanup by
default. When it wants to support parallel vacuum they will set
amparallelvacuumoptions as well as amcanparallelvacuum.

I'll try to use my previous proposal and check it. If something wrong
we can back to your proposal or others.


--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PHJ file leak.

2019-11-12 Thread Kyotaro Horiguchi
At Wed, 13 Nov 2019 09:48:19 +1300, Thomas Munro  wrote 
in 
> On Tue, Nov 12, 2019 at 5:03 PM Thomas Munro  wrote:
> > On Tue, Nov 12, 2019 at 4:23 PM Thomas Munro  wrote:
> > > On Tue, Nov 12, 2019 at 4:20 PM Kyotaro Horiguchi
> > >  wrote:
> > > > The previous patch would be wrong. The root cause is a open batch so
> > > > the right thing to be done at scan end is
> > > > ExecHashTableDeatchBatch. And the real issue here seems to be in
> > > > ExecutePlan, not in PHJ.
> > >
> > > You are right.  Here is the email I just wrote that says the same
> > > thing, but with less efficiency:
> >
> > And yeah, your Make_parallel_shutdown_on_broken_channel.patch seems
> > like the real fix here.  It's not optional to run that at
> > end-of-query, though you might get that impression from various
> > comments, and it's also not OK to call it before the end of the query,
> > though you might get that impression from what the code actually does.
> 
> Here's the version I'd like to commit in a day or two, once the dust
> has settled on the minor release.  Instead of adding yet another copy
> of that code, I just moved it out of the loop; this way there is no
> way to miss it.  I think the comment could also be better, but I'll
> wait for the concurrent discussions about the meaning of
> ExecShutdownNode() to fix that in master.

The phatch's shape looks better. Thanks.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: checking my understanding of TupleDesc

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-12 18:20:56 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-11-12 17:39:20 -0500, Tom Lane wrote:
> >> There's a semi-exception, which is that the planner might decide that we
> >> can skip a projection step for the output of a table scan node, in which
> >> case dropped columns would be included in its output.  But that would only
> >> be true if there are upper plan nodes that are doing some projections of
> >> their own.  The final query output will definitely not have them.
> 
> > I *think* we don't even do that, because build_physical_tlist() bails
> > out if there's a dropped (or missing) column.
> 
> Ah, right.  Probably because we need to insist on every column of an
> execution-time tupdesc having a valid atttypid ... although I wonder,
> is that really necessary?

Yea, the stated reasoning is ExecTypeFromTL():
 *
 * Exception: if there are any dropped or missing columns, we punt and return
 * NIL.  Ideally we would like to handle these cases too.  However this
 * creates problems for ExecTypeFromTL, which may be asked to build a tupdesc
 * for a tlist that includes vars of no-longer-existent types.  In theory we
 * could dig out the required info from the pg_attribute entries of the
 * relation, but that data is not readily available to ExecTypeFromTL.
 * For now, we don't apply the physical-tlist optimization when there are
 * dropped cols.

I think the main problem is that we don't even have a convenient way to
identify that a targetlist expression is actually a dropped column, and
treat that differently. If we were to expand physical tlists to cover
dropped and missing columns, we'd need to be able to add error checks to
at least ExecInitExprRec, and to printtup_prepare_info().

I wonder if we could get away with making build_physical_tlist()
returning a TargetEntry for a Const instead of a Var for the dropped
columns? That'd contain enough information for tuple deforming to work
on higher query levels?  Or perhaps we ought to invent a DroppedVar
node, that includes the type information? That'd make it trivial to
error out when such an expression is actually evaluated, and allow to
detect such columns.  We already put Const nodes in some places like
that IIRC...

Greetings,

Andres Freund




Re: make pg_attribute_noreturn() work for msvc?

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-12 18:15:28 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > It's worthwhile to note - I forgot this - that noreturn actually has
> > been standardized in C11 and C++11. For C11 the keyword is _Noreturn,
> > with a convenience macro 'noreturn' defined in stdnoreturn.h.
> 
> > For C++11, the syntax is (please don't get an aneurysm...):
> > [[ noreturn ]] void funcname(params)...
> > (yes, the [[]] are actually part of the syntax, not some BNF like thing)
> 
> Egad.  I'd *want* to hide that under a macro :-(

Yea, it's quite ugly.

I think the only saving grace is that C++ made that the generic syntax
for various annotations / attributes. Everywhere, not just for function
properties. So there's [[noreturn]], [[fallthrough]], [[nodiscard]],
[[maybe_unused]] etc, and that there is explicit namespacing for vendor
extensions by using [[vendorname::attname]], e.g. the actually existing
[[gnu::always_inline]].

There's probably not that many other forms of syntax one can add to all
the various places, without running into syntax limitations, or various
vendor extensions...

But still.


> > While it looks tempting to just use 'noreturn', and backfill it if the
> > current environment doesn't support it, I think that's a bit too
> > dangerous, because it will tend to break other code like
> > __attribute__((noreturn)) and _declspec(noreturn). As there's plenty
> > other software using either or both of these, I don't think it's worth
> > going there.
> 
> Agreed, defining noreturn is too dangerous, it'll have to be
> pg_noreturn.  Or maybe use _Noreturn?  But that feels ugly too.

Yea, I don't like that all that much. We'd have to define it in C++
mode, and it's in the explicit standard reserved namespace...


> Anyway, I still like the idea of merging the void keyword in with
> that.

Hm. Any other opinions?


Greetings,

Andres Freund




Re: checking my understanding of TupleDesc

2019-11-12 Thread Tom Lane
Andres Freund  writes:
> On 2019-11-12 17:39:20 -0500, Tom Lane wrote:
>> There's a semi-exception, which is that the planner might decide that we
>> can skip a projection step for the output of a table scan node, in which
>> case dropped columns would be included in its output.  But that would only
>> be true if there are upper plan nodes that are doing some projections of
>> their own.  The final query output will definitely not have them.

> I *think* we don't even do that, because build_physical_tlist() bails
> out if there's a dropped (or missing) column.

Ah, right.  Probably because we need to insist on every column of an
execution-time tupdesc having a valid atttypid ... although I wonder,
is that really necessary?

regards, tom lane




Re: make pg_attribute_noreturn() work for msvc?

2019-11-12 Thread Tom Lane
Andres Freund  writes:
> It's worthwhile to note - I forgot this - that noreturn actually has
> been standardized in C11 and C++11. For C11 the keyword is _Noreturn,
> with a convenience macro 'noreturn' defined in stdnoreturn.h.

> For C++11, the syntax is (please don't get an aneurysm...):
> [[ noreturn ]] void funcname(params)...
> (yes, the [[]] are actually part of the syntax, not some BNF like thing)

Egad.  I'd *want* to hide that under a macro :-(

> While it looks tempting to just use 'noreturn', and backfill it if the
> current environment doesn't support it, I think that's a bit too
> dangerous, because it will tend to break other code like
> __attribute__((noreturn)) and _declspec(noreturn). As there's plenty
> other software using either or both of these, I don't think it's worth
> going there.

Agreed, defining noreturn is too dangerous, it'll have to be
pg_noreturn.  Or maybe use _Noreturn?  But that feels ugly too.

Anyway, I still like the idea of merging the void keyword in with
that.

regards, tom lane




Re: checking my understanding of TupleDesc

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-12 17:39:20 -0500, Tom Lane wrote:
> >  and under what other
> >  circumstances one would only encounter 'cleaned up' TupleDescs with
> >  no dropped attributes, and contiguous numbers for the real ones?
> 
> I don't believe we ever include dropped columns in a projection result,
> so generally speaking, the output of a query plan node wouldn't have them.
> 
> There's a semi-exception, which is that the planner might decide that we
> can skip a projection step for the output of a table scan node, in which
> case dropped columns would be included in its output.  But that would only
> be true if there are upper plan nodes that are doing some projections of
> their own.  The final query output will definitely not have them.

I *think* we don't even do that, because build_physical_tlist() bails
out if there's a dropped (or missing) column. Or are you thinking of
something else?

Greetings,

Andres Freund




Re: make pg_attribute_noreturn() work for msvc?

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-12 17:22:05 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-11-12 15:58:07 -0500, Tom Lane wrote:
> > I'm a bit confused as to why pg_upgrade.h doesn't use 'extern' for
> > function declarations? Not that it's really related, except for the
> > 'extern' otherwise hiding the effect of pgindent not liking 'void
> > pg_noreturn'…
> 
> There are various headers where people have tended to not use "extern".
> I always disliked that, thinking it was not per project style, but
> never bothered to force the issue.  If we went around and inserted
> extern in these places, it wouldn't bother me any.



> > I don't see a reason not to go for 'pg_noreturn void'?
> 
> That seems kind of ugly from here.  Not sure why, but at least to
> my mind that's a surprising ordering.

Oh, to me it seemed a quite reasonable order. It think it feels that way
to me because we put properties like 'static', 'extern', 'inline' etc
also before the return type (and it's similar for variable declarations
too).

It's maybe also worthwhile to note that emacs parses 'pg_noreturn void'
correctly, but gets confused by 'void pg_noreturn'. It's just syntax
highlighting though, so whatever.


> >> One idea is to merge it with the "void" result type that such
> >> a function would presumably have, along the lines of
> >> #define pg_noreturnvoid __declspec(noreturn)
> 
> > Yea, that'd be an alternative. But since not necessary, I'd not go
> > there?
> 
> I kind of liked that idea, too bad you don't.

I don't actively dislike it. It just seemed a bit more magic than
necessary. One need not understand what pg_noreturn does - not that it's
hard to infer from the name - to know the return type of the function.


> One argument for it is that then there'd be exactly one right way to
> do it, not two.  Also, if we find out that there's some compiler
> that's pickier about where to place the annotation, we'd have a
> central place to handle it.

The former seems like a good argument to me. I'm not quite sure I think
the second is likely.


It's worthwhile to note - I forgot this - that noreturn actually has
been standardized in C11 and C++11. For C11 the keyword is _Noreturn,
with a convenience macro 'noreturn' defined in stdnoreturn.h.

For C++11, the syntax is (please don't get an aneurysm...):
[[ noreturn ]] void funcname(params)...
(yes, the [[]] are actually part of the syntax, not some BNF like thing)

I *think* the standard prescribes _Noreturn to be before the return type
(it's defined in the same rule as inline), but I have some difficulty
parsing the standard language. Gcc at least accepts inline only before
the return type, but _Noreturn in both places.

Certainly all the standard examples place it before the type.


While it looks tempting to just use 'noreturn', and backfill it if the
current environment doesn't support it, I think that's a bit too
dangerous, because it will tend to break other code like
__attribute__((noreturn)) and _declspec(noreturn). As there's plenty
other software using either or both of these, I don't think it's worth
going there.


Greetings,

Andres Freund




Re: checking my understanding of TupleDesc

2019-11-12 Thread Tom Lane
Chapman Flack  writes:
> On 09/29/19 20:13, Chapman Flack wrote:
>> From looking around the code, I've made these tentative observations
>> about TupleDescs:
>> 
>> 1. If the TupleDesc was obtained straight from the relcache for some
>> relation, then all of its attributes should have nonzero attrelid
>> identifying that relation, but in (every? nearly every?) other case,
>> the attributes found in a TupleDesc will have a dummy attrelid of zero.

I'm not sure about every vs. nearly every, but otherwise this seems
accurate.  Generally attrelid is meaningful in a pg_attribute catalog
entry, but not in TupleDescs in memory.  It appears valid in relcache
entry tupdescs only because they are built straight from pg_attribute.

>> 2. The attributes in a TupleDesc will (always?) have consecutive attnum
>> corresponding to their positions in the TupleDesc (and therefore
>> redundant).

Correct.

> And one more:

>   3. One could encounter a TupleDesc with one or more 'attisdropped'
>  attributes, which do have their original attnums (corresponding
>  to their positions in the TupleDesc and therefore redundant),
>  so the attnums of nondropped attributes may be discontiguous.

Right.

>  Is it simple to say under what circumstances a TupleDesc possibly
>  with dropped members could be encountered,

Any tupdesc that's describing the rowtype of a table with dropped columns
would look like that.

>  and under what other
>  circumstances one would only encounter 'cleaned up' TupleDescs with
>  no dropped attributes, and contiguous numbers for the real ones?

I don't believe we ever include dropped columns in a projection result,
so generally speaking, the output of a query plan node wouldn't have them.

There's a semi-exception, which is that the planner might decide that we
can skip a projection step for the output of a table scan node, in which
case dropped columns would be included in its output.  But that would only
be true if there are upper plan nodes that are doing some projections of
their own.  The final query output will definitely not have them.

regards, tom lane




Re: make pg_attribute_noreturn() work for msvc?

2019-11-12 Thread Tom Lane
Andres Freund  writes:
> On 2019-11-12 15:58:07 -0500, Tom Lane wrote:
>> I guess my big question about that is whether pgindent will make a
>> hash of it.

> If one writes 'pg_noreturn void', rather than 'void pg_noreturn', then
> there's only one place where pgindent changes something in a somewhat
> weird way. For tablesync.c, it indents the pg_noreturn for
> finish_sync_worker(). But only due to being on a separate newline, which
> seems unnecessary…

I think that it might be like that because some previous version of
pgindent changed it to that.  That's probably why we never adopted
this style generally in the first place.

> With 'void pg_noreturn', a few prototypes in headers get indented more
> than pretty, e.g. in pg_upgrade.h it turns

> void pg_noreturn pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2);
> into
> void  pg_noreturn pg_fatal(const char *fmt,...) 
> pg_attribute_printf(1, 2);

> I'm a bit confused as to why pg_upgrade.h doesn't use 'extern' for
> function declarations? Not that it's really related, except for the
> 'extern' otherwise hiding the effect of pgindent not liking 'void
> pg_noreturn'…

There are various headers where people have tended to not use "extern".
I always disliked that, thinking it was not per project style, but
never bothered to force the issue.  If we went around and inserted
extern in these places, it wouldn't bother me any.

> I don't see a reason not to go for 'pg_noreturn void'?

That seems kind of ugly from here.  Not sure why, but at least to
my mind that's a surprising ordering.

>> One idea is to merge it with the "void" result type that such
>> a function would presumably have, along the lines of
>> #define pg_noreturn  void __declspec(noreturn)

> Yea, that'd be an alternative. But since not necessary, I'd not go
> there?

I kind of liked that idea, too bad you don't.  One argument for it
is that then there'd be exactly one right way to do it, not two.
Also, if we find out that there's some compiler that's pickier
about where to place the annotation, we'd have a central place
to handle it.

regards, tom lane




Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-12 13:42:10 -0800, Maciek Sakrejda wrote:
> On Mon, Oct 28, 2019 at 5:02 PM Andres Freund  wrote:
> > What I dislike about that is that it basically again is introducing
> 
> "again"? Am I missing some history here? I'd love to read up on this
> if there are mistakes to learn from.

I think I was mostly referring to mistakes we've made for the json etc
key names. By e.g. having expressions as "Function Call", "Table
Function Call", "Filter", "TID Cond", ... a tool that wants to interpret
the output needs awareness of all of these different names, rather than
knowing that everything with a sub-group "Expression" has to be an
expression.

I.e. instead of

"Plan": {
  "Node Type": "Seq Scan",
  "Parallel Aware": false,
  "Relation Name": "pg_class",
  "Schema": "pg_catalog",
  "Alias": "pg_class",
  "Startup Cost": 0.00,
  "Total Cost": 17.82,
  "Plan Rows": 385,
  "Plan Width": 68,
  "Output": ["relname", "tableoid"],
  "Filter": "(pg_class.relname <> 'foo'::name)"
}

we ought to have gone for

"Plan": {
  "Node Type": "Seq Scan",
  "Parallel Aware": false,
  "Relation Name": "pg_class",
  "Schema": "pg_catalog",
  "Alias": "pg_class",
  "Startup Cost": 0.00,
  "Total Cost": 17.82,
  "Plan Rows": 385,
  "Plan Width": 68,
  "Output": ["relname", "tableoid"],
  "Filter": {"Expression" : { "text": (pg_class.relname <> 'foo'::name)"}}
}

or something like that. Which'd then make it obvious how to add
information about JIT to each expression.


Whereas the proposal of the separate key name perpetuates the
messiness...


> > something that requires either pattern matching on key names (i.e. a key
> > of '(.*) JIT' is one that has information about JIT, and the associated
> > expresssion is in key $1), or knowing all the potential keys an
> > expression could be in.
> 
> That still seems less awkward than having to handle a Filter field
> that's either scalar or a group.

Yea, it's a sucky option :(


> Most current EXPLAIN options just add
> additional fields to the structured plan instead of modifying it, no?
> If that output is better enough, though, maybe we should just always
> make Filter a group and go with the breaking change? If tooling
> authors need to treat this case specially anyway, might as well evolve
> the format.

Yea, maybe that's the right thing to do. Would be nice to have some more
input...

Greetings,

Andres Freund




2019-11-14 Press Release Draft

2019-11-12 Thread Jonathan S. Katz
Hi,

Attached is a draft of the press release for the update release going
out on 2010-11-14. Please provide feedback, particularly on the
technical accuracy of the statements.

Thanks!

Jonathan
2019-05-09 Cumulative Update Release


The PostgreSQL Global Development Group has released an update to all supported
versions of our database system, including 12.1, 11.6, 10.11, 9.6.16, 9.5.20,
and 9.4.25. This release fixes over 60 bugs reported over the last three months.

PostgreSQL 9.4 EOL Approaching
--

PostgreSQL 9.4 will stop receiving fixes on February 13, 2020, which is the next
planned cumulative update release. Please see our
[versioning policy](https://www.postgresql.org/support/versioning/) for more
information.

Bug Fixes and Improvements
--

This update also fixes over 50 bugs that were reported in the last several
months. Some of these issues affect only version 12, but may also affect all
supported versions.

Some of these fixes include:

* Fix crash that occurs when `ALTER TABLE` adds a column without a default value
along with other changes that require a table rewrite
* Several fixes for `REINDEX CONCURRENTLY`.
* Fix for `VACUUM` that would cause it to fail under a specific case involving a
still-running transaction.
* Fix for a memory leak that could occur when `VACUUM` runs on a GiST index.
* Fix for an error that occurred when running `CLUSTER` on an expression index.
* Fix failure for `SET CONSTRAINTS ... DEFERRED` on partitioned tables.
* Several fixes for the creation and dropping of indexes on partitioned tables.
* Fix for partition-wise joins that could lead to planner failures.
* Ensure that offset expressions in WINDOW clauses are processed when a query's
expressions are manipulated.
* Fix misbehavior of `bitshiftright()` where it failed to zero out padding space
in the last byte if the bit string length is not a multiple of 8. For how to
correct your data, please see the "Updating" section.
* Ensure an empty string that is evaluated by the `position()` returns 1, as per
the SQL standard.
* Fix for a parallel query failure when it is unable to request a background
worker.
* Fix crash triggered by a case involving a `BEFORE UPDATE` trigger.
* Display the correct error when a query tries to access a TOAST table.
* Allow encoding conversion to succeed on strings with output up to 1GB.
Previously there was hard limit of 0.25GB on the input string.
* Ensure that temporary WAL and history files are removed at the end of archive
recovery.
* Avoid failure in archive recovery if `recovery_min_apply_delay` is enabled.
* Ignore `restore_command`, `recovery_end_command`, and
`recovery_min_apply_delay` settings during crash recovery.
* Several fixes for logical replication, including a failure when the publisher
& subscriber had different REPLICA IDENTITY columns set.
* Correctly timestamp replication messages for logical decoding, which in the
broken case would lead to `pg_stat_subscription.last_msg_send_time` set to
`NULL`.
* Several fixes for libpq, including one that improves PostgreSQL 12
compatibility.
* Several `pg_upgrade` fixes.
* Fix how a parallel restore handles foreign key constraints on partitioned
tables to ensure they are not created too soon.
* `pg_dump` now outputs similarly named triggers and RLS policies in order based
on table name, instead of OID.
* Fix `pg_rewind` to not update the contents of `pg_control` when using the
`--dry-run` option.

This update also contains tzdata release 2019c for DST law changes in Fiji and
Norfolk Island.  Historical corrections for Alberta, Austria, Belgium, British
Columbia, Cambodia, Hong Kong, Indiana (Perry County), Kaliningrad, Kentucky,
Michigan, Norfolk Island, South Korea, and Turkey.

For the full list of changes available, please review the
[release notes](https://www.postgresql.org/docs/current/release.html).

Updating


All PostgreSQL update releases are cumulative. As with other minor releases,
users are not required to dump and reload their database or use `pg_upgrade` in
order to apply this update release; you may simply shutdown PostgreSQL and
update its binaries.

Users who have skipped one or more update releases may need to run additional,
post-update steps; please see the release notes for earlier versions for
details.

If you have inconsistent data as a result of saving the output of
`bitshiftright()` in a table, it's possible to fix it with a query similar to:

UPDATE mytab SET bitcol = ~(~bitcol) WHERE bitcol != ~(~bitcol);

**NOTE**: PostgreSQL 9.4 will stop receiving fixes on February 13, 2020. Please
see our [versioning policy](https://www.postgresql.org/support/versioning/) for
more information.

Links
-
* [Download](https://www.postgresql.org/download/)
* [Release Notes](https://www.postgresql.org/docs/current/release.html)
* [Security Page](https://www.postgresql.org/support/security/)
* [Versioning Policy](ht

Re: make pg_attribute_noreturn() work for msvc?

2019-11-12 Thread Andres Freund
On 2019-11-12 15:58:07 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > So perhaps we ought to rename pg_attribute_noreturn() to pg_noreturn(),
> > add a __declspec(noreturn) version, and move the existing uses to it.
> 
> > I'm inclined to also drop the parentheses at the same time (i.e
> > pg_noreturn rather than pg_noreturn()) - it seems easier to mentally
> > parse the code that way.
> 
> I guess my big question about that is whether pgindent will make a
> hash of it.

If one writes 'pg_noreturn void', rather than 'void pg_noreturn', then
there's only one place where pgindent changes something in a somewhat
weird way. For tablesync.c, it indents the pg_noreturn for
finish_sync_worker(). But only due to being on a separate newline, which
seems unnecessary…

With 'void pg_noreturn', a few prototypes in headers get indented more
than pretty, e.g. in pg_upgrade.h it turns

void pg_noreturn pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2);
into
voidpg_noreturn pg_fatal(const char *fmt,...) 
pg_attribute_printf(1, 2);


I'm a bit confused as to why pg_upgrade.h doesn't use 'extern' for
function declarations? Not that it's really related, except for the
'extern' otherwise hiding the effect of pgindent not liking 'void
pg_noreturn'…


I don't see a reason not to go for 'pg_noreturn void'?


> One idea is to merge it with the "void" result type that such
> a function would presumably have, along the lines of
> 
> #define pg_noreturn   void __declspec(noreturn)
> ...
> extern pg_noreturn proc_exit(int);

> and if necessary, we could strongarm pgindent into believing
> that pg_noreturn is a typedef.

Yea, that'd be an alternative. But since not necessary, I'd not go
there?

Greetings,

Andres Freund
>From ed3bea2b054fdd0f74b599b50bc8ea457a42fc63 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 12 Nov 2019 13:55:39 -0800
Subject: [PATCH v1] Change pg_attribute_noreturn() to pg_noreturn, move to
 before return type.

---
 contrib/cube/cubedata.h  |  2 +-
 contrib/dblink/dblink.c  |  6 ++
 contrib/pgcrypto/px.h|  2 +-
 contrib/seg/segdata.h|  2 +-
 src/backend/postmaster/autovacuum.c  |  4 ++--
 src/backend/postmaster/pgarch.c  |  2 +-
 src/backend/postmaster/pgstat.c  |  2 +-
 src/backend/postmaster/postmaster.c  |  4 ++--
 src/backend/postmaster/syslogger.c   |  2 +-
 src/backend/replication/logical/tablesync.c  |  3 +--
 src/backend/replication/walsender.c  |  2 +-
 src/backend/utils/adt/json.c |  4 ++--
 src/backend/utils/adt/ri_triggers.c  |  8 
 src/backend/utils/fmgr/dfmgr.c   |  4 ++--
 src/backend/utils/misc/guc.c |  3 +--
 src/bin/pg_dump/pg_backup_utils.h|  2 +-
 src/bin/pg_upgrade/pg_upgrade.h  |  2 +-
 src/bin/pgbench/pgbench.h| 12 ++--
 src/include/bootstrap/bootstrap.h|  4 ++--
 src/include/c.h  | 10 +-
 src/include/mb/pg_wchar.h|  6 +++---
 src/include/parser/parse_relation.h  |  6 +++---
 src/include/parser/scanner.h |  2 +-
 src/include/pgstat.h |  2 +-
 src/include/postmaster/autovacuum.h  |  4 ++--
 src/include/postmaster/bgworker_internals.h  |  2 +-
 src/include/postmaster/bgwriter.h|  4 ++--
 src/include/postmaster/pgarch.h  |  2 +-
 src/include/postmaster/postmaster.h  |  4 ++--
 src/include/postmaster/startup.h |  2 +-
 src/include/postmaster/syslogger.h   |  2 +-
 src/include/postmaster/walwriter.h   |  2 +-
 src/include/replication/walreceiver.h|  2 +-
 src/include/replication/walsender_private.h  |  2 +-
 src/include/storage/ipc.h|  2 +-
 src/include/storage/lock.h   |  2 +-
 src/include/tcop/tcopprot.h  | 10 +-
 src/include/utils/datetime.h |  4 ++--
 src/include/utils/elog.h |  8 
 src/include/utils/help_config.h  |  2 +-
 src/interfaces/ecpg/preproc/preproc_extern.h |  2 +-
 src/pl/plpgsql/src/plpgsql.h |  2 +-
 src/test/modules/test_shm_mq/test_shm_mq.h   |  2 +-
 src/test/modules/worker_spi/worker_spi.c |  2 +-
 src/timezone/zic.c   |  4 ++--
 45 files changed, 79 insertions(+), 83 deletions(-)

diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h
index dbe7d4f7429..178fb46f217 100644
--- a/contrib/cube/cubedata.h
+++ b/contrib/cube/cubedata.h
@@ -61,7 +61,7 @@ typedef struct NDBOX
 
 /* in cubescan.l */
 extern int	cube_yylex(void);
-extern void cube_yyerror(NDBOX **result, const char *message) pg_attribute_noreturn();
+extern pg_noreturn void cube_yyerror(NDBOX **result, const char *message);
 extern void cube_scanner_init(const char *str);
 

Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-11-12 Thread Alvaro Herrera
On 2019-Nov-07, Amit Langote wrote:

> On Tue, Nov 5, 2019 at 9:22 AM Michael Paquier  wrote:

> >  Please note that I have not switched the old interface
> > to be static to reloptions.c as if you look closely at reloptions.h we
> > allow much more flexibility around HANDLE_INT_RELOPTION to fill and
> > parse the reloptions in custom AM.  AM maintainers had better use the
> > new interface, but there could be a point for more customized error
> > messages.
> 
> I looked around but don't understand why these macros need to be
> exposed.  I read this comment:
> 
>  *  Note that this is more or less the same that fillRelOptions does, so only
>  *  use this if you need to do something non-standard within some option's
>  *  code block.
> 
> but don't see how an AM author might be able to do something
> non-standard with this interface.
> 
> Maybe Alvaro knows this better.

I think the idea was that you could have external code doing things in a
different way for some reason, ie. not use a bytea varlena struct that
could be filled by fillRelOptions but instead ... do something else.
That's why those macros are exposed.  Now, this idea doesn't seem to be
aged very well.  Maybe exposing that stuff is unnecessary.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: auxiliary processes in pg_stat_ssl

2019-11-12 Thread Alvaro Herrera
On 2019-Nov-04, Stephen Frost wrote:

> Based on what we claim in our docs, it does look like 'client_port IS
> NOT NULL' should work.  I do think we might want to update the docs to
> make it a bit more explicit, what we say now is:
> 
> TCP port number that the client is using for communication with this
> backend, or -1 if a Unix socket is used
> 
> We don't explain there that NULL means the backend doesn't have an
> external connection even though plenty of those entries show up in every
> instance of PG.  Perhaps we should add this:
> 
> If this field is null, it indicates that this is an internal process
> such as autovacuum.
> 
> Which is what we say for 'client_addr'.

Seems sensible.  Done.  Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: auxiliary processes in pg_stat_ssl

2019-11-12 Thread Alvaro Herrera
On 2019-Nov-04, Euler Taveira wrote:

> Yep, it is pointless. BackendType that open connections to server are:
> autovacuum worker, client backend, background worker, wal sender. I
> also notice that pg_stat_gssapi is in the same boat as pg_stat_ssl and
> we should constraint the rows to backend types that open connections.
> I'm attaching a patch to list only connections in those system views.

Thanks!  I pushed this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-12 Thread Maciek Sakrejda
On Mon, Oct 28, 2019 at 5:02 PM Andres Freund  wrote:
> What I dislike about that is that it basically again is introducing

"again"? Am I missing some history here? I'd love to read up on this
if there are mistakes to learn from.

> something that requires either pattern matching on key names (i.e. a key
> of '(.*) JIT' is one that has information about JIT, and the associated
> expresssion is in key $1), or knowing all the potential keys an
> expression could be in.

That still seems less awkward than having to handle a Filter field
that's either scalar or a group. Most current EXPLAIN options just add
additional fields to the structured plan instead of modifying it, no?
If that output is better enough, though, maybe we should just always
make Filter a group and go with the breaking change? If tooling
authors need to treat this case specially anyway, might as well evolve
the format.

> Another alternative would be to just remove the 'Output' line when a
> node doesn't project - it can't really carry meaning in those cases
> anyway?

¯\_(ツ)_/¯

For what it's worth, I certainly wouldn't miss it.




Re: Missing dependency tracking for TableFunc nodes

2019-11-12 Thread Andres Freund
On 2019-11-12 15:32:14 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-11-12 10:19:30 -0500, Tom Lane wrote:
> >> I could imagine annotations that say "this field contains a function OID"
> >> or "this list contains collation OIDs" and then the find_expr_references
> >> logic could be derived from that.
> 
> > I want to attach some annotations to types, rather than fields. I
> > e.g. introduced a Location typedef, annotated as being ignored for
> > equality purposes, instead of annotating each 'int location'. Wonder if
> > we should also do something like that for your hypothetical "function
> > OID" etc. above - seems like it also might give the human reader of code
> > a hint.
> 
> Hm.  We could certainly do "typedef FunctionOid Oid;",
> "typedef CollationOidList List;" etc, but I think it'd get pretty
> tedious pretty quickly --- just for this one purpose, you'd need
> a couple of typedefs for every system catalog that contains
> query-referenceable OIDs.  Maybe that's better than comment-style
> annotations, but I'm not convinced.

I'm not saying that we should exclusively do so, just that it's
worthwhile for a few frequent cases.


> One issue there that I'm not sure how to resolve with autogenerated
> code, much less automated checking, is that quite a few cases in
> find_expr_references know that we don't need to record a dependency on
> an OID stored in the node because there's an indirect dependency on
> something else.  For example, in FuncExpr we needn't log
> funcresulttype because the funcid is enough dependency, and we needn't
> log either funccollid or inputcollid because those are derived from
> the input expressions or the function result type.  (And giving up
> those optimizations would be pretty costly, 4x more dependency checks
> in this example alone.)

Yea, that one is hard. I suspect the best way to address that is to have
explicit code for a few cases that are worth optimizing (like
e.g. FuncExpr), and for the rest use the generic logic using.  I'd so
far just written the specialized cases into the "generated metadata"
using code - but we also could have an annotation that instructs to
instead call some function, but I doubt that's worthwhile.


> For sure I don't want both "CollationOid" and "RedundantCollationOid"
> typedefs

Indeed.


> so it seems like annotation is the solution for this

I'm not even sure annotations are going to be the easiest way to
implement some of the more complicated edge cases. Might be easier to
just open-code those, and fall back to generic logic for the rest. We'll
have to see, I think.


Greetings,

Andres Freund




Re: Collation versioning

2019-11-12 Thread Thomas Munro
On Wed, Nov 13, 2019 at 3:27 AM Julien Rouhaud  wrote:
> On Sun, Nov 10, 2019 at 10:08 AM Thomas Munro  wrote:
> > That's because the 0003 patch only calls recordDependencyOnVersion()
> > for simple attribute references.  When
> > recordDependencyOnSingleRelExpr() is called by index_create() to
> > analyse ii_Expressions and ii_Predicate, it's going to have to be
> > smart enough to detect collation references and record the versions.
> > There is also some more code that ignores pinned collations hiding in
> > there.
> >
> > This leads to the difficult question of how you recognise a real
> > dependency on a collation's version in an expression.  I have some
> > vague ideas but haven't seriously looked into it yet.  (The same
> > question comes up for check constraint -> collation dependencies.)
>
> Oh good point.  A simple and exhaustive way to deal with that would be
> to teach recordMultipleDependencies() to override isObjectPinned() and
> retrieve the collation version if the referenced object is a collation
> and it's neither C or POSIX collation.   It means that we could also
> remove the extra "version" argument and get rid of
> recordDependencyOnVersion to simply call recordMultipleDependencies in
> create_index for direct column references having a collation.
>
> Would it be ok to add this kind of knowledge in
> recordMultipleDependencies() or is it too hacky?

That doesn't seem like the right place; that's a raw data insertion
function, though... I guess it does already have enough brains to skip
pinned objects.  Hmm.

> > I think create_index() will need to perform recursive analysis on
> > composite types to look for text attributes, when they appear as
> > simple attributes, and then add direct dependencies index -> collation
> > to capture the versions.  Then we might need to do the same for
> > composite types hiding inside ii_Expressions and ii_Predicate (once we
> > figure out what that really means).
>
> Isn't that actually a bug?  For instance such an index will have a 0
> indcollation in pg_index, and according to pg_index documentation:
>
> " this contains the OID of the collation to use for the index, or zero
> if the column is not of a collatable data type."
>
> You can't use a COLLATE expression on such data type, but it still has
> a collation used.

I don't think it's a bug.  The information is available, but you have
to follow the graph to get it.  Considering that the composite type
could be something like CREATE TYPE my_type AS (fr_name text COLLATE
"fr_CA", en_name text COLLATE "en_CA"), there is no single collation
you could put into pg_index.indcollation anyway.  As for pg_depend,
it's currently enough for the index to depend on the type, and the
type to depend on the collation, because the purpose of dependencies
is to control dropping and dumping order, but for our new purpose we
also need to create direct dependencies index -> "fr_CA", index ->
"en_CA" so we can record the current versions.

> > 3.  Test t/002_pg_dump.pl in src/bin/pg_upgrade fails.
>
> Apparently neither "make check" nor "make world" run this test :(
> This was broken due collversion support in pg_dump, I have fixed it
> locally.

make check-world




Re: make pg_attribute_noreturn() work for msvc?

2019-11-12 Thread Tom Lane
Andres Freund  writes:
> So perhaps we ought to rename pg_attribute_noreturn() to pg_noreturn(),
> add a __declspec(noreturn) version, and move the existing uses to it.

> I'm inclined to also drop the parentheses at the same time (i.e
> pg_noreturn rather than pg_noreturn()) - it seems easier to mentally
> parse the code that way.

I guess my big question about that is whether pgindent will make a
hash of it.

One idea is to merge it with the "void" result type that such
a function would presumably have, along the lines of

#define pg_noreturn void __declspec(noreturn)
...
extern pg_noreturn proc_exit(int);

and if necessary, we could strongarm pgindent into believing
that pg_noreturn is a typedef.

regards, tom lane




Re: PHJ file leak.

2019-11-12 Thread Thomas Munro
On Tue, Nov 12, 2019 at 5:03 PM Thomas Munro  wrote:
> On Tue, Nov 12, 2019 at 4:23 PM Thomas Munro  wrote:
> > On Tue, Nov 12, 2019 at 4:20 PM Kyotaro Horiguchi
> >  wrote:
> > > The previous patch would be wrong. The root cause is a open batch so
> > > the right thing to be done at scan end is
> > > ExecHashTableDeatchBatch. And the real issue here seems to be in
> > > ExecutePlan, not in PHJ.
> >
> > You are right.  Here is the email I just wrote that says the same
> > thing, but with less efficiency:
>
> And yeah, your Make_parallel_shutdown_on_broken_channel.patch seems
> like the real fix here.  It's not optional to run that at
> end-of-query, though you might get that impression from various
> comments, and it's also not OK to call it before the end of the query,
> though you might get that impression from what the code actually does.

Here's the version I'd like to commit in a day or two, once the dust
has settled on the minor release.  Instead of adding yet another copy
of that code, I just moved it out of the loop; this way there is no
way to miss it.  I think the comment could also be better, but I'll
wait for the concurrent discussions about the meaning of
ExecShutdownNode() to fix that in master.


0001-Make-sure-we-call-ExecShutdownNode-if-appropriate.patch
Description: Binary data


Re: Missing dependency tracking for TableFunc nodes

2019-11-12 Thread Tom Lane
Andres Freund  writes:
> On 2019-11-12 10:19:30 -0500, Tom Lane wrote:
>> I could imagine annotations that say "this field contains a function OID"
>> or "this list contains collation OIDs" and then the find_expr_references
>> logic could be derived from that.

> I want to attach some annotations to types, rather than fields. I
> e.g. introduced a Location typedef, annotated as being ignored for
> equality purposes, instead of annotating each 'int location'. Wonder if
> we should also do something like that for your hypothetical "function
> OID" etc. above - seems like it also might give the human reader of code
> a hint.

Hm.  We could certainly do "typedef FunctionOid Oid;",
"typedef CollationOidList List;" etc, but I think it'd get pretty
tedious pretty quickly --- just for this one purpose, you'd need
a couple of typedefs for every system catalog that contains
query-referenceable OIDs.  Maybe that's better than comment-style
annotations, but I'm not convinced.

> Wonder if there's any way to write an assertion check that verifies we
> have the necessary dependencies. But the only idea I have - basically
> record all the syscache lookups while parse analysing an expression, and
> then check that all the necessary dependencies exist - seems too
> complicated to be worthwhile.

Yeah, it's problematic.  One issue there that I'm not sure how to
resolve with autogenerated code, much less automated checking, is that
quite a few cases in find_expr_references know that we don't need to
record a dependency on an OID stored in the node because there's an
indirect dependency on something else.  For example, in FuncExpr we
needn't log funcresulttype because the funcid is enough dependency,
and we needn't log either funccollid or inputcollid because those are
derived from the input expressions or the function result type.
(And giving up those optimizations would be pretty costly, 4x more
dependency checks in this example alone.)

For sure I don't want both "CollationOid" and "RedundantCollationOid"
typedefs, so it seems like annotation is the solution for this, but
I see no reasonable way to automatically verify such annotations.
Still, just writing down the annotations would be a way to expose
such assumptions for manual checking, which we don't really have now.

regards, tom lane




make pg_attribute_noreturn() work for msvc?

2019-11-12 Thread Andres Freund
Hi,

At the bottom of
https://www.postgresql.org/message-id/20191112192716.emrqs2afuefunw6v%40alap3.anarazel.de

I mused about the somewhat odd coding pattern at the end of WalSndShutdown():

/*
 * Handle a client's connection abort in an orderly manner.
 */
static void
WalSndShutdown(void)
{
/*
 * Reset whereToSendOutput to prevent ereport from attempting to send 
any
 * more messages to the standby.
 */
if (whereToSendOutput == DestRemote)
whereToSendOutput = DestNone;

proc_exit(0);
abort();/* keep the compiler 
quiet */
}

namely that we prox_exit() and then abort(). This case seems to be
purely historical baggage (from when this was inside other functiosn,
before being centralized), so we can likely just remove the abort().

But even back then, one would have hoped that proc_exit() being
annotated with pg_attribute_noreturn() should have told the compiler
enough.

But it turns out, we don't actually implement that for MSVC. Which does
explain at least some cases where we had to add "keep compiler quiet"
type code specifically for MSVC.

As it turns out msvc has it's own annotation for functions that don't
return, __declspec(noreturn). But it unfortunately needs to be placed
before where we, so far, placed pg_attribute_noreturn(), namely after
the function name / parameters. Instead it needs to be before the
function name.

But as it turns out GCC et al's __attribute__((noreturn)) actually can
also be placed there, and seemingly for a long time:
https://godbolt.org/z/8v5aFS

So perhaps we ought to rename pg_attribute_noreturn() to pg_noreturn(),
add a __declspec(noreturn) version, and move the existing uses to it.

I'm inclined to also drop the parentheses at the same time (i.e
pg_noreturn rather than pg_noreturn()) - it seems easier to mentally
parse the code that way.

I actually find the placement closer to the return type easier to
understand, so I'd find this mildly beneficial even without the msvc
angle.

Greetings,

Andres Freund




Re: Extension development

2019-11-12 Thread Ahsan Hadi
Hi Yonatan,

You can follow this blog for creating your own extension in PostgreSQL..

https://www.highgo.ca/2019/10/01/a-guide-to-create-user-defined-extension-modules-to-postgres/

-- Ahsan

On Tue, Nov 12, 2019 at 11:54 AM Yonatan Misgan  wrote:

> I am developed my own PostgreSQL extension for learning purpose and it is
> working correctly but I want to know to which components of the database is
> my own extension components communicate. For example I have c code, make
> file sql script, and control file after compiling the make file to which
> components of the database are each of my extension components to
> communicate. Thanks for your response.
>
>
>
> Regards,
>
> 
>
> *Yonathan Misgan *
>
> *Assistant Lecturer, @ Debre Tabor University*
>
> *Faculty of Technology*
>
> *Department of Computer Science*
>
> *Studying MSc in **Computer Science** (in Data and Web Engineering) *
>
> *@ Addis Ababa University*
>
> *E-mail: yona...@dtu.edu.et *
>
> *yonathanmisga...@gmail.com *
>
> *Tel:   **(+251)-911180185 (mob)*
>
>
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.h...@highgo.ca


Re: Missing dependency tracking for TableFunc nodes

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-12 10:19:30 -0500, Tom Lane wrote:
> I think that the long-term answer, if Andres gets somewhere with his
> project to autogenerate code like this, is that we'd rely on annotating
> the struct declarations to tell us what to do.  In the case at hand,
> I could imagine annotations that say "this field contains a function OID"
> or "this list contains collation OIDs" and then the find_expr_references
> logic could be derived from that.  Now, that's not perfect either, because
> it's always possible to forget to annotate something.  But it'd be a lot
> easier, because there'd be tons of nearby examples of doing it right.

Yea, I think that'd be going in the right direction.

I've a few annotations for other purposes in my local version of the
patch (e.g. to ignore fields for comparison), and adding further ones
for purposes like this ought to be easy.

I want to attach some annotations to types, rather than fields. I
e.g. introduced a Location typedef, annotated as being ignored for
equality purposes, instead of annotating each 'int location'. Wonder if
we should also do something like that for your hypothetical "function
OID" etc. above - seems like it also might give the human reader of code
a hint.



On 2019-11-11 16:41:41 -0500, Tom Lane wrote:
> I happened to notice that find_expr_references_walker has not
> been taught anything about TableFunc nodes, which means it will
> miss the type and collation OIDs embedded in such a node.

> Would it be a good idea to move find_expr_references_walker to
> nodeFuncs.c, in hopes of making it more visible to people adding
> new node types?

Can't hurt, at least. Reducing the number of files that need to be
fairly mechanically be touched when adding a node type / node type
field strikes me as a good idea.

Wonder if there's any way to write an assertion check that verifies we
have the necessary dependencies. But the only idea I have - basically
record all the syscache lookups while parse analysing an expression, and
then check that all the necessary dependencies exist - seems too
complicated to be worthwhile.


> We could decouple it from the specific use-case
> of recordDependencyOnExpr by having it call a callback function
> for each identified OID; although maybe there's no point in that,
> since I'm not sure there are any other use-cases.

I could see it being useful for a few other purposes, e.g. it seems
*marginally* possible we could share *some* code with
extract_query_dependencies(). But I think I'd only go there if we
actually convert something else to it.

Greetings,

Andres Freund




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-12 14:17:42 +0900, Michael Paquier wrote:
> On Tue, Oct 22, 2019 at 04:13:03PM +0530, Amit Kapila wrote:
> > Hmm, but then what is your suggestion for existing code that uses {0}.
> > If we reject this patch and leave the current code as it is, there is
> > always a risk of some people using {0} and others using memset which
> > will lead to further deviation in the code.  Now, maybe if we change
> > the existing code to always use memset where we use {0}, then we can
> > kind of enforce such a rule for future patch authors.
> 
> Well, we could have a shot at reducing the footprint of {0} then where
> we can.  I am seeing less than a dozen in contrib/, and a bit more
> than thirty in src/backend/.

-many. I think this serves zero positive purpose, except to make it
harder to analyze code-flow.

I think it's not worth going around to convert code to use {0} style
initializers in most cases, but when authors write it, we shouldn't
remove it either.

Greetings,

Andres Freund




Re: Coding in WalSndWaitForWal

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-11 13:53:40 -0300, Alvaro Herrera wrote:
> On 2019-Nov-11, Amit Kapila wrote:
> 
> > On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier  wrote:
> 
> > > So your suggestion would be to call GetFlushRecPtr() before the first
> > > check on RecentFlushPtr and before entering the loop?
> > 
> > No.  What I meant was to keep the current code as-is and have an
> > additional check on RecentFlushPtr before entering the loop.
> 
> I noticed that the "return" at the bottom of the function does a
> SetLatch(), but the other returns do not.  Isn't that a bug?

I don't think it is - We never reset the latch in that case. I don't see
what we'd gain from setting it explicitly, other than unnecessarily
performing more work?


>   /*
>* Fast path to avoid acquiring the spinlock in case we already know we
>* have enough WAL available. This is particularly interesting if we're
>* far behind.
>*/
>   if (RecentFlushPtr != InvalidXLogRecPtr &&
>   loc <= RecentFlushPtr)
> + {
> + SetLatch(MyLatch);
>   return RecentFlushPtr;
> + }

I.e. let's not do this.


>   /* Get a more recent flush pointer. */
>   if (!RecoveryInProgress())
>   RecentFlushPtr = GetFlushRecPtr();
>   else
>   RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>  
> + if (loc <= RecentFlushPtr)
> + {
> + SetLatch(MyLatch);
> + return RecentFlushPtr;
> + }

Hm. I'm doubtful this is a good idea - it essentially means we'd not
check for interrupts, protocol replies, etc. for an unbounded amount of
time. Whereas the existing fast-path does so for a bounded - although
not necessarily short - amount of time.

It seems to me it'd be better to just remove the "get a more recent
flush pointer" block - it doesn't seem to currently surve a meaningful
purpose.


>   for (;;)
>   {
>   longsleeptime;
>  
>   /* Clear any already-pending wakeups */
>   ResetLatch(MyLatch);
>  
> @@ -2267,15 +2276,14 @@ WalSndLoop(WalSndSendDataCallback send_data)
>  
>   /* Sleep until something happens or we time out */
>   (void) WaitLatchOrSocket(MyLatch, wakeEvents,
>
> MyProcPort->sock, sleeptime,
>
> WAIT_EVENT_WAL_SENDER_MAIN);
>   }
>   }
> - return;
>  }

Having dug into history, the reason this exists is that there used to be
the following below the return:

-
-send_failure:
-
-/*
- * Get here on send failure.  Clean up and exit.
- *
- * Reset whereToSendOutput to prevent ereport from attempting to send any
- * more messages to the standby.
- */
-if (whereToSendOutput == DestRemote)
-whereToSendOutput = DestNone;
-
-proc_exit(0);
-abort();/* keep the compiler quiet */

but when 5a991ef8692ed (Allow logical decoding via the walsender
interface) moved the shutdown code into its own function,
WalSndShutdown(), we left the returns in place.


We still have the curious
proc_exit(0);
abort();/* keep the compiler 
quiet */

pattern in WalSndShutdown() - wouldn't the right approach to instead
proc_exit() with pg_attribute_noreturn()?


Greetings,

Andres Freund




Re: [Patch] Optimize dropping of relation buffers using dlist

2019-11-12 Thread Tomas Vondra

On Tue, Nov 12, 2019 at 10:49:49AM +, k.jami...@fujitsu.com wrote:

On Thurs, November 7, 2019 1:27 AM (GMT+9), Robert Haas wrote:

On Tue, Nov 5, 2019 at 10:34 AM Tomas Vondra 
wrote:
> 2) This adds another hashtable maintenance to BufferAlloc etc. but
> you've only done tests / benchmark for the case this optimizes. I
> think we need to see a benchmark for workload that allocates and
> invalidates lot of buffers. A pgbench with a workload that fits into
> RAM but not into shared buffers would be interesting.

Yeah, it seems pretty hard to believe that this won't be bad for some workloads.
Not only do you have the overhead of the hash table operations, but you also
have locking overhead around that. A whole new set of LWLocks where you have
to take and release one of them every time you allocate or invalidate a buffer
seems likely to cause a pretty substantial contention problem.


I'm sorry for the late reply. Thank you Tomas and Robert for checking this 
patch.
Attached is the v3 of the patch.
- I moved the unnecessary items from buf_internals.h to cached_buf.c since most 
of
 of those items are only used in that file.
- Fixed the bug of v2. Seems to pass both RT and TAP test now

Thanks for the advice on benchmark test. Please refer below for test and 
results.

[Machine spec]
CPU: 16, Number of cores per socket: 8
RHEL6.5, Memory: 240GB

scale: 3125 (about 46GB DB size)
shared_buffers = 8GB

[workload that fits into RAM but not into shared buffers]
pgbench -i -s 3125 cachetest
pgbench -c 16 -j 8 -T 600 cachetest

[Patched]
scaling factor: 3125
query mode: simple
number of clients: 16
number of threads: 8
duration: 600 s
number of transactions actually processed: 8815123
latency average = 1.089 ms
tps = 14691.436343 (including connections establishing)
tps = 14691.482714 (excluding connections establishing)

[Master/Unpatched]
...
number of transactions actually processed: 8852327
latency average = 1.084 ms
tps = 14753.814648 (including connections establishing)
tps = 14753.861589 (excluding connections establishing)


My patch caused a little overhead of about 0.42-0.46%, which I think is small.
Kindly let me know your opinions/comments about the patch or tests, etc.



Now try measuring that with a read-only workload, with prepared
statements. I've tried that on a machine with 16 cores, doing

  # 16 clients
  pgbench -n -S -j 16 -c 16 -M prepared -T 60 test

  # 1 client
  pgbench -n -S -c 1 -M prepared -T 60 test

and average from 30 runs of each looks like this:

   # clients  master patched %
  -
   1  29690  27833   93.7%
   16300935 283383   94.1%

That's quite significant regression, considering it's optimizing an
operation that is expected to be pretty rare (people are generally not
dropping dropping objects as often as they query them).

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-11-12 Thread Andres Freund
Hi Peter, Peter, :)


On 2019-10-28 00:30:11 +, Smith, Peter wrote:
> From: Andres Freund  Sent: Sunday, 27 October 2019 12:03 
> PM
> > My proposal for this really was just to use this as a fallback for when 
> > static assert isn't available. Which in turn I was just suggesting because 
> > Tom wanted a fallback.
>
> The patch is updated to use "extern" technique only when  "_Static_assert" is 
> unavailable.

Cool.


>  /*
>   * forkname_to_number - look up fork number by name
> diff --git a/src/include/c.h b/src/include/c.h
> index d752cc0..3e24ff4 100644
> --- a/src/include/c.h
> +++ b/src/include/c.h
> @@ -838,11 +838,16 @@ extern void ExceptionalCondition(const char 
> *conditionName,
>   do { _Static_assert(condition, errmessage); } while(0)
>  #define StaticAssertExpr(condition, errmessage) \
>   ((void) ({ StaticAssertStmt(condition, errmessage); true; }))
> +/* StaticAssertDecl is suitable for use at file scope. */
> +#define StaticAssertDecl(condition, errmessage) \
> + _Static_assert(condition, errmessage)
>  #else/* 
> !HAVE__STATIC_ASSERT */
>  #define StaticAssertStmt(condition, errmessage) \
>   ((void) sizeof(struct { int static_assert_failure : (condition) ? 1 : 
> -1; }))
>  #define StaticAssertExpr(condition, errmessage) \
>   StaticAssertStmt(condition, errmessage)
> +#define StaticAssertDecl(condition, errmessage) \
> + extern void static_assert_func(int static_assert_failure[(condition) ? 
> 1 : -1])
>  #endif   /* 
> HAVE__STATIC_ASSERT */
>  #else/* C++ */
>  #if defined(__cpp_static_assert) && __cpp_static_assert >= 200410
> @@ -858,7 +863,6 @@ extern void ExceptionalCondition(const char 
> *conditionName,
>  #endif
>  #endif   /* C++
>   */

Peter Smith:

Is there a reason to not just make StaticAssertExpr and StaticAssertStmt
be the same? I don't want to proliferate variants that users have to
understand if there's no compelling need.  Nor do I think do we really
need two different fallback implementation for static asserts.

As far as I can tell we should be able to use the prototype based
approach in all the cases where we currently use the "negative bit-field
width" approach?

Should then also update
 * Otherwise we fall back on a kluge that assumes the compiler will complain
 * about a negative width for a struct bit-field.  This will not include a
 * helpful error message, but it beats not getting an error at all.


Peter Eisentraut:

Looking at the cplusplus variant, I'm somewhat surprised to see that you
made both fallback and plain version unconditionally use GCC style
compound expressions:

commit a2c8e5cfdb9d82ae6d4bb8f37a4dc7cbeca63ec1
Author: Peter Eisentraut 
Date:   2016-08-30 12:00:00 -0400

Add support for static assertions in C++

...

+#if defined(__cpp_static_assert) && __cpp_static_assert >= 200410
+#define StaticAssertStmt(condition, errmessage) \
+static_assert(condition, errmessage)
+#define StaticAssertExpr(condition, errmessage) \
+StaticAssertStmt(condition, errmessage)
+#else
+#define StaticAssertStmt(condition, errmessage) \
+do { struct static_assert_struct { int static_assert_failure : (condition) 
? 1 : -1; }; } while(0)
+#define StaticAssertExpr(condition, errmessage) \
+({ StaticAssertStmt(condition, errmessage); })
+#endif

Was that intentional? The C version intentionally uses compound
expressions only for the _Static_assert case, where configure tests for
the compound expression support?  As far as I can tell this'll not allow
using our headers e.g. with msvc in C++ mode if somebody introduce a
static assertion in a header - which seems like a likely and good
outcome with the changes proposed here?


Btw, it looks to me like msvc supports using the C++ static_assert()
even in C mode:
https://godbolt.org/z/b_dxDW

Greetings,

Andres Freund




Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2019-11-12 Thread Andres Freund
On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote:
> > On 11 Nov 2019, at 09:32, Michael Paquier  wrote:
> 
> > This part has resulted in 75c1921, and we could just change
> > DecodeMultiInsert() so as if there is no tuple data then we'd just
> > leave.  However, I don't feel completely comfortable with that either
> > as it would be nice to still check that for normal relations we
> > *always* have a FPW available.
> 
> XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
> IIUC (that is, non logically decoded relations), so it seems to me that we can
> have a fastpath out of DecodeMultiInsert() which inspects that flag without
> problems. Is this proposal along the lines of what you were thinking?

Maybe I'm missing something, but it's the opposite, no?

boolneed_tuple_data = RelationIsLogicallyLogged(relation);

...
if (need_tuple_data)
xlrec->flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;

or am I misunderstanding what you mean?


> @@ -1600,10 +1598,16 @@ recordDependencyOnExpr(const ObjectAddress *depender,
>   /* Remove any duplicates */
>   eliminate_duplicate_dependencies(context.addrs);
>  
> - /* And record 'em */
> - recordMultipleDependencies(depender,
> -context.addrs->refs, 
> context.addrs->numrefs,
> -behavior);
> + /*
> +  * And record 'em. If we know we only have a single dependency then
> +  * avoid the extra cost of setting up a multi insert.
> +  */
> + if (context.addrs->numrefs == 1)
> + recordDependencyOn(depender, &context.addrs->refs[0], behavior);
> + else
> + recordMultipleDependencies(depender,
> +
> context.addrs->refs, context.addrs->numrefs,
> +behavior);

I'm not sure this is actually a worthwhile complexity to keep. Hard to
believe that setting up a multi-insert is goign to have a significant
enough overhead to matter here?

And if it does, is there a chance we can hide this repeated block
somewhere within recordMultipleDependencies() or such? I don't like the
repetitiveness here. Seems recordMultipleDependencies() could easily
optimize the case of there being exactly one dependency to insert?


> +/*
> + * InsertPgAttributeTuples
> + *   Construct and insert multiple tuples in pg_attribute.
> + *
> + * This is a variant of InsertPgAttributeTuple() which dynamically allocates
> + * space for multiple tuples. Having two so similar functions is a kludge, 
> but
> + * for now it's a TODO to make it less terrible.
> + */
> +void
> +InsertPgAttributeTuples(Relation pg_attribute_rel,
> + FormData_pg_attribute 
> *new_attributes,
> + int natts,
> + CatalogIndexState indstate)
> +{
> + Datum   values[Natts_pg_attribute];
> + boolnulls[Natts_pg_attribute];
> + HeapTuple   tup;
> + int i;
> + TupleTableSlot **slot;
> +
> + /*
> +  * The slots are dropped in CatalogMultiInsertWithInfo(). TODO: natts
> +  * number of slots is not a reasonable assumption as a wide relation
> +  * would be detrimental, figuring a good number is left as a TODO.
> +  */
> + slot = palloc(sizeof(TupleTableSlot *) * natts);

Hm. Looking at

SELECT avg(pg_column_size(pa)) FROM pg_attribute pa;

yielding ~144 bytes, we can probably cap this at 128 or such, without
loosing efficiency. Or just use
#define MAX_BUFFERED_BYTES  65535
from copy.c or such (MAX_BUFFERED_BYTES / sizeof(FormData_pg_attribute)).


> + /* This is a tad tedious, but way cleaner than what we used to do... */

I don't like comments that refer to "what we used to" because there's no
way for anybody to make sense of that, because it's basically a dangling
reference :)


> + memset(values, 0, sizeof(values));
> + memset(nulls, false, sizeof(nulls));

> + /* start out with empty permissions and empty options */
> + nulls[Anum_pg_attribute_attacl - 1] = true;
> + nulls[Anum_pg_attribute_attoptions - 1] = true;
> + nulls[Anum_pg_attribute_attfdwoptions - 1] = true;
> + nulls[Anum_pg_attribute_attmissingval - 1] = true;
> +
> + /* attcacheoff is always -1 in storage */
> + values[Anum_pg_attribute_attcacheoff - 1] = Int32GetDatum(-1);
> +
> + for (i = 0; i < natts; i++)
> + {
> + values[Anum_pg_attribute_attrelid - 1] = 
> ObjectIdGetDatum(new_attributes[i].attrelid);
> + values[Anum_pg_attribute_attname - 1] = 
> NameGetDatum(&new_attributes[i].attname);
> + values[Anum_pg_attribute_atttypid - 1] = 
> ObjectIdGetDatum(new_attributes[i]

Re: SQL/JSON: JSON_TABLE

2019-11-12 Thread Pavel Stehule
Hi

please, can you rebase 0001-SQL-JSON-functions-v40.patch. I have a problem
with patching

Pavel


Re: segmentation fault when cassert enabled

2019-11-12 Thread Jehan-Guillaume de Rorthais
On Mon, 28 Oct 2019 16:47:02 +0900 (JST)
Kyotaro Horiguchi  wrote:

> At Fri, 25 Oct 2019 12:28:38 -0400, Tom Lane  wrote in 
> > Jehan-Guillaume de Rorthais  writes:  
> > > When investigating for the bug reported in thread "logical replication -
> > > negative bitmapset member not allowed", I found a way to seg fault
> > > postgresql only when cassert is enabled.
> > > ...
> > > I hadn't time to digg further yet. However, I don't understand why this
> > > crash is triggered when cassert is enabled.  
> > 
> > Most likely, it's not so much assertions that provoke the crash as
> > CLOBBER_FREED_MEMORY, ie the actual problem here is use of already-freed
> > memory.  
> 
> Agreed.
> 
> By the way I didn't get a crash by Jehan's script with the
> --enable-cassert build of the master HEAD of a few days ago.

I am now working with HEAD and I can confirm I am able to make it crash 99% of
the time using my script.
It feels like a race condition between cache invalidation and record
processing from worker.c. Make sure you have enough write activity during the
test.

> FWIW I sometimes got SEGVish crashes or mysterious misbehavor when
> some structs were changed and I didn't do "make clean". Rarely I
> needed "make distclean". (Yeah, I didn't ususally turn on
> --enable-depend..)

I'm paranoid, I always do:

* make distclean
* git reset; git clean -df
* ./configure && make install

Regards,




Re: Built-in connection pooler

2019-11-12 Thread Konstantin Knizhnik

Hi

On 12.11.2019 10:50, ideriha.take...@fujitsu.com wrote:

Hi.


From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]

New version of builtin connection pooler fixing handling messages of extended
protocol.


Here are things I've noticed.

1. Is adding guc to postgresql.conf.sample useful for users?

Good catch: I will add it.



2. When proxy_port is a bit large (perhaps more than 2^15), connection failed
though regular "port" is fine with number more than 2^15.

$ bin/psql -p  32768
2019-11-12 16:11:25.460 JST [5617] LOG:  Message size 84
2019-11-12 16:11:25.461 JST [5617] WARNING:  could not setup local connect to 
server
2019-11-12 16:11:25.461 JST [5617] DETAIL:  invalid port number: "-22768"
2019-11-12 16:11:25.461 JST [5617] LOG:  Handshake response will be sent to the 
client later when backed is assigned
psql: error: could not connect to server: invalid port number: "-22768"

Hmmm, ProxyPortNumber is used exactly in the same way as PostPortNumber.
I was able to connect to the specified port:


knizhnik@knizhnik:~/dtm-data$ psql postgres -p 42768
psql (13devel)
Type "help" for help.

postgres=# \q
knizhnik@knizhnik:~/dtm-data$ psql postgres -h 127.0.0.1 -p 42768
psql (13devel)
Type "help" for help.

postgres=# \q



3. When porxy_port is 6543 and connection_proxies is 2, running "make 
installcheck" twice without restarting server failed.
This is because of remaining backend.

== dropping database "regression" ==
ERROR:  database "regression" is being accessed by other users
DETAIL:  There is 1 other session using the database.
command failed: "/usr/local/pgsql-connection-proxy-performance/bin/psql" -X -c "DROP DATABASE IF EXISTS 
\"regression\"" "postgres"

Yes, this is known limitation.
Frankly speaking I do not consider it as a problem: it is not possible 
to drop database while there are active sessions accessing it.
And definitely proxy has such sessions. You can specify 
idle_pool_worker_timeout to shutdown pooler workers after  some idle time.
In this case, if you make large enough pause between test iterations, 
then workers will be terminated and it will be possible to drop database.



4. When running "make installcheck-world" with various connection-proxies, it 
results in a different number of errors.
With connection_proxies = 2, the test never ends. With connection_proxies = 20, 
23 tests failed.
More connection_proxies, the number of failed tests decreased.

Please notice, that each proxy maintains its own connection pool.
Default number of pooled backends is 10 (session_pool_size).
If you specify too large number of proxies then number of spawned backends =
 session_pool_size * connection_proxies can be too large (for the 
specified number of max_connections).


Please notice the difference between number of proxies and number of 
pooler backends.
Usually one proxy process is enough to serve all workers. Only in case 
of MPP systems with large number of cores
and especially with SSL connections, proxy can become a bottleneck. In 
this case you can configure several proxies.

But having more than 1-4 proxies seems to be bad idea.

But in case of check-world the problem is not related with number of 
proxies.

It takes place even with connection_proxies = 1
There was one bug with handling clients terminated inside transaction.
It is fixed in the attached patch.
But there is still problem with passing isolation tests under connection 
proxy: them are using pg_isolation_test_session_is_blocked
function which checks if backends with specified PIDs are blocked. But 
as far as in case of using connection proxy session is no more bounded 
to the particular backend, this check may not work as expected and test 
is blocked. I do not know how it can be fixed and not sure if it has to 
be fixed at all.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c
index adf0490..5c2095f 100644
--- a/contrib/spi/refint.c
+++ b/contrib/spi/refint.c
@@ -11,6 +11,7 @@
 
 #include "commands/trigger.h"
 #include "executor/spi.h"
+#include "storage/proc.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -93,6 +94,8 @@ check_primary_key(PG_FUNCTION_ARGS)
 	else
 		tuple = trigdata->tg_newtuple;
 
+	MyProc->is_tainted = true;
+
 	trigger = trigdata->tg_trigger;
 	nargs = trigger->tgnargs;
 	args = trigger->tgargs;
@@ -284,6 +287,8 @@ check_foreign_key(PG_FUNCTION_ARGS)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: cannot process INSERT events");
 
+	MyProc->is_tainted = true;
+
 	/* Have to check tg_trigtuple - tuple being deleted */
 	trigtuple = trigdata->tg_trigtuple;
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c91e3e1..df0bcaf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -719,6 +719,169 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_sessions (integer)
+  
+   ma

[PATCH][BUG FIX] Potential uninitialized vars used.

2019-11-12 Thread Ranier Vilela
Hi,
Var TargetEntry *tle;
Have several paths where can it fail.

Can anyone check this bug fix?

--- \dll\postgresql-12.0\a\backend\parser\parse_expr.c  Mon Sep 30 17:06:55 2019
+++ parse_expr.cTue Nov 12 12:43:07 2019
@@ -349,6 +349,7 @@
 errmsg("DEFAULT is not allowed in this 
context"),
 parser_errposition(pstate,

((SetToDefault *) expr)->location)));
+   result = NULL;  /* keep compiler quiet */
break;
 
/*
@@ -1637,11 +1638,13 @@
pstate->p_multiassign_exprs = 
lappend(pstate->p_multiassign_exprs,

  tle);
}
-   else
+   else {
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("source for a multiple-column 
UPDATE item must be a sub-SELECT or ROW() expression"),
 parser_errposition(pstate, 
exprLocation(maref->source;
+return NULL;
+}
}
else
{
@@ -1653,6 +1656,10 @@
Assert(pstate->p_multiassign_exprs != NIL);
tle = (TargetEntry *) llast(pstate->p_multiassign_exprs);
}
+if (tle == NULL) {
+   elog(ERROR, "unexpected expr type in multiassign list");
+   return NULL;/* keep compiler quiet 
*/
+}
 
/*
 * Emit the appropriate output expression for the current column


parse_expr.c.patch
Description: parse_expr.c.patch


Re: Option to dump foreign data in pg_dump

2019-11-12 Thread Daniel Gustafsson
> On 12 Nov 2019, at 15:21, Luis Carril  wrote:
> 
>> The nitpicks have been addressed.  However, it seems that the new file
>> containing the test FDW seems missing from the new version of the patch.  Did
>> you forget to git add the file?
> Yes, I forgot, thanks for noticing. New patch attached again.

The patch applies, compiles and tests clean.  The debate whether we want to
allow dumping of foreign data at all will continue but I am marking the patch
as ready for committer as I believe it is ready for input on that level.

cheers ./daniel



Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2019-11-12 Thread Daniel Gustafsson
> On 11 Nov 2019, at 09:32, Michael Paquier  wrote:

> This part has resulted in 75c1921, and we could just change
> DecodeMultiInsert() so as if there is no tuple data then we'd just
> leave.  However, I don't feel completely comfortable with that either
> as it would be nice to still check that for normal relations we
> *always* have a FPW available.

XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
IIUC (that is, non logically decoded relations), so it seems to me that we can
have a fastpath out of DecodeMultiInsert() which inspects that flag without
problems. Is this proposal along the lines of what you were thinking?

The patch is now generating an error in the regression tests as well, on your
recently added CREATE INDEX test from 68ac9cf2499236996f3d4bf31f7f16d5bd3c77af.
By using the ObjectAddresses API the dependencies are deduped before recorded,
removing the duplicate entry from that test output.  AFAICT there is nothing
benefiting us from having duplicate dependencies, so this seems an improvement
(albeit tiny and not very important), or am I missing something?  Is there a
use for duplicate dependencies?

Attached version contains the above two fixes, as well as a multi_insert for
dependencies in CREATE EXTENSION which I had missed to git add in previous
versions.

cheers ./daniel



catalog_multi_insert-v5.patch
Description: Binary data


Re: documentation inconsistent re: alignment

2019-11-12 Thread Tom Lane
Chapman Flack  writes:
> On 10/20/19 14:47, Tom Lane wrote:
>> Probably the statement in CREATE TYPE is too strong.  There are, I
>> believe, still machines in the buildfarm where maxalign is just 4.

> So just closing the circle on this, the low-down seems to be that
> the alignments called s, i, and d (by pg_type), and int2, int4, and
> double (by CREATE TYPE) refer to the machine values configure picks
> for ALIGNOF_SHORT, ALIGNOF_INT, and ALIGNOF_DOUBLE, respectively.

Right.

> And while configure also defines an ALIGNOF_LONG, and there are
> LONGALIGN macros in c.h that use it, that one isn't a choice when
> creating a type, presumably because it's never been usefully different
> on any interesting platform?

The problem with "long int" is that it's 32 bits on some platforms
and 64 bits on others, so it's not terribly useful as a basis for
a user-visible SQL type.  That's why it's not accounted for in the
typalign options.  I think ALIGNOF_LONG is just there for completeness'
sake --- it doesn't look to me like we actually use that, or LONGALIGN,
anyplace.

regards, tom lane




Re: Missing dependency tracking for TableFunc nodes

2019-11-12 Thread Tom Lane
Mark Dilger  writes:
> I played with this a bit, making the change I proposed, and got lots of 
> warnings from the compiler.  I don't know how many of these would need 
> to be suppressed by adding a no-op for them at the end of the switch vs. 
> how many need to be handled, but the attached patch implements the idea. 
>   I admit adding all these extra cases to the end is verbose

Yeah, that's why it's not done that way ...

> The change as written is much too verbose to be acceptable, but given 
> how many places in the code could really use this sort of treatment, I 
> wonder if there is a way to reorganize the NodeTag enum into multiple 
> enums, one for each logical subtype (such as executor nodes, plan nodes, 
> etc) and then have switches over enums of the given subtype, with the 
> compiler helping detect tags of same subtype that are unhandled in the 
> switch.

The problem here is that the set of nodes of interest can vary depending
on what you're doing.  As a case in point, find_expr_references has to
cover both expression nodes and some things that aren't expression nodes
but can represent dependencies of a plan tree.

I think that the long-term answer, if Andres gets somewhere with his
project to autogenerate code like this, is that we'd rely on annotating
the struct declarations to tell us what to do.  In the case at hand,
I could imagine annotations that say "this field contains a function OID"
or "this list contains collation OIDs" and then the find_expr_references
logic could be derived from that.  Now, that's not perfect either, because
it's always possible to forget to annotate something.  But it'd be a lot
easier, because there'd be tons of nearby examples of doing it right.

regards, tom lane




Re: Option to dump foreign data in pg_dump

2019-11-12 Thread Alvaro Herrera
On 2019-Nov-12, Luis Carril wrote:

> But, not all foreign tables are necessarily in a remote server like
> the ones referenced by the postgres_fdw.
> In FDWs like swarm64da, cstore, citus or timescaledb, the foreign
> tables are part of your database, and one could expect that a dump of
> the database includes data from these FDWs.

BTW these are not FDWs in the "foreign" sense at all; they're just
abusing the FDW system in order to be able to store data in some
different way.  The right thing to do IMO is to port these systems to be
users of the new storage abstraction (table AM).  If we do that, what
value is there to the feature being proposed here?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Collation versioning

2019-11-12 Thread Julien Rouhaud
On Sun, Nov 10, 2019 at 10:08 AM Thomas Munro  wrote:
>
> Some more thoughts:
>
> 1.  If you create an index on an expression that includes a COLLATE or
> a partial index that has one in the WHERE clause, you get bogus
> warnings:
>
> postgres=# create table t (v text);
> CREATE TABLE
> postgres=# create index on t(v) where v > 'hello' collate "en_NZ";
> WARNING:  index "t_v_idx3" depends on collation "en_NZ" version "",
> but the current version is "2.28"
> DETAIL:  The index may be corrupted due to changes in sort order.
> HINT:  REINDEX to avoid the risk of corruption.
> CREATE INDEX
>
> postgres=# create index on t((case when v < 'x' collate "en_NZ" then
> 'foo' else 'bar' end));
> WARNING:  index "t_case_idx" depends on collation "en_NZ" version "",
> but the current version is "2.28"
> DETAIL:  The index may be corrupted due to changes in sort order.
> HINT:  REINDEX to avoid the risk of corruption.
> CREATE INDEX
>
> That's because the 0003 patch only calls recordDependencyOnVersion()
> for simple attribute references.  When
> recordDependencyOnSingleRelExpr() is called by index_create() to
> analyse ii_Expressions and ii_Predicate, it's going to have to be
> smart enough to detect collation references and record the versions.
> There is also some more code that ignores pinned collations hiding in
> there.
>
> This leads to the difficult question of how you recognise a real
> dependency on a collation's version in an expression.  I have some
> vague ideas but haven't seriously looked into it yet.  (The same
> question comes up for check constraint -> collation dependencies.)

Oh good point.  A simple and exhaustive way to deal with that would be
to teach recordMultipleDependencies() to override isObjectPinned() and
retrieve the collation version if the referenced object is a collation
and it's neither C or POSIX collation.   It means that we could also
remove the extra "version" argument and get rid of
recordDependencyOnVersion to simply call recordMultipleDependencies in
create_index for direct column references having a collation.

Would it be ok to add this kind of knowledge in
recordMultipleDependencies() or is it too hacky?

> 2.  If you create a composite type with a text attribute (with or
> without an explicit collation), and then create an index on a column
> of that type, we don't record the dependency.
>
> postgres=# create type my_type as (x text collate "en_NZ");
> CREATE TYPE
> postgres=# create table t (v my_type);
> CREATE TABLE
> postgres=# create index on t(v);
> CREATE INDEX
> postgres=# select * from pg_depend where refobjversion != '';
>  classid | objid | objsubid | refclassid | refobjid | refobjsubid |
> refobjversion | deptype
> -+---+--++--+-+---+-
> (0 rows)
>
> I think create_index() will need to perform recursive analysis on
> composite types to look for text attributes, when they appear as
> simple attributes, and then add direct dependencies index -> collation
> to capture the versions.  Then we might need to do the same for
> composite types hiding inside ii_Expressions and ii_Predicate (once we
> figure out what that really means).

Isn't that actually a bug?  For instance such an index will have a 0
indcollation in pg_index, and according to pg_index documentation:

" this contains the OID of the collation to use for the index, or zero
if the column is not of a collatable data type."

You can't use a COLLATE expression on such data type, but it still has
a collation used.

> 3.  Test t/002_pg_dump.pl in src/bin/pg_upgrade fails.

Apparently neither "make check" nor "make world" run this test :(
This was broken due collversion support in pg_dump, I have fixed it
locally.

> 4.  In the warning message we should show get_collation_name() instead
> of the OID.

+1, I also fixed it locally.




Re: Option to dump foreign data in pg_dump

2019-11-12 Thread Luis Carril
The nitpicks have been addressed.  However, it seems that the new file
containing the test FDW seems missing from the new version of the patch.  Did
you forget to git add the file?

Yes, I forgot, thanks for noticing. New patch attached again.


Cheers


Luis M Carril

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 4bcd4bdaef..319851b936 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -767,6 +767,34 @@ PostgreSQL documentation
   
  
 
+ 
+  --include-foreign-data=foreignserver
+  
+   
+Dump the data for any foreign table with a foreign server
+matching foreignserver
+pattern. Multiple foreign servers can be selected by writing multiple --include-foreign-data.
+Also, the foreignserver parameter is
+interpreted as a pattern according to the same rules used by
+psql's \d commands (see ),
+so multiple foreign servers can also be selected by writing wildcard characters
+in the pattern.  When using wildcards, be careful to quote the pattern
+if needed to prevent the shell from expanding the wildcards;  see
+.
+The only exception is that an empty pattern is disallowed.
+   
+
+   
+
+ When --include-foreign-data is specified, pg_dump
+ does not check if the foreign table is also writeable. Therefore, there is no guarantee that 
+ the results of a foreign table dump can be successfully restored by themselves into a clean database.
+
+   
+  
+ 
+
  
   --inserts
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bf69adc2f4..31465dec79 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -120,6 +120,8 @@ static SimpleStringList table_exclude_patterns = {NULL, NULL};
 static SimpleOidList table_exclude_oids = {NULL, NULL};
 static SimpleStringList tabledata_exclude_patterns = {NULL, NULL};
 static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
+static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
+static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 
 
 char		g_opaque_type[10];	/* name for the opaque type */
@@ -156,6 +158,9 @@ static void expand_schema_name_patterns(Archive *fout,
 		SimpleStringList *patterns,
 		SimpleOidList *oids,
 		bool strict_names);
+static void expand_foreign_server_name_patterns(Archive *fout,
+		SimpleStringList *patterns,
+		SimpleOidList *oids);
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
@@ -388,6 +393,7 @@ main(int argc, char **argv)
 		{"no-sync", no_argument, NULL, 7},
 		{"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
+		{"include-foreign-data", required_argument, NULL, 11},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -604,6 +610,15 @@ main(int argc, char **argv)
 dopt.dump_inserts = (int) rowsPerInsert;
 break;
 
+			case 11:/* include foreign data */
+if (strcmp(optarg, "") == 0)
+{
+	pg_log_error("empty string is not a valid pattern in --include-foreign-data");
+	exit_nicely(1);
+}
+simple_string_list_append(&foreign_servers_include_patterns, optarg);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -645,6 +660,12 @@ main(int argc, char **argv)
 		exit_nicely(1);
 	}
 
+	if (dopt.schemaOnly && foreign_servers_include_patterns.head != NULL)
+	{
+		pg_log_error("options -s/--schema-only and --include-foreign-data cannot be used together");
+		exit_nicely(1);
+	}
+
 	if (dopt.dataOnly && dopt.outputClean)
 	{
 		pg_log_error("options -c/--clean and -a/--data-only cannot be used together");
@@ -812,6 +833,9 @@ main(int argc, char **argv)
 			   &tabledata_exclude_oids,
 			   false);
 
+	expand_foreign_server_name_patterns(fout, &foreign_servers_include_patterns,
+		&foreign_servers_include_oids);
+
 	/* non-matching exclusion patterns aren't an error */
 
 	/*
@@ -1035,6 +1059,9 @@ help(const char *progname)
 	printf(_("  --use-set-session-authorization\n"
 			 "   use SET SESSION AUTHORIZATION commands instead of\n"
 			 "   ALTER OWNER commands to set ownership\n"));
+	printf(_("  --include-foreign-data=PATTERN\n"
+			 "   include data of foreign tables with the named\n"
+			 "   foreign servers in dump\n"));
 
 	printf(_("\nConnection options:\n"));
 	printf(_("  -d, --dbname=DBNAME  database to dump\n"));
@@ -1333,6 +1360,53 @@ expand_schema_name_patterns(Archive *fout,
 	destroyPQExpBuffer(query);
 }
 
+/*
+ * Find the OIDs of all foreign servers matching the given list of patterns,
+ * and append them to the given OID

RE: [PATCH][BUG_FIX] Potential null pointer dereferencing.

2019-11-12 Thread Ranier Vilela
Hi,
The condition is :
74.  if (TupIsNull(slot))  is true
85.  if (TupIsNull(resultTupleSlot)) is true too.

If resultTupleSlot is not can be NULL, why test if (TupIsNull(resultTupleSlot))?
Occurring these two conditions ExecClearTuple is called, but, don't check by 
NULL arg.

There are at least 2 more possible cases, envolving ExecClearTuple:
nodeFunctionscan.c and nodeWindowAgg.c

Best regards,
Ranier Vilela


De: Daniel Gustafsson 
Enviado: terça-feira, 12 de novembro de 2019 13:43
Para: Ranier Vilela
Cc: PostgreSQL Hackers
Assunto: Re: [PATCH][BUG_FIX] Potential null pointer dereferencing.

> On 12 Nov 2019, at 14:07, Ranier Vilela  wrote:

> ExecClearTuple don't check por NULL pointer arg and according
> TupIsNull slot can be NULL.

I assume you are referring to the TupIsNull(resultTupleSlot) check a few lines
down in the same loop?  If resultTupleSlot was indeed NULL and not empty, the
subsequent call to ExecCopySlot would be a NULL pointer dereference too.  I
might be missing something obvious, but in which case can resultTupleSlot be
NULL when calling ExecUnique?

cheers ./daniel




Re: [PATCH][BUG_FIX] Potential null pointer dereferencing.

2019-11-12 Thread Daniel Gustafsson
> On 12 Nov 2019, at 14:07, Ranier Vilela  wrote:

> ExecClearTuple don't check por NULL pointer arg and according
> TupIsNull slot can be NULL.

I assume you are referring to the TupIsNull(resultTupleSlot) check a few lines
down in the same loop?  If resultTupleSlot was indeed NULL and not empty, the
subsequent call to ExecCopySlot would be a NULL pointer dereference too.  I
might be missing something obvious, but in which case can resultTupleSlot be
NULL when calling ExecUnique?

cheers ./daniel



Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Amit Kapila
On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
 wrote:
>
> On Tue, 12 Nov 2019 at 20:11, Amit Kapila  wrote:
> >
> > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> >  wrote:
> > >
> > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar  wrote:
> > > >
> > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > Yeah, maybe something like amparallelvacuumoptions.  The options can 
> > > > > be:
> > > > >
> > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > vacuumcleanup) can't be performed in parallel
> > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > performed in parallel (hash index will set this flag)
> > > >
> > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > will not do the cleanup in parallel right?
> > > >
> >
> > Yeah, but it is better to be explicit about this.
>
> VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
>

I am not sure if that is required.

> I think brin indexes
> will use this flag.
>

Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
it should work.

> It will end up with
> (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> VACUUM_OPTION_NO_PARALLEL, though.
>
> >
> > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > flag)
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > gin, gist, spgist, bloom will set this flag)
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > and bloom will set this flag)
> > > > >
> > > > > Does something like this make sense?
> > >
> > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > two flags instead: one for doing parallel cleanup when not performed
> > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > >
> >
> > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > what makes you think 4 is conditional.
>
> Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> 4 it doesn't need to set 3 because 4 means always doing cleanup in
> parallel.
>

Yeah, that makes sense.  They can just set 4.

> >
> > > That way, we
> > > can have flags as follows and index AM chooses two flags, one from the
> > > first two flags for bulk deletion and another from next three flags
> > > for cleanup.
> > >
> > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > >
> >
> > This also looks reasonable, but if there is an index that doesn't want
> > to support a parallel vacuum, it needs to set multiple flags.
>
> Right. It would be better to use uint16 as two uint8. I mean that if
> first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> could be followings:
>
> VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
>

Hmm, I think we should define these flags in the most simple way.
Your previous proposal sounds okay to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Option to dump foreign data in pg_dump

2019-11-12 Thread Daniel Gustafsson
> On 12 Nov 2019, at 12:12, Luis Carril  wrote:

>a new version of the patch with the tests from Daniel (thanks!) and the 
> nitpicks.

The nitpicks have been addressed.  However, it seems that the new file
containing the test FDW seems missing from the new version of the patch.  Did
you forget to git add the file?

>> I don't feel good about this feature.
>> pg_dump should not dump any data that are not part of the database
>> being dumped.
>> 
>> If you restore such a dump, the data will be inserted into the foreign table,
>> right?  Unless someone emptied the remote table first, this will add
>> duplicated data to that table.
>> I think that is an unpleasant surprise.  I'd expect that if I drop a database
>> and restore it from a dump, it should be as it was before.  This change would
>> break that assumption.
>> 
>> What are the use cases of a dump with foreign table data?
>> 
>> Unless I misunderstood something there, -1.
> 
> This feature is opt-in so if the user makes dumps of a remote server 
> explicitly by other means, then the user would not need to use these option.
> But, not all foreign tables are necessarily in a remote server like the ones 
> referenced by the postgres_fdw.
> In FDWs like swarm64da, cstore, citus or timescaledb, the foreign tables are 
> part of your database, and one could expect that a dump of the database 
> includes data from these FDWs.

Right, given the deliberate opt-in which is required I don't see much risk of
unpleasant user surprises.  There are no doubt foot-guns available with this
feature, as has been discussed upthread, but the current proposal is IMHO
minimizing them.

cheers ./daniel



Re: MarkBufferDirtyHint() and LSN update

2019-11-12 Thread Antonin Houska
Kyotaro Horiguchi  wrote:

> At Mon, 11 Nov 2019 10:03:14 +0100, Antonin Houska  wrote 
> in 
> > Michael Paquier  wrote:
> > > Does something like the attached patch make sense?  Reviews are
> > > welcome.
> > 
> > This looks good to me.
> 
> I have a qustion.
> 
> The current code assumes that !BM_DIRTY means that the function is
> dirtying the page.  But if !BM_JUST_DIRTIED, the function actually is
> going to re-dirty the page even if BM_DIRTY.

It makes sense to me. I can imagine the following:

1. FlushBuffer() cleared BM_JUST_DIRTIED, wrote the page to disk but hasn't
yet cleared BM_DIRTY.

2. Another backend changed a hint bit in shared memory and called
MarkBufferDirtyHint().

Thus FlushBuffer() missed the current hint bit change, so we need to dirty the
page again.

> If this is correct, the trigger for stats update is not !BM_DIRTY but
> !BM_JUST_DIRTIED, or the fact that we passed the line of
> XLogSaveBufferForHint() could be the trigger, regardless whether the
> LSN is valid or not.

I agree.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




[PATCH][BUG_FIX] Potential null pointer dereferencing.

2019-11-12 Thread Ranier Vilela
Hi,
ExecClearTuple don't check por NULL pointer arg and according
TupIsNull slot can be NULL.

Can anyone check this buf fix?

--- \dll\postgresql-12.0\a\backend\executor\nodeUnique.cMon Sep 30 
17:06:55 2019
+++ nodeUnique.cTue Nov 12 09:54:34 2019
@@ -74,7 +74,8 @@
if (TupIsNull(slot))
{
/* end of subplan, so we're done */
-   ExecClearTuple(resultTupleSlot);
+   if (!TupIsNull(resultTupleSlot))
+   ExecClearTuple(resultTupleSlot);
return NULL;
}
 


nodeUnique.c.patch
Description: nodeUnique.c.patch


Re: MarkBufferDirtyHint() and LSN update

2019-11-12 Thread Kyotaro Horiguchi
At Mon, 11 Nov 2019 10:03:14 +0100, Antonin Houska  wrote in 
> Michael Paquier  wrote:
> > Does something like the attached patch make sense?  Reviews are
> > welcome.
> 
> This looks good to me.

I have a qustion.

The current code assumes that !BM_DIRTY means that the function is
dirtying the page.  But if !BM_JUST_DIRTIED, the function actually is
going to re-dirty the page even if BM_DIRTY.

If this is correct, the trigger for stats update is not !BM_DIRTY but
!BM_JUST_DIRTIED, or the fact that we passed the line of
XLogSaveBufferForHint() could be the trigger, regardless whether the
LSN is valid or not.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Masahiko Sawada
On Tue, 12 Nov 2019 at 20:29, Dilip Kumar  wrote:
>
> On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
>  wrote:
> >
> > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar  wrote:
> > >
> > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada  
> > > wrote:
> > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > updated version patch that also incorporated some comments I got so
> > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > test the total delay time.
> > > >
> > > While reviewing the 0002, I got one doubt related to how we are
> > > dividing the maintainance_work_mem
> > >
> > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int 
> > > nindexes)
> > > +{
> > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > + lvshared->maintenance_work_mem_worker =
> > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > maintenance_work_mem;
> > > +}
> > > Is it fair to just consider the number of indexes which use
> > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > well.  My point is suppose there are 10 indexes which will use the
> > > maintenance_work_mem but we are launching just 2 workers then what is
> > > the point in dividing the maintenance_work_mem by 10.
> > >
> > > IMHO the calculation should be like this
> > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > maintenance_work_mem;
> > >
> > > Am I missing something?
> >
> > No, I think you're right. On the other hand I think that dividing it
> > by the number of indexes that will use the mantenance_work_mem makes
> > sense when parallel degree > the number of such indexes. Suppose the
> > table has 2 indexes and there are 10 workers then we should divide the
> > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > most 2 indexes that uses the maintenance_work_mem are processed in
> > parallel at a time.
> >
> Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).

Thanks! I'll fix it in the next version patch.

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Masahiko Sawada
On Tue, 12 Nov 2019 at 20:11, Amit Kapila  wrote:
>
> On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar  wrote:
> > >
> > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila  
> > > wrote:
> > > >
> > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > >
> > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > vacuumcleanup) can't be performed in parallel
> > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > performed in parallel (hash index will set this flag)
> > >
> > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > will not do the cleanup in parallel right?
> > >
>
> Yeah, but it is better to be explicit about this.

VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? I think brin indexes
will use this flag. It will end up with
(VACUUM_OPTION_NO_PARALLEL_CLEANUP |
VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
VACUUM_OPTION_NO_PARALLEL, though.

>
> > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > flag)
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > gin, gist, spgist, bloom will set this flag)
> > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > and bloom will set this flag)
> > > >
> > > > Does something like this make sense?
> >
> > 3 and 4 confused me because 4 also looks conditional. How about having
> > two flags instead: one for doing parallel cleanup when not performed
> > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> >
>
> Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> what makes you think 4 is conditional.

Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
4 it doesn't need to set 3 because 4 means always doing cleanup in
parallel.

>
> > That way, we
> > can have flags as follows and index AM chooses two flags, one from the
> > first two flags for bulk deletion and another from next three flags
> > for cleanup.
> >
> > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> >
>
> This also looks reasonable, but if there is an index that doesn't want
> to support a parallel vacuum, it needs to set multiple flags.

Right. It would be better to use uint16 as two uint8. I mean that if
first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
could be followings:

VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
VACUUM_OPTION_PARALLEL_CLEANUP 0x0200

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: [BUG FIX] Uninitialized var fargtypes used.

2019-11-12 Thread Ranier Vilela
Hi,
Sorry by error in the patch.

--- \dll\postgresql-12.0\a\backend\commands\event_trigger.c Mon Sep 30 17:06:55 
2019
+++ event_trigger.c Tue Nov 12 08:34:30 2019
@@ -171,7 +171,7 @@
  HeapTuple tuple;
  Oid funcoid;
  Oid funcrettype;
- Oid fargtypes[1]; /* dummy */
+ Oid fargtypes[1] = {InvalidOid}; /* dummy */
  Oid evtowner = GetUserId();
  ListCell   *lc;
  List   *tags = NULL;



De: Michael Paquier 
Enviado: terça-feira, 12 de novembro de 2019 03:31
Para: Ranier Vilela 
Cc: pgsql-hackers@lists.postgresql.org 
Assunto: Re: [BUG FIX] Uninitialized var fargtypes used.

On Mon, Nov 11, 2019 at 06:28:47PM +, Ranier Vilela wrote:
> Can anyone check this bug fix?
>
> +++ event_trigger.c Mon Nov 11 13:52:35 2019
> @@ -171,7 +171,7 @@
>  HeapTuple   tuple;
>  Oid funcoid;
>  Oid funcrettype;
> -   Oid fargtypes[1];   /* dummy */
> +   Oid fargtypes[1] = {InvalidOid, InvalidOid};  
>   /* dummy */
>  Oid evtowner = GetUserId();

Yeah, it would be better to fix this initialization.
--
Michael


event_trigger.c.patch
Description: event_trigger.c.patch


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-11-12 Thread Kuntal Ghosh
On Tue, Nov 12, 2019 at 4:12 PM Alexey Kondratov
 wrote:
>
> On 04.11.2019 13:05, Kuntal Ghosh wrote:
> > On Mon, Nov 4, 2019 at 3:32 PM Dilip Kumar  wrote:
> >> So your result shows that with "streaming on", performance is
> >> degrading?  By any chance did you try to see where is the bottleneck?
> >>
> > Right. But, as we increase the logical_decoding_work_mem, the
> > performance improves. I've not analyzed the bottleneck yet. I'm
> > looking into the same.
>
> My guess is that 64 kB is just too small value. In the table schema used
> for tests every rows takes at least 24 bytes for storing column values.
> Thus, with this logical_decoding_work_mem value the limit should be hit
> after about 2500+ rows, or about 400 times during transaction of 100
> rows size.
>
> It is just too frequent, while ReorderBufferStreamTXN includes a whole
> bunch of logic, e.g. it always starts internal transaction:
>
> /*
>   * Decoding needs access to syscaches et al., which in turn use
>   * heavyweight locks and such. Thus we need to have enough state around to
>   * keep track of those.  The easiest way is to simply use a transaction
>   * internally.  That also allows us to easily enforce that nothing writes
>   * to the database by checking for xid assignments. ...
>   */
>
> Also it issues separated stream_start/stop messages around each streamed
> transaction chunk. So if streaming starts and stops too frequently it
> adds additional overhead and may even interfere with current in-progress
> transaction.
>
Yeah, I've also found the same. With stream_start/stop message, it
writes 1 byte of checksum and 4 bytes of number of sub-transactions
which increases the write amplification significantly.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Dilip Kumar
On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
 wrote:
>
> On Mon, 11 Nov 2019 at 17:57, Dilip Kumar  wrote:
> >
> > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada  
> > wrote:
> > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > updated version patch that also incorporated some comments I got so
> > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > test the total delay time.
> > >
> > While reviewing the 0002, I got one doubt related to how we are
> > dividing the maintainance_work_mem
> >
> > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > +{
> > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > + lvshared->maintenance_work_mem_worker =
> > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > maintenance_work_mem;
> > +}
> > Is it fair to just consider the number of indexes which use
> > maintenance_work_mem?  Or we need to consider the number of worker as
> > well.  My point is suppose there are 10 indexes which will use the
> > maintenance_work_mem but we are launching just 2 workers then what is
> > the point in dividing the maintenance_work_mem by 10.
> >
> > IMHO the calculation should be like this
> > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > maintenance_work_mem;
> >
> > Am I missing something?
>
> No, I think you're right. On the other hand I think that dividing it
> by the number of indexes that will use the mantenance_work_mem makes
> sense when parallel degree > the number of such indexes. Suppose the
> table has 2 indexes and there are 10 workers then we should divide the
> maintenance_work_mem by 2 rather than 10 because it's possible that at
> most 2 indexes that uses the maintenance_work_mem are processed in
> parallel at a time.
>
Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: cost based vacuum (parallel)

2019-11-12 Thread Masahiko Sawada
On Tue, 12 Nov 2019 at 19:08, Amit Kapila  wrote:
>
> On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar  wrote:
> >
> > On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar  wrote:
> > >
> > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar  
> > > > wrote:
> > > > >
> > > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila 
> > > > > >  wrote:
> > > > > > >
> > > > > > >
> > > > > > > Yeah, I think it is difficult to get the exact balance, but we 
> > > > > > > can try
> > > > > > > to be as close as possible.  We can try to play with the 
> > > > > > > threshold and
> > > > > > > another possibility is to try to sleep in proportion to the 
> > > > > > > amount of
> > > > > > > I/O done by the worker.
> > > > > > I have done another experiment where I have done another 2 changes 
> > > > > > on
> > > > > > top op patch3
> > > > > > a) Only reduce the local balance from the total shared balance
> > > > > > whenever it's applying delay
> > > > > > b) Compute the delay based on the local balance.
> > > > > >
> > > > > > patch4:
> > > > > > worker 0 delay=84.13 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > worker 1 delay=89.23 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > worker 2 delay=88.68 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > worker 3 delay=80.79 total I/O=16378 hit=4318 miss=0 dirty=603
> > > > > >
> > > > > > I think with this approach the delay is divided among the worker 
> > > > > > quite
> > > > > > well compared to other approaches
> > > > > >
> > > > > > >
> > > > ..
> > > > > I have tested the same with some other workload(test file attached).
> > > > > I can see the same behaviour with this workload as well that with the
> > > > > patch 4 the distribution of the delay is better compared to other
> > > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > > have alsomost equal delay.  Only thing is that the total delay with
> > > > > the patch 4 is slightly less compared to other pacthes.
> > > > >
> > > >
> > > > I see one problem with the formula you have used in the patch, maybe
> > > > that is causing the value of total delay to go down.
> > > >
> > > > - if (new_balance >= VacuumCostLimit)
> > > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > > + if ((new_balance >= VacuumCostLimit) &&
> > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > > >
> > > > As per discussion, the second part of the condition should be
> > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".  I think
> > > > you can once change this and try again.  Also, please try with the
> > > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > > >
> > > I have modified the patch4 and ran with different values.  But, I
> > > don't see much difference in the values with the patch4.  Infact I
> > > removed the condition for the local balancing check completely still
> > > the delays are the same,  I think this is because with patch4 worker
> > > are only reducing their own balance and also delaying as much as their
> > > local balance.  So maybe the second condition will not have much
> > > impact.
> > >
>
> Yeah, but I suspect the condition (when the local balance exceeds a
> certain threshold, then only try to perform delay) you mentioned can
> have an impact in some other scenarios.  So, it is better to retain
> the same.  I feel the overall results look sane and the approach seems
> reasonable to me.
>
> > >
> > I have revised the patch4 so that it doesn't depent upon the fix
> > number of workers, instead I have dynamically updated the worker
> > count.
> >
>
> Thanks.  Sawada-San, by any chance, can you try some of the tests done
> by Dilip or some similar tests just to rule out any sort of
> machine-specific dependency?

Sure. I'll try it tomorrow.

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Option to dump foreign data in pg_dump

2019-11-12 Thread Luis Carril
Hello

   a new version of the patch with the tests from Daniel (thanks!) and the 
nitpicks.


I don't feel good about this feature.
pg_dump should not dump any data that are not part of the database
being dumped.

If you restore such a dump, the data will be inserted into the foreign table,
right?  Unless someone emptied the remote table first, this will add
duplicated data to that table.
I think that is an unpleasant surprise.  I'd expect that if I drop a database
and restore it from a dump, it should be as it was before.  This change would
break that assumption.

What are the use cases of a dump with foreign table data?

Unless I misunderstood something there, -1.

This feature is opt-in so if the user makes dumps of a remote server explicitly 
by other means, then the user would not need to use these option.

But, not all foreign tables are necessarily in a remote server like the ones 
referenced by the postgres_fdw.
In FDWs like swarm64da, cstore, citus or timescaledb, the foreign tables are 
part of your database, and one could expect that a dump of the database 
includes data from these FDWs.


Cheers


Luis M Carril


diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 4bcd4bdaef..319851b936 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -767,6 +767,34 @@ PostgreSQL documentation
   
  
 
+ 
+  --include-foreign-data=foreignserver
+  
+   
+Dump the data for any foreign table with a foreign server
+matching foreignserver
+pattern. Multiple foreign servers can be selected by writing multiple --include-foreign-data.
+Also, the foreignserver parameter is
+interpreted as a pattern according to the same rules used by
+psql's \d commands (see ),
+so multiple foreign servers can also be selected by writing wildcard characters
+in the pattern.  When using wildcards, be careful to quote the pattern
+if needed to prevent the shell from expanding the wildcards;  see
+.
+The only exception is that an empty pattern is disallowed.
+   
+
+   
+
+ When --include-foreign-data is specified, pg_dump
+ does not check if the foreign table is also writeable. Therefore, there is no guarantee that 
+ the results of a foreign table dump can be successfully restored by themselves into a clean database.
+
+   
+  
+ 
+
  
   --inserts
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bf69adc2f4..31465dec79 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -120,6 +120,8 @@ static SimpleStringList table_exclude_patterns = {NULL, NULL};
 static SimpleOidList table_exclude_oids = {NULL, NULL};
 static SimpleStringList tabledata_exclude_patterns = {NULL, NULL};
 static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
+static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
+static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 
 
 char		g_opaque_type[10];	/* name for the opaque type */
@@ -156,6 +158,9 @@ static void expand_schema_name_patterns(Archive *fout,
 		SimpleStringList *patterns,
 		SimpleOidList *oids,
 		bool strict_names);
+static void expand_foreign_server_name_patterns(Archive *fout,
+		SimpleStringList *patterns,
+		SimpleOidList *oids);
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
@@ -388,6 +393,7 @@ main(int argc, char **argv)
 		{"no-sync", no_argument, NULL, 7},
 		{"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
+		{"include-foreign-data", required_argument, NULL, 11},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -604,6 +610,15 @@ main(int argc, char **argv)
 dopt.dump_inserts = (int) rowsPerInsert;
 break;
 
+			case 11:/* include foreign data */
+if (strcmp(optarg, "") == 0)
+{
+	pg_log_error("empty string is not a valid pattern in --include-foreign-data");
+	exit_nicely(1);
+}
+simple_string_list_append(&foreign_servers_include_patterns, optarg);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -645,6 +660,12 @@ main(int argc, char **argv)
 		exit_nicely(1);
 	}
 
+	if (dopt.schemaOnly && foreign_servers_include_patterns.head != NULL)
+	{
+		pg_log_error("options -s/--schema-only and --include-foreign-data cannot be used together");
+		exit_nicely(1);
+	}
+
 	if (dopt.dataOnly && dopt.outputClean)
 	{
 		pg_log_error("options -c/--clean and -a/--data-only cannot be used together");
@@ -812,6 +833,9 @@ main(int argc, char **argv)
 			   &tabledata_exclude_oids,
 			   false);
 
+	expand_foreign_server_name_patterns(fout, &foreign_servers_include_patterns,
+		&foreign_server

Re: [HACKERS] Block level parallel vacuum

2019-11-12 Thread Amit Kapila
On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
 wrote:
>
> On Tue, 12 Nov 2019 at 18:26, Dilip Kumar  wrote:
> >
> > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila  wrote:
> > >
> > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > >
> > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > vacuumcleanup) can't be performed in parallel
> > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > performed in parallel (hash index will set this flag)
> >
> > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > will not do the cleanup in parallel right?
> >

Yeah, but it is better to be explicit about this.

> > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > flag)
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > gin, gist, spgist, bloom will set this flag)
> > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > and bloom will set this flag)
> > >
> > > Does something like this make sense?
>
> 3 and 4 confused me because 4 also looks conditional. How about having
> two flags instead: one for doing parallel cleanup when not performed
> yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
>

Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
what makes you think 4 is conditional.

> That way, we
> can have flags as follows and index AM chooses two flags, one from the
> first two flags for bulk deletion and another from next three flags
> for cleanup.
>
> VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
>

This also looks reasonable, but if there is an index that doesn't want
to support a parallel vacuum, it needs to set multiple flags.

> > Yeah, something like that seems better to me.
> >
> > > If we all agree on this, then I
> > > think we can summarize the part of the discussion related to this API
> > > and get feedback from a broader audience.
> >
> > Make sense.
>
> +1
>

Okay, then I will write a separate email for this topic.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Attempt to consolidate reading of XLOG page

2019-11-12 Thread Antonin Houska
Michael Paquier  wrote:

> On Mon, Nov 11, 2019 at 04:25:56PM +0100, Antonin Houska wrote:
> >> On Fri, Oct 04, 2019 at 12:11:11PM +0200, Antonin Houska wrote:
> >> Your patch removes all the three optional lseek() calls which can
> >> happen in a segment.  Am I missing something but isn't that plain
> >> wrong?  You could reuse the error context for that as well if an error
> >> happens as what's needed is basically the segment name and the LSN
> >> offset.
> > 
> > Explicit call of lseek() is not used because XLogRead() uses pg_pread()
> > now. Nevertheless I found out that in the the last version of the patch I 
> > set
> > ws_off to 0 for a newly opened segment. This was wrong, fixed now.
> 
> Missed that part, thanks.  This was actually not obvious after an
> initial lookup of the patch.  Wouldn't it make sense to split that
> part in a separate patch that we could review and get committed first
> then?  It would have the advantage to make the rest easier to review
> and follow.  And using pread is actually better for performance
> compared to read+lseek.  Now there is also the argument that we don't
> always seek into an opened WAL segment, and that a plain read() is
> actually better than pread() in some cases.

ok, the next version uses explicit lseek(). Maybe the fact that XLOG is mostly
read sequentially (i.e. without frequent seeks) is the reason pread() has't
been adopted so far.

The new version reflects your other suggestions too, except the one about not
renaming "XLOG" -> "WAL" (actually you mentioned that earlier in the
thread). I recall that when working on the preliminary patch (709d003fbd),
Alvaro suggested "WAL" for some structures because these are new. The rule
seemed to be that "XLOG..." should be left for the existing symbols, while the
new ones should be "WAL...":

https://www.postgresql.org/message-id/20190917221521.GA15733%40alvherre.pgsql

So I decided to rename the new symbols and to remove the related comment.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

>From 34c0ae891de7dae3b84de33795c5d0521ccc0a88 Mon Sep 17 00:00:00 2001
From: Antonin Houska 
Date: Tue, 12 Nov 2019 11:51:14 +0100
Subject: [PATCH 1/2] Use only xlogreader.c:XLogRead()

The implementations in xlogutils.c and walsender.c are just renamed now, to be
removed by the following diff.
---
 src/backend/access/transam/xlogreader.c | 121 
 src/backend/access/transam/xlogutils.c  |  94 ++--
 src/backend/replication/walsender.c | 143 +++-
 src/bin/pg_waldump/pg_waldump.c |  63 ++-
 src/include/access/xlogreader.h |  37 ++
 src/include/access/xlogutils.h  |   1 +
 6 files changed, 441 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 7f24f0cb95..006c6298c9 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -17,6 +17,8 @@
  */
 #include "postgres.h"
 
+#include 
+
 #include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "access/xlogreader.h"
@@ -27,6 +29,7 @@
 
 #ifndef FRONTEND
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #endif
 
@@ -1015,6 +1018,124 @@ out:
 
 #endif			/* FRONTEND */
 
+/*
+ * Read 'count' bytes from WAL fetched from timeline 'tli' into 'buf',
+ * starting at location 'startptr'. 'seg' is the last segment used,
+ * 'openSegment' is a callback to open the next segment and 'segcxt' is
+ * additional segment info that does not fit into 'seg'.
+ *
+ * 'errinfo' should point to XLogReadError structure which will receive error
+ * details in case the read fails.
+ *
+ * Returns true if succeeded, false if failed.
+ *
+ * XXX probably this should be improved to suck data directly from the
+ * WAL buffers when possible.
+ */
+bool
+XLogRead(char *buf, XLogRecPtr startptr, Size count, TimeLineID tli,
+		 WALOpenSegment *seg, WALSegmentContext *segcxt,
+		 WALSegmentOpen openSegment, WALReadError *errinfo)
+{
+	char	   *p;
+	XLogRecPtr	recptr;
+	Size		nbytes;
+
+	p = buf;
+	recptr = startptr;
+	nbytes = count;
+
+	while (nbytes > 0)
+	{
+		uint32		startoff;
+		int			segbytes;
+		int			readbytes;
+
+		startoff = XLogSegmentOffset(recptr, segcxt->ws_segsize);
+
+		if (seg->ws_file < 0 ||
+			!XLByteInSeg(recptr, seg->ws_segno, segcxt->ws_segsize) ||
+			tli != seg->ws_tli)
+		{
+			XLogSegNo	nextSegNo;
+
+			/* Switch to another logfile segment */
+			if (seg->ws_file >= 0)
+close(seg->ws_file);
+
+			XLByteToSeg(recptr, nextSegNo, segcxt->ws_segsize);
+
+			/* Open the next segment in the caller's way. */
+			openSegment(nextSegNo, segcxt, &tli, &seg->ws_file);
+
+			/* Update the current segment info. */
+			seg->ws_tli = tli;
+			seg->ws_segno = nextSegNo;
+			seg->ws_off = 0;
+		}
+
+		/* Need to seek in the file? */
+		if (seg->ws_off != startoff)
+		{
+			/*
+			 * Update ws_off unconditionally, it will be useful fo

  1   2   >