Re: to be a multirange or not be, that's the question

2021-11-06 Thread David G. Johnston
On Saturday, November 6, 2021, Jaime Casanova 
wrote:

> Ok, subject was a bit philosophical but this message I just found is
> quite confusing.
>
> """
> regression=# select cast(null as anyrange) &> cast(null as anymultirange);
> ERROR:  argument declared anymultirange is not a multirange type but type
> anymultirange
> """
>
>
I get that it is hard to parse but it is unambiguously correct.  It does
seem to presume that one understands how polymorphic pseudo-types work (the
supplied query was written without that knowledge applied).  I can envision
some improvements here though it would depend a lot on the actual
implementation.  I’m also curious as to why we get as far the operator
invocation when casting literals to polymorphic pseudo-types is supposedly
an invalid thing to do.

David J.


to be a multirange or not be, that's the question

2021-11-06 Thread Jaime Casanova
Ok, subject was a bit philosophical but this message I just found is
quite confusing.

"""
regression=# select cast(null as anyrange) &> cast(null as anymultirange);
ERROR:  argument declared anymultirange is not a multirange type but type 
anymultirange
"""

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: archive modules

2021-11-06 Thread Fujii Masao




On 2021/11/03 0:03, Bossart, Nathan wrote:

On 11/1/21, 9:44 PM, "Fujii Masao"  wrote:

What is the main motivation of this patch? I was thinking that
it's for parallelizing WAL archiving. But as far as I read
the patch very briefly, WAL file name is still passed to
the archive callback function one by one.


The main motivation is provide a way to archive without shelling out.
This reduces the amount of overhead, which can improve archival rate
significantly.


It's helpful if you share how much this approach reduces
the amount of overhead.



It should also make it easier to archive more safely.
For example, many of the common shell commands used for archiving
won't fsync the data, but it isn't too hard to do so via C.


But probably we can do the same thing even by using the existing
shell interface? For example, we can implement and provide
the C program of the archive command that fsync's the file?
Users can just use it in archive_command.



The
current proposal doesn't introduce any extra infrastructure for
batching or parallelism, but it is probably still possible.  I would
like to eventually add batching, but for now I'm only focused on
introducing basic archive module support.


Understood. I agree that it's reasonable to implement them gradually.



Are you planning to extend this mechanism to other WAL
archiving-related commands like restore_command? I can imagine
that those who use archive library (rather than shell) would
like to use the same mechanism for WAL restore.


I would like to do this eventually, but my current proposal is limited
to archive_command.


Understood.

 

I think that it's worth adding this module into core
rather than handling it as test module. It provides very basic
WAL archiving feature, but (I guess) it's enough for some users.


Do you think it should go into contrib?


Yes, at least for me..

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Allow escape in application_name

2021-11-06 Thread Fujii Masao




On 2021/11/05 12:17, Kyotaro Horiguchi wrote:

I'm not sure that that distinction is so clear for users. So I feel we
want a notice something like this. But it doesn't seem correct as it
is.  When the user name of the session consists of non-ascii
characters, %u is finally seen as a sequence of '?'.  It is not a
embedded strings in pgfdw_application_name.  I didn't notice this
behavior.

"pgfdw_application_name is set to application_name of a pgfdw
connection after placeholder conversion, thus the resulting string is
subject to the same restrictions to application_names.". Maybe
followed by "If session user name or database name contains non-ascii
characters, they are replaced by question marks "?"".


If we add something to the docs about this, we should clearly
describe what postgres_fdw.application_name does and
what postgres_fdw in a foreign server does. BTW, this kind of
information was already commented in option.c.
Probably it's better to mention the limitations on not only
ASCII characters but also the length of application name.

 * Unlike application_name GUC, don't set GUC_IS_NAME flag nor 
check_hook
 * to allow postgres_fdw.application_name to be any string more than
 * NAMEDATALEN characters and to include non-ASCII characters. Instead,
 * remote server truncates application_name of remote connection to less
 * than NAMEDATALEN and replaces any non-ASCII characters in it with a 
'?'
 * character.

If possible, I'd like to see this change as a separate patch
and commt it first because this is the description for
the existing parameter postgres_fdw.application_name.



I'd like to hear more opinions about this from others.
But *if* there is actually no use case, I'd like not to add
the feature, to make the code simpler.


I think padding is useful because it alingns the significant content
of log lines by equating the length of the leading fixed
components. application_name doesn't make any other visible components
unaligned even when shown in the pg_stat_activity view.  It is not
useful in that sense.

Padding make the string itself make look maybe nicer, but gives no
other advantage.

I doubt people complain to the lack of the padding feature.  Addition
to that, since pgfdw_application_name and log_line_prefix work
different way in regard to multibyte characters, we don't need to
struggle so hard to consilidate them in their behavior.

# I noticed that the paddig feature doesn't consider multibyte
# characters. (I'm not suggesting them ought to be handled
# "prpoerly").

In short, I'm for to removing it by +0.7.


So our current consensus is to remove the padding part
from postgres_fdw.application_name.



+   case 'u':
+   Assert(MyProcPort != NULL);

Isn't it enough to perform this assertion check only once
at the top of parse_pgfdw_appname()?


Yeah, in either way, we should treat them in the same way.


We can force parse_pgfdw_appname() not to be called if MyProcPort does
not exist,
but I don't think it is needed now.


Yes.


(I assume you said "it is needed now".)  I'm not sure how to force
that but if it means a NULL MyProcPort cuases a crash, I think
crashing server is needlessly too aggressive as the penatly.


I said "Yes" for Kuroda-san's comment "I don't think it is
needed now". So I meant that "it is NOT needed now".
Sorry for unclear comment..

His idea was to skip calling parse_pgfdw_appname() if
MyProcPort is NULL, so as to prevent parse_pgfdw_appname()
from seeing NULL pointer of MyProcPort. But he thought
it's not necessary now, and I agree with him because
the idea would cause more confusing behavior.



It seems to me that postgres-fdw asumes a valid user id, but doesn't
make no use of databsae, server port, and process id.  What I thought
here is that making it an assertion is too much. So just ignoring the
replacement is also fine to me.

That being said, if we are eager not to have unused code paths, it is
reasonable enough.  I don't object strongly to replace it with an
assertion.


So no one strongly objects to the addition of assertion?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Parallel Full Hash Join

2021-11-06 Thread Justin Pryzby
> Rebased patches attached. I will change status back to "Ready for Committer"

The CI showed a crash on freebsd, which I reproduced.
https://cirrus-ci.com/task/5203060415791104

The crash is evidenced in 0001 - but only ~15% of the time.

I think it's the same thing which was committed and then reverted here, so
maybe I'm not saying anything new.

https://commitfest.postgresql.org/33/3031/
https://www.postgresql.org/message-id/flat/20200929061142.ga29...@paquier.xyz

(gdb) p pstate->build_barrier->phase 
Cannot access memory at address 0x7f82e0fa42f4

#1  0x7f13de34f801 in __GI_abort () at abort.c:79
#2  0x5638e6a16d28 in ExceptionalCondition 
(conditionName=conditionName@entry=0x5638e6b62850 "!pstate || 
BarrierPhase(&pstate->build_barrier) >= PHJ_BUILD_RUN",
errorType=errorType@entry=0x5638e6a6f00b "FailedAssertion", 
fileName=fileName@entry=0x5638e6b625be "nodeHash.c", 
lineNumber=lineNumber@entry=3305) at assert.c:69
#3  0x5638e678085b in ExecHashTableDetach (hashtable=0x5638e8e6ca88) at 
nodeHash.c:3305
#4  0x5638e6784656 in ExecShutdownHashJoin (node=node@entry=0x5638e8e57cb8) 
at nodeHashjoin.c:1400
#5  0x5638e67666d8 in ExecShutdownNode (node=0x5638e8e57cb8) at 
execProcnode.c:812
#6  ExecShutdownNode (node=0x5638e8e57cb8) at execProcnode.c:772
#7  0x5638e67cd5b1 in planstate_tree_walker 
(planstate=planstate@entry=0x5638e8e58580, walker=walker@entry=0x5638e6766680 
, context=context@entry=0x0) at nodeFuncs.c:4009
#8  0x5638e67666b2 in ExecShutdownNode (node=0x5638e8e58580) at 
execProcnode.c:792
#9  ExecShutdownNode (node=0x5638e8e58580) at execProcnode.c:772
#10 0x5638e67cd5b1 in planstate_tree_walker 
(planstate=planstate@entry=0x5638e8e58418, walker=walker@entry=0x5638e6766680 
, context=context@entry=0x0) at nodeFuncs.c:4009
#11 0x5638e67666b2 in ExecShutdownNode (node=0x5638e8e58418) at 
execProcnode.c:792
#12 ExecShutdownNode (node=node@entry=0x5638e8e58418) at execProcnode.c:772
#13 0x5638e675f518 in ExecutePlan (execute_once=, 
dest=0x5638e8df0058, direction=, numberTuples=0, 
sendTuples=, operation=CMD_SELECT,
use_parallel_mode=, planstate=0x5638e8e58418, 
estate=0x5638e8e57a10) at execMain.c:1658
#14 standard_ExecutorRun () at execMain.c:410
#15 0x5638e6763e0a in ParallelQueryMain (seg=0x5638e8d823d8, 
toc=0x7f13df4e9000) at execParallel.c:1493
#16 0x5638e663f6c7 in ParallelWorkerMain () at parallel.c:1495
#17 0x5638e68542e4 in StartBackgroundWorker () at bgworker.c:858
#18 0x5638e6860f53 in do_start_bgworker (rw=) at 
postmaster.c:5883
#19 maybe_start_bgworkers () at postmaster.c:6108
#20 0x5638e68619e5 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:5272
#21 
#22 0x7f13de425ff7 in __GI___select (nfds=nfds@entry=7, 
readfds=readfds@entry=0x7ffef03b8400, writefds=writefds@entry=0x0, 
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffef03b8360)
at ../sysdeps/unix/sysv/linux/select.c:41
#23 0x5638e68620ce in ServerLoop () at postmaster.c:1765
#24 0x5638e6863bcc in PostmasterMain () at postmaster.c:1473
#25 0x5638e658fd00 in main (argc=8, argv=0x5638e8d54730) at main.c:198




XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-06 Thread Noah Misch
On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-10-24%2012%3A01%3A10
> got an interesting v9.6 failure [...]:
> 
> 2021-10-24 14:25:29.263 CEST [34569:175] pgbench ERROR:  could not read 
> two-phase state from xlog at 0/158F4E0
> 2021-10-24 14:25:29.263 CEST [34569:176] pgbench STATEMENT:  COMMIT PREPARED 
> 'c1';

As a first step, let's report the actual XLogReadRecord() error message.
Attached.  All the other sites that expect no error already do this.
Author: Noah Misch 
Commit: Noah Misch 

Report any XLogReadRecord() error in XlogReadTwoPhaseData().

Buildfarm member kittiwake has witnessed errors at this site.  The site
discarded key facts.  Back-patch to 9.6 (all supported versions).

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
index ef4b5f6..28b153a 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1397,10 +1397,18 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int 
*len)
record = XLogReadRecord(xlogreader, &errormsg);
 
if (record == NULL)
-   ereport(ERROR,
-   (errcode_for_file_access(),
-errmsg("could not read two-phase state from 
WAL at %X/%X",
-   LSN_FORMAT_ARGS(lsn;
+   {
+   if (errormsg)
+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg("could not read two-phase state 
from WAL at %X/%X: %s",
+   LSN_FORMAT_ARGS(lsn), 
errormsg)));
+   else
+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg("could not read two-phase state 
from WAL at %X/%X",
+   LSN_FORMAT_ARGS(lsn;
+   }
 
if (XLogRecGetRmid(xlogreader) != RM_XACT_ID ||
(XLogRecGetInfo(xlogreader) & XLOG_XACT_OPMASK) != 
XLOG_XACT_PREPARE)


Re: inefficient loop in StandbyReleaseLockList()

2021-11-06 Thread Tom Lane
Andres Freund  writes:
> On 2021-11-06 18:32:54 -0400, Tom Lane wrote:
>> Good point.  The note at list_delete_last that it's O(1) isn't really
>> on point --- instead, the text for list_delete_first should be like
>> 
>> + * Note that this takes time proportional to the length of the list,
>> + * since the remaining entries must be moved.  Consider reversing the
>> + * list order so that you can use list_delete_last() instead.  However,
>> + * if that causes you to replace lappend() with lcons(), you haven't
>> + * improved matters.

> LGTM

Done that way, then.

regards, tom lane




Re: inefficient loop in StandbyReleaseLockList()

2021-11-06 Thread Andres Freund
On 2021-11-06 18:32:54 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2021-11-06 14:06:12 -0400, Tom Lane wrote:
> >> + * Note that this takes time proportional to the length of the list,
> >> + * since the remaining entries must be moved.
> >> */
> >> List *
> >> list_delete_first(List *list)
> 
> > Perhaps we could point to list_delete_last()? But it's an improvement 
> > without
> > that too.
> 
> Good point.  The note at list_delete_last that it's O(1) isn't really
> on point --- instead, the text for list_delete_first should be like
> 
> + * Note that this takes time proportional to the length of the list,
> + * since the remaining entries must be moved.  Consider reversing the
> + * list order so that you can use list_delete_last() instead.  However,
> + * if that causes you to replace lappend() with lcons(), you haven't
> + * improved matters.

LGTM




Re: inefficient loop in StandbyReleaseLockList()

2021-11-06 Thread Tom Lane
Andres Freund  writes:
> On 2021-11-06 14:06:12 -0400, Tom Lane wrote:
>> + * Note that this takes time proportional to the length of the list,
>> + * since the remaining entries must be moved.
>> */
>> List *
>> list_delete_first(List *list)

> Perhaps we could point to list_delete_last()? But it's an improvement without
> that too.

Good point.  The note at list_delete_last that it's O(1) isn't really
on point --- instead, the text for list_delete_first should be like

+ * Note that this takes time proportional to the length of the list,
+ * since the remaining entries must be moved.  Consider reversing the
+ * list order so that you can use list_delete_last() instead.  However,
+ * if that causes you to replace lappend() with lcons(), you haven't
+ * improved matters.

regards, tom lane




Re: inefficient loop in StandbyReleaseLockList()

2021-11-06 Thread Andres Freund
Hi,

On 2021-11-06 14:06:12 -0400, Tom Lane wrote:
> I wrote:
> > Hm.  I think it's not the only list function with O(N) behavior;
> > in fact there used to be more such functions than there are now.
> > But I could get behind a patch that annotates all of them.

Personally I think the delete first is particularly easy to run into, due to
implementing fifo like behaviour. But I'm i


> Here's a quick hack at that.  Having done it, I'm not sure if it's
> really worth the trouble or not ... thoughts?

In favor of adding them.


> @@ -870,6 +890,9 @@ list_delete_oid(List *list, Oid datum)
>   * where the intent is to alter the list rather than just traverse it.
>   * Beware that the list is modified, whereas the Lisp-y coding leaves
>   * the original list head intact in case there's another pointer to it.
> + *
> + * Note that this takes time proportional to the length of the list,
> + * since the remaining entries must be moved.
>   */
>  List *
>  list_delete_first(List *list)

Perhaps we could point to list_delete_last()? But it's an improvement without
that too.

This reminds me that I wanted a list splicing operation for ilist.h...

Greetings,

Andres Freund




Re: amcheck's verify_heapam(), and HOT chain verification

2021-11-06 Thread Peter Geoghegan
On Fri, Nov 5, 2021 at 7:51 PM Peter Geoghegan  wrote:
> Here are some specific checks I have in mind:

One more for the list:

* Validate PageIsAllVisible() for each page.

In other words, pg_visibility should be merged with verify_heapam.c
(or at least pg_visibility 's pg_check_frozen() and pg_check_visible()
functions should be moved, merged, or whatever). This would mean that
verify_heapam() would directly check if the page-level PD_ALL_VISIBLE
flag contradicts either the tuple headers of tuples with storage on
the page, the presence (or absence) of LP_DEAD stub line pointers on
the page, or the corresponding visibility map bit (e.g.,
VISIBILITYMAP_ALL_VISIBLE) for the page.

There is value in teaching verify_heapam() about any possible problem,
including with the visibility map, but it's certainly less valuable
than the HOT chain verification stuff -- and probably trickier to get
right. I'm mentioning it now to be exhaustive, but it's less of a
priority for me personally.

I am quite willing to help out with all this, if you're interested.

One more thing about HOT chain validation:

I can give you another example bug of the kind I'd expect
verify_heapam() to catch only with full HOT chain validation. This one
is a vintage MultiXact bug that has the same basic HOT chain
corruption, looks-like-index-corruption-but-isn't quality as the more
memorable freeze-the-dead bug (this one was fixed by commit 6bfa88ac):

https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb%2BbYkUcnCk%3DO67w0cSswPvV7XfUcU5g%40mail.gmail.com

In general I think that reviewing historic examples of pernicious
corruption bugs is a valuable exercise when designing tools like
amcheck. Maybe even revert the fix during testing, to be sure it would
have been caught had the final tool been available. History doesn't
repeat itself, but it does rhyme.

-- 
Peter Geoghegan




Re: amcheck's verify_heapam(), and HOT chain verification

2021-11-06 Thread Peter Geoghegan
On Fri, Nov 5, 2021 at 8:18 PM Mark Dilger  wrote:
> Thanks, Peter, for this analysis.

It's strategically important work. What you've done so far has every
chance of catching corruption involving storage issues, which is
great. But we ought to do more to catch certain known general patterns
of corruption. The so-called "freeze the dead" bug (see commit
9c2f0a6c for the final fix) is the single weirdest and scariest bug
that I can recall (there were a couple of incorrect bug fixes that had
to be reverted), and it is very much the kind of thing that I'd like
to see verify_heapam.c handle exhaustively. That would de-risk a lot
of things. Note that the same variety of HOT chain corruption bugs
besides that one, so it's really a general class of corruption, not
one specific bug.

The heapallindexed verification option actually detected the "freeze
the dead" bug [1], although that was a total accident -- that checking
option happens to use CREATE INDEX infrastructure, which happens to
sanitize HOT chains by calling heap_get_root_tuples() and detecting
contradictions -- heap_get_root_tuples() is code that is right next to
the pruning code I mentioned. That level of detection is certainly
better than nothing, but it's still not good enough -- we need this in
verify_heapam, too.

Here's why I think that we need very thorough HOT chain verification
that lives directly in verify_heapam:

First of all, the heapallindexed option is rather expensive, whereas
the additional overhead for verify_heapam would be minimal -- you
probably wouldn't even need to make it optional. Second of all, why
should you need to use bt_index_check() to detect what is after all
heap corruption? And third of all, the approach of detecting
corruption by outsourcing detection to heapam_index_build_range_scan()
(which calls heap_get_root_tuples() itself) probably risks missing
corrupt HOT chains where the corruption doesn't look a certain way --
it was not designed with amcheck in mind.

heapam_index_build_range_scan() + heap_get_root_tuples() will notice a
heap-only tuple without a parent, which is a good start. But what
about an LP_REDIRECT item whose lp_off (i.e. page offset number
redirect) somehow points to any item on the page other than a valid
heap-only tuple? Also doesn't catch contradictory commit states for
heap-only tuples that form a hot chain through their t_ctid links. I'm
sure that there are other things that I haven't thought of, too --
there is a lot going on here.

This HOT chain verification business is a little like the checking
that we do in verify_nbtree.c, actually. The individual tuples (in
this case heap tuples) may well not be corrupt (or demonstrably
corrupt) in isolation. But taken together, in a particular page offset
number sequence order, they can nevertheless *imply* corruption --
it's implied if the tuples tell a contradictory story about what's
going on at the level of the whole page. A HOT chain LP_REDIRECT is
after all a little like a "mini index" stored on a heap page.
Sequential scans don't care about LP_REDIRECTs at all. But index scans
expect a great deal from LP_REDIRECTs.

[1] 
https://www.postgresql.org/message-id/cah2-wznm4rcrhfaiwkpwtpew2bxdtgrozk7jwwgucxeh3d1...@mail.gmail.com
--
Peter Geoghegan




Re: Draft release notes for next week's releases

2021-11-06 Thread Tom Lane
Peter Geoghegan  writes:
> You probably won't miss this, but just in case: Alexander's very
> recent "Reset lastOverflowedXid on standby when needed" commit needs
> to be listed now.

Yup, I saw it.

regards, tom lane




Re: Draft release notes for next week's releases

2021-11-06 Thread Peter Geoghegan
On Fri, Nov 5, 2021 at 5:27 PM Tom Lane  wrote:
> As usual, please send any comments/corrections by Sunday.

You probably won't miss this, but just in case: Alexander's very
recent "Reset lastOverflowedXid on standby when needed" commit needs
to be listed now.

-- 
Peter Geoghegan




Re: inefficient loop in StandbyReleaseLockList()

2021-11-06 Thread Tom Lane
I wrote:
> Hm.  I think it's not the only list function with O(N) behavior;
> in fact there used to be more such functions than there are now.
> But I could get behind a patch that annotates all of them.

Here's a quick hack at that.  Having done it, I'm not sure if it's
really worth the trouble or not ... thoughts?

regards, tom lane

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 12847e35ea..247032124d 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -410,6 +410,9 @@ insert_new_cell(List *list, int pos)
 /*
  * Insert the given datum at position 'pos' (measured from 0) in the list.
  * 'pos' must be valid, ie, 0 <= pos <= list's length.
+ *
+ * Note that this takes time proportional to the distance to the end of the
+ * list, since the following entries must be moved.
  */
 List *
 list_insert_nth(List *list, int pos, void *datum)
@@ -460,6 +463,9 @@ list_insert_nth_oid(List *list, int pos, Oid datum)
  * value, rather than continuing to use the pointer passed as the
  * second argument.
  *
+ * Note that this takes time proportional to the length of the list,
+ * since the existing entries must be moved.
+ *
  * Caution: before Postgres 8.0, the original List was unmodified and
  * could be considered to retain its separate identity.  This is no longer
  * the case.
@@ -525,6 +531,10 @@ lcons_oid(Oid datum, List *list)
  * Callers should be sure to use the return value as the new pointer to the
  * concatenated list: the 'list1' input pointer may or may not be the same
  * as the returned pointer.
+ *
+ * Note that this takes at least time proportional to the length of list2.
+ * It'd typically be the case that we have to enlarge list1's storage,
+ * probably adding time proportional to the length of list1.
  */
 List *
 list_concat(List *list1, const List *list2)
@@ -623,6 +633,8 @@ list_truncate(List *list, int new_size)
  * Return true iff 'datum' is a member of the list. Equality is
  * determined via equal(), so callers should ensure that they pass a
  * Node as 'datum'.
+ *
+ * This does a simple linear search --- avoid using it on long lists.
  */
 bool
 list_member(const List *list, const void *datum)
@@ -706,6 +718,9 @@ list_member_oid(const List *list, Oid datum)
  * Delete the n'th cell (counting from 0) in list.
  *
  * The List is pfree'd if this was the last member.
+ *
+ * Note that this takes time proportional to the distance to the end of the
+ * list, since the following entries must be moved.
  */
 List *
 list_delete_nth_cell(List *list, int n)
@@ -777,6 +792,9 @@ list_delete_nth_cell(List *list, int n)
  *
  * The List is pfree'd if this was the last member.  However, we do not
  * touch any data the cell might've been pointing to.
+ *
+ * Note that this takes time proportional to the distance to the end of the
+ * list, since the following entries must be moved.
  */
 List *
 list_delete_cell(List *list, ListCell *cell)
@@ -787,6 +805,8 @@ list_delete_cell(List *list, ListCell *cell)
 /*
  * Delete the first cell in list that matches datum, if any.
  * Equality is determined via equal().
+ *
+ * This does a simple linear search --- avoid using it on long lists.
  */
 List *
 list_delete(List *list, void *datum)
@@ -870,6 +890,9 @@ list_delete_oid(List *list, Oid datum)
  * where the intent is to alter the list rather than just traverse it.
  * Beware that the list is modified, whereas the Lisp-y coding leaves
  * the original list head intact in case there's another pointer to it.
+ *
+ * Note that this takes time proportional to the length of the list,
+ * since the remaining entries must be moved.
  */
 List *
 list_delete_first(List *list)
@@ -885,8 +908,8 @@ list_delete_first(List *list)
 /*
  * Delete the last element of the list.
  *
- * This is the opposite of list_delete_first(), but is noticeably cheaper
- * with a long list, since no data need be moved.
+ * This is the opposite of list_delete_first(), but is O(1),
+ * since no data need be moved.
  */
 List *
 list_delete_last(List *list)
@@ -910,6 +933,9 @@ list_delete_last(List *list)
  * Delete the first N cells of the list.
  *
  * The List is pfree'd if the request causes all cells to be deleted.
+ *
+ * Note that this takes time proportional to the distance to the end of the
+ * list, since the following entries must be moved.
  */
 List *
 list_delete_first_n(List *list, int n)
@@ -989,8 +1015,10 @@ list_delete_first_n(List *list, int n)
  * you probably want to use list_concat_unique() instead to avoid wasting
  * the storage of the old x list.
  *
- * This function could probably be implemented a lot faster if it is a
- * performance bottleneck.
+ * Note that this takes time proportional to the product of the list
+ * lengths, so beware of using it on long lists.  (We could probably
+ * improve that, but really you should be using some other data structure
+ * if this'd be a performance bottleneck.)
  */
 List *
 list_union(const List *list1, 

Re: Commitfest 2021-11 Patch Triage - Part 1

2021-11-06 Thread Tomas Vondra

On 11/5/21 22:15, Daniel Gustafsson wrote:

...
1651: GROUP BY optimization
===
This is IMO a desired optimization, and after all the heavy lifting by Tomas
the patch seems to be in pretty good shape.



I agree it's desirable. To move the patch forward, I need some feedback 
on the costing questions, mentioned in my last review, and the overall 
approach to planning (considering multiple group by variants).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: lastOverflowedXid does not handle transaction ID wraparound

2021-11-06 Thread Alexander Korotkov
Hi!

On Fri, Nov 5, 2021 at 10:31 AM Kyotaro Horiguchi
 wrote:
> At Thu, 4 Nov 2021 01:07:05 +0300, Alexander Korotkov  
> wrote in
> > On Wed, Nov 3, 2021 at 8:51 PM Simon Riggs  
> > wrote:
> > > It is however, an undocumented modularity violation. I think that is
> > > acceptable because of the ProcArrayLock traffic, but needs to have a
> > > comment to explain this at the call to
> > > ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset
> > > lastOverflowedXid", as well as a comment on the
> > > ExpireOldKnownAssignedTransactionIds() function.
> >
> > Thank you for your feedback.  Please find the revised patch attached.
> > It incorporates this function comment changes altogether with minor
> > editings and commit message. Let me know if you have further
> > suggestions.
> >
> > I'm going to push this if no objections.
>
> Thanks for taking a look on and refining this, Simon and Alex!  (while
> I was sick in bed X:)
>
> It looks good to me except the commit Author doesn't contain the name
> of Alexander Korotkov?

Thank you for the suggestion.  And thanks to everybody for the feedback.
Pushed!

--
Regards,
Alexander Korotkov




Re: GiST operator class for bool

2021-11-06 Thread Tomas Vondra

On 11/3/21 16:18, Tomas Vondra wrote:

Hi,

I looked at this patch today - it's pretty simple and in pretty good
shape, I can't think of anything that'd need fixing. Perhaps the test
might also do EXPLAIN like for other types, to verify the new index is
actually used. But that's minor enough to handle during commit.


I've marked this as RFC and will get it committed in a day or two.



Pushed, after adding some simple EXPLAIN to the regression test.

Thanks for the patch!


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Schema variables - new implementation for Postgres 15

2021-11-06 Thread Pavel Stehule
so 6. 11. 2021 v 15:57 odesílatel Justin Pryzby 
napsal:

> On Sat, Nov 06, 2021 at 04:45:19AM +0100, Pavel Stehule wrote:
> > st 3. 11. 2021 v 14:05 odesílatel Tomas Vondra <
> tomas.von...@enterprisedb.com> napsal:
> > > 1) Not sure why we need to call this "schema variables". Most objects
> > > are placed in a schema, and we don't say "schema tables" for example.
> > > And it's CREATE VARIABLE and not CREATE SCHEMA VARIABLE, so it's a bit
> > > inconsistent.
>
> +1
>
> At least the error messages need to be consistent.
> It doesn't make sense to have both of these:
>
> +   elog(ERROR, "cache lookup failed for schema variable %u",
> varid);
> +   elog(ERROR, "cache lookup failed for variable %u", varid);
>
> > Yes, there is inconsistency, but I think it is necessary. The name
> > "variable" is too generic. Theoretically we can use other adjectives like
> > session variables or global variables and the name will be valid. But it
> > doesn't describe the fundamentals of design. This is similar to the
> > package's variables from PL/SQL. These variables are global, session's
> > variables too. But the usual name is "package variables". So schema
> > variables are assigned to schemes, and I think a good name can be "schema
> > variables". But it is not necessary to repeat keyword schema in the
> CREATE
> > COMMAND.
> >
> > My opinion is not too strong in this case, and I can accept just
> > "variables" or "session's variables" or "global variables", but I am not
> > sure if these names describe this feature well, because still they are
> too
> > generic. There are too many different implementations of session global
> > variables (see PL/SQL or T-SQL or DB2).
>
> I would prefer "session variable".
>
> To me, this feature seems similar to a CTE (which exists for a single
> statement), or a temporary table (which exists for a single transaction).
> So
> "session" conveys a lot more of its meaning than "schema".
>

It depends on where you are looking. There are two perspectives - data and
metadata. And if I use data perspective, then it is session related. If I
use metadata perspective, then it can be persistent or temporal like
tables. I see strong similarity with Global Temporary Tables - but I think
naming "local temporary tables" and "global temporary tables" can be not
intuitive or messy for a lot of people too. Anyway, if people will try to
find this feature on Google, then probably use keywords "session
variables", so maybe my preference of more technical terminology is obscure
and not practical, and the name "session variables" can be more practical
for other people. If I use the system used for GTT - then the exact name
can be "Global Session Variable". Can we use this name? Or shortly just
Session Variables because we don't support local session variables now.

What do you think about it?

Regards

Pavel



> But don't rename everything just for me...
>
> --
> Justin
>


Re: Schema variables - new implementation for Postgres 15

2021-11-06 Thread Justin Pryzby
On Sat, Nov 06, 2021 at 04:45:19AM +0100, Pavel Stehule wrote:
> st 3. 11. 2021 v 14:05 odesílatel Tomas Vondra 
>  napsal:
> > 1) Not sure why we need to call this "schema variables". Most objects
> > are placed in a schema, and we don't say "schema tables" for example.
> > And it's CREATE VARIABLE and not CREATE SCHEMA VARIABLE, so it's a bit
> > inconsistent.

+1

At least the error messages need to be consistent.
It doesn't make sense to have both of these:

+   elog(ERROR, "cache lookup failed for schema variable %u", 
varid);
+   elog(ERROR, "cache lookup failed for variable %u", varid);

> Yes, there is inconsistency, but I think it is necessary. The name
> "variable" is too generic. Theoretically we can use other adjectives like
> session variables or global variables and the name will be valid. But it
> doesn't describe the fundamentals of design. This is similar to the
> package's variables from PL/SQL. These variables are global, session's
> variables too. But the usual name is "package variables". So schema
> variables are assigned to schemes, and I think a good name can be "schema
> variables". But it is not necessary to repeat keyword schema in the CREATE
> COMMAND.
> 
> My opinion is not too strong in this case, and I can accept just
> "variables" or "session's variables" or "global variables", but I am not
> sure if these names describe this feature well, because still they are too
> generic. There are too many different implementations of session global
> variables (see PL/SQL or T-SQL or DB2).

I would prefer "session variable".

To me, this feature seems similar to a CTE (which exists for a single
statement), or a temporary table (which exists for a single transaction).  So
"session" conveys a lot more of its meaning than "schema".

But don't rename everything just for me...

-- 
Justin




Re: Draft release notes for next week's releases

2021-11-06 Thread Tom Lane
Justin Pryzby  writes:
> On Fri, Nov 05, 2021 at 08:27:49PM -0400, Tom Lane wrote:
>> +  one more row than requested, so tha it can determine whether the

> A little typo:  tha it
> [ etc ]

Ugh.  These notes were prepared in a bit more haste than usual,
and I guess it shows :-(.  Will fix, thanks for proofreading!

regards, tom lane




Re: using extended statistics to improve join estimates

2021-11-06 Thread Andy Fan
Hi Tomas:

This is the exact patch I want, thanks for the patch!

On Thu, Oct 7, 2021 at 3:33 AM Tomas Vondra
 wrote:


> 3) estimation by join pairs
>
> At the moment, the estimates are calculated for pairs of relations, so
> for example given a query
>
>   explain analyze
>   select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b)
>join t3 on (t1.b = t3.b and t2.c = t3.c);
>
> we'll estimate the first join (t1,t2) just fine, but then the second
> join actually combines (t1,t2,t3). What the patch currently does is it
> splits it into (t1,t2) and (t2,t3) and estimates those.

Actually I can't understand how this works even for a simpler example.
let's say we query like this (ONLY use t2's column to join t3).

select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b)
 join t3 on (t2.c = t3.c and t2.d = t3.d);

Then it works well on JoinRel(t1, t2)  AND JoinRel(t2, t3).  But when comes
to JoinRel(t1, t2, t3), we didn't maintain the MCV on join rel,  so it
is hard to
work.  Here I see your solution is splitting it into (t1, t2) AND (t2,
t3) and estimate
those.  But how does this help to estimate the size of JoinRel(t1, t2, t3)?

> I wonder if this
> should actually combine all three MCVs at once - we're pretty much
> combining the MCVs into one large MCV representing the join result.
>

I guess we can keep the MCVs on joinrel for these matches.  Take the above
query I provided for example, and suppose the MCV data as below:

t1(a, b)
(1, 2) -> 0.1
(1, 3) -> 0.2
(2, 3) -> 0.5
(2, 8) -> 0.1

t2(a, b)
(1, 2) -> 0.2
(1, 3) -> 0.1
(2, 4) -> 0.2
(2, 10) -> 0.1

After t1.a = t2.a AND t1.b = t2.b,  we can build the MCV as below

(1, 2, 1, 2)  -> 0.1 * 0.2
(1, 3, 1, 3)  -> 0.2 * 0.1

And recording the total mcv frequence as (0.1 + 0.2 + 0.5 + 0.1) *
(0.2 + 0.1 + 0.2 + 0.1)

With this design, the nitems of MCV on joinrel would be less than
either of baserel.

and since we handle the eqjoin as well, we even can record the items as

(1, 2) -> 0.1 * 0.2
(1, 3) -> 0.2 * 0.1;

About when we should maintain the JoinRel's MCV data, rather than
maintain this just
after the JoinRel size is estimated, we can only estimate it when it
is needed.  for example:

select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b)
 join t3 on (t2.c = t3.c and t2.d = t3.d);

we don't need to maintain the MCV on (t1, t2, t3)  since no others
need it at all. However
I don't check code too closely to see if it (Lazing computing MVC on
joinrel) is convenient
to do.


> But I haven't done that yet, as it requires the MCVs to be combined
> using the join clauses (overlap in a way), but I'm not sure how likely
> that is in practice. In the example it could help, but that's a bit
> artificial example.
>
>
> 4) still just inner equi-joins
>
> I haven't done any work on extending this to outer joins etc. Adding
> outer and semi joins should not be complicated, mostly copying and
> tweaking what eqjoinsel() does.
>

Overall, thanks for the feature and I am expecting there are more cases
to handle during discussion.  To make the review process more efficient,
I suggest that we split the patch into smaller ones and review/commit them
separately if we have finalized the design roughly . For example:

Patch 1 -- required both sides to have extended statistics.
Patch 2 -- required one side to have extended statistics and the other side had
per-column MCV.
Patch 3 -- handle the case like  WHERE t1.a = t2.a  and t1.b = Const;
Patch 3 -- handle the case for 3+ table joins.
Patch 4 -- supports the outer join.

I think we can do this if we are sure that each individual patch would work in
some cases and would not make any other case worse.  If you agree with this,
I can do that splitting work during my review process.


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)




AW: VS2022: Support Visual Studio 2022 on Windows

2021-11-06 Thread Hans Buschmann
Now it is three days before release of VS2022.

I updated to the latest Preview 7 (= RC3) and recompiled PG14 64bit Release 
without issues.
There seem to be not many internal differences from previous versions in the 
tools used for building Postgres.

My intention for an early support is to catch the momentum for signalling to 
any user:  "We support current tools".
The risks seem non-existent.

Updated the patch to reflect the VisualStudioVersion for Preview 7, which is 
the version number compiled into the main  devenv.exe image.
This version number seems to be of no interest elsewhere in the postgres source 
tree.

I will reflect any updates after official release on monday, November 8

Hans Buschmann



0001_support_vs2022_v2.patch
Description: 0001_support_vs2022_v2.patch


Re: Extending amcheck to check toast size and compression

2021-11-06 Thread Mark Dilger



> On Nov 5, 2021, at 8:56 PM, Tom Lane  wrote:
> 
> Some of the buildfarm is unimpressed with this --- looks like the test
> output is less stable than you thought.

Yes, it does.  I had to play with it a bit to be sure the test itself is 
faulty, and I believe that it is.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company