Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-12 Thread Julien Rouhaud
On Thu, Nov 10, 2022 at 10:29:40AM +0900, Michael Paquier wrote:
>
> FWIW, I have been playing with the addition of a ErrorContextCallback
> in tokenize_auth_file(), and this addition leads to a really nice
> result.  With this method, it is possible to know the full chain of
> events leading to a failure when tokenizing included files, which is
> not available now in the logs when reloading the server.
>
> We could extend it to have more verbose information by passing more
> arguments to tokenize_auth_file(), still I'd like to think that just
> knowing the line number and the full path to the file is more than
> enough once you know the full chain of events.  0001 and 0002 ought to
> be merged together, but I am keeping these separate to show how simple
> the addition of the ErrorContextCallback is.

It's looks good to me.  I agree that file name and line number should be enough
to diagnose any unexpected error.




Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-12 Thread Amit Kapila
On Fri, Nov 11, 2022 at 2:12 PM houzj.f...@fujitsu.com
 wrote:
>

Few comments on v46-0001:
==
1.
+static void
+apply_handle_stream_abort(StringInfo s)
{
...
+ /* Send STREAM ABORT message to the parallel apply worker. */
+ parallel_apply_send_data(winfo, s->len, s->data);
+
+ if (abort_toplevel_transaction)
+ {
+ parallel_apply_unlock_stream(xid, AccessExclusiveLock);

Shouldn't we need to release this lock before sending the message as
we are doing for streap_prepare and stream_commit? If there is a
reason for doing it differently here then let's add some comments for
the same.

2. It seems once the patch makes the file state as busy
(LEADER_FILESET_BUSY), it will only be accessible after the leader
apply worker receives a transaction end message like stream_commit. Is
my understanding correct? If yes, then why can't we make it accessible
after the stream_stop message? Are you worried about the concurrency
handling for reading and writing the file? If so, we can probably deal
with it via some lock for reading and writing to file for each change.
I think after this we may not need additional stream level lock/unlock
in parallel_apply_spooled_messages. I understand that you probably
want to keep the code simple so I am not suggesting changing it
immediately but just wanted to know whether you have considered
alternatives here.

3. Don't we need to release the transaction lock at stream_abort in
parallel apply worker? I understand that we are not waiting for it in
the leader worker but still parallel apply worker should release it if
acquired at stream_start by it.

4. A minor comment change as below:
diff --git a/src/backend/replication/logical/worker.c
b/src/backend/replication/logical/worker.c
index 43f09b7e9a..c771851d1f 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1851,6 +1851,9 @@ apply_handle_stream_abort(StringInfo s)
parallel_apply_stream_abort(&abort_data);

/*
+* We need to wait after processing rollback
to savepoint for the next set
+* of changes.
+*
 * By the time parallel apply worker is
processing the changes in
 * the current streaming block, the leader
apply worker may have
 * sent multiple streaming blocks. So, try to
lock only if there


-- 
With Regards,
Amit Kapila.




Remove unused param rte in set_plain_rel_pathlist

2022-11-12 Thread Zhang Mingli
Hi, hackers


Param RangeTblEntry *rte in function set_plain_rel_pathlist is not used at all.

I look at the commit e2fa76d80b(10 years ago), it’s useless since then.

Add a path to remove it.

Regards,
Zhang Mingli


v1-0001-Remove-unused-param-rte-in-set_plain_rel_pathlist.patch
Description: Binary data


Re: logical replication restrictions

2022-11-12 Thread vignesh C
On Thu, 11 Aug 2022 at 02:03, Euler Taveira  wrote:
>
> On Wed, Aug 10, 2022, at 9:39 AM, osumi.takami...@fujitsu.com wrote:
>
> Minor review comments for v6.
>
> Thanks for your review. I'm attaching v7.
>
> "If the subscriber sets min_apply_delay parameter, ..."
>
> I suggest we use subscription rather than subscriber, because
> this parameter refers to and is used for one subscription.
> My suggestion is
> "If one subscription sets min_apply_delay parameter, ..."
> In case if you agree, there are other places to apply this change.
>
> I changed the terminology to subscription. I also checked other "subscriber"
> occurrences but I don't think it should be changed. Some of them are used as
> publisher/subscriber pair. If you think there is another sentence to consider,
> point it out.
>
> It might be better to write a note for committer
> like "Bump catalog version" at the bottom of the commit message.
>
> It is a committer task to bump the catalog number. IMO it is easy to notice
> (using a git hook?) that it must bump it when we are modifying the catalog.
> AFAICS there is no recommendation to add such a warning.
>
> The former interprets input number as milliseconds in case of no units,
> while the latter takes it as seconds without units.
> I feel it would be better to make them aligned.
>
> In a previous version I decided not to add a code to attach a unit when there
> isn't one. Instead, I changed the documentation to reflect what interval_in
> uses (seconds as unit). Under reflection, let's use ms as default unit if the
> user doesn't specify one.
>
> I fixed all the other suggestions too.

Few comments:
1) I feel if the user has specified a long delay there is a chance
that replication may not continue if the replication slot falls behind
the current LSN by more than max_slot_wal_keep_size. I feel we should
add this reference in the documentation of min_apply_delay as the
replication will not continue in this case.

2) I also noticed that if we have to shut down the publisher server
with a long min_apply_delay configuration, the publisher server cannot
be stopped as the walsender waits for the data to be replicated. Is
this behavior ok for the server to wait in this case? If this behavior
is ok, we could add a log message as it is not very evident from the
log files why the server could not be shut down.

Regards,
Vignesh




Re: Avoid overhead open-close indexes (catalog updates)

2022-11-12 Thread Ranier Vilela
Em sex., 11 de nov. de 2022 às 01:54, Michael Paquier 
escreveu:

> On Thu, Nov 10, 2022 at 08:56:25AM -0300, Ranier Vilela wrote:
> > For CopyStatistics() have performance checks.
>
> You are not giving all the details of your tests, though,

Windows 10 64 bits
SSD 256 GB

pgbench -i
pgbench_accounts;
pgbench_tellers;

Simple test, based on tables created by pgbench.


> so I had a
> look with some of my stuff using the attached set of SQL functions
> (create_function.sql) to create a bunch of indexes with a maximum
> number of expressions, as of:
> select create_table_cols('tab', 32);
> select create_index_multi_exprs('ind', 400, 'tab', 32);
> insert into tab values (1);
> analyze tab; -- 12.8k~ pg_statistic records
>
> On HEAD, a REINDEX CONCURRENTLY for the table 'tab' takes 1550ms on my
> laptop with an average of 10 runs.  The patch impacts the runtime with
> a single session, making the execution down to 1480ms as per an effect
> of the maximum number of attributes on an index being 32.  There may
> be some noise, but there is a trend, and some perf profiles confirm
> the same with CopyStatistics().  My case is a bit extreme, of course,
> still that's something.
>
> Anyway, while reviewing this code, it occured to me that we could do
> even better than this proposal once we switch to
> CatalogTuplesMultiInsertWithInfo() for the data insertion.
>

> This would reduce more the operation overhead by switching to multi
> INSERTs rather than 1 INSERT for each index attribute with tuples
> stored in a set of TupleTableSlots, meaning 1 WAL record rather than N
> records.  The approach would be similar to what you do for
> dependencies, see for example recordMultipleDependencies() when it
> comes to the number of slots used etc.
>

I think complexity doesn't pay off.
For example, CopyStatistics not knowing how many tuples will be processed.
IMHO, this step is right now.
CatalogTupleInsertWithInfo offers considerable improvement without
introducing bugs and maintenance issues.

regards,
Ranier Vilela


Re: Remove unused param rte in set_plain_rel_pathlist

2022-11-12 Thread Tom Lane
Zhang Mingli  writes:
> Param RangeTblEntry *rte in function set_plain_rel_pathlist is not used at 
> all.
> Add a path to remove it.

I'm disinclined to change that, as it'd make set_plain_rel_pathlist
different from its sibling functions, which do need the RTE.

In practice this has cost zero anyway, since set_plain_rel_pathlist
will surely get inlined into its sole caller.

regards, tom lane




Re: Lack of PageSetLSN in heap_xlog_visible

2022-11-12 Thread Jeff Davis
On Fri, 2022-11-11 at 12:43 +0200, Konstantin Knizhnik wrote:
> Yes, you are right: my original concerns that it may cause problems
> with 
> recovery at replica are not correct.

Great, thank you for following up.

> I also not sure that it can cause problems with checksums - page is 
> marked as dirty in any case.
> Yes, content and checksum of the page will be different at master and
> replica. It may be a problem for recovery pages from replica.

It could cause a theoretical problem: during recovery, the page could
be flushed out before minRecoveryPoint is updated, and while that's
happening, a crash could tear it. Then, a subsequent partial recovery
(e.g. PITR) could recover without fixing the torn page.

That would not be a practical problem, because it would require a tear
between two fields of the page header, which I don't think is possible.

> When it really be critical - is incremental backup (like
> pg_probackup) 
> whch looks at page LSN to determine moment of page modification.
> 
> And definitely it is critical for Neon, because LSN of page 
> reconstructed by pageserver is different from one expected by compute
> node.
> 
> May be I missing something, but I do not see any penalty from setting
> page LSN here.
> So even if there is not obvious scenario of failure, I still thing
> that 
> it should be fixed. Breaking invariants is always bad thing
> and there are should be very strong arguments for doing it. And I do
> not 
> see them here.

Agreed, thank you for the report!

Committed d6a3dbe14f and backpatched through version 11.

Also committed an unrelated cleanup patch in the same area (3eb8eeccbe)
and a README patch (97c61f70d1) to master. The README patch doesn't
guarantee that things won't change in the future, but the behavior it
describes has been true for quite a while now.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Re: generate_series for timestamptz and time zone problem

2022-11-12 Thread Tom Lane
=?UTF-8?Q?Przemys=c5=82aw_Sztoch?=  writes:
> Gurjeet Singh wrote on 01.07.2022 06:35:
>> Can you please explain why you chose to remove the provolatile
>> attribute from the existing entry of date_trunc in pg_proc.dat.

> I believe it was a mistake in PG code.
> All timestamptz functions must be STABLE as they depend on the current: 
> SHOW timezone.
> If new functions are created that pass the zone as a parameter, they 
> become IMMUTABLE.
> FIrst date_trunc function implementaion was without time zone parameter 
> and someone who
> added second variant (with timezone as parameter) copied the definition 
> without removing the STABLE flag.

Yeah, I think you are right, and the someone was me :-( (see 600b04d6b).

I think what I was thinking is that timezone definitions do change
fairly often and maybe we shouldn't risk treating them as immutable.
However, we've not taken that into account in other volatility
markings; for example the timezone() functions that underly AT TIME
ZONE are marked immutable, which is surely wrong if you are worried
about zone definitions changing.  Given how long that's stood without
complaint, I think marking timestamptz_trunc_zone as immutable
should be fine.

However, what it shouldn't be is part of this patch.  It's worth
pushing it separately to have a record of that decision.  I've
now done that, so you'll need to rebase to remove that delta.

I looked over the v5 patch very briefly, and have two main
complaints:

* There's no documentation additions.  You can't add a user-visible
function without adding an appropriate entry to func.sgml.

* I'm pretty unimpressed with the whole truncate-to-interval thing
and would recommend you drop it.  I don't think it's adding much
useful functionality beyond what we can already do with the existing
date_trunc variants; and the definition seems excessively messy
(too many restrictions and special cases).

regards, tom lane




Re: Fix order of checking ICU options in initdb and create database

2022-11-12 Thread Jose Arthur Benetasso Villanova



Hello Marina.

I just reviewed your patch.

It applied cleanly at my current master (commit 
d6a3dbe14f98d867b2fc3faeb99d2d3c2a48ca67).

Also, it worked as described in email. Since it's a clarification in an 
error message, I think the documentation is fine.


I played a bit with "make check", creating a database in my native 
language (pt_BR), testing with some data and everything worked as 
expected.


--
Jose Arthur Benetasso Villanova




Re: proposal: possibility to read dumped table's name from file

2022-11-12 Thread Pavel Stehule
pá 11. 11. 2022 v 9:11 odesílatel Julien Rouhaud 
napsal:

> Hi,
>
> On Sat, Nov 05, 2022 at 08:54:57PM +0100, Pavel Stehule wrote:
> >
> > pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud 
> > napsal:
> >
> > > > But one thing I noticed is that "optarg" looks wrong here:
> > > >
> > > > simple_string_list_append(&opts->triggerNames, optarg);
> > >
> > > Ah indeed, good catch!  Maybe there should be an explicit test for
> every
> > > (include|exclude) / objtype combination?  It would be a bit verbose
> (and
> > > possibly hard to maintain).
> > >
> >
> > yes - pg_restore is not well covered by  tests, fixed
> >
> > I found another issue. The pg_restore requires a full signature of the
> > function and it is pretty sensitive on white spaces (pg_restore).
>
> Argh, indeed.  It's a good thing to have expanded the regression tests :)
>
> > I made a
> > mistake when I partially parsed patterns like SQL identifiers. It can
> work
> > for simple cases, but when I parse the function's signature it stops
> > working. So I rewrote the parsing pattern part. Now, I just read an input
> > string and I try to reduce spaces. Still multiline identifiers are
> > supported. Against the previous method of pattern parsing, I needed to
> > change just one regress test - now I am not able to detect garbage after
> > pattern :-/.
>
> I'm not sure it's really problematic.  It looks POLA-violation compatible
> with
> regular pg_dump options, for instance:
>
> $ echo "include table t1()" | pg_dump --filter - | ag CREATE
> CREATE TABLE public.t1 (
>
> $ pg_dump -t "t1()" | ag CREATE
> CREATE TABLE public.t1 (
>
> $ echo "include table t1()blabla" | pg_dump --filter - | ag CREATE
> pg_dump: error: no matching tables were found
>
> $ pg_dump -t "t1()blabla" | ag CREATE
> pg_dump: error: no matching tables were found
>
> I don't think the file parsing code should try to be smart about checking
> the
> found patterns.
>
> > * function's filtering doesn't support schema - when the name of function
> > is specified with schema, then the function is not found
>
> Ah I didn't know that.  Indeed it only expect a non-qualified identifier,
> and
> would restore any function that matches the name (and arguments), possibly
> multiple ones if there are variants in different schema.  That's unrelated
> to
> this patch though.
>
> > * the function has to be specified with an argument type list - the
> > separator has to be exactly ", " string. Without space or with one space
> > more, the filtering doesn't work (new implementation of pattern parsing
> > reduces white spaces sensitivity). This is not a bug, but it is not well
> > documented.
>
> Agreed.
>
> > attached updated patch
>
> It looks overall good to me!  I just have a few minor nitpicking
> complaints:
>
> - you removed the pg_strip_clrf() calls and declared everything as "const
> char
>   *", so there's no need to explicitly cast the filter_get_keyword()
> arguments
>   anymore
>

removed


>
> Note also that the code now relies on the fact that there are some non-zero
> bytes after a pattern to know that no errors happened.  It's not a problem
> as
> you should find an EOF marker anyway if CLRF were stripped.
>

I am not sure if I understand this note well?


>
> + * Following routines are called from pg_dump, pg_dumpall and pg_restore.
> + * Unfortunately, implementation of exit_nicely in pg_dump and pg_restore
> + * is different from implementation of this routine in pg_dumpall. So
> instead
> + * of directly calling exit_nicely we have to return some error flag (in
> this
> + * case NULL), and exit_nicelly will be executed from caller's routine.
>
> Slight improvement:
> [...]
> Unfortunately, the implementation of exit_nicely in pg_dump and pg_restore
> is
> different from the one in pg_dumpall, so instead of...
>
> + * read_pattern - reads an pattern from input. The pattern can be mix of
> + * single line or multi line subpatterns. Single line subpattern starts
> first
> + * non white space char, and ending last non space char on line or by char
> + * '#'. The white spaces inside are removed (around char ".()"), or
> reformated
> + * around char ',' or reduced (the multiple spaces are replaced by one).
> + * Multiline subpattern starts by double quote and ending by this char
> too.
> + * The escape rules are same like for SQL quoted literal.
> + *
> + * Routine signalizes error by returning NULL. Otherwise returns pointer
> + * to next char after last processed char in input string.
>
>
> typo: reads "a" pattern from input...
>

fixed


>
> I don't fully understand the part about subpatterns, but is that necessary
> to
> describe it?  Simply saying that any valid and possibly-quoted identifier
> can
> be parsed should make it clear that identifiers containing \n characters
> should
> work too.  Maybe also just mention that whitespaces are removed and special
> care is taken to output routines in exactly the same way calling code will
> expect it (that is comma-and-single-space type

Re: proposal: possibility to read dumped table's name from file

2022-11-12 Thread Pavel Stehule
Hi

and updated patch

Regards

Pavel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..d5a6e2c7ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | table_data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb > mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb > db.sql
 
 
  
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index e62d05e5ab..9cad26bbe6 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -122,6 +122,28 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for databases excluded
+from dump. The patterns are interpretted according to the same rules
+as --exclude-database.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for excluding databases,
+and can also be specified more than once for multiple filter files.
+   
+
+   
+The file lists one database pattern per row, with the following format:
+
+exclude database  PATTERN
+
+   
+  
+ 
+
  
   -g
   --globals-only
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 47bd7dbda0..ffeb564c52 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -188,6 +188,31 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects excluded
+or included from restore. The patterns are interpretted according to the
+same rules as --schema, --exclude-schema,
+--function, --index, --table
+or --trigger.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple

Re: libpq compression (part 2)

2022-11-12 Thread Andrey Borodin
On Tue, Aug 2, 2022 at 1:53 PM Jacob Champion  wrote:
>
> and changing the status to "Needs Review"

I've tried the patch, it works as advertised. The code generally is
OK, maybe some functions require comments (because at least their
neighbours have some).
Some linkers complain about zs_is_valid_impl_id() being inline while
used in other modules.

Some architectural notes:
1. Currently when the user requests compression from libpq, but the
server does not support any of the codecs client have - the connection
will be rejected by client. I think this should be configured akin to
SSL: on, try, off.
2. On the zpq_stream level of abstraction we parse a stream of bytes
to look for individual message headers. I think this is OK, just a
little bit awkward.
3. CompressionAck message can be completely replaced by ParameterStatus message.
4. Instead of sending a separate SetCompressionMethod, the
CompressedData can have its header with the index of the used method
and the necessity to restart compression context.
5. Portions of pg_stat_network_traffic can be extracted to separate
patch step to ease the review. And, actually, the scope of this view
is slightly beyond compression anyway.

What do you think?

Also, compression is a very cool and awaited feature, hope to see it
committed one day, thank you for working on this!


Best regards, Andrey Borodin.




Re: Use fadvise in wal replay

2022-11-12 Thread Andrey Borodin
On Sun, Aug 7, 2022 at 9:41 AM Andrey Borodin  wrote:
>

Hi everyone. The patch is 16 lines, looks harmless and with proven
benefits. I'm moving this into RfC.

Thanks!

Best regards, Andrey Borodin.




Re: A new strategy for pull-up correlated ANY_SUBLINK

2022-11-12 Thread Tom Lane
Andy Fan  writes:
> In the past we pull-up the ANY-sublink with 2 steps, the first step is to
> pull up the sublink as a subquery, and the next step is to pull up the
> subquery if it is allowed.  The benefits of this method are obvious,
> pulling up the subquery has more requirements, even if we can just finish
> the first step, we still get huge benefits. However the bad stuff happens
> if varlevelsup = 1 involves, things fail at step 1.

> convert_ANY_sublink_to_join ...

> if (contain_vars_of_level((Node *) subselect, 1))
> return NULL;

> In this patch we distinguish the above case and try to pull-up it within
> one step if it is helpful, It looks to me that what we need to do is just
> transform it to EXIST-SUBLINK.

This patch seems awfully messy to me.  The fact that you're having to
duplicate stuff done elsewhere suggests at the least that you've not
plugged the code into the best place.

Looking again at that contain_vars_of_level restriction, I think the
reason for it was just to avoid making a FROM subquery that has outer
references, and the reason we needed to avoid that was merely that we
didn't have LATERAL at the time.  So I experimented with the attached.
It seems to work, in that we don't get wrong answers from any of the
small number of places that are affected.  (I wonder though whether
those test cases still test what they were intended to, particularly
the postgres_fdw one.  We might have to try to hack them some more
to not get affected by this optimization.)  Could do with more test
cases, no doubt.

One thing I'm not at all clear about is whether we need to restrict
the optimization so that it doesn't occur if the subquery contains
outer references falling outside available_rels.  I think that that
case is covered by is_simple_subquery() deciding later to not pull up
the subquery based on LATERAL restrictions, but maybe that misses
something.

I'm also wondering whether the similar restriction in
convert_EXISTS_sublink_to_join could be removed similarly.
In this light it was a mistake for convert_EXISTS_sublink_to_join
to manage the pullup itself rather than doing it in the two-step
fashion that convert_ANY_sublink_to_join does it.

regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 558e94b845..c07280d836 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11377,19 +11377,19 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl)
   SERVER loopback OPTIONS (table_name 'base_tbl');
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
- QUERY PLAN  
--
- Seq Scan on public.base_tbl
+QUERY PLAN 
+---
+ Nested Loop Semi Join
Output: base_tbl.a
-   Filter: (SubPlan 1)
-   SubPlan 1
- ->  Result
-   Output: base_tbl.a
-   ->  Append
- ->  Async Foreign Scan on public.foreign_tbl foreign_tbl_1
-   Remote SQL: SELECT NULL FROM public.base_tbl
- ->  Async Foreign Scan on public.foreign_tbl2 foreign_tbl_2
-   Remote SQL: SELECT NULL FROM public.base_tbl
+   ->  Seq Scan on public.base_tbl
+ Output: base_tbl.a, base_tbl.b
+ Filter: (base_tbl.a IS NOT NULL)
+   ->  Materialize
+ ->  Append
+   ->  Async Foreign Scan on public.foreign_tbl foreign_tbl_1
+ Remote SQL: SELECT NULL FROM public.base_tbl
+   ->  Async Foreign Scan on public.foreign_tbl2 foreign_tbl_2
+ Remote SQL: SELECT NULL FROM public.base_tbl
 (11 rows)
 
 SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 92e3338584..3d4645a154 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1271,6 +1271,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	JoinExpr   *result;
 	Query	   *parse = root->parse;
 	Query	   *subselect = (Query *) sublink->subselect;
+	bool		has_level_1_vars;
 	Relids		upper_varnos;
 	int			rtindex;
 	ParseNamespaceItem *nsitem;
@@ -1283,11 +1284,10 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	Assert(sublink->subLinkType == ANY_SUBLINK);
 
 	/*
-	 * The sub-select must not refer to any Vars of the parent query. (Vars of
-	 * higher levels should be okay, though.)
+	 * If the sub-select refers to any Vars of the parent query, we have to
+	 * treat it as LATERAL.  (Vars of higher levels don't matter here.)
 	 */
-	if (contain_vars_of_lev

Re: PGDOCS - Logical replication GUCs - added some xrefs

2022-11-12 Thread vignesh C
On Mon, 24 Oct 2022 at 13:15, Peter Smith  wrote:
>
> Hi hackers.
>
> There is a docs Logical Replication section "31.10 Configuration
> Settings" [1] which describes some logical replication GUCs, and
> details on how they interact with each other and how to take that into
> account when setting their values.
>
> There is another docs Server Configuration section "20.6 Replication"
> [2] which lists the replication-related GUC parameters, and what they
> are for.
>
> Currently AFAIK those two pages are unconnected, but I felt it might
> be helpful if some of the parameters in the list [2] had xref links to
> the additional logical replication configuration information [1]. PSA
> a patch to do that.
>
> ~~
>
> Meanwhile, I also suspect that the main blurb top of [1] is not
> entirely correct... it says "These settings control the behaviour of
> the built-in streaming replication feature", although some of the GUCs
> mentioned later in this section are clearly for "logical replication".

The introduction mainly talks about streaming replication and the page
[1] subsection "Subscribers" clearly mentions that these
configurations are for logical replication. As we already have a
separate page [2] to detail about logical replication configurations,
it might be better to move the "subscribers" section from [1] to [2].

[1] 20.6 Replication -
https://www.postgresql.org/docs/current/runtime-config-replication.html
[2] 31.10 Configuration Settings -
https://www.postgresql.org/docs/current/logical-replication-config.html

Regards,
Vignesh




Re: libpq compression (part 2)

2022-11-12 Thread Andrey Borodin
On Sat, Nov 12, 2022 at 1:47 PM Andrey Borodin  wrote:
>
> I've tried the patch, it works as advertised.

While testing patch some more I observe unpleasant segfaults:

#26 0x7fecafa1e058 in __memcpy_ssse3_back () from target:/lib64/libc.so.6
#27 0x0b08fda2 in lz4_decompress (d_stream=0x18cf82a0,
src=0x7feae4fa505d, src_size=92,
src_processed=0x79f4fdf8, dst=0x18b01f80, dst_size=8192,
dst_processed=0x79f4fe60)
#28 0x0b090624 in zs_read (zs=0x18cdfbf0, src=0x7feae4fa505d,
src_size=92, src_processed=0x79f4fdf8,
dst=0x18b01f80, dst_size=8192, dst_processed=0x79f4fe60)
#29 0x0b08eb8f in zpq_read_compressed_message
(zpq=0x7feae4fa5010, dst=0x18b01f80 "Q", dst_len=8192,
dst_processed=0x79f4fe60)
#30 0x0b08f1a9 in zpq_read (zpq=0x7feae4fa5010,
dst=0x18b01f80, dst_size=8192, noblock=false)

(gdb) select-frame 27
(gdb) info locals
ds = 0x18cf82a0
decPtr = 0x18cf8aec ""
decBytes = -87

This is the buffer overrun by decompression. I think the receive
buffer must be twice bigger than the send buffer to accommodate such
messages.
Also this portion of lz4_decompress()
Assert(decBytes > 0);
must actually be a real check and elog(ERROR,). Because clients can
intentionally compose CompressedData to blow up a server.

Best regards, Andrey Borodin.