Re: More efficient RI checks - take 2

2020-09-29 Thread Michael Paquier
On Sat, Sep 26, 2020 at 09:59:17PM -0500, Justin Pryzby wrote:
> I'm interested in testing this patch, however there's a lot of internals to
> digest.

Even with that, the thread has been waiting on author for a couple of
weeks now, so I have marke dthe entry as RwF.
--
Michael


signature.asc
Description: PGP signature


Re: WIP: expression evaluation improvements

2020-09-29 Thread Michael Paquier
On Wed, Jul 01, 2020 at 02:50:14PM +0200, Daniel Gustafsson wrote:
> Since the CFBot patch tester isn't to apply and test a patchset divided across
> multiple emails, can you please submit the full patchset for consideration 
> such
> that we can get it to run in the CI?

This thread seems to have died a couple of weeks ago, so I have marked
it as RwF.
--
Michael


signature.asc
Description: PGP signature


Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-29 Thread Dilip Kumar
On Mon, Sep 28, 2020 at 2:22 PM Amit Kapila  wrote:
>
> On Mon, Sep 28, 2020 at 10:01 AM Keisuke Kuroda
>  wrote:
> >
> > > Okay. Feel free to clarify your questions if you have any? Are you
> > > interested in writing a patch for the same?
> >
> > Thank you! Yes, I would like to write a patch.
> > If you already have a discussed thread or patch for this idea,
> > please let me know.
> >
>
> I don't have a patch for this idea but you can refer [1]
> (v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see
> what I was trying to say about registering the invalidations in the
> form of ReorderBufferChange. We have something along the lines of what
> I described above in that patch but we have removed it because we need
> all the invalidations to be accumulated to handle aborts and we don't
> want two different mechanisms to store invalidations.

In some of the intermediate version of the logical-decoding, we had
collected in form of changes and later we changed it so that we can
execute all the invalidation on abort.  I just browsed old patch and
fetch the changes to collect the invalidation in form of changes.  I
have rebased on the current head so that we collect both in form of
changes as well as collection of the invalidation.  So if we get
individiaual invalidation we execute them and on abort we execute all
the invalidation together.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Collect-command-invalidation-in-form-of-changes.patch
Description: Binary data


Re: [PATCH] Add section headings to index types doc

2020-09-29 Thread Michael Paquier
On Mon, Aug 10, 2020 at 12:52:17PM +, Jürgen Purtz wrote:
> The new status of this patch is: Waiting on Author

This has not been answered yet, so I have marked the patch as returned
with feedback.
--
Michael


signature.asc
Description: PGP signature


Re: psql FETCH_COUNT feature does not work with combined queries

2020-09-29 Thread Michael Paquier
On Thu, Jul 16, 2020 at 04:44:47PM -0700, David G. Johnston wrote:
> When FETCH_COUNT is non-zero, and you send a multi-statement command
> to the server (for example via -c or the \; meta-command), the results
> are undefined. This combination is presently allowed for backward
> compatibility.

This has been waiting on author for two months now, so I have marked
the patch as RwF in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: psql - add SHOW_ALL_RESULTS option

2020-09-29 Thread Michael Paquier
On Mon, Jul 20, 2020 at 07:48:42AM +0200, Fabien COELHO wrote:
> Yes, indeed. I'm planning to investigate, hopefully this week.

This reply was two months ago, and nothing has happened, so I have
marked the patch as RwF.
--
Michael


signature.asc
Description: PGP signature


Re: list of extended statistics on psql

2020-09-29 Thread Michael Paquier
On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote:
> Could you provide at least a rebased version of the patch?  The CF bot
> is complaning here.

Not seeing this answered after two weeks, I have marked the patch as
RwF for now.
--
Michael


signature.asc
Description: PGP signature


Re: pg_dump --where option

2020-09-29 Thread Michael Paquier
On Mon, Sep 14, 2020 at 05:00:19PM +0200, Daniel Gustafsson wrote:
> I'm not sure I follow. Surely tests can be added for this functionality?

We should have tests for that.  I can see that this has not been
answered in two weeks, so this has been marked as returned with
feedback in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2020-09-29 Thread Michael Paquier
On Sun, Sep 06, 2020 at 08:14:21PM -0700, Noah Misch wrote:
> http://cfbot.cputube.org/patch_29_2026.log applies the two patches in the
> wrong order.  For this CF entry, ignore it.

OK, thanks.  This is a bug fix, so I have moved that to the next CF
for now.  Noah, would you prefer more reviews or are you confident
enough to move on with this issue?
--
Michael


signature.asc
Description: PGP signature


Re: Parallel INSERT (INTO ... SELECT ...)

2020-09-29 Thread Dilip Kumar
On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow  wrote:
>
> > >
> > > I think you still need to work on the costing part, basically if we
> > > are parallelizing whole insert then plan is like below
> > >
> > > -> Gather
> > >   -> Parallel Insert
> > >   -> Parallel Seq Scan
> > >
> > > That means the tuple we are selecting via scan are not sent back to
> > > the gather node, so in cost_gather we need to see if it is for the
> > > INSERT then there is no row transferred through the parallel queue
> > > that mean we need not to pay any parallel tuple cost.
> >
> > I just looked into the parallel CTAS[1] patch for the same thing, and
> > I can see in that patch it is being handled.
> >
> > [1] 
> > https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> >
>
> Hi Dilip,
>
> You're right, the costing for Parallel Insert is not done and
> finished, I'm still working on the costing, and haven't posted an
> updated patch for it yet.

Okay.

> As far as cost_gather() method is concerned, for Parallel INSERT, it
> can probably use the same costing approach as the CTAS patch except in
> the case of a specified RETURNING clause.

Yeah right.  I did not think about the returning part.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: __pg_log_level in anonynous enum should be initialized? (Was: pgsql: Change SHA2 implementation based on OpenSSL to use EVP digest ro)

2020-09-29 Thread Michael Paquier
On Tue, Sep 29, 2020 at 09:29:18AM -0400, Tom Lane wrote:
> Hm, it doesn't seem like "OBJS_PGCOMMON" conveys much.  I think
> OBJS_FRONTEND ought to be the set of files built for frontend
> application use (i.e., libpgcommon.a), so we need a new name
> for what will go into libpgcommon_shlib.a.  Maybe OBJS_SHLIB,
> or to be even more explicit, OBJS_FRONTEND_SHLIB.

OBJS_SHLIB is already used to list the _shlib.o objects compiled with
libpgcommon_shlib.a, which is why I switched OBJS_FRONTEND to
OBJS_PGCOMMON, and an object cannot reference itself so we either need
two variables for the shlib list, or we could just do something like
that:
OBJS_FRONTEND = \
$(OBJS_COMMON) \
fe_memutils.o \
restricted_token.o \
sprompt.o
OBJS_SHLIB = $(OBJS_FRONTEND:%.o=%_shlib.o)
OBJS_FRONTEND += logging.o

However I would prefer the style of the attached.

> As for the comment, maybe "logging.c is excluded from OBJS_SHLIB
> as a matter of policy, because it is not appropriate for general
> purpose libraries such as libpq to report errors directly."

WFM.  Thanks!
--
Michael
diff --git a/src/common/Makefile b/src/common/Makefile
index f281762885..25c55bd642 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -88,16 +88,21 @@ OBJS_COMMON += sha2.o
 endif
 
 # A few files are currently only built for frontend, not server
-# (Mkvcbuild.pm has a copy of this list, too)
-OBJS_FRONTEND = \
+# (Mkvcbuild.pm has a copy of this list, too).  logging.c is excluded
+# from OBJS_FRONTEND_SHLIB (shared library) as a matter of policy,
+# because it is not appropriate for general purpose libraries such
+# as libpq to report errors directly.
+OBJS_FRONTEND_SHLIB = \
 	$(OBJS_COMMON) \
 	fe_memutils.o \
-	logging.o \
 	restricted_token.o \
 	sprompt.o
+OBJS_FRONTEND = \
+	$(OBJS_FRONTEND_SHLIB) \
+	logging.o
 
 # foo.o, foo_shlib.o, and foo_srv.o are all built from foo.c
-OBJS_SHLIB = $(OBJS_FRONTEND:%.o=%_shlib.o)
+OBJS_SHLIB = $(OBJS_FRONTEND_SHLIB:%.o=%_shlib.o)
 OBJS_SRV = $(OBJS_COMMON:%.o=%_srv.o)
 
 # where to find gen_keywordlist.pl and subsidiary files


signature.asc
Description: PGP signature


Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-29 Thread Keisuke Kuroda
Hi Amit,

> I don't have a patch for this idea but you can refer [1]
> (v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see
> what I was trying to say about registering the invalidations in the
> form of ReorderBufferChange. We have something along the lines of what
> I described above in that patch but we have removed it because we need
> all the invalidations to be accumulated to handle aborts and we don't
> want two different mechanisms to store invalidations.

Thanks, I read the patch
(v25-0002-Issue-individual-invalidations-with-wal_level-lo)
and the review.

I understand the following.
* In v25-0002, there were two different mechanisms,
  XLOG_XACT_INVALIDATIONS (ReorderBufferAddInvalidation) and
  XLOG_INVALIDATIONS (ReorderBufferAddInvalidations).
* After a review, XLOG_XACT_INVALIDATIONS was implemented to
  generate all invalidation messages.

I'm going to write a patch that looks like the following.
* Add the process of collecting invalidation to
  XLOG_XACT_INVALIDATIONS in the form of ReorderBufferChange.
* Invalidation is executed in case
  REORDER_BUFFER_CHANGE_INVALIDATION.

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Some comment problem in nodeAgg.c

2020-09-29 Thread Hou, Zhijie
Hi

When looking into the code about hash disk,
I found some comment in nodeAgg.c may have not been updated.

1. Since function lookup_hash_entry() has been deleted, 
there are still some comment talk about lookup_hash_entry().

 * and is packed/unpacked in lookup_hash_entry() / agg_retrieve_hash_table()
 ...
 * GROUP BY columns.  The per-group data is allocated in lookup_hash_entry(),
 ...
 * Be aware that lookup_hash_entry can reset the tmpcontext.


2. Now we can use hash_mem_multiplier to set hashagg's mem limit,
   The comment in hash_agg_set_limits() still use "work mem" like the following.

/*
 * Don't set the limit below 3/4 of hash_mem. In that case, we are at 
the
 * minimum number of partitions, so we aren't going to dramatically 
exceed
 * ## work mem ## anyway.

Does it mean hash_mem here?

Best regards,
houzj








Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2020-09-29 Thread Pavel Stehule
st 30. 9. 2020 v 4:01 odesílatel David G. Johnston <
david.g.johns...@gmail.com> napsal:

> On Tue, Sep 15, 2020 at 3:48 PM Alexander Korotkov 
> wrote:
>
>> Hi!
>>
>> I've skimmed through the thread and checked the patchset.  Everything
>> looks good, except one paragraph, which doesn't look completely clear.
>>
>> +  
>> +   This emulates the functionality provided by
>> +but sets the created object's
>> +   type
>> definition
>> +   to domain.
>> +  
>>
>> As I get it states that CREATE DOMAIN somehow "emulates" CREATE TYPE.
>> Could you please, rephrase it?  It looks confusing to me yet.
>>
>
> v5 attached, looking at this fresh and with some comments to consider.
>
> I ended up just combining both patches into one.
>
> I did away with the glossary changes altogether, and the invention of the
> new term.  I ended up limiting "type's type" to just domain usage but did a
> couple of a additional tweaks that tried to treat domains as not being
> actual types even though, at least in PostgreSQL, they are (at least as far
> as DROP TYPE is concerned - and since I don't have any understanding of the
> SQL Standard's decision to separate out create domain and create type I'll
> just stick to the implementation in front of me.
>

+1

Pavel


> David J.
>


Re: Optimize memory allocation code

2020-09-29 Thread Li Japin


On Sep 29, 2020, at 9:30 PM, Alvaro Herrera 
mailto:alvhe...@2ndquadrant.com>> wrote:

On 2020-Sep-26, Li Japin wrote:

Thanks! How big is this overhead? Is there any way I can test it?

You could also have a look at the assembly code that your compiler
generates -- particularly examine how it changes.

Thanks for your advice!

The origin assembly code for palloc0 is:

00517690 :
  517690: 55   push   %rbp
  517691: 53   push   %rbx
  517692: 48 89 fb mov%rdi,%rbx
  517695: 48 83 ec 08   sub$0x8,%rsp
  517699: 48 81 ff ff ff ff 3f cmp$0x3fff,%rdi
  5176a0: 48 8b 2d d9 0c 48 00 mov0x480cd9(%rip),%rbp# 998380 

  5176a7: 0f 87 d5 00 00 00 ja 517782 
  5176ad: 48 8b 45 10   mov0x10(%rbp),%rax
  5176b1: 48 89 fe mov%rdi,%rsi
  5176b4: c6 45 04 00   movb   $0x0,0x4(%rbp)
  5176b8: 48 89 ef mov%rbp,%rdi
  5176bb: ff 10 callq  *(%rax)
  5176bd: 48 85 c0 test   %rax,%rax
  5176c0: 48 89 c1 mov%rax,%rcx
  5176c3: 74 5b je 517720 
  5176c5: f6 c3 07 test   $0x7,%bl
  5176c8: 75 36 jne517700 
  5176ca: 48 81 fb 00 04 00 00 cmp$0x400,%rbx
  5176d1: 77 2d ja 517700 
  5176d3: 48 01 c3 add%rax,%rbx
  5176d6: 48 39 d8 cmp%rbx,%rax
  5176d9: 73 35 jae517710 
  5176db: 0f 1f 44 00 00   nopl   0x0(%rax,%rax,1)
  5176e0: 48 83 c0 08   add$0x8,%rax
  5176e4: 48 c7 40 f8 00 00 00 movq   $0x0,-0x8(%rax)
  5176eb: 00
  5176ec: 48 39 c3 cmp%rax,%rbx
  5176ef: 77 ef ja 5176e0 
  5176f1: 48 83 c4 08   add$0x8,%rsp
  5176f5: 48 89 c8 mov%rcx,%rax
  5176f8: 5b   pop%rbx
  5176f9: 5d   pop%rbp
  5176fa: c3   retq
  5176fb: 0f 1f 44 00 00   nopl   0x0(%rax,%rax,1)
  517700: 48 89 cf mov%rcx,%rdi
  517703: 48 89 da mov%rbx,%rdx
  517706: 31 f6 xor%esi,%esi
  517708: e8 e3 0e ba ff   callq  b85f0 
  51770d: 48 89 c1 mov%rax,%rcx
  517710: 48 83 c4 08   add$0x8,%rsp
  517714: 48 89 c8 mov%rcx,%rax
  517717: 5b   pop%rbx
  517718: 5d   pop%rbp
  517719: c3   retq
  51771a: 66 0f 1f 44 00 00 nopw   0x0(%rax,%rax,1)
  517720: 48 8b 3d 51 0c 48 00 mov0x480c51(%rip),%rdi# 998378 

  517727: be 64 00 00 00   mov$0x64,%esi
  51772c: e8 1f f9 ff ff   callq  517050 
  517731: 31 f6 xor%esi,%esi
  517733: bf 14 00 00 00   mov$0x14,%edi
  517738: e8 53 6d fd ff   callq  4ee490 
  51773d: bf c5 20 00 00   mov$0x20c5,%edi
  517742: e8 99 9b fd ff   callq  4f12e0 
  517747: 48 8d 3d 07 54 03 00 lea0x35407(%rip),%rdi# 54cb55 
<__func__.7554+0x45>
  51774e: 31 c0 xor%eax,%eax
  517750: e8 ab 9d fd ff   callq  4f1500 
  517755: 48 8b 55 38   mov0x38(%rbp),%rdx
  517759: 48 8d 3d 80 11 16 00 lea0x161180(%rip),%rdi# 6788e0 
<__func__.6248+0x150>
  517760: 48 89 de mov%rbx,%rsi
  517763: 31 c0 xor%eax,%eax
  517765: e8 56 a2 fd ff   callq  4f19c0 
  51776a: 48 8d 15 ff 11 16 00 lea0x1611ff(%rip),%rdx# 678970 
<__func__.7326>
  517771: 48 8d 3d 20 11 16 00 lea0x161120(%rip),%rdi# 678898 
<__func__.6248+0x108>
  517778: be eb 03 00 00   mov$0x3eb,%esi
  51777d: e8 0e 95 fd ff   callq  4f0c90 
  517782: 31 f6 xor%esi,%esi
  517784: bf 14 00 00 00   mov$0x14,%edi
  517789: e8 02 6d fd ff   callq  4ee490 
  51778e: 48 8d 3d db 10 16 00 lea0x1610db(%rip),%rdi# 678870 
<__func__.6248+0xe0>
  517795: 48 89 de mov%rbx,%rsi
  517798: 31 c0 xor%eax,%eax
  51779a: e8 91 98 fd ff   callq  4f1030 
  51779f: 48 8d 15 ca 11 16 00 lea0x1611ca(%rip),%rdx# 678970 
<__func__.7326>
  5177a6: 48 8d 3d eb 10 16 00 lea0x1610eb(%rip),%rdi# 678898 
<__func__.6248+0x108>
  5177ad: be df 03 00 00   mov$0x3df,%esi
  5177b2: e8 d9 94 fd ff   callq  4f0c90 
  5177b7: 66 0f 1f 84 00 00 00 nopw   0x0(%rax,%rax,1)
  5177be: 00 00

After modified, the palloc0 assembly code is:

00517690 :
  517690: 53   push   %rbx
  517691: 48 89 fb mov%rdi,%rbx
  517694: e8 17 ff ff ff   callq  5175b0 
  517699: f6 c3 07 test   $0x7,%bl
  51769c: 48 89 c1 mov%rax,%rcx
  51769f: 75 2f jne5176d0 
  5176a1: 48 81 fb 00 04 00 00 cmp$0x400,%rbx
  5176a8: 77 26 ja 5176d0 
  5176aa: 48 01 c3 add%rax,%rbx
  5176ad: 48 39 d8 cmp%rbx,%rax
  517

Re: NOTIFY docs fixup - emit and deliver consistency

2020-09-29 Thread David G. Johnston
On Tue, Sep 29, 2020 at 7:58 PM Tom Lane  wrote:

> Maybe we could use terminology along the lines of "added to the
> queue" and "removed from the queue"?
>

Quickly looking it over with this in mind there are a few spots that can be
cleaned up and linked together by explicitly talking about a FIFO queue as
the mechanism instead of the less precise send/process/deliver.  A bit more
invasive but I think it will be done more clearly with this approach.

David J.


Re: problem with RETURNING and update row movement

2020-09-29 Thread Amit Langote
On Thu, Sep 24, 2020 at 2:26 PM Amit Langote  wrote:
> On Thu, Sep 24, 2020 at 1:54 PM Michael Paquier  wrote:
> > On Thu, Sep 24, 2020 at 04:25:24AM +0900, Etsuro Fujita wrote:
> > > Sorry, my thought was vague.  To store xmin/xmax/cmin/cmax into a
> > > given slot, we need to extend the TupleTableSlot struct to contain
> > > these attributes as well?  Or we need to introduce a new callback
> > > routine for that (say, setsysattr)?  These would not be
> > > back-patchable, though.
> >
> > Please note that the latest patch fails to apply, so this needs a
> > rebase.
>
> I saw the CF-bot failure too yesterday, although it seems that it's
> because the bot is trying to apply the patch version meant for v11
> branch onto HEAD branch.  I just checked that the patches apply
> cleanly to their respective branches.

I checked that the last statement is still true, so I've switched the
status back to Needs Review.


--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




RE: Disable WAL logging to speed up data loading

2020-09-29 Thread osumi.takami...@fujitsu.com
Hello, Ashutosh

> Can they use a database with all unlogged tables?
Unfortunately, no. They want to switch a cluster condition to "without WAL 
logging"
only when they execute night bulk loading for their data warehouse.
In other words, they would like to keep their other usual operations with WAL.
In addition, using all tables as unlogged brings about
the risk to lose data warehouse's data caused by an unexpected server crash or 
power outage.

Regards,
Takamichi Osumi


Re: Use PG_FINALLY to simplify code

2020-09-29 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Tue, 29 Sep 2020 01:03:13 +, "Hou, Zhijie"  
> wrote in 
>> Since PG_FINALLY can be used now, I think we can use PG_FINALLY to simplify 
>> code here.

> The patch removes PG_RETHROW(), which is crucial in the code
> path.

No, that's not a problem, because PG_FINALLY incorporates logic
to reproduce the PG_RE_THROW action if we get to the code block
due to an error being thrown.

The patch is nonetheless moot, because after a6b1f5365 those
two code paths are no longer identical.

regards, tom lane




Re: NOTIFY docs fixup - emit and deliver consistency

2020-09-29 Thread Tom Lane
"David G. Johnston"  writes:
> Over in [1] Greg got confused by some wording in our NOTIFY documentation.
> The attached patch uses "emits" and "delivered" more consistently (in
> lieu of "processed" in the complained of location).

Meh --- I do not think "emitted" is much of an improvement over "sent".
(I agree it's not great that these two places don't use matching
terminology, though.)  Neither is clear as to where the message is
sent or emitted.

As for the other end of it, I don't like "delivered" because it presumes
that the processing action necessarily is to send the message to the
connected client.  When a backend takes a message off the queue, it may
just drop it on the floor because its client is not listening to that
channel.  Nonetheless, until it's done so that message must consume
queue space.

Maybe we could use terminology along the lines of "added to the
queue" and "removed from the queue"?

regards, tom lane




pg_proc.dat "proargmodes is not a 1-D char array"

2020-09-29 Thread Craig Ringer
Hi all

Random tip for future searchers. If you've modified pg_proc.dat and  initdb
fails with "proargmodes is not a 1-D char array" - it could well actually
be that the length of proargmodes does not match the length of
proallargtypes given the test

ARR_DIMS(arr)[0] != numargs ||

in funcapi.c.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Document JSON primitive quoting and letter-case rules

2020-09-29 Thread David G. Johnston
In Bug# 11636 [1] ChandraKumar complained that we documented fairly
explicitly how to write json boolean true and false but not json null.
 While repeating the entire RFC in our docs isn't desirable its seems worth
considering making the comments on the existing table consistently cover
quoting and letter-case rules for the four primitive types shown.  Thus the
attached.

David J.

[1]
https://www.postgresql.org/message-id/CAHyU1EYRYCV8gELrBYm6V8E%2BToqf%3DKQe370bPt4yrE_Y1yDZ9w%40mail.gmail.com
commit e009aab5ad4cb7852fec91d759aef3f356542a5e
Author: David G. Johnston 
Date:   Wed Sep 30 02:46:26 2020 +

Don't make reader view RFC for scalar json value syntax

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index c0a6554d4d..d727ffc669 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -147,22 +147,22 @@
 string
 text
 \u is disallowed, as are Unicode escapes
- representing characters not available in the database encoding
+ representing characters not available in the database encoding.  JSON values are double-quoted.


 number
 numeric
-NaN and infinity values are disallowed
+NaN and infinity values are disallowed.  JSON values are unquoted.


 boolean
 boolean
-Only lowercase true and false spellings are accepted
+Only unquoted lowercase true and false spellings are accepted


 null
 (none)
-SQL NULL is a different concept
+SQL NULL is a different concept.  Must be written lowercase and unquoted.

   
  


Re: Use PG_FINALLY to simplify code

2020-09-29 Thread Kyotaro Horiguchi
At Tue, 29 Sep 2020 01:03:13 +, "Hou, Zhijie"  
wrote in 
> In (/src/pl/plpgsql/src/pl_exec.c), I found some code like the following:
> 
>   PG_CATCH();
>   {
>   if (expr->plan && !expr->plan->saved)
>   expr->plan = NULL;
>   PG_RE_THROW();
>   }
>   PG_END_TRY();
> 
>   if (expr->plan && !expr->plan->saved)
>   expr->plan = NULL;
> 
> Since PG_FINALLY can be used now, I think we can use PG_FINALLY to simplify 
> code here.

The patch removes PG_RETHROW(), which is crucial in the code
path. There's at least one other instance of that coding in pquery.c
but PG_FINALLY() is not applicable there for the same reason, too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Disable WAL logging to speed up data loading

2020-09-29 Thread osumi.takami...@fujitsu.com
Hi, Amul

> > We'd like to propose a feature to disable WAL to speed up data loading.  
> > This
> was inspired by a feature added in the latest MySQL.  I wish you won't fear
> this feature...
> >
> 
> TWIMW, pg_bulkload contrib module[1], also does the same for the faster data
> loading.

Both features are helpful to make the data loading faster,
but those are different.

There are at least two major merits as their differences to use 
wal_level='none'.
The first one happens when user upgrades major version by pg_dumpall.
Imagine a case that one user gets a logical backup of whole cluster by 
pg_dumpall. 
The output file contains many COPY commands in it to recreate and upgrade the 
cluster.

Setting wal_level='none' can be easily used to boost the speed to remake the
cluster of a newer version by setting the wal_level.
OTOH, pg_bulkload can't treat this flow of upgrade in an easy way.
It requires to plan and write the detail of control files or commands for each 
table manually.
This wouldn't be easy for users.

Secondly,
to use pg_bulkload requires us to use a wrapper command of pg_ctl
like "pg_bulkload -r", which is prepared only for the feature.
This command is used when pg_bulkload is crashed.
The document recommends not to use pg_ctl directly in such a case.
Like this, paying attention to such usages or minor differences of usage is 
troublesome
while running a long operation of service, without support of the OSS community.

What did you think ?

Regards,
Takamichi Osumi


RE: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread tsunakawa.ta...@fujitsu.com
From: Vladimir Sitnikov 
> Just in case, I'm PgJDBC committer.

Thank you very much for your great efforts for the wonderful PgJDBC.  I saw you 
active.

# I'd be happy if you send emails in text format so that the reply looks nice.  
Your email seems to be in HTML.

> and apparently, Andrew is surprised that the database lacks BLOB/CLOB support.

I was a bit surprised too when I first saw Postgres not support blob/clob but 
bytea, because I had an impression that Postgres is highly SQL standard 
compliant.  I'm for adding blob/clob data types in server.

At the same time, I wonder why Postgres had to add bytea instead of blob.  It 
may be that there are or were some technical issues.  They may stand in the way 
even now.

One thing I can think of is the parameter format (text/binary).  libpq's 
PQexecParams() can specify input format for each parameter, but it can only 
specify the output format for all returned columns, not for each column.  As a 
consequence, the bytea host variable support added in PG 12 can INSERT 1 GB of 
binary data, but retrieval of the value fails with an error message like 
"invalid alloc request."  That's because the server allocates twice the size of 
stored data to convert it into text format, whose size becomes about 2 GB.  
That exceeds the limit palloc() can allocate.

33.3. Command Execution Functions
https://www.postgresql.org/docs/devel/libpq-exec.html


> The concerns to avoid "Clob maps to text" could be:
> a) Once the behavior is implemented, it is hard to change. That is 
> applications would rely on it (and it becomes a defacto standard), and it 
> would be hard to move to the proper "text with streaming API" datatype.
> b) If we make , then people might start using update/substring 
> APIs (which is the primary motivation for Clob) without realizing there’s 
> full value update behind the scenes. Currently, they can use 
> setString/getString for text, and it is crystal clear that the text is 
> updated fully on every update.

And if we treat clob as a synonym for text (just like the relationship between 
char and nchar), even when the user writes clob in DDL, pg_dump will output it 
as text.  That makes it a bit harder to use the output for other DBMSs.


Regards
Takayuki Tsunakawa



NOTIFY docs fixup - emit and deliver consistency

2020-09-29 Thread David G. Johnston
Hackers,

Over in [1] Greg got confused by some wording in our NOTIFY documentation.
The attached patch uses "emits" and "delivered" more consistently (in
lieu of "processed" in the complained of location).

[1]
https://www.postgresql.org/message-id/6EDB6A52-17F1-4DA9-B5B8-3BFFD5A576C8%40loblaw.ca

David J.


v1-notify-doc-fixup.patch
Description: Binary data


Re: [PATCH] We install pg_regress and isolationtester but not pg_isolation_regress

2020-09-29 Thread Craig Ringer
On Tue, 29 Sep 2020 at 22:09, Alvaro Herrera 
wrote:

> On 2018-Jun-01, Craig Ringer wrote:
>
> > On 28 May 2018 at 15:06, Craig Ringer  wrote:
> >
> > > Per topic, the Pg makefiles install pg_regress (for use by extensions)
> and
> > > htey install the isolationtester, but they don't install
> > > pg_isolation_regress.
> > >
> > > We should install it too.
>
> I happened to come across this thread by accident, and I tend to agree
> that we need to install both isolationtester and pg_isolation_regress to
> the pgxs dirs, just like we do pg_regress.  I can't find that
> isolationtester is installed anywhere though; maybe that changed after
> you posted this.  Anyway, this version of the patch installs both.
>

Thanks.

I think rules were added to allow the isolation tester to be installed with
an explicit make -C src/test/isolation install, but not added to the
default "install" target.

I agree it should just be installed.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Parallel INSERT (INTO ... SELECT ...)

2020-09-29 Thread Greg Nancarrow
> >
> > I think you still need to work on the costing part, basically if we
> > are parallelizing whole insert then plan is like below
> >
> > -> Gather
> >   -> Parallel Insert
> >   -> Parallel Seq Scan
> >
> > That means the tuple we are selecting via scan are not sent back to
> > the gather node, so in cost_gather we need to see if it is for the
> > INSERT then there is no row transferred through the parallel queue
> > that mean we need not to pay any parallel tuple cost.
>
> I just looked into the parallel CTAS[1] patch for the same thing, and
> I can see in that patch it is being handled.
>
> [1] 
> https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
>

Hi Dilip,

You're right, the costing for Parallel Insert is not done and
finished, I'm still working on the costing, and haven't posted an
updated patch for it yet.
As far as cost_gather() method is concerned, for Parallel INSERT, it
can probably use the same costing approach as the CTAS patch except in
the case of a specified RETURNING clause.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2020-09-29 Thread David G. Johnston
On Tue, Sep 15, 2020 at 3:48 PM Alexander Korotkov 
wrote:

> Hi!
>
> I've skimmed through the thread and checked the patchset.  Everything
> looks good, except one paragraph, which doesn't look completely clear.
>
> +  
> +   This emulates the functionality provided by
> +but sets the created object's
> +   type
> definition
> +   to domain.
> +  
>
> As I get it states that CREATE DOMAIN somehow "emulates" CREATE TYPE.
> Could you please, rephrase it?  It looks confusing to me yet.
>

v5 attached, looking at this fresh and with some comments to consider.

I ended up just combining both patches into one.

I did away with the glossary changes altogether, and the invention of the
new term.  I ended up limiting "type's type" to just domain usage but did a
couple of a additional tweaks that tried to treat domains as not being
actual types even though, at least in PostgreSQL, they are (at least as far
as DROP TYPE is concerned - and since I don't have any understanding of the
SQL Standard's decision to separate out create domain and create type I'll
just stick to the implementation in front of me.

David J.


001-v5-drop-if-exists-docs-and-tests.patch
Description: Binary data


Re: Parallel Full Hash Join

2020-09-29 Thread Melanie Plageman
On Mon, Sep 21, 2020 at 8:34 PM Thomas Munro  wrote:

> On Tue, Sep 22, 2020 at 8:49 AM Melanie Plageman
>  wrote:
> > On Wed, Sep 11, 2019 at 11:23 PM Thomas Munro 
> wrote:
>
> I took it for a very quick spin and saw simple cases working nicely,
> but TPC-DS queries 51 and 97 (which contain full joins) couldn't be
> convinced to use it.  Hmm.
>

Thanks for taking a look, Thomas!

Both query 51 and query 97 have full outer joins of two CTEs, each of
which are aggregate queries.

During planning when constructing the joinrel and choosing paths, in
hash_inner_and_outer(), we don't consider parallel hash parallel hash
join paths because the outerrel and innerrel do not have
partial_pathlists.

This code

  if (joinrel->consider_parallel &&
save_jointype != JOIN_UNIQUE_OUTER &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))

gates the code to generate partial paths for hash join.

My understanding of this is that if the inner and outerrel don't have
partial paths, then they can't be executed in parallel, so the join
could not be executed in parallel.

For the two TPC-DS queries, even if they use parallel aggs, the finalize
agg will have to be done by a single worker, so I don't think they could
be joined with a parallel hash join.

I added some logging inside the "if" statement and ran join_hash.sql in
regress to see what nodes were typically in the pathlist and partial
pathlist. All of them had basically just sequential scans as the outer
and inner rel paths. regress examples are definitely meant to be
minimal, so this probably wasn't the best place to look for examples of
more complex rels that can be joined with a parallel hash join.


>
> >> Some other notes on the patch:
>
> From a quick peek:
>
> +/*
> + * Upon arriving at the barrier, if this worker is not the last
> worker attached,
> + * detach from the barrier and return false. If this worker is the last
> worker,
> + * remain attached and advance the phase of the barrier, return true
> to indicate
> + * you are the last or "elected" worker who is still attached to the
> barrier.
> + * Another name I considered was BarrierUniqueify or BarrierSoloAssign
> + */
> +bool
> +BarrierDetachOrElect(Barrier *barrier)
>
> I tried to find some existing naming in writing about
> barriers/phasers, but nothing is jumping out at me.  I think a lot of
> this stuff comes from super computing where I guess "make all of the
> threads give up except one" isn't a primitive they'd be too excited
> about :-)
>
> BarrierArriveAndElectOrDetach()... gah, no.
>

You're right that Arrive should be in there.
So, I went with BarrierArriveAndDetachExceptLast()
It's specific, if not clever.


>
> +last = BarrierDetachOrElect(&batch->batch_barrier);
>
> I'd be nice to add some assertions after that, in the 'last' path,
> that there's only one participant and that the phase is as expected,
> just to make it even clearer to the reader, and a comment in the other
> path that we are no longer attached.
>

Assert and comment added to the single worker path.
The other path is just back to HJ_NEED_NEW_BATCH and workers will detach
there as before, so I'm not sure where we could add the comment about
the other workers detaching.


>
> +hjstate->hj_AllocatedBucketRange = 0;
> ...
> +pg_atomic_uint32 bucket;/* bucket allocator for unmatched inner
> scan */
> ...
> +//volatile int mybp = 0; while (mybp == 0)
>
> Some leftover fragments of the bucket-scan version and debugging stuff.
>

cleaned up (and rebased).

I also changed ExecScanHashTableForUnmatched() to scan HashMemoryChunks
in the hashtable instead of using the buckets to align parallel and
serial hash join code.

Originally, I had that code freeing the chunks of the hashtable after
finishing scanning them, however, I noticed this query from regress
failing:

select * from
(values (1, array[10,20]), (2, array[20,30])) as v1(v1x,v1ys)
left join (values (1, 10), (2, 20)) as v2(v2x,v2y) on v2x = v1x
left join unnest(v1ys) as u1(u1y) on u1y = v2y;

It is because the hash join gets rescanned and because there is only one
batch, ExecReScanHashJoin reuses the same hashtable.

 QUERY PLAN
-
 Nested Loop Left Join
   ->  Values Scan on "*VALUES*"
   ->  Hash Right Join
 Hash Cond: (u1.u1y = "*VALUES*_1".column2)
 Filter: ("*VALUES*_1".column1 = "*VALUES*".column1)
 ->  Function Scan on unnest u1
 ->  Hash
   ->  Values Scan on "*VALUES*_1"

I was freeing the hashtable as I scanned each chunk, which clearly
doesn't work for a single batch hash join which gets rescanned.

I don't see anything specific to parallel hash join in ExecReScanHashJoin(),
so, it seems like the same rules apply to parallel hash join. So, I will
have to remove the logic that frees the hash table after scanning each
chunk from the parallel function as well.

In addition, I s

Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

2020-09-29 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Sep-29, Tom Lane wrote:
>> So I wonder why PostgresNode.pm is doing
>> print $conf "max_wal_senders = 5\n";
>> Considering that our default these days is 10 senders, and that a
>> walsender slot doesn't really cost much, this seems unduly cheapskate.
>> I propose raising this to 10.

> I suggest to remove that line.  max_wal_senders used to default to 0
> when PostgresNode was touched to have this line in commit 89ac7004dad;
> the global default was raised in f6d6d2920d2c.

Hm.  We could do so back to v10 where that came in, and there are no
src/test/subscription tests before v10, so that should be sufficient.
Sold.

regards, tom lane




Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

2020-09-29 Thread Alvaro Herrera
On 2020-Sep-29, Tom Lane wrote:

> So I wonder why PostgresNode.pm is doing
> 
>   print $conf "max_wal_senders = 5\n";
> 
> Considering that our default these days is 10 senders, and that a
> walsender slot doesn't really cost much, this seems unduly cheapskate.
> I propose raising this to 10.

I suggest to remove that line.  max_wal_senders used to default to 0
when PostgresNode was touched to have this line in commit 89ac7004dad;
the global default was raised in f6d6d2920d2c.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Why does PostgresNode.pm set such a low value of max_wal_senders?

2020-09-29 Thread Tom Lane
I noticed this recent buildfarm failure:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-09-29%2018%3A45%3A17

which boils down to

error running SQL: 'psql::1: ERROR:  could not connect to the publisher: 
FATAL:  number of requested standby connections exceeds max_wal_senders 
(currently 5)'
while running 'psql -XAtq -d port=62411 host=/tmp/cmXKiWUDs9 dbname='postgres' 
-f - -v ON_ERROR_STOP=1' with sql 'ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION' 
at 
/home/pgbf/buildroot/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm
 line 1546.

Digging in the postmaster log shows that indeed we were at the limit
of 5 wal senders.  One was about to exit (else this test could never
succeed at all), but it had not done so fast enough to avoid this
failure.

Further digging in the buildfarm archives shows that "number of requested
standby connections exceeds max_wal_senders" seems rather common on our
slower buildfarm members, eg there are two such complaints in prairiedog's
latest successful HEAD build.  Apparently, most of the time this gets
masked by automatic restart of logrep workers; but when a test script
involves explicit execution of a replication command, it's going to notice
if that try fails to connect.

So I wonder why PostgresNode.pm is doing

print $conf "max_wal_senders = 5\n";

Considering that our default these days is 10 senders, and that a
walsender slot doesn't really cost much, this seems unduly cheapskate.
I propose raising this to 10.

There might be some value in the fact that this situation is exercising
the automatic-reconnection behavior, but if so I'd like to find a more
consistent way of testing that.

regards, tom lane




Re: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread Vladimir Sitnikov
Andrew>It needs to contain a substantial implementation plan

Here's an implementation plan, again, quoted from the very same mail:

Vladimir>Of course both variations above fail to support streaming
Vladimir> (as in "need to process all the contents in order to get the last
character"), so it might be better to use
Vladimir>"prefix that specifies encoding + 'index block' (that specifies
offsets for each 1M characters) + encoded string".

It does describe the data structure.

Andrew>what APIS

I believe it does not matter much.
However, it might be similar to the existing LO APIs, except the indices
are in characters rather than bytes.

Andrew>protocol changes

None.

Andrew>and so on

Well, it looks like I had everything you mentioned in the very first email.

Vladimir


Re: Planner making bad choice in alternative subplan decision

2020-09-29 Thread David Rowley
On Wed, 30 Sep 2020 at 04:42, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Tue, 29 Sep 2020 at 12:08, Tom Lane  wrote:
> >> The idea I'd had was to adjust make_subplan and cost_subplan to estimate
> >> EXIST cases on the basis of either 50% retrieval (same as ANY/ALL) or
> >> maybe full retrieval if you want to be pessimistic.  I've not had time
> >> to try it out though.
>
> > Here's a patch which does that.  There are no regression test failures.
>
> (1) IMO, make_subplan should be adjusted in the same way; or at least,
> if it isn't, the comments therein need to be changed to explain why
> it's okay for it to be out of sync with cost_subplan.

All the code seems to be doing is deciding on the tuple_fraction to
pass to the planner. Perhaps it's worth putting a note there to
mention why cost_subplan() does it differently, but that might be
better explained in cost_subplan()

> (2) Does this fix the plan choice problem you started with?

Yes

> > I'm trying to anticipate areas that this could cause a regression.  I
> > think generally a hashed subplan should win in about all cases where
> > we lookup all of the items in the table. The places where it could
> > cause regression are, where we have to load way more into the hash
> > table than we're going to lookup. Assuming roughly the same costs for
> > the subplan for hashing and the non-hashed subplan, then the crossover
> > point will be about 2 lookups (2 x 50%).  I do wonder if 50% is a bit
> > too big a number. We did ask the planner for a fast startup plan, so
> > there is perhaps some reasoning to make the cost multiplier somewhere
> > between 50% and 1 / nrows. I'm just not sure what that should be.
> > There's very little to go on cost-wise, or even heuristically. So
> > consider the patch still a discussion level patch.
>
> One way to get some data is to see what the threshold multiplier is
> to change the plan choice in an EXISTS that is currently planned
> wrongly.

It'll choose the hashed plan if we divide by just half the rows.

I've input the costs into the attached spreadsheet. Changing cell D2
is what I'm proposing to change. The patch effectively changes this to
=F2*0.5 which makes the cost of the non-hashed plan 850,000 (vs 195
for the hashed plan). Changing it to =F2/C2*2 (i.e divide by half the
rows) would cause the planner to choose the hashed plan.

David


subplan_costing.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: Yet another fast GiST build

2020-09-29 Thread Heikki Linnakangas

On 29/09/2020 21:04, Andrey M. Borodin wrote:

28 сент. 2020 г., в 13:12, Heikki Linnakangas  написал(а):

I did some testing with your patch. It seems that the rightlinks
are still not always set. I didn't try to debug why.


Yes, there is a bug. Now it seems to me so obvious, yet it took some 
time to understand that links were shifted by one extra jump. PFA 
fixed rightlinks installation.

Ah, that was simple. I propose adding a comment on it:

--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -540,6 +540,19 @@ gist_indexsortbuild_pagestate_flush(GISTBuildState 
*state,

/* Re-initialize the page buffer for next page on this level. */
pagestate->page = palloc(BLCKSZ);
gistinitpage(pagestate->page, isleaf ? F_LEAF : 0);
+
+   /*
+* Set the right link to point to the previous page. This is just for
+* debugging purposes: GiST only follows the right link if a page is 
split
+* concurrently to a scan, and that cannot happen during index build.
+*
+* It's a bit counterintuitive that we set the right link on the new 
page
+* to point to the previous page, and not the other way round. But GiST
+* pages are not ordered like B-tree pages are, so as long as the
+* right-links form a chain through all the pages in the same level, the
+* order doesn't matter.
+*/
+   GistPageGetOpaque(pagestate->page)->rightlink = blkno;
 }


BTW some one more small thing: we initialise page buffers with
palloc() and palloc0(), while first one is sufficient for
gistinitpage().
Hmm. Only the first one, in gist_indexsortbuild(), but that needs to be 
palloc0, because it's used to write an all-zeros placeholder for the 
root page.



Also I'm working on btree_gist opclasses and found out that new
sortsupport function is not mentioned in gistadjustmembers(). I think
this can be fixed with gist_btree patch.


Thanks!


Your pageinspect patch seems very useful. How do you think, should we
provide a way to find invalid tuples in GiST within
gist_page_items()? At some point we will have to ask user to reindex
GiSTs with invalid tuples.
You mean invalid tuples created by crash on PostgreSQL version 9.0 or 
below, and pg_upgraded? I doubt there are any of those still around in 
the wild. We have to keep the code to detect them, though.


It would be nice to improve gist_page_items() to display more 
information about the items, although I wouldn't worry much about 
invalid tuples. The 'gevel' extension that Oleg mentioned upthread does 
more, it would be nice to incorporate that into pageinspect somehow.


- Heikki




Re: Dumping/restoring fails on inherited generated column

2020-09-29 Thread Tom Lane
Daniel Gustafsson  writes:
> On 29 Sep 2020, at 18:37, Tom Lane  wrote:
>> This does not cause child2.f1's attislocal property to become
>> true.  Maybe it should have, but it's probably too late for
>> that; at least, pg_dump couldn't assume it's true in older
>> servers.  

> Do you recall the rationale for it not being set to true?  I didn't spot
> anything in the commit history. Intuitively it seems a tad odd.

I'd bet the explanation is mostly that it didn't occur to anyone
that SET DEFAULT should do that.  I'm not really proposing that it
should either.  If we were to make any catalog changes in response
to this, what I'd vote for is to add an "inherited" flag to
pg_attrdef.  (I'm not quite sure if a bool would be sufficient,
or if we'd need to go to the same extent as pg_attribute does,
and have a bool plus an inheritance count.)

Of course, that line of thought does not lead to anything
back-patchable.  But pg_dump would have to be prepared to cope
with the situation in older servers in any case.

regards, tom lane




Re: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread Andrew Dunstan


On 9/29/20 3:48 PM, Vladimir Sitnikov wrote:
> Andrew>You and I clearly have a different idea from what constitutes a
> concrete
> Andrew>proposal. This is hardly the ghost of a proposal.
>
> Can you please clarify what is a proposal from your point of view?
> Is it documented?
>
> I think I have read the relevant TODO items:
> https://wiki.postgresql.org/wiki/Developer_FAQ#What_do_I_do_after_choosing_an_item_to_work_on.3F
>
> Wiki clearly suggests posting a mail to pgsql-hackers before starting
> work.
>
>

A concrete proposal needs to be more than "a feature that does X". It
needs to contain a substantial implementation plan. What structures will
be affected, what APIS, what protocol changes and so on. You don't need
to have the code for these things but you do need a description of
what's intended by way of implementation that is detailed enough for the
community to critique. If you don't the danger is that you will spend a
lot of time coding and then present it to the community and they will
say "Oh, the design is all wrong." That's happened to a number of people
in the past, including some quite high profile people, and it's very sad
and frustrating for everyone when it happens.

A feature for streaming large data types could well be very valuable,
but right at the moment I at least don't have any idea what such a thing
could look like (and as you might imagine I'm quite interested).


cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Dumping/restoring fails on inherited generated column

2020-09-29 Thread Daniel Gustafsson
> On 29 Sep 2020, at 18:37, Tom Lane  wrote:
> 
> [ Pulling Daniel into this older thread seems like the cleanest way to
>  unify the two threads ]
> 
> Masahiko Sawada  writes:
>> If we have ALTER TABLE ONLY / DROP EXPRESSION update the attlocal
>> column of children to true to fix the issue you raised, my proposed
>> patch is not necessary. OTOH if we fix it by prohibiting the command,
>> I guess we need both patches to fix the issues.
> 
> Right, Peter already mentioned that we need a further pg_dump fix on
> top of prohibiting the ONLY ... DROP EXPRESSION case.
> 
> Bug #16622 [1] is another report of this same issue, and in that thread,
> Daniel argues that the right fix is just
> 
> + /*
> +  * Skip if the column isn't local to the table's definition as the 
> attrdef
> +  * will be printed with the inheritance parent table definition
> +  */
> + if (!tbinfo->attislocal[adnum - 1])
> + return;
> 
> without the attgenerated clause that Masahiko-san proposes.
> I think however that that's wrong.  It is possible to have
> a non-attislocal column that has its own default, because
> you can do

Ah, that's the sequence I didn't think of.  I agree with your assessment of
this being wrong. Thanks!

> This does not cause child2.f1's attislocal property to become
> true.  Maybe it should have, but it's probably too late for
> that; at least, pg_dump couldn't assume it's true in older
> servers.  

Do you recall the rationale for it not being set to true?  I didn't spot
anything in the commit history. Intuitively it seems a tad odd.

> Therefore, since we can't tell whether the default
> is inherited or not, we'd better dump it.

Agreed.

> (I recall that pg_dump has a hack somewhere that checks for
> textual equality of CHECK conditions to avoid dumping redundant
> child copies.  Maybe we could do something similar here.)

Unless someone beats me to it I will take a stab at this to see what it would
look like.

cheers ./daniel



Re: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread Vladimir Sitnikov
Andrew>You and I clearly have a different idea from what constitutes a
concrete
Andrew>proposal. This is hardly the ghost of a proposal.

Can you please clarify what is a proposal from your point of view?
Is it documented?

I think I have read the relevant TODO items:
https://wiki.postgresql.org/wiki/Developer_FAQ#What_do_I_do_after_choosing_an_item_to_work_on.3F

Wiki clearly suggests posting a mail to pgsql-hackers before starting work.

Vladimir


Re: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread Andrew Dunstan


On 9/29/20 2:39 PM, Dave Cramer wrote:
>
>
> On Tue, 29 Sep 2020 at 14:33, Andrew Dunstan
>  > wrote:
>
>
> On 9/29/20 10:26 AM, Peter Eisentraut wrote:
> > On 2020-09-28 15:46, Vladimir Sitnikov wrote:
> >> The concerns to avoid "Clob maps to text" could be:
> >> a) Once the behavior is implemented, it is hard to change. That is
> >> applications would rely on it (and it becomes a defacto standard),
> >> and it would be hard to move to the proper "text with streaming
> API"
> >> datatype.
> >> b) If we make «clob is text», then people might start using
> >> update/substring APIs (which is the primary motivation for Clob)
> >> without realizing there’s full value update behind the scenes.
> >> Currently, they can use setString/getString for text, and it is
> >> crystal clear that the text is updated fully on every update.
> >
> > When we added TOAST, we made the explicit decision to not add a
> "LONG"
> > type but instead have the toasting mechanism transparent in all
> > variable-length types.  And that turned out to be a very successful
> > decision, because it allows this system to be used by all data
> types,
> > not only one or two hardcoded ones.  Therefore, I'm very strongly of
> > the opinion that if a streaming system of the sort you allude to
> were
> > added, it would also be added transparently into the TOAST system.
> >
> > The JDBC spec says
> >
> > """
> > An implementation of a Blob, Clob or NClob object may either be
> > locator based or result in the object being fully materialized
> on the
> > client.
> >
> > By default, a JDBC driver should implement the Blob, Clob and NClob
> > interfaces using the appropriate locator type. An application
> does not
> > deal directly with the locator types that are defined in SQL.
> > """
> >
> > (A "locator" in SQL is basically what you might call a streaming
> handle.)
> >
> > So yes, this encourages the implementation of locators.  But it also
> > specifies that if you don't have locators, you can implement this
> > using non-large-object types.
> >
> >
>
> So if I read this correctly what I have proposed is completely kosher
> according to the spec - it's the "fully materialized on the client"
> variant, just like the MySQL and MSSQL drivers.
>
>
> I haven't really looked at MySQL or MSSQL but do they implement the
> full CLOB API ?
> We would need to implement the full API.
>
> BTW, just because it adheres to the spec doesn't seem to hold water in
> the PostgreSQL project. Just sayin'
>
>

I take your point, but my remark was more in response to the apparent
suggestion that what I submitted was not according to spec.


There are two Clob methods I didn't implement, and one Blob method - the
set*Stream methods, I think they should be implementable, but they will
make the implementation somewhat more complex.


Anyway, at this stage let's take the discussion back to the github forums.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread Dave Cramer
On Tue, 29 Sep 2020 at 14:33, Andrew Dunstan 
wrote:

>
> On 9/29/20 10:26 AM, Peter Eisentraut wrote:
> > On 2020-09-28 15:46, Vladimir Sitnikov wrote:
> >> The concerns to avoid "Clob maps to text" could be:
> >> a) Once the behavior is implemented, it is hard to change. That is
> >> applications would rely on it (and it becomes a defacto standard),
> >> and it would be hard to move to the proper "text with streaming API"
> >> datatype.
> >> b) If we make «clob is text», then people might start using
> >> update/substring APIs (which is the primary motivation for Clob)
> >> without realizing there’s full value update behind the scenes.
> >> Currently, they can use setString/getString for text, and it is
> >> crystal clear that the text is updated fully on every update.
> >
> > When we added TOAST, we made the explicit decision to not add a "LONG"
> > type but instead have the toasting mechanism transparent in all
> > variable-length types.  And that turned out to be a very successful
> > decision, because it allows this system to be used by all data types,
> > not only one or two hardcoded ones.  Therefore, I'm very strongly of
> > the opinion that if a streaming system of the sort you allude to were
> > added, it would also be added transparently into the TOAST system.
> >
> > The JDBC spec says
> >
> > """
> > An implementation of a Blob, Clob or NClob object may either be
> > locator based or result in the object being fully materialized on the
> > client.
> >
> > By default, a JDBC driver should implement the Blob, Clob and NClob
> > interfaces using the appropriate locator type. An application does not
> > deal directly with the locator types that are defined in SQL.
> > """
> >
> > (A "locator" in SQL is basically what you might call a streaming handle.)
> >
> > So yes, this encourages the implementation of locators.  But it also
> > specifies that if you don't have locators, you can implement this
> > using non-large-object types.
> >
> >
>
> So if I read this correctly what I have proposed is completely kosher
> according to the spec - it's the "fully materialized on the client"
> variant, just like the MySQL and MSSQL drivers.
>
>
I haven't really looked at MySQL or MSSQL but do they implement the full
CLOB API ?
We would need to implement the full API.

BTW, just because it adheres to the spec doesn't seem to hold water in the
PostgreSQL project. Just sayin'

Dave


Re: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread Andrew Dunstan


On 9/29/20 10:26 AM, Peter Eisentraut wrote:
> On 2020-09-28 15:46, Vladimir Sitnikov wrote:
>> The concerns to avoid "Clob maps to text" could be:
>> a) Once the behavior is implemented, it is hard to change. That is
>> applications would rely on it (and it becomes a defacto standard),
>> and it would be hard to move to the proper "text with streaming API"
>> datatype.
>> b) If we make «clob is text», then people might start using
>> update/substring APIs (which is the primary motivation for Clob)
>> without realizing there’s full value update behind the scenes.
>> Currently, they can use setString/getString for text, and it is
>> crystal clear that the text is updated fully on every update.
>
> When we added TOAST, we made the explicit decision to not add a "LONG"
> type but instead have the toasting mechanism transparent in all
> variable-length types.  And that turned out to be a very successful
> decision, because it allows this system to be used by all data types,
> not only one or two hardcoded ones.  Therefore, I'm very strongly of
> the opinion that if a streaming system of the sort you allude to were
> added, it would also be added transparently into the TOAST system.
>
> The JDBC spec says
>
> """
> An implementation of a Blob, Clob or NClob object may either be
> locator based or result in the object being fully materialized on the
> client.
>
> By default, a JDBC driver should implement the Blob, Clob and NClob
> interfaces using the appropriate locator type. An application does not
> deal directly with the locator types that are defined in SQL.
> """
>
> (A "locator" in SQL is basically what you might call a streaming handle.)
>
> So yes, this encourages the implementation of locators.  But it also
> specifies that if you don't have locators, you can implement this
> using non-large-object types.
>
>

So if I read this correctly what I have proposed is completely kosher
according to the spec - it's the "fully materialized on the client"
variant, just like the MySQL and MSSQL drivers.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: BUG #16419: wrong parsing BC year in to_date() function

2020-09-29 Thread Tom Lane
I wrote:
> I think this is nuts.  The current behavior is obviously broken;
> we should just treat it as a bug and fix it, including back-patching.
> I do not think there is a compatibility problem of any significance.
> Who out there is going to have an application that is relying on the
> ability to insert BC dates in this way?

Concretely, I propose the attached.  This adjusts Dar Alathar-Yemen's
patch (it didn't do the right thing IMO for the combination of bc
and year < 0) and adds test cases and docs.

Oracle would have us throw an error for year zero, but our historical
behavior has been to read it as 1 BC.  That's not so obviously wrong
that I'd want to change it in the back branches.  Maybe it could be
done as a follow-up change in HEAD.

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 62dd738230..ec8451d1b9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -7678,6 +7678,15 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
   
  
 
+ 
+  
+   In to_timestamp and to_date,
+   negative years are treated as signifying BC.  If you write both a
+   negative year and an explicit BC field, you get AD
+   again.  An input of year zero is treated as 1 BC.
+  
+ 
+
  
   
In to_timestamp and to_date,
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b91ff7bb80..3bb01cdb65 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -4569,8 +4569,11 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std,
 		{
 			/* If a 4-digit year is provided, we use that and ignore CC. */
 			tm->tm_year = tmfc.year;
-			if (tmfc.bc && tm->tm_year > 0)
-tm->tm_year = -(tm->tm_year - 1);
+			if (tmfc.bc)
+tm->tm_year = -tm->tm_year;
+			/* correct for our representation of BC years */
+			if (tm->tm_year < 0)
+tm->tm_year++;
 		}
 		fmask |= DTK_M(YEAR);
 	}
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index c8c33a0fc0..7f82dcfbfe 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2916,6 +2916,45 @@ SELECT to_date('2458872', 'J');
  01-23-2020
 (1 row)
 
+--
+-- Check handling of BC dates
+--
+SELECT to_date('44-02-01 BC','-MM-DD BC');
+to_date
+---
+ 02-01-0044 BC
+(1 row)
+
+SELECT to_date('-44-02-01','-MM-DD');
+to_date
+---
+ 02-01-0044 BC
+(1 row)
+
+SELECT to_date('-44-02-01 BC','-MM-DD BC');
+  to_date   
+
+ 02-01-0044
+(1 row)
+
+SELECT to_timestamp('44-02-01 11:12:13 BC','-MM-DD HH24:MI:SS BC');
+  to_timestamp   
+-
+ Fri Feb 01 11:12:13 0044 PST BC
+(1 row)
+
+SELECT to_timestamp('-44-02-01 11:12:13','-MM-DD HH24:MI:SS');
+  to_timestamp   
+-
+ Fri Feb 01 11:12:13 0044 PST BC
+(1 row)
+
+SELECT to_timestamp('-44-02-01 11:12:13 BC','-MM-DD HH24:MI:SS BC');
+ to_timestamp 
+--
+ Mon Feb 01 11:12:13 0044 PST
+(1 row)
+
 --
 -- Check handling of multiple spaces in format and/or input
 --
@@ -3183,6 +3222,12 @@ SELECT to_date('2016 366', ' DDD');  -- ok
 
 SELECT to_date('2016 367', ' DDD');
 ERROR:  date/time field value out of range: "2016 367"
+SELECT to_date('-02-01','-MM-DD');  -- allowed, though it shouldn't be
+to_date
+---
+ 02-01-0001 BC
+(1 row)
+
 --
 -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572)
 --
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index c464e6766c..fed21a53c8 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -426,6 +426,17 @@ SELECT to_date('1 4 1902', 'Q MM ');  -- Q is ignored
 SELECT to_date('3 4 21 01', 'W MM CC YY');
 SELECT to_date('2458872', 'J');
 
+--
+-- Check handling of BC dates
+--
+
+SELECT to_date('44-02-01 BC','-MM-DD BC');
+SELECT to_date('-44-02-01','-MM-DD');
+SELECT to_date('-44-02-01 BC','-MM-DD BC');
+SELECT to_timestamp('44-02-01 11:12:13 BC','-MM-DD HH24:MI:SS BC');
+SELECT to_timestamp('-44-02-01 11:12:13','-MM-DD HH24:MI:SS');
+SELECT to_timestamp('-44-02-01 11:12:13 BC','-MM-DD HH24:MI:SS BC');
+
 --
 -- Check handling of multiple spaces in format and/or input
 --
@@ -511,6 +522,7 @@ SELECT to_date('2015 366', ' DDD');
 SELECT to_date('2016 365', ' DDD');  -- ok
 SELECT to_date('2016 366', ' DDD');  -- ok
 SELECT to_date('2016 367', ' DDD');
+SELECT to_date('-02-01','-MM-DD');  -- allowed, though it shouldn't be
 
 --
 -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572)


Re: Yet another fast GiST build

2020-09-29 Thread Andrey M. Borodin


> 28 сент. 2020 г., в 13:12, Heikki Linnakangas  написал(а):
> 
> On 21/09/2020 17:19, Andrey M. Borodin wrote:
>>> 21 сент. 2020 г., в 18:29, Andrey M. Borodin  
>>> написал(а):
>>> 
>>> It was a conscious decision with incorrect motivation. I was thinking that 
>>> it will help to reduce number of "false positive" inspecting right pages. 
>>> But now I see that:
>>> 1. There should be no such "false positives" that we can avoid
>>> 2. Valid rightlinks could help to do amcheck verification in future
>> Well, point number 2 here is invalid. There exist one leaf page p, so that 
>> if we start traversing rightlink from p we will reach all leaf pages. But we 
>> practically have no means to find this page. This makes rightlinks not very 
>> helpful in amcheck for GiST.
> 
> Well, if you store all the right links in a hash table or something, you can 
> "connect the dots" after you have scanned all the pages to see that the chain 
> is unbroken. Probably would not be worth the trouble, since the rightlinks 
> are not actually needed after concurrent scans have completed.
> 
>> But for consistency I think it worth to install them.
> 
> I agree. I did some testing with your patch. It seems that the rightlinks are 
> still not always set. I didn't try to debug why.
> 
> I wrote a couple of 'pageinspect' function to inspect GiST pages for this. 
> See attached. I then created a test table and index like this:
> 
> create table points (p point);
> insert into points select point(x,y) from generate_series(-2000, 2000) x, 
> generate_series(-2000, 2000) y;
> create index points_idx on points using gist (p);
> 
> And this is what the root page looks like:
> 
> postgres=# select * from gist_page_items(get_raw_page('points_idx', 0));
> itemoffset | ctid  | itemlen
> +---+-
>  1 | (27891,65535) |  40
>  2 | (55614,65535) |  40
>  3 | (83337,65535) |  40
>  4 | (97019,65535) |  40
> (4 rows)
> 
> And the right links on the next level:
> 
> postgres=# select * from (VALUES (27891), (55614), (83337), (97019)) b 
> (blkno), lateral gist_page_opaque_info(get_raw_page('points_idx', blkno));
> blkno | lsn | nsn | rightlink  | flags
> ---+-+-++---
> 27891 | 0/1 | 0/0 | 4294967295 | {}
> 55614 | 0/1 | 0/0 | 4294967295 | {}
> 83337 | 0/1 | 0/0 |  27891 | {}
> 97019 | 0/1 | 0/0 |  55614 | {}
> (4 rows)
> 
> I expected there to be only one page with invalid right link, but there are 
> two.

Yes, there is a bug. Now it seems to me so obvious, yet it took some time to 
understand that links were shifted by one extra jump. PFA fixed rightlinks 
installation.

BTW some one more small thing: we initialise page buffers with palloc() and 
palloc0(), while first one is sufficient for gistinitpage().
Also I'm working on btree_gist opclasses and found out that new sortsupport 
function is not mentioned in gistadjustmembers(). I think this can be fixed 
with gist_btree patch.

Your pageinspect patch seems very useful. How do you think, should we provide a 
way to find invalid tuples in GiST within gist_page_items()? At some point we 
will have to ask user to reindex GiSTs with invalid tuples.


> 28 сент. 2020 г., в 11:53, Heikki Linnakangas  написал(а):
> 
> On 21/09/2020 16:29, Andrey M. Borodin wrote:
>> But thing that bothers me now: when we vacuum leaf page, we bump it's
>> NSN. But we do not bump internal page LSN. Does this means we will
>> follow rightlinks after vacuum? It seems superflous.
> 
> Sorry, I did not understand what you said above. Vacuum doesn't update any 
> NSNs, only LSNs. Can you elaborate?
I've misunderstood difference between NSN and LSN. Seems like everything is 
fine.

> 
>> And btw we did not adjust internal page tuples after vacuum...
> What do you mean by that?

When we delete rows from table internal tuples in GiST stay wide.


Thanks!

Best regards, Andrey Borodin.



install_rightlinks_v2.diff
Description: Binary data


Re: BUG #16419: wrong parsing BC year in to_date() function

2020-09-29 Thread Tom Lane
Peter Eisentraut  writes:
> Adding support for negative years in make_timestamp seems pretty 
> straightforward; see attached patch.

In hopes of moving things along, I pushed that, along with documentation
additions.  I couldn't quite convince myself that it was a bug fix
though, so no back-patch.  (I don't think we really need any doc
changes about it in the back branches, either.)

regards, tom lane




Re: BUG #16419: wrong parsing BC year in to_date() function

2020-09-29 Thread Tom Lane
Bruce Momjian  writes:
> On Fri, Sep  4, 2020 at 12:45:36PM -0700, David G. Johnston wrote:
>>> Because we already have the to_date/make_date inconsistency, and the -1
>>> to -2 BC mapping is confusing, and doesn't match Oracle, I think the
>>> clean solution is to change PG 14 to treat -1 as 1 BC, and document the
>>> incompatibility in the release notes.

>> I agree that someone else should write another patch to fix the behavior for
>> v14.  Still suggest committing the proposed patch to master and all supported
>> versions to document the existing behavior correctly.  The fix patch can work
>> from that.

> I think we need to apply the patches to all branches at the same time. 
> I am not sure we want to document a behavior we know will change in PG
> 14.

I think this is nuts.  The current behavior is obviously broken;
we should just treat it as a bug and fix it, including back-patching.
I do not think there is a compatibility problem of any significance.
Who out there is going to have an application that is relying on the
ability to insert BC dates in this way?

regards, tom lane




Re: Dumping/restoring fails on inherited generated column

2020-09-29 Thread Tom Lane
I wrote:
> The situation is different for GENERATED columns, since we disallow
> a child having a different GENERATED property than the parent.

BTW, that alleged prohibition is pretty damn leaky:

d3=# create table pp1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE
d3=# create table cc1 (a int, b int GENERATED ALWAYS AS (a * 3) STORED);
CREATE TABLE
d3=# alter table cc1 inherit pp1;
ALTER TABLE

Maybe the *real* fix here is to give up on this idea that they
can't be different?

regards, tom lane




some pointless HeapTupleHeaderIndicatesMovedPartitions calls

2020-09-29 Thread Alvaro Herrera
Hello

Pavan Deolasee recently noted that a few of the
HeapTupleHeaderIndicatesMovedPartitions calls added by commit
5db6df0c0117 are useless, since they are done after comparing t_self
with t_ctid.  That's because t_self can never be set to the magical
values that indicate that the tuple moved partition.  If the first
test fails (so we know t_self equals t_ctid), necessarily the second
test will also fail.

So these checks can be removed and no harm is done.

-- 
Álvaro Herrera39°49'30"S 73°17'W
>From 953eceb701ca35c5a3c731c4318ed1465f205811 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 29 Sep 2020 10:47:45 -0300
Subject: [PATCH] Remove pointless HeapTupleHeaderIndicatesMovedPartitions
 calls

---
 src/backend/access/heap/heapam.c| 12 
 src/backend/access/heap/heapam_visibility.c |  9 +++--
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861a02..868ff13453 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2618,8 +2618,7 @@ l1:
 			HEAP_XMAX_IS_LOCKED_ONLY(tp.t_data->t_infomask) ||
 			HeapTupleHeaderIsOnlyLocked(tp.t_data))
 			result = TM_Ok;
-		else if (!ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid) ||
- HeapTupleHeaderIndicatesMovedPartitions(tp.t_data))
+		else if (!ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid))
 			result = TM_Updated;
 		else
 			result = TM_Deleted;
@@ -3248,8 +3247,7 @@ l2:
 
 		if (can_continue)
 			result = TM_Ok;
-		else if (!ItemPointerEquals(&oldtup.t_self, &oldtup.t_data->t_ctid) ||
- HeapTupleHeaderIndicatesMovedPartitions(oldtup.t_data))
+		else if (!ItemPointerEquals(&oldtup.t_self, &oldtup.t_data->t_ctid))
 			result = TM_Updated;
 		else
 			result = TM_Deleted;
@@ -4485,8 +4483,7 @@ l3:
 			HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_data->t_infomask) ||
 			HeapTupleHeaderIsOnlyLocked(tuple->t_data))
 			result = TM_Ok;
-		else if (!ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid) ||
- HeapTupleHeaderIndicatesMovedPartitions(tuple->t_data))
+		else if (!ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid))
 			result = TM_Updated;
 		else
 			result = TM_Deleted;
@@ -5059,8 +5056,7 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 LOCKMODE_from_mxstatus(wantedstatus)))
 		{
 			/* bummer */
-			if (!ItemPointerEquals(&tup->t_self, &tup->t_data->t_ctid) ||
-HeapTupleHeaderIndicatesMovedPartitions(tup->t_data))
+			if (!ItemPointerEquals(&tup->t_self, &tup->t_data->t_ctid))
 return TM_Updated;
 			else
 return TM_Deleted;
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 80bd494076..cab6a48a5d 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -607,8 +607,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	{
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 			return TM_Ok;
-		if (!ItemPointerEquals(&htup->t_self, &tuple->t_ctid) ||
-			HeapTupleHeaderIndicatesMovedPartitions(tuple))
+		if (!ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
 			return TM_Updated;	/* updated by other */
 		else
 			return TM_Deleted;	/* deleted by other */
@@ -653,8 +652,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 
 		if (TransactionIdDidCommit(xmax))
 		{
-			if (!ItemPointerEquals(&htup->t_self, &tuple->t_ctid) ||
-HeapTupleHeaderIndicatesMovedPartitions(tuple))
+			if (!ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
 return TM_Updated;
 			else
 return TM_Deleted;
@@ -714,8 +712,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 
 	SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
 HeapTupleHeaderGetRawXmax(tuple));
-	if (!ItemPointerEquals(&htup->t_self, &tuple->t_ctid) ||
-		HeapTupleHeaderIndicatesMovedPartitions(tuple))
+	if (!ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
 		return TM_Updated;		/* updated by other */
 	else
 		return TM_Deleted;		/* deleted by other */
-- 
2.20.1



Re: calling procedures is slow and consumes extra much memory against calling function

2020-09-29 Thread Pavel Stehule
út 29. 9. 2020 v 17:20 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > I agree with these conclusions.  I'll try to look if I can do #2 patch
> > better for pg14. Probably it can fix more issues related to CALL
> statement,
> > and I agree so this should not be backapatched.
>
> I've pushed this and marked the CF entry committed.  Please start a
> new thread and new CF entry whenever you have a more ambitious patch.
>

Thank you

Pavel


> regards, tom lane
>


Re: Dumping/restoring fails on inherited generated column

2020-09-29 Thread Tom Lane
[ Pulling Daniel into this older thread seems like the cleanest way to
  unify the two threads ]

Masahiko Sawada  writes:
> If we have ALTER TABLE ONLY / DROP EXPRESSION update the attlocal
> column of children to true to fix the issue you raised, my proposed
> patch is not necessary. OTOH if we fix it by prohibiting the command,
> I guess we need both patches to fix the issues.

Right, Peter already mentioned that we need a further pg_dump fix on
top of prohibiting the ONLY ... DROP EXPRESSION case.

Bug #16622 [1] is another report of this same issue, and in that thread,
Daniel argues that the right fix is just

+   /*
+* Skip if the column isn't local to the table's definition as the 
attrdef
+* will be printed with the inheritance parent table definition
+*/
+   if (!tbinfo->attislocal[adnum - 1])
+   return;

without the attgenerated clause that Masahiko-san proposes.
I think however that that's wrong.  It is possible to have
a non-attislocal column that has its own default, because
you can do

d3=# create table parent (f1 int default 2);
CREATE TABLE
d3=# create table child (f1 int default 3) inherits(parent);
NOTICE:  merging column "f1" with inherited definition
CREATE TABLE
d3=# create table child2() inherits(parent);
CREATE TABLE
d3=# alter table child2 alter column f1 set default 42;
ALTER TABLE

This does not cause child2.f1's attislocal property to become
true.  Maybe it should have, but it's probably too late for
that; at least, pg_dump couldn't assume it's true in older
servers.  Therefore, since we can't tell whether the default
is inherited or not, we'd better dump it.

(I recall that pg_dump has a hack somewhere that checks for
textual equality of CHECK conditions to avoid dumping redundant
child copies.  Maybe we could do something similar here.)

The situation is different for GENERATED columns, since we disallow
a child having a different GENERATED property than the parent.
However, I think Masahiko-san's patch is not quite right either,
because a column can be both inherited and local.  An example is

d3=# create table pgen (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE
d3=# create table cgen1 (b int) inherits (pgen);
NOTICE:  moving and merging column "b" with inherited definition
DETAIL:  User-specified column moved to the position of the inherited column.
CREATE TABLE
d3=# select attname, attislocal, attinhcount from pg_attribute where attrelid = 
'cgen1'::regclass and attnum>0;
 attname | attislocal | attinhcount 
-++-
 a   | f  |   1
 b   | t  |   1
(2 rows)

So it appears to me that a more correct coding is

 /*
  * Do not print a GENERATED default for an inherited column; it will
  * be inherited from the parent, and the backend won't accept a
  * command to set it separately.
  */
 if (tbinfo->attinhcount[adnum - 1] > 0 && tbinfo->attgenerated[adnum - 1])
 return;

Unfortunately this has still got a problem: it will mishandle the case of
a child column that is GENERATED while its parent is not.  Peter opined
way upthread that we should not allow that, but according to my testing
we do.

This'd all be a lot cleaner if defaults were marked as to whether they
were inherited or locally generated.  Maybe we ought to work on that.

In the meantime, maybe the best bet is for pg_dump to try to detect
whether a default is identical to a parent default, and not dump it
if so.  That would fix both the GENERATED case where we must not
dump it, and the non-GENERATED case where it's merely inefficient
to do so.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/16622-779a79851b4e2491%40postgresql.org




Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

2020-09-29 Thread Fujii Masao




On 2020/09/30 0:50, Bharath Rupireddy wrote:

Thanks for the comments.

On Tue, Sep 29, 2020 at 7:30 PM Fujii Masao  wrote:


+1 to add debug3 message there. But this message doesn't seem to
match with what the error actually happened. What about something like
"could not start remote transaction on connection %p", instead?



Looks better. Changed.



Also maybe it's better to append PQerrorMessage(entry->conn)?



Added. Now the log looks like [1].



+-- Generate a connection to remote. Local backend will cache it.
+SELECT * FROM ft1 LIMIT 1;

The result of this query would not be stable. Probably you need to,
for example, add ORDER BY or replace * with 1, etc.



Changed to SELECT 1 FROM ft1 LIMIT 1;



+-- Retrieve pid of remote backend with application name fdw_retry_check
+-- and kill it intentionally here. Note that, local backend still has
+-- the remote connection/backend info in it's cache.
+SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
+backend_type = 'client backend' AND application_name = 'fdw_retry_check';

Isn't this test fragile because there is no gurantee that the target backend
has exited just after calling pg_terminate_backend()?



I think this is okay, because pg_terminate_backend() sends SIGTERM to
the backend, and upon receiving SIGTERM the signal handler die() will
be called and since there is no query being executed on the backend by
the time SIGTERM is received, it will exit immediately. Thoughts?


Yeah, basically you're right. But that backend *can* still be running
when the subsequent test query starts. I'm wondering if wait_pid()
(please see regress.c and sql/dblink.sql) should be used to ensure
the target backend disappeared.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: New statistics for tuning WAL buffer size

2020-09-29 Thread Fujii Masao




On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda  wrote:


On 2020-09-28 12:43, Amit Kapila wrote:
> On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
>  wrote:
>>
>> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
>>  wrote in
>> > One other thing that occurred to me today is can't we keep this as
>> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
>> > reset it. It seems to me this is a cluster-wide stats and somewhat
>> > similar to some of the other stats we maintain there.
>>
>> I like that direction, but PgStat_GlobalStats is actually
>> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
>>
>
> Yeah, I think if we want to pursue this direction then we probably
> need to have a separate message to set/reset WAL-related stuff. I
> guess we probably need to have a separate reset timestamp for WAL. I
> think the difference would be that we can have one structure to refer
> to global_stats instead of referring to multiple structures and we
> don't need to issue separate read/write calls but OTOH I don't see
> many disadvantages of the current approach as well.

IIUC, if we keep wal stats as part of PgStat_GlobalStats,
don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
to PgStat_GlobalStats too?



I have given the idea for wal_stats because there is just one counter
in that. I think you can just try to evaluate the merits of each
approach and choose whichever you feel is good. This is just a
suggestion, if you don't like it feel free to proceed with the current
approach.


Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi),
I think that the current approach is good.


+1

I marked this patch as ready for committer.
Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

2020-09-29 Thread Bharath Rupireddy
Thanks for the comments.

On Tue, Sep 29, 2020 at 7:30 PM Fujii Masao  wrote:
>
> +1 to add debug3 message there. But this message doesn't seem to
> match with what the error actually happened. What about something like
> "could not start remote transaction on connection %p", instead?
>

Looks better. Changed.

>
> Also maybe it's better to append PQerrorMessage(entry->conn)?
>

Added. Now the log looks like [1].

>
> +-- Generate a connection to remote. Local backend will cache it.
> +SELECT * FROM ft1 LIMIT 1;
>
> The result of this query would not be stable. Probably you need to,
> for example, add ORDER BY or replace * with 1, etc.
>

Changed to SELECT 1 FROM ft1 LIMIT 1;

>
> +-- Retrieve pid of remote backend with application name fdw_retry_check
> +-- and kill it intentionally here. Note that, local backend still has
> +-- the remote connection/backend info in it's cache.
> +SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
> +backend_type = 'client backend' AND application_name = 'fdw_retry_check';
>
> Isn't this test fragile because there is no gurantee that the target backend
> has exited just after calling pg_terminate_backend()?
>

I think this is okay, because pg_terminate_backend() sends SIGTERM to
the backend, and upon receiving SIGTERM the signal handler die() will
be called and since there is no query being executed on the backend by
the time SIGTERM is received, it will exit immediately. Thoughts?

pqsignal(SIGTERM, die); /* cancel current query and exit */

And also, pg_terminate_backend() returns true in case the backend is
killed successfully, otherwise it returns false. PG_RETURN_BOOL(r
== SIGNAL_BACKEND_SUCCESS);

Attaching v6 patch, please review it further.

[1]
DEBUG:  starting remote transaction on connection 0x55cd393a66e0
DEBUG:  could not start remote transaction on connection 0x55cd393a66e0
DETAIL:  server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
DEBUG:  closing connection 0x55cd393a66e0 to reestablish a new one
DEBUG:  new postgres_fdw connection 0x55cd393a66e0 for server
"foreign_server" (user mapping oid 16398, userid 10)
DEBUG:  starting remote transaction on connection 0x55cd393a66e0
DEBUG:  closing remote transaction on connection 0x55cd393a66e0


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v6-Retry-Cached-Remote-Connections-For-postgres_fdw.patch
Description: Binary data


Re: Planner making bad choice in alternative subplan decision

2020-09-29 Thread Tom Lane
David Rowley  writes:
> On Tue, 29 Sep 2020 at 12:08, Tom Lane  wrote:
>> The idea I'd had was to adjust make_subplan and cost_subplan to estimate
>> EXIST cases on the basis of either 50% retrieval (same as ANY/ALL) or
>> maybe full retrieval if you want to be pessimistic.  I've not had time
>> to try it out though.

> Here's a patch which does that.  There are no regression test failures.

(1) IMO, make_subplan should be adjusted in the same way; or at least,
if it isn't, the comments therein need to be changed to explain why
it's okay for it to be out of sync with cost_subplan.

(2) Does this fix the plan choice problem you started with?

> I'm trying to anticipate areas that this could cause a regression.  I
> think generally a hashed subplan should win in about all cases where
> we lookup all of the items in the table. The places where it could
> cause regression are, where we have to load way more into the hash
> table than we're going to lookup. Assuming roughly the same costs for
> the subplan for hashing and the non-hashed subplan, then the crossover
> point will be about 2 lookups (2 x 50%).  I do wonder if 50% is a bit
> too big a number. We did ask the planner for a fast startup plan, so
> there is perhaps some reasoning to make the cost multiplier somewhere
> between 50% and 1 / nrows. I'm just not sure what that should be.
> There's very little to go on cost-wise, or even heuristically. So
> consider the patch still a discussion level patch.

One way to get some data is to see what the threshold multiplier is
to change the plan choice in an EXISTS that is currently planned
wrongly.

regards, tom lane




Re: calling procedures is slow and consumes extra much memory against calling function

2020-09-29 Thread Tom Lane
Pavel Stehule  writes:
> I agree with these conclusions.  I'll try to look if I can do #2 patch
> better for pg14. Probably it can fix more issues related to CALL statement,
> and I agree so this should not be backapatched.

I've pushed this and marked the CF entry committed.  Please start a
new thread and new CF entry whenever you have a more ambitious patch.

regards, tom lane




Re: Parallel INSERT (INTO ... SELECT ...)

2020-09-29 Thread Dilip Kumar
On Tue, Sep 29, 2020 at 8:27 PM Dilip Kumar  wrote:
>
> On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow  wrote:
> >
> > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> >  wrote:
> >
> > > I further checked on full txn id and command id. Yes, these are
> > > getting passed to workers  via InitializeParallelDSM() ->
> > > SerializeTransactionState(). I tried to summarize what we need to do
> > > in case of parallel inserts in general i.e. parallel COPY, parallel
> > > inserts in INSERT INTO and parallel inserts in CTAS.
> > >
> > > In the leader:
> > > GetCurrentFullTransactionId()
> > > GetCurrentCommandId(true)
> > > EnterParallelMode();
> > > InitializeParallelDSM() --> calls SerializeTransactionState()
> > > (both full txn id and command id are serialized into parallel DSM)
> > >
> > > In the workers:
> > > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > > full txn id and command id are restored into workers'
> > > CurrentTransactionState->fullTransactionId and currentCommandId)
> > > If the parallel workers are meant for insertions, then we need to set
> > > currentCommandIdUsed = true; Maybe we can lift the assert in
> > > GetCurrentCommandId(), if we don't want to touch that function, then
> > > we can have a new function GetCurrentCommandidInWorker() whose
> > > functionality will be same as GetCurrentCommandId() without the
> > > Assert(!IsParallelWorker());.
> > >
> > > Am I missing something?
> > >
> > > If the above points are true, we might have to update the parallel
> > > copy patch set, test the use cases and post separately in the parallel
> > > copy thread in coming days.
> > >
> >
> > Hi Bharath,
> >
> > I pretty much agree with your above points.
> >
> > I've attached an updated Parallel INSERT...SELECT patch, that:
> > - Only uses existing transaction state serialization support for
> > transfer of xid and cid.
> > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> > currentCommandIdUsed=true at the start of a parallel operation (used
> > for Parallel INSERT case, where we know the currentCommandId has been
> > assigned to the worker at the start of the parallel operation).
> > - Tweaks the Assert condition within "used=true" parameter case in
> > GetCurrentCommandId(), so that it only fires if in a parallel worker
> > and currentCommandId is false - refer to the updated comment in that
> > function.
> > - Does not modify any existing GetCurrentCommandId() calls.
> > - Does not remove any existing parallel-related asserts/checks, except
> > for the "cannot insert tuples in a parallel worker" error in
> > heap_prepare_insert(). I am still considering what to do with the
> > original error-check here.
> > [- Does not yet cater for other exclusion cases that you and Vignesh
> > have pointed out]
> >
> > This patch is mostly a lot cleaner, but does contain a possible ugly
> > hack, in that where it needs to call GetCurrentFullTransactionId(), it
> > must temporarily escape parallel-mode (recalling that parallel-mode is
> > set true right at the top of ExectePlan() in the cases of Parallel
> > INSERT/SELECT).
>
> I think you still need to work on the costing part, basically if we
> are parallelizing whole insert then plan is like below
>
> -> Gather
>   -> Parallel Insert
>   -> Parallel Seq Scan
>
> That means the tuple we are selecting via scan are not sent back to
> the gather node, so in cost_gather we need to see if it is for the
> INSERT then there is no row transferred through the parallel queue
> that mean we need not to pay any parallel tuple cost.

I just looked into the parallel CTAS[1] patch for the same thing, and
I can see in that patch it is being handled.

[1] 
https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel INSERT (INTO ... SELECT ...)

2020-09-29 Thread Dilip Kumar
On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow  wrote:
>
> On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
>  wrote:
>
> > I further checked on full txn id and command id. Yes, these are
> > getting passed to workers  via InitializeParallelDSM() ->
> > SerializeTransactionState(). I tried to summarize what we need to do
> > in case of parallel inserts in general i.e. parallel COPY, parallel
> > inserts in INSERT INTO and parallel inserts in CTAS.
> >
> > In the leader:
> > GetCurrentFullTransactionId()
> > GetCurrentCommandId(true)
> > EnterParallelMode();
> > InitializeParallelDSM() --> calls SerializeTransactionState()
> > (both full txn id and command id are serialized into parallel DSM)
> >
> > In the workers:
> > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > full txn id and command id are restored into workers'
> > CurrentTransactionState->fullTransactionId and currentCommandId)
> > If the parallel workers are meant for insertions, then we need to set
> > currentCommandIdUsed = true; Maybe we can lift the assert in
> > GetCurrentCommandId(), if we don't want to touch that function, then
> > we can have a new function GetCurrentCommandidInWorker() whose
> > functionality will be same as GetCurrentCommandId() without the
> > Assert(!IsParallelWorker());.
> >
> > Am I missing something?
> >
> > If the above points are true, we might have to update the parallel
> > copy patch set, test the use cases and post separately in the parallel
> > copy thread in coming days.
> >
>
> Hi Bharath,
>
> I pretty much agree with your above points.
>
> I've attached an updated Parallel INSERT...SELECT patch, that:
> - Only uses existing transaction state serialization support for
> transfer of xid and cid.
> - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> currentCommandIdUsed=true at the start of a parallel operation (used
> for Parallel INSERT case, where we know the currentCommandId has been
> assigned to the worker at the start of the parallel operation).
> - Tweaks the Assert condition within "used=true" parameter case in
> GetCurrentCommandId(), so that it only fires if in a parallel worker
> and currentCommandId is false - refer to the updated comment in that
> function.
> - Does not modify any existing GetCurrentCommandId() calls.
> - Does not remove any existing parallel-related asserts/checks, except
> for the "cannot insert tuples in a parallel worker" error in
> heap_prepare_insert(). I am still considering what to do with the
> original error-check here.
> [- Does not yet cater for other exclusion cases that you and Vignesh
> have pointed out]
>
> This patch is mostly a lot cleaner, but does contain a possible ugly
> hack, in that where it needs to call GetCurrentFullTransactionId(), it
> must temporarily escape parallel-mode (recalling that parallel-mode is
> set true right at the top of ExectePlan() in the cases of Parallel
> INSERT/SELECT).

I think you still need to work on the costing part, basically if we
are parallelizing whole insert then plan is like below

-> Gather
  -> Parallel Insert
  -> Parallel Seq Scan

That means the tuple we are selecting via scan are not sent back to
the gather node, so in cost_gather we need to see if it is for the
INSERT then there is no row transferred through the parallel queue
that mean we need not to pay any parallel tuple cost.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-09-29 Thread vignesh C
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila  wrote:
>
> On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila  wrote:
> >
> > Few additional comments:
> > ==
>
> Some more comments:
>

Thanks Amit for the comments, I will work on the comments and provide
a patch in the next few days.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] logical decoding of two-phase transactions

2020-09-29 Thread Dilip Kumar
On Mon, Sep 28, 2020 at 1:13 PM Ajin Cherian  wrote:
>
> On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila  wrote:
>
> > No problem. I think you can handle the other comments and then we can
> > come back to this and you might want to share the exact details of the
> > test (may be a narrow down version of the original test) and I or
> > someone else might be able to help you with that.
> >
> > --
> > With Regards,
> > Amit Kapila.
>
> I have added a new patch for supporting 2 phase commit semantics in
> the streaming APIs for the logical decoding plugins. I have added 3
> APIs
> 1. stream_prepare
> 2. stream_commit_prepared
> 3. stream_abort_prepared
>
> I have also added the support for the new APIs in test_decoding
> plugin. I have not yet added it to pgoutpout.
>
> I have also added a fix for the error I saw while calling
> ReorderBufferCleanupTXN as part of FinishPrepared handling. As a
> result I have removed the function I added earlier,
> ReorderBufferCleanupPreparedTXN.
> Please have a look at the new changes and let me know what you think.
>
> I will continue to look at:
>
> 1. Remove snapshots on prepare truncate.
> 2. Bug seen while abort of prepared transaction, the prepared flag is
> lost, and not able to make out that it was a previously prepared
> transaction.

I have started looking into you latest patches,  as of now I have a
few comments.

v6-0001

@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
  prev_lsn = change->lsn;

  /* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
  {
  curtxn = change->txn;
  SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
  break;
  }
  }
-

For streaming transaction we need to check the xid everytime because
there could concurrent a subtransaction abort, but
for two-phase we don't need to call SetupCheckXidLive everytime,
because we are sure that transaction is going to be
the same throughout the processing.



Apart from this I have also noticed a couple of cosmetic changes

+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }

One blank line after variable declations

- /* remove potential on-disk data, and deallocate */
+/*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
  ReorderBufferCleanupTXN(rb, txn);

Comment not aligned properly


v6-0003

+LookupGXact(const char *gid)
+{
+ int i;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];

I think we should take LW_SHARED lowck here no?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread Peter Eisentraut

On 2020-09-28 15:46, Vladimir Sitnikov wrote:

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is 
applications would rely on it (and it becomes a defacto standard), and 
it would be hard to move to the proper "text with streaming API" datatype.
b) If we make «clob is text», then people might start using 
update/substring APIs (which is the primary motivation for Clob) without 
realizing there’s full value update behind the scenes. Currently, they 
can use setString/getString for text, and it is crystal clear that the 
text is updated fully on every update.


When we added TOAST, we made the explicit decision to not add a "LONG" 
type but instead have the toasting mechanism transparent in all 
variable-length types.  And that turned out to be a very successful 
decision, because it allows this system to be used by all data types, 
not only one or two hardcoded ones.  Therefore, I'm very strongly of the 
opinion that if a streaming system of the sort you allude to were added, 
it would also be added transparently into the TOAST system.


The JDBC spec says

"""
An implementation of a Blob, Clob or NClob object may either be locator 
based or result in the object being fully materialized on the client.


By default, a JDBC driver should implement the Blob, Clob and NClob 
interfaces using the appropriate locator type. An application does not 
deal directly with the locator types that are defined in SQL.

"""

(A "locator" in SQL is basically what you might call a streaming handle.)

So yes, this encourages the implementation of locators.  But it also 
specifies that if you don't have locators, you can implement this using 
non-large-object types.



--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] We install pg_regress and isolationtester but not pg_isolation_regress

2020-09-29 Thread Alvaro Herrera
On 2018-Jun-01, Craig Ringer wrote:

> On 28 May 2018 at 15:06, Craig Ringer  wrote:
> 
> > Per topic, the Pg makefiles install pg_regress (for use by extensions) and
> > htey install the isolationtester, but they don't install
> > pg_isolation_regress.
> >
> > We should install it too.

I happened to come across this thread by accident, and I tend to agree
that we need to install both isolationtester and pg_isolation_regress to
the pgxs dirs, just like we do pg_regress.  I can't find that
isolationtester is installed anywhere though; maybe that changed after
you posted this.  Anyway, this version of the patch installs both.

I did search for evidence in the Makefile's git log that would remove a
recipe for installing isolationtester; couldn't find anything.

> I'm wondering if I should add ISOLATION support to PGXS too, like we have
> REGRESS .

This was covered by Michael's d3c09b9b1307 a few months after you
posted.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 8c81d47ee10bee52b53c8092d8724ce39fed3dfa Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 1 Jun 2018 11:26:09 +0800
Subject: [PATCH v3] Install pg_isolation_regress and isolationtester

---
 src/test/isolation/Makefile | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile
index da5e088bdd..d23e2cec64 100644
--- a/src/test/isolation/Makefile
+++ b/src/test/isolation/Makefile
@@ -18,12 +18,16 @@ OBJS = \
 
 all: isolationtester$(X) pg_isolation_regress$(X)
 
-# Though we don't install these binaries, build them during installation
-# (including temp-install).  Otherwise, "make -j check-world" and "make -j
-# installcheck-world" would spawn multiple, concurrent builds in this
-# directory.  Later builds would overwrite files while earlier builds are
-# reading them, causing occasional failures.
-install: | all
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_isolation_regress$(X) '$(DESTDIR)$(pgxsdir)/$(subdir)/pg_isolation_regress$(X)'
+	$(INSTALL_PROGRAM) isolationtester$(X) '$(DESTDIR)$(pgxsdir)/$(subdir)/isolationtester$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(pgxsdir)/$(subdir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/pg_isolation_regress$(X)'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/isolationtester$(X)'
 
 submake-regress:
 	$(MAKE) -C $(top_builddir)/src/test/regress pg_regress.o
-- 
2.20.1



Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

2020-09-29 Thread Fujii Masao




On 2020/09/25 21:19, Bharath Rupireddy wrote:

On Fri, Sep 25, 2020 at 3:21 PM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:
 >
 > I think that we can simplify the code by merging the connection-retry
 > code into them, like the attached very WIP patch (based on yours) does.
 >

+1.

 >
 > +                       else
 > +                               ereport(ERROR,
 > +                                       
(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
 > +                                        errmsg("could not connect to server 
\"%s\"",
 > +                                                       server->servername),
 > +                                        errdetail_internal("%s", 
pchomp(PQerrorMessage(entry->conn);
 >
 > The above is not necessary? If this error occurs, connect_pg_server()
 > reports an error, before the above code is reached. Right?
 >

Removed.

Thanks for the comments.

I think we need to have a volatile qualifier for need_reset_conn, because of 
the sigsetjmp.


Yes.


Instead of "need_reset_conn", "retry_conn" would be more meaningful and also instead of goto label 
name "reset;", "retry:".


Sounds good.


I changed "closing connection %p to reestablish connection" to  "closing connection 
%p to reestablish a new one"


OK.


I also adjusted the comments to be under the 80char limit.
I feel, when we are about to close an existing connection and reestablish a new 
connection, it will be good to have a debug3 message saying that we "could not 
connect to postgres_fdw connection %p"[1].


+1 to add debug3 message there. But this message doesn't seem to
match with what the error actually happened. What about something like
"could not start remote transaction on connection %p", instead?
Also maybe it's better to append PQerrorMessage(entry->conn)?



Attaching v5 patch that has the above changes. Both make check and make 
check-world regression tests passes. Please review.


Thanks for updating the patch!

+-- Generate a connection to remote. Local backend will cache it.
+SELECT * FROM ft1 LIMIT 1;

The result of this query would not be stable. Probably you need to,
for example, add ORDER BY or replace * with 1, etc.


+-- Retrieve pid of remote backend with application name fdw_retry_check
+-- and kill it intentionally here. Note that, local backend still has
+-- the remote connection/backend info in it's cache.
+SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
+backend_type = 'client backend' AND application_name = 'fdw_retry_check';

Isn't this test fragile because there is no gurantee that the target backend
has exited just after calling pg_terminate_backend()?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Optimize memory allocation code

2020-09-29 Thread Alvaro Herrera
On 2020-Sep-26, Li Japin wrote:

> Thanks! How big is this overhead? Is there any way I can test it?

You could also have a look at the assembly code that your compiler
generates -- particularly examine how it changes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: __pg_log_level in anonynous enum should be initialized? (Was: pgsql: Change SHA2 implementation based on OpenSSL to use EVP digest ro)

2020-09-29 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Sep 28, 2020 at 12:49:29PM -0400, Tom Lane wrote:
>> Since prairiedog is not likely to be around forever, I propose that
>> we ought to enforce this going forward by arranging for common/logging.c
>> to not get built into the libpgcommon_shlib library variant.

> Agreed.  This makes sense in the long term, so attached is a proposal
> of patch.  I did not really see the point in complicating the MSVC
> scripts for that though.  The new variable names look natural to me
> like that, the new comment may need some tweaks.

Hm, it doesn't seem like "OBJS_PGCOMMON" conveys much.  I think
OBJS_FRONTEND ought to be the set of files built for frontend
application use (i.e., libpgcommon.a), so we need a new name
for what will go into libpgcommon_shlib.a.  Maybe OBJS_SHLIB,
or to be even more explicit, OBJS_FRONTEND_SHLIB.

As for the comment, maybe "logging.c is excluded from OBJS_SHLIB
as a matter of policy, because it is not appropriate for general
purpose libraries such as libpq to report errors directly."

regards, tom lane




Re: [HACKERS] logical decoding of two-phase transactions

2020-09-29 Thread Amit Kapila
On Tue, Sep 29, 2020 at 5:08 PM Ajin Cherian  wrote:
>
> On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila  wrote:
>
> > If the transaction is prepared which you can ensure via
> > ReorderBufferTxnIsPrepared() (considering you have a proper check in
> > that function), it should not require skipping the transaction in
> > Abort. One way it could happen is if you clean up the ReorderBufferTxn
> > in Prepare which you were doing in earlier version of patch which I
> > pointed out was wrong, if you have changed that then I don't know why
> > it could fail, may be someplace else during prepare the patch is
> > freeing it. Just check that.
>
> I had a look at this problem. The problem happens when decoding is
> done after a prepare but before the corresponding rollback
> prepared/commit prepared.
> For eg:
>
> Begin;
> 
> 
> PREPARE TRANSACTION '';
> SELECT data FROM pg_logical_slot_get_changes(...);
> :
> :
> ROLLBACK PREPARED '';
> SELECT data FROM pg_logical_slot_get_changes(...);
>
> Since the prepare is consumed in the first call to
> pg_logical_slot_get_changes, subsequently when it is encountered in
> the second call, it is skipped (as already decoded) in DecodePrepare
> and the txn->flags are not set to
> reflect the fact that it was prepared. The same behaviour is seen when
> it is commit prepared after the original prepare was consumed.
> Initially I was thinking about the following approach to fix it in 
> DecodePrepare
> Approach 1:
>1. Break the big Skip check in DecodePrepare into 2 parts.
>   Return if the following conditions are true:
>  If (parsed->dbId != InvalidOid && parsed->dbId !=
> ctx->slot->data.database) ||
>ctx->fast_forward || FilterByOrigin(ctx, origin_id))
>
>   2. Check If this condition is true:
>   SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr)
>
>   Then this means we are skipping because this has already
> been decoded, then instead of returning, call a new function
> ReorderBufferMarkPrepare() which will only update the flags in the txn
> to indicate that the transaction is prepared
>   Then later in DecodeAbort or DecodeCommit, we can confirm
> that the transaction has been Prepared by checking if the flag is set
> and call ReorderBufferFinishPrepared appropriately.
>
> But then, thinking about this some more, I thought of a second approach.
> Approach 2:
> If the only purpose of all this was to differentiate between
> Abort vs Rollback Prepared and Commit vs Commit Prepared, then we dont
> need this. We already know the exact operation
> in DecodeXactOp and can differentiate there. We only
> overloaded DecodeAbort and DecodeCommit for convenience, we can always
> call these functions with an extra flag to denote that we are either
> commit or aborting a
> previously prepared transaction and call
> ReorderBufferFinishPrepared accordingly.
>

The second approach sounds better but you can see if there is not much
you want to reuse from DecodeCommit/DecodeAbort then you can even
write new functions DecodeCommitPrepared/DecodeAbortPrepared. OTOH, if
there is a common code among them then passing the flag would be a
better way.

-- 
With Regards,
Amit Kapila.




Re: Parallel copy

2020-09-29 Thread Amit Kapila
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila  wrote:
>
> Few additional comments:
> ==

Some more comments:

v5-0002-Framework-for-leader-worker-in-parallel-copy
===
1.
These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context
switch & the
+ * work is fairly distributed among the workers.

How about writing it as: "These values help in the handover of
multiple records with the significant size of data to be processed by
each of the workers. This also ensures there is no context switch and
the work is fairly distributed among the workers."

2. Can we keep WORKER_CHUNK_COUNT, MAX_BLOCKS_COUNT, and RINGSIZE as
power-of-two? Say WORKER_CHUNK_COUNT as 64, MAX_BLOCK_COUNT as 1024,
and accordingly choose RINGSIZE. At many places, we do that way. I
think it can sometimes help in faster processing due to cache size
requirements and in this case, I don't see a reason why we can't
choose these values to be power-of-two. If you agree with this change
then also do some performance testing after this change?

3.
+ bool   curr_blk_completed;
+ char   data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8  skip_bytes;
+} ParallelCopyDataBlock;

Is there a reason to keep skip_bytes after data? Normally the variable
size data is at the end of the structure. Also, there is no comment
explaining the purpose of skip_bytes.

4.
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number
of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock

Keep one empty line after the description line like below. I also
suggested to do a minor tweak in the above sentence which is as
follows:

* Copy data block information.
*
* These data blocks are created in DSM. Data read ...

Try to follow a similar format in other comments as well.

5. I think it is better to move parallelism related code to a new file
(we can name it as copyParallel.c or something like that).

6. copy.c(1648,25): warning C4133: 'function': incompatible types -
from 'ParallelCopyLineState *' to 'uint32 *'
Getting above compilation warning on Windows.

v5-0003-Allow-copy-from-command-to-process-data-from-file
==
1.
@@ -4294,7 +5047,7 @@ BeginCopyFrom(ParseState *pstate,
  * only in text mode.
  */
  initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *)
palloc(RAW_BUF_SIZE + 1);

Is there anyway IsParallelCopy can be true by this time? AFAICS, we do
anything about parallelism after this. If you want to save this
allocation then we need to move this after we determine that
parallelism can be used or not and accordingly the below code in the
patch needs to be changed.

 * ParallelCopyFrom - parallel copy leader's functionality.
  *
  * Leader executes the before statement for before statement trigger, if before
@@ -1110,8 +1547,302 @@ ParallelCopyFrom(CopyState cstate)
  ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
  ereport(DEBUG1, (errmsg("Running parallel copy leader")));

+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/
+ pfree(cstate->raw_buf);
+ cstate->raw_buf = NULL;

Is there anything else also the allocation of which depends on parallelism?

2.
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}

In the comments, we should write why parallelism is not allowed for a
particular case

Re: Disable WAL logging to speed up data loading

2020-09-29 Thread Ashutosh Bapat
Can they use a database with all unlogged tables?

On Tue, Sep 29, 2020 at 1:58 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> Hello,
>
>
> We'd like to propose a feature to disable WAL to speed up data loading.  This 
> was inspired by a feature added in the latest MySQL.  I wish you won't fear 
> this feature...
>
>
> BACKGROUND
> 
>
> This branches off from [1] as mentioned therein.  Briefly speaking, a 
> customer wants to shorten the time for nightly loading of data into their 
> data warehouse as much as possible to be prepared for using the data 
> warehouse for new things.
>
> Currently, they are using Oracle's SQL*Loader with its no-logging feature.  
> They want a similar feature to migrate to Postgres.  Other than the data 
> loading performance, they don't want to be concerned about the storage for 
> large volumes of WAL.
>
> In [1], we thought about something like Oracle's per-table no-logging 
> feature, but it seems difficult (or at least not easy.)  Meanwhile, I found 
> another feature added in the latest MySQL 8.0.21 [2].  This proposal follows 
> it almost directly.  That satisfies the customer request.
>
> As an aside, it's also conceivable that in the near future, users could see 
> the WAL bottleneck (WAL buffer or disk) when they utilize the parallel COPY 
> that is being developed in the community.
>
>
> FUNCTIONAL SPECIFICATION
> 
>
> Add a new value 'none' to the server configuration parameter wal_level.  With 
> this setting:
>
> * No WAL is emitted.
>
> * The server refuses to start (pg_ctl start fails) after an abnormal shutdown 
> due to power outage, pg_ctl's immediate shutdown, etc, showing a 
> straightforward message like MySQL.
>
> * Features like continuous archiving, pg_basebackup, and streaming/logical 
> replication that requires wal_level >= replica are not available.
>
> * The user can use all features again if you shut down the server 
> successfully after data loading and reset wal_level to a value other than 
> none.  He needs to take a base backup or rebuild the replication standby 
> after restarting the server.
>
>
> In addition to the cosmetic modifications to the manual articles that refer 
> to wal_level, add a clause or paragraphs to the following sections to let 
> users know the availability of this feature.
>
> 14.4. Populating a Database
> 18.6.1. Upgrading Data via pg_dumpall
>
>
> PROGRAM DESIGN (main point only)
> 
>
> As in the bootstrap mode (during initdb), when wal_level = none, XLogInsert() 
> does nothing and just returns a fixed value, which is the tail of the last 
> shutdown checkpoint WAL record.  As a result, the value is set to the 
> relation page header's LSN field.
>
> In addition, it might be worth having XLogBeginInsert() and XLogRec...() to 
> check wal_level and just return.  I don't expect much from this, but it may 
> be interesting to give it a try to see the squeezed performance.
>
> StartupXLOG() checks the wal_level setting in pg_control and quits the 
> startup with ereport(FATAL) accordingly.
>
>
> [1]
> Implement UNLOGGED clause for COPY FROM
> https://www.postgresql.org/message-id/osbpr01mb47c0bdc5129c65dfc5e5ed...@osbpr01mb4888.jpnprd01.prod.outlook.com
>
> [2]
> Disabling Redo Logging
> https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging
>
>
> Regards
> Takayuki Tsunakawa
>
>
>


-- 
Best Wishes,
Ashutosh Bapat




Re: Corner-case bug in pg_rewind

2020-09-29 Thread Heikki Linnakangas

On 11/09/2020 09:42, Ian Barwick wrote:

Take the following cluster with:
   - node1 (initial primary)
   - node2 (standby)
   - node3 (standby)

Following activity takes place (greatly simplified from a real-world situation):

1. node1 is shut down.
2. node3 is promoted
3. node2 is attached to node3.
4. node1 is attached to node3
5. node1 is then promoted (creating a split brain situation with
node1 and node3 as primaries)
6. node2 and node3 are shut down (in that order).
7. pg_rewind is executed to reset node2 so it can reattach
to node1 as a standby. pg_rewind claims:

 pg_rewind: servers diverged at WAL location X/XXX on timeline 2
 pg_rewind: no rewind required

8. based off that assurance, node2 is restarted with replication configuration
pointing to node1 - but it is unable to attach, with node2's log reporting
something like:

   new timeline 3 forked off current database system timeline 2
before current recovery point X/XXX

The cause is that pg_rewind is assuming that if the node's last
checkpoint matches the
divergence point, no rewind is needed:

 if (chkptendrec == divergerec)
 rewind_needed = false;

but in this case there *are* records beyond the last checkpoint, which can be
inferred from "minRecoveryPoint" - but this is not checked.


Yep, I think you're right.


Attached patch addresses this. It includes a test, which doesn't make use of
the RewindTest module, as that hard-codes a primary and a standby, while here
three nodes are needed (I can't come up with a situation where this can be
reproduced with only two nodes). The test sets "wal_keep_size" so would need
modification for Pg12 and earlier.


I think we also need to change the extractPageMap() call:


/*
 * Read the target WAL from last checkpoint before the point of fork, to
 * extract all the pages that were modified on the target cluster after
 * the fork. We can stop reading after reaching the final shutdown 
record.
 * XXX: If we supported rewinding a server that was not shut down 
cleanly,
 * we would need to replay until the end of WAL here.
 */
if (showprogress)
pg_log_info("reading WAL in target");
extractPageMap(datadir_target, chkptrec, lastcommontliIndex,
   ControlFile_target.checkPoint, 
restore_command);
filemap_finalize();


so that it scans all the way up to minRecoveryPoint, instead of stopping 
at ControlFile_target.checkPoint. Otherwise, we will fail to rewind 
changes that happened after the last checkpoint. And we need to do that 
regardless of the "no rewind required" bug, any time we rewind a server 
that's in DB_SHUTDOWNED_IN_RECOVERY state. I'm surprised none of the 
existing tests have caught that. Am I missing something?


- Heikki




Re: Skip ExecCheckRTPerms in CTAS with no data

2020-09-29 Thread Bharath Rupireddy
On Mon, Sep 28, 2020 at 7:48 PM Tom Lane  wrote:
>
> Bharath Rupireddy  writes:
> > In case of CTAS with no data, we actually do not insert the tuples
> > into the created table, so we can skip checking for the insert
> > permissions. Anyways, the insert permissions will be checked when the
> > tuples are inserted into the table.
>
> I'd argue this is wrong.  You don't get to skip permissions checks
> in ordinary queries just because, say, there's a LIMIT 0 on the
> query.
>

Right, when there's a select with limit 0 clause, we do check for the
select permissions. But my point is, we don't check insert
permissions(or select or update etc.) when we create a plain table
using CREATE TABLE test_tbl(a1 INT). Of course, we do check create
permissions. Going by the similar point, shouldn't we also check only
create permission(which is already being done as part of
DefineRelation) and skip the insert permission(the change this patch
does) for the new table being created as part of CREATE TABLE test_tbl
AS SELECT * FROM test_tbl2? However select permission will be checked
for test_tbl2. The insert permissions will be checked anyways before
inserting rows into the table created in CTAS.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] logical decoding of two-phase transactions

2020-09-29 Thread Ajin Cherian
On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila  wrote:

> If the transaction is prepared which you can ensure via
> ReorderBufferTxnIsPrepared() (considering you have a proper check in
> that function), it should not require skipping the transaction in
> Abort. One way it could happen is if you clean up the ReorderBufferTxn
> in Prepare which you were doing in earlier version of patch which I
> pointed out was wrong, if you have changed that then I don't know why
> it could fail, may be someplace else during prepare the patch is
> freeing it. Just check that.

I had a look at this problem. The problem happens when decoding is
done after a prepare but before the corresponding rollback
prepared/commit prepared.
For eg:

Begin;


PREPARE TRANSACTION '';
SELECT data FROM pg_logical_slot_get_changes(...);
:
:
ROLLBACK PREPARED '';
SELECT data FROM pg_logical_slot_get_changes(...);

Since the prepare is consumed in the first call to
pg_logical_slot_get_changes, subsequently when it is encountered in
the second call, it is skipped (as already decoded) in DecodePrepare
and the txn->flags are not set to
reflect the fact that it was prepared. The same behaviour is seen when
it is commit prepared after the original prepare was consumed.
Initially I was thinking about the following approach to fix it in DecodePrepare
Approach 1:
   1. Break the big Skip check in DecodePrepare into 2 parts.
  Return if the following conditions are true:
 If (parsed->dbId != InvalidOid && parsed->dbId !=
ctx->slot->data.database) ||
   ctx->fast_forward || FilterByOrigin(ctx, origin_id))

  2. Check If this condition is true:
  SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr)

  Then this means we are skipping because this has already
been decoded, then instead of returning, call a new function
ReorderBufferMarkPrepare() which will only update the flags in the txn
to indicate that the transaction is prepared
  Then later in DecodeAbort or DecodeCommit, we can confirm
that the transaction has been Prepared by checking if the flag is set
and call ReorderBufferFinishPrepared appropriately.

But then, thinking about this some more, I thought of a second approach.
Approach 2:
If the only purpose of all this was to differentiate between
Abort vs Rollback Prepared and Commit vs Commit Prepared, then we dont
need this. We already know the exact operation
in DecodeXactOp and can differentiate there. We only
overloaded DecodeAbort and DecodeCommit for convenience, we can always
call these functions with an extra flag to denote that we are either
commit or aborting a
previously prepared transaction and call
ReorderBufferFinishPrepared accordingly.

Let me know your thoughts on these two approaches or any other
suggestions on this.

regards,
Ajin Cherian
Fujitsu Australia




Re: BLOB / CLOB support in PostgreSQL

2020-09-29 Thread Dave Cramer
On Mon, 28 Sep 2020 at 20:08, Andy Fan  wrote:

>
>
> On Tue, Sep 29, 2020 at 5:22 AM Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
>
>> >100% compatible with the MySQL
>>
>> It is hardly a justification for a feature or for a change request.
>>
>> Vladimir
>>
>
I would have to agree. This argument alone has not swayed discussions in
the past.

>
>>
> Glad to see this topic.
>
> The obviously different opinion for this feature is based on if we need a
> "perfect" solution
> or  a  "OK-to-most-user cases" solution.
>
> As for PG core developers,  I'm +1 with "pg have their own serious
> problems" , and
> we are lacking the resources to handle everything well. However, "serious
> problems"
> to different people may be different.
>
> As a rare experienced Java developer,  looks raise "NotImplemented" error
> for some
> unimplemented APIs will not make the maintenance work hard, that probably
> not common
> used APIs.  Not fully supported API should be better than fully not
> supported APIs at all.
>
> I don't agree with this statement. Currently the largest source of
problems with CLOB not being implemented is from hibernate which chose to
not follow the spec and for some reason fails if we throw a not implemented
(which is allowed by the spec). If we now implement something which only
partially implements the API I can imagine others failing in other ways.
Honestly I have no idea but what is the point of he spec if we don't adhere
to it?


> As an Oracle DBA before, I do see users need CLOB/BLOB some time but for
> most of them,
> they just want to save/get big stuff.  This case in Oracle may be more
> outstanding because of
> the max length of varchar2 is too low.
>
> It is rare to see people writing JDBC now. It is much more likely they are
using JPA or hibernate. Getting around this is rather trivial simply by
using @Type(org.hibernate.type.TextType)


> When come to the JDBC standard
>
> JDBC Spec> * All methods on the Clob interface must be fully
> implemented if the
> JDBC Spec> * JDBC driver supports the data type.
>
> What would be the sense behind this?  This is not reasonable based on
> limited experience.
>


The sense about this is that others writing code above it expect everything
to be implemented. If they aren't then things fail.

As mentioned above hibernate fails to check for not implemented and we see
the number of issues that resulted in that. Imagine what happens when we
partially implement the interface. It will be all but useless to them.

I'm not currently convinced the risk/reward ratio is in our favour.

Dave

>


Re: OpenSSL 3.0.0 compatibility

2020-09-29 Thread Daniel Gustafsson
> On 22 Sep 2020, at 14:01, Daniel Gustafsson  wrote:
> 
>> On 22 Sep 2020, at 11:37, Peter Eisentraut 
>>  wrote:
>> 
>> On 2020-09-18 16:11, Daniel Gustafsson wrote:
>>> Since we support ciphers that are now deprecated, we have no other choice 
>>> than
>>> to load the legacy provider.
>> 
>> Well, we could just have deprecated ciphers fail, unless the user loads the 
>> legacy provider in the OS configuration.  There might be an argument that 
>> that is more proper.
> 
> Thats a fair point.  The issue I have with that is that we'd impose a system
> wide loading of the legacy provider, impacting other consumers of libssl as
> well.

After another round of thinking on this, somewhat spurred on by findings in the
the nearby FIPS thread, I'm coming around to this idea.

Looking at SCRAM+FIPS made me realize that we can't enforce FIPS mode in
pgcrypto via the OpenSSL configuration file, since pgcrypto doesn't load the
config.  The automatic initialization in 1.1.0+ will however load the config
file, so not doing it for older versions makes pgcrypto inconsistent.  Thus, I
think we should make sure that pgcrypto loads the configuratio for all OpenSSL
versions, and defer the provider decision in 3.0.0 to the users.  This makes
the patch minimally intrusive while making pgcrypto behave consistently (and
giving 3.0.0 users the option to not run legacy).

The attached adds config loading to pgcrypto for < 1.1.0 and a doc notice for
enabling the legacy provider in 3.0.0.  This will require an alternative output
file for non-legacy configs, but that should wait until 3.0.0 is GA since the
returned error messages have changed over course of development and may not be
set in stone just yet.

cheers ./daniel



0001-Fix-OpenSSL-3-support-in-pgcrypto-v3.patch
Description: Binary data


Re: Improved Cost Calculation for IndexOnlyScan

2020-09-29 Thread Heikki Linnakangas

On 29/09/2020 11:49, Hamid Akhtar wrote:
So, not actually random replacement here, rather a change with 
baserel->allvisfrac taken into consideration (as given below):


index_random_page_cost = Min(spc_seq_page_cost + spc_random_page_cost * 
(1.0 - baserel->allvisfrac), spc_random_page_cost);



Does this make sense?


No. genericcostestimate() is only concerned with accesses to the index, 
not the the heap accesses that are needed with Index Scans. 'allvisfrac' 
should not affect the number of *index* pages fetched in any way.


- Heikki




Re: Parallel copy

2020-09-29 Thread Greg Nancarrow
Hi Vignesh and Bharath,

Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as
parallel-unsafe.
Can you explain why this is?

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Add session statistics to pg_stat_database

2020-09-29 Thread Laurenz Albe
On Thu, 2020-09-24 at 14:38 -0700, Soumyadeep Chakraborty wrote:
> Thanks for submitting this! Please find my feedback below.

Thanks for the thorough review.

Before I update the patch, I have a few comments and questions.

> * Are we trying to capture ONLY client initiated disconnects in
> m_aborted (we are not handling other disconnects by not accounting for
> EOF..like if psql was killed)? If yes, why?

I thought it was interesting to know how many database sessions are
ended regularly as opposed to ones that get killed or end by unexpected
client death.

> * pgstat_send_connstats(): How about renaming the "force" argument to
> "disconnected"?

Yes, that might be better.  I'll do that.

> *
> > static TimestampTz pgStatActiveStart = DT_NOBEGIN;
> > static PgStat_Counter pgStatActiveTime = 0;
> > static TimestampTz pgStatTransactionIdleStart = DT_NOBEGIN;
> > static PgStat_Counter pgStatTransactionIdleTime = 0;
> > static bool pgStatSessionReported = false;
> > bool pgStatSessionDisconnected = false;
> 
> I think we can house all of these globals inside PgBackendStatus and can
> follow the protocol for reading/writing fields in PgBackendStatus.
> Refer: PGSTAT_{BEGIN|END}_WRITE_ACTIVITY

Are you sure that is the right way to go?

Correct me if I am wrong, but isn't PgBackendStatus for relevant status
information that other processes can access?
I'd assume that it is not the correct place to store backend-private data
that are not relevant to others.
Besides, if data is written to this structure more often, readers would
have deal with more contention, which could affect performance.

But I agree with the following:

> Also, some of these fields are not required:
> 
> I don't think we need pgStatActiveStart and pgStatTransactionIdleStart -
> instead of these two we could use
> PgBackendStatus.st_state_start_timestamp which marks the beginning TS of
> the backend's current state (st_state). We can look at that field along
> with the current and to-be-transitioned-to states inside
> pgstat_report_activity() when there is a transition away from
> STATE_RUNNING, STATE_IDLEINTRANSACTION or
> STATE_IDLEINTRANSACTION_ABORTED, in order to update pgStatActiveTime and
> pgStatTransactionIdleTime. We would also need to update those counters
> on disconnect/PGSTAT_STAT_INTERVAL timeout if the backend's current
> state was STATE_RUNNING, STATE_IDLEINTRANSACTION or
> STATE_IDLEINTRANSACTION_ABORTED (in pgstat_send_connstats())

Yes, that would be better.

> pgStatSessionDisconnected is not required as it can be determined if a
> session has been disconnected by looking at the force argument to
> pgstat_report_stat() [unless we would want to distinguish between
> client-initiated disconnects, which I am not sure why, as I have
> brought up above].

But wouldn't that mean that we count *every* end of a session as regular
disconnection, even if the backend was killed?

I personally would want all my database connections to be closed by
the client, unless something unexpected happens.

> pgStatSessionReported is not required. We can glean this information by
> checking if the function local static last_report in
> pgstat_report_stat() is 0 and passing this on as another param
> "first_report" to pgstat_send_connstats().

Yes, that is better.

> * PGSTAT_FILE_FORMAT_ID needs to be updated when a stats collector data
> structure changes and we had a change in PgStat_StatDBEntry.

I think that should be left to the committer.

> * We can directly use PgBackendStatus.st_proc_start_timestamp for
> calculating m_session_time. We can also choose to report session uptime
> even when the report is for the not-disconnect case
> (PGSTAT_STAT_INTERVAL elapsed). No reason why not. Then we would need to
> pass in the value of last_report to pgstat_send_connstats() -> calculate
> m_session_time to be number of time units from
> PgBackendStatus.st_proc_start_timestamp for the first report and then
> number of time units from the last_report for all subsequent reports.

Yes, that would make for better statistics, since client connections
can last quite long.

> * We would need to bump the catalog version since we have made
> changes to system views. Refer: #define CATALOG_VERSION_NO

Again, I think that's up to the committer.

Thanks again!

Yours,
Laurenz Albe





Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

2020-09-29 Thread Peter Eisentraut

On 2020-09-26 07:32, Amit Kapila wrote:

This is exactly my feeling too. But how about changing documentation a
bit as proposed above [1] to make it precise.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LQWXS_4RwLo%2BWT7jusGnBkUvXO73xQOCsydWLYBpLBEg%40mail.gmail.com


Yes, making the documentation more precise would be good.  Right now, 
it's a bit confusing and unclear (using phrases like "based on"). 
Someone who wants to the the VACUUM PARALLEL option presumably  wants 
precise control, so specifying the exact rules would be desirable.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Disable WAL logging to speed up data loading

2020-09-29 Thread Amul Sul
On Tue, Sep 29, 2020 at 1:58 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> Hello,
>
>
> We'd like to propose a feature to disable WAL to speed up data loading.  This 
> was inspired by a feature added in the latest MySQL.  I wish you won't fear 
> this feature...
>

TWIMW, pg_bulkload contrib module[1], also does the same for the
faster data loading.

Regards,
Amul

1] http://ossc-db.github.io/pg_bulkload/pg_bulkload.html




Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-09-29 Thread David Rowley
On Wed, 23 Sep 2020 at 08:42, David Rowley  wrote:
>
> On Tue, 22 Sep 2020 at 19:08, David Rowley  wrote:
> > I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc
> > 9.3. I'm unable to see any gains with this, however, the results were
> > pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely
> > need to run that a bit longer. I'll do that tonight.
>
> I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs
> master + elog_ereport_attribute_cold_v4.patch
>
> It does not look great. The patched version seems to have done about
> 1.17% less work than master did.

I've marked this patch back as waiting for review. It would be good if
someone could run some tests on some intel hardware and see if they
can see any speedup.

David




Re: Fix inconsistency in jsonpath .datetime()

2020-09-29 Thread Alexander Korotkov
On Fri, Sep 25, 2020 at 2:02 AM Alexander Korotkov  wrote:
> Other opinions?

Given no other opinions yet, I've pushed the both patches.

--
Regards,
Alexander Korotkov




Re: Improved Cost Calculation for IndexOnlyScan

2020-09-29 Thread Hamid Akhtar
On Tue, Sep 29, 2020 at 1:08 PM Heikki Linnakangas  wrote:

> On 29/09/2020 10:06, Hamid Akhtar wrote:
> > In one of my earlier emails [1], I mentioned that there seems to be a
> > problem with how the cost for index only scans is being calculated.
> > [1]
> >
> https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com
> >
> > My concern is that there seems to be a bigger disconnect between the
> > cost of index only scan and the execution time. Having tested this on 3
> > different systems, docker, laptop and a server with RAID 5 SSD
> > configured, at the threshold where index only scan cost exceeds that of
> > sequential scan, index only is still around 30% faster than the
> > sequential scan.
>
> A 30% discrepancy doesn't sound too bad, to be honest. The exact
> threshold depends on so many factors.
>
> > My initial hunch was that perhaps we need to consider a different
> > approach when considering cost for index only scan. However, the
> > solution seems somewhat simple.
> >
> > cost_index function in costsize.c, in case of indexonlyscan, multiplies
> > the number of pages fetched by a factor of (1.0 - baserel->allvisfrac)
> > which is then used to calculate the max_IO_cost and min_IO_cost.
> >
> > This is very similar to the cost estimate methods for indexes internally
> > call genericostesimate function. This function primarily gets the number
> > of pages for the indexes and multiplies that with random page cost
> > (spc_random_page_cost) to get the total disk access cost.
> >
> > I believe that in case of index only scan, we should adjust the
> > spc_random_page_cost in context of baserel->allvisfrac so that it
> > accounts for random pages for only the fraction that needs to be read
> > for the relation and excludes that the index page fetches.
>
> That doesn't sound right to me. The genericcostestimate() function
> calculates the number of *index* pages fetched. It makes no difference
> if it's an Index Scan or an Index Only Scan.
>
> genericcostestimate() could surely be made smarter. Currently, it
> multiplies the number of index pages fetched with random_page_cost, even
> though a freshly created index is mostly physically ordered by the keys.
> seq_page_cost with some fudge factor might be more appropriate, whether
> or not it's an Index Only Scan. Not sure what the exact formula should
> be, just replacing random_page_cost with seq_page_cost is surely not
> right either.
>
> - Heikki
>

So, not actually random replacement here, rather a change with
baserel->allvisfrac taken into consideration (as given below):

index_random_page_cost = Min(spc_seq_page_cost + spc_random_page_cost *
(1.0 - baserel->allvisfrac), spc_random_page_cost);


Does this make sense?

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


Disable WAL logging to speed up data loading

2020-09-29 Thread tsunakawa.ta...@fujitsu.com
Hello,


We'd like to propose a feature to disable WAL to speed up data loading.  This 
was inspired by a feature added in the latest MySQL.  I wish you won't fear 
this feature...


BACKGROUND


This branches off from [1] as mentioned therein.  Briefly speaking, a customer 
wants to shorten the time for nightly loading of data into their data warehouse 
as much as possible to be prepared for using the data warehouse for new things.

Currently, they are using Oracle's SQL*Loader with its no-logging feature.  
They want a similar feature to migrate to Postgres.  Other than the data 
loading performance, they don't want to be concerned about the storage for 
large volumes of WAL.

In [1], we thought about something like Oracle's per-table no-logging feature, 
but it seems difficult (or at least not easy.)  Meanwhile, I found another 
feature added in the latest MySQL 8.0.21 [2].  This proposal follows it almost 
directly.  That satisfies the customer request.

As an aside, it's also conceivable that in the near future, users could see the 
WAL bottleneck (WAL buffer or disk) when they utilize the parallel COPY that is 
being developed in the community.


FUNCTIONAL SPECIFICATION


Add a new value 'none' to the server configuration parameter wal_level.  With 
this setting:

* No WAL is emitted.

* The server refuses to start (pg_ctl start fails) after an abnormal shutdown 
due to power outage, pg_ctl's immediate shutdown, etc, showing a 
straightforward message like MySQL.

* Features like continuous archiving, pg_basebackup, and streaming/logical 
replication that requires wal_level >= replica are not available.

* The user can use all features again if you shut down the server successfully 
after data loading and reset wal_level to a value other than none.  He needs to 
take a base backup or rebuild the replication standby after restarting the 
server.


In addition to the cosmetic modifications to the manual articles that refer to 
wal_level, add a clause or paragraphs to the following sections to let users 
know the availability of this feature.

14.4. Populating a Database
18.6.1. Upgrading Data via pg_dumpall


PROGRAM DESIGN (main point only)


As in the bootstrap mode (during initdb), when wal_level = none, XLogInsert() 
does nothing and just returns a fixed value, which is the tail of the last 
shutdown checkpoint WAL record.  As a result, the value is set to the relation 
page header's LSN field.

In addition, it might be worth having XLogBeginInsert() and XLogRec...() to 
check wal_level and just return.  I don't expect much from this, but it may be 
interesting to give it a try to see the squeezed performance.

StartupXLOG() checks the wal_level setting in pg_control and quits the startup 
with ereport(FATAL) accordingly.


[1]
Implement UNLOGGED clause for COPY FROM
https://www.postgresql.org/message-id/osbpr01mb47c0bdc5129c65dfc5e5ed...@osbpr01mb4888.jpnprd01.prod.outlook.com

[2]
Disabling Redo Logging
https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging


Regards
Takayuki Tsunakawa





Re: Assertion failure with barriers in parallel hash join

2020-09-29 Thread Thomas Munro
On Tue, Sep 29, 2020 at 7:11 PM Michael Paquier  wrote:
> #2  0x009027d2 in ExceptionalCondition
> (conditionName=conditionName@entry=0xa80846 "!barrier->static_party",

> #4  0x00682ebf in ExecParallelHashJoinNewBatch

Thanks.  Ohhh.  I think I see how that condition was reached and what
to do about it, but I'll need to look more closely.  I'm away on
vacation right now, and will update in a couple of days when I'm back
at a real computer.




Re: Improved Cost Calculation for IndexOnlyScan

2020-09-29 Thread Heikki Linnakangas

On 29/09/2020 10:06, Hamid Akhtar wrote:
In one of my earlier emails [1], I mentioned that there seems to be a 
problem with how the cost for index only scans is being calculated.
[1] 
https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com


My concern is that there seems to be a bigger disconnect between the 
cost of index only scan and the execution time. Having tested this on 3 
different systems, docker, laptop and a server with RAID 5 SSD 
configured, at the threshold where index only scan cost exceeds that of 
sequential scan, index only is still around 30% faster than the 
sequential scan.


A 30% discrepancy doesn't sound too bad, to be honest. The exact 
threshold depends on so many factors.


My initial hunch was that perhaps we need to consider a different 
approach when considering cost for index only scan. However, the 
solution seems somewhat simple.


cost_index function in costsize.c, in case of indexonlyscan, multiplies 
the number of pages fetched by a factor of (1.0 - baserel->allvisfrac) 
which is then used to calculate the max_IO_cost and min_IO_cost.


This is very similar to the cost estimate methods for indexes internally 
call genericostesimate function. This function primarily gets the number 
of pages for the indexes and multiplies that with random page cost 
(spc_random_page_cost) to get the total disk access cost.


I believe that in case of index only scan, we should adjust the 
spc_random_page_cost in context of baserel->allvisfrac so that it 
accounts for random pages for only the fraction that needs to be read 
for the relation and excludes that the index page fetches.


That doesn't sound right to me. The genericcostestimate() function 
calculates the number of *index* pages fetched. It makes no difference 
if it's an Index Scan or an Index Only Scan.


genericcostestimate() could surely be made smarter. Currently, it 
multiplies the number of index pages fetched with random_page_cost, even 
though a freshly created index is mostly physically ordered by the keys. 
seq_page_cost with some fudge factor might be more appropriate, whether 
or not it's an Index Only Scan. Not sure what the exact formula should 
be, just replacing random_page_cost with seq_page_cost is surely not 
right either.


- Heikki




Re: Support for NSS as a libpq TLS backend

2020-09-29 Thread Daniel Gustafsson
> On 29 Sep 2020, at 07:59, Michael Paquier  wrote:
> 
> On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote:
>> Attached is a v10 rebased to apply on top of HEAD.
> 
> I am afraid that this needs a new rebase.  The patch is failing to
> apply, per the CF bot. :/

It's failing on binary diffs due to the NSS certificate databases being
included to make hacking on the patch easier:

  File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported.

This is a limitation of the CFBot patch tester, the text portions of the patch
still applies with a tiny but of fuzz.

cheers ./daniel



Re: Buggy handling of redundant options in COPY

2020-09-29 Thread Pavel Stehule
út 29. 9. 2020 v 9:24 odesílatel Michael Paquier 
napsal:

> Hi all,
>
> While diving into the CF, I have noticed the following message from
> Remy (in CC):
>
> https://www.postgresql.org/message-id/0b55bd07-83e4-439f-aacc-fa2d7cf50...@lenstra.fr
>
> The following two cases should fail the same way, but the second does
> not because we check directly the flag value extracted from the
> DefElem to see if the option is repeated or not:
> =# copy (select 1) to '/tmp/data.txt' (header on, header off);
> ERROR:  42601: conflicting or redundant options
> =# copy (select 1) to '/tmp/data.txt' (header off, header on);
> ERROR:  0A000: COPY HEADER available only in CSV mode
>
> Looking quickly at the usages of defGetBoolean() across the code, it
> seems that we are rather consistent on a command-basis to handle such
> cases (EXPLAIN does not care, subscriptions do, etc.), while COPY is
> a mixed bag that clearly aims at checking for redundant options
> correctly.  So, attached is a patch to do that, with tests for the
> various options while on it.  This is not something worth a
> back-patch in my opinion.
>
> Any thoughts?
>

+1

Pavel

--
> Michael
>


Re: history file on replica and double switchover

2020-09-29 Thread Fujii Masao




On 2020/09/26 5:34, David Zhang wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

"make installcheck-world" test was performed on branch "REL_13_STABLE" with tag "REL_13_0". The 
regression failed was not caused by the "history file" patch, since it has the same error on my environment even 
without any patch. So the failure is not related, in other words, the patch "history_replica_v4.patch" is good for me.


Thanks for the check! I pushed the patch. Thanks!



Below is the failed message when tested with and without 
"history_replica_v4.patch".


BTW, I could not reproduce this issue in my env.
I'm not sure why this failure happened in your env

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Buggy handling of redundant options in COPY

2020-09-29 Thread Michael Paquier
Hi all,

While diving into the CF, I have noticed the following message from
Remy (in CC):
https://www.postgresql.org/message-id/0b55bd07-83e4-439f-aacc-fa2d7cf50...@lenstra.fr

The following two cases should fail the same way, but the second does
not because we check directly the flag value extracted from the
DefElem to see if the option is repeated or not:
=# copy (select 1) to '/tmp/data.txt' (header on, header off);
ERROR:  42601: conflicting or redundant options
=# copy (select 1) to '/tmp/data.txt' (header off, header on);
ERROR:  0A000: COPY HEADER available only in CSV mode

Looking quickly at the usages of defGetBoolean() across the code, it
seems that we are rather consistent on a command-basis to handle such
cases (EXPLAIN does not care, subscriptions do, etc.), while COPY is 
a mixed bag that clearly aims at checking for redundant options
correctly.  So, attached is a patch to do that, with tests for the
various options while on it.  This is not something worth a
back-patch in my opinion.

Any thoughts?
--
Michael
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2047557e52..3c7dbad27a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1159,6 +1159,8 @@ ProcessCopyOptions(ParseState *pstate,
    List *options)
 {
 	bool		format_specified = false;
+	bool		freeze_specified = false;
+	bool		header_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -1198,11 +1200,12 @@ ProcessCopyOptions(ParseState *pstate,
 		}
 		else if (strcmp(defel->defname, "freeze") == 0)
 		{
-			if (cstate->freeze)
+			if (freeze_specified)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
 		 errmsg("conflicting or redundant options"),
 		 parser_errposition(pstate, defel->location)));
+			freeze_specified = true;
 			cstate->freeze = defGetBoolean(defel);
 		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
@@ -1225,11 +1228,12 @@ ProcessCopyOptions(ParseState *pstate,
 		}
 		else if (strcmp(defel->defname, "header") == 0)
 		{
-			if (cstate->header_line)
+			if (header_specified)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
 		 errmsg("conflicting or redundant options"),
 		 parser_errposition(pstate, defel->location)));
+			header_specified = true;
 			cstate->header_line = defGetBoolean(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d25a..c64f0719e7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -28,6 +28,53 @@ COPY x (a, b, c, d, e) from stdin;
 -- non-existent column in column list: should fail
 COPY x (xyz) from stdin;
 ERROR:  column "xyz" of relation "x" does not exist
+-- redundant options
+COPY x from stdin (format CSV, FORMAT CSV);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (format CSV, FORMAT CSV);
+   ^
+COPY x from stdin (freeze off, freeze on);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (freeze off, freeze on);
+   ^
+COPY x from stdin (delimiter ',', delimiter ',');
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (delimiter ',', delimiter ',');
+  ^
+COPY x from stdin (null ' ', null ' ');
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (null ' ', null ' ');
+ ^
+COPY x from stdin (header off, header on);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (header off, header on);
+   ^
+COPY x from stdin (quote ':', quote ':');
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (quote ':', quote ':');
+  ^
+COPY x from stdin (escape ':', escape ':');
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (escape ':', escape ':');
+   ^
+COPY x from stdin (force_quote (a), force_quote *);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (force_quote (a), force_quote *);
+^
+COPY x from stdin (force_not_null (a), force_not_null (b));
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (force_not_null (a), force_not_null (b));
+   ^
+COPY x from stdin (force_null (a), force_null (b));
+ERROR:  conflicting or redundant options
+COPY x from stdin (convert_selectively (a), convert_selectively (b));
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (convert_selectively (a), convert_selectiv...
+^
+COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
+ERROR:  conflicting or redundant options
+LINE 1: COPY

Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

2020-09-29 Thread Heikki Linnakangas

On 10/09/2020 09:37, Zidenberg, Tsahi wrote:

On 08/09/2020, 1:01, "Tom Lane"  wrote:

 > I wonder what version of gcc you intend this for.  AFAICS, older
 > gcc versions lack this flag at all, while newer ones have it on
 > by default.


(previously sent private reply, sorry)

The moutline-atomics flag showed substantial enough improvements
that it has been backported to GCC 9, 8 and there is a gcc-7 branch in
the works.
Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it,
with other distributions including Ubuntu 18.04 and Debian on the way.
all distributions, including the upcoming Ubuntu with GCC-10, have
moutline-atomics turned off by default.


If it's a good idea to use -moutline-atomics, I would expect the 
compiler or distribution to enable it by default. And as you pointed 
out, many have. For the others, there are probably reasons they haven't, 
like begin conservative in general. Whatever the reasons, IMHO we should 
not second-guess them.


I'm marking this as Rejected in the commitfest. But thanks for the 
benchmarking, that is valuable information nevertheless.


- Heikki




Improved Cost Calculation for IndexOnlyScan

2020-09-29 Thread Hamid Akhtar
In one of my earlier emails [1], I mentioned that there seems to be a
problem with how the cost for index only scans is being calculated.
[1]
https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com

My concern is that there seems to be a bigger disconnect between the cost
of index only scan and the execution time. Having tested this on 3
different systems, docker, laptop and a server with RAID 5 SSD configured,
at the threshold where index only scan cost exceeds that of sequential
scan, index only is still around 30% faster than the sequential scan.

My initial hunch was that perhaps we need to consider a different approach
when considering cost for index only scan. However, the solution seems
somewhat simple.

cost_index function in costsize.c, in case of indexonlyscan, multiplies the
number of pages fetched by a factor of (1.0 - baserel->allvisfrac) which is
then used to calculate the max_IO_cost and min_IO_cost.

This is very similar to the cost estimate methods for indexes internally
call genericostesimate function. This function primarily gets the number of
pages for the indexes and multiplies that with random page cost
(spc_random_page_cost) to get the total disk access cost.

I believe that in case of index only scan, we should adjust the
spc_random_page_cost in context of baserel->allvisfrac so that it accounts
for random pages for only the fraction that needs to be read for the
relation and excludes that the index page fetches.

With the attached POC for this approach (based on the current master
branch), index only scan which was previously bailing out at much earlier,
approximately around when 50% of the index/table was being scanned. With
the attached patch, this percentage goes upto 70%. The execution time for
index only still remains well below that of the sequential scan, however,
this is a significant improvement IMHO.

Following is the simple SQL that I was using to see how this patch impacts
that indexonlyscan costing and execution time for the scan. You may tweak
the table size or the number of rows being scanned for your environment..

SQL:
-
CREATE TABLE index_test AS (SELECT GENERATE_SERIES(1, 100)::int id,
'hello world' AS name);
CREATE INDEX on index_test(id);
VACUUM ANALYZE index_test;
EXPLAIN ANALYZE SELECT COUNT(*) FROM index_test WHERE id < 700;

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


poc_indexonly_costing.patch
Description: Binary data