date:20190402

Re: [PATCH v20] GSSAPI encryption support

2019-04-02 Thread Magnus Hagander

On Wed, Apr 3, 2019 at 12:22 AM Joe Conway  wrote:

> On 4/2/19 6:18 PM, Stephen Frost wrote:
> > Greetings,
> >
> > On Tue, Apr 2, 2019 at 18:10 Peter Eisentraut
> >  > > wrote:
> >
> > On 2019-02-23 17:27, Stephen Frost wrote:
> > >> About pg_hba.conf: The "hostgss" keyword seems a bit confusing.
> > It only
> > >> applies to encrypted gss-using connections, not all of them.
> Maybe
> > >> "hostgssenc" or "hostgsswrap"?
> > > Not quite sure what you mean here, but 'hostgss' seems to be quite
> > well
> > > in-line with what we do for SSL...  as in, we have 'hostssl', we
> don't
> > > say 'hostsslenc'.  I feel like I'm just not understanding what you
> > mean
> > > by "not all of them".
> >
> > Reading the latest patch, I think this is still a bit confusing.
> > Consider an entry like
> >
> > hostgss all all 0.0.0.0/0
> >    gss
> >
> > The "hostgss" part means, the connection is GSS-*encrypted*.  The
> "gss"
> > entry in the last column means use gss for *authentication*.  But
> didn't
> > "hostgss" already imply that?  No.  I understand what's going on,
> but it
> > seems quite confusing.  They both just say "gss"; you have to know a
> lot
> > about the nuances of pg_hba.conf processing to get that.
> >
> > If you have line like
> >
> > hostgss all all 0.0.0.0/0
> >    md5
> >
> > it is not obvious that this means, if GSS-encrypted, use md5.  It
> could
> > just as well mean, if GSS-authenticated, use md5.
> >
> > The analogy with SSL is such that we use "hostssl" for connections
> using
> > SSL encryption and "cert" for the authentication method.  So there we
> > use two different words for two different aspects of SSL.
> >
> >
> > I don’t view it as confusing, but I’ll change it to hostgssenc as was
> > suggested earlier to address that concern.  It’s a bit wordy but if it
> > helps reduce confusion then that’s a good thing.
>
> Personally I don't find it as confusing as is either, and I find hostgss
> to be a good analog of hostssl. On the other hand hostgssenc is long and
> unintuitive. So +1 for leaving as is and -1 one for changing it IMHO.
>

I think for those who are well versed in pg_hba (and maybe gss as well),
it's not confusing. That includes me.

However, for a new user, I can definitely see how it can be considered
confusing. And confusion in *security configuration* is always a bad idea,
even if it's just potential.

Thus +1 on changing it.

If it was on the table it might have been better to keep hostgss and change
the authentication method to gssauth or something, but that ship sailed
*years* ago.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] generated columns

2019-04-02 Thread Amit Langote

On 2019/04/03 3:54, Erik Rijkers wrote:
> create schema if not exists "tmp";
> CREATE SCHEMA
> create server if not exists "tmpserver" foreign data wrapper file_fdw;
> CREATE SERVER
> drop   foreign table if exists tmp.pg_head cascade;
> DROP FOREIGN TABLE
> create foreign table   tmp.pg_head (
>     "Gene"  text,
>     "Ratio H/L normalized Exp1" numeric
> )
> server tmpserver
> options (
>     delimiter E'\t'
>   , format 'csv'
>   , header 'TRUE'
>   , filename  '/tmp/pg_head.txt'
> );
> CREATE FOREIGN TABLE
> alter foreign table tmp.pg_head
>    add column "Ratio H/L normalized Exp1 Log2 (Generated column)" numeric
> generated always as (case when "Ratio H/L normalized Exp1" > 0 then log(2,
> "Ratio H/L normalized Exp1") else  null end) stored
> ;
> ALTER FOREIGN TABLE
> -- this is OK (although the generated-column values are all empty/null)
> select
>  "Gene"
>    , "Ratio H/L normalized Exp1"
>    , "Ratio H/L normalized Exp1 Log2 (Generated column)"
> from tmp.pg_head
> limit 3 ;
>   Gene  | Ratio H/L normalized Exp1 | Ratio H/L normalized Exp1 Log2
> (Generated column)
> +---+---
> 
>  Dhx9   |   NaN |
>  Gapdh  |   0.42288 |
>  Gm8797 |   0.81352 |
> (3 rows)
> 
> -- but this fails
> select
>     "Gene"
>    , "Ratio H/L normalized Exp1 Log2 (Generated column)"
> from tmp.pg_head
> limit 3 ;
> ERROR:  column "Ratio H/L normalized Exp1 Log2 (Generated column)" is a
> generated column
> DETAIL:  Generated columns cannot be used in COPY.

Fwiw, here's a scenario that fails similarly when using copy, which is
what file_fdw uses internally.

$ cat foo.csv
1

create table foo (a int, b int generated always as (a+1) stored);
copy foo (a, b) from '/tmp/foo.csv';
ERROR:  column "b" is a generated column
DETAIL:  Generated columns cannot be used in COPY.

whereas:

copy foo from '/tmp/foo.csv';
COPY 1

select * from foo;
 a │ b
───┼───
 1 │ 2
(1 row)

Erik said upthread that generated columns should really be disallowed for
(at least) file_fdw foreign tables [1], because (?) they don't support
inserts.  Changing copy.c to "fix the above seems out of the question.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/e61c597ac4541b77750594eea73a774c%40xs4all.nl

Caveats from reloption toast_tuple_target

2019-04-02 Thread Michael Paquier

Hi all,
(Adding Simon as the author of toast_tuple_target, as well Andrew and 
Pavan in CC.)

toast_tuple_target has been introduced in 2017 by c251336 as of v11.
And while reviewing Pavan's patch to have more complex control over
the compression threshold of a tuple, I have bumped into some
surprising code:
https://www.postgresql.org/message-id/20190403044916.gd3...@paquier.xyz

As far as I understand it, even with this option we don't try to toast
tuples in heap_prepare_insert() and heap_update() where
TOAST_TUPLE_THRESHOLD gets used to define if a tuple can be toasted or
not.  The same applies to raw_heap_insert() in rewriteheap.c, and
needs_toast_table() in toasting.c.

Shouldn't we use the reloption instead of the compiled threshold to
determine if a tuple should be toasted or not?  Perhaps I am missing
something?  It seems to me that this is a bug that should be
back-patched, but it could also be qualified as a behavior change for
existing relations.

Thoughts?
--
Michael


signature.asc
Description: PGP signature

Re: Re: A separate table level option to control compression

2019-04-02 Thread Michael Paquier

On Wed, Apr 03, 2019 at 11:40:57AM +0530, Pavan Deolasee wrote:
> On Wed, Apr 3, 2019 at 10:19 AM Michael Paquier  wrote:
>> - The comments in tuptoaster.h need to be updated to outline the
>> difference between the compression invocation and the toast invocation
>> thresholds.  The wording could be better though.
> 
> I see that you've done this already. But please let me if more is needed.

If you have a better idea of wording for that part...  Please feel
free.

> Yeah, this is an issue with the existing code. Even though we allow setting
> toast_tuple_target to a value less than compile-time TOAST_TUPLE_THRESHOLD,
> the code doesn't really honour a value less than TOAST_TUPLE_THRESHOLD. In
> other words, setting toast_tuple_target lesser than TOAST_TUPLE_THRESHOLD
> doesn't have any effect. We don't even create a toast table if the
> estimated length of tuple is not greater than TOAST_TUPLE_THRESHOLD.
>
> The change introduced by this patch will now trigger the tuptoaster code
> when the compression or toast threshold is set to a value lower than
> TOAST_TUPLE_THRESHOLD. But as far as I can see, that doesn't have any bad
> effect on the toasting since toast_insert_or_update() is capable of
> handling the case when the toast table is missing. There will be a
> behavioural change though. e.g.

It seems to me that c251336 should have done all those things from the 
start...  In other terms, isn't that a bug and something that we
should fix and back-patch?  I'll begin a new thread about that to
catch more attention, with Simon and Andrew in CC.
--
Michael


signature.asc
Description: PGP signature

Re: Re: A separate table level option to control compression

2019-04-02 Thread Pavan Deolasee

On Wed, Apr 3, 2019 at 10:19 AM Michael Paquier  wrote:

>
>
> I have been looking at this patch more, and here are some notes:
> - The tests can be really simplified using directly reltoastrelid, so
> I changed the queries this way.  I am aware that the surroundings
> hardcode directly the relation name, but that's not really elegant in
> my opinion.  And I am really tempted to adjust these as well to
> directly use reltoastrelid.
>

I agree using reltoastrelid is a better way. I just followed the
surrounding tests, but +1 to change all of these use reltoastrelid.

> - The docs had a weird indentation.
>

Sorry about that. I was under the impression that it doesn't matter since
the doc builder ultimately chooses the correct indentation. I'd a patch
ready after Sawada's review, but since you already fixed that, I am not
sending it.

> - I think that we should be careful with the portability of
> pg_column_size(), so I have added comparisons instead of the direct
> values in a way which does not change the meaning of the tests nor
> their coverage.
>

Ok, understood.

> - Having RelationGetCompressTupleTarget use directly
> toast_tuple_threshold as default argument is I think kind of
> confusing, so let's use a different static value, named
> COMPRESS_TUPLE_TARGET in the attached.  This is similar to
> TOAST_TUPLE_TARGET for the toast tuple threshold.
>

Sounds good. This also takes care of the other issue you brought about
"hoff" getting subtracted twice.

> - The comments in tuptoaster.h need to be updated to outline the
> difference between the compression invocation and the toast invocation
> thresholds.  The wording could be better though.
>

I see that you've done this already. But please let me if more is needed.

> - Better to avoid comments in the middle of the else/if blocks in my
> opinion.
>

This is probably a personal preference and I've seen many other places
where we do that (see e.g. lazy_scan_heap()). But I don't have any strong
views on this. I see that you've updated comments in tuptoaster.h, so no
problem if you want to remove the comments from the code block.

>
> Also, the previous versions of the patch do that when doing a heap
> insertion (heapam.c and rewriteheap.c):
> +toast_tuple_threshold = RelationGetToastTupleTarget(relation,
> +TOAST_TUPLE_THRESHOLD);
> +compress_tuple_threshold =
> RelationGetCompressTupleTarget(relation,
> +toast_tuple_threshold);
> +compress_tuple_threshold = Min(compress_tuple_threshold,
> +   toast_tuple_threshold);
> [...]
> need_toast = (HeapTupleHasExternal(&oldtup) ||
>   HeapTupleHasExternal(newtup) ||
> -newtup->t_len > TOAST_TUPLE_THRESHOLD);
> +newtup->t_len > compress_tuple_threshold);
>
> This means that the original code always uses the compilation-time,
> default value of toast_tuple_target for all relations.  But this gets
> changed so as we would use the value set at relation level for
> toast_tuple_target if the reloption is changed, without touching
> compress_tuple_threshold.  This is out of the scope of this patch, but
> shouldn't we always use the relation-level value instead of the
> compiled one?  Perhaps there is something I am missing?
>

Yeah, this is an issue with the existing code. Even though we allow setting
toast_tuple_target to a value less than compile-time TOAST_TUPLE_THRESHOLD,
the code doesn't really honour a value less than TOAST_TUPLE_THRESHOLD. In
other words, setting toast_tuple_target lesser than TOAST_TUPLE_THRESHOLD
doesn't have any effect. We don't even create a toast table if the
estimated length of tuple is not greater than TOAST_TUPLE_THRESHOLD.

The change introduced by this patch will now trigger the tuptoaster code
when the compression or toast threshold is set to a value lower than
TOAST_TUPLE_THRESHOLD. But as far as I can see, that doesn't have any bad
effect on the toasting since toast_insert_or_update() is capable of
handling the case when the toast table is missing. There will be a
behavioural change though. e.g.

CREATE TABLE testtab (a text) WITH (toast_tuple_target=256);
INSERT INTO testtab VALUES ;

Earlier this tuple would not have been toasted, even though
toast_tuple_target is set to 256. But with the new code, the tuple will get
toasted. So that's a change, but in the right direction as far as I can
see. Also, this is a bit unrelated to what this patch is trying to achieve.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: COPY FROM WHEN condition

2019-04-02 Thread Andres Freund

On 2019-04-03 18:44:32 +1300, David Rowley wrote:
> (Fixed all of what you mentioned)
> 
> On Wed, 3 Apr 2019 at 06:57, Andres Freund  wrote:
> > > +/*
> > > + * CopyMultiInsertInfo_Flush
> > > + *   Write out all stored tuples in all buffers out to the 
> > > tables.
> > > + *
> > > + * To save us from ending up with buffers for 1000s of partitions we 
> > > remove
> > > + * buffers belonging to partitions that we've seen no tuples for in this 
> > > batch
> >
> > That seems a little naive (imagine you have like 5 partitions, and we
> > constantly cycle through 2-3 of them per batch).  It's probably OK for
> > this version.   I guess I'd only do that cleanup if we're actually
> > flushing due to the number of partitions.
> 
> hmm good point.  It seems like being smarter there would be a good thing.
> 
> I've ended up getting rid of the hash table in favour of the List that
> you mentioned and storing the buffer in ResultRelInfo.


Cool.

> I also changed
> the logic that removes buffers once we reach the limit.  Instead of
> getting rid of buffers that were not used on this run, I've changed it
> so it just gets rid of the buffers starting with the oldest one first,
> but stops once the number of buffers is at the maximum again.  This
> can mean that we end up with MAX_BUFFERED_TUPLES buffers instead of
> MAX_PARTITION_BUFFERS if there is only 1 tuple per buffer.  My current
> thinking is that this does not matter since only 1 slot will be
> allocated per buffer.  We'll remove all of the excess buffers during
> the flush and keep just MAX_PARTITION_BUFFERS of the newest buffers.

Yea, that makes sense to me.


> Also, after changing CopyMultiInsertBuffer to use fixed sized arrays
> instead of allocating them with another palloc the performance has
> improved a bit more.
> 
> Using the perl files mentioned in [1]
> 
> Master + Patched:
> # copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
> COPY 35651564
> Time: 9106.776 ms (00:09.107)
> # truncate table listp;
> TRUNCATE TABLE
> # copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 10154.196 ms (00:10.154)
> 
> 
> Master only:
> # copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
> COPY 35651564
> Time: 22200.535 ms (00:22.201)
> # truncate table listp;
> TRUNCATE TABLE
> # copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 18592.107 ms (00:18.592)

Awesome. I'm probably too tired to give you meaningful feedback
tonight. But I'll do a quick readthrough.



> +/* Class the buffer full if there are >= this many bytes of tuples stored */
> +#define MAX_BUFFERED_BYTES   65535

Random aside: This seems pretty small (but should be changed separately.


> +/* Trim the list of buffers back down to this number after flushing */
> +#define MAX_PARTITION_BUFFERS32
> +
> +/* Stores multi-insert data related to a single relation in CopyFrom. */
> +typedef struct CopyMultiInsertBuffer
> +{
> + TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
> + ResultRelInfo *resultRelInfo;   /* ResultRelInfo for 'relid' */
> + BulkInsertState bistate;/* BulkInsertState for this rel */
> + int nused;  /* number of 'slots' 
> containing tuples */
> + uint64  linenos[MAX_BUFFERED_TUPLES];   /* Line # of tuple in 
> copy
> + 
>  * stream */
> +} CopyMultiInsertBuffer;

I don't think it's needed now, but you'd probably achieve a bit better
locality by storing slots + linenos in a single array of (slot,lineno).

> +/*
> + * CopyMultiInsertBuffer_Init
> + *   Allocate memory and initialize a new CopyMultiInsertBuffer for 
> this
> + *   ResultRelInfo.
> + */
> +static CopyMultiInsertBuffer *
> +CopyMultiInsertBuffer_Init(ResultRelInfo *rri)
> +{
> + CopyMultiInsertBuffer *buffer;
> +
> + buffer = (CopyMultiInsertBuffer *) 
> palloc(sizeof(CopyMultiInsertBuffer));
> + memset(buffer->slots, 0, sizeof(TupleTableSlot *) * 
> MAX_BUFFERED_TUPLES);

Is there a reason to not just directly palloc0?


> +/*
> + * CopyMultiInsertBuffer_Cleanup
> + *   Drop used slots and free member for this buffer.  The buffer
> + *   must be flushed before cleanup.
> + */
> +static inline void
> +CopyMultiInsertBuffer_Cleanup(CopyMultiInsertBuffer *buffer)
> +{
> + int i;
> +
> + /* Ensure buffer was flushed */
> + Assert(buffer->nused == 0);
> +
> + /* Remove back-link to ourself */
> + buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
> +
> + ReleaseBulkInsertStatePin(buffer->bistate);

Hm, afaict this still leaks the bistate itself?


Greetings,

Andres Freund

Re: Ordered Partitioned Table Scans

2019-04-02 Thread David Rowley

On Wed, 3 Apr 2019 at 14:01, Amit Langote  wrote:
> I don't have any more comments.

Great. Many thanks for having a look at this.  Going by [1], Julien
also seems pretty happy, so I'm going to change this over to ready for
committer.

[1] 
https://www.postgresql.org/message-id/caobau_yptqbfqcp5jyjzetpl6mgyutwvtvvbzkzkc6xdbtd...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: COPY FROM WHEN condition

2019-04-02 Thread David Rowley

(Fixed all of what you mentioned)

On Wed, 3 Apr 2019 at 06:57, Andres Freund  wrote:
> > +/*
> > + * CopyMultiInsertInfo_Flush
> > + *   Write out all stored tuples in all buffers out to the tables.
> > + *
> > + * To save us from ending up with buffers for 1000s of partitions we remove
> > + * buffers belonging to partitions that we've seen no tuples for in this 
> > batch
>
> That seems a little naive (imagine you have like 5 partitions, and we
> constantly cycle through 2-3 of them per batch).  It's probably OK for
> this version.   I guess I'd only do that cleanup if we're actually
> flushing due to the number of partitions.

hmm good point.  It seems like being smarter there would be a good thing.

I've ended up getting rid of the hash table in favour of the List that
you mentioned and storing the buffer in ResultRelInfo. I also changed
the logic that removes buffers once we reach the limit.  Instead of
getting rid of buffers that were not used on this run, I've changed it
so it just gets rid of the buffers starting with the oldest one first,
but stops once the number of buffers is at the maximum again.  This
can mean that we end up with MAX_BUFFERED_TUPLES buffers instead of
MAX_PARTITION_BUFFERS if there is only 1 tuple per buffer.  My current
thinking is that this does not matter since only 1 slot will be
allocated per buffer.  We'll remove all of the excess buffers during
the flush and keep just MAX_PARTITION_BUFFERS of the newest buffers.

Also, after changing CopyMultiInsertBuffer to use fixed sized arrays
instead of allocating them with another palloc the performance has
improved a bit more.

Using the perl files mentioned in [1]

Master + Patched:
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
COPY 35651564
Time: 9106.776 ms (00:09.107)
# truncate table listp;
TRUNCATE TABLE
# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 10154.196 ms (00:10.154)

Master only:
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
COPY 35651564
Time: 22200.535 ms (00:22.201)
# truncate table listp;
TRUNCATE TABLE
# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 18592.107 ms (00:18.592)

> >   if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> >   {
> >   ereport(ERROR,
> > - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > - errmsg("cannot perform FREEZE on a 
> > partitioned table")));
> > + 
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +  errmsg("cannot perform FREEZE on a 
> > partitioned table")));
> >   }
>
> accidental hunk?

It was left over from a pgindent run. Now removed.

[1] 
https://www.postgresql.org/message-id/CAKJS1f98Fa+QRTGKwqbtz0M=cy1ehyr8q-w08cpa78toy4e...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

david_tableam_copy_v3.patch
Description: Binary data

Re: pg_basebackup ignores the existing data directory permissions

2019-04-02 Thread Haribabu Kommi

On Fri, Mar 29, 2019 at 9:05 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-26 03:26, Michael Paquier wrote:
> > Do we really want to extend the replication protocol to control that?
>
> Perhaps we are losing sight of the original problem, which is that if
> you create the target directory with the wrong permissions then ... it
> has the wrong permissions.  And you are free to change the permissions
> at any time.  Many of the proposed solutions sound excessively
> complicated relative to that.
>

Yes, I agree that the proposed solution for fixing the original problem of
existing
data directory permissions with new group options to pg_basebackup is a
big.

why can't we just fix the permissions of the directory by default as per the
source instance? I feel with this change, it may be useful to many people
than problem.

>From understanding of the thread discussion,

+1 by:

Michael Paquier
Robert Haas
Haribabu Kommi

-1 by:

Magnus Hagander
Peter Eisentraut

Does any one want to weigh their opinion on this patch to consider best
option for controlling the existing standby data directory permissions.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: New vacuum option to do only freezing

2019-04-02 Thread Masahiko Sawada

On Wed, Apr 3, 2019 at 1:17 PM Kyotaro HORIGUCHI
 wrote:
>
> At Wed, 3 Apr 2019 12:10:03 +0900, Masahiko Sawada  
> wrote in 
> > > And in the following part:
> > >
> > > +   /* Set index cleanup option based on reloptions */
> > > +   if (params->index_cleanup == VACUUM_OPTION_DEFAULT)
> > > +   {
> > > +   if (onerel->rd_options == NULL ||
> > > +   ((StdRdOptions *) 
> > > onerel->rd_options)->vacuum_index_cleanup)
> > > +   params->index_cleanup = VACUUM_OPTION_ENABLED;
> > > +   else
> > > +   params->index_cleanup = VACUUM_OPTION_DISABLED;
> > > +   }
> > > +
> > >
> > > The option should not be false while VACUUM FULL,
> >
> > I think that we need to complain only when INDEX_CLEANUP option is
> > disabled by an explicit option on the VACUUM command and FULL option
> > is specified. It's no problem when vacuum_index_cleanup is false and
> > FULL option is true. Since internally we don't use index cleanup when
> > vacuum full I guess that we don't need to require index_cleanup being
> > always true even when full option is specified.
>
> I know it's safe. It's just about integrity of option values. So
> I don't insist on that.
>
> > > and maybe we
> > > should complain in WARNING or NOTICE that the relopt is ignored.
> >
> > I think when users want to control index cleanup behavior manually
> > they specify INDEX_CLEANUP option on the VACUUM command. So it seems
> > to me that overwriting a reloption by an explicit option would be a
> > natural behavior. I'm concerned that these message would rather
> > confuse users.
>
> If it "cannot be specified with FULL", it seems strange that it's
> safe being specified by reloptions.
>
> I'm rather thinking that INDEX_CLEANUP = false is ignorable even
> being specified with FULL option, aand DISABLE_PAGE_SKIPPING for
> VACUUM FULL shuld be ignored since VACUUM FULL doesn't skip pages
> in the first place.
>
> Couldn't we silence the DISABLE_PAGE_SKIPPING & FULL case instead
> of complaining about INDEX_CLEANUP & FULL? If so, I feel just
> ignoring the relopt cases is reasonable.

Agreed with being silent even when INDEX_CLEANUP/vacuum_index_cleanup
= false and FULL = true. For  DISABLE_PAGE_SKIPPING, it should be a
separate patch and changes the existing behavior. Maybe need other
discussion.

For VacOptTernaryValue part, I've incorporated the review comments but
left the new enum type since it seems to be more straightforward for
now. I might change that if there are other way.

Attached the updated version patches including the
DISABLE_PAGE_SKIPPING part (0003).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From 48ad0ab24084833b888b9a0f5a7ce1a55fb24775 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Thu, 7 Mar 2019 09:45:11 +0900
Subject: [PATCH v13 1/3] Add INDEX_CLEANUP option to VACUUM command

If this option is false, VACUUM does HOT-pruning for live tuples but
doesn't remove dead tuples completely and disables index vacuum.

vacrelstats->dead_tuples could have tuples that became dead after
checked at a HOT-pruning time, which are not marked as dead. Per
discussion on pgsql-hackers We normally records and remove them but
with this option we don't process and leave for the next vacuum for
simplifing the code. That's okay because it's very rare condition and
those tuples will be processed by the next vacuum.
---
 doc/src/sgml/ref/create_table.sgml | 15 +++
 doc/src/sgml/ref/vacuum.sgml   | 23 ++
 src/backend/access/common/reloptions.c | 13 +-
 src/backend/access/heap/vacuumlazy.c   | 80 ++
 src/backend/commands/vacuum.c  | 28 
 src/backend/postmaster/autovacuum.c|  1 +
 src/bin/psql/tab-complete.c|  6 ++-
 src/include/commands/vacuum.h  | 15 +++
 src/include/utils/rel.h|  1 +
 src/test/regress/expected/vacuum.out   |  9 
 src/test/regress/sql/vacuum.sql| 10 +
 11 files changed, 181 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 0fcbc66..910db52 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1378,6 +1378,21 @@ WITH ( MODULUS numeric_literal, REM

 

+vacuum_index_cleanup (boolean)
+
+ 
+  Enables or disables index cleanup when VACUUM is
+  run on this table.  The default value is true.
+  Disabling index cleanup can speed of VACUUM very
+  significantly, but may also lead to severely bloated indexes if table
+  modifications are frequent.  The INDEX_CLEANUP
+  parameter to , if specified, overrides
+  the value of this option.
+ 
+
+   
+
+   
 autovacuum_vacuum_threshold, toast.autovacuum_vacuum_threshold (integer)
 
  
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/

Re: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-04-02 Thread Haribabu Kommi

On Wed, Apr 3, 2019 at 1:59 PM Amit Kapila  wrote:

> On Thu, Mar 28, 2019 at 11:43 AM Haribabu Kommi
>  wrote:
> >
> > On Wed, Mar 27, 2019 at 11:27 PM Amit Kapila 
> wrote:
> >>
> >> On Wed, Mar 27, 2019 at 6:53 AM Haribabu Kommi <
> kommi.harib...@gmail.com> wrote:
> >> > On Tue, Mar 26, 2019 at 9:08 PM Amit Kapila 
> wrote:
> >> >>
> >> >>   As part of this thread, maybe we can
> >> >> just fix the case of the parallel cooperating transaction.
> >> >
> >> >
> >> > With the current patch, all the parallel implementation transaction
> are getting
> >> > skipped, in my tests parallel workers are the major factor in the
> transaction
> >> > stats counter. Even before parallelism, the stats of the autovacuum
> and etc
> >> > are still present but their contribution is not greatly influencing
> the stats.
> >> >
> >> > I agree with you in fixing the stats with parallel workers and
> improve stats.
> >> >
> >>
> >> I was proposing to fix only the transaction started with
> >> StartParallelWorkerTransaction by using is_parallel_worker flag as
> >> discussed above.  I understand that it won't completely fix the
> >> problem reported by you, but it will be a step in that direction.  My
> >> main worry is that if we fix it the way you are proposing and we also
> >> invent a new way to deal with all other internal transactions, then
> >> the fix done by us now needs to be changed/reverted.  Note, that this
> >> fix needs to be backpatched as well, so we should avoid doing any fix
> >> which needs to be changed or reverted.
> >
> >
> > I tried the approach as your suggested as by not counting the actual
> parallel work
> > transactions by just releasing the resources without touching the
> counters,
> > the counts are not reduced much.
> >
> > HEAD - With 4 parallel workers running query generates 13 stats ( 4 * 3
> + 1)
> > Patch -  With 4 parallel workers running query generates 9 stats ( 4 * 2
> + 1)
> > Old approach patch - With 4 parallel workers running query generates 1
> stat (1)
> >
> > Currently the parallel worker start transaction 3 times in the following
> places.
> > 1. InitPostgres
> > 2. ParallelWorkerMain (2)
> >
> > with the attached patch, we reduce one count from ParallelWorkerMain.
> >
>
> Right, that is expected from this fix.  BTW, why you have changed the
> approach in this patch to not count anything by the parallel worker as
> compared to the earlier version where you were just ignoring the stats
> for transactions.  As of now, either way is fine, but in future (after
> parallel inserts/updates), we want other stats to be updated.  I think
> your previous idea was better, just you need to start using
> is_parallel_worker flag.
>

Thanks for the review.

While changing the approach to use the is_parallel_worker_flag, I thought
that the rest of the stats are also not required to be updated and also
those
are any way write operations and those values are zero anyway for parallel
workers.

Instead of expanding the patch scope, I just changed to avoid the commit
or rollback stats as discussed, and later we can target the handling of all
the
internal transactions and their corresponding stats.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Avoid-counting-parallel-worker-start-transactions_v4.patch
Description: Binary data

Re: Re: A separate table level option to control compression

2019-04-02 Thread Michael Paquier

On Tue, Apr 02, 2019 at 02:35:19PM +0900, Michael Paquier wrote:
> +   compress_tuple_threshold = RelationGetCompressTupleTarget(relation,
> +   toast_tuple_threshold);
> +   compress_tuple_threshold = Min(compress_tuple_threshold,
> +  toast_tuple_threshold);
> All the callers of RelationGetCompressTupleTarget directly compile the
> minimum between compress_tuple_threshold and toast_tuple_threshold,
> and also call RelationGetToastTupleTarget() beforehand.  Wouldn't it
> be better to merge all that in a common routine?  The same calculation
> method is duplicated 5 times.

I have been looking at this patch more, and here are some notes:
- The tests can be really simplified using directly reltoastrelid, so
I changed the queries this way.  I am aware that the surroundings
hardcode directly the relation name, but that's not really elegant in
my opinion.  And I am really tempted to adjust these as well to
directly use reltoastrelid.
- The docs had a weird indentation.
- I think that we should be careful with the portability of
pg_column_size(), so I have added comparisons instead of the direct
values in a way which does not change the meaning of the tests nor
their coverage.
- Having RelationGetCompressTupleTarget use directly
toast_tuple_threshold as default argument is I think kind of
confusing, so let's use a different static value, named
COMPRESS_TUPLE_TARGET in the attached.  This is similar to
TOAST_TUPLE_TARGET for the toast tuple threshold.
- The comments in tuptoaster.h need to be updated to outline the
difference between the compression invocation and the toast invocation
thresholds.  The wording could be better though.
- Better to avoid comments in the middle of the else/if blocks in my
opinion.

Also, the previous versions of the patch do that when doing a heap
insertion (heapam.c and rewriteheap.c):
+toast_tuple_threshold = RelationGetToastTupleTarget(relation,
+TOAST_TUPLE_THRESHOLD);
+compress_tuple_threshold = RelationGetCompressTupleTarget(relation,
+toast_tuple_threshold);
+compress_tuple_threshold = Min(compress_tuple_threshold,
+   toast_tuple_threshold);
[...]
need_toast = (HeapTupleHasExternal(&oldtup) ||
  HeapTupleHasExternal(newtup) ||
-newtup->t_len > TOAST_TUPLE_THRESHOLD);
+newtup->t_len > compress_tuple_threshold);

This means that the original code always uses the compilation-time,
default value of toast_tuple_target for all relations.  But this gets
changed so as we would use the value set at relation level for
toast_tuple_target if the reloption is changed, without touching
compress_tuple_threshold.  This is out of the scope of this patch, but
shouldn't we always use the relation-level value instead of the
compiled one?  Perhaps there is something I am missing?
--
Michael
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 0fcbc660b3..acb858fba1 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1343,6 +1343,32 @@ WITH ( MODULUS numeric_literal, REM
 

 
+   
+compress_tuple_target (integer)
+
+ 
+  The compress_tuple_target parameter specifies the minimum tuple length
+  required before we try to compress columns marked as Extended or Main
+  and applies only to new tuples - there is no effect on existing rows.
+  By default this parameter is set to allow at least 4 tuples per block,
+  which with the default blocksize will be 2040 bytes. Valid values are
+  between 128 bytes and (blocksize - header), by default 8160 bytes.
+  If the specified value is greater than
+  toast_tuple_target, then we will use the current
+  setting of toast_tuple_target for
+  compress_tuple_target. Note that the default setting
+  is often close to optimal. If the value is set too low then the
+  TOAST may get invoked too often. If the
+  compressibility of the field values is not good, then compression and
+  decompression can add significant computation overhead without
+  corresponding savings in storage consumption.
+ 
+ 
+  This parameter cannot be set for TOAST tables.
+ 
+
+   
+

 parallel_workers (integer)
 
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 5df987f9c9..cfa0af571d 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -426,14 +426,17 @@ bytes regardless of the actual size of the represented value.
 
 The TOAST management code is triggered only
 when a row value to be stored in a table is wider than
-TOAST_TUPLE_THRESHOLD bytes (normally 2 kB).
-The TOAST code will compress and/or move
-field values out-of-line until the row value is shorter than
-TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable)
-or

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-02 Thread Pavan Deolasee

On Wed, Mar 27, 2019 at 9:47 AM Masahiko Sawada 
wrote:

>
> The patch looks good to me. There is no comment from me.
>
>
Thanks for your review! Updated patch attached since patch failed to apply
after recent changes in the master.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


copy_freeze_v5.patch
Description: Binary data

Re: Strange coding in _mdfd_openseg()

2019-04-02 Thread Kyotaro HORIGUCHI

At Wed, 3 Apr 2019 17:14:36 +1300, Thomas Munro  wrote 
in 
> Hello,
> 
> I think the following conditional code is misleading, and I wonder if
> it would be better like so:
> 
> --- a/src/backend/storage/smgr/md.c
> +++ b/src/backend/storage/smgr/md.c
> @@ -1787,8 +1787,13 @@ _mdfd_openseg(SMgrRelation reln, ForkNumber
> forknum, BlockNumber segno,
> if (fd < 0)
> return NULL;
> 
> -   if (segno <= reln->md_num_open_segs[forknum])
> -   _fdvec_resize(reln, forknum, segno + 1);
> +   /*
> +* Segments are always opened in order from lowest to highest,
> so we must
> +* be adding a new one at the end.
> +*/
> +   Assert(segno == reln->md_num_open_segs[forknum]);
> +
> +   _fdvec_resize(reln, forknum, segno + 1);
> 
> /* fill the entry */
> v = &reln->md_seg_fds[forknum][segno];
> 
> I think the condition is always true, and with == it would also always
> be true.  If that weren't the case, the call to _fdvec_resize() code
> would effectively leak vfds.

I may be missing something, but it seems possible that
_mdfd_getseg calls it with segno > opensegs.

| for (nextsegno = reln->md_num_open_segs[forknum];
|  nextsegno <= targetseg; nextsegno++)
| ...
| v = _mdfd_openseg(reln, forknum, nextsegno, flags);


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ToDo: show size of partitioned table

2019-04-02 Thread Kyotaro HORIGUCHI

Hello.

At Wed, 3 Apr 2019 12:55:06 +0900, Amit Langote  
wrote in 
> On 2019/04/03 12:02, Kyotaro HORIGUCHI wrote:
> > \dPn doesn't show children because the are of 'r' relkind. And
> > \dPn * doesn't show type for children.
...
> I think it's intentional that leaf partitions are not shown, because the
> patch is meant to allow showing only the "partitioned" members of
> partition trees.  Regular \d[t|i][+] shows normal tables ('r', 'i', etc.)
> including their size, but it's useless for partitioned tables ('P', 'I',
> etc.) as far as showing the size is concerned, so the patch.  Even \dPn is
> meant to only show partitions that are themselves partitioned; note the
> "P" in the command.

+If the modifier n (nested) is used,
+then non-root partitioned tables are included, and a column is shown
+displaying the parent of each partitioned relation.

Ah. I see. "non-root *partitined* tables". I misread the
phrase. Sorry for the noise.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: New vacuum option to do only freezing

2019-04-02 Thread Kyotaro HORIGUCHI

At Wed, 3 Apr 2019 12:10:03 +0900, Masahiko Sawada  
wrote in 
> > And in the following part:
> >
> > +   /* Set index cleanup option based on reloptions */
> > +   if (params->index_cleanup == VACUUM_OPTION_DEFAULT)
> > +   {
> > +   if (onerel->rd_options == NULL ||
> > +   ((StdRdOptions *) 
> > onerel->rd_options)->vacuum_index_cleanup)
> > +   params->index_cleanup = VACUUM_OPTION_ENABLED;
> > +   else
> > +   params->index_cleanup = VACUUM_OPTION_DISABLED;
> > +   }
> > +
> >
> > The option should not be false while VACUUM FULL,
> 
> I think that we need to complain only when INDEX_CLEANUP option is
> disabled by an explicit option on the VACUUM command and FULL option
> is specified. It's no problem when vacuum_index_cleanup is false and
> FULL option is true. Since internally we don't use index cleanup when
> vacuum full I guess that we don't need to require index_cleanup being
> always true even when full option is specified.

I know it's safe. It's just about integrity of option values. So
I don't insist on that.

> > and maybe we
> > should complain in WARNING or NOTICE that the relopt is ignored.
> 
> I think when users want to control index cleanup behavior manually
> they specify INDEX_CLEANUP option on the VACUUM command. So it seems
> to me that overwriting a reloption by an explicit option would be a
> natural behavior. I'm concerned that these message would rather
> confuse users.

If it "cannot be specified with FULL", it seems strange that it's
safe being specified by reloptions.

I'm rather thinking that INDEX_CLEANUP = false is ignorable even
being specified with FULL option, aand DISABLE_PAGE_SKIPPING for
VACUUM FULL shuld be ignored since VACUUM FULL doesn't skip pages
in the first place.

Couldn't we silence the DISABLE_PAGE_SKIPPING & FULL case instead
of complaining about INDEX_CLEANUP & FULL? If so, I feel just
ignoring the relopt cases is reasonable.

Yeah, perhaps I'm warrying too much.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Strange coding in _mdfd_openseg()

2019-04-02 Thread Thomas Munro

Hello,

I think the following conditional code is misleading, and I wonder if
it would be better like so:

--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1787,8 +1787,13 @@ _mdfd_openseg(SMgrRelation reln, ForkNumber
forknum, BlockNumber segno,
if (fd < 0)
return NULL;

-   if (segno <= reln->md_num_open_segs[forknum])
-   _fdvec_resize(reln, forknum, segno + 1);
+   /*
+* Segments are always opened in order from lowest to highest,
so we must
+* be adding a new one at the end.
+*/
+   Assert(segno == reln->md_num_open_segs[forknum]);
+
+   _fdvec_resize(reln, forknum, segno + 1);

/* fill the entry */
v = &reln->md_seg_fds[forknum][segno];

I think the condition is always true, and with == it would also always
be true.  If that weren't the case, the call to _fdvec_resize() code
would effectively leak vfds.

-- 
Thomas Munro
https://enterprisedb.com

Re: ToDo: show size of partitioned table

2019-04-02 Thread Amit Langote

Horiguchi-san,

Thanks for taking a look.

On 2019/04/03 12:02, Kyotaro HORIGUCHI wrote:
> \dPn doesn't show children because the are of 'r' relkind. And
> \dPn * doesn't show type for children.
> 
> create table pa (a int, b int) partition by range (a);
> create table c1 partition of pa for values from (0) to (10);
> create table c2 partition of pa for values from (10) to (20);
> create index on pa(a);
> 
> postgres=# \dPn *
>   List of partitioned tables and indexes
>  Schema |   Name   |  Owner   |   Type| Parent name | On table 
> +--+--+---+-+--
>  public | pa   | horiguti | partitioned table | | 
>  public | pa_a_idx | horiguti | partitioned index | | pa
> (2 rows)
> 
> 
> With the attached on top of the latest patch, we get the following result.
> 
> postgres=# \dPn *
>   List of partitioned tables and indexes
>  Schema |   Name   |  Owner   |   Type| Parent name | On table 
> +--+--+---+-+--
>  public | c1   | horiguti | partition table   | pa  | 
>  public | c1_a_idx | horiguti | partition index   | pa_a_idx| c1
>  public | c2   | horiguti | partition table   | pa  | 
>  public | c2_a_idx | horiguti | partition index   | pa_a_idx| c2
>  public | pa   | horiguti | partitioned table | | 
>  public | pa_a_idx | horiguti | partitioned index | | pa
> (6 rows)

I think it's intentional that leaf partitions are not shown, because the
patch is meant to allow showing only the "partitioned" members of
partition trees.  Regular \d[t|i][+] shows normal tables ('r', 'i', etc.)
including their size, but it's useless for partitioned tables ('P', 'I',
etc.) as far as showing the size is concerned, so the patch.  Even \dPn is
meant to only show partitions that are themselves partitioned; note the
"P" in the command.

Thanks,
Amit

Re: New vacuum option to do only freezing

2019-04-02 Thread Masahiko Sawada

On Wed, Apr 3, 2019 at 10:56 AM Kyotaro HORIGUCHI
 wrote:
>
> Hello.
>
> At Mon, 1 Apr 2019 14:26:15 +0900, Masahiko Sawada  
> wrote in 
> > On Sat, Mar 30, 2019 at 5:04 AM Robert Haas  wrote:
> > >
> > > On Fri, Mar 29, 2019 at 12:27 PM Masahiko Sawada  
> > > wrote:
> > > > Yeah, but since multiple relations might be specified in VACUUM
> > > > command we need to process index_cleanup option after opened each
> > > > relations. Maybe we need to process all options except for
> > > > INDEX_CLEANUP in ExecVacuum() and pass VacuumStmt down to vacuum_rel()
> > > > and process it  in manner of you suggested after opened the relation.
> > > > Is that right?
> > >
> > > Blech, no, let's not do that.  We'd better use some method that can
> > > indicate yes/no/default.  Something like psql's trivalue thingy, but
> > > probably not exactly that.  We can define an enum for this purpose,
> > > for example - VACUUM_INDEX_CLEANUP_{ENABLED,DISABLED,DEFAULT}.  Or
> > > maybe there's some other way.  But let's not pass bits of the parse
> > > tree around any more than really required.
> >
> > I've defined an enum VacOptTernaryValue representing
> > enabled/disabled/default and added index_cleanup variable as the new
>

Thank you for reviewing the patch!

> It is defined as ENABLED=0, DISABLED=1, DEFAULT=2. At leat
> ENABLED=0 and DISABLED=1 are misleading.

Indeed, will fix.

>
> > enum type to VacuumParams. The vacuum options that uses the reloption
> > value as the default value such as index cleanup and new truncation
> > option can use this enum and set either enabled or disabled after
> > opened the relation when it’s set to default. Please find the attached
> > patches.
>
> +static VacOptTernaryValue get_vacopt_ternary_value(DefElem *def);
>
> This is actually a type converter of boolean. It is used to read
> VACUUM option but not used to read reloption. It seems useless.
>
>
> Finally the ternary value is determined to true or false before
> use. It is simple that index_cleanup finaly be read as bool. We
> could add another boolean to indicate that the value is set or
> not, but I think it would be better that the ternary type is a
> straightfoward expansion of bool.{DEFAULT = -1, DISABLED = 0,
> ENABLED = 1} and make sure that index_cleanup is not DEFAULT at a
> certain point.
>
> So, how about this?
>
> #define VACOPT_TERNARY_DEFAULT -1
> typedef int VacOptTernaryValue;  /* -1 is undecided, 0 is false, 1 is true */

Hmm, if we do that we set either VAOPT_TERNARY_DEFAULT, true or false
to index_cleanup, but I'm not sure this is a good approach. I think we
would want VACOPT_TERNARY_TRUE and VACOPT_TERNARY_FALSE as we defined
new type as a ternary value and already have VACOPT_TERNARY_DEFAULT.

>
> /* No longer the value mustn't be left DEFAULT */
> Assert (params->index_cleanup != VACOPT_TERNARY_DEFAULT);

Agreed, will add it.

>
>
> > > > > I think it would be better to just ignore the INDEX_CLEANUP option
> > > > > when FULL is specified.
> > > >
> > > > Okay, but why do we ignore that in this case while we complain in the
> > > > case of FULL and DISABLE_PAGE_SKIPPING?
> > >
> > > Well, that's a fair point, I guess.  If we go that that route, we'll
> > > need to make sure that setting the reloption doesn't prevent VACUUM
> > > FULL from working -- the complaint must only affect an explicit option
> > > on the VACUUM command line.  I think we should have a regression test
> > > for that.
> >
> > I've added regression tests. Since we check it before setting
> > index_cleanup based on reloptions we doesn't prevent VAUCUM FULL from
> > working even when the vacuum_index_cleanup is false.
>
> +  errmsg("VACUUM option INDEX_CLEANUP cannot be set to false with FULL")));
>
> I'm not one to talk on this, but this seems somewhat confused.
>
> "VACUUM option INDEX_CLEANUP cannot be set to false with FULL being specified"
>
> or
>
> "INDEX_CLEANUP cannot be disabled for VACUUM FULL"?

I prefer the former, will fix.

>
>
> And in the following part:
>
> +   /* Set index cleanup option based on reloptions */
> +   if (params->index_cleanup == VACUUM_OPTION_DEFAULT)
> +   {
> +   if (onerel->rd_options == NULL ||
> +   ((StdRdOptions *) 
> onerel->rd_options)->vacuum_index_cleanup)
> +   params->index_cleanup = VACUUM_OPTION_ENABLED;
> +   else
> +   params->index_cleanup = VACUUM_OPTION_DISABLED;
> +   }
> +
>
> The option should not be false while VACUUM FULL,

I think that we need to complain only when INDEX_CLEANUP option is
disabled by an explicit option on the VACUUM command and FULL option
is specified. It's no problem when vacuum_index_cleanup is false and
FULL option is true. Since internally we don't use index cleanup when
vacuum full I guess that we don't need to require index_cleanup being
always true even when full option is specified.

> and maybe we
> should complain in WARNING or NOTICE that the relopt is ignor

Re: ToDo: show size of partitioned table

2019-04-02 Thread Kyotaro HORIGUCHI

At Tue, 2 Apr 2019 22:49:25 -0300, Alvaro Herrera  
wrote in <20190403014925.GA9976@alvherre.pgsql>
> Hi Amit
> 
> On 2019-Apr-03, Amit Langote wrote:
> 
> > Thanks.  I'm getting an error while applying v12:
> > 
> > patching file src/bin/psql/po/es.po
> > Hunk #1 FAILED at 12.
> > 1 out of 2 hunks FAILED -- saving rejects to file src/bin/psql/po/es.po.rej
> 
> Uh, I certainly did not intend to send the es.po part of the patch, I
> was just testing that the translation worked as intended.  That's what I
> get for doing stuff till the last minute ...

\dPn doesn't show children because the are of 'r' relkind. And
\dPn * doesn't show type for children.

create table pa (a int, b int) partition by range (a);
create table c1 partition of pa for values from (0) to (10);
create table c2 partition of pa for values from (10) to (20);
create index on pa(a);

postgres=# \dPn *
  List of partitioned tables and indexes
 Schema |   Name   |  Owner   |   Type| Parent name | On table 
+--+--+---+-+--
 public | pa   | horiguti | partitioned table | | 
 public | pa_a_idx | horiguti | partitioned index | | pa
(2 rows)


With the attached on top of the latest patch, we get the following result.

postgres=# \dPn *
  List of partitioned tables and indexes
 Schema |   Name   |  Owner   |   Type| Parent name | On table 
+--+--+---+-+--
 public | c1   | horiguti | partition table   | pa  | 
 public | c1_a_idx | horiguti | partition index   | pa_a_idx| c1
 public | c2   | horiguti | partition table   | pa  | 
 public | c2_a_idx | horiguti | partition index   | pa_a_idx| c2
 public | pa   | horiguti | partitioned table | | 
 public | pa_a_idx | horiguti | partitioned index | | pa
(6 rows)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 736cca33d6..5f97bfc4b2 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3954,7 +3954,11 @@ listPartitions(const char *pattern, bool verbose, bool show_indexes,
 		}
 	}
 
-	appendPQExpBuffer(&buf, "\nWHERE c.relkind IN (%s)", relkind_str);
+	if (show_nested)
+		appendPQExpBuffer(&buf, "\nWHERE (c.relkind IN (%s) OR c3.relkind IN (%s))", relkind_str, relkind_str);
+	else
+		appendPQExpBuffer(&buf, "\nWHERE c.relkind IN (%s)", relkind_str);
+		
 	appendPQExpBufferStr(&buf, !show_nested ? " AND NOT c.relispartition\n" : "\n");
 
 	if (!pattern)
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 5f97bfc4b2..41075f9a67 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3864,9 +3864,13 @@ listPartitions(const char *pattern, bool verbose, bool show_indexes,
 	  ",\n  CASE c.relkind"
 	  " WHEN " CppAsString2(RELKIND_PARTITIONED_TABLE) " THEN '%s'"
 	  " WHEN " CppAsString2(RELKIND_PARTITIONED_INDEX) " THEN '%s'"
+	  " WHEN " CppAsString2(RELKIND_RELATION) " THEN '%s'"
+	  " WHEN " CppAsString2(RELKIND_INDEX) " THEN '%s'"
 	  " END as \"%s\"",
 	  gettext_noop("partitioned table"),
 	  gettext_noop("partitioned index"),
+	  gettext_noop("partition table"),
+	  gettext_noop("partition index"),
 	  gettext_noop("Type"));
 
 		translate_columns[3] = true;

Re: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-04-02 Thread Amit Kapila

On Thu, Mar 28, 2019 at 11:43 AM Haribabu Kommi
 wrote:
>
> On Wed, Mar 27, 2019 at 11:27 PM Amit Kapila  wrote:
>>
>> On Wed, Mar 27, 2019 at 6:53 AM Haribabu Kommi  
>> wrote:
>> > On Tue, Mar 26, 2019 at 9:08 PM Amit Kapila  
>> > wrote:
>> >>
>> >>   As part of this thread, maybe we can
>> >> just fix the case of the parallel cooperating transaction.
>> >
>> >
>> > With the current patch, all the parallel implementation transaction are 
>> > getting
>> > skipped, in my tests parallel workers are the major factor in the 
>> > transaction
>> > stats counter. Even before parallelism, the stats of the autovacuum and etc
>> > are still present but their contribution is not greatly influencing the 
>> > stats.
>> >
>> > I agree with you in fixing the stats with parallel workers and improve 
>> > stats.
>> >
>>
>> I was proposing to fix only the transaction started with
>> StartParallelWorkerTransaction by using is_parallel_worker flag as
>> discussed above.  I understand that it won't completely fix the
>> problem reported by you, but it will be a step in that direction.  My
>> main worry is that if we fix it the way you are proposing and we also
>> invent a new way to deal with all other internal transactions, then
>> the fix done by us now needs to be changed/reverted.  Note, that this
>> fix needs to be backpatched as well, so we should avoid doing any fix
>> which needs to be changed or reverted.
>
>
> I tried the approach as your suggested as by not counting the actual parallel 
> work
> transactions by just releasing the resources without touching the counters,
> the counts are not reduced much.
>
> HEAD - With 4 parallel workers running query generates 13 stats ( 4 * 3  + 1)
> Patch -  With 4 parallel workers running query generates 9 stats ( 4 * 2 + 1)
> Old approach patch - With 4 parallel workers running query generates 1 stat 
> (1)
>
> Currently the parallel worker start transaction 3 times in the following 
> places.
> 1. InitPostgres
> 2. ParallelWorkerMain (2)
>
> with the attached patch, we reduce one count from ParallelWorkerMain.
>

Right, that is expected from this fix.  BTW, why you have changed the
approach in this patch to not count anything by the parallel worker as
compared to the earlier version where you were just ignoring the stats
for transactions.  As of now, either way is fine, but in future (after
parallel inserts/updates), we want other stats to be updated.  I think
your previous idea was better, just you need to start using
is_parallel_worker flag.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: New vacuum option to do only freezing

2019-04-02 Thread Kyotaro HORIGUCHI

Hello.

At Mon, 1 Apr 2019 14:26:15 +0900, Masahiko Sawada  
wrote in 
> On Sat, Mar 30, 2019 at 5:04 AM Robert Haas  wrote:
> >
> > On Fri, Mar 29, 2019 at 12:27 PM Masahiko Sawada  
> > wrote:
> > > Yeah, but since multiple relations might be specified in VACUUM
> > > command we need to process index_cleanup option after opened each
> > > relations. Maybe we need to process all options except for
> > > INDEX_CLEANUP in ExecVacuum() and pass VacuumStmt down to vacuum_rel()
> > > and process it  in manner of you suggested after opened the relation.
> > > Is that right?
> >
> > Blech, no, let's not do that.  We'd better use some method that can
> > indicate yes/no/default.  Something like psql's trivalue thingy, but
> > probably not exactly that.  We can define an enum for this purpose,
> > for example - VACUUM_INDEX_CLEANUP_{ENABLED,DISABLED,DEFAULT}.  Or
> > maybe there's some other way.  But let's not pass bits of the parse
> > tree around any more than really required.
> 
> I've defined an enum VacOptTernaryValue representing
> enabled/disabled/default and added index_cleanup variable as the new

It is defined as ENABLED=0, DISABLED=1, DEFAULT=2. At leat
ENABLED=0 and DISABLED=1 are misleading. 

> enum type to VacuumParams. The vacuum options that uses the reloption
> value as the default value such as index cleanup and new truncation
> option can use this enum and set either enabled or disabled after
> opened the relation when it’s set to default. Please find the attached
> patches.

+static VacOptTernaryValue get_vacopt_ternary_value(DefElem *def);

This is actually a type converter of boolean. It is used to read
VACUUM option but not used to read reloption. It seems useless.


Finally the ternary value is determined to true or false before
use. It is simple that index_cleanup finaly be read as bool. We
could add another boolean to indicate that the value is set or
not, but I think it would be better that the ternary type is a
straightfoward expansion of bool.{DEFAULT = -1, DISABLED = 0,
ENABLED = 1} and make sure that index_cleanup is not DEFAULT at a
certain point.

So, how about this?

#define VACOPT_TERNARY_DEFAULT -1
typedef int VacOptTernaryValue;  /* -1 is undecided, 0 is false, 1 is true */

/* No longer the value mustn't be left DEFAULT */
Assert (params->index_cleanup != VACOPT_TERNARY_DEFAULT);


> > > > I think it would be better to just ignore the INDEX_CLEANUP option
> > > > when FULL is specified.
> > >
> > > Okay, but why do we ignore that in this case while we complain in the
> > > case of FULL and DISABLE_PAGE_SKIPPING?
> >
> > Well, that's a fair point, I guess.  If we go that that route, we'll
> > need to make sure that setting the reloption doesn't prevent VACUUM
> > FULL from working -- the complaint must only affect an explicit option
> > on the VACUUM command line.  I think we should have a regression test
> > for that.
> 
> I've added regression tests. Since we check it before setting
> index_cleanup based on reloptions we doesn't prevent VAUCUM FULL from
> working even when the vacuum_index_cleanup is false.

+  errmsg("VACUUM option INDEX_CLEANUP cannot be set to false with FULL")));

I'm not one to talk on this, but this seems somewhat confused.

"VACUUM option INDEX_CLEANUP cannot be set to false with FULL being specified"

or

"INDEX_CLEANUP cannot be disabled for VACUUM FULL"?


And in the following part:

+   /* Set index cleanup option based on reloptions */
+   if (params->index_cleanup == VACUUM_OPTION_DEFAULT)
+   {
+   if (onerel->rd_options == NULL ||
+   ((StdRdOptions *) 
onerel->rd_options)->vacuum_index_cleanup)
+   params->index_cleanup = VACUUM_OPTION_ENABLED;
+   else
+   params->index_cleanup = VACUUM_OPTION_DISABLED;
+   }
+

The option should not be false while VACUUM FULL, and maybe we
should complain in WARNING or NOTICE that the relopt is ignored.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Problem with default partition pruning

2019-04-02 Thread Amit Langote

Hosoya-san,

On 2019/04/02 14:02, Yuzuko Hosoya wrote:
> Hi,
> 
>> Maybe we should have two patches as we seem to be improving two things:
>>
>> 1. Patch to fix problems with default partition pruning originally reported 
>> by Hosoya-san
>>
>> 2. Patch to determine if a given clause contradicts a sub-partitioned 
>> table's partition constraint,
>> fixing problems unearthed by Thibaut's tests
> 
> I attached the latest patches according to Amit comment.
> v3_default_partition_pruning.patch fixes default partition pruning problems
> and ignore_contradictory_where_clauses_at_partprune_step.patch fixes
> sub-partition problems Thibaut tested.

Thanks for dividing patches that way.

Would it be a good idea to add some new test cases to these patches, just
so it's easily apparent what we're changing?

So, we could add the test case presented by Thibaut at the following link
to the default_partition_pruning.patch:

https://www.postgresql.org/message-id/a4968068-6401-7a9c-8bd4-6a3bc9164a86%40dalibo.com

And, another reported at the following link to
ignore_contradictory_where_clauses_at_partprune_step.patch:

https://www.postgresql.org/message-id/bd03f475-30d4-c4d0-3d7f-d2fbde755971%40dalibo.com

Actually, it might be possible/better to construct the test queries in
partition_prune.sql using the existing tables in that script, that is,
without defining new tables just for adding the new test cases.  If not,
maybe it's OK to create the new tables too.

Thanks,
Amit

Re: ToDo: show size of partitioned table

2019-04-02 Thread Alvaro Herrera

Hi Amit

On 2019-Apr-03, Amit Langote wrote:

> Thanks.  I'm getting an error while applying v12:
> 
> patching file src/bin/psql/po/es.po
> Hunk #1 FAILED at 12.
> 1 out of 2 hunks FAILED -- saving rejects to file src/bin/psql/po/es.po.rej

Uh, I certainly did not intend to send the es.po part of the patch, I
was just testing that the translation worked as intended.  That's what I
get for doing stuff till the last minute ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Progress reporting for pg_verify_checksums

2019-04-02 Thread Michael Paquier

On Tue, Apr 02, 2019 at 10:10:34AM +0200, Fabien COELHO wrote:
> For online, we should want throttling so that the check can have a reduced
> performance impact when scrubbing.

Yes.  Throttling is a necessary property in my opinion, perhaps in
combination with some autovacuum-like options to only trigger the
checks for a relation after a certain amout of activity has happened
on it.
--
Michael


signature.asc
Description: PGP signature

Re: Progress reporting for pg_verify_checksums

2019-04-02 Thread Michael Paquier

On Tue, Apr 02, 2019 at 04:02:59PM +0200, Michael Banck wrote:
> Can you explain in more detail how this would work? I thought we came to
> the conclusion (and the documentation seems to indicate so), that you
> should stop all participating instances of a cluster and then enable
> checksums on all of them, which would impose a downtime.

That's what I was pointing out here:
https://www.postgresql.org/message-id/20190320225924.gc20...@paquier.xyz

I still agree about keeping in the docs safer recommendations, in the
shape of the ones currently present, than what I mentioned on the
other thread.
--
Michael


signature.asc
Description: PGP signature

Re: Checksum errors in pg_stat_database

2019-04-02 Thread Michael Paquier

On Tue, Apr 02, 2019 at 07:06:35PM +0200, Magnus Hagander wrote:
> I think having the count and hte last time make sense, but I'm very
> sceptical about the rest.

There may be other things which we are not considering on this
thread.  I don't know.

> I can somewhat agree that splitting it  on a per database level might even
> at that be overdoing it. What might actually be more interesting from a
> failure-location perspective would be tablespace, rather than any of the
> others. Or we could reduce it down to just putting it in pg_stat_bgwriter
> and only count global values perhaps, if in the end we don't think the
> split-per-database is reasonable?

A split per database or per tablespace is I think a very good thing.
This helps in tracking down which partitions have gone crazy, and a
single global counter does not allow that.
--
Michael


signature.asc
Description: PGP signature

Re: ToDo: show size of partitioned table

2019-04-02 Thread Amit Langote

Hi Alvaro,

On 2019/04/03 4:42, Alvaro Herrera wrote:
> On 2019-Apr-02, Alvaro Herrera wrote:
> 
>> Something is not right:
> 
> Another thing that was not right is that translated_columns was being
> marked static, and modified in the function; so if you called \dP twice
> where one called for translation and the second not, the second time
> we'd also translating that column.
> 
> Attached contains fixes for those three issues.

Thanks.  I'm getting an error while applying v12:

patching file src/bin/psql/po/es.po
Hunk #1 FAILED at 12.
1 out of 2 hunks FAILED -- saving rejects to file src/bin/psql/po/es.po.rej

Did you accidentally add it to the patch?

Thanks,
Amit

Re: Ordered Partitioned Table Scans

2019-04-02 Thread Amit Langote

On 2019/04/03 3:10, David Rowley wrote:
> On Wed, 3 Apr 2019 at 01:26, Amit Langote  wrote:
>> +#include "nodes/pathnodes.h"
>> ...
>> +partitions_are_ordered(struct RelOptInfo *partrel)
>>
>> Maybe, "struct" is unnecessary?
> 
> I just left it there so that the signature matched the header file.
> Looking around for examples I see make_partition_pruneinfo() has the
> structs only in the header file, so I guess that is how we do things,
> so changed to that in the attached.

Ah, I see.  Thanks for updating the patch.

I don't have any more comments.

Thanks,
Amit

Re: [PATCH v20] GSSAPI encryption support

2019-04-02 Thread Joe Conway

On 4/2/19 6:18 PM, Stephen Frost wrote:
> Greetings,
> 
> On Tue, Apr 2, 2019 at 18:10 Peter Eisentraut
>  > wrote:
> 
> On 2019-02-23 17:27, Stephen Frost wrote:
> >> About pg_hba.conf: The "hostgss" keyword seems a bit confusing. 
> It only
> >> applies to encrypted gss-using connections, not all of them.  Maybe
> >> "hostgssenc" or "hostgsswrap"?
> > Not quite sure what you mean here, but 'hostgss' seems to be quite
> well
> > in-line with what we do for SSL...  as in, we have 'hostssl', we don't
> > say 'hostsslenc'.  I feel like I'm just not understanding what you
> mean
> > by "not all of them".
> 
> Reading the latest patch, I think this is still a bit confusing.
> Consider an entry like
> 
>     hostgss all             all             0.0.0.0/0
>                gss
> 
> The "hostgss" part means, the connection is GSS-*encrypted*.  The "gss"
> entry in the last column means use gss for *authentication*.  But didn't
> "hostgss" already imply that?  No.  I understand what's going on, but it
> seems quite confusing.  They both just say "gss"; you have to know a lot
> about the nuances of pg_hba.conf processing to get that.
> 
> If you have line like
> 
>     hostgss all             all             0.0.0.0/0
>                md5
> 
> it is not obvious that this means, if GSS-encrypted, use md5.  It could
> just as well mean, if GSS-authenticated, use md5.
> 
> The analogy with SSL is such that we use "hostssl" for connections using
> SSL encryption and "cert" for the authentication method.  So there we
> use two different words for two different aspects of SSL.
> 
> 
> I don’t view it as confusing, but I’ll change it to hostgssenc as was
> suggested earlier to address that concern.  It’s a bit wordy but if it
> helps reduce confusion then that’s a good thing.

Personally I don't find it as confusing as is either, and I find hostgss
to be a good analog of hostssl. On the other hand hostgssenc is long and
unintuitive. So +1 for leaving as is and -1 one for changing it IMHO.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: [PATCH v20] GSSAPI encryption support

2019-04-02 Thread Stephen Frost

Greetings,

On Tue, Apr 2, 2019 at 18:10 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-02-23 17:27, Stephen Frost wrote:
> >> About pg_hba.conf: The "hostgss" keyword seems a bit confusing.  It only
> >> applies to encrypted gss-using connections, not all of them.  Maybe
> >> "hostgssenc" or "hostgsswrap"?
> > Not quite sure what you mean here, but 'hostgss' seems to be quite well
> > in-line with what we do for SSL...  as in, we have 'hostssl', we don't
> > say 'hostsslenc'.  I feel like I'm just not understanding what you mean
> > by "not all of them".
>
> Reading the latest patch, I think this is still a bit confusing.
> Consider an entry like
>
> hostgss all all 0.0.0.0/0   gss
>
> The "hostgss" part means, the connection is GSS-*encrypted*.  The "gss"
> entry in the last column means use gss for *authentication*.  But didn't
> "hostgss" already imply that?  No.  I understand what's going on, but it
> seems quite confusing.  They both just say "gss"; you have to know a lot
> about the nuances of pg_hba.conf processing to get that.
>
> If you have line like
>
> hostgss all all 0.0.0.0/0   md5
>
> it is not obvious that this means, if GSS-encrypted, use md5.  It could
> just as well mean, if GSS-authenticated, use md5.
>
> The analogy with SSL is such that we use "hostssl" for connections using
> SSL encryption and "cert" for the authentication method.  So there we
> use two different words for two different aspects of SSL.


I don’t view it as confusing, but I’ll change it to hostgssenc as was
suggested earlier to address that concern.  It’s a bit wordy but if it
helps reduce confusion then that’s a good thing.

Thanks,

Stephen

>

Re: [PATCH v20] GSSAPI encryption support

2019-04-02 Thread Peter Eisentraut

On 2019-02-23 17:27, Stephen Frost wrote:
>> About pg_hba.conf: The "hostgss" keyword seems a bit confusing.  It only
>> applies to encrypted gss-using connections, not all of them.  Maybe
>> "hostgssenc" or "hostgsswrap"?
> Not quite sure what you mean here, but 'hostgss' seems to be quite well
> in-line with what we do for SSL...  as in, we have 'hostssl', we don't
> say 'hostsslenc'.  I feel like I'm just not understanding what you mean
> by "not all of them".

Reading the latest patch, I think this is still a bit confusing.
Consider an entry like

hostgss all all 0.0.0.0/0   gss

The "hostgss" part means, the connection is GSS-*encrypted*.  The "gss"
entry in the last column means use gss for *authentication*.  But didn't
"hostgss" already imply that?  No.  I understand what's going on, but it
seems quite confusing.  They both just say "gss"; you have to know a lot
about the nuances of pg_hba.conf processing to get that.

If you have line like

hostgss all all 0.0.0.0/0   md5

it is not obvious that this means, if GSS-encrypted, use md5.  It could
just as well mean, if GSS-authenticated, use md5.

The analogy with SSL is such that we use "hostssl" for connections using
SSL encryption and "cert" for the authentication method.  So there we
use two different words for two different aspects of SSL.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Pluggable Storage - Andres's take

2019-04-02 Thread Andres Freund

Hi,

On 2019-04-02 17:11:07 +1100, Haribabu Kommi wrote:
> From a72cfcd523887f1220473231d7982928acc23684 Mon Sep 17 00:00:00 2001
> From: Hari Babu 
> Date: Tue, 2 Apr 2019 15:41:17 +1100
> Subject: [PATCH 1/2] tableam : doc update of table access methods
> 
> Providing basic explanation of table access methods
> including their structure details and reference heap
> implementation files.
> ---
>  doc/src/sgml/catalogs.sgml |  5 ++--
>  doc/src/sgml/filelist.sgml |  1 +
>  doc/src/sgml/postgres.sgml |  1 +
>  doc/src/sgml/tableam.sgml  | 56 ++
>  4 files changed, 61 insertions(+), 2 deletions(-)
>  create mode 100644 doc/src/sgml/tableam.sgml
> 
> diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
> index f4aabf5dc7..200708e121 100644
> --- a/doc/src/sgml/catalogs.sgml
> +++ b/doc/src/sgml/catalogs.sgml
> @@ -587,8 +587,9 @@
> The catalog pg_am stores information about
> relation access methods.  There is one row for each access method 
> supported
> by the system.
> -   Currently, only indexes have access methods.  The requirements for index
> -   access methods are discussed in detail in .
> +   Currently, only table and indexes have access methods. The requirements 
> for table
> +   access methods are discussed in detail in  and 
> the
> +   requirements for index access methods are discussed in detail in  linkend="indexam"/>.
>

I also adapted pg_am.amtype.


> diff --git a/doc/src/sgml/tableam.sgml b/doc/src/sgml/tableam.sgml
> new file mode 100644
> index 00..9eca52ee70
> --- /dev/null
> +++ b/doc/src/sgml/tableam.sgml
> @@ -0,0 +1,56 @@
> +
> +
> +
> + Table Access Method Interface Definition
> + 
> +  
> +   This chapter defines the interface between the core 
> PostgreSQL
> +   system and access methods, which manage 
> TABLE
> +   types. The core system knows nothing about these access methods beyond
> +   what is specified here, so it is possible to develop entirely new access
> +   method types by writing add-on code.
> +  
> +  
> +  
> +   All Tables in PostgreSQL system are the primary
> +   data store. Each table is stored as its own physical 
> relation
> +   and so is described by an entry in the pg_class
> +   catalog. A table's content is entirely controlled by its access method.
> +   (All the access methods furthermore use the standard page layout described
> +   in .)
> +  

I don't think there's actually any sort of dependency on the page
layout. It's entirely conceivable to write an AM that doesn't use
postgres' shared buffers.

> +  
> +   A table access method handler function must be declared to accept a single
> +   argument of type internal and to return the pseudo-type
> +   table_am_handler.  The argument is a dummy value that simply
> +   serves to prevent handler functions from being called directly from SQL 
> commands.

> +   The result of the function must be a palloc'd struct of type 
> TableAmRoutine,
> +   which contains everything that the core code needs to know to make use of
> +   the table access method.

That's not been correct for a while...


> The TableAmRoutine struct,
> +   also called the access method's API struct, 
> includes
> +   fields specifying assorted fixed properties of the access method, such as
> +   whether it can support bitmap scans.  More importantly, it contains 
> pointers
> +   to support functions for the access method, which do all of the real work 
> to
> +   access tables. These support functions are plain C functions and are not
> +   visible or callable at the SQL level.  The support functions are described
> +   in TableAmRoutine structure. For more details, 
> please
> +   refer the file src/include/access/tableam.h.
> +  

This seems to not have been adapted after copying it from indexam?


I'm still working on this (in particular I think storage.sgml and
probably some other places needs updates to make clear they apply to
heap not generally; I think there needs to be some references to generic
WAL records in tableam.sgml, ...), but I got to run a few errands.

One thing I want to call out is that I made the reference to
src/include/access/tableam.h a link to gitweb. I think that makes it
much more useful to the casual reader. But it also means that, baring
some infrastructure / procedure we don't have, the link will just
continue to point to master. I'm not particularly concerned about that,
but it seems worth pointing out, given that we've only a single link to
gitweb in the sgml docs so far.

Greetings,

Andres Freund
>From 33826b211e3f2d5fe76024e2938ff2eb6aeb00da Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 2 Apr 2019 14:55:30 -0700
Subject: [PATCH] tableam: docs

---
 doc/src/sgml/catalogs.sgml|  9 +-
 doc/src/sgml/config.sgml  | 17 
 doc/src/sgml/filelist.sgml|  1 +
 doc/src/sgml/postgres.sgml|  1 +
 doc/src/sgml/ref/create_access_method.sgml| 14

Re: amcheck verification for GiST

2019-04-02 Thread Peter Geoghegan

On Thu, Mar 28, 2019 at 11:30 PM Andrey Borodin  wrote:
> Here's updated patch with AccessShareLock.
> I've tried to stress this with combination of random inserts, vaccuums and 
> checks. This process neither failed, nor deadlocked.
> The patch intentionally contains one superflous line to make gist logically 
> broken. This triggers regression test of amcheck.

Any thoughts on this, Heikki?

It would be nice to be able to squeeze this into Postgres 12,
especially given that GiST has been enhanced for 12 already.

-- 
Peter Geoghegan

Re: PostgreSQL pollutes the file system

2019-04-02 Thread Fred .Flintstone

It looks like this thread is featured on LWN under the article:
Program names and "pollution".
https://lwn.net/
https://lwn.net/Articles/784508/ (Subscription required)

On Sat, Mar 30, 2019 at 12:27 PM Peter Eisentraut
 wrote:
>
> On 2019-03-29 20:32, Joe Conway wrote:
> >   pg_util  
>
> How is that better than just renaming to pg_$oldname?
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unified logging system for command-line programs

2019-04-02 Thread Alvaro Herrera

On 2019-Apr-02, Tom Lane wrote:

> Alvaro Herrera  writes:
> > I don't much like the code that does
> >pg_log_error("%s", something);
> 
> > because then the string "%s" is marked for translation.
> 
> Uh, surely we've got hundreds of instances of that in the system already?

Actually, we don't have that many, but there are more than I remembered
there being -- my memory was telling me that I had erradicated them all
in commit 55a70a023c3d but that's sadly misinformed.  Seeing this (and
also because the API would become nastier than I thought it would), I'll
leave this stuff be for now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unified logging system for command-line programs

2019-04-02 Thread Tom Lane

Alvaro Herrera  writes:
> I don't much like the code that does
>pg_log_error("%s", something);

> because then the string "%s" is marked for translation.

Uh, surely we've got hundreds of instances of that in the system already?

> Maybe we should
> consider a variant that takes a straight string literal instead of a
> sprintf-style fmt to avoid this problem.  We'd do something like
>pg_log_error_v(something);
> which does not call _() within.

What it looks like that's doing is something similar to appendPQExpBuffer
versus appendPQExpBufferStr, ie, just skipping the overhead of sprintf
format processing when you don't need it.  The implications for
translatability or not are unobvious, so I'm afraid this would result
in confusion and missed translations.

I'm not necessarily against some idea like this, but how do we
separate "translatability" from "sprintf formatting"?

regards, tom lane

Re: Unified logging system for command-line programs

2019-04-02 Thread Alvaro Herrera

I don't much like the code that does

   pg_log_error("%s", something);

because then the string "%s" is marked for translation.  Maybe we should
consider a variant that takes a straight string literal instead of a
sprintf-style fmt to avoid this problem.  We'd do something like

   pg_log_error_v(something);

which does not call _() within.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ToDo: show size of partitioned table

2019-04-02 Thread Alvaro Herrera

On 2019-Apr-02, Alvaro Herrera wrote:

> Something is not right:

Another thing that was not right is that translated_columns was being
marked static, and modified in the function; so if you called \dP twice
where one called for translation and the second not, the second time
we'd also translating that column.

Attached contains fixes for those three issues.

I'm gonna go over the docs now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index b29e7547c6a..a789042ce5f 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1659,6 +1659,72 @@ testdb=>
 
   
 
+
+  
+\dP[n+] [ pattern ]
+
+
+Lists partitioned relations.  If pattern is specified, only entries
+whose name matches the pattern are listed.  By default, only
+partitioned tables are listed; supply a pattern to also include
+partitioned indexes.  If the form \dP+
+is used, the sum of sizes of table's partitions (including their
+indexes) is also displayed, along with the associated description.
+
+
+
+If the modifier n (nested) is used,
+then non-root partitioned tables are included, and a column is shown
+displaying the parent of each partitioned relation.
+
+If n is combined with +, two
+sizes are shown: one including the total size of directly-attached
+leaf partitions, and another showing the total size of all partitions,
+including indirectly attached sub-partitions.
+
+
+  
+
+
+  
+\dPi[n+] [ pattern ]
+
+
+Lists partitioned indexes. If pattern is specified, only entries
+whose name matches the pattern are listed. If the form
+\dPi+ is used, the sum of sizes of index's
+partitions is also displayed, along with the associated description.
+
+
+
+ If the n modifier is used, non-root partitioned
+ indexes are displayed too.
+
+
+  
+
+
+  
+\dPt[n+] [ pattern ]
+
+
+Lists partitioned tables. If pattern is specified, only entries
+whose name matches the pattern are listed. If the form
+\dPt+ is used, the sum of sizes of table's
+partitions is also displayed, along with the associated description.
+
+
+
+ If the n modifier is used, non-root partitioned
+ tables are displayed too.
+
+
+  
+
+
   
 \drds [ role-pattern [ database-pattern ] ]
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 72188b7f3ef..ce8bee99087 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -782,6 +782,42 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 			case 'p':
 success = permissionsList(pattern);
 break;
+			case 'P':
+{
+	bool show_nested = strchr(cmd, 'n') ? true : false;
+
+	switch (cmd[2])
+	{
+		case 'i':
+			/* show indexes only */
+			success = listPartitions(pattern, show_verbose,
+	 true, false,
+	 show_nested);
+			break;
+		case 't':
+			/* show tables only */
+			success = listPartitions(pattern, show_verbose,
+	 false, true,
+	 show_nested);
+			break;
+		case '+':
+		case '\0':
+		case 'n':
+			/*
+			 * Show only tables if there is no pattern.  Also
+			 * show indexes if pattern is specified.  Show
+			 * total size if verbose output is specified.
+			 */
+			success = listPartitions(pattern, show_verbose,
+	 false, false,
+	 show_nested);
+			break;
+		default:
+			status = PSQL_CMD_UNKNOWN;
+			break;
+	}
+}
+break;
 			case 'T':
 success = describeTypes(pattern, show_verbose, show_system);
 break;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 036810303a6..736cca33d6b 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3774,6 +3774,247 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 	return true;
 }
 
+/*
+ * listPartitions()
+ *
+ * handler for \dP[t|i][n]
+ */
+bool
+listPartitions(const char *pattern, bool verbose, bool show_indexes,
+			   bool show_tables, bool show_nested)
+{
+	PQExpBufferData buf;
+	PGresult   *res;
+	printQueryOpt myopt = pset.popt;
+	bool translate_columns[] = {false, false, false, false, false, false, false, false, false};
+	const char *size_function;
+	const char *relkind_str;
+	const char *errmsg_pattern;
+	const char *errmsg_no_pattern;
+	const char *tabletitle;
+	bool		mixed_output = false;
+
+	/*
+	 * Note: Declarative table partitions are only supported as of Pg 10.0.
+	 */

Re: Pluggable Storage - Andres's take

2019-04-02 Thread Andres Freund

Hi,

On 2019-04-02 17:11:07 +1100, Haribabu Kommi wrote:

> +  xreflabel="default_table_access_method">
> +  default_table_access_method 
> (string)
> +  
> +   default_table_access_method configuration 
> parameter
> +  
> +  
> +  
> +   
> +The value is either the name of a table access method, or an empty 
> string
> +to specify using the default table access method of the current 
> database.
> +If the value does not match the name of any existing table access 
> method,
> +PostgreSQL will automatically use the 
> default
> +table access method of the current database.
> +   

Hm, this doesn't strike me as right (there's no such thing as "default
table access method of the current database"). You just get an error in
that case. I think we should simply not allow setting to "" - what's the
point in that?

Greetings,

Andres Freund

Re: ToDo: show size of partitioned table

2019-04-02 Thread Alvaro Herrera

Something is not right:

alvherre=# \dP t
 List of partitioned tables or indexes
 Esquema | Nombre |  Dueño   |   Tipo|   On 
table   
-++--+---+--
 public  | t  | alvherre | partitioned table | Project-Id-Version: psql 
(PostgreSQL) 10+
 ||  |   | Report-Msgid-Bugs-To: 
pgsql-b...@postgresql.org +
 ||  |   | PO-Revision-Date: 2017-07-10 
12:14-0400 +
 ||  |   | Last-Translator: Álvaro 
Herrera+
 ||  |   | Language-Team: PgSQL Español 
+
 ||  |   | Language: es 
   +
 ||  |   | MIME-Version: 1.0
   +
 ||  |   | Content-Type: text/plain; 
charset=UTF-8 +
 ||  |   | Content-Transfer-Encoding: 
8bit +
 ||  |   | Plural-Forms: nplurals=2; 
plural=n != 1;+
 ||  |   | 
(1 fila)


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ToDo: show size of partitioned table

2019-04-02 Thread Alvaro Herrera

On 2019-Mar-13, Amit Langote wrote:

> + if (show_indexes)
> + {
> + objects_name = gettext_noop("indexes");
...
> + if (pattern)
> + /* translator: objects_name is "index", "table" */
> + psql_error("Did not find any partitioned %s named 
> \"%s\".\n",
> +object_name,
> +pattern);
...
> + /* translator: objects_name is "indexes", "tables" */
> + appendPQExpBuffer(&title, _("List of partitioned %s"), 
> objects_name);
> +

This stuff doesn't work from a translation standpoint; sentence building
is not allowed.  You have to do stuff like:

if (tables)
{
errmsg = gettext_noop("Could not find any partitioned table");
tabletitle = gettext_noop("List of partitioned tables");
}
 ...
if (pattern)
/* translator: objects_name is "index", "table" */
psql_error(gettext(errmsg), object_name, pattern);
  ...
appendPQExpBuffer(&title, gettext(tabletitle));

and so on.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

2019-04-02 Thread Erik Rijkers


On 2019-04-02 15:36, Erik Rijkers wrote:

On 2019-04-02 14:43, Peter Eisentraut wrote:

On 2019-04-01 10:52, Peter Eisentraut wrote:

On 2019-03-31 05:49, Erik Rijkers wrote:

STORED in a
file_fdw foreign table still silently creates the column which then
turns out to be useless on SELECT, with an error like:

"ERROR:  column some_column_name is a generated column
DETAIL:  Generated columns cannot be used in COPY."

Maybe it'd be possible to get an error earlier, i.e., while trying 
to

create such a useless column?


I'll look into it.


I've been trying to create a test case for file_fdw for this, but I'm
not getting your result.  Can you send a complete test case?


attached is run_ft.sh  which creates a text file:  /tmp/pg_head.txt
then sets it up as a foreign table, and adds a generated column.

Then selects a succesful select, followed by a error-producing select.

Some selects are succesful but some fail.  I'm not sure why it sometimes 
fails  (it's not just the explicitness of the generated-column-name like 
I suggested earlier).




My output of run_ft.sh is below.


$ ./run_ft.sh
create schema if not exists "tmp";
CREATE SCHEMA
create server if not exists "tmpserver" foreign data wrapper file_fdw;
CREATE SERVER
drop   foreign table if exists tmp.pg_head cascade;
DROP FOREIGN TABLE
create foreign table   tmp.pg_head (
"Gene"  text,
"Ratio H/L normalized Exp1" numeric
)
server tmpserver
options (
delimiter E'\t'
  , format 'csv'
  , header 'TRUE'
  , filename  '/tmp/pg_head.txt'
);
CREATE FOREIGN TABLE
alter foreign table tmp.pg_head
   add column "Ratio H/L normalized Exp1 Log2 (Generated column)" 
numeric generated always as (case when "Ratio H/L normalized Exp1" > 0 
then log(2, "Ratio H/L normalized Exp1") else  null end) stored

;
ALTER FOREIGN TABLE
-- this is OK (although the generated-column values are all empty/null)
select
 "Gene"
   , "Ratio H/L normalized Exp1"
   , "Ratio H/L normalized Exp1 Log2 (Generated column)"
from tmp.pg_head
limit 3 ;
  Gene  | Ratio H/L normalized Exp1 | Ratio H/L normalized Exp1 Log2 
(Generated column)

+---+---
 Dhx9   |   NaN |
 Gapdh  |   0.42288 |
 Gm8797 |   0.81352 |
(3 rows)

-- but this fails
select
"Gene"
   , "Ratio H/L normalized Exp1 Log2 (Generated column)"
from tmp.pg_head
limit 3 ;
ERROR:  column "Ratio H/L normalized Exp1 Log2 (Generated column)" is a 
generated column

DETAIL:  Generated columns cannot be used in COPY.


#!/bin/bash

echo "Gene	Ratio H/L normalized Exp1
Dhx9	NaN
Gapdh	0.42288
Gm8797	0.81352
Aldh2	0.89913
Ccdc12	NaN
Hip1	NaN
Hist1h2aa	0.66911
Tpm2	0.57535
Fasn	NaN
Aldoa	0.61898
Unc13b	NaN
Wrn	0.0050816
Psma1	NaN
Ldha	0.90211
Numa1	NaN" > /tmp/pg_head.txt

psql -Xa << FT_SETUP_TXT

create schema if not exists "tmp";
create server if not exists "tmpserver" foreign data wrapper file_fdw;
drop   foreign table if exists tmp.pg_head cascade;
create foreign table   tmp.pg_head (
"Gene"  text,
"Ratio H/L normalized Exp1" numeric
)
server tmpserver
options (
delimiter E'\t'
  , format 'csv'
  , header 'TRUE'
  , filename  '/tmp/pg_head.txt'
);

alter foreign table tmp.pg_head
   add column "Ratio H/L normalized Exp1 Log2 (Generated column)" numeric generated always as (case when "Ratio H/L normalized Exp1" > 0 then log(2, "Ratio H/L normalized Exp1") else  null end) stored
;

FT_SETUP_TXT

psql -qXa << SQL_TXT

-- this is OK (although the generated-column values are all empty/null)
select 
 "Gene" 
   , "Ratio H/L normalized Exp1"
   , "Ratio H/L normalized Exp1 Log2 (Generated column)"
from tmp.pg_head
limit 3 ;

-- but this fails
select 
"Gene"
   , "Ratio H/L normalized Exp1 Log2 (Generated column)"
from tmp.pg_head
limit 3 ;

SQL_TXT

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-02 Thread Alvaro Herrera

I noticed a couple of inconsistent uses of "heap" which I changed to
"table", and also changed one update_multi_params() call that had only
one param to update_param().  Very minor other tweaks here and there.

And pushed.

Thanks for the reviews!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ordered Partitioned Table Scans

2019-04-02 Thread David Rowley

On Wed, 3 Apr 2019 at 01:26, Amit Langote  wrote:
> +#include "nodes/pathnodes.h"
> ...
> +partitions_are_ordered(struct RelOptInfo *partrel)
>
> Maybe, "struct" is unnecessary?

I just left it there so that the signature matched the header file.
Looking around for examples I see make_partition_pruneinfo() has the
structs only in the header file, so I guess that is how we do things,
so changed to that in the attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

mergeappend_to_append_conversion_v19.patch
Description: Binary data

Re: COPY FROM WHEN condition

2019-04-02 Thread Andres Freund

Gah, ignore this mail - I've somehow accidentally sent this early. I'm
not quite sure, but I miskeyed, and emacs sent it early, without going
through mutt...

Re: [HACKERS] proposal: schema variables

2019-04-02 Thread Pavel Stehule

út 26. 3. 2019 v 6:40 odesílatel Pavel Stehule 
napsal:

> Hi
>
> ne 24. 3. 2019 v 6:57 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> rebase against current master
>>
>
>
> fixed issue IF NOT EXISTS & related regress tests
>

another rebase

Regards

Pavel


> Regards
>
> Pavel
>
>
>> Regards
>>
>> Pavel
>>
>


schema-variables-20190402.patch.gz
Description: application/gzip

Re: COPY FROM WHEN condition

2019-04-02 Thread Andres Freund

On 2019-04-03 06:41:49 +1300, David Rowley wrote:
> However, I've ended up not doing it that way as the patch requires
> more than just an array of TupleTableSlots to be stored in the
> ResultRelInfo, it'll pretty much need all of what I have in
> CopyMultiInsertBuffer, which includes line numbers for error
> reporting, a BulkInsertState and a counter to track how many of the
> slots are used.  I had thoughts that we could just tag the whole
> CopyMultiInsertBuffer in ResultRelInfo, but that requires moving it
> somewhere a bit more generic. Another thought would be that we have
> something like "void *extra;" in ResultRelInfo that we can just borrow
> for copy.c.  It may be interesting to try this out to see if it saves
> much in the way of performance.

Hm, we could just forwad declare struct CopyMultiInsertBuffer in a
header, and only define it in copy.c. That doesn't sound insane to me.


> I've got the patch into a working state now and wrote a bunch of
> comments. I've not really done a thorough read back of the code again.
> I normally wait until the light is coming from a different angle
> before doing that, but there does not seem to be much time to wait for
> that in this case, so here's v2.  Performance seems to be about the
> same as what I reported yesterday.

Cool.


> +/*
> + * Multi-Insert handling code for COPY FROM.  Inserts are performed in
> + * batches of up to MAX_BUFFERED_TUPLES.

This should probably be moved to the top of the file.


> + * When COPY FROM is used on a partitioned table, we attempt to maintain
> + * only MAX_PARTITION_BUFFERS buffers at once.  Buffers that are completely
> + * unused in each batch are removed so that we end up not keeping buffers for
> + * partitions that we might not insert into again.
> + */
> +#define MAX_BUFFERED_TUPLES  1000
> +#define MAX_BUFFERED_BYTES   65535
> +#define MAX_PARTITION_BUFFERS15

> +/*
> + * CopyMultiInsertBuffer
> + *   Stores multi-insert data related to a single relation in 
> CopyFrom.
> + */
> +typedef struct

Please don't create anonymous structs - they can't be forward declared,
and some debugging tools handle them worse than named structs. And it
makes naming CopyMultiInsertBuffer in the comment less necessary.


> +{
> + Oid relid;  /* Relation ID this 
> insert data is for */
> + TupleTableSlot **slots; /* Array of MAX_BUFFERED_TUPLES to store
> +  * tuples */
> + uint64 *linenos;/* Line # of tuple in copy stream */

It could make sense to allocate those two together, or even as part of
the CopyMultiInsertBuffer itself, to reduce allocator overhead.


> + else
> + {
> + CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) 
> palloc(sizeof(CopyMultiInsertBuffer));
> +
> + /* Non-partitioned table.  Just setup the buffer for the table. 
> */
> + buffer->relid = RelationGetRelid(rri->ri_RelationDesc);
> + buffer->slots = palloc0(MAX_BUFFERED_TUPLES * 
> sizeof(TupleTableSlot *));
> + buffer->linenos = palloc(MAX_BUFFERED_TUPLES * sizeof(uint64));
> + buffer->resultRelInfo = rri;
> + buffer->bistate = GetBulkInsertState();
> + buffer->nused = 0;
> + miinfo->multiInsertBufferTab = NULL;
> + miinfo->buffer = buffer;
> + miinfo->nbuffers = 1;
> + }

Can this be moved into a helper function?


> + /*
> +  * heap_multi_insert leaks memory, so switch to short-lived memory 
> context
> +  * before calling it.
> +  */

s/heap_multi_insert/table_multi_insert/



> + /*
> +  * If there are any indexes, update them for all the inserted tuples, 
> and
> +  * run AFTER ROW INSERT triggers.
> +  */
> + if (resultRelInfo->ri_NumIndices > 0)
> + {
> + for (i = 0; i < nBufferedTuples; i++)
> + {
> + List   *recheckIndexes;
> +
> + cstate->cur_lineno = buffer->linenos[i];
> + recheckIndexes =
> + ExecInsertIndexTuples(buffer->slots[i], estate, 
> false, NULL,
> +   NIL);
> + ExecARInsertTriggers(estate, resultRelInfo,
> +  
> buffer->slots[i],
> +  
> recheckIndexes, cstate->transition_capture);
> + list_free(recheckIndexes);
> + }
> + }
> +
> + /*
> +  * There's no indexes, but see if we need to run AFTER ROW INSERT 
> triggers
> +  * anyway.
> +  */
> + else if (resultRelInfo->ri_TrigDesc != NULL &&
> +  (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
> +   resultRelInfo->r

Re: COPY FROM WHEN condition

2019-04-02 Thread Andres Freund

On 2019-04-03 06:41:49 +1300, David Rowley wrote:
> However, I've ended up not doing it that way as the patch requires
> more than just an array of TupleTableSlots to be stored in the
> ResultRelInfo, it'll pretty much need all of what I have in
> CopyMultiInsertBuffer, which includes line numbers for error
> reporting, a BulkInsertState and a counter to track how many of the
> slots are used.  I had thoughts that we could just tag the whole
> CopyMultiInsertBuffer in ResultRelInfo, but that requires moving it
> somewhere a bit more generic. Another thought would be that we have
> something like "void *extra;" in ResultRelInfo that we can just borrow
> for copy.c.  It may be interesting to try this out to see if it saves
> much in the way of performance.

Hm, we could just forwad declare struct CopyMultiInsertBuffer in a
header, and only define it in copy.c. That doesn't sound insane to me.


> I've got the patch into a working state now and wrote a bunch of
> comments. I've not really done a thorough read back of the code again.
> I normally wait until the light is coming from a different angle
> before doing that, but there does not seem to be much time to wait for
> that in this case, so here's v2.  Performance seems to be about the
> same as what I reported yesterday.

Cool.


> +/*
> + * Multi-Insert handling code for COPY FROM.  Inserts are performed in
> + * batches of up to MAX_BUFFERED_TUPLES.

This should probably be moved to the top of the file.


> + * When COPY FROM is used on a partitioned table, we attempt to maintain
> + * only MAX_PARTITION_BUFFERS buffers at once.  Buffers that are completely
> + * unused in each batch are removed so that we end up not keeping buffers for
> + * partitions that we might not insert into again.
> + */
> +#define MAX_BUFFERED_TUPLES  1000
> +#define MAX_BUFFERED_BYTES   65535
> +#define MAX_PARTITION_BUFFERS15

> +/*
> + * CopyMultiInsertBuffer
> + *   Stores multi-insert data related to a single relation in 
> CopyFrom.
> + */
> +typedef struct

Please don't create anonymous structs - they can't be forward declared,
and some debugging tools handle them worse than named structs. And it
makes naming CopyMultiInsertBuffer in the comment less necessary.


> +{
> + Oid relid;  /* Relation ID this 
> insert data is for */
> + TupleTableSlot **slots; /* Array of MAX_BUFFERED_TUPLES to store
> +  * tuples */
> + uint64 *linenos;/* Line # of tuple in copy stream */

It could make sense to allocate those two together, or even as part of
the CopyMultiInsertBuffer itself, to reduce allocator overhead.


> + else
> + {
> + CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) 
> palloc(sizeof(CopyMultiInsertBuffer));
> +
> + /* Non-partitioned table.  Just setup the buffer for the table. 
> */
> + buffer->relid = RelationGetRelid(rri->ri_RelationDesc);
> + buffer->slots = palloc0(MAX_BUFFERED_TUPLES * 
> sizeof(TupleTableSlot *));
> + buffer->linenos = palloc(MAX_BUFFERED_TUPLES * sizeof(uint64));
> + buffer->resultRelInfo = rri;
> + buffer->bistate = GetBulkInsertState();
> + buffer->nused = 0;
> + miinfo->multiInsertBufferTab = NULL;
> + miinfo->buffer = buffer;
> + miinfo->nbuffers = 1;
> + }

Can this be moved into a helper function?


> + /*
> +  * heap_multi_insert leaks memory, so switch to short-lived memory 
> context
> +  * before calling it.
> +  */

s/heap_multi_insert/table_multi_insert/



> + /*
> +  * If there are any indexes, update them for all the inserted tuples, 
> and
> +  * run AFTER ROW INSERT triggers.
> +  */
> + if (resultRelInfo->ri_NumIndices > 0)
> + {
> + for (i = 0; i < nBufferedTuples; i++)
> + {
> + List   *recheckIndexes;
> +
> + cstate->cur_lineno = buffer->linenos[i];
> + recheckIndexes =
> + ExecInsertIndexTuples(buffer->slots[i], estate, 
> false, NULL,
> +   NIL);
> + ExecARInsertTriggers(estate, resultRelInfo,
> +  
> buffer->slots[i],
> +  
> recheckIndexes, cstate->transition_capture);
> + list_free(recheckIndexes);
> + }
> + }
> +
> + /*
> +  * There's no indexes, but see if we need to run AFTER ROW INSERT 
> triggers
> +  * anyway.
> +  */
> + else if (resultRelInfo->ri_TrigDesc != NULL &&
> +  (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
> +   resultRelInfo->r

Re: COPY FROM WHEN condition

2019-04-02 Thread David Rowley

On Tue, 2 Apr 2019 at 14:11, Andres Freund  wrote:
> Why do we need that list membership check? If you append the
> ResultRelInfo to the list when creating the ResultRelInfo's slots array,
> you don't need to touch the list after a partition lookup - you know
> it's a member if the ResultRelInfo has a slot array.  Then you only need
> to iterate the list when you want to drop slots to avoid using too much
> memory - and that's also a sequential scan of the hash table in your
> approach, right?

Okay, you're right, we could just determine if we've got a new
ResultRelInfo by the lack of any bulk insert slots being set in it.
However, I've ended up not doing it that way as the patch requires
more than just an array of TupleTableSlots to be stored in the
ResultRelInfo, it'll pretty much need all of what I have in
CopyMultiInsertBuffer, which includes line numbers for error
reporting, a BulkInsertState and a counter to track how many of the
slots are used.  I had thoughts that we could just tag the whole
CopyMultiInsertBuffer in ResultRelInfo, but that requires moving it
somewhere a bit more generic. Another thought would be that we have
something like "void *extra;" in ResultRelInfo that we can just borrow
for copy.c.  It may be interesting to try this out to see if it saves
much in the way of performance.

I've got the patch into a working state now and wrote a bunch of
comments. I've not really done a thorough read back of the code again.
I normally wait until the light is coming from a different angle
before doing that, but there does not seem to be much time to wait for
that in this case, so here's v2.  Performance seems to be about the
same as what I reported yesterday.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

david_tableam_copy_v2.patch
Description: Binary data

Re: Checksum errors in pg_stat_database

2019-04-02 Thread Magnus Hagander

On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier  wrote:

> On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
> > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier 
> wrote:
> >>  One thing which is not
> >> proposed on this patch, and I am fine with it as a first draft, is
> >> that we don't have any information about the broken block number and
> >> the file involved.  My gut tells me that we'd want a separate view,
> >> like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
> >> blck) to be complete.  But that's just for future work.
> >
> > That could indeed be nice.
>
> Actually, backpedaling on this one...  pg_stat_checksums_details may
> be a bad idea as we could finish with one row per broken block.  If
> a corruption is spreading quickly, pgstat would not be able to sustain
> that amount of objects.  Having pg_stat_checksums would allow us to
> plugin more data easily based on the last failure state:
> - last relid of failure
> - last fork type of failure
> - last block number of failure.
> Not saying to do that now, but having that in pg_stat_database does
> not seem very natural to me.  And on top of that we would have an
> extra row full of NULLs for shared objects in pg_stat_database if we
> adopt the unique view approach...  I find that rather ugly.
>

I think that tracking each and every block is of course a non-starter, as
you've noticed.

I'm really not sure how much those three extra fields help, TBH. As I see
it the real usecase for this is automated monitoring and quick-checks of
the kind of "is my db currently broken somewhere", in combination with "did
this occur recently" (for people who have never looked at their stats).

This gives people enough information to know where to go look in the logs.

I mean, what's the actual usecase for tracking relid/fork/block of the
*last* failure only? To monitor and see if it changes? What do I do when I
have 10 failures, and I only know about the last one? (I have to go to the
logs anyway)

I think having the count and hte last time make sense, but I'm very
sceptical about the rest.

I can somewhat agree that splitting it  on a per database level might even
at that be overdoing it. What might actually be more interesting from a
failure-location perspective would be tablespace, rather than any of the
others. Or we could reduce it down to just putting it in pg_stat_bgwriter
and only count global values perhaps, if in the end we don't think the
split-per-database is reasonable?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Inadequate executor locking of indexes

2019-04-02 Thread Tom Lane

David Rowley  writes:
> On Thu, 14 Mar 2019 at 21:52, Amit Langote
>  wrote:
>> If the correct lock is taken in both cases by the current code, then maybe
>> there's no need to change anything?

> I'm aiming to fix the problem described by Tom when he started this
> thread. It's pretty easy to get an assert failure in master.

Yeah, removing the Assert failure is a minimum requirement.  Whether
we need to do more than that can be debated.

> I think this borderline impossible to fix completely.

After further review I concur with that position.  The cases where
we have lock upgrade hazards are where some subquery uses a table
in a way that requires a stronger lock than is needed for some other
reference that was processed earlier.  It's kind of pointless to
guarantee that we avoid that in the planner or executor if the parser
already hit the problem; and it seems darn near impossible, and
certainly impractical, to avoid the problem while parsing.

(Actually, even if we fixed the parser, I'm not sure we'd be out of
the woods.  Consider a case where a subquery requires AccessShareLock
on some table, but then when we come to expand inheritance in the
planner, we discover that that same table is a child or partition of
some target table, so now we need a stronger lock on it.)

I think therefore that we should forget about the idea of avoiding
lock upgrade hazards, at least for situations where the hazard is
caused by conflicting table references that are all written by the
user.  That's not common, and since none of these lock types are
exclusive, the odds of an actual deadlock are low anyway.

> Maybe you're right about being able to use rellockmode for indexes,
> but I assume that we lowered the lock level for indexes for some
> reason, and this would reverse that.

I kind of think that we *should* use rellockmode for indexes too.
To the extent that we're doing differently from that now, I think
it's accidental not intentional.  It would perhaps have been difficult
to clean that up completely before we added rellockmode, but now that
we've got that we should use it.  I feel that adding a second field
for index lock mode would just be perpetuating some accidental
inconsistencies.

In short, I think we should take the parts of this patch that modify
the index_open calls, but make them use rte->rellockmode; and forget
all the other parts.

BTW, I'd also suggest expanding the header comment for 
ExecRelationIsTargetRelation to explain that it's no longer
used in the core code, but we keep it around because FDWs
might want it.

regards, tom lane

Re: Compressed TOAST Slicing

2019-04-02 Thread Stephen Frost

Greetings,

* Darafei "Komяpa" Praliaskouski (m...@komzpa.net) wrote:
> > I'll plan to push this tomorrow with the above change (and a few
> > additional comments to explain what all is going on..).
> 
> Is everything ok? Can it be pushed?

This has been pushed now.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [PATCH v22] GSSAPI encryption support

2019-04-02 Thread Stephen Frost

Greetings,

* Robbie Harwood (rharw...@redhat.com) wrote:
> Stephen Frost  writes:
> > * Robbie Harwood (rharw...@redhat.com) wrote:
> >> Stephen Frost  writes:
> >> 
> >> I wanted to note a couple things about this approach.  It now uses
> >> one more buffer than before (in contrast to the previous approach,
> >> which reused a buffer for received data that was encrypted and
> >> decrypted).
> >
> > Yeah, I don't see that as too much of an issue and it certainly seems
> > cleaner and simpler to reason about to me, which seems worth the
> > modest additional buffer cost.  That's certainly something we could
> > change if others feel differently.
> >
> >> Since these are static fixed size buffers, this increases the total
> >> steady-state memory usage by 16k as opposed to re-using the buffer.
> >> This may be fine; I don't know how tight RAM is here.
> >
> > It seems unlikely to be an issue to me- and I would contend that the
> > prior implementation didn't actually take any steps to prevent the
> > other side from sending packets of nearly arbitrary size (up to 1G),
> > so while the steady-state memory usage of the prior implementation was
> > less when everyone was playing nicely, it could have certainly been
> > abused.  I'm a lot happier having an explicit cap on how much memory
> > will be used, even if that cap is a bit higher.
> 
> Yeah, that's definitely a fair point.  We could combine the two
> approaches, but I don't really see a reason to if it's unlikely to be an
> issue - as you say, this is more readable.  It can always be a follow-on
> if needed.

Agreed, we could do that later if it seems like it would be helpful.

> > I did add in a simple pg_stat_gssapi view, modeled on pg_stat_ssl, so
> > that you can check server-side if GSSAPI was used for authentication
> > and/or encryption, and what principal was used if GSSAPI was used for
> > authentication.
> 
> Good idea.

Thanks. :)  I definitely like being able to see from the backend side of
things when a connection is encrypted or not, and being able to see the
principal used for authentication is really nice too.

> >> Again, these are nits, and I think I'm okay with the changes.
> >
> > Great, thanks again for reviewing!
> >
> > Updated patch attached, if you could take another look through it,
> > that'd be great.
> 
> I'm happy with this!  Appreciate you exploring my concerns.

Fantastic, thanks so much for working on this over the years!  Unless
there's any further comments, I'm going to push this tomorrow.

Thanks again!

Stephen


signature.asc
Description: PGP signature

Re: Minimal logical decoding on standbys

2019-04-02 Thread Andres Freund

Hi,

On 2019-04-02 15:26:52 +0530, Amit Khandekar wrote:
> On Thu, 14 Mar 2019 at 15:00, Amit Khandekar  wrote:
> > I managed to get a recovery conflict by :
> > 1. Setting hot_standby_feedback to off
> > 2. Creating a logical replication slot on standby
> > 3. Creating a table on master, and insert some data.
> > 2. Running : VACUUM FULL;
> >
> > This gives WARNING messages in the standby log file.
> > 2019-03-14 14:57:56.833 IST [40076] WARNING:  slot decoding_standby w/
> > catalog xmin 474 conflicts with removed xid 477
> > 2019-03-14 14:57:56.833 IST [40076] CONTEXT:  WAL redo at 0/3069E98
> > for Heap2/CLEAN: remxid 477
> >
> > But I did not add such a testcase into the test file, because with the
> > current patch, it does not do anything with the slot; it just keeps on
> > emitting WARNING in the log file; so we can't test this scenario as of
> > now using the tap test.
> 
> I am going ahead with drop-the-slot way of handling the recovery
> conflict. I am trying out using ReplicationSlotDropPtr() to drop the
> slot. It seems the required locks are already in place inside the for
> loop of ResolveRecoveryConflictWithSlots(), so we can directly call
> ReplicationSlotDropPtr() when the slot xmin conflict is found.

Cool.


> As explained above, the only way I could reproduce the conflict is by
> turning hot_standby_feedback off on slave, creating and inserting into
> a table on master and then running VACUUM FULL. But after doing this,
> I am not able to verify whether the slot is dropped, because on slave,
> any simple psql command thereon, waits on a lock acquired on sys
> catache, e.g. pg_authid. Working on it.

I think that indicates a bug somewhere. If replay progressed, it should
have killed the slot, and continued replaying past the VACUUM
FULL. Those symptoms suggest replay is stuck somewhere. I suggest a)
compiling with WAL_DEBUG enabled, and turning on wal_debug=1, b) looking
at a backtrace of the startup process.

Greetings,

Andres Freund

Re: Column lookup in a row performance

2019-04-02 Thread Tom Lane

=?UTF-8?B?0J/QsNCy0LvRg9GF0LjQvSDQmNCy0LDQvQ==?=  writes:
>> (1) Backwards compatibility, and (2) it's not clear that a different
>> layout would be a win for all cases.

> I am curious regarding (2), for my understanding it is good to find
> out at least one case when layout with lengths/offsets in a header
> will be crucially worse. I will be happy if someone can elaborate.

It seems like you think the only figure of merit here is how fast
deform_heap_tuple runs.  That's not the case.  There are at least
two issues:

1.  You're not going to be able to do this without making tuples
larger overall in many cases; but more data means more I/O which
means less performance.  I base this objection on the observation
that our existing design allows single-byte length "words" in many
common cases, but it's really hard to see how you could avoid
storing a full-size offset for each column if you want to be able
to access each column in O(1) time without any examination of other
columns.

2.  Our existing system design has an across-the-board assumption
that each variable-length datum has its length embedded in it,
so that a single pointer carries enough information for any called
function to work with the value.  If you remove the length word
and expect the length to be computed by subtracting two offsets that
are not even physically adjacent to the datum, that stops working.
There is no fix for that that doesn't add performance costs and
complexity.

Practically speaking, even if we were willing to lose on-disk database
compatibility, point 2 breaks so many internal and extension APIs that
there's no chance whatever that we could remove the length-word datum
headers.  That means that the added fields in tuple headers would be
pure added space with no offsetting savings in the data size, making
point 1 quite a lot worse.

regards, tom lane

Re: patch to allow disable of WAL recycling

2019-04-02 Thread Jerry Jelinek

On Mon, Apr 1, 2019 at 7:48 PM Thomas Munro  wrote:

> On Sat, Mar 30, 2019 at 11:18 AM Jerry Jelinek 
> wrote:
> > I went through your new version of the patch and it all looks great to
> me.
>
> I moved the error handling logic around a bit so we'd capture errno
> immediately after the syscalls.  I also made a couple of further
> tweaks to comments and removed some unnecessary casts.
>
> I suspect we took so long on this because of lack of ZFS knowledge and
> uncertainty about the precise reason for the current coding in terms
> of crash safety in general.  After learning more, I now suspect the
> claim about fsyncdata(2) and indirect blocks in the comments may be
> incorrect (it may stem from buggy behaviour on older Linux kernels),
> but I'm not sure and it's not this patch's job to change that.
>
> Pushed.  Thanks for the patch!


 Thanks a lot for getting this integrated, and thanks to everyone else who
gave me so much feedback and assistance during this process!
Jerry

Re: partitioned tables referenced by FKs

2019-04-02 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Mar-29, Robert Haas wrote:
>> I don't know why dependency.c doesn't handle this internally.  If I
>> say that I want to delete a list of objects, some of which can't be
>> dropped without dropping certain other things, dependency.c ought to
>> be able to suspend judgement on what the problems are until it's
>> looked over that whole list.  It seems to me that we've had to fight
>> with this problem before, and I don't know why every individual bit of
>> code that calls into that file has to be responsible for working
>> around it individually.

> Hmm, if you think this kind of situation is more widespread than this
> particular case, then maybe we can add the hooks I was talking about,
> and simplify those other cases while fixing this problem.  Do you have
> it in your mind what cases are those?  Nothing comes to mind, but I'll
> search around and see if I can find anything.

FWIW, I think that the dependency mechanism is designed around the idea
that whether it's okay to drop a *database object* depends only on what
other *database objects* rely on it, and that you can always make a DROP
valid by dropping all the dependent objects.  That's not an unreasonable
design assumption, considering that the SQL standard embeds the same
assumption in its RESTRICT/CASCADE syntax.

The problem this patch is running into is that we'd like to make the
validity of dropping a table partition depend on whether there are
individual rows in some other table that FK-reference rows in the target
partition.  That's just completely outside the object-dependency model,
and I'm not sure why Robert finds that surprising.

I think the stopgap solution of requiring a separate DETACH step first
is probably fine, although I don't see any documentation explaining that
in the v10 patch?  Also the test cases in the v10 patch don't seem to
square with that reality, or at least their comments need work.

In the long run I wouldn't necessarily object to having some sort of
hooks in dependency.c to allow this type of use-case to be addressed.
But it'd likely be a good idea to have at least one other concrete
example in mind before trying to design a mechanism for that.

regards, tom lane

Re: "could not reattach to shared memory" on buildfarm member dory

2019-04-02 Thread Tom Lane

Noah Misch  writes:
> I can reproduce the 4 MiB allocations described
> in https://postgr.es/m/29823.1525132...@sss.pgh.pa.us; a few times per
> "vcregress check", they emerge in the middle of PGSharedMemoryReAttach().
> The 4 MiB allocations are stacks for new threads of the default thread
> pool[1].

Hah!  It is great to finally have an understanding of what is happening
here.

I worry that your proposed fix is unstable, in particular this assumption
seems shaky:

> + * ... The idea is that, if the allocator handed out
> + * REGION1 pages before REGION2 pages at one occasion, it will do so whenever
> + * both regions are free.

I wonder whether it's possible to address this by configuring the "default
thread pool" to have only one thread?  It seems like the extra threads are
just adding backend startup overhead to no benefit, since we won't use 'em.

regards, tom lane

Re: Progress reporting for pg_verify_checksums

2019-04-02 Thread Michael Banck

Hi,

Am Dienstag, den 02.04.2019, 15:56 +0900 schrieb Michael Paquier:
> Regarding all this tooling around checksums.  With v12, enabling
> checksums with no actual downtime is doable with a primary-standby
> deployment using physical replication and one planned failover

Can you explain in more detail how this would work? I thought we came to
the conclusion (and the documentation seems to indicate so), that you
should stop all participating instances of a cluster and then enable
checksums on all of them, which would impose a downtime.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Re: "could not reattach to shared memory" on buildfarm member dory

2019-04-02 Thread Noah Misch

On Sun, Dec 02, 2018 at 09:35:06PM -0800, Noah Misch wrote:
> On Tue, Sep 25, 2018 at 08:05:12AM -0700, Noah Misch wrote:
> > On Mon, Sep 24, 2018 at 01:53:05PM -0400, Tom Lane wrote:
> > > Overall, I agree that neither of these approaches are exactly attractive.
> > > We're paying a heck of a lot of performance or complexity to solve a
> > > problem that shouldn't even be there, and that we don't understand well.
> > > In particular, the theory that some privileged code is injecting a thread
> > > into every new process doesn't square with my results at
> > > https://www.postgresql.org/message-id/15345.1525145612%40sss.pgh.pa.us
> > > 
> > > I think our best course of action at this point is to do nothing until
> > > we have a clearer understanding of what's actually happening on dory.
> > > Perhaps such understanding will yield an idea for a less painful fix.
> 
> Could one of you having a dory login use
> https://live.sysinternals.com/Procmon.exe to capture process events during
> backend startup?  The ideal would be one capture where startup failed reattach
> and another where it succeeded, but having the successful run alone would be a
> good start.

Joseph Ayers provided, off-list, the capture from a successful startup.  It
wasn't materially different from the one my system generates, so I abandoned
that line of inquiry.  Having explored other aspects of the problem, I expect
the attached fix will work.  I can reproduce the 4 MiB allocations described
in https://postgr.es/m/29823.1525132...@sss.pgh.pa.us; a few times per
"vcregress check", they emerge in the middle of PGSharedMemoryReAttach().  On
my system, there's 5.7 MiB of free address space just before UsedShmemSegAddr,
so the 4 MiB allocation fits in there, and PGSharedMemoryReAttach() does not
fail.  Still, it's easy to imagine that boring variations between environments
could surface dory's problem by reducing that free 5.7 MiB to, say, 3.9 MiB.

The 4 MiB allocations are stacks for new threads of the default thread
pool[1].  (I confirmed that by observing their size change when I changed
StackReserveSize in MSBuildProject.pm and by checking all stack pointers with
"thread apply all info frame" in gdb.)  The API calls in
PGSharedMemoryReAttach() don't cause the thread creation; it's a timing
coincidence.  Commit 2307868 would have worked around the problem, but
pg_usleep() is essentially a no-op on Windows before
pgwin32_signal_initialize() runs.  (I'll push Assert(pgwin32_signal_event) to
some functions.)  While one fix is to block until all expected threads have
started, that could be notably slow, and I don't know how to implement it
cleanly.  I think a better fix is to arrange for the system to prefer a
different address space region for these thread stacks; for details, see the
first comment the patch adds to win32_shmem.c.  This works here.

> backend startup sees six thread creations:
> 
> 1. main thread
> 2. thread created before postgres.exe has control
> 3. thread created before postgres.exe has control
> 4. thread created before postgres.exe has control
> 5. in pgwin32_signal_initialize()
> 6. in src\backend\port\win32\timer.c:setitimer()
> 
> Threads 2-4 exit exactly 30s after creation.  If we fail to reattach to shared
> memory, we'll exit before reaching code to start 5 or 6.

Threads 2-4 proved to be worker threads of the default thread pool.

[1] https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pools
diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index dd91fe5..7d8961f 100644
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index bc1e946..eed17c5 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -17,6 +17,28 @@
 #include "storage/ipc.h"
 #include "storage/pg_shmem.h"

+/*
+ * Early in a process's life, Windows asynchronously creates threads for the
+ * process's "default thread pool"
+ * (https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pools).
+ * Occasionally, thread creation allocates a stack after
+ * PGSharedMemoryReAttach() has released UsedShmemSegAddr and before it has
+ * mapped shared memory at UsedShmemSegAddr.  This would cause mapping to fail
+ * if the allocator preferred the just-released region for allocating the new
+ * thread stack.  We observed such failures in some Windows Server 2016
+ * configurations.  To give the system another region to prefer, reserve and
+ * release an additional, protective region immediately before reserving or
+ * releasing shared memory.  The idea is that, if the allocator handed out
+ * REGION1 pages before REGION2 pages at one occasion, it will do so whenever
+ * both regions are free.  Windows Server 2016 exhibits that behavior, and a
+ * system behaving differently would have less need to protect
+ * UsedShmemSegAddr.  The protective region must be at least large enough for
+ * one thread stack.  However, ten times as much is less than 2% of the 3

Re: partitioned tables referenced by FKs

2019-04-02 Thread Jesper Pedersen


Hi,

On 4/1/19 4:03 PM, Alvaro Herrera wrote:

I'm satisfied with this patch now, so I intend to push early tomorrow.



Passes check-world, and my tests.

Best regards,
 Jesper

Re: [HACKERS] generated columns

2019-04-02 Thread Erik Rijkers


On 2019-04-02 14:43, Peter Eisentraut wrote:

On 2019-04-01 10:52, Peter Eisentraut wrote:

On 2019-03-31 05:49, Erik Rijkers wrote:

STORED in a
file_fdw foreign table still silently creates the column which then
turns out to be useless on SELECT, with an error like:

"ERROR:  column some_column_name is a generated column
DETAIL:  Generated columns cannot be used in COPY."

Maybe it'd be possible to get an error earlier, i.e., while trying to
create such a useless column?


I'll look into it.


I've been trying to create a test case for file_fdw for this, but I'm
not getting your result.  Can you send a complete test case?



Ah, I had not noticed before: with an asterisk ('select * from table' ) 
one gets no error, just empty values.


An actual error seems to occur when one mentions the 
generated-column-name explicitly in the select-list.


select "id", "Ratio Log2 GEN" from ;
"
ERROR:  column "Ratio Log2 GEN" is a generated column
DETAIL:  Generated columns cannot be used in COPY.
"

That's from a quick test here at work; maybe that gives you enough info.

If that doesn't make it repeatable (for you) I'll make a more complete 
example this evening (from home).

Re: [PROPOSAL] Temporal query processing with range types

2019-04-02 Thread Ibrar Ahmed

I start looking at the patch, there is a couple of problems with the patch. The 
first one is the OID conflict, which I fixed on my machine. The second problem 
is assertion failure. I think you have not compiled the PostgreSQL code with 
the assertion. 

...
postgres=# SELECT *
FROM (projects p1 NORMALIZE projects p2 USING() WITH(t,t)) p_adjusted;
TRAP: FailedAssertion("!(ptr == ((void *)0) || (((const Node*)(ptr))->type) == 
type)", File: "../../../src/include/nodes/nodes.h", Line: 588)
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-04-02 
12:50:09.654 UTC [27550] LOG:  server process (PID 27559) was terminated by 
signal 6: Aborted
...

Although this patch is WIP, but please avoid mix declaration to avoid the 
compiler warning message.

...
joinpath.c:993:3: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
   PathTarget *target_split = makeNode(PathTarget);
...

I am still looking at the patch.

Re: [HACKERS] generated columns

2019-04-02 Thread Peter Eisentraut

On 2019-04-01 10:52, Peter Eisentraut wrote:
> On 2019-03-31 05:49, Erik Rijkers wrote:
>> STORED in a 
>> file_fdw foreign table still silently creates the column which then 
>> turns out to be useless on SELECT, with an error like:
>>
>> "ERROR:  column some_column_name is a generated column
>> DETAIL:  Generated columns cannot be used in COPY."
>>
>> Maybe it'd be possible to get an error earlier, i.e., while trying to 
>> create such a useless column?
> 
> I'll look into it.

I've been trying to create a test case for file_fdw for this, but I'm
not getting your result.  Can you send a complete test case?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ordered Partitioned Table Scans

2019-04-02 Thread Amit Langote

Hi David,

On Tue, Apr 2, 2019 at 8:49 PM David Rowley
 wrote:
> I ended up rewording the entire thing and working on the header
> comment for the function too. I think previously it wasn't that well
> defined what "ordered" meant. I added a mention that we expect that
> NULLs, if possible must come in the last partition.

Thanks for the updated patch.

New descriptions look good, although was amused by this:

diff --git a/src/backend/partitioning/partbounds.c
b/src/backend/partitioning/partbounds.c
index bdd0d23854..9dd378d7a0 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -25,6 +25,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/pathnodes.h"
...
+partitions_are_ordered(struct RelOptInfo *partrel)

Maybe, "struct" is unnecessary?

Thanks,
Amit

Re: Unified logging system for command-line programs

2019-04-02 Thread Peter Eisentraut

On 2019-04-02 05:05, Michael Paquier wrote:
> I am a bit disappointed that this stuff is not reducing
> the amount of diff code with ifdef FRONTEND in src/common/.

That wasn't the purpose of the patch.  If you have a concrete proposal
for how to do what you describe, it would surely be welcome.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unified logging system for command-line programs

2019-04-02 Thread Peter Eisentraut

On 2019-04-01 20:55, Andres Freund wrote:
> A written upthread, I think this should have had a uniform interface
> with elog.h, and probably even share some code between the two.

The purpose of this patch was to consolidate the existing zoo of logging
routines in the frontend programs.  That has meaningful benefits.  There
is hardly any new code; most of the code was just consolidated from
existing scattered code.

If someone wants to take it further and consolidate that with the
backend logging infrastructure, they are free to propose a patch for
consideration.  Surely the present patch can only help, since it already
makes the call sites uniform, which would presumably have to be done
anyway.  However, there is no prototype or even detailed design sketch
let alone someone committing to implement such a project.  So it seems
unreasonable to block other meaningful improvements in adjacent areas in
the meantime.

> This is
> going in the wrong direction, making it harder, not easier, to share
> code between frontend and backend.

I don't think anything has changed in that respect.  If there is reason
to believe that code that uses fprintf() is easier to share with the
backend than alternatives, then nothing is standing in the way of
continuing to use that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Problems with plan estimates in postgres_fdw

2019-04-02 Thread Etsuro Fujita


(2019/03/29 22:18), Etsuro Fujita wrote:

Attached is a
new version. If there are no objections, I'll commit this version.


Pushed after polishing the patchset a bit further: add an assertion, 
tweak comments, and do cleanups.  Thanks for reviewing, Antonin and Jeff!


Best regards,
Etsuro Fujita

Re: Ordered Partitioned Table Scans

2019-04-02 Thread David Rowley

On Tue, 2 Apr 2019 at 14:26, Amit Langote  wrote:
> How about extending the sentence about when the optimization is disabled
> as follows:
>
> "If we find a Datum in a partition that's greater than one previously
> already seen, then values could become out of order and we'd have to
> disable the optimization.  We'd also need to disable the optimization if
> NULL values are interleaved with other Datum values, because the calling
> code expect them to be present in the last partition."
>
> Further, extend the "For now..." sentence as:
>
> "For now, let's just keep it simple and just accept LIST partitioned table
> without a DEFAULT partition where each partition only accepts a single
> Datum or NULL.  It's OK to always accept NULL partition in that case,
> because PartitionBoundInfo lists it as the last partition."

I ended up rewording the entire thing and working on the header
comment for the function too. I think previously it wasn't that well
defined what "ordered" meant. I added a mention that we expect that
NULLs, if possible must come in the last partition.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


mergeappend_to_append_conversion_v18.patch
Description: Binary data

Re: GCoS2019--pgBackRest port to Windows (2019)

2019-04-02 Thread David Steele


Hello Youssef,

On 4/1/19 5:16 PM, Youssef Khedher wrote:

Hello,

I’m Youssef Khedher, 19 years old student from Tunisia.

I’m a computer science student in Tunisia, and also an online student 
for Harvard CS50 Program.


I’m interested in the ” pgBackRest port to Windows (2019)”.


Excellent!

> To be frank with you, I’m not an expert in C

You'll need to be able to understand and modify many areas of the C code 
in order to be successful in this project.  I would encourage you to 
review the code and make sure you are able to follow what it's doing in 
general before submitting an application:


https://github.com/pgbackrest/pgbackrest/tree/master/src

I don't think you'll be writing a bunch of new code, but anything is 
possible when porting software.


In order to achieve the project goal, all I need from you MR. Stephen 
Frost or MR. David Steele is some help to understand perfectly the task 
and to help me enter to world of software development.


For this project it's important to have knowledge about how Windows 
differs from Unix, e.g. the way child processes are spawned.  In the 
core code we only use fork() with an immediate exec(), so this should be 
straightforward enough to port:


https://github.com/pgbackrest/pgbackrest/blob/master/src/command/archive/get/get.c#L235

We also use fork() quite a bit in our testing and I was thinking the 
HARNESS_FORK*() macros could be enhanced to use threads instead on Windows:


https://github.com/pgbackrest/pgbackrest/blob/master/test/src/common/harnessFork.h

If not, then the tests will need to be adjusted to accommodate whatever 
testing method is developed.


pgBackRest uses SSH to communicate with remote processes:

https://github.com/pgbackrest/pgbackrest/blob/master/src/protocol/helper.c#L288

Eventually we would like to move away from requiring SSH, but for this 
port I think the best idea would be to get pgBackRest working with some 
open source SSH solution such as OpenSSH (which is easily installed on 
recent versions of Windows, but not sure about older versions).  If 
there is time at the end we might look at alternate solutions.


There may be other minor areas in the code that need be adjusted or 
#ifdef'd to work with Windows.  We've tried to keep this to a minimum by 
enforcing C99 and Posix standards, but there will be some differences. 
The config code that enforces Unix path structure is an obvious area 
that will need to be updated:


https://github.com/pgbackrest/pgbackrest/blob/master/src/config/parse.c#L1034

Note that we want to port to native Windows without the presence of 
Cygwin (or similar) in production.  My preference would be to use 
something like Strawberry Perl for testing, and then as few dependencies 
as possible for the production distribution.


A CI testing platform for Windows will need to be selected -- mostly 
likely AppVeyor.


The documentation will also need to be updated for Windows.

You should delve into the areas mentioned above and propose possible 
solutions when writing your proposal.  Feel free to ask questions.


Porting code from one platform to another can be quite complicated, but 
we believe this project can be accomplished over the summer by a skilled 
and motivated student.


If you are interested in proceeding, you should create an issue here:

https://github.com/pgbackrest/pgbackrest/issues

We do our development on Github and issues are the way we discuss 
projects, enhancements, and bugs.


Good luck!
--
-David
da...@pgmasters.net

Re: [HACKERS] WAL logging problem in 9.4.3?

2019-04-02 Thread Kyotaro HORIGUCHI

Thank you for reviewing.

At Sun, 31 Mar 2019 15:31:58 -0700, Noah Misch  wrote in 
<20190331223158.gb891...@rfd.leadboat.com>
> On Sun, Mar 10, 2019 at 07:27:08PM -0700, Noah Misch wrote:
> > On Mon, Mar 04, 2019 at 12:24:48PM +0900, Kyotaro HORIGUCHI wrote:
> > > +/*
> > > + * RelationRemovePendingSync() -- remove pendingSync entry for a relation
> > > + */
> > > +void
> > > +RelationRemovePendingSync(Relation rel)
> > 
> > What is the coding rule for deciding when to call this?  Currently, only
> > ATExecSetTableSpace() calls this.  CLUSTER doesn't call it, despite behaving
> > much like ALTER TABLE SET TABLESPACE behaves.
> 
> This question still applies.  (The function name did change from
> RelationRemovePendingSync() to RelationInvalidateWALRequirements().)

It is called for heap_register_sync()'ed relations to avoid
syncing useless or trying to sync nonexistent files. I modifed
all CLUSTER, COPY FROM, CREATE AS, REFRESH MATVIEW and SET
TABLESPACE uses the function. (The function is renamed to
table_relation_invalidate_walskip()).

I noticed that heap_register_sync and friends are now a kind of
Table-AM function. So I added .relation_register_walskip and
.relation_invalidate_walskip in TableAMRoutine and moved the
heap_register_sync stuff as heapam_relation_register_walskip and
friends. .finish_bulk_insert() is modified to be used only
WAL-skip is active on the relation. (0004, 0005) But I'm not sure
that is the right direction.

(RelWALRequirements is renamed to RelWALSkip)

The change made smgrFinishBulkInsert (known as smgrDoPendingSync)
need to call a tableam interface. Relation is required to call it
in the designed way but relcache cannot live until there. In the
attached patch 0005, a new member TableAmRoutine *tableam is
added to RelWalSkip and calls finish_bulk_insert() via the
tableAm. But I'm quite uneasy with that...

> On Mon, Mar 25, 2019 at 09:32:04PM +0900, Kyotaro HORIGUCHI wrote:
> > At Wed, 20 Mar 2019 22:48:35 -0700, Noah Misch  wrote in 
> > <20190321054835.gb3842...@rfd.leadboat.com>
> Again, I do want them in the code.  Please restore them, but use a mechanism
> like CACHE_elog() so they're built only if one defines a preprocessor symbol.

Ah, sorry. I restored the messages using STORAGE_elog(). I also
needed this. (SMGR_ might be better but I'm not sure.)

> On Tue, Mar 26, 2019 at 04:35:07PM +0900, Kyotaro HORIGUCHI wrote:
> > +   smgrProcessWALRequirementInval(s->subTransactionId, false);
> 
> The smgrProcessWALRequirementInval() calls almost certainly belong in
> CommitSubTransaction() and AbortSubTransaction(), not in these functions.  By
> doing it here, you'd get the wrong behavior in a subtransaction created via a
> plpgsql "BEGIN ... EXCEPTION WHEN OTHERS THEN" block.

Thanks. Moved it to AtSubAbort_smgr() and AtSubCommit_smgr(). (0005)

> > +/*
> > + * Process pending invalidation of WAL requirements happened in the
> > + * subtransaction
> > + */
> > +void
> > +smgrProcessWALRequirementInval(SubTransactionId sxid, bool isCommit)
> > +{
> > +   HASH_SEQ_STATUS status;
> > +   RelWalRequirement *walreq;
> > +
> > +   if (!walRequirements)
> > +   return;
> > +
> > +   /* We expect that we don't have walRequirements in almost all cases */
> > +   hash_seq_init(&status, walRequirements);
> > +
> > +   while ((walreq = hash_seq_search(&status)) != NULL)
> > +   {
> > +   /* remove useless entry */
> > +   if (isCommit ?
> > +   walreq->invalidate_sxid == sxid :
> > +   walreq->create_sxid == sxid)
> > +   hash_search(walRequirements, &walreq->relnode, 
> > HASH_REMOVE, NULL);
> 
> Do not remove entries during subtransaction commit, because a parent
> subtransaction might still abort.  See other CommitSubTransaction() callees
> for examples of correct subtransaction handling.  AtEOSubXact_Files() is one
> simple example.

Thanks. smgrProcessWALSkipInval() (0005) is changed so that:

 - If a RelWalSkip entry is created in aborted subtransaction,
   remove it.

 - If a RelWalSkip entry is created then invalidated in committed
   subtransaction, remove it.

 - If a RelWalSkip entry is created and committed, change the
   creator subtransaction to the parent subtransaction.

 - If a RelWalSkip entry is create elsewhere and invalidated in
   committed subtransaction, move the invalidation to the parent
   subtransaction.

 - If a RelWalSkip entry is created elsewhere and invalidated in
   aborted subtransaction, cancel the invalidation.

Test is added as test3a2 and test3a3. (0001)

> > @@ -3567,15 +3602,26 @@ heap_update
> >  */
> > if (RelationIsAccessibleInLogicalDecoding(relation))
> > {
> > -   log_heap_new_cid(relation, &oldtup);
> > -   log_heap_new_cid(relation, heaptup);
> > +   if (oldbuf_needs_wal)
> > +   log_heap_new_cid(relation, &oldtup);
> > +   if (newbuf_needs_wal)
>

Re: Refactoring the checkpointer's fsync request queue

2019-04-02 Thread Thomas Munro

On Wed, Mar 27, 2019 at 5:48 AM Shawn Debnath  wrote:
> On Tue, Mar 26, 2019 at 09:22:56AM -0400, Robert Haas wrote:
> > On Tue, Mar 26, 2019 at 12:20 AM Thomas Munro  
> > wrote:
> > > Here's a fixup patch.  0001 is the same as Shawn's v12 patch, and 0002
> > > has my proposed changes to switch to callbacks that actually perform
> > > the sync and unlink operations given a file tag, and do so via the
> > > SMGR fd cache, rather than exposing the path to sync.c.  This moves us
> > > back towards the way I had it in an earlier version of the patch, but
> > > instead of using smgrsyncimmed() as I had it, it goes via Shawn's new
> > > sync handler function lookup table, allowing for non-smgr components
> > > to use this machinery in future (as requested by Andres).
> >
> > Strong +1.  Not only might closing and reopening the files have
> > performance and reliability implications, but a future smgr might talk
> > to the network, having no local file to sync.
>
> Makes sense for now. When we re-visit the fd-passing or sync-on-close
> implementations, we can adapt the changes relatively easily given the
> rest of the framework is staying intact. I am hoping these patches do
> not delay the last fsync-gate issue discussion further.

I found a few more things that I thought needed adjustment:

* Packing handler and request type into a uint8 is cute but a waste of
time if we're just going to put it in a struct next to a member that
requires word-alignment.  So I changed it to a pair of plain old int16
members.  The ftag member starts at offset 4 either way, on my system.

* I didn't really like the use of the word HIERARCHY in the name of
the request type, and changed it to SYNC_FILTER_REQUEST.  That word
came up because we were implementing a kind of hierarchy, where if you
drop a database you want to forget things for all segments inside all
relations inside that database, but the whole point of this new API is
that it doesn't understand that, it calls a filter function to decide
which requests to keep.  So I preferred "filter" as a name for the
type of request.

* I simplified the "matches" callback interface.

* Various typos and comment clean-up.

I'm going to do some more testing and tidying tomorrow (for example I
think the segment.h header is silly and I'd like to remove that), and
commit this.

-- 
Thomas Munro
https://enterprisedb.com

0001-Refactor-the-fsync-queue-for-future-SMGR-impleme-v14.patch
Description: Binary data

Re: Minimal logical decoding on standbys

2019-04-02 Thread Amit Khandekar

On Thu, 14 Mar 2019 at 15:00, Amit Khandekar  wrote:
> I managed to get a recovery conflict by :
> 1. Setting hot_standby_feedback to off
> 2. Creating a logical replication slot on standby
> 3. Creating a table on master, and insert some data.
> 2. Running : VACUUM FULL;
>
> This gives WARNING messages in the standby log file.
> 2019-03-14 14:57:56.833 IST [40076] WARNING:  slot decoding_standby w/
> catalog xmin 474 conflicts with removed xid 477
> 2019-03-14 14:57:56.833 IST [40076] CONTEXT:  WAL redo at 0/3069E98
> for Heap2/CLEAN: remxid 477
>
> But I did not add such a testcase into the test file, because with the
> current patch, it does not do anything with the slot; it just keeps on
> emitting WARNING in the log file; so we can't test this scenario as of
> now using the tap test.

I am going ahead with drop-the-slot way of handling the recovery
conflict. I am trying out using ReplicationSlotDropPtr() to drop the
slot. It seems the required locks are already in place inside the for
loop of ResolveRecoveryConflictWithSlots(), so we can directly call
ReplicationSlotDropPtr() when the slot xmin conflict is found.

As explained above, the only way I could reproduce the conflict is by
turning hot_standby_feedback off on slave, creating and inserting into
a table on master and then running VACUUM FULL. But after doing this,
I am not able to verify whether the slot is dropped, because on slave,
any simple psql command thereon, waits on a lock acquired on sys
catache, e.g. pg_authid. Working on it.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: minimizing pg_stat_statements performance overhead

2019-04-02 Thread Christoph Berg

Re: Raymond Martin 2019-04-01 

> Thanks again Fabien. I am attaching the patch to this email in the hope of 
> getting it approved during the next commit fest. 

Raymond,

you sent the patch as UTF-16, could you re-send it as plain ascii?

Christoph

Re: [Patch] Mingw: Fix import library extension, build actual static libraries

2019-04-02 Thread Sandro Mani

Any chance these patches could be considered?

Thanks
Sandro

On Thu, Mar 7, 2019 at 3:23 PM Sandro Mani  wrote:

> Currently building postgresql for Win32 with a mingw toolchain produces
> import libraries with *.a extension, whereas the extension should be
> *.dll.a. There are various downstream workarounds for this, see i.e. [1]
> and [2]. The attached patch 0001-Fix-import-library-extension.patch
> addresses this.
>
> Related, no actual static libraries are produced alongside the respective
> dlls. The attached patch 0002-Build-static-libraries.patch addresses this,
> in a similar fashion as is already done for the AIX case in Makefile.shlib.
>
> Thanks Sandro
>
> [1]
> https://src.fedoraproject.org/rpms/mingw-postgresql/blob/master/f/mingw-postgresql.spec#_144
> [2]
> https://aur.archlinux.org/cgit/aur.git/tree/0001-Use-.dll.a-as-extension-for-import-libraries.patch?h=mingw-w64-postgresql
>

Re: partitioned tables referenced by FKs

2019-04-02 Thread Amit Langote

Hi Alvaro,

On 2019/04/02 5:03, Alvaro Herrera wrote:
> On 2019-Mar-29, Jesper Pedersen wrote:
> 
>> I ran my test cases for this feature, and havn't seen any issues.
>>
>> Therefore I'm marking 1877 as Ready for Committer. If others have additional
>> feedback feel free to switch it back.
> 
> Thanks!
> 
> I found two issues today.  One, server side, is that during cloning for
> partition attach we were not checking for concurrent deletion of
> referenced tuples in partitions.  I added an isolation spec test for
> this.  To fix the bug, added a find_all_inheritors() to lock all
> partitions with ShareRowExclusiveLock.
> 
> Another is that psql's \d failed for versions < 12, because we were
> inconditionally adding an "AND conparentid = 0" clause.
> 
> I also reworked CloneForeignKeyConstraints.  The previous style was
> being forced by the old recursing method; now we can make it a lot
> simpler -- it's now just two subroutine calls.
> 
> I'm satisfied with this patch now, so I intend to push early tomorrow.

I tried the latest patch and found no problems as far as I tried.

I guess it's OK for the time being that users will not be able to drop a
partition while a foreign key is pointing to it, given the troubles with
that pre-DROP check.  (Of course, someone may want to improve dependency.c
in the future so that we can allow the DROP where possible.)

Thanks for working on this.

Off-topic:

I did some performance testing, which has no relation to the code being
changed by this patch, but thought the numbers hint at a need for
improving other areas that affect the scalability (in terms of number of
partitions) of the foreign key checks.

* Table setup:

Create a hash partitioned table with N partitions, followed by a regular
table referencing the aforementioned partitioned table.  Insert keys
1-1000 into ht, the primary key table.

N: 2..8192

drop table ht cascade;
create table ht (a int primary key, b int, c int) partition by hash (a);
select 'create table ht' || x::text || ' partition of ht for values with
(MODULUS N, REMAINDER ' || (x)::text || ');' from generate_series(0, N-1) x;
\gexec
drop table foo cascade;
create table foo (a int references ht);
truncate ht cascade;
insert into ht select generate_series(1, 1000);

* pgbench script (insert-foo.sql)

\set a random(1, 1000)
insert into foo values (:a);

* Run pgbench

pgbench -n -T 120 -f insert-foo.sql

* TPS with plan_cache_mode = auto (default) or force_custom_plan

2   2382.779742 <=
8   2392.514952
32  2392.682178
128 2387.828361
512 2358.325865
10242234.699889
40962030.610062
81921740.633117

* TPS with plan_cache_mode = force_generic_plan

2   2924.030727 <=
8   2854.460937
32  2378.630989
128 1550.235166
512 642.980148
1024330.142430
409671.739272
819232.620403

It'd be good to work on improving the scalability of generic plan
execution in the future, because not having to re-plan the query used by
RI_FKey_check would always be better, as seen by comparing execution times
for 2 partitions (2382 tps with custom plan vs. 2924 tps with generic
plan.).  Having run-time pruning already helps quite a lot, but some other
ideas are needed, such as one proposed by David recently, but withdrawn
for PG 12 [1].

Anyway, nothing here that prevents this patch from being committed. :)

Thanks for reading.

Regards,
Amit

[1] https://commitfest.postgresql.org/22/1897/

Re: Progress reporting for pg_verify_checksums

2019-04-02 Thread Fabien COELHO





For pg_checksums, probably some improvement patch will be submitted later,
if someone feels like it.


Let's see.  I think that what we have now in v12 is good enough for
checksum operations on an offline cluster.  And my take is that we
should focus more on online checksum verification for v13 and future
versions.


I agree.

For online, we should want throttling so that the check can have a reduced 
performance impact when scrubbing.


--
Fabien.

RE: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-04-02 Thread Jamison, Kirk

On Thursday, March 28, 2019 3:13 PM (GMT+9), Haribabu Kommi wrote:
> I tried the approach as your suggested as by not counting the actual parallel 
> work
> transactions by just releasing the resources without touching the counters,
> the counts are not reduced much.
>
> HEAD - With 4 parallel workers running query generates 13 stats ( 4 * 3  + 1)
> Patch -  With 4 parallel workers running query generates 9 stats ( 4 * 2 + 1)
> Old approach patch - With 4 parallel workers running query generates 1 stat 
> (1)
>
> Currently the parallel worker start transaction 3 times in the following 
> places.
> 1. InitPostgres
> 2. ParallelWorkerMain (2)
>
> with the attached patch, we reduce one count from ParallelWorkerMain.

I'm sorry for the late review of the patch.
Patch applies, compiles cleanly, make check passes too.
I tested the recent patch again using the same method above
and confirmed the increase of generated stats by 9 w/ 4 parallel workers.

postgres=# BEGIN;
postgres=# select xact_commit from pg_stat_database where datname = 'postgres';
xact_commit
-
  60
(1 row)
postgres=# explain analyze select * from tab where b = 6;
[snipped]
postgres=# COMMIT;
postgres=# select xact_commit from pg_stat_database where datname = 'postgres';
xact_commit
-
  69

The intention of the latest patch is to fix the stat of
(IOW, do not count) parallel cooperating transactions,
or the transactions started by StartParallelWorkerTransaction,
based from the advice of Amit.
After testing, the current patch works as intended.

However, also currently being discussed in the latter mails
is the behavior of how parallel xact stats should be shown.
So the goal eventually is to improve how we present the
stats for parallel workers by differentiating between the
internal-initiated (bgworker xact, autovacuum, etc) and
user/application-initiated transactions, apart from keeping
the overall xact stats.
..So fixing the latter one will change this current thread's fix?

I personally have no problems with committing this fix first
before fixing the latter problem discussed above.
But I think there should be a consensus regarding that one.

Regards,
Kirk Jamison

GCoS2019--pgBackRest port to Windows (2019)

2019-04-02 Thread Youssef Khedher

Hello,
I’m Youssef Khedher, 19 years old student from Tunisia.
I’m a computer science student in Tunisia, and also an online student for 
Harvard CS50 Program.
I’m interested in the ” pgBackRest port to Windows (2019)”.
I have a good knowledge in C and in IT in general, I believe that nothing is 
impossible and that I can handle and understand this project 
even if I have to stay 24h/24h. I love challenges and I’m someone who never 
stops not matter what is the obstacle is.
Working in an open source project with a well-known organization like 
PostgreSQL is a pleasure and a dream for every young 
student. That’s why I’m fully motivated to the challenge with you. 
To be frank with you, I’m not an expert in C,  but I’m kind of people who never 
give up until the problem is solved on my own.
In order to achieve the project goal, all I need from you MR. Stephen Frost or 
MR. David Steele is some help to understand perfectly the task and to help me 
enter to world of software development. 
I’m feeling comfortable to express what’s in my head honestly because 
PostgreSQL said that their mentors are wonderfully kind and friendly people who 
want you to learn and succeed!
Thanks for your time. 
Looking forward to your response!

Sincerely,
 " YOUSSEF  KHEDHER "

Computer-Science Student. 
Phone:(+216) 25 460 276 
Address : 28 Rue Amir Abedlkader, Jemmal 5020, Monastir, Tunisia.

RE: Planning counters in pg_stat_statements (using pgss_store)

2019-04-02 Thread legrand legrand

Hi,

>>
>> case  avg_tps   pct_diff
>> 089 278   --
>> 188 745   0,6%
>> 288 282   1,1%
>> 386 660   2,9%
>>
>> This means that even in this extrem test case, the worst degradation is less
>> than 3%
>> (this overhead can be removed using pg_stat_statements.track_planning guc)

> Is the difference between 2 and 3 the extraneous pgss_store call to
> always store the query text if planner hook doesn't have access to the
> query text?

Yes it is,
but I agree it seems a big gap (1,8%) compared to the difference between 1 and 
2 (0,5%).
Maybe this is just mesure "noise" ...

Regards
PAscal

81 matches

Mail list logo