date:20190513

Tab completion for CREATE TYPE

2019-05-13 Thread Thomas Munro

Hi,

Since I keep forgetting the syntax and options, here is $SUBJECT.

-- 
Thomas Munro
https://enterprisedb.com


0001-Tab-completion-for-CREATE-TYPE.patch
Description: Binary data

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 09:33:52PM -0700, Andres Freund wrote:
> On 2019-05-14 13:23:28 +0900, Michael Paquier wrote:
>> What's actually the reason preventing us from delaying the
>> checkpointer like the index AMs for the logging of heap init fork?
> 
> I'm not following. What do you mean by "delaying the checkpointer"?

I mean what Robert has mentioned here:
https://www.postgresql.org/message-id/ca+tgmoz4twapckhf-szv-npxdxl40zcwm9pnfjzurvrgm2o...@mail.gmail.com

And my gut tells me that he got that right, because we are discussing
about race conditions with crashes and checkpoints in-between calls to 
smgrimmedsync() and log_newpage().  That could be invasive for
back-branches, but for HEAD this would make the whole init fork
handling saner.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] WAL logging problem in 9.4.3?

2019-05-13 Thread Kyotaro HORIGUCHI

Hello.

At Sun, 12 May 2019 17:37:05 -0700, Noah Misch  wrote in 
<20190513003705.ga1202...@rfd.leadboat.com>
> On Sun, Mar 31, 2019 at 03:31:58PM -0700, Noah Misch wrote:
> > On Sun, Mar 10, 2019 at 07:27:08PM -0700, Noah Misch wrote:
> > > I also liked the design in the https://postgr.es/m/559fa0ba.3080...@iki.fi
> > > last paragraph, and I suspect it would have been no harder to back-patch. 
> > >  I
> > > wonder if it would have been simpler and better, but I'm not asking 
> > > anyone to
> > > investigate that.
> > 
> > Now I am asking for that.  Would anyone like to try implementing that other
> > design, to see how much simpler it would be?

Yeah, I think it is a bit too-complex for the value. But I think
it is the best way as far as we keep reusing a file on
truncation of the whole file.

> Anyone?  I've been deferring review of v10 and v11 in hopes of seeing the
> above-described patch first.

The siginificant portion of the complexity in this patch comes
from need to behave differently per block according to remebered
logged and truncated block numbers.

0005:
+ * NB: after WAL-logging has been skipped for a block, we must not WAL-log
+ * any subsequent actions on the same block either. Replaying the WAL record
+ * of the subsequent action might fail otherwise, as the "before" state of
+ * the block might not match, as the earlier actions were not WAL-logged.
+ * Likewise, after we have WAL-logged an operation for a block, we must
+ * WAL-log any subsequent operations on the same page as well. Replaying
+ * a possible full-page-image from the earlier WAL record would otherwise
+ * revert the page to the old state, even if we sync the relation at end
+ * of transaction.
+ *
+ * If a relation is truncated (without creating a new relfilenode), and we
+ * emit a WAL record of the truncation, we can't skip WAL-logging for any
+ * of the truncated blocks anymore, as replaying the truncation record will
+ * destroy all the data inserted after that. But if we have already decided
+ * to skip WAL-logging changes to a relation, and the relation is truncated,
+ * we don't need to WAL-log the truncation either.

If this consideration holds and given the optimizations on
WAL-skip and truncation, there's no way to avoid the per-block
behavior as far as we are allowing mixture of
logged-modifications and WAL-skipped COPY on the same relation
within a transaction.

We could avoid the per-block behavior change by making the
wal-inhibition per-relation basis. That will reduce the patch
size by the amount of BufferNeedsWALs and log_heap_update, but
not that large.

 inhibit wal-skipping after any wal-logged modifications in the relation.
 inhibit wal-logging after any wal-skipped modifications in the relation.
 wal-skipped relations are synced at commit-time.
 truncation of wal-skipped relation creates a new relfilenode.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Andres Freund

Hi,

On 2019-05-14 13:23:28 +0900, Michael Paquier wrote:
> On Mon, May 13, 2019 at 10:37:35AM -0700, Andres Freund wrote:
> > Ugh, this is all such a mess. But, isn't this broken independently of
> > the smgrimmedsync() issue? In a basebackup case, the basebackup could
> > have included the main fork, but not the init fork, and the reverse. WAL
> > replay *solely* needs to be able to recover from that.  At the very
> > least we'd have to do the cleanup step after becoming consistent, not
> > just before recovery even started.
> 
> Yes, the logic using smgrimmedsync() is race-prone and weaker than the
> index AMs in my opinion, even if the failure window is limited (I
> think that this is mentioned upthread a bit).

How's it limited? On a large database a base backup easily can take
*days*. And e.g. VM and FSM can easily have inodes that are much newer
than the the main/init forks, so typical base-backups (via OS/glibc
readdir) will sort them at a later point (or it'll be hashed, in which
case it's entirely random), so the window between when the different
forks are copied are large.

> What's actually the reason preventing us from delaying the
> checkpointer like the index AMs for the logging of heap init fork?

I'm not following. What do you mean by "delaying the checkpointer"?

Greetings,

Andres Freund

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-05-13 Thread Amit Langote

On 2019/05/14 13:23, Amit Langote wrote:
> Tom
> strongly objected to that idea saying that such join paths are kind of
> silly [1], even outside the context of partitionwise join.  He suggested
> that we abandon partitionwise join in such cases, because having to build
> a dummy base relation for pruned partitions only to generate silly-looking
> paths would be an ugly kludge.

I forgot to mention that he even committed a patch to disable
partitionwise joins in such cases, which was also applied to v11 branch.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d70c147fa217c4bae32ac1afb86ab42d98b36fdf

Note that there were also other reasons for committing, beside what I
described in my previous email.

Thanks,
Amit

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Andres Freund

Hi,

On 2019-05-14 12:50:09 +0900, Michael Paquier wrote:
> On Mon, May 13, 2019 at 10:52:21AM -0700, Andres Freund wrote:
> > I was wrong here - I thought we WAL logged the main fork creation even
> > for unlogged tables. I think it's foolish that we don't, but we don't.
>
> Why?  The main fork is not actually necessary, and the beginning of
> recovery would make sure that any existing main fork gets removed and
> that the main fork gets recreated by copying it from the init fork,
> which is WAL-logged.

Several things:

1) A main fork that's created while a base-backup is in progress might
   only partially exist on a HS node. We won't necessarily remove it,
   because the init fork might *not* have been copied. The init fork
   will be re-created by WAL replay, but that'll be too late for the
   ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP) call.

   Which then will

   a) cause ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to fail, because the
  target file already exists, as mentioned elsewhere in this thread

   b) cause trouble when access the unlogged table - unfortunately there
  are several ways to access unlogged tables on a HS node - so the
  main fork better either not exist, or be in a valid empty
  state. Case in point:
  COPY blarg TO STDOUT;
  does not throw an error due to unlogged-ness.

   If we instead properly WAL logged the MAIN fork creation b) wouldn't
   be a problem.

   And we could have the UNLOGGED_RELATION_INIT case re-use the
   hashtable built at UNLOGGED_RELATION_CLEANUP time, and avoid a second
   scan of the filesystem. As all relations that were created after the
   LSN crash recovery started at would be valid, we would not need to do
   a full second scan: All unlogged relations created after that LSN are
   guaranteed to have a consistent main fork.

2) With a small bit of additional work, the sketch to fix 1) would allow
   us to allow read access to unlogged relations during HS. We'd just
   have do the UNLOGGED_RELATION_INIT work at the start of recovery.

3) I'm somewhat certain that there are relfilenode reuse issues with the
   current approach, again during base backups (as there the checkpoint
   logic to prevent relfilenode reuse doesn't work). I think you can
   easily end up with main / vm / fsm / init forks that actually don't
   belong to the same relation for a base backup. If you don't force
   delete all other forks at the time the init fork record is replayed,
   there's no defense against that.

There's probably more, but I gotta cook.

> So we definitely have to log the init fork, but logging the main fork
> is just useless data in the WAL record.  Or you have something else in
> mind?

Well, as I argued somewhere else in this thread, I think this all should
just be one WAL record.

Something very roughly like:

struct xl_smgr_create
{
RelFileNode rnode;
ForkNumber  forkNum;
boolremove_all_other_forks;
boolcopy_init_to_main_fork;
size_t  data_length;
charfork_contents[];
};

Presumably with a flags argument instead of separate booleans.

Greetings,

Andres Freund

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-05-13 Thread Amit Langote

Hi Amul, Ashutosh,

On 2019/04/24 20:26, amul sul wrote:
> Attached version is rebase atop of the latest master head(fdc7efcc30), also
> incorporates the Ashutosh's suggestion, thanks.

Reading the commit message of 0002 and after testing to confirm, I
understand that the patch doesn't handle OUTER joins where the nullable
side is missing some partitions.  The reason given is that join planning
would have to add base relations corresponding to missing partitions on
the nullable side, which we can't do.  While working on partition pruning
refactoring recently (committed as 428b260f87e), we faced a similar
situation in that pruned partitions are like missing partitions, because
they're not added to the PlannerInfo anymore, whereas before that commit,
they'd be added and marked dummy afterwards.  Earlier versions of my patch
had code to add dummy base relations for such pruned partitions, because
partitionwise join expected pairs of matched partitions to be valid base
relations, because that's how things were when partitionwise joins feature
was committed.  Join path generated in this case would have a
constant-false Result path (an empty relation) for the nullable side.  Tom
strongly objected to that idea saying that such join paths are kind of
silly [1], even outside the context of partitionwise join.  He suggested
that we abandon partitionwise join in such cases, because having to build
a dummy base relation for pruned partitions only to generate silly-looking
paths would be an ugly kludge.  I guess the same argument applies to the
case where the nullable side is missing some partitions, so the right way
to support partitionwise join case in that case wouldn't be to figure out
how joinrels.c could add dummy base relations.

He did mention that cases where the nullable side is provably empty can be
handled by simply returning the path of the non-nullable side with
suitable projection path added on top to emit NULLs for the columns of the
nullable-side.  If we teach populate_joinrel_with_paths() and underlings
about that, then we can allow partitionwise join even in the case where
the nullable side has some partitions missing.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/25035.1553905052%40sss.pgh.pa.us

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 10:37:35AM -0700, Andres Freund wrote:
> Ugh, this is all such a mess. But, isn't this broken independently of
> the smgrimmedsync() issue? In a basebackup case, the basebackup could
> have included the main fork, but not the init fork, and the reverse. WAL
> replay *solely* needs to be able to recover from that.  At the very
> least we'd have to do the cleanup step after becoming consistent, not
> just before recovery even started.

Yes, the logic using smgrimmedsync() is race-prone and weaker than the
index AMs in my opinion, even if the failure window is limited (I
think that this is mentioned upthread a bit).  What's actually the
reason preventing us from delaying the checkpointer like the index AMs
for the logging of heap init fork?  The fact that the init heap fork
is an empty page?
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 11:28:32AM -0400, Robert Haas wrote:
> 1. The comment added in that commit says "Recover may as well remove
> it while replaying..." but what is really meant is "Recovery may well
> remove it well replaying..."  The phrase "may as well" means that
> there isn't really any reason not to do it even if it's not strictly
> necessary.  The phrase "may well" means that it's entirely possible.
> The latter meaning is the one we want here.

Perhaps I wrote this comment.  Per what you say, indeed we should use
"may well" here, to outline the fact that it is a possibility.  I have
to admit that I don't completely get the difference with "may as
well", which also sounds like a possibility by reading this comment
today, but I am no native speaker.

> 2. The comment as adjusted in that commit refers to needing an
> immediate sync "even if the page has been logged, because the write
> did not go through shared_buffers," but there is no page and no write,
> because an empty heap is just an empty file.   That logic makes sense
> for index AMs that bypass shared buffers to write a metapage, e.g.
> nbtree, as opposed to others which go through shared_buffers and don't
> have the immediate sync, e.g. brin.  However, the heap writes no pages
> either through shared buffers or bypassing shared buffers, so either
> I'm confused or the comment makes no sense.

We need to ensure that the init fork, even if empty, still exists so
as recovery can do its job of detection that this is an unlogged table
and then recreate the empty main fork.  Perhaps the comment could
insist on the on-disk existence of the relfilenode instead, making
necessary the logging of this page, however empty it is.

> 3. Before that commit, the comment said that "the immediate sync is
> required, because otherwise there's no guarantee that this will hit
> the disk before the next checkpoint moves the redo pointer."  However,
> that justification seems to apply to *anything* that does smgrcreate +
> log_smgrcreate would also need to do smgrimmedsync, and
> RelationCreateStorage doesn't.  Unless, for some reason that I'm not
> thinking of right now, the init fork has stronger durability
> requirements that the main fork, it's hard to understand why
> heapam_relation_set_new_filenode can call RelationCreateStorage to do
> smgrcreate + log_smgrcreate for the main fork and that's OK, but when
> it does the same thing itself for the init fork, it now needs
> smgrimmedsync as well.
> 
> My guess, just shooting from the hip, is that the smgrimmedsync call
> can be removed here.  If that's wrong, then we need a better
> explanation for why it's needed, and we possibly need to add it to
> every single place that does smgrcreate that doesn't have it already.

That's mentioned upthread, and my sloppy memories on the matter match
that we need the fsync call to ensure that the init fork is present at
the init of recovery, and that logging makes sure that is does not get
wiped out if replaying a database or tablespace record.
--
Michael


signature.asc
Description: PGP signature

Re: VACUUM can finish an interrupted nbtree page split -- is that okay?

2019-05-13 Thread Peter Geoghegan

On Mon, May 13, 2019 at 8:30 PM Tom Lane  wrote:
> Peter Geoghegan  writes:
> > To be fair, I suppose that the code made more sense when it first went
> > in, because at the time there was a chance that there could be
> > leftover half-dead internal pages. But, that was a long time ago now.
>
> Is there a good reason to assume there are none left anywhere?

That is not an assumption that the proposed patch rests upon, though
it is true that there are probably going to be virtually no half-dead
internal pages that make there way on to a Postgres 12 installation.
You'd have to do a CREATE INDEX on Postgres 9.3, and then not VACUUM
or REINDEX the index once it was on a 9.4+ installation. I suppose
that a 9.3 -> 12 upgrade is the most plausible scenario in which you
could actual get a half-dead internal page on a Postgres 12
installation.

Even when that happens, the index is already considered corrupt by
VACUUM, so the same VACUUM process that could in theory be adversely
affected by removing the half-dead internal page check will complain
about the page when it gets to it later on -- the user will be told to
REINDEX. And even then, we will never actually get to apply the check
that I propose to remove, since we're already checking the leaf page
sibling of the leaf-level target -- the leaf-level test that was added
by efada2b8e92 was clearly necessary. But it was also sufficient (no
equivalent internal half-dead right sibling test is needed): a 9.3-era
half-dead internal page cannot have more than one child, which must be
undergoing deletion as well.

If somebody doubted my rationale for why we don't need to do anything
more on internal page levels in installations where the user didn't
pg_upgrade from a version that's < 9.4, then they'd still have to
explain why we haven't heard of any problems in 5 years, and probably
offer some alternative fix that considers "logically half-dead
internal pages" (i.e. pages that are or will be the top parent in a
deletion chain). Because the code that I propose to remove obviously
cannot be doing much of anything for indexes built on 9.4+.

-- 
Peter Geoghegan

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Andres Freund

Hi,

On 2019-05-14 12:37:39 +0900, Michael Paquier wrote:
> Still, I would like to understand why the bootstrap process has been
> signaled to begin with, particularly for an initdb, which is not
> really something that should happen on a server where an instance
> runs.  If you have a too aggressive monitoring job, you may want to
> revisit that as well, because it is able to complain just with an
> initdb.

Shutdown, timeout, resource exhaustion all seem like possible
causes. Don't think any of them warrant a core file - as the OP
explains, that'll often trigger pages etc.

Greetings,

Andres Freund

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 10:52:21AM -0700, Andres Freund wrote:
> I was wrong here - I thought we WAL logged the main fork creation even
> for unlogged tables. I think it's foolish that we don't, but we don't.

Why?  The main fork is not actually necessary, and the beginning of
recovery would make sure that any existing main fork gets removed and
that the main fork gets recreated by copying it from the init fork,
which is WAL-logged.  So we definitely have to log the init fork, but
logging the main fork is just useless data in the WAL record.  Or you
have something else in mind?
--
Michael


signature.asc
Description: PGP signature

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 11:28:51PM -0400, Tom Lane wrote:
> OK, that's fair.  The SIG_DFL change I suggested will fix that problem
> for SIGINT etc (except SIGQUIT, for which you should be *expecting*
> a core file).  I agree with Michael that we do not wish to change what
> happens for an internal error; but external signals do not represent
> a bug in PG, so forcing a PANIC for those seems unwarranted.

No objections from here to change the signal handlers.  Still, I would
like to understand why the bootstrap process has been signaled to
begin with, particularly for an initdb, which is not really something
that should happen on a server where an instance runs.  If you have a
too aggressive monitoring job, you may want to revisit that as well,
because it is able to complain just with an initdb.
--
Michael


signature.asc
Description: PGP signature

Re: VACUUM can finish an interrupted nbtree page split -- is that okay?

2019-05-13 Thread Tom Lane

Peter Geoghegan  writes:
> To be fair, I suppose that the code made more sense when it first went
> in, because at the time there was a chance that there could be
> leftover half-dead internal pages. But, that was a long time ago now.

Is there a good reason to assume there are none left anywhere?

regards, tom lane

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Tom Lane

[ please don't top-post on the PG lists ]

Thunder   writes:
> At 2019-05-14 07:53:36, "Michael Paquier"  wrote:
>> On Mon, May 13, 2019 at 09:37:32AM -0400, Tom Lane wrote:
>>> But ... that code's been like that for decades and nobody's complained
>>> before.  Why are we worried about bootstrap's response to signals at all?

> On our server when process crash and core dump file generated we will receive 
> complaining phone call.
> That's why i try to fix it.

OK, that's fair.  The SIG_DFL change I suggested will fix that problem
for SIGINT etc (except SIGQUIT, for which you should be *expecting*
a core file).  I agree with Michael that we do not wish to change what
happens for an internal error; but external signals do not represent
a bug in PG, so forcing a PANIC for those seems unwarranted.

regards, tom lane

Re: VACUUM can finish an interrupted nbtree page split -- is that okay?

2019-05-13 Thread Peter Geoghegan

On Tue, May 7, 2019 at 9:59 AM Peter Geoghegan  wrote:
> On Tue, May 7, 2019 at 12:27 AM Heikki Linnakangas  wrote:
> > I don't understand that reasoning. Yes, _bt_pagedel() will complain if
> > it finds a half-dead internal page. But how does that mean that
> > _bt_lock_branch_parent() can't encounter one?
>
> I suppose that in theory it could, but only if you allow that any
> possible state could be found -- it doesn't seem any more likely than
> any other random illegal state.

To be fair, I suppose that the code made more sense when it first went
in, because at the time there was a chance that there could be
leftover half-dead internal pages. But, that was a long time ago now.

I wonder why the code wasn't complaining about corruption loudly, like
the top level code, instead of treating half-dead pages as a
legitimate reason to not proceed with multi-level page deletion. That
would have been overkill, but it would have made much more sense IMV.

I would like to proceed with pushing this patch to HEAD in the next
few days, since it's clearly removing code that can't be useful. Are
there any objections?

-- 
Peter Geoghegan

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2019-05-13 Thread Kyotaro HORIGUCHI

Hello.

At Mon, 13 May 2019 17:37:50 +0800, Paul Guo  wrote in 

> Thanks for the reply.
> 
> On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> 
> >
> > +  if (!(stat(parent_path, ) == 0 && S_ISDIR(st.st_mode)))
> > +  {
> >
> > This patch is allowing missing source and destination directory
> > even in consistent state. I don't think it is safe.
> >
> 
> I do not understand this. Can you elaborate?

Suppose we were recoverying based on a backup at LSN1 targeting
to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2
is called as "consistency point", before where the database is
not consistent. It's because we are applying WAL recored older
than those that were already applied in the second trial. The
same can be said for crash recovery, where LSN1 is the latest
checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN.

Creation of an existing directory or dropping of a non-existent
directory are apparently inconsistent or "broken" so we should
stop recovery when seeing such WAL records while database is in
consistent state.

> > +ereport(WARNING,
> > +(errmsg("directory \"%s\" for copydir() does not exists."
> > +"It is possibly expected. Skip copydir().",
> > +parent_path)));
> >
> > This message seems unfriendly to users, or it seems like an elog
> > message. How about something like this. The same can be said for
> > the source directory.
> >
> > | WARNING:  skipped creating database directory: "%s"
> > | DETAIL:  The tabelspace %u may have been removed just before crash.
> >
> 
> Yeah. Looks better.
> 
> 
> >
> > # I'm not confident in this at all:(
> >
> > > 2) Fixed dbase_desc(). Now the xlog output looks correct.
> > >
> > > rmgr: Databaselen (rec/tot): 42/42, tx:486, lsn:
> > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> > > pg_tblspc/16384/PG_12_201904281/16386
> > >
> > > rmgr: Databaselen (rec/tot): 34/34, tx:487, lsn:
> > > 0/01638EB8, prev 0/01638E40, desc: DROP dir
> > > pg_tblspc/16384/PG_12_201904281/16386
> >
> > WAL records don't convey such information. The previous
> > description seems right to me.
> >
> 
> 2019-04-17 14:52:14.951 CST [23030] CONTEXT:  WAL redo at 0/3011650 for
> Database/CREATE: copy dir 1663/1 to 65546/65547
> The directories are definitely wrong and misleading.

The original description is right in the light of how the server
recognizes. The record exactly says that "copy dir 1663/1 to
65546/65547" and the latter path is converted in filesystem layer
via a symlink.


> > > > Also I'd suggest we should use pg_mkdir_p() in
> > TablespaceCreateDbspace()
> > > > to replace
> > > > the code block includes a lot of
> > > > get_parent_directory(), MakePGDirectory(), etc even it
> > > > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > > > graceful and simpler.
> >
> > But I don't agree to this. pg_mkdir_p goes above two-parents up,
> > which would be unwanted here.
> >
> > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
> This change just makes the code concise. Though in theory the change is not
> needed.

We don't want to create tablespace direcotory after concurrent
DROPing, as the comment just above is saying:

|  * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
|  * or TablespaceCreateDbspace is running concurrently.

If the concurrent DROP TABLESPACE destroyed the grand parent
directory, we mustn't create it again.

> > > > Whatever ignore mkdir failure or mkdir_p, I found that these steps
> > seem to
> > > > be error-prone
> > > > along with postgre evolving since they are hard to test and also we are
> > > > not easy to think out
> > > > various potential bad cases. Is it possible that we should do real
> > > > checkpoint (flush & update
> > > > redo lsn) when seeing checkpoint xlogs for these operations? This will
> > > > slow down standby
> > > > but master also does this and also these operations are not usual,
> > > > espeically it seems that it
> > > > does not slow down wal receiving usually?
> >
> > That dramatically slows recovery (not replication) if databases
> > are created and deleted frequently. That wouldn't be acceptable.
> >
> 
> This behavior is rare and seems to have the same impact on master & standby
> from checkpoint/restartpoint.
> We do not worry about master so we should not worry about standby also.

I didn't mention replication. I said that that slows recovery,
which is not governed by master's speed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: remove doc/bug.template?

2019-05-13 Thread Bruce Momjian

On Mon, May 13, 2019 at 04:34:34PM -0400, Tom Lane wrote:
> Magnus Hagander  writes:
> > On Mon, May 13, 2019 at 10:00 PM Robert Haas  wrote:
> >> On Mon, May 13, 2019 at 3:54 PM Peter Eisentraut
> >>  wrote:
> >>> I'm not sure doc/bug.template still serves a purpose.  There is bug
> >>> reporting advice in the documentation, and there is a bug reporting
> >>> form.  This file just seems outdated.  Should we remove it?
> 
> >> In my opinion, yes.
> 
> > +1.
> 
> No objection, but make sure you fix src/tools/version_stamp.pl.
> (Looks like there's a reference in .gitattributes, too)

Yes, please remove.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re:Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Thunder

On our server when process crash and core dump file generated we will receive 
complaining phone call.
That's why i try to fix it.








At 2019-05-14 07:53:36, "Michael Paquier"  wrote:
>On Mon, May 13, 2019 at 09:37:32AM -0400, Tom Lane wrote:
>> But ... that code's been like that for decades and nobody's complained
>> before.  Why are we worried about bootstrap's response to signals at all?
>
>Yeah, I don't think that it is something worth bothering either.  As
>you mentioned the data folder would be removed by default.  Or perhaps
>the reporter has another case in mind which could justify a change in
>the signal handlers?  I am ready to hear that case, but there is
>nothing about the reason why it could be a benefit.
>
>The patch proposed upthread is not something I find correct anyway,
>I'd rather have the abort path complain loudly about a bootstrap
>transaction that fails instead of just ignoring it, because it is the
>kind of transaction which must never fail.  And it seems to me that it
>can be handy for development purposes.
>--
>Michael

Re: pg12 release notes

2019-05-13 Thread Bruce Momjian

On Mon, May 13, 2019 at 12:48:00PM -0500, Justin Pryzby wrote:
> On Fri, May 10, 2019 at 05:10:06PM +1200, David Rowley wrote:
> > On Fri, 10 May 2019 at 16:52, Justin Pryzby  wrote:
> > > I'm rechecking my list from last month.  What about these ?
> > >
> > > > c076f3d Remove pg_constraint.conincluding
> > > > bd09503 Increase the default vacuum_cost_limit from 200 to 2000
> > 
> > bd09503 was reverted in 52985e4fea7 and replaced by cbccac371, which
> > is documented already by:
> > 
> >  Reduce the default value of 
> > autovacuum_vacuum_cost_delay to 2ms (Tom Lane) 
> 
> Right, thanks.
> 
> I suspect c076f3d should be included, though.

This commit was part of a set of patches that forced a catalog version
changes in PG 11 in early September, 2018.

> |The many system tables with such columns will now display those columns with 
> SELECT * by default. 
> I think could be better:
> |SELECT * will now output those columns for the many system tables which have 
> them.
> |(previously, the columns weren't shown unless explicitly selected).

Good idea.  The new text is:

SELECT * will now output those columns for the many
system tables which have them.  Previously, the columns had to be 
selected
explicitly.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Passing CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot()

2019-05-13 Thread Michael Paquier

On Tue, May 14, 2019 at 01:19:30PM +1200, David Rowley wrote:
> When I wrote the code I admit that I was probably wearing my
> object-orientated programming hat. I had in mind that the whole
> function series would have made a good class.  Passing the
> CopyMultiInsertInfo was sort of the non-OOP equivalent to having
> this/Me/self available, as it would be for any instance method of a
> class.  Back to reality, this isn't OOP, so I was wearing the wrong
> hat.  I think the unused parameter should likely be removed.  It's
> probably not doing a great deal of harm since the function is static
> inline and the compiler should be producing any code for the unused
> param, but for the sake of preventing confusion, it should be removed.
> Ashutosh had to ask about it, so it wasn't immediately clear what the
> purpose of it was. Since there's none, be gone with it, I say.

Sounds fair to me.  This has been introduced by 86b8504, so let's see
what's Andres take.
--
Michael


signature.asc
Description: PGP signature

Re: ldapbindpasswdfile

2019-05-13 Thread Thomas Munro

On Tue, May 14, 2019 at 1:49 PM Thomas Munro  wrote:
> ... or a named pipe that performs arbitrary magic.

(Erm, that bit might not make much sense...)

-- 
Thomas Munro
https://enterprisedb.com

Re: PG 12 draft release notes

2019-05-13 Thread Bruce Momjian

On Mon, May 13, 2019 at 10:08:57AM +0300, Oleg Bartunov wrote:
> On Mon, May 13, 2019 at 6:52 AM Bruce Momjian  wrote:
> >
> > On Sun, May 12, 2019 at 09:49:40AM -0400, Bruce Momjian wrote:
> > > On Sun, May 12, 2019 at 10:00:38AM +0300, Oleg Bartunov wrote:
> > > > Bruce,
> > > >
> > > > I noticed that jsonpath in your version is mentioned only in functions
> > > > chapter, but  commit
> > > > 72b6460336e86ad5cafd3426af6013c7d8457367 is about implementation of
> > > > SQL-2016 standard. We implemented JSON Path language as a jsonpath
> > > > datatype with a bunch of support functions, our implementation
> > > > supports 14 out of 15 features and it's the most complete
> > > > implementation (we compared oracle, mysql and ms sql).
> > >
> > > Glad you asked.  I was very confused about why a data type was added for
> > > a new path syntax.  Is it a new storage format for JSON, or something
> > > else?  I need help on this.
> >
> > I talked to Alexander Korotkov on chat about this.  The data types are
> > used as arguments to the functions, similar to how tsquery and tsvector
> > are used for full text search.
> >
> > Therefore, the data types are not really useful on their own, but as
> > support for path functions.  However, path functions are more like JSON
> > queries, rather than traditional functions, so it odd to list them under
> > functions, but there isn't a more reasonable place to put them.
> >
> > Alexander researched how we listed full text search in the release notes
> > that added the feature, but we had  "General" category at that time that
> > we don't have now.
> 
> I attached slide about our Jsonpath implementation in Postgres, it
> summarizes the reasons to have jsonpath data type. But my point was:
> JSON Path is a part of SQL-2016 standard and I think it's worth to
> mention it, not just a set of jsonb functions.

So, are you suggesting we mention the jsonpath data type in the Data
Type section, even though it is useless without jsonpath, which is in
another section, or are you suggesting to move the jsonpath item to the
Data Type section?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 12 draft release notes

2019-05-13 Thread Bruce Momjian

On Mon, May 13, 2019 at 08:41:25AM +0200, Fabien COELHO wrote:
> 
> Hello Bruce,
> 
> > I have posted a draft copy of the PG 12 release notes here:
> > 
> > http://momjian.us/pgsql_docs/release-12.html
> > 
> > They are committed to git.  It includes links to the main docs, where
> > appropriate.  Our official developer docs will rebuild in a few hours.
> 
> Pgbench entry "Improve pgbench error reporting with clearer messages and
> return codes" is by Peter Eisentraut, not me. I just reviewed it.

Thanks, fixed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

ldapbindpasswdfile

2019-05-13 Thread Thomas Munro

Hello hackers,

Some users don't like the fact that ldapbindpasswd can leak into logs
(and now system views?).  Also, some users don't like the fact that it
is in cleartext rather than some encryption scheme (though I don't
know what, since we'd presumably also need the key).  I propose a new
option $SUBJECT so that users can at least add a level of indirection
and put the password in a file.  A motivated user could point it at an
encrypted loopback device so that they need a passphrase at mount
time, or a named pipe that performs arbitrary magic.  Some of these
topics were discussed last time someone had this idea[1].

Using a separate file for the bind password is fairly common in other
software: see the ldapsearch's -y switch, and I think it probably
makes sense at the very least as a convenience, without getting into
hand-wringing discussions about whether any security is truly added.

Draft patch attached.

Hi Stephen!

I also know that a motivated user could also use GSSAPI instead of
LDAP.  Do you think we should update the manual to say so, perhaps in
a "tip" box on the LDAP auth page?

[1] 
https://www.postgresql.org/message-id/flat/20140617175511.2589.45249%40wrigleys.postgresql.org

-- 
Thomas Munro
https://enterprisedb.com


0001-Add-ldapbindpasswdfile-option-for-pg_hba.conf.patch
Description: Binary data

Re: Passing CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot()

2019-05-13 Thread David Rowley

On Tue, 14 May 2019 at 13:00, Michael Paquier  wrote:
>
> On Mon, May 13, 2019 at 08:17:49PM +0530, Ashutosh Sharma wrote:
> > Thanks for the confirmation David. The patch looks good to me.
>
> It looks to me that it can be a matter a consistency with the other
> APIs dealing with multi-inserts in COPY.  For now I have added an open
> item on that.

When I wrote the code I admit that I was probably wearing my
object-orientated programming hat. I had in mind that the whole
function series would have made a good class.  Passing the
CopyMultiInsertInfo was sort of the non-OOP equivalent to having
this/Me/self available, as it would be for any instance method of a
class.  Back to reality, this isn't OOP, so I was wearing the wrong
hat.  I think the unused parameter should likely be removed.  It's
probably not doing a great deal of harm since the function is static
inline and the compiler should be producing any code for the unused
param, but for the sake of preventing confusion, it should be removed.
Ashutosh had to ask about it, so it wasn't immediately clear what the
purpose of it was. Since there's none, be gone with it, I say.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: pg12 release notes

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 12:48:00PM -0500, Justin Pryzby wrote:
> I suspect c076f3d should be included, though.

bd47c4a9 has removed pg_constraint.conincluding from REL_11_STABLE as
well.
--
Michael


signature.asc
Description: PGP signature

Re: vacuumdb and new VACUUM options

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 07:28:25PM +0900, Masahiko Sawada wrote:
> Thank you! I've incorporated your comment in my branch. I'll post the
> updated version patch after the above discussion got a consensus.

Fujii-san, any input about the way to move forward here?  Beta1 is
planned for next week, hence it would be nice to progress on this
front this week.
--
Michael


signature.asc
Description: PGP signature

Re: Passing CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot()

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 08:17:49PM +0530, Ashutosh Sharma wrote:
> Thanks for the confirmation David. The patch looks good to me.

It looks to me that it can be a matter a consistency with the other
APIs dealing with multi-inserts in COPY.  For now I have added an open
item on that.
--
Michael


signature.asc
Description: PGP signature

Re: Re: SQL/JSON: functions

2019-05-13 Thread Andrew Alsup


On 3/5/19 5:35 PM, Nikita Glukhov wrote:

Attached 36th version of the patches rebased onto jsonpath v36.

While testing this patch a found a few issues:

[1] I was not able to apply the patch to the current HEAD. However, it 
applies cleanly to commit: e988878f85 (NOTE: I did not investigate which 
commit between e988878f85 and HEAD caused problems).


[2] JsonPath array slicing does not work. I'm not aware of a 
comprehensive list of JsonPath features/syntax that is targeted for 
support; however, I did try various forms of array slicing, which don't 
currently work.


Here are a few examples:

The input document is the same in each example.

{
  "a1": 123,
  "b1": "xxx",
  "c1": {
    "a2": 456,
    "b2": "yyy",
    "c2": [
  {"a3": 777, "b3": "7z"},
  {"a3": 888, "b3": "8z"}
    ]
  }
}

array wildcard selector [*] works: $.c1.c2[*].b3

# select json_path_query('{"a1": 123, "b1": "xxx", "c1": {"a2": 456, 
"b2": "yyy", "c2": [{"a3": 777, "b3": "7z"},{"a3": 888, "b3": 
"8z"}]}}'::json, '$.c1.c2[*].b3'::jsonpath);

 json_path_query
-
 "7z"
 "8z"
(2 rows)

array index selector [0] works: $.c1.c2[0].b3

jsonpatch=# select json_path_query('{"a1": 123, "b1": "xxx", "c1": 
{"a2": 456, "b2": "yyy", "c2": [{"a3": 777, "b3": "7z"},{"a3": 888, 
"b3": "8z"}]}}'::json, '$.c1.c2[0].b3'::jsonpath);

 json_path_query
-
 "7z"
(1 row)

array slicing [0:], [:1], and [0:1] do not work:$.c1.c2[0:].b3

# select json_path_query('{"a1": 123, "b1": "xxx", "c1": {"a2": 456, 
"b2": "yyy", "c2": [{"a3": 777, "b3": "7z"},{"a3": 888, "b3": 
"8z"}]}}'::json, '$.c1.c2[0:].b3'::jsonpath);
2019-05-13 20:47:48.740 EDT [21856] ERROR:  bad jsonpath representation 
at character 147
2019-05-13 20:47:48.740 EDT [21856] DETAIL:  syntax error, unexpected 
':', expecting ',' or ']' at or near ":"
2019-05-13 20:47:48.740 EDT [21856] STATEMENT:  select 
json_path_query('{"a1": 123, "b1": "xxx", "c1": {"a2": 456, "b2": "yyy", 
"c2": [{"a3": 777, "b3": "7z"},{"a3": 888, "b3": "8z"}]}}'::json, 
'$.c1.c2[0:].b3'::jsonpath);

ERROR:  bad jsonpath representation
LINE 1: ...7, "b3": "7z"},{"a3": 888, "b3": "8z"}]}}'::json, '$.c1.c2[0...
 ^
DETAIL:  syntax error, unexpected ':', expecting ',' or ']' at or near ":"

Re: Quitting the thes

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 04:52:15PM -0300, Stephen Amell wrote:
> Here my first patch for postgres. Starting by an easy thing, I correct the
> duplicated "the the" strings from comments on some files.

Welcome!

> - src/backend/executor/execExpr.c
> - src/include/c.h
> - src/include/jit/llvmjit_emit.h
> - src/include/nodes/execnodes.h
> - src/include/replication/logical.h
> 
> Any feedback are welcome!

Thanks, committed.  I have noticed an extra one in reorderbuffer.c.

If you get interested in more areas of the code, there is plently of
small work items which can be reviewed or worked on.

We have on the wiki a manual about how to submit a patch:
https://wiki.postgresql.org/wiki/Submitting_a_Patch

Patches go through a dedicated commit fest app, and it has many small
fishes which can be worked on:
https://commitfest.postgresql.org/23/
You could get more familiar with some areas of the code this way.

Thanks,
--
Michael

signature.asc
Description: PGP signature

Re: How to install login_hook in Postgres 10.5

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 01:06:10PM -0700, legrand legrand wrote:
> that finished commited
> "pgsql: Add hooks for session start and session end"
> https://www.postgresql.org/message-id/flat/575d6fa2-78d0-4456-8600-302fc35b2591%40dunslane.net#0819e315c6e44c49a36c69080cab644d
> 
> but was finally rollbacked because it didn't pass installcheck test
> ...
> 
> Maybe you are able to patch your pg installation, 
> it would be a solution of choice (there is a nice exemple 
> of extension included)

You will need to patch Postgres to add this hook, and you could
basically reuse the patch which has been committed once.  I don't
think that it would be that much amount of work to get it working
correctly on the test side to be honest, so we may be able to get
something into v13 at this stage.  This is mainly a matter of
resources though, and of folks willing to actually push for it.
--
Michael


signature.asc
Description: PGP signature

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 09:37:32AM -0400, Tom Lane wrote:
> But ... that code's been like that for decades and nobody's complained
> before.  Why are we worried about bootstrap's response to signals at all?

Yeah, I don't think that it is something worth bothering either.  As
you mentioned the data folder would be removed by default.  Or perhaps
the reporter has another case in mind which could justify a change in
the signal handlers?  I am ready to hear that case, but there is
nothing about the reason why it could be a benefit.

The patch proposed upthread is not something I find correct anyway,
I'd rather have the abort path complain loudly about a bootstrap
transaction that fails instead of just ignoring it, because it is the
kind of transaction which must never fail.  And it seems to me that it
can be handy for development purposes.
--
Michael

signature.asc
Description: PGP signature

Re: PG12, PGXS and linking pgfeutils

2019-05-13 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-May-13, Tom Lane wrote:
>> I started working on a patch to do that, and soon noticed that there
>> are pre-existing files logging.[hc] in src/bin/pg_rewind/.  This seems
>> like a Bad Thing, in fact the #includes in pg_rewind/ are already a
>> little confused due to this.  I think we should either rename those
>> two pg_rewind files to something else, or rename the generic ones,
>> perhaps to "fe_logging.[hc]".  The latter could be done nearly
>> trivially as part of the movement patch, but on cosmetic grounds
>> I'd be more inclined to do the former instead.  Thoughts?

> I'd rename both :-)

On closer inspection, there's so little left in pg_rewind's logging.h/.c
(one function and a couple of global variables) that the better answer
is probably just to move those objects somewhere else and nuke the
separate files altogether.  As attached.

regards, tom lane

diff --git a/src/bin/pg_rewind/Makefile b/src/bin/pg_rewind/Makefile
index 019e199..5455fd4 100644
--- a/src/bin/pg_rewind/Makefile
+++ b/src/bin/pg_rewind/Makefile
@@ -19,7 +19,7 @@ override CPPFLAGS := -I$(libpq_srcdir) -DFRONTEND $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
 OBJS	= pg_rewind.o parsexlog.o xlogreader.o datapagemap.o timeline.o \
-	fetch.o file_ops.o copy_fetch.o libpq_fetch.o filemap.o logging.o \
+	fetch.o file_ops.o copy_fetch.o libpq_fetch.o filemap.o \
 	$(WIN32RES)
 
 EXTRA_CLEAN = xlogreader.c
diff --git a/src/bin/pg_rewind/copy_fetch.c b/src/bin/pg_rewind/copy_fetch.c
index 3fd0404..2ada861 100644
--- a/src/bin/pg_rewind/copy_fetch.c
+++ b/src/bin/pg_rewind/copy_fetch.c
@@ -18,7 +18,6 @@
 #include "fetch.h"
 #include "file_ops.h"
 #include "filemap.h"
-#include "logging.h"
 #include "pg_rewind.h"
 
 static void recurse_dir(const char *datadir, const char *path,
diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index e442f93..f9e41b1 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -21,7 +21,6 @@
 #include "common/file_perm.h"
 #include "file_ops.h"
 #include "filemap.h"
-#include "logging.h"
 #include "pg_rewind.h"
 
 /*
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 63d0bae..813eadc 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -15,12 +15,10 @@
 
 #include "datapagemap.h"
 #include "filemap.h"
-#include "logging.h"
 #include "pg_rewind.h"
 
 #include "common/string.h"
 #include "catalog/pg_tablespace_d.h"
-#include "fe_utils/logging.h"
 #include "storage/fd.h"
 
 filemap_t  *filemap = NULL;
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index 11ec045..b6fa7e5 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -19,12 +19,10 @@
 #include "fetch.h"
 #include "file_ops.h"
 #include "filemap.h"
-#include "logging.h"
 
 #include "libpq-fe.h"
 #include "catalog/pg_type_d.h"
 #include "fe_utils/connect.h"
-#include "fe_utils/logging.h"
 #include "port/pg_bswap.h"
 
 static PGconn *conn = NULL;
diff --git a/src/bin/pg_rewind/logging.c b/src/bin/pg_rewind/logging.c
deleted file mode 100644
index 8169f73..000
--- a/src/bin/pg_rewind/logging.c
+++ /dev/null
@@ -1,79 +0,0 @@
-/*-
- *
- * logging.c
- *	 logging functions
- *
- *	Copyright (c) 2010-2019, PostgreSQL Global Development Group
- *
- *-
- */
-#include "postgres_fe.h"
-
-#include 
-#include 
-
-#include "pg_rewind.h"
-#include "logging.h"
-
-#include "pgtime.h"
-
-/* Progress counters */
-uint64		fetch_size;
-uint64		fetch_done;
-
-static pg_time_t last_progress_report = 0;
-
-
-/*
- * Print a progress report based on the global variables.
- *
- * Progress report is written at maximum once per second, unless the
- * force parameter is set to true.
- */
-void
-progress_report(bool force)
-{
-	int			percent;
-	char		fetch_done_str[32];
-	char		fetch_size_str[32];
-	pg_time_t	now;
-
-	if (!showprogress)
-		return;
-
-	now = time(NULL);
-	if (now == last_progress_report && !force)
-		return;	/* Max once per second */
-
-	last_progress_report = now;
-	percent = fetch_size ? (int) ((fetch_done) * 100 / fetch_size) : 0;
-
-	/*
-	 * Avoid overflowing past 100% or the full size. This may make the total
-	 * size number change as we approach the end of the backup (the estimate
-	 * will always be wrong if WAL is included), but that's better than having
-	 * the done column be bigger than the total.
-	 */
-	if (percent > 100)
-		percent = 100;
-	if (fetch_done > fetch_size)
-		fetch_size = fetch_done;
-
-	/*
-	 * Separate step to keep platform-dependent format code out of
-	 * translatable strings.  And we only test for INT64_FORMAT availability
-	 * in snprintf, not fprintf.
-	 */
-	snprintf(fetch_done_str, sizeof(fetch_done_str), INT64_FORMAT,
-

Re: Multivariate MCV stats can leak data to unprivileged users

2019-05-13 Thread Tomas Vondra


On Fri, May 10, 2019 at 10:19:44AM +0100, Dean Rasheed wrote:

While working on 1aebfbea83c, I noticed that the new multivariate MCV
stats feature suffers from the same problem, and also the original
problems that were fixed in e2d4ef8de8 and earlier --- namely that a
user can see values in the MCV lists that they shouldn't see (values
from tables that they don't have privileges on).

I think there are 2 separate issues here:

1). The table pg_statistic_ext is accessible to anyone, so any user
can see the MCV lists of any table. I think we should give this the
same treatment as pg_statistic, and hide it behind a security barrier
view, revoking public access from the table.

2). The multivariate MCV stats planner code can be made to invoke
user-defined operators, so a user can create a leaky operator and use
it to reveal data values from the MCV lists even if they have no
permissions on the table.

Attached is a draft patch to fix (2), which hooks into
statext_is_compatible_clause().



I think that patch is good.


I haven't thought much about (1). There are some questions about what
exactly the view should look like. Probably it should translate table
oids to names, like pg_stats does, but should it also translate column
indexes to names? That could get fiddly. Should it unpack MCV items?



Yeah. I suggest we add a simple pg_stats_ext view, similar to pg_stats.
It would:

(1) translate the schema / relation / attribute names

 I don't see why translating column indexes to names would be fiddly.
 It's a matter of simple unnest + join, no? Or what issues you see?

(2) include values for ndistinct / dependencies, if built

 Those don't include any actual values, so this should be OK. You might
 make the argument that even this does leak a bit of information (e.g.
 when you can see values in one column, and you know there's a strong
 functional dependence, it tells you something about the other column
 which you may not see). But I think we kinda already leak information
 about that through estimates, so maybe that's not an issue.

(3) include MCV list only when user has access to all columns

 Essentially, if the user is missing 'select' privilege on at least one
 of the columns, there'll be NULL. Otherwise the MCV value.

The attached patch adds pg_stats_ext doing this. I don't claim it's the
best possible query backing the view, so any improvements are welcome.


I've been thinking we might somehow filter the statistics values, e.g. by
not showing values for attributes the user has no 'select' privilege on,
but that seems like a can of worms. It'd lead to MCV items that can't be
distinguished because the only difference was the removed attribute, and
so on. So I propose we simply show/hide the whole MCV list.

Likewise, I don't think it makes sense to expand the MCV list in this
view - that works for the single-dimensional case, because then the
list is expanded into two arrays (values + frequencies), which are easy
to process further. But for multivariate MCV lists that's much more
complex - we don't know how many attributes are there, for example.

So I suggest we just show the pg_mcv_list value as is, and leave it up
to the user to call the pg_mcv_list_items() function if needed.

This will also work for histograms, where expanding the value in the
pg_stats_ext would be even trickier.


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 566100d6df..b1302cfa5d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -253,6 +253,26 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 
 REVOKE ALL on pg_statistic FROM public;
 
+CREATE VIEW pg_stats_ext WITH (security_barrier) AS
+SELECT
+nspname AS schemaname,
+relname AS tablename,
+(SELECT array_agg(attname) FROM unnest(stxkeys) k
+   JOIN pg_attribute a ON (a.attnum = k)
+  WHERE attrelid = stxrelid) AS attnames,
+stxkind,
+stxndistinct,
+stxdependencies,
+(CASE WHEN EXISTS (SELECT 1 FROM unnest(stxkeys) k
+   JOIN pg_attribute a ON (a.attnum = k)
+  WHERE attrelid = stxrelid AND NOT has_column_privilege(c.oid, 
a.attnum, 'select')) THEN NULL
+ ELSE stxmcv END) stxmcv
+FROM pg_statistic_ext s JOIN pg_class c ON (c.oid = s.stxrelid)
+ LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
+WHERE (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+
+REVOKE ALL on pg_statistic_ext FROM public;
+
 CREATE VIEW pg_publication_tables AS
 SELECT
 P.pubname AS pubname,

PostgreSQL 12 Beta 1 Release: 2019-05-23

2019-05-13 Thread Andres Freund

Hi,

The Release Management Team is pleased to announce that
the release date for PostgreSQL 11 Beta 1 is set to be 2019-05-23
(wrapping [1] the release 2019-05-20).

We’re excited to make the first beta for this latest major
release of PostgreSQL available for testing, and we welcome
all feedback.

Please let us know if you have any questions.

Regards,

Andres, on behalf of the PG 12 RMT

[1] https://wiki.postgresql.org/wiki/Release_process

Re: remove doc/bug.template?

2019-05-13 Thread Tom Lane

Magnus Hagander  writes:
> On Mon, May 13, 2019 at 10:00 PM Robert Haas  wrote:
>> On Mon, May 13, 2019 at 3:54 PM Peter Eisentraut
>>  wrote:
>>> I'm not sure doc/bug.template still serves a purpose.  There is bug
>>> reporting advice in the documentation, and there is a bug reporting
>>> form.  This file just seems outdated.  Should we remove it?

>> In my opinion, yes.

> +1.

No objection, but make sure you fix src/tools/version_stamp.pl.
(Looks like there's a reference in .gitattributes, too)

regards, tom lane

Re: remove doc/bug.template?

2019-05-13 Thread Magnus Hagander

On Mon, May 13, 2019 at 10:00 PM Robert Haas  wrote:

> On Mon, May 13, 2019 at 3:54 PM Peter Eisentraut
>  wrote:
> > I'm not sure doc/bug.template still serves a purpose.  There is bug
> > reporting advice in the documentation, and there is a bug reporting
> > form.  This file just seems outdated.  Should we remove it?
>
> In my opinion, yes.
>

+1.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: How to install login_hook in Postgres 10.5

2019-05-13 Thread legrand legrand

Hello,

This extension https://github.com/splendiddata/login_hook
seems very interesting !
But I didn't test it myself and maybe the best place to ask 
support is there
https://github.com/splendiddata/login_hook/issues

For information there is something equivalent in core
"[PATCH] A hook for session start"
https://www.postgresql.org/message-id/flat/20171103164305.1f952c0f%40asp437-24-g082ur#d897fd6ec551d242493815312de5f5d5

that finished commited
"pgsql: Add hooks for session start and session end"
https://www.postgresql.org/message-id/flat/575d6fa2-78d0-4456-8600-302fc35b2591%40dunslane.net#0819e315c6e44c49a36c69080cab644d

but was finally rollbacked because it didn't pass installcheck test
...

Maybe you are able to patch your pg installation, 
it would be a solution of choice (there is a nice exemple 
of extension included)

Showing interest for this may also help getting this feature back ;o)

Regards
PAscal




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: PG12, PGXS and linking pgfeutils

2019-05-13 Thread Alvaro Herrera

On 2019-May-13, Tom Lane wrote:

> I started working on a patch to do that, and soon noticed that there
> are pre-existing files logging.[hc] in src/bin/pg_rewind/.  This seems
> like a Bad Thing, in fact the #includes in pg_rewind/ are already a
> little confused due to this.  I think we should either rename those
> two pg_rewind files to something else, or rename the generic ones,
> perhaps to "fe_logging.[hc]".  The latter could be done nearly
> trivially as part of the movement patch, but on cosmetic grounds
> I'd be more inclined to do the former instead.  Thoughts?

I'd rename both :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: remove doc/bug.template?

2019-05-13 Thread Robert Haas

On Mon, May 13, 2019 at 3:54 PM Peter Eisentraut
 wrote:
> I'm not sure doc/bug.template still serves a purpose.  There is bug
> reporting advice in the documentation, and there is a bug reporting
> form.  This file just seems outdated.  Should we remove it?

In my opinion, yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

remove doc/bug.template?

2019-05-13 Thread Peter Eisentraut

I'm not sure doc/bug.template still serves a purpose.  There is bug
reporting advice in the documentation, and there is a bug reporting
form.  This file just seems outdated.  Should we remove it?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: What is an item pointer, anyway?

2019-05-13 Thread Peter Geoghegan

On Mon, May 13, 2019 at 12:38 PM Tom Lane  wrote:
>  /*
> - * Prune specified item pointer or a HOT chain originating at that item.
> + * Prune specified line pointer or a HOT chain originating at that item.
>   *
>   * If the item is an index-referenced tuple (i.e. not a heap-only tuple),
>
> Should "that item" also be re-worded, for consistency?

Yes, it should be -- I'll fix it.

I'm going to backpatch the storage.sgml change on its own, while
pushing everything else in a separate HEAD-only commit.

Thanks
-- 
Peter Geoghegan

Quitting the thes

2019-05-13 Thread Stephen Amell


Hi all,

Here my first patch for postgres. Starting by an easy thing, I correct 
the duplicated "the the" strings from comments on some files.


- src/backend/executor/execExpr.c
- src/include/c.h
- src/include/jit/llvmjit_emit.h
- src/include/nodes/execnodes.h
- src/include/replication/logical.h

Any feedback are welcome!

Thanks a lot,


diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 0fb31f5..0a7b2b8 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -2361,7 +2361,7 @@ get_last_attnums_walker(Node *node, LastAttnumInfo *info)
  * Compute additional information for EEOP_*_FETCHSOME ops.
  *
  * The goal is to determine whether a slot is 'fixed', that is, every
- * evaluation of the the expression will have the same type of slot, with an
+ * evaluation of the expression will have the same type of slot, with an
  * equivalent descriptor.
  */
 static void
diff --git a/src/include/c.h b/src/include/c.h
index 33c9518..0f89c02 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -1129,7 +1129,7 @@ typedef union PGAlignedXLogBlock
  * Please note IT IS NOT SAFE to cast constness away if the result will ever
  * be modified (it would be undefined behaviour). Doing so anyway can cause
  * compiler misoptimizations or runtime crashes (modifying readonly memory).
- * It is only safe to use when the the result will not be modified, but API
+ * It is only safe to use when the result will not be modified, but API
  * design or language restrictions prevent you from declaring that
  * (e.g. because a function returns both const and non-const variables).
  *
diff --git a/src/include/jit/llvmjit_emit.h b/src/include/jit/llvmjit_emit.h
index 9569da6..71a8625 100644
--- a/src/include/jit/llvmjit_emit.h
+++ b/src/include/jit/llvmjit_emit.h
@@ -218,7 +218,7 @@ l_mcxt_switch(LLVMModuleRef mod, LLVMBuilderRef b, 
LLVMValueRef nc)
 }
 
 /*
- * Return pointer to the the argno'th argument nullness.
+ * Return pointer to the argno'th argument nullness.
  */
 static inline LLVMValueRef
 l_funcnullp(LLVMBuilderRef b, LLVMValueRef v_fcinfo, size_t argno)
@@ -236,7 +236,7 @@ l_funcnullp(LLVMBuilderRef b, LLVMValueRef v_fcinfo, size_t 
argno)
 }
 
 /*
- * Return pointer to the the argno'th argument datum.
+ * Return pointer to the argno'th argument datum.
  */
 static inline LLVMValueRef
 l_funcvaluep(LLVMBuilderRef b, LLVMValueRef v_fcinfo, size_t argno)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ff33287..4b1635e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -593,7 +593,7 @@ typedef struct EState
 * and with which options.  es_jit is created on-demand when JITing is
 * performed.
 *
-* es_jit_combined_instr is the the combined, on demand allocated,
+* es_jit_combined_instr is the combined, on demand allocated,
 * instrumentation from all workers. The leader's instrumentation is 
kept
 * separate, and is combined on demand by ExplainPrintJITSummary().
 */
diff --git a/src/include/replication/logical.h 
b/src/include/replication/logical.h
index 0a2a63a..0daa38c 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -47,7 +47,7 @@ typedef struct LogicalDecodingContext
 
/*
 * Marks the logical decoding context as fast forward decoding one. 
Such a
-* context does not have plugin loaded so most of the the following
+* context does not have plugin loaded so most of the following
 * properties are unused.
 */
boolfast_forward;

Re: Inconsistency between table am callback and table function names

2019-05-13 Thread Robert Haas

On Fri, May 10, 2019 at 3:43 PM Ashwin Agrawal  wrote:
> Meant to stick the question mark in that email, somehow missed. Yes
> not planning to spend any time on it if objections. Here is the list
> of renames I wish to perform.
>
> Lets start with low hanging ones.
>
> table_rescan -> table_scan_rescan
> table_insert -> table_tuple_insert
> table_insert_speculative -> table_tuple_insert_speculative
> table_delete -> table_tuple_delete
> table_update -> table_tuple_update
> table_lock_tuple -> table_tuple_lock
>
> Below two you already mentioned no objections to rename
> table_fetch_row_version -> table_tuple_fetch_row_version
> table_get_latest_tid -> table_tuple_get_latest_tid
>
> Now, table_beginscan and table_endscan are the ones which are
> wide-spread.

I vote to rename all the ones where the new name would contain "tuple"
and to leave the others alone.  i.e. leave table_beginscan,
table_endscan, and table_rescan as they are.  I think that there's
little benefit in standardizing table_rescan but not the other two,
and we seem to agree that tinkering with the other two gets into a
painful amount of churn.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: What is an item pointer, anyway?

2019-05-13 Thread Tom Lane

Peter Geoghegan  writes:
> On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan  wrote:
>> Attached draft patch adjusts code comments and error messages where a
>> line pointer is referred to as an item pointer. It turns out that this
>> practice isn't all that prevalent. Here are some specific concerns
>> that I had to think about when writing the patch, though:

> Ping? Any objections to pushing ahead with this?

Patch looks fine to me.  One minor quibble: in pruneheap.c you have

 /*
- * Prune specified item pointer or a HOT chain originating at that item.
+ * Prune specified line pointer or a HOT chain originating at that item.
  *
  * If the item is an index-referenced tuple (i.e. not a heap-only tuple),

Should "that item" also be re-worded, for consistency?

regards, tom lane

Re: PG12, PGXS and linking pgfeutils

2019-05-13 Thread Tom Lane

I wrote:
> Alvaro Herrera  writes:
>> I wonder if a better solution isn't to move the file_utils stuff to
>> fe_utils.  Half of it is frontend-specific.  The only one that should be
>> shared to backend seems to be fsync_fname ... but instead of sharing it,
>> we have a second copy in fd.c.

> Hm, if file_utils is the only thing in common/ that uses this, and we
> expect that to remain true, that would fix the issue.  But ...

Thumbing through commit cc8d41511, I see that it already touched
five common/ modules

diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
diff --git a/src/common/file_utils.c b/src/common/file_utils.c
diff --git a/src/common/pgfnames.c b/src/common/pgfnames.c
diff --git a/src/common/restricted_token.c b/src/common/restricted_token.c
diff --git a/src/common/rmtree.c b/src/common/rmtree.c

Several of those have substantial backend components, so moving them
to fe_utils is a nonstarter.  I think moving fe_utils/logging.[hc] to
common/ is definitely the way to get out of this problem.

I started working on a patch to do that, and soon noticed that there
are pre-existing files logging.[hc] in src/bin/pg_rewind/.  This seems
like a Bad Thing, in fact the #includes in pg_rewind/ are already a
little confused due to this.  I think we should either rename those
two pg_rewind files to something else, or rename the generic ones,
perhaps to "fe_logging.[hc]".  The latter could be done nearly
trivially as part of the movement patch, but on cosmetic grounds
I'd be more inclined to do the former instead.  Thoughts?

regards, tom lane

Re: What is an item pointer, anyway?

2019-05-13 Thread Peter Geoghegan

On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan  wrote:
> Attached draft patch adjusts code comments and error messages where a
> line pointer is referred to as an item pointer. It turns out that this
> practice isn't all that prevalent. Here are some specific concerns
> that I had to think about when writing the patch, though:

Ping? Any objections to pushing ahead with this?

-- 
Peter Geoghegan

Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> Hmm.  Oddly, you can't fix it by adding parens:

 Tom> regression=# select 'foo' similar to ('f' || escape) escape escape from 
(values ('oo')) v(escape);
 Tom> psql: ERROR:  syntax error at or near "escape"
 Tom> LINE 1: select 'foo' similar to ('f' || escape) escape escape from (...
 Tom> ^

 Tom> Since "escape" is an unreserved word, I'd have expected that to
 Tom> work. Odd.

Simpler cases fail too:

select 'f' || escape from (values ('o')) v(escape);
psql: ERROR:  syntax error at or near "escape"

select 1 + escape from (values (1)) v(escape);  -- works
select 1 & escape from (values (1)) v(escape);  -- fails

in short ESCAPE can't follow any generic operator, because its lower
precedence forces the operator to be reduced as a postfix op instead.

 Tom> The big picture here is that fixing grammar ambiguities by adding
 Tom> precedence is a dangerous business :-(

Yeah. But the alternative is usually reserving words more strictly,
which has its own issues :-(

Or we could kill off postfix operators...

-- 
Andrew (irc:RhodiumToad)

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Tom Lane

I wrote:
> If we do anything at all about this, my thought would just be to change
> bootstrap_signals() so that it points all the signal handlers at
> quickdie(), or maybe something equivalent to quickdie() but printing
> a more apropos message, or even just set them all to SIGDFL since that
> means process termination for all of these.  die() isn't really the right
> thing, precisely because it thinks it can trigger transaction abort,
> which makes no sense in bootstrap mode.

After further thought I like the SIG_DFL answer, as per attached proposed
patch.  With this, you get SIGINT behavior like this if you manage to
catch it in bootstrap mode (which is not that easy these days):

selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... America/New_York
creating configuration files ... ok
running bootstrap script ... ^Cchild process was terminated by signal 2: 
Interrupt
initdb: removing data directory "/home/postgres/testversion/data"

That seems perfectly fine from here.

> But ... that code's been like that for decades and nobody's complained
> before.  Why are we worried about bootstrap's response to signals at all?

I'm still wondering why the OP cares.  Still, the PANIC message that you
get right now is potentially confusing.

regards, tom lane

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index d8776e1..825433d 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -558,11 +558,16 @@ bootstrap_signals(void)
 {
 	Assert(!IsUnderPostmaster);

-	/* Set up appropriately for interactive use */
-	pqsignal(SIGHUP, die);
-	pqsignal(SIGINT, die);
-	pqsignal(SIGTERM, die);
-	pqsignal(SIGQUIT, die);
+	/*
+	 * We don't actually need any non-default signal handling in bootstrap
+	 * mode; "curl up and die" is a sufficient response for all these cases.
+	 * But let's just make sure the signals are set that way, since our parent
+	 * process initdb has them set differently.
+	 */
+	pqsignal(SIGHUP, SIG_DFL);
+	pqsignal(SIGINT, SIG_DFL);
+	pqsignal(SIGTERM, SIG_DFL);
+	pqsignal(SIGQUIT, SIG_DFL);
 }

 /*

Re: POC: Cleaning up orphaned files using undo logs

2019-05-13 Thread Robert Haas

On Sun, May 12, 2019 at 2:15 AM Dilip Kumar  wrote:
> I have removed some of the globals and also improved some comments.

I don't like the discard_lock very much.  Perhaps it's OK, but I hope
that there are better alternatives.  One problem with Thomas Munro
pointed out to me in off-list discussion is that the discard_lock has
to be held by anyone reading undo even if the undo they are reading
and the undo that the discard worker wants to discard are in
completely different parts of the undo log.  Somebody could be trying
to read an undo page written 1 second ago while the discard worker is
trying to discard an undo page written to the same undo log 1 hour
ago.  Those things need not block each other, but with this design
they will.  Another problem is that we end up holding it across an
I/O; there's precedent for that, but it's not particularly good
precedent.  Let's see if we can do better.

My first idea was that we should just make this the caller's problem
instead of handling it in this layer.  Undo is retained for committed
transactions until they are all-visible, and the reason for that is
that we presume that nobody can be interested in the data for MVCC
purposes unless there's a snapshot that can't see the results of the
transaction in question.  Once the committed transaction is
all-visible, that's nobody, so it should be fine to just discard the
undo any time we like.  That won't work with the existing zheap code,
which currently sometimes follows undo chains for transactions that
are all-visible, but I think that's a problem we should fix rather
than something we should force the undo layer to support.  We'd still
need something kinda like the discard_lock for aborted transactions,
though, because as soon as you release the buffer lock on a table
page, the undo workers could apply all the undo to that page and then
discard it, and then you could afterwards try to look up the undo
pointer which you had retrieved from that page and stored in
backend-local memory.  One thing we could probably do is make that a
heavyweight lock on the XID itself, so if you observe that an XID is
aborted, you have to go get this lock in ShareLock mode, then recheck
the page, and only then consult the undo; discarding the undo for an
aborted transaction would require AccessExclusiveLock on the XID.
This solution gets rid of the LWLock for committed undo; for aborted
undo, it avoids the false sharing and non-interruptibility that an
LWLock imposes.

But then I had what I think may be a better idea.  Let's add a new
ReadBufferMode that suppresses the actual I/O; if the buffer is not
already present in shared_buffers, it allocates a buffer but returns
it without doing any I/O, so the caller must be prepared for BM_VALID
to be unset.  I don't know what to call this, so I'll call it
RBM_ALLOCATE (leaving room for possible future variants like
RBM_ALLOCATE_AND_LOCK).  Then, the protocol for reading an undo buffer
would go like this:

1. Read the buffer with RBM_ALLOCATE, thus acquiring a pin on the
relevant buffer.
2. Check whether the buffer precedes the discard horizon for that undo
log stored in shared memory.
3. If so, use the ForgetBuffer() code we have in the zheap branch to
deallocate the buffer and stop here.  The undo is not available to be
read, whether it's still physically present or not.
4. Otherwise, if the buffer is not valid, call ReadBufferExtended
again, or some new function, to make it so.  Remember to release all
of our pins.

The protocol for discarding an undo buffer would go like this:

1. Advance the discard horizon in shared memory.
2. Take a cleanup lock on each buffer that ought to be discarded.
Remember the dirty ones and forget the others.
3. WAL-log the discard operation.
4. Revisit the dirty buffers we remembered in step 2 and forget them.

The idea is that, once we've advanced the discard horizon in shared
memory, any readers that come along later are responsible for making
sure that they never do I/O on any older undo.  They may create some
invalid buffers in shared memory, but they'll hopefully also get rid
of them if they do, and if they error out for some reason before doing
so, that buffer should age out naturally.  So, the discard worker just
needs to worry about buffers that already exist.  Once it's taken a
cleanup lock on each buffer, it knows that there are no I/O operations
and in fact no buffer usage of any kind still in progress from before
it moved the in-memory discard horizon.  Anyone new that comes along
will clean up after themselves.  We postpone forgetting dirty buffers
until after we've successfully WAL-logged the discard, in case we fail
to do so.

With this design, we don't add any new cases where a lock of any kind
must be held across an I/O, and there's also no false sharing.
Furthermore, unlike the previous proposal, this will work nicely with
something like old_snapshot_threshold.  The previous design relies on
undo not getting discarded while anyone still cares

Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Tom Lane

Andrew Gierth  writes:
> "Andrew" == Andrew Gierth  writes:
>  Andrew> The ESCAPE part could in theory be ambiguous if the SIMILAR
>  Andrew> expression ends in a ... SIMILAR TO xxx operator, since then we
>  Andrew> wouldn't know whether to attach the ESCAPE to that or keep it
>  Andrew> as part of the function syntax. But I think this is probably a
>  Andrew> non-issue. More significant is that ... COLNAME ! ESCAPE ...
>  Andrew> again has postfix- vs. infix-operator ambiguities.

> And this ambiguity shows up already in other contexts:

> select 'foo' similar to 'f' || escape escape escape from (values ('oo')) 
> v(escape);
> psql: ERROR:  syntax error at or near "escape"
> LINE 1: select 'foo' similar to 'f' || escape escape escape from (va...


Hmm.  Oddly, you can't fix it by adding parens:

regression=# select 'foo' similar to ('f' || escape) escape escape from (values 
('oo')) v(escape);
psql: ERROR:  syntax error at or near "escape"
LINE 1: select 'foo' similar to ('f' || escape) escape escape from (...
^

Since "escape" is an unreserved word, I'd have expected that to work.
Odd.

The big picture here is that fixing grammar ambiguities by adding
precedence is a dangerous business :-(

regards, tom lane

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Andres Freund

Hi,

On 2019-05-13 13:33:00 -0400, Alvaro Herrera wrote:
> On 2019-May-13, Andres Freund wrote:
> 
> > On 2019-05-13 13:07:30 -0400, Alvaro Herrera wrote:
> > > On 2019-May-13, Andres Freund wrote:
> 
> > > The first ResetUnloggedRelations call occurs before any WAL is replayed,
> > > so the data dir certainly still in inconsistent state.  At that point,
> > > we need the init fork files to be present, because the init files are the
> > > indicators of what relations we need to delete the other forks for.
> > 
> > Hm. I think this might be a self-made problem. For the main fork, we
> > don't need this - if the init fork was created before the last
> > checkpoint/restartpoint, it'll be on-disk. If it was created afterwards,
> > WAL replay will recreate both main an init fork. So the problem is just
> > that the VM fork might survive, because it'll not get nuked given the
> > current arrangement. Is that what you're thinking about?

I was wrong here - I thought we WAL logged the main fork creation even
for unlogged tables. I think it's foolish that we don't, but we don't.

Greetings,

Andres Freund

Re: pg12 release notes

2019-05-13 Thread Justin Pryzby

On Fri, May 10, 2019 at 05:10:06PM +1200, David Rowley wrote:
> On Fri, 10 May 2019 at 16:52, Justin Pryzby  wrote:
> > I'm rechecking my list from last month.  What about these ?
> >
> > > c076f3d Remove pg_constraint.conincluding
> > > bd09503 Increase the default vacuum_cost_limit from 200 to 2000
> 
> bd09503 was reverted in 52985e4fea7 and replaced by cbccac371, which
> is documented already by:
> 
>  Reduce the default value of 
> autovacuum_vacuum_cost_delay to 2ms (Tom Lane) 

Right, thanks.

I suspect c076f3d should be included, though.

Also,

|The many system tables with such columns will now display those columns with 
SELECT * by default. 
I think could be better:
|SELECT * will now output those columns for the many system tables which have 
them.
|(previously, the columns weren't shown unless explicitly selected).

Justin

Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Andrew Gierth

> "Andrew" == Andrew Gierth  writes:

 Andrew> The ESCAPE part could in theory be ambiguous if the SIMILAR
 Andrew> expression ends in a ... SIMILAR TO xxx operator, since then we
 Andrew> wouldn't know whether to attach the ESCAPE to that or keep it
 Andrew> as part of the function syntax. But I think this is probably a
 Andrew> non-issue. More significant is that ... COLNAME ! ESCAPE ...
 Andrew> again has postfix- vs. infix-operator ambiguities.

And this ambiguity shows up already in other contexts:

select 'foo' similar to 'f' || escape escape escape from (values ('oo')) 
v(escape);
psql: ERROR:  syntax error at or near "escape"
LINE 1: select 'foo' similar to 'f' || escape escape escape from (va...

select 'foo' similar to 'f' || escape escape from (values ('oo')) v(escape);
psql: ERROR:  operator does not exist: unknown ||
LINE 1: select 'foo' similar to 'f' || escape escape from (values ('...

I guess this happens because ESCAPE has precedence below POSTFIXOP, so
the ('f' ||) gets reduced in preference to shifting in the first ESCAPE
token.

-- 
Andrew (irc:RhodiumToad)

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Andres Freund

Hi,

On 2019-05-13 13:21:33 -0400, Alvaro Herrera wrote:
> On 2019-May-13, Robert Haas wrote:
> > If that's the scenario, I'm not sure the smgrimmedsync() call is
> > sufficient.  Suppose we log_smgrcreate() but then crash before
> > smgrimmedsync()... seems like we'd need to do them in the other order,
> > or else maybe just pass a flag to copy_file() telling it not to be so
> > picky.
> 
> Well, if we do modify copy_file and thus we don't think the deletion
> step serves any purpose, why not just get rid of the deletion step
> entirely?

Well, it does serve a purpose:
/*
 * We're in recovery, so unlogged relations may be trashed and 
must be
 * reset.  This should be done BEFORE allowing Hot Standby
 * connections, so that read-only backends don't try to read 
whatever
 * garbage is left over from before.
 */

but I'm somewhat doubtful that this logic is correct.

Greetings,

Andres Freund

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Andres Freund

On 2019-05-13 13:09:17 -0400, Robert Haas wrote:
> On Mon, May 13, 2019 at 12:50 PM Andres Freund  wrote:
> > > AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
> > > so it would be sufficient to have the init fork be recovered from WAL
> > > for that to work.  However, we also do ResetUnloggedRelations *before*
> > > replaying WAL in order to remove leftover not-init-fork files, and that
> > > process requires that the init fork is present at that time.
> >
> > What scenario are you precisely wondering about? That
> > ResetUnloggedRelations() could overwrite the main fork, while not yet
> > having valid contents (due to the lack of smgrimmedsync())? Shouldn't
> > that only be possible while still in an inconsistent state? A checkpoint
> > would have serialized the correct contents, and we'd not reach HS
> > consistency before having replayed that WAL records resetting the table
> > and the init fork consistency?
> 
> I think I see what Alvaro is talking about, or at least I think I see
> *a* possible problem based on his remarks.
> 
> Suppose we create an unlogged table and then crash. The main fork
> makes it to disk, and the init fork does not.  Before WAL replay, we
> remove any main forks that have init forks, but because the init fork
> was lost, that does not happen.  Recovery recreates the init fork.
> After WAL replay, we try to copy_file() each _init fork to the
> corresponding main fork. That fails, because copy_file() expects to be
> able to create the target file, and here it can't do that because it
> already exists.

Ugh, this is all such a mess. But, isn't this broken independently of
the smgrimmedsync() issue? In a basebackup case, the basebackup could
have included the main fork, but not the init fork, and the reverse. WAL
replay *solely* needs to be able to recover from that.  At the very
least we'd have to do the cleanup step after becoming consistent, not
just before recovery even started.

> If that's the scenario, I'm not sure the smgrimmedsync() call is
> sufficient.  Suppose we log_smgrcreate() but then crash before
> smgrimmedsync()... seems like we'd need to do them in the other order,
> or else maybe just pass a flag to copy_file() telling it not to be so
> picky.

I think I mentioned this elsewhere recently, I think our smgr WAL
logging makes this unneccessarily hard. We should just have the intial
smgrcreate record (be it for the main fork, or just the init fork), have
a flag that tells WAL replay to clean out everything pre-existing

Hm, I'm also a bit confused as to how ResetUnloggedRelations() got to
take its argument as a bitmask - given that we only pass in individual
flags at the moment.  Seems pretty clear to me that the call at the end
of recovery would need to be UNLOGGED_RELATION_CLEANUP |
UNLOGGED_RELATION_INIT, unless we make some bigger changes?

Greetings,

Andres Freund

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Alvaro Herrera

On 2019-May-13, Andres Freund wrote:

> On 2019-05-13 13:07:30 -0400, Alvaro Herrera wrote:
> > On 2019-May-13, Andres Freund wrote:

> > The first ResetUnloggedRelations call occurs before any WAL is replayed,
> > so the data dir certainly still in inconsistent state.  At that point,
> > we need the init fork files to be present, because the init files are the
> > indicators of what relations we need to delete the other forks for.
> 
> Hm. I think this might be a self-made problem. For the main fork, we
> don't need this - if the init fork was created before the last
> checkpoint/restartpoint, it'll be on-disk. If it was created afterwards,
> WAL replay will recreate both main an init fork. So the problem is just
> that the VM fork might survive, because it'll not get nuked given the
> current arrangement. Is that what you're thinking about?

No, this wasn't was I was trying to explain.  Robert described it
better.

> > Maybe we can do something lighter than a full immedsync of all the data
> > for the init file -- it would be sufficient to have the file *exist* --
> > but I'm not sure this optimization is worth anything.
> 
> I don't think just that is sufficient in isolation for types of
> relations with metapages (e.g. btree) - the init fork constains data
> there.

No, I meant that when doing the initial cleanup (before WAL replay) we
only delete files; and for that we only need to know whether the table
is unlogged, and we know that by testing presence of the init file.  We
do need the contents *after* WAL replay, and for indexes we of course
need the actual contents of the init fork.

> > > Well, otherwise the relation won't exist on a standby? And if replay
> > > starts from before a database/tablespace creation we'd remove the init
> > > fork. So if it's not in the WAL, we'd loose it.
> > 
> > Ah, of course.  Well, that needs to be in the comments then.
> 
> I think it is?
> 
>  * ... Recovery may as well remove it
>* while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
>* record. Therefore, logging is necessary even if wal_level=minimal.

I meant the standby part.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Andres Freund

Hi,

On 2019-05-13 13:07:30 -0400, Alvaro Herrera wrote:
> On 2019-May-13, Andres Freund wrote:
> 
> > On 2019-05-13 12:24:05 -0400, Alvaro Herrera wrote:
> 
> > > AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
> > > so it would be sufficient to have the init fork be recovered from WAL
> > > for that to work.  However, we also do ResetUnloggedRelations *before*
> > > replaying WAL in order to remove leftover not-init-fork files, and that
> > > process requires that the init fork is present at that time.
> > 
> > What scenario are you precisely wondering about? That
> > ResetUnloggedRelations() could overwrite the main fork, while not yet
> > having valid contents (due to the lack of smgrimmedsync())? Shouldn't
> > that only be possible while still in an inconsistent state? A checkpoint
> > would have serialized the correct contents, and we'd not reach HS
> > consistency before having replayed that WAL records resetting the table
> > and the init fork consistency?
> 
> The first ResetUnloggedRelations call occurs before any WAL is replayed,
> so the data dir certainly still in inconsistent state.  At that point,
> we need the init fork files to be present, because the init files are the
> indicators of what relations we need to delete the other forks for.

Hm. I think this might be a self-made problem. For the main fork, we
don't need this - if the init fork was created before the last
checkpoint/restartpoint, it'll be on-disk. If it was created afterwards,
WAL replay will recreate both main an init fork. So the problem is just
that the VM fork might survive, because it'll not get nuked given the
current arrangement. Is that what you're thinking about?

I'm doubtful that that is a sane arrangement - there very well could be
tables created and dropped, and then recreated with a recycled oid,
between start and end of recovery. I'm not sure this is actively a
problem for the VM, but I think it's pretty broken for the FSM.

Why isn't the correct answer to nuke all forks during the WAL replay of
the main relation's creation?

> Maybe we can do something lighter than a full immedsync of all the data
> for the init file -- it would be sufficient to have the file *exist* --
> but I'm not sure this optimization is worth anything.

I don't think just that is sufficient in isolation for types of
relations with metapages (e.g. btree) - the init fork constains data
there.

> > > So I think the immedsync call is necessary (otherwise the cleanup
> > > may fail).  I don't quite understand why the log_smgrcreate is
> > > necessary, but I think it is for reasons that are not adequately
> > > explained by the existing comments.
> > 
> > Well, otherwise the relation won't exist on a standby? And if replay
> > starts from before a database/tablespace creation we'd remove the init
> > fork. So if it's not in the WAL, we'd loose it.
> 
> Ah, of course.  Well, that needs to be in the comments then.

I think it is?

 * ... Recovery may as well remove it
 * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
 * record. Therefore, logging is necessary even if wal_level=minimal.

Greetings,

Andres Freund

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Alvaro Herrera

On 2019-May-13, Robert Haas wrote:

> I think I see what Alvaro is talking about, or at least I think I see
> *a* possible problem based on his remarks.
> 
> Suppose we create an unlogged table and then crash. The main fork
> makes it to disk, and the init fork does not.  Before WAL replay, we
> remove any main forks that have init forks, but because the init fork
> was lost, that does not happen.  Recovery recreates the init fork.
> After WAL replay, we try to copy_file() each _init fork to the
> corresponding main fork. That fails, because copy_file() expects to be
> able to create the target file, and here it can't do that because it
> already exists.

... right, that seems to be it.

> If that's the scenario, I'm not sure the smgrimmedsync() call is
> sufficient.  Suppose we log_smgrcreate() but then crash before
> smgrimmedsync()... seems like we'd need to do them in the other order,
> or else maybe just pass a flag to copy_file() telling it not to be so
> picky.

Well, if we do modify copy_file and thus we don't think the deletion
step serves any purpose, why not just get rid of the deletion step
entirely?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: att_isnull

2019-05-13 Thread Robert Haas

On Fri, May 10, 2019 at 10:46 AM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > Yeah, I've noticed this inconsistency too.  I doubt we want to change
> > the macro definition or its name, but +1 for expanding the comment.
> > Your proposed wording seems sufficient.
>
> +1

OK, committed.  I assume nobody is going to complain that such changes
are off-limits during feature freeze, but maybe I'll be unpleasantly
surprised.

> > I remember being bit by this inconsistency while fixing data corruption
> > problems, but I'm not sure what, if anything, should we do about it.
> > Maybe there's a perfect spot where to add some further documentation
> > about it (a code comment somewhere?) but I don't know where would that
> > be.
>
> It is documented in the "Database Physical Storage" part of the docs,
> but no particular emphasis is laid on the 1-vs-0 convention.  Maybe
> a few more words there are worthwhile?

To me it seems like we more need to emphasize it in the code comments,
but I have no concrete proposal.  I don't think this is an urgent
problem that needs to consume a lot of cycles right now, but I thought
it was worth mentioning for the archives and just to get the idea out
there that maybe we could do better someday.

(My first idea was to deadpan a proposal that we reverse the
convention, but then I realized that trolling the list might not be my
best strategy.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Robert Haas

On Mon, May 13, 2019 at 12:50 PM Andres Freund  wrote:
> > AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
> > so it would be sufficient to have the init fork be recovered from WAL
> > for that to work.  However, we also do ResetUnloggedRelations *before*
> > replaying WAL in order to remove leftover not-init-fork files, and that
> > process requires that the init fork is present at that time.
>
> What scenario are you precisely wondering about? That
> ResetUnloggedRelations() could overwrite the main fork, while not yet
> having valid contents (due to the lack of smgrimmedsync())? Shouldn't
> that only be possible while still in an inconsistent state? A checkpoint
> would have serialized the correct contents, and we'd not reach HS
> consistency before having replayed that WAL records resetting the table
> and the init fork consistency?

I think I see what Alvaro is talking about, or at least I think I see
*a* possible problem based on his remarks.

Suppose we create an unlogged table and then crash. The main fork
makes it to disk, and the init fork does not.  Before WAL replay, we
remove any main forks that have init forks, but because the init fork
was lost, that does not happen.  Recovery recreates the init fork.
After WAL replay, we try to copy_file() each _init fork to the
corresponding main fork. That fails, because copy_file() expects to be
able to create the target file, and here it can't do that because it
already exists.

If that's the scenario, I'm not sure the smgrimmedsync() call is
sufficient.  Suppose we log_smgrcreate() but then crash before
smgrimmedsync()... seems like we'd need to do them in the other order,
or else maybe just pass a flag to copy_file() telling it not to be so
picky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Alvaro Herrera

On 2019-May-13, Andres Freund wrote:

> On 2019-05-13 12:24:05 -0400, Alvaro Herrera wrote:

> > AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
> > so it would be sufficient to have the init fork be recovered from WAL
> > for that to work.  However, we also do ResetUnloggedRelations *before*
> > replaying WAL in order to remove leftover not-init-fork files, and that
> > process requires that the init fork is present at that time.
> 
> What scenario are you precisely wondering about? That
> ResetUnloggedRelations() could overwrite the main fork, while not yet
> having valid contents (due to the lack of smgrimmedsync())? Shouldn't
> that only be possible while still in an inconsistent state? A checkpoint
> would have serialized the correct contents, and we'd not reach HS
> consistency before having replayed that WAL records resetting the table
> and the init fork consistency?

The first ResetUnloggedRelations call occurs before any WAL is replayed,
so the data dir certainly still in inconsistent state.  At that point,
we need the init fork files to be present, because the init files are the
indicators of what relations we need to delete the other forks for.

Maybe we can do something lighter than a full immedsync of all the data
for the init file -- it would be sufficient to have the file *exist* --
but I'm not sure this optimization is worth anything.

> > So I think the immedsync call is necessary (otherwise the cleanup
> > may fail).  I don't quite understand why the log_smgrcreate is
> > necessary, but I think it is for reasons that are not adequately
> > explained by the existing comments.
> 
> Well, otherwise the relation won't exist on a standby? And if replay
> starts from before a database/tablespace creation we'd remove the init
> fork. So if it's not in the WAL, we'd loose it.

Ah, of course.  Well, that needs to be in the comments then.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Andres Freund

Hi,

On 2019-05-13 12:24:05 -0400, Alvaro Herrera wrote:
> > My guess, just shooting from the hip, is that the smgrimmedsync call
> > can be removed here.  If that's wrong, then we need a better
> > explanation for why it's needed, and we possibly need to add it to
> > every single place that does smgrcreate that doesn't have it already.
> 
> AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
> so it would be sufficient to have the init fork be recovered from WAL
> for that to work.  However, we also do ResetUnloggedRelations *before*
> replaying WAL in order to remove leftover not-init-fork files, and that
> process requires that the init fork is present at that time.

What scenario are you precisely wondering about? That
ResetUnloggedRelations() could overwrite the main fork, while not yet
having valid contents (due to the lack of smgrimmedsync())? Shouldn't
that only be possible while still in an inconsistent state? A checkpoint
would have serialized the correct contents, and we'd not reach HS
consistency before having replayed that WAL records resetting the table
and the init fork consistency?

> So I think
> the immedsync call is necessary (otherwise the cleanup may fail).  I
> don't quite understand why the log_smgrcreate is necessary, but I think
> it is for reasons that are not adequately explained by the existing
> comments.

Well, otherwise the relation won't exist on a standby? And if replay
starts from before a database/tablespace creation we'd remove the init
fork. So if it's not in the WAL, we'd loose it.

> IMO if you can remove either the immedsync or the log_smgrcreate call
> and no test fails, then we're either missing test cases, or (one of) the
> calls is unnecessary.

We definitely don't have a high coverage of more complicated recovery
scenarios.

Greetings,

Andres Freund

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Alvaro Herrera

I agree that the wording "recovery may as well" is incorrect and that
"may well" makes it correct.

On 2019-May-13, Robert Haas wrote:

> My guess, just shooting from the hip, is that the smgrimmedsync call
> can be removed here.  If that's wrong, then we need a better
> explanation for why it's needed, and we possibly need to add it to
> every single place that does smgrcreate that doesn't have it already.

AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
so it would be sufficient to have the init fork be recovered from WAL
for that to work.  However, we also do ResetUnloggedRelations *before*
replaying WAL in order to remove leftover not-init-fork files, and that
process requires that the init fork is present at that time.  So I think
the immedsync call is necessary (otherwise the cleanup may fail).  I
don't quite understand why the log_smgrcreate is necessary, but I think
it is for reasons that are not adequately explained by the existing
comments.

IMO if you can remove either the immedsync or the log_smgrcreate call
and no test fails, then we're either missing test cases, or (one of) the
calls is unnecessary.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Removing unneeded self joins

2019-05-13 Thread Alexander Kuzmenkov


On 3/25/19 18:13, Alexander Kuzmenkov wrote:


Please see the attached v15.



I won't be able to continue working on this because I'm changing jobs. 
My colleague Arseny Sher is probably going to take over.


Here is a v16 that is a rebased v12, plus renames from v15, plus a 
couple of bug fixes (use bms_is_empty instead of a NULL check, and 
properly create a NullTest clause when replacing "X = X").


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 7ef160823a4051cad54852783c566c51366c2a8a Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov 
Date: Mon, 13 May 2019 19:11:10 +0300
Subject: [PATCH] Remove self joins v16

---
 src/backend/nodes/outfuncs.c  |  19 +-
 src/backend/optimizer/path/indxpath.c |  28 +-
 src/backend/optimizer/path/joinpath.c |   6 +-
 src/backend/optimizer/plan/analyzejoins.c | 981 +++---
 src/backend/optimizer/plan/planmain.c |   7 +-
 src/backend/optimizer/util/pathnode.c |   3 +-
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/pathnodes.h |  22 +-
 src/include/optimizer/pathnode.h  |   5 +
 src/include/optimizer/paths.h |   4 +-
 src/include/optimizer/planmain.h  |   9 +-
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 248 +++-
 src/test/regress/sql/equivclass.sql   |  17 +
 src/test/regress/sql/join.sql | 127 
 16 files changed, 1436 insertions(+), 99 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 387e4b9..8c6d755 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2271,7 +2271,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2344,6 +2345,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 }
 
 static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(column_values);
+	}
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -4109,6 +4123,9 @@ outNode(StringInfo str, const void *obj)
 			case T_StatisticExtInfo:
 _outStatisticExtInfo(str, obj);
 break;
+			case T_UniqueRelInfo:
+_outUniqueRelInfo(str, obj);
+break;
 			case T_ExtensibleNode:
 _outExtensibleNode(str, obj);
 break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..88e61d4 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueRelInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueRelInfo **unique_info)
 {
 	ListCell   *ic;
 
 	Assert(list_length(exprlist) == list_length(oprlist));
 
+	if (unique_info)
+		*unique_info = NULL;
+
 	/* Short-circuit if no indexes... */
 	if (rel->indexlist == NIL)
 		return false;
@@ -3660,6 +3665,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *column_values = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3714,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 if (match_index_to_operand(rexpr, c, ind))
 {
 	matched = true; /* column is unique */
+	column_values = lappend(column_values, rinfo->outer_is_left
+			? get_leftop(rinfo->clause)
+			: get_rightop(rinfo->clause));
 	break;
 }
 			}
@@ -3750,7 +3759,22 @@

Re: Fuzzy thinking in is_publishable_class

2019-05-13 Thread Robert Haas

On Thu, May 9, 2019 at 9:41 AM Tom Lane  wrote:
> > I think we can get rid of the ability to reload the information_schema
> > after initdb.  That was interesting in the early phase of its
> > development, but now it just creates complications.
>
> We've relied on that more than once to allow minor-release updates of
> information_schema views, so I think losing the ability to do it is
> a bad idea.

+1

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Unlogged tables cleanup

2019-05-13 Thread Robert Haas

On Thu, Dec 8, 2016 at 7:49 PM Michael Paquier
 wrote:
> On Fri, Dec 9, 2016 at 4:54 AM, Robert Haas  wrote:
> > On Wed, Dec 7, 2016 at 11:20 PM, Michael Paquier
> >  wrote:
> >> OK, I rewrote a bit the patch as attached. What do you think?
> >
> > Committed and back-patched all the way back to 9.2.
>
> Thanks!

This thread resulted in commit
fa0f466d5329e10b16f3b38c8eaf5306f7e234e8, and today I had cause to
look at that commit again.  The code is now in
heapam_relation_set_new_filenode. I noticed a few things that seem
like they are not quite right.

1. The comment added in that commit says "Recover may as well remove
it while replaying..." but what is really meant is "Recovery may well
remove it well replaying..."  The phrase "may as well" means that
there isn't really any reason not to do it even if it's not strictly
necessary.  The phrase "may well" means that it's entirely possible.
The latter meaning is the one we want here.

2. The comment as adjusted in that commit refers to needing an
immediate sync "even if the page has been logged, because the write
did not go through shared_buffers," but there is no page and no write,
because an empty heap is just an empty file.   That logic makes sense
for index AMs that bypass shared buffers to write a metapage, e.g.
nbtree, as opposed to others which go through shared_buffers and don't
have the immediate sync, e.g. brin.  However, the heap writes no pages
either through shared buffers or bypassing shared buffers, so either
I'm confused or the comment makes no sense.

3. Before that commit, the comment said that "the immediate sync is
required, because otherwise there's no guarantee that this will hit
the disk before the next checkpoint moves the redo pointer."  However,
that justification seems to apply to *anything* that does smgrcreate +
log_smgrcreate would also need to do smgrimmedsync, and
RelationCreateStorage doesn't.  Unless, for some reason that I'm not
thinking of right now, the init fork has stronger durability
requirements that the main fork, it's hard to understand why
heapam_relation_set_new_filenode can call RelationCreateStorage to do
smgrcreate + log_smgrcreate for the main fork and that's OK, but when
it does the same thing itself for the init fork, it now needs
smgrimmedsync as well.

My guess, just shooting from the hip, is that the smgrimmedsync call
can be removed here.  If that's wrong, then we need a better
explanation for why it's needed, and we possibly need to add it to
every single place that does smgrcreate that doesn't have it already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Passing CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot()

2019-05-13 Thread Ashutosh Sharma

On Mon, May 13, 2019 at 7:16 PM David Rowley 
wrote:

> On Mon, 13 May 2019 at 23:20, Ashutosh Sharma 
> wrote:
> > In the latest PostgreSQL code, I could see that we are passing
> CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot() although
> it is not being used anywhere in that function. Could you please let me
> know if it has been done intentionally or it is just an overlook that needs
> to be corrected. AFAIU, CopyMultiInsertInfoNextFreeSlot() is just intended
> to return the next free slot available in the multi insert buffer and we
> already have that buffer stored in ResultRelInfo structure which is also
> being passed to that function so not sure what could be the purpose of
> passing CopyMultiInsertInfo structure as well. Please let me know if i am
> missing something here. Thank you.
>
> There's likely no good reason for that. The attached removes it.
>

Thanks for the confirmation David. The patch looks good to me.


>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Kuntal Ghosh

On Mon, May 13, 2019 at 7:07 PM Tom Lane  wrote:

> Michael Paquier  writes:
> > On Mon, May 13, 2019 at 01:25:19PM +0530, Kuntal Ghosh wrote:
> >> If we fix the issue in this way, we're certainly not going to do all
> >> those portal,locks,memory,resource owner cleanups that are done
> >> inside AbortTransaction() for a normal transaction ID. But, I'm not
> >> sure how relevant those steps are since the database is anyway
> >> shutting down.
>
> > And it is happening in bootstrap, meaning that the data folder is most
> > likely toast, and needs to be reinitialized.
>
> Indeed, initdb is going to remove the data directory if the bootstrap run
> crashes.
>
> But ... that code's been like that for decades and nobody's complained
> before.  Why are we worried about bootstrap's response to signals at all?
>
> +1

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Passing CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot()

2019-05-13 Thread David Rowley

On Mon, 13 May 2019 at 23:20, Ashutosh Sharma  wrote:
> In the latest PostgreSQL code, I could see that we are passing 
> CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot() although 
> it is not being used anywhere in that function. Could you please let me know 
> if it has been done intentionally or it is just an overlook that needs to be 
> corrected. AFAIU, CopyMultiInsertInfoNextFreeSlot() is just intended to 
> return the next free slot available in the multi insert buffer and we already 
> have that buffer stored in ResultRelInfo structure which is also being passed 
> to that function so not sure what could be the purpose of passing 
> CopyMultiInsertInfo structure as well. Please let me know if i am missing 
> something here. Thank you.

There's likely no good reason for that. The attached removes it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_unused_parameter_from_CopyMultiInsertInfoNextFreeSlot.patch
Description: Binary data

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Tom Lane

Michael Paquier  writes:
> On Mon, May 13, 2019 at 01:25:19PM +0530, Kuntal Ghosh wrote:
>> If we fix the issue in this way, we're certainly not going to do all
>> those portal,locks,memory,resource owner cleanups that are done
>> inside AbortTransaction() for a normal transaction ID. But, I'm not
>> sure how relevant those steps are since the database is anyway
>> shutting down.

> And it is happening in bootstrap, meaning that the data folder is most
> likely toast, and needs to be reinitialized.

Indeed, initdb is going to remove the data directory if the bootstrap run
crashes.

If we do anything at all about this, my thought would just be to change
bootstrap_signals() so that it points all the signal handlers at
quickdie(), or maybe something equivalent to quickdie() but printing
a more apropos message, or even just set them all to SIGDFL since that
means process termination for all of these.  die() isn't really the right
thing, precisely because it thinks it can trigger transaction abort,
which makes no sense in bootstrap mode.

But ... that code's been like that for decades and nobody's complained
before.  Why are we worried about bootstrap's response to signals at all?

regards, tom lane

pg_stat_database update stats_reset only by pg_stat_reset

2019-05-13 Thread 张连壮

pg_stat_reset_single_table_counters/pg_stat_reset_single_function_counters
only update pg_stat_database column stats_reset.
stat_reset shuld update when all the column is reset.

sample:
drop database if exists lzzhang_db;
create database lzzhang_db;
\c lzzhang_db

create table lzzhang_tab(id int);
insert into lzzhang_tab values(1);
insert into lzzhang_tab values(1);

select tup_fetched, stats_reset from pg_stat_database where
datname='lzzhang_db';
select pg_sleep(1);

select pg_stat_reset_single_table_counters('lzzhang_tab'::regclass::oid);
select tup_fetched, stats_reset from pg_stat_database where
datname='lzzhang_db';

result:
 tup_fetched |  stats_reset
-+---
 514 | 2019-05-12 03:22:55.702753+08
(1 row)
 tup_fetched |  stats_reset
-+---
 710 | 2019-05-12 03:22:56.729336+08
(1 row)
tup_fetched is not reset but stats_reset is reset.
From 4f78735ceab9a410dbd015d1c0b1bf977eddf5b7 Mon Sep 17 00:00:00 2001
From: zhang lian zhuang 
Date: Sun, 12 May 2019 18:38:55 +0800
Subject: [PATCH] pg_stat_database update stats_reset only by pg_stat_reset

pg_stat_reset_single_table_counters and pg_stat_reset_single_function_counters
only update pg_stat_database column stats_reset.
stat_reset shuld update when all the column is reset.
---
 src/backend/postmaster/pgstat.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 15852fe24f..cba4450c01 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -5984,9 +5984,6 @@ pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len)
 	if (!dbentry)
 		return;
 
-	/* Set the reset timestamp for the whole database */
-	dbentry->stat_reset_timestamp = GetCurrentTimestamp();
-
 	/* Remove object if it exists, ignore it if not */
 	if (msg->m_resettype == RESET_TABLE)
 		(void) hash_search(dbentry->tables, (void *) &(msg->m_objectid),
-- 
2.21.0

Passing CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot()

2019-05-13 Thread Ashutosh Sharma

Hi Andres, Hari, David,

In the latest PostgreSQL code, I could see that we are passing
CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot() although
it is not being used anywhere in that function. Could you please let me
know if it has been done intentionally or it is just an overlook that needs
to be corrected. AFAIU, CopyMultiInsertInfoNextFreeSlot() is just intended
to return the next free slot available in the multi insert buffer and we
already have that buffer stored in ResultRelInfo structure which is also
being passed to that function so not sure what could be the purpose of
passing CopyMultiInsertInfo structure as well. Please let me know if i am
missing something here. Thank you.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:*http://www.enterprisedb.com *

Re: How to install login_hook in Postgres 10.5

2019-05-13 Thread pavan95

Hello Community,

While I was searching for logon trigger in postgres similar to that of
Oracle, I came across "login_hook", which can be installed as a  Postgres
database extension to mimic a logon trigger.

But I tried to install but failed. Error is that it could not find its .so
file.  Could you please help me in installing this login_hook ??

Looking forward to hear from you. Any suggestions would be of great help!

Thanks & Regards,
Pavan



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: vacuumdb and new VACUUM options

2019-05-13 Thread Masahiko Sawada

On Fri, May 10, 2019 at 9:03 PM Julien Rouhaud  wrote:
>
> On Thu, May 9, 2019 at 1:39 PM Kyotaro HORIGUCHI
>  wrote:
> >
> > At Thu, 9 May 2019 20:14:51 +0900, Masahiko Sawada  
> > wrote in 
> > 
> > > On Thu, May 9, 2019 at 10:01 AM Michael Paquier  
> > > wrote:
> > > >
> > > > On Wed, May 08, 2019 at 06:21:09PM -0300, Euler Taveira wrote:
> > > > > Em qua, 8 de mai de 2019 às 14:19, Fujii Masao 
> > > > >  escreveu:
> > > > >> The question is; we should support vacuumdb option for (1), i.e.,,
> > > > >> something like --index-cleanup option is added?
> > > > >> Or for (2), i.e., something like --disable-index-cleanup option is 
> > > > >> added
> > > > >> as your patch does? Or for both?
> > > > >
> > > > > --index-cleanup=BOOL
> > > >
> > > > I agree with Euler's suggestion to have a 1-1 mapping between the
> > > > option of vacuumdb and the VACUUM parameter
> > >
> > > +1. Attached the draft version patches for both options.
> >
> > +   printf(_("  --index-cleanup=BOOLEAN do or do not index 
> > vacuuming and index cleanup\n"));
> > +   printf(_("  --truncate=BOOLEAN  do or do not truncate 
> > off empty pages at the end of the table\n"));
> >
> > I *feel* that force/inhibit is suitable than true/false for the
> > options.
>
> Indeed.

The new VACUUM command option for these option take true and false as
the same meaning. What is the motivation is to change a 1-1 mapping
name?

>
> + If not specify this option
> + the behavior depends on vacuum_index_cleanup 
> option
> + for the table to be vacuumed.
>
> + If not specify this option
> + the behavior depends on vacuum_truncate option
> + for the table to be vacuumed.
>
> Those sentences should be rephrased to something like "If this option
> is not specified, the bahvior...".

Thank you! I've incorporated your comment in my branch. I'll post the
updated version patch after the above discussion got a consensus.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2019-05-13 Thread Paul Guo

Thanks for the reply.

On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

>
> +  if (!(stat(parent_path, ) == 0 && S_ISDIR(st.st_mode)))
> +  {
>
> This patch is allowing missing source and destination directory
> even in consistent state. I don't think it is safe.
>

I do not understand this. Can you elaborate?


>
>
>
> +ereport(WARNING,
> +(errmsg("directory \"%s\" for copydir() does not exists."
> +"It is possibly expected. Skip copydir().",
> +parent_path)));
>
> This message seems unfriendly to users, or it seems like an elog
> message. How about something like this. The same can be said for
> the source directory.
>
> | WARNING:  skipped creating database directory: "%s"
> | DETAIL:  The tabelspace %u may have been removed just before crash.
>

Yeah. Looks better.


>
> # I'm not confident in this at all:(
>
> > 2) Fixed dbase_desc(). Now the xlog output looks correct.
> >
> > rmgr: Databaselen (rec/tot): 42/42, tx:486, lsn:
> > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> > pg_tblspc/16384/PG_12_201904281/16386
> >
> > rmgr: Databaselen (rec/tot): 34/34, tx:487, lsn:
> > 0/01638EB8, prev 0/01638E40, desc: DROP dir
> > pg_tblspc/16384/PG_12_201904281/16386
>
> WAL records don't convey such information. The previous
> description seems right to me.
>

2019-04-17 14:52:14.951 CST [23030] CONTEXT:  WAL redo at 0/3011650 for
Database/CREATE: copy dir 1663/1 to 65546/65547
The directories are definitely wrong and misleading.


> > > Also I'd suggest we should use pg_mkdir_p() in
> TablespaceCreateDbspace()
> > > to replace
> > > the code block includes a lot of
> > > get_parent_directory(), MakePGDirectory(), etc even it
> > > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > > graceful and simpler.
>
> But I don't agree to this. pg_mkdir_p goes above two-parents up,
> which would be unwanted here.
>
> I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
This change just makes the code concise. Though in theory the change is not
needed.


> > > Whatever ignore mkdir failure or mkdir_p, I found that these steps
> seem to
> > > be error-prone
> > > along with postgre evolving since they are hard to test and also we are
> > > not easy to think out
> > > various potential bad cases. Is it possible that we should do real
> > > checkpoint (flush & update
> > > redo lsn) when seeing checkpoint xlogs for these operations? This will
> > > slow down standby
> > > but master also does this and also these operations are not usual,
> > > espeically it seems that it
> > > does not slow down wal receiving usually?
>
> That dramatically slows recovery (not replication) if databases
> are created and deleted frequently. That wouldn't be acceptable.
>

This behavior is rare and seems to have the same impact on master & standby
from checkpoint/restartpoint.
We do not worry about master so we should not worry about standby also.

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 01:25:19PM +0530, Kuntal Ghosh wrote:
> If we fix the issue in this way, we're certainly not going to do all
> those portal,locks,memory,resource owner cleanups that are done
> inside AbortTransaction() for a normal transaction ID. But, I'm not
> sure how relevant those steps are since the database is anyway
> shutting down.

And it is happening in bootstrap, meaning that the data folder is most
likely toast, and needs to be reinitialized.  TransactionLogFetch()
treats bootstrap and frozen XIDs as always committed, so from this
perspective it is not wrong either to complain that this transaction
has already been committed when attempting to abort it.  Not sure
what's a more user-friendly behavior in this case though.
--
Michael


signature.asc
Description: PGP signature

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-05-13 Thread Masahiko Sawada

On Mon, May 13, 2019 at 2:09 PM Smith, Peter  wrote:
>
> Hi Masahiko,
>
> > Let me briefly explain the current design I'm thinking. The design 
> > employees 2-tier key architecture. That is, a database cluster has one
> > master key and per-tablespace keys which are encrypted with the master key 
> > before storing to disk. Each tablespace keys are generated
> > randomly inside database when CREATE TABLESPACE. The all encrypted 
> > tablespace keys are stored together with the master key ID to the
> > file (say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can 
> > easily get all tablespace keys and the master key ID before
> > starting recovery, and therefore can get the all decrypted tablespace keys.
>
> Your design idea sounds very similar to the current Fujitsu Enterprise 
> Postgres (FEP) implementation of TDE.
>

Yeah, I studied the design of TDE from FEP as well as other database
supporting TDE.

> FEP uses a master encryption key (MEK) for the database cluster. This MEK is 
> stored in a file at some GUC variable location. This file is encrypted using 
> a “passphrase” known only to the administrator.
>
> There are also per-tablespace keys, which are randomly generated at the time 
> of CREATE TABLESPACE and stored in files. There is one tablespace key file 
> per tablespace. These tablespace key files are encrypted by the MEK and 
> stored at the location specified by CREATE TABLESPACE.
>
> Not all tablespaces use TDE. An FEP extension of the CREATE TABLESPACE 
> syntax, creates the tablespace key file only when encryption was requested.
> e.g. CREATE TABLESPACE my_secure_tablespace LOCATION 
> '/home/postgre/FEP/TESTING/tablespacedir' WITH 
> (tablespace_encryption_algorithm = 'AES256');
>
> The MEK is not currently got from a third party. It is randomly generated 
> when the master key file is first created by another added function.
> e.g. select pgx_set_master_key('passphrase');

Thank you for explaining!

I think that the main difference between FEP and our proposal is the
master key management. In our proposal postgres can get the master key
from the external key management server or services such as AWS KMS,
Gemalto KeySecure and an encrypted file by using the corresponding
plugin. We believe that this extensible architecture would be useful
for applying postgres to various systems.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: PANIC :Call AbortTransaction when transaction id is no normal

2019-05-13 Thread Kuntal Ghosh

Hello,

On Mon, May 13, 2019 at 10:15 AM Thunder  wrote:

> I try to fix this issue and check whether it's normal transaction id
> before we do abort.
>
> diff --git a/src/backend/access/transam/xact.c
> b/src/backend/access/transam/xact.c
> index 20feeec327..dbf2bf567a 100644
> --- a/src/backend/access/transam/xact.c
> +++ b/src/backend/access/transam/xact.c
> @@ -4504,8 +4504,13 @@ RollbackAndReleaseCurrentSubTransaction(void)
>  void
>  AbortOutOfAnyTransaction(void)
>  {
> +   TransactionId xid = GetCurrentTransactionIdIfAny();
> TransactionState s = CurrentTransactionState;
>
> +   /* Check to see if the transaction ID is a permanent one because
> we cannot abort it */
> +   if (!TransactionIdIsNormal(xid))
> +   return;
> +
> /* Ensure we're not running in a doomed memory context */
> AtAbort_Memory();
>
> Can we fix in this way?
>
> If we fix the issue in this way, we're certainly not going to do all those
portal,locks,memory,resource owner cleanups that are done
inside AbortTransaction() for a normal transaction ID. But, I'm not sure
how relevant those steps are since the database is anyway shutting down.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Andrew Gierth

> "Andrew" == Andrew Gierth  writes:

 Tom> I am, frankly, inclined to ignore this as a bad idea. We do have
 Tom> SIMILAR and ESCAPE as keywords already, but they're
 Tom> type_func_name_keyword and unreserved_keyword respectively. To
 Tom> support this syntax, I'm pretty sure we'd have to make them both
 Tom> fully reserved.

 Andrew> I only did a quick trial but it doesn't seem to require
 Andrew> reserving them more strictly - just adding the obvious
 Andrew> productions to the grammar doesn't introduce any conflicts.

Digging deeper, that's because both SIMILAR and ESCAPE have been
assigned precedence. Ambiguities that exist include:

   ... COLNAME ! SIMILAR ( ...

which could be COLNAME postfix-op SIMILAR a_expr, or COLNAME infix-op
function-call. Postfix operators strike again... we really should kill
those off.

The ESCAPE part could in theory be ambiguous if the SIMILAR expression
ends in a ... SIMILAR TO xxx operator, since then we wouldn't know
whether to attach the ESCAPE to that or keep it as part of the function
syntax. But I think this is probably a non-issue. More significant is
that ... COLNAME ! ESCAPE ... again has postfix- vs. infix-operator
ambiguities.

-- 
Andrew (irc:RhodiumToad)

Re: pglz performance

2019-05-13 Thread Michael Paquier

On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote:
> I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there
> is a big room for improvement in performance of pglz compression and
> decompression.

Yes, I believe so too.  pglz is a huge CPU-consumer when it comes to
compilation compared to more modern algos like lz4.

> With Vladimir we started to investigate ways to boost byte copying
> and eventually created test suit[1] to investigate performance of
> compression and decompression.  This is and extension with single
> function test_pglz() which performs tests for different: 
> 1. Data payloads
> 2. Compression implementations
> 3. Decompression implementations

Cool.  I got something rather similar in my wallet of plugins:
https://github.com/michaelpq/pg_plugins/tree/master/compress_test
This is something I worked on mainly for FPW compression in WAL.

> Currently we test mostly decompression improvements against two WALs
> and one data file taken from pgbench-generated database. Any
> suggestion on more relevant data payloads are very welcome.

Text strings made of random data and variable length?  For any test of
this kind I think that it is good to focus on the performance of the
low-level calls, even going as far as a simple C wrapper on top of the
pglz APIs to test only the performance and not have extra PG-related
overhead like palloc() which can be a barrier.  Focusing on strings of
lengths of 1kB up to 16kB may be an idea of size, and it is important
to keep the same uncompressed strings for performance comparison.

> My laptop tests show that our decompression implementation [2] can
> be from 15% to 50% faster.  Also I've noted that compression is
> extremely slow, ~30 times slower than decompression. I believe we
> can do something about it.

That's nice.

> We focus only on boosting existing codec without any considerations
> of other compression algorithms.

There is this as well.  A couple of algorithms have a license
compatible with Postgres, but it may be more simple to just improve
pglz.  A 10%~20% improvement is something worth doing.

> Most important questions are:
> 1. What are relevant data sets?
> 2. What are relevant CPUs? I have only XEON-based servers and few
> laptops\desktops with intel CPUs
> 3. If compression is 30 times slower, should we better focus on
> compression instead of decompression?

Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked.  Any improvement in one or the
other, or even both, is nice to have.
--
Michael

signature.asc
Description: PGP signature

Re: PG 12 draft release notes

2019-05-13 Thread Oleg Bartunov

On Mon, May 13, 2019 at 6:52 AM Bruce Momjian  wrote:
>
> On Sun, May 12, 2019 at 09:49:40AM -0400, Bruce Momjian wrote:
> > On Sun, May 12, 2019 at 10:00:38AM +0300, Oleg Bartunov wrote:
> > > Bruce,
> > >
> > > I noticed that jsonpath in your version is mentioned only in functions
> > > chapter, but  commit
> > > 72b6460336e86ad5cafd3426af6013c7d8457367 is about implementation of
> > > SQL-2016 standard. We implemented JSON Path language as a jsonpath
> > > datatype with a bunch of support functions, our implementation
> > > supports 14 out of 15 features and it's the most complete
> > > implementation (we compared oracle, mysql and ms sql).
> >
> > Glad you asked.  I was very confused about why a data type was added for
> > a new path syntax.  Is it a new storage format for JSON, or something
> > else?  I need help on this.
>
> I talked to Alexander Korotkov on chat about this.  The data types are
> used as arguments to the functions, similar to how tsquery and tsvector
> are used for full text search.
>
> Therefore, the data types are not really useful on their own, but as
> support for path functions.  However, path functions are more like JSON
> queries, rather than traditional functions, so it odd to list them under
> functions, but there isn't a more reasonable place to put them.
>
> Alexander researched how we listed full text search in the release notes
> that added the feature, but we had  "General" category at that time that
> we don't have now.

I attached slide about our Jsonpath implementation in Postgres, it
summarizes the reasons to have jsonpath data type. But my point was:
JSON Path is a part of SQL-2016 standard and I think it's worth to
mention it, not just a set of jsonb functions.

>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +  Ancient Roman grave inscription +



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PG 12 draft release notes

2019-05-13 Thread Fabien COELHO




Hello Bruce,


I have posted a draft copy of the PG 12 release notes here:

http://momjian.us/pgsql_docs/release-12.html

They are committed to git.  It includes links to the main docs, where
appropriate.  Our official developer docs will rebuild in a few hours.


Pgbench entry "Improve pgbench error reporting with clearer messages and 
return codes" is by Peter Eisentraut, not me. I just reviewed it.


--
Fabien.

Re: SQL-spec incompatibilities in similar_escape() and related stuff

2019-05-13 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> but in recent versions it's

 Tom>  ::=
 Tom>   SUBSTRING  
 Tom>  SIMILAR 
 Tom>  ESCAPE  

 Tom> I am, frankly, inclined to ignore this as a bad idea. We do have
 Tom> SIMILAR and ESCAPE as keywords already, but they're
 Tom> type_func_name_keyword and unreserved_keyword respectively. To
 Tom> support this syntax, I'm pretty sure we'd have to make them both
 Tom> fully reserved.

I only did a quick trial but it doesn't seem to require reserving them
more strictly - just adding the obvious productions to the grammar
doesn't introduce any conflicts.

 Tom> * Our function similar_escape() is not documented, but it
 Tom> underlies three things in the grammar:

 Tom>   a SIMILAR TO b
 Tom>   Translated as "a ~ similar_escape(b, null)"

 Tom>   a SIMILAR TO b ESCAPE e
 Tom>   Translated as "a ~ similar_escape(b, e)"

 Tom>   substring(a, b, e)
 Tom>   This is a SQL function expanding to
 Tom>   select pg_catalog.substring($1, pg_catalog.similar_escape($2, $3))

 Tom> To support the first usage, similar_escape is non-strict, and it
 Tom> takes a NULL second argument to mean '\'. This is already a SQL
 Tom> spec violation, because as far as I can tell from the spec, if you
 Tom> don't write an ESCAPE clause then there is *no* escape character;
 Tom> there certainly is not a default of '\'. However, we document this
 Tom> behavior, so I don't know if we want to change it.

This is the same spec violation that we also have for LIKE, which also
is supposed to have no escape character in the absense of an explicit
ESCAPE clause.

-- 
Andrew (irc:RhodiumToad)

85 matches

Mail list logo