Re: lazy detoasting

2018-04-07 Thread Andrew Gierth
> "Chapman" == Chapman Flack  writes:

 Chapman> Please bear with me as I check my understanding of snapshot
 Chapman> management by looking at PersistHoldablePortal(). There's a
 Chapman> PushActiveSnapshot(queryDesc->snapshot) in there. Is that
 Chapman> because:

 Chapman> (d) some other reason I haven't thought of ?

It has to be pushed as the active snapshot so that it's the snapshot
that the executor uses to run the query to populate the tuplestore which
becomes the "held" portal content.

I think you're confusing the stack of active snapshots with the registry
of snapshots. The snapshot of a portal that's not currently running in
the executor won't be on the stack, but it will be registered (which is
enough to maintain the session's reported xmin, which is what prevents
visible data from being vacuumed away).

-- 
Andrew (irc:RhodiumToad)



Re: [HACKERS] pgbench - allow to store select results into variables

2018-04-07 Thread Fabien COELHO


Hello Stephen,


Might be interesting to support mutli-row (or no row?) returns in the
future, but not something this patch needs to do, so this looks fine to
me.


It could match PL/pgSQL's INTO, that is first row or NULL if none.


Yeah, but that's not really something that needs to go into this patch.


Sure. I did not. I checked psql \gset behavior:

  psql> SELECT 1 AS stuff WHERE false \gset
no rows returned for \gset
  psql> \echo :stuff
:stuff -- "stuff" var was not set

  psql> SELECT i AS stuff FROM generate_series(1,5) AS i \gset
more than one row returned for \gset
  psql> \echo :stuff
:stuff -- "stuff" var was not set either

If the semantics is changed in anyway, ISTM that psql & pgbench should be 
kept consistent.



If this patch get through, might be handy for simplifying tests in
current pgbench submissions, especially the "error handling" one.


Glad to hear that.  Unfortunately, I didn't end up having time to wrap
this up to a satisfactory level for myself to get it into PG11.


No problem with waiting for PG. Whatever N:-)

I know it's been a long time coming, and thank you for continuing to 
push on it;


Yeah, I'm kind of stubborn. Sometimes a quality, often a flaw.

I'll try to make sure that I take some time in the first CF 
for PG12 to wrap this up and get it in.  I'm all for these improvements 
in pgbench in general, it's a really useful tool and it's great to see 
work going into it.


Thanks for scheduling a try! :-)

When it gets in, I'll submit, eventually, a "tpcb-strict" builtin 
benchmarking script for illustration, which would implement the bench 
requirement that clients more often query in their own branch. This would 
take advantage of recently (PG11) added \if and logical expressions (for 
correlating clients to their branch) and gset (the benchmark states that 
the client must retrieve the value, whereas it is currently discarded).


--
Fabien.



Fix slot's xmin advancement and subxact's lost snapshots in decoding.

2018-04-07 Thread Arseny Sher
Hello,

I've discovered a couple of bugs in logical decoding code, both leading
to incorrect decoding results in somewhat rare cases. First, xmin of
slots is advanced too early. This affects the results only when
interlocking allows to perform DDL concurrently with looking at the
schema. In fact, I was not aware about such DDL until at
https://www.postgresql.org/message-id/flat/87tvu0p0jm.fsf%40ars-thinkpad#87tvu0p0jm.fsf@ars-thinkpad
I raised this question and Andres pointed out ALTER of composite
types. Probably there are others, I am not sure; it would be interesting
to know them.

Another problem is that new snapshots are never queued to known
subxacts. It means decoding results can be wrong if toplevel doesn't
write anything while subxact does.

Please see detailed description of the issues, tests which reproduce
them and fixes in the attached patch.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 0d03ef172a29ce64a6c5c26f484f0f3186895125 Mon Sep 17 00:00:00 2001
From: Arseny Sher 
Date: Sat, 7 Apr 2018 08:22:30 +0300
Subject: [PATCH] Fix slot's xmin advancement and subxact's lost snapshots in
 decoding.

There are two in some way related bugs here. First, xmin of logical slots was
advanced too early. During xl_running_xacts processing, xmin of slot was set to
oldest running xid in the record, which is wrong: actually snapshots which will
be used for not-yet-replayed at this moment xacts might consider older xacts as
running too. The problem wasn't noticed earlier because DDL which allows to
delete tuple (set xmax) while some another not yet committed xact looks at it
is pretty rare, if not unique: e.g. all forms of ALTER TABLE which change schema
acquire ACCESS EXCLUSIVE lock conflicting with any inserts. Included test
case uses ALTER of a composite type, which doesn't have such interlocking.

To deal with this, we must be able to quickly retrieve oldest xmin
(oldest running xid among all assigned snapshots) from ReorderBuffer. Proposed
fix adds yet another list of ReorderBufferTXNs to it, where xacts are sorted by
LSN of the base snapshot, because
 * Plain usage of current ReorderBufferGetOldestTXN (to get base snapshot's
   xmin) is wrong, because order by LSN of the first record is *not* the same as
   order by LSN of record on which base snapshot was assigned: many record types
   don't need the snapshot, e.g. xl_xact_assignment. We could remember
   snapbuilder's xmin during processing of the first record, but that's too
   conservative and we would keep more tuples from vacuuming than needed.
 * On the other hand, we can't fully replace existing order by LSN of the first
   change with LSN of the base snapshot, because if we do that, WAL records
   which didn't forced receival of base snap might be recycled before we
   replay the xact, as ReorderBufferGetOldestTXN determines which LSN still must
   be kept.

Second issue concerns subxacts. Currently SnapBuildDistributeNewCatalogSnapshot
never spares a snapshot to known subxact. It means that if toplevel doesn't have
base snapshot (never did writes), there was an assignment record (subxact is
known) and subxact is writing, no new snapshots are being queued. Probably the
simplest fix would be to employ freshly baked by_base_snapshot_lsn list and just
distribute snapshot to all xacts with base snapshot whether they are subxacts or
not. However, in this case we would queue unneccessary snapshot to all already
known subxacts. To avoid this, in this patch base snapshot of known subxact
is immediately passed to the toplevel as soon as we learn the kinship / base
snap assigned -- we have to do that anyway in ReorderBufferCommitChild; new
snaps are distributed only to top-level xacts (or unknown subxacts) as before.

Also, minor memory leak is fixed: counter of toplevel's old base
snapshot was not decremented when snap has been passed from child.
---
 contrib/test_decoding/Makefile |   3 +-
 contrib/test_decoding/expected/oldest_xmin.out |  27 +++
 .../test_decoding/expected/subxact_snap_pass.out   |  49 ++
 contrib/test_decoding/specs/oldest_xmin.spec   |  37 +
 contrib/test_decoding/specs/subxact_snap_pass.spec |  42 +
 src/backend/replication/logical/reorderbuffer.c| 183 ++---
 src/backend/replication/logical/snapbuild.c|  15 +-
 src/include/replication/reorderbuffer.h|  15 +-
 8 files changed, 304 insertions(+), 67 deletions(-)
 create mode 100644 contrib/test_decoding/expected/oldest_xmin.out
 create mode 100644 contrib/test_decoding/expected/subxact_snap_pass.out
 create mode 100644 contrib/test_decoding/specs/oldest_xmin.spec
 create mode 100644 contrib/test_decoding/specs/subxact_snap_pass.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 6c18189d9d..c696288d67 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -50,7 +50,8 @@ regresscheck-inst

Re: Checkpoint not retrying failed fsync?

2018-04-07 Thread Amit Kapila
On Fri, Apr 6, 2018 at 6:26 AM, Thomas Munro
 wrote:
> On Fri, Apr 6, 2018 at 11:36 AM, Thomas Munro
>  wrote:
>> On Fri, Apr 6, 2018 at 11:34 AM, Andrew Gierth
>>  wrote:
>>> Right.
>>>
>>> But I don't think just copying the value is sufficient; if a new bit was
>>> set while we were processing the old ones, how would we know which to
>>> clear? We couldn't just clear all the bits afterwards because then we
>>> might lose a request.
>>
>> Agreed.  The attached draft patch handles that correctly, I think.
>
> After some testing, here is a better one for review.  Changes:
>
> 1.  The copy wasn't really needed.
> 2.  errno needed to be restored (in case bms_union stomped on it),
> thanks to Andrew for an off-list complaint about that.
> 3.  Memory context was wrong.
> 4.  bms_first_member() eats its input.  Need to use bms_next_member() instead.
>

Won't in the success case, you need to delete each member (by
something like bms_del_member) rather than just using bms_free?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: lazy detoasting

2018-04-07 Thread Amit Kapila
On Thu, Apr 5, 2018 at 8:04 AM, Chapman Flack  wrote:
> On 04/01/18 01:19, Tom Lane wrote:
>> Chapman Flack  writes:
>>> If I copy an out-of-line, on-disk TOAST pointer into a memory context
>>> with transaction lifetime, with an eye to detoasting it later in the
>>> same transaction, are there circumstances where it wouldn't work?
>>
>> Should be safe *as long as you hold onto a snapshot that can see the
>> toast value's parent row*.  Otherwise, it's not.
>>
>> For a counterexample, see the changes we had to make to avoid depending
>> on out-of-line toast values in the catcaches, commit 08e261cbc.
>
> Thanks. I think I see two approaches happening in that commit: retaining
> a snapshot, if the tuples will be used within the transaction, or eagerly
> detoasting, when persisting a holdable portal so tuples can be used after
> the transaction.
>
> Please bear with me as I check my understanding of snapshot management
> by looking at PersistHoldablePortal(). There's a
> PushActiveSnapshot(queryDesc->snapshot) in there. Is that because:
>
> (a) that snapshot might not be on the active stack at this point, and it
> needs to be be there to keep it alive during this executor re-run, or
>
> (b) it's known to be somewhere on the active stack already, but not
> necessarily on top, and needs to be pushed so it is topmost while
> re-running this executor (does the top snapshot on the active stack
> affect which tuples the executor sees? or is this stack's only purpose
> to keep snapshots alive?), or
>
> (c) it's known to be somewhere on the stack already, but needs to be
> be pushed to make sure some stack-depth invariant is preserved
> (perhaps because of an implied pop in case of an error), or
>
> (d) some other reason I haven't thought of ?
>
> I *think* I'm smart enough to rule out choice (a), because it wouldn't
> make sense to have queryDesc->snapshot refer to a snapshot that isn't
> already on the stack (unless it's also been registered, in which case,
> why would it need to be pushed on the stack too?), as then I would be
> reckless to assume it's still alive to *be* pushed. Am I close?
>

I think it is somewhat close to what you have mentioned in (b).
Basically, it will help executor to use that snapshot for tuple
visibility.

> I haven't yet gone to track down the code that assigns a snapshot to
> queryDesc->snapshot.
>

See CreateQueryDesc().


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-07 Thread Christophe Pettus

> On Apr 7, 2018, at 20:27, Craig Ringer  wrote:
> 
> Right now I think we're at option (4): If you see anything that smells like a 
> write error in your kernel logs, hard-kill postgres with -m immediate (do NOT 
> let it do a shutdown checkpoint). If it did a checkpoint since the logs, fake 
> up a backup label to force redo to start from the last checkpoint before the 
> error. Otherwise, it's safe to just let it start up again and do redo again.

Before we spiral down into despair and excessive alcohol consumption, this is 
basically the same situation as a checksum failure or some other kind of 
uncorrected media-level error.  The bad part is that we have to find out from 
the kernel logs rather than from PostgreSQL directly.  But this does not strike 
me as otherwise significantly different from, say, an infrequently-accessed 
disk block reporting an uncorrectable error when we finally get around to 
reading it.

--
-- Christophe Pettus
   x...@thebuild.com




Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-07 Thread Peter Geoghegan
On Sat, Apr 7, 2018 at 8:27 PM, Craig Ringer  wrote:
> More below, but here's an idea #5: decide InnoDB has the right idea, and go
> to using a single massive blob file, or a few giant blobs.
>
> We have a storage abstraction that makes this way, way less painful than it
> should be.
>
> We can virtualize relfilenodes into storage extents in relatively few big
> files. We could use sparse regions to make the addressing more convenient,
> but that makes copying and backup painful, so I'd rather not.
>
> Even one file per tablespace for persistent relation heaps, another for
> indexes, another for each fork type.

I'm not sure that we can do that now, since it would break the new
"Optimize btree insertions for common case of increasing values"
optimization. (I did mention this before it went in.)

I've asked Pavan to at least add a note to the nbtree README that
explains the high level theory behind the optimization, as part of
post-commit clean-up. I'll ask him to say something about how it might
affect extent-based storage, too.

-- 
Peter Geoghegan



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-07 Thread Craig Ringer
On 8 April 2018 at 10:16, Thomas Munro 
wrote:

> So, what can we actually do about this new Linux behaviour?
>

Yeah, I've been cooking over that myself.

More below, but here's an idea #5: decide InnoDB has the right idea, and go
to using a single massive blob file, or a few giant blobs.

We have a storage abstraction that makes this way, way less painful than it
should be.

We can virtualize relfilenodes into storage extents in relatively few big
files. We could use sparse regions to make the addressing more convenient,
but that makes copying and backup painful, so I'd rather not.

Even one file per tablespace for persistent relation heaps, another for
indexes, another for each fork type.

That way we can use something like your #1 (which is what I was also
thinking about then rejecting previously), but reduce the pain by reducing
the FD count drastically so exhausting FDs stops being a problem.


Previously I was leaning toward what you've described here:


> * whenever you open a file, either tell the checkpointer so it can
> open it too (and wait for it to tell you that it has done so, because
> it's not safe to write() until then), or send it a copy of the file
> descriptor via IPC (since duplicated file descriptors share the same
> f_wb_err)
>
> * if the checkpointer can't take any more file descriptors (how would
> that limit even work in the IPC case?), then it somehow needs to tell
> you that so that you know that you're responsible for fsyncing that
> file yourself, both on close (due to fd cache recycling) and also when
> the checkpointer tells you to
>
> Maybe it could be made to work, but sheesh, that seems horrible.  Is
> there some simpler idea along these lines that could make sure that
> fsync() is only ever called on file descriptors that were opened
> before all unflushed writes, or file descriptors cloned from such file
> descriptors?
>

... and got stuck on "yuck, that's awful".

I was assuming we'd force early checkpoints if the checkpointer hit its fd
limit, but that's even worse.

We'd need to urgently do away with segmented relations, and partitions
would start to become a hinderance.

Even then it's going to be an unworkable nightmare with heavily partitioned
systems, systems that use schema-sharding, etc. And it'll mean we need to
play with process limits and, often, system wide limits on FDs. I imagine
the performance implications won't be pretty.


Idea 2:
>
> Give up, complain that this implementation is defective and
> unworkable, both on POSIX-compliance grounds and on POLA grounds, and
> campaign to get it fixed more fundamentally (actual details left to
> the experts, no point in speculating here, but we've seen a few
> approaches that work on other operating systems including keeping
> buffers dirty and marking the whole filesystem broken/read-only).
>

This appears to be what SQLite does AFAICS.

https://www.sqlite.org/atomiccommit.html

though it has the huge luxury of a single writer, so it's probably only
subject to the original issue not the multiprocess / checkpointer issues we
face.



> Idea 3:
>
> Give up on buffered IO and develop an O_SYNC | O_DIRECT based system ASAP.
>

That seems to be what the kernel folks will expect. But that's going to
KILL performance. We'll need writer threads to have any hope of it not
*totally* sucking, because otherwise simple things like updating a heap
tuple and two related indexes will incur enormous disk latencies.

But I suspect it's the path forward.

Goody.




> Any other ideas?
>
> For a while I considered suggesting an idea which I now think doesn't
> work.  I thought we could try asking for a new fcntl interface that
> spits out wb_err counter.  Call it an opaque error token or something.
> Then we could store it in our fsync queue and safely close the file.
> Check again before fsync()ing, and if we ever see a different value,
> PANIC because it means a writeback error happened while we weren't
> looking.  Sadly I think it doesn't work because AIUI inodes are not
> pinned in kernel memory when no one has the file open and there are no
> dirty buffers, so I think the counters could go away and be reset.
> Perhaps you could keep inodes pinned by keeping the associated buffers
> dirty after an error (like FreeBSD), but if you did that you'd have
> solved the problem already and wouldn't really need the wb_err system
> at all.  Is there some other idea long these lines that could work?


I think our underlying data syncing concept is fundamentally broken, and
it's not really the kernel's fault.

We assume that we can safely:

procA: open()
procA: write()
procA: close()

... some long time later, unbounded as far as the kernel is concerned ...

procB: open()
procB: fsync()
procB: close()


If the kernel does writeback in the middle, how on earth is it supposed to
know we expect to reopen the file and check back later?

Should it just remember "this file had an error" forever, and tell every
caller? In that case how could we

Re: pgsql: Support partition pruning at execution time

2018-04-07 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Support partition pruning at execution time
> 
> Buildfarm member lapwing doesn't like this.  I can reproduce the
> failures here by setting force_parallel_mode = regress.  Kind
> of looks like instrumentation counts aren't getting propagated
> from workers back to the leader?

This theory seems correct; the counters are getting incremented, yet
explain later prints them as zero.  What is the code that is supposed to
propagate the instrumentation counts?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-07 Thread Christophe Pettus

> On Apr 7, 2018, at 19:33, Bruce Momjian  wrote:
> Idea 4 would be for people to assume their database is corrupt if their
> server logs report any I/O error on the file systems Postgres uses.

Pragmatically, that's where we are right now.  The best answer in this bad 
situation is (a) fix the error, then (b) replay from a checkpoint before the 
error occurred, but it appears we can't even guarantee that a PostgreSQL 
process will be the one to see the error.

--
-- Christophe Pettus
   x...@thebuild.com




Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-07 Thread Bruce Momjian
On Sun, Apr  8, 2018 at 02:16:07PM +1200, Thomas Munro wrote:
> So, what can we actually do about this new Linux behaviour?
> 
> Idea 1:
> 
> * whenever you open a file, either tell the checkpointer so it can
> open it too (and wait for it to tell you that it has done so, because
> it's not safe to write() until then), or send it a copy of the file
> descriptor via IPC (since duplicated file descriptors share the same
> f_wb_err)
> 
> * if the checkpointer can't take any more file descriptors (how would
> that limit even work in the IPC case?), then it somehow needs to tell
> you that so that you know that you're responsible for fsyncing that
> file yourself, both on close (due to fd cache recycling) and also when
> the checkpointer tells you to
> 
> Maybe it could be made to work, but sheesh, that seems horrible.  Is
> there some simpler idea along these lines that could make sure that
> fsync() is only ever called on file descriptors that were opened
> before all unflushed writes, or file descriptors cloned from such file
> descriptors?
> 
> Idea 2:
> 
> Give up, complain that this implementation is defective and
> unworkable, both on POSIX-compliance grounds and on POLA grounds, and
> campaign to get it fixed more fundamentally (actual details left to
> the experts, no point in speculating here, but we've seen a few
> approaches that work on other operating systems including keeping
> buffers dirty and marking the whole filesystem broken/read-only).
> 
> Idea 3:
> 
> Give up on buffered IO and develop an O_SYNC | O_DIRECT based system ASAP.

Idea 4 would be for people to assume their database is corrupt if their
server logs report any I/O error on the file systems Postgres uses.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-07 Thread Thomas Munro
So, what can we actually do about this new Linux behaviour?

Idea 1:

* whenever you open a file, either tell the checkpointer so it can
open it too (and wait for it to tell you that it has done so, because
it's not safe to write() until then), or send it a copy of the file
descriptor via IPC (since duplicated file descriptors share the same
f_wb_err)

* if the checkpointer can't take any more file descriptors (how would
that limit even work in the IPC case?), then it somehow needs to tell
you that so that you know that you're responsible for fsyncing that
file yourself, both on close (due to fd cache recycling) and also when
the checkpointer tells you to

Maybe it could be made to work, but sheesh, that seems horrible.  Is
there some simpler idea along these lines that could make sure that
fsync() is only ever called on file descriptors that were opened
before all unflushed writes, or file descriptors cloned from such file
descriptors?

Idea 2:

Give up, complain that this implementation is defective and
unworkable, both on POSIX-compliance grounds and on POLA grounds, and
campaign to get it fixed more fundamentally (actual details left to
the experts, no point in speculating here, but we've seen a few
approaches that work on other operating systems including keeping
buffers dirty and marking the whole filesystem broken/read-only).

Idea 3:

Give up on buffered IO and develop an O_SYNC | O_DIRECT based system ASAP.

Any other ideas?

For a while I considered suggesting an idea which I now think doesn't
work.  I thought we could try asking for a new fcntl interface that
spits out wb_err counter.  Call it an opaque error token or something.
Then we could store it in our fsync queue and safely close the file.
Check again before fsync()ing, and if we ever see a different value,
PANIC because it means a writeback error happened while we weren't
looking.  Sadly I think it doesn't work because AIUI inodes are not
pinned in kernel memory when no one has the file open and there are no
dirty buffers, so I think the counters could go away and be reset.
Perhaps you could keep inodes pinned by keeping the associated buffers
dirty after an error (like FreeBSD), but if you did that you'd have
solved the problem already and wouldn't really need the wb_err system
at all.  Is there some other idea long these lines that could work?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: PATCH: psql tab completion for SELECT

2018-04-07 Thread Edmund Horner
On 6 April 2018 at 13:29, Peter Eisentraut
 wrote:
> The selection of which functions to show can be further refined.
>
> I don't think the contents of pg_amproc and pg_cast should be excluded.
> Some of those functions are useful.  Similarly for pg_operator.oprcode.
>
> Things like oprrest and oprjoin will already be excluded because they
> have "internal" arguments.  Similarly for some columns in pg_aggregate.

You're right.  There were lots of useful functions being excluded by
the pg_amproc, pg_cast, and oprcode checks.  And all the oprrest and
oprjoin functions are already excluded, so I can remove that check.

Perhaps we should remove the "appears in this catalog table" exclusion
checks, and just use argument and return type?

> There are also additional pseudo-types that should be excluded.  See
> pg_type.typtype = 'p', except some of those should not be excluded.
> Needs more thought.

I don't know much about some of the pseudotypes but this is what I propose:

any*, record, _record, cstring should NOT be excluded
void should NOT be excluded for return type (and perhaps in general;
void_out and void_send are callable, if not hugely useful in psql)

trigger, event_trigger should be excluded
internal, opaque, unknown, pg_ddl_command should be excluded
language_handler, tsm_handler, index_am_handler, fdw_handler should be excluded

For modern servers, our query can be simplified to something like:

SELECT distinct pg_catalog.quote_ident(p.proname)
FROM pg_catalog.pg_proc p
WHERE NOT arrayoverlap(ARRAY['internal', 'event_trigger', 'internal',
'opaque', 'unknown', 'pg_ddl_command', 'language_handler',
'tsm_handler', 'index_am_handler', 'fdw_handler']::regtype[]::oid[],
ARRAY(SELECT p.prorettype UNION SELECT unnest(proargtypes)));

What do you think?



Re: Bring atomic flag fallback up to snuff

2018-04-07 Thread Andres Freund
Hi,

On 2018-04-07 14:23:38 -0400, Tom Lane wrote:
> I think I'd just drop those asserts altogether.  The hardware is in charge
> of complaining about misaligned pointers.

I came around to that view. The problem with "natural" and atomic
alignment differing is only relevant for 64bit integers (4 byte natural,
8 byte atomic).  Pushing that once the tests finish.

Greetings,

Andres Freund



Re: [HACKERS] pgbench - allow to store select results into variables

2018-04-07 Thread Stephen Frost
Fabien,

* Fabien COELHO (coe...@cri.ensmp.fr) wrote:
> >>   
> >>+   
> >>+
> >>+ \cset [prefix] or
> >>+ \gset [prefix]
> >>+
> >
> >Seems like this should really be moved down below the section for
> >'\set'.
> 
> Dunno.
> 
> I put them there because it is in alphabetical order (for cset at least) and
> because "set" documentation is heavy about expressions (operators,
> functions, constants, ...) which have nothing to do with cset/gset, so I
> felt that having them clearly separated and in abc order was best.

Ah, hmmm, yes, alphabetical order is sensible, certainly.

> >>-   char   *line;   /* text of command line */
> >>+   char   *line;   /* first line for short display 
> >>*/
> >>+   char   *lines;  /* full multi-line text of 
> >>command */
> >
> >Not really a fan of such closely-named variables...  Maybe first_line
> >instead?
> 
> Indeed, looks better.

Great, thanks.

> >>+   case PGRES_TUPLES_OK:
> >>+   if (gset[compound] != NULL)
> >
> >Probably be good to add some comments here, eh:
> 
> >/*
> >* The results are intentionally thrown away if we aren't under a gset
> >* call.
> >*/
> 
> I added a (shorter) comment.

Ok.

> >>+   if (ntuples != 1)
> >>+   {
> >>+   fprintf(stderr,
> >>+   "client %d file 
> >>%d command %d compound %d: "
> >>+   "expecting one 
> >>row, got %d\n",
> >>+   st->id, 
> >>st->use_file, st->command, compound, ntuples);
> >>+   st->ecnt++;
> >>+   PQclear(res);
> >>+   discard_response(st);
> >>+   return false;
> >>+   }
> >
> >Might be interesting to support mutli-row (or no row?) returns in the
> >future, but not something this patch needs to do, so this looks fine to
> >me.
> 
> It could match PL/pgSQL's INTO, that is first row or NULL if none.

Yeah, but that's not really something that needs to go into this patch.

> >>+
> >>+   /* store result as a string */
> >>+   if (!putVariable(st, "gset", 
> >>varname,
> >>+   
> >> PQgetvalue(res, 0, f)))
> >>+   {
> >>+   /* internal error, 
> >>should it rather abort? */
> >
> >I'm a bit on the fence about if we should abort in this case or not.  A
> >failure here seems likely to happen consistently (whereas the ntuples
> >case might be a fluke of some kind), which tends to make me think we
> >should abort, but still thinking about it.
> 
> I'm also still thinking about it:-) For me it is an abort, but there is this
> whole "return false" internal error management in pgbench the purpose of
> which fails me a little bit, so I stick to that anyway.

Yeah.

> >>+   if (*gset[compound] != '\0')
> >>+   free(varname);
> >
> >A comment here, and above where we're assigning the result of the
> >psprintf(), to varname probably wouldn't hurt, explaining that the
> >variable is sometimes pointing into the query result structure and
> >sometimes not...
> 
> I added two comments to avoid a malloc/free when there are no prefixes. ISTM
> that although it might be a border-line over-optimization case, it is a
> short one, the free is a few lines away, so it might be ok.

Ok, having the comments is definitely good as it was a bit confusing as
to what was going on. :)

> >>+   /* read and discard the query results */
> >
> >That comment doesn't feel quite right now. ;)
> 
> Indeed. Changed with "store or discard".

Ok.

> >>
> >>-   /* We don't want to scribble on cmd->argv[0] until done */
> >>-   sql = pg_strdup(cmd->argv[0]);
> >>+   sql = pg_strdup(cmd->lines);
> >
> >The function-header comment for parseQuery() could really stand to be
> >improved.
> 
> Indeed.
> 
> >>+   /* merge gset variants into preceeding SQL 
> >>command */
> >>+   if (pg_strcasecmp(bs_cmd, "gset") == 0 ||
> >>+   pg_strcasecmp(bs_cmd, "cset") == 0)
> >>+   {
> >>+
> >>"\\gset cannot start a script",
> >>+
> >>"\\gset must follow a SQL command",
> >>+

Re: [PATCH] session_replication_role = replica with TRUNCATE

2018-04-07 Thread Peter Eisentraut
This part of the patch didn't end up being necessary, since the updated
implementation of logical decoding of TRUNCATE could do without it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Logical decoding of TRUNCATE

2018-04-07 Thread Peter Eisentraut
On 4/5/18 13:07, Alvaro Herrera wrote:
> Note that you start the loop having the Relation; yet you go extra
> length to grab the relnamespace and relname from syscache instead of
> just relations[i]->rd_rel->relname etc.

fixed

> Maybe not a big deal, but for future pg_waldump users I'm sure it'll be
> nice to have the OIDs here.

done

>> +void
>> +ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
>> +DropBehavior behavior, 
>> bool restart_seqs)
>> +{
> 
> Please add a comment atop this function.

done

>> +/*
>> + * Write a WAL record to allow this set of actions to be logically 
>> decoded.
>> + *
>> + * Assemble an array of relids so we can write a single WAL record for 
>> the
>> + * whole action.
>> + */
>> +if (list_length(relids_logged) > 0)
>> +{
>> +xl_heap_truncate xlrec;
>> +int i = 0;
> 
> I wonder if this should happen only if logical decoding?  (Maybe it
> already occurs because relids_logged would be empty?  Worth a comment in
> that case)

Added an assertion and a comment.

Committed with those changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Configurable file mode mask

2018-04-07 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> (If not, this bodes ill for the Windows results.)

Ah, there go the Windows builds...  Missing a #else clause where we
should have one for those systems, will fix.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PATCH: Configurable file mode mask

2018-04-07 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2018-04-07%2021%3A50%3A02
> 
> > Yes, I'm taking a look at it.
> 
> Since other animals are coming back successfully, my first suspicion
> lights on culicidae's use of EXEC_BACKEND.  Did you test that case?
> (If not, this bodes ill for the Windows results.)

The vast majority of this patch isn't something that's relevant to
Windows anyway, so I'm hopeful that those animals will stay green (if
not, it's probably that we need to mark some other test as
not-on-Windows).

We have EXEC_BACKEND on the Unix-y systems too though, so that needs to
be fixed.  I'm pretty sure the issue here is that SubPostmasterMain()
needs to also be checking/updating the umask() based on the data
directory, as is done in PostmasterMain.

Testing that out now and if that looks good then I'll push that and
hopefully it'll make cullcidae green.

Thanks!

Stephen


signature.asc
Description: PGP signature


Searching for: Fast windows buildfarm animal

2018-04-07 Thread Andres Freund
Hi,

It's common that half the buildfarm has reported back before a single
windows buildfarm animal reports. And if they report a failure one often
has to wait for hours for the next run.

It'd be awesome if somebody could set up a windows animal that runs
frequently (i.e. checks for build needed every minute or five) and is
fast enough to not take ages to finish a build.

Perhaps some recipient has that, or knows somebody with the necessary
hardware / resources available?

Greetings,

Andres Freund



Re: PATCH: Configurable file mode mask

2018-04-07 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2018-04-07%2021%3A50%3A02

> Yes, I'm taking a look at it.

Since other animals are coming back successfully, my first suspicion
lights on culicidae's use of EXEC_BACKEND.  Did you test that case?
(If not, this bodes ill for the Windows results.)

regards, tom lane



Re: PATCH: Configurable file mode mask

2018-04-07 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Time to watch the buildfarm,
> 
> That didn't take long.
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2018-04-07%2021%3A50%3A02

Yes, I'm taking a look at it.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PATCH: Configurable file mode mask

2018-04-07 Thread Tom Lane
Stephen Frost  writes:
> Time to watch the buildfarm,

That didn't take long.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2018-04-07%2021%3A50%3A02

(I'm really starting to get a bit upset at the amount of stuff that's
being pushed in on the very last day.)

regards, tom lane



Re: PATCH: Configurable file mode mask

2018-04-07 Thread David Steele
On 4/7/18 5:49 PM, Stephen Frost wrote:
> * David Steele (da...@pgmasters.net) wrote:
>>
>> OK, one last review.  I did't make any code changes, but I improved some
>> comments, added documentation and fixed a test.
> 
> Thanks!  I took those and then added a bit more commentary around the
> umask() calls in the utilities and fixed a typo or two and then pushed
> it.

Excellent, thank you!

> Time to watch the buildfarm, particularly for Windows hosts just in case
> there's something in the regression tests which aren't working correctly
> on that platform.  I was able to run the full regression suite locally
> before committing, though given the recent activity, we may see failures
> attributed to this patch which are due to unrelated instability.

I'll have dinner then come back and take a look.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: PATCH: Configurable file mode mask

2018-04-07 Thread Stephen Frost
David,

* David Steele (da...@pgmasters.net) wrote:
> On 4/6/18 10:22 PM, Stephen Frost wrote:
> > * David Steele (da...@pgmasters.net) wrote:
> >> On 4/6/18 6:04 PM, David Steele wrote:
> >>> On 4/6/18 3:02 PM, Stephen Frost wrote:
> 
>  - Further discussion in the commit messages
> >>>
> >>> Agreed, these need some more work.  I'm happy to do that but I'll need a
> >>> bit more time.  Have a look at the new patches and I'll work on some
> >>> better messages.
> >>
> >> I'm sure you'll want to reword some things, but I think these commit
> >> messages capture the essential changes for each patch.
> > 
> > Thanks much.  I've taken (most) of these, adjusting a few bits here and
> > there.
> > 
> > I've been back over the patch again, mostly improving the commit
> > messages, comments, and docs.  I also looked over the code and tests
> > again and they're looking pretty good to me, so I'll be looking to
> > commit this tomorrow afternoon or so US/Eastern.
> 
> OK, one last review.  I did't make any code changes, but I improved some
> comments, added documentation and fixed a test.

Thanks!  I took those and then added a bit more commentary around the
umask() calls in the utilities and fixed a typo or two and then pushed
it.

Time to watch the buildfarm, particularly for Windows hosts just in case
there's something in the regression tests which aren't working correctly
on that platform.  I was able to run the full regression suite locally
before committing, though given the recent activity, we may see failures
attributed to this patch which are due to unrelated instability.

Thanks again!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread David Rowley
On 8 April 2018 at 09:13, Alvaro Herrera  wrote:
> I pushed this patch -- 0001, 0002 and 0003 only.  I did not include
> anything from 0004 and 0005; I didn't even get to the point of reading
> them, so that I could focus on the first part.

Oh great! Thank you for working on this and pushing it, especially so
during your weekend.

> While we didn't get fast pruning support for MergeAppend or the
> DELETE/UPDATE parts, I think those are valuable and recommend to
> resubmit those for PG12.

Thanks. I'll certainly be doing that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Removing useless DISTINCT clauses

2018-04-07 Thread David Rowley
On 8 April 2018 at 08:55, Tom Lane  wrote:
> regression=# create table t1 (a int,b int, c int, d int, primary key(a,b));
> CREATE TABLE
> regression=# explain verbose select distinct * from t1 order by a,c,b;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.

Ouch!

> The reason is we're hitting this assert, a bit further down in
> create_distinct_paths:
>
> /* Assert checks that parser didn't mess up... */
> Assert(pathkeys_contained_in(root->distinct_pathkeys,
>  needed_pathkeys));
>
>
> I don't think that that makes this patch unsalvageable.  The way forward
> probably involves remembering that we removed some distinctClause items
> (so that we know the correspondence to the sortClause no longer holds)
> and then working harder in create_distinct_paths when that's the case.
>
> However, that's not something that's going to happen on the last day
> of the commitfest.  So I'm going to mark this Returned With Feedback
> and encourage you to return to the matter in v12.
>
> In the meantime, attached is the version of the patch that I was about to
> commit before getting cold feet.  It has some cosmetic improvements
> over yours, notably comment additions in allpaths.c.

Thanks a lot for spending time on this.

I'll no doubt have some time over this coming winter to see if it can
be fixed and re-submitted in the PG12 cycle.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread Alvaro Herrera
I pushed this patch -- 0001, 0002 and 0003 only.  I did not include
anything from 0004 and 0005; I didn't even get to the point of reading
them, so that I could focus on the first part.

I did not find anything to complain about.  I made a few adjustments and
asked David to supply a paragraph for perform.sgml (the "Using EXPLAIN"
section) which is included here.  I also adopted Jesper's (actually
David's) suggestion of changing "Partitions Pruned" to "Partitions
Removed" in the EXPLAIN output.

I had reservations about a relation_open() in the new executor code.  It
seemed a bit odd; we don't have any other relation_open in the executor
anywhere.  However, setting up the pruneinfo needs some stuff from
relcache that I don't see a reasonable mechanism to pass through
planner.  I asked Andres about it on IM and while he didn't endorse the
patch in any way, his quick opinion was that "it wasn't entirely
insane".  I verified that we already hold lock on the relation.


While we didn't get fast pruning support for MergeAppend or the
DELETE/UPDATE parts, I think those are valuable and recommend to
resubmit those for PG12.

Thank you!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Andrew Gierth
> "Teodor" == Teodor Sigaev  writes:

 >> Support for testing amcaninclude via
 >> pg_indexam_has_property(oid,'can_include') seems to be missing?
 >> 
 >> Also the return values of pg_index_column_has_property for included
 >> columns seem a bit dubious... should probably be returning NULL for most
 >> properties except 'returnable'.
 
 Teodor> Damn, you right, it's missed.

 >> I can look at fixing these for you if you like?

 Teodor> If you will do that I will be very grateful

OK, I will deal with it.

-- 
Andrew (irc:RhodiumToad)



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Teodor Sigaev

Support for testing amcaninclude via
pg_indexam_has_property(oid,'can_include') seems to be missing?

Also the return values of pg_index_column_has_property for included
columns seem a bit dubious... should probably be returning NULL for most
properties except 'returnable'.

Damn, you right, it's missed.


I can look at fixing these for you if you like?


If you will do that I will be very grateful

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Tomas Vondra
On 04/07/2018 06:23 PM, Tom Lane wrote:
> Teodor Sigaev  writes:
>>> I dunno, how would you estimate whether this is actually a win or not?
>>> I don't think our model of sort costs is anywhere near refined enough
>>> or accurate enough to reliably predict whether this is better than
>>> just doing it in one step.  Even if the cost model is good, it's not
>>> going to be better than our statistics about the number/size of the
>>> groups in the first column(s), and that's a notoriously unreliable stat.
> 
>> I think that improvement in cost calculation of sort should be a 
>> separate patch, not directly connected to this one. Postpone patches 
>> till other part will be ready to get max improvement for postponed ones 
>> doesn't seem to me very good, especially if it suggests some improvement 
>> right now.
> 
> No, you misunderstand the point of my argument.  Without a reasonably
> reliable cost model, this patch could easily make performance *worse*
> not better for many people, due to choosing incremental-sort plans
> where they were really a loss.
> 

Yeah. Essentially the patch could push the planner to pick a path that
has low startup cost (and very high total cost), assuming it'll only
need to read small part of the input. But if the number of groups in the
input is low (perhaps just one huge group), that would be a regression.

> If we were at the start of a development cycle and work were being
> promised to be done later in the cycle to improve the planning aspect,
> I'd be more charitable about it.  But this isn't merely the end of a
> cycle, it's the *last day*.  Now is not the time to commit stuff that
> needs, or even just might need, follow-on work.
> 

+1 to that

FWIW I'm willing to spend some time on the patch for PG12, particularly
on the planner / costing part. The potential gains are too interesting.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Peter Geoghegan
On Sat, Apr 7, 2018 at 1:52 PM, Andrew Gierth
 wrote:
> Support for testing amcaninclude via
> pg_indexam_has_property(oid,'can_include') seems to be missing?
>
> Also the return values of pg_index_column_has_property for included
> columns seem a bit dubious... should probably be returning NULL for most
> properties except 'returnable'.
>
> I can look at fixing these for you if you like?

I'm happy to accept your help with it, for one.

-- 
Peter Geoghegan



Re: Removing useless DISTINCT clauses

2018-04-07 Thread Tom Lane
I wrote:
> My gripe about
> this as it stands is mainly that it seems like it's going to do
> a lot of catalog-lookup work twice over, at least in cases that
> have both DISTINCT and GROUP BY --- or is that too small a set to
> worry about?

I convinced myself that that was a silly objection, and started looking
through the patch with intent to commit.  However, re-reading the thread
re-awakened my concern about whether deleting items from the
distinctClause is actually safe.

You're right that remove_unused_subquery_outputs isn't really in trouble
with this; it might fail to remove some things it could remove, but that
seems OK, and at best deserving of a comment.

However, that doesn't mean we're out of the woods.  See also
preprocess_groupclause, particularly this comment:
 * Note: we need no comparable processing of the distinctClause because
 * the parser already enforced that that matches ORDER BY.

as well as the interesting assumptions in create_distinct_paths about
what it means to compare the lengths of the pathkey lists for these
clauses.  Once I noticed that, I became convinced that this patch actually
breaks planning of sort/unique-based DISTINCT: if we have a pkey a,b,
but the ORDER BY list is a,c,b, then we will sort by a,c,b but the
Unique node will be told it only needs to unique-ify on the first two
sort columns.

That led me to try this test case, and it blew up even better than
I expected:

regression=# create table t1 (a int,b int, c int, d int, primary key(a,b));
CREATE TABLE
regression=# explain verbose select distinct * from t1 order by a,c,b;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

The reason is we're hitting this assert, a bit further down in
create_distinct_paths:

/* Assert checks that parser didn't mess up... */
Assert(pathkeys_contained_in(root->distinct_pathkeys,
 needed_pathkeys));


I don't think that that makes this patch unsalvageable.  The way forward
probably involves remembering that we removed some distinctClause items
(so that we know the correspondence to the sortClause no longer holds)
and then working harder in create_distinct_paths when that's the case.

However, that's not something that's going to happen on the last day
of the commitfest.  So I'm going to mark this Returned With Feedback
and encourage you to return to the matter in v12.

In the meantime, attached is the version of the patch that I was about to
commit before getting cold feet.  It has some cosmetic improvements
over yours, notably comment additions in allpaths.c.

regards, tom lane

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 65a34a2..345ed17 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
*** remove_unused_subquery_outputs(Query *su
*** 3306,3311 
--- 3306,3315 
  	/*
  	 * If subquery has regular DISTINCT (not DISTINCT ON), we're wasting our
  	 * time: all its output columns must be used in the distinctClause.
+ 	 * (Note: the latter is not necessarily true anymore, because planner.c
+ 	 * might have found some of the DISTINCT columns to be redundant and
+ 	 * dropped them.  But they'd still have sortgroupref markings, so unless
+ 	 * we improve the heuristic below, we would not recognize them as unused.)
  	 */
  	if (subquery->distinctClause && !subquery->hasDistinctOn)
  		return;
*** remove_unused_subquery_outputs(Query *su
*** 3348,3358 
  
  		/*
  		 * If it has a sortgroupref number, it's used in some sort/group
! 		 * clause so we'd better not remove it.  Also, don't remove any
! 		 * resjunk columns, since their reason for being has nothing to do
! 		 * with anybody reading the subquery's output.  (It's likely that
! 		 * resjunk columns in a sub-SELECT would always have ressortgroupref
! 		 * set, but even if they don't, it seems imprudent to remove them.)
  		 */
  		if (tle->ressortgroupref || tle->resjunk)
  			continue;
--- 3352,3365 
  
  		/*
  		 * If it has a sortgroupref number, it's used in some sort/group
! 		 * clause so we'd better not remove it.  (This is a conservative
! 		 * heuristic, since it might not actually be used by any surviving
! 		 * sort/group clause; but we don't bother to expend the cycles needed
! 		 * for a more accurate test.)  Also, don't remove any resjunk columns,
! 		 * since their reason for being has nothing to do with anybody reading
! 		 * the subquery's output.  (It's likely that resjunk columns in a
! 		 * sub-SELECT would always have ressortgroupref set, but even if they
! 		 * don't, it seems imprudent to remove them.)
  		 */
  		if (tle->ressortgroupref || tle->resjunk)
  			continue;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 008492b..a9ccfed 100644

Re: WIP: Covering + unique indexes.

2018-04-07 Thread Andrew Gierth
> "Teodor" == Teodor Sigaev  writes:

 >> I'll keep an eye on the buildfarm, since it's late in Russia.
 
 Teodor> Thank you very much! Now 23:10 MSK and I'll be able to follow
 Teodor> during approximately hour.

Support for testing amcaninclude via
pg_indexam_has_property(oid,'can_include') seems to be missing?

Also the return values of pg_index_column_has_property for included
columns seem a bit dubious... should probably be returning NULL for most
properties except 'returnable'.

I can look at fixing these for you if you like?

-- 
Andrew (irc:RhodiumToad)



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Teodor Sigaev

Thank you, I looked to buildfarm and completely forget about commitfest site

Andres Freund wrote:

On 2018-04-07 23:02:08 +0300, Teodor Sigaev wrote:

Thanks to everyone, pushed.


Marked CF entry as committed.

Greetings,

Andres Freund



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-04-07 Thread Tomas Vondra

On 04/07/2018 07:52 PM, Dean Rasheed wrote:
> On 7 April 2018 at 15:12, Bruce Momjian  wrote:
>> Uh, where are we on this patch?  It isn't going to make it into PG 11?
>> Feature development for this has been going on for years.
> 
> Unfortunately, I think there's no way that this will be ready for
> PG11. So far, I have only read parts of the patch, focusing mainly on
> the planner changes, and how it will make use of the new statistics. I
> think there are still issues to work out in that area, although I
> haven't read the latest version yet, I have some doubts about the new
> approach described.
> 
> Looking back at the history of this patch, it appears to have moved
> through all 4 PG11 commitfests with fairly minimal review, never
> becoming Ready for Committer. That's a real shame because I agree that
> it's an important feature, but IMO it's not yet in a committable
> state. I feel bad saying that, because the patch hasn't really had a
> fair chance to-date, despite Tomas' efforts and quick responses to all
> review comments.
> 

Well, yeah. The free fall through all PG11 commitfests is somewhat
demotivating :-/

I certainly agree the patch is not committable in the current state. I
don't think it's in terrible shape, but I'm sure there are still some
dubious parts. I certainly know about a few.

> At this stage though, all I can say is that I'll make every effort to
> keep reviewing it, and I hope Tomas will persist, so that it has a
> better chance in PG12.
> 

Thank you for the effort and for the reviews, of course.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-04-07 Thread Amit Langote
On Sun, Apr 8, 2018 at 5:37 AM, Andres Freund  wrote:
> On 2018-04-06 19:25:20 -0400, Robert Haas wrote:
>> On Thu, Apr 5, 2018 at 6:21 AM, Etsuro Fujita
>>  wrote:
>> > Attached is an updated version of the patch set plus the patch in [1]. 
>> > Patch
>> > 0003_foreign-routing-fdwapi-6.patch can be applied on top of patch
>> > 0001_postgres-fdw-refactoring-6.patch and
>> > 0002_copy-from-check-constraint-fix.patch.
>>
>> Committed.  Let me say that you do nice work.
>
> The CF entry for this is still marked as 'ready for committer'. Does anything 
> remain?
>
> https://commitfest.postgresql.org/17/1184/

Nothing remains, so marked as committed.

Thanks,
Amit



Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-04-07 Thread Andres Freund
On 2018-04-06 19:25:20 -0400, Robert Haas wrote:
> On Thu, Apr 5, 2018 at 6:21 AM, Etsuro Fujita
>  wrote:
> > Attached is an updated version of the patch set plus the patch in [1]. Patch
> > 0003_foreign-routing-fdwapi-6.patch can be applied on top of patch
> > 0001_postgres-fdw-refactoring-6.patch and
> > 0002_copy-from-check-constraint-fix.patch.
> 
> Committed.  Let me say that you do nice work.

The CF entry for this is still marked as 'ready for committer'. Does anything 
remain?

https://commitfest.postgresql.org/17/1184/

Greetings,

Andres Freund



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Andres Freund
On 2018-04-07 23:02:08 +0300, Teodor Sigaev wrote:
> Thanks to everyone, pushed.

Marked CF entry as committed.

Greetings,

Andres Freund



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-04-07 Thread Andres Freund
On 2018-04-07 20:13:36 +0530, amul sul wrote:
> Attached is the patch does the renaming of this tests -- need to apply
> to the top of  v10 patch[1].

These indeed are a bit too long, so I went with the numbers.  I've
pushed the patch now. Two changes:
- I've added one more error patch to EvalPlanQualFetch, as suggested by
  Amit. Added tests for that too.
- renamed '*_rang_*' table names in tests to range.

Thanks!

- Andres



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-04-07 Thread Andres Freund
On 2018-04-06 09:41:07 +0530, Amit Kapila wrote:
> >> Won't the same question applies to the similar usage in
> >> EvalPlanQualFetch and heap_lock_updated_tuple_rec.
> >
> > I don't think so?
> >
> >
> >> In EvalPlanQualFetch, we consider such a tuple to be deleted and will
> >> silently miss/skip it which seems contradictory to the places where we
> >> have detected such a situation and raised an error.
> >
> > if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
> > ereport(ERROR,
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >  errmsg("tuple to be locked was already moved to 
> > another partition due to concurrent update")));
> >
> >
> 
> I was talking about the case when the tuple version is not visible aka
> the below code:

> I think if we return an error in EvalPlanQualFetch at the place
> mentioned above, the behavior will be sane.

I think you're right. I've adapted the code, added a bunch of tests.

Greetings,

Andres Freund



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Teodor Sigaev

I'll keep an eye on the buildfarm, since it's late in Russia.


Thank you very much! Now 23:10 MSK and I'll be able to follow during 
approximately hour.


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Peter Geoghegan
On Sat, Apr 7, 2018 at 1:02 PM, Teodor Sigaev  wrote:
> Thanks to everyone, pushed.

I'll keep an eye on the buildfarm, since it's late in Russia.

-- 
Peter Geoghegan



Sv: Re: WIP: Covering + unique indexes.

2018-04-07 Thread Andreas Joseph Krogh
På lørdag 07. april 2018 kl. 22:02:08, skrev Teodor Sigaev mailto:teo...@sigaev.ru>>:
Thanks to everyone, pushed.
 
Rock!
 
--
Andreas Joseph Krogh

 


Re: WIP: Covering + unique indexes.

2018-04-07 Thread Teodor Sigaev

Thanks to everyone, pushed.




Peter Geoghegan wrote:

On Sat, Apr 7, 2018 at 5:48 AM, Teodor Sigaev  wrote:

On close look, bts_btentry.ip_posid is not used anymore, I change
bts_btentry type to BlockNumber. As result, BTEntrySame() is removed.


That seems like a good idea.


I'm not very happy with massive usage of
ItemPointerGetBlockNumberNoCheck(&(itup->t_tid)), suggest to  wrap it to
macro something like this:
#define BTreeInnerTupleGetDownLink(itup) \
 ItemPointerGetBlockNumberNoCheck(&(itup->t_tid))


Agreed. We do that with GIN.



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: [HACKERS] path toward faster partition pruning

2018-04-07 Thread Alvaro Herrera
Andres Freund wrote:
> On 2018-04-07 08:13:23 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > I've also attempted to fix rhinoceros's failure I remarked upon a couple
> > > hours ago in
> > > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de
> > 
> > And this, too.  I was unsure if this was because we were missing calling
> > some object access hook from the new code, or the additional pruning.
> 
> That's possible.  I did attempt to skim the code, that's where my
> complain about the docs originated. There certainly isn't an
> InvokeFunctionExecuteHook() present.  It's not clear to me whether
> that's an issue - we don't invoke the hooks in a significant number of
> places either, and as far as I can discern there's not much rule or
> reason about where we invoke it.

I managed to convince myself that it's not higher-level code's
responsibility to invoke the execute hooks; the likelihood of bugs of
omission seems just too large.  But maybe I'm wrong.

There's a small number of InvokeFunctionExecuteHook calls in the
executor, but I really doubt that it exhaustively covers everyplace
where catalogued functions are called in the executor.

CC'ing KaiGai and Stephen Frost; they may want to chip in here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [sqlsmith] Segfault in expand_tuple

2018-04-07 Thread Andres Freund
Hi,

On 2018-04-07 21:28:39 +0200, Andreas Seltenreich wrote:
> the following query triggers a segfault for me when run against the
> regression database.  Testing was done with master at 039eb6e92f.
> Backtrace below.

Andrew, that looks like it's in your area?

> Core was generated by `postgres: smith regression [local] SELECT  '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> (gdb) bt
> #0  0x556c14759cb8 in expand_tuple 
> (targetHeapTuple=targetHeapTuple@entry=0x0, 
> targetMinimalTuple=targetMinimalTuple@entry=0x7ffe8088a118, 
> sourceTuple=, tupleDesc=)
> at heaptuple.c:984
> #1  0x556c1475bb46 in minimal_expand_tuple (sourceTuple=, 
> tupleDesc=) at heaptuple.c:1015
> #2  0x556c14917177 in ExecCopySlotMinimalTuple (slot=) at 
> execTuples.c:631
> #3  0x556c14ba8ada in copytup_heap (state=0x556c16c4f5e8, 
> stup=0x7ffe8088a180, tup=) at tuplesort.c:3585
> #4  0x556c14baf8e6 in tuplesort_puttupleslot 
> (state=state@entry=0x556c16c4f5e8, slot=) at tuplesort.c:1444
> #5  0x556c14937791 in ExecSort (pstate=0x556c16c3ac50) at nodeSort.c:112
> #6  0x556c1493c6f4 in ExecProcNode (node=0x556c16c3ac50) at 
> ../../../src/include/executor/executor.h:239
> #7  begin_partition (winstate=winstate@entry=0x556c16c3a6b8) at 
> nodeWindowAgg.c:1110
> #8  0x556c149403aa in ExecWindowAgg (pstate=0x556c16c3a6b8) at 
> nodeWindowAgg.c:2094
> #9  0x556c1490c0ca in ExecProcNode (node=0x556c16c3a6b8) at 
> ../../../src/include/executor/executor.h:239


Greetings,

Andres Freund



[sqlsmith] Segfault in expand_tuple

2018-04-07 Thread Andreas Seltenreich
Hi,

the following query triggers a segfault for me when run against the
regression database.  Testing was done with master at 039eb6e92f.
Backtrace below.

regards,
Andreas

select
  case when pg_catalog.lastval() < 
pg_catalog.pg_stat_get_bgwriter_maxwritten_clean() then case when 
pg_catalog.circle_sub_pt(
  cast(cast(null as circle) as circle),
  cast((select location from public.emp limit 1 offset 13)
 as point)) ~ cast(nullif(case when cast(null as box) &> (select 
boxcol from public.brintest limit 1 offset 2)
 then (select f1 from public.circle_tbl limit 1 offset 4)
   else (select f1 from public.circle_tbl limit 1 offset 4)
   end,
  case when (select pg_catalog.max(class) from public.f_star)
 ~~ ref_0.c then cast(null as circle) else cast(null as circle) 
end
) as circle) then ref_0.a else ref_0.a end
   else case when pg_catalog.circle_sub_pt(
  cast(cast(null as circle) as circle),
  cast((select location from public.emp limit 1 offset 13)
 as point)) ~ cast(nullif(case when cast(null as box) &> (select 
boxcol from public.brintest limit 1 offset 2)
 then (select f1 from public.circle_tbl limit 1 offset 4)
   else (select f1 from public.circle_tbl limit 1 offset 4)
   end,
  case when (select pg_catalog.max(class) from public.f_star)
 ~~ ref_0.c then cast(null as circle) else cast(null as circle) 
end
) as circle) then ref_0.a else ref_0.a end
   end as c0,
  case when (select intervalcol from public.brintest limit 1 offset 1)
 >= cast(null as "interval") then case when ((select 
pg_catalog.max(roomno) from public.room)
 !~~ ref_0.c)
and (cast(null as xid) <> 100) then ref_0.b else ref_0.b end
   else case when ((select pg_catalog.max(roomno) from public.room)
 !~~ ref_0.c)
and (cast(null as xid) <> 100) then ref_0.b else ref_0.b end
   end as c1,
  ref_0.a as c2,
  (select a from public.idxpart1 limit 1 offset 5) as c3,
  ref_0.b as c4,
pg_catalog.stddev(
  cast((select pg_catalog.sum(float4col) from public.brintest)
 as float4)) over (partition by ref_0.a,ref_0.b,ref_0.c order by 
ref_0.b) as c5,
  cast(nullif(ref_0.b, ref_0.a) as int4) as c6, ref_0.b as c7, ref_0.c as c8
from
  public.mlparted3 as ref_0
where true;

Core was generated by `postgres: smith regression [local] SELECT  '.
Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0x556c14759cb8 in expand_tuple 
(targetHeapTuple=targetHeapTuple@entry=0x0, 
targetMinimalTuple=targetMinimalTuple@entry=0x7ffe8088a118, 
sourceTuple=, tupleDesc=)
at heaptuple.c:984
#1  0x556c1475bb46 in minimal_expand_tuple (sourceTuple=, 
tupleDesc=) at heaptuple.c:1015
#2  0x556c14917177 in ExecCopySlotMinimalTuple (slot=) at 
execTuples.c:631
#3  0x556c14ba8ada in copytup_heap (state=0x556c16c4f5e8, 
stup=0x7ffe8088a180, tup=) at tuplesort.c:3585
#4  0x556c14baf8e6 in tuplesort_puttupleslot 
(state=state@entry=0x556c16c4f5e8, slot=) at tuplesort.c:1444
#5  0x556c14937791 in ExecSort (pstate=0x556c16c3ac50) at nodeSort.c:112
#6  0x556c1493c6f4 in ExecProcNode (node=0x556c16c3ac50) at 
../../../src/include/executor/executor.h:239
#7  begin_partition (winstate=winstate@entry=0x556c16c3a6b8) at 
nodeWindowAgg.c:1110
#8  0x556c149403aa in ExecWindowAgg (pstate=0x556c16c3a6b8) at 
nodeWindowAgg.c:2094
#9  0x556c1490c0ca in ExecProcNode (node=0x556c16c3a6b8) at 
../../../src/include/executor/executor.h:239
#10 ExecutePlan (execute_once=, dest=0x7f25481b5e88, 
direction=, numberTuples=0, 
sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x556c16c3a6b8, 
estate=0x556c16c1bbf8) at execMain.c:1729
#11 standard_ExecutorRun (queryDesc=0x556c16c250c8, direction=, 
count=0, execute_once=)
at execMain.c:364
#12 0x556c14a6b40c in PortalRunSelect (portal=portal@entry=0x556c16b96468, 
forward=forward@entry=true, count=0, 
count@entry=9223372036854775807, dest=dest@entry=0x7f25481b5e88) at 
pquery.c:937
#13 0x556c14a6ca90 in PortalRun (portal=portal@entry=0x556c16b96468, 
count=count@entry=9223372036854775807, 
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, 
dest=dest@entry=0x7f25481b5e88, 
altdest=altdest@entry=0x7f25481b5e88, completionTag=0x7ffe8088a500 "") at 
pquery.c:778
#14 0x556c14a6859b in exec_simple_query (
query_string=0x556c16b2b438 "select\n  case when pg_catalog.lastval() < 
pg_catalog.pg_stat_get_bgwriter_maxwritten_clean() then case when 
pg_catalog.circle_sub_pt(\n\t  cast(cast(null as circle) as circle),\n\t  
cast((select location "...) at postgres.c:1121
#15 0x556c14a6a341 in PostgresMain (argc=, 
argv=argv@entry=0x556c16b56ad8, dbname=, 
username=) at postgres.c:4149
#16 0x556c1474eac4 in BackendRun (port=0x556c16b4c030) at 

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-07 Thread Andreas Seltenreich
Hi,

sqlsmith triggered an assertion with the following MERGE statement
against the regression database.  Testing was done with master at
039eb6e92f.  Backtrace below.

regards,
Andreas

MERGE INTO public.pagg_tab_ml_p3 as target_0
USING public.hash_i4_heap as ref_0
ON target_0.b = ref_0.seqno
WHEN MATCHED AND ((select bitcol from public.brintest limit 1 offset 92)
 > cast(null as "bit"))
and (false)
   THEN UPDATE  set
b = target_0.b,
a = target_0.b
WHEN NOT MATCHED
  AND cast(null as text) ~ cast(nullif(case when cast(null as float8) <= 
cast(null as float8) then cast(null as text) else cast(null as text) end
,
  cast(null as text)) as text)
THEN DO NOTHING;

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f25474cf42a in __GI_abort () at abort.c:89
#2  0x556c14b75bb3 in ExceptionalCondition (
conditionName=conditionName@entry=0x556c14d09bf8 "!(list != ((List *) 
((void *)0)))", 
errorType=errorType@entry=0x556c14bc4dbd "FailedAssertion", 
fileName=fileName@entry=0x556c14d3022c "list.c", 
lineNumber=lineNumber@entry=390) at assert.c:54
#3  0x556c1495d580 in list_nth_cell (list=, n=) at list.c:390
#4  0x556c1495d5d6 in list_nth (list=list@entry=0x0, n=) at 
list.c:413
#5  0x556c14911fa5 in adjust_partition_tlist (tlist=0x0, map=, map=) at execPartition.c:1266
#6  0x556c14913049 in ExecInitPartitionInfo 
(mtstate=mtstate@entry=0x556c16c163e8, resultRelInfo=, 
proute=proute@entry=0x556c16c29988, estate=estate@entry=0x556c16c15bf8, 
partidx=0) at execPartition.c:683
#7  0x556c1490ff80 in ExecMergeMatched (junkfilter=0x556c16c15bf8, 
tupleid=0x7ffe8088a10a, slot=0x556c16c22e20, 
estate=0x556c16c15bf8, mtstate=0x556c16c163e8) at execMerge.c:205
#8  ExecMerge (mtstate=mtstate@entry=0x556c16c163e8, 
estate=estate@entry=0x556c16c15bf8, slot=slot@entry=0x556c16c22e20, 
junkfilter=junkfilter@entry=0x556c16c2b730, 
resultRelInfo=resultRelInfo@entry=0x556c16c15e48) at execMerge.c:127
#9  0x556c14933614 in ExecModifyTable (pstate=0x556c16c163e8) at 
nodeModifyTable.c:2179
#10 0x556c1490c0ca in ExecProcNode (node=0x556c16c163e8) at 
../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=, dest=0x556c16c111b8, 
direction=, numberTuples=0, 
sendTuples=, operation=CMD_MERGE, 
use_parallel_mode=, planstate=0x556c16c163e8, 
estate=0x556c16c15bf8) at execMain.c:1729
#12 standard_ExecutorRun (queryDesc=0x556c16c1bce8, direction=, 
count=0, execute_once=)
at execMain.c:364
#13 0x556c14a6ba52 in ProcessQuery (plan=, 
sourceText=0x556c16b2ac08 "...", params=0x0, 
queryEnv=0x0, dest=0x556c16c111b8, completionTag=0x7ffe8088a500 "") at 
pquery.c:161
#14 0x556c14a6bceb in PortalRunMulti (portal=portal@entry=0x556c16b96468, 
isTopLevel=isTopLevel@entry=true, 
setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x556c16c111b8, altdest=altdest@entry=0x556c16c111b8, 
completionTag=completionTag@entry=0x7ffe8088a500 "") at pquery.c:1291
#15 0x556c14a6c979 in PortalRun (portal=portal@entry=0x556c16b96468, 
count=count@entry=9223372036854775807, 
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, 
dest=dest@entry=0x556c16c111b8, 
altdest=altdest@entry=0x556c16c111b8, completionTag=0x7ffe8088a500 "") at 
pquery.c:804
#16 0x556c14a6859b in exec_simple_query (
query_string=0x556c16b2ac08 "MERGE INTO public.pagg_tab_ml_p3 as 
target_0\nUSING public.hash_i4_heap as ref_0\nON target_0.b = ref_0.seqno\nWHEN 
MATCHED AND ((select bitcol from public.brintest limit 1 offset 92)\n > 
cast(null as \"bi"...) at postgres.c:1121
#17 0x556c14a6a341 in PostgresMain (argc=, 
argv=argv@entry=0x556c16b56ad8, dbname=, 
username=) at postgres.c:4149
#18 0x556c1474eac4 in BackendRun (port=0x556c16b4c030) at postmaster.c:4409
#19 BackendStartup (port=0x556c16b4c030) at postmaster.c:4081
#20 ServerLoop () at postmaster.c:1754
#21 0x556c149ec017 in PostmasterMain (argc=3, argv=0x556c16b257d0) at 
postmaster.c:1362
#22 0x556c1475006d in main (argc=3, argv=0x556c16b257d0) at main.c:228



Re: [sqlsmith] Failed assertion in create_gather_path

2018-04-07 Thread Andreas Seltenreich
Tom Lane writes:
> Alvaro Herrera  writes:
>> Andreas Seltenreich wrote:
>>> as of 039eb6e92f:
>>> TRAP: FailedAssertion("!(subpath->parallel_safe)", File: "pathnode.c", 
>>> Line: 1813)
>
>> Uh, that's pretty strange -- that patch did not touch the planner at
>> all, and the relcache.c changes are pretty trivial.
>
> I don't think Andreas meant that the bug is necessarily new in 039eb6e92f,
> only that that's the version he happened to be testing.

Right.  I'm not testing often enough yet to be able to report on commit
granularity :-).  I'll try for a less ambiguos wording in future
reports.



Re: [sqlsmith] Failed assertion in create_gather_path

2018-04-07 Thread Tom Lane
Alvaro Herrera  writes:
> Andreas Seltenreich wrote:
>> sqlsmith found a query that triggers the following assertion in master
>> as of 039eb6e92f:
>> TRAP: FailedAssertion("!(subpath->parallel_safe)", File: "pathnode.c", Line: 
>> 1813)

> Uh, that's pretty strange -- that patch did not touch the planner at
> all, and the relcache.c changes are pretty trivial.

I don't think Andreas meant that the bug is necessarily new in 039eb6e92f,
only that that's the version he happened to be testing.

regards, tom lane



Re: [sqlsmith] Failed assertion in create_gather_path

2018-04-07 Thread Alvaro Herrera
Andreas Seltenreich wrote:
> Hi,
> 
> sqlsmith found a query that triggers the following assertion in master
> as of 039eb6e92f:
> 
> TRAP: FailedAssertion("!(subpath->parallel_safe)", File: "pathnode.c", Line: 
> 1813)
> 
> Backtrace and recipe against the regression database below.

Uh, that's pretty strange -- that patch did not touch the planner at
all, and the relcache.c changes are pretty trivial.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-04-07 Thread Bruce Momjian
On Sat, Apr  7, 2018 at 06:52:42PM +0100, Dean Rasheed wrote:
> On 7 April 2018 at 15:12, Bruce Momjian  wrote:
> > Uh, where are we on this patch?  It isn't going to make it into PG 11?
> > Feature development for this has been going on for years.
> 
> Unfortunately, I think there's no way that this will be ready for
> PG11. So far, I have only read parts of the patch, focusing mainly on
> the planner changes, and how it will make use of the new statistics. I
> think there are still issues to work out in that area, although I
> haven't read the latest version yet, I have some doubts about the new
> approach described.
> 
> Looking back at the history of this patch, it appears to have moved
> through all 4 PG11 commitfests with fairly minimal review, never
> becoming Ready for Committer. That's a real shame because I agree that
> it's an important feature, but IMO it's not yet in a committable
> state. I feel bad saying that, because the patch hasn't really had a
> fair chance to-date, despite Tomas' efforts and quick responses to all
> review comments.
> 
> At this stage though, all I can say is that I'll make every effort to
> keep reviewing it, and I hope Tomas will persist, so that it has a
> better chance in PG12.

Yes, let's please keep going on this patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



[sqlsmith] Failed assertion on pfree() via perform_pruning_combine_step

2018-04-07 Thread Andreas Seltenreich
Hi,

testing with master at 039eb6e92f yielded another query triggering an
assertion.  Backtrace and query against the regression database below.

regards,
Andreas

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f25474cf42a in __GI_abort () at abort.c:89
#2  0x556c14b75bb3 in ExceptionalCondition (
conditionName=conditionName@entry=0x556c14d11510 "!(((context) != ((void 
*)0) && (const Node*)((context)))->type) == T_AllocSetContext) || const 
Node*)((context)))->type) == T_SlabContext) || const 
Node*)((context)))->type) == T_Generatio"..., 
errorType=errorType@entry=0x556c14bcac95 "BadArgument", 
fileName=fileName@entry=0x556c14d11480 
"../../../../src/include/utils/memutils.h", lineNumber=lineNumber@entry=129)
at assert.c:54
#3  0x556c14ba28cb in GetMemoryChunkContext (pointer=) at 
../../../../src/include/utils/memutils.h:129
#4  pfree (pointer=) at mcxt.c:1033
#5  0x556c1495fc01 in bms_int_members (a=, b=) at bitmapset.c:917
#6  0x556c149d910a in perform_pruning_combine_step (context=0x7ffe80889b20, 
step_results=0x7f253e3efcc0, cstep=0x7f253e3f13a0)
at partprune.c:2697
#7  get_matching_partitions (context=0x7ffe80889b20, pruning_steps=) at partprune.c:317
#8  0x556c149d9596 in prune_append_rel_partitions 
(rel=rel@entry=0x7f253e3eb3e8) at partprune.c:262
#9  0x556c14989e21 in set_append_rel_size (rte=0x7f254819d810, rti=2, 
rel=0x7f253e3eb3e8, root=0x7f254819d3c8) at allpaths.c:907
#10 set_rel_size (root=root@entry=0x7f254819d3c8, rel=rel@entry=0x7f253e3eb3e8, 
rti=rti@entry=2, rte=0x7f254819d810)
at allpaths.c:341
#11 0x556c1498afad in set_base_rel_sizes (root=) at 
allpaths.c:281
#12 make_one_rel (root=root@entry=0x7f254819d3c8, 
joinlist=joinlist@entry=0x7f253e3f0100) at allpaths.c:179
#13 0x556c149abfac in query_planner (root=root@entry=0x7f254819d3c8, 
tlist=tlist@entry=0x7f25481afdc0, 
qp_callback=qp_callback@entry=0x556c149acb90 , 
qp_extra=qp_extra@entry=0x7ffe80889dc0) at planmain.c:259
#14 0x556c149b0be5 in grouping_planner (root=root@entry=0x7f254819d3c8, 
inheritance_update=inheritance_update@entry=false, 
tuple_fraction=, tuple_fraction@entry=0) at planner.c:1914
#15 0x556c149b31a1 in subquery_planner (glob=, 
parse=parse@entry=0x556c16bf4c88, 
parent_root=parent_root@entry=0x556c16bf6360, 
hasRecursion=hasRecursion@entry=false, tuple_fraction=)
at planner.c:984
#16 0x556c149899d3 in set_subquery_pathlist (rte=, 
rti=, rel=0x556c16c0f6a0, root=0x556c16bf6360)
at allpaths.c:2196
#17 set_rel_size (root=root@entry=0x556c16bf6360, rel=rel@entry=0x556c16c0eaf0, 
rti=rti@entry=2, rte=)
at allpaths.c:379
#18 0x556c1498afad in set_base_rel_sizes (root=) at 
allpaths.c:281
#19 make_one_rel (root=root@entry=0x556c16bf6360, 
joinlist=joinlist@entry=0x556c16c0eda8) at allpaths.c:179
#20 0x556c149abfac in query_planner (root=root@entry=0x556c16bf6360, 
tlist=tlist@entry=0x556c16c0d928, 
qp_callback=qp_callback@entry=0x556c149acb90 , 
qp_extra=qp_extra@entry=0x7ffe8088a200) at planmain.c:259
#21 0x556c149b0be5 in grouping_planner (root=root@entry=0x556c16bf6360, 
inheritance_update=inheritance_update@entry=false, 
tuple_fraction=, tuple_fraction@entry=0) at planner.c:1914
#22 0x556c149b31a1 in subquery_planner (glob=glob@entry=0x556c16bf5bf0, 
parse=parse@entry=0x556c16bf4da0, 
parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=false, 
tuple_fraction=tuple_fraction@entry=0)
at planner.c:984
#23 0x556c149b4356 in standard_planner (parse=0x556c16bf4da0, 
cursorOptions=256, boundParams=) at planner.c:405
#24 0x556c14a680dd in pg_plan_query (querytree=0x556c16bf4da0, 
cursorOptions=256, boundParams=0x0) at postgres.c:808
#25 0x556c14a681be in pg_plan_queries (querytrees=, 
cursorOptions=cursorOptions@entry=256, 
boundParams=boundParams@entry=0x0) at postgres.c:874
#26 0x556c14a686a9 in exec_simple_query ( query_string=0x556c16b2b438 
"...") at postgres.c:1049
#27 0x556c14a6a341 in PostgresMain (argc=, 
argv=argv@entry=0x556c16b56ad8, dbname=, 
username=) at postgres.c:4149
#28 0x556c1474eac4 in BackendRun (port=0x556c16b4c030) at postmaster.c:4409
#29 BackendStartup (port=0x556c16b4c030) at postmaster.c:4081
#30 ServerLoop () at postmaster.c:1754
#31 0x556c149ec017 in PostmasterMain (argc=3, argv=0x556c16b257d0) at 
postmaster.c:1362
#32 0x556c1475006d in main (argc=3, argv=0x556c16b257d0) at main.c:228

select
  sample_0.dd as c0,
  subq_1.c3 as c1,
  subq_1.c0 as c2,
  subq_1.c2 as c3,
  subq_1.c3 as c4,
  sample_0.bb as c5,
  subq_1.c0 as c6,
  pg_catalog.pg_current_wal_flush_lsn() as c7,
  public.func_with_bad_set() as c8,
  sample_0.bb as c9,
  sample_0.aa as c10,
  sample_0.dd as c11
from
  public.d as sample_0 tablesample bernoulli (2.8) ,
  lateral (select
subq_0.c1 as c0,
sample_0.aa as c1,
subq_0.c0 as c2,
sample_0.cc as c3,
subq_0

Re: Bring atomic flag fallback up to snuff

2018-04-07 Thread Andres Freund
Hi,

On 2018-04-07 14:23:38 -0400, Tom Lane wrote:
> I think I'd just drop those asserts altogether.  The hardware is in charge
> of complaining about misaligned pointers.

Well, the problem is that some atomics operations on some platforms do
not fail for unaligned pointers, they just loose their atomic
property. Fun times.


> If you do insist on asserting something, it needs to be about ptr->sema;
> the bool value field isn't going to have any interesting alignment
> requirement, but the sema might.

The alignment requirements of sema is going to be taken care of by
normal alignment rules, otherwise we'd be in trouble all over the tree
already.

Think I'll just drop it from the general branch, and add it to the
generic gcc implementation that uses a four byte value (because of arm
and the like).

Greetings,

Andres Freund



Re: Bring atomic flag fallback up to snuff

2018-04-07 Thread Tom Lane
Andres Freund  writes:
> On 2018-04-07 14:07:05 -0400, Tom Lane wrote:
>> TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((sizeof(*ptr)) - 
>> 1)) & ~((uintptr_t) ((sizeof(*ptr)) - 1))) != (uintptr_t)(ptr)", File: 
>> "../../../src/include/port/atomics.h", Line: 177)

> Yea, I just saw that.

> Afaict it's "just" an over-eager / wrong assert. I can't for the heck of
> it think why I wrote (9.5 timeframe)
>   AssertPointerAlignment(ptr, sizeof(*ptr));
> where the bigger ones all have asserts alignment to their own size.  I
> assume I did because some platforms want to do atomics bigger than a
> single int - but then I should've used sizeof(ptr->value).  So far
> pademelon is the only animal affected afaict - let me think about it for
> a bit and come up with a patch, ok?

I think I'd just drop those asserts altogether.  The hardware is in charge
of complaining about misaligned pointers.

If you do insist on asserting something, it needs to be about ptr->sema;
the bool value field isn't going to have any interesting alignment
requirement, but the sema might.

regards, tom lane



Re: Bring atomic flag fallback up to snuff

2018-04-07 Thread Andres Freund
On 2018-04-07 14:07:05 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > As Daniel pointed out in:
> > https://postgr.es/m/fb948276-7b32-4b77-83e6-d00167f8e...@yesql.se the
> > pg_atomic_flag fallback implementation is broken.  That has gone
> > unnoticed because the fallback implementation wasn't testable until now:
> > ...
> > The attached fixes the bug and removes the edge-cases by storing a value
> > separate from the semaphore. I should have done that from the start.
> > This is an ABI break, but given the fallback didn't work at all, I don't
> > think that's a problem for backporting.
> 
> > Fix attached. Comments?
> 
> pademelon says it's wrong.
> 
> 2018-04-07 13:39:34.982 EDT [1197:89] pg_regress/lock LOG:  statement: SELECT 
> test_atomic_ops();
> TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((sizeof(*ptr)) - 
> 1)) & ~((uintptr_t) ((sizeof(*ptr)) - 1))) != (uintptr_t)(ptr)", File: 
> "../../../src/include/port/atomics.h", Line: 177)

Yea, I just saw that.

Afaict it's "just" an over-eager / wrong assert. I can't for the heck of
it think why I wrote (9.5 timeframe)
AssertPointerAlignment(ptr, sizeof(*ptr));
where the bigger ones all have asserts alignment to their own size.  I
assume I did because some platforms want to do atomics bigger than a
single int - but then I should've used sizeof(ptr->value).  So far
pademelon is the only animal affected afaict - let me think about it for
a bit and come up with a patch, ok?

Greetings,

Andres Freund



[sqlsmith] Failed assertion in create_gather_path

2018-04-07 Thread Andreas Seltenreich
Hi,

sqlsmith found a query that triggers the following assertion in master
as of 039eb6e92f:

TRAP: FailedAssertion("!(subpath->parallel_safe)", File: "pathnode.c", Line: 
1813)

Backtrace and recipe against the regression database below.

regards,
Andreas

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f25474cf42a in __GI_abort () at abort.c:89
#2  0x556c14b75bb3 in ExceptionalCondition 
(conditionName=conditionName@entry=0x556c14d16d68 "!(subpath->parallel_safe)",
errorType=errorType@entry=0x556c14bc4dbd "FailedAssertion", 
fileName=fileName@entry=0x556c14d16d43 "pathnode.c",
lineNumber=lineNumber@entry=1813) at assert.c:54
#3  0x556c149ca01d in create_gather_path (root=root@entry=0x556c16bfb7a0, 
rel=rel@entry=0x7f253e36f418, subpath=0x7f253e37d9d8,
target=0x7f253e36f650, required_outer=required_outer@entry=0x0, 
rows=rows@entry=0x0) at pathnode.c:1813
#4  0x556c1498a3d7 in generate_gather_paths 
(root=root@entry=0x556c16bfb7a0, rel=rel@entry=0x7f253e36f418,
override_rows=override_rows@entry=false) at allpaths.c:2564
#5  0x556c1498a7b0 in set_rel_pathlist (root=root@entry=0x556c16bfb7a0, 
rel=0x7f253e36f418, rti=rti@entry=2, rte=0x556c16bfb420)
at allpaths.c:497
#6  0x556c1498b09d in set_base_rel_pathlists (root=) at 
allpaths.c:310
#7  make_one_rel (root=root@entry=0x556c16bfb7a0, 
joinlist=joinlist@entry=0x7f253e374450) at allpaths.c:180
#8  0x556c149abfac in query_planner (root=root@entry=0x556c16bfb7a0, 
tlist=tlist@entry=0x7f253e3eb4a0,
qp_callback=qp_callback@entry=0x556c149acb90 , 
qp_extra=qp_extra@entry=0x7ffe8088a200) at planmain.c:259
#9  0x556c149b0be5 in grouping_planner (root=root@entry=0x556c16bfb7a0, 
inheritance_update=inheritance_update@entry=false,
tuple_fraction=, tuple_fraction@entry=0) at planner.c:1914
#10 0x556c149b31a1 in subquery_planner (glob=glob@entry=0x556c16c234d0, 
parse=parse@entry=0x556c16bfaeb8,
parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=false, 
tuple_fraction=tuple_fraction@entry=0)
at planner.c:984
#11 0x556c149b4356 in standard_planner (parse=0x556c16bfaeb8, 
cursorOptions=256, boundParams=) at planner.c:405
#12 0x556c14a680dd in pg_plan_query (querytree=0x556c16bfaeb8, 
cursorOptions=256, boundParams=0x0) at postgres.c:808
#13 0x556c14a681be in pg_plan_queries (querytrees=, 
cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0) at postgres.c:874
#14 0x556c14a686a9 in exec_simple_query (
query_string=0x556c16b2b438 "...") at postgres.c:1049
#15 0x556c14a6a341 in PostgresMain (argc=, 
argv=argv@entry=0x556c16b56ad8, dbname=,
username=) at postgres.c:4149
#16 0x556c1474eac4 in BackendRun (port=0x556c16b4c030) at postmaster.c:4409
#17 BackendStartup (port=0x556c16b4c030) at postmaster.c:4081
#18 ServerLoop () at postmaster.c:1754
#19 0x556c149ec017 in PostmasterMain (argc=3, argv=0x556c16b257d0) at 
postmaster.c:1362
#20 0x556c1475006d in main (argc=3, argv=0x556c16b257d0) at main.c:228

set min_parallel_table_scan_size to 0;
select
  66 as c0,
  ref_1.cid as c1,
pg_catalog.min(
  cast((select timetzcol from public.brintest limit 1 offset 3)
 as timetz)) over (partition by ref_1.name order by ref_1.name) as c2,
  ref_0.c as c3
from
  public.prt1_l as ref_0
right join public.my_property_normal as ref_1
on (ref_0.a <= ref_0.a)
where EXISTS (
  select
  ref_2.y as c0,
  ref_2.y as c1,
  sample_0.random as c2,
  ref_1.tel as c3,
  ref_0.a as c4,
  sample_0.random as c5,
  ref_2.y as c6,
  ref_2.x as c7,
  case when (true <> (select pg_catalog.bool_and(n) from 
testxmlschema.test2)
  )
  and (sample_0.seqno = (select int_four from public.test_type_diff2_c3 
limit 1 offset 1)
  ) then ref_2.y else ref_2.y end
 as c8,
  sample_0.seqno as c9,
  ref_1.name as c10,
  ref_0.a as c11,
  (select nslots from public.hub limit 1 offset 2)
 as c12,
  ref_1.name as c13
from
  public.hash_name_heap as sample_0 tablesample system (8.2)
left join public.tt6 as ref_2
on cast(null as tinterval) <= (select f1 from public.tinterval_tbl 
limit 1 offset 79)
)
and (ref_2.y is not NULL))
  or (((false)
  and ((cast(null as tsquery) > (select keyword from 
public.test_tsquery limit 1 offset 34)
)
or select pg_catalog.jsonb_agg(sl_name) from 
public.shoelace_obsolete)
 <@ cast(null as jsonb))
or (EXISTS (
  select
  100 as c0,
  ref_0.a as c1,
  sample_0.seqno as c2,
  ref_0.a as c3,
  sample_0.seqno as c4,
  

Re: Bring atomic flag fallback up to snuff

2018-04-07 Thread Tom Lane
Andres Freund  writes:
> As Daniel pointed out in:
> https://postgr.es/m/fb948276-7b32-4b77-83e6-d00167f8e...@yesql.se the
> pg_atomic_flag fallback implementation is broken.  That has gone
> unnoticed because the fallback implementation wasn't testable until now:
> ...
> The attached fixes the bug and removes the edge-cases by storing a value
> separate from the semaphore. I should have done that from the start.
> This is an ABI break, but given the fallback didn't work at all, I don't
> think that's a problem for backporting.

> Fix attached. Comments?

pademelon says it's wrong.

2018-04-07 13:39:34.982 EDT [1197:89] pg_regress/lock LOG:  statement: SELECT 
test_atomic_ops();
TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((sizeof(*ptr)) - 
1)) & ~((uintptr_t) ((sizeof(*ptr)) - 1))) != (uintptr_t)(ptr)", File: 
"../../../src/include/port/atomics.h", Line: 177)


regards, tom lane



Re: json(b)_to_tsvector with numeric values

2018-04-07 Thread Teodor Sigaev

Thank you, pushed with some editorization

Dmitry Dolgov wrote:

On 7 April 2018 at 17:09, Teodor Sigaev  wrote:

See workable sketch for parsing jsonb flags and new worker variant.



Yep, thanks for the sketch. Here is the new version of patch, does it look
close to what you have in mind?



Patch looks good except error messaging, you took it directly from sketch
where I didn't spend time for it. Please, improve. elog() should be used
only for impossible error, whereas user input could contins mistakes.


I assume what you mean is that for user input errors we need to use ereport.
Indeed, thanks for noticing. I've replaced all elog except the last one, since
it actually describes an impossible situation, when we started to read an
array, but ended up having something else instead WJB_END_ARRAY.



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: pgsql: New files for MERGE

2018-04-07 Thread Andres Freund
Hi,

On 2018-04-07 13:33:53 -0400, Bruce Momjian wrote:
> To summarize how I see this patch, we have this workflow at the top of
> the TODO list (which I think Simon helped with or suggested):
> 
>   Desirability -> Design -> Implement -> Test -> Review -> Commit
> 
> I think the MERGE patch spent a long time getting through the first and
> second stages

The current implementation effort publicly started 2017-10-27. For a
major feature that's really not that long ago. There also were a few
gaps in which no public development/discussion happened.


>, and now the objections appear to be related to implementation.  I
>think the implementation issues only appeared during the final
>commitfest, which made it feel like a new patch.  Yes, it had been
>through the first two stages before the final commitfest.

I'm not sure there was agreement on the design even. A lot of that has
been discussed until very recently.


> I think one of the missing rules we have is that when we say no new
> large patches in the final commitfest, do we mean that all _three_
> stages should be solidified before the final commitfest?  I have never
> been clear on that point.

I think the implementation at the least should be roughly ready, and
implement a roughly agreed upon design. It's fine to change things
around, but major re-engineering surely is an alarm sign.

Greetings,

Andres Freund



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-04-07 Thread Dean Rasheed
On 7 April 2018 at 15:12, Bruce Momjian  wrote:
> Uh, where are we on this patch?  It isn't going to make it into PG 11?
> Feature development for this has been going on for years.

Unfortunately, I think there's no way that this will be ready for
PG11. So far, I have only read parts of the patch, focusing mainly on
the planner changes, and how it will make use of the new statistics. I
think there are still issues to work out in that area, although I
haven't read the latest version yet, I have some doubts about the new
approach described.

Looking back at the history of this patch, it appears to have moved
through all 4 PG11 commitfests with fairly minimal review, never
becoming Ready for Committer. That's a real shame because I agree that
it's an important feature, but IMO it's not yet in a committable
state. I feel bad saying that, because the patch hasn't really had a
fair chance to-date, despite Tomas' efforts and quick responses to all
review comments.

At this stage though, all I can say is that I'll make every effort to
keep reviewing it, and I hope Tomas will persist, so that it has a
better chance in PG12.

Regards,
Dean



Re: pgsql: New files for MERGE

2018-04-07 Thread Andres Freund
Hi,

On 2018-04-07 13:45:21 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On 6 April 2018 at 17:22, Bruce Momjian  wrote:
> >> My point was that people didn't ask you to work harder on fixing the
> >> patch, but in reverting it.  You can work harder on fixing things in the
> >> hope they change their minds, but again, that isn't addressing their
> >> request.
> 
> > If Tom or Andres still feel that their concerns have not been
> > addressed over the last few days, I am happy to revert the patch with
> > no further discussion from me in this cycle.
> 
> FWIW, I still vote to revert.  Even if the patch were now perfect,
> there is not time for people to satisfy themselves of that, and
> we've got lots of other things on our plates.

Precisely.

And no, I don't think the concerns have been addressed fully.


> I'd be glad to participate in a proper review of this when v12
> opens.  But right now it just seems too rushed, and I have little
> confidence in it being right.

Yea, I'd really hope this gets submitted *early* in the v12 cycle rather
than leaving it to the end.  I'd even be willing to produce a prototype
of what I think should be changed in the parse-analysis & executor
stages, if necessary.

Greetings,

Andres Freund



Re: pgsql: New files for MERGE

2018-04-07 Thread Tom Lane
Simon Riggs  writes:
> On 6 April 2018 at 17:22, Bruce Momjian  wrote:
>> My point was that people didn't ask you to work harder on fixing the
>> patch, but in reverting it.  You can work harder on fixing things in the
>> hope they change their minds, but again, that isn't addressing their
>> request.

> If Tom or Andres still feel that their concerns have not been
> addressed over the last few days, I am happy to revert the patch with
> no further discussion from me in this cycle.

FWIW, I still vote to revert.  Even if the patch were now perfect,
there is not time for people to satisfy themselves of that, and
we've got lots of other things on our plates.

I'd be glad to participate in a proper review of this when v12
opens.  But right now it just seems too rushed, and I have little
confidence in it being right.

regards, tom lane

PS: If you do revert, please wrap it up as a single revert commit,
not a series of half a dozen.  You've already put several
non-buildable states into the commit history as a result of this
patch, each one of which is a land mine for git bisect testing.
We don't need more of those.  Also, looking at the reverse of the
reversion commit will provide a handy way of seeing the starting
point for future discussion of this patch.



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Peter Geoghegan
On Sat, Apr 7, 2018 at 5:48 AM, Teodor Sigaev  wrote:
> On close look, bts_btentry.ip_posid is not used anymore, I change
> bts_btentry type to BlockNumber. As result, BTEntrySame() is removed.

That seems like a good idea.

> I'm not very happy with massive usage of
> ItemPointerGetBlockNumberNoCheck(&(itup->t_tid)), suggest to  wrap it to
> macro something like this:
> #define BTreeInnerTupleGetDownLink(itup) \
> ItemPointerGetBlockNumberNoCheck(&(itup->t_tid))

Agreed. We do that with GIN.

-- 
Peter Geoghegan



Re: pgsql: New files for MERGE

2018-04-07 Thread Bruce Momjian
On Sat, Apr  7, 2018 at 06:19:19PM +0100, Simon Riggs wrote:
> Now I can see some people are annoyed, so I'm happy to apologize if
> I've done things that weren't understood or caused annoyance. Time is
> short at end of CF and tempers fray for us all.
> 
> If Tom or Andres still feel that their concerns have not been
> addressed over the last few days, I am happy to revert the patch with
> no further discussion from me in this cycle.

Yep, I think the right thing, as you have just done, is to ask for clear
confirmation on revert, and accept whatever people say.  I would revert
if anyone requests it.

To summarize how I see this patch, we have this workflow at the top of
the TODO list (which I think Simon helped with or suggested):

Desirability -> Design -> Implement -> Test -> Review -> Commit

I think the MERGE patch spent a long time getting through the first and
second stages, and now the objections appear to be related to
implementation.  I think the implementation issues only appeared during the
final commitfest, which made it feel like a new patch.  Yes, it had been
through the first two stages before the final commitfest.

I think one of the missing rules we have is that when we say no new
large patches in the final commitfest, do we mean that all _three_
stages should be solidified before the final commitfest?  I have never
been clear on that point.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: pgsql: New files for MERGE

2018-04-07 Thread Simon Riggs
On 6 April 2018 at 17:22, Bruce Momjian  wrote:
> On Fri, Apr  6, 2018 at 09:21:54AM +0100, Simon Riggs wrote:
>> On 5 April 2018 at 21:02, Bruce Momjian  wrote:
>> > Simon, you have three committers in this thread suggesting this patch be
>> > reverted.  Are you just going to barrel ahead with the fixes without
>> > addressing their emails?
>>
>> PeterG confirms that the patch works and has the agreed concurrency
>> semantics. Separating out the code allows us to see clearly that we
>> have almost complete test coverage of the code and its features.
>
> My point was that people didn't ask you to work harder on fixing the
> patch, but in reverting it.  You can work harder on fixing things in the
> hope they change their minds, but again, that isn't addressing their
> request.

I had understood Tom's post to be raising a discussion about whether to revert.

In my understanding, Tom's complaints were addressed quickly, against
what he expected - I was surprised myself at how quickly Pavan was
able to address them.

That left Andres' complaints, which as I explained hadn't been given
in any detail. Once given, Pavan investigated Andres' complaints and
explained in detail the results of his work and that he doesn't see
how the proposed changes would improve anything.

If there was anything unresolvable before commit I wouldn't have
committed it, or unresolvable after commit I would already have
reverted it.

And as I explained, this commit was not rushed and various other
negative points made are unfortunately not correct.

Now I can see some people are annoyed, so I'm happy to apologize if
I've done things that weren't understood or caused annoyance. Time is
short at end of CF and tempers fray for us all.

If Tom or Andres still feel that their concerns have not been
addressed over the last few days, I am happy to revert the patch with
no further discussion from me in this cycle.

If request to revert occurs, I propose to do this on Wed 11 pm, to
avoid getting in the way of other commits and because it will take
some time to prepare to revert.

Thanks to everyone for review comments and well done to Pavan,
whatever we decide.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Configurable file mode mask

2018-04-07 Thread David Steele
Hi Stephen,

On 4/6/18 10:22 PM, Stephen Frost wrote:
> 
> * David Steele (da...@pgmasters.net) wrote:
>> On 4/6/18 6:04 PM, David Steele wrote:
>>> On 4/6/18 3:02 PM, Stephen Frost wrote:

 - Further discussion in the commit messages
>>>
>>> Agreed, these need some more work.  I'm happy to do that but I'll need a
>>> bit more time.  Have a look at the new patches and I'll work on some
>>> better messages.
>>
>> I'm sure you'll want to reword some things, but I think these commit
>> messages capture the essential changes for each patch.
> 
> Thanks much.  I've taken (most) of these, adjusting a few bits here and
> there.
> 
> I've been back over the patch again, mostly improving the commit
> messages, comments, and docs.  I also looked over the code and tests
> again and they're looking pretty good to me, so I'll be looking to
> commit this tomorrow afternoon or so US/Eastern.

OK, one last review.  I did't make any code changes, but I improved some
comments, added documentation and fixed a test.

01) Refactor dir/file permissions

* Small comment update, replace "such cases-" with "such cases --".

02) Allow group access on PGDATA

* Add data_directory_mode guc to "Preset Options" documentation.
* Add a section to the documentation that discusses changing the
permissions of an existing cluster.
* Improved comment on checkControlFile().
* Comment wordsmithing for SetDataDirectoryCreatePerm() and
RetrieveDataDirCreatePerm().
* Fixed ordering of -g option in initdb documentation.
* Fixed a new test that was "broken" by 032429701e477.  It was broken
before but the rmtrees added in 032429701e477 exposed the problem.

Attached are the v19 patches.

Thanks!
-- 
-David
da...@pgmasters.net
From d97495e81bb59beb03941398d7746a22b55d48ec Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Sat, 7 Apr 2018 09:49:15 -0400
Subject: [PATCH 1/2] Refactor dir/file permissions

Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).

Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).

Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.

Authors: David Steele ,
 Adam Brightwell 
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: 
https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
---
 src/backend/access/transam/xlog.c|  2 +-
 src/backend/commands/tablespace.c| 18 ---
 src/backend/postmaster/postmaster.c  |  7 +--
 src/backend/postmaster/syslogger.c   |  5 +-
 src/backend/replication/basebackup.c |  5 +-
 src/backend/replication/slot.c   |  5 +-
 src/backend/storage/file/copydir.c   |  2 +-
 src/backend/storage/file/fd.c| 51 +--
 src/backend/storage/ipc/dsm_impl.c   |  3 +-
 src/backend/storage/ipc/ipc.c|  4 ++
 src/backend/utils/init/miscinit.c|  5 +-
 src/bin/initdb/initdb.c  | 24 -
 src/bin/initdb/t/001_initdb.pl   | 11 -
 src/bin/pg_basebackup/pg_basebackup.c|  9 ++--
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 14 +-
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 14 +-
 src/bin/pg_basebackup/walmethods.c   |  6 ++-
 src/bin/pg_ctl/pg_ctl.c  |  4 +-
 src/bin/pg_ctl/t/001_start_stop.pl   | 18 ++-
 src/bin/pg_resetwal/pg_resetwal.c|  5 +-
 src/bin/pg_resetwal/t/001_basic.pl   | 12 -
 src/bin/pg_rewind/RewindTest.pm  |  4 ++
 src/bin/pg_rewind/file_ops.c |  7 +--
 src/bin/pg_rewind/t/001_basic.pl | 11 -
 src/bin/pg_upgrade/file.c|  5 +-
 src/bin/pg_upgrade/pg_upgrade.c  |  3 +-
 src/bin/pg_upgrade/test.sh   | 11 +
 src/common/Makefile  |  4 +-
 src/common/file_perm.c   | 19 
 src/include/common/file_perm.h   | 34 +
 src/include/storage/fd.h |  3 ++
 src/test/perl/PostgresNode.pm|  3 ++
 src/test/perl/TestLib.pm | 73 
 src/tools/msvc/Mkvcbuild.pm  |  4 +-
 34 files changed, 330 insertions(+), 75 deletions(-)
 create mode 100644 src/common/file_perm.c
 create mode 100644 src/include/common/file_perm.h

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 813b2afaac..4a47395174 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4107,7 +4107,7 @@ ValidateXLOGDirectoryStructure(void

Re: Removing useless DISTINCT clauses

2018-04-07 Thread Tom Lane
David Rowley  writes:
> It's certainly possible to do more here.  I'm uncertain what needs to
> be done in regards to cached plan invalidation, but we'd certainly
> need to ensure cached plans are invalidated whenever the attnotnull is
> changed.

They already are; any change at all in a pg_class or pg_attribute
row results in a relcache inval for the associated table, cf
CacheInvalidateHeapTuple.  As far as the invalidation machinery is
concerned, changing attnotnull is indistinguishable from changing
atttypid, which I'm sure you'll agree must force plan regeneration.

So this line of thought leads to the conclusion that we were too
conservative in not allowing unique constraints to be used along
with actual pkeys in remove_useless_groupby_columns.  We do have
the additional requirement of checking attnotnull, but I think
we can assume that a plan inval will occur if that changes.

In fact, now that I think about it, I wonder why we're insisting
on tracking the constraint OID either.  Don't index drops force
relcache invals regardless?  If not, how do we ensure that an
indexscan plan relying on that index gets redone?

Part of this probably comes from looking at the parser's
check_ungrouped_columns(), but that is operating under different
constraints, since it is generating what might be a long-lived
parse tree, eg something that gets stored as a view.  We do need
to be able to register an actual dependency on the uniqueness
constraint in that situation.  But plans get tossed on the basis
of relcache invals, so I think they don't need it.

Anyway, that's clearly a future improvement rather than something
I'd insist on this patch tackling immediately.  My gripe about
this as it stands is mainly that it seems like it's going to do
a lot of catalog-lookup work twice over, at least in cases that
have both DISTINCT and GROUP BY --- or is that too small a set to
worry about?

regards, tom lane



Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Tom Lane
Teodor Sigaev  writes:
>> I dunno, how would you estimate whether this is actually a win or not?
>> I don't think our model of sort costs is anywhere near refined enough
>> or accurate enough to reliably predict whether this is better than
>> just doing it in one step.  Even if the cost model is good, it's not
>> going to be better than our statistics about the number/size of the
>> groups in the first column(s), and that's a notoriously unreliable stat.

> I think that improvement in cost calculation of sort should be a 
> separate patch, not directly connected to this one. Postpone patches 
> till other part will be ready to get max improvement for postponed ones 
> doesn't seem to me very good, especially if it suggests some improvement 
> right now.

No, you misunderstand the point of my argument.  Without a reasonably
reliable cost model, this patch could easily make performance *worse*
not better for many people, due to choosing incremental-sort plans
where they were really a loss.

If we were at the start of a development cycle and work were being
promised to be done later in the cycle to improve the planning aspect,
I'd be more charitable about it.  But this isn't merely the end of a
cycle, it's the *last day*.  Now is not the time to commit stuff that
needs, or even just might need, follow-on work.

regards, tom lane



Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Alexander Korotkov
On Sat, Apr 7, 2018 at 5:37 PM, Teodor Sigaev  wrote:

> by this patch.  Revised version is attached.
>>
>
> Fine, patch got several rounds of review in all its parts. Is any places
> which should be improved before commit?


Also I found that after planner changes of Alexander Kuzmenkov, incremental
sort
was used in cheapest_input_path() only if its child is cheapest total path.
That makes incremental sort to not get used almost never.
I've changed that to consider incremental sort path when we have some
presorted columns.  I also have to put changes in postgres_fdw regression
tests back, because incremental sort was used right there.

This revision of the patch also includes commit message.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


incremental-sort-26.patch
Description: Binary data


Re: Online enabling of checksums

2018-04-07 Thread Andres Freund
Hi,

On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:
> Note however that I'm sans-laptop until Sunday, so I will revert it then or
> possibly Monday.

I'll deactive the isolationtester tests until then. They've been
intermittently broken for days now and prevent other tests from being
exercised.

Greetings,

Andres Freund



Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Teodor Sigaev

I dunno, how would you estimate whether this is actually a win or not?
I don't think our model of sort costs is anywhere near refined enough
or accurate enough to reliably predict whether this is better than
just doing it in one step.  Even if the cost model is good, it's not
going to be better than our statistics about the number/size of the
groups in the first column(s), and that's a notoriously unreliable stat.
I think that improvement in cost calculation of sort should be a 
separate patch, not directly connected to this one. Postpone patches 
till other part will be ready to get max improvement for postponed ones 
doesn't seem to me very good, especially if it suggests some improvement 
right now.



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: Bring atomic flag fallback up to snuff

2018-04-07 Thread Andres Freund
Hi,

On 2018-04-07 12:57:53 +, Magnus Hagander wrote:
> This is the same one you already committed right? Not a further one on top
> of that?

Yes, I pushed a few hours later.


> That said,+1 for not bothering to  back patch it.

I did ;). I was more wondering about whether to backpatch the ABI break
or not, and decided it doesn't matter because all th broken cases were
broken already. And nobody uses the fallback "naturally" anyway.



- Andres



Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Andres Freund
On 2018-04-07 12:06:52 -0400, Tom Lane wrote:
> Alexander Korotkov  writes:
> > On Wed, Mar 28, 2018 at 6:38 PM, Alvaro Herrera 
> > wrote:
> >> Can we have a recap on what the patch *does*?
> 
> > Ggeneral idea hasn't been changed much since first email.
> > Incremental sort gives benefit when you need to sort your dataset
> > by some list of columns while you alredy have input presorted
> > by some prefix of that list of columns.  Then you don't do full sort
> > of dataset, but rather sort groups where values of prefix columns
> > are equal (see header comment in nodeIncremenalSort.c).
> 
> I dunno, how would you estimate whether this is actually a win or not?
> I don't think our model of sort costs is anywhere near refined enough
> or accurate enough to reliably predict whether this is better than
> just doing it in one step.  Even if the cost model is good, it's not
> going to be better than our statistics about the number/size of the
> groups in the first column(s), and that's a notoriously unreliable stat.
> 
> Given that we already have more than enough dubious patches that have
> been shoved in in the last few days, I'd rather not pile on stuff that
> there's any question about.

I don't disagree with any of that. Just wanted to pipe up to say that
there's a fair argument to be made that this patch, which has lingered
for years, "deserves" more to mature in tree than some of the rest.

Greetings,

Andres Freund



Re: json(b)_to_tsvector with numeric values

2018-04-07 Thread Dmitry Dolgov
> On 7 April 2018 at 17:09, Teodor Sigaev  wrote:
>>> See workable sketch for parsing jsonb flags and new worker variant.
>>
>>
>> Yep, thanks for the sketch. Here is the new version of patch, does it look
>> close to what you have in mind?
>
>
> Patch looks good except error messaging, you took it directly from sketch
> where I didn't spend time for it. Please, improve. elog() should be used
> only for impossible error, whereas user input could contins mistakes.

I assume what you mean is that for user input errors we need to use ereport.
Indeed, thanks for noticing. I've replaced all elog except the last one, since
it actually describes an impossible situation, when we started to read an
array, but ended up having something else instead WJB_END_ARRAY.


jsonb_to_tsvector_numeric_v5.patch
Description: Binary data


Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Tom Lane
Alexander Korotkov  writes:
> On Wed, Mar 28, 2018 at 6:38 PM, Alvaro Herrera 
> wrote:
>> Can we have a recap on what the patch *does*?

> Ggeneral idea hasn't been changed much since first email.
> Incremental sort gives benefit when you need to sort your dataset
> by some list of columns while you alredy have input presorted
> by some prefix of that list of columns.  Then you don't do full sort
> of dataset, but rather sort groups where values of prefix columns
> are equal (see header comment in nodeIncremenalSort.c).

I dunno, how would you estimate whether this is actually a win or not?
I don't think our model of sort costs is anywhere near refined enough
or accurate enough to reliably predict whether this is better than
just doing it in one step.  Even if the cost model is good, it's not
going to be better than our statistics about the number/size of the
groups in the first column(s), and that's a notoriously unreliable stat.

Given that we already have more than enough dubious patches that have
been shoved in in the last few days, I'd rather not pile on stuff that
there's any question about.

regards, tom lane



Re: [HACKERS] pgbench - allow to store select results into variables

2018-04-07 Thread Fabien COELHO


Hello Stephen,


Here's that review.


Thanks for the review.


   
+   
+
+ \cset [prefix] or
+ \gset [prefix]
+


Seems like this should really be moved down below the section for
'\set'.


Dunno.

I put them there because it is in alphabetical order (for cset at least) 
and because "set" documentation is heavy about expressions (operators, 
functions, constants, ...) which have nothing to do with cset/gset, so I 
felt that having them clearly separated and in abc order was best.



-   char   *line;   /* text of command line */
+   char   *line;   /* first line for short display 
*/
+   char   *lines;  /* full multi-line text of 
command */


Not really a fan of such closely-named variables...  Maybe first_line
instead?


Indeed, looks better.


+   case PGRES_TUPLES_OK:
+   if (gset[compound] != NULL)


Probably be good to add some comments here, eh:



/*
* The results are intentionally thrown away if we aren't under a gset
* call.
*/


I added a (shorter) comment.


+   if (ntuples != 1)
+   {
+   fprintf(stderr,
+   "client %d file %d 
command %d compound %d: "
+   "expecting one row, 
got %d\n",
+   st->id, 
st->use_file, st->command, compound, ntuples);
+   st->ecnt++;
+   PQclear(res);
+   discard_response(st);
+   return false;
+   }


Might be interesting to support mutli-row (or no row?) returns in the
future, but not something this patch needs to do, so this looks fine to
me.


It could match PL/pgSQL's INTO, that is first row or NULL if none.


+
+   /* store result as a string */
+   if (!putVariable(st, "gset", 
varname,
+   
 PQgetvalue(res, 0, f)))
+   {
+   /* internal error, 
should it rather abort? */


I'm a bit on the fence about if we should abort in this case or not.  A
failure here seems likely to happen consistently (whereas the ntuples
case might be a fluke of some kind), which tends to make me think we
should abort, but still thinking about it.


I'm also still thinking about it:-) For me it is an abort, but there is 
this whole "return false" internal error management in pgbench the purpose 
of which fails me a little bit, so I stick to that anyway.



+   if (*gset[compound] != '\0')
+   free(varname);


A comment here, and above where we're assigning the result of the
psprintf(), to varname probably wouldn't hurt, explaining that the
variable is sometimes pointing into the query result structure and
sometimes not...


I added two comments to avoid a malloc/free when there are no prefixes. 
ISTM that although it might be a border-line over-optimization case, it is 
a short one, the free is a few lines away, so it might be ok.



+   /* read and discard the query results */


That comment doesn't feel quite right now. ;)


Indeed. Changed with "store or discard".



-   /* We don't want to scribble on cmd->argv[0] until done */
-   sql = pg_strdup(cmd->argv[0]);
+   sql = pg_strdup(cmd->lines);


The function-header comment for parseQuery() could really stand to be
improved.


Indeed.


+   /* merge gset variants into preceeding SQL 
command */
+   if (pg_strcasecmp(bs_cmd, "gset") == 0 ||
+   pg_strcasecmp(bs_cmd, "cset") == 0)
+   {
+"\\gset 
cannot start a script",
+"\\gset 
must follow a SQL command",
+"\\gset 
cannot follow one another",


These errors should probably be '\\gset and \\cset' or similar, no?
Since we fall into this for both..


Indeed.

Attached an improved version, mostly comments and error message fixes.

I have not changed the 1 row constraint for now. Could be an later 
extension.


If this patch get through, might be handy for simplifying tests in
current pgbench submissions, especially t

Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Alexander Korotkov
On Wed, Mar 28, 2018 at 6:38 PM, Alvaro Herrera 
wrote:

> Teodor Sigaev wrote:
> > > BTW, patch had conflicts with master.  Please, find rebased version
> attached.
> >
> > Despite by patch conflist patch looks commitable, has anybody objections
> to
> > commit it?
> >
> > Patch recieved several rounds of review during 2 years, and seems to me,
> > keeping it out from sources may cause a lost it. Although it suggests
> > performance improvement in rather wide usecases.
>
> Can we have a recap on what the patch *does*?  I see there's a
> description in Alexander's first email
> https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=
> V7wgDaTXdDd9=gon-...@mail.gmail.com
> but that was a long time ago, and the patch has likely changed in the
> meantime ...
>

Ggeneral idea hasn't been changed much since first email.
Incremental sort gives benefit when you need to sort your dataset
by some list of columns while you alredy have input presorted
by some prefix of that list of columns.  Then you don't do full sort
of dataset, but rather sort groups where values of prefix columns
are equal (see header comment in nodeIncremenalSort.c).

Same example as in the first letter works, but plan displays
differently.

create table test as (select id, (random()*1)::int as v1, random() as
v2 from generate_series(1,100) id);
create index test_v1_idx on test (v1);

# explain select * from test order by v1, v2 limit 10;
  QUERY PLAN
---
 Limit  (cost=1.26..1.26 rows=10 width=16)
   ->  Incremental Sort  (cost=1.26..1.42 rows=100 width=16)
 Sort Key: v1, v2
 Presorted Key: v1
 ->  Index Scan using test_v1_idx on test  (cost=0.42..47602.50
rows=100 width=16)
(5 rows)

# select * from test order by v1, v2 limit 10;
   id   | v1 | v2
++
 216426 |  0 | 0.0697950166650116
  96649 |  0 |  0.230586454737931
 892243 |  0 |  0.677791305817664
 323001 |  0 |  0.708638620562851
  87458 |  0 |  0.923310813494027
 224291 |  0 |0.9349579163827
 446366 |  0 |  0.984529701061547
 376781 |  0 |  0.997424073051661
 768246 |  1 |  0.127851997036487
 666102 |  1 |   0.27093240711838
(10 rows)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] path toward faster partition pruning

2018-04-07 Thread Andres Freund
On 2018-04-07 08:13:23 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > I've also attempted to fix rhinoceros's failure I remarked upon a couple
> > hours ago in
> > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de
> 
> And this, too.  I was unsure if this was because we were missing calling
> some object access hook from the new code, or the additional pruning.

That's possible.  I did attempt to skim the code, that's where my
complain about the docs originated. There certainly isn't an
InvokeFunctionExecuteHook() present.  It's not clear to me whether
that's an issue - we don't invoke the hooks in a significant number of
places either, and as far as I can discern there's not much rule or
reason about where we invoke it.

Greetings,

Andres Freund



Re: Online enabling of checksums

2018-04-07 Thread Andres Freund
Hi,

On 2018-04-07 10:14:49 +0200, Michael Banck wrote:
> Can the pg_verify_checksums command be kept at least, please? 
> 
> AFAICT this one is not contentious, the code is isolated, it's really
> useful, orthogonal to online checksum activation and argueably could've
> been committed as a separate patch anyway.

I've not looked at it in any meaningful amount of detail, but it does
seem a lot lower risk from here.

Greetings,

Andres Freund



Re: Boolean partitions syntax

2018-04-07 Thread Jonathan S. Katz

> On Mar 21, 2018, at 10:59 PM, Amit Langote  
> wrote:
> 
> Hi David.
> 
> On 2018/03/21 23:31, David Steele wrote:
>> Hi Amit,
>> 
>> On 3/6/18 9:44 AM, David Steele wrote:
>>> On 3/2/18 2:27 AM, Amit Langote wrote:
 On 2018/03/02 15:58, Andres Freund wrote:
> On 2018-02-02 17:00:24 -0500, Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> There might be other options, but one way to solve this would be to
>>> treat partition bounds as a general expression in the grammar and then
>>> check in post-parse analysis that it's a constant.
>> 
>> That's pretty much what I said upthread.  What I basically don't like
>> about the current setup is that it's assuming that the bound item is
>> a bare literal.  Even disregarding future-extension issues, that's bad
>> because it can't result in an error message smarter than "syntax error"
>> when someone tries the rather natural thing of writing a more complicated
>> expression.
> 
> Given the current state of this patch, with a number of senior
> developers disagreeing with the design, and the last CF being in
> progress, I think we should mark this as returned with feedback.
 
 I see no problem with pursuing this in the next CF if the consensus is
 that we should fix how partition bounds are parsed, instead of adopting
 one of the patches to allow the Boolean literals to be accepted as
 partition bounds.
>>> 
>>> I'm inclined to mark this patch Returned with Feedback unless I hear
>>> opinions to the contrary.
>> 
>> Hearing no opinions to the contrary I have marked this entry Returned
>> with Feedback.  Please resubmit when you have an updated patch.
> 
> OK.
> 
> Btw, there is an 11dev open item recently added to the wiki that's related
> to this, but I think we might be able to deal with it independently of
> this proposal.
> 
> * Partitions with bool partition keys *
> https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items#Open_Issues 
> 

While testing the new partitioning features yesterday I got bit by this issue
when creating a common use-case: a table split up by active/archived records:

CREATE TABLE records (
id int GENERATED BY DEFAULT AS IDENTITY NOT NULL,
record_date date NOT NULL,
record_text text,
archived bool NOT NULL DEFAULT FALSE
) PARTITION BY LIST(archived);

CREATE TABLE records_archive
PARTITION OF records
FOR VALUES IN (TRUE);

The last line yielding:

ERROR:  syntax error at or near "TRUE"
LINE 3: FOR VALUES IN (TRUE);

[Omitted from example: the “records_active” partition]

I’m glad to see this was added to the open items. I would strongly suggest 
fixing
this prior to the 11 release as it is unintuitive from a user standpoint to use 
‘TRUE’

Thanks,

Jonathan



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-04-07 Thread Alvaro Herrera
amul sul wrote:
> On Sat, Apr 7, 2018 at 9:08 AM, Andres Freund  wrote:

> >> +test: partition-key-update-1
> >> +test: partition-key-update-2
> >> +test: partition-key-update-3
> >
> > Can you give these more descriptive names please (or further combine them)?
> 
> As I explained above further combining might not the good option and about
> the descriptive names I have following suggestions but I am afraid of
> the length of
> test case name:
> 
> +test: concurrent-partition-key-update.out
> 
> This test does the serialization failure check.
> 
> +test: concurrent-partition-key-update-and-insert-conflict-do-nothing-1
> +test: concurrent-partition-key-update-and-insert-conflict-do-nothing-2

Yikes.  I'd rather have the original name, and each test's purpose
stated in a comment in the spec file itself.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: json(b)_to_tsvector with numeric values

2018-04-07 Thread Teodor Sigaev

See workable sketch for parsing jsonb flags and new worker variant.


Yep, thanks for the sketch. Here is the new version of patch, does it look
close to what you have in mind?


Patch looks good except error messaging, you took it directly from 
sketch where I didn't spend time for it. Please, improve. elog() should 
be used only for impossible error, whereas user input could contins 
mistakes.



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Tomas Vondra
On 04/07/2018 04:37 PM, Teodor Sigaev wrote:
>> by this patch.  Revised version is attached.
> 
> Fine, patch got several rounds of review in all its parts. Is any
> places which should be improved before commit?
> 

I personally feel rather uneasy about committing it, TBH.

While I don't see any obvious issues in the patch at the moment, the
recent changes were rather significant so I might easily miss some
unexpected consequences. (OTOH it's true it was mostly about reduction
of scope, to limit the risks.)

I don't have time to do more review and testing on the latest patch
version, unfortunately, certainly not before the CF end.


So I guess the ultimate review / decision is up to you ...

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: json(b)_to_tsvector with numeric values

2018-04-07 Thread Dmitry Dolgov
> On 6 April 2018 at 18:55, Teodor Sigaev  wrote:
>
>
> Dmitry Dolgov wrote:
>>>
>>> On 6 April 2018 at 16:25, Teodor Sigaev  wrote:
>>> 1) I don't like jsonb_all_to_tsvector too.. What if we will accept new
>>> variant to index? Let me suggest:
>>>
>>> tsvector jsonb_to_tsvector([regclass,] jsonb, text[])
>>>
>>> where text[] arg is actually a flags, array contains any combination of
>>> literals 'numeric', 'string', 'boolean' (and even 'key' to index keys_ to
>>> point which types should be indexed. More than it, may be, it should a
>>> jsonb
>>> type for possible improvements in future. For now, it shouldbe a jsonb
>>> array
>>> type with string elements described above, example:
>>>
>>> select jsonb_to_tsvector('{"a": "aaa in bbb ddd ccc", "b":123}',
>>>  '["numeric", "boolean"]');
>>>
>>>
>>> Form jsonb_to_tsvector('...', '["string"]) is effectively the same as
>>> current to_tsvector(jsonb)
>>
>>
>> Thank you for the suggestion, this sounds appealing. But I have two
>> questions:
>>
>> * why it should be a jsonb array, not a regular array?
>
> To simplify extension of this array in future, for example to add limitation
> of recursion level in converted jsonb, etc.
>
>
>>
>> * it would introduce the idea of jsonb element type expressed in text
>> format,
>>so "string", "numeric", "boolean" etc - are there any consequences of
>> that?
>>As far as I understand so far there was only jsonb_typeof.
>
> Good catch, jsonb_typeof strings are okay: "string", "number", "boolean"
> also  "all", "key", "value"
>
> See workable sketch for parsing jsonb flags and new worker variant.

Yep, thanks for the sketch. Here is the new version of patch, does it look
close to what you have in mind?


jsonb_to_tsvector_numeric_v4.patch
Description: Binary data


Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-04-07 Thread amul sul
On Sat, Apr 7, 2018 at 9:08 AM, Andres Freund  wrote:
> Hi Tom, All,
>
> On 2018-04-06 14:19:02 +0530, amul sul wrote:
>> Thanks for the reminder -- fixed in the attached version.
>
> Tom, this seems to be the best approach for fixing the visibility issues
> around this. I've spent a good chunk of time looking at corruption
> issues like the ones you feared (see [1]) and I'm not particularly
> concerned.  I'm currently planning to go ahead with this, do you want to
> "veto" that (informally, not formally)?
>
> I'll go through this again tomorrow morning.
>
> [1] https://postgr.es/m/20180405014439.fbezvbjrmcw64...@alap3.anarazel.de
>
>
>> v9:
>>  Its the rebase version of Andres Freund patch v8[1] with the
>>  following additional changes:
>>  3. Argument changing_part of heap_delete renamed to ChangingPart to be
>> consistent with ExecDelete
>
> FWIW, I'd left it as it was before because the two functions have a bit
> different coding style, and the capitalization seemed more fitting in
> the surrounding context.
>
>> +test: partition-key-update-1
>> +test: partition-key-update-2
>> +test: partition-key-update-3
>
> Can you give these more descriptive names please (or further combine them)?
>

As I explained above further combining might not the good option and about
the descriptive names I have following suggestions but I am afraid of
the length of
test case name:

+test: concurrent-partition-key-update.out

This test does the serialization failure check.

+test: concurrent-partition-key-update-and-insert-conflict-do-nothing-1
+test: concurrent-partition-key-update-and-insert-conflict-do-nothing-2

Both are testing partition key update behaviour with the insert on
conflict do nothing case.

Attached is the patch does the renaming of this tests -- need to apply
to the top of  v10 patch[1].

Regards,
Amul

1]  
https://postgr.es/m/CAAJ_b94X5Y_zdTN=BGdZie+hM4p6qW70-XCJhFYaCUO0OfF=a...@mail.gmail.com


v10-0002-Rename-isolation-test-name.patch
Description: Binary data


Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Teodor Sigaev

by this patch.  Revised version is attached.


Fine, patch got several rounds of review in all its parts. Is any places 
which should be improved before commit?


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: [HACKERS] Parallel Append implementation

2018-04-07 Thread Adrien Nayrat
Hello,

I notice Parallel append is not listed on Parallel Plans documentation :
https://www.postgresql.org/docs/devel/static/parallel-plans.html

If you agree I can add it to Open Items.

Thanks,

-- 
Adrien NAYRAT




signature.asc
Description: OpenPGP digital signature


Re: WIP: a way forward on bootstrap data

2018-04-07 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > John Naylor wrote:
> >> Commit 9fdb675fc added a symbol to pg_opfamily.h
> >> where there were none before, so I went ahead and wrapped it with an
> >> EXPOSE_TO_CLIENT_CODE macro.
> 
> > Actually, after pushing that, I was thinking maybe it's better to remove
> > that #define from there and put it in each of the two .c files that need
> > it.  I don't think it makes sense to expose this macro any further, and
> > before that commit it was localized to a single file.
> 
> We're speaking of IsBooleanOpfamily, right?

Yeah, that's the one.

> Think I'd leave it where it is.  As soon as you have more than one
> place using a macro like that, there's room for maintenance mistakes.

Yeah, that's why I thought it'd be better to have it somewhere central
(the originally submitted patch just added it to partprune.c).

> Anyway, now that John and I have each (separately) rebased the bootstrap
> patch over that, I'd appreciate it if you hold off cosmetic refactoring
> till said patch goes in, which I expect to do in ~ 24 hours.

Understood.  I'm going over David Rowley's runtime pruning patch now
(doesn't touch any catalog files), which I intend to be my last
functional commit this cycle, and won't be doing any other commits till
after bootstrap rework has landed.  (As I mentioned elsewhere, I intend
to propose some restructuring of partitioning code, without any
functional changes, during next week.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2018-04-07 Thread Teodor Sigaev

Thanks to everyone, pushed

Alexander Korotkov wrote:
On Fri, Mar 2, 2018 at 6:57 AM, Thomas Munro 
mailto:thomas.mu...@enterprisedb.com>> 
wrote:


My thought experiments about pseudo-pages and avoiding the split stuff
were not intended to get the patch kicked out.  I thought for a while
that hash indexes were a special case and could benefit from
dispensing with those trickier problems.  Upon further reflection, for
interesting size hash indexes pure hash value predicate tags wouldn't
be much better.  Furthermore, if we do decide we want to use using x %
max_predicate_locks_per_relation to avoid having to escalate to
relation predicate locks at the cost of slightly higher collision rate
then we should consider that for the whole system (including heap page
predicate locking), not just hash indexes.  Please consider those
ideas parked for now.


OK.  While our potential pseudo-pages are identified as
"hash_value % some_constant_modulus", real bucket pages are very roughly
identified as "hash_value % number_of_index_pages".  So, page number is
adoptive to index size, despite it costs us handling page split. In the 
same way,

locking in other index access methods is adoptive to an index size, so
that should be considered as useful feature which should be present in 
hash index

as well.

--
Alexander Korotkov
Postgres Professional:http://www.postgrespro.com 


The Russian Postgres Company


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-04-07 Thread Bruce Momjian
On Sun, Apr  1, 2018 at 06:07:55PM +0200, Tomas Vondra wrote:
> Hi,
> 
> The attached patch version modifies how the non-MCV selectivity is
> computed, along the lines explained in the previous message.
> 
> The comments in statext_clauselist_selectivity() explain it in far more
> detail, but we this:

Uh, where are we on this patch?  It isn't going to make it into PG 11? 
Feature development for this has been going on for years.  I thought
when Dean Rasheed got involved that it would be applied for this
release.

I realize this is a complex feature, but I think it will solve 80-90% of
optimizer complaints, and I don't see any other way to fix them than
this.  This seems like the right way to fix optimizer problems, instead
of optimizer hints.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Alexander Korotkov
On Sat, Apr 7, 2018 at 4:56 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Apr 6, 2018 at 11:40 PM, Alexander Kuzmenkov <
> a.kuzmen...@postgrespro.ru> wrote:
>
>> On 06.04.2018 20:26, Tomas Vondra wrote:
>>
>>> I personally am OK with reducing the scope of the patch like this. It's
>>> still beneficial for the common ORDER BY + LIMIT case, which is good. I
>>> don't think it may negatively affect other cases (at least I can't think
>>> of any).
>>>
>>
>> I think we can reduce it even further. Just try incremental sort along
>> with full sort over the cheapest path in create_ordered_paths, and don't
>> touch anything else. This is a very minimal and a probably safe start, and
>> then we can continue working on other, more complex cases. In the attached
>> patch I tried to do this. We probably should also remove changes in
>> make_sort() and create a separate function make_incremental_sort() for it,
>> but I'm done for today.
>
>
> I've done further unwedding of sort and incremental sort providing them
> separate function for plan createion.
>
> 2) Likewise, I've suggested that the claim about abbreviated keys in
>>>
>> nodeIncrementalsort.c is dubious. No response, and the XXX comment was
>>> instead merged into the patch:
>>>
>>>   * XXX The claim about abbreviated keys seems rather dubious, IMHO.
>>>
>>
>> Not sure about that, maybe just use abbreviated keys for the first
>> version? Later we can research this more closely and maybe start deciding
>> whether to use abbrev on planning stage.
>
>
> That comes from time when we're trying to make incremental sort to be
> always
> not worse than full sort.  Now, we have separate paths for full and
> incremental sorts,
> and some costing penalty for incremental sort.  So, incremental sort
> should be
> selected only when it's expected to give big win.  Thus, we can give up
> with this
> optimization at least in the initial version.
>
> So, removed.
>
> 4) It's not clear to me why INITIAL_MEMTUPSIZE is defined the way it is.
>>> There needs to be a comment - the intent seems to be making it large
>>> enough to exceed ALLOCSET_SEPARATE_THRESHOLD, but it's not quite clear
>>> why that's a good idea.
>>>
>>
>> Not sure myself, let's ask the other Alexander.
>
>
> I've added comment to INITIAL_MEMTUPSIZE.  However, to be fair it's not
> invention of this patch.  Initial size of memtuples array was so
> previously.
> What this patch did is just move it to the macro.
>
> Also, this note hadn't been adressed yet.
>
> On Sat, Mar 31, 2018 at 11:43 PM, Tomas Vondra  2ndquadrant.com> wrote:
>
>> I'm wondering if a static MIN_GROUP_SIZE is good idea. For example, what
>> if the subplan is expected to return only very few tuples (say, 33), but
>> the query includes LIMIT 1. Now, let's assume the startup/total cost of
>> the subplan is 1 and 100. With MIN_GROUP_SIZE 32 we're bound to
>> execute it pretty much till the end, while we could terminate after the
>> first tuple (if the prefix changes).
>>
>> So I think we should use a Min(limit,MIN_GROUP_SIZE) here, and perhaps
>> this should depend on average group size too.
>>
>
> I agree with that.  For bounded sort, attached patch now selects minimal
> group
> size as Min(DEFAULT_MIN_GROUP_SIZE, bound).  That should improve
> "LIMIT small_number" case.
>

I've just noticed that incremental sort now is not used in
contrib/postgres_fdw.
It's even better assuming that we're going to limit use-cases of incremental
sort.  I've rolled back all the changes made in tests of
contirb/postgres_fdw
by this patch.  Revised version is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


incremental-sort-25.patch
Description: Binary data


Re: [HACKERS] [PATCH] Incremental sort

2018-04-07 Thread Alexander Korotkov
On Fri, Apr 6, 2018 at 11:40 PM, Alexander Kuzmenkov <
a.kuzmen...@postgrespro.ru> wrote:

> On 06.04.2018 20:26, Tomas Vondra wrote:
>
>> I personally am OK with reducing the scope of the patch like this. It's
>> still beneficial for the common ORDER BY + LIMIT case, which is good. I
>> don't think it may negatively affect other cases (at least I can't think
>> of any).
>>
>
> I think we can reduce it even further. Just try incremental sort along
> with full sort over the cheapest path in create_ordered_paths, and don't
> touch anything else. This is a very minimal and a probably safe start, and
> then we can continue working on other, more complex cases. In the attached
> patch I tried to do this. We probably should also remove changes in
> make_sort() and create a separate function make_incremental_sort() for it,
> but I'm done for today.


I've done further unwedding of sort and incremental sort providing them
separate function for plan createion.

2) Likewise, I've suggested that the claim about abbreviated keys in
>>
> nodeIncrementalsort.c is dubious. No response, and the XXX comment was
>> instead merged into the patch:
>>
>>   * XXX The claim about abbreviated keys seems rather dubious, IMHO.
>>
>
> Not sure about that, maybe just use abbreviated keys for the first
> version? Later we can research this more closely and maybe start deciding
> whether to use abbrev on planning stage.


That comes from time when we're trying to make incremental sort to be always
not worse than full sort.  Now, we have separate paths for full and
incremental sorts,
and some costing penalty for incremental sort.  So, incremental sort should
be
selected only when it's expected to give big win.  Thus, we can give up
with this
optimization at least in the initial version.

So, removed.

4) It's not clear to me why INITIAL_MEMTUPSIZE is defined the way it is.
>> There needs to be a comment - the intent seems to be making it large
>> enough to exceed ALLOCSET_SEPARATE_THRESHOLD, but it's not quite clear
>> why that's a good idea.
>>
>
> Not sure myself, let's ask the other Alexander.


I've added comment to INITIAL_MEMTUPSIZE.  However, to be fair it's not
invention of this patch.  Initial size of memtuples array was so previously.
What this patch did is just move it to the macro.

Also, this note hadn't been adressed yet.

On Sat, Mar 31, 2018 at 11:43 PM, Tomas Vondra  wrote:

> I'm wondering if a static MIN_GROUP_SIZE is good idea. For example, what
> if the subplan is expected to return only very few tuples (say, 33), but
> the query includes LIMIT 1. Now, let's assume the startup/total cost of
> the subplan is 1 and 100. With MIN_GROUP_SIZE 32 we're bound to
> execute it pretty much till the end, while we could terminate after the
> first tuple (if the prefix changes).
>
> So I think we should use a Min(limit,MIN_GROUP_SIZE) here, and perhaps
> this should depend on average group size too.
>

I agree with that.  For bounded sort, attached patch now selects minimal
group
size as Min(DEFAULT_MIN_GROUP_SIZE, bound).  That should improve
"LIMIT small_number" case.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


incremental-sort-24.patch
Description: Binary data


Re: WIP: a way forward on bootstrap data

2018-04-07 Thread Tom Lane
Alvaro Herrera  writes:
> John Naylor wrote:
>> Commit 9fdb675fc added a symbol to pg_opfamily.h
>> where there were none before, so I went ahead and wrapped it with an
>> EXPOSE_TO_CLIENT_CODE macro.

> Actually, after pushing that, I was thinking maybe it's better to remove
> that #define from there and put it in each of the two .c files that need
> it.  I don't think it makes sense to expose this macro any further, and
> before that commit it was localized to a single file.

We're speaking of IsBooleanOpfamily, right?  Think I'd leave it where it
is.  As soon as you have more than one place using a macro like that,
there's room for maintenance mistakes.

Now it could also be argued that indxpath.c and partprune.c don't
necessarily have the same idea of "boolean opfamily" anyway, in which case
giving them separate copies might be better.  Not sure about that.

Anyway, now that John and I have each (separately) rebased the bootstrap
patch over that, I'd appreciate it if you hold off cosmetic refactoring
till said patch goes in, which I expect to do in ~ 24 hours.

regards, tom lane



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread Jesper Pedersen

Hi,

On 04/07/2018 04:45 AM, David Rowley wrote:

Ok, so I've gone and done this.

PartitionPruning has become PartitionPruneState
PartitionRelPruning has become PartitionPruningData

I've changed pointers to PartitionPruneStates to be named prunestate,
sometimes having the node prefix; as_, ma_, in these cases prune and
state are separated with a _ which seems to be the general rule for
executor state struct members.

Generally, pointers to PartitionPruningData are now named pprune.
Hopefully, that's ok, as this was the name previously used for
PartitionPruning pointers.

I applied the patch to get rid of as_noop_scan in favour of using a
special as_whichplan value. There was already one special value
(INVALID_SUBPLAN_INDEX), so seemed better to build on that rather than
inventing something new. This also means we don't have to make the
AppendState struct and wider too, which seems like a good thing to try
to do.

I made all the fixups which I mentioned in my review earlier and also
re-removed the resultRelation parameter from make_partition_pruneinfo.
It sneaked back into v22.

v23 is attached.



Passes check-world.

Changing explain.c to "Subplans Removed" as suggested by you in [1] is a 
good idea.


[1] 
https://www.postgresql.org/message-id/CAKJS1f99JnkbOshdV_4zoJZ96DPtKeHMHv43JRL_ZdHRkkVKCA%40mail.gmail.com


Best regards,
 Jesper



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Teodor Sigaev

I didn't like rel.h being included in itup.h.  Do you really need a
Relation as argument to index_truncate_tuple?  It looks to me like you
could pass the tupledesc instead; indnatts could be passed as a separate
argument instead of IndexRelationGetNumberOfAttributes.



Hm, okay, I understand why, will fix by you suggestion

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: csv format for psql

2018-04-07 Thread Daniel Verite
Daniel Verite wrote:

> The output might still differ compared to COPY in that line endings
> depend on the client-side OS. There's also the minor issue
> of a single \ by itself on a line, which gets quoted by COPY
> and not by psql.

I meant \. or backslash followed by period.
This sequence means nothing in particular in csv outside
of COPY, but it means end-of-data for COPY.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread Alvaro Herrera
David Rowley wrote:
> On 8 April 2018 at 01:18, Alvaro Herrera  wrote:
> > Amit had it as "indexes" also in his original.  I wanted to avoid using
> > the "indexes" word alone, whose meaning is so overloaded.
> 
> hmm, good point.
> 
> > How about this?
> > "... which are then executed to produce a set of partitions (as indexes
> > of resultRelInfo->part_rels array) that satisfy the constraints in the
> > step".
> 
> Works for me, but with RelOptInfo rather than resultRelInfo.

Oops, sorry about that.  Pushed now, adding one line to
get_matching_partitions also.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Covering + unique indexes.

2018-04-07 Thread Alvaro Herrera
I didn't like rel.h being included in itup.h.  Do you really need a
Relation as argument to index_truncate_tuple?  It looks to me like you
could pass the tupledesc instead; indnatts could be passed as a separate
argument instead of IndexRelationGetNumberOfAttributes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



  1   2   >