Re: [proposal] recovery_target "latest"

2019-11-04 Thread Kyotaro Horiguchi
Hello.

At Mon, 4 Nov 2019 16:03:38 +0300, Grigory Smolkin  
wrote in 
> Hello, hackers!
> 
> I`d like to propose a new argument for recovery_target parameter,
> which will stand to recovering until all available WAL segments are
> applied.
> 
> Current PostgreSQL recovery default behavior(when no recovery target
> is provided) does exactly that, but there are several shortcomings:
>   - without explicit recovery target standing for default behavior,
> recovery_target_action is not coming to action at the end of recovery
>   - with PG12 changes, the life of all backup tools became very hard,
> because now recovery parameters can be set outside of single config
> file(recovery.conf), so it is impossible to ensure, that default
> recovery behavior, desired in some cases, will not be silently
> overwritten by some recovery parameter forgotten by user.
> 
> Proposed path is very simple and solves the aforementioned problems by
> introducing new argument "latest" for recovery_target parameter.

Does the tool remove or rename recovery.conf to cancel the settings?
And do you intend that the new option is used to override settings by
appending it at the end of postgresql.conf? If so, the commit
f2cbffc7a6 seems to break the assumption. PG12 rejects to start if it
finds two different kinds of recovery target settings.

> Old recovery behavior is still available if no recovery target is
> provided. I`m not sure, whether it should it be left as it is now, or
> not.
> 
> Another open question is what to do with recovery_target_inclusive if
> recovery_target = "latest" is used.

Anyway inclusiveness doesn't affect to "immediate". If we had the
"latest" option, it would behave the same way.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Resume vacuum and autovacuum from interruption and cancellation

2019-11-04 Thread Masahiko Sawada
On Sat, 2 Nov 2019 at 02:10, Robert Haas  wrote:
>
> On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih  wrote:
> > Sounds like an interesting idea, but does it really help? Because if
> > vacuum was interrupted previously, wouldn't it already know the dead
> > tuples, etc in the next run quite quickly, as the VM, FSM is already
> > updated for the page in the previous run.
>
> +1. I don't deny that a patch like this could sometimes save
> something, but it doesn't seem like it would save all that much all
> that often. If your autovacuum runs are being frequently cancelled,
> that's going to be a big problem, I think.

I've observed the case where user wants to cancel a very long running
autovacuum (sometimes for anti-wraparound) for doing DDL or something
maintenance works. If the table is very large autovacuum could take a
long time and they might not reclaim garbage enough.

> And as Rafia says, even
> though you might do a little extra work reclaiming garbage from
> subsequently-modified pages toward the beginning of the table, it
> would be unusual if they'd *all* been modified. Plus, if they've
> recently been modified, they're more likely to be in cache.
>
> I think this patch really needs a test scenario or demonstration of
> some kind to prove that it produces a measurable benefit.

Okay. A simple test could be that we cancel a long running vacuum on a
large table that is being updated and rerun vacuum. And then we see
the garbage on that table. I'll test it.

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: cost based vacuum (parallel)

2019-11-04 Thread Amit Kapila
On Mon, Nov 4, 2019 at 11:42 PM Andres Freund  wrote:
>
>
> > The two approaches to solve this problem being discussed in that
> > thread [1] are as follows:
> > (a) Allow the parallel workers and master backend to have a shared
> > view of vacuum cost related parameters (mainly VacuumCostBalance) and
> > allow each worker to update it and then based on that decide whether
> > it needs to sleep.  Sawada-San has done the POC for this approach.
> > See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> > drawback of this approach could be that we allow the worker to sleep
> > even though the I/O has been performed by some other worker.
>
> I don't understand this drawback.
>

I think the problem could be that the system is not properly throttled
when it is supposed to be.  Let me try by a simple example, say we
have two workers w-1 and w-2.  The w-2 is primarily doing the I/O and
w-1 is doing very less I/O but unfortunately whenever w-1 checks it
finds that cost_limit has exceeded and it goes for sleep, but w-1
still continues.  Now in such a situation even though we have made one
of the workers slept for a required time but ideally the worker which
was doing I/O should have slept.  The aim is to make the system stop
doing I/O whenever the limit has exceeded, so that might not work in
the above situation.

>
> > (b) The other idea could be that we split the I/O among workers
> > something similar to what we do for auto vacuum workers (see
> > autovac_balance_cost).  The basic idea would be that before launching
> > workers, we need to compute the remaining I/O (heap operation would
> > have used something) after which we need to sleep and split it equally
> > across workers.  Here, we are primarily thinking of dividing
> > VacuumCostBalance and VacuumCostLimit parameters.  Once the workers
> > are finished, they need to let master backend know how much I/O they
> > have consumed and then master backend can add it to it's current I/O
> > consumed.  I think we also need to rebalance the cost of remaining
> > workers once some of the worker's exit.  Dilip has prepared a POC
> > patch for this, see 0002-POC-divide-vacuum-cost-limit in email [3].
>
> (b) doesn't strike me as advantageous. It seems quite possible that you
> end up with one worker that has a lot more IO than others, leading to
> unnecessary sleeps, even though the actually available IO budget has not
> been used up.
>

Yeah, this is possible, but to an extent, this is possible in the
current design as well where we balance the cost among autovacuum
workers.  Now, it is quite possible that the current design itself is
not good and we don't want to do the same thing at another place, but
at least we will be consistent and can explain the overall behavior.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"

2019-11-04 Thread Fujii Masao
On Sat, Nov 2, 2019 at 4:40 PM Michael Paquier  wrote:
>
> On Fri, Nov 01, 2019 at 02:17:03PM +0500, Ibrar Ahmed wrote:
> > Do we really need a regression test cases for such small oversights?
>
> It is possible to get the command tags using an event trigger...  But
> that sounds hack-ish.

Yes, so if simple test mechanism to check command tag exists,
it would be helpful.

I'm thinking to commit the patch. But I have one question; is it ok to
back-patch? Since the patch changes the command tags for some commands,
for example, which might break the existing event trigger functions
using TG_TAG if we back-patch it. Or we should guarantee the compatibility of
command tag within the same major version?

Regards,

-- 
Fujii Masao




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-11-04 Thread Dilip Kumar
On Mon, Nov 4, 2019 at 5:22 PM Amit Kapila  wrote:
>
> On Wed, Oct 30, 2019 at 9:38 AM vignesh C  wrote:
> >
> > On Tue, Oct 22, 2019 at 10:52 PM Tomas Vondra
> >  wrote:
> > >
> > > I think the patch should do the simplest thing possible, i.e. what it
> > > does today. Otherwise we'll never get it committed.
> > >
> > I found a couple of crashes while reviewing and testing flushing of
> > open transaction data:
> >
>
> Thanks for doing these tests.  However, I don't think these issues are
> anyway related to this patch.  It seems to be base code issues
> manifested by this patch.  See my analysis below.
>
> > Issue 1:
> > #0  0x7f22c5722337 in raise () from /lib64/libc.so.6
> > #1  0x7f22c5723a28 in abort () from /lib64/libc.so.6
> > #2  0x00ec5390 in ExceptionalCondition
> > (conditionName=0x10ea814 "!dlist_is_empty(head)", errorType=0x10ea804
> > "FailedAssertion",
> > fileName=0x10ea7e0 "../../../../src/include/lib/ilist.h",
> > lineNumber=458) at assert.c:54
> > #3  0x00b4fb91 in dlist_tail_element_off (head=0x19e4db8,
> > off=64) at ../../../../src/include/lib/ilist.h:458
> > #4  0x00b546d0 in ReorderBufferAbortOld (rb=0x191b6b0,
> > oldestRunningXid=3834) at reorderbuffer.c:1966
> > #5  0x00b3ca03 in DecodeStandbyOp (ctx=0x19af990,
> > buf=0x7ffcbc26dc50) at decode.c:332
> >
>
> This seems to be the problem of base code where we abort immediately
> after serializing the changes because in that case, the changes list
> will be empty.  I think you can try to reproduce it via the debugger
> or by hacking the code such that it serializes after every change and
> then if you abort after one change, it should hit this problem.
>
I think you might need to kill the server after all changes are
serialized otherwise normal abort will hit the ReorderBufferAbort and
that will remove your ReorderBufferTXN entry and you will never hit
this case.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: v12.0: ERROR: could not find pathkey item to sort

2019-11-04 Thread Amit Langote
On Sun, Nov 3, 2019 at 4:43 AM Tom Lane  wrote:
> Amit Langote  writes:
> >> This would
> >> probably require refactoring things so that there are separate
> >> entry points to add child equivalences for base rels and join rels.
> >> But that seems cleaner anyway than what you've got here.
>
> > Separate entry points sounds better, but only in HEAD?
>
> I had actually had in mind that we might have two wrappers around a
> common search-and-replace routine, but after studying the code I see that
> there's just enough differences to make it probably not worth the trouble
> to do it like that.  I did spend a bit of time removing cosmetic
> differences between the two versions, though, mostly by creating
> common local variables.

Agree that having two totally separate routines is better.

> I think the way you did the matching_ecs computation is actually wrong:
> we need to find ECs that reference any rel of the join, not only those
> that reference both inputs.  If nothing else, the way you have it here
> makes the results dependent on which pair of input rels gets considered
> first, and that's certainly bad for multiway joins.

I'm not sure I fully understand the problems, but maybe you're right.

> Also, I thought we should try to put more conditions on whether we
> invoke add_child_join_rel_equivalences at all.  In the attached I did
>
> if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
> (joinrel->has_eclass_joins ||
>  has_useful_pathkeys(root, parent_joinrel)))
>
> but I wonder if we could do more, like checking to see if the parent
> joinrel is partitioned.  Alternatively, maybe it's unnecessary because
> we won't try to build child joinrels unless these conditions are true?

Actually, I think we can assert in add_child_rel_equivalences() that
enable_partitionwise_join is true.  Then checking
enable_partitionwise_aggregate is unnecessary.

> I did not like the test case either.  Creating a whole new (and rather
> large) test table just for this one case is unreasonably expensive ---
> it about doubles the runtime of the equivclass test for me.  There's
> already a perfectly good test table in partition_join.sql, which seems
> like a more natural home for this anyhow.  After a bit of finagling
> I was able to adapt the test query to fail on that table.

That's great.  I tried but I can only finagle so much when it comes to
twisting around plan shapes to what I need. :)

> Patch v4 attached.  I've not looked at what we need to do to make this
> work in v12.

Thanks a lot for the revised patch.

Maybe the only difference between HEAD and v12 is that the former has
eclass_indexes infrastructure, whereas the latter doesn't?  I have
attached a version of your patch adapted for v12.

Also, looking at this in the patched code:

+ /*
+ * We may ignore expressions that reference a single baserel,
+ * because add_child_rel_equivalences should have handled them.
+ */
+ if (bms_membership(cur_em->em_relids) != BMS_MULTIPLE)
+ continue;

I have been thinking maybe add_child_rel_equivalences() doesn't need
to translate EC members that reference multiple appendrels, because
there top_parent_relids is always a singleton set, whereas em_relids
of such expressions is not?  Those half-translated expressions are
useless, only adding to the overhead of scanning ec_members.  I'm
thinking that we should apply this diff:

diff --git a/src/backend/optimizer/path/equivclass.c
b/src/backend/optimizer/path/equivclass.c
index e8e9e9a314..d4d80c8101 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2169,7 +2169,7 @@ add_child_rel_equivalences(PlannerInfo *root,
 continue;   /* ignore children here */

 /* Does this member reference child's topmost parent rel? */
-if (bms_overlap(cur_em->em_relids, top_parent_relids))
+if (bms_is_subset(cur_em->em_relids, top_parent_relids))
 {
 /* Yes, generate transformed child version */
 Expr   *child_expr;

Thoughts?

Thanks,
Amit


d25ea01275-fixup-PG12-v4.patch
Description: Binary data


Re: auxiliary processes in pg_stat_ssl

2019-11-04 Thread Kuntal Ghosh
On Mon, Nov 4, 2019 at 9:09 PM Euler Taveira  wrote:
> >
> > But this seems pointless.  Should we not hide those?  Seems this only
> > happened as an unintended side-effect of fc70a4b0df38.  It appears to me
> > that we should redefine that view to restrict backend_type that's
> > 'client backend' (maybe include 'wal receiver'/'wal sender' also, not
> > sure.)
> >
> Yep, it is pointless. BackendType that open connections to server are:
> autovacuum worker, client backend, background worker, wal sender. I
> also notice that pg_stat_gssapi is in the same boat as pg_stat_ssl and
> we should constraint the rows to backend types that open connections.
> I'm attaching a patch to list only connections in those system views.
>
Yeah, We should hide those. As Robert mentioned, I think checking
whether 'client_port IS NOT NULL' is a better approach than checking
the backend_type. The patch looks good to me.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: Keep compiler silence (clang 10, implicit conversion from 'long' to 'double' )

2019-11-04 Thread Kyotaro Horiguchi
At Mon, 04 Nov 2019 12:53:48 -0500, Tom Lane  wrote in 
> Yuya Watari  writes:
> > I attached the modified patch. In the patch, I placed the macro in
> > "src/include/c.h", but this may not be a good choice because c.h is
> > widely included from a lot of files. Do you have any good ideas about
> > its placement?
> 
> I agree that there's an actual bug here; it can be demonstrated with
> 
> # select extract(epoch from '256 microseconds'::interval * (2^55)::float8);
>  date_part  
> 
>  -9223372036854.775
> (1 row)
> 
> which clearly is a wrong answer.
> 
> I do not however like any of the proposed patches.  We already have one
> place that deals with this problem correctly, in int8.c's dtoi8():
> 
> /*
>  * Range check.  We must be careful here that the boundary values are
>  * expressed exactly in the float domain.  We expect PG_INT64_MIN to be an
>  * exact power of 2, so it will be represented exactly; but PG_INT64_MAX
>  * isn't, and might get rounded off, so avoid using it.
>  */
> if (unlikely(num < (float8) PG_INT64_MIN ||
>  num >= -((float8) PG_INT64_MIN) ||
>  isnan(num)))
> ereport(ERROR,
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
>  errmsg("bigint out of range")));
> 
> We should adopt that coding technique not invent new ones.
> 
> I do concur with creating a macro that encapsulates a correct version
> of this test, maybe like
> 
> #define DOUBLE_FITS_IN_INT64(num) \
>   ((num) >= (double) PG_INT64_MIN && \
>(num) < -((double) PG_INT64_MIN))

# I didn't noticed the existing bit above.

Agreed. it is equivalent to the trick AFAICS thus no need to add
another one to warry with.

> (or s/double/float8/ ?)

Maybe.

> c.h is probably a reasonable place, seeing that we define the constants
> there.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Logical replication wal sender timestamp bug

2019-11-04 Thread Michael Paquier
On Sat, Nov 02, 2019 at 09:54:54PM -0400, Jeff Janes wrote:
> While monitoring pg_stat_subscription, I noticed that last_msg_send_time
> was usually NULL, which doesn't make sense.  Why would we have a message,
> but not know when it was sent?

So...  The timestamp is received and stored in LogicalRepApplyLoop()
with send_time when a 'w' message is received in the subscription
cluster.  And it gets computed with a two-phase process:
- WalSndPrepareWrite() reserves the space in the message for the
timestamp.
- WalSndWriteData() computes the timestamp in the reserved space once
the write message is computed and ready to go.

> Filling out the timestamp after the message has already been sent is taking
> "as late as possible" a little too far.  It results in every message having
> a zero timestamp, other than keep-alives which go through a different path.

It seems to me that you are right: the timestamp is computed too
late.

> Re-ordering the code blocks as in the attached seems to fix it.  But I have
> to wonder, if this has been broken from the start and no one ever noticed,
> is this even valuable information to relay in the first place?  We could
> just take the column out of the view, and not bother calling
> GetCurrentTimestamp() for each message.

I think that there are use cases for such monitoring capabilities,
see for example 7fee252.  So I would rather keep it.
--
Michael


signature.asc
Description: PGP signature


Re: TAP tests aren't using the magic words for Windows file access

2019-11-04 Thread Michael Paquier
On Sun, Nov 03, 2019 at 10:53:00PM -0500, Tom Lane wrote:
> That is, TestLib::slurp_file is failing to read a file.  Almost
> certainly, "permission denied" doesn't really mean a permissions
> problem, but failure to specify the file-opening flags needed to
> allow concurrent access on Windows.  We fixed this in pg_ctl
> itself in commit 0ba06e0bf ... but we didn't fix the TAP
> infrastructure.  Is there an easy way to get Perl on board
> with that?

If we were to use Win32API::File so as the file is opened in shared
mode, we would do the same as what our frontend/backend code does (see
$uShare):
https://metacpan.org/pod/Win32API::File
--
Michael


signature.asc
Description: PGP signature


Re: Run-time pruning for ModifyTable

2019-11-04 Thread Thomas Munro
On Thu, Sep 12, 2019 at 10:10 AM Alvaro Herrera
 wrote:
> Here's a rebased version of this patch (it had a trivial conflict).

Hi, FYI partition_prune.sql currently fails (maybe something to do
with commit d52eaa09?):

 where s.a = $1 and s.b = $2 and s.c = (select 1);
 explain (costs off) execute q (1, 1);
   QUERY PLAN

+
  Append
InitPlan 1 (returns $0)
  ->  Result
-   Subplans Removed: 1
->  Seq Scan on p1
- Filter: ((a = $1) AND (b = $2) AND (c = $0))
+ Filter: ((a = 1) AND (b = 1) AND (c = $0))
->  Seq Scan on q111
- Filter: ((a = $1) AND (b = $2) AND (c = $0))
+ Filter: ((a = 1) AND (b = 1) AND (c = $0))
->  Result
- One-Time Filter: ((1 = $1) AND (1 = $2) AND (1 = $0))
-(10 rows)
+ One-Time Filter: (1 = $0)
+(9 rows)

 execute q (1, 1);
  a | b | c




Re: pause recovery if pitr target not reached

2019-11-04 Thread Fujii Masao
On Fri, Nov 1, 2019 at 9:41 PM Peter Eisentraut
 wrote:
>
> On 2019-10-21 08:44, Fujii Masao wrote:
> > Probably we can use standby mode + recovery target setting for
> > the almost same purpose. In this configuration, if end-of-WAL is reached
> > before recovery target, the startup process keeps waiting for new WAL to
> > be available. Then, if recovery target is reached, the startup process works
> > as recovery_target_action indicates.
>
> So basically get rid of recovery.signal mode and honor recovery target
> parameters in standby mode?

Yes, currently not only archive recovery mode but also standby mode honors
the recovery target settings.

> That has some appeal because it simplify
> this whole space significantly, but perhaps it would be too confusing
> for end users?

This looks less confusing than extending archive recovery. But I'd like to
hear more opinions about that.

Regards,

-- 
Fujii Masao




Re: Include RELKIND_TOASTVALUE in get_relkind_objtype

2019-11-04 Thread Michael Paquier
On Mon, Nov 04, 2019 at 03:31:27PM -0500, Tom Lane wrote:
> I'd rather do something like the attached, which makes it more of an
> explicit goal that we won't fail on bad input.  (As written, we'd only
> fail on bad classId, which is a case that really shouldn't happen.)

Okay, that part looks fine.

> Tests are the same as yours, but I revised the commentary and got
> rid of the elog-for-bad-relkind.

No objections on that part either.

> I also made some cosmetic changes in commands/alter.c, so as to (1)
> make it clear by inspection that those calls are only used to feed
> aclcheck_error, and (2) avoid uselessly computing a value that we
> won't need in normal non-error cases.

Makes also sense.  Thanks for looking at it!
--
Michael


signature.asc
Description: PGP signature


Re: Do we have a CF manager for November?

2019-11-04 Thread Michael Paquier
On Mon, Nov 04, 2019 at 10:54:52AM -0500, Tom Lane wrote:
> It's time to start the next commitfest.  I seem to recall somebody
> saying back in September that they'd run the next one, but I forget
> who.  Anyway, we need a volunteer to be chief nagger.

That may have been me.  I can take this one if there is nobody else
around.

Note: I have switched the app as in progress a couple of days ago,
after AoE was on the 1st of November of course.
--
Michael


signature.asc
Description: PGP signature


Standard-conforming datetime years parsing

2019-11-04 Thread Alexander Korotkov
Hi!

Thread [1] about support for .datetime() jsonpath method raises a
question about standard-conforming parising for Y, YY, YYY and RR
datetime template patterns.

According to standard YYY, YY and Y should get higher digits from
current year. Our current implementation gets higher digits so that
the result is closest to 2020.

We currently don't support RR.  According to standard RR behavior is
implementation-defined and should select marching 4-digit year in the
interval [CY - 100; CY + 100], where CY is current year.  So, our
current implementation of YY is more like RR according to standard.

The open question are:
1) Do we like to make our datetime parsing to depend on current
timestamp?  I guess no.  But how to parse one-digit year?  If we
hardcode constant it would outdate in decade.  Thankfully, no one in
the right mind wouldn't use Y pattern, but still.
2) How do we like to parse RR?  Standard lives us a lot of freedom
here.  Do we like to parse it as do we parse YY now?  It looks
reasonable to select a closest matching year.  Since PG 13 is going to
be released in 2020, our algorithm would be perfect fit at release
time.
3) Do we like to change behavior to_date()/to_timestamp()?  Or just
jsonpath .datetime() and future CAST(... AS ... FORMAT ...) defined in
SQL 2016?

Attached patch solve the questions above as following.  YYY, YY and Y
patterns get higher digits from 2020.  So, results for Y would become
inconsistent since 2030.  RR select matching year closest to 2020 as
YY does for now.  It changes behavior for both
to_date()/to_timestamp() and  jsonpath .datetime().

Any thoughts?

Links
1. 
https://www.postgresql.org/message-id/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-datetime-years-parsing.patch
Description: Binary data


Re: update ALTER TABLE with ATTACH PARTITION lock mode (docs)

2019-11-04 Thread Michael Paquier
On Sat, Nov 02, 2019 at 05:19:11PM +0900, Michael Paquier wrote:
> Sounds fine.  So gathering everything I get the attached.  Any
> thoughts from others?

Committed after splitting the changes in two as originally proposed.
--
Michael


signature.asc
Description: PGP signature


Re: Zedstore - compressed in-core columnar storage

2019-11-04 Thread Taylor Vesely
> When a zedstore table is queried using *invalid* ctid, the server
> crashes due to assertion failure. See below,
>
> postgres=# select * from t1 where ctid = '(0, 0)';
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> I believe above should have either returned 0 rows or failed with some
> user friendly error.

We pushed a fix for this today. It now returns zero rows, like the
equivalent query with heap. Thanks for reporting!


Re: Restore replication settings when modifying a field type

2019-11-04 Thread Quan Zongliang

On 2019/10/28 12:39, Kyotaro Horiguchi wrote:

Hello.

# The patch no longer applies on the current master. Needs a rebasing.


New patch, rebased on master branch.


At Sat, 26 Oct 2019 16:50:48 +0800, Quan Zongliang  
wrote in

In fact, the replication property of the table has not been modified,
and it is still 'i'(REPLICA_IDENTITY_INDEX). But the previously
specified index property 'indisreplident' is set to false because of
the rebuild.


I suppose that the behavior is intended. Change of column types on the
publisher side can break the agreement on replica identity with
subscribers. Thus replica identity setting cannot be restored
unconditionally. For (somewhat artifitial :p) example:

P=# create table t (c1 integer, c2 text unique not null);
P=# alter table t replica identity using index t_c2_key;
P=# create publication p1 for table t;
P=# insert into t values (0, '00'), (1, '01');
S=# create table t (c1 integer, c2 text unique not null);
S=# alter table t replica identity using index t_c2_key;
S=# create subscription s1 connection '...' publication p1;

Your patch allows change of the type of c2 into integer.

P=# alter table t alter column c2 type integer using c2::integer;
P=# update t set c1 = c1 + 1 where c2 = '01';

This change doesn't affect perhaps as expected.

S=# select * from t;
  c1 | c2
+
   0 | 00
   1 | 01
(2 rows)



So I developed a patch. If the user modifies the field type. The
associated index is REPLICA IDENTITY. Rebuild and restore replication
settings.


Explicit setting of replica identity premises that they are sure that
the setting works correctly. Implicit rebuilding after a type change
can silently break it.

At least we need to guarantee that the restored replica identity
setting is truly compatible with all existing subscribers. I'm not
sure about potential subscribers..

Anyway I think it is a problem that replica identity setting is
dropped silently. Perhaps a message something like "REPLICA IDENTITY
setting is lost, please redefine after confirmation of compatibility
with subscribers." is needed.

regards.



diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 05593f3316..02f8dbeabf 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -174,6 +174,8 @@ typedef struct AlteredTableInfo
List   *changedConstraintDefs;  /* string definitions of same */
List   *changedIndexOids;   /* OIDs of indexes to rebuild */
List   *changedIndexDefs;   /* string definitions of same */
+   OidchangedReplIdentOid; /* OID of index to reset 
REPLICA IDENTIFY */
+   char   *changedReplIdentDef;/* string definitions of same */
 } AlteredTableInfo;
 
 /* Struct describing one new constraint to check in Phase 3 scan */
@@ -459,6 +461,7 @@ static ObjectAddress ATExecAlterColumnType(AlteredTableInfo 
*tab, Relation rel,

   AlterTableCmd *cmd, LOCKMODE lockmode);
 static void RememberConstraintForRebuilding(Oid conoid, AlteredTableInfo *tab);
 static void RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab);
+static void RememberReplicaIdentForRebuilding(Oid indoid, AlteredTableInfo 
*tab);
 static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab,
   LOCKMODE 
lockmode);
 static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
@@ -10715,6 +10718,9 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation 
rel,
{
Assert(foundObject.objectSubId 
== 0);

RememberIndexForRebuilding(foundObject.objectId, tab);
+
+   if 
(RelationGetForm(rel)->relreplident==REPLICA_IDENTITY_INDEX)
+   
RememberReplicaIdentForRebuilding(foundObject.objectId, tab);
}
else if (relKind == RELKIND_SEQUENCE)
{
@@ -10749,9 +10755,18 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation 
rel,
}
 
case OCLASS_CONSTRAINT:
-   Assert(foundObject.objectSubId == 0);
-   
RememberConstraintForRebuilding(foundObject.objectId, tab);
-   break;
+   {
+   Oid indId;
+
+   Assert(foundObject.objectSubId == 0);
+   
RememberConstraintForRebuilding(foundObject.objectId, tab);
+
+   indId = 
get_constraint_index(foundObject.objectId);
+

Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-11-04 Thread Michael Paquier
On Thu, Oct 31, 2019 at 05:55:55PM +0900, Michael Paquier wrote:
> Thanks.  I exactly did the same thing on my local branch.

Hearing nothing more, I have done some adjustments to the patch and
committed it.  Please note that I have not switched the old interface
to be static to reloptions.c as if you look closely at reloptions.h we
allow much more flexibility around HANDLE_INT_RELOPTION to fill and
parse the reloptions in custom AM.  AM maintainers had better use the
new interface, but there could be a point for more customized error
messages.
--
Michael


signature.asc
Description: PGP signature


Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-11-04 Thread Thomas Munro
On Sat, Oct 19, 2019 at 11:52 PM Tomas Vondra
 wrote:
> I wonder if an extension could do something like that, though. It can
> install a hook after parse analysis, so I guess it could walk the CTEs
> and mark them as materialized.

I wonder if the existing pg_hint_plan extension could be extended to
do that using something like /*+ MATERIALIZE */.  That'd presumably be
ignored when not installed/not understood etc.  I've never used
pg_hint_plan myself and don't know how or how well it works, but it
look like it supports Oracle-style hints hidden in comments like /*+
HashJoin(a b) */ rather than SQL Server-style hints that are in the
SQL grammar itself like SELECT ... FROM a HASH JOIN b.




Re: SimpleLruTruncate() mutual exclusion

2019-11-04 Thread Noah Misch
On Mon, Nov 04, 2019 at 03:26:35PM +1300, Thomas Munro wrote:
> On Thu, Aug 1, 2019 at 6:51 PM Noah Misch  wrote:
> > vac_truncate_clog() instance 1 starts, considers segment ABCD eligible to 
> > unlink
> > vac_truncate_clog() instance 2 starts, considers segment ABCD eligible to 
> > unlink
> > vac_truncate_clog() instance 1 unlinks segment ABCD
> > vac_truncate_clog() instance 1 calls SetTransactionIdLimit()
> > vac_truncate_clog() instance 1 finishes
> > some backend calls SimpleLruZeroPage(), creating segment ABCD
> > vac_truncate_clog() instance 2 unlinks segment ABCD
> >
> > Serializing vac_truncate_clog() fixes that.
> 
> I've wondered before (in a -bugs thread[1] about unexplained pg_serial
> wraparound warnings) if we could map 64 bit xids to wide SLRU file
> names that never wrap around and make this class of problem go away.
> Unfortunately multixacts would need 64 bit support too...
> 
> [1] 
> https://www.postgresql.org/message-id/flat/CAEBTBzuS-01t12GGVD6qCezce8EFD8aZ1V%2Bo_3BZ%3DbuVLQBtRg%40mail.gmail.com

That change sounds good to me.




Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-11-04 Thread Bruce Momjian
On Sat, Oct 19, 2019 at 02:35:42PM -0400, Isaac Morland wrote:
> On Sat, 19 Oct 2019 at 13:36, Stephen Frost  wrote:
> 
> Greetings,
> 
> * Isaac Morland (isaac.morl...@gmail.com) wrote:
> > That embeds a temporary hack in the application code indefinitely.
> 
> ... one could argue the same about having to say AS MATERIALIZED.
> 
>  
> I think OFFSET 0 is a hack - the fact that it forces an optimization fence
> feels like an oddity. By contrast, saying AS MATERIALIZED means materialize 
> the
> CTE. I suppose you could argue that the need to be able to request that is a
> temporary hack until query optimization improves further, but I don't think
> that's realistic. For the foreseeable future we will need to be able to tell
> the query planner that it is wrong. I mean, in principle the DB should figure
> out for itself which (non-constraint) indexes are needed. But I don't see any
> proposals to attempt to implement that.
> 
> Side note: I am frequently disappointed by the query planner. I have had many
> situations in which a nice simple strategy of looking up some tiny number of
> records in an index and then following more indices to get joined records 
> would
> have worked, but instead it did a linear scan through the wrong starting 
> table.
> So I'm very glad the AS MATERIALIZED now exists for when it's needed. On the
> other hand, I recognize that the reason I'm disappointed is because my
> expectations are so high: often I've written a query that joins several views
> together, meaning that under the covers it's really joining maybe 20 tables,
> and it comes back with the answer instantly. So in effect the query planner is
> just good enough to make me expect it to be even better than it is.

Well, since geqo_threshold = 12 is the default, for a 20-table join, you
are using genetic query optimization (GEQO) in PG 12 without
MATERIALIZED:

https://www.postgresql.org/docs/12/geqo.html

and GEQO assumes it would take too long to fully test all optimization
possibilities, so it randomly checks just some of them.  Therefore, it
is no surprise you are disappointed in its output.

In a way, when you are using materialized CTEs, you are doing the
optimization yourself, in your SQL code, and then the table join count
drops low enough that GEQO is not used and Postgres fully tests all
optimization possibilities.  This is behavior I had never considered ---
the idea that the user is partly replacing the optimizer, and that using
materialized CTEs prevents the problems that require the use of GEQO.

I guess my big take-away is that not only can MATERIALIZE change the
quality of query plans but it can also improve the quality of query
plans if it prevents GEQO from being used.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Collation versioning

2019-11-04 Thread Thomas Munro
On Mon, Nov 4, 2019 at 11:13 PM Julien Rouhaud  wrote:
> When working on the REINDEX FILTER, I totally missed this thread and

I created a new wiki page to try to track the various moving pieces
here.  Julien, Peter, Christoph, anyone interested, please feel free
to update it or add more information.

https://wiki.postgresql.org/wiki/Collations




Re: [Patch] Add a reset_computed_values function in pg_stat_statements

2019-11-04 Thread Thomas Munro
On Thu, Sep 26, 2019 at 8:05 AM Alvaro Herrera  wrote:
> This patch seems to be failing the contrib build.  Please fix.

Hi Pierre,

In contrib/pg_stat_statements/pg_stat_statements.c, you need to
declare or define entry_reset_computed() before you use it.  I suppose
your compiler must be warning about that.  I personally put
"COPT=-Wall -Werror" into src/Makefile.custom to make sure that I
don't miss any warnings.  That's also what http://cfbot.cputube.org/
does (a thing that does a full check-world on all registered patches
to look for such problems).




Re: patch: psql - enforce constant width of last column

2019-11-04 Thread Pavel Stehule
Hi

po 4. 11. 2019 v 21:55 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > st 18. 9. 2019 v 12:52 odesílatel Ahsan Hadi 
> napsal:
> >> does this patch have any value for psql without pspg?
>
> > The benefit of this patch is just for pspg users today.
>
> TBH, I think we should just reject this patch.  It makes psql's
> table-printing behavior even more complicated than it was before.
> And I don't see how pspg gets any benefit --- you'll still have
> to deal with the old code, for an indefinite time into the future.
>

I don't think so it increase printing rules too much. A default value
"auto" doesn't any change against current state, "always" ensure same line
width of any row.

The problem, that this patch try to solve, is different width of rows -
although the result is aligned.

Personally I think so current behave is not correct. Correct solution
should be set "finalspaces true" every time - for aligned output. But I
don't know a motivation of authors and as solution with minimal impacts I
wrote a possibility to set (it's not default) to finalspace to "always" as
fix of some possible visual artefact (although these artefacts are almost
time invisible).

The patch maybe looks not trivial (although it is trivial), but it is due I
try to reduce possible impact on any other application to zero.


>
> Moreover, *other* programs that pay close attention to the output
> format will be forced to deal with the possibility that this flag
> has been turned on, which typically they wouldn't even have a way
> to find out.  So I think you're basically trying to export your
> problems onto everyone else.
>

I try to fix this issue where this issue coming. For this patch is
important to get a agreement (or not) if this problem is a issue that
should be fixed.

I think so in aligned mode all rows should to have same width.

On second hand, really I don't know why the last space is not printed, and
if some applications had a problem with it. I have not any idea. Current
code where last spaces are not printed is little bit complex than if the
align was really complete.

Sure, "the issue of last invisible space" is not big issue (it's
triviality) - I really think so it should be fixed on psql side, but if
there will not be a agreement, I can fix it on pspg side (although it will
not be elegant - because I have to print chars that doesn't exists).

Is here any man who remember this implementation, who can say, why the code
is implemented how it is implemented?

Regards

Pavel



> regards, tom lane
>


Re: ssl passphrase callback

2019-11-04 Thread Thomas Munro
On Sat, Nov 2, 2019 at 6:57 AM Andrew Dunstan
 wrote:
> On 11/1/19 11:01 AM, Robert Haas wrote:
> > On Thu, Oct 31, 2019 at 11:37 AM Andrew Dunstan
> >  wrote:
> >> This patch provides a hook for a function that can supply an SSL
> >> passphrase. The hook can be filled in by a shared preloadable module. In
> >> order for that to be effective, the startup order is modified slightly.
> >> There is a test attached that builds and uses one trivial
> >> implementation, which just takes a configuration setting and rot13's it
> >> before supplying the result as the passphrase.
> > It seems to me that it would be a lot better to have an example in
> > contrib that does something which might be of actual use to users,
> > such as running a shell command and reading the passphrase from
> > stdout.
> >
> > Features that are only accessible by writing C code are, in general,
> > not as desirable as features which can be accessed via SQL or
> > configuration.
>
> Well, I tried to provide the most trivial and simple test I could come
> up with. Running a shell command can already be accomplished via the
> ssl_passphrase_command setting.

It looks like the new declarations in libpq-be.h are ifdef'd out in a
non-USE_SSL build, but then we still try to build the new test module
and it fails:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.64071




Re: [PATCH] Sort policies and triggers by table name in pg_dump.

2019-11-04 Thread Tom Lane
Benjie Gillam  writes:
>> Your patch has two warnings because you are trying to map a policy
>> info pointer to a trigger info pointer:

> Ah, thank you for the pointer (aha); I've attached an updated patch
> that addresses this copy/paste issue.

LGTM, pushed (with a bit of extra polishing of comments).

regards, tom lane




Re: Collation versioning

2019-11-04 Thread Thomas Munro
On Mon, Nov 4, 2019 at 11:13 PM Julien Rouhaud  wrote:
> On Mon, Nov 4, 2019 at 4:58 AM Thomas Munro  wrote:
> > * Some have expressed doubt that pg_depend is the right place for
> > this; let's see if any counter-proposals appear.
>
> When working on the REINDEX FILTER, I totally missed this thread and
> wrote a POC saving the version in pg_index.  That's not ideal though,
> as you need to record multiple version strings.  In my version I used
> a json type, using the collprovider  as the key, but that's not enough
> for ICU as each collation can have a different version string.  I'm
> not a huge fan of using pg_depend to record the version, but storing a
> collprovider/collname -> version per row in pg_index is definitely a
> no go, so I don't have any better counter-proposal.

Yeah, I also thought about using pg_index directly, and was annoyed by
the denormalisation you mention (an array of {collation, version}!?)
and so I realised I wanted another table like they teach you at
database school, but I also realised that there are other kinds of
database objects that depend on collations and that can become
corrupted if the collation definition changes.  It was thinking about
that that lead me to the idea of using something that can record
version dependences on *any* database object, which brought me to the
existing pg_depend table.

Concretely, eventually we might want to support checks etc, as
mentioned by Doug Doole and as I showed in an earlier version of this
POC patch, though I removed it from the more recent patch set so we
can focus on the more pressing problems.  The check constraint idea
leads to more questions like: "does this constraint *really* use any
operators that truly depend on the collation definition?" (so CHECK
(name > 'xxx') depends on name's collation, but CHECK (LENGTH(name) <
32) doesn't really), and I didn't want to be distracted by that rabbit
hole.  Here's the example message that came out of the earlier patch
for posterity:

WARNING: constraint "t_i_check" depends on collation 12018 version
"30.0.1", but the current version is "30.0.2"
DETAIL: The constraint may be corrupted due to changes in sort order.
HINT: Drop and recreate the constraint to avoid the risk of corruption.




Re: Collation versioning

2019-11-04 Thread Thomas Munro
On Tue, Nov 5, 2019 at 4:18 AM Julien Rouhaud  wrote:
> On Mon, Nov 4, 2019 at 11:13 AM Julien Rouhaud  wrote:
> > On Mon, Nov 4, 2019 at 4:58 AM Thomas Munro  wrote:
> > > Here are some problems to think about:
> > >
> > > * We'd need to track dependencies on the default collation once we
> > > have versioning for that [...]
>
> Another problem I just thought about is how to avoid discrepancy of
> collation version for indexes on shared objects, such as
> pg_database_datname_index.

I didn't look closely at the code, but I think when "name" recently
became collation-aware (commit 586b98fd), it switched to using
C_COLLATION_OID as its typcollation, and "C" doesn't need versioning,
so I think it would only be a problem if there are shared catalogs
that have "name" columns that have a non-type-default collation.
There are none of those, and you can't create them, right?  If there
were, if we take this patch set to its logical conclusion, we'd also
need pg_shdepend.refobjversion, but we don't need it AFAICS.




Re: patch: psql - enforce constant width of last column

2019-11-04 Thread Tom Lane
Pavel Stehule  writes:
> st 18. 9. 2019 v 12:52 odesílatel Ahsan Hadi  napsal:
>> does this patch have any value for psql without pspg?

> The benefit of this patch is just for pspg users today.

TBH, I think we should just reject this patch.  It makes psql's
table-printing behavior even more complicated than it was before.
And I don't see how pspg gets any benefit --- you'll still have
to deal with the old code, for an indefinite time into the future.

Moreover, *other* programs that pay close attention to the output
format will be forced to deal with the possibility that this flag
has been turned on, which typically they wouldn't even have a way
to find out.  So I think you're basically trying to export your
problems onto everyone else.

regards, tom lane




Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 14:58:20 -0500, Robert Haas wrote:
> On Mon, Nov 4, 2019 at 2:04 PM Andres Freund  wrote:
> > Is that really true? In the case where it started and failed we except
> > the error queue to have been attached to, and there to be either an
> > error 'E' or a 'X' response (cf HandleParallelMessage()).  It doesn't
> > strike me as very complicated to keep track of whether any worker has
> > sent an 'E' or not, no?  I don't think we really need the
> 
> One of us is confused here, because I don't think that helps. Consider
> three background workers Alice, Bob, and Charlie. Alice fails to
> launch because fork() fails. Bob launches but then exits unexpectedly.
> Charlie has no difficulties and carries out his assigned duties.
> 
> Now, the system you are proposing will say that Charlie is OK but
> Alice and Bob are a problem. However, that's the way it already works.
> What Tom wants is to distinguish Alice from Bob, and your proposal is
> of no help at all with that problem, so far as I can see.

I don't see how what I'm saying treats Alice and Bob the same. What I'm
saying is that if a worker has been started and shut down, without
signalling an error via the error queue, and without an exit code that
causes postmaster to worry, then we can just ignore the worker for the
purpose of determining whether the query succeeded. Without a meaningful
loss in reliably. And we can detect such a cases easily, we already do,
we just have remove an ereport(), and document things.


> > > We certainly can't ignore a worker that managed to start and
> > > then bombed out, because it might've already, for example, claimed a
> > > block from a Parallel Seq Scan and not yet sent back the corresponding
> > > tuples. We could ignore a worker that never started at all, due to
> > > EAGAIN or whatever else, but the original process that registered the
> > > worker has no way of finding this out.
> >
> > Sure, but in that case we'd have gotten either an error back from the
> > worker, or postmaster wouldhave PANIC restarted everyone due to an
> > unhandled error in the worker, no?
> 
> An unhandled ERROR in the worker is not a PANIC. I think it's just an
> ERROR that ends up being fatal in effect, but even if it's actually
> promoted to FATAL, it's not a PANIC.

If it's an _exit without going through the PG machinery, it'll
eventually be PANIC, albeit with a slight delay. And once we're actually
executing the parallel query, we better have error reporting set up for
parallel queries.


> It is *generally* true that if a worker hits an ERROR, the error will
> be propagated back to the leader, but it is not an invariable rule.
> One pretty common way that it fails to happen - common in the sense
> that it comes up during development, not common on production systems
> I hope - is if a worker dies before reaching the call to
> pq_redirect_to_shm_mq(). Before that, there's no possibility of
> communicating anything. Granted, at that point we shouldn't yet have
> done any work that might mess up the query results.

Right.


> Similarly, once we reach that point, we are dependent on a certain amount of 
> good
> behavior for things to work as expected; yeah, any code that calls
> proc_exit() is suppose to signal an ERROR or FATAL first, but what if
> it doesn't? Granted, in that case we'd probably fail to send an 'X'
> message, too, so the leader would still have a chance to realize
> something is wrong.

I mean, in that case so many more things are screwed up, I don't buy
that it's worth pessimizing ENOMEM handling for this. And if you're
really concerned, we could add before_shmem_exit hook or such that makes
extra double sure that we've signalled something.


> I guess I agree with you to this extent: I made a policy decision that
> if a worker is successfully fails to show up, that's an ERROR. It
> would be possible to adopt the opposite policy, namely that if a
> worker doesn't show up, that's an "oh well." You'd have to be very
> certain that the worker wasn't going to show up later, though. For
> instance, suppose you check all of the shared memory queues used for
> returning tuples and find that every queue is either in a state where
> (1) nobody's ever attached to it or (2) somebody attached and then
> detached. This is not good enough, because it's possible that after
> you checked queue #1, and found it in the former state, someone
> attached and read a block, which caused queue #2 to enter the latter
> state before you got around to checking it. If you decide that it's OK
> to decide that we're done at this point, you'll never return the
> tuples that are pushed through queue #1.

That's why the code *already* waits for workers to attach, or for the
slot to be marked unused/invalid/reused. I don't see how that applies to
not explicitly erroring out when we know that the worker *failed* to
start:

void
WaitForParallelWorkersToFinish(ParallelContext *pcxt)
...


/*
 

Re: 64 bit transaction id

2019-11-04 Thread Thomas Munro
On Tue, Nov 5, 2019 at 8:45 AM Tomas Vondra
 wrote:
> On Mon, Nov 04, 2019 at 10:44:53AM -0800, Andres Freund wrote:
> >On 2019-11-04 19:39:18 +0100, Tomas Vondra wrote:
> >> On Mon, Nov 04, 2019 at 10:04:09AM -0800, Andres Freund wrote:
> >> > And "without causing significant issues elsewhere" unfortunately
> >> > includes continuing to allow pg_upgrade to work.
> >
> >> Yeah. I suppose we could have a different AM implementing this, but
> >> maybe that's not possible ...
> >
> >Entirely possible. But the amount of code duplication / unnecessary
> >branching and the user confusion from two very similar AMs, would have
> >to be weighed against the benefits.
> >
>
> Agreed. I think code complexity is part of the trade-off. IMO it's fine
> to hack existing heap AM initially, and only explore the separate AM if
> that turns out to be promising.

I thought a bit about how to make a minimally-diffferent-from-heap
non-freezing table AM using 64 bit xids, as a thought experiment when
trying to understand or explain to others what zheap is about.
Committed transactions are easy (you don't have to freeze fxid
references from the ancient past because they don't wrap around so
they always look old), but how do you deal with *aborted* transactions
when truncating the CLOG (given that our current rule is "if it's
before the CLOG begins, it must be committed")?  I see three
possibilities: (1) don't truncate the CLOG anymore (use 64 bit
addressing and let it leak disk forever, like we did before commit
2589735d and later work), (2) freeze aborted transactions only, using
a wraparound vacuum (and now you have failed, if the goal was to avoid
having to scan all tuples periodically to freeze stuff, though
admittedly it will require less IO to freeze only the aborted
transactions), (3) go and remove aborted fxid references eagerly, when
you roll back (this could be done using the undo technology that we
have been developing to support zheap).  Another way to explain (3) is
that this hypothetical table AM, let's call it "yheap", takes the
minimum parts of the zheap technology stack required to get rid of
vacuum-for-wraparound, without doing in-place updates or any of that
hard stuff.  To make this really work you'd also have to deal with
multixacts, which also require freezing.  If that all sounds too
complicated, you're back to (2) which seems a bit weak to me.  Or
perhaps I'm missing something?




Re: Include RELKIND_TOASTVALUE in get_relkind_objtype

2019-11-04 Thread Tom Lane
Michael Paquier  writes:
> Okay.  Attached is what I was thinking about, with extra regression
> tests to cover the ground for toast tables and indexes that are able
> to reproduce the original failure, and more comments for the routines
> as they should be used only for ACL error messages.

I'd rather do something like the attached, which makes it more of an
explicit goal that we won't fail on bad input.  (As written, we'd only
fail on bad classId, which is a case that really shouldn't happen.)

Tests are the same as yours, but I revised the commentary and got
rid of the elog-for-bad-relkind.  I also made some cosmetic changes
in commands/alter.c, so as to (1) make it clear by inspection that
those calls are only used to feed aclcheck_error, and (2) avoid
uselessly computing a value that we won't need in normal non-error
cases.

regards, tom lane

diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index ce8a4e9..b8cbe6a 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -2612,6 +2612,13 @@ get_object_attnum_acl(Oid class_id)
 	return prop->attnum_acl;
 }
 
+/*
+ * get_object_type
+ *
+ * Return the object type associated with a given object.  This routine
+ * is primarily used to determine the object type to mention in ACL check
+ * error messages, so it's desirable for it to avoid failing.
+ */
 ObjectType
 get_object_type(Oid class_id, Oid object_id)
 {
@@ -5333,6 +5340,16 @@ strlist_to_textarray(List *list)
 	return arr;
 }
 
+/*
+ * get_relkind_objtype
+ *
+ * Return the object type for the relkind given by the caller.
+ *
+ * If an unexpected relkind is passed, we say OBJECT_TABLE rather than
+ * failing.  That's because this is mostly used for generating error messages
+ * for failed ACL checks on relations, and we'd rather produce a generic
+ * message saying "table" than fail entirely.
+ */
 ObjectType
 get_relkind_objtype(char relkind)
 {
@@ -5352,13 +5369,10 @@ get_relkind_objtype(char relkind)
 			return OBJECT_MATVIEW;
 		case RELKIND_FOREIGN_TABLE:
 			return OBJECT_FOREIGN_TABLE;
-
-			/*
-			 * other relkinds are not supported here because they don't map to
-			 * OBJECT_* values
-			 */
+		case RELKIND_TOASTVALUE:
+			return OBJECT_TABLE;
 		default:
-			elog(ERROR, "unexpected relkind: %d", relkind);
-			return 0;
+			/* Per above, don't raise an error */
+			return OBJECT_TABLE;
 	}
 }
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 70dbcb0..562e3d3 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -172,7 +172,6 @@ AlterObjectRename_internal(Relation rel, Oid objectId, const char *new_name)
 	AttrNumber	Anum_name = get_object_attnum_name(classId);
 	AttrNumber	Anum_namespace = get_object_attnum_namespace(classId);
 	AttrNumber	Anum_owner = get_object_attnum_owner(classId);
-	ObjectType	objtype = get_object_type(classId, objectId);
 	HeapTuple	oldtup;
 	HeapTuple	newtup;
 	Datum		datum;
@@ -224,7 +223,8 @@ AlterObjectRename_internal(Relation rel, Oid objectId, const char *new_name)
 		ownerId = DatumGetObjectId(datum);
 
 		if (!has_privs_of_role(GetUserId(), DatumGetObjectId(ownerId)))
-			aclcheck_error(ACLCHECK_NOT_OWNER, objtype, old_name);
+			aclcheck_error(ACLCHECK_NOT_OWNER, get_object_type(classId, objectId),
+		   old_name);
 
 		/* User must have CREATE privilege on the namespace */
 		if (OidIsValid(namespaceId))
@@ -670,7 +670,6 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid)
 	AttrNumber	Anum_name = get_object_attnum_name(classId);
 	AttrNumber	Anum_namespace = get_object_attnum_namespace(classId);
 	AttrNumber	Anum_owner = get_object_attnum_owner(classId);
-	ObjectType	objtype = get_object_type(classId, objid);
 	Oid			oldNspOid;
 	Datum		name,
 namespace;
@@ -726,7 +725,7 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid)
 		ownerId = DatumGetObjectId(owner);
 
 		if (!has_privs_of_role(GetUserId(), ownerId))
-			aclcheck_error(ACLCHECK_NOT_OWNER, objtype,
+			aclcheck_error(ACLCHECK_NOT_OWNER, get_object_type(classId, objid),
 		   NameStr(*(DatumGetName(name;
 
 		/* User must have CREATE privilege on new namespace */
@@ -950,8 +949,6 @@ AlterObjectOwner_internal(Relation rel, Oid objectId, Oid new_ownerId)
 		/* Superusers can bypass permission checks */
 		if (!superuser())
 		{
-			ObjectType	objtype = get_object_type(classId, objectId);
-
 			/* must be owner */
 			if (!has_privs_of_role(GetUserId(), old_ownerId))
 			{
@@ -970,7 +967,8 @@ AlterObjectOwner_internal(Relation rel, Oid objectId, Oid new_ownerId)
 	snprintf(namebuf, sizeof(namebuf), "%u", objectId);
 	objname = namebuf;
 }
-aclcheck_error(ACLCHECK_NOT_OWNER, objtype, objname);
+aclcheck_error(ACLCHECK_NOT_OWNER, get_object_type(classId, objectId),
+			   objname);
 			}
 			/* Must be able to become new owner */
 			check_is_member_of_role(GetUserId(), new_ownerI

Re: cost based vacuum (parallel)

2019-11-04 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-11-04 14:33:41 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > > > With parallelization across indexes, you could have a situation where
> > > > the individual indexes are on different tablespaces with independent
> > > > i/o, therefore the parallelization ends up giving you an increase in i/o
> > > > throughput, not just additional CPU time.
> > > 
> > > How's that related to IO throttling being active or not?
> > 
> > You might find that you have to throttle the IO down when operating
> > exclusively against one IO channel, but if you have multiple IO channels
> > then the acceptable IO utilization could be higher as it would be 
> > spread across the different IO channels.
> > 
> > In other words, the overall i/o allowance for a given operation might be
> > able to be higher if it's spread across multiple i/o channels, as it
> > wouldn't completely consume the i/o resources of any of them, whereas
> > with a higher allowance and a single i/o channel, there would likely be
> > an impact to other operations.
> > 
> > As for if this is really relevant only when it comes to parallel
> > operations is a bit of an interesting question- these considerations
> > might not require actual parallel operations as a single process might
> > be able to go through multiple indexes concurrently and still hit the
> > i/o limit that was set for it overall across the tablespaces.  I don't
> > know that it would actually be interesting or useful to spend the effort
> > to make that work though, so, from a practical perspective, it's
> > probably only interesting to think about this when talking about
> > parallel vacuum.
> 
> But you could just apply different budgets for different tablespaces?

Yes, that would be one approach to addressing this, though it would
change the existing meaning of those cost parameters.  I'm not sure if
we think that's an issue or not- if we only have this in the case of a
parallel vacuum then it's probably fine, I'm less sure if it'd be
alright to change that on an upgrade.

> That's quite doable independent of parallelism, as we don't have tables
> or indexes spanning more than one tablespace.  True, you could then make
> the processing of an individual vacuum faster by allowing to utilize
> multiple tablespace budgets at the same time.

Yes, it's possible to do independent of parallelism, but what I was
trying to get at above is that it might not be worth the effort.  When
it comes to parallel vacuum though, I'm not sure that you can just punt
on this question since you'll naturally end up spanning multiple
tablespaces concurrently, at least if the heap+indexes are spread across
multiple tablespaces and you're operating against more than one of those
relations at a time (which, I admit, I'm not 100% sure is actually
happening with this proposed patch set- if it isn't, then this isn't
really an issue, though that would be pretty unfortunate as then you
can't leverage multiple i/o channels concurrently and therefore Jeff's
question about why you'd be doing parallel vacuum with IO throttling is
a pretty good one).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Robert Haas
On Mon, Nov 4, 2019 at 2:04 PM Andres Freund  wrote:
> Is that really true? In the case where it started and failed we except
> the error queue to have been attached to, and there to be either an
> error 'E' or a 'X' response (cf HandleParallelMessage()).  It doesn't
> strike me as very complicated to keep track of whether any worker has
> sent an 'E' or not, no?  I don't think we really need the

One of us is confused here, because I don't think that helps. Consider
three background workers Alice, Bob, and Charlie. Alice fails to
launch because fork() fails. Bob launches but then exits unexpectedly.
Charlie has no difficulties and carries out his assigned duties.

Now, the system you are proposing will say that Charlie is OK but
Alice and Bob are a problem. However, that's the way it already works.
What Tom wants is to distinguish Alice from Bob, and your proposal is
of no help at all with that problem, so far as I can see.

> > We certainly can't ignore a worker that managed to start and
> > then bombed out, because it might've already, for example, claimed a
> > block from a Parallel Seq Scan and not yet sent back the corresponding
> > tuples. We could ignore a worker that never started at all, due to
> > EAGAIN or whatever else, but the original process that registered the
> > worker has no way of finding this out.
>
> Sure, but in that case we'd have gotten either an error back from the
> worker, or postmaster wouldhave PANIC restarted everyone due to an
> unhandled error in the worker, no?

An unhandled ERROR in the worker is not a PANIC. I think it's just an
ERROR that ends up being fatal in effect, but even if it's actually
promoted to FATAL, it's not a PANIC.

It is *generally* true that if a worker hits an ERROR, the error will
be propagated back to the leader, but it is not an invariable rule.
One pretty common way that it fails to happen - common in the sense
that it comes up during development, not common on production systems
I hope - is if a worker dies before reaching the call to
pq_redirect_to_shm_mq(). Before that, there's no possibility of
communicating anything. Granted, at that point we shouldn't yet have
done any work that might mess up the query results.  Similarly, once
we reach that point, we are dependent on a certain amount of good
behavior for things to work as expected; yeah, any code that calls
proc_exit() is suppose to signal an ERROR or FATAL first, but what if
it doesn't? Granted, in that case we'd probably fail to send an 'X'
message, too, so the leader would still have a chance to realize
something is wrong.

I guess I agree with you to this extent: I made a policy decision that
if a worker is successfully fails to show up, that's an ERROR. It
would be possible to adopt the opposite policy, namely that if a
worker doesn't show up, that's an "oh well." You'd have to be very
certain that the worker wasn't going to show up later, though. For
instance, suppose you check all of the shared memory queues used for
returning tuples and find that every queue is either in a state where
(1) nobody's ever attached to it or (2) somebody attached and then
detached. This is not good enough, because it's possible that after
you checked queue #1, and found it in the former state, someone
attached and read a block, which caused queue #2 to enter the latter
state before you got around to checking it. If you decide that it's OK
to decide that we're done at this point, you'll never return the
tuples that are pushed through queue #1.

But, assuming you nailed the door shut so that such problems could not
occur, I think we could make a decision to ignore works that failed
before doing anything interesting. Whether that would be a good policy
decision is pretty questionable in my mind. In addition to what I
mentioned before, I think there's a serious danger that errors that
users would have really wanted to know about - or developers would
really want to have known about - would get ignored. You could have
some horrible problem that's making your workers fail to launch, and
the system would just carry on as if everything were fine, except with
bad query plans. I realize that you and others might say "oh, well,
monitor your logs, then," but I think there is certainly some value in
an ordinary user being able to know that things didn't go well without
having to look into the PostgreSQL log for errors. Now, maybe you
think that's not enough value to justify having it work the way it
does today, and I certainly respect that, but I don't view it that way
myself.

What I mostly want to emphasize here is that, while parallel query has
had a number of bugs in this area that were the result of shoddy
design or inadequate testing - principally by me - this isn't one of
them. This decision was made consciously by me because I thought it
gave us the best chance of having a system that would be reliable and
have satisfying behavior for users. Sounds like not everybody agrees,
and that's fine, but I just want 

Re: 64 bit transaction id

2019-11-04 Thread Tomas Vondra

On Mon, Nov 04, 2019 at 10:44:53AM -0800, Andres Freund wrote:

Hi,

On 2019-11-04 19:39:18 +0100, Tomas Vondra wrote:

On Mon, Nov 04, 2019 at 10:04:09AM -0800, Andres Freund wrote:
> And "without causing significant issues elsewhere" unfortunately
> includes continuing to allow pg_upgrade to work.



Yeah. I suppose we could have a different AM implementing this, but
maybe that's not possible ...


Entirely possible. But the amount of code duplication / unnecessary
branching and the user confusion from two very similar AMs, would have
to be weighed against the benefits.



Agreed. I think code complexity is part of the trade-off. IMO it's fine
to hack existing heap AM initially, and only explore the separate AM if
that turns out to be promising.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: v12 and pg_restore -f-

2019-11-04 Thread Alvaro Herrera
On 2019-Nov-04, Euler Taveira wrote:

> Em seg., 4 de nov. de 2019 às 16:12, Alvaro Herrera
>  escreveu:
> > I'm not sure if we need to call out the incompatibility in the minors'
> > release notes (namely, that people using "-f-" to dump to ./- will need
> > to choose a different file name).
> >
> Should we break translations? I'm -0.5 on changing usage(). If you are
> using 9.5, you know that it does not work. If you try it by accident
> (because it works in v12), it will work but it is not that important
> to inform it in --help (if you are in doubt, checking the docs will
> answer your question).

I would rather break the translations, and make all users aware if they
look at --help.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: cost based vacuum (parallel)

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 14:33:41 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > > With parallelization across indexes, you could have a situation where
> > > the individual indexes are on different tablespaces with independent
> > > i/o, therefore the parallelization ends up giving you an increase in i/o
> > > throughput, not just additional CPU time.
> > 
> > How's that related to IO throttling being active or not?
> 
> You might find that you have to throttle the IO down when operating
> exclusively against one IO channel, but if you have multiple IO channels
> then the acceptable IO utilization could be higher as it would be 
> spread across the different IO channels.
> 
> In other words, the overall i/o allowance for a given operation might be
> able to be higher if it's spread across multiple i/o channels, as it
> wouldn't completely consume the i/o resources of any of them, whereas
> with a higher allowance and a single i/o channel, there would likely be
> an impact to other operations.
> 
> As for if this is really relevant only when it comes to parallel
> operations is a bit of an interesting question- these considerations
> might not require actual parallel operations as a single process might
> be able to go through multiple indexes concurrently and still hit the
> i/o limit that was set for it overall across the tablespaces.  I don't
> know that it would actually be interesting or useful to spend the effort
> to make that work though, so, from a practical perspective, it's
> probably only interesting to think about this when talking about
> parallel vacuum.

But you could just apply different budgets for different tablespaces?
That's quite doable independent of parallelism, as we don't have tables
or indexes spanning more than one tablespace.  True, you could then make
the processing of an individual vacuum faster by allowing to utilize
multiple tablespace budgets at the same time.


> I've been wondering if the accounting system should consider the cost
> per tablespace when there's multiple tablespaces involved, instead of
> throttling the overall process without consideration for the
> per-tablespace utilization.

This all seems like a feature proposal, or two, independent of the
patch/question at hand. I think there's a good argument to be had that
we should severely overhaul the current vacuum cost limiting - it's way
way too hard to understand the bandwidth that it's allowed to
consume. But unless one of the proposals makes that measurably harder or
easier, I think we don't gain anything by entangling an already complex
patchset with something new.


Greetings,

Andres Freund




Re: v12 and pg_restore -f-

2019-11-04 Thread Alvaro Herrera
On 2019-Nov-04, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Turns out that this is a simple partial cherry-pick of the original
> > commit.
> 
> In the back branches, you should keep the statement that stdout
> is the default output file.  Looks sane otherwise (I didn't test it).

I propose this:

   
Specify output file for generated script, or for the listing
when used with -l. Use -
for the standard output, which is also the default.
   

Less invasive formulations sound repetitive (such as "Use - for stdout.
The default is stdout").  I'm open to suggestions.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Excessive disk usage in WindowAgg

2019-11-04 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> On 2019-11-04 19:04:52 +, Andrew Gierth wrote:
 >>> Uh, it seems obvious to me that it should be backpatched?

 >> Fine with me. But I don't think it's just plainly obvious, it's
 >> essentailly a change in query plans etc, and we've been getting more
 >> hesitant with those over time.

 Tom> Since this is happening during create_plan(), it affects no
 Tom> planner decisions; it's just a pointless inefficiency AFAICS.
 Tom> Back-patching seems fine.

I will deal with it then. (probably tomorrow or so)

-- 
Andrew (irc:RhodiumToad)




Re: v12 and pg_restore -f-

2019-11-04 Thread Euler Taveira
Em seg., 4 de nov. de 2019 às 16:12, Alvaro Herrera
 escreveu:
> I'm not sure if we need to call out the incompatibility in the minors'
> release notes (namely, that people using "-f-" to dump to ./- will need
> to choose a different file name).
>
Should we break translations? I'm -0.5 on changing usage(). If you are
using 9.5, you know that it does not work. If you try it by accident
(because it works in v12), it will work but it is not that important
to inform it in --help (if you are in doubt, checking the docs will
answer your question).


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




Re: cost based vacuum (parallel)

2019-11-04 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > * Jeff Janes (jeff.ja...@gmail.com) wrote:
> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila  
> > > wrote:
> > > > For parallel vacuum [1], we were discussing what is the best way to
> > > > divide the cost among parallel workers but we didn't get many inputs
> > > > apart from people who are very actively involved in patch development.
> > > > I feel that we need some more inputs before we finalize anything, so
> > > > starting a new thread.
> > > 
> > > Maybe a I just don't have experience in the type of system that parallel
> > > vacuum is needed for, but if there is any meaningful IO throttling which 
> > > is
> > > active, then what is the point of doing the vacuum in parallel in the 
> > > first
> > > place?
> > 
> > With parallelization across indexes, you could have a situation where
> > the individual indexes are on different tablespaces with independent
> > i/o, therefore the parallelization ends up giving you an increase in i/o
> > throughput, not just additional CPU time.
> 
> How's that related to IO throttling being active or not?

You might find that you have to throttle the IO down when operating
exclusively against one IO channel, but if you have multiple IO channels
then the acceptable IO utilization could be higher as it would be 
spread across the different IO channels.

In other words, the overall i/o allowance for a given operation might be
able to be higher if it's spread across multiple i/o channels, as it
wouldn't completely consume the i/o resources of any of them, whereas
with a higher allowance and a single i/o channel, there would likely be
an impact to other operations.

As for if this is really relevant only when it comes to parallel
operations is a bit of an interesting question- these considerations
might not require actual parallel operations as a single process might
be able to go through multiple indexes concurrently and still hit the
i/o limit that was set for it overall across the tablespaces.  I don't
know that it would actually be interesting or useful to spend the effort
to make that work though, so, from a practical perspective, it's
probably only interesting to think about this when talking about
parallel vacuum.

I've been wondering if the accounting system should consider the cost
per tablespace when there's multiple tablespaces involved, instead of
throttling the overall process without consideration for the
per-tablespace utilization.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: v12 and pg_restore -f-

2019-11-04 Thread Tom Lane
Alvaro Herrera  writes:
> Turns out that this is a simple partial cherry-pick of the original
> commit.

In the back branches, you should keep the statement that stdout
is the default output file.  Looks sane otherwise (I didn't test it).

> I'm not sure if we need to call out the incompatibility in the minors'
> release notes (namely, that people using "-f-" to dump to ./- will need
> to choose a different file name).

Well, we'll have to document the addition of the feature.  I think it
can be phrased positively though.

regards, tom lane




Re: Excessive disk usage in WindowAgg

2019-11-04 Thread Tom Lane
Andres Freund  writes:
> On 2019-11-04 19:04:52 +, Andrew Gierth wrote:
>> Uh, it seems obvious to me that it should be backpatched?

> Fine with me. But I don't think it's just plainly obvious, it's
> essentailly a change in query plans etc, and we've been getting more
> hesitant with those over time.

Since this is happening during create_plan(), it affects no planner
decisions; it's just a pointless inefficiency AFAICS.  Back-patching
seems fine.

regards, tom lane




Re: v12 and pg_restore -f-

2019-11-04 Thread Alvaro Herrera
On 2019-Nov-04, Stephen Frost wrote:

> > Alvaro Herrera  writes:

> > > +1 for this, FWIW.  Let's get it done before next week minors.  Is
> > > anybody writing a patch?  If not, I can do it.

Turns out that this is a simple partial cherry-pick of the original
commit.

I'm not sure if we need to call out the incompatibility in the minors'
release notes (namely, that people using "-f-" to dump to ./- will need
to choose a different file name).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 2469191dc4b6ce434cc73f9a2fc57116035a6443 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 4 Nov 2019 15:50:57 -0300
Subject: [PATCH] Change pg_restore -f- to dump to stdout instead of to ./-

Starting with PostgreSQL 12, pg_restore refuses to run when neither -d
nor -f are specified (c.f. commit 413ccaa74d9a), and it also makes "-f -"
mean the old implicit behavior of dumping to stdout.  However, older
branches write to a file called ./- when invoked like that, making it
impossible to write pg_restore scripts that work across versions.  This
is a partial backpatch of the aforementioned commit to all older
supported branches, providing an upgrade path.

Discussion: https://postgr.es/m/20191006190839.ge18...@telsasoft.com
---
 doc/src/sgml/ref/pg_restore.sgml | 4 ++--
 src/bin/pg_dump/pg_backup_archiver.c | 7 ++-
 src/bin/pg_dump/pg_restore.c | 2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 725acb192c..18c9257ae9 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -176,8 +176,8 @@
   

 Specify output file for generated script, or for the listing
-when used with -l. Default is the standard
-output.
+when used with -l. Use -
+for stdout.

   
  
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 01b4af64f6..1ebbe852bc 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -1511,7 +1511,12 @@ SetOutput(ArchiveHandle *AH, const char *filename, int compression)
 	int			fn;
 
 	if (filename)
-		fn = -1;
+	{
+		if (strcmp(filename, "-") == 0)
+			fn = fileno(stdout);
+		else
+			fn = -1;
+	}
 	else if (AH->FH)
 		fn = fileno(AH->FH);
 	else if (AH->fSpec)
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 34d93ab472..f5df4e63d7 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -454,7 +454,7 @@ usage(const char *progname)
 
 	printf(_("\nGeneral options:\n"));
 	printf(_("  -d, --dbname=NAMEconnect to database name\n"));
-	printf(_("  -f, --file=FILENAME  output file name\n"));
+	printf(_("  -f, --file=FILENAME  output file name (- for stdout)\n"));
 	printf(_("  -F, --format=c|d|t   backup file format (should be automatic)\n"));
 	printf(_("  -l, --list   print summarized TOC of the archive\n"));
 	printf(_("  -v, --verboseverbose mode\n"));
-- 
2.20.1



Re: Excessive disk usage in WindowAgg

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 19:04:52 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  >>> Obviously we _do_ need to be more picky about this; it seems clear
>  >>> that using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many
>  >>> cases. Opinions?
> 
>  >> Seems reasonable to me, do you want to do the honors?
> 
>  Andres> I was briefly wondering if this ought to be backpatched. -0
>  Andres> here, but...
> 
> Uh, it seems obvious to me that it should be backpatched?

Fine with me. But I don't think it's just plainly obvious, it's
essentailly a change in query plans etc, and we've been getting more
hesitant with those over time.

Greetings,

Andres Freund




Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-10-09 12:29:18 -0400, Robert Haas wrote:
> > I would say rather that if fork() is failing on your system, you have
> > a not very stable system.
> 
> I don't think that's really true, fwiw. It's often a good idea to turn
> on strict memory overcommit accounting, and with that set, it's actually
> fairly common to see fork() fail with ENOMEM, even if there's
> practically a reasonable amount of resources. Especially with larger
> shared buffers and without huge pages, the amount of memory needed for a
> postmaster child in the worst case is not insubstantial.

I've not followed this thread very closely, but I agree with Andres here
wrt fork() failing with ENOMEM in the field and not because the system
isn't stable.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: cost based vacuum (parallel)

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> * Jeff Janes (jeff.ja...@gmail.com) wrote:
> > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila  wrote:
> > > For parallel vacuum [1], we were discussing what is the best way to
> > > divide the cost among parallel workers but we didn't get many inputs
> > > apart from people who are very actively involved in patch development.
> > > I feel that we need some more inputs before we finalize anything, so
> > > starting a new thread.
> > 
> > Maybe a I just don't have experience in the type of system that parallel
> > vacuum is needed for, but if there is any meaningful IO throttling which is
> > active, then what is the point of doing the vacuum in parallel in the first
> > place?
> 
> With parallelization across indexes, you could have a situation where
> the individual indexes are on different tablespaces with independent
> i/o, therefore the parallelization ends up giving you an increase in i/o
> throughput, not just additional CPU time.

How's that related to IO throttling being active or not?

Greetings,

Andres Freund




Re: cost based vacuum (parallel)

2019-11-04 Thread Stephen Frost
Greetings,

* Jeff Janes (jeff.ja...@gmail.com) wrote:
> On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila  wrote:
> > For parallel vacuum [1], we were discussing what is the best way to
> > divide the cost among parallel workers but we didn't get many inputs
> > apart from people who are very actively involved in patch development.
> > I feel that we need some more inputs before we finalize anything, so
> > starting a new thread.
> 
> Maybe a I just don't have experience in the type of system that parallel
> vacuum is needed for, but if there is any meaningful IO throttling which is
> active, then what is the point of doing the vacuum in parallel in the first
> place?

With parallelization across indexes, you could have a situation where
the individual indexes are on different tablespaces with independent
i/o, therefore the parallelization ends up giving you an increase in i/o
throughput, not just additional CPU time.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 12:14:53 -0500, Robert Haas wrote:
> If a process trying to register workers finds out that no worker slots
> are available, it discovers this at the time it tries to perform the
> registration. But fork() failure happens later and in a different
> process. The original process just finds out that the worker is
> "stopped," not whether or not it ever got started in the first
> place.

Is that really true? In the case where it started and failed we except
the error queue to have been attached to, and there to be either an
error 'E' or a 'X' response (cf HandleParallelMessage()).  It doesn't
strike me as very complicated to keep track of whether any worker has
sent an 'E' or not, no?  I don't think we really need the

Funny (?) anecdote: I learned about this part of the system recently,
after I had installed some crash handler inside postgres. Turns out that
that diverted, as a side-effect, SIGUSR1 to it's own signal handler. All
tests in the main regression tests passed, except for ones getting stuck
waiting for WaitForParallelWorkersToFinish(), which could be fixed by
disabling parallelism aggressively. Took me like two hours to
debug... Also, a bit sad that parallel query is the only visible
failure (in the main tests) of breaking the sigusr1 infrastructure...


> We certainly can't ignore a worker that managed to start and
> then bombed out, because it might've already, for example, claimed a
> block from a Parallel Seq Scan and not yet sent back the corresponding
> tuples. We could ignore a worker that never started at all, due to
> EAGAIN or whatever else, but the original process that registered the
> worker has no way of finding this out.

Sure, but in that case we'd have gotten either an error back from the
worker, or postmaster wouldhave PANIC restarted everyone due to an
unhandled error in the worker, no?


> And even if you solved for all of that, I think you might still find
> that it breaks some parallel query (or parallel create index) code
> that expects the number of workers to change at registration time, but
> not afterwards. So, that could would all need to be adjusted.

Fair enough. Although I think practically nearly everything has to be
ready to handle workers just being slow to start up anyway, no? There's
plenty cases where we just finish before all workers are getting around
to do work.

Greetings,

Andres Freund




Re: Excessive disk usage in WindowAgg

2019-11-04 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 >>> Obviously we _do_ need to be more picky about this; it seems clear
 >>> that using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many
 >>> cases. Opinions?

 >> Seems reasonable to me, do you want to do the honors?

 Andres> I was briefly wondering if this ought to be backpatched. -0
 Andres> here, but...

Uh, it seems obvious to me that it should be backpatched?

-- 
Andrew (irc:RhodiumToad)




Re: Wrong value in metapage of GIN INDEX.

2019-11-04 Thread Tom Lane
"imai.yoshik...@fujitsu.com"  writes:
> Moon-san, kuroda.keisuke-san
> On Thu, Aug 29, 2019 at 8:20 AM, Moon, Insung wrote:
>> The patch is very simple.
>> Fix to increase the value of nEntries only when a non-duplicate GIN index 
>> leaf added.

> Does nentries show the number of entries in the leaf pages?
> If so, the fix seems to be correct.

I looked at this issue.  The code in ginEntryInsert is not obviously wrong
by itself; it depends on what you think nEntries is supposed to count.
However, ginvacuum.c updates nEntries to the sum of PageGetMaxOffsetNumber()
across all the index's leaf pages, ie the number of surviving leaf items.

It's hard to see how ginvacuum could reverse-engineer a value that would
match what ginEntryInsert is doing, so probably we ought to define
nEntries as the number of leaf items, which seems to make the proposed
patch correct.  (It could use a bit more commentary though.)

I'm inclined to apply this to HEAD only; it doesn't seem significant
enough to justify back-patching.

regards, tom lane




Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Andres Freund
Hi,

On 2019-10-09 12:29:18 -0400, Robert Haas wrote:
> I would say rather that if fork() is failing on your system, you have
> a not very stable system.

I don't think that's really true, fwiw. It's often a good idea to turn
on strict memory overcommit accounting, and with that set, it's actually
fairly common to see fork() fail with ENOMEM, even if there's
practically a reasonable amount of resources. Especially with larger
shared buffers and without huge pages, the amount of memory needed for a
postmaster child in the worst case is not insubstantial.


> The fact that parallel query is going to fail is sad, but not as sad
> as the fact that connecting to the database is also going to fail, and
> that logging into the system to try to fix the problem may well fail
> as well.

Well, but parallel query also has to the potential to much more quickly
lead to a lot of new backends being started than you'd get new
connections on an analytics DB.


> Code that tries to make parallel query cope with this situation
> without an error wouldn't often be tested, so it might be buggy, and
> it wouldn't necessarily be a benefit if it did work. I expect many
> people would rather have the query fail and free up slots in the
> system process table than consume precisely all of them and then try
> to execute the query at a slower-than-expected rate.

I concede that you have a point here.

Greetings,

Andres Freund




Re: 64 bit transaction id

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 19:39:18 +0100, Tomas Vondra wrote:
> On Mon, Nov 04, 2019 at 10:04:09AM -0800, Andres Freund wrote:
> > And "without causing significant issues elsewhere" unfortunately
> > includes continuing to allow pg_upgrade to work.

> Yeah. I suppose we could have a different AM implementing this, but
> maybe that's not possible ...

Entirely possible. But the amount of code duplication / unnecessary
branching and the user confusion from two very similar AMs, would have
to be weighed against the benefits.

Greetings,

Andres Freund




Re: 64 bit transaction id

2019-11-04 Thread Tomas Vondra

On Mon, Nov 04, 2019 at 10:04:09AM -0800, Andres Freund wrote:

Hi,

(I've not read the rest of this thread yet)

On 2019-11-04 16:07:23 +0100, Tomas Vondra wrote:

On Mon, Nov 04, 2019 at 04:39:44PM +0300, Павел Ерёмин wrote:
>   And yet, if I try to implement a similar mechanism, if successful, will my
>   revision be considered?
>    

Why wouldn't it be considered? If you submit a patch that demonstrably
improves the behavior (in this case reduces per-tuple overhead without
causing significant issues elsewhere), we'd be crazy not to consider it.


And "without causing significant issues elsewhere" unfortunately
includes continuing to allow pg_upgrade to work.



Yeah. I suppose we could have a different AM implementing this, but
maybe that's not possible ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: cost based vacuum (parallel)

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila  wrote:
>
> > For parallel vacuum [1], we were discussing what is the best way to
> > divide the cost among parallel workers but we didn't get many inputs
> > apart from people who are very actively involved in patch development.
> > I feel that we need some more inputs before we finalize anything, so
> > starting a new thread.
> >
>
> Maybe a I just don't have experience in the type of system that parallel
> vacuum is needed for, but if there is any meaningful IO throttling which is
> active, then what is the point of doing the vacuum in parallel in the first
> place?

I am wondering the same - but to be fair, it's pretty easy to run into
cases where VACUUM is CPU bound. E.g. because most pages are in
shared_buffers, and compared to the size of the indexes number of tids
that need to be pruned is fairly small (also [1]). That means a lot of
pages need to be scanned, without a whole lot of IO going on. The
problem with that is just that the defaults for vacuum throttling will
also apply here, I've never seen anybody tune vacuum_cost_page_hit = 0,
vacuum_cost_page_dirty=0 or such (in contrast, the latter is the highest
cost currently).  Nor do we reduce the cost of vacuum_cost_page_dirty
for unlogged tables.

So while it doesn't seem unreasonable to want to use cost limiting to
protect against vacuum unexpectedly causing too much, especially read,
IO, I'm doubtful it has current practical relevance.

I'm wondering how much of the benefit of parallel vacuum really is just
to work around vacuum ringbuffers often massively hurting performance
(see e.g. [2]). Surely not all, but I'd be very unsurprised if it were a
large fraction.

Greetings,

Andres Freund

[1] I don't think the patch addresses this, IIUC it's only running index
vacuums in parallel, but it's very easy to run into being CPU
bottlenecked when vacuuming a busily updated table. heap_hot_prune
can be really expensive, especially with longer update chains (I
think it may have an O(n^2) worst case even).
[2] 
https://www.postgresql.org/message-id/20160406105716.fhk2eparljthpzp6%40alap3.anarazel.de




Re: cost based vacuum (parallel)

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 12:24:35 +0530, Amit Kapila wrote:
> For parallel vacuum [1], we were discussing what is the best way to
> divide the cost among parallel workers but we didn't get many inputs
> apart from people who are very actively involved in patch development.
> I feel that we need some more inputs before we finalize anything, so
> starting a new thread.
> 
> The initial version of the patch has a very rudimentary way of doing
> it which means each parallel vacuum worker operates independently
> w.r.t vacuum delay and cost.

Yea, that seems not ok for cases where vacuum delay is active.

There's also the question of when/why it is beneficial to use
parallelism when you're going to encounter IO limits in all likelihood.


> This will lead to more I/O in the system
> than the user has intended to do.  Assume that the overall I/O allowed
> for vacuum operation is X after which it will sleep for some time,
> reset the balance and continue.  In the patch, each worker will be
> allowed to perform X before which it can sleep and also there is no
> coordination for the same with master backend which would have done
> some I/O for the heap.  So, in the worst-case scenario, there can be n
> times more I/O where n is the number of workers doing the parallel
> operation.  This is somewhat similar to a memory usage problem with a
> parallel query where each worker is allowed to use up to work_mem of
> memory.  We can say that the users using parallel operation can expect
> more system resources to be used as they want to get the operation
> done faster, so we are fine with this.  However, I am not sure if that
> is the right thing, so we should try to come up with some solution for
> it and if the solution is too complex, then probably we can think of
> documenting such behavior.

I mean for parallel query the problem wasn't really introduced in
parallel query, it existed before - and does still - for non-parallel
queries. And there's a complex underlying planning issue. I don't think
this is a good excuse for VACUUM, where none of the complex "number of
paths considered" issues etc apply.


> The two approaches to solve this problem being discussed in that
> thread [1] are as follows:
> (a) Allow the parallel workers and master backend to have a shared
> view of vacuum cost related parameters (mainly VacuumCostBalance) and
> allow each worker to update it and then based on that decide whether
> it needs to sleep.  Sawada-San has done the POC for this approach.
> See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> drawback of this approach could be that we allow the worker to sleep
> even though the I/O has been performed by some other worker.

I don't understand this drawback.


> (b) The other idea could be that we split the I/O among workers
> something similar to what we do for auto vacuum workers (see
> autovac_balance_cost).  The basic idea would be that before launching
> workers, we need to compute the remaining I/O (heap operation would
> have used something) after which we need to sleep and split it equally
> across workers.  Here, we are primarily thinking of dividing
> VacuumCostBalance and VacuumCostLimit parameters.  Once the workers
> are finished, they need to let master backend know how much I/O they
> have consumed and then master backend can add it to it's current I/O
> consumed.  I think we also need to rebalance the cost of remaining
> workers once some of the worker's exit.  Dilip has prepared a POC
> patch for this, see 0002-POC-divide-vacuum-cost-limit in email [3].

(b) doesn't strike me as advantageous. It seems quite possible that you
end up with one worker that has a lot more IO than others, leading to
unnecessary sleeps, even though the actually available IO budget has not
been used up. Quite easy to see how that'd lead to parallel VACUUM
having a lower throughput than a single threaded one.

Greetings,

Andres Freund




Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Nov-04, Robert Haas wrote:
>> On Mon, Nov 4, 2019 at 10:42 AM Alvaro Herrera  
>> wrote:
>>> I agree with this point in principle.  Everything else (queries,
>>> checkpointing) can fail, but it's critical that postmaster continues to
>>> run [...]

>> Sure, I'm not arguing that the postmaster should blow up and die.

> I must have misinterpreted you, then.  But then I also misinterpreted
> Tom, because I thought it was this stability problem that was "utter
> bunkum".

I fixed the postmaster crash problem in commit 3887e9455.  The residual
issue that I think is entirely bogus is that the parallel query start
code will silently continue without workers if it hits our internal
resource limit of how many bgworker ProcArray slots there are, but
not do the same when it hits the external resource limit of the
kernel refusing to fork().  I grant that there might be implementation
reasons for that being difficult, but I reject Robert's apparent
opinion that it's somehow desirable to behave that way.  As things
stand, we have all of the disadvantages that you can't predict how
many workers you'll get, and none of the advantages of robustness
in the face of system resource exhaustion.

regards, tom lane




Re: Excessive disk usage in WindowAgg

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 12:18:48 -0500, Tom Lane wrote:
> Andrew Gierth  writes:
> > Using 92MB of disk for one integer seems excessive; the reason is clear
> > from the explain:
> > ...
> > so the whole width of the table is being stored in the tuplestore used
> > by the windowagg.
> 
> > In create_windowagg_plan, we have:
> 
> > /*
> >  * WindowAgg can project, so no need to be terribly picky about child
> >  * tlist, but we do need grouping columns to be available
> >  */
> > subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST);
> 
> > Obviously we _do_ need to be more picky about this; it seems clear that
> > using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many cases.
> > Opinions?
> 
> Seems reasonable to me, do you want to do the honors?

I was briefly wondering if this ought to be backpatched. -0 here, but...

Greetings,

Andres Freund




Re: 64 bit transaction id

2019-11-04 Thread Andres Freund
Hi,

(I've not read the rest of this thread yet)

On 2019-11-04 16:07:23 +0100, Tomas Vondra wrote:
> On Mon, Nov 04, 2019 at 04:39:44PM +0300, Павел Ерёмин wrote:
> >   And yet, if I try to implement a similar mechanism, if successful, will my
> >   revision be considered?
> >    
> 
> Why wouldn't it be considered? If you submit a patch that demonstrably
> improves the behavior (in this case reduces per-tuple overhead without
> causing significant issues elsewhere), we'd be crazy not to consider it.

And "without causing significant issues elsewhere" unfortunately
includes continuing to allow pg_upgrade to work.

Greetings,

Andres Freund




Re: [PATCH] contrib/seg: Fix PG_GETARG_SEG_P definition

2019-11-04 Thread Andres Freund
Hi,

On 2019-11-04 11:30:23 +, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane  writes:
> 
> > ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> >> I just noticed that when contrib/seg was converted to V1 calling
> >> convention (commit 389bb2818f4), the PG_GETARG_SEG_P() macro got defined
> >> in terms of PG_GETARG_POINTER().  But it itself calls DatumGetPointer(),
> >> so shouldn't it be using PG_GETARG_DATUM()?
> >
> > Yup, I agree.  Pushed.
> 
> Thanks!

Thanks both of you.

- Andres




Re: [PATCH] Include triggers in EXPLAIN

2019-11-04 Thread Andres Freund
Hi,

(minor note - on PG lists the style is to quote in-line and trip)

On 2019-11-04 10:35:25 +0100, Josef Šimánek wrote:
> Thanks for quick response. As I was testing this feature it shows all
> "possible" triggers to be executed running given query. The benefit of
> having this information in EXPLAIN as well is you do not need to execute
> the query (as EXPLAIN ANALYZE does). My usecase is to take a look at query
> before it is executed to get some idea about the plan with EXPLAIN.

I can actually see some value in additional information here, but I'd
probably want to change the format a bit. When explicitly desired (or
perhaps just in verbose mode?), I see value in counting the number of
triggers we know about that need to be checked, how many were excluded
on the basis of the trigger's WHEN clause etc.


> Do you have idea about some case where actual trigger will be missing in
> EXPLAIN with current implementation, but will be present in EXPLAIN
> ANALYZE? I can take a look if there's any way how to handle those cases as
> well.

Any triggers that are fired because of other, listed, triggers causing
other changes. E.g. a logging trigger that inserts into a log table -
EXPLAIN, without ANALYZE, doesn't have a way of knowing about that.

And before you say that sounds like a niche issue - it's not in my
experience. Forgetting the necessary indexes two or three foreign keys
down a CASCADE chain seems to be more common than doing so for tables
directly "linked" with busy ones.

Greetings,

Andres Freund




Re: cost based vacuum (parallel)

2019-11-04 Thread Jeff Janes
On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila  wrote:

> For parallel vacuum [1], we were discussing what is the best way to
> divide the cost among parallel workers but we didn't get many inputs
> apart from people who are very actively involved in patch development.
> I feel that we need some more inputs before we finalize anything, so
> starting a new thread.
>

Maybe a I just don't have experience in the type of system that parallel
vacuum is needed for, but if there is any meaningful IO throttling which is
active, then what is the point of doing the vacuum in parallel in the first
place?

Cheers,

Jeff


Re: Keep compiler silence (clang 10, implicit conversion from 'long' to 'double' )

2019-11-04 Thread Tom Lane
Yuya Watari  writes:
> I attached the modified patch. In the patch, I placed the macro in
> "src/include/c.h", but this may not be a good choice because c.h is
> widely included from a lot of files. Do you have any good ideas about
> its placement?

I agree that there's an actual bug here; it can be demonstrated with

# select extract(epoch from '256 microseconds'::interval * (2^55)::float8);
 date_part  

 -9223372036854.775
(1 row)

which clearly is a wrong answer.

I do not however like any of the proposed patches.  We already have one
place that deals with this problem correctly, in int8.c's dtoi8():

/*
 * Range check.  We must be careful here that the boundary values are
 * expressed exactly in the float domain.  We expect PG_INT64_MIN to be an
 * exact power of 2, so it will be represented exactly; but PG_INT64_MAX
 * isn't, and might get rounded off, so avoid using it.
 */
if (unlikely(num < (float8) PG_INT64_MIN ||
 num >= -((float8) PG_INT64_MIN) ||
 isnan(num)))
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("bigint out of range")));

We should adopt that coding technique not invent new ones.

I do concur with creating a macro that encapsulates a correct version
of this test, maybe like

#define DOUBLE_FITS_IN_INT64(num) \
((num) >= (double) PG_INT64_MIN && \
 (num) < -((double) PG_INT64_MIN))

(or s/double/float8/ ?)

c.h is probably a reasonable place, seeing that we define the constants
there.

regards, tom lane




Re: updating unaccent.rules for Arabic letters

2019-11-04 Thread Daniel Verite
kerbrose khaled wrote:

> I would like to update unaccent.rules file to support Arabic letters. so
> could someone help me or tell me how could I add such contribution. I
> attached the file including the modifications, only the last 4 lines.

The Arabic letters are found in the Unicode block U+0600 to U+06FF 
(https://www.fileformat.info/info/unicode/block/arabic/list.htm)
There has been no coverage of this block until now by the unaccent
module. Since Arabic uses several diacritics [1] , it would be best to
figure out all the transliterations that should go in and let them in
one go (plus coding that in the Python script).

The canonical way to unaccent is normally to apply a Unicode
transformation: NFC -> NFD and remove the non-spacing marks.

I've tentatively did that with each codepoint in the 0600-06FF block
in SQL with icu_transform in icu_ext [2], and it produces the
attached result, with 60 (!) entries, along with Unicode names for
readability.

Does that make sense to people who know Arabic?

For the record, here's the query:

WITH block(cp) AS (select * FROM generate_series(x'600'::int,x'6ff'::int) AS
cp),
  dest AS (select cp, icu_transform(chr(cp), 'any-NFD;[:nonspacing mark:]
any-remove; any-NFC') AS unaccented FROM block)
SELECT
  chr(cp) as "src",
  icu_transform(chr(cp), 'Name') as "srcName",
  dest.unaccented as "dest",
  icu_transform(dest.unaccented, 'Name') as "destName"
FROM dest
WHERE chr(cp) <> dest.unaccented;


[1] https://en.wikipedia.org/wiki/Arabic_diacritics
[2] https://github.com/dverite/icu_ext#icu_transform


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


unaccent-arabic-block.utf8.output
Description: Binary data


Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Alvaro Herrera
On 2019-Nov-04, Robert Haas wrote:

> On Mon, Nov 4, 2019 at 10:42 AM Alvaro Herrera  
> wrote:
> > > True, it's not a situation you especially want to be in.  However,
> > > I've lost count of the number of times that I've heard someone talk
> > > about how their system was overstressed to the point that everything
> > > else was failing, but Postgres kept chugging along.  That's a good
> > > reputation to have and we shouldn't just walk away from it.
> >
> > I agree with this point in principle.  Everything else (queries,
> > checkpointing) can fail, but it's critical that postmaster continues to
> > run [...]
> 
> Sure, I'm not arguing that the postmaster should blow up and die.

I must have misinterpreted you, then.  But then I also misinterpreted
Tom, because I thought it was this stability problem that was "utter
bunkum".

> I was, however, arguing that if the postmaster fails to launch workers
> for a parallel query due to process table exhaustion, it's OK for
> *that query* to error out.

That position makes sense to me.  It would be nice [..ponies..] for the
query to run regardless, but if it doesn't, it's not such a big deal;
the query could have equally failed to run in a single process anyway.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Excessive disk usage in WindowAgg

2019-11-04 Thread Tom Lane
Andrew Gierth  writes:
> Using 92MB of disk for one integer seems excessive; the reason is clear
> from the explain:
> ...
> so the whole width of the table is being stored in the tuplestore used
> by the windowagg.

> In create_windowagg_plan, we have:

> /*
>  * WindowAgg can project, so no need to be terribly picky about child
>  * tlist, but we do need grouping columns to be available
>  */
> subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST);

> Obviously we _do_ need to be more picky about this; it seems clear that
> using CP_SMALL_TLIST | CP_LABEL_TLIST would be a win in many cases.
> Opinions?

Seems reasonable to me, do you want to do the honors?

regards, tom lane




Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Robert Haas
On Mon, Nov 4, 2019 at 10:42 AM Alvaro Herrera  wrote:
> > True, it's not a situation you especially want to be in.  However,
> > I've lost count of the number of times that I've heard someone talk
> > about how their system was overstressed to the point that everything
> > else was failing, but Postgres kept chugging along.  That's a good
> > reputation to have and we shouldn't just walk away from it.
>
> I agree with this point in principle.  Everything else (queries,
> checkpointing) can fail, but it's critical that postmaster continues to
> run -- that way, once the high load episode is over, connections can be
> re-established as needed, auxiliary processes can be re-launched, and
> the system can be again working normally.  If postmaster dies, all bets
> are off.  Also: an idle postmaster is not using any resources; on its
> own, killing it or it dying would not free any useful resources for the
> system load to be back to low again.

Sure, I'm not arguing that the postmaster should blow up and die.

I was, however, arguing that if the postmaster fails to launch workers
for a parallel query due to process table exhaustion, it's OK for
*that query* to error out.

Tom finds that argument to be "utter bunkum," but I don't agree. I
think there might also be some implementation complexity there that is
more than meets the eye. If a process trying to register workers finds
out that no worker slots are available, it discovers this at the time
it tries to perform the registration. But fork() failure happens later
and in a different process. The original process just finds out that
the worker is "stopped," not whether or not it ever got started in the
first place. We certainly can't ignore a worker that managed to start
and then bombed out, because it might've already, for example, claimed
a block from a Parallel Seq Scan and not yet sent back the
corresponding tuples. We could ignore a worker that never started at
all, due to EAGAIN or whatever else, but the original process that
registered the worker has no way of finding this out.

Now you might think we could just fix that by having the postmaster
record something in the slot, but that doesn't work either, because
the slot could get reused before the original process checks the
status information. The fact that the slot has been reused is
sufficient evidence that the worker was unregistered, which means it
either stopped or we gave up on starting it, but it doesn't tell us
which one. To be able to tell that, we'd have to have a mechanism to
prevent slots from getting reused until any necessary exit status
information had bene read, sort of like the OS-level zombie process
mechanism (which we all love, I guess, and therefore definitely want
to reinvent...?). The postmaster logic would need to be made more
complicated, so that zombies couldn't accumulate: if a process asked
for status notifications, but then died, any zombies waiting for it
would need to be cleared. And you'd also have to make sure that a
process which didn't die was guaranteed to read the status from the
zombie to clear it, and that it did so in a reasonably timely fashion,
which is currently in no way guaranteed and does not appear at all
straightforward to guarantee.

And even if you solved for all of that, I think you might still find
that it breaks some parallel query (or parallel create index) code
that expects the number of workers to change at registration time, but
not afterwards. So, that could would all need to be adjusted.

In short, I think Tom wants a pony. But that does not mean we should
not fix this bug.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Issues with PAM : log that it failed, whether it actually failed or not

2019-11-04 Thread Tom Lane
[ redirecting to pgsql-hackers ]

I wrote:
> La Cancellera Yoann  writes:
>> I am having issues with PAM auth :
>> it works, password are correctly checked, unknown users cannot access,
>> known user can, everything looks good
>> But, it always log an error by default even if auth is succesful:
>> And if auth is unsuccessful, it will log that very same message twice

> Those aren't errors, they're just log events.

> If you're using psql to connect, the extra messages aren't surprising,
> because psql will first try to connect without a password, and only
> if it gets a failure that indicates that a password is needed will
> it prompt the user for a password (so two connection attempts occur,
> even if the second one is successful).  You can override that default
> behavior with the -W switch, and I bet that will make the extra
> log messages go away.

> Having said that, using LOG level for unsurprising auth failures
> seems excessively chatty.  More-commonly-used auth methods aren't
> that noisy.

I took a closer look at this and realized that the problem is that
the PAM code doesn't support our existing convention of not logging
anything about connections wherein the client side disconnects when
challenged for a password.  0001 attached fixes that, not in a
terribly nice way perhaps, but the PAM code is already relying on
static variables for communication :-(.

Also, 0002 adjusts some messages in the same file to match project
capitalization conventions.

Barring objections, I propose to back-patch 0001 but apply 0002
to HEAD only.

regards, tom lane

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index d28271c..909d736 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -110,6 +110,7 @@ static const char *pam_passwd = NULL;	/* Workaround for Solaris 2.6
 		 * brokenness */
 static Port *pam_port_cludge;	/* Workaround for passing "Port *port" into
  * pam_passwd_conv_proc */
+static bool pam_no_password;	/* For detecting no-password-given */
 #endif			/* USE_PAM */
 
 
@@ -2099,8 +2099,10 @@ pam_passwd_conv_proc(int num_msg, const struct pam_message **msg,
 	{
 		/*
 		 * Client didn't want to send password.  We
-		 * intentionally do not log anything about this.
+		 * intentionally do not log anything about this,
+		 * either here or at higher levels.
 		 */
+		pam_no_password = true;
 		goto fail;
 	}
 }
@@ -2159,6 +2161,7 @@ CheckPAMAuth(Port *port, const char *user, const char *password)
 	 */
 	pam_passwd = password;
 	pam_port_cludge = port;
+	pam_no_password = false;
 
 	/*
 	 * Set the application data portion of the conversation struct.  This is
@@ -2244,22 +2247,26 @@ CheckPAMAuth(Port *port, const char *user, const char *password)
 
 	if (retval != PAM_SUCCESS)
 	{
-		ereport(LOG,
-(errmsg("pam_authenticate failed: %s",
-		pam_strerror(pamh, retval;
+		/* If pam_passwd_conv_proc saw EOF, don't log anything */
+		if (!pam_no_password)
+			ereport(LOG,
+	(errmsg("pam_authenticate failed: %s",
+			pam_strerror(pamh, retval;
 		pam_passwd = NULL;		/* Unset pam_passwd */
-		return STATUS_ERROR;
+		return pam_no_password ? STATUS_EOF : STATUS_ERROR;
 	}
 
 	retval = pam_acct_mgmt(pamh, 0);
 
 	if (retval != PAM_SUCCESS)
 	{
-		ereport(LOG,
-(errmsg("pam_acct_mgmt failed: %s",
-		pam_strerror(pamh, retval;
+		/* If pam_passwd_conv_proc saw EOF, don't log anything */
+		if (!pam_no_password)
+			ereport(LOG,
+	(errmsg("pam_acct_mgmt failed: %s",
+			pam_strerror(pamh, retval;
 		pam_passwd = NULL;		/* Unset pam_passwd */
-		return STATUS_ERROR;
+		return pam_no_password ? STATUS_EOF : STATUS_ERROR;
 	}
 
 	retval = pam_end(pamh, retval);
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index d28271c..909d736 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -959,7 +960,7 @@ CheckSCRAMAuth(Port *port, char *shadow_pass, char **logdetail)
 			return STATUS_ERROR;
 		}
 
-		elog(DEBUG4, "Processing received SASL response of length %d", buf.len);
+		elog(DEBUG4, "processing received SASL response of length %d", buf.len);
 
 		/*
 		 * The first SASLInitialResponse message is different from the others.
@@ -1150,7 +1151,7 @@ pg_GSS_recvauth(Port *port)
 		gbuf.length = buf.len;
 		gbuf.value = buf.data;
 
-		elog(DEBUG4, "Processing received GSS token of length %u",
+		elog(DEBUG4, "processing received GSS token of length %u",
 			 (unsigned int) gbuf.length);
 
 		maj_stat = gss_accept_sec_context(
@@ -1427,8 +1428,7 @@ pg_SSPI_recvauth(Port *port)
 		outbuf.pBuffers = OutBuffers;
 		outbuf.ulVersion = SECBUFFER_VERSION;
 
-
-		elog(DEBUG4, "Processing received SSPI token of length %u",
+		elog(DEBUG4, "processing received SSPI token of length %u",
 			 (unsigned int) buf.len);
 
 		r = AcceptSecurityContext(&sspicred,
@@ -2949,7 +2956,7 @@ radius_add_attribute(radius_packet *packet, uint8 type, 

Re: Obsolete comment in partbounds.c

2019-11-04 Thread Alvaro Herrera
On 2019-Oct-19, Etsuro Fujita wrote:

> On Fri, Oct 18, 2019 at 6:56 PM Alvaro Herrera  
> wrote:

> > Yeah, agreed.  Instead of "the null comes from" I would use "the
> > partition that stores nulls".
> 
> I think your wording is better than mine.  Thank you for reviewing!

Thanks for getting this done.

> > While reviewing your patch I noticed a few places where we use an odd
> > pattern in switches, which can be simplified as shown here.
> 
> case PARTITION_STRATEGY_LIST:
> -   num_indexes = bound->ndatums;
> +   return bound->ndatums;
> break;
> 
> Why not remove the break statement?

You're right, I should have done that.  However, I backed out of doing
this change after all; it seems a pretty minor stylistic adjustment of
little value.

Thanks for reviewing all the same,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Log statement sample - take two

2019-11-04 Thread Adrien Nayrat
On 11/4/19 2:08 AM, Tomas Vondra wrote:
> 
> Seems fine to me, mostly. I think the docs should explain how
> log_min_duration_statement interacts with log_min_duration_sample.
> Attached is a patch doing that, by adding one para to each GUC, along
> with some minor rewordings. I think the docs are mixing "sampling"
> vs. "logging" and "durations" vs. "statements" not sure.

Thanks for the rewording, it's clearer now.

> 
> I also think the two new sampling GUCs (log_min_duration_sample and
> log_statement_sample_rate) should be next to each other. We're not
> ordering the GUCs alphabetically anyway.

+1

> 
> I plan to make those changes and push in a couple days.
> 

Thanks!




Re: v12 and pg_restore -f-

2019-11-04 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Alvaro Herrera  writes:
> > On 2019-Oct-17, Tom Lane wrote:
> >> Stephen Frost  writes:
> >>> Tom, I take it your suggestion is to have '-f -' be accepted to mean
> >>> 'goes to stdout' in all branches?
> 
> >> Yes.
> 
> > +1 for this, FWIW.  Let's get it done before next week minors.  Is
> > anybody writing a patch?  If not, I can do it.
> 
> Please do.

+1

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: auxiliary processes in pg_stat_ssl

2019-11-04 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Nov 4, 2019 at 8:26 AM Alvaro Herrera  
> wrote:
> > On 2019-Sep-04, Alvaro Herrera wrote:
> > > I just noticed that we list auxiliary processes in pg_stat_ssl:
> > [...]
> > > But this seems pointless.  Should we not hide those?  Seems this only
> > > happened as an unintended side-effect of fc70a4b0df38.  It appears to me
> > > that we should redefine that view to restrict backend_type that's
> > > 'client backend' (maybe include 'wal receiver'/'wal sender' also, not
> > > sure.)
> >
> > [crickets]
> >
> > Robert, Kuntal, any opinion on this?
> 
> I think if I were doing something about it, I'd probably try to filter
> on a field that directly represents whether there is a connection,
> rather than checking the backend type. That way, if the list of
> backend types that have client connections changes later, there's
> nothing to update. Like "WHERE client_port IS NOT NULL," or something
> of that sort.

Yeah, using a "this has a connection" would be better and, as also noted
on this thread, pg_stat_gssapi should get similar treatment.

Based on what we claim in our docs, it does look like 'client_port IS
NOT NULL' should work.  I do think we might want to update the docs to
make it a bit more explicit, what we say now is:

TCP port number that the client is using for communication with this
backend, or -1 if a Unix socket is used

We don't explain there that NULL means the backend doesn't have an
external connection even though plenty of those entries show up in every
instance of PG.  Perhaps we should add this:

If this field is null, it indicates that this is an internal process
such as autovacuum.

Which is what we say for 'client_addr'.

I have to admit that while it's handy that we just shove '-1' into
client_port when it's a unix socket, it's kind of ugly from a data
perspective- in a green field it'd probably be better to have a
"connection type" field that then indicates which other fields are valid
instead of having a special constant, but that ship sailed long ago and
it's not like we have a lot of people complaining about it, so I suppose
just using it here as suggested is fine.

Thanks,

Stephen


signature.asc
Description: PGP signature


Do we have a CF manager for November?

2019-11-04 Thread Tom Lane
It's time to start the next commitfest.  I seem to recall somebody
saying back in September that they'd run the next one, but I forget
who.  Anyway, we need a volunteer to be chief nagger.

regards, tom lane




Re: Missed check for too-many-children in bgworker spawning

2019-11-04 Thread Alvaro Herrera
On 2019-Oct-09, Tom Lane wrote:

> Robert Haas  writes:
> > On Wed, Oct 9, 2019 at 10:21 AM Tom Lane  wrote:
> >> We could improve on matters so far as the postmaster's child-process
> >> arrays are concerned, by defining separate slot "pools" for the different
> >> types of child processes.  But I don't see much point if the code is
> >> not prepared to recover from a fork() failure --- and if it is, that
> >> would a fortiori deal with out-of-child-slots as well.
> 
> > I would say rather that if fork() is failing on your system, you have
> > a not very stable system. The fact that parallel query is going to
> > fail is sad, but not as sad as the fact that connecting to the
> > database is also going to fail, and that logging into the system to
> > try to fix the problem may well fail as well.
> 
> True, it's not a situation you especially want to be in.  However,
> I've lost count of the number of times that I've heard someone talk
> about how their system was overstressed to the point that everything
> else was failing, but Postgres kept chugging along.  That's a good
> reputation to have and we shouldn't just walk away from it.

I agree with this point in principle.  Everything else (queries,
checkpointing) can fail, but it's critical that postmaster continues to
run -- that way, once the high load episode is over, connections can be
re-established as needed, auxiliary processes can be re-launched, and
the system can be again working normally.  If postmaster dies, all bets
are off.  Also: an idle postmaster is not using any resources; on its
own, killing it or it dying would not free any useful resources for the
system load to be back to low again.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: adding partitioned tables to publications

2019-11-04 Thread Rafia Sabih
Hi Amit,

On Fri, 11 Oct 2019 at 08:06, Amit Langote  wrote:

>
> Thanks for sharing this case.  I hadn't considered it, but you're
> right that it should be handled sensibly.  I have fixed table sync
> code to handle this case properly.  Could you please check your case
> with the attached updated patch?
>
> I was checking this today and found that the behavior doesn't change much
with the updated patch. The tables are still replicated, just that a select
count from parent table shows 0, rest of the partitions including default
one has the data from the publisher. I was expecting more like an error at
subscriber saying the table type is not same.

Please find the attached file for the test case, in case something is
unclear.

-- 
Regards,
Rafia Sabih
-- publisher
create table t (i int, j int, k text) partition by range(i);
create table child1 partition of t for values from ( 1) to (100);
create table child2 partition of t for values from (100) to (200);
create table child3 partition of t for values from (200) to (300);
create table child4 partition of t for values from (300) to (400);
create table child5 partition of t for values from (400) to (500);
create table def partition of t DEFAULT;

create publication mypub;
alter publication mypub add table t;

insert into t values (generate_series(1,500), generate_series(1,500), 
repeat('jqbsuyt7832edjw', 20));

--subscriber
create table t (i int, j int, k text);
create table child1 (i int, j int, k text);
create table child5 (i int, j int, k text);
create table child4 (i int, j int, k text);
create table child3 (i int, j int, k text);
create table child2 (i int, j int, k text);
create table def (i int, j int, k text);

create subscription mysub connection 'host=localhost port=5432 dbname=postgres' 
publication mypub;



Re: auxiliary processes in pg_stat_ssl

2019-11-04 Thread Euler Taveira
Em qua., 4 de set. de 2019 às 12:15, Alvaro Herrera
 escreveu:
>
> I just noticed that we list auxiliary processes in pg_stat_ssl:
>
> 55432 13devel 28627=# select * from pg_stat_ssl ;
>   pid  │ ssl │ version │ cipher │ bits │ compression │ 
> client_dn │ client_serial │ issuer_dn
> ───┼─┼─┼┼──┼─┼───┼───┼───
>  28618 │ f   │ ││  │ │
>│   │
>  28620 │ f   │ ││  │ │
>│   │
>  28627 │ t   │ TLSv1.3 │ TLS_AES_256_GCM_SHA384 │  256 │ f   │
>│   │
>  28616 │ f   │ ││  │ │
>│   │
>  28615 │ f   │ ││  │ │
>│   │
>  28617 │ f   │ ││  │ │
>│   │
> (6 filas)
>
> 55432 13devel 28627=# select pid, backend_type from pg_stat_activity ;
>   pid  │ backend_type
> ───┼──
>  28618 │ autovacuum launcher
>  28620 │ logical replication launcher
>  28627 │ client backend
>  28616 │ background writer
>  28615 │ checkpointer
>  28617 │ walwriter
> (6 filas)
>
> But this seems pointless.  Should we not hide those?  Seems this only
> happened as an unintended side-effect of fc70a4b0df38.  It appears to me
> that we should redefine that view to restrict backend_type that's
> 'client backend' (maybe include 'wal receiver'/'wal sender' also, not
> sure.)
>
Yep, it is pointless. BackendType that open connections to server are:
autovacuum worker, client backend, background worker, wal sender. I
also notice that pg_stat_gssapi is in the same boat as pg_stat_ssl and
we should constraint the rows to backend types that open connections.
I'm attaching a patch to list only connections in those system views.



--
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
From aca42a2a3dc95fa2b4a1e6a1960d5cc834850034 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 4 Nov 2019 14:45:38 +
Subject: [PATCH] Show only rows that open connections in some system views

It is pointless to show auxiliary processes that do not open connections
to Postgres in pg_stat_ssl and pg_stat_gssapi. This change affects
compatibility with previous versions.
---
 src/backend/catalog/system_views.sql | 6 --
 src/test/regress/expected/rules.out  | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9fe4a47..ce39808 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -826,7 +826,8 @@ CREATE VIEW pg_stat_ssl AS
 S.ssl_client_dn AS client_dn,
 S.ssl_client_serial AS client_serial,
 S.ssl_issuer_dn AS issuer_dn
-FROM pg_stat_get_activity(NULL) AS S;
+FROM pg_stat_get_activity(NULL) AS S
+	WHERE S.client_port IS NOT NULL;
 
 CREATE VIEW pg_stat_gssapi AS
 SELECT
@@ -834,7 +835,8 @@ CREATE VIEW pg_stat_gssapi AS
 S.gss_auth AS gss_authenticated,
 S.gss_princ AS principal,
 S.gss_enc AS encrypted
-FROM pg_stat_get_activity(NULL) AS S;
+FROM pg_stat_get_activity(NULL) AS S
+	WHERE S.client_port IS NOT NULL;
 
 CREATE VIEW pg_replication_slots AS
 SELECT
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..14e7214 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1845,7 +1845,8 @@ pg_stat_gssapi| SELECT s.pid,
 s.gss_auth AS gss_authenticated,
 s.gss_princ AS principal,
 s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
+  WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_cluster| SELECT s.pid,
 s.datid,
 d.datname,
@@ -1964,7 +1965,8 @@ pg_stat_ssl| SELECT s.pid,
 s.ssl_client_dn AS client_dn,
 s.ssl_client_serial AS cl

Re: Collation versioning

2019-11-04 Thread Julien Rouhaud
On Mon, Nov 4, 2019 at 11:13 AM Julien Rouhaud  wrote:
>
> On Mon, Nov 4, 2019 at 4:58 AM Thomas Munro  wrote:
> >
> > Here are some problems to think about:
> >
> > * We'd need to track dependencies on the default collation once we
> > have versioning for that [...]

Another problem I just thought about is how to avoid discrepancy of
collation version for indexes on shared objects, such as
pg_database_datname_index.




Re: v12 and pg_restore -f-

2019-11-04 Thread Euler Taveira
Em seg., 4 de nov. de 2019 às 11:53, Alvaro Herrera
 escreveu:
>
> On 2019-Oct-17, Tom Lane wrote:
>
> > Stephen Frost  writes:
> > > First, I'd like to clarify what I believe Tom's suggestion is, and then
> > > talk through that, as his vote sways this topic pretty heavily.
> >
> > > Tom, I take it your suggestion is to have '-f -' be accepted to mean
> > > 'goes to stdout' in all branches?
> >
> > Yes.
>
> +1 for this, FWIW.  Let's get it done before next week minors.  Is
> anybody writing a patch?  If not, I can do it.
>
I'm not.

> > > If you meant for all branches to accept '-f -' and have it go to a './-'
> > > file then that's just a revert of this entire change, which I can't
> > > agree with either
> >
> > No, I'm not proposing a full revert.  But there's certainly room to
> > consider reverting the part that says you *must* write "-f -" to get
> > output to stdout.
>
> I don't think this will buy us anything, if we get past branches updated
> promptly.
>
+1.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




Re: 64 bit transaction id

2019-11-04 Thread Tomas Vondra

On Mon, Nov 04, 2019 at 04:39:44PM +0300, Павел Ерёмин wrote:

  And yet, if I try to implement a similar mechanism, if successful, will my
  revision be considered?
   


Why wouldn't it be considered? If you submit a patch that demonstrably
improves the behavior (in this case reduces per-tuple overhead without
causing significant issues elsewhere), we'd be crazy not to consider it.

The bar is pretty high, though, because this touches one of the core
pieces of the database.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: v12 and pg_restore -f-

2019-11-04 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Oct-17, Tom Lane wrote:
>> Stephen Frost  writes:
>>> Tom, I take it your suggestion is to have '-f -' be accepted to mean
>>> 'goes to stdout' in all branches?

>> Yes.

> +1 for this, FWIW.  Let's get it done before next week minors.  Is
> anybody writing a patch?  If not, I can do it.

Please do.

>> No, I'm not proposing a full revert.  But there's certainly room to
>> consider reverting the part that says you *must* write "-f -" to get
>> output to stdout.

> I don't think this will buy us anything, if we get past branches updated
> promptly.

I'm okay with that approach.

regards, tom lane




Re: alternative to PG_CATCH

2019-11-04 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-11-02 15:36, Tom Lane wrote:
>> I hadn't actually tested this patch before commit, but now that
>> it's in, I'm seeing assorted compiler warnings:

> I've fixed the ones that I could reproduce on CentOS 6.  I haven't seen 
> any on a variety of newer systems.

I'd hoped for a way to fix PG_FINALLY rather than having to band-aid
the individual callers :-(.  But maybe there isn't one.

Now that I've actually looked at the patched code, there's a far
more severe problem with it.  Namely, that use of PG_FINALLY
means that the "finally" segment is run without having popped
the error context stack, which means that any error thrown
within that segment will sigsetjmp right back to the top,
resulting in an infinite loop.  (Well, not infinite, because
it'll crash out once the error nesting depth limit is hit.)
We *must* pop the stack before running recovery code.

Possibly this could be fixed like so:

#define PG_FINALLY() \
} \
else \
{ \
PG_exception_stack = _save_exception_stack; \
error_context_stack = _save_context_stack; \
_do_rethrow = true

#define PG_END_TRY()  \
} \
if (_do_rethrow) \
PG_RE_THROW(); \
PG_exception_stack = _save_exception_stack; \
error_context_stack = _save_context_stack; \
} while (0)

But I haven't tested that.

regards, tom lane




Re: cost based vacuum (parallel)

2019-11-04 Thread Masahiko Sawada
On Mon, 4 Nov 2019 at 19:26, Amit Kapila  wrote:
>
> On Mon, Nov 4, 2019 at 1:51 PM Masahiko Sawada  wrote:
> >
> > On Mon, Nov 4, 2019 at 3:54 PM Amit Kapila  wrote:
> > >
> > > I think approach-2 is better in throttling the system as it doesn't
> > > have the drawback of the first approach, but it might be a bit tricky
> > > to implement.
> >
> > I might be missing something but I think that there could be the
> > drawback of the approach-1 even on approach-2 depending on index pages
> > loaded on the shared buffer and the vacuum delay setting.
> >
>
> Can you be a bit more specific about this?

Suppose there are two indexes: one index is loaded at all while
another index isn't. One vacuum worker who processes the former index
hits all pages on the shared buffer but another worker who processes
the latter index read all pages from either OS page cache or disk.
Even if both the cost limit and the cost balance are split evenly
among workers because the cost of page hits and page misses are
different it's possible that one vacuum worker sleeps while other
workers doing I/O.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: v12 and pg_restore -f-

2019-11-04 Thread Alvaro Herrera
On 2019-Oct-17, Tom Lane wrote:

> Stephen Frost  writes:
> > First, I'd like to clarify what I believe Tom's suggestion is, and then
> > talk through that, as his vote sways this topic pretty heavily.
> 
> > Tom, I take it your suggestion is to have '-f -' be accepted to mean
> > 'goes to stdout' in all branches?
> 
> Yes.

+1 for this, FWIW.  Let's get it done before next week minors.  Is
anybody writing a patch?  If not, I can do it.

> > If you meant for all branches to accept '-f -' and have it go to a './-'
> > file then that's just a revert of this entire change, which I can't
> > agree with either
> 
> No, I'm not proposing a full revert.  But there's certainly room to
> consider reverting the part that says you *must* write "-f -" to get
> output to stdout.

I don't think this will buy us anything, if we get past branches updated
promptly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: auxiliary processes in pg_stat_ssl

2019-11-04 Thread Robert Haas
On Mon, Nov 4, 2019 at 8:26 AM Alvaro Herrera  wrote:
> On 2019-Sep-04, Alvaro Herrera wrote:
> > I just noticed that we list auxiliary processes in pg_stat_ssl:
> [...]
> > But this seems pointless.  Should we not hide those?  Seems this only
> > happened as an unintended side-effect of fc70a4b0df38.  It appears to me
> > that we should redefine that view to restrict backend_type that's
> > 'client backend' (maybe include 'wal receiver'/'wal sender' also, not
> > sure.)
>
> [crickets]
>
> Robert, Kuntal, any opinion on this?

I think if I were doing something about it, I'd probably try to filter
on a field that directly represents whether there is a connection,
rather than checking the backend type. That way, if the list of
backend types that have client connections changes later, there's
nothing to update. Like "WHERE client_port IS NOT NULL," or something
of that sort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: 64 bit transaction id

2019-11-04 Thread Павел Ерёмин
And yet, if I try to implement a similar mechanism, if successful, will my revision be considered? regards03.11.2019, 22:15, "Tomas Vondra" :On Sun, Nov 03, 2019 at 02:17:15PM +0300, Павел Ерёмин wrote:   I completely agree with all of the above. Therefore, the proposed   mechanism may entail larger improvements (and not only VACUUM).I think the best think you can do is try implementing this ...I'm afraid the "improvements" essentially mean making various imporantparts of the system much more complicated and expensive. There's atrade-off between saving 8B per row and additional overhead (duringvacuum etc.), and it does not seem like a winning strategy. What startedas "we can simply look at the next row version" is clearly way morecomplicated and expensive.The trouble here is that it adds dependency between pages in the datafile. That for example means that during cleanup of a page it may be necessary to modify the other page, when originally that would be read-only in that checkpoint interval. That's essentially write amplification, and may significantly increase the amount of WAL due to generating FPW for the other page.   I can offer the following solution.   For VACUUM, create a hash table.   VACUUM scanning the table sees that the version (tuple1) has t_ctid filled   and refers to the address tuple2, it creates a structure into which it   writes the address tuple1, tuple1.xid, length tuple1 (well, and other   information that is needed), puts this structure in the hash table by key   tuple2 addresses.   VACUUM reaches tuple2, checks the address of tuple2 in the hash table - if   it finds it, it evaluates the connection between them and makes a decision   on cleaning.We know VACUUM is already pretty expensive, so making it even moreexpensive seems pretty awful. And the proposed solution seems damnexpensive. We already do something similar for indexes - we trackpointers for removed rows, so that we can remove them from indexes. Andit's damn expensive because we don't know where in the index the tuplesare - so we have to scan the whole indexes.This would mean we have to do the same thing for table, because we don'tknow where in the table are the older versions of those rows, because wedon't know where the other rows are. That seems mighty expensive.Not to mention that this does nothing for page-level vacuum, which wedo when trying to fit another row on a page (e.g. for HOT). This has tobe absolutely cheap, we certainly are not going to do lookups of otherpages or looking for older versions of the row, and so on.Being able to do visibility decisions based on the tuple alone (orpossibly page-level + tuple information) has a lot of value, and I don'tthink we want to make this more complicated.regards-- Tomas Vondra  http://www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 

Re: auxiliary processes in pg_stat_ssl

2019-11-04 Thread Alvaro Herrera
On 2019-Sep-04, Alvaro Herrera wrote:

> I just noticed that we list auxiliary processes in pg_stat_ssl:
[...]
> But this seems pointless.  Should we not hide those?  Seems this only
> happened as an unintended side-effect of fc70a4b0df38.  It appears to me
> that we should redefine that view to restrict backend_type that's
> 'client backend' (maybe include 'wal receiver'/'wal sender' also, not
> sure.)

[crickets]

Robert, Kuntal, any opinion on this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: WIP/PoC for parallel backup

2019-11-04 Thread Asif Rehman
On Fri, Nov 1, 2019 at 8:53 PM Robert Haas  wrote:

> On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman 
> wrote:
> > 'startptr' is used by sendFile() during checksum verification. Since
> > SendBackupFiles() is using sendFIle we have to set a valid WAL location.
>
> Ugh, global variables.
>
> Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
> STOP_BACKUP all using the same base_backup_opt_list production as
> BASE_BACKUP? Presumably most of those options are not applicable to
> most of those commands, and the productions should therefore be
> separated.
>

Are you expecting something like the attached patch? Basically I have
reorganised the grammar
rules so each command can have the options required by it.

I was feeling a bit reluctant for this change because it may add some
unwanted grammar rules in
the replication grammar. Since these commands are using the same options as
base backup, may
be we could throw error inside the relevant functions on unwanted options?



> You should add docs, too.  I wouldn't have to guess what some of this
> stuff was for if you wrote documentation explaining what this stuff
> was for. :-)
>

Yes I will add it in the next patch.


>
> >> The tablespace_path option appears entirely unused, and I don't know
> >> why that should be necessary here, either.
> >
> > This is to calculate the basepathlen. We need to exclude the tablespace
> location (or
> > base path) from the filename before it is sent to the client with
> sendFile call. I added
> > this option primarily to avoid performing string manipulation on
> filename to extract the
> > tablespace location and then calculate the basepathlen.
> >
> > Alternatively we can do it by extracting the base path from the received
> filename. What
> > do you suggest?
>
> I don't think the server needs any information from the client in
> order to be able to exclude the tablespace location from the pathname.
> Whatever it needs to know, it should be able to figure out, just as it
> would in a non-parallel backup.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca


repl_grammar.patch
Description: Binary data


[proposal] recovery_target "latest"

2019-11-04 Thread Grigory Smolkin

Hello, hackers!

I`d like to propose a new argument for recovery_target parameter, which 
will stand to recovering until all available WAL segments are applied.


Current PostgreSQL recovery default behavior(when no recovery target is 
provided) does exactly that, but there are several shortcomings:
  - without explicit recovery target standing for default behavior, 
recovery_target_action is not coming to action at the end of recovery
  - with PG12 changes, the life of all backup tools became very hard, 
because now recovery parameters can be set outside of single config 
file(recovery.conf), so it is impossible to ensure, that default 
recovery behavior, desired in some cases, will not be silently 
overwritten by some recovery parameter forgotten by user.


Proposed path is very simple and solves the aforementioned problems by 
introducing new argument "latest" for recovery_target parameter.


Old recovery behavior is still available if no recovery target is 
provided. I`m not sure, whether it should it be left as it is now, or not.


Another open question is what to do with recovery_target_inclusive if 
recovery_target = "latest" is used.


--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0191ec84b1..49675c38da 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3366,21 +3366,19 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
 
  
  
-  recovery_target = 'immediate'
+  recovery_target (string)
   
 recovery_target configuration parameter
   
   
   

-This parameter specifies that recovery should end as soon as a
-consistent state is reached, i.e. as early as possible. When restoring
-from an online backup, this means the point where taking the backup
-ended.
-   
-   
-Technically, this is a string parameter, but 'immediate'
-is currently the only allowed value.
+This parameter determines how far recovery should proceed. The value
+immediate means that recovery should end as soon as a consistent
+state is reached, i.e. as early as possible. When restoring from an online
+backup, this means the point where taking the backup ended.
+The second possible value latest means that recovery
+should proceed to the end of the available WAL log.

   
  
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2e3cc51006..faab0b2b4c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6345,6 +6345,9 @@ StartupXLOG(void)
 		else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
 			ereport(LOG,
 	(errmsg("starting point-in-time recovery to earliest consistent point")));
+		else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+			ereport(LOG,
+	(errmsg("starting point-in-time recovery until all available WAL is applied")));
 		else
 			ereport(LOG,
 	(errmsg("starting archive recovery")));
@@ -7257,6 +7260,13 @@ StartupXLOG(void)
 			 * end of main redo apply loop
 			 */
 
+			/*
+			 * If all avaialble WAL is replayed,
+			 * then RECOVERY_TARGET_LATEST is satisfied
+			 */
+			if (recoveryTarget == RECOVERY_TARGET_LATEST)
+reachedStopPoint = true;
+
 			if (reachedStopPoint)
 			{
 if (!reachedConsistency)
@@ -7459,6 +7469,8 @@ StartupXLOG(void)
 	 recoveryStopName);
 		else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
 			snprintf(reason, sizeof(reason), "reached consistency");
+		else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+			snprintf(reason, sizeof(reason), "applied all available WAL");
 		else
 			snprintf(reason, sizeof(reason), "no recovery target specified");
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 31a5ef0474..b652304683 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3503,7 +3503,10 @@ static struct config_string ConfigureNamesString[] =
 
 	{
 		{"recovery_target", PGC_POSTMASTER, WAL_RECOVERY_TARGET,
-			gettext_noop("Set to \"immediate\" to end recovery as soon as a consistent state is reached."),
+			gettext_noop("Set to \"immediate\" to end recovery as "
+		 "soon as a consistent state is reached or "
+		 " to \"latest\" to end recovery after "
+		 "replaying all available WAL in the archive."),
 			NULL
 		},
 		&recovery_target_string,
@@ -11527,9 +11530,10 @@ error_multiple_recovery_targets(void)
 static bool
 check_recovery_target(char **newval, void **extra, GucSource source)
 {
-	if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+	if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "latest") != 0 &&
+		strcmp(*newval, "") != 0)
 	{
-		GUC_check_errdetail("The only allowed value is \"immediate\".");
+		GUC_check_errdetail("The only allowed values are \"immedi

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-11-04 Thread Amit Kapila
On Wed, Oct 30, 2019 at 9:38 AM vignesh C  wrote:
>
> On Tue, Oct 22, 2019 at 10:52 PM Tomas Vondra
>  wrote:
> >
> > I think the patch should do the simplest thing possible, i.e. what it
> > does today. Otherwise we'll never get it committed.
> >
> I found a couple of crashes while reviewing and testing flushing of
> open transaction data:
>

Thanks for doing these tests.  However, I don't think these issues are
anyway related to this patch.  It seems to be base code issues
manifested by this patch.  See my analysis below.

> Issue 1:
> #0  0x7f22c5722337 in raise () from /lib64/libc.so.6
> #1  0x7f22c5723a28 in abort () from /lib64/libc.so.6
> #2  0x00ec5390 in ExceptionalCondition
> (conditionName=0x10ea814 "!dlist_is_empty(head)", errorType=0x10ea804
> "FailedAssertion",
> fileName=0x10ea7e0 "../../../../src/include/lib/ilist.h",
> lineNumber=458) at assert.c:54
> #3  0x00b4fb91 in dlist_tail_element_off (head=0x19e4db8,
> off=64) at ../../../../src/include/lib/ilist.h:458
> #4  0x00b546d0 in ReorderBufferAbortOld (rb=0x191b6b0,
> oldestRunningXid=3834) at reorderbuffer.c:1966
> #5  0x00b3ca03 in DecodeStandbyOp (ctx=0x19af990,
> buf=0x7ffcbc26dc50) at decode.c:332
>

This seems to be the problem of base code where we abort immediately
after serializing the changes because in that case, the changes list
will be empty.  I think you can try to reproduce it via the debugger
or by hacking the code such that it serializes after every change and
then if you abort after one change, it should hit this problem.

>
> Issue 2:
> #0  0x7f1d7ddc4337 in raise () from /lib64/libc.so.6
> #1  0x7f1d7ddc5a28 in abort () from /lib64/libc.so.6
> #2  0x00ec4e1d in ExceptionalCondition
> (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr",
> errorType=0x10ea284 "FailedAssertion",
> fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54
> #3  0x00b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0,
> txn=0x2bafb08) at reorderbuffer.c:3052
> #4  0x00b52b1c in ReorderBufferCleanupTXN (rb=0y x2ae36b0,
> txn=0x2bafb08) at reorderbuffer.c:1318
> #5  0x00b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> txn=0x2b9d778) at reorderbuffer.c:1257
> #6  0x00b5475c in ReorderBufferAbortOld (rb=0x2ae36b0,
> oldestRunningXid=3835) at reorderbuffer.c:1973
>

This seems to be again the problem with base code as we don't update
the final_lsn for subtransactions during ReorderBufferAbortOld.  This
can also be reproduced with some hacking in code or via debugger in a
similar way as explained for the previous problem but with a
difference that there must be subtransaction involved in this case.

> #7  0x00b3ca03 in DecodeStandbyOp (ctx=0x2b676d0,
> buf=0x7ffcbc74cc00) at decode.c:332
> #8  0x00b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0,
> record=0x2b67990) at decode.c:121
> #9  0x00b70b2b in XLogSendLogical () at walsender.c:2845
>
> These failures come randomly.
> I'm not able to reproduce this issue with simple test case.

Yeah, it appears to be difficult to reproduce unless you hack the code
to serialize every change or use debugger to forcefully flush the
changes every time.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pg_waldump erroneously outputs newline for FPWs, and another minor bug

2019-11-04 Thread Jehan-Guillaume de Rorthais
On Wed, 30 Oct 2019 09:26:21 +0900
Michael Paquier  wrote:

> On Tue, Oct 29, 2019 at 04:42:07PM -0700, Peter Geoghegan wrote:
> > The same commit from Heikki omitted one field from that record, for no
> > good reason. I backpatched a bugfix to the output format for nbtree
> > page splits a few weeks ago, fixing that problem. I agree that we
> > should also backpatch this bugfix.  
> 
> The output format of pg_waldump may matter for some tools, like
> Jehan-Guillaume's PAF [1], but I am ready to bet that any tools like
> that just skip any noise newlines, so +1 for a backpatch.
> 
> I am adding Jehan-Guillaume in CC just in case.

Thank you Michael!




Re: [PATCH] contrib/seg: Fix PG_GETARG_SEG_P definition

2019-11-04 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
>> I just noticed that when contrib/seg was converted to V1 calling
>> convention (commit 389bb2818f4), the PG_GETARG_SEG_P() macro got defined
>> in terms of PG_GETARG_POINTER().  But it itself calls DatumGetPointer(),
>> so shouldn't it be using PG_GETARG_DATUM()?
>
> Yup, I agree.  Pushed.

Thanks!

>   regards, tom lane

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law




Re: Minimal logical decoding on standbys

2019-11-04 Thread Amit Khandekar
On Thu, 10 Oct 2019 at 05:49, Craig Ringer  wrote:
>
> On Tue, 1 Oct 2019 at 02:08, Robert Haas  wrote:
>>
>>
>> Why does create_logical_slot_on_standby include sleep(1)?
>
>
> Yeah, we really need to avoid sleeps in regression tests.

Yeah, have already got rid of the sleeps from the patch-series version
4 onwards.

By the way, the couple of patches out of the patch series had
bitrotten. Attached is the rebased version.

Thanks
-Amit Khandekar


logicaldecodng_standby_v4_rebased.tar.gz
Description: application/gzip


Re: adding partitioned tables to publications

2019-11-04 Thread Peter Eisentraut
This patch seems excessively complicated to me.  Why don't you just add 
the actual partitioned table to pg_publication_rel and then expand the 
partition hierarchy in pgoutput (get_rel_sync_entry() or 
GetRelationPublications() or somewhere around there).  Then you don't 
need to do any work in table DDL to keep the list of published tables up 
to date.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: cost based vacuum (parallel)

2019-11-04 Thread Amit Kapila
On Mon, Nov 4, 2019 at 1:03 PM Darafei "Komяpa" Praliaskouski
 wrote:
>>
>>
>> This is somewhat similar to a memory usage problem with a
>> parallel query where each worker is allowed to use up to work_mem of
>> memory.  We can say that the users using parallel operation can expect
>> more system resources to be used as they want to get the operation
>> done faster, so we are fine with this.  However, I am not sure if that
>> is the right thing, so we should try to come up with some solution for
>> it and if the solution is too complex, then probably we can think of
>> documenting such behavior.
>
>
> In cloud environments (Amazon + gp2) there's a budget on input/output 
> operations. If you cross it for long time, everything starts looking like you 
> work with a floppy disk.
>
> For the ease of configuration, I would need a "max_vacuum_disk_iops" that 
> would limit number of input-output operations by all of the vacuums in the 
> system. If I set it to less than value of budget refill, I can be sure than 
> that no vacuum runs too fast to impact any sibling query.
>
> There's also value in non-throttled VACUUM for smaller tables. On gp2 such 
> things will be consumed out of surge budget, and its size is known to 
> sysadmin. Let's call it "max_vacuum_disk_surge_iops" - if a relation has less 
> blocks than this value and it's a blocking in any way situation 
> (antiwraparound, interactive console, ...) - go on and run without throttling.
>

I think the need for these things can be addressed by current
cost-based-vacuum parameters. See docs [1].  For example, if you set
vacuum_cost_delay as zero, it will allow the operation to be performed
without throttling.

> For how to balance the cost: if we know a number of vacuum processes that 
> were running in the previous second, we can just divide a slot for this 
> iteration by that previous number.
>
> To correct for overshots, we can subtract the previous second's overshot from 
> next one's. That would also allow to account for surge budget usage and let 
> it refill, pausing all autovacuum after a manual one for some time.
>
> Precision of accounting limiting count of operations more than once a second 
> isn't beneficial for this use case.
>

I think it is better if we find a way to rebalance the cost on some
worker exit rather than every second as anyway it won't change unless
any worker exits.

[1] - 
https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: cost based vacuum (parallel)

2019-11-04 Thread Amit Kapila
On Mon, Nov 4, 2019 at 1:51 PM Masahiko Sawada  wrote:
>
> On Mon, Nov 4, 2019 at 3:54 PM Amit Kapila  wrote:
> >
> > I think approach-2 is better in throttling the system as it doesn't
> > have the drawback of the first approach, but it might be a bit tricky
> > to implement.
>
> I might be missing something but I think that there could be the
> drawback of the approach-1 even on approach-2 depending on index pages
> loaded on the shared buffer and the vacuum delay setting.
>

Can you be a bit more specific about this?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: alternative to PG_CATCH

2019-11-04 Thread Peter Eisentraut

On 2019-11-02 15:36, Tom Lane wrote:

I hadn't actually tested this patch before commit, but now that
it's in, I'm seeing assorted compiler warnings:


I've fixed the ones that I could reproduce on CentOS 6.  I haven't seen 
any on a variety of newer systems.


It's not clear why only a handful of cases cause warnings, but my guess 
is that the functions are above some size/complexity threshold beyond 
which those older compilers give up doing a full analysis.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-11-04 Thread vignesh C
On Thu, Oct 24, 2019 at 7:07 PM Amit Kapila  wrote:
>
> On Tue, Oct 22, 2019 at 10:30 AM Dilip Kumar  wrote:
> >
> > I have merged bugs_and_review_comments_fix.patch changes to 0001 and 0002.
> >
>
> I was wondering whether we have checked the code coverage after this
> patch?  Previously, the existing tests seem to be covering most parts
> of the function ReorderBufferSerializeTXN [1].  After this patch, the
> timing to call ReorderBufferSerializeTXN will change, so that might
> impact the testing of the same.  If it is already covered, then I
> would like to either add a new test or extend existing test with the
> help of new spill counters.  If it is not getting covered, then we
> need to think of extending the existing test or write a new test to
> cover the function ReorderBufferSerializeTXN.
>
I have run the tests with coverage and found that
ReorderBufferSerializeTXN is not being hit.
The reason it is not being hit is because of the following check in
ReorderBufferCheckMemoryLimit:
/* bail out if we haven't exceeded the memory limit */
if (rb->size < logical_decoding_work_mem * 1024L)
return;
Previously the tests from contrib/test_decoding could hit
ReorderBufferSerializeTXN function.
I'm checking if we can modify the test or add new test to hit
ReorderBufferSerializeTXN function.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Collation versioning

2019-11-04 Thread Julien Rouhaud
Hello Thomas,

On Mon, Nov 4, 2019 at 4:58 AM Thomas Munro  wrote:
>
> On Fri, Nov 1, 2019 at 2:21 AM Julien Rouhaud  wrote:
> > Are you planning to continue working on it?  For the record, that's
> > something needed to be able to implement a filter in REINDEX command
> > [1].
>
> Bonjour Julien,
>
> Unfortunately I haven't had time to work on it seriously, but here's a
> quick rebase to get the proof-of-concept back into working shape.
> It's nice to see progress in other bits of the problem-space.  I hope
> to have time to look at this patch set again soon, but if you or
> someone else would like hack on or think about it too, please feel
> free!

Thanks!  I already did some hack on it when looking at the code so I
can try to make some progress.

> Yes indeed this is exactly the same problem that you're trying to
> solve, approached from a different starting point.
>
> Here are some problems to think about:
>
> * We'd need to track dependencies on the default collation once we
> have versioning for that (see
> https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com).
> That is how most people actually consume collations out there in real
> life, and yet we don't normally track dependencies on the default
> collation and I don't know if that's simply a matter of ripping out
> all the code that looks like "xxx != DEFAULT_COLLATION_ID" in the
> dependency analysis code or if there's more to it.

This isn't enough.  What would remain is:

- teach get_collation_version_for_oid() to return the  default
collation name, which is simple
- have recordDependencyOnVersion() actually records the dependency,
which wouldn't happen as the default collation is pinned.

An easy fix would be to teach isObjectPinned() to ignore
CollationRelationId / DEFAULT_COLLATION_OID, but that's ugly and would
allow too many dependencies to be stored.  Not pinning the default
collation during initdb doesn't sound a good alternative either.
Maybe adding a force flag or a new DependencyType that'd mean "normal
but forced" would be ok?

> * Andres mentioned off-list that pg_depend rows might get blown away
> and recreated in some DDL circumstances.  We need to look into that.
> * Another is that pg_upgrade won't preserve pg_depend rows, so you'd
> need some catalog manipulation (direct or via new DDL) to fix that.
> * Some have expressed doubt that pg_depend is the right place for
> this; let's see if any counter-proposals appear.

When working on the REINDEX FILTER, I totally missed this thread and
wrote a POC saving the version in pg_index.  That's not ideal though,
as you need to record multiple version strings.  In my version I used
a json type, using the collprovider  as the key, but that's not enough
for ICU as each collation can have a different version string.  I'm
not a huge fan of using pg_depend to record the version, but storing a
collprovider/collname -> version per row in pg_index is definitely a
no go, so I don't have any better counter-proposal.

>
> > # reindex table t1;
> > WARNING:  01000: index "t1_val_idx" depends on collation 13330 version
> > "a153.97.35.8", but the current version is "153.97.35.8"
> > DETAIL:  The index may be corrupted due to changes in sort order.
> > HINT:  REINDEX to avoid the risk of corruption.
> > LOCATION:  index_check_collation_version, index.c:1263
>
> Duh.  Yeah, that's stupid and needs to be fixed somehow.

I don't have a clever solution for that either.




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-11-04 Thread Kuntal Ghosh
On Mon, Nov 4, 2019 at 3:32 PM Dilip Kumar  wrote:
>
> So your result shows that with "streaming on", performance is
> degrading?  By any chance did you try to see where is the bottleneck?
>
Right. But, as we increase the logical_decoding_work_mem, the
performance improves. I've not analyzed the bottleneck yet. I'm
looking into the same.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




  1   2   >