date:20191104

Re: [proposal] recovery_target "latest"

2019-11-04 Thread Kyotaro Horiguchi

Hello.

At Mon, 4 Nov 2019 16:03:38 +0300, Grigory Smolkin  
wrote in 
> Hello, hackers!
> 
> I`d like to propose a new argument for recovery_target parameter,
> which will stand to recovering until all available WAL segments are
> applied.
> 
> Current PostgreSQL recovery default behavior(when no recovery target
> is provided) does exactly that, but there are several shortcomings:
>   - without explicit recovery target standing for default behavior,
> recovery_target_action is not coming to action at the end of recovery
>   - with PG12 changes, the life of all backup tools became very hard,
> because now recovery parameters can be set outside of single config
> file(recovery.conf), so it is impossible to ensure, that default
> recovery behavior, desired in some cases, will not be silently
> overwritten by some recovery parameter forgotten by user.
> 
> Proposed path is very simple and solves the aforementioned problems by
> introducing new argument "latest" for recovery_target parameter.

Does the tool remove or rename recovery.conf to cancel the settings?
And do you intend that the new option is used to override settings by
appending it at the end of postgresql.conf? If so, the commit
f2cbffc7a6 seems to break the assumption. PG12 rejects to start if it
finds two different kinds of recovery target settings.

> Old recovery behavior is still available if no recovery target is
> provided. I`m not sure, whether it should it be left as it is now, or
> not.
> 
> Another open question is what to do with recovery_target_inclusive if
> recovery_target = "latest" is used.

Anyway inclusiveness doesn't affect to "immediate". If we had the
"latest" option, it would behave the same way.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Resume vacuum and autovacuum from interruption and cancellation

2019-11-04 Thread Masahiko Sawada

On Sat, 2 Nov 2019 at 02:10, Robert Haas  wrote:
>
> On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih  wrote:
> > Sounds like an interesting idea, but does it really help? Because if
> > vacuum was interrupted previously, wouldn't it already know the dead
> > tuples, etc in the next run quite quickly, as the VM, FSM is already
> > updated for the page in the previous run.
>
> +1. I don't deny that a patch like this could sometimes save
> something, but it doesn't seem like it would save all that much all
> that often. If your autovacuum runs are being frequently cancelled,
> that's going to be a big problem, I think.

I've observed the case where user wants to cancel a very long running
autovacuum (sometimes for anti-wraparound) for doing DDL or something
maintenance works. If the table is very large autovacuum could take a
long time and they might not reclaim garbage enough.

> And as Rafia says, even
> though you might do a little extra work reclaiming garbage from
> subsequently-modified pages toward the beginning of the table, it
> would be unusual if they'd *all* been modified. Plus, if they've
> recently been modified, they're more likely to be in cache.
>
> I think this patch really needs a test scenario or demonstration of
> some kind to prove that it produces a measurable benefit.

Okay. A simple test could be that we cancel a long running vacuum on a
large table that is being updated and rerun vacuum. And then we see
the garbage on that table. I'll test it.

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: cost based vacuum (parallel)

2019-11-04 Thread Amit Kapila

On Mon, Nov 4, 2019 at 11:42 PM Andres Freund  wrote:
>
>
> > The two approaches to solve this problem being discussed in that
> > thread [1] are as follows:
> > (a) Allow the parallel workers and master backend to have a shared
> > view of vacuum cost related parameters (mainly VacuumCostBalance) and
> > allow each worker to update it and then based on that decide whether
> > it needs to sleep.  Sawada-San has done the POC for this approach.
> > See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> > drawback of this approach could be that we allow the worker to sleep
> > even though the I/O has been performed by some other worker.
>
> I don't understand this drawback.
>

I think the problem could be that the system is not properly throttled
when it is supposed to be.  Let me try by a simple example, say we
have two workers w-1 and w-2.  The w-2 is primarily doing the I/O and
w-1 is doing very less I/O but unfortunately whenever w-1 checks it
finds that cost_limit has exceeded and it goes for sleep, but w-1
still continues.  Now in such a situation even though we have made one
of the workers slept for a required time but ideally the worker which
was doing I/O should have slept.  The aim is to make the system stop
doing I/O whenever the limit has exceeded, so that might not work in
the above situation.

>
> > (b) The other idea could be that we split the I/O among workers
> > something similar to what we do for auto vacuum workers (see
> > autovac_balance_cost).  The basic idea would be that before launching
> > workers, we need to compute the remaining I/O (heap operation would
> > have used something) after which we need to sleep and split it equally
> > across workers.  Here, we are primarily thinking of dividing
> > VacuumCostBalance and VacuumCostLimit parameters.  Once the workers
> > are finished, they need to let master backend know how much I/O they
> > have consumed and then master backend can add it to it's current I/O
> > consumed.  I think we also need to rebalance the cost of remaining
> > workers once some of the worker's exit.  Dilip has prepared a POC
> > patch for this, see 0002-POC-divide-vacuum-cost-limit in email [3].
>
> (b) doesn't strike me as advantageous. It seems quite possible that you
> end up with one worker that has a lot more IO than others, leading to
> unnecessary sleeps, even though the actually available IO budget has not
> been used up.
>

Yeah, this is possible, but to an extent, this is possible in the
current design as well where we balance the cost among autovacuum
workers.  Now, it is quite possible that the current design itself is
not good and we don't want to do the same thing at another place, but
at least we will be consistent and can explain the overall behavior.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"

2019-11-04 Thread Fujii Masao

On Sat, Nov 2, 2019 at 4:40 PM Michael Paquier  wrote:
>
> On Fri, Nov 01, 2019 at 02:17:03PM +0500, Ibrar Ahmed wrote:
> > Do we really need a regression test cases for such small oversights?
>
> It is possible to get the command tags using an event trigger...  But
> that sounds hack-ish.

Yes, so if simple test mechanism to check command tag exists,
it would be helpful.

I'm thinking to commit the patch. But I have one question; is it ok to
back-patch? Since the patch changes the command tags for some commands,
for example, which might break the existing event trigger functions
using TG_TAG if we back-patch it. Or we should guarantee the compatibility of
command tag within the same major version?

Regards,

-- 
Fujii Masao

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-11-04 Thread Dilip Kumar

On Mon, Nov 4, 2019 at 5:22 PM Amit Kapila  wrote:
>
> On Wed, Oct 30, 2019 at 9:38 AM vignesh C  wrote:
> >
> > On Tue, Oct 22, 2019 at 10:52 PM Tomas Vondra
> >  wrote:
> > >
> > > I think the patch should do the simplest thing possible, i.e. what it
> > > does today. Otherwise we'll never get it committed.
> > >
> > I found a couple of crashes while reviewing and testing flushing of
> > open transaction data:
> >
>
> Thanks for doing these tests.  However, I don't think these issues are
> anyway related to this patch.  It seems to be base code issues
> manifested by this patch.  See my analysis below.
>
> > Issue 1:
> > #0  0x7f22c5722337 in raise () from /lib64/libc.so.6
> > #1  0x7f22c5723a28 in abort () from /lib64/libc.so.6
> > #2  0x00ec5390 in ExceptionalCondition
> > (conditionName=0x10ea814 "!dlist_is_empty(head)", errorType=0x10ea804
> > "FailedAssertion",
> > fileName=0x10ea7e0 "../../../../src/include/lib/ilist.h",
> > lineNumber=458) at assert.c:54
> > #3  0x00b4fb91 in dlist_tail_element_off (head=0x19e4db8,
> > off=64) at ../../../../src/include/lib/ilist.h:458
> > #4  0x00b546d0 in ReorderBufferAbortOld (rb=0x191b6b0,
> > oldestRunningXid=3834) at reorderbuffer.c:1966
> > #5  0x00b3ca03 in DecodeStandbyOp (ctx=0x19af990,
> > buf=0x7ffcbc26dc50) at decode.c:332
> >
>
> This seems to be the problem of base code where we abort immediately
> after serializing the changes because in that case, the changes list
> will be empty.  I think you can try to reproduce it via the debugger
> or by hacking the code such that it serializes after every change and
> then if you abort after one change, it should hit this problem.
>
I think you might need to kill the server after all changes are
serialized otherwise normal abort will hit the ReorderBufferAbort and
that will remove your ReorderBufferTXN entry and you will never hit
this case.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: v12.0: ERROR: could not find pathkey item to sort

2019-11-04 Thread Amit Langote

On Sun, Nov 3, 2019 at 4:43 AM Tom Lane  wrote:
> Amit Langote  writes:
> >> This would
> >> probably require refactoring things so that there are separate
> >> entry points to add child equivalences for base rels and join rels.
> >> But that seems cleaner anyway than what you've got here.
>
> > Separate entry points sounds better, but only in HEAD?
>
> I had actually had in mind that we might have two wrappers around a
> common search-and-replace routine, but after studying the code I see that
> there's just enough differences to make it probably not worth the trouble
> to do it like that.  I did spend a bit of time removing cosmetic
> differences between the two versions, though, mostly by creating
> common local variables.

Agree that having two totally separate routines is better.

> I think the way you did the matching_ecs computation is actually wrong:
> we need to find ECs that reference any rel of the join, not only those
> that reference both inputs.  If nothing else, the way you have it here
> makes the results dependent on which pair of input rels gets considered
> first, and that's certainly bad for multiway joins.

I'm not sure I fully understand the problems, but maybe you're right.

> Also, I thought we should try to put more conditions on whether we
> invoke add_child_join_rel_equivalences at all.  In the attached I did
>
> if ((enable_partitionwise_join || enable_partitionwise_aggregate) &&
> (joinrel->has_eclass_joins ||
>  has_useful_pathkeys(root, parent_joinrel)))
>
> but I wonder if we could do more, like checking to see if the parent
> joinrel is partitioned.  Alternatively, maybe it's unnecessary because
> we won't try to build child joinrels unless these conditions are true?

Actually, I think we can assert in add_child_rel_equivalences() that
enable_partitionwise_join is true.  Then checking
enable_partitionwise_aggregate is unnecessary.

> I did not like the test case either.  Creating a whole new (and rather
> large) test table just for this one case is unreasonably expensive ---
> it about doubles the runtime of the equivclass test for me.  There's
> already a perfectly good test table in partition_join.sql, which seems
> like a more natural home for this anyhow.  After a bit of finagling
> I was able to adapt the test query to fail on that table.

That's great.  I tried but I can only finagle so much when it comes to
twisting around plan shapes to what I need. :)

> Patch v4 attached.  I've not looked at what we need to do to make this
> work in v12.

Thanks a lot for the revised patch.

Maybe the only difference between HEAD and v12 is that the former has
eclass_indexes infrastructure, whereas the latter doesn't?  I have
attached a version of your patch adapted for v12.

Also, looking at this in the patched code:

+ /*
+ * We may ignore expressions that reference a single baserel,
+ * because add_child_rel_equivalences should have handled them.
+ */
+ if (bms_membership(cur_em->em_relids) != BMS_MULTIPLE)
+ continue;

I have been thinking maybe add_child_rel_equivalences() doesn't need
to translate EC members that reference multiple appendrels, because
there top_parent_relids is always a singleton set, whereas em_relids
of such expressions is not?  Those half-translated expressions are
useless, only adding to the overhead of scanning ec_members.  I'm
thinking that we should apply this diff:

diff --git a/src/backend/optimizer/path/equivclass.c
b/src/backend/optimizer/path/equivclass.c
index e8e9e9a314..d4d80c8101 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2169,7 +2169,7 @@ add_child_rel_equivalences(PlannerInfo *root,
 continue;   /* ignore children here */

 /* Does this member reference child's topmost parent rel? */
-if (bms_overlap(cur_em->em_relids, top_parent_relids))
+if (bms_is_subset(cur_em->em_relids, top_parent_relids))
 {
 /* Yes, generate transformed child version */
 Expr   *child_expr;

Thoughts?

Thanks,
Amit


d25ea01275-fixup-PG12-v4.patch
Description: Binary data

Re: auxiliary processes in pg_stat_ssl

2019-11-04 Thread Kuntal Ghosh

On Mon, Nov 4, 2019 at 9:09 PM Euler Taveira  wrote:
> >
> > But this seems pointless.  Should we not hide those?  Seems this only
> > happened as an unintended side-effect of fc70a4b0df38.  It appears to me
> > that we should redefine that view to restrict backend_type that's
> > 'client backend' (maybe include 'wal receiver'/'wal sender' also, not
> > sure.)
> >
> Yep, it is pointless. BackendType that open connections to server are:
> autovacuum worker, client backend, background worker, wal sender. I
> also notice that pg_stat_gssapi is in the same boat as pg_stat_ssl and
> we should constraint the rows to backend types that open connections.
> I'm attaching a patch to list only connections in those system views.
>
Yeah, We should hide those. As Robert mentioned, I think checking
whether 'client_port IS NOT NULL' is a better approach than checking
the backend_type. The patch looks good to me.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Keep compiler silence (clang 10, implicit conversion from 'long' to 'double' )

2019-11-04 Thread Kyotaro Horiguchi

At Mon, 04 Nov 2019 12:53:48 -0500, Tom Lane  wrote in 
> Yuya Watari  writes:
> > I attached the modified patch. In the patch, I placed the macro in
> > "src/include/c.h", but this may not be a good choice because c.h is
> > widely included from a lot of files. Do you have any good ideas about
> > its placement?
> 
> I agree that there's an actual bug here; it can be demonstrated with
> 
> # select extract(epoch from '256 microseconds'::interval * (2^55)::float8);
>  date_part  
> 
>  -9223372036854.775
> (1 row)
> 
> which clearly is a wrong answer.
> 
> I do not however like any of the proposed patches.  We already have one
> place that deals with this problem correctly, in int8.c's dtoi8():
> 
> /*
>  * Range check.  We must be careful here that the boundary values are
>  * expressed exactly in the float domain.  We expect PG_INT64_MIN to be an
>  * exact power of 2, so it will be represented exactly; but PG_INT64_MAX
>  * isn't, and might get rounded off, so avoid using it.
>  */
> if (unlikely(num < (float8) PG_INT64_MIN ||
>  num >= -((float8) PG_INT64_MIN) ||
>  isnan(num)))
> ereport(ERROR,
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
>  errmsg("bigint out of range")));
> 
> We should adopt that coding technique not invent new ones.
> 
> I do concur with creating a macro that encapsulates a correct version
> of this test, maybe like
> 
> #define DOUBLE_FITS_IN_INT64(num) \
>   ((num) >= (double) PG_INT64_MIN && \
>(num) < -((double) PG_INT64_MIN))

# I didn't noticed the existing bit above.

Agreed. it is equivalent to the trick AFAICS thus no need to add
another one to warry with.

> (or s/double/float8/ ?)

Maybe.

> c.h is probably a reasonable place, seeing that we define the constants
> there.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Logical replication wal sender timestamp bug

2019-11-04 Thread Michael Paquier

On Sat, Nov 02, 2019 at 09:54:54PM -0400, Jeff Janes wrote:
> While monitoring pg_stat_subscription, I noticed that last_msg_send_time
> was usually NULL, which doesn't make sense.  Why would we have a message,
> but not know when it was sent?

So...  The timestamp is received and stored in LogicalRepApplyLoop()
with send_time when a 'w' message is received in the subscription
cluster.  And it gets computed with a two-phase process:
- WalSndPrepareWrite() reserves the space in the message for the
timestamp.
- WalSndWriteData() computes the timestamp in the reserved space once
the write message is computed and ready to go.

> Filling out the timestamp after the message has already been sent is taking
> "as late as possible" a little too far.  It results in every message having
> a zero timestamp, other than keep-alives which go through a different path.

It seems to me that you are right: the timestamp is computed too
late.

> Re-ordering the code blocks as in the attached seems to fix it.  But I have
> to wonder, if this has been broken from the start and no one ever noticed,
> is this even valuable information to relay in the first place?  We could
> just take the column out of the view, and not bother calling
> GetCurrentTimestamp() for each message.

I think that there are use cases for such monitoring capabilities,
see for example 7fee252.  So I would rather keep it.
--
Michael


signature.asc
Description: PGP signature

Re: TAP tests aren't using the magic words for Windows file access

2019-11-04 Thread Michael Paquier

On Sun, Nov 03, 2019 at 10:53:00PM -0500, Tom Lane wrote:
> That is, TestLib::slurp_file is failing to read a file.  Almost
> certainly, "permission denied" doesn't really mean a permissions
> problem, but failure to specify the file-opening flags needed to
> allow concurrent access on Windows.  We fixed this in pg_ctl
> itself in commit 0ba06e0bf ... but we didn't fix the TAP
> infrastructure.  Is there an easy way to get Perl on board
> with that?

If we were to use Win32API::File so as the file is opened in shared
mode, we would do the same as what our frontend/backend code does (see
$uShare):
https://metacpan.org/pod/Win32API::File
--
Michael


signature.asc
Description: PGP signature

Re: Run-time pruning for ModifyTable

2019-11-04 Thread Thomas Munro

On Thu, Sep 12, 2019 at 10:10 AM Alvaro Herrera
 wrote:
> Here's a rebased version of this patch (it had a trivial conflict).

Hi, FYI partition_prune.sql currently fails (maybe something to do
with commit d52eaa09?):

 where s.a = $1 and s.b = $2 and s.c = (select 1);
 explain (costs off) execute q (1, 1);
   QUERY PLAN

+
  Append
InitPlan 1 (returns $0)
  ->  Result
-   Subplans Removed: 1
->  Seq Scan on p1
- Filter: ((a = $1) AND (b = $2) AND (c = $0))
+ Filter: ((a = 1) AND (b = 1) AND (c = $0))
->  Seq Scan on q111
- Filter: ((a = $1) AND (b = $2) AND (c = $0))
+ Filter: ((a = 1) AND (b = 1) AND (c = $0))
->  Result
- One-Time Filter: ((1 = $1) AND (1 = $2) AND (1 = $0))
-(10 rows)
+ One-Time Filter: (1 = $0)
+(9 rows)

 execute q (1, 1);
  a | b | c

Re: pause recovery if pitr target not reached

2019-11-04 Thread Fujii Masao

On Fri, Nov 1, 2019 at 9:41 PM Peter Eisentraut
 wrote:
>
> On 2019-10-21 08:44, Fujii Masao wrote:
> > Probably we can use standby mode + recovery target setting for
> > the almost same purpose. In this configuration, if end-of-WAL is reached
> > before recovery target, the startup process keeps waiting for new WAL to
> > be available. Then, if recovery target is reached, the startup process works
> > as recovery_target_action indicates.
>
> So basically get rid of recovery.signal mode and honor recovery target
> parameters in standby mode?

Yes, currently not only archive recovery mode but also standby mode honors
the recovery target settings.

> That has some appeal because it simplify
> this whole space significantly, but perhaps it would be too confusing
> for end users?

This looks less confusing than extending archive recovery. But I'd like to
hear more opinions about that.

Regards,

-- 
Fujii Masao

Re: Include RELKIND_TOASTVALUE in get_relkind_objtype

2019-11-04 Thread Michael Paquier

On Mon, Nov 04, 2019 at 03:31:27PM -0500, Tom Lane wrote:
> I'd rather do something like the attached, which makes it more of an
> explicit goal that we won't fail on bad input.  (As written, we'd only
> fail on bad classId, which is a case that really shouldn't happen.)

Okay, that part looks fine.

> Tests are the same as yours, but I revised the commentary and got
> rid of the elog-for-bad-relkind.

No objections on that part either.

> I also made some cosmetic changes in commands/alter.c, so as to (1)
> make it clear by inspection that those calls are only used to feed
> aclcheck_error, and (2) avoid uselessly computing a value that we
> won't need in normal non-error cases.

Makes also sense.  Thanks for looking at it!
--
Michael


signature.asc
Description: PGP signature

Re: Do we have a CF manager for November?

2019-11-04 Thread Michael Paquier

On Mon, Nov 04, 2019 at 10:54:52AM -0500, Tom Lane wrote:
> It's time to start the next commitfest.  I seem to recall somebody
> saying back in September that they'd run the next one, but I forget
> who.  Anyway, we need a volunteer to be chief nagger.

That may have been me.  I can take this one if there is nobody else
around.

Note: I have switched the app as in progress a couple of days ago,
after AoE was on the 1st of November of course.
--
Michael


signature.asc
Description: PGP signature

Standard-conforming datetime years parsing

2019-11-04 Thread Alexander Korotkov

Hi!

Thread [1] about support for .datetime() jsonpath method raises a
question about standard-conforming parising for Y, YY, YYY and RR
datetime template patterns.

According to standard YYY, YY and Y should get higher digits from
current year. Our current implementation gets higher digits so that
the result is closest to 2020.

We currently don't support RR. According to standard RR behavior is
implementation-defined and should select marching 4-digit year in the
interval [CY - 100; CY + 100], where CY is current year. So, our
current implementation of YY is more like RR according to standard.

The open question are:
1) Do we like to make our datetime parsing to depend on current
timestamp? I guess no. But how to parse one-digit year? If we
hardcode constant it would outdate in decade. Thankfully, no one in
the right mind wouldn't use Y pattern, but still.
2) How do we like to parse RR? Standard lives us a lot of freedom
here. Do we like to parse it as do we parse YY now? It looks
reasonable to select a closest matching year. Since PG 13 is going to
be released in 2020, our algorithm would be perfect fit at release
time.
3) Do we like to change behavior to_date()/to_timestamp()? Or just
jsonpath .datetime() and future CAST(... AS ... FORMAT ...) defined in
SQL 2016?

Attached patch solve the questions above as following. YYY, YY and Y
patterns get higher digits from 2020. So, results for Y would become
inconsistent since 2030. RR select matching year closest to 2020 as
YY does for now. It changes behavior for both
to_date()/to_timestamp() and jsonpath .datetime().

1 2 >

1 - 100 of 107 matches

Mail list logo