Re: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-03-27 Thread Haribabu Kommi
On Wed, Mar 27, 2019 at 11:27 PM Amit Kapila 
wrote:

> On Wed, Mar 27, 2019 at 6:53 AM Haribabu Kommi 
> wrote:
> > On Tue, Mar 26, 2019 at 9:08 PM Amit Kapila 
> wrote:
> >>
> >>   As part of this thread, maybe we can
> >> just fix the case of the parallel cooperating transaction.
> >
> >
> > With the current patch, all the parallel implementation transaction are
> getting
> > skipped, in my tests parallel workers are the major factor in the
> transaction
> > stats counter. Even before parallelism, the stats of the autovacuum and
> etc
> > are still present but their contribution is not greatly influencing the
> stats.
> >
> > I agree with you in fixing the stats with parallel workers and improve
> stats.
> >
>
> I was proposing to fix only the transaction started with
> StartParallelWorkerTransaction by using is_parallel_worker flag as
> discussed above.  I understand that it won't completely fix the
> problem reported by you, but it will be a step in that direction.  My
> main worry is that if we fix it the way you are proposing and we also
> invent a new way to deal with all other internal transactions, then
> the fix done by us now needs to be changed/reverted.  Note, that this
> fix needs to be backpatched as well, so we should avoid doing any fix
> which needs to be changed or reverted.
>

I tried the approach as your suggested as by not counting the actual
parallel work
transactions by just releasing the resources without touching the counters,
the counts are not reduced much.

HEAD - With 4 parallel workers running query generates 13 stats ( 4 * 3  +
1)
Patch -  With 4 parallel workers running query generates 9 stats ( 4 * 2 +
1)
Old approach patch - With 4 parallel workers running query generates 1 stat
(1)

Currently the parallel worker start transaction 3 times in the following
places.
1. InitPostgres
2. ParallelWorkerMain (2)

with the attached patch, we reduce one count from ParallelWorkerMain.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Avoid-counting-parallel-worker-transactions-stats_v3.patch
Description: Binary data


RE: minimizing pg_stat_statements performance overhead

2019-03-27 Thread Fabien COELHO



Hello Raymond,

Note that this does not mean that the patch should not be applied, it 
looks like an oversight, but really I do not have the performance 
degradation you are suggesting.


I appreciate your input and I want to come up with a canonical test that 
makes this contention more obvious. Unfortunately, it is difficult 
because the criteria that causes this slow down (large query sizes and 
distinct non-repeated queries) are difficult to reproduce with pgbench. 
I would be open to any suggestions here.


So even though the performance gains in this specific scenario are not 
as great, do you still think it would make sense to submit a patch like 
this?


Sure, it definitely makes sense to reduce the overhead when the extension 
is disabled. I wanted to understand the source of performance issue, and 
your explanations where not enough for reproducing it.


--
Fabien.




RE: Timeout parameters

2019-03-27 Thread Tsunakawa, Takayuki
From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com]
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> > +if (setsockopt(conn->sock, IPPROTO_TCP, TCP_USER_TIMEOUT,
> > + (char *) &timeout, sizeof(timeout)) < 0 && errno !=
> > ENOPROTOOPT)
> > +{
> > +charsebuf[256];
> > +
> > +appendPQExpBuffer(&conn->errorMessage,
> > +libpq_gettext("setsockopt(TCP_USER_TIMEOUT)
> > failed: %s\n"),
> >
> > I suppose that the reason ENOPROTOOPT is excluded from error
> > condition is that the system call may fail with that errno on
> > older kernels, but I don't think that that justifies ignoring the
> > failure.
> 
> I think that's for the case where the modules is built on an OS that supports
> TCP_USER_TIMEOUT (#ifdef TCP_USER_TIMEOUT is true), and the module is used
> on an older OS that doesn't support TCP_USER_TIMEOUT.  I remember I was
> sometimes able to do such a thing on Linux and Solaris.  If we don't have
> to handle such usage, I agree about removing the special handling of
> ENOTPROTO.

Oops, I agree that we return an error even in the ENOPROTOOPT case, because 
setsockopt() is called only when the user specifies tcp_user_timeout.


Regards
Takayuki Tsunakawa






Re: New vacuum option to do only freezing

2019-03-27 Thread Masahiko Sawada
On Wed, Mar 27, 2019 at 12:12 AM Masahiko Sawada  wrote:
>
> On Sat, Mar 23, 2019 at 3:25 AM Robert Haas  wrote:
> >
> > On Fri, Mar 8, 2019 at 12:14 AM Masahiko Sawada  
> > wrote:
> > > IIUC we've discussed the field-and-value style vacuum option. I
> > > suggested that since we have already the disable_page_skipping option
> > > the disable_page_skipping option would be more natural style and
> > > consistent. I think "VACUUM (INDEX_CLEANUP false)" seems consistent
> > > with its reloption but not with other vacuum options. So why does only
> > > this option (and probably up-coming new options) need to support new
> > > style? Do we need the same change to the existing options?
> >
> > Well, it's too late to change to change DISABLE_PAGE_SKIPPING to work
> > some other way; it's been released, and we're stuck with it at this
> > point.
>
> Agreed.
>
> > However, I generally believe that it is preferable to phrase
> > options positively then negatively, so that for example one writes
> > EXPLAIN (ANALYZE, TIMING OFF) not EXPLAIN (ANALYZE, NO_TIMING).  So
> > I'd like to do it that way for the new options that we're proposing to
> > add.
>
> Agreed with using phrase options positively than negatively. Since
> DISABLE_PAGE_SKIPPING is an option for emergency we might be able to
> rename for consistency in a future release.
>
> Attached updated version patches.

The patch adds the basic functionality to disable index cleanup but
one possible argument could be whether we should always disable it
when anti-wraparound vacuum. As discussed on another thread[1]
anti-wraparound vacuum still could lead the I/O burst problem and take
a long time, especially for append-only large table. Originally the
purpose of this feature is to resolve the problem that vacuum takes a
long time even if the table has just a few dead tuples, which is a
quite common situation of anti-wraparound vacuum. It might be too late
to discuss but if we always disable it when anti-wraparound vacuum
then users don't need to do "VACUUM (INDEX_CLEANUP false)" manually on
PostgreSQL 12. Dose anyone have opinions?

[1] 
https://www.postgresql.org/message-id/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




RE: Timeout parameters

2019-03-27 Thread Tsunakawa, Takayuki
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> +if (setsockopt(conn->sock, IPPROTO_TCP, TCP_USER_TIMEOUT,
> + (char *) &timeout, sizeof(timeout)) < 0 && errno !=
> ENOPROTOOPT)
> +{
> +charsebuf[256];
> +
> +appendPQExpBuffer(&conn->errorMessage,
> +libpq_gettext("setsockopt(TCP_USER_TIMEOUT)
> failed: %s\n"),
> 
> I suppose that the reason ENOPROTOOPT is excluded from error
> condition is that the system call may fail with that errno on
> older kernels, but I don't think that that justifies ignoring the
> failure.

I think that's for the case where the modules is built on an OS that supports 
TCP_USER_TIMEOUT (#ifdef TCP_USER_TIMEOUT is true), and the module is used on 
an older OS that doesn't support TCP_USER_TIMEOUT.  I remember I was sometimes 
able to do such a thing on Linux and Solaris.  If we don't have to handle such 
usage, I agree about removing the special handling of ENOTPROTO.


Regards
Takayuki Tsunakawa








Re: Berserk Autovacuum (let's save next Mandrill)

2019-03-27 Thread Komяpa
чт, 28 мар. 2019 г. в 01:01, Alvaro Herrera :

> On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
>
>
> > чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera :
> >
> > > On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:
>
> > > * certain tables would have some sort of partial scan that sets the
> > >   visibility map.  There's no reason to invoke the whole vacuuming
> > >   machinery.  I don't think this is limited to append-only tables, but
> > >   rather those are just the ones that are affected the most.
> >
> > What other machinery runs on VACUUM invocation that is not wanted there?
> > Since Postgres 11 index cleanup is already skipped on append-only tables.
>
> Well, I think it would be useful to set all-visible earlier than waiting
> for a vacuum to be necessary, even for tables that are not append-only.
> So if you think about this just for the append-only table, you leave
> money on the table.
>

Thing is, problem does not exist for non-append-only tables, they're going
to be vacuumed after 50 rows got updated, automatically.


>
> > > * tables nearing wraparound danger should use the (yet to be committed)
> > >   option to skip index cleaning, which makes the cleanup action faster.
> > >   Again, no need for complete vacuuming.
> >
> > "Nearing wraparound" is too late already. In Amazon, reading table from
> gp2
> > after you exhausted your IOPS burst budget is like reading a floppy
> drive,
> > you have to freeze a lot earlier than you hit several terabytes of
> unfrozen
> > data, or you're dead like Mandrill's Search and Url tables from the link
> I
> > shared.
>
> OK, then start freezing tuples in the cheap mode (skip index updates)
> earlier than that.  I suppose a good question is when to start.
>

Attached (autovacuum_berserk_v1.patch)
 code achieves that. For append-only tables since
https://commitfest.postgresql.org/16/952/ vacuum skips index cleanup if no
updates happened. You just need to trigger it, and it already will be
"cheap".

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Usage of epoch in txid_current

2019-03-27 Thread Thomas Munro
On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas  wrote:
> Once we have the FullTransactionId type and basic macros in place, I'm
> sure we could tidy up a bunch of code by using them. For example,
> TransactionIdInRecentPast() in walsender.c would be simpler, if the
> caller dealt with FullTransactionIds rather than xid+epoch. But it makes
> sense to do that separately.

+1

> > + /*
> > +  * It is safe to read nextFullXid without a lock, because this is only
> > +  * called from the startup process, meaning that no other process can
> > +  * modify it.
> > +  */
> > + Assert(AmStartupProcess());
> > +
>
> This assertion fails on WAL replay in single-user mode:

Fixed.  (Embarrassingly I had that working in v7 but broke it in v8).

I decided to do some testing on a 32 bit system, and ran into weird
new problem in heap_compute_xid_horizon_for_tuples() which I assumed
to be somehow my fault due to the mention of xid horizons, but I
eventually realised that master was broken on that machine and
followed that up elsewhere.  Phew.

Thanks for the reviews!  Pushed.

-- 
Thomas Munro
https://enterprisedb.com




Re: speeding up planning with partitions

2019-03-27 Thread Tom Lane
Amit Langote  writes:
> On 2019/03/27 23:57, Tom Lane wrote:
>> Yeah, there's something to be said for having plancat.c open each table
>> *and store the Relation pointer in the RelOptInfo*, and then close that
>> again at the end of planning rather than immediately.  If we can't avoid
>> these retail table_opens without a great deal of pain, that's the
>> direction I'd tend to go in.  However it would add some overhead, in
>> the form of a need to traverse the RelOptInfo array an additional time.

> Just to be sure, do you mean we should do that now or later (David said
> "in the long term")?

It's probably not high priority, though I wonder how much time is being
eaten by the repeated table_opens.

regards, tom lane




Re: jsonpath

2019-03-27 Thread Tom Lane
Andrew Dunstan  writes:
> I was able to get this stack trace.
> 
> (gdb) bt
> #0  0x7ffb9ce6a458 in ntdll!RtlRaiseStatus ()
>from C:\WINDOWS\SYSTEM32\ntdll.dll
> #1  0x7ffb9ce7760e in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll
> #2  0x7ffb9ac52e1a in msvcrt!_setjmpex ()
>from C:\WINDOWS\System32\msvcrt.dll
> #3  0x0087431a in pg_re_throw ()
> at 
> c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:1720
> #4  0x00874106 in errfinish (dummy=)
> at 
> c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:464
> #5  0x007cc938 in jsonpath_yyerror (result=result@entry=0x0, 
> message=message@entry=0xab0868 <__func__.110231+1592> "unrecognized flag 
> of LIKE_REGEX predicate")
> at 
> /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_scan.l:305
> #6  0x007cec9d in makeItemLikeRegex (pattern=, 
> pattern=, flags=, expr=0x73c7a80)
> at 
> /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_gram.y:512

Hmm.  Reaching the yyerror call is expected given this input, but
seemingly the siglongjmp stack has been trashed?  The best I can
think of is a wild store that either occurs only on this platform
or happens to be harmless elsewhere ... but neither idea is terribly
convincing.

BTW, the expected behavior according to the regression test is

regression=# select '$ ? (@ like_regex "pattern" flag "a"'::jsonpath;
ERROR:  bad jsonpath representation
LINE 1: select '$ ? (@ like_regex "pattern" flag "a"'::jsonpath;
   ^
DETAIL:  unrecognized flag of LIKE_REGEX predicate at or near """

which leaves quite a lot to be desired already.  The "bad whatever"
error wording is a flat out violation of our message style guide,
while the "at or near  bit is pretty darn unhelpful.

The latter problem occurs because the last flex production was

\"  {
yylval->str = scanstring;
BEGIN INITIAL;
return STRING_P;
}

so that flex thinks the last token was just the quote mark ending the
string.  This could be improved on by adopting something similar to the
SET_YYLLOC() convention used in scan.l to remember the start of what the
user would think the token is.  Probably it's not worth the work right
now, but details like this are important from a fit-and-finish
perspective, so I'd like to see it improved sometime.

regards, tom lane




Re: speeding up planning with partitions

2019-03-27 Thread Amit Langote
On 2019/03/27 23:57, Tom Lane wrote:
> David Rowley  writes:
>> On Wed, 27 Mar 2019 at 18:39, Amit Langote
>>  wrote:
>>> On 2019/03/27 14:26, David Rowley wrote:
 Perhaps the way to make this work, at least in the long term is to do
 in the planner what we did in the executor in d73f4c74dd34.
> 
>>> Maybe you meant 9ddef36278a9?
> 
>> Probably.
> 
> Yeah, there's something to be said for having plancat.c open each table
> *and store the Relation pointer in the RelOptInfo*, and then close that
> again at the end of planning rather than immediately.  If we can't avoid
> these retail table_opens without a great deal of pain, that's the
> direction I'd tend to go in.  However it would add some overhead, in
> the form of a need to traverse the RelOptInfo array an additional time.

Just to be sure, do you mean we should do that now or later (David said
"in the long term")?

Thanks,
Amit





Re: Ordered Partitioned Table Scans

2019-03-27 Thread David Rowley
On Thu, 28 Mar 2019 at 15:40, David Rowley  wrote:
> Thanks for the review.  I've attached a patch that mostly just moved
> the code around.

Here's one that fixes up the compiler warning from the last one.
Thanks CF bot...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


mergeappend_to_append_conversion_v17.patch
Description: Binary data


Re: Timeout parameters

2019-03-27 Thread Kyotaro HORIGUCHI
Hello.

At Thu, 28 Mar 2019 01:11:23 +, "Nagaura, Ryohei" 
 wrote in 

> Hello, Fabien-san.
> 
> > From: Fabien COELHO 
> > About the backend v11 patch.
> > No space or newline before ";". Same comment about the libpq_ timeout.
> Fixed.
> 
> > There is an error in the code, I think it should be < 0 to detect errors.
> Yes, thanks!
> 
> > If the parameter has no effect on Windows, I do not see why its value 
> > should be
> > constrained to zero, it should just have no effect. Idem libpq_ timeout.
> I had a misunderstanding.
> Indeed, it doesn't need to be zero. Removed.
> 
> > Documentation:
> > ISTM this is not about libpq connections but about TCP connections. There 
> > can be
> > non libpq implementations client side.
> Then, where do you think the correct place?
> I thought that this parameter should be explained here because the 
> communication
> will be made through the library libpq same as keepalive features.

In TCP_backend patch:

+   
+Define a wrapper for TCP_USER_TIMEOUT
+socket option of libpq connection.
+   
+   
+Specifies the number of milliseconds after which a TCP connection can 
be
+aborted by the operation system due to network problems when sending or
+receiving the data through this connection. A value of zero uses the
+system default. In sessions connected via a Unix-domain socket, this
+parameter is ignored and always reads as zero. This parameter is
+supported only on systems that support
+TCP_USER_TIMEOUT; on other systems, it has no 
effect.
+   


I think this is not mentioning backend. Why don't you copy'n
paste then modify the description of tcp_keepalives_idle? Perhaps
it needs a similar caveat related to Windows.


+static const char*
+show_tcp_user_timeout(void)
+{
+   /* See comments in assign_tcp_keepalives_idle */
+   static char nbuf[16];
+
+   snprintf(nbuf, sizeof(nbuf), "%d", tcp_user_timeout);
+   return nbuf;
+}

The reason for, say, tcp_keepalive_idle uses the hook is that it
doesn't show the GUC variable, but the actual value read by
getsockopt. This is just showing the GUC value. I think this
should behave the same way with tcp_keepalive* options. If not,
we should have an explanation of the reason there.


In TCP_interface patch:

+   
+Define a wrapper for TCP_USER_TIMEOUT
+socket option of libpq connection.

What does the "wrapper" mean?

+   
+   
+Specifies the number of milliseconds after which a TCP connection can 
be
+aborted by the operation system due to network problems when sending or
+receiving the data through this connection. A value of zero uses the
+system default. In sessions connected via a Unix-domain socket, this
+parameter is ignored and always reads as zero. This parameter is
+supported only on systems that support
+TCP_USER_TIMEOUT; on other systems, it has no 
effect.
+   

I would suggest the same thing as the backend
part. Copy'n-paste-modify keepalives_idle would be better.


+if (setsockopt(conn->sock, IPPROTO_TCP, TCP_USER_TIMEOUT,
+ (char *) &timeout, sizeof(timeout)) < 0 && errno != ENOPROTOOPT)
+{
+charsebuf[256];
+
+appendPQExpBuffer(&conn->errorMessage,
+libpq_gettext("setsockopt(TCP_USER_TIMEOUT) failed: %s\n"),

I suppose that the reason ENOPROTOOPT is excluded from error
condition is that the system call may fail with that errno on
older kernels, but I don't think that that justifies ignoring the
failure.


+   if (!IS_AF_UNIX(addr_cur->ai_family))
+   {
+   if (!setTCPUserTimeout(conn))
+   {
+   closesocket(conn->sock);
+   conn->sock = -1;
+   conn->addr_cur = 
addr_cur->ai_next;
+   goto keep_going;
+   }
+   }

I don't see a point in the added part having own "if
(!IS_AF_UNIX" block separately from tcp_keepalive options.


+   /* TCP USER TIMEOUT */
+   {"tcp_user_timeout", NULL, NULL, NULL,
+   "TCP_user_timeout", "", 10, /* strlen(INT32_MAX) == 10 */
+   offsetof(struct pg_conn, pgtcp_user_timeout)},
+

Similary, why isn't this placed just after tcp_keepalives
options?

+char   *pgtcp_user_timeout;/* tcp user timeout (numeric string) */

+char   *tcp_user_timeout;/* tcp user timeout */

The latter doesn't seem to be used.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: partitioned tables referenced by FKs

2019-03-27 Thread Amit Langote
Hi,

On 2019/03/27 8:27, Alvaro Herrera wrote:
> On 2019-Mar-26, Alvaro Herrera wrote:
> 
>> Thanks for the thorough testing and bug analysis!  It was spot-on.  I've
>> applied your two proposed fixes, as well as added a new test setup that
>> covers both these bugs.  The attached set is rebased on 7c366ac969ce.
> 
> Attached is rebased on 126d63122232.  No further changes.

Noticed something -- I was expecting that your earlier commit 1af25ca0c2d9
would've taken care of this, but apparently it didn't.

-- pk
create table p (a int primary key) partition by list (a);
create table p1 partition of p for values in (1) partition by list (a);
create table p11 partition of p1 for values in (1);

-- fk (partitioned)
create table q (a int references p) partition by list (a);
create table q1 partition of q for values in (1) partition by list (a);
create table q11 partition of q1 for values in (1);

-- prints only the top-level constraint as expected
\d q
   Partitioned table "public.q"
 Column │  Type   │ Collation │ Nullable │ Default
┼─┼───┼──┼─
 a  │ integer │   │  │
Partition key: LIST (a)
Foreign-key constraints:
"q_a_fkey" FOREIGN KEY (a) REFERENCES p(a)
Number of partitions: 1 (Use \d+ to list them.)

-- fk (regular table)
create table r (a int references p);

-- this prints all, which seems unintentional
\d r
 Table "public.r"
 Column │  Type   │ Collation │ Nullable │ Default
┼─┼───┼──┼─
 a  │ integer │   │  │
Foreign-key constraints:
"r_a_fkey" FOREIGN KEY (a) REFERENCES p(a)
"r_a_fkey1" FOREIGN KEY (a) REFERENCES p1(a)
"r_a_fkey2" FOREIGN KEY (a) REFERENCES p11(a)

So, 0002 needs to include some psql tweaks.

Thanks,
Amit





RE: Timeout parameters

2019-03-27 Thread Nagaura, Ryohei
Hello, Fabien-san.

> From: Fabien COELHO [mailto:coe...@cri.ensmp.fr]
> The default postgres configuration file should be updated to reflect the
> parameter and its default value.
I'm very sorry for not addressing your review in the last patch.
I modified TCP_backend patch (adding postgresql.conf.sample) and attached in 
this mail.

Best regards,
-
Ryohei Nagaura



socket_timeout_v9.patch
Description: socket_timeout_v9.patch


TCP_interface_v12.patch
Description: TCP_interface_v12.patch


TCP_backend_v13.patch
Description: TCP_backend_v13.patch


Re: jsonpath

2019-03-27 Thread Andrew Dunstan

On 3/27/19 9:48 AM, Tom Lane wrote:
> Alexander Korotkov  writes:
>> Still no reproduction.
> Annoying, but it's probably not worth expending more effort on
> right now.  I wonder whether that buildfarm animal can be upgraded
> to capture core dump stack traces --- if so, then if it happens
> again we'd have more info.
>
>   



I was able to get this stack trace.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

[Switching to Thread 3960.0x34b4]
0x7ffb9ce6a458 in ntdll!RtlRaiseStatus ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
(gdb) bt
#0  0x7ffb9ce6a458 in ntdll!RtlRaiseStatus ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x7ffb9ce7760e in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x7ffb9ac52e1a in msvcrt!_setjmpex ()
   from C:\WINDOWS\System32\msvcrt.dll
#3  0x0087431a in pg_re_throw ()
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:1720
#4  0x00874106 in errfinish (dummy=)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:464
#5  0x007cc938 in jsonpath_yyerror (result=result@entry=0x0, 
message=message@entry=0xab0868 <__func__.110231+1592> "unrecognized flag of 
LIKE_REGEX predicate")
at 
/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_scan.l:305
#6  0x007cec9d in makeItemLikeRegex (pattern=, 
pattern=, flags=, expr=0x73c7a80)
at 
/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_gram.y:512
#7  jsonpath_yyparse (result=, result@entry=0x4b0e818)
at 
/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_gram.y:183
#8  0x007ced61 in parsejsonpath (
str=str@entry=0x73c71e0 "$ ? (@ like_regex \"pattern\" flag \"a\")", 
len=len@entry=37)
at 
/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_scan.l:481
#9  0x007cf347 in jsonPathFromCstring (
in=0x73c71e0 "$ ? (@ like_regex \"pattern\" flag \"a\")", len=37)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath.c:170
#10 0x008795ea in InputFunctionCall (flinfo=flinfo@entry=0x4b0e970, 
str=str@entry=0x73c71e0 "$ ? (@ like_regex \"pattern\" flag \"a\")", 
typioparam=typioparam@entry=4072, typmod=typmod@entry=-1)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/fmgr/fmgr.c:1548
#11 0x008797fc in OidInputFunctionCall (functionId=, 
str=0x73c71e0 "$ ? (@ like_regex \"pattern\" flag \"a\")", 
typioparam=4072, typmod=typmod@entry=-1)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/fmgr/fmgr.c:1651
#12 0x0053e395 in stringTypeDatum (tp=tp@entry=0x7475b50, 
string=, atttypmod=atttypmod@entry=-1)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/parser/parse_type.c:641
#13 0x0052729d in coerce_type (pstate=0x73c7618, node=0x73c7848, 
inputTypeId=, targetTypeId=4072, targetTypeMod=-1, 
ccontext=COERCION_EXPLICIT, cformat=COERCE_EXPLICIT_CAST, location=46)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/parser/parse_coerce.c:304
#14 0x0052772b in coerce_to_target_type (
pstate=pstate@entry=0x73c7618, expr=, 
expr@entry=0x73c7848, exprtype=, exprtype@entry=705, 
targettype=, targettypmod=-1, 
ccontext=ccontext@entry=COERCION_EXPLICIT, 
cformat=cformat@entry=COERCE_EXPLICIT_CAST, location=location@entry=46)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/parser/parse_coerce.c:103
#15 0x0052c489 in transformTypeCast (tc=0x73c7378, pstate=0x73c7618)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/parser/parse_expr.c:2811
#16 transformExprRecurse (pstate=pstate@entry=0x73c7618, 
expr=expr@entry=0x73c7378)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/parser/parse_expr.c:202
#17 0x0052f8aa in transformExprRecurse (expr=0x73c7378, 
pstate=0x73c7618)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/parser/parse_expr.c:163
#18 transformExpr (pstate=pstate@entry=0x73c7618, expr=expr@entry=0x73c7378, 
exprKind=exprKind@entry=EXPR_KIND_SELECT_TARGET)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/parser/parse_expr.c:155
#19 0x0053be7f in transformTargetEntry (
pstate=pstate@entry=0x73c7618, node=0x73c7378, expr=expr@entry=0x0, 
exprKind=exprKind@entry=EXPR_KIND_SELECT_TARGET, colname=0x0, 
resjunk=resjunk@entry=false)
at 
c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/bac

RE: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-03-27 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]
> You're both right and I'm wrong.
> 
> However, I think it would be better to stick with the term 'truncate'
> which is widely-used already, rather than introducing a new term.

Yeah, I have the same feeling.  OTOH, as I referred in this thread, shrink is 
used instead of truncate in the PostgreSQL documentation.  So, I chose shrink.  
To repeat myself, I'm comfortable with either word.  I'd like the committer to 
choose what he thinks better.


Regards
Takayuki Tsunakawa






Re: Ordered Partitioned Table Scans

2019-03-27 Thread David Rowley
On Thu, 28 Mar 2019 at 14:34, Amit Langote
 wrote:
>
> On 2019/03/28 7:29, David Rowley wrote:
> > On Wed, 27 Mar 2019 at 19:48, Amit Langote
> > It does need to know how many partitions the partitioned table has,
> > which it gets from partrel->nparts, so yeah, RelOptInfo is probably
> > needed. I don't think passing in int nparts is a good solution to
> > that.  The problem with moving it to partbounds.c is that nothing
> > there knows about RelOptInfo currently.
>
> Maybe, this could be a start.  Also, there is a patch in nearby thread
> which adds additional functionality to partbounds.c to be used by
> partitionwise join code in the optimizer [1].

Thanks for the review.  I've attached a patch that mostly just moved
the code around.

I also changed the comment in build_partition_pathkeys() to explain
about the nulls_first argument and why I just pass
ScanDirectionIsBackward(scandir).

Also, another comment in struct PlannerInfo to mentioning the
guarantee about append_rel_list having earlier partitions as defined
in PartitionDesc earlier in the list.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


mergeappend_to_append_conversion_v16.patch
Description: Binary data


Re: Log a sample of transactions

2019-03-27 Thread Masahiko Sawada
Thank you for updating the patch!

On Wed, Mar 27, 2019 at 10:28 PM Adrien NAYRAT
 wrote:
>
> On 3/27/19 10:22 AM, Adrien NAYRAT wrote:
> > Hello,
> >
> > On 3/27/19 7:05 AM, Masahiko Sawada wrote:
> > Why are both of these checked? ISTM once xact_is_sampled is set, we
> > ought not to respect a different value of log_xact_sample_rate for the
> > rest of that transaction.
>  I added theses checks to allow to disable logging during a sampled
>  transaction, per suggestion of Masahiko Sawada:
>  https://www.postgresql.org/message-id/CAD21AoD4t%2BhTV6XfK5Yz%3DEocB8oMyJSYFfjAryCDYtqfib2GrA%40mail.gmail.com
> 
> >>> I added a comment to explain why transaction logging is rechecked.
> >> Umm, I'm inclined to think that we should not disable logging within
> >> the transaction. If we allow that, since users can disable the logging
> >> within the transaction by setting it to 0 they may think that we can
> >> change the rate during the transaction, which is wrong. If we want
> >> this behavior we should document it but we can make user not being
> >> able to change the rate during the transaction to avoid confusion. And
> >> the latter looks more understandable and straightforward. This is
> >> different comment what I did before, I'm sorry for confusing you.
> >
> > Don't worry, I will change that.
> >
> >
> >>
> > As far as I can tell xact_is_sampled is not properly reset in case of
> > errors?
> >
> >>> I am not sure if I should disable logging in case of errors. Actually we
> >>> have:
> >>>
> >>> postgres=# set log_transaction_sample_rate to 1;
> >>> SET
> >>> postgres=# set client_min_messages to 'LOG';
> >>> LOG:  duration: 0.392 ms  statement: set client_min_messages to 'LOG';
> >>> SET
> >>> postgres=# begin;
> >>> LOG:  duration: 0.345 ms  statement: begin;
> >>> BEGIN
> >>> postgres=# select 1;
> >>> LOG:  duration: 0.479 ms  statement: select 1;
> >>>?column?
> >>> --
> >>>   1
> >>> (1 row)
> >>>
> >>> postgres=# select * from missingtable;
> >>> ERROR:  relation "missingtable" does not exist
> >>> LINE 1: select * from missingtable;
> >>> ^
> >>> postgres=# select 1;
> >>> ERROR:  current transaction is aborted, commands ignored until end of
> >>> transaction block
> >>> postgres=# rollback;
> >>> LOG:  duration: 11390.295 ms  statement: rollback;
> >>> ROLLBACK
> >>>
> >>> If I reset xact_is_sampled (after the first error inside
> >>> AbortTransaction if I am right), "rollback" statement will not be
> >>> logged. I wonder if this "statement" should be logged?
> >>>
> >>> If the answer is no, I will reset xact_is_sampled in AbortTransaction.
> >>>
> >> FWIW, I'd prefer to log "rollback" as well so as user can recognize
> >> the end of transaction.
> >
> > Ok.
> >
> >>
> >> +   /* Determine if statements are logged in this transaction */
> >> +   xact_is_sampled = random() <= log_xact_sample_rate *
> >> MAX_RANDOM_VALUE;
> >>
> >> If log_xact_sample_rate is 1 we always log all statement in the
> >> transaction regardless of value of random(). Similarly if it's 0, we
> >> never log. I think we can avoid unnecessary random() call in both case
> >> like log_statement_sample_rate does.
> >
> > I agree, I will change that.
> >
> >>
> >>  {"log_statement_sample_rate", PGC_SUSET, LOGGING_WHEN,
> >>  gettext_noop("Fraction of statements over
> >> log_min_duration_statement to log."),
> >>  gettext_noop("If you only want a sample, use a
> >> value between 0 (never "
> >> -"log) and 1.0 (always
> >> log).")
> >> +"log) and 1 (always
> >> log).")
> >>  },
> >>  &log_statement_sample_rate,
> >>  1.0, 0.0, 1.0,
> >>  NULL, NULL, NULL
> >>  },
> >>
> >> This change is not relevant with log_transaction_sample_rate feature
> >> but why do we need this change? In postgresql.conf.sample the
> >> description of both log_statement_sample_rate and
> >> log_transaction_sample_rate use 1.0 instead of 1, like "1.0 logs all
> >> statements from all transactions, 0 never logs". If we need this
> >> change I think it should be a separate patch.
> >
> >
> > Sorry, I changed that, someone suggest using either "0" and "1", or
> > "0.0" and "1.0" but not mixing both.

Agreed with using "0.0" and "1.0".

> > I will remove this change.
> >

--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
+#log_transaction_sample_rate = 0   # Fraction of transactions
whose statements
+   # are logged regardless of
their duration. 1.0 logs all
+   # statements from all
transactions, 0 never logs.

--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
+   {"log_transaction_sampl

Re: pg_upgrade: Pass -j down to vacuumdb

2019-03-27 Thread Jeff Janes
On Tue, Mar 26, 2019 at 7:28 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-25 22:57, Tom Lane wrote:
> > + fprintf(script, "echo %sYou may wish to add --jobs=N for parallel
> analyzing.%s\n",
> > + ECHO_QUOTE, ECHO_QUOTE);
>
> But then you get that information after you have already started the
> script.
>

True, but the same goes for all the other information there, and it sleeps
to let you break out of it.  And I make it a habit to glance through any
scripts someone suggests that I run, so would notice the embedded advice
without running it at all.

I don't find any information about this analyze business on the
> pg_upgrade reference page.  Maybe a discussion there could explain the
> different paths better than making the output script extra complicated.
>
> Essentially: If you want a slow and gentle analyze, use the supplied
> script.  If you want a fast analyze, use vacuumdb, perhaps with an
> appropriate --jobs option.  Note that pg_upgrade --jobs and vacuumdb
> --jobs are resource-bound in different ways, so the same value might not
> be appropriate for both.
>
>
To me, analyze-in-stages is not about gentleness at all.  For example, it
does nothing to move vacuum_cost_delay away from its default of 0.  Rather,
it is about slamming the bare minimum statistics in there as fast as
possible, so your database doesn't keel over from horrible query plans on
even simple queries as soon as you reopen.  I want the database to survive
long enough for the more complete statistics to be gathered.  If you have
quickly accumulated max_connection processes all running horrible query
plans that never finish, your database might as well still be closed for
all the good it does the users. And all the load generated by those is
going to make the critical ANALYZE all that much slower.

At first blush I thought it was obvious that you would not want to run
analyze-in-stages in parallel.  But after thinking about it some more and
reflecting on experience doing some troublesome upgrades, I would reverse
that and say it is now obvious you do want at least the first stage of
analyze-in-stages, and probably the first two, to run in parallel.  That is
not currently an option it supports, so we can't really recommend it in the
script or the docs.

But we could at least adopt the more straightforward patch, suggest that if
they don't want analyze-in-stages they should consider doing the big-bang
analyze in parallel.

Cheers,

Jeff


Re: Ordered Partitioned Table Scans

2019-03-27 Thread Amit Langote
Hi,

On 2019/03/28 7:29, David Rowley wrote:
> On Wed, 27 Mar 2019 at 19:48, Amit Langote
>  wrote:
>> Sorry if this was discussed before, but why does this patch add any new
>> code to partprune.c?  AFAICT, there's no functionality changes to the
>> pruning code.
> 
> You're right. It probably shouldn't be there.  There's a bit of a lack
> of a good home for partition code relating to the planner it seems.

partprune.c is outside the optimizer sub-directory, but exports
planning-related functions like prune_append_rel_partitions(),
make_partition_pruneinfo(), etc.

Similarly, partbound.c can grow bit of planning-related functionality.

>> Regarding
>>
>> +bool
>> +partitions_are_ordered(PlannerInfo *root, RelOptInfo *partrel)
>>
>> I think this could simply be:
>>
>> bool
>> partitions_are_ordered(PartitionBoundInfo *boundinfo)
>>
>> and be defined in partitioning/partbounds.c.  If you think any future
>> modifications to this will require access to the partition key info in
>> PartitionScheme, maybe the following is fine:
>>
>> bool
>> partitions_are_ordered(RelOptInfo *partrel)
> 
> It does need to know how many partitions the partitioned table has,
> which it gets from partrel->nparts, so yeah, RelOptInfo is probably
> needed. I don't think passing in int nparts is a good solution to
> that.  The problem with moving it to partbounds.c is that nothing
> there knows about RelOptInfo currently.

Maybe, this could be a start.  Also, there is a patch in nearby thread
which adds additional functionality to partbounds.c to be used by
partitionwise join code in the optimizer [1].

Thanks,
Amit

[1] https://commitfest.postgresql.org/22/1553/





RE: Timeout parameters

2019-03-27 Thread Nagaura, Ryohei
Hello, Fabien-san.

> From: Fabien COELHO 
> About the backend v11 patch.
> No space or newline before ";". Same comment about the libpq_ timeout.
Fixed.

> There is an error in the code, I think it should be < 0 to detect errors.
Yes, thanks!

> If the parameter has no effect on Windows, I do not see why its value should 
> be
> constrained to zero, it should just have no effect. Idem libpq_ timeout.
I had a misunderstanding.
Indeed, it doesn't need to be zero. Removed.

> Documentation:
> ISTM this is not about libpq connections but about TCP connections. There can 
> be
> non libpq implementations client side.
Then, where do you think the correct place?
I thought that this parameter should be explained here because the communication
will be made through the library libpq same as keepalive features.


Best regards,
-
Ryohei Nagaura




socket_timeout_v9.patch
Description: socket_timeout_v9.patch


TCP_backend_v12.patch
Description: TCP_backend_v12.patch


TCP_interface_v12.patch
Description: TCP_interface_v12.patch


Re: amcheck verification for GiST

2019-03-27 Thread Peter Geoghegan
On Wed, Mar 27, 2019 at 10:29 AM Heikki Linnakangas  wrote:
> Thanks! I had a look, and refactored it quite a bit.

I'm really happy that other people seem to be driving amcheck in a new
direction, without any prompting from me. It's too important to remain
something that only I have contributed to.

> I found the way the recursion worked confusing. On each internal page,
> it looped through all the child nodes, checking the consistency of the
> downlinks. And then it looped through the children again, to recurse.
> This isn't performance-critical, but visiting every page twice still
> seems strange.

To be fair, that's actually what bt_index_parent_check() does. OTOH,
it has to work in a way that makes it an extension of
bt_index_check(), which is not a problem for
gist_index_parent_check().

> In gist_check_page_keys(), if we get into the code to deal with a
> concurrent update, we set 'parent' to point to a tuple on a parent page,
> then unlock it, and continue to look at remaining tuples, using the
> pointer that points to an unlocked buffer.

Why not just copy the page into a local buffer? See my remarks on this below.

> I came up with the attached, which fixes the above-mentioned things. I
> also replaced the check that each node has only internal or leaf
> children, with a different check that the tree has the same height in
> all branches. That catches more potential problems, and was easier to
> implement after the refactoring. This still needs at least a round of
> fixing typos and tidying up comments, but it's more straightforward now,
> IMHO.

You probably didn't mean to leave this "boom" error behind:

> +   if (GistPageIsDeleted(page))
> +   {
> +   elog(ERROR,"boom");

I see that you have this check for deleted pages:

> +   if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INDEX_CORRUPTED),
> +   errmsg("index \"%s\" has deleted page with tuples",
> +   RelationGetRelationName(rel;
> +   }

Why not have a similar check for non-deleted pages, whose maxoffset
must be <= MaxIndexTuplesPerPage?

I see various errors like this one:

> +   if (MAXALIGN(ItemIdGetLength(iid)) != 
> MAXALIGN(IndexTupleSize(idxtuple)))
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INDEX_CORRUPTED),
> +errmsg("index \"%s\" has inconsistent tuple sizes",
> +   RelationGetRelationName(rel;

Can't we say which TID is involved here, so we can find the offending
corrupt tuple afterwards? Or at least the block number? And maybe even
the LSN of the page? I think that that kind of stuff could be added in
a number of places.

I see this stuff that's related to concurrent processes:

> +   /*
> +* It's possible that the page was split since we looked at the 
> parent,
> +* so that we didn't missed the downlink of the right sibling when we
> +* scanned the parent. If so, add the right sibling to the stack now.
> +*/

> +   /*
> +* There was a  discrepancy between parent and child tuples.
> +* We need to verify it is not a result of concurrent call
> +* of gistplacetopage(). So, lock parent and try to find 
> downlink
> +* for current page. It may be missing due to concurrent page
> +* split, this is OK.
> +*/

Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?

> +   stack->parenttup = gist_refind_parent(rel, stack->parentblk, 
> stack->blkno, strategy);

If the gistplacetopage() stuff is truly necessary, then is it okay to
call gist_refind_parent() with the original buffer lock still held
like this?

I still suspect that we should have something like palloc_btree_page()
for GiST, so that we're always operating on a copy of the page in
palloc()'d memory.  Maybe it's worthwhile to do something clever with
concurrently holding buffer locks, but if that's what we're doing here
then I would also expect us to have something weaker than ShareLock as
our relation-level heavyweight lock. And, there should be a prominent
explanation of the theory behind it somewhere.

> What have you been using to test this?

pg_hexedit has full support for GiST. ;-)

--
Peter Geoghegan




Re: Should the docs have a warning about pg_stat_reset()?

2019-03-27 Thread David Rowley
On Thu, 28 Mar 2019 at 10:33, Alvaro Herrera  wrote:
>
> On 2019-Mar-27, Peter Eisentraut wrote:
>
> > On 2019-03-26 16:28, Euler Taveira wrote:
> > > I don't remember why we didn't consider table without stats to be
> > > ANALYZEd. Isn't it the case to fix autovacuum? Analyze
> > > autovacuum_count + vacuum_count = 0?
> >
> > When the autovacuum system was introduced, we didn't have those columns.
> >  But now it seems to make sense that a table with autoanalyze_count +
> > analyze_count = 0 should be a candidate for autovacuum even if the write
> > statistics are zero.  Obviously, this would have the effect that a
> > pg_stat_reset() causes an immediate autovacuum for all tables, so maybe
> > it's not quite that simple.
>
> I'd say it would make them a candidate for auto-analyze; upon completion
> of that, there's sufficient data to determine whether auto-vacuum is
> needed or not.  This sounds like a sensible idea to me.

Yeah, analyze, not vacuum.  It is a bit scary to add new ways for
auto-vacuum to suddenly have a lot of work to do.  When all workers
are busy it can lead to neglect of other duties.  It's true that there
won't be much in the way of routine vacuuming work for the database
the stats were just reset on, as of course, all the n_dead_tup
counters were just reset. However, it could starve other databases of
vacuum attention.  Anti-wraparound vacuums on the current database may
get neglected too.

I'm not saying let's not do it, I'm just saying we need to think of
what bad things could happen as a result of such a change.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Berserk Autovacuum (let's save next Mandrill)

2019-03-27 Thread David Rowley
On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera  wrote:
>
> On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
> > "Nearing wraparound" is too late already. In Amazon, reading table from gp2
> > after you exhausted your IOPS burst budget is like reading a floppy drive,
> > you have to freeze a lot earlier than you hit several terabytes of unfrozen
> > data, or you're dead like Mandrill's Search and Url tables from the link I
> > shared.
>
> OK, then start freezing tuples in the cheap mode (skip index updates)
> earlier than that.  I suppose a good question is when to start.

I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers.  Probably
there would need to be some mode flag that mentioned which workers
were in proactive mode so that these could be cancelled when more
pressing work came in.  I don't have an idea exactly of what
"pro-active" would actually be defined as, but I know that when the
single transaction ID is consumed that causes terra bytes of tables to
suddenly need an anti-wraparound vacuum, then it's not a good
situation to be in. Perhaps getting to some percentage of
autovacuum_freeze_max_age could be classed as pro-active.

> I wonder if Mandrill's problem is related to Mailchimp raising the
> freeze_max_age to a point where autovac did not have enough time to
> react with an emergency vacuum.  If you keep raising that value because
> the vacuums cause problems for you (they block DDL), there's something
> wrong.

I have seen some very high autovacuum_freeze_max_age settings
recently. It would be interesting to know what they had theirs set to.
I see they mentioned "Search and Url tables". I can imagine "search"
never needs any UPDATEs, so quite possibly those were append-only, in
which case the anti-wraparound vacuum would have had quite a lot of
work on its hands since possibly every page needed frozen. A table
receiving regular auto-vacuums from dead tuples would likely get some
pages frozen during those.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Fix XML handling with DOCTYPE

2019-03-27 Thread Chapman Flack
On 03/27/19 19:07, Chapman Flack wrote:
> xml-functions-type-docfix-5.patch attached, with node-sets instead of
> nodesets, libxml2 instead of libxml, and parenthesized full sentences
> now au naturel.
> 
> I ended up turning the formerly-parenthesized note about libxml2's
> node-set ordering into a DocBook : there is really something
> parenthetical about it, with the official statement of node-set
> element ordering being that there is none, and the description of
> what the library happens to do being of possible interest, but set
> apart, with the necessary caveats about relying on it.

I have just suffered a giant sinking feeling upon re-reading this
sentence in our XMLTABLE doc:

  A column marked FOR ORDINALITY will be populated with row numbers
  matching the order in which the output rows appeared in the original
  input XML document.

I've been skimming right over it all this time, and that right there is
a glaring built-in reliance on the observable-but-disclaimed iteration
order of a libxml2 node-set.

I'm a bit unsure what any clarifying language should even say.

Regards,
-Chap




Re: Fix XML handling with DOCTYPE

2019-03-27 Thread Chapman Flack
Hi,

xml-functions-type-docfix-5.patch attached, with node-sets instead of
nodesets, libxml2 instead of libxml, and parenthesized full sentences
now au naturel.

I ended up turning the formerly-parenthesized note about libxml2's
node-set ordering into a DocBook : there is really something
parenthetical about it, with the official statement of node-set
element ordering being that there is none, and the description of
what the library happens to do being of possible interest, but set
apart, with the necessary caveats about relying on it.

Spotted and fixed a couple more typos in the process.

Regards,
-Chap
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 52c28e7..0aed14c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4219,6 +4219,12 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
 value is a full document or only a content fragment.

 
+   
+Limits and compatibility notes for the xml data type
+in PostgreSQL can be found in
+.
+   
+

 Creating XML Values

diff --git a/doc/src/sgml/features.sgml b/doc/src/sgml/features.sgml
index 6c22d69..095fcb9 100644
--- a/doc/src/sgml/features.sgml
+++ b/doc/src/sgml/features.sgml
@@ -16,7 +16,8 @@
   Language SQL.  A revised version of the standard is released
   from time to time; the most recent update appearing in 2011.
   The 2011 version is referred to as ISO/IEC 9075:2011, or simply as SQL:2011.
-  The versions prior to that were SQL:2008, SQL:2003, SQL:1999, and SQL-92.  Each version
+  The versions prior to that were SQL:2008, SQL:2006, SQL:2003, SQL:1999,
+  and SQL-92.  Each version
   replaces the previous one, so claims of conformance to earlier
   versions have no official merit.
   PostgreSQL development aims for
@@ -155,4 +156,329 @@

   
 
+  
+   XML Limits and Conformance to SQL/XML
+
+   
+SQL/XML
+limits and conformance
+   
+
+   
+Significant revisions to the ISO/IEC 9075-14 XML-related specifications
+(SQL/XML) were introduced with SQL:2006. The
+PostgreSQL implementation of the XML data type
+and related functions largely follows the earlier, 2003 edition, with some
+borrowing from the later editions. In particular:
+
+ 
+  
+   Where the current standard provides a family of XML data types
+   to hold document or content in
+   untyped or XML Schema-typed variants, and a type
+   XML(SEQUENCE) to hold arbitrary pieces of XML content,
+   PostgreSQL provides the single
+   xml type, which can hold document or
+   content, and no equivalent of the sequence
+   type.
+  
+ 
+
+ 
+  
+   PostgreSQL provides two functions introduced
+   in SQL:2006, but in variants that use the language XPath 1.0, rather than
+   XML Query as specified for them in the standard.
+  
+ 
+
+   
+
+   
+This section presents some of the resulting differences you may encounter.
+   
+
+   
+Queries restricted to XPath 1.0
+
+
+ The PostgreSQL-specific functions
+ xpath and xpath_exists query
+ XML documents using the XPath language, and
+ PostgreSQL also provides XPath-only variants of
+ the standard functions XMLEXISTS and
+ XMLTABLE, which officially use
+ the XQuery language. For all of these functions,
+ PostgreSQL relies on the
+ libxml2 library, which provides only XPath 1.0.
+
+
+
+ There is a strong connection between the XQuery language and XPath versions
+ 2.0 and later: any expression that is syntactically valid and executes
+ successfully in both produces the same result (with a minor exception for
+ expressions containing numeric character references or predefined entity
+ references, which XQuery replaces with the corresponding character while
+ XPath leaves them alone). But there is no such connection between XPath 1.0
+ and XQuery or the later XPath versions; it was an earlier language and
+ differs in many respects.
+
+
+
+ There are two categories of limitation to keep in mind: the restriction
+ from XQuery to XPath for the functions specified in the SQL standard, and
+ the restriction of XPath to version 1.0 for both the standard and the
+ PostgreSQL-specific functions.
+
+
+
+ Restriction of XQuery to XPath
+
+ 
+  Features of XQuery beyond those of XPath include:
+
+  
+   
+
+ XQuery expressions can construct and return new XML nodes, in addition
+ to all possible XPath values. XPath can introduce and return values of
+ the atomic types (numbers, strings, and so on) but can only return XML
+ nodes already present in documents supplied as input to the expression.
+
+   
+
+   
+
+ XQuery has control constructs for iteration, sorting, and grouping.
+
+   
+
+   
+
+ XQuery allows the declaration and use of lo

Re: Ordered Partitioned Table Scans

2019-03-27 Thread David Rowley
On Wed, 27 Mar 2019 at 21:24, Amit Langote
 wrote:
> Noticed a typo.
>
> + * multiple subpaths then we can't make guarantees about the
> + * order tuples in those subpaths, so we must leave the
>
> order of tuples?

I'll fix that. Thanks.

> Again, sorry if this was discussed, but I got curious about why
> partitions_are_ordered() thinks it can say true even for an otherwise
> ordered list-partitioned table, but containing a null-only partition,
> which is *always* scanned last.  If a query says ORDER BY partkey NULLS
> FIRST, then it's not alright to proceed with assuming partitions are
> ordered even if partitions_are_ordered() said so.

If it's *always* scanned last then it's fine for ORDER BY partkey
NULLS LAST.  If they have ORDER BY partkey NULLS FIRST then we won't
match on the pathkeys.

Or if they do ORDER BY partkey DESC NULLS FIRST, then we're also fine,
since we reverse the order of the subpaths list in that case.  ORDER
BY partkey DESC NULLS LAST is not okay, and we don't optimise that
since it won't match the pathkeys we generate in
build_partition_pathkeys().

> Related, why does build_partition_pathkeys() pass what it does for
> nulls_first parameter of make_pathkey_from_sortinfo()?
>
>  cpathkey = make_pathkey_from_sortinfo(root,
>keyCol,
>NULL,
>partscheme->partopfamily[i],
>partscheme->partopcintype[i],
>partscheme->partcollation[i],
>ScanDirectionIsBackward(scandir),
>  ==>   ScanDirectionIsBackward(scandir),
>0,
>partrel->relids,
>false);
>
> I think null values are almost always placed in the last partition, unless
> the null-accepting list partition also accepts some other non-null value.
> I'm not sure exactly how we can determine the correct value to pass here,
> but what's there in the patch now doesn't seem to be it.

The code looks okay to me. It'll generate pathkeys like ORDER BY
partkey NULLS LAST for forward scans and ORDER BY partkey DESC NULLS
FIRST for backwards scans.

Can you explain what cases you think the code gets wrong?

Here's a preview of the actual and expected behaviour:

# explain (costs off) select * from listp order by a asc nulls last;
 QUERY PLAN

 Append
   ->  Index Only Scan using listp1_a_idx on listp1
   ->  Index Only Scan using listp2_a_idx on listp2
   ->  Index Only Scan using listp_null_a_idx on listp_null
(4 rows)


# explain (costs off) select * from listp order by a asc nulls first;
-- not optimised
 QUERY PLAN

 Sort
   Sort Key: listp1.a NULLS FIRST
   ->  Append
 ->  Seq Scan on listp1
 ->  Seq Scan on listp2
 ->  Seq Scan on listp_null
(6 rows)

# explain (costs off) select * from listp order by a desc nulls first;
-- subpath list is simply reversed in this case.
 QUERY PLAN
-
 Append
   ->  Index Only Scan Backward using listp_null_a_idx on listp_null
   ->  Index Only Scan Backward using listp2_a_idx on listp2
   ->  Index Only Scan Backward using listp1_a_idx on listp1
(4 rows)


# explain (costs off) select * from listp order by a desc nulls last;
-- not optimised
  QUERY PLAN
--
 Sort
   Sort Key: listp1.a DESC NULLS LAST
   ->  Append
 ->  Seq Scan on listp1
 ->  Seq Scan on listp2
 ->  Seq Scan on listp_null
(6 rows)

We could likely improve the two cases that are not optimized by
putting the NULL partition in the correct place in the append
subpaths, but for now, we don't really have an efficient means to
identify which subpath that is.  I've not looked at your partition
planning improvements patch for a while to see if you're storing a
Bitmapset of the non-pruned partitions in RelOptInfo.  Something like
that would allow us to make this better.  Julien and I have talked
about other possible cases to optimise if we have that. e.g if the
default partition is pruned then we can optimise a RANGE partitioned
table with a default. So there's definitely more to be done on this. I
think there's a general consensus that what we're doing in the patch
already is enough to be useful.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Planning counters in pg_stat_statements (using pgss_store)

2019-03-27 Thread legrand legrand
Julien Rouhaud wrote
> On Wed, Mar 27, 2019 at 9:36 PM legrand legrand
> <

> legrand_legrand@

> > wrote:
>>
>> >> - trailing whitespaces and comments wider than 80 characters
>> >>  not fixed
>>
>> > why?  In case it's not clear, I'm talking about the .c file, not the
>> > regression tests.
>>
>> I work on a poor msys install on windows 7, where perl is broken ;o(
>> So no pgindent available.
>> Will fix that later, or as soon as I get a pgindent diff.
>>
>> >> - "Assert(planning_time > 0 && total_time > 0);"
>> >>  moved at the beginning of pgss_store
>>
>> > Have you tried to actually compile postgres and pg_stat_statements
>> > with --enable-cassert?  This test can *never* be true, since you
>> > either provide the planning time or the execution time or neither.  As
>> > I said in my previous mail, adding a parameter to say which counter
>> > you're updating, instead of adding another counter that's mutually
>> > exclusive with the other would make everything clearer.
>>
>> Yes this "assert" is useless as is ... I'll remove it.
>> I understand you proposal of pgss_store refactoring, but I don't have
>> much time available now ... and I would like to check that performances
>> are not broken before any other modification ...
> 
> Ok, but keep in mind that this is the last commitfest for pg12, and
> there are only 4 days left.  Will you have time to take care of it, or
> do you need help on it?

Oups, sorry, I won't have time nor knowledge to finish in time ;o(
Any help is welcome !

Regards
PAscal





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Ordered Partitioned Table Scans

2019-03-27 Thread David Rowley
On Wed, 27 Mar 2019 at 19:48, Amit Langote
 wrote:
> Sorry if this was discussed before, but why does this patch add any new
> code to partprune.c?  AFAICT, there's no functionality changes to the
> pruning code.

You're right. It probably shouldn't be there.  There's a bit of a lack
of a good home for partition code relating to the planner it seems.

> seem like their logic is specialized enough to be confined to pathkeys.c,
> only because it's needed there.

Yeah maybe.

> Regarding
>
> +bool
> +partitions_are_ordered(PlannerInfo *root, RelOptInfo *partrel)
>
> I think this could simply be:
>
> bool
> partitions_are_ordered(PartitionBoundInfo *boundinfo)
>
> and be defined in partitioning/partbounds.c.  If you think any future
> modifications to this will require access to the partition key info in
> PartitionScheme, maybe the following is fine:
>
> bool
> partitions_are_ordered(RelOptInfo *partrel)

It does need to know how many partitions the partitioned table has,
which it gets from partrel->nparts, so yeah, RelOptInfo is probably
needed. I don't think passing in int nparts is a good solution to
that.  The problem with moving it to partbounds.c is that nothing
there knows about RelOptInfo currently.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




RE: minimizing pg_stat_statements performance overhead

2019-03-27 Thread legrand legrand
my test case:

drop table a;
create table a ();

DO
$$
DECLARE
i int;
BEGIN
for i in 1..20
loop
execute 'alter table a add column a'||i::text||' int';
end loop;
END
$$;

select  pg_stat_statements_reset();
set pg_stat_statements.track='none';

DO
$$
DECLARE
i int;
j int;
BEGIN
for j in 1..20
loop
for i in 1..20
loop
execute 'select a'||i::text||',a'||j::text||' from a where 1=2';
end loop;
end loop;
END
$$;




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




RE: minimizing pg_stat_statements performance overhead

2019-03-27 Thread Raymond Martin
Hi Fabien, 
Thank you for your time. Apologies for not being more specific about my testing 
methodology.

> > PGSS not loaded: 0.18ms
>
> This means 0.0018 ms latency per transaction, which seems rather fast, on my 
> laptop I have typically 0.0XX ms...

This actually means 0.18 milliseconds. I agree that this is a bit high, so I 
instead created an Ubuntu VM to get results that would align with yours. 

> I could not reproduce these results on my ubuntu laptop. Could you be more 
> precise about the test? Did you use pgbench? Did it run in parallel? What 
> options were used? What is the test script?

I did not use pgbench. It is important to call pg_stat_statements_reset before 
every query. This simulates a user that is performing distinct and non-repeated 
queries on their database. If you use prepared statements or the same set of 
queries each time, you would remove the contention on the pgss query text file. 
I re-tested this on an Ubuntu machine with 4cores and 14GB ram. I did not run 
it in parallel. I used a python script that implements the follow logic: 
- select pg_stat_statements_reset() -- this is important because we are 
making pgss treat the 'select 1' like a new query which it has not cached into 
pgss_hash. 
- time 'select 1'
Repeat 100 times for each configuration. 

Here are my Ubuntu results:
  pgss unloaded
  Mean: 0.076
  Standard Deviation: 0.038

  pgss.track=none
  Mean: 0.099
  Standard Deviation: 0.040
 
  pgss.track=top 
  Mean: 0.098
  Standard Deviation: 0.107

  pgss.track=none + patch
  Mean: 0.078
  Standard Deviation: 0.042

The results are less noticeable, but I still see about a 20% performance 
improvement here.

> There I have an impact of 10% in these ideal testing conditions wrt latency 
> where the DB does basically nothing, thus which would not warrant to disable 
> pg_stat_statements given the great service this extension brings to 
> performance analysis.

I agree that pg_stat_statements should not be disabled based on these 
performance results. 

> Note that this does not mean that the patch should not be applied, it looks 
> like an oversight, but really I do not have the performance degradation you 
> are suggesting.

I appreciate your input and I want to come up with a canonical test that makes 
this contention more obvious. 
Unfortunately, it is difficult because the criteria that causes this slow down 
(large query sizes and distinct non-repeated queries) are difficult to 
reproduce with pgbench. I would be open to any suggestions here. 

So even though the performance gains in this specific scenario are not as 
great, do you still think it would make sense to submit a patch like this? 

--
Raymond Martin
rama...@microsoft.com
Azure Database for PostgreSQL




Re: Berserk Autovacuum (let's save next Mandrill)

2019-03-27 Thread Alvaro Herrera
On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:


> чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera :
> 
> > On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:

> > * certain tables would have some sort of partial scan that sets the
> >   visibility map.  There's no reason to invoke the whole vacuuming
> >   machinery.  I don't think this is limited to append-only tables, but
> >   rather those are just the ones that are affected the most.
> 
> What other machinery runs on VACUUM invocation that is not wanted there?
> Since Postgres 11 index cleanup is already skipped on append-only tables.

Well, I think it would be useful to set all-visible earlier than waiting
for a vacuum to be necessary, even for tables that are not append-only.
So if you think about this just for the append-only table, you leave
money on the table.

> > * tables nearing wraparound danger should use the (yet to be committed)
> >   option to skip index cleaning, which makes the cleanup action faster.
> >   Again, no need for complete vacuuming.
> 
> "Nearing wraparound" is too late already. In Amazon, reading table from gp2
> after you exhausted your IOPS burst budget is like reading a floppy drive,
> you have to freeze a lot earlier than you hit several terabytes of unfrozen
> data, or you're dead like Mandrill's Search and Url tables from the link I
> shared.

OK, then start freezing tuples in the cheap mode (skip index updates)
earlier than that.  I suppose a good question is when to start.


I wonder if Mandrill's problem is related to Mailchimp raising the
freeze_max_age to a point where autovac did not have enough time to
react with an emergency vacuum.  If you keep raising that value because
the vacuums cause problems for you (they block DDL), there's something
wrong.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Berserk Autovacuum (let's save next Mandrill)

2019-03-27 Thread Komяpa
Hi,

чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera :

> On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:
>
> > Attached is sketch of small patch that fixes several edge cases with
> > autovacuum. Long story short autovacuum never comes to append only
> tables,
> > killing large productions.
>
> Yeah, autovac is not coping with these scenarios (and probably others).
> However, rather than taking your patch's idea verbatim, I think we
> should have autovacuum use separate actions for those two (wildly
> different) scenarios.  For example:
>
> * certain tables would have some sort of partial scan that sets the
>   visibility map.  There's no reason to invoke the whole vacuuming
>   machinery.  I don't think this is limited to append-only tables, but
>   rather those are just the ones that are affected the most.
>

What other machinery runs on VACUUM invocation that is not wanted there?
Since Postgres 11 index cleanup is already skipped on append-only tables.


> * tables nearing wraparound danger should use the (yet to be committed)
>   option to skip index cleaning, which makes the cleanup action faster.
>   Again, no need for complete vacuuming.
>

"Nearing wraparound" is too late already. In Amazon, reading table from gp2
after you exhausted your IOPS burst budget is like reading a floppy drive,
you have to freeze a lot earlier than you hit several terabytes of unfrozen
data, or you're dead like Mandrill's Search and Url tables from the link I
shared.


>
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Should the docs have a warning about pg_stat_reset()?

2019-03-27 Thread Alvaro Herrera
On 2019-Mar-27, Peter Eisentraut wrote:

> On 2019-03-26 16:28, Euler Taveira wrote:
> > I don't remember why we didn't consider table without stats to be
> > ANALYZEd. Isn't it the case to fix autovacuum? Analyze
> > autovacuum_count + vacuum_count = 0?
> 
> When the autovacuum system was introduced, we didn't have those columns.
>  But now it seems to make sense that a table with autoanalyze_count +
> analyze_count = 0 should be a candidate for autovacuum even if the write
> statistics are zero.  Obviously, this would have the effect that a
> pg_stat_reset() causes an immediate autovacuum for all tables, so maybe
> it's not quite that simple.

I'd say it would make them a candidate for auto-analyze; upon completion
of that, there's sufficient data to determine whether auto-vacuum is
needed or not.  This sounds like a sensible idea to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Berserk Autovacuum (let's save next Mandrill)

2019-03-27 Thread Alvaro Herrera
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:

> Attached is sketch of small patch that fixes several edge cases with
> autovacuum. Long story short autovacuum never comes to append only tables,
> killing large productions.

Yeah, autovac is not coping with these scenarios (and probably others).
However, rather than taking your patch's idea verbatim, I think we
should have autovacuum use separate actions for those two (wildly
different) scenarios.  For example:

* certain tables would have some sort of partial scan that sets the
  visibility map.  There's no reason to invoke the whole vacuuming
  machinery.  I don't think this is limited to append-only tables, but
  rather those are just the ones that are affected the most.

* tables nearing wraparound danger should use the (yet to be committed)
  option to skip index cleaning, which makes the cleanup action faster.
  Again, no need for complete vacuuming.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Should the docs have a warning about pg_stat_reset()?

2019-03-27 Thread Peter Eisentraut
On 2019-03-26 16:28, Euler Taveira wrote:
> I don't remember why we didn't consider table without stats to be
> ANALYZEd. Isn't it the case to fix autovacuum? Analyze
> autovacuum_count + vacuum_count = 0?

When the autovacuum system was introduced, we didn't have those columns.
 But now it seems to make sense that a table with autoanalyze_count +
analyze_count = 0 should be a candidate for autovacuum even if the write
statistics are zero.  Obviously, this would have the effect that a
pg_stat_reset() causes an immediate autovacuum for all tables, so maybe
it's not quite that simple.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-03-27 Thread Robert Haas
On Tue, Mar 26, 2019 at 10:35 PM Tsunakawa, Takayuki
 wrote:
> I almost have the same view as Sawada-san.  The reloption 
> vacuum_shrink_enabled is a positive name and follows the naming style of 
> other reloptions.  I hope this matches the style you have in mind.

You're both right and I'm wrong.

However, I think it would be better to stick with the term 'truncate'
which is widely-used already, rather than introducing a new term.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Planning counters in pg_stat_statements (using pgss_store)

2019-03-27 Thread Julien Rouhaud
On Wed, Mar 27, 2019 at 9:36 PM legrand legrand
 wrote:
>
> >> - trailing whitespaces and comments wider than 80 characters
> >>  not fixed
>
> > why?  In case it's not clear, I'm talking about the .c file, not the
> > regression tests.
>
> I work on a poor msys install on windows 7, where perl is broken ;o(
> So no pgindent available.
> Will fix that later, or as soon as I get a pgindent diff.
>
> >> - "Assert(planning_time > 0 && total_time > 0);"
> >>  moved at the beginning of pgss_store
>
> > Have you tried to actually compile postgres and pg_stat_statements
> > with --enable-cassert?  This test can *never* be true, since you
> > either provide the planning time or the execution time or neither.  As
> > I said in my previous mail, adding a parameter to say which counter
> > you're updating, instead of adding another counter that's mutually
> > exclusive with the other would make everything clearer.
>
> Yes this "assert" is useless as is ... I'll remove it.
> I understand you proposal of pgss_store refactoring, but I don't have
> much time available now ... and I would like to check that performances
> are not broken before any other modification ...

Ok, but keep in mind that this is the last commitfest for pg12, and
there are only 4 days left.  Will you have time to take care of it, or
do you need help on it?




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Peter Eisentraut
On 2019-03-27 18:09, Tom Lane wrote:
> My recollection of the discussion is that people argued that "postmaster"
> might be taken to have something to do with an e-mail server, and
> therefore we needed to stop using that name.  The lack of either follow-on
> complaints or follow-on action doesn't make me too well disposed to
> what is essentially that same argument over again.

The reason there was that the distinction was mostly useless and the
different command-line option parsing was confusing.  The name itself
was confusing but not in conflict with anything.

However, we do know that we are very bad at actually getting rid of
deprecated things.

How about we compromise in this thread and remove postmaster and leave
everything else as is. ;-)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Gavin Flower

On 28/03/2019 03:41, Tom Lane wrote:

Andreas Karlsson  writes:

On 3/27/19 3:26 PM, Tomas Vondra wrote:

That is true, of course. But are there actual examples of such conflicts
in practice? I mean, are there tools/packages that provide commands with
a conflicting name? I'm not aware of any, and as was pointed before, we'd
have ~20 years of history on any new ones.

That is a fair argument. Since we squatted those names back in the
mid-90s I think the risk of collision is low.

Right.  I think there is a fair argument to be made for user confusion
(not actual conflict) with respect to createuser and dropuser.  The
argument for renaming any of the other tools is much weaker, IMO.

regards, tom lane


I think the consistency of having all PostgreSQL commands start with 
'pg_' would make them both easier to find and to learn.


Although I think we should keep the psql command name, in addition to 
the pg_sql variant - the latter needed for consistency.



Cheers,
Gavin





Re: PostgreSQL pollutes the file system

2019-03-27 Thread Gavin Flower

On 28/03/2019 03:07, Andreas Karlsson wrote:

On 3/27/19 2:51 PM, Tomas Vondra wrote:

I think the consensus in this thread (and the previous ancient ones) is
that it's not worth it. It's one thing to introduce new commands with 
the
pg_ prefix, and it's a completely different thing to rename existing 
ones.

That has inherent costs, and as Tom pointed out the burden would fall on
people using PostgreSQL (and that's rather undesirable).

I personally don't see why having commands without pg_ prefix would be
an issue. Especially when placed in a separate directory, which 
eliminates

the possibility of conflict with other commands.


I buy that it may not be worth breaking tens of thousands of scripts 
to fix this, but I disagree about it not being an issue. Most Linux 
distributions add PostgreSQL's executables in to a directory which is 
in the default $PATH (/usr/bin in the case of Debian). And even if it 
would be installed into a separate directory there would still be a 
conflict as soon as that directory is added to $PATH.


And I think that it is also relatively easy to confuse adduser and 
createuser when reading a script. Nothing about the name createuser 
indicates that it will create a role in an SQL database.


Andreas


Existing users would feel some pain, but continued use of commands  
'creatuser' rather than pg_createuser (better still pg_createrole, as 
suggested elsewhere) create confusion and display unintended arrogance.


There is a suggestion to use aliases, and I think that is a good interim 
step, to introduce the 'pg_' variants. Possible with an option at 
install time to force only 'pg_' variants (with the possible exception 
of psql).


The only command, that I think warrants a permanent alias is psql, which 
I think is not ambiguous, but having a pg_sql for consistency would be good.



Cheers,
Gavin





Berserk Autovacuum (let's save next Mandrill)

2019-03-27 Thread Komяpa
Hi hackers,

Attached is sketch of small patch that fixes several edge cases with
autovacuum. Long story short autovacuum never comes to append only tables,
killing large productions.

First case, mine.

https://www.postgresql.org/message-id/CAC8Q8tLBeAxR%2BBXWuKK%2BHP5m8tEVYn270CVrDvKXt%3D0PkJTY9g%40mail.gmail.com

We had a table we were appending and wanted Index Only Scan to work. For it
to work, you need to call VACUUM manually, since VACUUM is the only way to
mark pages all visible, and autovacuum never comes to append only tables.
We were clever to invent a workflow without dead tuples and it painfully
bit us.

Second case, just read in the news.
https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/

Mandrill has 6TB append only table that autovacuum probably never vacuumed.
Then anti-wraparound came and production went down. If autovacuum did its
job before that last moment, it would probably be okay.

Idea: look not on dead tuples, but on changes, just like ANALYZE does.
It's my first patch on Postgres, it's probably all wrong but I hope it
helps you get the idea.
-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index fa875db816..e297fc8c4b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3055,7 +3055,7 @@ relation_needs_vacanalyze(Oid relid,
 	if (PointerIsValid(tabentry) && AutoVacuumingActive())
 	{
 		reltuples = classForm->reltuples;
-		vactuples = tabentry->n_dead_tuples;
+		vactuples = tabentry->changes_since_vacuum;
 		anltuples = tabentry->changes_since_analyze;
 
 		vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 2a8472b91a..cd4611d64f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4687,6 +4687,7 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
 		result->n_live_tuples = 0;
 		result->n_dead_tuples = 0;
 		result->changes_since_analyze = 0;
+		result->changes_since_vacuum = 0;
 		result->blocks_fetched = 0;
 		result->blocks_hit = 0;
 		result->vacuum_timestamp = 0;
@@ -5817,6 +5818,7 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 			tabentry->n_live_tuples = tabmsg->t_counts.t_delta_live_tuples;
 			tabentry->n_dead_tuples = tabmsg->t_counts.t_delta_dead_tuples;
 			tabentry->changes_since_analyze = tabmsg->t_counts.t_changed_tuples;
+			tabentry->changes_since_vacuum = tabmsg->t_counts.t_changed_tuples;
 			tabentry->blocks_fetched = tabmsg->t_counts.t_blocks_fetched;
 			tabentry->blocks_hit = tabmsg->t_counts.t_blocks_hit;
 
@@ -5850,6 +5852,7 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 			tabentry->n_live_tuples += tabmsg->t_counts.t_delta_live_tuples;
 			tabentry->n_dead_tuples += tabmsg->t_counts.t_delta_dead_tuples;
 			tabentry->changes_since_analyze += tabmsg->t_counts.t_changed_tuples;
+			tabentry->changes_since_vacuum += tabmsg->t_counts.t_changed_tuples;
 			tabentry->blocks_fetched += tabmsg->t_counts.t_blocks_fetched;
 			tabentry->blocks_hit += tabmsg->t_counts.t_blocks_hit;
 		}
@@ -6083,6 +6086,7 @@ pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
 
 	tabentry->n_live_tuples = msg->m_live_tuples;
 	tabentry->n_dead_tuples = msg->m_dead_tuples;
+	tabentry->changes_since_vacuum = 0;
 
 	if (msg->m_autovacuum)
 	{
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c080fa6388..65e38edd8d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -643,6 +643,7 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter n_live_tuples;
 	PgStat_Counter n_dead_tuples;
 	PgStat_Counter changes_since_analyze;
+	PgStat_Counter changes_since_vacuum;
 
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;


Re: Enable data checksums by default

2019-03-27 Thread Ants Aasma
On Wed, Mar 27, 2019, 15:57 Christoph Berg  wrote:

> Re: To Tom Lane 2019-03-26 <20190326151446.gg3...@msg.df7cb.de>
> > I run a benchmark with checksums disabled/enabled. shared_buffers is
> > 512kB to make sure almost any read will fetch the page from the OS
> > cache; scale factor is 50 (~750MB) to make sure the whole cluster fits
> > into RAM.
> [...]
> > So the cost is 5% in this very contrived case. In almost any other
> > setting, the cost would be lower, I'd think.
>
> (That was on 12devel, btw.)
>
> That was about the most extreme OLTP read-only workload. After
> thinking about it some more, I realized that exercising large seqscans
> might be an even better way to test it because of less per-query
> overhead.
>
> Same setup again, shared_buffers = 16 (128kB), jit = off,
> max_parallel_workers_per_gather = 0:
>
> select count(bid) from pgbench_accounts;
>
> no checksums: ~456ms
> with checksums: ~489ms
>
> 456.0/489 = 0.9325
>
> The cost of checksums is about 6.75% here.
>

Can you try with postgres compiled with CFLAGS="-O2 -march=native"? There's
a bit of low hanging fruit there to use a runtime CPU check to pick a
better optimized checksum function.

Regards,
Ants Aasma

>


Re: Planning counters in pg_stat_statements (using pgss_store)

2019-03-27 Thread legrand legrand
>> - trailing whitespaces and comments wider than 80 characters
>>  not fixed

> why?  In case it's not clear, I'm talking about the .c file, not the
> regression tests.

I work on a poor msys install on windows 7, where perl is broken ;o(
So no pgindent available. 
Will fix that later, or as soon as I get a pgindent diff.

>> - "Assert(planning_time > 0 && total_time > 0);"
>>  moved at the beginning of pgss_store

> Have you tried to actually compile postgres and pg_stat_statements
> with --enable-cassert?  This test can *never* be true, since you
> either provide the planning time or the execution time or neither.  As
> I said in my previous mail, adding a parameter to say which counter
> you're updating, instead of adding another counter that's mutually
> exclusive with the other would make everything clearer.

Yes this "assert" is useless as is ... I'll remove it.
I understand you proposal of pgss_store refactoring, but I don't have 
much time available now ... and I would like to check that performances 
are not broken before any other modification ...

Regards
PAscal








--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Progress reporting for pg_verify_checksums

2019-03-27 Thread Fabien COELHO





I still think that the speed should compute a double to avoid integer
rounding errors within the computation. ISTM that rounding should only be
done for display in the end.


Ok, also done this way. I decided to print only full MB and not any
further digits as those don't seem very relevant.


For the computation to be in double, you must convert to double somewhere 
in the formula, otherwise it is computed as ints and only cast as a double 
afterwards. Maybe:


  current_speed = (1000.0 / MEGABYTES) * current_size / elapsed;

Or anything which converts to double early.

Instead of casting percent to int, maybe use "%.0f" as well, just like 
current_speed?


 +  if ((blockno % 100 == 0) && show_progress)

I'd invert the condition to avoid a modulo when not needed.

I'm not very found of global variables, but it does not matter here and 
doing it differently would involve more code. It would matter though if 
the progress report would be shared between commands.



I'm okay with calling the report on each file even if this means every few
GB...


For my I/O throttling branch I've backed off to only call the report
every 100 blocks (and at the end of the file). I've added that logic to
this patch as well.


The 100 blocks = 800 KiB looks arbitrary. What is the rational? ISTM that 
postgres will tend to store a power of two number of pages on full 
segments, so maybe there could be a rational for 1024. Not sure.



Someone suggested ETA, and it seems rather simple to do. What about
adding it?


I don't know, just computing the ETA is easy of course. But you'd have
to fiddle with seconds/minutes/hours conversions cause the runtime could
be hours. Then, you don't want to have it bumping around weirdly when
your I/O rate changes, cf. https://xkcd.com/612/ That doesn't sounds
simpler than the signal trigger we might get rid of.


Ok, it is more complicated that it looks for large sizes if second is not 
the right display unit.


--
Fabien.




RE: minimizing pg_stat_statements performance overhead

2019-03-27 Thread legrand legrand
Your fix is probably the best one.
Maybe this could be considered as a bug and back ported to previous versions
...

Regards
PAscal





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-27 Thread Petr Jelinek
On 27/03/2019 20:55, Tomas Vondra wrote:
> Hi,
> 
> I've now committed the MCV part, after addressing the last two issues
> raised by Dean:
> 

Congrats!

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services




Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-27 Thread Tomas Vondra

Hi,

I've now committed the MCV part, after addressing the last two issues
raised by Dean:

* The MCV build now always uses the mincount to decide which of the
items to keep in the list.

* Both the MCV build and deserialization now uses the same maximum number
of list items (10k).

Unfortunately, I forgot to merge these two fixes before pushing, so I had
to commit them separately. Sorry about that :/

Attached are the remaining parts of this patch series - the multivariate
histograms, and also a new patch tweaking regression tests for the old
statistic types (ndistinct, dependencies) to adopt the function-based
approach instead of the regular EXPLAIN.

But those are clearly a matter for the future (well, maybe it'd make sense
to commit the regression test change now).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



0001-adopt-regression-tests.patch.gz
Description: application/gzip


0002-multivariate-histograms.patch.gz
Description: application/gzip


Re: Re: A separate table level option to control compression

2019-03-27 Thread Shaun Thomas
> Setting compress_tuple_target to a higher value won't be negative because the
> toast_tuple_target is used for compression anyways when compress_tuple_target
> is higher than toast_tuple_target.

Ack, you're right. Got my wires crossed. I must have gotten confused by my later
tests that enforced main storage and pushed out the toast tuple target
to prevent
out-of-line storage.

> I added a small discussion about negative effects of setting  
> compress_tuple_target
> lower though, per your suggestion.

Fair enough. I like the addition since it provides a very concrete
reason not to abuse
the parameter. :shipit:

Vaya con Queso

-- 
Shaun M Thomas - 2ndQuadrant
PostgreSQL Training, Services and Support
shaun.tho...@2ndquadrant.com | www.2ndQuadrant.com




RE: minimizing pg_stat_statements performance overhead

2019-03-27 Thread Raymond Martin
Hi Pascal,
Thanks for your feedback! I like your ideas. 

>the part that hurts in terms or performances is:
>
>   if (jstate.clocations_count > 0)
>   pgss_store(pstate->p_sourcetext,

I agree that this is the typically the most expensive part, but query 
normalization and hashing can also start becoming expensive with larger 
queries.  

>that writes the query text to disk, when it has at less one parameter ...
>Computing the QueryId should stay very small and can be very usefull when used 
>in conjonction with
>https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql-archive.org%2FFeature-improvement-can-we-add-queryId-for-pg-catalog-pg-stat-activity-view-td6077275.html%23a6077602&data=02%7C01%7Cramarti%40microsoft.com%7Cfaa866abf1d5478e9a2208d6b2887cc4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892696615564063&sdata=pWbyVleHceAHoNTMzb5oHGzois5yDaMpEHKmappTIwk%3D&reserved=0
>for wait events sampling.

I also agree that the query id can be very useful! In cases where query id is 
required, pg_stat_statements can be enabled. 
My intent of turning tracking off is to minimize the performance impact of pgss 
as much as possible and the thread above states: "PGSS jumble query logic is 
not bullet proof and may take time then impact the perf".

I believe your fix is a great step forward, but checking enabled at the very 
beginning would lead to better performance. This also follows the paradigm set 
by the rest of the pg_stat_statements codebase.
In the future, if we want only the query ID to be calculated maybe we can add 
another option for that?

--
Raymond Martin
rama...@microsoft.com
Azure Database for PostgreSQL






Re: PostgreSQL pollutes the file system

2019-03-27 Thread Alvaro Herrera
On 2019-Mar-27, Tom Lane wrote:

> Alvaro Herrera  writes:
> > I suppose that if you're a Postgres developer, you naturally expect that
> > "createdb" creates a Postgres DB.  What if you use multiple database
> > systems, and then only occasionally have to do DBA tasks?  I find this
> > POV that createdb doesn't need renaming a bit self-centered.
> 
> Nobody is defending the existing names as being something we'd pick
> if we were picking them today.  The question is whether changing them
> is worth the pain.  (And, one more time, may I point out that most
> of the pain will be borne by people not on this mailing list, hence
> unable to vote here.)  I don't think there is any reasonable argument
> that said pain will be justified for any of them except maybe createuser
> and dropuser.

The implicit argument here is that existing users are a larger
population than future users.  I, for one, don't believe that.  I think
taking no action is a disservice to future users.  Also, that modifying
the code will be utterly painful and that less administrative code will be
written in the future than has already been written.

We *could* run a poll on twitter/slack/website to get a feeling on a
wider population.  That would still reach mostly existing Postgres
users, but at least it would be much more diverse than this group.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-03-27 Thread Alexey Kondratov

On 26.03.2019 11:19, Michael Paquier wrote:

+ * This is a simplified and adapted to frontend version of
+ * RestoreArchivedFile function from transam/xlogarchive.c
+ */
+static int
+RestoreArchivedWAL(const char *path, const char *xlogfname,
I don't think that we should have duplicates for that, so I would
recommend refactoring the code so as a unique code path is taken by
both, especially since the user can fetch the command from
postgresql.conf.


This comment is here since the beginning of my work on this patch and 
now it is rather misleading.


Even if we does not take into account obvious differences like error 
reporting, different log levels based on many conditions, cleanup 
options, check for standby mode; restore_command execution at backend 
recovery and during pg_rewind has a very important difference. If it 
fails at backend, then as stated in the comment 'Remember, we 
rollforward UNTIL the restore fails so failure here is just part of the 
process' -- it is OK. In opposite, in pg_rewind if we failed to recover 
some required WAL segment, then it definitely means the end of the 
entire process, since we will fail at finding last common checkpoint or 
extracting page map.


The only part we can share is constructing restore_command with aliases 
replacement. However, even in this place the logic is slightly 
different, since we do not need %r alias for pg_rewind. The only use 
case of %r in restore_command I know is pg_standby, which seems to be as 
not a case for pg_rewind. I have tried to move this part to the common, 
but it becomes full of conditions and less concise.


Please, correct me if I am wrong, but it seems that there are enough 
differences to keep this function separated, isn't it?



Why two options?  Wouldn't actually be enough use-postgresql-conf to
do the job?  Note that "postgres" should always be installed if
pg_rewind is present because it is a backend-side utility, so while I
don't like adding a dependency to other binaries in one binary, having
an option to pass out a command directly via the command line of
pg_rewind stresses me more.


I am not familiar enough with DBA scenarios, where -R option may be 
useful, but I was asked a few times for that. I can only speculate that 
for example someone may want to run freshly rewinded cluster as master, 
not replica, so its config may differ from replica's one, where 
restore_command is surely intended to be. Thus, it is easier to leave 
master's config at the place and just specify restore_command as command 
line argument.



Don't we need to worry about signals interrupting the restore command?
It seems to me that some refactoring from the stuff in xlogarchive.c
would be in order.


Thank you for pointing me to this place again. Previously, I thought 
that we should not care about it, since if restore_command was not 
successful due to any reason, then rewind failed, so we will stop and 
exit at upper levels. However, if it was due to a signal, then some of 
next messages may be misleading, if e.g. user manually interrupted it 
for some reason. So that, I added a similar check here as well.


Updated version of patch is attached.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 9e00f7a7696a88f350e1e328a9758ab85631c813 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v6] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 -
 src/bin/pg_rewind/parsexlog.c | 167 +-
 src/bin/pg_rewind/pg_rewind.c |  96 ++-
 src/bin/pg_rewind/pg_rewind.h |   7 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 -
 8 files changed, 376 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a64ee29e..90e3f22f97 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be m

Re: PostgreSQL pollutes the file system

2019-03-27 Thread Fred .Flintstone
Symlinks would be great, because then the symlinks could be packaged
as an optional package.
such as;
- postgresql-11
- postgresql-client-11
- postgresql-client-symlinks-11
- postgresql-client-common
- postgresql-common

Then one might chose to not install the symlinks package or uninstall it.

And it would ease discoverability, predictability, intuitiveness, and
ease-of-use so much by just being able to type pg_ to discover
all the PostgreSQL-related commands.

On Wed, Mar 27, 2019 at 6:26 PM Petr Jelinek
 wrote:
>
> On 27/03/2019 15:26, Tomas Vondra wrote:
> > On Wed, Mar 27, 2019 at 03:07:24PM +0100, Andreas Karlsson wrote:
> >> On 3/27/19 2:51 PM, Tomas Vondra wrote:
> >>> I think the consensus in this thread (and the previous ancient ones) is
> >>> that it's not worth it. It's one thing to introduce new commands with
> >>> the
> >>> pg_ prefix, and it's a completely different thing to rename existing
> >>> ones.
> >>> That has inherent costs, and as Tom pointed out the burden would fall on
> >>> people using PostgreSQL (and that's rather undesirable).
> >>>
> >>> I personally don't see why having commands without pg_ prefix would be
> >>> an issue. Especially when placed in a separate directory, which
> >>> eliminates
> >>> the possibility of conflict with other commands.
> >>
> >> I buy that it may not be worth breaking tens of thousands of scripts
> >> to fix this, but I disagree about it not being an issue. Most Linux
> >> distributions add PostgreSQL's executables in to a directory which is
> >> in the default $PATH (/usr/bin in the case of Debian). And even if it
> >> would be installed into a separate directory there would still be a
> >> conflict as soon as that directory is added to $PATH.
> >>
> >
> > That is true, of course.
>
> It's only partially true, for example on my systems:
>
> Debian/Ubuntu:
> $ readlink -f /usr/bin/createuser
> /usr/share/postgresql-common/pg_wrapper
>
> Centos (PGDG package):
> readlink -f /usr/bin/createdb
> /usr/pgsql-11/bin/createdb
>
> This also means that the idea about symlinks is something packages
> already do.
>
> --
>   Petr Jelinek  http://www.2ndQuadrant.com/
>   PostgreSQL Development, 24x7 Support, Training & Services




Re: amcheck verification for GiST

2019-03-27 Thread Heikki Linnakangas

On 27/03/2019 11:51, Andrey Borodin wrote:

Hi!

Here's new version of GiST amcheck, which takes into account recently committed 
GiST VACUUM.
It tests that deleted pages do not contain any data.


Thanks! I had a look, and refactored it quite a bit.

I found the way the recursion worked confusing. On each internal page, 
it looped through all the child nodes, checking the consistency of the 
downlinks. And then it looped through the children again, to recurse. 
This isn't performance-critical, but visiting every page twice still 
seems strange.


In gist_check_page_keys(), if we get into the code to deal with a 
concurrent update, we set 'parent' to point to a tuple on a parent page, 
then unlock it, and continue to look at remaining tuples, using the 
pointer that points to an unlocked buffer.


I came up with the attached, which fixes the above-mentioned things. I 
also replaced the check that each node has only internal or leaf 
children, with a different check that the tree has the same height in 
all branches. That catches more potential problems, and was easier to 
implement after the refactoring. This still needs at least a round of 
fixing typos and tidying up comments, but it's more straightforward now, 
IMHO.


What have you been using to test this?

- Heikki
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index dcec3b85203..dd9b5ecf926 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -1,13 +1,13 @@
 # contrib/amcheck/Makefile
 
 MODULE_big	= amcheck
-OBJS		= verify_nbtree.o $(WIN32RES)
+OBJS		= verify_nbtree.o verify_gist.o $(WIN32RES)
 
 EXTENSION = amcheck
 DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
 PGFILEDESC = "amcheck - function for verifying relation integrity"
 
-REGRESS = check check_btree
+REGRESS = check check_btree check_gist
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.1--1.2.sql b/contrib/amcheck/amcheck--1.1--1.2.sql
index 883530dec74..1d461fff5b9 100644
--- a/contrib/amcheck/amcheck--1.1--1.2.sql
+++ b/contrib/amcheck/amcheck--1.1--1.2.sql
@@ -17,3 +17,13 @@ LANGUAGE C STRICT PARALLEL RESTRICTED;
 
 -- Don't want this to be available to public
 REVOKE ALL ON FUNCTION bt_index_parent_check(regclass, boolean, boolean) FROM PUBLIC;
+
+--
+-- gist_index_parent_check()
+--
+CREATE FUNCTION gist_index_parent_check(index regclass)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'gist_index_parent_check'
+LANGUAGE C STRICT;
+
+REVOKE ALL ON FUNCTION gist_index_parent_check(regclass) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 000..84da7d7ec0f
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,31 @@
+/*-
+ *
+ * amcheck.h
+ *		Shared routines for amcheck verifications.
+ *
+ * Copyright (c) 2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/amcheck/amcheck.h
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "commands/tablecmds.h"
+#include "miscadmin.h"
+#include "storage/lmgr.h"
+#include "utils/memutils.h"
+#include "utils/snapmgr.h"
+
+extern void
+amcheck_lock_relation(Oid indrelid, bool parentcheck,Relation *indrel,
+		Relation *heaprel, LOCKMODE	*lockmode);
+
+extern void
+amcheck_unlock_relation(Oid indrelid, Relation indrel, Relation heaprel, LOCKMODE	lockmode);
diff --git a/contrib/amcheck/verify_gist.c b/contrib/amcheck/verify_gist.c
new file mode 100644
index 000..8b37b20bc9c
--- /dev/null
+++ b/contrib/amcheck/verify_gist.c
@@ -0,0 +1,348 @@
+/*-
+ *
+ * verify_gist.c
+ *		Verifies the integrity of GiST indexes based on invariants.
+ *
+ * Verification checks that all paths in GiST graph contain
+ * consistent keys: tuples on parent pages consistently include tuples
+ * from children pages. Also, verification checks graph invariants:
+ * internal page must have at least one downlinks, internal page can
+ * reference either only leaf pages or only internal pages.
+ *
+ *
+ * Copyright (c) 2017-2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/amcheck/verify_gist.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "access/gist_private.h"
+#include "amcheck.h"
+
+
+typedef struct GistScanItem
+{
+	int			depth;
+	IndexTuple	parenttup;
+	BlockNumber parentblk;
+	XLogRecPtr	parentlsn;
+	BlockNumber blkno;
+	struct GistScanItem *next;
+} GistScanItem;
+
+static void check_index_page(Relation rel, Buffer buffer);
+
+static IndexTuple gist_refind_parent(Relation rel, BlockNumber parentblkno, BlockNumber childblkno, BufferAccessStrategy strategy);
+
+sta

Re: PostgreSQL pollutes the file system

2019-03-27 Thread Petr Jelinek
On 27/03/2019 15:26, Tomas Vondra wrote:
> On Wed, Mar 27, 2019 at 03:07:24PM +0100, Andreas Karlsson wrote:
>> On 3/27/19 2:51 PM, Tomas Vondra wrote:
>>> I think the consensus in this thread (and the previous ancient ones) is
>>> that it's not worth it. It's one thing to introduce new commands with
>>> the
>>> pg_ prefix, and it's a completely different thing to rename existing
>>> ones.
>>> That has inherent costs, and as Tom pointed out the burden would fall on
>>> people using PostgreSQL (and that's rather undesirable).
>>>
>>> I personally don't see why having commands without pg_ prefix would be
>>> an issue. Especially when placed in a separate directory, which
>>> eliminates
>>> the possibility of conflict with other commands.
>>
>> I buy that it may not be worth breaking tens of thousands of scripts
>> to fix this, but I disagree about it not being an issue. Most Linux
>> distributions add PostgreSQL's executables in to a directory which is
>> in the default $PATH (/usr/bin in the case of Debian). And even if it
>> would be installed into a separate directory there would still be a
>> conflict as soon as that directory is added to $PATH.
>>
> 
> That is true, of course.

It's only partially true, for example on my systems:

Debian/Ubuntu:
$ readlink -f /usr/bin/createuser
/usr/share/postgresql-common/pg_wrapper

Centos (PGDG package):
readlink -f /usr/bin/createdb
/usr/pgsql-11/bin/createdb

This also means that the idea about symlinks is something packages
already do.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Tom Lane
Alvaro Herrera  writes:
> I suppose that if you're a Postgres developer, you naturally expect that
> "createdb" creates a Postgres DB.  What if you use multiple database
> systems, and then only occasionally have to do DBA tasks?  I find this
> POV that createdb doesn't need renaming a bit self-centered.

Nobody is defending the existing names as being something we'd pick
if we were picking them today.  The question is whether changing them
is worth the pain.  (And, one more time, may I point out that most
of the pain will be borne by people not on this mailing list, hence
unable to vote here.)  I don't think there is any reasonable argument
that said pain will be justified for any of them except maybe createuser
and dropuser.

>> "postmaster" symlink, though it's been deprecated for at least a
>> dozen years.)

> I don't think that change was because of executable namespace pollution
> or user confusion.  (Commit 5266f221a2e1, can't find the discussion
> though.)

My recollection of the discussion is that people argued that "postmaster"
might be taken to have something to do with an e-mail server, and
therefore we needed to stop using that name.  The lack of either follow-on
complaints or follow-on action doesn't make me too well disposed to
what is essentially that same argument over again.

regards, tom lane




Re: Fix XML handling with DOCTYPE

2019-03-27 Thread Chapman Flack
On 3/27/19 9:31 AM, Alvaro Herrera wrote:
> Everyone calls it "libxml2" nowadays.  Let's just use that and avoid any
> possible confusion.  If some libxml3 emerges one day, it's quite likely
> we'll need to revise much more than our docs in order to use it.

That's persuasive to me. I'll change the references to say libxml2
and let a committer serve as tiebreaker.

>> [1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
>> in no particular order"
> 
> What this means is "we don't guarantee any specific order".  It's like a
> query without ORDER BY: you may currently always get document order, but
> if you upgrade the library one day, it's quite possible to get the nodes
> in another order and you'll not get a refund.  So you (the user) should
> not rely on the order, or at least be mindful that it may change in the
> future.

Exactly. I called the behavior "counter-documented" to distinguish this
from the usual "undocumented" case, where you notice that a library is
behaving in a way you like, but its docs are utterly silent on the
matter, so you know you're going out on a limb to count on what you've
noticed.

In this case, you can notice the handy behavior but the doc *comes
right out and disclaims it* so if you count on it, you're going out
on a limb that has no bark left and looks punky.

And yet it seems worthwhile to mention how the library does in fact
seem to behave, because you might well be in the situation of porting
code over from SQL/XML:2006+ or XQuery or XPath 2+, or those are the
languages you've learned, so you may have order assumptions you've made,
and be surprised that XPath 1 doesn't let you make them, and at least
we can say "in a pinch, if you don't mind standing on this punky limb
here, you may be able to use the code you've got without having to
refactor every XMLTABLE() or xpath() into something wrapped in an
outer SQL query with ORDER BY. You just don't get your money back if
a later library upgrade changes the order."

The wiki page remembers[1] that I had tried some pretty gnarly XPath 1
queries to see if I could make libxml2 return things in a different
order, but no, got document order every time.

Regards,
-Chap

[1]
https://www.postgresql.org/message-id/5C465A65.4030305%40anastigmatix.net




Re: SET LOCAL ROLE NO RESET -- sandbox transactions

2019-03-27 Thread Chapman Flack
On 3/27/19 2:40 AM, Eric Hanson wrote:

> What would be the implications of adding a NO RESET clause to SET LOCAL
> ROLE?

There's a part of this that seems to be a special case of the
GUC-protected-by-cookie idea discussed a bit in [1] and [2]
(which is still an idea that I like).

Regards,
-Chap

[1]
https://www.postgresql.org/message-id/59127E4E.8090705%40anastigmatix.net

[2]
https://www.postgresql.org/message-id/CA%2BTgmoYOz%2BZmOteahrduJCc8RT8GEgC6PNXLwRzJPObmHGaurg%40mail.gmail.com




Re: Progress reporting for pg_verify_checksums

2019-03-27 Thread Michael Banck
Hi,

thanks again for your review!

Am Mittwoch, den 27.03.2019, 15:34 +0100 schrieb Fabien COELHO:
> Hallo Michael,
> 
> About patch v12:
> 
> Patch applies cleanly, compiles. make check ok, but feature is not tested. 
> Doc build ok.
> 
> Although I'm in favor of it, I'm not sure that the signal think will make 
> it for 12. Maybe it is worth compromising, doing a simple version for now, 
> and resubmitting the signal feature later?

Let's wait one more round. I think that feature is quite useful and
isolated enough that it could stay, but I'll remove it for the next
patch version if needed.

> ISTM that someone suggested 4 Hz was the good eye speed, but you wait for 
> 400 ms, which is 2.5 Hz. What about it?

Uh right, I messed that up, fixed.

> I seems that current_size and total_size are not in the same unit:
> 
>   +   if (current_size > total_size)
>   +  total_size = current_size / MEGABYTES;
> 
> But they should both really be bytes, always.

Yeah, they are both bytes and the "/ MEGABYTES" is wrong, removed.

> I think that the computations should be inverted:
>   - first adjust total_size to current_size if needed
>   - then compute percent
>   - remove the percent adjustement as it is <= 100
> since current_size <= total_size

Ok, done so.

> I still think that the speed should compute a double to avoid integer 
> rounding errors within the computation. ISTM that rounding should only be 
> done for display in the end.

Ok, also done this way. I decided to print only full MB and not any
further digits as those don't seem very relevant.

> I'm okay with calling the report on each file even if this means every few 
> GB...

For my I/O throttling branch I've backed off to only call the report
every 100 blocks (and at the end of the file). I've added that logic to
this patch as well.

> Someone suggested ETA, and it seems rather simple to do. What about 
> adding it?

I don't know, just computing the ETA is easy of course. But you'd have
to fiddle with seconds/minutes/hours conversions cause the runtime could
be hours. Then, you don't want to have it bumping around weirdly when
your I/O rate changes, cf. https://xkcd.com/612/ That doesn't sounds
simpler than the signal trigger we might get rid of.

I have attached a POC which just prints the remaining seconds. If
somebody wants to contribute a full implementation as a co-author, I'm
happy to include it. Otherwise, I might take a stab at it over the next
days, but it is not on the top of my TODO list.

New patch attached. I've also changed the reporting line so that it
prints a couple of blanks at the end cause I noticed that if the
previously reported line was longer than the current line, then the end
of it is still visible.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 1f4d4ab8b4..d18072a067 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -136,6 +136,19 @@ PostgreSQL documentation
  
 
  
+  -P
+  --progress
+  
+   
+Enable progress reporting. Turning this on will deliver an approximate
+progress report during the checksum verification. Sending the
+SIGUSR1 signal will toggle progress reporting
+on or off during the verification run (not available on Windows).
+   
+  
+ 
+
+ 
-V
--version

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 5c185ebed8..e9a47e96f6 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -15,6 +15,8 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -24,6 +26,7 @@
 #include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
+#include "portability/instr_time.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
@@ -37,6 +40,7 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool do_sync = true;
 static bool verbose = false;
+static bool show_progress = false;
 
 typedef enum
 {
@@ -59,6 +63,14 @@ static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
 
+/*
+ * Progress status information.
+ */
+int64		total_size = 0;
+int64		current_size = 0;
+instr_time	last_progress_update;
+instr_time	scan_started;
+
 static void
 usage(void)
 {
@@ -73,6 +85,7 @@ usage(void)
 	printf(_("  -N, --no-sync  do not wait for changes to be written safely to disk\n"));
 	printf(_(" 

Re: pgbench - doCustom cleanup

2019-03-27 Thread Alvaro Herrera
Pushed, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Reid Thompson
On Wed, 2019-03-27 at 15:07 +0100, Andreas Karlsson wrote:
> [EXTERNAL SOURCE]
> 
> 
> 
> On 3/27/19 2:51 PM, Tomas Vondra wrote:
> > I think the consensus in this thread (and the previous ancient ones) is
> > that it's not worth it. It's one thing to introduce new commands with the
> > pg_ prefix, and it's a completely different thing to rename existing ones.
> > That has inherent costs, and as Tom pointed out the burden would fall on
> > people using PostgreSQL (and that's rather undesirable).
> > 
> > I personally don't see why having commands without pg_ prefix would be
> > an issue. Especially when placed in a separate directory, which eliminates
> > the possibility of conflict with other commands.
> 
> I buy that it may not be worth breaking tens of thousands of scripts to
> fix this, but I disagree about it not being an issue. Most Linux
> distributions add PostgreSQL's executables in to a directory which is in
> the default $PATH (/usr/bin in the case of Debian). And even if it would
> be installed into a separate directory there would still be a conflict
> as soon as that directory is added to $PATH.
> 
> And I think that it is also relatively easy to confuse adduser and
> createuser when reading a script. Nothing about the name createuser
> indicates that it will create a role in an SQL database.
> 
> Andreas
> 

theres nothing about createuser or adduser( useradd on my system,
adduser doesn't exist on mine ) that indicates that either would/should
create a user in the system either.  That's what man and -h/--help are
for.  If you don't know what an executable does, don't invoke it until
you do.  That's a basic premise for any executable.

reid



Re: speeding up planning with partitions

2019-03-27 Thread Tom Lane
David Rowley  writes:
> On Wed, 27 Mar 2019 at 18:39, Amit Langote
>  wrote:
>> On 2019/03/27 14:26, David Rowley wrote:
>>> Perhaps the way to make this work, at least in the long term is to do
>>> in the planner what we did in the executor in d73f4c74dd34.

>> Maybe you meant 9ddef36278a9?

> Probably.

Yeah, there's something to be said for having plancat.c open each table
*and store the Relation pointer in the RelOptInfo*, and then close that
again at the end of planning rather than immediately.  If we can't avoid
these retail table_opens without a great deal of pain, that's the
direction I'd tend to go in.  However it would add some overhead, in
the form of a need to traverse the RelOptInfo array an additional time.

regards, tom lane




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Alvaro Herrera
On 2019-Mar-27, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Mar-27, Tomas Vondra wrote:
> >> I think the consensus in this thread (and the previous ancient ones) is
> >> that it's not worth it. It's one thing to introduce new commands with the
> >> pg_ prefix, and it's a completely different thing to rename existing ones.
> >> That has inherent costs, and as Tom pointed out the burden would fall on
> >> people using PostgreSQL (and that's rather undesirable).
> 
> > I thought the consensus was to rename them, and install symlinks to the
> > old names.
> 
> The question is what's the endgame.  We haven't actually fixed the
> complained-of confusion problem unless we eventually remove createuser
> and dropuser under those names.

Well, partly we have, because there mere act of having a symlink
documents the command via the symlink target.

Somebody proposed to rename createuser not to pg_createuser, though, but
rather to pg_createrole; ditto dropuser.  That seems to make sense.

I additionally proposed (nobody replied to this part) that we could have
the command print a WARNING if the argv[0] is shown to be the old name.
Not necessarily in pg12; maybe we can have them print such a warning in
pg13, and then remove the old names three years from now, or something
like that.

I suppose that if you're a Postgres developer, you naturally expect that
"createdb" creates a Postgres DB.  What if you use multiple database
systems, and then only occasionally have to do DBA tasks?  I find this
POV that createdb doesn't need renaming a bit self-centered.

> Are we prepared to force script breakage of that sort, even over a
> multi-year deprecation cycle?

Why not?

> (As a comparison point, I note that we still haven't removed the
> "postmaster" symlink, though it's been deprecated for at least a
> dozen years.)

I don't think that change was because of executable namespace pollution
or user confusion.  (Commit 5266f221a2e1, can't find the discussion
though.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: jsonpath

2019-03-27 Thread Tomas Vondra

On Wed, Mar 27, 2019 at 05:37:40PM +0300, Alexander Korotkov wrote:

On Wed, Mar 27, 2019 at 4:49 PM Tom Lane  wrote:

Alexander Korotkov  writes:
> Still no reproduction.

Annoying, but it's probably not worth expending more effort on
right now.  I wonder whether that buildfarm animal can be upgraded
to capture core dump stack traces --- if so, then if it happens
again we'd have more info.


Hopefully, Andrew will manage to get a backtrace [1].

BTW, while searching for this bug, I've collected this backtrace using valgrind.

==00:00:00:14.596 10866== Conditional jump or move depends on
uninitialised value(s)
==00:00:00:14.596 10866==at 0x579F8C4: strtod_l_internal (in
/usr/lib64/libc-2.17.so)
==00:00:00:14.596 10866==by 0x771561: float8in_internal_opt_error
(float.c:394)
==00:00:00:14.596 10866==by 0x7718B9: float8in_internal (float.c:515)
==00:00:00:14.596 10866==by 0x7718B9: float8in (float.c:336)
==00:00:00:14.596 10866==by 0x842D43: DirectFunctionCall1Coll (fmgr.c:803)
==00:00:00:14.596 10866==by 0x7C9649: numeric_float8 (numeric.c:3417)
==00:00:00:14.596 10866==by 0x842D43: DirectFunctionCall1Coll (fmgr.c:803)
==00:00:00:14.596 10866==by 0x7A1D8D: jsonb_float8 (jsonb.c:2058)
==00:00:00:14.596 10866==by 0x5F8F54: ExecInterpExpr (execExprInterp.c:649)
==00:00:00:14.596 10866==by 0x6A2E19: ExecEvalExprSwitchContext
(executor.h:307)
==00:00:00:14.596 10866==by 0x6A2E19: evaluate_expr (clauses.c:4827)
==00:00:00:14.596 10866==by 0x6A45FF: evaluate_function (clauses.c:4369)
==00:00:00:14.596 10866==by 0x6A45FF: simplify_function (clauses.c:3999)
==00:00:00:14.596 10866==by 0x6A31C1:
eval_const_expressions_mutator (clauses.c:2474)
==00:00:00:14.596 10866==by 0x644466: expression_tree_mutator
(nodeFuncs.c:3072)
==00:00:00:14.596 10866==
{
  
  Memcheck:Cond
  fun:strtod_l_internal
  fun:float8in_internal_opt_error
  fun:float8in_internal
  fun:float8in
  fun:DirectFunctionCall1Coll
  fun:numeric_float8
  fun:DirectFunctionCall1Coll
  fun:jsonb_float8
  fun:ExecInterpExpr
  fun:ExecEvalExprSwitchContext
  fun:evaluate_expr
  fun:evaluate_function
  fun:simplify_function
  fun:eval_const_expressions_mutator
  fun:expression_tree_mutator
}

Not sure whether it's related to 16d489b0fe.  Will investigate more.



This might be another case of false positive due to SSE (which I think is
used by strtod in some cases). But I'd expect strncasecmp in the stack in
that case, so maybe that's not it.


cheers

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: PostgreSQL pollutes the file system

2019-03-27 Thread Tom Lane
Andreas Karlsson  writes:
> On 3/27/19 3:26 PM, Tomas Vondra wrote:
>> That is true, of course. But are there actual examples of such conflicts
>> in practice? I mean, are there tools/packages that provide commands with
>> a conflicting name? I'm not aware of any, and as was pointed before, we'd
>> have ~20 years of history on any new ones.

> That is a fair argument. Since we squatted those names back in the 
> mid-90s I think the risk of collision is low.

Right.  I think there is a fair argument to be made for user confusion
(not actual conflict) with respect to createuser and dropuser.  The
argument for renaming any of the other tools is much weaker, IMO.

regards, tom lane




Re: jsonpath

2019-03-27 Thread Alexander Korotkov
On Wed, Mar 27, 2019 at 4:49 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > Still no reproduction.
>
> Annoying, but it's probably not worth expending more effort on
> right now.  I wonder whether that buildfarm animal can be upgraded
> to capture core dump stack traces --- if so, then if it happens
> again we'd have more info.

Hopefully, Andrew will manage to get a backtrace [1].

BTW, while searching for this bug, I've collected this backtrace using valgrind.

==00:00:00:14.596 10866== Conditional jump or move depends on
uninitialised value(s)
==00:00:00:14.596 10866==at 0x579F8C4: strtod_l_internal (in
/usr/lib64/libc-2.17.so)
==00:00:00:14.596 10866==by 0x771561: float8in_internal_opt_error
(float.c:394)
==00:00:00:14.596 10866==by 0x7718B9: float8in_internal (float.c:515)
==00:00:00:14.596 10866==by 0x7718B9: float8in (float.c:336)
==00:00:00:14.596 10866==by 0x842D43: DirectFunctionCall1Coll (fmgr.c:803)
==00:00:00:14.596 10866==by 0x7C9649: numeric_float8 (numeric.c:3417)
==00:00:00:14.596 10866==by 0x842D43: DirectFunctionCall1Coll (fmgr.c:803)
==00:00:00:14.596 10866==by 0x7A1D8D: jsonb_float8 (jsonb.c:2058)
==00:00:00:14.596 10866==by 0x5F8F54: ExecInterpExpr (execExprInterp.c:649)
==00:00:00:14.596 10866==by 0x6A2E19: ExecEvalExprSwitchContext
(executor.h:307)
==00:00:00:14.596 10866==by 0x6A2E19: evaluate_expr (clauses.c:4827)
==00:00:00:14.596 10866==by 0x6A45FF: evaluate_function (clauses.c:4369)
==00:00:00:14.596 10866==by 0x6A45FF: simplify_function (clauses.c:3999)
==00:00:00:14.596 10866==by 0x6A31C1:
eval_const_expressions_mutator (clauses.c:2474)
==00:00:00:14.596 10866==by 0x644466: expression_tree_mutator
(nodeFuncs.c:3072)
==00:00:00:14.596 10866==
{
   
   Memcheck:Cond
   fun:strtod_l_internal
   fun:float8in_internal_opt_error
   fun:float8in_internal
   fun:float8in
   fun:DirectFunctionCall1Coll
   fun:numeric_float8
   fun:DirectFunctionCall1Coll
   fun:jsonb_float8
   fun:ExecInterpExpr
   fun:ExecEvalExprSwitchContext
   fun:evaluate_expr
   fun:evaluate_function
   fun:simplify_function
   fun:eval_const_expressions_mutator
   fun:expression_tree_mutator
}

Not sure whether it's related to 16d489b0fe.  Will investigate more.

1. 
https://www.postgresql.org/message-id/91ff584f-75d2-471f-4d9e-9ee8f09cdc1d%402ndQuadrant.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Andreas Karlsson

On 3/27/19 3:26 PM, Tomas Vondra wrote:

That is true, of course. But are there actual examples of such conflicts
in practice? I mean, are there tools/packages that provide commands with
a conflicting name? I'm not aware of any, and as was pointed before, we'd
have ~20 years of history on any new ones.


That is a fair argument. Since we squatted those names back in the 
mid-90s I think the risk of collision is low.


Andreas




Re: Progress reporting for pg_verify_checksums

2019-03-27 Thread Fabien COELHO



Hallo Michael,

About patch v12:

Patch applies cleanly, compiles. make check ok, but feature is not tested. 
Doc build ok.


Although I'm in favor of it, I'm not sure that the signal think will make 
it for 12. Maybe it is worth compromising, doing a simple version for now, 
and resubmitting the signal feature later?


ISTM that someone suggested 4 Hz was the good eye speed, but you wait for 
400 ms, which is 2.5 Hz. What about it?


I seems that current_size and total_size are not in the same unit:

 +   if (current_size > total_size)
 +  total_size = current_size / MEGABYTES;

But they should both really be bytes, always.

I think that the computations should be inverted:
 - first adjust total_size to current_size if needed
 - then compute percent
 - remove the percent adjustement as it is <= 100
   since current_size <= total_size

I still think that the speed should compute a double to avoid integer 
rounding errors within the computation. ISTM that rounding should only be 
done for display in the end.


I'm okay with calling the report on each file even if this means every few 
GB...


Someone suggested ETA, and it seems rather simple to do. What about 
adding it?


--
Fabien.




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Tomas Vondra

On Wed, Mar 27, 2019 at 03:07:24PM +0100, Andreas Karlsson wrote:

On 3/27/19 2:51 PM, Tomas Vondra wrote:

I think the consensus in this thread (and the previous ancient ones) is
that it's not worth it. It's one thing to introduce new commands with the
pg_ prefix, and it's a completely different thing to rename existing ones.
That has inherent costs, and as Tom pointed out the burden would fall on
people using PostgreSQL (and that's rather undesirable).

I personally don't see why having commands without pg_ prefix would be
an issue. Especially when placed in a separate directory, which eliminates
the possibility of conflict with other commands.


I buy that it may not be worth breaking tens of thousands of scripts 
to fix this, but I disagree about it not being an issue. Most Linux 
distributions add PostgreSQL's executables in to a directory which is 
in the default $PATH (/usr/bin in the case of Debian). And even if it 
would be installed into a separate directory there would still be a 
conflict as soon as that directory is added to $PATH.




That is true, of course. But are there actual examples of such conflicts
in practice? I mean, are there tools/packages that provide commands with
a conflicting name? I'm not aware of any, and as was pointed before, we'd
have ~20 years of history on any new ones.

And I think that it is also relatively easy to confuse adduser and 
createuser when reading a script. Nothing about the name createuser 
indicates that it will create a role in an SQL database.




Sure, and I've confused those tools too in the past. But that's not
something you'll hit in a script, at least not if you test it before
running it on production system. And if you're running untested scripts,
this is likely the least of your problems ...

cheers

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: PostgreSQL pollutes the file system

2019-03-27 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Mar-27, Tomas Vondra wrote:
>> I think the consensus in this thread (and the previous ancient ones) is
>> that it's not worth it. It's one thing to introduce new commands with the
>> pg_ prefix, and it's a completely different thing to rename existing ones.
>> That has inherent costs, and as Tom pointed out the burden would fall on
>> people using PostgreSQL (and that's rather undesirable).

> I thought the consensus was to rename them, and install symlinks to the
> old names.

The question is what's the endgame.  We haven't actually fixed the
complained-of confusion problem unless we eventually remove createuser
and dropuser under those names.  Are we prepared to force script
breakage of that sort, even over a multi-year deprecation cycle?

(As a comparison point, I note that we still haven't removed the
"postmaster" symlink, though it's been deprecated for at least a
dozen years.)

regards, tom lane




Re: partitioned tables referenced by FKs

2019-03-27 Thread Jesper Pedersen

Hi,

On 3/26/19 5:39 PM, Alvaro Herrera wrote:

As I said before, I'm thinking of getting rid of the whole business of
checking partitions on the referenced side of an FK at DROP time, and
instead jut forbid the DROP completely if any FKs reference an ancestor
of that partition.


Will that allow `DROP TABLE parted_pk CASCADE` to succeed even if
partitions still contain referenced data?  I suppose that's the example
you cited upthread as a bug that remains to be solved.


That's the idea, yes, it should do that: only allow a DROP of a
partition referenced by an FK if the topmost constraint is also being
dropped.  Maybe this means I need to get rid of 0002 completely.  But I
haven't got to doing that yet.



I think that is ok, if doc/src/sgml/ref/drop_table.sgml is updated with 
a description of how CASCADE works in this scenario.


Best regards,
 Jesper




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Tomas Vondra

On Wed, Mar 27, 2019 at 11:00:18AM -0300, Alvaro Herrera wrote:

On 2019-Mar-27, Tomas Vondra wrote:


I think the consensus in this thread (and the previous ancient ones) is
that it's not worth it. It's one thing to introduce new commands with the
pg_ prefix, and it's a completely different thing to rename existing ones.
That has inherent costs, and as Tom pointed out the burden would fall on
people using PostgreSQL (and that's rather undesirable).


I thought the consensus was to rename them, and install symlinks to the
old names.



I know symlinks were mentioned/proposed, but I don't think there's a clear
consensus to do that. I might have missed that part of the discussion.

That being said, I'm not strongly opposed to doing that, although I still
don't see the need to do that ...

regard


--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: PostgreSQL pollutes the file system

2019-03-27 Thread Andreas Karlsson

On 3/27/19 2:51 PM, Tomas Vondra wrote:

I think the consensus in this thread (and the previous ancient ones) is
that it's not worth it. It's one thing to introduce new commands with the
pg_ prefix, and it's a completely different thing to rename existing ones.
That has inherent costs, and as Tom pointed out the burden would fall on
people using PostgreSQL (and that's rather undesirable).

I personally don't see why having commands without pg_ prefix would be
an issue. Especially when placed in a separate directory, which eliminates
the possibility of conflict with other commands.


I buy that it may not be worth breaking tens of thousands of scripts to 
fix this, but I disagree about it not being an issue. Most Linux 
distributions add PostgreSQL's executables in to a directory which is in 
the default $PATH (/usr/bin in the case of Debian). And even if it would 
be installed into a separate directory there would still be a conflict 
as soon as that directory is added to $PATH.


And I think that it is also relatively easy to confuse adduser and 
createuser when reading a script. Nothing about the name createuser 
indicates that it will create a role in an SQL database.


Andreas




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Komяpa
Hello,

at the very least my Ubuntu Cosmic has createdb, createuser and createlang
in user's space, and I had at least two cases when people were trying to
use createuser to create a new OS user.

I would prefer them having pg_ prefix to have less confusion.

On Wed, Mar 27, 2019 at 4:51 PM Tomas Vondra 
wrote:

> On Wed, Mar 27, 2019 at 02:31:14PM +0100, Fred .Flintstone wrote:
> >Many of these are gone in the modern PostgreSQL, a few remain.
> >https://packages.ubuntu.com/disco/amd64/postgresql-client-11/filelist
> >
> >/usr/lib/postgresql/11/bin/clusterdb
> >/usr/lib/postgresql/11/bin/createdb
> >/usr/lib/postgresql/11/bin/createuser
> >/usr/lib/postgresql/11/bin/dropdb
> >/usr/lib/postgresql/11/bin/dropuser
> >/usr/lib/postgresql/11/bin/pg_basebackup
> >/usr/lib/postgresql/11/bin/pg_dump
> >/usr/lib/postgresql/11/bin/pg_dumpall
> >/usr/lib/postgresql/11/bin/pg_isready
> >/usr/lib/postgresql/11/bin/pg_receivewal
> >/usr/lib/postgresql/11/bin/pg_recvlogical
> >/usr/lib/postgresql/11/bin/pg_restore
> >/usr/lib/postgresql/11/bin/psql
> >/usr/lib/postgresql/11/bin/reindexdb
> >/usr/lib/postgresql/11/bin/vacuumdb
> >
> >Can we rename clusterdb, reindexdb and vacuumdb to carry the pg_ prefix?
> >
>
> I think the consensus in this thread (and the previous ancient ones) is
> that it's not worth it. It's one thing to introduce new commands with the
> pg_ prefix, and it's a completely different thing to rename existing ones.
> That has inherent costs, and as Tom pointed out the burden would fall on
> people using PostgreSQL (and that's rather undesirable).
>
> I personally don't see why having commands without pg_ prefix would be
> an issue. Especially when placed in a separate directory, which eliminates
> the possibility of conflict with other commands.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>
>

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: PostgreSQL pollutes the file system

2019-03-27 Thread Fred .Flintstone
It does not matter if they are in a different directory, because when
I use tab-completion in the shell, then all commands show.
I type "create" then "createdb" and "createuser" shows up. This
is very confusing, and I don't know if this creates a Linux system
user account or a PostgreSQL account. Without knowing better, I would
be inclined to believe such a command would create a system account.

It gets even more confusing when a user have multiple database servers
installed such as MySQL and PostgreSQL or MongoDB and PostgreSQL. Then
it is very confusing what "createdb" does.


On Wed, Mar 27, 2019 at 2:51 PM Tomas Vondra
 wrote:
>
> On Wed, Mar 27, 2019 at 02:31:14PM +0100, Fred .Flintstone wrote:
> >Many of these are gone in the modern PostgreSQL, a few remain.
> >https://packages.ubuntu.com/disco/amd64/postgresql-client-11/filelist
> >
> >/usr/lib/postgresql/11/bin/clusterdb
> >/usr/lib/postgresql/11/bin/createdb
> >/usr/lib/postgresql/11/bin/createuser
> >/usr/lib/postgresql/11/bin/dropdb
> >/usr/lib/postgresql/11/bin/dropuser
> >/usr/lib/postgresql/11/bin/pg_basebackup
> >/usr/lib/postgresql/11/bin/pg_dump
> >/usr/lib/postgresql/11/bin/pg_dumpall
> >/usr/lib/postgresql/11/bin/pg_isready
> >/usr/lib/postgresql/11/bin/pg_receivewal
> >/usr/lib/postgresql/11/bin/pg_recvlogical
> >/usr/lib/postgresql/11/bin/pg_restore
> >/usr/lib/postgresql/11/bin/psql
> >/usr/lib/postgresql/11/bin/reindexdb
> >/usr/lib/postgresql/11/bin/vacuumdb
> >
> >Can we rename clusterdb, reindexdb and vacuumdb to carry the pg_ prefix?
> >
>
> I think the consensus in this thread (and the previous ancient ones) is
> that it's not worth it. It's one thing to introduce new commands with the
> pg_ prefix, and it's a completely different thing to rename existing ones.
> That has inherent costs, and as Tom pointed out the burden would fall on
> people using PostgreSQL (and that's rather undesirable).
>
> I personally don't see why having commands without pg_ prefix would be
> an issue. Especially when placed in a separate directory, which eliminates
> the possibility of conflict with other commands.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Alvaro Herrera
On 2019-Mar-27, Tomas Vondra wrote:

> I think the consensus in this thread (and the previous ancient ones) is
> that it's not worth it. It's one thing to introduce new commands with the
> pg_ prefix, and it's a completely different thing to rename existing ones.
> That has inherent costs, and as Tom pointed out the burden would fall on
> people using PostgreSQL (and that's rather undesirable).

I thought the consensus was to rename them, and install symlinks to the
old names.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Enable data checksums by default

2019-03-27 Thread Christoph Berg
Re: To Tom Lane 2019-03-26 <20190326151446.gg3...@msg.df7cb.de>
> I run a benchmark with checksums disabled/enabled. shared_buffers is
> 512kB to make sure almost any read will fetch the page from the OS
> cache; scale factor is 50 (~750MB) to make sure the whole cluster fits
> into RAM.
[...]
> So the cost is 5% in this very contrived case. In almost any other
> setting, the cost would be lower, I'd think.

(That was on 12devel, btw.)

That was about the most extreme OLTP read-only workload. After
thinking about it some more, I realized that exercising large seqscans
might be an even better way to test it because of less per-query
overhead.

Same setup again, shared_buffers = 16 (128kB), jit = off,
max_parallel_workers_per_gather = 0:

select count(bid) from pgbench_accounts;

no checksums: ~456ms
with checksums: ~489ms

456.0/489 = 0.9325

The cost of checksums is about 6.75% here.

Christoph




Re: Progress reporting for pg_verify_checksums

2019-03-27 Thread Michael Banck
Hi,

Am Samstag, den 23.03.2019, 10:49 +0900 schrieb Michael Paquier:
> On Fri, Mar 22, 2019 at 02:23:17PM +0100, Michael Banck wrote:
> > The current version prints a newline when it progress reporting is
> > toggled off. Do you mean there is a hazard that this happens right when
> > we are printing the progress, so end up with a partly garbage line? I
> > don't think that'd be so bad to warrant further code complexitiy, after
> > all, the user explicitly wanted the progress to stop.
> > 
> > But maybe I am misunderstanding?
> 
> The latest patch does not apply because of mainly ed308d7.  Could you
> send a rebase?

I've rebased it now, see attached.

> FWIW, I would remove the signal toggling stuff from the patch and keep
> the logic simple at this stage.  Not having a solution which works on
> Windows is perhaps not a strong reason to not include it, but it's a
> sign that we could perhaps design something better, and that's
> annoying.  Personally I think that I would just use --progress all the
> time and not use the signaling part at all, my 2c.

We had this discussion already, and back then I was against dropping it
as well.

Do you see --progress being the default? I don't think this is in-line
with other project binaries, so I guess not?

If not, I guess your main point is that we might come up with a better
implementation/UI further down the road.  Which is fair enough, but I
don't think this is such an in-your-face feature that changing the way
it works in future releases would be a terrible breakage of backwards
compatibility. Even more so as it is solely an interactive feature and
cannot be used in scripts etc.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 1f4d4ab8b4..d18072a067 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -136,6 +136,19 @@ PostgreSQL documentation
  
 
  
+  -P
+  --progress
+  
+   
+Enable progress reporting. Turning this on will deliver an approximate
+progress report during the checksum verification. Sending the
+SIGUSR1 signal will toggle progress reporting
+on or off during the verification run (not available on Windows).
+   
+  
+ 
+
+ 
-V
--version

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 5c185ebed8..4bcafb02dd 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -15,6 +15,8 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -24,6 +26,7 @@
 #include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
+#include "portability/instr_time.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
@@ -37,6 +40,7 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool do_sync = true;
 static bool verbose = false;
+static bool show_progress = false;
 
 typedef enum
 {
@@ -59,6 +63,14 @@ static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
 
+/*
+ * Progress status information.
+ */
+int64		total_size = 0;
+int64		current_size = 0;
+instr_time	last_progress_update;
+instr_time	scan_started;
+
 static void
 usage(void)
 {
@@ -73,6 +85,7 @@ usage(void)
 	printf(_("  -N, --no-sync  do not wait for changes to be written safely to disk\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -r RELFILENODE check only relation with specified relfilenode\n"));
+	printf(_("  -P, --progress show progress information\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
 	printf(_("\nIf no data directory (DATADIR) is specified, "
@@ -97,6 +110,83 @@ static const char *const skip[] = {
 	NULL,
 };
 
+static void
+toggle_progress_report(int signum)
+{
+
+	/* we handle SIGUSR1 only, and toggle the value of show_progress */
+	if (signum == SIGUSR1)
+	{
+		/*
+		 * Skip to the next line in order to avoid overwriting half of
+		 * the progress report line with the final output.
+		 */
+		if (show_progress && isatty(fileno(stderr)))
+			fprintf(stderr, "\n");
+		show_progress = !show_progress;
+	}
+}
+
+/*
+ * Report current progress status. Parts borrowed from
+ * PostgreSQLs' src/bin/pg_basebackup.c
+ */
+static void
+report_progress(bool force)
+{
+	instr_time	now;
+	double		elapsed;
+

Re: PostgreSQL pollutes the file system

2019-03-27 Thread Tomas Vondra

On Wed, Mar 27, 2019 at 02:31:14PM +0100, Fred .Flintstone wrote:

Many of these are gone in the modern PostgreSQL, a few remain.
https://packages.ubuntu.com/disco/amd64/postgresql-client-11/filelist

/usr/lib/postgresql/11/bin/clusterdb
/usr/lib/postgresql/11/bin/createdb
/usr/lib/postgresql/11/bin/createuser
/usr/lib/postgresql/11/bin/dropdb
/usr/lib/postgresql/11/bin/dropuser
/usr/lib/postgresql/11/bin/pg_basebackup
/usr/lib/postgresql/11/bin/pg_dump
/usr/lib/postgresql/11/bin/pg_dumpall
/usr/lib/postgresql/11/bin/pg_isready
/usr/lib/postgresql/11/bin/pg_receivewal
/usr/lib/postgresql/11/bin/pg_recvlogical
/usr/lib/postgresql/11/bin/pg_restore
/usr/lib/postgresql/11/bin/psql
/usr/lib/postgresql/11/bin/reindexdb
/usr/lib/postgresql/11/bin/vacuumdb

Can we rename clusterdb, reindexdb and vacuumdb to carry the pg_ prefix?



I think the consensus in this thread (and the previous ancient ones) is
that it's not worth it. It's one thing to introduce new commands with the
pg_ prefix, and it's a completely different thing to rename existing ones.
That has inherent costs, and as Tom pointed out the burden would fall on
people using PostgreSQL (and that's rather undesirable).

I personally don't see why having commands without pg_ prefix would be
an issue. Especially when placed in a separate directory, which eliminates
the possibility of conflict with other commands.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: jsonpath

2019-03-27 Thread Tom Lane
Alexander Korotkov  writes:
> Still no reproduction.

Annoying, but it's probably not worth expending more effort on
right now.  I wonder whether that buildfarm animal can be upgraded
to capture core dump stack traces --- if so, then if it happens
again we'd have more info.

regards, tom lane




Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

2019-03-27 Thread Fabien COELHO


Hello Heikki,

Indeed, yet again, I forgot the attachement:-(

I stared at the new test case for a while, and I must say it looks very 
cryptic. It's not exactly this patch's fault - all the pgbench tests are 
cryptic -


Perl is cryptic. Regexprs are cryptic.

but I think we need to do something about that before adding any more 
tests. I'm not sure what exactly, but I'd like them to be more like 
pg_regress tests, where you have an expected output and you compare it with 
the actual output. I realize that's not easy, because there are a lot of 
varying numbers in the output, but we've got to do something.


As a good first step, I wish the pgbench() function used named arguments. 
[...]


You would have something like this:

my $elapsed = pgbench(
 test_name => 'pgbench progress',
 opts => '-T 2 -P 1 -l --aggregate-interval=1'


I do not like them much in perl because it changes the code significantly, 
but why not. That would be another patch anyway.


A lighter but efficient option would be to add a few comments on the larger 
calls, see attached.


Please really find the attachement, and do not hesitate to share spare 
a few grey cells so that I will not forget about them in the futur:-)


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index cdd21d7469..7d853a478f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -6050,6 +6050,7 @@ threadRun(void *arg)
 	int			nstate = thread->nstate;
 	int			remains = nstate;	/* number of remaining clients */
 	socket_set *sockets = alloc_socket_set(nstate);
+	int			aborted = 0;
 	int			i;
 
 	/* for reporting progress: */
@@ -6279,6 +6280,10 @@ threadRun(void *arg)
 			 */
 			if (st->state == CSTATE_FINISHED || st->state == CSTATE_ABORTED)
 remains--;
+
+			/* count aborted clients */
+			if (st->state == CSTATE_ABORTED)
+aborted++;
 		}
 
 		/* progress report is made by thread 0 for all threads */
@@ -6295,7 +6300,10 @@ threadRun(void *arg)
 
 /*
  * Ensure that the next report is in the future, in case
- * pgbench/postgres got stuck somewhere.
+ * pgbench/postgres got stuck somewhere. This may skip
+ * some progress reports if the thread does not get enough
+ * cpu time. In such case, probably the whole bench should
+ * be ignored.
  */
 do
 {
@@ -6306,6 +6314,52 @@ threadRun(void *arg)
 	}
 
 done:
+	/*
+	 * Under -R, comply with -T and -P even if there is nothing to do,
+	 * (unless all clients aborted) and ensure that one report is printed.
+	 * This special behavior allows tap tests to check that a run lasts
+	 * as expected and that some progress is shown, even on very slow hosts.
+	 */
+	if (duration && throttle_delay && aborted < nstate && thread->tid == 0)
+	{
+		int64		thread_end = thread_start + (int64) duration * 100;
+		instr_time	now_time;
+		int64		now;
+
+		INSTR_TIME_SET_CURRENT(now_time);
+		now = INSTR_TIME_GET_MICROSEC(now_time);
+
+		while (now < thread_end)
+		{
+			if (progress && next_report <= thread_end)
+			{
+pg_usleep(next_report - now);
+INSTR_TIME_SET_CURRENT(now_time);
+now = INSTR_TIME_GET_MICROSEC(now_time);
+printProgressReport(thread, thread_start, now, &last, &last_report);
+
+/* schedule next report */
+do
+{
+	next_report += (int64) progress * 100;
+} while (now >= next_report);
+			}
+			else
+			{
+pg_usleep(thread_end - now);
+INSTR_TIME_SET_CURRENT(now_time);
+now = INSTR_TIME_GET_MICROSEC(now_time);
+			}
+		}
+
+		/*
+		 * Print a closing progress report if none were printed
+		 * and at least one was expected.
+		 */
+		if (progress && progress <= duration && last_report == thread_start)
+			printProgressReport(thread, thread_start, now, &last, &last_report);
+	}
+
 	INSTR_TIME_SET_CURRENT(start);
 	disconnect_all(state, nstate);
 	INSTR_TIME_SET_CURRENT(end);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index ad3a7a79f8..424fe074d6 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -4,13 +4,14 @@ use warnings;
 use PostgresNode;
 use TestLib;
 use Test::More;
+use Time::HiRes qw{time};
 
 # start a pgbench specific server
 my $node = get_new_node('main');
 $node->init;
 $node->start;
 
-# invoke pgbench, with parameters:
+# invoke pgbench, return elapsed time, with parameters:
 #   $opts: options as a string to be split on spaces
 #   $stat: expected exit status
 #   $out: reference to a regexp list that must match stdout
@@ -49,13 +50,14 @@ sub pgbench
 	}
 
 	push @cmd, @args;
+	my $t0 = time();
 
 	$node->command_checks_all(\@cmd, $stat, $out, $err, $name);
 
 	# cleanup?
 	#unlink @filenames or die "cannot unlink files (@filenames): $!";
 
-	return;
+	return time() - $t0;
 }
 
 # Test concurrent insertion into table with serial column.  This
@@ -894,16 +896,68 @@ sub check_pgbench_logs
 
 my $bdir = $node->basedir;
 
-# with sampling rat

Re: Fix XML handling with DOCTYPE

2019-03-27 Thread Alvaro Herrera
On 2019-Mar-27, Chapman Flack wrote:

> On 03/26/19 21:39, Ryan Lambert wrote:

> > Should this be clarified as the libxml2 library?  That's what I installed
> > to build postgres from source (Ubuntu 16/18).  If it is the libxml library
> > and the "2" is irrelevant
> 
> That's a good catch. I'm not actually sure whether there is any "libxml"
> library that isn't libxml2. Maybe there was once and nobody admits to
> hanging out with it. Most Google hits on "libxml" seem to be modules
> that have libxml in their names and libxml2 as their actual dependency.
> 
>   Perl XML:LibXML: "This module is an interface to libxml2"
>   Haskell libxml: "Binding to libxml2"
>   libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
> for the GNOME Libxml2 ..."
> 
>   --with-libxml is the PostgreSQL configure option to make it use libxml2.
> 
>   The very web page http://xmlsoft.org/index.html says "The XML C parser
>   and toolkit of Gnome: libxml" and is all about libxml2.
> 
> So I think I was unsure what convention to follow, and threw up my hands
> and went with libxml. I could just as easily throw them up and go with
> libxml2. Which do you think would be better?

Daniel Veillard actually had libxml version 1 in that repository (mostly
of GNOME provenance, it seems, put together during some W3C meeting in
1998).  The version number changed to 2 sometime during year 2000.
Version 1 was mostly abandoned at that point, and for some reason
everyone keeps using "libxml2" as the name as though it was a different
thing from "libxml".  I suppose the latter name is just too generic, or
because they wanted to differentiate from the old (probably
incompatible API) code.
https://gitlab.gnome.org/GNOME/libxml2/tree/LIB_XML_1_BRANCH

Everyone calls it "libxml2" nowadays.  Let's just use that and avoid any
possible confusion.  If some libxml3 emerges one day, it's quite likely
we'll need to revise much more than our docs in order to use it.

> On 03/26/19 23:52, Tom Lane wrote:
> > Do we need to mention that at all?  If you're not building from source,
> > it doesn't seem very interesting ... but maybe I'm missing some reason
> > why end users would care.
> 
> The three places I've mentioned it were the ones where I thought users
> might care:

These seem relevant details.

> If you agree, I should go through and fix my nodesets to be node-sets.

+1

> [1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
> in no particular order"

What this means is "we don't guarantee any specific order".  It's like a
query without ORDER BY: you may currently always get document order, but
if you upgrade the library one day, it's quite possible to get the nodes
in another order and you'll not get a refund.  So you (the user) should
not rely on the order, or at least be mindful that it may change in the
future.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PostgreSQL pollutes the file system

2019-03-27 Thread Fred .Flintstone
Many of these are gone in the modern PostgreSQL, a few remain.
https://packages.ubuntu.com/disco/amd64/postgresql-client-11/filelist

/usr/lib/postgresql/11/bin/clusterdb
/usr/lib/postgresql/11/bin/createdb
/usr/lib/postgresql/11/bin/createuser
/usr/lib/postgresql/11/bin/dropdb
/usr/lib/postgresql/11/bin/dropuser
/usr/lib/postgresql/11/bin/pg_basebackup
/usr/lib/postgresql/11/bin/pg_dump
/usr/lib/postgresql/11/bin/pg_dumpall
/usr/lib/postgresql/11/bin/pg_isready
/usr/lib/postgresql/11/bin/pg_receivewal
/usr/lib/postgresql/11/bin/pg_recvlogical
/usr/lib/postgresql/11/bin/pg_restore
/usr/lib/postgresql/11/bin/psql
/usr/lib/postgresql/11/bin/reindexdb
/usr/lib/postgresql/11/bin/vacuumdb

Can we rename clusterdb, reindexdb and vacuumdb to carry the pg_ prefix?

On Fri, Mar 22, 2019 at 3:13 AM Mark Kirkwood
 wrote:
>
> On 22/03/19 3:05 PM, Tom Lane wrote:
> > Michael Paquier  writes:
> >> I would be curious to hear the reason why such tool names have been
> >> chosen from the start.  The tools have been switched to C in 9e0ab71
> >> from 2003, have been introduced by Peter Eisentraut as of 240e4c9 from
> >> 1999, and I cannot spot the thread from the time where this was
> >> discussed.
> > createuser, at least, dates back to Berkeley days: my copy of the
> > PG v4r2 tarball contains a "src/bin/createuser/createuser.sh" file
> > dated 1994-03-19.  (The 1999 commit you mention just moved the
> > functionality around; it was there before.)  So I imagine the answer
> > is that nobody at the time thought of fitting these scripts into a
> > larger ecosystem.
>
>
> FWIW the whole set is there in version 6.4.2:
>
> markir@vedavec:/download/postgres/src/postgresql-6.4.2/src/bin$ ls -l
> total 72
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 cleardbdir
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 createdb
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 createuser
> drwxr-sr-x 2 markir adm 4096 Dec 31  1998 CVS
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 destroydb
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 destroyuser
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 initdb
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 initlocation
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 ipcclean
> -rw-r--r-- 1 markir adm  795 Dec 19  1998 Makefile
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 pgaccess
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 pg_dump
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 pg_encoding
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 pg_id
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 pg_passwd
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 pgtclsh
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 pg_version
> drwxr-sr-x 3 markir adm 4096 Dec 31  1998 psql
>
> --
>
> Mark
>
>




Re: Log a sample of transactions

2019-03-27 Thread Adrien NAYRAT

On 3/27/19 10:22 AM, Adrien NAYRAT wrote:

Hello,

On 3/27/19 7:05 AM, Masahiko Sawada wrote:

Why are both of these checked? ISTM once xact_is_sampled is set, we
ought not to respect a different value of log_xact_sample_rate for the
rest of that transaction.

I added theses checks to allow to disable logging during a sampled
transaction, per suggestion of Masahiko Sawada:
https://www.postgresql.org/message-id/CAD21AoD4t%2BhTV6XfK5Yz%3DEocB8oMyJSYFfjAryCDYtqfib2GrA%40mail.gmail.com 


I added a comment to explain why transaction logging is rechecked.

Umm, I'm inclined to think that we should not disable logging within
the transaction. If we allow that, since users can disable the logging
within the transaction by setting it to 0 they may think that we can
change the rate during the transaction, which is wrong. If we want
this behavior we should document it but we can make user not being
able to change the rate during the transaction to avoid confusion. And
the latter looks more understandable and straightforward. This is
different comment what I did before, I'm sorry for confusing you.


Don't worry, I will change that.





As far as I can tell xact_is_sampled is not properly reset in case of
errors?


I am not sure if I should disable logging in case of errors. Actually we
have:

postgres=# set log_transaction_sample_rate to 1;
SET
postgres=# set client_min_messages to 'LOG';
LOG:  duration: 0.392 ms  statement: set client_min_messages to 'LOG';
SET
postgres=# begin;
LOG:  duration: 0.345 ms  statement: begin;
BEGIN
postgres=# select 1;
LOG:  duration: 0.479 ms  statement: select 1;
   ?column?
--
  1
(1 row)

postgres=# select * from missingtable;
ERROR:  relation "missingtable" does not exist
LINE 1: select * from missingtable;
    ^
postgres=# select 1;
ERROR:  current transaction is aborted, commands ignored until end of
transaction block
postgres=# rollback;
LOG:  duration: 11390.295 ms  statement: rollback;
ROLLBACK

If I reset xact_is_sampled (after the first error inside
AbortTransaction if I am right), "rollback" statement will not be
logged. I wonder if this "statement" should be logged?

If the answer is no, I will reset xact_is_sampled in AbortTransaction.


FWIW, I'd prefer to log "rollback" as well so as user can recognize
the end of transaction.


Ok.



+   /* Determine if statements are logged in this transaction */
+   xact_is_sampled = random() <= log_xact_sample_rate * 
MAX_RANDOM_VALUE;


If log_xact_sample_rate is 1 we always log all statement in the
transaction regardless of value of random(). Similarly if it's 0, we
never log. I think we can avoid unnecessary random() call in both case
like log_statement_sample_rate does.


I agree, I will change that.



 {"log_statement_sample_rate", PGC_SUSET, LOGGING_WHEN,
 gettext_noop("Fraction of statements over
log_min_duration_statement to log."),
 gettext_noop("If you only want a sample, use a
value between 0 (never "
-    "log) and 1.0 (always 
log).")
+    "log) and 1 (always 
log).")

 },
 &log_statement_sample_rate,
 1.0, 0.0, 1.0,
 NULL, NULL, NULL
 },

This change is not relevant with log_transaction_sample_rate feature
but why do we need this change? In postgresql.conf.sample the
description of both log_statement_sample_rate and
log_transaction_sample_rate use 1.0 instead of 1, like "1.0 logs all
statements from all transactions, 0 never logs". If we need this
change I think it should be a separate patch.



Sorry, I changed that, someone suggest using either "0" and "1", or 
"0.0" and "1.0" but not mixing both. I will remove this change.


Thanks for your review.





Patch attached with all changes requested.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d383de2512..99e38b32d1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5836,6 +5836,31 @@ local0.*/var/log/postgresql

   
 
+ 
+  log_transaction_sample_rate (real)
+  
+   log_transaction_sample_rate configuration parameter
+  
+  
+   
+
+ Set the fraction of transactions whose statements are logged. It applies
+ to each new transaction regardless of their duration. The default is
+ 0, meaning do not log statements from this transaction.
+ Setting this to 1 logs all statements for all transactions.
+ log_transaction_sample_rate is helpful to track a
+ sample of transaction.
+
+   
+
+ This option can add overhead when transactions are large (many short queries).
+ Or, when the workload is high and log_transaction_sample_rate
+ is closed to 1.
+
+   
+   
+ 
+
  
 
 
diff --git a/src/backend/ac

Re: Usage of epoch in txid_current

2019-03-27 Thread Heikki Linnakangas

On 27/03/2019 13:44, Thomas Munro wrote:

* I tidied up the code that serialises transaction state.  It was
already hammering round pegs into square holes, and the previous patch
made that even worse, so I added a new struct
SerializedTransactionState to do this properly.


Ah, good, it used to be confusing.


I still need to look into Andres's suggestion about getting rid of
epoch from various user interfaces and showing 64 bit numbers.  I
should probably also find a place in the relevant README to explain
this new scheme.  I will post follow-up patches for those.


Once we have the FullTransactionId type and basic macros in place, I'm 
sure we could tidy up a bunch of code by using them. For example, 
TransactionIdInRecentPast() in walsender.c would be simpler, if the 
caller dealt with FullTransactionIds rather than xid+epoch. But it makes 
sense to do that separately.



+/*
+ * Advance nextFullXid to the value after a given xid.  The epoch is inferred.
+ * This must only be called during recovery or from two-phase start-up code.
+ */
+void
+AdvanceNextFullTransactionIdPastXid(TransactionId xid)
+{
+   FullTransactionId newNextFullXid;
+   TransactionId next_xid;
+   uint32  epoch;
+
+   /*
+* It is safe to read nextFullXid without a lock, because this is only
+* called from the startup process, meaning that no other process can
+* modify it.
+*/
+   Assert(AmStartupProcess());
+


This assertion fails on WAL replay in single-user mode:

$ bin/postgres --single -D data postgres
2019-03-27 14:32:35.058 EET [32359] LOG:  database system was 
interrupted; last known up at 2019-03-27 14:32:18 EET
2019-03-27 14:32:35.144 EET [32359] LOG:  database system was not 
properly shut down; automatic recovery in progress

2019-03-27 14:32:35.148 EET [32359] LOG:  redo starts at 0/15BB7B0
TRAP: FailedAssertion("!((MyAuxProcType == StartupProcess))", File: 
"varsup.c", Line: 269)

Aborted

- Heikki




Re: BUG #15708: RLS 'using' running as wrong user when called from a view

2019-03-27 Thread Dean Rasheed
On Mon, 25 Mar 2019 at 20:27, Stephen Frost  wrote:
>
> * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
>
> > It looks like the best place to fix it is in
> > get_policies_for_relation(), since that's where all the policies to be
> > applied for a given RTE are pulled together. Patch attached.
>
> Yes, on a quick review, that looks like a good solution to me as well.
>

On second thoughts, it actually needs to be in
get_row_security_policies(), after making copies of the quals from the
policies, otherwise it would be scribbling on the copies from the
relcache. Actually that makes the code change a bit simpler too.

Regards,
Dean
diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c
new file mode 100644
index 835540c..e9f532c
--- a/src/backend/rewrite/rowsecurity.c
+++ b/src/backend/rewrite/rowsecurity.c
@@ -47,6 +47,7 @@
 #include "nodes/pg_list.h"
 #include "nodes/plannodes.h"
 #include "parser/parsetree.h"
+#include "rewrite/rewriteDefine.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "rewrite/rowsecurity.h"
@@ -382,6 +383,13 @@ get_row_security_policies(Query *root, R
 	table_close(rel, NoLock);
 
 	/*
+	 * Copy checkAsUser to the row security quals and WithCheckOption checks,
+	 * in case they contain any subqueries referring to other relations.
+	 */
+	setRuleCheckAsUser((Node *) *securityQuals, rte->checkAsUser);
+	setRuleCheckAsUser((Node *) *withCheckOptions, rte->checkAsUser);
+
+	/*
 	 * Mark this query as having row security, so plancache can invalidate it
 	 * when necessary (eg: role changes)
 	 */
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
new file mode 100644
index 4b53401..d764991
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -3910,6 +3910,33 @@ DROP OWNED BY regress_rls_dob_role1;
 DROP POLICY p1 ON dob_t2; -- should succeed
 DROP USER regress_rls_dob_role1;
 DROP USER regress_rls_dob_role2;
+-- Bug #15708: view + table with RLS should check policies as view owner
+CREATE TABLE ref_tbl (a int);
+INSERT INTO ref_tbl VALUES (1);
+CREATE TABLE rls_tbl (a int);
+INSERT INTO rls_tbl VALUES (10);
+ALTER TABLE rls_tbl ENABLE ROW LEVEL SECURITY;
+CREATE POLICY p1 ON rls_tbl USING (EXISTS (SELECT 1 FROM ref_tbl));
+GRANT SELECT ON ref_tbl TO regress_rls_bob;
+GRANT SELECT ON rls_tbl TO regress_rls_bob;
+CREATE VIEW rls_view AS SELECT * FROM rls_tbl;
+ALTER VIEW rls_view OWNER TO regress_rls_bob;
+GRANT SELECT ON rls_view TO regress_rls_alice;
+SET SESSION AUTHORIZATION regress_rls_alice;
+SELECT * FROM ref_tbl; -- Permission denied
+ERROR:  permission denied for table ref_tbl
+SELECT * FROM rls_tbl; -- Permission denied
+ERROR:  permission denied for table rls_tbl
+SELECT * FROM rls_view; -- OK
+ a  
+
+ 10
+(1 row)
+
+RESET SESSION AUTHORIZATION;
+DROP VIEW rls_view;
+DROP TABLE rls_tbl;
+DROP TABLE ref_tbl;
 --
 -- Clean up objects
 --
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
new file mode 100644
index ea83153..df8fe11
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -1764,6 +1764,32 @@ DROP POLICY p1 ON dob_t2; -- should succ
 DROP USER regress_rls_dob_role1;
 DROP USER regress_rls_dob_role2;
 
+-- Bug #15708: view + table with RLS should check policies as view owner
+CREATE TABLE ref_tbl (a int);
+INSERT INTO ref_tbl VALUES (1);
+
+CREATE TABLE rls_tbl (a int);
+INSERT INTO rls_tbl VALUES (10);
+ALTER TABLE rls_tbl ENABLE ROW LEVEL SECURITY;
+CREATE POLICY p1 ON rls_tbl USING (EXISTS (SELECT 1 FROM ref_tbl));
+
+GRANT SELECT ON ref_tbl TO regress_rls_bob;
+GRANT SELECT ON rls_tbl TO regress_rls_bob;
+
+CREATE VIEW rls_view AS SELECT * FROM rls_tbl;
+ALTER VIEW rls_view OWNER TO regress_rls_bob;
+GRANT SELECT ON rls_view TO regress_rls_alice;
+
+SET SESSION AUTHORIZATION regress_rls_alice;
+SELECT * FROM ref_tbl; -- Permission denied
+SELECT * FROM rls_tbl; -- Permission denied
+SELECT * FROM rls_view; -- OK
+RESET SESSION AUTHORIZATION;
+
+DROP VIEW rls_view;
+DROP TABLE rls_tbl;
+DROP TABLE ref_tbl;
+
 --
 -- Clean up objects
 --


Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-27 Thread Rahila Syed
On Mon, 25 Mar 2019 at 22:23, Alvaro Herrera 
wrote:

> Here's v6 of this patch.  I have rebased on top of today's CLUSTER
> monitoring, as well as on table AM commits.  The latter caused a bit of
> trouble, as now the number of blocks processed by a scan is not as easy
> to get as before; I added a new entry point heapscan_get_blocks_done on
> heapam.c to help with that.  (I suppose this will need some fixups later
> on.)
>
> I removed the "M of N" phase labels that Robert didn't like; those were
> suggested by Rahila and upvoted by Amit L.  I'm of two minds about
> those.  If you care about those and want them back, please speak up.
>
> I see value in reporting those numbers because it gives user insight into
where
we are at in the whole process without having to refer to documentation or
code.
Besides here also we are reporting facts as we follow for other metrics.

I agree that it will be most effective if the phases are carried out in
succession
which is not the case every time and its relevance varies for each command
as mentioned upthread by Alvaro and Robert. But I feel as long as we have in
the documentation that some phases overlap, some are mutually exclusive
hence
may be skipped etc. reporting `phase number versus total phases` does
provide
valuable information.
We are able to give user a whole picture in addition to reporting progress
within phases.

Thank you,
-- 
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade version checking questions

2019-03-27 Thread Christoph Berg
Re: Daniel Gustafsson 2019-03-26 

> 0003 - Make -B default to CWD and remove the exec_path check
> 
> Defaulting to CWD for the new bindir has the side effect that the default
> sockdir is in the bin/ directory which may be less optimal.

Hmm, I would have thought that the default for the new bindir is the
directory where pg_upgrade is located, not the CWD, which is likely to
be ~postgres or the like?

On Debian, the incantation is

/usr/lib/postgresql/12/bin/pg_upgrade \
  -b /usr/lib/postgresql/11/bin \
  -B /usr/lib/postgresql/12/bin<-- should be redundant

Christoph




Re: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-03-27 Thread Amit Kapila
On Wed, Mar 27, 2019 at 6:53 AM Haribabu Kommi  wrote:
> On Tue, Mar 26, 2019 at 9:08 PM Amit Kapila  wrote:
>>
>>   As part of this thread, maybe we can
>> just fix the case of the parallel cooperating transaction.
>
>
> With the current patch, all the parallel implementation transaction are 
> getting
> skipped, in my tests parallel workers are the major factor in the transaction
> stats counter. Even before parallelism, the stats of the autovacuum and etc
> are still present but their contribution is not greatly influencing the stats.
>
> I agree with you in fixing the stats with parallel workers and improve stats.
>

I was proposing to fix only the transaction started with
StartParallelWorkerTransaction by using is_parallel_worker flag as
discussed above.  I understand that it won't completely fix the
problem reported by you, but it will be a step in that direction.  My
main worry is that if we fix it the way you are proposing and we also
invent a new way to deal with all other internal transactions, then
the fix done by us now needs to be changed/reverted.  Note, that this
fix needs to be backpatched as well, so we should avoid doing any fix
which needs to be changed or reverted.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Fix XML handling with DOCTYPE

2019-03-27 Thread Ryan Lambert
Thanks for putting up with a new reviewer!

  --with-libxml is the PostgreSQL configure option to make it use libxml2.
>


>   The very web page http://xmlsoft.org/index.html says "The XML C parser
>   and toolkit of Gnome: libxml" and is all about libxml2.
>


> So I think I was unsure what convention to follow, and threw up my hands
> and went with libxml. I could just as easily throw them up and go with
> libxml2. Which do you think would be better?


I think leaving it as libxml makes sense with all that.  Good point that
--with-libxml is used to build so I think staying with that works and is
consistent.  I agree that having this point included does clarify the how
and why of the limitations of this implementation.

I also over-parenthesize so I'm used to looking for that in my own
writing.  The full sentences were the ones that seemed excessive to me, I
think the others are ok and I won't nit-pick either way on those (unless
you want me to!).

If you agree, I should go through and fix my nodesets to be node-sets.


Yes, I like node-sets better, especially knowing it conforms to the spec's
language.

Thanks,

*Ryan Lambert*


On Tue, Mar 26, 2019 at 11:05 PM Chapman Flack 
wrote:

> On 03/26/19 21:39, Ryan Lambert wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   tested, passed
> > Spec compliant:   not tested
> > Documentation:tested, passed
>
> Thanks for the review!
>
> > I have two recommendations for features.sgml.  You state:
> >
> >>  relies on the libxml library
> >
> > Should this be clarified as the libxml2 library?  That's what I installed
> > to build postgres from source (Ubuntu 16/18).  If it is the libxml
> library
> > and the "2" is irrelevant
>
> That's a good catch. I'm not actually sure whether there is any "libxml"
> library that isn't libxml2. Maybe there was once and nobody admits to
> hanging out with it. Most Google hits on "libxml" seem to be modules
> that have libxml in their names and libxml2 as their actual dependency.
>
>   Perl XML:LibXML: "This module is an interface to libxml2"
>   Haskell libxml: "Binding to libxml2"
>   libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
> for the GNOME Libxml2 ..."
>
>   --with-libxml is the PostgreSQL configure option to make it use libxml2.
>
>   The very web page http://xmlsoft.org/index.html says "The XML C parser
>   and toolkit of Gnome: libxml" and is all about libxml2.
>
> So I think I was unsure what convention to follow, and threw up my hands
> and went with libxml. I could just as easily throw them up and go with
> libxml2. Which do you think would be better?
>
> On 03/26/19 23:52, Tom Lane wrote:
> > Do we need to mention that at all?  If you're not building from source,
> > it doesn't seem very interesting ... but maybe I'm missing some reason
> > why end users would care.
>
> The three places I've mentioned it were the ones where I thought users
> might care:
>
>  - why are we stuck at XPath 1.0? It's what we get from the library we use.
>
>  - in what order do we get things out from a (hmm) node-set? Per XPath 1.0,
>it's indeterminate (it's a set!), unlike XPath 2.0/XQuery where there's
>a sequence type and you have order control. Observable behavior from
>libxml2 (and you could certainly want to know this) is you get things
> out
>in document order, whether that's what you wanted or not, even though
>this is undocumented, and even counter-documented[1], libxml2 behavior.
>So it's an example of something you would fundamentally like to know,
>where the only available answer depends precariously on the library
>we happen to be using.
>
>  - which limits in our implementation are inherent to the library, and
>which are just current limits in our embedding of it? (Maybe this is
>right at the border of what a user would care to know, but I know it's
>a question that crosses my mind when I bonk into a limit I wasn't
>expecting.)
>
> > There are a few places where the parenthesis around a block of text
> > seem unnecessary.
>
> )blush( that's a long-standing wart in my writing ... seems I often think
> in parentheses, then look and say "those aren't needed" and take them out,
> only sometimes I don't.
>
> I skimmed just now and found a few instances of parenthesized whole
> sentence: the one you quoted, and some (if argument is null, the result
> is null), and (No rows will be produced if ). Shall I deparenthesize
> them all? Did you have other instances in mind?
>
> > It seems you are standardizing from "node set" to "nodeset", is that
> > the preferred nomenclature or preference?
>
> Another good catch. I remember consciously making a last pass to get them
> all consistent, and I wanted them consistent with the spec, and I see now
> I messed up.
>
> XPath 1.0 [2] has zero instances of "nodeset", two of "node set" and about

Re: Usage of epoch in txid_current

2019-03-27 Thread Thomas Munro
On Tue, Mar 26, 2019 at 12:58 PM Thomas Munro  wrote:
> On Tue, Mar 26, 2019 at 3:23 AM Heikki Linnakangas  wrote:
> > Looks good.

I did some testing and proof-reading and made a few minor changes:

* I tidied up the code that serialises transaction state.  It was
already hammering round pegs into square holes, and the previous patch
made that even worse, so I added a new struct
SerializedTransactionState to do this properly.

* I open-coded Get{Current,Top}TransactionId[IfAny](), rather than
having them call the "Full" variants, so that nobody could accuse me
of adding an extra function call that might not be inlined.  It's just
a couple of lines anyway.

* I kept the name GetNewTransactionId(), since it's referred to in
many places in comments etc.  Previously I had called it
GetNewFullTransactionId() and had GetNewTransactionId() just call that
and truncate to 32 bits, but there wasn't much point without an
in-tree caller for the narrow version.  If there is any out-of-tree
code calling this, it will now fail to compile thanks to our
non-convertible return type.

These are the patches I'm planning to push tomorrow.

I still need to look into Andres's suggestion about getting rid of
epoch from various user interfaces and showing 64 bit numbers.  I
should probably also find a place in the relevant README to explain
this new scheme.  I will post follow-up patches for those.

-- 
Thomas Munro
https://enterprisedb.com


0001-Add-basic-infrastructure-for-64-bit-transaction-I-v8.patch
Description: Binary data


0002-Use-FullTransactionId-for-the-transaction-stack-v8.patch
Description: Binary data


Re: jsonpath

2019-03-27 Thread Alexander Korotkov
On Tue, Mar 26, 2019 at 6:06 PM Alexander Korotkov
 wrote:
> On Tue, Mar 26, 2019 at 5:32 PM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > Got access to that buildfarm animal thanks to Tom Turelinckx.  Now
> > > running check-world in a loop on the same commit hash with same build
> > > options.  Error wasn't triggered yet.
> >
> > I notice that snapper is using force_parallel_mode = regress ...
> > have you got that enabled in your manual test?
>
> Nope.  Thank you for pointing!  I've rerun my test loop with this...

Still no reproduction.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [Patch] Base backups and random or zero pageheaders

2019-03-27 Thread Michael Banck
Hi,

Am Dienstag, den 26.03.2019, 19:23 +0100 schrieb Michael Banck:
> Am Dienstag, den 26.03.2019, 10:30 -0700 schrieb Andres Freund:
> > On 2019-03-26 18:22:55 +0100, Michael Banck wrote:
> > >   /*
> > > -  * Only check pages which have not been 
> > > modified since the
> > > -  * start of the base backup. Otherwise, they 
> > > might have been
> > > -  * written only halfway and the checksum would 
> > > not be valid.
> > > -  * However, replaying WAL would reinstate the 
> > > correct page in
> > > -  * this case. We also skip completely new 
> > > pages, since they
> > > -  * don't have a checksum yet.
> > > +  * We skip completely new pages after checking 
> > > they are
> > > +  * all-zero, since they don't have a checksum 
> > > yet.
> > >*/
> > > - if (!PageIsNew(page) && PageGetLSN(page) < 
> > > startptr)
> > > + if (PageIsNew(page))
> > >   {
> > > - checksum = pg_checksum_page((char *) 
> > > page, blkno + segmentno * RELSEG_SIZE);
> > > - phdr = (PageHeader) page;
> > > - if (phdr->pd_checksum != checksum)
> > > + all_zeroes = true;
> > > + pagebytes = (size_t *) page;
> > > + for (int i = 0; i < (BLCKSZ / 
> > > sizeof(size_t)); i++)
> > 
> > Can we please abstract the zeroeness check into a separate function to
> > be used both by PageIsVerified() and this?
> 
> Ok, done so as PageIsZero further down in bufpage.c.

It turns out that pg_checksums (current master and back branches, not
just the online version) needs this treatment as well as it won't catch
zeroed-out pageheader corruption, see attached patch to its TAP tests
which trigger it (I also added a random data check similar to
pg_basebackup as well which is not a problem for the current codebase).

Any suggestion on how to handle this? Should I duplicate the
PageIsZero() code in pg_checksums? Should I move PageIsZero into
something like bufpage_impl.h for use by external programs, similar to
pg_checksum_page()?

I've done the latter as a POC in the second attached patch.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 41575c5245..680c26f2d6 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 62;
+use Test::More tests => 78;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -17,6 +17,9 @@ sub check_relation_corruption
 	my $node = shift;
 	my $table = shift;
 	my $tablespace = shift;
+	my $offset = shift;
+	my $corrupted_data = shift;
+	my $description = shift;
 	my $pgdata = $node->data_dir;
 
 	$node->safe_psql('postgres',
@@ -44,8 +47,8 @@ sub check_relation_corruption
 
 	# Time to create some corruption
 	open my $file, '+<', "$pgdata/$file_corrupted";
-	seek($file, $pageheader_size, 0);
-	syswrite($file, "\0\0\0\0\0\0\0\0\0");
+	seek($file, $offset, 0);
+	syswrite($file, $corrupted_data);
 	close $file;
 
 	# Checksum checks on single relfilenode fail
@@ -54,14 +57,14 @@ sub check_relation_corruption
 			  1,
 			  [qr/Bad checksums:.*1/],
 			  [qr/checksum verification failed/],
-			  "fails with corrupted data for single relfilenode on tablespace $tablespace");
+			  "fails with corrupted data for single relfilenode $description");
 
 	# Global checksum checks fail as well
 	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 			  1,
 			  [qr/Bad checksums:.*1/],
 			  [qr/checksum verification failed/],
-			  "fails with corrupted data on tablespace $tablespace");
+			  "fails with corrupted data on $description");
 
 	# Drop corrupted table again and make sure there is no more corruption.
 	$node->start;
@@ -155,8 +158,12 @@ $node->start;
 command_fails(['pg_checksums', '--check', '-D', $pgdata],
 			  "fails with online cluster");
 
+# Set page header and block size
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
 # Check corruption of table on default tablespace.
-check_relation

Re: Planning counters in pg_stat_statements (using pgss_store)

2019-03-27 Thread Julien Rouhaud
On Wed, Mar 27, 2019 at 12:21 AM legrand legrand
 wrote:
>
> here is a new version:
>
> - "track_planning" GUC added
> to permit to keep previous behavior unchanged

good

> - trailing whitespaces and comments wider than 80 characters
>  not fixed

why?  In case it's not clear, I'm talking about the .c file, not the
regression tests.

> - "Assert(planning_time > 0 && total_time > 0);"
>  moved at the beginning of pgss_store

Have you tried to actually compile postgres and pg_stat_statements
with --enable-cassert?  This test can *never* be true, since you
either provide the planning time or the execution time or neither.  As
I said in my previous mail, adding a parameter to say which counter
you're updating, instead of adding another counter that's mutually
exclusive with the other would make everything clearer.




Re: amcheck verification for GiST

2019-03-27 Thread Andrey Borodin
Hi!

Here's new version of GiST amcheck, which takes into account recently committed 
GiST VACUUM.
It tests that deleted pages do not contain any data.

Also, Heikki's fix applied (wrong OffsetNumberNext(i) replaced by 
OffsetNumberNext(o)).

Thanks!

Best regards, Andrey Borodin.


0001-GiST-verification-function-for-amcheck-v6.patch
Description: Binary data


Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

2019-03-27 Thread Fabien COELHO



Hello Heikki,

I stared at the new test case for a while, and I must say it looks very 
cryptic. It's not exactly this patch's fault - all the pgbench tests are 
cryptic -


Perl is cryptic. Regexprs are cryptic.

but I think we need to do something about that before adding any more 
tests. I'm not sure what exactly, but I'd like them to be more like 
pg_regress tests, where you have an expected output and you compare it 
with the actual output. I realize that's not easy, because there are a 
lot of varying numbers in the output, but we've got to do something.


As a good first step, I wish the pgbench() function used named 
arguments. [...]


You would have something like this:

my $elapsed = pgbench(
 test_name => 'pgbench progress',
 opts => '-T 2 -P 1 -l --aggregate-interval=1'


I do not like them much in perl because it changes the code significantly, 
but why not. That would be another patch anyway.


A lighter but efficient option would be to add a few comments on the 
larger calls, see attached.


My other complaint about the new test is that it does nothing to check 
if the output looks sensible. That's even harder to test, so it's 
probably not worth the trouble to try. But as it is, how much value does 
the test really have?


I do not understand. The list of expected regexpr on stdout and stderr 
*are* the checks out the outputs, and there quite a few plenty of them?


It would fail, if --progress caused pgbench to crash, or if no progress 
reports were printed at all, but I can't get very excited about that.


I cannot help you on this one: tests are never exciting:-)

The point is to improve code coverage, inducing that if a changes breaks 
something the test should help notice before field reports.


There is nothing "exciting" about that, indeed.

The current test status is that one could write a dozen abort() in the 
code and it would pass the tests.


Current tests coverage is much too low all over the project, see 
https://coverage.postgresql.org/ which looks abysmal to me. I would to put 
pgbench in the green. I do think that the whole project should be in the 
green when all tests are run. Although coverage is not everything there is 
about testing, it is a good start.


--
Fabien.




Re: Log a sample of transactions

2019-03-27 Thread Adrien NAYRAT

Hello,

On 3/27/19 7:05 AM, Masahiko Sawada wrote:

Why are both of these checked? ISTM once xact_is_sampled is set, we
ought not to respect a different value of log_xact_sample_rate for the
rest of that transaction.

I added theses checks to allow to disable logging during a sampled
transaction, per suggestion of Masahiko Sawada:
https://www.postgresql.org/message-id/CAD21AoD4t%2BhTV6XfK5Yz%3DEocB8oMyJSYFfjAryCDYtqfib2GrA%40mail.gmail.com

I added a comment to explain why transaction logging is rechecked.

Umm, I'm inclined to think that we should not disable logging within
the transaction. If we allow that, since users can disable the logging
within the transaction by setting it to 0 they may think that we can
change the rate during the transaction, which is wrong. If we want
this behavior we should document it but we can make user not being
able to change the rate during the transaction to avoid confusion. And
the latter looks more understandable and straightforward. This is
different comment what I did before, I'm sorry for confusing you.


Don't worry, I will change that.





As far as I can tell xact_is_sampled is not properly reset in case of
errors?


I am not sure if I should disable logging in case of errors. Actually we
have:

postgres=# set log_transaction_sample_rate to 1;
SET
postgres=# set client_min_messages to 'LOG';
LOG:  duration: 0.392 ms  statement: set client_min_messages to 'LOG';
SET
postgres=# begin;
LOG:  duration: 0.345 ms  statement: begin;
BEGIN
postgres=# select 1;
LOG:  duration: 0.479 ms  statement: select 1;
   ?column?
--
  1
(1 row)

postgres=# select * from missingtable;
ERROR:  relation "missingtable" does not exist
LINE 1: select * from missingtable;
^
postgres=# select 1;
ERROR:  current transaction is aborted, commands ignored until end of
transaction block
postgres=# rollback;
LOG:  duration: 11390.295 ms  statement: rollback;
ROLLBACK

If I reset xact_is_sampled (after the first error inside
AbortTransaction if I am right), "rollback" statement will not be
logged. I wonder if this "statement" should be logged?

If the answer is no, I will reset xact_is_sampled in AbortTransaction.


FWIW, I'd prefer to log "rollback" as well so as user can recognize
the end of transaction.


Ok.



+   /* Determine if statements are logged in this transaction */
+   xact_is_sampled = random() <= log_xact_sample_rate * MAX_RANDOM_VALUE;

If log_xact_sample_rate is 1 we always log all statement in the
transaction regardless of value of random(). Similarly if it's 0, we
never log. I think we can avoid unnecessary random() call in both case
like log_statement_sample_rate does.


I agree, I will change that.



 {"log_statement_sample_rate", PGC_SUSET, LOGGING_WHEN,
 gettext_noop("Fraction of statements over
log_min_duration_statement to log."),
 gettext_noop("If you only want a sample, use a
value between 0 (never "
-"log) and 1.0 (always log).")
+"log) and 1 (always log).")
 },
 &log_statement_sample_rate,
 1.0, 0.0, 1.0,
 NULL, NULL, NULL
 },

This change is not relevant with log_transaction_sample_rate feature
but why do we need this change? In postgresql.conf.sample the
description of both log_statement_sample_rate and
log_transaction_sample_rate use 1.0 instead of 1, like "1.0 logs all
statements from all transactions, 0 never logs". If we need this
change I think it should be a separate patch.



Sorry, I changed that, someone suggest using either "0" and "1", or 
"0.0" and "1.0" but not mixing both. I will remove this change.


Thanks for your review.





Re: txid_status() off-by-one error

2019-03-27 Thread Thomas Munro
On Wed, Mar 27, 2019 at 7:55 PM Thomas Munro  wrote:
> If you keep asking for txid_status(txid_current() + 1) in new
> transactions, you eventually hit:
>
> ERROR:  could not access status of transaction 32768
> DETAIL:  Could not read from file "pg_xact/" at offset 8192: No error: 0.

> -   || (xid_epoch == now_epoch && xid > now_epoch_last_xid))
> +   || (xid_epoch == now_epoch && xid >= now_epoch_last_xid))

Pushed and back-patched, along with renaming of that variable, s/last/next/.

-- 
Thomas Munro
https://enterprisedb.com




Re: Protect syscache from bloating with negative cache entries

2019-03-27 Thread Kyotaro HORIGUCHI
At Mon, 25 Mar 2019 09:28:57 -0400, Robert Haas  wrote 
in 
> On Thu, Mar 7, 2019 at 11:40 PM Ideriha, Takeshi
>  wrote:
> > Just to be sure, we introduced the LRU list in this thread to find the 
> > entries less than threshold time
> > without scanning the whole hash table. If hash table becomes large without 
> > LRU list, scanning time becomes slow.
> 
> Hmm.  So, it's a trade-off, right?  One option is to have an LRU list,
> which imposes a small overhead on every syscache or catcache operation
> to maintain the LRU ordering.  The other option is to have no LRU
> list, which imposes a larger overhead every time we clean up the
> syscaches.  My bias is toward thinking that the latter is better,
> because:
> 
> 1. Not everybody is going to use this feature, and
> 
> 2. Syscache cleanup should be something that only happens every so
> many minutes, and probably while the backend is otherwise idle,
> whereas lookups can happen many times per millisecond.
> 
> However, perhaps someone will provide some evidence that casts a
> different light on the situation.

It's closer to my feeling. When cache is enlarged, all entries
are copied into new twice-in-size hash. If some entries removed,
we don't need to duplicate the whole hash, otherwise it means
that we do extra scan. We don't the pruning scan not frequently
than the interval so it is not a bad bid.

> I don't see much point in continuing to review this patch at this
> point.  There's been no new version of the patch in 3 weeks, and there
> is -- in my view at least -- a rather frustrating lack of evidence
> that the complexity this patch introduces is actually beneficial.  No
> matter how many people +1 the idea of making this more complicated, it
> can't be justified unless you can provide a test result showing that
> the additional complexity solves a problem that does not get solved
> without that complexity.  And even then, who is going to commit a
> patch that uses a design which Tom Lane says was tried before and
> stunk?

Hmm. Anyway it is hit by recent commit. I'll post a rebased
version and a version reverted to do hole-scan. Then I'll take
numbers as far as I can and will show the result.. tomorrow.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Ordered Partitioned Table Scans

2019-03-27 Thread Amit Langote
On 2019/03/27 15:48, Amit Langote wrote:
> Hi David,
> 
> Sorry if this was discussed before, but why does this patch add any new
> code to partprune.c?  AFAICT, there's no functionality changes to the
> pruning code.
> 
> Both
> 
> +bool
> +partkey_is_bool_constant_for_query(RelOptInfo *partrel, int partkeycol)
> 
> and
> 
> +static bool
> +matches_boolean_partition_clause(RestrictInfo *rinfo, int partkeycol,
> + RelOptInfo *partrel)
> 
> seem like their logic is specialized enough to be confined to pathkeys.c,
> only because it's needed there.
> 
> 
> Regarding
> 
> +bool
> +partitions_are_ordered(PlannerInfo *root, RelOptInfo *partrel)
> 
> I think this could simply be:
> 
> bool
> partitions_are_ordered(PartitionBoundInfo *boundinfo)
> 
> and be defined in partitioning/partbounds.c.  If you think any future
> modifications to this will require access to the partition key info in
> PartitionScheme, maybe the following is fine:
> 
> bool
> partitions_are_ordered(RelOptInfo *partrel)

Noticed a typo.

+ * multiple subpaths then we can't make guarantees about the
+ * order tuples in those subpaths, so we must leave the

order of tuples?

Again, sorry if this was discussed, but I got curious about why
partitions_are_ordered() thinks it can say true even for an otherwise
ordered list-partitioned table, but containing a null-only partition,
which is *always* scanned last.  If a query says ORDER BY partkey NULLS
FIRST, then it's not alright to proceed with assuming partitions are
ordered even if partitions_are_ordered() said so.

Related, why does build_partition_pathkeys() pass what it does for
nulls_first parameter of make_pathkey_from_sortinfo()?

 cpathkey = make_pathkey_from_sortinfo(root,
   keyCol,
   NULL,
   partscheme->partopfamily[i],
   partscheme->partopcintype[i],
   partscheme->partcollation[i],
   ScanDirectionIsBackward(scandir),
 ==>   ScanDirectionIsBackward(scandir),
   0,
   partrel->relids,
   false);

I think null values are almost always placed in the last partition, unless
the null-accepting list partition also accepts some other non-null value.
I'm not sure exactly how we can determine the correct value to pass here,
but what's there in the patch now doesn't seem to be it.

Thanks,
Amit





  1   2   >