Re: Berserk Autovacuum (let's save next Mandrill)

2019-09-12 Thread Masahiko Sawada
On Tue, Sep 10, 2019 at 8:19 PM Amit Kapila  wrote:
>
> On Tue, Jul 23, 2019 at 1:53 PM Masahiko Sawada  wrote:
> >
> >
> > To invoke autovacuum even on insert-only tables we would need check
> > the number of inserted tuples since last vacuum. I think we can keep
> > track of the number of inserted tuples since last vacuum to the stats
> > collector and add the threshold to invoke vacuum with INDEX_CLEANUP =
> > false. If an autovacuum worker confirms that the number of inserted
> > tuples exceeds the threshold it invokes vacuum with INDEX_CLEANUP =
> > false. However if the number of dead tuples also exceeds the
> > autovacuum thresholds (autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor) it should invoke vacuum with
> > INDEX_CLEANUP = true. Therefore new threshold makes sense only when
> > it's lower than the autovacuum thresholds.
> >
> > I guess we can have one new GUC parameter to control scale factor.
> > Since only relatively large tables will require this feature we might
> > not need the threshold based the number of tuples.
> >
>
> Generally speaking, having more guc's for autovacuum and that too
> which are in some way dependent on existing guc's sounds bit scary,
> but OTOH whatever you wrote makes sense and can help the scenarios
> which this thread is trying to deal with.   Have you given any thought
> to what Alvaro mentioned up-thread "certain tables would have some
> sort of partial scan that sets the visibility map.  There's no reason
> to invoke the whole vacuuming machinery.  I don't think this is
> limited to append-only tables, but
> rather those are just the ones that are affected the most."?
>

Speaking of partial scan I've considered before that we could use WAL
to find which pages have garbage much and not all-visible pages. We
can vacuum only a particular part of table that is most effective of
garbage collection instead of whole tables. I've shared some results
of that at PGCon and it's still in PoC state.

Also, to address the issue of updating VM of mostly-append-only tables
I considered some possible solutions:

1. Using INDEX_CLEANUP = false and TRUNCATE = false vacuum does hot
pruning, vacuuming table and updating VM. In addition to updating VM
we need to do other two operations but since the mostly-insert-only
tables would have less garbage the hot pruning and vacuuming table
should be light workload. This is what I proposed on up-thread.
2. This may have already been discussed before but we could update
VM when hot pruning during SELECT operation. Since this affects SELECT
performance it should be enabled on only particular tables by user
request.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Michael Paquier
On Fri, Sep 13, 2019 at 09:59:40AM +0530, Amit Kapila wrote:
> I think that is what we have not done in one of the cases pointed by me.

Thinking more about it, I see your point now.  HEAP_LOCKED_UPGRADED is
not a direct combination of the other flags and depends on other
conditions, so we cannot make a combination of it with other things.
The three others don't have that problem.

Attached is a patch to fix your suggestions.  This also removes the
use of HEAP_XMAX_IS_LOCKED_ONLY which did not make completely sense
either as a "raw" flag.  While on it, the order of the flags can be
improved to match more the order of htup_details.h

Does this patch address your concerns?
--
Michael
diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 6a09d46a57..76f02dbea2 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -91,7 +91,7 @@ FROM heap_page_items(get_raw_page('test1', 0)),
  LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2) m(flags);
  t_infomask | t_infomask2 |   flags   
 +-+---
-   2816 |   2 | {HEAP_XMAX_INVALID,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID}
+   2816 |   2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
 (1 row)
 
 -- output the decoded flag HEAP_XMIN_FROZEN instead
@@ -100,7 +100,7 @@ FROM heap_page_items(get_raw_page('test1', 0)),
  LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2, true) m(flags);
  t_infomask | t_infomask2 |flags 
 +-+--
-   2816 |   2 | {HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
+   2816 |   2 | {HEAP_XMIN_FROZEN,HEAP_XMAX_INVALID}
 (1 row)
 
 -- tests for decoding of combined flags
@@ -140,20 +140,7 @@ SELECT heap_tuple_infomask_flags(x'C000'::int, 0, true);
 SELECT heap_tuple_infomask_flags(x'C000'::int, 0, false);
heap_tuple_infomask_flags
 
- {HEAP_MOVED_IN,HEAP_MOVED_OFF}
-(1 row)
-
--- HEAP_LOCKED_UPGRADED = (HEAP_XMAX_IS_MULTI | HEAP_XMAX_LOCK_ONLY)
-SELECT heap_tuple_infomask_flags(x'1080'::int, 0, true);
- heap_tuple_infomask_flags 

- {HEAP_LOCKED_UPGRADED}
-(1 row)
-
-SELECT heap_tuple_infomask_flags(x'1080'::int, 0, false);
-heap_tuple_infomask_flags 
---
- {HEAP_XMAX_LOCK_ONLY,HEAP_XMAX_IS_MULTI}
+ {HEAP_MOVED_OFF,HEAP_MOVED_IN}
 (1 row)
 
 -- test all flags of t_infomask and t_infomask2
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 68f16cd400..c696d7d6d1 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -540,12 +540,8 @@ heap_tuple_infomask_flags(PG_FUNCTION_ARGS)
 		d[cnt++] = CStringGetTextDatum("HEAP_HASOID_OLD");
 	if ((t_infomask & HEAP_COMBOCID) != 0)
 		d[cnt++] = CStringGetTextDatum("HEAP_COMBOCID");
-	if ((t_infomask & HEAP_XMAX_COMMITTED) != 0)
-		d[cnt++] = CStringGetTextDatum("HEAP_XMAX_COMMITTED");
-	if ((t_infomask & HEAP_XMAX_INVALID) != 0)
-		d[cnt++] = CStringGetTextDatum("HEAP_XMAX_INVALID");
-	if ((t_infomask & HEAP_UPDATED) != 0)
-		d[cnt++] = CStringGetTextDatum("HEAP_UPDATED");
+	if ((t_infomask & HEAP_XMAX_LOCK_ONLY) != 0)
+		d[cnt++] = CStringGetTextDatum("HEAP_XMAX_LOCK_ONLY");
 
 	/* decode combined masks of t_infomaks */
 	if (decode_combined && (t_infomask & HEAP_XMAX_SHR_LOCK) != 0)
@@ -568,24 +564,23 @@ heap_tuple_infomask_flags(PG_FUNCTION_ARGS)
 			d[cnt++] = CStringGetTextDatum("HEAP_XMIN_INVALID");
 	}
 
+	if ((t_infomask & HEAP_XMAX_COMMITTED) != 0)
+		d[cnt++] = CStringGetTextDatum("HEAP_XMAX_COMMITTED");
+	if ((t_infomask & HEAP_XMAX_INVALID) != 0)
+		d[cnt++] = CStringGetTextDatum("HEAP_XMAX_INVALID");
+	if ((t_infomask & HEAP_XMAX_IS_MULTI) != 0)
+		d[cnt++] = CStringGetTextDatum("HEAP_XMAX_IS_MULTI");
+	if ((t_infomask & HEAP_UPDATED) != 0)
+		d[cnt++] = CStringGetTextDatum("HEAP_UPDATED");
+
 	if (decode_combined && (t_infomask & HEAP_MOVED) != 0)
 		d[cnt++] = CStringGetTextDatum("HEAP_MOVED");
 	else
 	{
-		if ((t_infomask & HEAP_MOVED_IN) != 0)
-			d[cnt++] = CStringGetTextDatum("HEAP_MOVED_IN");
 		if ((t_infomask & HEAP_MOVED_OFF) != 0)
 			d[cnt++] = CStringGetTextDatum("HEAP_MOVED_OFF");
-	}
-
-	if (decode_combined && HEAP_LOCKED_UPGRADED(t_infomask))
-		d[cnt++] = CStringGetTextDatum("HEAP_LOCKED_UPGRADED");
-	else
-	{
-		if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
-			d[cnt++] = CStringGetTextDatum("HEAP_XMAX_LOCK_ONLY");
-		if ((t_infomask & HEAP_XMAX_IS_MULTI) != 0)
-			d[cnt++] = CStringGetTextDatum("HEAP_XMAX_IS_MULTI");
+		if ((t_infomask & HEAP_MOVED_IN) != 0)
+			d[cnt++] = CStringGetTextDatum("HEAP_MOVED_IN");
 	}
 
 	/* decode t_infomask2 */
diff --git a/contrib/pageinspect/sql/page.sql b/contrib/pageinspect/sql/page.sql
index 

Re: Create collation reporting the ICU locale display name

2019-09-12 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Sep 12, 2019 at 03:03:43PM -0400, Tom Lane wrote:
>> The idea of having CREATE COLLATION automatically create a comment
>> is sort of interesting, although it seems pretty orthogonal to
>> normal command behavior.  I wonder whether the seeming need for
>> this indicates that we should add a descriptive field to pg_collation
>> proper, and not usurp the user-oriented comment feature for that.
>> 
>> The difficulty with localization is that whatever we put into
>> template1 has got to be ASCII-only, so that the template DB
>> can be copied to other encodings.  I suppose we could consider
>> having CREATE COLLATION act differently during initdb than
>> later, but that seems ugly too.

> Or could it make sense to provide a system function which returns a
> collation description for at least an ICU-provided one?  We could make
> use of that in psql for example.

Oh, that seems like a good way to tackle it.

regards, tom lane




Re: refactoring - share str2*int64 functions

2019-09-12 Thread Michael Paquier
On Tue, Sep 10, 2019 at 12:05:25PM +0900, Michael Paquier wrote:
> Attached is an updated patch?  How does it look?  I have left the
> parts of readfuncs.c for now as there are more issues behind that than
> doing a single switch, short reads are one, long reads a second.  And
> the patch already does a lot.  There could be also an argument for
> having extra _check wrappers for the unsigned portions but these would
> be mostly unused in the backend code, so I have left that out on
> purpose.

I have looked at this patch again today after letting it aside a
couple of days, and I quite like the resulting shape of the routines.
Does anybody else have any comments?  Would it make sense to extend
more the string-to-int conversion routines with a set of control flags
to bypass the removal of leading and trailing whitespaces?
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Amit Kapila
On Fri, Sep 13, 2019 at 9:00 AM Michael Paquier  wrote:
>
> On Thu, Sep 12, 2019 at 05:24:17PM +0530, Amit Kapila wrote:
> > On Thu, Sep 12, 2019 at 4:48 PM Michael Paquier  wrote:
> > Hmm, I thought when decode_combined flag is set to false means we will
> > display the raw flags set on the tuple without any further
> > interpretation.  I think that is what is most people in thread
> > advocated about.
>
> Sorry if I created any confusion.  When set to false then the raw list
> of flags is returned, and that's the default.
>

I think that is what we have not done in one of the cases pointed by me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Berserk Autovacuum (let's save next Mandrill)

2019-09-12 Thread Amit Kapila
On Fri, Sep 13, 2019 at 8:52 AM Masahiko Sawada  wrote:
> On Tue, Sep 10, 2019 at 8:19 PM Amit Kapila  wrote:
> >
> >
> > Generally speaking, having more guc's for autovacuum and that too
> > which are in some way dependent on existing guc's sounds bit scary,
> > but OTOH whatever you wrote makes sense and can help the scenarios
> > which this thread is trying to deal with.   Have you given any thought
> > to what Alvaro mentioned up-thread "certain tables would have some
> > sort of partial scan that sets the visibility map.  There's no reason
> > to invoke the whole vacuuming machinery.  I don't think this is
> > limited to append-only tables, but
> > rather those are just the ones that are affected the most."?
> >
>
> Speaking of partial scan I've considered before that we could use WAL
> to find which pages have garbage much and not all-visible pages. We
> can vacuum only a particular part of table that is most effective of
> garbage collection instead of whole tables. I've shared some results
> of that at PGCon and it's still in PoC state.
>
> Also, to address the issue of updating VM of mostly-append-only tables
> I considered some possible solutions:
>
> 1. Using INDEX_CLEANUP = false and TRUNCATE = false vacuum does hot
> pruning, vacuuming table and updating VM. In addition to updating VM
> we need to do other two operations but since the mostly-insert-only
> tables would have less garbage the hot pruning and vacuuming table
> should be light workload. This is what I proposed on up-thread.
>

Yes, this is an option, but it might be better if we can somehow avoid
triggering the vacuum machinery.

> 2. This may have already been discussed before but we could update
> VM when hot pruning during SELECT operation. Since this affects SELECT
> performance it should be enabled on only particular tables by user
> request.
>

Yeah, doing anything additional in SELECT's can be tricky and think of
a case where actually there is nothing to prune on-page, in that case
also if we run the visibility checks and then mark the visibility map,
then it can be a noticeable overhead.  OTOH, I think this will be a
one-time overhead because after the first scan the visibility map will
be updated and future scans don't need to update visibility map unless
someone has updated that page.  I was wondering why not do this during
write workloads.  For example, when Insert operation finds that there
is no space in the current page and it has to move to next page, it
can check if the page (that doesn't have space to accommodate current
tuple) can be marked all-visible.  In this case, we would have already
done the costly part of an operation which is to Read/Lock the buffer.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pgbench - allow to create partitioned tables

2019-09-12 Thread Dilip Kumar
On Wed, Sep 11, 2019 at 6:08 PM Fabien COELHO  wrote:
> Attached an updated version.
I have reviewed the patch and done some basic testing.  It works as
per the expectation

I have a few cosmetic comments

1.
+ if (partitions >= 1)
+ {
+ char ff[64];
+ ff[0] = '\0';
+ append_fillfactor(ff, sizeof(ff));

Generally, we give one blank line between the variable declaration and
the first statement of the block.

2.
+ if (p == 1)
+ sprintf(minvalue, "minvalue");
+ else
+ sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);

(p-1) -> (p - 1)

I am just wondering will it be a good idea to expand it to support
multi-level partitioning?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Michael Paquier
On Thu, Sep 12, 2019 at 05:24:17PM +0530, Amit Kapila wrote:
> On Thu, Sep 12, 2019 at 4:48 PM Michael Paquier  wrote:
> Hmm, I thought when decode_combined flag is set to false means we will
> display the raw flags set on the tuple without any further
> interpretation.  I think that is what is most people in thread
> advocated about.

Sorry if I created any confusion.  When set to false then the raw list
of flags is returned, and that's the default.  The example provided in
the docs is careful about that, as well as the description done for
the option (at least I guess so!).

> Yes, I think we could have more discussion on this point.  It is not
> 100% clear how we should interpret this flag and or where to draw a
> line.  It might be that whatever we have done is alright, but still,
> it is worth more discussion and opinion from a few more people.

Of course.

>> decode_combined sounds like a good compromise to me.  If there is a
>> better consensus, well, let's use it, but I don't find those
>> suggestions to be improvements.
> 
> I think it depends on the meaning of that flag.

Perhaps using "decode" is the confusing part here?  It is more like a
"merge" of the flags, or just a combination of them.  An idea that
just popped here would be to name the switch "combine_flags" instead.
--
Michael


signature.asc
Description: PGP signature


Re: Create collation reporting the ICU locale display name

2019-09-12 Thread Michael Paquier
On Thu, Sep 12, 2019 at 03:03:43PM -0400, Tom Lane wrote:
> The idea of having CREATE COLLATION automatically create a comment
> is sort of interesting, although it seems pretty orthogonal to
> normal command behavior.  I wonder whether the seeming need for
> this indicates that we should add a descriptive field to pg_collation
> proper, and not usurp the user-oriented comment feature for that.
> 
> The difficulty with localization is that whatever we put into
> template1 has got to be ASCII-only, so that the template DB
> can be copied to other encodings.  I suppose we could consider
> having CREATE COLLATION act differently during initdb than
> later, but that seems ugly too.

Or could it make sense to provide a system function which returns a
collation description for at least an ICU-provided one?  We could make
use of that in psql for example.
--
Michael


signature.asc
Description: PGP signature


Re: psql - improve test coverage from 41% to 88%

2019-09-12 Thread Michael Paquier
On Thu, Sep 12, 2019 at 12:14:16PM -0300, Alvaro Herrera wrote:
> Mostly, because I think they're going to cause trouble.  Adding a
> parameter in the middle of the list may cause trouble for third-party
> users of TestLib.  I propose that we make the routines a bit smarter to
> cope with the API change: use named parameters instead.  And in order to
> do that without having to change existing users of command_check, make
> it so that the routine checks whether the parameter is a hashref, and
> behave differently.  So when called as in the existing callsites (five
> scalar parameters) it behaves as currently.

+1.
--
Michael


signature.asc
Description: PGP signature


[bug fix??] Fishy code in tts_cirtual_copyslot()

2019-09-12 Thread Tsunakawa, Takayuki
Hello,

In the following code in execTuples.c, shouldn' srcdesc point to the source 
slot's tuple descriptor?  The attached fix passes make check.  What kind of 
failure could this cause?

BTW, I thought that in PostgreSQL coding convention, local variables should be 
defined at the top of blocks, but this function writes "for (int natts;".  I 
didn't modify it because many other source files also write in that way.


--
static void
tts_virtual_copyslot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
{
TupleDesc   srcdesc = dstslot->tts_tupleDescriptor;

Assert(srcdesc->natts <= dstslot->tts_tupleDescriptor->natts);

tts_virtual_clear(dstslot);

slot_getallattrs(srcslot);

for (int natt = 0; natt < srcdesc->natts; natt++)
{
dstslot->tts_values[natt] = srcslot->tts_values[natt];
dstslot->tts_isnull[natt] = srcslot->tts_isnull[natt];
}
--


Regards
Takayuki Tsunakawa



virtslot_tiny_fix.patch
Description: virtslot_tiny_fix.patch


Re: Leakproofness of texteq()/textne()

2019-09-12 Thread Tom Lane
Robert Haas  writes:
> I wouldn't feel comfortable with ignoring information leaks that can
> happen with some valid strings but not others. That sounds like
> exactly the sort of information leak that we must prevent. The user
> can write arbitrary stuff in their query, potentially transforming
> strings so that the result hits the ERROR iff the original string had
> some arbitrary property P for which they wish to test.

Agreed.

> However, I wonder if there's any realistic case outside of an encoding
> conversion where such failures can occur. I would expect, perhaps
> naively, that the set of characters that can be represented by UTF-16
> is the same set as can be represented by UTF-8.

Unless Microsoft did something weird, that doesn't seem very likely.
I'm more worried about the possibility that ICU contains weird exception
cases that will make it fail to compare specific strings.  Noting
that ucnv_toUChars has an error indicator but ucol_strcoll does not,
it seems like again any such cases are going to boil down to
encoding conversion problems.

However, if there is some character C that makes ICU misbehave like
that, we are going to have problems with indexing strings containing C,
whether we think varstr_cmp is leaky or not.  So I'm not sure that
focusing our attention on leakiness is a helpful way to proceed.

regards, tom lane




Re: Leakproofness of texteq()/textne()

2019-09-12 Thread Robert Haas
On Thu, Sep 12, 2019 at 1:38 PM Tom Lane  wrote:
> In any case, from a purely theoretical viewpoint, such an error message
> *does* constitute a leak of information about the input strings.  Whether
> it's a usable leak is very debatable, but that's basically what we've
> got to decide.

I'm pretty content to ignore information leaks that can only happen if
the database is corrupt anyway.  If that's moving the goalposts at
all, it's about a quarter-inch.  I mean, a slightly differently
corrupted varlena would could crash the database entirely.

I wouldn't feel comfortable with ignoring information leaks that can
happen with some valid strings but not others. That sounds like
exactly the sort of information leak that we must prevent. The user
can write arbitrary stuff in their query, potentially transforming
strings so that the result hits the ERROR iff the original string had
some arbitrary property P for which they wish to test. Allowing that
sounds no different than deciding that int4div is leakproof, which it
sure isn't.

However, I wonder if there's any realistic case outside of an encoding
conversion where such failures can occur. I would expect, perhaps
naively, that the set of characters that can be represented by UTF-16
is the same set as can be represented by UTF-8.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




abort-time portal cleanup

2019-09-12 Thread Robert Haas
I've been studying At{Sub,}{Abort,Cleanup}_Portals() for the last few
days and have come to the conclusion that the code is not entirely up
to our usual standards. I believe that a good deal of the reason for
this is attributable to the poor quality of the code comments in this
area, although there are perhaps some other contributing factors as
well, such as bullheadedness on my part and perhaps others.

The trouble starts with the header comment for AtAbort_Portals, which
states that "At this point we run the cleanup hook if present, but we
can't release the portal's memory until the cleanup call." At the time
this logic was introduced in commit
de28dc9a04c4df5d711815b7a518501b43535a26 (2003-05-02),
AtAbort_Portals() affected all non-held portals without caring whether
they were active or not, and, UserAbortTransactionBlock() called
AbortTransaction() directly, so typing "ROLLBACK;" would cause
AtAbort_Portals() to be reached from within PortalRun().  Even if
PortalRun() managed to return without crashing, the caller would next
try to call PortalDrop() on what was now an invalid pointer. However,
commit 8f9f1986034a2273e09ad10671e10d1adda21d1f (2004-09-16) changed
things so that UserAbortEndTransaction() just sets things up so that
the subsequent call to CommitTransactionCommand() would call
AbortTransaction() instead of trying to do it right away, and that
doesn't happen until after we're done with the portal.  As far as I
can see, that change made this comment mostly false, but the comment
has nevertheless managed to survive for another ~15 years. I think we
can, and in fact should, just drop the portal right here.

As far as actually making that work, there are a few wrinkles. The
first is that we might be in the middle of FATAL. In that case, unlike
the ROLLBACK case, a call to PortalRun() is still on the stack, but
we'll exit the process rather than returning, so the fact that we've
created a dangling pointer for the caller won't matter. However, as
shown by commit ad9a274778d2d88c46b90309212b92ee7fdf9afe (2018-02-01)
and the report that led up to it at
https://www.postgresql.org/message-id/20180128034531.h6o4w3727ifof3jy%40alap3.anarazel.de
it's not a good idea to try to clean up the portal in that case,
because we might've already started shutting down critical systems.
It seems not only risky but also unnecessary: our process-local state
is about to go away, and the executor shouldn't need to clean up any
shared memory state that won't also get cleaned up by some other
mechanism. So, it seems to me that if we reach AtAbort_Portals()
during FATAL processing, we should either (1) do nothing at all and
just return or (2) forget about all the existing portals without
cleaning them up and then return.  The second option seems a little
safer to me, because it guarantees that if we somehow reach code that
might try to look up a portal later, it won't find anything. But I
think it's arguable.

The second wrinkle is that there might be an active portal.  Apart
from the FATAL case already mentioned, I think the only way this can
happen is some C code that calls purposefully calls AbortTransaction()
in the middle of executing a command.  It can't be an ERROR, because
then the portal would be marked as failed before we get here, and it
can't be an explicit ROLLBACK, because as noted above, that case was
changed 15 years ago.  It's got to be some other case where C code
calls AbortTransaction() voluntarily in the middle of a statement. For
over a decade, there were no cases at all of this type, but the code
in this function catered to hypothetical cases by marking the portal
failed. By 2016, Noah had figured out that this was bogus, and that
any future cases would likely require different handling, but Tom and
I shouted him down:

http://postgr.es/m/67674.1454259...@sss.pgh.pa.us

The end result of that discussion was commit
41baee7a9312eefb315b6b2973ac058c9efaa9cf (2016-02-05) which left the
code as it was but added comments nothing that it was wrong. It
actually wasn't entirely wrong, because it handled the FATAL case
mentioned above by the byzantine mechanism of invoking the portal's
cleanup callback after first setting the status to PORTAL_FAILED.
Since the only existing cleanup callback arranges to do nothing if the
status is PORTAL_FAILED, this worked out to a complicated way of
(correctly) skipping callback in the FATAL case.

But, probably because that wasn't documented in any understandable
way, possibly because nobody really understood it, when commit
8561e4840c81f7e345be2df170839846814fa004 (2018-01-22) added support
for transaction control in procedures, it just removed the code
marking active portals as failed, just as Noah had predicted would be
necessary ~2 years earlier. Andres noticed that this broke the FATAL
case and tracked it back to the removal of this code, resulting it it
getting put back, but just for the FATAL case, in commit
ad9a274778d2d88c46b90309212b92ee7fdf9afe (2018-02-01). See 

Re: Fix XML handling with DOCTYPE

2019-09-12 Thread Alvaro Herrera
Hi Chapman,

On 2019-Sep-05, Chapman Flack wrote:

> Are these on your radar to maybe backpatch in this round of activity?
> 
> The latest patches I did for 11 and 10 are in
> https://www.postgresql.org/message-id/5D45A44F.8010803%40anastigmatix.net

Thanks!  I just pushed these to those branches.

I think we're finally done with these.  Many thanks for your
persistence.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: psql - add SHOW_ALL_RESULTS option

2019-09-12 Thread Alvaro Herrera
This v6 is just Fabien's v5, rebased over a very minor conflict, and
pgindented.  No further changes.  I've marked this Ready for Committer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..de59a5cf65 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -49,17 +49,42 @@ BEGIN \;
 SELECT 2.0 AS "float" \;
 SELECT 'world' AS "text" \;
 COMMIT;
+ float 
+---
+   2.0
+(1 row)
+
+ text  
+---
+ world
+(1 row)
+
 -- compound with empty statements and spurious leading spacing
-\;\;   SELECT 3 + 3 \;\;\;   SELECT ' ' || ' !' \;\;   SELECT 1 + 4 \;;
- ?column? 
---
-5
+\;\;   SELECT 3 + 3 AS "+" \;\;\;   SELECT ' ' || ' !' AS "||" \;\;   SELECT 1 + 4 AS "+" \;;
+ + 
+---
+ 6
+(1 row)
+
+ ||  
+-
+   !
+(1 row)
+
+ + 
+---
+ 5
 (1 row)
 
 -- non ;-terminated statements
 SELECT 1 + 1 + 1 AS "add" \gset
 SELECT :add + 1 + 1 AS "add" \;
 SELECT :add + 1 + 1 AS "add" \gset
+ add 
+-
+   5
+(1 row)
+
 -- set operator
 SELECT 1 AS i UNION SELECT 2 ORDER BY i;
  i 
@@ -102,12 +127,12 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
  SELECT $1+| 4 |4
   +|   | 
AS "text"   |   | 
- SELECT $1 + $2| 2 |2
  SELECT $1 + $2 + $3 AS "add"  | 3 |3
+ SELECT $1 + $2 AS "+" | 2 |2
  SELECT $1 AS "float"  | 1 |1
  SELECT $1 AS "int"| 2 |2
  SELECT $1 AS i UNION SELECT $2 ORDER BY i | 1 |2
- SELECT $1 || $2   | 1 |1
+ SELECT $1 || $2 AS "||"   | 1 |1
  SELECT pg_stat_statements_reset() | 1 |1
  WITH t(f) AS (   +| 1 |2
VALUES ($1), ($2)  +|   | 
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..ea3a31176e 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -27,7 +27,7 @@ SELECT 'world' AS "text" \;
 COMMIT;
 
 -- compound with empty statements and spurious leading spacing
-\;\;   SELECT 3 + 3 \;\;\;   SELECT ' ' || ' !' \;\;   SELECT 1 + 4 \;;
+\;\;   SELECT 3 + 3 AS "+" \;\;\;   SELECT ' ' || ' !' AS "||" \;\;   SELECT 1 + 4 AS "+" \;;
 
 -- non ;-terminated statements
 SELECT 1 + 1 + 1 AS "add" \gset
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7789fc6177..4e6ab5a0a5 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -127,18 +127,11 @@ echo '\x \\ SELECT * FROM foo;' | psql
commands included in the string to divide it into multiple
transactions.  (See 
for more details about how the server handles multi-query strings.)
-   Also, psql only prints the
-   result of the last SQL command in the string.
-   This is different from the behavior when the same string is read from
-   a file or fed to psql's standard input,
-   because then psql sends
-   each SQL command separately.
   
   
-   Because of this behavior, putting more than one SQL command in a
-   single -c string often has unexpected results.
-   It's better to use repeated -c commands or feed
-   multiple commands to psql's standard input,
+   If having several commands executed in one transaction is not desired, 
+   use repeated -c commands or feed multiple commands to
+   psql's standard input,
either using echo as illustrated above, or
via a shell here-document, for example:
 
@@ -3355,10 +3348,8 @@ select 1\; select 2\; select 3;
 commands included in the string to divide it into multiple
 transactions.  (See 
 for more details about how the server handles multi-query strings.)
-psql prints only the last query result
-it receives for each request; in this example, although all
-three SELECTs are indeed executed, psql
-only prints the 3.
+psql prints all results it receives, one
+after the other.
 
 
   
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 4b2679360f..fa358c8c58 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -486,6 +486,16 @@ ResetCancelConn(void)
 #endif
 }
 
+static void
+ShowErrorMessage(const PGresult *result)

Re: Create collation reporting the ICU locale display name

2019-09-12 Thread Tom Lane
Alvaro Herrera  writes:
> I wonder if INFO is better than NOTICE (I think it is).

You're just waving a red flag in front of a bull, you know.

I don't especially like the idea of having this emit a NOTICE;
it's ugly and in-your-face.  INFO is right out.

The idea of having CREATE COLLATION automatically create a comment
is sort of interesting, although it seems pretty orthogonal to
normal command behavior.  I wonder whether the seeming need for
this indicates that we should add a descriptive field to pg_collation
proper, and not usurp the user-oriented comment feature for that.

The difficulty with localization is that whatever we put into
template1 has got to be ASCII-only, so that the template DB
can be copied to other encodings.  I suppose we could consider
having CREATE COLLATION act differently during initdb than
later, but that seems ugly too.

regards, tom lane




Re: another look at macOS SIP

2019-09-12 Thread Peter Eisentraut
On 2019-09-10 19:26, Tom Lane wrote:
>> I think the way forward here is to get rid of all uses of system() for
>> calling between PostgreSQL programs.
> 
> We could do that perhaps, but how are you going to get make to not use
> /bin/sh while spawning subprocesses?  I don't think we want to also
> reimplement make ...

make is not a problem if the DYLD_* assignments are in a makefile rule
(as currently), because then make just calls a shell with a string
"DYLD_*=foo some command", which is not affected by any filtering.  It
would be a problem if you do the variable assignments in a makefile
outside a rule or outside a makefile.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-09-12 Thread Alvaro Herrera
I just noticed we had two CF items pointing to this thread,

https://commitfest.postgresql.org/24/2119/
https://commitfest.postgresql.org/24/2180/

so I marked the newer one as withdrawn.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Create collation reporting the ICU locale display name

2019-09-12 Thread Alvaro Herrera
On 2019-Sep-12, Daniel Verite wrote:

>   Michael Paquier wrote:
> 
> > On Wed, Sep 11, 2019 at 04:53:16PM +0200, Daniel Verite wrote:
> > > I think it would be nice to have CREATE COLLATION report this
> > > information as feedback in the form of a NOTICE message.
> > > PFA a simple patch implementing that.
> > 
> > Why is that better than the descriptions provided with \dO[S]+ in
> > psql?
> 
> There is no description for collations created outside of
> pg_import_system_collations().

Hmm, sounds like the collation should automatically acquire the display
name as a comment even when created via CREATE COLLATION.

I wonder if INFO is better than NOTICE (I think it is).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Create collation reporting the ICU locale display name

2019-09-12 Thread Peter Geoghegan
On Thu, Sep 12, 2019 at 11:30 AM Peter Geoghegan  wrote:
> I wonder if it's possible to display a localized version of the
> display string in the NOTICE message? Does that work, or could it? For
> example, do you see the message in French?

BTW, I already know for sure that ICU supports localized display
names. The question is whether or not this patch can take advantage of
that.

The way that we use display name in pg_import_system_collations() is
an ugly hack. It insists on only storing ASCII-safe strings in
pg_collation.

-- 
Peter Geoghegan




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-12 Thread Andres Freund
Hi,

On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:
> On 2019-Sep-12, Tomas Vondra wrote:
> 
> > On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant 
> > wrote:
> > > On 2019-Sep-11, Amit Khandekar wrote:
> 
> > > I think doing this all the time would make restore very slow -- there's a
> > > reason we keep the files open, after all.
> > 
> > How much slower? It certainly will have a hit, but maybe it's negligible
> > compared to all the other stuff happening in this code?

I'd expect it to be significant.


> > As a sidenote - in the other thread about streaming, one of the patches
> > does change how we log subxact assignments. In the end, this allows using
> > just a single file for the top-level transaction, instead of having one
> > file per subxact. That would also solve this.

Uhm, how is rollback to savepoint going to be handled in that case? I
don't think it's great to just retain space for all rolled back
subtransactions.

Greetings,

Andres Freund




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-12 Thread Andres Freund
Hi,

On 2019-09-12 09:41:02 -0400, Robert Haas wrote:
> On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
>  wrote:
> > I don't see how the current API could do that transparently - it does
> > track the files, but the user only gets a file descriptor. With just a
> > file descriptor, how could the code know to do reopen/seek when it's going
> > just through the regular fopen/fclose?
> >
> > Anyway, I agree we need to do something, to fix this corner case (many
> > serialized in-progress transactions). ISTM we have two options - either do
> > something in the context of reorderbuffer.c, or extend the transient file
> > API somehow. I'd say the second option is the right thing going forward,
> > because it does allow doing it transparently and without leaking details
> > about maxAllocatedDescs etc. There are two issues, though - it does
> > require changes / extensions to the API, and it's not backpatchable.
> >
> > So maybe we should start with the localized fix in reorderbuffer, and I
> > agree tracking offset seems reasonable.
> 
> We've already got code that knows how to track this sort of thing.
> You just need to go through the File abstraction (PathNameOpenFile or
> PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
> functions that deal directly with fds (OpenTransientFile,
> BasicOpenFile, etc.).  It seems like it would be better to reuse the
> existing VFD layer than to invent a whole new one specific to logical
> replication.

Yea, I agree that that is the right fix.

Greetings,

Andres Freund




Re: Create collation reporting the ICU locale display name

2019-09-12 Thread Peter Geoghegan
On Wed, Sep 11, 2019 at 7:53 AM Daniel Verite  wrote:
> The 'locale' or 'lc_collate/lc_ctype' argument of an ICU collation may
> have a complicated syntax, especially with non-deterministic
> collations, and input mistakes in these names will not necessarily be
> detected as such by ICU.

That's a real problem.

> The "display name" of a locale is a simple way to get human-readable
> feedback about the characteristics of that locale.
> pg_import_system_collations() already push these as comments when
> creating locales en masse.
>
> I think it would be nice to have CREATE COLLATION report this
> information as feedback in the form of a NOTICE message.
> PFA a simple patch implementing that.

I like this idea.

I wonder if it's possible to display a localized version of the
display string in the NOTICE message? Does that work, or could it? For
example, do you see the message in French?

-- 
Peter Geoghegan




Re: Leakproofness of texteq()/textne()

2019-09-12 Thread Tom Lane
I wrote:
> I agree with your point that this is a shouldn't-happen corner case.
> The question boils down to, if it *does* happen, does that constitute
> a meaningful information leak?  Up to now we've taken quite a hard
> line about what leakproofness means, so deciding that varstr_cmp
> is leakproof would constitute moving the goalposts a bit.  They'd
> still be in the same stadium, though, IMO.

For most of us it might be more meaningful to look at the non-Windows
code paths, for which the question reduces to what we think of this:

UErrorCodestatus;

status = U_ZERO_ERROR;
result = ucol_strcollUTF8(mylocale->info.icu.ucol,
  arg1, len1,
  arg2, len2,
  );
if (U_FAILURE(status))
ereport(ERROR,
(errmsg("collation failed: %s", 
u_errorName(status;

which, as it happens, is also a UTF8-encoding-only code path.
Can this throw an error in practice, and if so does that
constitute a meaningful information leak?  (For bonus points:
is this error report up to project standards?)

Thumbing through the list of UErrorCode values, it seems like the only
ones that are applicable here and aren't internal-error cases are
U_INVALID_CHAR_FOUND and the like, so that this boils down to "one of
the strings contains a character that ICU can't cope with".  Maybe that's
impossible except with a pre-existing encoding violation, or maybe not.

In any case, from a purely theoretical viewpoint, such an error message
*does* constitute a leak of information about the input strings.  Whether
it's a usable leak is very debatable, but that's basically what we've
got to decide.

regards, tom lane




Re: Leakproofness of texteq()/textne()

2019-09-12 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 12, 2019 at 12:19 PM Tom Lane  wrote:
>> After burrowing down further, it's visibly the case that
>> text_cmp and varstr_cmp don't leak in the sense of actually
>> reporting any part of their input strings.  What they do do,
>> in some code paths, is things like
>> ereport(ERROR,
>> (errmsg("could not convert string to UTF-16: error code %lu",
>> GetLastError(;

> Is this possible? I mean, I'm sure it could happen if the data's
> corrupted, but we ought to have validated it on the way into the
> database. But maybe this code path also gets used for non-Unicode
> encodings?

Nope, the above is inside 

#ifdef WIN32
/* Win32 does not have UTF-8, so we need to map to UTF-16 */
if (GetDatabaseEncoding() == PG_UTF8
&& (!mylocale || mylocale->provider == COLLPROVIDER_LIBC))

I agree with your point that this is a shouldn't-happen corner case.
The question boils down to, if it *does* happen, does that constitute
a meaningful information leak?  Up to now we've taken quite a hard
line about what leakproofness means, so deciding that varstr_cmp
is leakproof would constitute moving the goalposts a bit.  They'd
still be in the same stadium, though, IMO.

Another approach would be to try to remove these failure cases,
but I don't really see how we'd do that.

regards, tom lane




Re: Leakproofness of texteq()/textne()

2019-09-12 Thread Robert Haas
On Thu, Sep 12, 2019 at 12:19 PM Tom Lane  wrote:
> After burrowing down further, it's visibly the case that
> text_cmp and varstr_cmp don't leak in the sense of actually
> reporting any part of their input strings.  What they do do,
> in some code paths, is things like
>
> ereport(ERROR,
> (errmsg("could not convert string to UTF-16: error code %lu",
> GetLastError(;

Is this possible? I mean, I'm sure it could happen if the data's
corrupted, but we ought to have validated it on the way into the
database. But maybe this code path also gets used for non-Unicode
encodings?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Leakproofness of texteq()/textne()

2019-09-12 Thread Tom Lane
I wrote:
> Seems like we have two choices:
> 1. Remove the leakproof marking on texteq()/textne().  This
> would, unfortunately, be catastrophic for performance in
> a lot of scenarios where leakproofness matters.
> 2. Fix text_cmp to actually be leakproof, whereupon we might
> as well mark all the remaining btree comparison operators
> leakproof.
> I think #2 is pretty obviously the superior answer, if we
> can do it.

After burrowing down further, it's visibly the case that
text_cmp and varstr_cmp don't leak in the sense of actually
reporting any part of their input strings.  What they do do,
in some code paths, is things like

ereport(ERROR,
(errmsg("could not convert string to UTF-16: error code %lu",
GetLastError(;

So this seems to mostly be a question of interpretation:
does an error like this represent enough of an information
leak to violate the promises of leakproofness?  I'm not sure
that this error is really capable of disclosing any information
about the input strings.  (Note that this error occurs only if
we failed to convert UTF8 to UTF16, which probably could only
happen if the input isn't valid UTF8, which would mean a failure
of encoding checking somewhere up the line.)

There are also various pallocs and such that could fail, which
perhaps could be argued to disclose the lengths of the input
strings.  But that hazard exists already inside PG_GETARG_TEXT_PP;
if you want to complain about that, then functions like byteaeq()
aren't legitimately leakproof either.

On the whole I'm prepared to say that these aren't leakproofness
violations, but it would depend a lot on how strict you want to be.

regards, tom lane




Re: Pulling up direct-correlated ANY_SUBLINK

2019-09-12 Thread Antonin Houska
Richard Guo  wrote:

> On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska 
> wrote:
> 
>
> Nevertheless, I don't know how to overcome the problems that I
> mentioned
> upthread.
> 
> 
> Do you mean the problem "the WHERE clause of the subquery didn't
> participate in the SEMI JOIN evaluation"? Good news is it has been
> fixed
> by commit 043f6ff0 as I mentioned upthread.

Do you say that my old patch (rebased) no longer breaks the regression tests?

(I noticed your other email in the thread which seems to indicate that you're
no lo longer interested to work on the feature, but asking out of curiosity.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-09-12 Thread James Coleman
> OK, so we have that now.  I suppose this spreadsheet now tells us which
> places are useful and which aren't, at least for the queries that you've
> tested.  Dowe that mean that we want to get the patch to consider adding
> paths only the places that your spreadsheet says are useful?  I'm not
> sure what the next steps are for this patch.

I wanted to note here that I haven't abandoned this patch, but ended
up needing to use my extra time for working on a conference talk. That
talk is today, so I'm hoping to be able to catch up on this again
soon.

James Coleman




JSON parser discards value of string token

2019-09-12 Thread Antonin Houska
Although the following problem does not seem to be exposed in the core, I
think it's still a problem to fix. (I've hit it when implementing a custom
parser for extension configuration file.)

If makeJsonLexContextCstringLen() is passed need_escapes=false,
JsonLexContext.strval is not initialized, and in turn, functions of
JsonSemAction which should receive the string token value
(e.g. object_field_start) receive NULL.

Attached is a patch that fixes the problem. If this approach is acceptable,
then it'd probably be worth to also rename the JsonLexContext.strval field to
something that recalls the "de-escaping", e.g. "noesc"?

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 26d293709a..2ef16fb089 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -142,17 +142,26 @@ lex_accept(JsonLexContext *lex, JsonTokenType token, char **lexeme)
 	{
 		if (lexeme != NULL)
 		{
-			if (lex->token_type == JSON_TOKEN_STRING)
-			{
-if (lex->strval != NULL)
-	*lexeme = pstrdup(lex->strval->data);
-			}
+			if (lex->token_type == JSON_TOKEN_STRING && lex->strval != NULL)
+*lexeme = pstrdup(lex->strval->data);
 			else
 			{
 int			len = (lex->token_terminator - lex->token_start);
-char	   *tokstr = palloc(len + 1);
+char	*src = lex->token_start;
+char	   *tokstr;
+
+/* String token should be quoted. */
+if (lex->token_type == JSON_TOKEN_STRING)
+{
+	Assert(len >= 2);
+	Assert(src[0] == '"' && src[len - 1] == '"');
+
+	src++;
+	len -= 2;
+}
 
-memcpy(tokstr, lex->token_start, len);
+tokstr = palloc(len + 1);
+memcpy(tokstr, src, len);
 tokstr[len] = '\0';
 *lexeme = tokstr;
 			}


Leakproofness of texteq()/textne()

2019-09-12 Thread Tom Lane
Currently, texteq() and textne() are marked leakproof, while
sibling operations such as textlt() are not.  The argument
for that was that those two functions depend only on memcmp()
so they can be seen to be safe, whereas it's a whole lot less
clear that text_cmp() should be considered leakproof.

Of course, the addition of nondeterministic collations has
utterly obliterated that argument: it's now possible for
texteq() to call text_cmp(), so if the latter is leaky then
the former certainly must be considered so as well.

Seems like we have two choices:

1. Remove the leakproof marking on texteq()/textne().  This
would, unfortunately, be catastrophic for performance in
a lot of scenarios where leakproofness matters.

2. Fix text_cmp to actually be leakproof, whereupon we might
as well mark all the remaining btree comparison operators
leakproof.

I think #2 is pretty obviously the superior answer, if we
can do it.

ISTM we can't ship v12 without dealing with this one way
or the other, so I'll go add an open item.

Comments?

regards, tom lane




Re: Implementing Incremental View Maintenance

2019-09-12 Thread Alvaro Herrera
On 2019-Aug-06, Tatsuo Ishii wrote:

> It's not mentioned below but some bugs including seg fault when
> --enable-casser is enabled was also fixed in this patch.
> 
> BTW, I found a bug with min/max support in this patch and I believe
> Yugo is working on it. Details:
> https://github.com/sraoss/pgsql-ivm/issues/20

So is he posting an updated patch soon?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: psql - improve test coverage from 41% to 88%

2019-09-12 Thread Alvaro Herrera
I think the TestLib.pm changes should be done separately, not together
with the rest of the hacking in this patch.

Mostly, because I think they're going to cause trouble.  Adding a
parameter in the middle of the list may cause trouble for third-party
users of TestLib.  I propose that we make the routines a bit smarter to
cope with the API change: use named parameters instead.  And in order to
do that without having to change existing users of command_check, make
it so that the routine checks whether the parameter is a hashref, and
behave differently.  So when called as in the existing callsites (five
scalar paramters) it behaves as currently.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: (Re)building index using itself or another index of the same table

2019-09-12 Thread Tom Lane
Arseny Sher  writes:
> A problem of similar nature can be reproduced with the following
> stripped-down scenario:

> CREATE TABLE pears(f1 int primary key, f2 int);
> INSERT INTO pears SELECT i, i+1 FROM generate_series(1, 100) i;
> CREATE OR REPLACE FUNCTION pears_f(i int) RETURNS int LANGUAGE SQL IMMUTABLE 
> AS $$
>   SELECT f1 FROM pears WHERE pears.f2 = 42
> $$;
> CREATE index ON pears ((pears_f(f1)));

We've seen complaints about this sort of thing before, and rejected
them because, as you say, that function is NOT immutable.  When you
lie to the system like that, you should not be surprised if things
break.

> There is already a mechanism which prevents usage of indexes during
> reindex -- ReindexIsProcessingIndex et al. However, to the contrary of
> what index.c:3664 comment say, these protect only indexes on system
> catalogs, not user tables: the only real caller is genam.c.
> Attached patch extends it: the same check is added to
> get_relation_info. Also SetReindexProcessing is cocked in index_create
> to defend from index self usage during creation as in stripped example
> above. There are some other still unprotected callers of index_build;
> concurrent index creation doesn't need it because index is
> 'not indisvalid' during the build, and in RelationTruncateIndexes
> table is empty, so it looks like it can be omitted.

I have exactly no faith that this fixes things enough to make such
cases supportable.  And I have no interest in opening that can of
worms anyway.  I'd rather put in some code to reject database
accesses in immutable functions.

> One might argue that function selecting from table can hardly be called
> immutable, and immutability is required for index expressions. However,
> if user is sure table contents doesn't change, why not?

If the table contents never change, why are you doing VACUUM FULL on it?

regards, tom lane




(Re)building index using itself or another index of the same table

2019-09-12 Thread Arseny Sher
Hi,

Our customer encountered a curious scenario. They have a table with GIN
index on expression, which performs multiple joins with this table
itself. These joins employ another btree index for efficiency.
VACUUM FULL on this table fails with error like

ERROR:  could not read block 3534 in file "base/41366676/56697497": read only 0 
of 8192 bytes

It happens because order in which indexes are rebuilt is not specified.
GIN index is being rebuilt when btree index is not reconstructed yet;
an attempt to use old index with rewritten heap crashes.

A problem of similar nature can be reproduced with the following
stripped-down scenario:

CREATE TABLE pears(f1 int primary key, f2 int);
INSERT INTO pears SELECT i, i+1 FROM generate_series(1, 100) i;
CREATE OR REPLACE FUNCTION pears_f(i int) RETURNS int LANGUAGE SQL IMMUTABLE AS 
$$
  SELECT f1 FROM pears WHERE pears.f2 = 42
$$;
CREATE index ON pears ((pears_f(f1)));

Here usage of not-yet-created index on pears_f(f1) for its own
construction is pointless, however planner in principle considers it in
get_relation_info, tries to get btree height (_bt_getrootheight) -- and
fails.


There is already a mechanism which prevents usage of indexes during
reindex -- ReindexIsProcessingIndex et al. However, to the contrary of
what index.c:3664 comment say, these protect only indexes on system
catalogs, not user tables: the only real caller is genam.c.

Attached patch extends it: the same check is added to
get_relation_info. Also SetReindexProcessing is cocked in index_create
to defend from index self usage during creation as in stripped example
above. There are some other still unprotected callers of index_build;
concurrent index creation doesn't need it because index is
'not indisvalid' during the build, and in RelationTruncateIndexes
table is empty, so it looks like it can be omitted.


One might argue that function selecting from table can hardly be called
immutable, and immutability is required for index expressions. However,
if user is sure table contents doesn't change, why not? Also, the
possiblity of triggering "could not read block" error with plain SQL is
definitely not nice.

>From 5942a3a5b2c90056119b9873c81f30dfa9e003af Mon Sep 17 00:00:00 2001
From: Arseny Sher 
Date: Thu, 12 Sep 2019 17:35:16 +0300
Subject: [PATCH] Avoid touching user indexes while they are being (re)built.

Existing ReindexIsProcessingIndex check is consulted only in genam.c and thus
enforced only for system catalogs. Check it also in the planner, so that indexes
which are currently being rebuilt are never used. Also cock SetReindexProcessing
in index_create to defend from index self usage during its creation.

Without this, VACUUM FULL or just CREATE INDEX might fail with something like

ERROR:  could not read block 3534 in file "base/41366676/56697497": read only 0 of 8192 bytes

if there are indexes which usage can be considered during these very
indexes (re)building, i.e. index expression scans indexed table.
---
 src/backend/catalog/index.c| 22 --
 src/backend/optimizer/util/plancat.c   |  5 +
 src/test/regress/expected/create_index.out | 12 
 src/test/regress/sql/create_index.sql  | 13 +
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3e1d40662d..5bc764ce46 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1174,7 +1174,22 @@ index_create(Relation heapRelation,
 	}
 	else
 	{
-		index_build(heapRelation, indexRelation, indexInfo, false, true);
+		/* ensure SetReindexProcessing state isn't leaked */
+		PG_TRY();
+		{
+			/* Suppress use of the target index while building it */
+			SetReindexProcessing(heapRelationId, indexRelationId);
+
+			index_build(heapRelation, indexRelation, indexInfo, false, true);
+		}
+		PG_CATCH();
+		{
+			/* Make sure flag gets cleared on error exit */
+			ResetReindexProcessing();
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+		ResetReindexProcessing();
 	}
 
 	/*
@@ -1379,7 +1394,10 @@ index_concurrently_build(Oid heapRelationId,
 	indexInfo->ii_Concurrent = true;
 	indexInfo->ii_BrokenHotChain = false;
 
-	/* Now build the index */
+	/*
+	 * Now build the index
+	 * SetReindexProcessing is not required since indisvalid is false anyway
+	 */
 	index_build(heapRel, indexRelation, indexInfo, false, true);
 
 	/* Close both the relations, but keep the locks */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cf1761401d..9d58cd2574 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
+#include "catalog/index.h"
 #include "catalog/heap.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_proc.h"
@@ -193,6 +194,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 		

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-12 Thread Alvaro Herrera
On 2019-Sep-12, Tomas Vondra wrote:

> On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant 
> wrote:
> > On 2019-Sep-11, Amit Khandekar wrote:

> > I think doing this all the time would make restore very slow -- there's a
> > reason we keep the files open, after all.
> 
> How much slower? It certainly will have a hit, but maybe it's negligible
> compared to all the other stuff happening in this code?

I dunno -- that's a half-assed guess based on there being many more
syscalls, and on the fact that the API is how it is in the first place.
(Andres did a lot of perf benchmarking and tweaking back when he was
writing this ... I just point out that I have a colleague that had to
invent *a lot* of new MemCxt infrastructure in order to make some of
Andres' perf optimizations cleaner, just as a semi-related data point.
Anyway, I digress.)

Anyway, such a fix would pessimize all cases, including every single
case that works today (which evidently is almost every single usage of
this feature, since we hadn't heard of this problem until yesterday), in
order to solve a problem that you can only find in very rare ones.

Another point of view is that we should make it work first, then make it
fast.  But the point remains that it works fine and fast for 99.99% of
cases.

> > It would be better if we can keep the descriptors open as much as
> > possible, and only close them if there's trouble.  I was under the
> > impression that using OpenTransientFile was already taking care of that,
> > but that's evidently not the case.
> 
> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?

Yeah, I don't know what was in Amit's mind, but it seemed obvious to me
that such a fix required changing that API so that a seekpos is kept
together with the fd.  ReorderBufferRestoreChange is static in
reorderbuffer.c so it's not a problem to change its ABI.

I agree with trying to get a reorderbuffer-localized back-patchable fix
first, then we how to improve from that.

> As a sidenote - in the other thread about streaming, one of the patches
> does change how we log subxact assignments. In the end, this allows using
> just a single file for the top-level transaction, instead of having one
> file per subxact. That would also solve this.

:-(  While I would love to get that patch done and get rid of the issue,
it's not a backpatchable fix either.  ... however, it does mean we maybe
can get away with the reorderbuffer.c-local fix, and then just use your
streaming stuff to get rid of the perf problem forever?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Runtime pruning problem

2019-09-12 Thread Tom Lane
Alvaro Herrera  writes:
> So is anyone working on a patch to use this approach?

It's on my to-do list, but I'm not sure how soon I'll get to it.

regards, tom lane




Re: Runtime pruning problem

2019-09-12 Thread Alvaro Herrera
On 2019-Jul-30, Tom Lane wrote:

> I wrote:
> > This may be arguing for a change in ruleutils' existing behavior,
> > not sure.  But when dealing with traditional-style inheritance,
> > I've always thought that Vars above the Append were referring to
> > the parent rel in its capacity as the parent, not in its capacity
> > as the first child.  With new-style partitioning drawing a clear
> > distinction between the parent and all its children, it's easier
> > to understand the difference.
> 
> OK, so experimenting, I see that it is a change: [...]

> The portion of this below the Append is fine, but I argue that
> the Vars above the Append should say "part", not "part_p1".
> In that way they'd look the same regardless of which partitions
> have been pruned or not.

So is anyone working on a patch to use this approach?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Misleading comment in tuplesort_set_bound

2019-09-12 Thread Tom Lane
Alvaro Herrera  writes:
> [ shrug ]  It seemed to require no further work, so I just pushed Tom's
> proposed change.
> I added an empty line after the new combined assertion, which makes
> clearer (to me anyway) that the other assertions are unrelated.

Actually, the thing I wanted to add was some actual comments for
those other assertions.  But that requires a bit of research that
I hadn't made time for...

regards, tom lane




Re: [PATCH] ltree, lquery, and ltxtquery binary protocol support

2019-09-12 Thread Alexander Korotkov
On Wed, Sep 11, 2019 at 9:45 PM Nino Floris  wrote:
>  > * We currently don't add new extension SQL-script for new extension
> > version (i.e. ltree--1.2.sql).  Instead, we provide just a migration
> > script (i.e. ltree--1.1--1.2.sql).  This simplifies testing of
> > extension migration – plain extension creation implies migration.
>
> I wasn't sure how to add new methods to the type without doing a full
> CREATE TYPE, as ALTER TYPE doesn't allow updates to functions. At that
> point wouldn't it be better as a new version?
> I checked some other extensions like hstore to find reference material
> how to do a new CREATE TYPE, all did a full version bump.
> Should I just do a DROP and CREATE instead in a migration?

Oh, I appears we didn't add send/recv to any pluggable datatype one we
get extension system.

Ideally we need to adjust ALTER TYPE command so that it could add
send/recv functions to an existing type.

I see couple other options:
1) Write migration script, which directly updates pg_type.
2) Add ltree--1.2.sql and advise users to recreate extension to get
binary protocol support.
But both these options are cumbersome.

What do you think about writing patch for ALTER TYPE?

> >  * What is motivation for binary format for lquery and ltxtquery?  One
> could transfer large datasets of ltree's.  But is it so for lquery and
> ltxtquery, which are used just for querying.
>
> Completeness, Npgsql (and other drivers like Ecto from Elixir and
> probably many others as well) can't do any text fallback in the binary
> protocol without manual configuration.
> This is because these drivers don't know much (or anything) about the
> SQL they send so it wouldn't know to apply it for which columns.
> I believe there has been a proposal at some point to enhance the
> binary protocol to additionally allow clients to specify text/binary
> per data type instead of per column.
> That would allow all these drivers to automate this, but I think it
> never went anywhere.
> As it stands it's not ergonomic to ask developers to setup this
> metadata per query they write.
>
>  * Just send binary representation of datatype is not OK.  PostgreSQL
> supports a lot of platforms with different byte ordering, alignment
> and so on.  You basically need to send each particular field using
> pq_send*() function.
>
> Oh my, I don't think I did? I copied what jsonb is doing,
> specifically, sending the textual representation of the data type with
> a version field prefixed.
> (It is why I introduced deparse_ltree/lquery, to take the respective
> structure and create a null terminated string of its textual form)
> So there are no other fields besides version and the string
> representation of the structure.
> ltree, lquery, and ltxtquery all seem to have a lossless and compact
> textual interpretation.
> My motivation here has been "if it's good enough for jsonb it should
> be good enough for a niche extension like ltree" especially as having
> any binary support is better than not having it at all.
> I can change it to anything you'd like to see instead but I would need
> some pointers as to what that would be.

Oh, sorry.  I didn't notice you send textual representation for ltree,
lquery, and ltxtquery.  And the point is not performance for these
datatypes, but completeness.  Now I got it, then it looks OK.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Write visibility map during CLUSTER/VACUUM FULL

2019-09-12 Thread Alexander Korotkov
On Wed, Sep 11, 2019 at 3:30 PM Amit Kapila  wrote:
> On Sun, Sep 1, 2019 at 1:37 PM Alexander Korotkov
>  wrote:
> > I found it weird that CLUSTER/VACUUM FULL don't write visibility map.
> > Attached patch implements writing visibility map in
> > heapam_relation_copy_for_cluster().
> >
> > I've studied previous attempt to implement this [1].  The main problem
> > of that attempt was usage of existing heap_page_is_all_visible() and
> > visibilitymap_set() functions.  These functions works through buffer
> > manager, while heap rewriting is made bypass buffer manager.
> >
> > In my patch visibility map pages are handled in the same way as heap
> > pages are.
> >
>
> I haven't studied this patch in detail, but while glancing I observed
> that this doesn't try to sync the vm pages as we do for heap pages in
> the end (during end_heap_rewrite).  Am I missing something?

You're not missed anything.  Yes, VM need sync.  Will fix this.  And I
just noticed I need a closer look to what is going on with TOAST.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-12 Thread Robert Haas
On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
 wrote:
> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?
>
> Anyway, I agree we need to do something, to fix this corner case (many
> serialized in-progress transactions). ISTM we have two options - either do
> something in the context of reorderbuffer.c, or extend the transient file
> API somehow. I'd say the second option is the right thing going forward,
> because it does allow doing it transparently and without leaking details
> about maxAllocatedDescs etc. There are two issues, though - it does
> require changes / extensions to the API, and it's not backpatchable.
>
> So maybe we should start with the localized fix in reorderbuffer, and I
> agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.
You just need to go through the File abstraction (PathNameOpenFile or
PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
functions that deal directly with fds (OpenTransientFile,
BasicOpenFile, etc.).  It seems like it would be better to reuse the
existing VFD layer than to invent a whole new one specific to logical
replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Bug in GiST paring heap comparator

2019-09-12 Thread Alexander Korotkov
On Wed, Sep 11, 2019 at 3:34 AM Nikita Glukhov  wrote:
> On 09.09.2019 22:47, Alexander Korotkov wrote:
>
> On Mon, Sep 9, 2019 at 8:32 PM Nikita Glukhov  wrote:
>
> On 08.09.2019 22:32, Alexander Korotkov wrote:
>
> On Fri, Sep 6, 2019 at 5:44 PM Alexander Korotkov
>  wrote:
>
> I'm going to push both if no objections.
>
> So, pushed!
>
> Two years ago there was a similar patch for this issue:
> https://www.postgresql.org/message-id/1499c9d0-075a-3014-d2aa-ba59121b3728%40postgrespro.ru
>
> Sorry that I looked at this thread too late.
>
> I see, thanks.
>
> You patch seems a bit less cumbersome thanks to using GISTDistance
> struct.  You don't have to introduce separate array with null flags.
> But it's harder too keep index_store_float8_orderby_distances() used
> by both GiST and SP-GiST.
>
> What do you think?  You can rebase your patch can propose that as refactoring.
>
> Rebased patch with refactoring is attached.
>
> During this rebase I found that SP-GiST code has similar problems with NULLs.
> All SP-GiST functions do not check SK_ISNULL flag of ordering ScanKeys, and
> that leads to segfaults.  In the following test, index is searched with
> non-NULL key first and then with NULL key, that leads to a crash:
>
> CREATE TABLE t AS SELECT point(x, y) p
> FROM generate_series(0, 100) x, generate_series(0, 100) y;
>
> CREATE INDEX ON t USING spgist (p);
>
> -- crash
> SELECT (SELECT p FROM t ORDER BY p <-> pt LIMIT 1)
> FROM (VALUES (point '1,2'), (NULL)) pts(pt);
>
>
> After adding SK_ISNULL checks and starting to produce NULL distances, we need 
> to
> store NULL flags for distance somewhere.  Existing singleton flag for the 
> whole
> SPGistSearchItem is not sufficient anymore.
>
>
> So, I introduced structure IndexOrderByDistance and used it everywhere in the
> GiST and SP-GiST code instead of raw double distances.
>
>
> SP-GiST order-by-distance code can be further refactored so that user 
> functions
> do not have to worry about SK_ISNULL checks.

It doesn't seem SP-GiST inner_consistent and leaf_consistent functions
can handle NULL scan keys for now.  So should we let them handle NULL
orderby keys?  If we assume that NULL arguments produce NULL
distances, we can evade changing the code of opclasses.

Also, I've noticed IndexOrderByDistance is missed in typedefs.list.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Misleading comment in tuplesort_set_bound

2019-09-12 Thread Alvaro Herrera
On 2019-Sep-05, James Coleman wrote:

> Yes, planning on it, just a bit behind right now so will likely be a
> few more days at least.

[ shrug ]  It seemed to require no further work, so I just pushed Tom's
proposed change.

I added an empty line after the new combined assertion, which makes
clearer (to me anyway) that the other assertions are unrelated.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: block-level incremental backup

2019-09-12 Thread Jeevan Chalke
Hi,

One of my colleague at EDB, Rajkumar Raghuwanshi, while testing this
feature reported an issue. He reported that if a full base-backup is
taken, and then created a database, and then took an incremental backup,
combining full backup with incremental backup is then failing.

I had a look over this issue and observed that when the new database is
created, the catalog files are copied as-is into the new directory
corresponding to a newly created database. And as they are just copied,
the LSN on those pages are not changed. Due to this incremental backup
thinks that its an existing file and thus do not copy the blocks from
these new files, leading to the failure.

I have surprised to know that even though we are creating new files from
old files, we kept the LSN unmodified. I didn't see any other parameter
in basebackup which tells that this is a new file from last LSN or
something.

I tried looking for any other DDL doing similar stuff like creating a new
page with existing LSN. But I could not find any other commands than
CREATE DATABASE and ALTER DATABASE .. SET TABLESPACE.

Suggestions/thoughts?

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: SegFault on 9.6.14

2019-09-12 Thread Robert Haas
On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila  wrote:
> Robert, Thomas, do you have any more suggestions related to this.  I
> am planning to commit the above-discussed patch (Forbid Limit node to
> shutdown resources.) coming Monday, so that at least the reported
> problem got fixed.

I think that your commit message isn't very clear about what the
actual issue is.  And the patch itself doesn't add any comments or
anything to try to clear it up. So I wouldn't favor committing it in
this form.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Close stdout and stderr in syslogger

2019-09-12 Thread Tom Lane
=?utf-8?B?0KHQstGP0YLQvtGB0LvQsNCyINCV0YDQvNC40LvQuNC9?= 
 writes:
> Hi! Few months ago we have encountered 
> situation when some quite big open log files were open by Postres despite 
> being deleted.This affects free space caluculation in out managed 
> PostgreSQL instances.Currently I'm investigating this 
> issue.We traced some roots to unclosed descriptors in Perl code of 
> postgresql-common [0], but we still face some unclosed descriptors pointing 
> to log file. Debian tools from postgresql-common starts 
> pg_ctl piped to logfile. Descriptor is piped to logfile and block it for 
> delete.That is why we can't delete logfile.1 after 
> logrotate.And after second logrotate logfile.1 is in "deleted" 
> status, but can't be deleted in fact. Here I apply path 
> with possible solution. In this patch stdout and stderr pipes are just closed 
> in syslogger. --Sviatoslav 
> ErmilinYandex [0] 
> https://salsa.debian.org/postgresql/postgresql-common/commit/580aa0677edc222ebaf6e1031cf3929f847f27fb

I'm quite certain that the current behavior is intentional, if only
because closing the syslogger's stderr would make it impossible to
debug problems inside the syslogger.  Why is it a problem that we
leave it open?  I don't believe either that the file will grow much
(in normal cases anyway), or that it's impossible to unlink it
(except on Windows, but you didn't say anything about Windows).

In any case, the proposed patch almost certainly introduces new
problems, in that you dropped the fcloses's into code that
executes repeatedly.

regards, tom lane




Re: doc: pg_trgm missing description for GUC "pg_trgm.strict_word_similarity_threshold"

2019-09-12 Thread Alexander Korotkov
On Thu, Sep 12, 2019 at 3:39 PM Euler Taveira  wrote:
> Em seg, 10 de jun de 2019 às 14:34, Alexander Korotkov
>  escreveu:
> >
> > Pushed!
> >
> Alexander, this commit is ok for 11 and so. However, GUC
> strict_word_similarity_threshold does not exist in 9.6 and 10. The
> attached patch revert this part. It should apply cleanly in 9.6 and
> 10.

Thank you for pointing this out.
Pushed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Amcheck: do rightlink verification with lock coupling

2019-09-12 Thread Alvaro Herrera
On 2019-Sep-12, Andrey Borodin wrote:

> This patch violates one of amcheck design principles: current code
> does not ever take more than one page lock. I do not know: should we
> hold this rule or should we use more deep check?

The check does seem worthwhile to me.

As far as I know, in btree you can lock the right sibling of a page
while keeping lock on the page itself, without risk of deadlock.  So I'm
not sure what's the issue with that.  This is in the README:

  In most cases we release our lock and pin on a page before attempting
  to acquire pin and lock on the page we are moving to.  In a few places
  it is necessary to lock the next page before releasing the current one.
  This is safe when moving right or up, but not when moving left or down
  (else we'd create the possibility of deadlocks).

I suppose Peter was concerned about being able to run amcheck without
causing any trouble at all for concurrent operation; maybe we can retain
that property by making this check optional.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Amcheck: do rightlink verification with lock coupling

2019-09-12 Thread Andrey Borodin
Hi!

This is a thread to discuss amcheck feature started in other thread[0].

Currently amcheck is scanning every B-tree level. If verification is done with 
ShareLock - amcheck will test that each page leftlink is pointing to page with 
rightlink backwards.
This is important invariant, in our experience it proved to be good at 
detecting various corruptions.
But this invariant can be detected only if both pages are not modified (e.g. 
split concurrently).

PFA patch, that in case of suspicion locks two pages and retests that 
invariant. This allows detection of several corruptions on standby.

This patch violates one of amcheck design principles: current code does not 
ever take more than one page lock. I do not know: should we hold this rule or 
should we use more deep check?

Thanks!

Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/da9b33ac-53cb-4643-96d4-7a0bbc037...@yandex-team.ru


v2-0001-In-amcheck-nbtree-do-rightlink-verification-with-loc.patch
Description: Binary data


Re: tableam vs. TOAST

2019-09-12 Thread Robert Haas
On Fri, Aug 2, 2019 at 6:42 PM Andres Freund  wrote:
> The obvious problem with this is the slice fetching logic. For slices
> with an offset of 0, it's obviously trivial to implement. For the higher
> slice logic, I'd assume we'd have to fetch the first slice by estimating
> where the start chunk is based on the current suggest chunk size, and
> restarting the scan earlier/later if not accurate.  In most cases it'll
> be accurate, so we'd not loose efficiency.

Dilip and I were discussing this proposal this morning, and after
talking to him, I don't quite see how to make this work without
significant loss of efficiency.  Suppose that, based on the estimated
chunk size, you decide that there are probably 10 chunks and that the
value that you need is probably located in chunk 6. So you fetch chunk
6.  Happily, chunk 6 is the size that you expect, so you extract the
bytes that you need and go on your way.

But ... what if there are actually 6 chunks, not 10, and the first
five are bigger than you expected, and the reason why the size of
chunk 6 matched your expectation because it was the last chunk and
thus smaller than the rest? Now you've just returned the wrong answer.

AFAICS, there's no way to detect that except to always read at least
two chunks, which seems like a pretty heavy price to pay. It doesn't
cost if you were going to need to read at least 2 chunks anyway, but
if you were only going to need to read 1, having to fetch 2 stinks.

Actually, when I initially read your proposal, I thought you were
proposing to relax things such that the chunks didn't even have to all
be the same size. That seems like it would be something cool that
could potentially be leveraged not only by AMs but perhaps also by
data types that want to break up their data strategically to cater to
future access patterns. But then there's really no way to make slicing
work without always reading from the beginning.

There would be no big problem here - in either scenario - if each
chunk contained the byte-offset of that chunk relative to the start of
the datum. You could guess wildly and always know whether or not you
had got it right without reading any extra data.  But that's not the
case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: SegFault on 9.6.14

2019-09-12 Thread Amit Kapila
On Thu, Sep 5, 2019 at 7:53 PM Amit Kapila  wrote:
>
> On Mon, Sep 2, 2019 at 4:51 PM Amit Kapila  wrote:
> >
> > On Fri, Aug 9, 2019 at 6:29 PM Robert Haas  wrote:
> > >
> > >
> > > But beyond that, the issue here is that the Limit node is shutting
> > > down the Gather node too early, and the right fix must be to stop
> > > doing that, not to change the definition of what it means to shut down
> > > a node, as this patch does.  So maybe a possible approach here - which
> > > I think is more or less what Tom is proposing - is:
> > >
> > > 1. Remove the code from ExecLimit() that calls ExecShutdownNode().
> > >
> >
> > Attached patch does that.  I have also added one test as a separate
> > patch so that later if we introduce shutting down resources in Limit
> > node, we don't break anything.  As of now, I have kept it separate for
> > easy verification, but if we decide to go with this approach and test
> > appears fine, we can merge it along with the fix.
> >
>
> I have merged the code change and test case patch as I felt that it is
> good to cover this case.  I have slightly changed the test case to
> make its output predictable (made the inner scan ordered so that the
> query always produces the same result).  One more thing I am not able
> to come up with some predictable test case for 9.6 branches as it
> doesn't support Gather Merge which is required for this particular
> test to always produce predictable output.  There could be some better
> way to write this test, so any input in that regards or otherwise is
> welcome.  So, if we commit this patch the containing test case will be
> for branches HEAD~10, but the code will be for HEAD~9.6.
>

Robert, Thomas, do you have any more suggestions related to this.  I
am planning to commit the above-discussed patch (Forbid Limit node to
shutdown resources.) coming Monday, so that at least the reported
problem got fixed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: doc: pg_trgm missing description for GUC "pg_trgm.strict_word_similarity_threshold"

2019-09-12 Thread Euler Taveira
Em seg, 10 de jun de 2019 às 14:34, Alexander Korotkov
 escreveu:
>
> Pushed!
>
Alexander, this commit is ok for 11 and so. However, GUC
strict_word_similarity_threshold does not exist in 9.6 and 10. The
attached patch revert this part. It should apply cleanly in 9.6 and
10.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
From 26f77930fa668c329694da43697361877dfb5ac0 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Thu, 12 Sep 2019 09:12:59 -0300
Subject: [PATCH] Fix pg_trgm documentation

Revert part of the commit 9ee98cc3f because it introduced a GUC
description that does not exist on this version.
---
 doc/src/sgml/pgtrgm.sgml | 17 -
 1 file changed, 17 deletions(-)

diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 9f9482ef27..b8e0df6c03 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -263,23 +263,6 @@
  
 

-   
-
- pg_trgm.strict_word_similarity_threshold (real)
- 
-  
-   pg_trgm.strict_word_similarity_threshold configuration parameter
-  
- 
-
-
- 
-  Sets the current strict word similarity threshold that is used by the
-  % and % operators.  The threshold
-  must be between 0 and 1 (default is 0.5).
- 
-
-   
   
  
 
-- 
2.11.0



Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Amit Kapila
On Thu, Sep 12, 2019 at 4:48 PM Michael Paquier  wrote:
>
> On Thu, Sep 12, 2019 at 05:34:08PM +0800, Masahiko Sawada wrote:
> > On Thu, Sep 12, 2019 at 1:56 PM Amit Kapila  wrote:
> >> I think it is better to use a message like "must be superuser to use
> >> pageinspect functions" as this function doesn't take raw page as
> >> input.  If you see other functions like bt_page_items which doesn't
> >> take raw page as input has the message which I am suggesting.  I can
> >> see that tuple_data_split also has a similar message as you are
> >> proposing, but I think that is also not very appropriate.
> >
> > Agreed. Will fix.
>
> Well, those functions are all able to work only from data of a raw
> page, so the existing message was actually fine by me.  If you want to
> change it this way, I don't see either any arguments against.
>
> > Hmm my understanding of 'decode_combined' is to decode the flags that
> > we represent by using multiple flags. HEAP_XMAX_IS_LOCKED_ONLY is true
> > either if HEAP_XMAX_LOCK_ONLY is set or not a multi and the EXCL_LOCK
> > bit is set. That is it requires only one flag. So I thought that it's
> > not a combined flag.
>
> Same interpretation here.
>

Hmm, I thought when decode_combined flag is set to false means we will
display the raw flags set on the tuple without any further
interpretation.  I think that is what is most people in thread
advocated about.

> >> I am not completely sure whether we want to cover HEAP_LOCKED_UPGRADED
> >> case even when decode_combined flag is set.  It seems this is a bit
> >> more interpretation of flags than we are doing in other cases.  For
> >> example, other cases like HEAP_XMAX_SHR_LOCK or HEAP_XMIN_FROZEN are
> >> the flags that are explicitly set on the tuple so displaying them
> >> makes sense, but the same is not true for HEAP_LOCKED_UPGRADED.
> >
> > I thought it would be better to interpret it as much as possible,
> > especially for diagnostic use cases. I'm concerned that user might not
> > be able to get enough information for investigation if we
> > intentionally filtered particular flags.
>
> For HEAP_LOCKED_UPGRADED, my interpretation was that the current code
> is correct to understand it as a decomposition of HEAP_XMAX_IS_MULTI
> and HEAP_XMAX_LOCK_ONLY, still...
>
> It seems to me that the way we define combined flags is subject to
> a lot of interpretation.
>

Right.

>  Honestly, if we cannot come up with a clear
> definition of what should be combined or not, I would be of the
> opinion to just wipe out the option, and just return in the text array
> the bits which are set.  It has been discussed on the thread that it
> would be confusing to not show combined flags to some users as some
> bits set have rather contrary meaning when set with others.
>

Yes, I think we could have more discussion on this point.  It is not
100% clear how we should interpret this flag and or where to draw a
line.  It might be that whatever we have done is alright, but still,
it is worth more discussion and opinion from a few more people.

> We tell
> the user that all the flag details are defined in htup_details.h in
> the code and the documentation so the information is not in the
> returned data, but in the code.  And I would like to think that users
> of pageinspect are knowledgeable enough about Postgres that they would
> likely never use decode_combined = true.  Likely I am outnumbered
> regarding this point, so I won't push hard on it, still I get that the
> confusion does not come from this module, but in the way the code
> combines and names all the bits for the infomasks :)
>
> And there would be the argument to not use HEAP_XMAX_IS_LOCKED_ONLY()
> in the code.
>
> >> I am not very happy with the parameter name 'decode_combined'.  It is
> >> not clear from the name what it means and I think it doesn't even
> >> match with what we are actually doing here.  How about raw_flags,
> >> raw_tuple_flags or something like that?
> >>
> >
> > raw_flags might be more straightforward. Or perhaps the third idea
> > could be show_raw_flags? If other hackers agree to change the flag
> > name I'll fix it.
> >
> > I'll submit the patch to fix the commit after we got a consensus on
> > the above changes.
>
> decode_combined sounds like a good compromise to me.  If there is a
> better consensus, well, let's use it, but I don't find those
> suggestions to be improvements.
>

I think it depends on the meaning of that flag.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2019-09-12 Thread Asim R P
On Thu, Sep 12, 2019 at 2:05 PM Kyotaro Horiguchi 
wrote:
>
> Hello.
>
> At Wed, 11 Sep 2019 17:26:44 +0300, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote in <
a82a896b-93f0-c26c-b941-f56651313...@postgrespro.ru>
> > 10.09.2019 14:42, Asim R P wrote:
> > > Hi Anastasia
> > >
> > > On Thu, Aug 22, 2019 at 9:43 PM Anastasia Lubennikova
> > > mailto:a.lubennik...@postgrespro.ru>>
> > > wrote:
> > > >
> > > > But during the review, I found a bug in the current implementation.
> > > > New behavior must apply to crash-recovery only, now it applies to
> > > > archiveRecovery too.
> > > > That can cause a silent loss of a tablespace during regular standby
> > > > operation
> > > > since it never calls CheckRecoveryConsistency().
>
> Yeah. We should take the same steps with redo operations on
> missing pages. Just record failure during inconsistent state then
> forget it if underlying tablespace is gone. If we had a record
> when we reached concsistency, we're in a serious situation and
> should stop recovery.  log_invalid_page forget_invalid_pages and
> CheckRecoveryConsistency are the entry points of the feature to
> understand.
>

Yes, I get it now.  I will adjust the patch written by Paul accordingly.

>
> # I remember that the start point of this patch is a crash after
> # table space drop subsequent to several operations within the
> # table space. Then, crash recovery fails at an operation in the
> # finally-removed tablespace. Is it right?
>

That's correct.  Once the directories are removed from filesystem, any
attempt to replay WAL records that depend on their existence fails.


> > > But before that I'm revisiting another solution upthread, that of
> > > creating restart points when replaying create/drop database commands
> > > before making filesystem changes such as removing a directory.  The
> > > restart points should align with checkpoints on master.  The concern
> > > against this solution was creation of restart points will slow down
> > > recovery.  I don't think crash recovery is affected by this solution
> > > because of the already existing enforcement of checkpoints.  WAL
> > > records prior to a create/drop database will not be seen by crash
> > > recovery due to the checkpoint enforced during the command's normal
> > > execution.
> > >
> >
> > I haven't measured the impact of generating extra restart points in
> > previous solution, so I cannot tell whether concerns upthread are
> > justified.  Still, I enjoy latest design more, since it is clear and
> > similar with the code of checking unexpected uninitialized pages. In
> > principle it works. And the issue, I described in previous review can
> > be easily fixed by several additional checks of InHotStandby macro.
>
> Generally we shouldn't trigger useless restart point for
> uncertain reasons. If standby crashes, it starts the next
> recovery from the latest *restart point*.  Even in that case what
> we should do is the same.
>

The reason is quite clear to me - removing directories from filesystem
break the ability to replay WAL records second time.  And we already create
checkpoints during normal operation in such a case, so crash recovery on a
master node does not suffer from this bug.  I've attached a patch that
performs restart points during drop database replay, just for reference.
It passes both the TAP tests written by Kyotaro and Paul.  I had to modify
drop database WAL record a bit.

Asim


v1-0001-Create-restartpoint-when-replaying-drop-database.patch
Description: Binary data


v1-0002-Test-to-validate-replay-of-WAL-records-involving-.patch
Description: Binary data


Re: Create collation reporting the ICU locale display name

2019-09-12 Thread Daniel Verite
Michael Paquier wrote:

> On Wed, Sep 11, 2019 at 04:53:16PM +0200, Daniel Verite wrote:
> > I think it would be nice to have CREATE COLLATION report this
> > information as feedback in the form of a NOTICE message.
> > PFA a simple patch implementing that.
> 
> Why is that better than the descriptions provided with \dO[S]+ in
> psql?

There is no description for collations created outside of
pg_import_system_collations().

Example:

db=# create collation mycoll(provider=icu, locale='fr-FR-u-ks-level1');
NOTICE:  ICU locale: "French (France, colstrength=primary)"

db=# \x auto

db=# \dO+
List of collations
-[ RECORD 1 ]--+--
Schema | public
Name   | mycoll
Collate| fr-FR-u-ks-level1
Ctype  | fr-FR-u-ks-level1
Provider   | icu
Deterministic? | yes
Description| 

The NOTICE above is with the patch. Otherwise, the "display name"
is never shown nor stored anywhere AFAICS.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: psql - improve test coverage from 41% to 88%

2019-09-12 Thread Fabien COELHO



Here is a v5.



Few more in icommand_checks subroutine:
Few unwanted code can be removed.


Indeed, more debug and test code.

Attached v6 fixes these, and I checked for remaining scrubs without 
finding any.


--
Fabien.diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index b7d36b65dd..aabc27ab6f 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -506,6 +506,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*checksum verification failed/s],
 	'pg_basebackup reports checksum mismatch');
@@ -526,6 +527,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*further.*failures.*will.not.be.reported/s],
 	'pg_basebackup does not report more than 5 checksum mismatches');
@@ -542,6 +544,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*7 total checksum verification failures/s],
 	'pg_basebackup correctly report the total number of checksum mismatches');
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 59228b916c..cf4811d382 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -62,6 +62,7 @@ sub check_relation_corruption
 			'--filenode',   $relfilenode_corrupted
 		],
 		1,
+		'',
 		[qr/Bad checksums:.*1/],
 		[qr/checksum verification failed/],
 		"fails with corrupted data for single relfilenode on tablespace $tablespace"
@@ -71,6 +72,7 @@ sub check_relation_corruption
 	$node->command_checks_all(
 		[ 'pg_checksums', '--check', '-D', $pgdata ],
 		1,
+		'',
 		[qr/Bad checksums:.*1/],
 		[qr/checksum verification failed/],
 		"fails with corrupted data on tablespace $tablespace");
@@ -203,6 +205,7 @@ sub fail_corrupt
 	$node->command_checks_all(
 		[ 'pg_checksums', '--check', '-D', $pgdata ],
 		1,
+		'',
 		[qr/^$/],
 		[qr/could not read block 0 in file.*$file\":/],
 		"fails for corrupted data in $file");
diff --git a/src/bin/pg_controldata/t/001_pg_controldata.pl b/src/bin/pg_controldata/t/001_pg_controldata.pl
index 3b63ad230f..ebe9b80a52 100644
--- a/src/bin/pg_controldata/t/001_pg_controldata.pl
+++ b/src/bin/pg_controldata/t/001_pg_controldata.pl
@@ -33,6 +33,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_controldata', $node->data_dir ],
 	0,
+	'',
 	[
 		qr/WARNING: Calculated CRC checksum does not match value stored in file/,
 		qr/WARNING: invalid WAL segment size/
diff --git a/src/bin/pg_resetwal/t/002_corrupted.pl b/src/bin/pg_resetwal/t/002_corrupted.pl
index f9940d7fc5..1990669d26 100644
--- a/src/bin/pg_resetwal/t/002_corrupted.pl
+++ b/src/bin/pg_resetwal/t/002_corrupted.pl
@@ -30,6 +30,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_resetwal', '-n', $node->data_dir ],
 	0,
+	'',
 	[qr/pg_control version number/],
 	[
 		qr/pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it/
@@ -46,6 +47,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_resetwal', '-n', $node->data_dir ],
 	0,
+	'',
 	[qr/pg_control version number/],
 	[
 		qr/\Qpg_resetwal: warning: pg_control specifies invalid WAL segment size (0 bytes); proceed with caution\E/
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index b82d3f65c4..01010914fe 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -50,7 +50,7 @@ sub pgbench
 
 	push @cmd, @args;
 
-	$node->command_checks_all(\@cmd, $stat, $out, $err, $name);
+	$node->command_checks_all(\@cmd, $stat, '', $out, $err, $name);
 
 	# cleanup?
 	#unlink @filenames or die "cannot unlink files (@filenames): $!";
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index f7fa18418b..b58f3548c3 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -25,7 +25,7 @@ sub pgbench
 	my ($opts, $stat, $out, $err, $name) = @_;
 	print STDERR "opts=$opts, stat=$stat, out=$out, err=$err, name=$name";
 	command_checks_all([ 'pgbench', split(/\s+/, $opts) ],
-		$stat, $out, $err, $name);
+		$stat, '', $out, $err, $name);
 	return;
 }
 
@@ -52,7 +52,7 @@ sub pgbench_scripts
 			push @cmd, '-f', $filename;
 		}
 	}
-	command_checks_all(\@cmd, $stat, $out, $err, $name);
+	command_checks_all(\@cmd, $stat, '', $out, $err, $name);
 	return;
 }
 
diff --git a/src/bin/psql/.gitignore b/src/bin/psql/.gitignore
index c2862b12d6..d324c1c1fa 100644
--- a/src/bin/psql/.gitignore
+++ b/src/bin/psql/.gitignore
@@ -3,3 +3,4 @@
 /sql_help.c
 
 /psql
+/tmp_check
diff --git a/src/bin/psql/Makefile 

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Michael Paquier
On Thu, Sep 12, 2019 at 05:34:08PM +0800, Masahiko Sawada wrote:
> On Thu, Sep 12, 2019 at 1:56 PM Amit Kapila  wrote:
>> I think it is better to use a message like "must be superuser to use
>> pageinspect functions" as this function doesn't take raw page as
>> input.  If you see other functions like bt_page_items which doesn't
>> take raw page as input has the message which I am suggesting.  I can
>> see that tuple_data_split also has a similar message as you are
>> proposing, but I think that is also not very appropriate.
> 
> Agreed. Will fix.

Well, those functions are all able to work only from data of a raw
page, so the existing message was actually fine by me.  If you want to
change it this way, I don't see either any arguments against.

> Hmm my understanding of 'decode_combined' is to decode the flags that
> we represent by using multiple flags. HEAP_XMAX_IS_LOCKED_ONLY is true
> either if HEAP_XMAX_LOCK_ONLY is set or not a multi and the EXCL_LOCK
> bit is set. That is it requires only one flag. So I thought that it's
> not a combined flag.

Same interpretation here.

>> I am not completely sure whether we want to cover HEAP_LOCKED_UPGRADED
>> case even when decode_combined flag is set.  It seems this is a bit
>> more interpretation of flags than we are doing in other cases.  For
>> example, other cases like HEAP_XMAX_SHR_LOCK or HEAP_XMIN_FROZEN are
>> the flags that are explicitly set on the tuple so displaying them
>> makes sense, but the same is not true for HEAP_LOCKED_UPGRADED.
> 
> I thought it would be better to interpret it as much as possible,
> especially for diagnostic use cases. I'm concerned that user might not
> be able to get enough information for investigation if we
> intentionally filtered particular flags.

For HEAP_LOCKED_UPGRADED, my interpretation was that the current code
is correct to understand it as a decomposition of HEAP_XMAX_IS_MULTI
and HEAP_XMAX_LOCK_ONLY, still...

It seems to me that the way we define combined flags is subject to
a lot of interpretation.  Honestly, if we cannot come up with a clear
definition of what should be combined or not, I would be of the
opinion to just wipe out the option, and just return in the text array
the bits which are set.  It has been discussed on the thread that it
would be confusing to not show combined flags to some users as some
bits set have rather contrary meaning when set with others.  We tell
the user that all the flag details are defined in htup_details.h in
the code and the documentation so the information is not in the
returned data, but in the code.  And I would like to think that users
of pageinspect are knowledgeable enough about Postgres that they would
likely never use decode_combined = true.  Likely I am outnumbered
regarding this point, so I won't push hard on it, still I get that the
confusion does not come from this module, but in the way the code
combines and names all the bits for the infomasks :)

And there would be the argument to not use HEAP_XMAX_IS_LOCKED_ONLY()
in the code.

>> I am not very happy with the parameter name 'decode_combined'.  It is
>> not clear from the name what it means and I think it doesn't even
>> match with what we are actually doing here.  How about raw_flags,
>> raw_tuple_flags or something like that?
>>
> 
> raw_flags might be more straightforward. Or perhaps the third idea
> could be show_raw_flags? If other hackers agree to change the flag
> name I'll fix it.
> 
> I'll submit the patch to fix the commit after we got a consensus on
> the above changes.

decode_combined sounds like a good compromise to me.  If there is a
better consensus, well, let's use it, but I don't find those
suggestions to be improvements.
--
Michael


signature.asc
Description: PGP signature


Close stdout and stderr in syslogger

2019-09-12 Thread Святослав Ермилин
Hi! Few months ago we have encountered situation when some quite big open log files were open by Postres despite being deleted.This affects free space caluculation in out managed PostgreSQL instances.Currently I'm investigating this issue.We traced some roots to unclosed descriptors in Perl code of postgresql-common [0], but we still face some unclosed descriptors pointing to log file. Debian tools from postgresql-common starts pg_ctl piped to logfile. Descriptor is piped to logfile and block it for delete.That is why we can't delete logfile.1 after logrotate.And after second logrotate logfile.1 is in "deleted" status, but can't be deleted in fact. Here I apply path with possible solution. In this patch stdout and stderr pipes are just closed in syslogger. --Sviatoslav ErmilinYandex [0] https://salsa.debian.org/postgresql/postgresql-common/commit/580aa0677edc222ebaf6e1031cf3929f847f27fbFrom ef44d999ce1d7dcfa515713e52d8b318d5f655af Mon Sep 17 00:00:00 2001
From: Ermilin Sviatoslav 
Date: Thu, 12 Sep 2019 11:58:05 +0500
Subject: [PATCH] Close stderr and stdout in syslogger. Debian tools from
 postgresql-common starts pg_ctl piped to logfile. Descriptor is piped to
 logfile and block it for delete. That is why we can't delete logfile.1 after
 logrotate. And after second logrotate logfile.1 is in "deleted" status, but
 can't be deleted in fact.

---
 src/backend/postmaster/syslogger.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 14d72d38d7..e23fa96c71 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -1365,6 +1365,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 	if (csvfilename)
 		pfree(csvfilename);
 
+	fclose(stdout);
+	fclose(stderr);
+
 	update_metainfo_datafile();
 
 	set_next_rotation_time();
-- 
2.20.1 (Apple Git-117)



Re: psql - improve test coverage from 41% to 88%

2019-09-12 Thread vignesh C
On Thu, Sep 12, 2019 at 2:15 PM Fabien COELHO  wrote:
>
>
> >> Ok. Rebased version added, with some minor changes to improve readability
> >> (comments, variables).
> >
> > Few comments: [...]
> >
> > Commented line can be removed
> > Commented lines can be removed
> > ??? can be changed to some suitable heading
> > tab-complation to be changed to tab-completion
> > Commented lines can be removed, some more are present below these lines 
> > also.
>
> Thanks for this review.
>
> The lines were really tests I did that had some issues because of the way
> the Expect module works, and are not useful for inclusion in the code
> base.
>
> Here is a v5.
Few more in icommand_checks subroutine:

+
+ #$ps->slave->stty(qw(raw -echo));
+ $ps->slave->stty(qw(raw));
+ my $n = 0;
+ for my $test (@$inout)
+ {
+ #warn "test: @$test";
+ my ($in, @out) = @$test;
+ $n++;
+ #warn "in: $in";
+ #warn "out: @out";
+ $ps->send($in);

Few unwanted code can be removed.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: let's kill AtSubStart_Notify

2019-09-12 Thread Kyotaro Horiguchi
At Thu, 12 Sep 2019 09:44:49 +0530, Dilip Kumar  wrote 
in 
> On Wed, Sep 11, 2019 at 6:22 PM Robert Haas  wrote:
> > trivial subtransactions.  I used this test case:
> >
> > \timing
> > do $$begin for i in 1 .. 1000 loop begin null; exception when
> > others then null; end; end loop; end;$$;
> >
> > I ran the test four times with and without the patch and took the
> > median of the last three.  This was an attempt to exclude effects due
> > to starting up the database cluster. With the patch, the result was
> > 3127.377 ms; without the patch, it was 3527.285 ms. That's a big
> > enough difference that I'm wondering whether I did something wrong
> > while testing this, so feel free to check my work and tell me whether
> > I'm all wet. Still, I don't find it wholly unbelievable, because I've
> > observed in the past that these code paths are lean enough that a few
> > palloc() calls can make a noticeable difference, and the effect of
> > this patch is to remove a few palloc() calls.
> 
> I did not read the patch but run the same case what you have given and
> I can see the similar improvement with the patch.
> With the patch 8832.988, without the patch 10252.701ms (median of three 
> reading)

I see the similar result. The patch let it run faster by about
25%. The gain is reduced to 3-6% by a crude check by adding { (in
TopTxCxt) lcons(0, p1); lcons(0, p2); } to the place where
AtSubStart_Notify was called and respective list_delete_first's
just after the call to AtSubCommit_Notfiy. At least around 20% of
the gain seems to be the result of removing palloc/pfree's.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Specifying attribute slot for storing/reading statistics

2019-09-12 Thread Esteban Zimanyi
>
> So these are 4 different data types (or classes of data types) that you
> introduce in your extension? Or is that just a conceptual view and it's
> stored in some other way (e.g. normalized in some way)?
>

At the SQL level these 4 durations are not distinguishable. For example for
a tfloat (temporal float) we can have

select tfloat '1@2000-01-01' -- Instant duration
select tfloat '{1@2000-01-01 , 2@2000-01-02 , 1@2000-01-03}' -- Instant set
duration
select tfloat '[1@2000-01-01, 2@2000-01-02 , 1@2000-01-03)' -- Sequence
duration, left-inclusive and right-exclusive bound,
select tfloat {'[1@2000-01-01, 2@2000-01-02 , 1@2000-01-03], '[1@2000-01-04,
1@2000-01-05]}  ' -- Sequence set duration

Nevertheless it is possible to restrict a column to a specific duration
with a typymod specifier as in

create table test ( ..., measure tfloat(Instant) -- only Instant durations
accepted, ...)

At the C level these 4 durations are distinguished and implement in
something equivalent to a template abstract class Temporal with four
subclasses TemporalInst, TemporalI, TemporalSeq, and TemporalS. Indeed the
algorithms for manipulating these 4 durations are completely different.
They are called template classes since they keep the Oid of the base type
(float for tfloat or geometry for tgeompoint) in the same way array or
ranges do.

For more information please refer to the manual at github
https://github.com/ULB-CoDE-WIT/MobilityDB/


> I don't think we're strongly against changing the code to allow this, as
> long as it does not break existing extensions/code (unnecessarily).
>
> >If you want I can prepare a PR in order to understand the implications of
> >these changes. Please let me know.
> >
>
> I think having an actual patch to look at would be helpful.
>

I am preparing a first patch for the files selfuncs.h and selfunc.c and
thus for instant duration selectivity. It basically
1) Moves some prototypes of the static functions from the .c to the .h file
so that the functions are exported.
2) Passes the operator from the top level functions to the inner functions
such as mcv_selectivity or ineq_histogram_selectivity.

This allows me to call the functions twice, once for the value component
and another for the time component, e.g. as follows.

else if (cachedOp == CONTAINED_OP || cachedOp == OVERLAPS_OP)
{
/* Enable the addition of the selectivity of the value and time
 * dimensions since either may be missing */
int selec_value = 1.0, selec_time = 1.0;

/* Selectivity for the value dimension */
if (MOBDB_FLAGS_GET_X(box->flags))
{
operator = oper_oid(LT_OP, valuetypid, valuetypid);
selec_value = scalarineqsel(root, operator, false, false
, vardata,
Float8GetDatum(box->xmin), valuetypid);
operator = oper_oid(GT_OP, valuetypid, valuetypid);
selec_value += scalarineqsel(root, operator, true, false
, vardata,
Float8GetDatum(box->xmax), valuetypid);
selec_value = 1 - selec_value;
}
/* Selectivity for the time dimension */
if (MOBDB_FLAGS_GET_T(box->flags))
{
operator = oper_oid(LT_OP, T_TIMESTAMPTZ, T_TIMESTAMPTZ);
selec_time = scalarineqsel(root, operator, false, false
, vardata,
TimestampTzGetDatum(box->tmin), TIMESTAMPTZOID);
operator = oper_oid(GT_OP, T_TIMESTAMPTZ, T_TIMESTAMPTZ);
selec_time += scalarineqsel(root, operator, true, false
, vardata,
TimestampTzGetDatum(box->tmax), TIMESTAMPTZOID);
selec_time = 1 - selec_time;
}
selec = selec_value * selec_time;
}

Regards

Esteban


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Masahiko Sawada
On Thu, Sep 12, 2019 at 1:56 PM Amit Kapila  wrote:
>
> On Wed, Sep 11, 2019 at 8:53 PM Masahiko Sawada  wrote:
> >
> > I've attached the updated patch that incorporated all comments. I kept
> > the function as superuser-restricted.
> >
>
> Thanks for the updated patch.
>
> Few more comments:

Thank you for your comments.

>
> *
> + if (!superuser())
> + ereport(ERROR,
> + (errcode
> (ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be superuser to use raw
> page functions")));
>
> I think it is better to use a message like "must be superuser to use
> pageinspect functions" as this function doesn't take raw page as
> input.  If you see other functions like bt_page_items which doesn't
> take raw page as input has the message which I am suggesting.  I can
> see that tuple_data_split also has a similar message as you are
> proposing, but I think that is also not very appropriate.

Agreed. Will fix.

>
> *
> else
> + {
> + if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
> + d[cnt++] = CStringGetTextDatum
> ("HEAP_XMAX_LOCK_ONLY");
>
> If the idea is that whenever decode_combined flag is false, we will
> display the raw flags set on the tuple, then why to try to interpret
> flags on a tuple in the above case.

Hmm my understanding of 'decode_combined' is to decode the flags that
we represent by using multiple flags. HEAP_XMAX_IS_LOCKED_ONLY is true
either if HEAP_XMAX_LOCK_ONLY is set or not a multi and the EXCL_LOCK
bit is set. That is it requires only one flag. So I thought that it's
not a combined flag.

>
> *
> + if (decode_combined &&
> + HEAP_LOCKED_UPGRADED(t_infomask))
> + d[cnt++] = CStringGetTextDatum("HEAP_LOCKED_UPGRADED");
> + else
> + {
> + if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
> + d[cnt++] = CStringGetTextDatum
> ("HEAP_XMAX_LOCK_ONLY");
> + if ((t_infomask & HEAP_XMAX_IS_MULTI) != 0)
> + d[cnt++] = CStringGetTextDatum
> ("HEAP_XMAX_IS_MULTI");
> + }
>
> I am not completely sure whether we want to cover HEAP_LOCKED_UPGRADED
> case even when decode_combined flag is set.  It seems this is a bit
> more interpretation of flags than we are doing in other cases.  For
> example, other cases like HEAP_XMAX_SHR_LOCK or HEAP_XMIN_FROZEN are
> the flags that are explicitly set on the tuple so displaying them
> makes sense, but the same is not true for HEAP_LOCKED_UPGRADED.
>

I thought it would be better to interpret it as much as possible,
especially for diagnostic use cases. I'm concerned that user might not
be able to get enough information for investigation if we
intentionally filtered particular flags.

> *
> +CREATE FUNCTION heap_tuple_infomask_flags(
> +   t_infomask integer,
> +   t_infomask2 integer,
> +   decode_combined boolean DEFAULT false)
>
> I am not very happy with the parameter name 'decode_combined'.  It is
> not clear from the name what it means and I think it doesn't even
> match with what we are actually doing here.  How about raw_flags,
> raw_tuple_flags or something like that?
>

raw_flags might be more straightforward. Or perhaps the third idea
could be show_raw_flags? If other hackers agree to change the flag
name I'll fix it.

I'll submit the patch to fix the commit after we got a consensus on
the above changes.



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2019-09-12 Thread Kyotaro Horiguchi
Hello.

At Wed, 11 Sep 2019 17:26:44 +0300, Anastasia Lubennikova 
 wrote in 

> 10.09.2019 14:42, Asim R P wrote:
> > Hi Anastasia
> >
> > On Thu, Aug 22, 2019 at 9:43 PM Anastasia Lubennikova
> > mailto:a.lubennik...@postgrespro.ru>>
> > wrote:
> > >
> > > But during the review, I found a bug in the current implementation.
> > > New behavior must apply to crash-recovery only, now it applies to
> > > archiveRecovery too.
> > > That can cause a silent loss of a tablespace during regular standby
> > > operation
> > > since it never calls CheckRecoveryConsistency().

Yeah. We should take the same steps with redo operations on
missing pages. Just record failure during inconsistent state then
forget it if underlying tablespace is gone. If we had a record
when we reached concsistency, we're in a serious situation and
should stop recovery.  log_invalid_page forget_invalid_pages and
CheckRecoveryConsistency are the entry points of the feature to
understand.

> > > Steps to reproduce:
> > > 1) run master and replica
> > > 2) create dir for tablespace:
> > > mkdir  /tmp/tblspc1
> > >
> > > 3) create tablespace and database on the master:
> > > create tablespace tblspc1 location '/tmp/tblspc1';
> > > create database db1 tablespace tblspc1 ;
> > >
> > > 4) wait for replica to receive this changes and pause replication:
> > > select pg_wal_replay_pause();
> > >
> > > 5) move replica's tablespace symlink to some empty directory,
> > > i.e. /tmp/tblspc2
> > > mkdir  /tmp/tblspc2
> > > ln -sfn /tmp/tblspc2 postgresql_data_replica/pg_tblspc/16384
> > >
> >
> > By changing the tablespace symlink target, we are silently nullifying
> > effects of a committed transaction from the standby data directory -
> > the directory structure created by the standby for create tablespace
> > transaction.  This step, therefore, does not look like a valid test
> > case to me.  Can you share a sequence of steps that does not involve
> > changing data directory manually?

I see it as the same. WAL is inconsistent with what happend on
storage with the steps. Database is just broken.

> Hi, the whole idea of the test is to reproduce a data loss. For
> example, if the disk containing this tablespace failed.

So, apparently we must start recovery from a backup before that
failure happened in that case, and that should ends in success.

# I remember that the start point of this patch is a crash after
# table space drop subsequent to several operations within the
# table space. Then, crash recovery fails at an operation in the
# finally-removed tablespace. Is it right?

> Probably, simply deleting the directory
> 'postgresql_data_replica/pg_tblspc/16384'
> would work as well, though I was afraid that it can be caught by some
> earlier checks and my example won't be so illustrative.
> >
> > Thank you for the review feedback.  I agree with all the points.  Let
> > me incorporate them (I plan to pick this work up and drive it to
> > completion as Paul got busy with other things).
> >
> > But before that I'm revisiting another solution upthread, that of
> > creating restart points when replaying create/drop database commands
> > before making filesystem changes such as removing a directory.  The
> > restart points should align with checkpoints on master.  The concern
> > against this solution was creation of restart points will slow down
> > recovery.  I don't think crash recovery is affected by this solution
> > because of the already existing enforcement of checkpoints.  WAL
> > records prior to a create/drop database will not be seen by crash
> > recovery due to the checkpoint enforced during the command's normal
> > execution.
> >
> 
> I haven't measured the impact of generating extra restart points in
> previous solution, so I cannot tell whether concerns upthread are
> justified.  Still, I enjoy latest design more, since it is clear and
> similar with the code of checking unexpected uninitialized pages. In
> principle it works. And the issue, I described in previous review can
> be easily fixed by several additional checks of InHotStandby macro.

Generally we shouldn't trigger useless restart point for
uncertain reasons. If standby crashes, it starts the next
recovery from the latest *restart point*.  Even in that case what
we should do is the same.

Of course, for testing, we *should* establish a restartpoint
manually in order to establish the prerequisite state.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-12 Thread Tomas Vondra

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:


I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map" , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.


Uh-oh :-(  Thanks for the reproducer -- I confirm it breaks things.


Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if  reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).


Makes sense.


Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.


I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.


How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?


It would be better if we can keep the descriptors open as much as
possible, and only close them if there's trouble.  I was under the
impression that using OpenTransientFile was already taking care of that,
but that's evidently not the case.



I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Amit Kapila
On Thu, Sep 12, 2019 at 11:43 AM Michael Paquier  wrote:
>
> On Wed, Sep 11, 2019 at 11:22:45PM +0800, Masahiko Sawada wrote:
> > Hmm it will be more consistent with other functions but I think we
> > would need to increase the pageinspect version to 1.8 and need the new
> > sql file to rename the function name. And it will be for PG12, not
> > PG13. If we have to do it someday I think it's better to do it in PG12
> > that the table AM has been introduced to. Anyway I've attached
> > separate patch for it.
>
> Like Alvaro, I would discard this one for now.
>
> > I've attached the updated patch that incorporated all comments. I kept
> > the function as superuser-restricted.
>
> But not this one.  So committed.
>

I had a few comments as posted in the previous email which I think we
can address incrementally as the patch for those is produced.
However, one point which I am slightly worried is the last one in my
email.  Are we happy with the name of the new parameter in the API
decode_combined?  Because if we decide to change that then we need to
change the exposed API and I think in the ideal case we need to change
the version as well, but I might be wrong and maybe the parameter name
as committed is good enough in which case we should be good.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: psql - improve test coverage from 41% to 88%

2019-09-12 Thread Fabien COELHO



Ok. Rebased version added, with some minor changes to improve readability
(comments, variables).


Few comments: [...]

Commented line can be removed
Commented lines can be removed
??? can be changed to some suitable heading
tab-complation to be changed to tab-completion
Commented lines can be removed, some more are present below these lines also.


Thanks for this review.

The lines were really tests I did that had some issues because of the way 
the Expect module works, and are not useful for inclusion in the code 
base.


Here is a v5.

--
Fabien Coelho - CRI, MINES ParisTechdiff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index b7d36b65dd..aabc27ab6f 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -506,6 +506,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*checksum verification failed/s],
 	'pg_basebackup reports checksum mismatch');
@@ -526,6 +527,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*further.*failures.*will.not.be.reported/s],
 	'pg_basebackup does not report more than 5 checksum mismatches');
@@ -542,6 +544,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*7 total checksum verification failures/s],
 	'pg_basebackup correctly report the total number of checksum mismatches');
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 59228b916c..cf4811d382 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -62,6 +62,7 @@ sub check_relation_corruption
 			'--filenode',   $relfilenode_corrupted
 		],
 		1,
+		'',
 		[qr/Bad checksums:.*1/],
 		[qr/checksum verification failed/],
 		"fails with corrupted data for single relfilenode on tablespace $tablespace"
@@ -71,6 +72,7 @@ sub check_relation_corruption
 	$node->command_checks_all(
 		[ 'pg_checksums', '--check', '-D', $pgdata ],
 		1,
+		'',
 		[qr/Bad checksums:.*1/],
 		[qr/checksum verification failed/],
 		"fails with corrupted data on tablespace $tablespace");
@@ -203,6 +205,7 @@ sub fail_corrupt
 	$node->command_checks_all(
 		[ 'pg_checksums', '--check', '-D', $pgdata ],
 		1,
+		'',
 		[qr/^$/],
 		[qr/could not read block 0 in file.*$file\":/],
 		"fails for corrupted data in $file");
diff --git a/src/bin/pg_controldata/t/001_pg_controldata.pl b/src/bin/pg_controldata/t/001_pg_controldata.pl
index 3b63ad230f..ebe9b80a52 100644
--- a/src/bin/pg_controldata/t/001_pg_controldata.pl
+++ b/src/bin/pg_controldata/t/001_pg_controldata.pl
@@ -33,6 +33,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_controldata', $node->data_dir ],
 	0,
+	'',
 	[
 		qr/WARNING: Calculated CRC checksum does not match value stored in file/,
 		qr/WARNING: invalid WAL segment size/
diff --git a/src/bin/pg_resetwal/t/002_corrupted.pl b/src/bin/pg_resetwal/t/002_corrupted.pl
index f9940d7fc5..1990669d26 100644
--- a/src/bin/pg_resetwal/t/002_corrupted.pl
+++ b/src/bin/pg_resetwal/t/002_corrupted.pl
@@ -30,6 +30,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_resetwal', '-n', $node->data_dir ],
 	0,
+	'',
 	[qr/pg_control version number/],
 	[
 		qr/pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it/
@@ -46,6 +47,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_resetwal', '-n', $node->data_dir ],
 	0,
+	'',
 	[qr/pg_control version number/],
 	[
 		qr/\Qpg_resetwal: warning: pg_control specifies invalid WAL segment size (0 bytes); proceed with caution\E/
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index b82d3f65c4..01010914fe 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -50,7 +50,7 @@ sub pgbench
 
 	push @cmd, @args;
 
-	$node->command_checks_all(\@cmd, $stat, $out, $err, $name);
+	$node->command_checks_all(\@cmd, $stat, '', $out, $err, $name);
 
 	# cleanup?
 	#unlink @filenames or die "cannot unlink files (@filenames): $!";
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index f7fa18418b..b58f3548c3 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -25,7 +25,7 @@ sub pgbench
 	my ($opts, $stat, $out, $err, $name) = @_;
 	print STDERR "opts=$opts, stat=$stat, out=$out, err=$err, name=$name";
 	command_checks_all([ 'pgbench', split(/\s+/, $opts) ],
-		$stat, $out, $err, $name);
+		$stat, '', $out, $err, $name);
 	return;
 }
 
@@ -52,7 +52,7 @@ sub pgbench_scripts
 			push @cmd, '-f', $filename;
 		}
 	}
-	

Re: psql - improve test coverage from 41% to 88%

2019-09-12 Thread vignesh C
On Thu, Sep 12, 2019 at 11:56 AM Fabien COELHO  wrote:
>
> > On Wed, Sep 11, 2019 at 10:52:01PM +0200, Fabien COELHO wrote:
> >> AFAICR this is because the coverage was not the same:-) Some backslash
> >> commands just skip silently to the end of the line, so that intermediate
> >> \commands on the same line are not recognized/processed the same, so I 
> >> moved
> >> everything on one line to avoid this.
> >
> > I see.  So basically this tests for more code paths to ignore
> > backslash commands and it improves the coverage of \elif.  Applied
> > after fixing two nits:
> > - Indentation was incorrect.
> > - Moved the \elif test closer to the existing one for the false
> > branch (you can grep #2 to find it).
>
> Ok. Rebased version added, with some minor changes to improve readability
> (comments, variables).
>
>
Few comments:
+sub create_test_file
+{
+ my ($fname, $contents) = @_;
+ my $fn = $node->basedir . '/' . $fname;
+ #ok(not -e $fn, "$fn must not already exists");
+ append_to_file($fn, $contents);
+ return $fn;
+}

Commented line can be removed

+# nope, interacts on tty
+#psql('-W', 0, "foo\n", [ qr{^$} ], [ qr{^$} ], 'psql -W');
+psql('-x', 0, "SELECT 1 AS one, 2 AS two;\n", [ qr{one \| 1.*two \|
2}s ], $EMPTY, 'psql -x');
+# some issue, \0 is not welcome somewhere
+#psql('-A -z', "SELECT 1 AS one, 2 AS two;\n", [ qr{one.two}s,
qr{1.2}s ], $EMPTY, 'psql -z');
+#psql('-A -0', "SELECT 1 AS one, 2 AS two;\n", [ qr{two.1}s ],
$EMPTY, 'psql -0');
+psql('-1', 0, "SELECT 54;\nSELECT 32;\n", [ qr{54}, qr{32} ], $EMPTY,
'psql -1');

Commented lines can be removed

+ [ "\\lo_list\n", [ qr{Large objects} ] ],
+ [ "\\if true\\q\\endif\n", $EMPTY ],
+ # ???
+ #[ "SELECT md5('hello world');\n\\s\n", [ qr{5eb63bbbe0}, qr{SELECT md5} ] ],
+ [ "\\set\n", [ qr{ENCODING = }, qr{VERSION_NUM = } ] ],
+ [ "\\set COMP_KEYWORD_CASE preserve-lower\n\\set COMP_KEYWORD_CASE lower\n" .

#[ "Select"] commented line can be removed
??? can be changed to some suitable heading

+psql('', 0, "\\s /dev/null\n", $EMPTY, $EMPTY, 'psql \s null');
+
+# tab-complation
+ipsql('-P pager', 0, 5,
+  [ # commands

tab-complation to be changed to tab-completion

+ # but the coverage works as expected.
+ #[ "CREATE \t", qr/i(MATERIALIZED VIEW.*postgres=\# )?/s ],
+ #[ "\\r\n", qr/Query buffer reset.*postgres=\# /s ],
+ [ "CREATE \t\\r\n", qr/Query buffer reset.*postgres=\# /s ],
+ #[ "DROP \t", qr/(UNLOGGED.*postgres=\# )?/s ],
+ #[ "\\r\n", qr/Query buffer reset.*postgres=\# /s ],
+ [ "DROP \t\\r\n", qr/Query buffer reset.*postgres=\# /s ],
+ #[ "ALTER \t", qr/(TABLESPACE.*postgres=\# )/s ],
+ #[ "\\r\n", qr/Query buffer reset.*postgres=\# /s ],

Commented lines can be removed, some more are present below these lines also.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: psql - improve test coverage from 41% to 88%

2019-09-12 Thread Fabien COELHO

On Wed, Sep 11, 2019 at 10:52:01PM +0200, Fabien COELHO wrote:

AFAICR this is because the coverage was not the same:-) Some backslash
commands just skip silently to the end of the line, so that intermediate
\commands on the same line are not recognized/processed the same, so I moved
everything on one line to avoid this.


I see.  So basically this tests for more code paths to ignore
backslash commands and it improves the coverage of \elif.  Applied
after fixing two nits:
- Indentation was incorrect.
- Moved the \elif test closer to the existing one for the false
branch (you can grep #2 to find it).


Ok. Rebased version added, with some minor changes to improve readability 
(comments, variables).


--
Fabien.diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index b7d36b65dd..aabc27ab6f 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -506,6 +506,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*checksum verification failed/s],
 	'pg_basebackup reports checksum mismatch');
@@ -526,6 +527,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*further.*failures.*will.not.be.reported/s],
 	'pg_basebackup does not report more than 5 checksum mismatches');
@@ -542,6 +544,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*7 total checksum verification failures/s],
 	'pg_basebackup correctly report the total number of checksum mismatches');
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 59228b916c..cf4811d382 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -62,6 +62,7 @@ sub check_relation_corruption
 			'--filenode',   $relfilenode_corrupted
 		],
 		1,
+		'',
 		[qr/Bad checksums:.*1/],
 		[qr/checksum verification failed/],
 		"fails with corrupted data for single relfilenode on tablespace $tablespace"
@@ -71,6 +72,7 @@ sub check_relation_corruption
 	$node->command_checks_all(
 		[ 'pg_checksums', '--check', '-D', $pgdata ],
 		1,
+		'',
 		[qr/Bad checksums:.*1/],
 		[qr/checksum verification failed/],
 		"fails with corrupted data on tablespace $tablespace");
@@ -203,6 +205,7 @@ sub fail_corrupt
 	$node->command_checks_all(
 		[ 'pg_checksums', '--check', '-D', $pgdata ],
 		1,
+		'',
 		[qr/^$/],
 		[qr/could not read block 0 in file.*$file\":/],
 		"fails for corrupted data in $file");
diff --git a/src/bin/pg_controldata/t/001_pg_controldata.pl b/src/bin/pg_controldata/t/001_pg_controldata.pl
index 3b63ad230f..ebe9b80a52 100644
--- a/src/bin/pg_controldata/t/001_pg_controldata.pl
+++ b/src/bin/pg_controldata/t/001_pg_controldata.pl
@@ -33,6 +33,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_controldata', $node->data_dir ],
 	0,
+	'',
 	[
 		qr/WARNING: Calculated CRC checksum does not match value stored in file/,
 		qr/WARNING: invalid WAL segment size/
diff --git a/src/bin/pg_resetwal/t/002_corrupted.pl b/src/bin/pg_resetwal/t/002_corrupted.pl
index f9940d7fc5..1990669d26 100644
--- a/src/bin/pg_resetwal/t/002_corrupted.pl
+++ b/src/bin/pg_resetwal/t/002_corrupted.pl
@@ -30,6 +30,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_resetwal', '-n', $node->data_dir ],
 	0,
+	'',
 	[qr/pg_control version number/],
 	[
 		qr/pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it/
@@ -46,6 +47,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_resetwal', '-n', $node->data_dir ],
 	0,
+	'',
 	[qr/pg_control version number/],
 	[
 		qr/\Qpg_resetwal: warning: pg_control specifies invalid WAL segment size (0 bytes); proceed with caution\E/
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index b82d3f65c4..01010914fe 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -50,7 +50,7 @@ sub pgbench
 
 	push @cmd, @args;
 
-	$node->command_checks_all(\@cmd, $stat, $out, $err, $name);
+	$node->command_checks_all(\@cmd, $stat, '', $out, $err, $name);
 
 	# cleanup?
 	#unlink @filenames or die "cannot unlink files (@filenames): $!";
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index f7fa18418b..b58f3548c3 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -25,7 +25,7 @@ sub pgbench
 	my ($opts, $stat, $out, $err, $name) = @_;
 	print STDERR "opts=$opts, stat=$stat, out=$out, err=$err, name=$name";
 	command_checks_all([ 'pgbench', split(/\s+/, $opts) ],
-		$stat, $out, $err, $name);
+		$stat, '', 

Parallel Full Hash Join

2019-09-12 Thread Thomas Munro
Hello,

While thinking about looping hash joins (an alternative strategy for
limiting hash join memory usage currently being investigated by
Melanie Plageman in a nearby thread[1]), the topic of parallel query
deadlock hazards came back to haunt me.  I wanted to illustrate the
problems I'm aware of with the concrete code where I ran into this
stuff, so here is a new-but-still-broken implementation of $SUBJECT.
This was removed from the original PHJ submission when I got stuck and
ran out of time in the release cycle for 11.  Since the original
discussion is buried in long threads and some of it was also a bit
confused, here's a fresh description of the problems as I see them.
Hopefully these thoughts might help Melanie's project move forward,
because it's closely related, but I didn't want to dump another patch
into that other thread.  Hence this new thread.

I haven't succeeded in actually observing a deadlock with the attached
patch (though I did last year, very rarely), but I also haven't tried
very hard.  The patch seems to produce the right answers and is pretty
scalable, so it's really frustrating not to be able to get it over the
line.

Tuple queue deadlock hazard:

If the leader process is executing the subplan itself and waiting for
all processes to arrive in ExecParallelHashEndProbe() (in this patch)
while another process has filled up its tuple queue and is waiting for
the leader to read some tuples an unblock it, they will deadlock
forever.  That can't happen in the the committed version of PHJ,
because it never waits for barriers after it has begun emitting
tuples.

Some possible ways to fix this:

1.  You could probably make it so that the PHJ_BATCH_SCAN_INNER phase
in this patch (the scan for unmatched tuples) is executed by only one
process, using the "detach-and-see-if-you-were-last" trick.  Melanie
proposed that for an equivalent problem in the looping hash join.  I
think it probably works, but it gives up a lot of parallelism and thus
won't scale as nicely as the attached patch.

2.  You could probably make it so that only the leader process drops
out of executing the inner unmatched scan, and then I think you
wouldn't have this very specific problem at the cost of losing some
(but not all) parallelism (ie the leader), but there might be other
variants of the problem.  For example, a GatherMerge leader process
might be blocked waiting for the next tuple for a tuple from P1, while
P2 is try to write to a full queue, and P1 waits for P2.

3.  You could introduce some kind of overflow for tuple queues, so
that tuple queues can never block because they're full (until you run
out of extra memory buffers or disk and error out).  I haven't
seriously looked into this but I'm starting to suspect it's the
industrial strength general solution to the problem and variants of it
that show up in other parallelism projects (Parallel Repartition).  As
Robert mentioned last time I talked about this[2], you'd probably only
want to allow spooling (rather than waiting) when the leader is
actually waiting for other processes; I'm not sure how exactly to
control that.

4.  Goetz Graefe's writing about parallel sorting
comes close to this topic, which he calls flow control deadlocks.  He
mentions the possibility of infinite spooling like (3) as a solution.
He's describing a world where producers and consumers are running
concurrently, and the consumer doesn't just decide to start running
the subplan (what we call "leader participation"), so he doesn't
actually have a problem like Gather deadlock.  He describes
planner-enforced rules that allow deadlock free execution even with
fixed-size tuple queue flow control by careful controlling where
order-forcing operators are allowed to appear, so he doesn't have a
problem like Gather Merge deadlock.  I'm not proposing we should
create a whole bunch of producer and consumer processes to run
different plan fragments, but I think you can virtualise the general
idea in an async executor with "streams", and that also solves other
problems when you start working with partitions in a world where it's
not even sure how many workers will show up.  I see this as a long
term architectural goal requiring vast amounts of energy to achieve,
hence my new interest in (3) for now.

Hypothetical inter-node deadlock hazard:

Right now I think it is the case the whenever any node begins pulling
tuples from a subplan, it continues to do so until either the query
ends early or the subplan runs out of tuples.  For example, Append
processes its subplans one at a time until they're done -- it doesn't
jump back and forth.  Parallel Append doesn't necessarily run them in
the order that they appear in the plan, but it still runs each one to
completion before picking another one.  If we ever had a node that
didn't adhere to that rule, then two Parallel Full Hash Join nodes
could dead lock, if some of the workers were stuck waiting in one
while some were stuck waiting in the other.

If we were 

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Michael Paquier
On Wed, Sep 11, 2019 at 11:22:45PM +0800, Masahiko Sawada wrote:
> Hmm it will be more consistent with other functions but I think we
> would need to increase the pageinspect version to 1.8 and need the new
> sql file to rename the function name. And it will be for PG12, not
> PG13. If we have to do it someday I think it's better to do it in PG12
> that the table AM has been introduced to. Anyway I've attached
> separate patch for it.

Like Alvaro, I would discard this one for now.

> I've attached the updated patch that incorporated all comments. I kept
> the function as superuser-restricted.

But not this one.  So committed.  I have gone through the patch and
adjusted a couple of things in the tests, the docs with weird
formulations and an example leading mainly to NULLs returned when
scanning the first page of pg_class.  The tests needed some
improvements to gain in clarity (no need for unnest with 2 elements,
added tests for all the combined flags, etc.).  The patch was not
indented either but this is no big deal.

I hope I forgot to credit nobody in the commit message.  If that's the
case, you are the winner of a drink of your choice the next time we
meet.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-12 Thread Amit Kapila
On Wed, Sep 11, 2019 at 8:53 PM Masahiko Sawada  wrote:
>
> I've attached the updated patch that incorporated all comments. I kept
> the function as superuser-restricted.
>

Thanks for the updated patch.

Few more comments:

*
+ if (!superuser())
+ ereport(ERROR,
+ (errcode
(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to use raw
page functions")));

I think it is better to use a message like "must be superuser to use
pageinspect functions" as this function doesn't take raw page as
input.  If you see other functions like bt_page_items which doesn't
take raw page as input has the message which I am suggesting.  I can
see that tuple_data_split also has a similar message as you are
proposing, but I think that is also not very appropriate.

*
else
+ {
+ if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
+ d[cnt++] = CStringGetTextDatum
("HEAP_XMAX_LOCK_ONLY");

If the idea is that whenever decode_combined flag is false, we will
display the raw flags set on the tuple, then why to try to interpret
flags on a tuple in the above case.

*
+ if (decode_combined &&
+ HEAP_LOCKED_UPGRADED(t_infomask))
+ d[cnt++] = CStringGetTextDatum("HEAP_LOCKED_UPGRADED");
+ else
+ {
+ if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
+ d[cnt++] = CStringGetTextDatum
("HEAP_XMAX_LOCK_ONLY");
+ if ((t_infomask & HEAP_XMAX_IS_MULTI) != 0)
+ d[cnt++] = CStringGetTextDatum
("HEAP_XMAX_IS_MULTI");
+ }

I am not completely sure whether we want to cover HEAP_LOCKED_UPGRADED
case even when decode_combined flag is set.  It seems this is a bit
more interpretation of flags than we are doing in other cases.  For
example, other cases like HEAP_XMAX_SHR_LOCK or HEAP_XMIN_FROZEN are
the flags that are explicitly set on the tuple so displaying them
makes sense, but the same is not true for HEAP_LOCKED_UPGRADED.

*
+CREATE FUNCTION heap_tuple_infomask_flags(
+   t_infomask integer,
+   t_infomask2 integer,
+   decode_combined boolean DEFAULT false)

I am not very happy with the parameter name 'decode_combined'.  It is
not clear from the name what it means and I think it doesn't even
match with what we are actually doing here.  How about raw_flags,
raw_tuple_flags or something like that?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com