Re: Asymmetric partition-wise JOIN

2019-08-24 Thread Kohei KaiGai
2019年8月24日(土) 7:02 Thomas Munro :
>
> On Fri, Aug 23, 2019 at 4:05 AM Kohei KaiGai  wrote:
> > We can consider the table join ptable X t1 above is equivalent to:
> >   (ptable_p0 + ptable_p1 + ptable_p2) X t1
> > = (ptable_p0 X t1) + (ptable_p1 X t1) + (ptable_p2 X t1)
> > It returns an equivalent result, however, rows are already reduced by 
> > HashJoin
> > in the individual leaf of Append, so CPU-cycles consumed by Append node can
> > be cheaper.
> >
> > On the other hands, it has a downside because t1 must be read 3 times and
> > hash table also must be built 3 times. It increases the expected cost,
> > so planner
> > may not choose the asymmetric partition-wise join plan.
>
> What if you include the partition constraint as a filter on t1?  So you get:
>
> ptable X t1 =
>   (ptable_p0 X (σ hash(dist)%4=0 (t1))) +
>   (ptable_p1 X (σ hash(dist)%4=1 (t1))) +
>   (ptable_p2 X (σ hash(dist)%4=2 (t1))) +
>   (ptable_p3 X (σ hash(dist)%4=3 (t1)))
>
> Pros:
> 1.  The hash tables will not contain unnecessary junk.
> 2.  You'll get the right answer if t1 is on the outer side of an outer join.
> 3.  If this runs underneath a Parallel Append and t1 is big enough
> then workers will hopefully cooperate and do a synchronised scan of
> t1.
> 4.  The filter might enable a selective and efficient plan like an index scan.
>
> Cons:
> 1.  The filter might not enable a selective and efficient plan, and
> therefore cause extra work.
>
> (It's a little weird in this example because don't usually see hash
> functions in WHERE clauses, but that could just as easily be dist
> BETWEEN 1 AND 99 or any other partition constraint.)
>
It requires the join-key must include the partition key and also must be
equality-join, doesn't it?
If ptable and t1 are joined using ptable.dist = t1.foo, we can distribute
t1 for each leaf table with "WHERE hash(foo)%4 = xxx" according to
the partition bounds, indeed.

In case when some of partition leafs are pruned, it is exactly beneficial
because relevant rows to be referenced by the pruned child relations
are waste of memory.

On the other hands, it eventually consumes almost equivalent amount
of memory to load the inner relations, if no leafs are pruned, and if we
could extend the Hash-node to share the hash-table with sibling join-nodess.

> > One idea I have is, sibling HashJoin shares a hash table that was built once
> > by any of the sibling Hash plan. Right now, it is not implemented yet.
>
> Yeah, I've thought a little bit about that in the context of Parallel
> Repartition.  I'm interested in combining intra-node partitioning
> (where a single plan node repartitions data among workers on the fly)
> with inter-node partitioning (like PWJ, where partitions are handled
> by different parts of the plan, considered at planning time); you
> finish up needing to have nodes in the plan that 'receive' tuples for
> each partition, to match up with the PWJ plan structure.  That's not
> entirely unlike CTE references, and not entirely unlike your idea of
> somehow sharing the same hash table.  I ran into a number of problems
> while thinking about that, which I should write about in another
> thread.
>
Hmm. Do you intend the inner-path may have different behavior according
to the partition bounds definition where the outer-path to be joined?
Let me investigate its pros & cons.

The reasons why I think the idea of sharing the same hash table is reasonable
in this scenario are:
1. We can easily extend the idea for parallel optimization. A hash table on DSM
segment, once built, can be shared by all the siblings in all the
parallel workers.
2. We can save the memory consumption regardless of the join-keys and
partition-keys, even if these are not involved in the query.

On the other hands, below are the downside. Potentially, combined use of
your idea may help these cases:
3. Distributed inner-relation cannot be outer side of XXX OUTER JOIN.
4. Hash table contains rows to be referenced by only pruned partition leafs.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: using explicit_bzero

2019-08-24 Thread Peter Eisentraut
On 2019-08-14 05:00, Michael Paquier wrote:
> On Tue, Aug 13, 2019 at 10:30:39AM +0200, Peter Eisentraut wrote:
>> Another patch, to attempt to fix the Windows build.
> 
> I have not been able to test the compilation, but the changes look
> good on this side.

Rebased patch, no functionality changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 36c8521f8c75517670aa01d064e8b4c117765f87 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 13 Aug 2019 10:25:17 +0200
Subject: [PATCH v6] Use explicit_bzero

Use the explicit_bzero() function in places where it is important that
security information such as passwords is cleared from memory.  There
might be other places where it could be useful; this is just an
initial collection.

For platforms that don't have explicit_bzero(), provide various
fallback implementations.  (explicit_bzero() itself isn't standard,
but as Linux/glibc and OpenBSD have it, it's the most common spelling,
so it makes sense to make that the invocation point.)

Discussion: 
https://www.postgresql.org/message-id/flat/42d26bde-5d5b-c90d-87ae-6cab875f73be%402ndquadrant.com
---
 configure| 15 +++-
 configure.in |  2 +
 src/backend/libpq/be-secure-common.c |  3 ++
 src/include/pg_config.h.in   |  6 +++
 src/include/pg_config.h.win32|  6 +++
 src/include/port.h   |  4 ++
 src/interfaces/libpq/fe-connect.c|  8 
 src/port/explicit_bzero.c| 55 
 src/tools/msvc/Mkvcbuild.pm  |  2 +-
 9 files changed, 99 insertions(+), 2 deletions(-)
 create mode 100644 src/port/explicit_bzero.c

diff --git a/configure b/configure
index f14709ed1e..b3c92764be 100755
--- a/configure
+++ b/configure
@@ -15087,7 +15087,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred 
getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat 
pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open 
strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
+for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred 
getrlimit mbstowcs_l memset_s memmove poll posix_fallocate ppoll pstat 
pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open 
strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -15739,6 +15739,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "explicit_bzero" "ac_cv_func_explicit_bzero"
+if test "x$ac_cv_func_explicit_bzero" = xyes; then :
+  $as_echo "#define HAVE_EXPLICIT_BZERO 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" explicit_bzero.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS explicit_bzero.$ac_objext"
+ ;;
+esac
+
+fi
+
 ac_fn_c_check_func "$LINENO" "fls" "ac_cv_func_fls"
 if test "x$ac_cv_func_fls" = xyes; then :
   $as_echo "#define HAVE_FLS 1" >>confdefs.h
diff --git a/configure.in b/configure.in
index 805cf8617b..0d16c1a971 100644
--- a/configure.in
+++ b/configure.in
@@ -1594,6 +1594,7 @@ AC_CHECK_FUNCS(m4_normalize([
getpeerucred
getrlimit
mbstowcs_l
+   memset_s
memmove
poll
posix_fallocate
@@ -1691,6 +1692,7 @@ fi
 
 AC_REPLACE_FUNCS(m4_normalize([
dlopen
+   explicit_bzero
fls
getopt
getrusage
diff --git a/src/backend/libpq/be-secure-common.c 
b/src/backend/libpq/be-secure-common.c
index e8f27bc782..d801929ea2 100644
--- a/src/backend/libpq/be-secure-common.c
+++ b/src/backend/libpq/be-secure-common.c
@@ -87,6 +87,7 @@ run_ssl_passphrase_command(const char *prompt, bool 
is_server_start, char *buf,
{
if (ferror(fh))
{
+   explicit_bzero(buf, size);
ereport(loglevel,
(errcode_for_file_access(),
 errmsg("could not read from command 
\"%s\": %m",
@@ -98,6 +99,7 @@ run_ssl_passphrase_command(const char *prompt, bool 
is_server_start, char *buf,
pclose_rc = ClosePipeStream(fh);
if (pclose_rc == -1)
{
+   explicit_bzero(buf, size);
ereport(loglevel,
(errcode_for_file_access(),
 errmsg("could not close pipe to external 
command: %m")));
@@ -105,6 +107,7 @@ run_ssl_passphrase_command(const char *prompt, bool 
is_server_start, char *buf,
}
else if (pclose_rc != 0)
{
+   explicit_bzero(buf, size);
ereport(loglevel,
(errcode_for_file_access(),
   

Re: POC: Cleaning up orphaned files using undo logs

2019-08-24 Thread Robert Haas
On Fri, Aug 23, 2019 at 2:04 AM Thomas Munro  wrote:
> 2.  Strict self-update-only:  We could update it as part of
> transaction cleanup.  That is, when you commit or abort, probably at
> some time when your transaction is still advertised as running, you go
> and update your own transaction header with your the size.  If you
> never reach that stage, I think you can fix it during crash recovery,
> during the startup scan that feeds the rollback request queues.  That
> is, if you encounter a transaction header with length == 0, it must be
> the final one and its length is therefore known and can be updated,
> before you allow new transactions to begin.  There are some
> complications like backends that exit without crashing, which I
> haven't thought about.  As Amit just pointed out to me, that means
> that the update is not folded into the same buffer access as the next
> transaction, but perhaps you can mitigate that by not updating your
> header if the next header will be on the same page -- the next
> transaction can do it safely then (this page with the insert pointer
> on it can't be discarded).  As Dilip just pointed out to me, it means
> that you always do an update that you might not never need to do if
> the transaction is discarded, to which I have no answer.  Bleugh.

Andres and I have spent a lot of time on the phone over the last
couple of days and I think we both kind of like this option.  I don't
think that the costs are likely to be very significant: you're talking
about pinning, locking, dirtying, unlocking, and unpinning one buffer
at commit time, or maybe two if your transaction touched both logged
and unlogged tables.  If the transaction is short enough for that
overhead to matter, that buffer is probably already in shared_buffers,
is probably already dirty, and is probably already in your CPU's
cache. So I think the overhead will turn out to be low.

Moreover, I doubt that we want to separately discard every transaction
anyway.  If you have very light-weight transactions, you don't want to
add an extra WAL record per transaction anyway.  Increasing the number
of separate WAL records per transaction from say 5 to 6 would be a
significant additional cost.  You probably want to perform a discard,
say, every 5 seconds or sooner if you can discard at least 64kB of
undo, or something of that sort.  So we're not going to save the
overhead of updating the previous transaction header often enough to
make much difference unless we're discarding so aggressively that we
incur a much larger overhead elsewhere.  I think.

I am a little concerned about the backends that exit without crashing.
Andres seems to want to treat that case as a bug to be fixed, but I
doubt whether that's going to be practical.   We're really only
talking about extreme corner cases here, because
before_shmem_exit(ShutdownPostgres, 0) means we'll
AbortOutOfAnyTransaction() which should RecordTransactionAbort(). Only
if we fail in the AbortTransaction() prior to reaching
RecordTransactionAbort() will we manage to reach the later cleanup
stages without having written an abort record.  I haven't scrutinized
that code lately to see exactly how things can go wrong there, but
there shouldn't be a whole lot. However, there's probably a few
things, like maybe a really poorly-timed malloc() failure.

A zero-order solution would be to install a deadman switch. At
on_shmem_exit time, you must detach from any undo log to which you are
connected, so that somebody else can attach to it later.  We can stick
in a cross-check there that you haven't written any undo bytes to that
log and PANIC if you have.  Then the system must be water-tight.
Perhaps it's possible to do better: if we could identify the cases in
which such logic gets reached, we could try to guarantee that WAL is
written and the undo log safely detached before we get there.  But at
the various least we can promote ERROR/FATAL to PANIC in the relevant
case.

A better solution would be to detect the problem and make sure we
recover from it before reusing the undo log.  Suppose each undo log
has three states: (1) nobody's attached, (2) somebody's attached, and
(3) nobody's attached but the last record might need a fixup.  When we
start up, all undo logs are in state 3, and the discard worker runs
around and puts them into state 1.  Subsequently, they alternate
between states 1 and 2 for as long as the system remains up.  But if
as an exceptional case we reach on_shmem_exit without having detached
the undo log, because of cascading failures, then we put the undo log
in state 3.  The discard worker already knows how to move undo logs
from state 3 to state 1, and it can do the same thing here.  Until it
does nobody else can reuse that undo log.

I might be missing something, but I think that would nail this down
pretty tightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: mingw32 floating point diff

2019-08-24 Thread Emre Hasegeli
> I poked at this a bit more.  I can reproduce the problem by using
> -mfpmath=387 on dromedary's host (fairly old 32-bit macOS); although
> I also get half a dozen *other* failures in the core regression tests,
> mostly around detection of float overflow.  So I'm not quite sure that
> this is comparable.  But at any rate, I tracked the core of the problem
> to pg_hypot:

I couldn't test if it helps, but another solution may be is to rip out
pg_hypot() in favour of the libc implementation.  This was discussed
in detail as part of "Improve geometric types" thread.

Last comment about it:

https://www.postgresql.org/message-id/9223.1507039405%40sss.pgh.pa.us




Re: POC: Cleaning up orphaned files using undo logs

2019-08-24 Thread Robert Haas
On Wed, Aug 21, 2019 at 4:34 PM Robert Haas  wrote:
> ReleaseResourcesAndProcessUndo() is only supposed to be called after
> AbortTransaction(), and the additional steps it performs --
> AtCleanup_Portals() and AtEOXact_Snapshot() or alternatively
> AtSubCleanup_Portals -- are taken from Cleanup(Sub)Transaction.
> That's not crazy; the other steps in Cleanup(Sub)Transaction() look
> like stuff that's intended to be performed when we're totally done
> with this TransactionState stack entry, whereas these things are
> slightly higher-level cleanups that might even block undo (e.g.
> undropped portal prevents orphaned file cleanup).  Granted, there are
> no comments explaining why those particular cleanup steps are
> performed here, and it's possible some other approach is better, but I
> think perhaps it's not quite as flagrantly broken as you think.

Andres smacked me with the clue-bat off-list and now I understand why
this is broken: there's no guarantee that running the various
AtEOXact/AtCleanup functions actually puts the transaction back into a
good state.  They *might* return the system to the state that it was
in immediately following StartTransaction(), but they also might not.
Moreover, ProcessUndoRequestForEachLogCat uses PG_TRY/PG_CATCH and
then discards the error without performing *any cleanup at all* but
then goes on and tries to do undo for other undo log categories
anyway.  That is totally unsafe.

I think that there should only be one chance to perform undo actions,
and as I said or at least alluded to before, if that throws an error,
it shouldn't be caught by a TRY/CATCH block but should be handled by
the state machine in xact.c.  If we're not going to make the state
machine handle these conditions, the addition of
TRANS_UNDO/TBLOCK_UNDO/TBLOCK_SUBUNDO is really missing the boat.  I'm
still not quite sure of the exact sequence of steps: we clearly need
AtCleanup_Portals() and a bunch of the other stuff that happens during
CleanupTransaction(), ideally including the freeing of memory, to
happen before we try undo. But I don't have a great feeling for how to
make that happen, and it seems more desirable for undo to begin as
soon as the transaction fails rather than waiting until
Cleanup(Sub)Transaction() time. I think some more research is needed
here.

> I am also not convinced that semi-critical sections are a bad idea,

Regarding this, after further thought and discussion with Andres,
there are two cases here that call for somewhat different handling:
temporary undo, and subtransaction abort.

In the case of a subtransaction abort, we can't proceed with the
toplevel transaction unless we succeed in applying the
subtransaction's undo, but that does not require killing off the
backend.  It might be a better idea to just fail the containing
subtransaction with the error that occurred during undo apply; if
there are multiple levels of subtransactions present then we might
fail in the same way several times, but eventually we'll fail at the
top level, forcibly kick the undo into the background, and the session
can continue.  The background workers will, hopefully, eventually
recover the situation.  Even if they can't, because, say, the failure
is due to a bug or whatever, killing off the session doesn't really
help.

In the case of temporary undo, killing the session is a much more
appealing approach. If we don't, how will that undo ever get
processed?  We could retry at some point (like every time we return to
the toplevel command prompt?) or just ignore the fact that we didn't
manage to perform those undo actions and leave that undo there like an
albatross, accumulating more and more undo behind it until the session
exits or the disk fills up.  The latter strikes me as a terrible user
experience, especially because for wire protocol reasons we'd have to
swallow the errors or at best convert them to warnings, but YMMV.

Anyway, probably these cases should not be handled exactly the same
way, but exactly what to do probably depends on the previous question:
how exactly does the integration into xact.c's state machine work,
anyway?

Meanwhile, I've been working up a prototype of how the undorequest.c
stuff I sent previously could be integrated with xact.c.  In working
on that, I've realized that there seem to be two different tasks.  One
is tracking the information that we'll need to have available to
perform undo actions.  The other is the actual transaction state
manipulation: when and how do we abort transactions, cleanup
transactions, start new transactions specifically for undo?  How are
transactions performing undo specially marked, if at all?  The
attached patch includes a new module, undostate.c/h, which tries to
handle the first of those things; this is just a prototype, and is
missing some pieces marked with XXX, but I think it's probably the
right general direction.  It will still need to be plugged into a
framework for launching undo apply background workers (which might
require som

Re: "ago" times on buildfarm status page

2019-08-24 Thread Andrew Dunstan


On 8/21/19 10:52 AM, Andrew Dunstan wrote:
>>
>> The Javscript could also be made to update the "ago" part every minute,
>> and show the absoulte time as a tooltip, which is what pretty much every
>> other website does.
>>
>
> The code for the page is here:
> 
>
>
> Patches welcome.
>
>


I have done both of these things.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: mingw32 floating point diff

2019-08-24 Thread Tom Lane
Emre Hasegeli  writes:
> I couldn't test if it helps, but another solution may be is to rip out
> pg_hypot() in favour of the libc implementation.  This was discussed
> in detail as part of "Improve geometric types" thread.

Hm ... the problem we're trying to fix here is platform-varying results.
Surely switching to the libc implementation would make that worse not
better?

regards, tom lane




Re: Re: Email to hackers for test coverage

2019-08-24 Thread movead...@highgo.ca
>Hi Movead,
>Please add that to commitfest.

Thanks and I just do it, it is
https://commitfest.postgresql.org/24/2258/

--
Movead Li


Re: Why overhead of SPI is so large?

2019-08-24 Thread David Fetter
On Fri, Aug 23, 2019 at 11:10:28AM +0200, Pavel Stehule wrote:
> pá 23. 8. 2019 v 11:05 odesílatel Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> napsal:
> 
> >
> >
> > On 22.08.2019 18:56, Pavel Stehule wrote:
> >
> >
> >
> > čt 22. 8. 2019 v 17:51 odesílatel Konstantin Knizhnik <
> > k.knizh...@postgrespro.ru> napsal:
> >
> >> Some more information...
> >> First of all I found out that marking PL/pgSQL function as immutable
> >> significantly increase speed of its execution:
> >> 19808 ms vs. 27594. It happens because exec_eval_simple_expr is taken
> >> snapshot if function is volatile (default).
> >> I wonder if PL/pgSQL compiler can detect that evaluated expression itself
> >> is actually immutable  and there is no need to take snapshot
> >> for each invocation of this function. Also I have tried yet another PL
> >> language - JavaScript, which is now new outsider, despite to the fact that
> >> v8 JIT compiler is very good.
> >>
> >
> > I have a plan to do some work in this direction. Snapshot is not necessary
> > for almost buildin functions. If expr calls only buildin functions, then
> > probably can be called without snapshot and without any work with plan
> > cache.
> >
> >
> > I wonder if the following simple patch is correct?
> >
> 
> You cannot to believe to user defined functions so immutable flag is
> correct. Only buildin functions are 100% correct.
> 
> CREATE OR REPLACE FUNCTION foo()
> RETURNS int AS $$
> SELECT count(*) FROM pg_class;
> $$ LANGUAGE sql IMMUTABLE;
> 
> is working.

No, it's lying to the RDBMS, so it's pilot error. The problem of
determining from the function itself whether it is in fact immutable
is, in general, equivalent to the Halting Problem, so no, we can't
figure it out. We do need to trust our users not to lie to us, and we
do not need to protect them from the consequences when they do.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-24 Thread Bruce Momjian
On Fri, Aug 23, 2019 at 10:04:13PM -0400, Stephen Frost wrote:
> > Well, I think they might do that to reduce encryption overhead.  I think
> > tests have shown that is not an issue, but we will need to test further.
> 
> I seriously doubt that's why and I don't think there's actually much
> value in trying to figure out the "why" here- the question is, do those
> systems answer the check-box requirement that was brought up on the call
> as the justification for this feature?  If so, then clearly not
> everything is required to be encrypted and we shouldn't be stressing
> over trying to do that.

We will stress in trying _not_ to encrypt everything.

> > I am not sure of the downside of encrypting everything, since it leaks
> > the least information and has a minimal user API and code impact.  What
> > is the value of encrypting only the user rows?  Better key control?
> 
> Yes, better key control, and better user API, and avoiding having an

Uh, there is no user API for all-cluster encryption except for the
administrator.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Why overhead of SPI is so large?

2019-08-24 Thread Pavel Stehule
so 24. 8. 2019 v 18:01 odesílatel David Fetter  napsal:

> On Fri, Aug 23, 2019 at 11:10:28AM +0200, Pavel Stehule wrote:
> > pá 23. 8. 2019 v 11:05 odesílatel Konstantin Knizhnik <
> > k.knizh...@postgrespro.ru> napsal:
> >
> > >
> > >
> > > On 22.08.2019 18:56, Pavel Stehule wrote:
> > >
> > >
> > >
> > > čt 22. 8. 2019 v 17:51 odesílatel Konstantin Knizhnik <
> > > k.knizh...@postgrespro.ru> napsal:
> > >
> > >> Some more information...
> > >> First of all I found out that marking PL/pgSQL function as immutable
> > >> significantly increase speed of its execution:
> > >> 19808 ms vs. 27594. It happens because exec_eval_simple_expr is taken
> > >> snapshot if function is volatile (default).
> > >> I wonder if PL/pgSQL compiler can detect that evaluated expression
> itself
> > >> is actually immutable  and there is no need to take snapshot
> > >> for each invocation of this function. Also I have tried yet another PL
> > >> language - JavaScript, which is now new outsider, despite to the fact
> that
> > >> v8 JIT compiler is very good.
> > >>
> > >
> > > I have a plan to do some work in this direction. Snapshot is not
> necessary
> > > for almost buildin functions. If expr calls only buildin functions,
> then
> > > probably can be called without snapshot and without any work with plan
> > > cache.
> > >
> > >
> > > I wonder if the following simple patch is correct?
> > >
> >
> > You cannot to believe to user defined functions so immutable flag is
> > correct. Only buildin functions are 100% correct.
> >
> > CREATE OR REPLACE FUNCTION foo()
> > RETURNS int AS $$
> > SELECT count(*) FROM pg_class;
> > $$ LANGUAGE sql IMMUTABLE;
> >
> > is working.
>
> No, it's lying to the RDBMS, so it's pilot error. The problem of
> determining from the function itself whether it is in fact immutable
> is, in general, equivalent to the Halting Problem, so no, we can't
> figure it out. We do need to trust our users not to lie to us, and we
> do not need to protect them from the consequences when they do.
>

I have not any problem with fixing this behave when there will be any
alternative.

I can imagine new special flag that can be used for STABLE functions, that
enforce one shot plans and can be optimized similar like IMMUTABLE
functions now - using result in planning time.

The users lie because they must - there is not a alternative. There is not
any other solution - and estimation errors related to a joins are
fundamental issue.

Regards

Pavel





> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


Re: assertion at postmaster start

2019-08-24 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> On 2019-06-15 12:09:50 -0400, Alvaro Herrera wrote:
>>> Once in a blue moon I get this assertion failure on server start:
>>> TRAP: FailedAssertion("!(AbortStartTime == 0)", Archivo: 
>>> "/pgsql/source/master/src/backend/postmaster/postmaster.c", Linea: 2957)

>> And "server process" is afaict only used for actual backends, not other
>> types of processes. But we've not yet seen "database system is ready to
>> accept accept connections", so IIRC it could only be a "dead_end" type
>> backend? But we didn't yet see an error from that...
>> Seems to indicate a logic error in postmaster's state machine. Perhaps
>> something related to dead_end processes?

> So if Andres is guessing right, this must be from something trying to
> connect before the postmaster is ready?  Seems like that could be
> tested for ...

I got around to trying to test for this, and I find that I can reproduce
the symptom exactly by applying the attached hack and then trying to
connect a couple of seconds after starting the postmaster.

Basically what seems to be happening in Alvaro's report is that

(1) before the startup process is done, something tries to connect,
causing a dead_end child to be launched;

(2) for reasons mysterious, that child exits with exit code 15 rather
than the expected 0 or 1;

(3) the postmaster therefore initiates a system-wide restart cycle;

(4) the startup process completes normally anyway, indicating that the
SIGQUIT arrived too late to affect it;

(5) then we hit the Assert, since we reach the transition-to-normal-run
code even though HandleChildCrash set AbortStartTime in step (3).

The timing window for (4) to happen is extremely tight normally.  The
attached patch makes it wider by the expedient of just not sending the
SIGQUIT to the startup process ;-).  Then you just need enough of a delay
in startup to perform a manual connection, plus some hack to make the
dead_end child exit with an unexpected exit code.

I think what this demonstrates is that that Assert is just wrong:
we *can* reach PM_RUN with the flag still set, so we should do

StartupStatus = STARTUP_NOT_RUNNING;
FatalError = false;
-   Assert(AbortStartTime == 0);
+   AbortStartTime = 0;
ReachedNormalRunning = true;
pmState = PM_RUN;

Probably likewise for the similar Assert in sigusr1_handler.

A larger question is whether we should modify the postmaster logic
so that crashes of dead_end children aren't treated as reasons to
perform a system restart.  I'm dubious about this, because frankly,
such crashes shouldn't be happening.  There is very little code
that a dead_end child will traverse before exiting ... so how the
devil did it reach an exit(15)?  Alvaro, are you running any
nonstandard code in the postmaster (shared_preload_libraries, maybe)?

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3339804..ef53522 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2260,6 +2260,7 @@ retry1:
 	switch (port->canAcceptConnections)
 	{
 		case CAC_STARTUP:
+			exit(15);
 			ereport(FATAL,
 	(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 	 errmsg("the database system is starting up")));
@@ -3512,7 +3513,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 (errmsg_internal("sending %s to process %d",
  (SendStop ? "SIGSTOP" : "SIGQUIT"),
  (int) StartupPID)));
-		signal_child(StartupPID, (SendStop ? SIGSTOP : SIGQUIT));
+//		signal_child(StartupPID, (SendStop ? SIGSTOP : SIGQUIT));
 		StartupStatus = STARTUP_SIGNALED;
 	}
 
@@ -5383,6 +5384,9 @@ StartChildProcess(AuxProcType type)
 		MemoryContextDelete(PostmasterContext);
 		PostmasterContext = NULL;
 
+		if (type == StartupProcess)
+			pg_usleep(500);
+
 		AuxiliaryProcessMain(ac, av);
 		ExitPostmaster(0);
 	}


Re: [PATCH] Make configuration file "include" directive handling more robust

2019-08-24 Thread Tom Lane
Ian Barwick  writes:
> On 7/17/19 5:34 PM, Kyotaro Horiguchi wrote:> Hello.
>>> I don't think this is new to 12.

> No, though I'm not sure how much this would be seen as a bugfix
> and how far back it would be sensible to patch.

I think this is worth considering as a bugfix; although I'm afraid
we can't change the signature of ParseConfigFile/ParseConfigFp in
released branches, since extensions could possibly be using those.
That limits what we can do --- but it's still possible to detect
direct recursion, which seems like enough to produce a nice error
message in typical cases.

I concur with Kyotaro-san that disallow-empty-include-directives.v1.patch
seems a bit brute-force, but where I would put the checks is in
ParseConfigFile and ParseConfigDirectory.

Also, I don't agree with the goals of prevent-disallowed-includes.patch.
I'm utterly not on board with breaking use of "include" in extension
files, for instance; while that may not be documented, it works fine,
and maybe somebody out there is relying on it.  Likewise, while "include"
in pg.auto.conf is not really considered supported, I don't see the
point of going out of our way to break the historical behavior.

That leads me to propose the attached simplified patch.  While I haven't
actually tried, I'm pretty sure this should back-patch without trouble.

regards, tom lane

diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 1c8b5f7..125b898 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -568,6 +568,22 @@ ParseConfigFile(const char *config_file, bool strict,
 	FILE	   *fp;
 
 	/*
+	 * Reject file name that is all-blank (including empty), as that leads to
+	 * confusion, or even recursive inclusion in some cases.
+	 */
+	if (strspn(config_file, " \t\r\n") == strlen(config_file))
+	{
+		ereport(elevel,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("empty configuration file name: \"%s\"",
+		config_file)));
+		record_config_file_error("empty configuration file name",
+ calling_file, calling_lineno,
+ head_p, tail_p);
+		return false;
+	}
+
+	/*
 	 * Reject too-deep include nesting depth.  This is just a safety check to
 	 * avoid dumping core due to stack overflow if an include file loops back
 	 * to itself.  The maximum nesting depth is pretty arbitrary.
@@ -585,6 +601,26 @@ ParseConfigFile(const char *config_file, bool strict,
 	}
 
 	abs_path = AbsoluteConfigLocation(config_file, calling_file);
+
+	/*
+	 * Reject direct recursion.  Indirect recursion is also possible, but it's
+	 * harder to detect and so doesn't seem worth the trouble.  (We test at
+	 * this step because the canonicalization done by AbsoluteConfigLocation
+	 * makes it more likely that a simple strcmp comparison will match.)
+	 */
+	if (calling_file && strcmp(abs_path, calling_file) == 0)
+	{
+		ereport(elevel,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("configuration file recursion in \"%s\"",
+		calling_file)));
+		record_config_file_error("configuration file recursion",
+ calling_file, calling_lineno,
+ head_p, tail_p);
+		pfree(abs_path);
+		return false;
+	}
+
 	fp = AllocateFile(abs_path, "r");
 	if (!fp)
 	{
@@ -933,6 +969,27 @@ ParseConfigDirectory(const char *includedir,
 	int			size_filenames;
 	bool		status;
 
+	/*
+	 * Reject directory name that is all-blank (including empty), as that
+	 * leads to confusion, or even recursive inclusion in some cases.
+	 */
+	if (strspn(includedir, " \t\r\n") == strlen(includedir))
+	{
+		ereport(elevel,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("empty configuration directory name: \"%s\"",
+		includedir)));
+		record_config_file_error("empty configuration directory name",
+ calling_file, calling_lineno,
+ head_p, tail_p);
+		return false;
+	}
+
+	/*
+	 * We don't check for recursion or too-deep nesting depth here; the
+	 * subsequent calls to ParseConfigFile will take care of that.
+	 */
+
 	directory = AbsoluteConfigLocation(includedir, calling_file);
 	d = AllocateDir(directory);
 	if (d == NULL)


LLVM breakage on seawasp

2019-08-24 Thread Thomas Munro
Hi,

llvmjit_inline.cpp:177:55: error: ‘make_unique’ is not a member of ‘llvm’
  std::unique_ptr globalsToInline =
llvm::make_unique();

That's because they just moved to C++14 and replaced their own
llvm::make_unique<> with std::make_unique<>:

https://github.com/llvm-mirror/llvm/commit/114087caa6f95b526861c3af94b3093d9444c57b

Perhaps we'll need some macrology to select between llvm and std
versions?  I am guessing we can't decree that PostgreSQL's minimum C++
level is C++14 and simply change it to std::make_unique.

-- 
Thomas Munro
https://enterprisedb.com




Re: LLVM breakage on seawasp

2019-08-24 Thread Andres Freund
Hi,

On August 24, 2019 1:08:11 PM PDT, Thomas Munro  wrote:
>Hi,
>
>llvmjit_inline.cpp:177:55: error: ‘make_unique’ is not a member of
>‘llvm’
>  std::unique_ptr globalsToInline =
>llvm::make_unique();
>
>That's because they just moved to C++14 and replaced their own
>llvm::make_unique<> with std::make_unique<>:
>
>https://github.com/llvm-mirror/llvm/commit/114087caa6f95b526861c3af94b3093d9444c57b
>
>Perhaps we'll need some macrology to select between llvm and std
>versions?  I am guessing we can't decree that PostgreSQL's minimum C++
>level is C++14 and simply change it to std::make_unique.

Perhaps just a
#if new_enough
using std::make_unique
#else
using  llvm::mak_eunique

At the start of the file, and then use it unqualified?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: assertion at postmaster start

2019-08-24 Thread Tom Lane
I wrote:
> I think what this demonstrates is that that Assert is just wrong:
> we *can* reach PM_RUN with the flag still set, so we should do
>   StartupStatus = STARTUP_NOT_RUNNING;
>   FatalError = false;
> - Assert(AbortStartTime == 0);
> + AbortStartTime = 0;
>   ReachedNormalRunning = true;
>   pmState = PM_RUN;
> Probably likewise for the similar Assert in sigusr1_handler.

Poking further at this, I noticed that the code just above here completely
fails to do what the comments say it intends to do, which is restart the
startup process after we've SIGQUIT'd it.  That's because the careful
manipulation of StartupStatus in reaper (lines 2943ff in HEAD) is stomped
on by HandleChildCrash, which will just unconditionally reset it to
STARTUP_CRASHED (line 3507).  So we end up shutting down the database
after all, which is not what the intention seems to be.  Hence,
commit 45811be94 was still a few bricks shy of a load :-(.

I propose the attached.  I'm inclined to think that the risk/benefit
of back-patching this is not very good, so I just want to stick it in
HEAD, unless somebody can explain why dead_end children are likely to
crash in the field.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3339804..ecb7892 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2920,7 +2920,9 @@ reaper(SIGNAL_ARGS)
 			 * during PM_STARTUP is treated as catastrophic. There are no
 			 * other processes running yet, so we can just exit.
 			 */
-			if (pmState == PM_STARTUP && !EXIT_STATUS_0(exitstatus))
+			if (pmState == PM_STARTUP &&
+StartupStatus != STARTUP_SIGNALED &&
+!EXIT_STATUS_0(exitstatus))
 			{
 LogChildExit(LOG, _("startup process"),
 			 pid, exitstatus);
@@ -2937,11 +2939,24 @@ reaper(SIGNAL_ARGS)
 			 * then we previously sent the startup process a SIGQUIT; so
 			 * that's probably the reason it died, and we do want to try to
 			 * restart in that case.
+			 *
+			 * This stanza also handles the case where we sent a SIGQUIT
+			 * during PM_STARTUP due to some dead_end child crashing: in that
+			 * situation, if the startup process dies on the SIGQUIT, we need
+			 * to transition to PM_WAIT_BACKENDS state which will allow
+			 * PostmasterStateMachine to restart the startup process.  (On the
+			 * other hand, the startup process might complete normally, if we
+			 * were too late with the SIGQUIT.  In that case we'll fall
+			 * through and commence normal operations.)
 			 */
 			if (!EXIT_STATUS_0(exitstatus))
 			{
 if (StartupStatus == STARTUP_SIGNALED)
+{
 	StartupStatus = STARTUP_NOT_RUNNING;
+	if (pmState == PM_STARTUP && Shutdown == NoShutdown)
+		pmState = PM_WAIT_BACKENDS;
+}
 else
 	StartupStatus = STARTUP_CRASHED;
 HandleChildCrash(pid, exitstatus,
@@ -2954,7 +2969,7 @@ reaper(SIGNAL_ARGS)
 			 */
 			StartupStatus = STARTUP_NOT_RUNNING;
 			FatalError = false;
-			Assert(AbortStartTime == 0);
+			AbortStartTime = 0;
 			ReachedNormalRunning = true;
 			pmState = PM_RUN;
 
@@ -3504,7 +3519,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	if (pid == StartupPID)
 	{
 		StartupPID = 0;
-		StartupStatus = STARTUP_CRASHED;
+		/* Caller adjusts StartupStatus, so don't touch it here */
 	}
 	else if (StartupPID != 0 && take_action)
 	{
@@ -5100,7 +5115,7 @@ sigusr1_handler(SIGNAL_ARGS)
 	{
 		/* WAL redo has started. We're out of reinitialization. */
 		FatalError = false;
-		Assert(AbortStartTime == 0);
+		AbortStartTime = 0;
 
 		/*
 		 * Crank up the background tasks.  It doesn't matter if this fails,


Re: LLVM breakage on seawasp

2019-08-24 Thread Tom Lane
Andres Freund  writes:
> On August 24, 2019 1:08:11 PM PDT, Thomas Munro  
> wrote:
>> That's because they just moved to C++14 and replaced their own
>> llvm::make_unique<> with std::make_unique<>:
>> https://github.com/llvm-mirror/llvm/commit/114087caa6f95b526861c3af94b3093d9444c57b
>> Perhaps we'll need some macrology to select between llvm and std
>> versions?  I am guessing we can't decree that PostgreSQL's minimum C++
>> level is C++14 and simply change it to std::make_unique.

So we're depending on APIs that upstream doesn't think are stable?

regards, tom lane




Re: LLVM breakage on seawasp

2019-08-24 Thread Andres Freund
Hi,

On August 24, 2019 1:57:56 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On August 24, 2019 1:08:11 PM PDT, Thomas Munro
> wrote:
>>> That's because they just moved to C++14 and replaced their own
>>> llvm::make_unique<> with std::make_unique<>:
>>>
>https://github.com/llvm-mirror/llvm/commit/114087caa6f95b526861c3af94b3093d9444c57b
>>> Perhaps we'll need some macrology to select between llvm and std
>>> versions?  I am guessing we can't decree that PostgreSQL's minimum
>C++
>>> level is C++14 and simply change it to std::make_unique.
>
>So we're depending on APIs that upstream doesn't think are stable?

 Seawasp iirc builds against the development branch of llvm, which explains why 
we see failures there. Does that address what you are concerned about? If not, 
could you expand?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: LLVM breakage on seawasp

2019-08-24 Thread Thomas Munro
On Sun, Aug 25, 2019 at 8:24 AM Andres Freund  wrote:
> On August 24, 2019 1:08:11 PM PDT, Thomas Munro  
> wrote:
> >Perhaps we'll need some macrology to select between llvm and std
> >versions?  I am guessing we can't decree that PostgreSQL's minimum C++
> >level is C++14 and simply change it to std::make_unique.
>
> Perhaps just a
> #if new_enough
> using std::make_unique
> #else
> using  llvm::mak_eunique
>
> At the start of the file, and then use it unqualified?

Yeah, it's a pain though, you'd have to say:

#if llvm >= 9
#  if cpp >= 14
#using std::make_unique;
#  else
#error "postgres needs at least c++ 14 to use llvm 9"
#  endif
#else
#  using llvm::make_unique;
#endif

Maybe we should just use std::unique_ptr's constructor, ie give it new
ImportMayTy() instead of using make_unique(), even though that's not
cool C++ these days?

-- 
Thomas Munro
https://enterprisedb.com




Re: LLVM breakage on seawasp

2019-08-24 Thread Tom Lane
Andres Freund  writes:
> On August 24, 2019 1:57:56 PM PDT, Tom Lane  wrote:
>> So we're depending on APIs that upstream doesn't think are stable?

>  Seawasp iirc builds against the development branch of llvm, which explains 
> why we see failures there. Does that address what you are concerned about? If 
> not, could you expand?

I know it's the development branch.  The question is whether this
breakage is something *they* ought to be fixing.  If not, I'm
worried that we're too much in bed with implementation details
of LLVM that we shouldn't be depending on.

regards, tom lane




Re: LLVM breakage on seawasp

2019-08-24 Thread Andres Freund
Hi,

On August 24, 2019 2:37:55 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On August 24, 2019 1:57:56 PM PDT, Tom Lane 
>wrote:
>>> So we're depending on APIs that upstream doesn't think are stable?
>
>>  Seawasp iirc builds against the development branch of llvm, which
>explains why we see failures there. Does that address what you are
>concerned about? If not, could you expand?
>
>I know it's the development branch.  The question is whether this
>breakage is something *they* ought to be fixing.  If not, I'm
>worried that we're too much in bed with implementation details
>of LLVM that we shouldn't be depending on.

Don't think so - it's a C++ standard feature in the version of the standard 
LLVM is based on. So it's pretty reasonable for them to drop their older 
backwards compatible function.

Access
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: LLVM breakage on seawasp

2019-08-24 Thread Andres Freund
Hi,

On 2019-08-25 09:15:35 +1200, Thomas Munro wrote:
> On Sun, Aug 25, 2019 at 8:24 AM Andres Freund  wrote:
> > On August 24, 2019 1:08:11 PM PDT, Thomas Munro  
> > wrote:
> > >Perhaps we'll need some macrology to select between llvm and std
> > >versions?  I am guessing we can't decree that PostgreSQL's minimum C++
> > >level is C++14 and simply change it to std::make_unique.
> >
> > Perhaps just a
> > #if new_enough
> > using std::make_unique
> > #else
> > using  llvm::mak_eunique
> >
> > At the start of the file, and then use it unqualified?
> 
> Yeah, it's a pain though, you'd have to say:
> 
> #if llvm >= 9
> #  if cpp >= 14
> #using std::make_unique;
> #  else
> #error "postgres needs at least c++ 14 to use llvm 9"
> #  endif
> #else
> #  using llvm::make_unique;
> #endif

I don't think we'd really need the inner part, because you can't use
llvm 9 without a new enough compiler. There's plenty make_unique use in
inline functions etc.


> Maybe we should just use std::unique_ptr's constructor, ie give it new
> ImportMayTy() instead of using make_unique(), even though that's not
> cool C++ these days?

Yea, wfm. do you want to make it so?

Greetings,

Andres Freund




Re: LLVM breakage on seawasp

2019-08-24 Thread Tom Lane
Andres Freund  writes:
> On August 24, 2019 2:37:55 PM PDT, Tom Lane  wrote:
>> I know it's the development branch.  The question is whether this
>> breakage is something *they* ought to be fixing.  If not, I'm
>> worried that we're too much in bed with implementation details
>> of LLVM that we shouldn't be depending on.

> Don't think so - it's a C++ standard feature in the version of the standard 
> LLVM is based on. So it's pretty reasonable for them to drop their older 
> backwards compatible function.

Whether it's reasonable or not doesn't really matter to my point.
We shouldn't be in the business of tracking multitudes of small
changes in LLVM, no matter whether they're individually "reasonable".
The more often this happens, the more concerned I am that we chose
the wrong semantic level to interface at.

regards, tom lane




Re: Optimize single tuple fetch from nbtree index

2019-08-24 Thread Floris Van Nee

> Hello,
> welcome to hackers with your first patch)

Thank you.

> Though, I got interested in the comment inconsistency you have found.
> I added debug message into this code branch of the patch and was able to
> see it in regression.diffs after 'make check':
> Speaking of your patch, it seems that the buffer was unpinned and pinned
> again between two reads,
> and the condition of holding it continuously has not been met.

May I ask what makes you conclude that the condition of holding the pin 
continuously has not been met?
Your reply encouraged me to dig a little bit more into this today. First, I 
wanted to check if indeed the pin was continuously held by the backend or not. 
I added some debug info to ReleaseBuffer for this: it turned out that the pin 
on the buffer was definitely never released by the backend between the calls to 
_bt_first and _bt_next. So the buffer got compacted while the backend held a 
pin on it.
After some more searching I found the following code: _bt_vacuum_one_page in 
nbtinsert.c
This function compacts one single page without taking a super-exclusive lock. 
It is used during inserts to make room on a page. I verified that if I comment 
out the calls to this function, the compacting never happens while I have a pin 
on the buffer.
So I guess that answers my own question: cleaning up garbage during inserts is 
one of the cases where compacting may happen even while other backends hold a 
pin to the buffer. Perhaps this should also be more clearly phrased in the 
comments in eg. _bt_killitems? Because currently those comments make it look 
like this case never occurs.

> While reading the code to answer your question, I noticed that
> BTScanPosIsPinned macro name is misleading.
> It calls BufferIsValid(), not BufferIsPinned() as one could expect.
> And BufferIsValid in bufmgr.h comment explicitly states that it
> shouldn't be confused with BufferIsPinned.
> The same goes for BTScanPosUnpinIfPinned().

I agree the name is misleading. It clearly does something else than how it's 
named. However, I don't believe this introduces problems in these particular 
pieces of code, as long as the macro's are always used. BTScanPosIsPinned 
actually checks whether it's valid and not necessarily whether it's pinned, as 
you mentioned. However, any time the buffer gets unpinned using the macro 
BTScanPosUnpin, the buffer gets set to Invalid by the macro as well. Therefore, 
any consecutive call to BTScanPosIsPinned should indeed return false. It'd 
definitely be nice if this gets clarified in comments though.

-Floris




pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)

2019-08-24 Thread Justin Pryzby
Core was generated by `postgres: telsasoft ts 10.100.2.162(33500) SELECT '.
Program terminated with signal 6, Aborted.
#0  0x0039ff6325e5 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install 
audit-libs-2.4.5-3.el6.x86_64 cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 
geos36-3.6.3-1.rhel6.1.x86_64 json-c-0.11-12.el6.x86_64 
keyutils-libs-1.4-5.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 
libgcc-4.4.7-17.el6.x86_64 libicu-4.2.1-14.el6.x86_64 
libselinux-2.0.94-7.el6.x86_64 libstdc++-4.4.7-17.el6.x86_64 
nspr-4.11.0-1.el6.x86_64 nss-3.21.0-8.el6.x86_64 
nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64 nss-util-3.21.0-2.el6.x86_64 
postgis24_11-2.4.5-1.rhel6.1.x86_64 proj49-4.9.3-3.rhel6.1.x86_64 
zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x0039ff6325e5 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0039ff633dc5 in abort () at abort.c:92
#2  0x0039ff6704f7 in __libc_message (do_abort=2, fmt=0x39ff758a60 "*** 
glibc detected *** %s: %s: 0x%s ***\n") at 
../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x0039ff675f3e in malloc_printerr (action=3, str=0x39ff758df0 "double 
free or corruption (!prev)", ptr=, ar_ptr=) at malloc.c:6360
#4  0x0039ff678dd0 in _int_free (av=0x39ff98e120, p=0x1d40b090, 
have_lock=0) at malloc.c:4846
#5  0x006269e5 in ExecHashJoinNewBatch (pstate=0x2771218) at 
nodeHashjoin.c:1058
#6  ExecHashJoinImpl (pstate=0x2771218) at nodeHashjoin.c:539
#7  ExecHashJoin (pstate=0x2771218) at nodeHashjoin.c:565
#8  0x006131d0 in ExecProcNodeInstr (node=0x2771218) at 
execProcnode.c:461
#9  0x00633286 in ExecProcNode (pstate=0x2771108) at 
../../../src/include/executor/executor.h:247
#10 ExecSort (pstate=0x2771108) at nodeSort.c:107
#11 0x006131d0 in ExecProcNodeInstr (node=0x2771108) at 
execProcnode.c:461
#12 0x0061d51e in ExecProcNode (aggstate=0x2770d30) at 
../../../src/include/executor/executor.h:247
#13 fetch_input_tuple (aggstate=0x2770d30) at nodeAgg.c:406
#14 0x0061ee70 in agg_retrieve_direct (pstate=0x2770d30) at 
nodeAgg.c:1755
#15 ExecAgg (pstate=0x2770d30) at nodeAgg.c:1570
#16 0x006131d0 in ExecProcNodeInstr (node=0x2770d30) at 
execProcnode.c:461
#17 0x0060f4b7 in ExecProcNode (queryDesc=0x2940950, direction=, count=0, execute_once=48) at 
../../../src/include/executor/executor.h:247
#18 ExecutePlan (queryDesc=0x2940950, direction=, count=0, 
execute_once=48) at execMain.c:1723
#19 standard_ExecutorRun (queryDesc=0x2940950, direction=, 
count=0, execute_once=48) at execMain.c:364
#20 0x7ff08f74f618 in pgss_ExecutorRun (queryDesc=0x2940950, 
direction=ForwardScanDirection, count=0, execute_once=true) at 
pg_stat_statements.c:892
#21 0x7ff08f2cc81d in explain_ExecutorRun (queryDesc=0x2940950, 
direction=ForwardScanDirection, count=0, execute_once=true) at 
auto_explain.c:281
#22 0x0075444b in PortalRunSelect (portal=0x2571f00, forward=, count=0, dest=) at pquery.c:932
#23 0x00755631 in PortalRun (portal=0x2571f00, 
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x928e7a8, 
altdest=0x928e7a8, completionTag=0x7ffcaa940670 "") at pquery.c:773
#24 0x00752267 in exec_simple_query (
query_string=0x2753e08 "--BEGIN SQL\nSELECT * FROM\n\n(SELECT site_office 
AS site_gran,\n \tsite_location AS cell,\n\tdisplay_site_name (sect_name, 
site_name, sect_mscid) AS sitename,\n\tperiod, data_cell.bh_day,\n\n", '-' 
, "Ac"...) at postgres.c:1145
#25 0x00753211 in PostgresMain (argc=, argv=, dbname=0x252b1f8 "ts", username=) at 
postgres.c:4182
#26 0x006e2587 in BackendRun (argc=, argv=) at postmaster.c:4358
#27 BackendStartup (argc=, argv=) at 
postmaster.c:4030
#28 ServerLoop (argc=, argv=) at 
postmaster.c:1707
#29 PostmasterMain (argc=, argv=) at 
postmaster.c:1380
#30 0x00656a80 in main (argc=3, argv=0x24f7570) at main.c:228

I found this report from 9.6 last month; copying those parties.
https://www.postgresql.org/message-id/CAHyXU0xzwzgnUTyK62ZkG0_1CQsBHwnTVT2TSX7iwTEv1ve9ag%40mail.gmail.com

Merlin: what OS are you running on and what postgres package, or was it
compiled locally ?

We're running:
postgresql11-server-11.5-1PGDG.rhel6.x86_64
CentOS release 6.8
glibc-2.12-1.192.el6.x86_64
linux 2.6.32-754.3.5.el6.x86_64
DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, 
BIOS 6.00 09/21/2015
Model name:Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz

I see this is available but not installed: 2.12-1.212.el6_10.3

The VM has no known issues.

ts=# SELECT * FROM pg_config();
..
| CONFIGURE | '--enable-rpath' '--prefix=/usr/pgsql-11' 
'--includedir=/usr/pgsql-11/include' '--mandir=/usr/pgsql-11/share/man' 
'--datadir=/usr/pgsql-11/share' '--libdir=/usr/pgsql-11/lib' '--with-icu' 
'--with-perl' '--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64' 
'--with-openssl

Re: LLVM breakage on seawasp

2019-08-24 Thread Thomas Munro
On Sun, Aug 25, 2019 at 9:46 AM Andres Freund  wrote:
> > Maybe we should just use std::unique_ptr's constructor, ie give it new
> > ImportMayTy() instead of using make_unique(), even though that's not
> > cool C++ these days?
>
> Yea, wfm. do you want to make it so?

Done.

-- 
Thomas Munro
https://enterprisedb.com




Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)

2019-08-24 Thread Peter Geoghegan
On Sat, Aug 24, 2019 at 7:25 PM Justin Pryzby  wrote:
> Merlin: what OS are you running on and what postgres package, or was it
> compiled locally ?

I was reminded of this issue from last year, which also appeared to
involve BufFileClose() and a double-free:

https://postgr.es/m/87y3hmee19@news-spur.riddles.org.uk

That was a BufFile that was under the control of a tuplestore, so it
was similar to but different from your case. I suspect it's related.

-- 
Peter Geoghegan




Re: Asymmetric partition-wise JOIN

2019-08-24 Thread Kohei KaiGai
2019年8月24日(土) 7:02 Thomas Munro :
>
> On Fri, Aug 23, 2019 at 4:05 AM Kohei KaiGai  wrote:
> > We can consider the table join ptable X t1 above is equivalent to:
> >   (ptable_p0 + ptable_p1 + ptable_p2) X t1
> > = (ptable_p0 X t1) + (ptable_p1 X t1) + (ptable_p2 X t1)
> > It returns an equivalent result, however, rows are already reduced by 
> > HashJoin
> > in the individual leaf of Append, so CPU-cycles consumed by Append node can
> > be cheaper.
> >
> > On the other hands, it has a downside because t1 must be read 3 times and
> > hash table also must be built 3 times. It increases the expected cost,
> > so planner
> > may not choose the asymmetric partition-wise join plan.
>
> What if you include the partition constraint as a filter on t1?  So you get:
>
> ptable X t1 =
>   (ptable_p0 X (σ hash(dist)%4=0 (t1))) +
>   (ptable_p1 X (σ hash(dist)%4=1 (t1))) +
>   (ptable_p2 X (σ hash(dist)%4=2 (t1))) +
>   (ptable_p3 X (σ hash(dist)%4=3 (t1)))
>
> Pros:
> 1.  The hash tables will not contain unnecessary junk.
> 2.  You'll get the right answer if t1 is on the outer side of an outer join.
> 3.  If this runs underneath a Parallel Append and t1 is big enough
> then workers will hopefully cooperate and do a synchronised scan of
> t1.
> 4.  The filter might enable a selective and efficient plan like an index scan.
>
> Cons:
> 1.  The filter might not enable a selective and efficient plan, and
> therefore cause extra work.
>
> (It's a little weird in this example because don't usually see hash
> functions in WHERE clauses, but that could just as easily be dist
> BETWEEN 1 AND 99 or any other partition constraint.)
>
It requires the join-key must include the partition key and also must be
equality-join, doesn't it?
If ptable and t1 are joined using ptable.dist = t1.foo, we can distribute
t1 for each leaf table with "WHERE hash(foo)%4 = xxx" according to
the partition bounds, indeed.

In case when some of partition leafs are pruned, it is exactly beneficial
because relevant rows to be referenced by the pruned child relations
are waste of memory.

On the other hands, it eventually consumes almost equivalent amount
of memory to load the inner relations, if no leafs are pruned, and if we
could extend the Hash-node to share the hash-table with sibling join-nodess.

> > One idea I have is, sibling HashJoin shares a hash table that was built once
> > by any of the sibling Hash plan. Right now, it is not implemented yet.
>
> Yeah, I've thought a little bit about that in the context of Parallel
> Repartition.  I'm interested in combining intra-node partitioning
> (where a single plan node repartitions data among workers on the fly)
> with inter-node partitioning (like PWJ, where partitions are handled
> by different parts of the plan, considered at planning time); you
> finish up needing to have nodes in the plan that 'receive' tuples for
> each partition, to match up with the PWJ plan structure.  That's not
> entirely unlike CTE references, and not entirely unlike your idea of
> somehow sharing the same hash table.  I ran into a number of problems
> while thinking about that, which I should write about in another
> thread.
>
Hmm. Do you intend the inner-path may have different behavior according
to the partition bounds definition where the outer-path to be joined?
Let me investigate its pros & cons.

The reasons why I think the idea of sharing the same hash table is reasonable
in this scenario are:
1. We can easily extend the idea for parallel optimization. A hash table on DSM
segment, once built, can be shared by all the siblings in all the
parallel workers.
2. We can save the memory consumption regardless of the join-keys and
partition-keys, even if these are not involved in the query.

On the other hands, below are the downside. Potentially, combined use of
your idea may help these cases:
3. Distributed inner-relation cannot be outer side of XXX OUTER JOIN.
4. Hash table contains rows to be referenced by only pruned partition leafs.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: using explicit_bzero

2019-08-24 Thread Peter Eisentraut
On 2019-08-14 05:00, Michael Paquier wrote:
> On Tue, Aug 13, 2019 at 10:30:39AM +0200, Peter Eisentraut wrote:
>> Another patch, to attempt to fix the Windows build.
> 
> I have not been able to test the compilation, but the changes look
> good on this side.

Rebased patch, no functionality changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 36c8521f8c75517670aa01d064e8b4c117765f87 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 13 Aug 2019 10:25:17 +0200
Subject: [PATCH v6] Use explicit_bzero

Use the explicit_bzero() function in places where it is important that
security information such as passwords is cleared from memory.  There
might be other places where it could be useful; this is just an
initial collection.

For platforms that don't have explicit_bzero(), provide various
fallback implementations.  (explicit_bzero() itself isn't standard,
but as Linux/glibc and OpenBSD have it, it's the most common spelling,
so it makes sense to make that the invocation point.)

Discussion: 
https://www.postgresql.org/message-id/flat/42d26bde-5d5b-c90d-87ae-6cab875f73be%402ndquadrant.com
---
 configure| 15 +++-
 configure.in |  2 +
 src/backend/libpq/be-secure-common.c |  3 ++
 src/include/pg_config.h.in   |  6 +++
 src/include/pg_config.h.win32|  6 +++
 src/include/port.h   |  4 ++
 src/interfaces/libpq/fe-connect.c|  8 
 src/port/explicit_bzero.c| 55 
 src/tools/msvc/Mkvcbuild.pm  |  2 +-
 9 files changed, 99 insertions(+), 2 deletions(-)
 create mode 100644 src/port/explicit_bzero.c

diff --git a/configure b/configure
index f14709ed1e..b3c92764be 100755
--- a/configure
+++ b/configure
@@ -15087,7 +15087,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred 
getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat 
pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open 
strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
+for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred 
getrlimit mbstowcs_l memset_s memmove poll posix_fallocate ppoll pstat 
pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open 
strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -15739,6 +15739,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "explicit_bzero" "ac_cv_func_explicit_bzero"
+if test "x$ac_cv_func_explicit_bzero" = xyes; then :
+  $as_echo "#define HAVE_EXPLICIT_BZERO 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" explicit_bzero.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS explicit_bzero.$ac_objext"
+ ;;
+esac
+
+fi
+
 ac_fn_c_check_func "$LINENO" "fls" "ac_cv_func_fls"
 if test "x$ac_cv_func_fls" = xyes; then :
   $as_echo "#define HAVE_FLS 1" >>confdefs.h
diff --git a/configure.in b/configure.in
index 805cf8617b..0d16c1a971 100644
--- a/configure.in
+++ b/configure.in
@@ -1594,6 +1594,7 @@ AC_CHECK_FUNCS(m4_normalize([
getpeerucred
getrlimit
mbstowcs_l
+   memset_s
memmove
poll
posix_fallocate
@@ -1691,6 +1692,7 @@ fi
 
 AC_REPLACE_FUNCS(m4_normalize([
dlopen
+   explicit_bzero
fls
getopt
getrusage
diff --git a/src/backend/libpq/be-secure-common.c 
b/src/backend/libpq/be-secure-common.c
index e8f27bc782..d801929ea2 100644
--- a/src/backend/libpq/be-secure-common.c
+++ b/src/backend/libpq/be-secure-common.c
@@ -87,6 +87,7 @@ run_ssl_passphrase_command(const char *prompt, bool 
is_server_start, char *buf,
{
if (ferror(fh))
{
+   explicit_bzero(buf, size);
ereport(loglevel,
(errcode_for_file_access(),
 errmsg("could not read from command 
\"%s\": %m",
@@ -98,6 +99,7 @@ run_ssl_passphrase_command(const char *prompt, bool 
is_server_start, char *buf,
pclose_rc = ClosePipeStream(fh);
if (pclose_rc == -1)
{
+   explicit_bzero(buf, size);
ereport(loglevel,
(errcode_for_file_access(),
 errmsg("could not close pipe to external 
command: %m")));
@@ -105,6 +107,7 @@ run_ssl_passphrase_command(const char *prompt, bool 
is_server_start, char *buf,
}
else if (pclose_rc != 0)
{
+   explicit_bzero(buf, size);
ereport(loglevel,
(errcode_for_file_access(),
   

Re: POC: Cleaning up orphaned files using undo logs

2019-08-24 Thread Robert Haas
On Fri, Aug 23, 2019 at 2:04 AM Thomas Munro  wrote:
> 2.  Strict self-update-only:  We could update it as part of
> transaction cleanup.  That is, when you commit or abort, probably at
> some time when your transaction is still advertised as running, you go
> and update your own transaction header with your the size.  If you
> never reach that stage, I think you can fix it during crash recovery,
> during the startup scan that feeds the rollback request queues.  That
> is, if you encounter a transaction header with length == 0, it must be
> the final one and its length is therefore known and can be updated,
> before you allow new transactions to begin.  There are some
> complications like backends that exit without crashing, which I
> haven't thought about.  As Amit just pointed out to me, that means
> that the update is not folded into the same buffer access as the next
> transaction, but perhaps you can mitigate that by not updating your
> header if the next header will be on the same page -- the next
> transaction can do it safely then (this page with the insert pointer
> on it can't be discarded).  As Dilip just pointed out to me, it means
> that you always do an update that you might not never need to do if
> the transaction is discarded, to which I have no answer.  Bleugh.

Andres and I have spent a lot of time on the phone over the last
couple of days and I think we both kind of like this option.  I don't
think that the costs are likely to be very significant: you're talking
about pinning, locking, dirtying, unlocking, and unpinning one buffer
at commit time, or maybe two if your transaction touched both logged
and unlogged tables.  If the transaction is short enough for that
overhead to matter, that buffer is probably already in shared_buffers,
is probably already dirty, and is probably already in your CPU's
cache. So I think the overhead will turn out to be low.

Moreover, I doubt that we want to separately discard every transaction
anyway.  If you have very light-weight transactions, you don't want to
add an extra WAL record per transaction anyway.  Increasing the number
of separate WAL records per transaction from say 5 to 6 would be a
significant additional cost.  You probably want to perform a discard,
say, every 5 seconds or sooner if you can discard at least 64kB of
undo, or something of that sort.  So we're not going to save the
overhead of updating the previous transaction header often enough to
make much difference unless we're discarding so aggressively that we
incur a much larger overhead elsewhere.  I think.

I am a little concerned about the backends that exit without crashing.
Andres seems to want to treat that case as a bug to be fixed, but I
doubt whether that's going to be practical.   We're really only
talking about extreme corner cases here, because
before_shmem_exit(ShutdownPostgres, 0) means we'll
AbortOutOfAnyTransaction() which should RecordTransactionAbort(). Only
if we fail in the AbortTransaction() prior to reaching
RecordTransactionAbort() will we manage to reach the later cleanup
stages without having written an abort record.  I haven't scrutinized
that code lately to see exactly how things can go wrong there, but
there shouldn't be a whole lot. However, there's probably a few
things, like maybe a really poorly-timed malloc() failure.

A zero-order solution would be to install a deadman switch. At
on_shmem_exit time, you must detach from any undo log to which you are
connected, so that somebody else can attach to it later.  We can stick
in a cross-check there that you haven't written any undo bytes to that
log and PANIC if you have.  Then the system must be water-tight.
Perhaps it's possible to do better: if we could identify the cases in
which such logic gets reached, we could try to guarantee that WAL is
written and the undo log safely detached before we get there.  But at
the various least we can promote ERROR/FATAL to PANIC in the relevant
case.

A better solution would be to detect the problem and make sure we
recover from it before reusing the undo log.  Suppose each undo log
has three states: (1) nobody's attached, (2) somebody's attached, and
(3) nobody's attached but the last record might need a fixup.  When we
start up, all undo logs are in state 3, and the discard worker runs
around and puts them into state 1.  Subsequently, they alternate
between states 1 and 2 for as long as the system remains up.  But if
as an exceptional case we reach on_shmem_exit without having detached
the undo log, because of cascading failures, then we put the undo log
in state 3.  The discard worker already knows how to move undo logs
from state 3 to state 1, and it can do the same thing here.  Until it
does nobody else can reuse that undo log.

I might be missing something, but I think that would nail this down
pretty tightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: mingw32 floating point diff

2019-08-24 Thread Emre Hasegeli
> I poked at this a bit more.  I can reproduce the problem by using
> -mfpmath=387 on dromedary's host (fairly old 32-bit macOS); although
> I also get half a dozen *other* failures in the core regression tests,
> mostly around detection of float overflow.  So I'm not quite sure that
> this is comparable.  But at any rate, I tracked the core of the problem
> to pg_hypot:

I couldn't test if it helps, but another solution may be is to rip out
pg_hypot() in favour of the libc implementation.  This was discussed
in detail as part of "Improve geometric types" thread.

Last comment about it:

https://www.postgresql.org/message-id/9223.1507039405%40sss.pgh.pa.us




Re: POC: Cleaning up orphaned files using undo logs

2019-08-24 Thread Robert Haas
On Wed, Aug 21, 2019 at 4:34 PM Robert Haas  wrote:
> ReleaseResourcesAndProcessUndo() is only supposed to be called after
> AbortTransaction(), and the additional steps it performs --
> AtCleanup_Portals() and AtEOXact_Snapshot() or alternatively
> AtSubCleanup_Portals -- are taken from Cleanup(Sub)Transaction.
> That's not crazy; the other steps in Cleanup(Sub)Transaction() look
> like stuff that's intended to be performed when we're totally done
> with this TransactionState stack entry, whereas these things are
> slightly higher-level cleanups that might even block undo (e.g.
> undropped portal prevents orphaned file cleanup).  Granted, there are
> no comments explaining why those particular cleanup steps are
> performed here, and it's possible some other approach is better, but I
> think perhaps it's not quite as flagrantly broken as you think.

Andres smacked me with the clue-bat off-list and now I understand why
this is broken: there's no guarantee that running the various
AtEOXact/AtCleanup functions actually puts the transaction back into a
good state.  They *might* return the system to the state that it was
in immediately following StartTransaction(), but they also might not.
Moreover, ProcessUndoRequestForEachLogCat uses PG_TRY/PG_CATCH and
then discards the error without performing *any cleanup at all* but
then goes on and tries to do undo for other undo log categories
anyway.  That is totally unsafe.

I think that there should only be one chance to perform undo actions,
and as I said or at least alluded to before, if that throws an error,
it shouldn't be caught by a TRY/CATCH block but should be handled by
the state machine in xact.c.  If we're not going to make the state
machine handle these conditions, the addition of
TRANS_UNDO/TBLOCK_UNDO/TBLOCK_SUBUNDO is really missing the boat.  I'm
still not quite sure of the exact sequence of steps: we clearly need
AtCleanup_Portals() and a bunch of the other stuff that happens during
CleanupTransaction(), ideally including the freeing of memory, to
happen before we try undo. But I don't have a great feeling for how to
make that happen, and it seems more desirable for undo to begin as
soon as the transaction fails rather than waiting until
Cleanup(Sub)Transaction() time. I think some more research is needed
here.

> I am also not convinced that semi-critical sections are a bad idea,

Regarding this, after further thought and discussion with Andres,
there are two cases here that call for somewhat different handling:
temporary undo, and subtransaction abort.

In the case of a subtransaction abort, we can't proceed with the
toplevel transaction unless we succeed in applying the
subtransaction's undo, but that does not require killing off the
backend.  It might be a better idea to just fail the containing
subtransaction with the error that occurred during undo apply; if
there are multiple levels of subtransactions present then we might
fail in the same way several times, but eventually we'll fail at the
top level, forcibly kick the undo into the background, and the session
can continue.  The background workers will, hopefully, eventually
recover the situation.  Even if they can't, because, say, the failure
is due to a bug or whatever, killing off the session doesn't really
help.

In the case of temporary undo, killing the session is a much more
appealing approach. If we don't, how will that undo ever get
processed?  We could retry at some point (like every time we return to
the toplevel command prompt?) or just ignore the fact that we didn't
manage to perform those undo actions and leave that undo there like an
albatross, accumulating more and more undo behind it until the session
exits or the disk fills up.  The latter strikes me as a terrible user
experience, especially because for wire protocol reasons we'd have to
swallow the errors or at best convert them to warnings, but YMMV.

Anyway, probably these cases should not be handled exactly the same
way, but exactly what to do probably depends on the previous question:
how exactly does the integration into xact.c's state machine work,
anyway?

Meanwhile, I've been working up a prototype of how the undorequest.c
stuff I sent previously could be integrated with xact.c.  In working
on that, I've realized that there seem to be two different tasks.  One
is tracking the information that we'll need to have available to
perform undo actions.  The other is the actual transaction state
manipulation: when and how do we abort transactions, cleanup
transactions, start new transactions specifically for undo?  How are
transactions performing undo specially marked, if at all?  The
attached patch includes a new module, undostate.c/h, which tries to
handle the first of those things; this is just a prototype, and is
missing some pieces marked with XXX, but I think it's probably the
right general direction.  It will still need to be plugged into a
framework for launching undo apply background workers (which might
require som

Re: "ago" times on buildfarm status page

2019-08-24 Thread Andrew Dunstan


On 8/21/19 10:52 AM, Andrew Dunstan wrote:
>>
>> The Javscript could also be made to update the "ago" part every minute,
>> and show the absoulte time as a tooltip, which is what pretty much every
>> other website does.
>>
>
> The code for the page is here:
> 
>
>
> Patches welcome.
>
>


I have done both of these things.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: mingw32 floating point diff

2019-08-24 Thread Tom Lane
Emre Hasegeli  writes:
> I couldn't test if it helps, but another solution may be is to rip out
> pg_hypot() in favour of the libc implementation.  This was discussed
> in detail as part of "Improve geometric types" thread.

Hm ... the problem we're trying to fix here is platform-varying results.
Surely switching to the libc implementation would make that worse not
better?

regards, tom lane




Re: Re: Email to hackers for test coverage

2019-08-24 Thread movead...@highgo.ca
>Hi Movead,
>Please add that to commitfest.

Thanks and I just do it, it is
https://commitfest.postgresql.org/24/2258/

--
Movead Li


Re: Why overhead of SPI is so large?

2019-08-24 Thread David Fetter
On Fri, Aug 23, 2019 at 11:10:28AM +0200, Pavel Stehule wrote:
> pá 23. 8. 2019 v 11:05 odesílatel Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> napsal:
> 
> >
> >
> > On 22.08.2019 18:56, Pavel Stehule wrote:
> >
> >
> >
> > čt 22. 8. 2019 v 17:51 odesílatel Konstantin Knizhnik <
> > k.knizh...@postgrespro.ru> napsal:
> >
> >> Some more information...
> >> First of all I found out that marking PL/pgSQL function as immutable
> >> significantly increase speed of its execution:
> >> 19808 ms vs. 27594. It happens because exec_eval_simple_expr is taken
> >> snapshot if function is volatile (default).
> >> I wonder if PL/pgSQL compiler can detect that evaluated expression itself
> >> is actually immutable  and there is no need to take snapshot
> >> for each invocation of this function. Also I have tried yet another PL
> >> language - JavaScript, which is now new outsider, despite to the fact that
> >> v8 JIT compiler is very good.
> >>
> >
> > I have a plan to do some work in this direction. Snapshot is not necessary
> > for almost buildin functions. If expr calls only buildin functions, then
> > probably can be called without snapshot and without any work with plan
> > cache.
> >
> >
> > I wonder if the following simple patch is correct?
> >
> 
> You cannot to believe to user defined functions so immutable flag is
> correct. Only buildin functions are 100% correct.
> 
> CREATE OR REPLACE FUNCTION foo()
> RETURNS int AS $$
> SELECT count(*) FROM pg_class;
> $$ LANGUAGE sql IMMUTABLE;
> 
> is working.

No, it's lying to the RDBMS, so it's pilot error. The problem of
determining from the function itself whether it is in fact immutable
is, in general, equivalent to the Halting Problem, so no, we can't
figure it out. We do need to trust our users not to lie to us, and we
do not need to protect them from the consequences when they do.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-24 Thread Bruce Momjian
On Fri, Aug 23, 2019 at 10:04:13PM -0400, Stephen Frost wrote:
> > Well, I think they might do that to reduce encryption overhead.  I think
> > tests have shown that is not an issue, but we will need to test further.
> 
> I seriously doubt that's why and I don't think there's actually much
> value in trying to figure out the "why" here- the question is, do those
> systems answer the check-box requirement that was brought up on the call
> as the justification for this feature?  If so, then clearly not
> everything is required to be encrypted and we shouldn't be stressing
> over trying to do that.

We will stress in trying _not_ to encrypt everything.

> > I am not sure of the downside of encrypting everything, since it leaks
> > the least information and has a minimal user API and code impact.  What
> > is the value of encrypting only the user rows?  Better key control?
> 
> Yes, better key control, and better user API, and avoiding having an

Uh, there is no user API for all-cluster encryption except for the
administrator.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Why overhead of SPI is so large?

2019-08-24 Thread Pavel Stehule
so 24. 8. 2019 v 18:01 odesílatel David Fetter  napsal:

> On Fri, Aug 23, 2019 at 11:10:28AM +0200, Pavel Stehule wrote:
> > pá 23. 8. 2019 v 11:05 odesílatel Konstantin Knizhnik <
> > k.knizh...@postgrespro.ru> napsal:
> >
> > >
> > >
> > > On 22.08.2019 18:56, Pavel Stehule wrote:
> > >
> > >
> > >
> > > čt 22. 8. 2019 v 17:51 odesílatel Konstantin Knizhnik <
> > > k.knizh...@postgrespro.ru> napsal:
> > >
> > >> Some more information...
> > >> First of all I found out that marking PL/pgSQL function as immutable
> > >> significantly increase speed of its execution:
> > >> 19808 ms vs. 27594. It happens because exec_eval_simple_expr is taken
> > >> snapshot if function is volatile (default).
> > >> I wonder if PL/pgSQL compiler can detect that evaluated expression
> itself
> > >> is actually immutable  and there is no need to take snapshot
> > >> for each invocation of this function. Also I have tried yet another PL
> > >> language - JavaScript, which is now new outsider, despite to the fact
> that
> > >> v8 JIT compiler is very good.
> > >>
> > >
> > > I have a plan to do some work in this direction. Snapshot is not
> necessary
> > > for almost buildin functions. If expr calls only buildin functions,
> then
> > > probably can be called without snapshot and without any work with plan
> > > cache.
> > >
> > >
> > > I wonder if the following simple patch is correct?
> > >
> >
> > You cannot to believe to user defined functions so immutable flag is
> > correct. Only buildin functions are 100% correct.
> >
> > CREATE OR REPLACE FUNCTION foo()
> > RETURNS int AS $$
> > SELECT count(*) FROM pg_class;
> > $$ LANGUAGE sql IMMUTABLE;
> >
> > is working.
>
> No, it's lying to the RDBMS, so it's pilot error. The problem of
> determining from the function itself whether it is in fact immutable
> is, in general, equivalent to the Halting Problem, so no, we can't
> figure it out. We do need to trust our users not to lie to us, and we
> do not need to protect them from the consequences when they do.
>

I have not any problem with fixing this behave when there will be any
alternative.

I can imagine new special flag that can be used for STABLE functions, that
enforce one shot plans and can be optimized similar like IMMUTABLE
functions now - using result in planning time.

The users lie because they must - there is not a alternative. There is not
any other solution - and estimation errors related to a joins are
fundamental issue.

Regards

Pavel





> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


Re: assertion at postmaster start

2019-08-24 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> On 2019-06-15 12:09:50 -0400, Alvaro Herrera wrote:
>>> Once in a blue moon I get this assertion failure on server start:
>>> TRAP: FailedAssertion("!(AbortStartTime == 0)", Archivo: 
>>> "/pgsql/source/master/src/backend/postmaster/postmaster.c", Linea: 2957)

>> And "server process" is afaict only used for actual backends, not other
>> types of processes. But we've not yet seen "database system is ready to
>> accept accept connections", so IIRC it could only be a "dead_end" type
>> backend? But we didn't yet see an error from that...
>> Seems to indicate a logic error in postmaster's state machine. Perhaps
>> something related to dead_end processes?

> So if Andres is guessing right, this must be from something trying to
> connect before the postmaster is ready?  Seems like that could be
> tested for ...

I got around to trying to test for this, and I find that I can reproduce
the symptom exactly by applying the attached hack and then trying to
connect a couple of seconds after starting the postmaster.

Basically what seems to be happening in Alvaro's report is that

(1) before the startup process is done, something tries to connect,
causing a dead_end child to be launched;

(2) for reasons mysterious, that child exits with exit code 15 rather
than the expected 0 or 1;

(3) the postmaster therefore initiates a system-wide restart cycle;

(4) the startup process completes normally anyway, indicating that the
SIGQUIT arrived too late to affect it;

(5) then we hit the Assert, since we reach the transition-to-normal-run
code even though HandleChildCrash set AbortStartTime in step (3).

The timing window for (4) to happen is extremely tight normally.  The
attached patch makes it wider by the expedient of just not sending the
SIGQUIT to the startup process ;-).  Then you just need enough of a delay
in startup to perform a manual connection, plus some hack to make the
dead_end child exit with an unexpected exit code.

I think what this demonstrates is that that Assert is just wrong:
we *can* reach PM_RUN with the flag still set, so we should do

StartupStatus = STARTUP_NOT_RUNNING;
FatalError = false;
-   Assert(AbortStartTime == 0);
+   AbortStartTime = 0;
ReachedNormalRunning = true;
pmState = PM_RUN;

Probably likewise for the similar Assert in sigusr1_handler.

A larger question is whether we should modify the postmaster logic
so that crashes of dead_end children aren't treated as reasons to
perform a system restart.  I'm dubious about this, because frankly,
such crashes shouldn't be happening.  There is very little code
that a dead_end child will traverse before exiting ... so how the
devil did it reach an exit(15)?  Alvaro, are you running any
nonstandard code in the postmaster (shared_preload_libraries, maybe)?

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3339804..ef53522 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2260,6 +2260,7 @@ retry1:
 	switch (port->canAcceptConnections)
 	{
 		case CAC_STARTUP:
+			exit(15);
 			ereport(FATAL,
 	(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 	 errmsg("the database system is starting up")));
@@ -3512,7 +3513,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 (errmsg_internal("sending %s to process %d",
  (SendStop ? "SIGSTOP" : "SIGQUIT"),
  (int) StartupPID)));
-		signal_child(StartupPID, (SendStop ? SIGSTOP : SIGQUIT));
+//		signal_child(StartupPID, (SendStop ? SIGSTOP : SIGQUIT));
 		StartupStatus = STARTUP_SIGNALED;
 	}
 
@@ -5383,6 +5384,9 @@ StartChildProcess(AuxProcType type)
 		MemoryContextDelete(PostmasterContext);
 		PostmasterContext = NULL;
 
+		if (type == StartupProcess)
+			pg_usleep(500);
+
 		AuxiliaryProcessMain(ac, av);
 		ExitPostmaster(0);
 	}


Re: [PATCH] Make configuration file "include" directive handling more robust

2019-08-24 Thread Tom Lane
Ian Barwick  writes:
> On 7/17/19 5:34 PM, Kyotaro Horiguchi wrote:> Hello.
>>> I don't think this is new to 12.

> No, though I'm not sure how much this would be seen as a bugfix
> and how far back it would be sensible to patch.

I think this is worth considering as a bugfix; although I'm afraid
we can't change the signature of ParseConfigFile/ParseConfigFp in
released branches, since extensions could possibly be using those.
That limits what we can do --- but it's still possible to detect
direct recursion, which seems like enough to produce a nice error
message in typical cases.

I concur with Kyotaro-san that disallow-empty-include-directives.v1.patch
seems a bit brute-force, but where I would put the checks is in
ParseConfigFile and ParseConfigDirectory.

Also, I don't agree with the goals of prevent-disallowed-includes.patch.
I'm utterly not on board with breaking use of "include" in extension
files, for instance; while that may not be documented, it works fine,
and maybe somebody out there is relying on it.  Likewise, while "include"
in pg.auto.conf is not really considered supported, I don't see the
point of going out of our way to break the historical behavior.

That leads me to propose the attached simplified patch.  While I haven't
actually tried, I'm pretty sure this should back-patch without trouble.

regards, tom lane

diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 1c8b5f7..125b898 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -568,6 +568,22 @@ ParseConfigFile(const char *config_file, bool strict,
 	FILE	   *fp;
 
 	/*
+	 * Reject file name that is all-blank (including empty), as that leads to
+	 * confusion, or even recursive inclusion in some cases.
+	 */
+	if (strspn(config_file, " \t\r\n") == strlen(config_file))
+	{
+		ereport(elevel,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("empty configuration file name: \"%s\"",
+		config_file)));
+		record_config_file_error("empty configuration file name",
+ calling_file, calling_lineno,
+ head_p, tail_p);
+		return false;
+	}
+
+	/*
 	 * Reject too-deep include nesting depth.  This is just a safety check to
 	 * avoid dumping core due to stack overflow if an include file loops back
 	 * to itself.  The maximum nesting depth is pretty arbitrary.
@@ -585,6 +601,26 @@ ParseConfigFile(const char *config_file, bool strict,
 	}
 
 	abs_path = AbsoluteConfigLocation(config_file, calling_file);
+
+	/*
+	 * Reject direct recursion.  Indirect recursion is also possible, but it's
+	 * harder to detect and so doesn't seem worth the trouble.  (We test at
+	 * this step because the canonicalization done by AbsoluteConfigLocation
+	 * makes it more likely that a simple strcmp comparison will match.)
+	 */
+	if (calling_file && strcmp(abs_path, calling_file) == 0)
+	{
+		ereport(elevel,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("configuration file recursion in \"%s\"",
+		calling_file)));
+		record_config_file_error("configuration file recursion",
+ calling_file, calling_lineno,
+ head_p, tail_p);
+		pfree(abs_path);
+		return false;
+	}
+
 	fp = AllocateFile(abs_path, "r");
 	if (!fp)
 	{
@@ -933,6 +969,27 @@ ParseConfigDirectory(const char *includedir,
 	int			size_filenames;
 	bool		status;
 
+	/*
+	 * Reject directory name that is all-blank (including empty), as that
+	 * leads to confusion, or even recursive inclusion in some cases.
+	 */
+	if (strspn(includedir, " \t\r\n") == strlen(includedir))
+	{
+		ereport(elevel,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("empty configuration directory name: \"%s\"",
+		includedir)));
+		record_config_file_error("empty configuration directory name",
+ calling_file, calling_lineno,
+ head_p, tail_p);
+		return false;
+	}
+
+	/*
+	 * We don't check for recursion or too-deep nesting depth here; the
+	 * subsequent calls to ParseConfigFile will take care of that.
+	 */
+
 	directory = AbsoluteConfigLocation(includedir, calling_file);
 	d = AllocateDir(directory);
 	if (d == NULL)


LLVM breakage on seawasp

2019-08-24 Thread Thomas Munro
Hi,

llvmjit_inline.cpp:177:55: error: ‘make_unique’ is not a member of ‘llvm’
  std::unique_ptr globalsToInline =
llvm::make_unique();

That's because they just moved to C++14 and replaced their own
llvm::make_unique<> with std::make_unique<>:

https://github.com/llvm-mirror/llvm/commit/114087caa6f95b526861c3af94b3093d9444c57b

Perhaps we'll need some macrology to select between llvm and std
versions?  I am guessing we can't decree that PostgreSQL's minimum C++
level is C++14 and simply change it to std::make_unique.

-- 
Thomas Munro
https://enterprisedb.com




Re: LLVM breakage on seawasp

2019-08-24 Thread Andres Freund
Hi,

On August 24, 2019 1:08:11 PM PDT, Thomas Munro  wrote:
>Hi,
>
>llvmjit_inline.cpp:177:55: error: ‘make_unique’ is not a member of
>‘llvm’
>  std::unique_ptr globalsToInline =
>llvm::make_unique();
>
>That's because they just moved to C++14 and replaced their own
>llvm::make_unique<> with std::make_unique<>:
>
>https://github.com/llvm-mirror/llvm/commit/114087caa6f95b526861c3af94b3093d9444c57b
>
>Perhaps we'll need some macrology to select between llvm and std
>versions?  I am guessing we can't decree that PostgreSQL's minimum C++
>level is C++14 and simply change it to std::make_unique.

Perhaps just a
#if new_enough
using std::make_unique
#else
using  llvm::mak_eunique

At the start of the file, and then use it unqualified?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: assertion at postmaster start

2019-08-24 Thread Tom Lane
I wrote:
> I think what this demonstrates is that that Assert is just wrong:
> we *can* reach PM_RUN with the flag still set, so we should do
>   StartupStatus = STARTUP_NOT_RUNNING;
>   FatalError = false;
> - Assert(AbortStartTime == 0);
> + AbortStartTime = 0;
>   ReachedNormalRunning = true;
>   pmState = PM_RUN;
> Probably likewise for the similar Assert in sigusr1_handler.

Poking further at this, I noticed that the code just above here completely
fails to do what the comments say it intends to do, which is restart the
startup process after we've SIGQUIT'd it.  That's because the careful
manipulation of StartupStatus in reaper (lines 2943ff in HEAD) is stomped
on by HandleChildCrash, which will just unconditionally reset it to
STARTUP_CRASHED (line 3507).  So we end up shutting down the database
after all, which is not what the intention seems to be.  Hence,
commit 45811be94 was still a few bricks shy of a load :-(.

I propose the attached.  I'm inclined to think that the risk/benefit
of back-patching this is not very good, so I just want to stick it in
HEAD, unless somebody can explain why dead_end children are likely to
crash in the field.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3339804..ecb7892 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2920,7 +2920,9 @@ reaper(SIGNAL_ARGS)
 			 * during PM_STARTUP is treated as catastrophic. There are no
 			 * other processes running yet, so we can just exit.
 			 */
-			if (pmState == PM_STARTUP && !EXIT_STATUS_0(exitstatus))
+			if (pmState == PM_STARTUP &&
+StartupStatus != STARTUP_SIGNALED &&
+!EXIT_STATUS_0(exitstatus))
 			{
 LogChildExit(LOG, _("startup process"),
 			 pid, exitstatus);
@@ -2937,11 +2939,24 @@ reaper(SIGNAL_ARGS)
 			 * then we previously sent the startup process a SIGQUIT; so
 			 * that's probably the reason it died, and we do want to try to
 			 * restart in that case.
+			 *
+			 * This stanza also handles the case where we sent a SIGQUIT
+			 * during PM_STARTUP due to some dead_end child crashing: in that
+			 * situation, if the startup process dies on the SIGQUIT, we need
+			 * to transition to PM_WAIT_BACKENDS state which will allow
+			 * PostmasterStateMachine to restart the startup process.  (On the
+			 * other hand, the startup process might complete normally, if we
+			 * were too late with the SIGQUIT.  In that case we'll fall
+			 * through and commence normal operations.)
 			 */
 			if (!EXIT_STATUS_0(exitstatus))
 			{
 if (StartupStatus == STARTUP_SIGNALED)
+{
 	StartupStatus = STARTUP_NOT_RUNNING;
+	if (pmState == PM_STARTUP && Shutdown == NoShutdown)
+		pmState = PM_WAIT_BACKENDS;
+}
 else
 	StartupStatus = STARTUP_CRASHED;
 HandleChildCrash(pid, exitstatus,
@@ -2954,7 +2969,7 @@ reaper(SIGNAL_ARGS)
 			 */
 			StartupStatus = STARTUP_NOT_RUNNING;
 			FatalError = false;
-			Assert(AbortStartTime == 0);
+			AbortStartTime = 0;
 			ReachedNormalRunning = true;
 			pmState = PM_RUN;
 
@@ -3504,7 +3519,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	if (pid == StartupPID)
 	{
 		StartupPID = 0;
-		StartupStatus = STARTUP_CRASHED;
+		/* Caller adjusts StartupStatus, so don't touch it here */
 	}
 	else if (StartupPID != 0 && take_action)
 	{
@@ -5100,7 +5115,7 @@ sigusr1_handler(SIGNAL_ARGS)
 	{
 		/* WAL redo has started. We're out of reinitialization. */
 		FatalError = false;
-		Assert(AbortStartTime == 0);
+		AbortStartTime = 0;
 
 		/*
 		 * Crank up the background tasks.  It doesn't matter if this fails,


Re: LLVM breakage on seawasp

2019-08-24 Thread Tom Lane
Andres Freund  writes:
> On August 24, 2019 1:08:11 PM PDT, Thomas Munro  
> wrote:
>> That's because they just moved to C++14 and replaced their own
>> llvm::make_unique<> with std::make_unique<>:
>> https://github.com/llvm-mirror/llvm/commit/114087caa6f95b526861c3af94b3093d9444c57b
>> Perhaps we'll need some macrology to select between llvm and std
>> versions?  I am guessing we can't decree that PostgreSQL's minimum C++
>> level is C++14 and simply change it to std::make_unique.

So we're depending on APIs that upstream doesn't think are stable?

regards, tom lane




Re: LLVM breakage on seawasp

2019-08-24 Thread Andres Freund
Hi,

On August 24, 2019 1:57:56 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On August 24, 2019 1:08:11 PM PDT, Thomas Munro
> wrote:
>>> That's because they just moved to C++14 and replaced their own
>>> llvm::make_unique<> with std::make_unique<>:
>>>
>https://github.com/llvm-mirror/llvm/commit/114087caa6f95b526861c3af94b3093d9444c57b
>>> Perhaps we'll need some macrology to select between llvm and std
>>> versions?  I am guessing we can't decree that PostgreSQL's minimum
>C++
>>> level is C++14 and simply change it to std::make_unique.
>
>So we're depending on APIs that upstream doesn't think are stable?

 Seawasp iirc builds against the development branch of llvm, which explains why 
we see failures there. Does that address what you are concerned about? If not, 
could you expand?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: LLVM breakage on seawasp

2019-08-24 Thread Thomas Munro
On Sun, Aug 25, 2019 at 8:24 AM Andres Freund  wrote:
> On August 24, 2019 1:08:11 PM PDT, Thomas Munro  
> wrote:
> >Perhaps we'll need some macrology to select between llvm and std
> >versions?  I am guessing we can't decree that PostgreSQL's minimum C++
> >level is C++14 and simply change it to std::make_unique.
>
> Perhaps just a
> #if new_enough
> using std::make_unique
> #else
> using  llvm::mak_eunique
>
> At the start of the file, and then use it unqualified?

Yeah, it's a pain though, you'd have to say:

#if llvm >= 9
#  if cpp >= 14
#using std::make_unique;
#  else
#error "postgres needs at least c++ 14 to use llvm 9"
#  endif
#else
#  using llvm::make_unique;
#endif

Maybe we should just use std::unique_ptr's constructor, ie give it new
ImportMayTy() instead of using make_unique(), even though that's not
cool C++ these days?

-- 
Thomas Munro
https://enterprisedb.com




Re: LLVM breakage on seawasp

2019-08-24 Thread Tom Lane
Andres Freund  writes:
> On August 24, 2019 1:57:56 PM PDT, Tom Lane  wrote:
>> So we're depending on APIs that upstream doesn't think are stable?

>  Seawasp iirc builds against the development branch of llvm, which explains 
> why we see failures there. Does that address what you are concerned about? If 
> not, could you expand?

I know it's the development branch.  The question is whether this
breakage is something *they* ought to be fixing.  If not, I'm
worried that we're too much in bed with implementation details
of LLVM that we shouldn't be depending on.

regards, tom lane




Re: LLVM breakage on seawasp

2019-08-24 Thread Andres Freund
Hi,

On August 24, 2019 2:37:55 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On August 24, 2019 1:57:56 PM PDT, Tom Lane 
>wrote:
>>> So we're depending on APIs that upstream doesn't think are stable?
>
>>  Seawasp iirc builds against the development branch of llvm, which
>explains why we see failures there. Does that address what you are
>concerned about? If not, could you expand?
>
>I know it's the development branch.  The question is whether this
>breakage is something *they* ought to be fixing.  If not, I'm
>worried that we're too much in bed with implementation details
>of LLVM that we shouldn't be depending on.

Don't think so - it's a C++ standard feature in the version of the standard 
LLVM is based on. So it's pretty reasonable for them to drop their older 
backwards compatible function.

Access
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: LLVM breakage on seawasp

2019-08-24 Thread Andres Freund
Hi,

On 2019-08-25 09:15:35 +1200, Thomas Munro wrote:
> On Sun, Aug 25, 2019 at 8:24 AM Andres Freund  wrote:
> > On August 24, 2019 1:08:11 PM PDT, Thomas Munro  
> > wrote:
> > >Perhaps we'll need some macrology to select between llvm and std
> > >versions?  I am guessing we can't decree that PostgreSQL's minimum C++
> > >level is C++14 and simply change it to std::make_unique.
> >
> > Perhaps just a
> > #if new_enough
> > using std::make_unique
> > #else
> > using  llvm::mak_eunique
> >
> > At the start of the file, and then use it unqualified?
> 
> Yeah, it's a pain though, you'd have to say:
> 
> #if llvm >= 9
> #  if cpp >= 14
> #using std::make_unique;
> #  else
> #error "postgres needs at least c++ 14 to use llvm 9"
> #  endif
> #else
> #  using llvm::make_unique;
> #endif

I don't think we'd really need the inner part, because you can't use
llvm 9 without a new enough compiler. There's plenty make_unique use in
inline functions etc.


> Maybe we should just use std::unique_ptr's constructor, ie give it new
> ImportMayTy() instead of using make_unique(), even though that's not
> cool C++ these days?

Yea, wfm. do you want to make it so?

Greetings,

Andres Freund




Re: LLVM breakage on seawasp

2019-08-24 Thread Tom Lane
Andres Freund  writes:
> On August 24, 2019 2:37:55 PM PDT, Tom Lane  wrote:
>> I know it's the development branch.  The question is whether this
>> breakage is something *they* ought to be fixing.  If not, I'm
>> worried that we're too much in bed with implementation details
>> of LLVM that we shouldn't be depending on.

> Don't think so - it's a C++ standard feature in the version of the standard 
> LLVM is based on. So it's pretty reasonable for them to drop their older 
> backwards compatible function.

Whether it's reasonable or not doesn't really matter to my point.
We shouldn't be in the business of tracking multitudes of small
changes in LLVM, no matter whether they're individually "reasonable".
The more often this happens, the more concerned I am that we chose
the wrong semantic level to interface at.

regards, tom lane




Re: Optimize single tuple fetch from nbtree index

2019-08-24 Thread Floris Van Nee

> Hello,
> welcome to hackers with your first patch)

Thank you.

> Though, I got interested in the comment inconsistency you have found.
> I added debug message into this code branch of the patch and was able to
> see it in regression.diffs after 'make check':
> Speaking of your patch, it seems that the buffer was unpinned and pinned
> again between two reads,
> and the condition of holding it continuously has not been met.

May I ask what makes you conclude that the condition of holding the pin 
continuously has not been met?
Your reply encouraged me to dig a little bit more into this today. First, I 
wanted to check if indeed the pin was continuously held by the backend or not. 
I added some debug info to ReleaseBuffer for this: it turned out that the pin 
on the buffer was definitely never released by the backend between the calls to 
_bt_first and _bt_next. So the buffer got compacted while the backend held a 
pin on it.
After some more searching I found the following code: _bt_vacuum_one_page in 
nbtinsert.c
This function compacts one single page without taking a super-exclusive lock. 
It is used during inserts to make room on a page. I verified that if I comment 
out the calls to this function, the compacting never happens while I have a pin 
on the buffer.
So I guess that answers my own question: cleaning up garbage during inserts is 
one of the cases where compacting may happen even while other backends hold a 
pin to the buffer. Perhaps this should also be more clearly phrased in the 
comments in eg. _bt_killitems? Because currently those comments make it look 
like this case never occurs.

> While reading the code to answer your question, I noticed that
> BTScanPosIsPinned macro name is misleading.
> It calls BufferIsValid(), not BufferIsPinned() as one could expect.
> And BufferIsValid in bufmgr.h comment explicitly states that it
> shouldn't be confused with BufferIsPinned.
> The same goes for BTScanPosUnpinIfPinned().

I agree the name is misleading. It clearly does something else than how it's 
named. However, I don't believe this introduces problems in these particular 
pieces of code, as long as the macro's are always used. BTScanPosIsPinned 
actually checks whether it's valid and not necessarily whether it's pinned, as 
you mentioned. However, any time the buffer gets unpinned using the macro 
BTScanPosUnpin, the buffer gets set to Invalid by the macro as well. Therefore, 
any consecutive call to BTScanPosIsPinned should indeed return false. It'd 
definitely be nice if this gets clarified in comments though.

-Floris




pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)

2019-08-24 Thread Justin Pryzby
Core was generated by `postgres: telsasoft ts 10.100.2.162(33500) SELECT '.
Program terminated with signal 6, Aborted.
#0  0x0039ff6325e5 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install 
audit-libs-2.4.5-3.el6.x86_64 cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 
geos36-3.6.3-1.rhel6.1.x86_64 json-c-0.11-12.el6.x86_64 
keyutils-libs-1.4-5.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 
libgcc-4.4.7-17.el6.x86_64 libicu-4.2.1-14.el6.x86_64 
libselinux-2.0.94-7.el6.x86_64 libstdc++-4.4.7-17.el6.x86_64 
nspr-4.11.0-1.el6.x86_64 nss-3.21.0-8.el6.x86_64 
nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64 nss-util-3.21.0-2.el6.x86_64 
postgis24_11-2.4.5-1.rhel6.1.x86_64 proj49-4.9.3-3.rhel6.1.x86_64 
zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x0039ff6325e5 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0039ff633dc5 in abort () at abort.c:92
#2  0x0039ff6704f7 in __libc_message (do_abort=2, fmt=0x39ff758a60 "*** 
glibc detected *** %s: %s: 0x%s ***\n") at 
../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x0039ff675f3e in malloc_printerr (action=3, str=0x39ff758df0 "double 
free or corruption (!prev)", ptr=, ar_ptr=) at malloc.c:6360
#4  0x0039ff678dd0 in _int_free (av=0x39ff98e120, p=0x1d40b090, 
have_lock=0) at malloc.c:4846
#5  0x006269e5 in ExecHashJoinNewBatch (pstate=0x2771218) at 
nodeHashjoin.c:1058
#6  ExecHashJoinImpl (pstate=0x2771218) at nodeHashjoin.c:539
#7  ExecHashJoin (pstate=0x2771218) at nodeHashjoin.c:565
#8  0x006131d0 in ExecProcNodeInstr (node=0x2771218) at 
execProcnode.c:461
#9  0x00633286 in ExecProcNode (pstate=0x2771108) at 
../../../src/include/executor/executor.h:247
#10 ExecSort (pstate=0x2771108) at nodeSort.c:107
#11 0x006131d0 in ExecProcNodeInstr (node=0x2771108) at 
execProcnode.c:461
#12 0x0061d51e in ExecProcNode (aggstate=0x2770d30) at 
../../../src/include/executor/executor.h:247
#13 fetch_input_tuple (aggstate=0x2770d30) at nodeAgg.c:406
#14 0x0061ee70 in agg_retrieve_direct (pstate=0x2770d30) at 
nodeAgg.c:1755
#15 ExecAgg (pstate=0x2770d30) at nodeAgg.c:1570
#16 0x006131d0 in ExecProcNodeInstr (node=0x2770d30) at 
execProcnode.c:461
#17 0x0060f4b7 in ExecProcNode (queryDesc=0x2940950, direction=, count=0, execute_once=48) at 
../../../src/include/executor/executor.h:247
#18 ExecutePlan (queryDesc=0x2940950, direction=, count=0, 
execute_once=48) at execMain.c:1723
#19 standard_ExecutorRun (queryDesc=0x2940950, direction=, 
count=0, execute_once=48) at execMain.c:364
#20 0x7ff08f74f618 in pgss_ExecutorRun (queryDesc=0x2940950, 
direction=ForwardScanDirection, count=0, execute_once=true) at 
pg_stat_statements.c:892
#21 0x7ff08f2cc81d in explain_ExecutorRun (queryDesc=0x2940950, 
direction=ForwardScanDirection, count=0, execute_once=true) at 
auto_explain.c:281
#22 0x0075444b in PortalRunSelect (portal=0x2571f00, forward=, count=0, dest=) at pquery.c:932
#23 0x00755631 in PortalRun (portal=0x2571f00, 
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x928e7a8, 
altdest=0x928e7a8, completionTag=0x7ffcaa940670 "") at pquery.c:773
#24 0x00752267 in exec_simple_query (
query_string=0x2753e08 "--BEGIN SQL\nSELECT * FROM\n\n(SELECT site_office 
AS site_gran,\n \tsite_location AS cell,\n\tdisplay_site_name (sect_name, 
site_name, sect_mscid) AS sitename,\n\tperiod, data_cell.bh_day,\n\n", '-' 
, "Ac"...) at postgres.c:1145
#25 0x00753211 in PostgresMain (argc=, argv=, dbname=0x252b1f8 "ts", username=) at 
postgres.c:4182
#26 0x006e2587 in BackendRun (argc=, argv=) at postmaster.c:4358
#27 BackendStartup (argc=, argv=) at 
postmaster.c:4030
#28 ServerLoop (argc=, argv=) at 
postmaster.c:1707
#29 PostmasterMain (argc=, argv=) at 
postmaster.c:1380
#30 0x00656a80 in main (argc=3, argv=0x24f7570) at main.c:228

I found this report from 9.6 last month; copying those parties.
https://www.postgresql.org/message-id/CAHyXU0xzwzgnUTyK62ZkG0_1CQsBHwnTVT2TSX7iwTEv1ve9ag%40mail.gmail.com

Merlin: what OS are you running on and what postgres package, or was it
compiled locally ?

We're running:
postgresql11-server-11.5-1PGDG.rhel6.x86_64
CentOS release 6.8
glibc-2.12-1.192.el6.x86_64
linux 2.6.32-754.3.5.el6.x86_64
DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, 
BIOS 6.00 09/21/2015
Model name:Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz

I see this is available but not installed: 2.12-1.212.el6_10.3

The VM has no known issues.

ts=# SELECT * FROM pg_config();
..
| CONFIGURE | '--enable-rpath' '--prefix=/usr/pgsql-11' 
'--includedir=/usr/pgsql-11/include' '--mandir=/usr/pgsql-11/share/man' 
'--datadir=/usr/pgsql-11/share' '--libdir=/usr/pgsql-11/lib' '--with-icu' 
'--with-perl' '--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64' 
'--with-openssl

Re: LLVM breakage on seawasp

2019-08-24 Thread Thomas Munro
On Sun, Aug 25, 2019 at 9:46 AM Andres Freund  wrote:
> > Maybe we should just use std::unique_ptr's constructor, ie give it new
> > ImportMayTy() instead of using make_unique(), even though that's not
> > cool C++ these days?
>
> Yea, wfm. do you want to make it so?

Done.

-- 
Thomas Munro
https://enterprisedb.com




Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)

2019-08-24 Thread Peter Geoghegan
On Sat, Aug 24, 2019 at 7:25 PM Justin Pryzby  wrote:
> Merlin: what OS are you running on and what postgres package, or was it
> compiled locally ?

I was reminded of this issue from last year, which also appeared to
involve BufFileClose() and a double-free:

https://postgr.es/m/87y3hmee19@news-spur.riddles.org.uk

That was a BufFile that was under the control of a tuplestore, so it
was similar to but different from your case. I suspect it's related.

-- 
Peter Geoghegan