Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

2020-07-26 Thread Bharath Rupireddy
On Fri, Jul 24, 2020 at 8:07 PM Robert Haas  wrote:
>
> On Fri, Jul 24, 2020 at 7:10 AM Bharath Rupireddy
>  wrote:
> > I looked at what actually llvm_shutdown() does? It frees up JIT stacks,
also if exists perf related resource, using LLVMOrcDisposeInstance() and
LLVMOrcUnregisterPerf(), that were dynamically allocated in
llvm_session_initialize through a JIT library function
LLVMOrcCreateInstance() [1].
> >
> > It looks like there is no problem in moving llvm_shutdown to either
on_shmem_exit() or on_proc_exit().
>
> If it doesn't involve shared memory, I guess it can be on_proc_exit()
> rather than on_shmem_exit().
>
> I guess the other question is why we're doing it at all. What
> resources are being allocated that wouldn't be freed up by process
> exit anyway?
>

LLVMOrcCreateInstance() and LLVMOrcDisposeInstance() are doing new and
delete respectively, I just found these functions from the link [1]. But I
don't exactly know whether there are any other resources being allocated
that can't be freed up by proc_exit(). Tagging @Andres Freund  for inputs
on whether we have any problem making llvm_shutdown() a on_proc_exit()
callback instead of before_shmem_exit() callback.

And as suggested in the previous mails, we wanted to make it on_proc_exit()
to avoid the abort issue reported in this mail chain, however if we take
the abort issue fix commit # 303640199d0436c5e7acdf50b837a027b5726594 as
mentioned in the previous response[2], then it may not be necessary, right
now, but just to be safer and to avoid any of these similar kind of issues
in future, we can consider this change as well.

[1] - https://llvm.org/doxygen/OrcCBindings_8cpp_source.html

 LLVMOrcJITStackRef LLVMOrcCreateInstance(LLVMTargetMachineRef TM) {
  TargetMachine *TM2(unwrap(TM));
   Triple T(TM2->getTargetTriple());
   auto IndirectStubsMgrBuilder =
  orc::createLocalIndirectStubsManagerBuilder(T);
   OrcCBindingsStack *JITStack =
  new OrcCBindingsStack(*TM2, std::move(IndirectStubsMgrBuilder));
   return wrap(JITStack);
 }

LLVMErrorRef LLVMOrcDisposeInstance(LLVMOrcJITStackRef JITStack) {
  auto *J = unwrap(JITStack);
  auto Err = J->shutdown();
  delete J;
  return wrap(std::move(Err));
 }

[2] -
https://www.postgresql.org/message-id/CALj2ACVwOKZ8qYUsZrU2y2efnYZOLRxPC6k52FQcB3oriH9Kcg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Re: autovac issue with large number of tables

2020-07-26 Thread Masahiko Sawada
On Mon, 27 Jul 2020 at 06:43, Nasby, Jim  wrote:
>
> A database with a very large number of  tables eligible for autovacuum can 
> result in autovacuum workers “stuck” in a tight loop of 
> table_recheck_autovac() constantly reporting nothing to do on the table. This 
> is because a database with a very large number of tables means it takes a 
> while to search the statistics hash to verify that the table still needs to 
> be processed[1]. If a worker spends some time processing a table, when it’s 
> done it can spend a significant amount of time rechecking each table that it 
> identified at launch (I’ve seen a worker in this state for over an hour). A 
> simple work-around in this scenario is to kill the worker; the launcher will 
> quickly fire up a new worker on the same database, and that worker will build 
> a new list of tables.
>
>
>
> That’s not a complete solution though… if the database contains a large 
> number of very small tables you can end up in a state where 1 or 2 workers is 
> busy chugging through those small tables so quickly than any additional 
> workers spend all their time in table_recheck_autovac(), because that takes 
> long enough that the additional workers are never able to “leapfrog” the 
> workers that are doing useful work.
>

As another solution, I've been considering adding a queue having table
OIDs that need to vacuumed/analyzed on the shared memory (i.g. on
DSA). Since all autovacuum workers running on the same database can
see a consistent queue, the issue explained above won't happen and
probably it makes the implementation of prioritization of tables being
vacuumed easier which is sometimes discussed on pgsql-hackers. I guess
it might be worth to discuss including this idea.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Reigning in ExecParallelHashRepartitionFirst

2020-07-26 Thread Thomas Munro
On Thu, Jul 9, 2020 at 8:17 AM Melanie Plageman
 wrote:
> Last week as I was working on adaptive hash join [1] and trying to get
> parallel adaptive hash join batch 0 to spill correctly, I noticed what
> seemed like a problem with the code to repartition batch 0.
>
> If we run out of space while inserting tuples into the hashtable during
> the build phase of parallel hash join and proceed to increase the number
> of batches, we need to repartition all of the tuples from the old
> generation (when nbatch was x) and move them to their new homes in the
> new generation (when nbatch is 2x). Before we do this repartitioning we
> disable growth in the number of batches.
>
> Then we repartition the tuples from the hashtable, inserting them either
> back into the hashtable or into a batch file. While inserting them into
> the hashtable, we call ExecParallelHashTupleAlloc(), and, if there is no
> space for the current tuple in the current chunk and growth in the
> number of batches is disabled, we go ahead and allocate a new chunk of
> memory -- regardless of whether or not we will exceed the space limit.

Hmm.  It shouldn't really be possible for
ExecParallelHashRepartitionFirst() to run out of memory anyway,
considering that the input of that operation previously fit (just... I
mean we started repartitioning because one more chunk would have
pushed us over the edge, but the tuples so far fit, and we'll insert
them in the same order for each input chunk, possibly filtering some
out).  Perhaps you reached this condition because
batches[0].shared->size finishes up accounting for the memory used by
the bucket array in PHJ_GROW_BUCKETS_ELECTING, but didn't originally
account for it in generation 0, so what previously appeared to fit no
longer does :-(.  I'll look into that.




Re: Global snapshots

2020-07-26 Thread Andrey V. Lepikhov

On 7/27/20 11:22 AM, tsunakawa.ta...@fujitsu.com wrote:

Hi Andrey san, Movead san,


From: tsunakawa.ta...@fujitsu.com 

While Clock-SI seems to be considered the best promising for global
serializability here,

* Why does Clock-SI gets so much attention?  How did Clock-SI become the
only choice?

* Clock-SI was devised in Microsoft Research.  Does Microsoft or some other
organization use Clock-SI?


Could you take a look at this patent?  I'm afraid this is the Clock-SI for MVCC.  Microsoft 
holds this until 2031.  I couldn't find this with the keyword "Clock-SI.""


US8356007B2 - Distributed transaction management for database systems with 
multiversioning - Google Patents
https://patents.google.com/patent/US8356007


If it is, can we circumvent this patent?


Regards
Takayuki Tsunakawa




Thank you for the research (and previous links too).
I haven't seen this patent before. This should be carefully studied.

--
regards,
Andrey Lepikhov
Postgres Professional




RE: Global snapshots

2020-07-26 Thread tsunakawa.ta...@fujitsu.com
Hi Andrey san, Movead san,


From: tsunakawa.ta...@fujitsu.com 
> While Clock-SI seems to be considered the best promising for global
> serializability here,
> 
> * Why does Clock-SI gets so much attention?  How did Clock-SI become the
> only choice?
> 
> * Clock-SI was devised in Microsoft Research.  Does Microsoft or some other
> organization use Clock-SI?

Could you take a look at this patent?  I'm afraid this is the Clock-SI for 
MVCC.  Microsoft holds this until 2031.  I couldn't find this with the keyword 
"Clock-SI.""


US8356007B2 - Distributed transaction management for database systems with 
multiversioning - Google Patents
https://patents.google.com/patent/US8356007


If it is, can we circumvent this patent?


Regards
Takayuki Tsunakawa




Re: proposal: possibility to read dumped table's name from file

2020-07-26 Thread Pavel Stehule
ne 26. 7. 2020 v 21:10 odesílatel Justin Pryzby 
napsal:

> On Sat, Jul 25, 2020 at 06:56:31PM +0530, vignesh C wrote:
> > On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule 
> wrote:
> > >> I meant can this:
> > >> printf(_("  --filter=FILENAMEread object name filter
> > >> expressions from file\n"));
> > >> be changed to:
> > >> printf(_("  --filter=FILENAMEdump objects and data based
> > >> on the filter expressions from the filter file\n"));
> > >
> > > done in today patch
> >
> > Thanks for fixing the  comments.
> > Few comments:
> > + /* use "-" as symbol for stdin */
> > + if (strcmp(filename, "-") != 0)
> > + {
> > + fp = fopen(filename, "r");
> > + if (!fp)
> > + fatal("could not open the input file \"%s\": %m",
> > +   filename);
> > + }
> > + else
> > + fp = stdin;
> >
> > We could use STDIN itself instead of -, it will be a more easier
> > option to understand.
>
> I think "-" is used widely for commandline tools, and STDIN is not (even
> though
> it's commonly used by programmers).  For example, since last year,
> pg_restore
> -f - means stdout.
>

yes, STDIN is used by programming languages, but it is not usual in command
line tools. And because it was used by pg_restore, then we should not use
new inconsistency.

Regards

Pavel


> --
> Justin
>


Re: proposal: possibility to read dumped table's name from file

2020-07-26 Thread Pavel Stehule
so 25. 7. 2020 v 15:26 odesílatel vignesh C  napsal:

> On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule 
> wrote:
> >> I meant can this:
> >> printf(_("  --filter=FILENAMEread object name filter
> >> expressions from file\n"));
> >> be changed to:
> >> printf(_("  --filter=FILENAMEdump objects and data based
> >> on the filter expressions from the filter file\n"));
> >
> > done in today patch
> >
>
> Thanks for fixing the  comments.
> Few comments:
> + /* use "-" as symbol for stdin */
> + if (strcmp(filename, "-") != 0)
> + {
> + fp = fopen(filename, "r");
> + if (!fp)
> + fatal("could not open the input file \"%s\": %m",
> +   filename);
> + }
> + else
> + fp = stdin;
>
> We could use STDIN itself instead of -, it will be a more easier
> option to understand.
>
> + /* when first char is hash, ignore whole line */
> + if (*line == '#')
> + continue;
>
> If line starts with # we ignore that line, I feel this should be
> included in the documentation.
>


Good note - I wrote sentence to doc

+   
+The lines starting with symbol # are ignored.
+Previous white chars (spaces, tabs) are not allowed. These
+lines can be used for comments, notes.
+   
+


> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7a37fd8045..0e0fbc90db 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -755,6 +755,56 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Read objects filters from the specified file.
+If you use "-" as a filename, the filters are read from stdin.
+The lines of this file must have the following format:
+
+(+|-)[tnfd] objectname
+
+   
+
+   
+The first character specifies whether the object is to be included
+(+) or excluded (-), and the
+second character specifies the type of object to be filtered:
+t (table),
+n (schema),
+f (foreign table),
+d (table data).
+   
+
+   
+With the following filter file, the dump would include table
+mytable1 and data from foreign table
+some_foreign_table, but exclude data
+from table mytable2.
+
++t mytable1
++f some_foreign_table
+-d mytable2
+
+   
+
+   
+The lines starting with symbol # are ignored.
+Previous white chars (spaces, tabs) are not allowed. These
+lines can be used for comments, notes.
+   
+
+   
+The --filter option works just like the other
+options to include or exclude tables, schemas, table data, or foreign
+tables, and both forms may be combined.  Note that there are no options
+to exclude a specific foreign table or to include a specific table's
+data.
+   
+  
+ 
+
  
   --if-exists
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 94459b3539..8df1f91cb6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -290,6 +290,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(TableInfo *tbinfo);
+static void read_patterns_from_file(char *filename, DumpOptions *dopt);
 
 
 int
@@ -364,6 +365,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, &dopt.enable_row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"extra-float-digits", required_argument, NULL, 8},
+		{"filter", required_argument, NULL, 12},
 		{"if-exists", no_argument, &dopt.if_exists, 1},
 		{"inserts", no_argument, NULL, 9},
 		{"lock-wait-timeout", required_argument, NULL, 2},
@@ -603,6 +605,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* filter implementation */
+read_patterns_from_file(optarg, &dopt);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -1022,6 +1028,8 @@ help(const char *progname)
 			 "   access to)\n"));
 	printf(_("  --exclude-table-data=PATTERN do NOT dump data for the specified table(s)\n"));
 	printf(_("  --extra-float-digits=NUM override default setting for extra_float_digits\n"));
+	printf(_("  --filter=FILENAMEdump objects and data based on the filter expressions\n"
+			 "   from the filter file\n"));
 	printf(_("  --if-exists  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "   include data of foreign tables on foreign\n"
@@ -18434,3 +18442,211 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse reloptions array")

Re: Parallel worker hangs while handling errors.

2020-07-26 Thread Bharath Rupireddy
On Sat, Jul 25, 2020 at 7:02 AM vignesh C  wrote:
>
> I have made slight changes on top of the patch to remove duplicate
> code, attached v3 patch for the same.
> The parallel worker hang issue gets resolved, make check & make
> check-world passes.
>

Having a function to unblock selective signals for a bg worker looks good to me.

Few comments:
1. Do we need "worker" as a function argument in
update_parallel_worker_sigmask(BackgroundWorker *worker, ? Since
MyBgworkerEntry is a global variable, can't we have a local variable
instead?
2. Instead of update_parallel_worker_sigmask() serving only for
parallel workers, can we make it generic, so that for any bgworker,
given a signal it unblocks it, although there's no current use case
for a bg worker unblocking a single signal other than a parallel
worker doing it for SIGUSR1 for this hang issue. Please note that we
have BackgroundWorkerBlockSignals() and
BackgroundWorkerUnblockSignals().
I slightly modified your function, something like below?

void
BackgroundWorkerUpdateSignalMask(int signum, bool toblock)
{
if (toblock)
sigaddset(&BlockSig, signum);
else
sigdelset(&BlockSig, signum);

PG_SETMASK(&BlockSig);
}

/*to unblock SIGUSR1*/
if ((worker->bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
BackgroundWorkerUpdateSignalMask(SIGUSR1, false);

/*to block SIGUSR1*/
if ((worker->bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
BackgroundWorkerUpdateSignalMask(SIGUSR1, true);

If okay, with the BackgroundWorkerUpdateSignalMask() function, please
note that we might have to add it in bgworker.sgml as well.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-26 Thread Ashutosh Sharma
Hi,

Thanks for sharing your thoughts. Please find my comments inline below:


>
> I think here we should report that we haven't done what was asked.
> +   /* Nothing to do if the itemid is unused or
> already dead. */
> +   if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
> +   continue;
>
>
Okay. Will add a log message saying "skipping tid ... because ..."


> Also, should we try to fix VM along the way?
>

I think we should let VACUUM do that.


> Are there any caveats with concurrent VACUUM? (I do not see any, just
> asking)
>

As of now, I don't see any.


> It would be good to have some checks for interrupts in safe places.
>

I think I have already added those wherever I felt it was required. If you
feel it's missing somewhere, it could be good if you could point it out.


> I think we should not trust user entierly here. I'd prefer validation and
> graceful exit, not a core dump.
> +   Assert(noffs <= PageGetMaxOffsetNumber(page));
>
>
Yeah, sounds reasonable. Will do that.


> For some reason we had unlogged versions of these functions. But I do not
> recall exact rationale..
> Also, I'd be happy if we had something like "Restore this tuple iff this
> does not break unique constraint". To do so we need to sort tids by
> xmin\xmax, to revive most recent data first.
>

I didn't get this point. Could you please elaborate.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


Re: new heapcheck contrib module

2020-07-26 Thread Amul Sul
On Tue, Jul 21, 2020 at 2:32 AM Mark Dilger
 wrote:
> []
> >
> > +   StaticAssertStmt(InvalidOffsetNumber + 1 == 
> > FirstOffsetNumber,
> > +"InvalidOffsetNumber
> > increments to FirstOffsetNumber");
> >
> > If you are going to rely on this property, I agree that it is good to
> > check it. But it would be better to NOT rely on this property, and I
> > suspect the code can be written quite cleanly without relying on it.
> > And actually, that's what you did, because you first set ctx.offnum =
> > InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
> > the loop initializer. So AFAICS the first initializer, and the static
> > assert, are pointless.
>
> Ah, right you are.  Removed.
>

I can see the same assert and the unnecessary assignment in v12-0002,  is that
the same thing that is supposed to be removed, or am I missing something?

> []
> > +confess(HeapCheckContext * ctx, char *msg)
> > +TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
> > +check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
> >
> > This is what happens when you pgindent without adding all the right
> > things to typedefs.list first ... or when you don't pgindent and have
> > odd ideas about how to indent things.
>
> Hmm.  I don't see the three lines of code you are quoting.  Which patch is 
> that from?
>

I think it was the same thing related to my previous suggestion to list new
structures to typedefs.list.  V12 has listed new structures but I think there
are still some more adjustments needed in the code e.g. see space between
HeapCheckContext and * (asterisk) that need to be fixed. I am not sure if the
pgindent will do that or not.

Here are a few more minor comments for the v12-0002 patch & some of them
apply to other patches as well:

 #include "utils/snapmgr.h"
-
+#include "amcheck.h"

Doesn't seem to be at the correct place -- need to be in sorted order.


+ if (!PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block " INT64_FORMAT
+ " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(3;
+ if (!PG_ARGISNULL(4))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block " INT64_FORMAT
+ " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(4;

I think these errmsg() strings also should be in one line.


+ if (fatal)
+ {
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toastrel)
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);

Toast index and rel closing block style is not the same as at the ending of
verify_heapam().


+ /* If we get this far, we know the relation has at least one block */
+ startblock = PG_ARGISNULL(3) ? 0 : PG_GETARG_INT64(3);
+ endblock = PG_ARGISNULL(4) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(4);
+ if (startblock < 0 || endblock >= ctx.nblocks || startblock > endblock)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block range " INT64_FORMAT " .. " INT64_FORMAT
+ " is out of bounds for relation with block count %u",
+ startblock, endblock, ctx.nblocks)));
+
...
...
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;

Other than endblock < 0 case, do we really need that?  I think due to the above
error check the rest of the cases will not reach this place.


+ confess(ctx, psprintf(
+   "tuple xmax %u follows last assigned xid %u",
+   xmax, ctx->nextKnownValidXid));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("tuple's header size is %u bytes which is less than the %u
byte minimum valid header size",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));

confess() call has two different code styles, first one where psprintf()'s only
argument got its own line and second style where psprintf has its own line with
the argument. I think the 2nd style is what we do follow & correct, not the
former.


+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap",
+ RelationGetRelationName(rel;

Like elsewhere,  can we have errmsg as "only heap AM is supported" and error
code is ERRCODE_FEATURE_NOT_SUPPORTED ?


That all, for now, apologize for multiple review emails.

Regards,
Amul




Re: Loaded footgun open_datasync on Windows

2020-07-26 Thread Michael Paquier
On Thu, Jul 23, 2020 at 01:05:04PM -0400, Jeff Janes wrote:
> I have noticed this before, but since it wasn't a production machine I just
> shrugged it off as being a hazard of using consumer-grade stuff; it didn't
> seem to be worth investigating further.

The most direct and non-invasive way to address this problem in
back-branches would be to copy the non-concurrent logic in
src/port/open.c directly into pg_test_fsync.c.  We have done that in
the past such things for example with restricted tokens on Windows for
pg_upgrade, see fa1e5afa but that was much more important.  I am not
sure that if it is worth the time spent here though, as it took
roughly 10 years to notice the problem (and I don't really count the
time where the tool was under src/tools/ but this did not build the
tool on Windows).
--
Michael


signature.asc
Description: PGP signature


Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.

2020-07-26 Thread Andy Fan
>
>
> 2. Currently I want to add a new GUC parameter, if set it to true, server
> will
> create a holdable portal, or else nothing changed.  Then let the user set
> it to true in the above case and reset it to false afterward.  Is there
> any issue
> with this method?
>
>
I forget to say in this case, the user has to drop the holdable
portal  explicitly.


-- 
Best Regards
Andy Fan


Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.

2020-07-26 Thread Andy Fan
I have a user case like this:

rs = prepared_stmt.execute(1);
while(rs.next())
{
// do something with the result and commit the transaction.
conn.commit();
}

The driver used the extended protocol in this case. It works like this: 1).
Parse ->
PreparedStmt.  2). Bind -> Bind the prepared stmt with a Portal, no chance
to
set the CURSOR_OPT_HOLD option.  3). Execute.   4). Commit - the portal was
dropped at this stage.  5). when fetching the next batch of results, we get
the error
"Portal doesn't exist"

There are several methods we can work around this, but no one is perfect.
1.run the prepared stmt in a dedicated connection.  (The number of
connection will
doubled)
2. use the with hold cursor.  It doesn't support any bind parameter, so we
have
   to create a cursor for each dedicated id.
3. don't commit the transaction.  -- long transaction with many rows locked.

I have several questions about this case:
1. How about filling a cursorOptions information in bind protocol?  then we
can
set the portal->cursorOptions accordingly?  if so, how to be compatible
with the
old driver usually?
2. Currently I want to add a new GUC parameter, if set it to true, server
will
create a holdable portal, or else nothing changed.  Then let the user set
it to true in the above case and reset it to false afterward.  Is there any
issue
with this method?

-- 
Best Regards
Andy Fan


Re: INSERT INTO SELECT, Why Parallelism is not selected?

2020-07-26 Thread Amit Kapila
On Sun, Jul 26, 2020 at 4:54 PM Amit Kapila  wrote:
>
> On Sat, Jul 25, 2020 at 8:42 PM Tom Lane  wrote:
> >
>
> No, "git diff --check" doesn't help.  I have tried pgindent but that
> also doesn't help neither was I expecting it to help.  I am still not
> able to figure out how I goofed up this but will spend some more time
> on this.
>

I think I have figured out how the patch ended up having  . Some
editors add it when we use two spaces after a period (.) and I could
see that my Gmail client does that for such a pattern.  Normally, I
use one of MSVC, vi, or NetBeans IDE to develop code/patch but
sometimes I check the comments by pasting in Gmail client to find
typos or such and then fix them manually. I guess in this case I have
used Gmail client to write this comment and then copy/pasted it for
the patch. I will be careful not to do this in the future.

-- 
With Regards,
Amit Kapila.




Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.

2020-07-26 Thread Justin Pryzby
On Mon, Jul 27, 2020 at 10:48:45AM +1200, David Rowley wrote:
> On Wed, 1 Jul 2020 at 18:46, Jeff Davis  wrote:
> >
> > On Tue, Jun 30, 2020, 7:04 PM David Rowley  wrote:
> >>
> >> Does anyone have any objections to that being changed?
> >
> > That's OK with me. By the way, I'm on vacation and will catch up on these 
> > HashAgg threads next week.
> 
> (Adding Justin as I know he's expressed interest in the EXPLAIN output
> of HashAgg before)

Thanks.

It's unrelated to hashAgg vs hashJoin, but I also noticed that this is shown
only conditionally:

if (es->format != EXPLAIN_FORMAT_TEXT)
{
if (es->costs && aggstate->hash_planned_partitions > 0)
{
ExplainPropertyInteger("Planned Partitions", NULL,
   
aggstate->hash_planned_partitions, es);

That was conditional since it was introduced at 1f39bce02:

if (es->costs && aggstate->hash_planned_partitions > 0)
{
ExplainPropertyInteger("Planned Partitions", NULL,
   
aggstate->hash_planned_partitions, es);
}

I think 40efbf870 should've handled this, too.

-- 
Justin




Re: Fast DSM segments

2020-07-26 Thread Thomas Munro
On Sat, Jun 20, 2020 at 7:17 AM Andres Freund  wrote:
> On 2020-06-19 17:42:41 +1200, Thomas Munro wrote:
> > On Thu, Jun 18, 2020 at 6:05 PM Thomas Munro  wrote:
> > > Here's a version that adds some documentation.
> >
> > I jumped on a dual socket machine with 36 cores/72 threads and 144GB
> > of RAM (Azure F72s_v2) running Linux, configured with 50GB of huge
> > pages available, and I ran a very simple test: select count(*) from t
> > t1 join t t2 using (i), where the table was created with create table
> > t as select generate_series(1, 4)::int i, and then prewarmed
> > into 20GB of shared_buffers.
>
> I assume all the data fits into 20GB?

Yep.

> Which kernel version is this?

Tested on 4.19 (Debian stable/10).

> How much of the benefit comes from huge pages being used, how much from
> avoiding the dsm overhead, and how much from the page table being shared
> for that mapping? Do you have a rough idea?

Without huge pages, the 36 process version of the test mentioned above
shows around a 1.1x speedup, which is in line with the numbers from my
first message (which was from a much smaller computer).  The rest of
the speedup (2x) is due to huge pages.

Further speedups are available by increasing the hash chunk size, and
probably doing NUMA-aware allocation, in later work.

Here's a new version, using the name min_dynamic_shared_memory, which
sounds better to me.  Any objections?  I also fixed the GUC's maximum
setting so that it's sure to fit in size_t.
From 3a77e8ab67e5492340425f1ce4a36a45a8c80d05 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 27 Jul 2020 12:07:05 +1200
Subject: [PATCH v3] Preallocate some DSM space at startup.

Create an optional region in the main shared memory segment that can be
used to acquire and release "fast" DSM segments, and can benefit from
huge pages allocated at cluster startup time, if configured.  Fall back
to the existing mechanisms when that space is full.  The size is
controlled by a new GUC min_dynamic_shared_memory, defaulting to 0.

Main region DSM segments initially contain whatever garbage the memory
held last time they were used, rather than zeroes.  That change revealed
that DSA areas failed to initialize themselves correctly in memory that
wasn't zeroed first, so fix that problem.

Discussion: https://postgr.es/m/CA%2BhUKGLAE2QBv-WgGp%2BD9P_J-%3Dyne3zof9nfMaqq1h3EGHFXYQ%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |  26 +++
 src/backend/storage/ipc/dsm.c | 184 --
 src/backend/storage/ipc/dsm_impl.c|   3 +
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/utils/misc/guc.c  |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/mmgr/dsa.c  |   5 +-
 src/include/storage/dsm.h |   3 +
 src/include/storage/dsm_impl.h|   1 +
 9 files changed, 212 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6ce5907896..48fef11041 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1864,6 +1864,32 @@ include_dir 'conf.d'
   
  
 
+ 
+  min_dynamic_shared_memory (integer)
+  
+   min_dynamic_shared_memory configuration parameter
+  
+  
+  
+   
+Specifies the amount of memory that should be allocated at server
+startup time for use by parallel queries.  When this memory region is
+insufficient or exhausted by concurrent parallel queries, new
+parallel queries try to allocate extra shared memory temporarily from
+the operating system using the method configured with
+dynamic_shared_memory_type, which may be slower
+due to memory management overheads.
+Memory that is allocated at startup time with
+min_dynamic_shared_memory is affected by the
+huge_pages setting on operating systems where that
+is supported, and may be more likely to benefit from larger pages on
+operating systems where page size is managed automatically.  Larger
+memory pages can improve the performance of parallel hash joins.
+The default value is 0 (none).
+   
+  
+ 
+
  
  
 
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index ef64d08357..b9941496a3 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -35,10 +35,12 @@
 
 #include "lib/ilist.h"
 #include "miscadmin.h"
+#include "port/pg_bitutils.h"
 #include "storage/dsm.h"
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/pg_shmem.h"
+#include "utils/freepage.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner_private.h"
@@ -76,6 +78,8 @@ typedef struct dsm_control_item
 {
 	dsm_handle	handle;
 	uint32		refcnt;			/* 2+ = active, 1 = moribund, 0 = gone */
+	size_t		first_page;
+	size_t		npages;
 	void	   *impl_pr

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-26 Thread Amit Langote
On Sun, Jul 26, 2020 at 6:06 AM Tom Lane  wrote:
> Amit Langote  writes:
> > [ v7-0001-Improve-performance-of-binary-COPY-FROM-with-buff.patch ]
>
> Pushed with cosmetic changes.

Thanks for that.

> I'd always supposed that stdio does enough internal buffering that short
> fread()s shouldn't be much worse than memcpy().  But I reproduced your
> result of ~30% speedup for data with a lot of narrow text columns, using
> RHEL 8.2.  Thinking this an indictment of glibc, I also tried it on
> current macOS, and saw an even bigger speedup, approaching 50%.  So
> there's definitely something to this.  I wonder if we ought to check
> other I/O-constrained users of fread and fwrite, like pg_dump/pg_restore.

Ah, maybe a good idea to check that.

> A point that I did not see addressed in the thread is whether this
> has any negative impact on the copy-from-frontend code path, where
> there's no fread() to avoid; short reads from CopyGetData() are
> already going to be satisfied by memcpy'ing from the fe_msgbuf.
> However, a quick check suggests that this patch is still a small
> win for that case too --- apparently the control overhead in
> CopyGetData() is not negligible.

Indeed.

> So the patch seems fine functionally, but there were some cosmetic
> things I didn't like:
>
> * Removing CopyGetInt32 and CopyGetInt16 seemed like a pretty bad
> idea, because it made the callers much uglier and more error-prone.
> This is a particularly bad example:
>
> /* Header extension length */
> -   if (!CopyGetInt32(cstate, &tmp) ||
> -   tmp < 0)
> +   if (CopyReadBinaryData(cstate, (char *) &tmp, sizeof(tmp)) !=
> +   sizeof(tmp) || (tmp = (int32) pg_ntoh32(tmp)) < 0)
>
> Putting side-effects into late stages of an if-condition is just
> awful coding practice.  They're easy for a reader to miss and they
> are magnets for bugs, because of the possibility that control doesn't
> reach that part of the condition.
>
> You can get the exact same speedup without any of those disadvantages
> by marking these two functions "inline", so that's what I did.
>
> * I dropped the DRAIN_COPY_RAW_BUF macro too, as in my estimation it was
> a net negative for readability.  With only two use-cases, having it made
> the code longer not shorter; I was also pretty unconvinced about the
> wisdom of having some of the loop's control logic inside the macro and
> some outside.
>
> * BTW, the macro definitions weren't particularly per project style
> anyway.  We generally put at least one space before line-ending
> backslashes.  I don't think pgindent will fix this for you; IME
> it doesn't touch macro definitions at all.
>
> * Did some more work on the comments.

Thanks for these changes.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: handle a ECPG_bytea typo

2020-07-26 Thread Michael Paquier
On Sat, Jul 25, 2020 at 06:17:42PM +0900, Michael Paquier wrote:
> ECPGset_noind_null() and ECPGis_noind_null() in misc.c show that
> ECPGgeneric_bytea is attached to ECPGt_bytea.  The two structures may
> be the same now, but if a bug fix or a code change involves a change
> in the structure definition we could run into problems.  So let's fix
> and back-patch this change.  I am not spotting other areas impacted,
> and I'll try to take care at the beginning of next week.

Okay, fixed as e971357.  The issue came from 050710b, so this fix was
only needed in 12~.
--
Michael


signature.asc
Description: PGP signature


Re: Parallel bitmap index scan

2020-07-26 Thread Dilip Kumar
On Mon, 27 Jul 2020 at 3:48 AM, Thomas Munro  wrote:

> On Mon, Jul 27, 2020 at 1:58 AM Dilip Kumar  wrote:
> > On Sun, Jul 26, 2020 at 6:42 PM Dilip Kumar 
> wrote:
> > >
> > > I would like to propose a patch for enabling the parallelism for the
> > > bitmap index scan path.
>
> Workers Planned: 4
> ->  Parallel Bitmap Heap Scan on tenk1
>   Recheck Cond: (hundred > 1)
> - ->  Bitmap Index Scan on tenk1_hundred
> + ->  Parallel Bitmap Index Scan on tenk1_hundred
> Index Cond: (hundred > 1)
>
> +1, this is a very good feature to have.
>
> +   /* Merge bitmap to a common
> shared bitmap */
> +   SpinLockAcquire(&pstate->mutex);
> +   tbm_merge(tbm,
> &pstate->tbm_shared, &pstate->pt_shared);
> +   SpinLockRelease(&pstate->mutex);
>
> This doesn't look like a job for a spinlock.


Yes I agree with that.

You have a barrier so that you can wait until all workers have
> finished merging their partial bitmap into the complete bitmap, which
> makes perfect sense.  You also use that spinlock (probably should be
> LWLock) to serialise the bitmap merging work...  Hmm, I suppose there
> would be an alternative design which also uses the barrier to
> serialise the merge, and has the merging done entirely by one process,
> like this:
>
>   bool chosen_to_merge = false;
>
>   /* Attach to the barrier, and see what phase we're up to. */
>   switch (BarrierAttach())
>   {
>   case PBH_BUILDING:
> ... build my partial bitmap in shmem ...
> chosen_to_merge = BarrierArriveAndWait();
> /* Fall through */
>
>   case PBH_MERGING:
> if (chosen_to_merge)
>   ... perform merge of all partial results into final shmem bitmap ...
> BarrierArriveAndWait();
> /* Fall through */
>
>   case PBH_SCANNING:
> /* We attached too late to help build the bitmap.  */
> BarrierDetach();
> break;
>   }
>
> Just an idea, not sure if it's a good one.  I find it a little easier
> to reason about the behaviour of late-attaching workers when the
> phases are explicitly named and handled with code like that (it's not
> immediately clear to me whether your coding handles late attachers
> correctly, which seems to be one of the main problems to think about
> with "dynamic party" parallelism...).


Yeah this seems better idea.  I am handling late attachers case but the
idea of using the barrier phase looks quite clean.  I will change it this
way.

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: hashagg slowdown due to spill changes

2020-07-26 Thread Jeff Davis
On Sat, 2020-07-25 at 15:08 -0700, Peter Geoghegan wrote:
> I find that Andres original "SELECT cat, count(*) FROM
> fewgroups_many_rows GROUP BY 1;" test case is noticeably improved by
> the patch. Without the patch, v13 takes ~11.46 seconds. With the
> patch, it takes only ~10.64 seconds.

I saw less of an improvement than you or Andres (perhaps just more
noise). But given that both you and Andres are reporting a measurable
improvement, then I went ahead and committed it. Thank you.

Regards,
Jeff Davis






Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.

2020-07-26 Thread David Rowley
On Wed, 1 Jul 2020 at 18:46, Jeff Davis  wrote:
>
> On Tue, Jun 30, 2020, 7:04 PM David Rowley  wrote:
>>
>> Does anyone have any objections to that being changed?
>
> That's OK with me. By the way, I'm on vacation and will catch up on these 
> HashAgg threads next week.

(Adding Justin as I know he's expressed interest in the EXPLAIN output
of HashAgg before)

I've written a patch to bring the HashAgg EXPLAIN ANALYZE output to be
more aligned to the Hash Join output.

Couple of things I observed about Hash Join EXPLAIN ANALYZE:
1. The number of batches starts at 1.
2. We always display the number of batches.
3. We write "Batches" for text format and "Hash Batches" for non-text formats.
4. We write "Memory Usage" for text format and "Peak Memory Usage" for
non-text formats.
5. "Batches" comes before memory usage.

Before this patch, HashAgg EXPLAIN ANALYZE output would:
1. Start the number of batches at 0.
2. Only display "Hash Batches" when batches > 0.
3. Used the words "HashAgg Batches" for text and non-text formats.
4. Used the words "Peak Memory Usage" for text and non-text formats.
5. "Hash Batches" was written after memory usage.

In the attached patch I've changed HashAgg to be aligned to Hash Join
on each of the points above.

e.g.

Before:

postgres=# explain analyze select c.relname,count(*) from pg_class c
inner join pg_Attribute a on c.oid = a.attrelid group by c.relname;
 QUERY PLAN

 HashAggregate  (cost=138.37..142.23 rows=386 width=72) (actual
time=3.121..3.201 rows=427 loops=1)
   Group Key: c.relname
   Peak Memory Usage: 109kB
   ->  Hash Join  (cost=21.68..124.10 rows=2855 width=64) (actual
time=0.298..1.768 rows=3153 loops=1)
 Hash Cond: (a.attrelid = c.oid)
 ->  Seq Scan on pg_attribute a  (cost=0.00..93.95 rows=3195
width=4) (actual time=0.011..0.353 rows=3153 loops=1)
 ->  Hash  (cost=16.86..16.86 rows=386 width=68) (actual
time=0.279..0.279 rows=427 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 50kB
   ->  Seq Scan on pg_class c  (cost=0.00..16.86 rows=386
width=68) (actual time=0.007..0.112 rows=427 loops=1)
 Planning Time: 0.421 ms
 Execution Time: 3.294 ms
(11 rows)

After:

postgres=# explain analyze select c.relname,count(*) from pg_class c
inner join pg_Attribute a on c.oid = a.attrelid group by c.relname;
  QUERY PLAN
--
 HashAggregate  (cost=566.03..580.00 rows=1397 width=72) (actual
time=13.097..13.430 rows=1397 loops=1)
   Group Key: c.relname
   Batches: 1  Memory Usage: 321kB
   ->  Hash Join  (cost=64.43..496.10 rows=13985 width=64) (actual
time=0.838..7.546 rows=13985 loops=1)
 Hash Cond: (a.attrelid = c.oid)
 ->  Seq Scan on pg_attribute a  (cost=0.00..394.85 rows=13985
width=4) (actual time=0.010..1.462 rows=13985 loops=1)
 ->  Hash  (cost=46.97..46.97 rows=1397 width=68) (actual
time=0.820..0.821 rows=1397 loops=1)
   Buckets: 2048  Batches: 1  Memory Usage: 153kB
   ->  Seq Scan on pg_class c  (cost=0.00..46.97 rows=1397
width=68) (actual time=0.009..0.362 rows=1397 loops=1)
 Planning Time: 0.440 ms
 Execution Time: 13.634 ms
(11 rows)

(ignore the change in memory consumption. That was due to adding
records for testing)

Any objections to this change?

David


yet_more_hashagg_explain_fixes.patch
Description: Binary data


Re: Parallel bitmap index scan

2020-07-26 Thread Thomas Munro
On Mon, Jul 27, 2020 at 1:58 AM Dilip Kumar  wrote:
> On Sun, Jul 26, 2020 at 6:42 PM Dilip Kumar  wrote:
> >
> > I would like to propose a patch for enabling the parallelism for the
> > bitmap index scan path.

Workers Planned: 4
->  Parallel Bitmap Heap Scan on tenk1
  Recheck Cond: (hundred > 1)
- ->  Bitmap Index Scan on tenk1_hundred
+ ->  Parallel Bitmap Index Scan on tenk1_hundred
Index Cond: (hundred > 1)

+1, this is a very good feature to have.

+   /* Merge bitmap to a common
shared bitmap */
+   SpinLockAcquire(&pstate->mutex);
+   tbm_merge(tbm,
&pstate->tbm_shared, &pstate->pt_shared);
+   SpinLockRelease(&pstate->mutex);

This doesn't look like a job for a spinlock.

You have a barrier so that you can wait until all workers have
finished merging their partial bitmap into the complete bitmap, which
makes perfect sense.  You also use that spinlock (probably should be
LWLock) to serialise the bitmap merging work...  Hmm, I suppose there
would be an alternative design which also uses the barrier to
serialise the merge, and has the merging done entirely by one process,
like this:

  bool chosen_to_merge = false;

  /* Attach to the barrier, and see what phase we're up to. */
  switch (BarrierAttach())
  {
  case PBH_BUILDING:
... build my partial bitmap in shmem ...
chosen_to_merge = BarrierArriveAndWait();
/* Fall through */

  case PBH_MERGING:
if (chosen_to_merge)
  ... perform merge of all partial results into final shmem bitmap ...
BarrierArriveAndWait();
/* Fall through */

  case PBH_SCANNING:
/* We attached too late to help build the bitmap.  */
BarrierDetach();
break;
  }

Just an idea, not sure if it's a good one.  I find it a little easier
to reason about the behaviour of late-attaching workers when the
phases are explicitly named and handled with code like that (it's not
immediately clear to me whether your coding handles late attachers
correctly, which seems to be one of the main problems to think about
with "dynamic party" parallelism...).




Re: proposal: possibility to read dumped table's name from file

2020-07-26 Thread Justin Pryzby
On Sat, Jul 25, 2020 at 06:56:31PM +0530, vignesh C wrote:
> On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule  
> wrote:
> >> I meant can this:
> >> printf(_("  --filter=FILENAMEread object name filter
> >> expressions from file\n"));
> >> be changed to:
> >> printf(_("  --filter=FILENAMEdump objects and data based
> >> on the filter expressions from the filter file\n"));
> >
> > done in today patch
> 
> Thanks for fixing the  comments.
> Few comments:
> + /* use "-" as symbol for stdin */
> + if (strcmp(filename, "-") != 0)
> + {
> + fp = fopen(filename, "r");
> + if (!fp)
> + fatal("could not open the input file \"%s\": %m",
> +   filename);
> + }
> + else
> + fp = stdin;
> 
> We could use STDIN itself instead of -, it will be a more easier
> option to understand.

I think "-" is used widely for commandline tools, and STDIN is not (even though
it's commonly used by programmers).  For example, since last year, pg_restore
-f - means stdout.

-- 
Justin




Re: expose parallel leader in CSV and log_line_prefix

2020-07-26 Thread Justin Pryzby
On Sun, Jul 26, 2020 at 04:42:06PM +0900, Michael Paquier wrote:
> On Thu, Jul 23, 2020 at 09:52:14AM +0900, Michael Paquier wrote:
> > Sounds fine to me.  Thanks.
> > 
> > Do others have any objections with this wording?
> 
> I have used the wording suggested by Alvaro, and applied the patch
> down to 13.  Now let's see about the original item of this thread..

Updated with updated wording to avoid "null", per Tom.

-- 
Justin
>From 665c9ea8827d4ad94ea0100c5ba93dbf05db5943 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 13 Mar 2020 22:03:06 -0500
Subject: [PATCH v3] Include the leader PID in logfile

See also: b025f32e0b, which adds the leader PID to pg_stat_activity
---
 doc/src/sgml/config.sgml  | 10 ++-
 src/backend/utils/error/elog.c| 30 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e806b13754..c525a0fd15 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6687,6 +6687,13 @@ local0.*/var/log/postgresql
  Process ID
  no
 
+
+ %P
+ For a parallel worker, this is the Process ID of its leader
+ process.
+ 
+ no
+
 
  %t
  Time stamp without milliseconds
@@ -7019,7 +7026,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, and backend type.
+application name, backend type, and leader PID.
 Here is a sample table definition for storing CSV-format log output:
 
 
@@ -7049,6 +7056,7 @@ CREATE TABLE postgres_log
   location text,
   application_name text,
   backend_type text,
+  leader_pid integer,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e4b717c79a..3055b72c01 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,6 +77,7 @@
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
 #include "storage/proc.h"
+#include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2448,6 +2449,25 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, "%d", MyProcPid);
 break;
+
+			case 'P':
+if (MyProc)
+{
+	PGPROC *leader = MyProc->lockGroupLeader;
+	if (leader == NULL || leader->pid == MyProcPid)
+		/* padding only */
+		appendStringInfoSpaces(buf,
+padding > 0 ? padding : -padding);
+	else if (padding != 0)
+		appendStringInfo(buf, "%*d", padding, leader->pid);
+	else
+		appendStringInfo(buf, "%d", leader->pid);
+}
+else if (padding != 0)
+	appendStringInfoSpaces(buf,
+		   padding > 0 ? padding : -padding);
+break;
+
 			case 'l':
 if (padding != 0)
 	appendStringInfo(buf, "%*ld", padding, log_line_number);
@@ -2836,6 +2856,16 @@ write_csvlog(ErrorData *edata)
 	else
 		appendCSVLiteral(&buf, GetBackendTypeDesc(MyBackendType));
 
+	appendStringInfoChar(&buf, ',');
+
+	/* leader PID */
+	if (MyProc)
+	{
+		PGPROC *leader = MyProc->lockGroupLeader;
+		if (leader && leader->pid != MyProcPid)
+			appendStringInfo(&buf, "%d", leader->pid);
+	}
+
 	appendStringInfoChar(&buf, '\n');
 
 	/* If in the syslogger process, try to write messages direct to file */
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index aa30291ea3..55c2e23c47 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -536,6 +536,7 @@
 	#   %h = remote host
 	#   %b = backend type
 	#   %p = process ID
+	#   %P = leader PID
 	#   %t = timestamp without milliseconds
 	#   %m = timestamp with milliseconds
 	#   %n = timestamp with milliseconds (as a Unix epoch)
-- 
2.17.0



Re: Default setting for enable_hashagg_disk

2020-07-26 Thread Tomas Vondra

On Sat, Jul 25, 2020 at 05:13:00PM -0700, Peter Geoghegan wrote:

On Sat, Jul 25, 2020 at 5:05 PM Tomas Vondra
 wrote:

I'm not sure what you mean by "reported memory usage doesn't reflect the
space used for transition state"? Surely it does include that, we've
built the memory accounting stuff pretty much exactly to do that.

I think it's pretty clear what's happening - in the sorted case there's
only a single group getting new values at any moment, so when we decide
to spill we'll only add rows to that group and everything else will be
spilled to disk.


Right.


In the unsorted case however we manage to initialize all groups in the
hash table, but at that point the groups are tiny an fit into work_mem.
As we process more and more data the groups grow, but we can't evict
them - at the moment we don't have that capability. So we end up
processing everything in memory, but significantly exceeding work_mem.


work_mem was set to 200MB, which is more than the reported "Peak
Memory Usage: 1605334kB". So either the random case significantly


That's 1.6GB, if I read it right. Which is more than 200MB ;-)


exceeds work_mem and the "Peak Memory Usage" accounting is wrong
(because it doesn't report this excess), or the random case really
doesn't exceed work_mem but has a surprising advantage over the sorted
case.

--
Peter Geoghegan


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-26 Thread Andrey M. Borodin



> 24 июля 2020 г., в 14:05, Ashutosh Sharma  написал(а):
> 
> Attached is the patch that adds heap_force_kill(regclass, tid[]) and 
> heap_force_freeze(regclass, tid[]) functions which Robert mentioned in the 
> first email in this thread. The patch basically adds an extension named 
> pg_surgery that contains these functions.  Please have a look and let me know 
> your feedback. Thank you.

Thanks for the patch!
I have just few random thoughts.

I think here we should report that we haven't done what was asked.
+   /* Nothing to do if the itemid is unused or already 
dead. */
+   if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+   continue;

Also, should we try to fix VM along the way?
Are there any caveats with concurrent VACUUM? (I do not see any, just asking)
It would be good to have some checks for interrupts in safe places.

I think we should not trust user entierly here. I'd prefer validation and 
graceful exit, not a core dump.
+   Assert(noffs <= PageGetMaxOffsetNumber(page));

For some reason we had unlogged versions of these functions. But I do not 
recall exact rationale..
Also, I'd be happy if we had something like "Restore this tuple iff this does 
not break unique constraint". To do so we need to sort tids by xmin\xmax, to 
revive most recent data first.

Thanks!

Best regards, Andrey Borodin.



Re: Making CASE error handling less surprising

2020-07-26 Thread Chris Travers
On Fri, Jul 24, 2020 at 7:18 PM Tom Lane  wrote:

> Robert Haas  writes:
> > Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
> > believe this. Pavel's example is a good one. The leakproof exception
> > helps, but it doesn't cover everything. Users I've encountered throw
> > things like date_trunc() and lpad() into SQL code and expect them to
> > behave (from a performance point of view) like constants, but they
> > also expect 1/0 not to get evaluated too early when e.g. CASE is used.
> > It's difficult to meet both sets of expectations at the same time and
> > we're probably never going to have a perfect solution, but I think
> > you're minimizing the concern too much here.
>
> I've quoted this point before, but ... we can make queries arbitrarily
> fast, if we don't have to give the right answer.  I think we've seen
> enough complaints on this topic now to make it clear that what we're
> doing today with CASE is the wrong answer.
>

So here's my concern in a little more detail.

For small databases, these performance concerns are not big deals. But for
large, heavily loaded databases one tends to run into all of the
pathological cases more frequently.  In other words the overhead for the
largest users will likely not be proportional to the gains of the newer
users who are surprised by the current behavior.  The more complex we make
exceptions as to how the planner works, the more complex the knowledge
required to work on the high end of the database is.  So the complexity
here is such that I just don't think is worth it.


> The performance argument can be made to cut both ways, too.  If somebody's
> got a very expensive function in a CASE arm that they don't expect to
> reach, having it be evaluated anyway because it's got constant inputs
> isn't going to make them happy.
>

However in this case we would be evaluating the expensive case arm every
time it is invoked (i.e. for every row matched), right?  It is hard to see
this as even being close to a performance gain or even approximately
neutral because the cases where you have a significant gain are likely to
be extremely rare, and the penalties for when the cost applies will be many
multiples of the maximum gain.

>
> The real bottom line is: if you don't want to do this, how else do
> you want to fix the problem?  I'm no longer willing to deny that
> there is a problem.
>

I see three ways forward.

The first (probably the best) would be a solution along the lines of yours
along with a session-level GUC variable which could determine whether case
branches can fold constants.  This has several important benefits:

1.  It gets a fix in shortly for those who want it.
2.  It ensures this is optional behavior for the more experienced users
(where one can better decide which direction to go), and
3.  It makes the behavior explicit, documented, and thus more easily
understood.

A third approach would be to allow some sort of "constant evaluation
mechanism" maybe with its own memory context where constants could be
cached on first evaluation under the statement memory context.  That would
solve the problem more gneerally.


>
> > I don't think I believe this either. I don't think an average user is
> > going to expect  to behave differently from (SELECT
> > ).
>
> Agreed, that's poorly (or not at all?) documented.  But it's been
> true all along, and this patch isn't changing that behavior at all.
> I'm not sure if we should do anything more than improve the docs,
> but in any case it seems independent of the CASE issue.
>
> > The current behavior isn't great, but at least it handles these
> > cases consistently.
>
> Really?
>
> regards, tom lane
>
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Parallel bitmap index scan

2020-07-26 Thread Dilip Kumar
On Sun, Jul 26, 2020 at 6:42 PM Dilip Kumar  wrote:
>
> I would like to propose a patch for enabling the parallelism for the
> bitmap index scan path.
>
> Background:
> Currently, we support only a parallel bitmap heap scan path.  Therein,
> the underlying bitmap index scan is done by a single worker called the
> leader.  The leader creates a bitmap in shared memory and once the
> bitmap is ready it creates a shared iterator and after that, all the
> workers process the shared iterator and scan the heap in parallel.
> While analyzing the TPCH plan we have observed that some of the
> queries are spending significant time in preparing the bitmap.  So the
> idea of this patch is to use the parallel index scan for preparing the
> underlying bitmap in parallel.
>
> Design:
> If underlying index AM supports the parallel path (currently only
> BTREE support it), then we will create a parallel bitmap heap scan
> path on top of the parallel bitmap index scan path.  So the idea of
> this patch is that each worker will do the parallel index scan and
> generate their part of the bitmap.  And, we will create a barrier so
> that we can not start preparing the shared iterator until all the
> worker is ready with their bitmap.  The first worker, which is ready
> with the bitmap will keep a copy of its TBM and the page table in the
> shared memory.  And, all the subsequent workers will merge their TBM
> with the shared TBM.  Once all the TBM are merged we will get one
> common shared TBM and after that stage, the worker can continue.  The
> remaining part is the same,  basically, again one worker will scan the
> shared TBM and prepare the shared iterator and once it is ready all
> the workers will jointly scan the heap in parallel using shared
> iterator.
>
> BitmapHeapNext
> {
> ...
> BarrierAttach();
> tbm = MultiExecProcNode();
> tbm_merge(tbm); --Merge with common tbm using tbm_union
> BarrierArriveAndWait();
>
> if (BitmapShouldInitializeSharedState(pstate)). --> only one worker
> come out of this
> {
>tbm_prepare_shared_iterate();
>BitmapDoneInitializingSharedState(). -->wakeup others
> }
> tbm_attach_shared_iterate(). --> all worker attach to shared iterator
> ...
> }
>
> Performance:  With scale factor 10, I could see that Q6 is spending
> significant time in a bitmap index scan so I have taken the
> performance with that query and I can see that the bitmap index scan
> node is 3x faster by using 3 workers whereas overall plan got ~40%
> faster.
>
> TPCH: S.F. 10, work_mem=512MB  shared_buffers: 1GB
>
> HEAD:
>   Limit  (cost=1559777.02..1559777.03 rows=1 width=32) (actual
> time=5260.121..5260.122 rows=1 loops=1)
>->  Finalize Aggregate  (cost=1559777.02..1559777.03 rows=1
> width=32) (actual time=5260.119..5260.119 rows=1 loops=1)
>  ->  Gather  (cost=1559776.69..1559777.00 rows=3 width=32)
> (actual time=5257.251..5289.595 rows=4 loops=1)
>Workers Planned: 3
>Workers Launched: 3
>->  Partial Aggregate  (cost=1558776.69..1558776.70
> rows=1 width=32) (actual time=5247.714..5247.714 rows=1 loops=4)
>  ->  Parallel Bitmap Heap Scan on lineitem
> (cost=300603.01..1556898.89 rows=375560 width=12) (actual
> time=3475.944..50
> 37.484 rows=285808 loops=4)
>Recheck Cond: ((l_shipdate >=
> '1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp
> without tim
> e zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND
> (l_quantity < '24'::numeric))
>Heap Blocks: exact=205250
>->  Bitmap Index Scan on
> idx_lineitem_shipdate  (cost=0.00..300311.95 rows=1164235 width=0)
> (actual time=3169.85
> 5..3169.855 rows=1143234 loops=1)
>  Index Cond: ((l_shipdate >=
> '1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp
> without
>  time zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND
> (l_quantity < '24'::numeric))
>  Planning Time: 0.659 ms
>  Execution Time: 5289.787 ms
> (13 rows)
>
>
> PATCH:
>
>   Limit  (cost=1559579.85..1559579.86 rows=1 width=32) (actual
> time=.572...572 rows=1 loops=1)
>->  Finalize Aggregate  (cost=1559579.85..1559579.86 rows=1
> width=32) (actual time=.569...569 rows=1 loops=1)
>  ->  Gather  (cost=1559579.52..1559579.83 rows=3 width=32)
> (actual time=3328.619..3365.227 rows=4 loops=1)
>Workers Planned: 3
>Workers Launched: 3
>->  Partial Aggregate  (cost=1558579.52..1558579.53
> rows=1 width=32) (actual time=3307.805..3307.805 rows=1 loops=4)
>  ->  Parallel Bitmap Heap Scan on lineitem
> (cost=300405.84..1556701.72 rows=375560 width=12) (actual
> time=1585.726..30
> 97.628 rows=285808 loops=4)
>Recheck Cond: ((l_shipdate >=
> '1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp
> without tim
> e zone) AND (l_di

Parallel bitmap index scan

2020-07-26 Thread Dilip Kumar
I would like to propose a patch for enabling the parallelism for the
bitmap index scan path.

Background:
Currently, we support only a parallel bitmap heap scan path.  Therein,
the underlying bitmap index scan is done by a single worker called the
leader.  The leader creates a bitmap in shared memory and once the
bitmap is ready it creates a shared iterator and after that, all the
workers process the shared iterator and scan the heap in parallel.
While analyzing the TPCH plan we have observed that some of the
queries are spending significant time in preparing the bitmap.  So the
idea of this patch is to use the parallel index scan for preparing the
underlying bitmap in parallel.

Design:
If underlying index AM supports the parallel path (currently only
BTREE support it), then we will create a parallel bitmap heap scan
path on top of the parallel bitmap index scan path.  So the idea of
this patch is that each worker will do the parallel index scan and
generate their part of the bitmap.  And, we will create a barrier so
that we can not start preparing the shared iterator until all the
worker is ready with their bitmap.  The first worker, which is ready
with the bitmap will keep a copy of its TBM and the page table in the
shared memory.  And, all the subsequent workers will merge their TBM
with the shared TBM.  Once all the TBM are merged we will get one
common shared TBM and after that stage, the worker can continue.  The
remaining part is the same,  basically, again one worker will scan the
shared TBM and prepare the shared iterator and once it is ready all
the workers will jointly scan the heap in parallel using shared
iterator.

BitmapHeapNext
{
...
BarrierAttach();
tbm = MultiExecProcNode();
tbm_merge(tbm); --Merge with common tbm using tbm_union
BarrierArriveAndWait();

if (BitmapShouldInitializeSharedState(pstate)). --> only one worker
come out of this
{
   tbm_prepare_shared_iterate();
   BitmapDoneInitializingSharedState(). -->wakeup others
}
tbm_attach_shared_iterate(). --> all worker attach to shared iterator
...
}

Performance:  With scale factor 10, I could see that Q6 is spending
significant time in a bitmap index scan so I have taken the
performance with that query and I can see that the bitmap index scan
node is 3x faster by using 3 workers whereas overall plan got ~40%
faster.

TPCH: S.F. 10, work_mem=512MB  shared_buffers: 1GB

HEAD:
  Limit  (cost=1559777.02..1559777.03 rows=1 width=32) (actual
time=5260.121..5260.122 rows=1 loops=1)
   ->  Finalize Aggregate  (cost=1559777.02..1559777.03 rows=1
width=32) (actual time=5260.119..5260.119 rows=1 loops=1)
 ->  Gather  (cost=1559776.69..1559777.00 rows=3 width=32)
(actual time=5257.251..5289.595 rows=4 loops=1)
   Workers Planned: 3
   Workers Launched: 3
   ->  Partial Aggregate  (cost=1558776.69..1558776.70
rows=1 width=32) (actual time=5247.714..5247.714 rows=1 loops=4)
 ->  Parallel Bitmap Heap Scan on lineitem
(cost=300603.01..1556898.89 rows=375560 width=12) (actual
time=3475.944..50
37.484 rows=285808 loops=4)
   Recheck Cond: ((l_shipdate >=
'1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp
without tim
e zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND
(l_quantity < '24'::numeric))
   Heap Blocks: exact=205250
   ->  Bitmap Index Scan on
idx_lineitem_shipdate  (cost=0.00..300311.95 rows=1164235 width=0)
(actual time=3169.85
5..3169.855 rows=1143234 loops=1)
 Index Cond: ((l_shipdate >=
'1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp
without
 time zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND
(l_quantity < '24'::numeric))
 Planning Time: 0.659 ms
 Execution Time: 5289.787 ms
(13 rows)


PATCH:

  Limit  (cost=1559579.85..1559579.86 rows=1 width=32) (actual
time=.572...572 rows=1 loops=1)
   ->  Finalize Aggregate  (cost=1559579.85..1559579.86 rows=1
width=32) (actual time=.569...569 rows=1 loops=1)
 ->  Gather  (cost=1559579.52..1559579.83 rows=3 width=32)
(actual time=3328.619..3365.227 rows=4 loops=1)
   Workers Planned: 3
   Workers Launched: 3
   ->  Partial Aggregate  (cost=1558579.52..1558579.53
rows=1 width=32) (actual time=3307.805..3307.805 rows=1 loops=4)
 ->  Parallel Bitmap Heap Scan on lineitem
(cost=300405.84..1556701.72 rows=375560 width=12) (actual
time=1585.726..30
97.628 rows=285808 loops=4)
   Recheck Cond: ((l_shipdate >=
'1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp
without tim
e zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND
(l_quantity < '24'::numeric))
   Heap Blocks: exact=184293
   ->  Parallel Bitmap Index Scan on
idx_lineitem_shipdate  (cost=0.00..300311.95 rows=1164235 width=0)
(actual tim

Re: INSERT INTO SELECT, Why Parallelism is not selected?

2020-07-26 Thread Amit Kapila
On Sat, Jul 25, 2020 at 8:42 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Fri, Jul 24, 2020 at 7:36 PM Tom Lane  wrote:
> >> Yeah, the proposed comment changes don't actually add much.  Also
> >> please try to avoid inserting non-ASCII   into the source code;
> >> at least in my mail reader, that attachment has some.
>
> > I don't see any non-ASCII characters in the patch.  I have applied and
> > checked (via vi editor) the patch as well but don't see any non-ASCII
> > characters.  How can I verify that?
>
> They're definitely there:
>
> $ od -c 0001-Fix-comments-in-heapam.c.patch
> ...
> 0002740   h   e  \n   +  \t   *   l   e   a   d   e   r   c
> 0002760   a   n   p   e   r   f   o   r   m   t   h   e   i
> 0003000   n   s   e   r   t   . 302 240   T   h   i   s   r   e
> 0003020   s   t   r   i   c   t   i   o   n   c   a   n   b   e
> 0003040   u   p   l   i   f   t   e   d   o   n   c   e   w
> 0003060   e  \n   +  \t   *   a   l   l   o   w   t   h   e
> 0003100 302 240   p   l   a   n   n   e   r   t   o   g   e   n
> 0003120   e   r   a   t   e   p   a   r   a   l   l   e   l   p
> 0003140   l   a   n   s   f   o   r   i   n   s   e   r   t   s
> 0003160   .  \n  \t   *   /  \n  \t   i   f   (   I   s
> ...
>

Thanks, I could see that.

> I'm not sure if "git diff --check" would whine about this.
>

No, "git diff --check" doesn't help.  I have tried pgindent but that
also doesn't help neither was I expecting it to help.  I am still not
able to figure out how I goofed up this but will spend some more time
on this.  In the meantime, I have updated the patch to improve the
comments as suggested by Robert.  Do let me know if you want to
edit/add something more?

-- 
With Regards,
Amit Kapila.


v2-0001-Fix-comments-in-heapam.c.patch
Description: Binary data


Re: Difference for Binary format vs Text format for client-server communication

2020-07-26 Thread Andy Fan
On Sun, Jul 26, 2020 at 1:49 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-07-16 18:52, Andy Fan wrote:
> > The reason I ask this is because I have a task to make numeric output
> > similar to oracle.
> >
> > Oracle:
> >
> > SQL> select 2 / 1.0 from dual;
> >
> >   2/1.0
> > --
> >   2
> >
> > PG:
> >
> > postgres=# select  2 / 1.0;
> >?column?
> > 
> >   2.
> > (1 row)
> >
> > If the user uses text format, I can just hack some numeric_out function,
> > but if they
> > use binary format,  looks I have to change the driver they used for it.
> > Am I
> > understand it correctly?
>
> I think what you should be looking at is why the numeric division
> function produces that scale and possibly make changes there.


Thanks, I  think you are talking about the select_div_scale function, which
is
called before the real division task in div_var.  so it will be hard to hack
at that part.  Beside that,  oracle returns the zero-trim version no matter
if division
is involved(I forgot to mention at the first).

At last, I just hacked the numeric_out  function, then it works like Oracle
now.
However it just works in text format. I tried JDBC,  and it uses text
format by
default.  The solution is not good enough but it is ok for my purpose
currently.

IIUC, if a driver uses text protocol for a data type,  then it works like
this:  1). server
gets a value in binary format.  2). server convert it to string and send it
via network,
3). client gets the string. 4). client converts the string to a given data
type.  looks it is much
more complex than binary protocol.  then why text protocol is chosen by
default.


> By the
> time the type's output or send function is invoked, that's already
> decided.  The output/send functions are not the place to make scale or
> other semantic adjustments.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

-- 
Best Regards
Andy Fan


Re: Parallel Seq Scan vs kernel read ahead

2020-07-26 Thread David Rowley
On Wed, 15 Jul 2020 at 12:24, David Rowley  wrote:
> If we've not seen any performance regressions within 1 week, then I
> propose that we (pending final review) push this to allow wider
> testing. It seems we're early enough in the PG14 cycle that there's a
> large window of time for us to do something about any reported
> performance regressions that come in.

I did that final review which ended up in quite a few cosmetic changes.

Functionality-wise, it's basically that of the v2 patch with the
PARALLEL_SEQSCAN_MAX_CHUNK_SIZE set to 8192.

I mentioned that we might want to revisit giving users some influence
on the chunk size, but we'll only do so once we see some conclusive
evidence that it's worthwhile.

David




Re: expose parallel leader in CSV and log_line_prefix

2020-07-26 Thread Michael Paquier
On Thu, Jul 23, 2020 at 09:52:14AM +0900, Michael Paquier wrote:
> Sounds fine to me.  Thanks.
> 
> Do others have any objections with this wording?

I have used the wording suggested by Alvaro, and applied the patch
down to 13.  Now let's see about the original item of this thread..
--
Michael


signature.asc
Description: PGP signature


Re: OpenSSL randomness seeding

2020-07-26 Thread Michael Paquier
On Wed, Jul 22, 2020 at 11:31:38PM +0200, Daniel Gustafsson wrote:
> Thanks for picking it up!

For the archives, the patch set has been applied as ce4939f and
15e4419 on HEAD.  Thanks, Noah.

> That's a good question.  I believe that if one actually do use RAND_cleanup as
> a re-seeding mechanism then that can break FIPS enabled OpenSSL installations
> as RAND_cleanup resets the RNG method from the FIPS RNG to the built-in one.  
> I
> would be inclined to follow the upstream recommendations of using RAND_poll
> exclusively, but I'm far from an expert here.

RAND_cleanup() can cause a failure telling that the RNG state is not
initialized when attempting to use FIPS in 1.0.2.  This is not
officially supported by upstream AFAIK, and those APIs have been
dropped later in 1.1.0.  And FWIW, VMware's Photon actually applies
some custom patches in this area:
https://github.com/vmware/photon/tree/master/SPECS/openssl

openssl-drbg-default-read-system-fips.patch is used to enforce the
initialization state of FIPS for example.  Anyway, I would just stick
with the wiki recommendation.

>> Do you happen to know how OpenSSL 1.1.1 changed its reaction to forks in 
>> order
>> to make the RAND_poll() superfluous?  (No need to research it if you don't.)
> 
> I'm not entirely sure, but I do believe that 1.1.1 ported over the RNG from 
> the
> FIPS module which re-seeds itself with fork() protection.  There was however a
> bug, fixed in 1.1.1d or thereabouts as CVE-2019-1549, where the fork 
> protection
> wasn't activated by default..  so there is that.  Since that bug was found,
> there has been tests introduced to catch any regression on that which is
> comforting.

No idea about this one actually.
--
Michael


signature.asc
Description: PGP signature