[PATCH] Performance Improvement For Copy From Binary Files

2020-06-28 Thread Bharath Rupireddy
Hi Hackers,

For Copy From Binary files, there exists below information for each
tuple/row.
1. field count(number of columns)
2. for every field, field size(column data length)
3. field data of field size(actual column data)

Currently, all the above data required at each step is read directly from
file using fread() and this happens for all the tuples/rows.

One observation is that in the total execution time of a copy from binary
file, the fread() call is taking upto 20% of time and the fread() function
call count is also too high.

For instance, with a dataset of size 5.3GB, 10million tuples with 10
columns,
total exec time in sec total time taken for fread() fread() function call
count
101.193 *21.33* 21005
101.345 *21.436* 21005

The total time taken for fread() and the corresponding function call count
may increase if we have more number of columns for instance 1000.

One solution to this problem is to read data from binary file in
RAW_BUF_SIZE(64KB) chunks to avoid repeatedly calling fread()(thus possibly
avoiding few disk IOs). This is similar to the approach followed for
csv/text files.

Attaching a patch, implementing the above solution for binary format files.

Below is the improvement gained.
total exec time in sec total time taken for fread() fread() function call
count
75.757 *2.73* 160884
75.351 *2.742* 160884

*Execution is 1.36X times faster, fread() time is reduced by 87%, fread()
call count is reduced by 99%.*

Request the community to take this patch for review if this approach and
improvement seem beneficial.

Any suggestions to improve further are most welcome.

Attached also is the config file used for testing the above use case.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Postgres configuration used for above testing:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

System Configuration:
RAM: 503GB
Disk Type:   SSD
Disk Size:   1.6TB
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             8
NUMA node(s):          8
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 47
Model name:            Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
Stepping:              2
CPU MHz:               1064.000
CPU max MHz:           2129.
CPU min MHz:           1064.
BogoMIPS:              4266.62
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              24576K


v1-0001-Performance-Improvement-For-Copy-From-Binary-File.patch
Description: Binary data


Re: POC: postgres_fdw insert batching

2020-06-28 Thread Amit Langote
Hi Tomas,

On Mon, Jun 29, 2020 at 12:10 AM Tomas Vondra
 wrote:
>
> Hi,
>
> One of the issues I'm fairly regularly reminded by users/customers is
> that inserting into tables sharded using FDWs are rather slow. We do
> even get it reported on pgsql-bugs from time to time [1].
>
> Some of the slowness / overhead is expected, doe to the latency between
> machines in the sharded setup. Even just 1ms latency will make it way
> more expensive than a single instance.
>
> But let's do a simple experiment, comparing a hash-partitioned regular
> partitions, and one with FDW partitions in the same instance. Scripts to
> run this are attached. The duration of inserting 1M rows to this table
> (average of 10 runs on my laptop) looks like this:
>
>regular: 2872 ms
>FDW: 64454 ms
>
> Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
> Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
> which is probably where most single-DC clusters are :-(
>
> Now, the primary reason why the performance degrades like this is that
> while FDW has batching for SELECT queries (i.e. we read larger chunks of
> data from the cursors), we don't have that for INSERTs (or other DML).
> Every time you insert a row, it has to go all the way down into the
> partition synchronously.
>
> For some use cases this may be reduced by having many independent
> connnections from different users, so the per-user latency is higher but
> acceptable. But if you need to import larger amounts of data (say, a CSV
> file for analytics, ...) this may not work.
>
> Some time ago I wrote an ugly PoC adding batching, just to see how far
> would it get us, and it seems quite promising - results for he same
> INSERT benchmarks look like this:
>
> FDW batching: 4584 ms
>
> So, rather nice improvement, I'd say ...

Very nice indeed.

> Before I spend more time hacking on this, I have a couple open questions
> about the design, restrictions etc.

I think you may want to take a look this recent proposal by Andrey Lepikhov:

* [POC] Fast COPY FROM command for the table with foreign partitions *
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: Resetting spilled txn statistics in pg_stat_replication

2020-06-28 Thread Masahiko Sawada
On Fri, 26 Jun 2020 at 17:53, Amit Kapila  wrote:
>
> On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada
>  wrote:
> >
> > On Thu, 25 Jun 2020 at 19:35, Amit Kapila  wrote:
> > >
> > > On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra
> > > >  wrote:
> > > > >
> > > > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
> > > > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
> > > > > > wrote:
> > > > > >>
> > > > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra 
> > > > > >>  wrote:
> > > > > >> >
> > > > > >> > >
> > > > > >> > >What if the decoding has been performed by multiple backends 
> > > > > >> > >using the
> > > > > >> > >same slot?  In that case, it will be difficult to make the 
> > > > > >> > >judgment
> > > > > >> > >for the value of logical_decoding_work_mem based on stats.  It 
> > > > > >> > >would
> > > > > >> > >make sense if we provide a way to set logical_decoding_work_mem 
> > > > > >> > >for a
> > > > > >> > >slot but not sure if that is better than what we have now.
> > > > > >> > >
> > > > > >>
> > > > > >> I thought that the stats are relevant to what
> > > > > >> logical_decoding_work_mem value was but not with who performed 
> > > > > >> logical
> > > > > >> decoding. So even if multiple backends perform logical decoding 
> > > > > >> using
> > > > > >> the same slot, the user can directly use stats as long as
> > > > > >> logical_decoding_work_mem value doesn’t change.
> > > > > >>
> > >
> > > Today, I thought about it again, and if we consider the point that
> > > logical_decoding_work_mem value doesn’t change much then having the
> > > stats at slot-level would also allow computing
> > > logical_decoding_work_mem based on stats.  Do you think it is a
> > > reasonable assumption that users won't change
> > > logical_decoding_work_mem for different processes (WALSender, etc.)?
> >
> > FWIW, if we use logical_decoding_work_mem as a threshold of starting
> > of sending changes to a subscriber, I think there might be use cases
> > where the user wants to set different logical_decoding_work_mem values
> > to different wal senders. For example, setting a lower value to
> > minimize the latency of synchronous logical replication to a near-site
> > whereas setting a large value to minimize the amount of data sent to a
> > far site.
> >
>
> How does setting a large value can minimize the amount of data sent?
> One possibility is if there are a lot of transaction aborts and
> transactions are not large enough that they cross
> logical_decoding_work_mem threshold but such cases shouldn't be many.

Yeah, this is what I meant.

I agree that it would not be a common case that the user sets
different values for different processes. Based on that assumption, I
also think having the stats at slot-level is a good idea. But I might
want to have the reset function.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




[Bug fix]There is the case archive_timeout parameter is ignored after recovery works.

2020-06-28 Thread higuchi.dais...@fujitsu.com
Hi,

I found the bug about archive_timeout parameter.
There is the case archive_timeout parameter is ignored after recovery works.

[Problem]
When the value of archive_timeout is smaller than that of checkpoint_timeout 
and recovery works, archive_timeout is ignored in the first WAL archiving.
Once WAL is archived, the archive_timeout seems to be valid after that.

I attached the simple script for reproducing this problem on version 12. 
I also confirmed that PostgreSQL10, 11 and 12. I think other supported versions 
have this problem. 

[Investigation]
In the CheckpointerMain(), calculate the time (cur_timeout) to wait on 
WaitLatch.

-
now = (pg_time_t) time(NULL);
elapsed_secs = now - last_checkpoint_time;
if (elapsed_secs >= CheckPointTimeout)
continue;   /* no sleep for us ... */
cur_timeout = CheckPointTimeout - elapsed_secs;
if (XLogArchiveTimeout > 0 && !RecoveryInProgress())
{
elapsed_secs = now - last_xlog_switch_time;
if (elapsed_secs >= XLogArchiveTimeout)
continue;   /* no sleep for us ... */
cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs);
}

(void) WaitLatch(MyLatch,
 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 cur_timeout * 1000L /* convert to ms */ ,
 WAIT_EVENT_CHECKPOINTER_MAIN);
-

Currently, cur_timeout is set according to only checkpoint_timeout when it is 
during recovery.
Even during recovery, the cur_timeout should be calculated including 
archive_timeout as well as checkpoint_timeout, I think.
I attached the patch to solve this problem.

Regards,
Daisuke, Higuchi


archive_timeout_test.sh
Description: archive_timeout_test.sh


archive_timeout.patch
Description: archive_timeout.patch


Re: Fwd: PostgreSQL: WolfSSL support

2020-06-28 Thread Felix Lechner
Hi Jonah,

On Sat, Jun 27, 2020 at 12:35 PM Jonah H. Harris  wrote:
>
> Somewhere, I recall seeing an open-source OpenSSL compatibility wrapper for 
> WolfSSL. Assuming that still exists, this patch seems entirely unnecessary.

The patch uses the OpenSSL compatibility layer.

Kind regards
Felix Lechner




Re: Fwd: PostgreSQL: WolfSSL support

2020-06-28 Thread Felix Lechner
Hi Tom,

On Sat, Jun 27, 2020 at 11:52 AM Tom Lane  wrote:
>
> The configure
> script could add -I/usr/include/wolfssl (or wherever those files
> are) to CPPFLAGS instead of touching all those #includes.

That does not work well when OpenSSL's development files are
installed. I did not think a segmentation fault was a good way to make
friends.

> However, as long as
> it's available on GPL terms, I don't see a problem with it
> [wolfSSL] being one alternative.

A minimal patch against -13 is on its way.

Kind regards
Felix Lechner




Re: Fwd: PostgreSQL: WolfSSL support

2020-06-28 Thread Felix Lechner
Hi Tom,

On Sat, Jun 27, 2020 at 7:56 AM Tom Lane  wrote:
>
> However, judging from the caveats mentioned in the initial message,
> my inclination would be to wait awhile for wolfSSL to mature.

Please have a closer look. The library has been around since 2004 and
is popular in embedded systems. (It was previously known as cyaSSL.)
If you bought a car or an appliance in the past ten years, you may be
using it already.

wolfSSL's original claim to fame was that MySQL relied on it (I think
Oracle changed that). MariaDB still bundles an older, captive version.
The software is mature, and widely deployed.

Kind regards
Felix Lechner




Re: PostgreSQL: WolfSSL support

2020-06-28 Thread Felix Lechner
Hi Greg,

On Sun, Jun 28, 2020 at 6:40 AM Greg Stark  wrote:
>
> I'm more interested in a library like Amazon's

Does S2N support TLS 1.3?

https://github.com/awslabs/s2n/issues/388

Kind regards
Felix Lechner




Re: pgsql: Enable Unix-domain sockets support on Windows

2020-06-28 Thread Amit Kapila
On Sun, Jun 28, 2020 at 2:03 PM Peter Eisentraut
 wrote:
>
> On 2020-06-27 13:57, Amit Kapila wrote:
> > BTW, in which
> > format the path needs to be specified for unix_socket_directories?  I
> > tried with '/c/tmp', 'c:/tmp', 'tmp' but nothing seems to be working,
> > it gives me errors like: "could not create lock file
> > "/c/tmp/.s.PGSQL.5432.lock": No such file or directory" on server
> > start.  I am trying this on Win7 just to check what is the behavior of
> > this feature on it.
>
> Hmm, the only thing I remember about this now is that you need to use
> native Windows paths, meaning you can't just use /tmp under MSYS, but it
> needs to be something like C:\something.
>

I have tried it by giving something like that.
After giving path as unix_socket_directories = 'C:\\akapila', I get
below errors on server start:
2020-06-29 08:19:13.174 IST [4460] LOG:  could not create Unix socket
for address "C:/akapila/.s.PGSQL.5432": An address incompatible with
the request
ed protocol was used.
2020-06-29 08:19:13.205 IST [4460] WARNING:  could not create
Unix-domain socket in directory "C:/akapila"
2020-06-29 08:19:13.205 IST [4460] FATAL:  could not create any
Unix-domain sockets
2020-06-29 08:19:13.221 IST [4460] LOG:  database system is shut down

After giving path as unix_socket_directories = 'C:\akapila', I get
below errors on server start:
2020-06-29 08:24:11.861 IST [4808] FATAL:  could not create lock file
"C:akapila/.s.PGSQL.5432.lock": No such file or directory
2020-06-29 08:24:11.877 IST [4808] LOG:  database system is shut down

>  But the error you have there
> is not even about the socket file but about the lock file, which is a
> normal file, so if that goes wrong, it might be an unrelated problem.
>

Yeah, but as I am trying this on Win7 machine, I was expecting an
error similar to what you were saying: "unsupported address family
...". It seems this error occurred after passing that phase.  I am not
sure what is going on here and maybe it is not important as well
because we don't support this feature on Win7 but probably an
appropriate error message would have been good.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




estimation problems for DISTINCT ON with FDW

2020-06-28 Thread Jeff Janes
If I use the attached sql file to set up the database with loop-back
postgres_fdw, and then turn on use_remote_estimate for this query:

distinct on (id) id, z from fgn.priority order by id, priority desc,z

It issues two queries for the foreign estimate, one with a sort and one
without:

EXPLAIN SELECT id, priority, z FROM public.priority

EXPLAIN SELECT id, priority, z FROM public.priority ORDER BY id ASC NULLS
LAST, priority DESC NULLS FIRST, z ASC NULLS LAST

It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign
side, which is probably the best way to run the query.  I guess it makes
sense that FDW machinery in general doesn't want to try to push a
PostgreSQL specific construct.

But much worse than that, it horribly misestmates the number of unique rows
it will get back, having never asked the remote side for an estimate of
that.

 Result  (cost=100.51..88635.90 rows=1 width=16)
   ->  Unique  (cost=100.51..88635.90 rows=1 width=16)
 ->  Foreign Scan on priority  (cost=100.51..86135.90 rows=100
width=16)

Where does it come up with the idea that these 1,000,000 rows will
DISTINCT/Unique down to just 1 row?   I can't find the place in the code
where that happens.  I suspect it is happening somewhere in the core code
based on data fed into it by postgres_fdw, not in postgres_fdw itself.

This leads to horrible plans if the DISTINCT ON is actually in a subquery
which is joined to other tables, for example.

If you don't use the remote estimates, it at least comes up with a roughly
sane estimate of 200 distinct rows, which is enough to inhibit selection of
the worst plans. Why does an uninformative remote estimate do so much worse
than no remote estimate at all?

Of course I could just disable remote estimates for this table, but then
other queries that use the table without DISTINCT ON suffer.  Another
solution is to ANALYZE the foreign table, but that opens up a can of worms
of its own.

I see this behavior in all supported or in-development versions.

Cheers,

Jeff
create table priority as select floor(random()*100)::int as id, 
floor(random()*10)::int as priority, random() as z from 
generate_series(1,100)f(id) order by random();
vacuum ANALYZE priority;
create index on priority (id, priority,z);
create extension postgres_fdw ;
create server fdw foreign data wrapper postgres_fdw;
create user MAPPING FOR CURRENT_USER SERVER fdw ;
create schema fgn;
import foreign schema public from server fdw into fgn;
explain select distinct on (id) id, z from fgn.priority order by id, priority 
desc,z;
alter foreign table fgn.priority options (use_remote_estimate 'true');
explain select distinct on (id) id, z from fgn.priority order by id, priority 
desc,z;


Re: Creating a function for exposing memory usage of backend process

2020-06-28 Thread torikoshia

On 2020-06-20 03:11, Robert Haas wrote:

On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
 wrote:

> As a first step, to deal with (3) I'd like to add
> pg_stat_get_backend_memory_context() which target is limited to the
> local backend process.

+1


+1 from me, too.


Attached a patch that adds a function exposing memory usage of local
backend.

It's almost same as pg_cheat_funcs's pg_stat_get_memory_context().
I've also added MemoryContexts identifier because it seems useful to
distinguish the same kind of memory contexts.

For example, when there are many prepared statements we can
distinguish them using identifiers, which shows the cached query.

  =# SELECT name, ident FROM pg_stat_get_memory_contexts() WHERE name = 
'CachedPlanSource';

 name   |ident
  --+
   CachedPlanSource | PREPARE q1(text) AS SELECT ..;
   CachedPlanSource | PREPARE q2(text) AS SELECT ..;
  (2 rows)



Something that exposed this via shared memory would
be quite useful, and there are other things we'd like to expose
similarly, such as the plan(s) from the running queries, or even just
the untruncated query string(s). I'd like to have a good
infrastructure for that sort of thing, but I think it's quite tricky
to do properly. You need a variable-size chunk of shared memory, so
probably it has to use DSM somehow, and you need to update it as
things change, but if you update it too frequently performance will
stink. If you ping processes to do the updates, how do you know when
they've completed them, and what happens if they don't respond in a
timely fashion? These are probably all solvable problems, but I don't
think they are very easy ones.


Thanks for your comments!

It seems hard as you pointed out.
I'm going to consider it along with the advice of Fujii and Kasahara.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 29 Jun 2020 07:48:49 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

Backend processes sometimes use a lot of memory because of various
reasons like caches, prepared statements and cursors.

Previously, the only way to examine the usage of backend processes
was attaching a debugger and call MemoryContextStats() and it was
not so convenient in some cases.

This patch implements a new SQL-callable function
pg_stat_get_memory_contexts() which exposes memory usage of the
local backend.
---
 doc/src/sgml/monitoring.sgml| 14 +
 src/backend/postmaster/pgstat.c | 80 +
 src/backend/utils/adt/pgstatfuncs.c | 45 
 src/include/catalog/pg_proc.dat |  9 
 src/include/pgstat.h|  6 ++-
 5 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dfa9d0d641..74c8663981 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4617,6 +4617,20 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i

   
 
+  
+   
+
+ pg_stat_get_memory_contexts
+
+pg_stat_get_memory_contexts ()
+setof record
+   
+   
+Returns records of information about all memory contexts of the
+local backend.
+   
+  
+
   

 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c022597bc0..140f111654 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -62,6 +62,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -167,6 +168,13 @@ static const char *const slru_names[] = {
  */
 static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];
 
+/* --
+ * The max bytes for showing identifiers of MemoryContext.
+ * --
+ */
+#define MEMORY_CONTEXT_IDENT_SIZE	1024
+
+
 /* --
  * Local data
  * --
@@ -2691,6 +2699,78 @@ pgstat_fetch_slru(void)
 	return slruStats;
 }
 
+void
+pgstat_put_memory_contexts_tuplestore(Tuplestorestate *tupstore,
+  TupleDesc tupdesc, MemoryContext context,
+  MemoryContext parent, int level)
+{
+#define PG_STAT_GET_MEMORY_CONTEXT_COLS	9
+	Datum		values[PG_STAT_GET_MEMORY_CONTEXT_COLS];
+	bool		nulls[PG_STAT_GET_MEMORY_CONTEXT_COLS];
+	MemoryContextCounters stat;
+	MemoryContext child;
+	const char *name = context->name;
+	const char *ident = context->ident;
+
+	if (context == NULL)
+		return;
+
+	/*
+	 * It seems preferable to label dynahash contexts with just the hash table
+	 * name.  Those are already unique enough, so the "dynahash" part isn't
+	 * very helpful, and this way is more consistent with pre-v11 practice.
+	 */
+	if (ident && strcmp(name, "dynahash") 

Re: Default setting for enable_hashagg_disk

2020-06-28 Thread Peter Geoghegan
On Sat, Jun 27, 2020 at 3:41 AM Tomas Vondra
 wrote:
> Well, there are multiple ideas discussed in this (sub)thread, one of
> them being a per-query memory limit. That requires decisions how much
> memory should different nodes get, which I think would need to be
> cost-based.

A design like that probably makes sense. But it's way out of scope for
Postgres 13, and not something that should be discussed further on
this thread IMV.

> Of course, a simpler scheme like this would not require that. And maybe
> introducing hash_mem is a good idea - I'm not particularly opposed to
> that, actually. But I think we should not introduce separate memory
> limits for each node type, which was also mentioned earlier.

I had imagined that hash_mem would apply to hash join and hash
aggregate only. A GUC that either represents a multiple of work_mem,
or an absolute work_mem-style KB value.

> The problem of course is that hash_mem does not really solve the issue
> discussed at the beginning of this thread, i.e. regressions due to
> underestimates and unexpected spilling at execution time.

Like Andres, I suspect that that's a smaller problem in practice. A
hash aggregate that spills often has performance characteristics
somewhat like a group aggregate + sort, anyway. I'm worried about
cases where an *in-memory* hash aggregate is naturally far far faster
than other strategies, and yet we can't use it -- despite the fact
that Postgres 12 could "safely" do so. (It probably doesn't matter
whether the slow plan that you get in Postgres 13 is a hash aggregate
that spills, or something else -- this is not really a costing
problem.)

Besides, hash_mem *can* solve that problem to some extent. Other cases
(cases where the estimates are so bad that hash_mem won't help) seem
like less of a concern to me. To some extent, that's the price you pay
to avoid the risk of an OOM.

> The thread is getting a rather confusing mix of proposals how to fix
> that for v13 and proposals how to improve our configuration of memory
> limits :-(

As I said to Amit in my last message, I think that all of the ideas
that are worth pursuing involve giving hash aggregate nodes license to
use more memory than other nodes. One variant involves doing so only
at execution time, while the hash_mem idea involves formalizing and
documenting that hash-based nodes are special -- and taking that into
account during both planning and execution.

> Interesting. What is not entirely clear to me how do these databases
> decide how much should each node get during planning. With the separate
> work_mem-like settings it's fairly obvious, but how do they do that with
> the global limit (either per-instance or per-query)?

I don't know, but that seems like a much more advanced way of
approaching the problem. It isn't in scope here.

Perhaps I'm not considering some unintended consequence of the planner
giving hash-based nodes extra memory "for free" in the common case
where hash_mem exceeds work_mem (by 2x, say). But my guess is that
that won't be a significant problem that biases the planner in some
obviously undesirable way.

-- 
Peter Geoghegan




Re: ModifyTable overheads in generic plans

2020-06-28 Thread David Rowley
On Sat, 27 Jun 2020 at 00:36, Amit Langote  wrote:
> 2. ExecCheckRTPerms(): checks permissions of *all* partitions before
> executing the plan tree, but maybe it's okay to check only the ones
> that will be accessed

I don't think it needs to be quite as complex as that.
expand_single_inheritance_child will set the
RangeTblEntry.requiredPerms to 0, so we never need to check
permissions on a partition.  The overhead of permission checking when
there are many partitions is just down to the fact that
ExecCheckRTPerms() loops over the entire rangetable and calls
ExecCheckRTEPerms for each one.  ExecCheckRTEPerms() does have very
little work to do when requiredPerms is 0, but the loop itself and the
function call overhead show up when you remove the other bottlenecks.

I have a patch somewhere that just had the planner add the RTindexes
with a non-zero requiredPerms and set that in the plan so that
ExecCheckRTPerms could just look at the ones that actually needed
something checked.   There's a slight disadvantage there that for
queries to non-partitioned tables that we need to build a Bitmapset
that has all items from the rangetable.  That's likely a small
overhead, but not free, so perhaps there is a better way.

David




Re: Fwd: PostgreSQL: WolfSSL support

2020-06-28 Thread Michael Paquier
On Sun, Jun 28, 2020 at 10:18:12AM +0200, Peter Eisentraut wrote:
> We have added support for allegedly-OpenSSL compatible libraries such as
> LibreSSL before, so some tweaks for wolfSSL would seem acceptable. However,
> I doubt we are going to backpatch them, so unless you want to take
> responsibility for that as a packager, it's not really going to help anyone
> soon.

That's a new feature to me.

> And OpenSSL 3.0.0 will have a new license, so for the next PostgreSQL
> release, this problem might be gone.

And there is this part too to consider, but I am no lawyer.

@@ -131,11 +131,11 @@ typedef union {
 #ifdef WOLFSSL_SHA3
  wc_Sha3 sha3;
 #endif
-} Hash;
+} WolfSSLHash;
[...]
 #endif
 #if !defined(XVALIDATE_DATE) && !defined(HAVE_VALIDATE_DATE)
 #define USE_WOLF_VALIDDATE
-#define XVALIDATE_DATE(d, f, t) ValidateDate((d), (f), (t))
+#define XVALIDATE_DATE(d, f, t) WolfSSLValidateDate((d), (f), (t))
 #endif
Looking at the patches, it seems to me that the part applying only to
WolfSSL should be done anyway, at least for the Hash part which is a
rather generic name, and that it may be better to do something as well
on the Postgres part for the same plan node to avoid conflicts, but
that's something old enough that it could vote (1054097).
ValidateTime() is present in the Postgres tree since f901bb5, but it
is always annoying to break stuff that could be used by external
plugins...

Regarding the Postgres part of the WIP, the hard part is that we need
more thinking about the refactoring bits, so as people compiling
Postgres can choose between OpenSSL or something else.  And as Tom
mentioned upthread there is no need for that:
-#include 
-#include 
-#include 
+#include 
+#include 
+#include 
+#include 

./configure should just append the correct path with -I.

-   my_bio_methods->bread = my_sock_read;
-   my_bio_methods->bwrite = my_sock_write;
+   my_bio_methods->readCb = my_sock_read;
+   my_bio_methods->writeCb = my_sock_write;
These parts could also be consolidated between OpenSSL and WolfSSL?

-   dh = PEM_read_DHparams(fp, NULL, NULL, NULL);
FreeFile(fp);
+   return NULL;
This part is not acceptable as-is.  As a proof of concept, that's
fine of course.
--
Michael


signature.asc
Description: PGP signature


Re: ModifyTable overheads in generic plans

2020-06-28 Thread Amit Langote
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote  wrote:
> I would like to discuss a refactoring patch that builds on top of the
> patches at [1] to address $subject.

I've added this to the next CF: https://commitfest.postgresql.org/28/2621/

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: Default setting for enable_hashagg_disk

2020-06-28 Thread Peter Geoghegan
On Sat, Jun 27, 2020 at 3:00 AM Amit Kapila  wrote:
> I think the advantage of delaying it is that we
> might see some real problems (like where hash aggregate is not a good
> choice) which can be fixed via the costing model.

I think any problem that might come up with the costing is best
thought of as a distinct problem. This thread is mostly about the
problem of users getting fewer in-memory hash aggregates compared to a
previous release running the same application (though there has been
some discussion of the other problem, too [1], but it's thought to be
less serious).

The problem is that affected users were theoretically never entitled
to the performance they came to rely on, and yet there is good reason
to think that hash aggregate really should be entitled to more memory.
They won't care that they were theoretically never entitled to that
performance, though -- they *liked* the fact that hash agg could
cheat. And they'll dislike the fact that this cannot be corrected by
tuning work_mem, since that affects all node types that consume
work_mem, not just hash aggregate -- that could cause OOMs for them.

There are two or three similar ideas under discussion that might fix
the problem. They all seem to involve admitting that hash aggregate's
"cheating" might actually have been a good thing all along (even
though giving hash aggregate much much more memory than other nodes is
terrible), and giving hash aggregate license to "cheat openly". Note
that the problem isn't exactly a problem with the hash aggregate
spilling patch. You could think of the problem as a pre-existing issue
-- a failure to give more memory to hash aggregate, which really
should be entitled to more memory. Jeff's patch just made the issue
more obvious.

[1] https://postgr.es/m/20200624191433.5gnqgrxfmucex...@alap3.anarazel.de
-- 
Peter Geoghegan




Re: Commitfest 2020-07

2020-06-28 Thread Michael Paquier
On Mon, Jun 29, 2020 at 08:04:35AM +0800, Andy Fan wrote:
> Thanks for the volunteering!

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Commitfest 2020-07

2020-06-28 Thread Andy Fan
On Mon, Jun 29, 2020 at 2:50 AM Magnus Hagander  wrote:

> On Sun, Jun 28, 2020 at 4:49 PM Tom Lane  wrote:
>
>> Daniel Gustafsson  writes:
>> > In a few days, the first commitfest of the 14 cycle - 2020-07 - will
>> start.
>> > Unless anyone has already spoken up that I've missed, I'm happy to
>> volunteer to
>> > run CFM for this one.
>>
>> No one has volunteered that I recall, so the baton is yours.
>>
>>
> Enjoy!
>
> //Magnus
>
>

Thanks for the volunteering!

-- 
Best Regards
Andy Fan


Re: pg_read_file() with virtual files returns empty string

2020-06-28 Thread Tom Lane
Joe Conway  writes:
> All good stuff -- I believe the attached checks all the boxes.

Looks okay to me, except I think you want

!   if (bytes_to_read > 0)

to be

!   if (bytes_to_read >= 0)

As it stands, a zero request will be treated like -1 (read all the
rest of the file) while ISTM it ought to be an expensive way to
read zero bytes --- perhaps useful to check the filename and seek
offset validity?

> The intention here seems to be that if you pass bytes_to_read = -1 with a
> negative offset, it will give you the last offset bytes of the file.

I think it's just trying to convert bytes_to_read = -1 into an explicit
positive length-to-read in all cases.  We don't need that anymore with
this code, so dropping it is fine.

regards, tom lane




Revert workarounds for unportability of printf %.*s format?

2020-06-28 Thread Tom Lane
In connection with the discussion at [1], I realized that we could unwind
the hacks we've introduced --- mostly in commit 54cd4f045 --- to avoid
depending on the behavior of %.*s format in printf.  Now that we always
use our own snprintf.c code, we know that it measures field widths in
bytes not characters, and we also know that use of this format won't
cause random encoding-related failures.

Some of the changes are not worth undoing; for example using strlcpy
instead of snprintf to truncate a string is a net win by any measure.
But places where we introduced a temporary buffer, such as the
change in truncate_identifier() in 54cd4f045, would be better off
with the old coding.  In any case we could remove all the comments
warning against using this feature.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/a120087c-4c88-d9d4-1ec5-808d7a7f133d%40gmail.com




Re: Commitfest 2020-07

2020-06-28 Thread Magnus Hagander
On Sun, Jun 28, 2020 at 4:49 PM Tom Lane  wrote:

> Daniel Gustafsson  writes:
> > In a few days, the first commitfest of the 14 cycle - 2020-07 - will
> start.
> > Unless anyone has already spoken up that I've missed, I'm happy to
> volunteer to
> > run CFM for this one.
>
> No one has volunteered that I recall, so the baton is yours.
>
>
Enjoy!

//Magnus


Re: new heapcheck contrib module

2020-06-28 Thread Mark Dilger



> On Jun 28, 2020, at 9:05 AM, Dilip Kumar  wrote:
> 
> Some more comments on v9_0001.
> 1.
> + LWLockAcquire(XidGenLock, LW_SHARED);
> + nextFullXid = ShmemVariableCache->nextFullXid;
> + ctx.oldestValidXid = ShmemVariableCache->oldestXid;
> + LWLockRelease(XidGenLock);
> + ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
> ...
> ...
> +
> + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
> + {
> + int32 mapbits;
> + OffsetNumber maxoff;
> + PageHeader ph;
> +
> + /* Optionally skip over all-frozen or all-visible blocks */
> + if (skip_all_frozen || skip_all_visible)
> + {
> + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> +);
> + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> + continue;
> + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> + continue;
> + }
> +
> + /* Read and lock the next page. */
> + ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
> + RBM_NORMAL, ctx.bstrategy);
> + LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
> 
> I might be missing something, but it appears that first we are getting
> the nextFullXid and after that, we are scanning the block by block.
> So while we are scanning the block if the nextXid is advanced and it
> has updated some tuple in the heap pages,  then it seems the current
> logic will complain about out of range xid.  I did not test this
> behavior so please point me to the logic which is protecting this.

We know the oldest valid Xid cannot advance, because we hold a lock that would 
prevent it from doing so.  We cannot know that the newest Xid will not advance, 
but when we see an Xid beyond the end of the known valid range, we check its 
validity, and either report it as a corruption or advance our idea of the 
newest valid Xid, depending on that check.  That logic is in 
TransactionIdValidInRel.

> 2.
> /*
> * Helper function to construct the TupleDesc needed by verify_heapam.
> */
> static TupleDesc
> verify_heapam_tupdesc(void)
> 
> From function name, it appeared that it is verifying tuple descriptor
> but this is just creating the tuple descriptor.

In amcheck--1.2--1.3.sql we define a function named verify_heapam which returns 
a set of records.  This is the tuple descriptor for that function.  I 
understand that the name can be parsed as verify_(heapam_tupdesc), but it is 
meant as (verify_heapam)_tupdesc.  Do you have a name you would prefer?

> 3.
> + /* Optionally skip over all-frozen or all-visible blocks */
> + if (skip_all_frozen || skip_all_visible)
> + {
> + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> +);
> + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> + continue;
> + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> + continue;
> + }
> 
> Here, do we want to test that in VM the all visible bit is set whereas
> on the page it is not set?  That can lead to a wrong result in an
> index-only scan.

If the caller has specified that the corruption check should skip over 
all-frozen or all-visible data, then we cannot load the page that the VM claims 
is all-frozen or all-visible without defeating the purpose of the caller having 
specified these options.  Without loading the page, we cannot check the page's 
header bits.

When not skipping all-visible or all-frozen blocks, we might like to pin both 
the heap page and the visibility map page in order to compare the two, being 
careful not to hold a pin on the one while performing I/O on the other.  See 
for example the logic in heap_delete().  But I'm not sure what guarantees the 
system makes about agreement between these two bits.  Certainly, the VM should 
not claim a page is all visible when it isn't, but are we guaranteed that a 
page that is all-visible will always have its all-visible bit set?  I don't 
know if (possibly transient) disagreement between these two bits constitutes 
corruption.  Perhaps others following this thread can advise?

> 4. One cosmetic comment
> 
> + /* Skip non-varlena values, but update offset first */
> ..
> +
> + /* Ok, we're looking at a varlena attribute. */
> 
> Throughout the patch, I have noticed that some of your single-line
> comments have "full stop" whereas other don't.  Can we keep them
> consistent?

I try to use a "full stop" at the end of sentences, but not at the end of 
sentence fragments.  To me, a "full stop" means that a sentence has reached its 
conclusion.  I don't intentionally use one at the end of a fragment, unless the 
fragment precedes a full sentence, in which case the "full stop" is needed to 
separate the two.  Of course, I may have violated my own rule in a few places, 
but before I submit a v10 patch with comment punctuation changes, perhaps we 
can agree on what the rule is?  (This has probably been discussed before and 
agreed before.  A link to the appropriate email thread would be sufficient.)

For example:

/* red, green, or blue */
/* 

Re: Fwd: PostgreSQL: WolfSSL support

2020-06-28 Thread Tom Lane
Felix Lechner  writes:
> On Sat, Jun 27, 2020 at 7:56 AM Tom Lane  wrote:
>> However, judging from the caveats mentioned in the initial message,
>> my inclination would be to wait awhile for wolfSSL to mature.

> Please have a closer look. The library has been around since 2004 and
> is popular in embedded systems. (It was previously known as cyaSSL.)

I don't really care where else it's used.  If we have to hack the source
code before we can use it, it's not mature for our purposes.  Even when
(if) that requirement reduces to "you have to use the latest bleeding
edge release", it'll be problematic for people whose platforms supply
a less bleeding edge version.  I was signifying a desire to wait until
compatible versions are reasonably common in the wild before we spend
much time on this.

regards, tom lane




Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2020-06-28 Thread Tom Lane
I wrote:
> Even granting the premise, the proposed patch seems like a significant
> decrease in user-friendliness for typical cases.  I'd rather see us
> make an effort to print one valid-per-the-DB-encoding character.
> Now that we can rely on snprintf to count %s restrictions in bytes,
> I think something like this should work:
>  errmsg("\"%.*s\" is not a valid binary digit",
> pg_mblen(sp), sp)));
> But the real problem is that this is only the tip of the iceberg.
> You didn't even hit all the %c usages in varbit.c.

I went through all the %c format sequences in the backend to see which
ones could use this type of fix.  There were not as many as I'd expected,
but still a fair number.  (I skipped cases where the input was coming from
the catalogs, as well as some non-user-facing debug printouts.)  That
leads to the attached patch, which seems to do the job without breaking
anything that works today.

regards, tom lane

PS: I failed to resist the temptation to improve some shoddy error
messages nearby in pageinspect/heapfuncs.c.

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 60bdbea46b..b3304ff844 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -80,7 +80,9 @@ get_val(HSParser *state, bool ignoreeq, bool *escaped)
 			}
 			else if (*(state->ptr) == '=' && !ignoreeq)
 			{
-elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int32) (state->ptr - state->begin));
+elog(ERROR, "Syntax error near \"%.*s\" at position %d",
+	 pg_mblen(state->ptr), state->ptr,
+	 (int32) (state->ptr - state->begin));
 			}
 			else if (*(state->ptr) == '\\')
 			{
@@ -219,7 +221,9 @@ parse_hstore(HSParser *state)
 			}
 			else if (!isspace((unsigned char) *(state->ptr)))
 			{
-elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int32) (state->ptr - state->begin));
+elog(ERROR, "Syntax error near \"%.*s\" at position %d",
+	 pg_mblen(state->ptr), state->ptr,
+	 (int32) (state->ptr - state->begin));
 			}
 		}
 		else if (st == WGT)
@@ -234,7 +238,9 @@ parse_hstore(HSParser *state)
 			}
 			else
 			{
-elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int32) (state->ptr - state->begin));
+elog(ERROR, "Syntax error near \"%.*s\" at position %d",
+	 pg_mblen(state->ptr), state->ptr,
+	 (int32) (state->ptr - state->begin));
 			}
 		}
 		else if (st == WVAL)
@@ -267,7 +273,9 @@ parse_hstore(HSParser *state)
 			}
 			else if (!isspace((unsigned char) *(state->ptr)))
 			{
-elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int32) (state->ptr - state->begin));
+elog(ERROR, "Syntax error near \"%.*s\" at position %d",
+	 pg_mblen(state->ptr), state->ptr,
+	 (int32) (state->ptr - state->begin));
 			}
 		}
 		else
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 11a910184b..f04455da12 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_am_d.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
+#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pageinspect.h"
 #include "port/pg_bitutils.h"
@@ -99,7 +100,8 @@ text_to_bits(char *str, int len)
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
-	 errmsg("illegal character '%c' in t_bits string", str[off])));
+	 errmsg("invalid character \"%.*s\" in t_bits string",
+			pg_mblen(str + off), str + off)));
 
 		if (off % 8 == 7)
 			bits[off / 8] = byte;
@@ -460,14 +462,13 @@ tuple_data_split(PG_FUNCTION_ARGS)
 		if (!t_bits_str)
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
-	 errmsg("argument of t_bits is null, but it is expected to be null and %d character long",
-			bits_len)));
+	 errmsg("t_bits string must not be NULL")));
 
 		bits_str_len = strlen(t_bits_str);
 		if (bits_len != bits_str_len)
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
-	 errmsg("unexpected length of t_bits %u, expected %d",
+	 errmsg("unexpected length of t_bits string: %u, expected %u",
 			bits_str_len, bits_len)));
 
 		/* do the conversion */
@@ -478,7 +479,7 @@ tuple_data_split(PG_FUNCTION_ARGS)
 		if (t_bits_str)
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
-	 errmsg("t_bits string is expected to be NULL, but instead it is %zu bytes length",
+	 errmsg("t_bits string is expected to be NULL, but instead it is %zu bytes long",
 			strlen(t_bits_str;
 	}
 
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 61d318d93c..a609d49c12 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -15,6 +15,7 @@
 
 #include 
 
+#include "mb/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 
@@ -171,17 +172,19 @@ hex_encode(const char *src, size_t len, char 

Re: pg_read_file() with virtual files returns empty string

2020-06-28 Thread Joe Conway
On 6/27/20 3:43 PM, Tom Lane wrote:
> Joe Conway  writes:
>> The attached patch fixes this for me. I think it ought to be backpatched 
>> through
>> pg11.
> 
>> Comments?
> 
> 1. Doesn't seem to be accounting for the possibility of an error in fread().
> 
> 2. Don't we want to remove the stat() call altogether, if we're not
> going to believe its length?
> 
> 3. This bit might need to cast the RHS to int64:
>   if (bytes_to_read > (MaxAllocSize - VARHDRSZ))
> otherwise it might be treated as an unsigned comparison.
> Or you could check for bytes_to_read < 0 separately.
> 
> 4. appendStringInfoString seems like quite the wrong thing to use
> when the input is binary data.
> 
> 5. Don't like the comment.  Whether the file is virtual or not isn't
> very relevant here.
> 
> 6. If the file size exceeds 1GB, I fear we'll get some rather opaque
> failure from the stringinfo infrastructure.  It'd be better to
> check for that here and give a file-too-large error.


All good stuff -- I believe the attached checks all the boxes.

I noted while at this, that the current code can never hit this case:

!   if (bytes_to_read < 0)
!   {
!   if (seek_offset < 0)
!   bytes_to_read = -seek_offset;

The intention here seems to be that if you pass bytes_to_read = -1 with a
negative offset, it will give you the last offset bytes of the file.

But all of the SQL exposed paths disallow an explicit negative value for
bytes_to_read. This was also not documented as far as I can tell so I eliminated
that case in the attached. Is that actually a case I should fix/support instead?

Separately, it seems to me that a two argument version of pg_read_file() would
be useful:

  pg_read_file('filename', offset)

In that case bytes_to_read = -1 could be passed down in order to read the entire
file after the offset. In fact I think that would nicely handle the negative
offset case as well.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/contrib/adminpack/expected/adminpack.out b/contrib/adminpack/expected/adminpack.out
index 5738b0f..edf3ebf 100644
*** a/contrib/adminpack/expected/adminpack.out
--- b/contrib/adminpack/expected/adminpack.out
*** SELECT pg_file_rename('test_file1', 'tes
*** 79,85 
  (1 row)
  
  SELECT pg_read_file('test_file1');  -- not there
! ERROR:  could not stat file "test_file1": No such file or directory
  SELECT pg_read_file('test_file2');
   pg_read_file 
  --
--- 79,85 
  (1 row)
  
  SELECT pg_read_file('test_file1');  -- not there
! ERROR:  could not open file "test_file1" for reading: No such file or directory
  SELECT pg_read_file('test_file2');
   pg_read_file 
  --
*** SELECT pg_file_rename('test_file2', 'tes
*** 108,114 
  (1 row)
  
  SELECT pg_read_file('test_file2');  -- not there
! ERROR:  could not stat file "test_file2": No such file or directory
  SELECT pg_read_file('test_file3');
   pg_read_file 
  --
--- 108,114 
  (1 row)
  
  SELECT pg_read_file('test_file2');  -- not there
! ERROR:  could not open file "test_file2" for reading: No such file or directory
  SELECT pg_read_file('test_file3');
   pg_read_file 
  --
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index ceaa618..6363dbe 100644
*** a/src/backend/utils/adt/genfile.c
--- b/src/backend/utils/adt/genfile.c
***
*** 36,41 
--- 36,42 
  #include "utils/syscache.h"
  #include "utils/timestamp.h"
  
+ #define READBUF_SIZE	4096
  
  /*
   * Convert a "text" filename argument to C string, and check it's allowable.
*** read_binary_file(const char *filename, i
*** 106,138 
   bool missing_ok)
  {
  	bytea	   *buf;
! 	size_t		nbytes;
  	FILE	   *file;
  
! 	if (bytes_to_read < 0)
! 	{
! 		if (seek_offset < 0)
! 			bytes_to_read = -seek_offset;
! 		else
! 		{
! 			struct stat fst;
! 
! 			if (stat(filename, ) < 0)
! 			{
! if (missing_ok && errno == ENOENT)
! 	return NULL;
! else
! 	ereport(ERROR,
! 			(errcode_for_file_access(),
! 			 errmsg("could not stat file \"%s\": %m", filename)));
! 			}
! 
! 			bytes_to_read = fst.st_size - seek_offset;
! 		}
! 	}
! 
! 	/* not sure why anyone thought that int64 length was a good idea */
! 	if (bytes_to_read > (MaxAllocSize - VARHDRSZ))
  		ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("requested length too large")));
--- 107,117 
   bool missing_ok)
  {
  	bytea	   *buf;
! 	size_t		nbytes = 0;
  	FILE	   *file;
  
! 	/* clamp request size to what we can actually deliver */
! 	if (bytes_to_read > (int64) (MaxAllocSize - VARHDRSZ))
  		ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("requested length too large")));
*** read_binary_file(const char *filename, i
*** 154,162 
  

Re: new heapcheck contrib module

2020-06-28 Thread Dilip Kumar
On Sun, Jun 28, 2020 at 8:59 PM Dilip Kumar  wrote:
>
> On Mon, Jun 22, 2020 at 5:44 AM Mark Dilger
>  wrote:
> >
> >
> >
> > > On Jun 21, 2020, at 2:54 AM, Dilip Kumar  wrote:
> > >
> > > I have looked into 0001 patch and I have a few comments.
> > >
> > > 1.
> > > +
> > > + /* Skip over unused/dead/redirected line pointers */
> > > + if (!ItemIdIsUsed(ctx.itemid) ||
> > > + ItemIdIsDead(ctx.itemid) ||
> > > + ItemIdIsRedirected(ctx.itemid))
> > > + continue;
> > >
> > > Isn't it a good idea to verify the Redirected Itemtid?  Because we
> > > will still access the redirected item id to find the
> > > actual tuple from the index scan.  Maybe not exactly at this level,
> > > but we can verify that the link itemid store in that
> > > is within the itemid range of the page or not.
> >
> > Good idea.  I've added checks that the redirection is valid, both in terms 
> > of being within bounds and in terms of alignment.
> >
> > > 2.
> > >
> > > + /* Check for tuple header corruption */
> > > + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
> > > + {
> > > + confess(ctx,
> > > + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
> > > + ctx->tuphdr->t_hoff,
> > > + (unsigned) SizeofHeapTupleHeader));
> > > + fatal = true;
> > > + }
> > >
> > > I think we can also check that if there is no NULL attributes (if
> > > (!(t_infomask & HEAP_HASNULL)) then
> > > ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.
> >
> > You have to take alignment padding into account, but otherwise yes, and 
> > I've added a check for that.
> >
> > > 3.
> > > + ctx->offset = 0;
> > > + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
> > > + {
> > > + if (!check_tuple_attribute(ctx))
> > > + break;
> > > + }
> > > + ctx->offset = -1;
> > > + ctx->attnum = -1;
> > >
> > > So we are first setting ctx->offset to 0, then inside
> > > check_tuple_attribute, we will keep updating the offset as we process
> > > the attributes and after the loop is over we set ctx->offset to -1,  I
> > > did not understand that why we need to reset it to -1, do we ever
> > > check for that.  We don't even initialize the ctx->offset to -1 while
> > > initializing the context for the tuple so I do not understand what is
> > > the meaning of the random value -1.
> >
> > Ahh, right, those are left over from a previous design of the code.  Thanks 
> > for pointing them out.  They are now removed.
> >
> > > 4.
> > > + if (!VARATT_IS_EXTENDED(chunk))
> > > + {
> > > + chunksize = VARSIZE(chunk) - VARHDRSZ;
> > > + chunkdata = VARDATA(chunk);
> > > + }
> > > + else if (VARATT_IS_SHORT(chunk))
> > > + {
> > > + /*
> > > + * could happen due to heap_form_tuple doing its thing
> > > + */
> > > + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
> > > + chunkdata = VARDATA_SHORT(chunk);
> > > + }
> > > + else
> > > + {
> > > + /* should never happen */
> > > + confess(ctx,
> > > + pstrdup("toast chunk is neither short nor extended"));
> > > + return;
> > > + }
> > >
> > > I think the error message "toast chunk is neither short nor extended".
> > > Because ideally, the toast chunk should not be further toasted.
> > > So I think the check is correct, but the error message is not correct.
> >
> > I agree the error message was wrongly stated, and I've changed it, but you 
> > might suggest a better wording than what I came up with, "corrupt toast 
> > chunk va_header".
> >
> > > 5.
> > >
> > > + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
> > > + check_relation_relkind_and_relam(ctx.rel);
> > > +
> > > + /*
> > > + * Open the toast relation, if any, also protected from concurrent
> > > + * vacuums.
> > > + */
> > > + if (ctx.rel->rd_rel->reltoastrelid)
> > > + {
> > > + int offset;
> > > +
> > > + /* Main relation has associated toast relation */
> > > + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
> > > +   ShareUpdateExclusiveLock);
> > > + offset = toast_open_indexes(ctx.toastrel,
> > > 
> > > + if (TransactionIdIsNormal(ctx.relfrozenxid) &&
> > > + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
> > > + {
> > > + confess(, psprintf("relfrozenxid %u precedes global "
> > > +"oldest valid xid %u ",
> > > +ctx.relfrozenxid, ctx.oldestValidXid));
> > > + PG_RETURN_NULL();
> > > + }
> > >
> > > Don't we need to close the relation/toastrel/toastindexrel in such
> > > return which is without an abort? IIRC, we
> > > will get relcache leak WARNING on commit if we left them open in commit 
> > > path.
> >
> > Ok, I've added logic to close them.
> >
> > All changes inspired by your review are included in the v9-0001 patch.  The 
> > differences since v8 are pulled out into v9_diffs for easier review.
>
> I have reviewed the changes in v9_diffs and looks fine to me.

Some more comments on v9_0001.
1.
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = 

Re: new heapcheck contrib module

2020-06-28 Thread Dilip Kumar
On Mon, Jun 22, 2020 at 5:44 AM Mark Dilger
 wrote:
>
>
>
> > On Jun 21, 2020, at 2:54 AM, Dilip Kumar  wrote:
> >
> > I have looked into 0001 patch and I have a few comments.
> >
> > 1.
> > +
> > + /* Skip over unused/dead/redirected line pointers */
> > + if (!ItemIdIsUsed(ctx.itemid) ||
> > + ItemIdIsDead(ctx.itemid) ||
> > + ItemIdIsRedirected(ctx.itemid))
> > + continue;
> >
> > Isn't it a good idea to verify the Redirected Itemtid?  Because we
> > will still access the redirected item id to find the
> > actual tuple from the index scan.  Maybe not exactly at this level,
> > but we can verify that the link itemid store in that
> > is within the itemid range of the page or not.
>
> Good idea.  I've added checks that the redirection is valid, both in terms of 
> being within bounds and in terms of alignment.
>
> > 2.
> >
> > + /* Check for tuple header corruption */
> > + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
> > + {
> > + confess(ctx,
> > + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
> > + ctx->tuphdr->t_hoff,
> > + (unsigned) SizeofHeapTupleHeader));
> > + fatal = true;
> > + }
> >
> > I think we can also check that if there is no NULL attributes (if
> > (!(t_infomask & HEAP_HASNULL)) then
> > ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.
>
> You have to take alignment padding into account, but otherwise yes, and I've 
> added a check for that.
>
> > 3.
> > + ctx->offset = 0;
> > + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
> > + {
> > + if (!check_tuple_attribute(ctx))
> > + break;
> > + }
> > + ctx->offset = -1;
> > + ctx->attnum = -1;
> >
> > So we are first setting ctx->offset to 0, then inside
> > check_tuple_attribute, we will keep updating the offset as we process
> > the attributes and after the loop is over we set ctx->offset to -1,  I
> > did not understand that why we need to reset it to -1, do we ever
> > check for that.  We don't even initialize the ctx->offset to -1 while
> > initializing the context for the tuple so I do not understand what is
> > the meaning of the random value -1.
>
> Ahh, right, those are left over from a previous design of the code.  Thanks 
> for pointing them out.  They are now removed.
>
> > 4.
> > + if (!VARATT_IS_EXTENDED(chunk))
> > + {
> > + chunksize = VARSIZE(chunk) - VARHDRSZ;
> > + chunkdata = VARDATA(chunk);
> > + }
> > + else if (VARATT_IS_SHORT(chunk))
> > + {
> > + /*
> > + * could happen due to heap_form_tuple doing its thing
> > + */
> > + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
> > + chunkdata = VARDATA_SHORT(chunk);
> > + }
> > + else
> > + {
> > + /* should never happen */
> > + confess(ctx,
> > + pstrdup("toast chunk is neither short nor extended"));
> > + return;
> > + }
> >
> > I think the error message "toast chunk is neither short nor extended".
> > Because ideally, the toast chunk should not be further toasted.
> > So I think the check is correct, but the error message is not correct.
>
> I agree the error message was wrongly stated, and I've changed it, but you 
> might suggest a better wording than what I came up with, "corrupt toast chunk 
> va_header".
>
> > 5.
> >
> > + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
> > + check_relation_relkind_and_relam(ctx.rel);
> > +
> > + /*
> > + * Open the toast relation, if any, also protected from concurrent
> > + * vacuums.
> > + */
> > + if (ctx.rel->rd_rel->reltoastrelid)
> > + {
> > + int offset;
> > +
> > + /* Main relation has associated toast relation */
> > + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
> > +   ShareUpdateExclusiveLock);
> > + offset = toast_open_indexes(ctx.toastrel,
> > 
> > + if (TransactionIdIsNormal(ctx.relfrozenxid) &&
> > + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
> > + {
> > + confess(, psprintf("relfrozenxid %u precedes global "
> > +"oldest valid xid %u ",
> > +ctx.relfrozenxid, ctx.oldestValidXid));
> > + PG_RETURN_NULL();
> > + }
> >
> > Don't we need to close the relation/toastrel/toastindexrel in such
> > return which is without an abort? IIRC, we
> > will get relcache leak WARNING on commit if we left them open in commit 
> > path.
>
> Ok, I've added logic to close them.
>
> All changes inspired by your review are included in the v9-0001 patch.  The 
> differences since v8 are pulled out into v9_diffs for easier review.

I have reviewed the changes in v9_diffs and looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-06-28 Thread Dilip Kumar
On Fri, Jun 26, 2020 at 11:47 AM Amit Kapila  wrote:
>
> On Thu, Jun 25, 2020 at 7:11 PM Dilip Kumar  wrote:
> >
> > On Wed, Jun 24, 2020 at 4:04 PM Amit Kapila  wrote:
> > >
> > >
> > > Review comments on various patches.
> > >
> > > poc_shared_fileset_cleanup_on_procexit
> > > =
> > > 1.
> > > - ent->subxact_fileset =
> > > - MemoryContextAlloc(ApplyContext, sizeof(SharedFileSet));
> > > + MemoryContext oldctx;
> > >
> > > + /* Shared fileset handle must be allocated in the persistent context */
> > > + oldctx = MemoryContextSwitchTo(ApplyContext);
> > > + ent->subxact_fileset = palloc(sizeof(SharedFileSet));
> > >   SharedFileSetInit(ent->subxact_fileset, NULL);
> > > + MemoryContextSwitchTo(oldctx);
> > >   fd = BufFileCreateShared(ent->subxact_fileset, path);
> > >
> > > Why is this change required for this patch and why we only cover
> > > SharedFileSetInit in the Apply context and not BufFileCreateShared?
> > > The comment is also not very clear on this point.
> >
> > Added the comments for the same.
> >
>
> 1.
> + /*
> + * Shared fileset handle must be allocated in the persistent context.
> + * Also, SharedFileSetInit allocate the memory for sharefileset list
> + * so we need to allocate that in the long term meemory context.
> + */
>
> How about "We need to maintain shared fileset across multiple stream
> open/close calls.  So, we allocate it in a persistent context."

Done

> 2.
> + /*
> + * If the caller is following the dsm based cleanup then we don't
> + * maintain the filesetlist so return.
> + */
> + if (filesetlist == NULL)
> + return;
>
> The check here should use 'NIL' instead of 'NULL'

Done

> Other than that the changes in this particular patch looks good to me.
Added as a last patch in the series, in the next version I will merge
this to 0012 and 0013.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: tar-related code in PostgreSQL

2020-06-28 Thread Hamid Akhtar
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The patch works perfectly.

The new status of this patch is: Ready for Committer


POC: postgres_fdw insert batching

2020-06-28 Thread Tomas Vondra

Hi,

One of the issues I'm fairly regularly reminded by users/customers is
that inserting into tables sharded using FDWs are rather slow. We do
even get it reported on pgsql-bugs from time to time [1].

Some of the slowness / overhead is expected, doe to the latency between
machines in the sharded setup. Even just 1ms latency will make it way
more expensive than a single instance.

But let's do a simple experiment, comparing a hash-partitioned regular
partitions, and one with FDW partitions in the same instance. Scripts to
run this are attached. The duration of inserting 1M rows to this table
(average of 10 runs on my laptop) looks like this:

  regular: 2872 ms
  FDW: 64454 ms

Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
which is probably where most single-DC clusters are :-(

Now, the primary reason why the performance degrades like this is that
while FDW has batching for SELECT queries (i.e. we read larger chunks of
data from the cursors), we don't have that for INSERTs (or other DML).
Every time you insert a row, it has to go all the way down into the
partition synchronously.

For some use cases this may be reduced by having many independent
connnections from different users, so the per-user latency is higher but
acceptable. But if you need to import larger amounts of data (say, a CSV
file for analytics, ...) this may not work.

Some time ago I wrote an ugly PoC adding batching, just to see how far
would it get us, and it seems quite promising - results for he same
INSERT benchmarks look like this:

   FDW batching: 4584 ms

So, rather nice improvement, I'd say ...

Before I spend more time hacking on this, I have a couple open questions
about the design, restrictions etc.


1) Extend the FDW API?

In the patch, the batching is simply "injected" into the existing insert
API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
extend the API with a "batched" version of the method, so that we can
easily determine whether the FDW supports batching or not - it would
require changes in the callers, though. OTOH it might be useful for
COPY, where we could do something similar to multi_insert (COPY already
benefits from this patch, but it does not use the batching built-into
COPY).


2) What about the insert results?

I'm not sure what to do about "result" status for the inserted rows. We
only really "stash" the rows into a buffer, so we don't know if it will
succeed or not. The patch simply assumes it will succeed, but that's
clearly wrong, and it may result in reporting a wrong number or rows.

The patch also disables the batching when the insert has a RETURNING
clause, because there's just a single slot (for the currently inserted
row). I suppose a "batching" method would take an array of slots.


3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.


3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

For cases that can replace INSERT with COPY this would be enough, but
unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
this :-(


4) Expected consistency?

I'm not entirely sure what are the consistency expectations for FDWs.
Currently the FDW nodes pointing to the same server share a connection,
so the inserted rows might be visible to other nodes. But if we only
stash the rows in a local buffer for a while, that's no longer true. So
maybe this breaks the consistency expectations?

But maybe that's OK - I'm not sure how the prepared statements/cursors
affect this. I can imagine restricting the batching only to plans where
this is not an issue (single FDW node or something), but it seems rather
fragile and undesirable.

I was thinking about adding a GUC to enable/disable the batching at some
level (global, server, table, ...) but it seems like a bad match because
the consistency expectations likely depend on a query. There should be a
GUC to set the batch size, though (it's hardcoded to 100 for now).


regards



[1] 
https://www.postgresql.org/message-id/CACnz%2BQ1q0%2B2KoJam9LyNMk8JmdC6qYHXWB895Wu2xcpoip18xQ%40mail.gmail.com

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fdw.sql
Description: application/sql


local.sql
Description: application/sql
>From df2cf502909886fbfc86f93f36b2daba03f785e4 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 28 Jun 2020 14:31:18 +0200
Subject: [PATCH] 

Re: More tzdb fun: POSIXRULES is being deprecated upstream

2020-06-28 Thread Tom Lane
Seems like I'm not getting any traction in convincing people that
back-patching this change is wise.  To get this closed out before
the CF starts, I'm just going to put it into HEAD/v13 and call it
a day.

I remain of the opinion that we'll probably regret not doing
anything in the back branches, sometime in the next 4+ years.

regards, tom lane




Re: Commitfest 2020-07

2020-06-28 Thread Tom Lane
Daniel Gustafsson  writes:
> In a few days, the first commitfest of the 14 cycle - 2020-07 - will start.
> Unless anyone has already spoken up that I've missed, I'm happy to volunteer 
> to
> run CFM for this one.

No one has volunteered that I recall, so the baton is yours.

regards, tom lane




Re: Fwd: PostgreSQL: WolfSSL support

2020-06-28 Thread Bruce Momjian
On Sun, Jun 28, 2020 at 10:18:12AM +0200, Peter Eisentraut wrote:
> On 2020-06-27 14:50, Christoph Berg wrote:
> > Re: Peter Eisentraut
> > > What would be the advantage of using wolfSSL over OpenSSL?
> > 
> > Avoiding the OpenSSL-vs-GPL linkage problem with readline.
> 
> We have added support for allegedly-OpenSSL compatible libraries such as
> LibreSSL before, so some tweaks for wolfSSL would seem acceptable. However,
> I doubt we are going to backpatch them, so unless you want to take
> responsibility for that as a packager, it's not really going to help anyone
> soon.  And OpenSSL 3.0.0 will have a new license, so for the next PostgreSQL
> release, this problem might be gone.

Oh, that is a long time coming --- it would be nice.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: PostgreSQL: WolfSSL support

2020-06-28 Thread Greg Stark
Personally I'm more interested in a library like Amazon's which is
trying to do less rather than more. I would rather a simpler, better
tested, easier to audit code-base than one with more features and more
complications.

https://github.com/awslabs/s2n




Re: [HACKERS] Custom compression methods

2020-06-28 Thread Dilip Kumar
On Wed, Jun 24, 2020 at 5:30 PM Robert Haas  wrote:
>
> On Tue, Jun 23, 2020 at 4:00 PM Andres Freund  wrote:
> > https://postgr.es/m/20130621000900.GA12425%40alap2.anarazel.de is a
> > thread with more information / patches further along.
> >
> > I confused this patch with the approach in
> > https://www.postgresql.org/message-id/d8576096-76ba-487d-515b-44fdedba8bb5%402ndquadrant.com
> > sorry for that.  It obviously still differs by not having lower space
> > overhead (by virtue of not having a 4 byte 'va_cmid', but no additional
> > space for two methods, and then 1 byte overhead for 256 more), but
> > that's not that fundamental a difference.
>
> Wait a minute. Are we saying there are three (3) dueling patches for
> adding an alternate TOAST algorithm? It seems like there is:
>
> This "custom compression methods" thread - vintage 2017 - Original
> code by Nikita Glukhov, later work by Ildus Kurbangaliev
> The "pluggable compression support" thread - vintage 2013 - Andres Freund
> The "plgz performance" thread - vintage 2019 - Petr Jelinek
>
> Anyone want to point to a FOURTH implementation of this feature?
>
> I guess the next thing to do is figure out which one is the best basis
> for further work.

I have gone through these 3 threads and here is a summary of what I
understand from them.  Feel free to correct me if I have missed
something.

#1. Custom compression methods:  Provide a mechanism to create/drop
compression methods by using external libraries, and it also provides
a way to set the compression method for the columns/types.  There are
a few complexities with this approach those are listed below:

a. We need to maintain the dependencies between the column and the
compression method.  And the bigger issue is, even if the compression
method is changed, we need to maintain the dependencies with the older
compression methods as we might have some older tuples that were
compressed with older methods.
b. Inside the compressed attribute, we need to maintain the
compression method so that we know how to decompress it.  For this, we
use 2 bits from the raw_size of the compressed varlena header.

#2. pglz performance:  Along with pglz this patch provides an
additional compression method using lz4.  The new compression method
can be enabled/disabled during configure time or using SIGHUP.  We use
1 bit from the raw_size of the compressed varlena header to identify
the compression method (pglz or lz4).

#3. pluggable compression:  This proposal is to replace the existing
pglz algorithm,  with the snappy or lz4 whichever is better.  As per
the performance data[1], it appeared that the lz4 is the winner in
most of the cases.
- This also provides an additional patch to plugin any compression method.
- This will also use 2 bits from the raw_size of the compressed
attribute for identifying the compression method.
- Provide an option to select the compression method using GUC,  but
the comments in the patch suggest to remove the GUC.  So it seems that
GUC was used only for the POC.
- Honestly, I did not clearly understand from this patch set that
whether it proposes to replace the existing compression method with
the best method (and the plugin is just provided for performance
testing) or it actually proposes an option to have pluggable
compression methods.

IMHO,  We can provide a solution based on #1 and #2, i.e. we can
provide a few best compression methods in the core, and on top of
that, we can also provide a mechanism to create/drop the external
compression methods.

[1] 
https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




TLS checking in pgstat

2020-06-28 Thread Daniel Gustafsson
As I mentioned in [1], checking (struct Port)->ssl for NULL to determine
whether TLS is used for connection is a bit of a leaky abstraction, as that's
an OpenSSL specific struct member.  This sets the requirement that all TLS
implementations use a pointer named SSL, and that the pointer is set to NULL in
case of a failed connection, which may or may not fit.

Is there a reason to not use (struct Port)->ssl_in_use flag which tracks just
what we're looking for here?  This also maps against other parts of the
abstraction in be-secure.c which do just that.  The attached implements this.

cheers ./daniel

[1] fab21fc8-0f62-434f-aa78-6bd9336d6...@yesql.se



ssl_reporting.patch
Description: Binary data


Commitfest 2020-07

2020-06-28 Thread Daniel Gustafsson
In a few days, the first commitfest of the 14 cycle - 2020-07 - will start.
Unless anyone has already spoken up that I've missed, I'm happy to volunteer to
run CFM for this one.

cheers ./daniel



Re: PostgreSQL: WolfSSL support

2020-06-28 Thread Daniel Gustafsson
> On 27 Jun 2020, at 21:40, Bruce Momjian  wrote:
> On Sat, Jun 27, 2020 at 04:22:51PM -0300, Ranier Vilela wrote:

>> WolfSSL, will provide a commercial license for PostgreSQL?
>> Isn't LIbreSSL a better alternative?
> 
> Seems it might be.

That's not really an apples/apples comparison as the projects have vastly
different goals (wolfSSL offers FIPS certification for example).

cheers ./daniel




Re: pgsql: Enable Unix-domain sockets support on Windows

2020-06-28 Thread Peter Eisentraut

On 2020-06-27 13:57, Amit Kapila wrote:

Fair enough, but what should be the behavior in the Windows versions
(<10) where Unix-domain sockets are not supported?


You get an error about an unsupported address family, similar to trying 
to use IPv6 on a system that doesn't support it.



BTW, in which
format the path needs to be specified for unix_socket_directories?  I
tried with '/c/tmp', 'c:/tmp', 'tmp' but nothing seems to be working,
it gives me errors like: "could not create lock file
"/c/tmp/.s.PGSQL.5432.lock": No such file or directory" on server
start.  I am trying this on Win7 just to check what is the behavior of
this feature on it.


Hmm, the only thing I remember about this now is that you need to use 
native Windows paths, meaning you can't just use /tmp under MSYS, but it 
needs to be something like C:\something.  But the error you have there 
is not even about the socket file but about the lock file, which is a 
normal file, so if that goes wrong, it might be an unrelated problem.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Fwd: PostgreSQL: WolfSSL support

2020-06-28 Thread Peter Eisentraut

On 2020-06-27 14:50, Christoph Berg wrote:

Re: Peter Eisentraut

What would be the advantage of using wolfSSL over OpenSSL?


Avoiding the OpenSSL-vs-GPL linkage problem with readline.


We have added support for allegedly-OpenSSL compatible libraries such as 
LibreSSL before, so some tweaks for wolfSSL would seem acceptable. 
However, I doubt we are going to backpatch them, so unless you want to 
take responsibility for that as a packager, it's not really going to 
help anyone soon.  And OpenSSL 3.0.0 will have a new license, so for the 
next PostgreSQL release, this problem might be gone.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Raising stop and warn limits

2020-06-28 Thread Noah Misch
On Sun, Jun 21, 2020 at 01:35:13AM -0700, Noah Misch wrote:
> In brief, I'm proposing to raise xidWrapLimit-xidStopLimit to 3M and
> xidWrapLimit-xidWarnLimit to 40M.  Likewise for mxact counterparts.

Here's the patch for it.

> 1. VACUUM, to advance a limit, may assign IDs subject to one of the limits.
>VACUUM formerly consumed XIDs, not mxacts.  It now consumes mxacts, not
>XIDs.

Correction: a lazy_truncate_heap() at wal_level!=minimal does assign an XID,
so XID consumption is impossible with "VACUUM (TRUNCATE false)" but possible
otherwise.  "VACUUM (ANALYZE)", which a DBA might do by reflex, also assigns
XIDs.  (These corrections do not affect $SUBJECT.)
Author: Noah Misch 
Commit: Noah Misch 

Change XID and mxact limits to warn at 40M and stop at 3M.

We have edge-case bugs when assigning values in the last few dozen pages
before the wrap limit.  We may introduce similar bugs in the future.  At
default BLCKSZ, this makes such bugs unreachable outside of single-user
mode.  Also, when VACUUM began to consume mxacts, multiStopLimit did not
change to compensate.

pg_upgrade may fail on a cluster that was already printing "must be
vacuumed" warnings.  Follow the warning's instructions to clear the
warning, then run pg_upgrade again.  One can still, peacefully consume
98% of XIDs or mxacts, so DBAs need not change routine VACUUM settings.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20200621083513.ga3074...@rfd.leadboat.com

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 612e4cb..a28ea56 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -608,10 +608,10 @@ SELECT datname, age(datfrozenxid) FROM pg_database;

 If for some reason autovacuum fails to clear old XIDs from a table, the
 system will begin to emit warning messages like this when the database's
-oldest XIDs reach eleven million transactions from the wraparound point:
+oldest XIDs reach forty million transactions from the wraparound point:
 
 
-WARNING:  database "mydb" must be vacuumed within 10985967 transactions
+WARNING:  database "mydb" must be vacuumed within 39985967 transactions
 HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that 
database.
 
 
@@ -621,7 +621,7 @@ HINT:  To avoid a database shutdown, execute a 
database-wide VACUUM in that data
 be able to advance the database's datfrozenxid.)
 If these warnings are
 ignored, the system will shut down and refuse to start any new
-transactions once there are fewer than 1 million transactions left
+transactions once there are fewer than three million transactions left
 until wraparound:
 
 
@@ -629,7 +629,7 @@ ERROR:  database is not accepting commands to avoid 
wraparound data loss in data
 HINT:  Stop the postmaster and vacuum that database in single-user mode.
 
 
-The 1-million-transaction safety margin exists to let the
+The three-million-transaction safety margin exists to let the
 administrator recover without data loss, by manually executing the
 required VACUUM commands.  However, since the system 
will not
 execute commands once it has gone into the safety shutdown mode,
diff --git a/src/backend/access/transam/multixact.c 
b/src/backend/access/transam/multixact.c
index ce84dac..475f5ed 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2217,28 +2217,24 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid 
oldest_datoid,
multiWrapLimit += FirstMultiXactId;
 
/*
-* We'll refuse to continue assigning MultiXactIds once we get within 
100
-* multi of data loss.
-*
-* Note: This differs from the magic number used in
-* SetTransactionIdLimit() since vacuum itself will never generate new
-* multis.  XXX actually it does, if it needs to freeze old multis.
+* We'll refuse to continue assigning MultiXactIds once we get within 3M
+* multi of data loss.  See SetTransactionIdLimit.
 */
-   multiStopLimit = multiWrapLimit - 100;
+   multiStopLimit = multiWrapLimit - 300;
if (multiStopLimit < FirstMultiXactId)
multiStopLimit -= FirstMultiXactId;
 
/*
-* We'll start complaining loudly when we get within 10M multis of the
-* stop point.   This is kind of arbitrary, but if you let your gas 
gauge
-* get down to 1% of full, would you be looking for the next gas 
station?
-* We need to be fairly liberal about this number because there are lots
-* of scenarios where most transactions are done by automatic clients 
that
-* won't pay attention to warnings. (No, we're not gonna make this
+* We'll start complaining loudly when we get within 40M multis of data
+* loss.  This is kind of arbitrary, but if you let your 

Re: update substring pattern matching syntax

2020-06-28 Thread Fabien COELHO



Hallo Peter,

v2 patches apply cleanly, compile, global check ok, citext check ok, doc 
gen ok. No further comments.


As I did not find an entry in the CF, so I did nothing about tagging it 
"ready".


--
Fabien.