Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-07-02 Thread Andrey V. Lepikhov

On 6/22/20 5:11 PM, Ashutosh Bapat wrote:


mailto:a.lepik...@postgrespro.ru>> wrote:
It looks like we call BeginForeignInsert and EndForeignInsert even 
though actual copy is performed using BeginForeignCopy, ExecForeignCopy 
and EndForeignCopy. BeginForeignInsert constructs the INSERT query which 
looks unnecessary. Also some of the other PgFdwModifyState members are 
initialized unnecessarily. It also gives an impression that we are using 
INSERT underneath the copy. Instead a better way would be to 
call BeginForeignCopy instead of BeginForeignInsert and EndForeignCopy 
instead of EndForeignInsert, if we are going to use COPY protocol to 
copy data to the foreign server. Corresponding postgres_fdw 
implementations need to change in order to do that.


I did not answer for a long time, because of waiting for the results of 
the discussion on Tomas approach to bulk INSERT/UPDATE/DELETE. It seems 
more general.
I can move the query construction into the first execution of INSERT or 
COPY operation. But another changes seems more invasive because 
BeginForeignInsert/EndForeignInsert are used in the execPartition.c 
module. We will need to pass copy/insert state of operation into 
ExecFindPartition() and ExecCleanupTupleRouting().


--
regards,
Andrey Lepikhov
Postgres Professional




Re: A patch for get origin from commit_ts.

2020-07-02 Thread Michael Paquier
On Thu, Jul 02, 2020 at 10:12:02AM +0200, Petr Jelinek wrote:
> On 02/07/2020 03:58, mich...@paquier.xyz wrote:
>> Adding a new function able to return both fields at the same time does
>> not imply that we'd remove the original one, it just implies that we
>> would be able to retrieve both fields with a single call of
>> TransactionIdGetCommitTsData(), saving from an extra CommitTsSLRULock
>> taken, etc.  That's actually what pglogical does with
>> its pglogical_xact_commit_timestamp_origin() in
>> pglogical_functions.c.  So adding one function able to return one
>> tuple with the two fields, without removing the existing
>> pg_xact_commit_timestamp() makes the most sense, no?
> 
> Agreed, sounds reasonable.

Thanks.  Movead, please note that the patch is waiting on author?
Could you send an update if you think that those changes make sense?
--
Michael


signature.asc
Description: PGP signature


Re: Read access for pg_monitor to pg_replication_origin_status view

2020-07-02 Thread Michael Paquier
On Thu, Jul 02, 2020 at 03:03:22PM +0200, Daniel Gustafsson wrote:
> AFAICT from the thread there is nothing left of this changeset to consider, so
> I've marked the entry as committed in the CF app.  Feel free to update in case
> I've missed something.

A second item discussed in this thread was if we should try to grant
more privileges to pg_read_all_stats when it comes to replication
origins, that's why I let the entry in the CF app.  Now, the thread
has stalled and what has been committed in cc07264 allows to do this
change anyway, so marking the entry as committed sounds fine to me.
Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: track_planning causing performance regression

2020-07-02 Thread Pavel Stehule
Hi

pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao 
napsal:

>
>
> On 2020/07/01 7:37, Peter Geoghegan wrote:
> > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao 
> wrote:
> >> Ants and Andres suggested to replace the spinlock used in pgss_store()
> with
> >> LWLock. I agreed with them and posted the POC patch doing that. But I
> think
> >> the patch is an item for v14. The patch may address the reported
> performance
> >> issue, but may cause other performance issues in other workloads. We
> would
> >> need to measure how the patch affects the performance in various
> workloads.
> >> It seems too late to do that at this stage of v13. Thought?
> >
> > I agree that it's too late for v13.
>
> Thanks for the comment!
>
> So I pushed the patch and changed default of track_planning to off.
>

Maybe there can be documented so enabling this option can have a negative
impact on performance.

Regards

Pavel

>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>
>
>


Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 7:46 PM Justin Pryzby  wrote:
> Thanks for putting it together, I agree that hash_mem seems to be an obvious
> "escape hatch" that generalizes existing GUCs and independently useful.

It is independently useful. It's a natural consequence of "being
honest" about work_mem and hashing.

> I feel it should same as work_mem, as it's written, and not a multiplier.
>
> And actually I don't think a lower value should be ignored: "mechanism not
> policy".  Do we refuse atypical values of maintenance_work_mem < work_mem ?

I see your point, but AFAIK maintenance_work_mem was not retrofit like
this. It seems different. (Unless we add the -1 behavior, perhaps)

> I assumed that hash_mem would default to -1, which would mean "fall back to
> work_mem".  We'd then advise users to increase it if they have an issue in v13
> with performance of hashes spilled to disk.  (And maybe in other cases, too.)

Yeah, this kind of -1 behavior could make sense.

> I read the argument that hash tables are a better use of RAM than sort.
> However it seems like setting the default to greater than work_mem is a
> separate change than providing the mechanism allowing user to do so.  I guess
> the change in default is intended to mitigate the worst possible behavior
> change someone might experience in v13 hashing, and might be expected to
> improve "out of the box" performance.  I'm not opposed to it, but it's not an
> essential part of the patch.

That's true.

> In nodeHash.c, you missed an underscore:
> +* Target in-memory hashtable size is hashmem kilobytes.

Got it; thanks.

-- 
Peter Geoghegan




Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 7:51 PM Mark Dilger  wrote:
> I expect these issues to be less than half what you would need to resolve, 
> though much of the rest of it is less clear to me.

Thank you for this insightful input. I will think it over.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 8:00 PM Bruce Momjian  wrote:
> Also, I feel this is all out of scope for PG 13, frankly.  I think our
> only option is to revert the hash spill entirely, and return to PG 13
> behavior, if people are too worried about hash performance regression.

But the problem isn't really the hashaggs-that-spill patch itself.
Rather, the problem is the way that work_mem is supposed to behave in
general, and the impact that that has on hash aggregate now that it
has finally been brought into line with every other kind of executor
node. There just isn't much reason to think that we should give the
same amount of memory to a groupagg + sort as a hash aggregate. The
patch more or less broke an existing behavior that is itself
officially broken. That is, the problem that we're trying to fix here
is only a problem to the extent that the previous scheme isn't really
operating as intended (because grouping estimates are inherently very
hard). A revert doesn't seem like it helps anyone.

I accept that the idea of inventing hash_mem to fix this problem now
is unorthodox. In a certain sense it solves problems beyond the
problems that we're theoretically obligated to solve now. But any
"more conservative" approach that I can think of seems like a big
mess.

-- 
Peter Geoghegan




Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 12:16 PM Thomas Munro  wrote:
> This was a research topic in ancient times (somewhere I read that in
> some ancient version, VACUUM didn't originally remove tuples, it moved
> them to permanent write-only storage).  Even after the open source
> project began, there was a "time travel" feature, but it was removed
> in 6.2:

Very interesting. Thanks for sharing.

> There aren't indexes on those things.

Oh. My information is based on what I read in [1]. This is where I
realized that if PostgreSQL maintains those extra columns and indices,
then there is no point in replicating that by copying all that to
another table. So this is not true? Or not true anymore?

> If you want to keep track of all changes in a way that lets you query
> things as of historical times, including joins, and possibly including
> multiple time dimensions ("on the 2nd of Feb, what address did we
> think Fred lived at on the 1st of Jan?") you might want to read
> "Developing Time-Oriented Database Applications in SQL" about this,

Interesting. I checked it out a bit. I think this is not exactly what
I am searching for. My main motivation is reactive web applications,
where I can push changes of (sub)state of the database to the web app,
when that (sub)state changes. And if the web app is offline for some
time, that it can come and resync also all missed changes. Moreover,
changes themselves are important (not just the last state) because it
allows one to merge with a potentially changed local state in the web
app while it was offline. So in a way it is logical replication and
replay, but just at database - client level.

[1] https://eng.uber.com/postgres-to-mysql-migration/


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: POC and rebased patch for CSN based snapshots

2020-07-02 Thread Andrey V. Lepikhov

On 7/2/20 7:31 PM, Movead Li wrote:

Thanks for the remarks,

 >Some remarks on your patch:
 >1. The variable last_max_csn can be an atomic variable.
Yes will consider.

 >2. GenerateCSN() routine: in the case than csn < csnState->last_max_csn
 >This is the case when someone changed the value of the system clock. I
 >think it is needed to write a WARNING to the log file. (May be we can do
 >synchronization with a time server.
Yes good point, I will work out a way to report the warning, it should 
exist a

report gap rather than report every time it generates CSN.
If we really need a correct time? What's the inferiority if one node 
generate

csn by monotonically increasing?
Changes in time values can lead to poor effects, such as old snapshot. 
Adjusting the time can be a kind of defense.


 >3. That about global snapshot xmin? In the pgpro version of the patch we
 >had GlobalSnapshotMapXmin() routine to maintain circular buffer of
 >oldestXmins for several seconds in past. This buffer allows to shift
 >oldestXmin in the past when backend is importing global transaction.
 >Otherwise old versions of tuples that were needed for this transaction
 >can be recycled by other processes (vacuum, HOT, etc).
 >How do you implement protection from local pruning? I saw
 >SNAP_DESYNC_COMPLAIN, but it is not used anywhere.
I have researched your patch which is so great, in the patch only data
out of 'global_snapshot_defer_time' can be vacuum, and it keep dead
tuple even if no snapshot import at all,right?

I am thanking about a way if we can start remain dead tuple just before
we import a csn snapshot.

Base on Clock-SI paper, we should get local CSN then send to shard nodes,
because we do not known if the shard nodes' csn bigger or smaller then
master node, so we should keep some dead tuple all the time to support
snapshot import anytime.

Then if we can do a small change to CLock-SI model, we do not use the
local csn when transaction start, instead we touch every shard node for
require their csn, and shard nodes start keep dead tuple, and master node
choose the biggest csn to send to shard nodes.

By the new way, we do not need to keep dead tuple all the time and do
not need to manage a ring buf, we can give to ball to 'snapshot too old'
feature. But for trade off, almost all shard node need wait.
I will send more detail explain in few days.
I think, in the case of distributed system and many servers it can be 
bottleneck.
Main idea of "deferred time" is to reduce interference between DML 
queries in the case of intensive OLTP workload. This time can be reduced 
if the bloationg of a database prevails over the frequency of 
transaction aborts.



 >4. The current version of the patch is not applied clearly with current
 >master.
Maybe it's because of the release of PG13, it cause some conflict, I will
rebase it.

Ok


---
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca





--
regards,
Andrey Lepikhov
Postgres Professional




Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-02 Thread Bruce Momjian
On Thu, Jul  2, 2020 at 10:58:34PM -0400, Bruce Momjian wrote:
> On Thu, Jul  2, 2020 at 09:46:49PM -0500, Justin Pryzby wrote:
> > On Thu, Jul 02, 2020 at 07:05:48PM -0700, Peter Geoghegan wrote:
> > > anything else). I still think that the new GUC should work as a
> > > multiplier of work_mem, or something else along those lines, though
> > > for now it's just an independent work_mem used for hashing. I bring it
> > > up again because I'm concerned about users that upgrade to Postgres 13
> > > incautiously, and find that hashing uses *less* memory than before.
> > > Many users probably get away with setting work_mem quite high across
> > > the board. At the very least, hash_mem should be ignored when it's set
> > > to below work_mem (which isn't what the patch does).
> > 
> > I feel it should same as work_mem, as it's written, and not a multiplier.
> > 
> > And actually I don't think a lower value should be ignored: "mechanism not
> > policy".  Do we refuse atypical values of maintenance_work_mem < work_mem ?
> > 
> > I assumed that hash_mem would default to -1, which would mean "fall back to
> > work_mem".  We'd then advise users to increase it if they have an issue in 
> > v13
> > with performance of hashes spilled to disk.  (And maybe in other cases, 
> > too.)
> 
> Uh, with this patch, don't we really have sort_mem and hash_mem, but
> hash_mem default to sort_mem, or something like that.  If hash_mem is a
> multiplier, it would make more sense to keep the work_mem name.

Also, I feel this is all out of scope for PG 13, frankly.  I think our
only option is to revert the hash spill entirely, and return to PG 13
behavior, if people are too worried about hash performance regression.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-02 Thread Bruce Momjian
On Thu, Jul  2, 2020 at 09:46:49PM -0500, Justin Pryzby wrote:
> On Thu, Jul 02, 2020 at 07:05:48PM -0700, Peter Geoghegan wrote:
> > anything else). I still think that the new GUC should work as a
> > multiplier of work_mem, or something else along those lines, though
> > for now it's just an independent work_mem used for hashing. I bring it
> > up again because I'm concerned about users that upgrade to Postgres 13
> > incautiously, and find that hashing uses *less* memory than before.
> > Many users probably get away with setting work_mem quite high across
> > the board. At the very least, hash_mem should be ignored when it's set
> > to below work_mem (which isn't what the patch does).
> 
> I feel it should same as work_mem, as it's written, and not a multiplier.
> 
> And actually I don't think a lower value should be ignored: "mechanism not
> policy".  Do we refuse atypical values of maintenance_work_mem < work_mem ?
> 
> I assumed that hash_mem would default to -1, which would mean "fall back to
> work_mem".  We'd then advise users to increase it if they have an issue in v13
> with performance of hashes spilled to disk.  (And maybe in other cases, too.)

Uh, with this patch, don't we really have sort_mem and hash_mem, but
hash_mem default to sort_mem, or something like that.  If hash_mem is a
multiplier, it would make more sense to keep the work_mem name.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: estimation problems for DISTINCT ON with FDW

2020-07-02 Thread Etsuro Fujita
On Thu, Jul 2, 2020 at 11:46 PM Tom Lane  wrote:
> Etsuro Fujita  writes:
> > On Wed, Jul 1, 2020 at 11:40 PM Tom Lane  wrote:
> >> Short of sending a whole second query to the remote server, it's
> >> not clear to me how we could get the full table size (or equivalently
> >> the target query's selectivity for that table).  The best we realistically
> >> can do is to adopt pg_class.reltuples if there's been an ANALYZE of
> >> the foreign table.  That case already works (and this proposal doesn't
> >> break it).  The problem is what to do when pg_class.reltuples is zero
> >> or otherwise badly out-of-date.
>
> > In estimate_path_cost_size(), if use_remote_estimate is true, we
> > adjust the rows estimate returned from the remote server, by factoring
> > in the selectivity of the locally-checked quals.  I thought what I
> > proposed above would be more consistent with that.
>
> No, I don't think that would be very helpful.  There are really three
> different numbers of interest here:
>
> 1. The actual total rowcount of the remote table.
>
> 2. The number of rows returned by the remote query (which is #1 times
> the selectivity of the shippable quals).
>
> 3. The number of rows returned by the foreign scan (which is #2 times
> the selectivity of the non-shippable quals)).
>
> Clearly, rel->rows should be set to #3.  However, what we really want
> for rel->tuples is #1.  That's because, to the extent that the planner
> inspects rel->tuples at all, it's to adjust whole-table stats such as
> we might have from ANALYZE.  What you're suggesting is that we use #2,
> but I doubt that that's a big improvement.  In a decently tuned query
> it's going to be a lot closer to #3 than to #1.
>
> We could perhaps try to make our own estimate of the selectivity of the
> shippable quals and then back into #1 from the value we got for #2 from
> the remote server.

Actually, that is what I suggested:

+ /*
+ * plancat.c copied baserel->pages and baserel->tuples from pg_class.
+ * If the foreign table has never been ANALYZEd, or if its stats are
+ * out of date, baserel->tuples might now be less than baserel->rows,
+ * which will confuse assorted logic.  Hack it to appear minimally
+ * sensible.  (Do we need to hack baserel->pages too?)
+ */
+ baserel->tuples = Max(baserel->tuples, baserel->rows);

for consistency, this should be

  baserel->tuples = clamp_row_est(baserel->rows / sel);

where sel is the selectivity of the baserestrictinfo clauses?

By "the baserestrictinfo clauses", I mean the shippable clauses as
well as the non-shippable clauses.  Since baserel->rows stores the
rows estimate returned by estimate_path_cost_size(), which is #3, this
estimates #1.

> But that sounds mighty error-prone, so I doubt it'd
> make for much of an improvement.

I have to admit the error-proneness.

> In any case, the proposal I'm making is just to add a sanity-check
> clamp to prevent the worst effects of not setting rel->tuples sanely.
> It doesn't foreclose future improvements inside the FDW.

Agreed.

Best regards,
Etsuro Fujita




Re: Persist MVCC forever - retain history

2020-07-02 Thread Mark Dilger



> On Jul 2, 2020, at 5:58 PM, Mitar  wrote:
> 
>> Plus, wrap-around and freezing aren’t just nice-to-have features.
> 
> Oh, I forgot about that. ctid is still just 32 bits? So then for such
> table with permanent MVCC this would have to be increased, to like 64
> bits or something. Then one would not have to do wrap-around
> protection, no?

I think what you propose is a huge undertaking, and would likely result in a 
fork of postgres not compatible with the public sources.  I do not recommend 
the project.  But in answer to your question

Yes, the values stored in the tuple header are 32 bits.  Take a look in 
access/htup_details.h.  You'll notice that HeapTupleHeaderData has a union:

union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
}   t_choice;

If you check, HeapTupleFields and DatumTupleFields are the same size, each 
having three 32 bit values, though they mean different things.  You may need to 
expand types TransactionId, CommandId, and Oid to 64 bits, expand varlena 
headers to 64 bits, and typemods to 64 bits.  You may find that it is harder to 
just expand a subset of those, given the way these fields overlay in these 
unions.  There will be lot of busy work going through the code to adjust 
everything else to match.  Just updating printf style formatting in error 
messages may take a long time.

If you do choose to expand only some of the types, say just TransactionId and 
CommandId, you'll have to deal with the size mismatch between HeapTupleFields 
and DatumTupleFields.

Aborted transactions leave dead rows in your tables, and you may want to deal 
with that for performance reasons.  Even if you don't intend to remove deleted 
rows, because you are just going to keep them around for time travel purposes, 
you might still want to use vacuum to remove dead rows, those that never 
committed.

You'll need to think about how to manage the growing clog if you don't intend 
to truncate it periodically.  Or if you do intend to truncate clog 
periodically, you'll need to think about the fact that you have TransactionIds 
in your tables older than what clog knows about.

You may want to think about how keeping dead rows around affects index 
performance.

I expect these issues to be less than half what you would need to resolve, 
though much of the rest of it is less clear to me.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: track_planning causing performance regression

2020-07-02 Thread Fujii Masao




On 2020/07/03 11:43, Peter Geoghegan wrote:

On Thu, Jul 2, 2020 at 7:39 PM Fujii Masao  wrote:

So I pushed the patch and changed default of track_planning to off.


I have closed out the open item I created for this.


Thanks!!

I added the patch that replaces spinlock with lwlock in pgss, into CF-2020-09.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-02 Thread Justin Pryzby
On Thu, Jul 02, 2020 at 07:05:48PM -0700, Peter Geoghegan wrote:
> On Mon, Jun 29, 2020 at 2:46 PM Peter Geoghegan  wrote:
> > On Mon, Jun 29, 2020 at 2:22 PM Tomas Vondra  
> > wrote:
> > > I agree with this, and I'm mostly OK with having hash_mem. In fact, from
> > > the proposals in this thread I like it the most - as long as it's used
> > > both during planning and execution. It's a pretty clear solution.
> >
> > Great.
> >
> > It's not trivial to write the patch, since there are a few tricky
> > cases. For example, maybe there is some subtlety in nodeAgg.c with
> > AGG_MIXED cases.
> 
> Attached is an attempt at this. I have not been particularly thorough,

Thanks for putting it together, I agree that hash_mem seems to be an obvious
"escape hatch" that generalizes existing GUCs and independently useful.

> anything else). I still think that the new GUC should work as a
> multiplier of work_mem, or something else along those lines, though
> for now it's just an independent work_mem used for hashing. I bring it
> up again because I'm concerned about users that upgrade to Postgres 13
> incautiously, and find that hashing uses *less* memory than before.
> Many users probably get away with setting work_mem quite high across
> the board. At the very least, hash_mem should be ignored when it's set
> to below work_mem (which isn't what the patch does).

I feel it should same as work_mem, as it's written, and not a multiplier.

And actually I don't think a lower value should be ignored: "mechanism not
policy".  Do we refuse atypical values of maintenance_work_mem < work_mem ?

I assumed that hash_mem would default to -1, which would mean "fall back to
work_mem".  We'd then advise users to increase it if they have an issue in v13
with performance of hashes spilled to disk.  (And maybe in other cases, too.)

I read the argument that hash tables are a better use of RAM than sort.
However it seems like setting the default to greater than work_mem is a
separate change than providing the mechanism allowing user to do so.  I guess
the change in default is intended to mitigate the worst possible behavior
change someone might experience in v13 hashing, and might be expected to
improve "out of the box" performance.  I'm not opposed to it, but it's not an
essential part of the patch.

In nodeHash.c, you missed an underscore:
+* Target in-memory hashtable size is hashmem kilobytes.

-- 
Justin




Re: Creating a function for exposing memory usage of backend process

2020-07-02 Thread torikoshia
On Wed, Jul 1, 2020 at 10:15 PM torikoshia  
wrote:

I'm going to do some renaming and transportations.

- view name: pg_memory_contexts
- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c


Attached an updated patch.

On Wed, Jul 1, 2020 at 10:58 PM Fujii Masao 
 wrote:

Ok, understood! I agree that it's strange to display different names
for the same memory context between this view and logging.

It's helpful if the comment there refers to MemoryContextStatsPrint()
and mentions the reason that you told.


Agreed. I changed the comments.


> It also derived from MemoryContextStatsPrint().
> I suppose it is for keeping readability of the log..

Understood. I may want to change the upper limit of query size to 
display.

But at the first step, I'm fine with limitting 1024 bytes.


Thanks, I've left it as it was.


> I'm going to follow the discussion on the mailing list and find why
> these codes were introduced.

https://www.postgresql.org/message-id/12319.1521999065%40sss.pgh.pa.us


Thanks for sharing!


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 3 Jul 2020 11:00:42 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

Previously, the only way to examine the usage of backend processes
was attaching a debugger and call MemoryContextStats() and it was
not so convenient in some cases.

This patch implements a new SQL-callable function
pg_get_memory_contexts which exposes memory usage of the local
backend.
It also adds a new view pg_memory_contexts for exposing local
backend memory contexts.
---
 doc/src/sgml/catalogs.sgml   | 126 ++
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 130 ++-
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 +++
 5 files changed, 277 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 49a881b262..e7ec822139 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9248,6 +9248,11 @@ SCRAM-SHA-256$iteration count:
   materialized views
  
 
+ 
+  pg_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_policies
   policies
@@ -10526,6 +10531,127 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_memory_contexts
+
+  
+   pg_memory_contexts
+  
+
+  
+   backend memory contexts
+  
+
+  
+   The view pg_memory_contexts displays all
+   the local backend memory contexts.
+  
+  
+   pg_memory_contexts contains one row
+   for each memory context.
+  
+
+  
+   pg_memory_contexts Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   name text
+  
+  
+   Name of the memory context
+  
+ 
+
+ 
+  
+   ident text
+  
+  
+   Identification information of the memory context. This field is truncated if the identification field is longer than 1024 characters
+  
+ 
+
+ 
+  
+   parent text
+  
+  
+   Name of the parent of this memory context
+  
+ 
+
+ 
+  
+   level integer
+  
+  
+   Distance from TopMemoryContext in context tree
+  
+ 
+
+ 
+  
+   total_bytes bigint
+  
+  
+   Total bytes requested from malloc
+  
+ 
+
+ 
+  
+   total_nblocks bigint
+  
+  
+   Total number of malloc blocks
+  
+ 
+
+ 
+  
+   free_bytes bigint
+  
+  
+   Free space in bytes
+  
+ 
+
+ 
+  
+   free_chunks bigint
+  
+  
+   Total number of free chunks
+  
+ 
+
+ 
+  
+   used_bytes bigint
+  
+  
+   Used space in bytes
+  
+ 
+
+   
+  
+
+ 
+
  
   pg_matviews
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..79d44842c6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -554,6 +554,9 @@ CREATE VIEW pg_shmem_allocations AS
 REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 
+CREATE VIEW pg_memory_contexts AS
+SELECT * FROM pg_get_memory_contexts();
+
 -- Statistics views
 
 CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index abda22fa57..76fd6f9a6a 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -21,12 +21,13 @@
 
 #include "postgres.h"
 
+#include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
 
-
 

Re: Persist MVCC forever - retain history

2020-07-02 Thread Jonah H. Harris
On Thu, Jul 2, 2020 at 2:56 PM Mitar  wrote:

> Hi!
>
> (Sorry if this was already discussed, it looks pretty obvious, but I
> could not find anything.)


There have been a couple timetravel extensions done, each with their own
limitations. I don’t believe a perfect implementation could be done without
reading the functionality to core (which would be new functionality given
all the changes.) I’d say start with the extensions and go from there.

-- 
Jonah H. Harris


Re: track_planning causing performance regression

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 7:39 PM Fujii Masao  wrote:
> So I pushed the patch and changed default of track_planning to off.

I have closed out the open item I created for this.

Thanks!
-- 
Peter Geoghegan




Re: track_planning causing performance regression

2020-07-02 Thread Fujii Masao




On 2020/07/01 7:37, Peter Geoghegan wrote:

On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao  wrote:

Ants and Andres suggested to replace the spinlock used in pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that. But I think
the patch is an item for v14. The patch may address the reported performance
issue, but may cause other performance issues in other workloads. We would
need to measure how the patch affects the performance in various workloads.
It seems too late to do that at this stage of v13. Thought?


I agree that it's too late for v13.


Thanks for the comment!

So I pushed the patch and changed default of track_planning to off.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add Information during standby recovery conflicts

2020-07-02 Thread Masahiko Sawada
On Wed, 1 Jul 2020 at 21:52, Drouvot, Bertrand  wrote:
>
> Hey,
>
> On 6/29/20 10:55 AM, Masahiko Sawada wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> >
> >
> >
> > On Thu, 18 Jun 2020 at 16:28, Drouvot, Bertrand  wrote:
> >> Hi hackers,
> >>
> >> I've attached a patch to add information during standby recovery conflicts.
> >> The motive behind is to be able to get some information:
> >>
> >> On the apply side
> >> On the blocker(s) side
> >>
> >> Motivation:
> >>
> >> When a standby recovery conflict occurs it could be useful to get more 
> >> information to be able to dive deep on the root cause and find a way to 
> >> avoid/mitigate new occurrences.
> > I think this is a good feature. Like log_lock_waits, it will help the
> > users to investigate recovery conflict issues.
> exactly, thanks for looking at it.
> >
> >> Adding this information would make the investigations easier, it could 
> >> help answering questions like:
> >>
> >> On which LSN was the WAL apply blocked?
> >> What was the purpose of the bocked WAL record?
> >> On which relation (if any) was the blocked WAL record related to?
> >> What was the blocker(s) doing?
> >> When did the blocker(s) started their queries (if any)?
> >> What was the blocker(s) waiting for? on which wait event?
> >>
> >> Technical context and proposal:
> >>
> >> There is 2 points in this patch:
> >>
> >> Add the information about the blocked WAL record. This is done in 
> >> standby.c (ResolveRecoveryConflictWithVirtualXIDs, 
> >> ResolveRecoveryConflictWithDatabase, StandbyTimeoutHandler)
> > I think we already have the information about the WAL record being
> > applied in errcontext.
> right, but it won’t be displayed in case log_error_verbosity is set to
> terse.

Yes. I think in this case errcontext or errdetail is more appropriate
for this information in this case. Otherwise, we will end up emitting
same WAL record information twice in log_error_verbosity = verbose.

For instance, here is the server logs when I tested a recovery
conflict on buffer pin but it looks redundant:

2020-07-03 11:01:15.339 JST [60585] LOG:  wal record apply is blocked
by 1 connection(s), reason: User query might have needed to see row
versions that must be removed.
2020-07-03 11:01:15.339 JST [60585] CONTEXT:  WAL redo at 0/30025E0
for Heap2/CLEAN: remxid 505
2020-07-03 11:01:15.339 JST [60585] LOG:  blocked wal record rmgr:
Heap2, lsn: 0/030025E0, received at: 2020-07-03 11:01:15.338997+09,
desc: CLEAN, relation: rel 1663/12930/16384 fork main blk 0
2020-07-03 11:01:15.339 JST [60585] CONTEXT:  WAL redo at 0/30025E0
for Heap2/CLEAN: remxid 505
2020-07-03 11:01:15.347 JST [60604] LOG:  about to be interrupted:
backend_type: client backend, state: active, wait_event: PgSleep,
query_start: 2020-07-03 11:01:14.337708+09
2020-07-03 11:01:15.347 JST [60604] STATEMENT:  select pg_sleep(30);
2020-07-03 11:01:15.347 JST [60604] ERROR:  canceling statement due to
conflict with recovery
2020-07-03 11:01:15.347 JST [60604] DETAIL:  User query might have
needed to see row versions that must be removed.
2020-07-03 11:01:15.347 JST [60604] STATEMENT:  select pg_sleep(30);

There are the information WAL record three times and the reason for
interruption twice.

> Or did you had in mind to try to avoid using the new
> “current_replayed_xlog” in xlog.c?

Regarding LogBlockedWalRecordInfo(), it seems to me that the main
message of this logging is to let users know both that the process got
stuck due to recovery conflict and its reason (lock, database,
bufferpin etc). Other information such as the details of blocked WAL,
how many processes are blocking seems like additional information. So
I think this information would be better to be shown errcontext or
errdetails and we don’t need to create a similar feature as we already
show the WAL record being replayed in errcontext.

Also, this function emits two LOG messages related to each other but
I'm concerned that it can be hard for users to associate these
separate log lines especially when server logs are written at a high
rate. And I think these two log lines don't follow our error message
style[1].

>
> >
> > I wonder if we can show the recovery conflict information in the main
> > LOG message, the blocker information in errdetail, and use errcontext
> > with regard to WAL record information. For example:
> >
> > LOG:  process 500 waiting for recovery conflict on snapshot
> > DETAIL:  conflicting transition id: 100, 200, 300
> > CONTEXT:  WAL redo at 0/3001970 for Heap2/CLEAN: remxid 506
> Not sure at all if it would be possible to put all the information at
> the same time.
> For example in case of shared buffer pin conflict, the blocker is
> currently known “only” during the RecoveryConflictInterrupt call (so
> still not known yet when we can “already” report the blocked LSN
> information).
> It might also 

Re: Default setting for enable_hashagg_disk

2020-07-02 Thread Peter Geoghegan
On Mon, Jun 29, 2020 at 2:46 PM Peter Geoghegan  wrote:
> On Mon, Jun 29, 2020 at 2:22 PM Tomas Vondra
>  wrote:
> > I agree with this, and I'm mostly OK with having hash_mem. In fact, from
> > the proposals in this thread I like it the most - as long as it's used
> > both during planning and execution. It's a pretty clear solution.
>
> Great.
>
> It's not trivial to write the patch, since there are a few tricky
> cases. For example, maybe there is some subtlety in nodeAgg.c with
> AGG_MIXED cases.

Attached is an attempt at this. I have not been particularly thorough,
since it is still not completely clear that the hash_mem proposal has
a serious chance of resolving the "many users rely on hashagg
exceeding work_mem, regardless of whether or not that is the intended
behavior in Postgres 12" problem. But at least we have a patch now,
and so have some idea of how invasive this will have to be. We also
have something to test.

Note that I created a new open item for this "maybe we need something
like a hash_mem GUC now" problem today. To recap, this thread started
out being a discussion about the enable_hashagg_disk GUC, which seems
like a distinct problem to me. It won't make much sense to return to
discussing the original problem before we have a solution to this
other problem (the problem that I propose to address by inventing
hash_mem).

About the patch:

The patch adds hash_mem, which is just another work_mem-like GUC that
the patch has us use in certain cases -- cases where the work area is
a hash table (but not when it's a sort, or some kind of bitset, or
anything else). I still think that the new GUC should work as a
multiplier of work_mem, or something else along those lines, though
for now it's just an independent work_mem used for hashing. I bring it
up again because I'm concerned about users that upgrade to Postgres 13
incautiously, and find that hashing uses *less* memory than before.
Many users probably get away with setting work_mem quite high across
the board. At the very least, hash_mem should be ignored when it's set
to below work_mem (which isn't what the patch does).

It might have made more sense to call the new GUC hash_work_mem
instead of hash_mem. I don't feel strongly about the name. Again, this
is just a starting point for further discussion.

-- 
Peter Geoghegan


0001-Add-a-GUC-that-limits-memory-use-for-hash-tables.patch
Description: Binary data


Re: track_planning causing performance regression

2020-07-02 Thread Fujii Masao




On 2020/07/02 1:54, Andres Freund wrote:

Hi,

On 2020-07-01 22:20:50 +0900, Fujii Masao wrote:

On 2020/07/01 4:03, Andres Freund wrote:

Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.


Because pgssEntry is not array entry but hashtable entry. First I was
thinking to assign per-process lwlock to each entry in the array at the
startup. But each entry is created every time new entry is required.
So lwlock needs to be assigned to each entry at that creation time.
We cannnot easily assign lwlock to all the entries at the startup.


But why not just do it exactly at the place the SpinLockInit() is done
currently?


Sorry I failed to understand your point... You mean that new lwlock should
be initialized at the place the SpinLockInit() is done currently instead of
requesting postmaster to initialize all the lwlocks required for pgss
at _PG_init()?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Additional improvements to extended statistics

2020-07-02 Thread Tomas Vondra

On Wed, Jul 01, 2020 at 01:19:40PM +0200, Daniel Gustafsson wrote:

On 24 Mar 2020, at 15:33, Tomas Vondra  wrote:

On Tue, Mar 24, 2020 at 01:20:07PM +, Dean Rasheed wrote:



Sounds like a reasonable approach, but I think it would be better to
preserve the current public API by having clauselist_selectivity()
become a thin wrapper around  a new function that optionally applies
extended stats.



OK, makes sense. I'll take a stab at it.


Have you had time to hack on this?  The proposed patch no longer applies, so
I've marked the entry Waiting on Author.


Yep, here's a rebased patch. This does not include the changes we've
discussed with Dean in March, but I plan to address that soon.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From 3120d4b483c203bb83e114a1099f0391fe5b Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 3 Jul 2020 03:06:33 +0200
Subject: [PATCH] Improve estimation of OR clauses using extended statistics

Until now, OR clauses were estimated using extended statistics only when
the whole clause (all the arguments) are compatible. If even just one
argument was found to be incompatible, the whole clause was estimated
ignoring extended statistics. Estimation errors for OR clauses tend to
be fairly mild, so this was considered acceptable, but it may become an
issue for OR clauses with more complex arguments, etc.

This commit relaxes the restriction, using mostly the same logic as AND
clauses. We first apply extended statistics to as many arguments as
possible, and then use the (s1 + s2 - s1 * s2) formula to factor in the
remaining clauses.

The OR clause is still considered incompatible, though. If any argument
is unsupported or references variable not covered by the statistics, the
whole OR clause is incompatible. The consequence is that e.g. clauses

(a = 1) AND (b = 1 OR c = 1 OR d = 1)

can't be estimated by statistics on (a,b,c) because the OR clause also
references "d". So we'll estimate each of the AND arguments separately,
and the extended statistics will be used only to estimate the OR clause.
This may be solved by creating statistics including the "d" column, but
the issue applies to cases where the clause type is unsupported, e.g.

(a = 1) AND (b = 1 OR c = 1 OR mod(d,10) = 0)

which can't be solved by adding "d" to the statistics, at least for now.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed, Thomas Munro
Discussion: Discussion: 
https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development
---
 src/backend/optimizer/path/clausesel.c| 141 +--
 src/backend/statistics/extended_stats.c   |  36 ++--
 src/backend/statistics/mcv.c  |   5 +-
 src/include/optimizer/optimizer.h |   7 +
 .../statistics/extended_stats_internal.h  |   3 +-
 src/include/statistics/statistics.h   |   3 +-
 src/test/regress/expected/stats_ext.out   | 165 +-
 src/test/regress/sql/stats_ext.sql|  77 +++-
 8 files changed, 395 insertions(+), 42 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c 
b/src/backend/optimizer/path/clausesel.c
index a3ebe10592..ce14d47409 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -92,7 +92,7 @@ clauselist_selectivity(PlannerInfo *root,
 */
s1 *= statext_clauselist_selectivity(root, clauses, varRelid,

 jointype, sjinfo, rel,
-   
 );
+   
 , false);
}
 
/*
@@ -104,6 +104,62 @@ clauselist_selectivity(PlannerInfo *root,

  estimatedclauses);
 }
 
+/*
+ * clauselist_selectivity_or -
+ *   Compute the selectivity of an implicitly-ORed list of boolean
+ *   expression clauses.  The list can be empty, in which case 0.0
+ *   must be returned.  List elements may be either RestrictInfos
+ *   or bare expression clauses --- the former is preferred since
+ *   it allows caching of results.
+ *
+ * See clause_selectivity() for the meaning of the additional parameters.
+ *
+ * The basic approach is to apply extended statistics first, on as many
+ * clauses as possible, in order to capture cross-column dependencies etc.
+ * The remaining clauses are then estimated using regular statistics tracked
+ * for individual columns.  This is done by simply passing the clauses to
+ * clauselist_selectivity and then combining the selectivities using the
+ * regular formula (s1+s2 - s1*s2).
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+ 

Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 12:12 PM David G. Johnston
 wrote:
> Even for a single table how would you go about specifying this in a 
> user-friendly way?  Then consider joins.

One general answer: you use query rewriting. But what is user-friendly
depends on the use case. For me, the main motivation for this is that
I would like to sync database and client state, including all
revisions of data. So it is pretty easy to then query based on this
row revision for which rows are newer and sync them over. And then I
can show diffs of changes through time for that particular row.

I agree that reconstructing joins at one particular moment in time in
the past requires more information. But that information also other
solutions (like copying all changes to a separate table in triggers)
require: adding timestamp column and so on. So I can just have a
timestamp column in my original (and only) table and have a BEFORE
trigger which populates it with a timestamp. Then at a later time,
when I have in one table all revisions of a row, I can also query
based on timestamp, but PostgreSQL revision column help me to address
the issue of two changes happening at the same timestamp.

I still gain that a) I do not have to copy rows to another table b) I
do not have to vacuum. The only downside is that I have to rewrite
queries for the latest state to operate only on the latest state (or
maybe PostgreSQL could continue to do this for me like now, just allow
me to also access old versions).

>  If by “this” you mean leveraging MVCC you don’t; it isn’t suitable for 
> persistent temporal data.

Why not?

> The fundamental missing piece is that there is no concept of timestamp in 
> MVCC.

That can be added using BEFORE trigger.

> Plus, wrap-around and freezing aren’t just nice-to-have features.

Oh, I forgot about that. ctid is still just 32 bits? So then for such
table with permanent MVCC this would have to be increased, to like 64
bits or something. Then one would not have to do wrap-around
protection, no?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: why do we allow people to create a publication before setting wal_leve

2020-07-02 Thread Tomas Vondra

On Thu, Jul 02, 2020 at 12:37:29PM -0400, Dave Cramer wrote:

This seems pretty strange:

create publication pub1 for all tables;

WARNING:  wal_level is insufficient to publish logical changes
HINT:  Set wal_level to logical before creating subscriptions.



pg_dump restoring a database with publications would fail unnecessarily.

There's a more detailed explanation in the thread that ultimately added
the warning in 2019:

https://www.postgresql.org/message-id/flat/CAPjy-57rn5Y9g4e5u--eSOP-7P4QrE9uOZmT2ZcUebF8qxsYhg%40mail.gmail.com


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Enabling B-Tree deduplication by default

2020-07-02 Thread Bruce Momjian
On Thu, Jul  2, 2020 at 02:59:47PM -0700, Peter Geoghegan wrote:
> On Thu, Jun 25, 2020 at 4:28 PM Peter Geoghegan  wrote:
> > It's now time to make a final decision on this. Does anyone have any
> > reason to believe that leaving deduplication enabled by default is the
> > wrong way to go?
> 
> I marked the open item resolved just now -- B-Tree deduplication will
> remain enabled by default in Postgres 13.

+1

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: pg_read_file() with virtual files returns empty string

2020-07-02 Thread Tom Lane
Joe Conway  writes:
> On 7/2/20 5:37 PM, Tom Lane wrote:
>> I still can't get excited about contorting the code to remove that
>> issue.

> It doesn't seem much worse than the oom test that was there before -- see 
> attached.

Personally I would not bother, but it's your patch.

> Are we in agreement that whatever gets pushed should be backpatched through 
> pg11
> (see start of thread)?

OK by me.

regards, tom lane




Re: pg_read_file() with virtual files returns empty string

2020-07-02 Thread Joe Conway
On 7/2/20 5:37 PM, Tom Lane wrote:
> Joe Conway  writes:
>> In fact, in principle there is no reason we can't get to max - 4 with this 
>> code
>> except that when the filesize is exactly 1073741819, we need to try to read 
>> one
>> more byte to find the EOF that way I did in my patch. I.e.:
> 
> Ah, right, *that* is where the extra byte is lost: we need a buffer
> workspace one byte more than the file size, or we won't ever actually
> see the EOF indication.
> 
> I still can't get excited about contorting the code to remove that
> issue.

It doesn't seem much worse than the oom test that was there before -- see 
attached.

In any case I will give you the last word and then quit bugging you about it ;-)

Are we in agreement that whatever gets pushed should be backpatched through pg11
(see start of thread)?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/contrib/adminpack/expected/adminpack.out b/contrib/adminpack/expected/adminpack.out
index 5738b0f..edf3ebf 100644
*** a/contrib/adminpack/expected/adminpack.out
--- b/contrib/adminpack/expected/adminpack.out
*** SELECT pg_file_rename('test_file1', 'tes
*** 79,85 
  (1 row)
  
  SELECT pg_read_file('test_file1');  -- not there
! ERROR:  could not stat file "test_file1": No such file or directory
  SELECT pg_read_file('test_file2');
   pg_read_file 
  --
--- 79,85 
  (1 row)
  
  SELECT pg_read_file('test_file1');  -- not there
! ERROR:  could not open file "test_file1" for reading: No such file or directory
  SELECT pg_read_file('test_file2');
   pg_read_file 
  --
*** SELECT pg_file_rename('test_file2', 'tes
*** 108,114 
  (1 row)
  
  SELECT pg_read_file('test_file2');  -- not there
! ERROR:  could not stat file "test_file2": No such file or directory
  SELECT pg_read_file('test_file3');
   pg_read_file 
  --
--- 108,114 
  (1 row)
  
  SELECT pg_read_file('test_file2');  -- not there
! ERROR:  could not open file "test_file2" for reading: No such file or directory
  SELECT pg_read_file('test_file3');
   pg_read_file 
  --
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index ceaa618..32a5eab 100644
*** a/src/backend/utils/adt/genfile.c
--- b/src/backend/utils/adt/genfile.c
*** read_binary_file(const char *filename, i
*** 106,138 
   bool missing_ok)
  {
  	bytea	   *buf;
! 	size_t		nbytes;
  	FILE	   *file;
  
! 	if (bytes_to_read < 0)
! 	{
! 		if (seek_offset < 0)
! 			bytes_to_read = -seek_offset;
! 		else
! 		{
! 			struct stat fst;
! 
! 			if (stat(filename, ) < 0)
! 			{
! if (missing_ok && errno == ENOENT)
! 	return NULL;
! else
! 	ereport(ERROR,
! 			(errcode_for_file_access(),
! 			 errmsg("could not stat file \"%s\": %m", filename)));
! 			}
! 
! 			bytes_to_read = fst.st_size - seek_offset;
! 		}
! 	}
! 
! 	/* not sure why anyone thought that int64 length was a good idea */
! 	if (bytes_to_read > (MaxAllocSize - VARHDRSZ))
  		ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("requested length too large")));
--- 106,116 
   bool missing_ok)
  {
  	bytea	   *buf;
! 	size_t		nbytes = 0;
  	FILE	   *file;
  
! 	/* clamp request size to what we can actually deliver */
! 	if (bytes_to_read > (int64) (MaxAllocSize - VARHDRSZ))
  		ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("requested length too large")));
*** read_binary_file(const char *filename, i
*** 154,162 
  (errcode_for_file_access(),
   errmsg("could not seek in file \"%s\": %m", filename)));
  
! 	buf = (bytea *) palloc((Size) bytes_to_read + VARHDRSZ);
  
! 	nbytes = fread(VARDATA(buf), 1, (size_t) bytes_to_read, file);
  
  	if (ferror(file))
  		ereport(ERROR,
--- 132,199 
  (errcode_for_file_access(),
   errmsg("could not seek in file \"%s\": %m", filename)));
  
! 	if (bytes_to_read >= 0)
! 	{
! 		/* If passed explicit read size just do it */
! 		buf = (bytea *) palloc((Size) bytes_to_read + VARHDRSZ);
  
! 		nbytes = fread(VARDATA(buf), 1, (size_t) bytes_to_read, file);
! 	}
! 	else
! 	{
! 		/* Negative read size, read rest of file */
! 		StringInfoData sbuf;
! 
! 		initStringInfo();
! 		/* Leave room in the buffer for the varlena length word */
! 		sbuf.len += VARHDRSZ;
! 		Assert(sbuf.len < sbuf.maxlen);
! 
! 		while (!(feof(file) || ferror(file)))
! 		{
! 			size_t		rbytes;
! 
! 			/* Minimum amount to read at a time */
! #define MIN_READ_SIZE 4096
! 
! 			/*
! 			 * If not at end of file, and sbuf.len is equal to
! 			 * MaxAllocSize - 1, then either the file is too large, or
! 			 * there is nothing left to read. Attempt to read one more
! 			 * byte to see if the end of file has been reached. If not,
! 			 * the file is too large; we'd rather give the error message
! 			 * for that ourselves.
! 			 */
! 			if 

Re: [patch] demote

2020-07-02 Thread Jehan-Guillaume de Rorthais
Hi,

Here is a small activity summary since last report.

On Thu, 25 Jun 2020 19:27:54 +0200
Jehan-Guillaume de Rorthais  wrote:
[...]
> I hadn't time to investigate Robert's concern about shared memory for snapshot
> during recovery.

I hadn't time to dig very far, but I suppose this might be related to the
comment in ProcArrayShmemSize(). If I'm right, then it seems the space is
already allocated as long as hot_standby is enabled. I realize it doesn't means
we are on the safe side of the fence though. I still have to have a better
understanding on this.

> The patch doesn't deal with prepared xact yet. Testing
> "start->demote->promote" raise an assert if some prepared xact exist. I
> suppose I will rollback them during demote in next patch version.

Rollback all prepared transaction on demote seems easy. However, I realized
there's no point to cancel them. After the demote action, they might still be
committed later on a promoted instance.

I am currently trying to clean shared memory for existing prepared transaction
so they are handled by the startup process during recovery.
I've been able to clean TwoPhaseState and the ProcArray. I'm now in the
process to clean remaining prepared xact locks.

Regards,




Re: Enabling B-Tree deduplication by default

2020-07-02 Thread Peter Geoghegan
On Thu, Jun 25, 2020 at 4:28 PM Peter Geoghegan  wrote:
> It's now time to make a final decision on this. Does anyone have any
> reason to believe that leaving deduplication enabled by default is the
> wrong way to go?

I marked the open item resolved just now -- B-Tree deduplication will
remain enabled by default in Postgres 13.

-- 
Peter Geoghegan




Re: pg_read_file() with virtual files returns empty string

2020-07-02 Thread Tom Lane
Joe Conway  writes:
> On 7/2/20 4:27 PM, Tom Lane wrote:
>> Huh, I wonder why it's not max - 5.  Probably not worth worrying about,
>> though.

> Well this part:

> + rbytes = fread(sbuf.data + sbuf.len, 1,
> +(size_t) (sbuf.maxlen - sbuf.len - 1), file);
> could actually be:
> + rbytes = fread(sbuf.data + sbuf.len, 1,
> +(size_t) (sbuf.maxlen - sbuf.len), file);
> because there is no actual need to reserve a byte for the trailing null, since
> we are not using appendBinaryStringInfo() anymore, and that is where the
> trailing NULL gets written.

No, I'd put a big -1 on that, because so far as stringinfo.c is concerned
you're violating the invariant that len must be less than maxlen.  The fact
that you happen to not hit any assertions right at the moment does not
make this code okay.

> In fact, in principle there is no reason we can't get to max - 4 with this 
> code
> except that when the filesize is exactly 1073741819, we need to try to read 
> one
> more byte to find the EOF that way I did in my patch. I.e.:

Ah, right, *that* is where the extra byte is lost: we need a buffer
workspace one byte more than the file size, or we won't ever actually
see the EOF indication.

I still can't get excited about contorting the code to remove that
issue.

regards, tom lane




Re: pg_read_file() with virtual files returns empty string

2020-07-02 Thread Joe Conway
On 7/2/20 4:27 PM, Tom Lane wrote:
> Joe Conway  writes:
>> When I saw originally MaxAllocSize - 5 fail I skipped to something smaller by
>> 4096 and it worked. But here I see that the actual max size is MaxAllocSize 
>> - 6.
> 
> Huh, I wonder why it's not max - 5.  Probably not worth worrying about,
> though.

Well this part:

+   rbytes = fread(sbuf.data + sbuf.len, 1,
+  (size_t) (sbuf.maxlen - sbuf.len - 1), file);

could actually be:

+   rbytes = fread(sbuf.data + sbuf.len, 1,
+  (size_t) (sbuf.maxlen - sbuf.len), file);

because there is no actual need to reserve a byte for the trailing null, since
we are not using appendBinaryStringInfo() anymore, and that is where the
trailing NULL gets written.

With that change (and some elog(NOTICE,...) calls) we have:

select length(pg_read_binary_file('/tmp/rbftest2.bin'));
NOTICE:  loop start - buf max len: 1024; buf len 4
NOTICE:  loop end - buf max len: 8192; buf len 8192
NOTICE:  loop start - buf max len: 8192; buf len 8192
NOTICE:  loop end - buf max len: 16384; buf len 16384
NOTICE:  loop start - buf max len: 16384; buf len 16384
[...]
NOTICE:  loop end - buf max len: 536870912; buf len 536870912
NOTICE:  loop start - buf max len: 536870912; buf len 536870912
NOTICE:  loop end - buf max len: 1073741823; buf len 1073741822
   length

 1073741818
(1 row)

Or max - 5, so we got our byte back :-)

In fact, in principle there is no reason we can't get to max - 4 with this code
except that when the filesize is exactly 1073741819, we need to try to read one
more byte to find the EOF that way I did in my patch. I.e.:

-- use 1073741819 byte file
select length(pg_read_binary_file('/tmp/rbftest1.bin'));
NOTICE:  loop start - buf max len: 1024; buf len 4
NOTICE:  loop end - buf max len: 8192; buf len 8192
NOTICE:  loop start - buf max len: 8192; buf len 8192
NOTICE:  loop end - buf max len: 16384; buf len 16384
NOTICE:  loop start - buf max len: 16384; buf len 16384
[...]
NOTICE:  loop end - buf max len: 536870912; buf len 536870912
NOTICE:  loop start - buf max len: 536870912; buf len 536870912
NOTICE:  loop end - buf max len: 1073741823; buf len 1073741823
NOTICE:  loop start - buf max len: 1073741823; buf len 1073741823
ERROR:  requested length too large

Because we read the last byte, but not beyond, EOF is not reached, so on the
next loop iteration we continue and fail on max size rather than exit the loop.

But I am guessing that test in particular was what you thought too complicated
for what it accomplishes?

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Warn when parallel restoring a custom dump without data offsets

2020-07-02 Thread Tom Lane
David Gilman  writes:
> I've attached the latest patches after further review from Justin Pryzby.

I guess I'm completely confused about the purpose of these patches.
Far from coping with the situation of an unseekable file, they appear
to change pg_restore so that it fails altogether if it can't seek
its input file.  Why would we want to do this?

regards, tom lane




Re: pg_read_file() with virtual files returns empty string

2020-07-02 Thread Tom Lane
Joe Conway  writes:
> When I saw originally MaxAllocSize - 5 fail I skipped to something smaller by
> 4096 and it worked. But here I see that the actual max size is MaxAllocSize - 
> 6.

Huh, I wonder why it's not max - 5.  Probably not worth worrying about,
though.

regards, tom lane




Re: estimation problems for DISTINCT ON with FDW

2020-07-02 Thread Tom Lane
Concretely, I now propose the attached, which seems entirely
safe to back-patch.

regards, tom lane

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..1e997c218b 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -123,6 +123,14 @@ GetForeignRelSize(PlannerInfo *root,
  should be replaced if at all possible.  The function may also choose to
  update baserel-width if it can compute a better estimate
  of the average result row width.
+ (The initial value is based on column data types and on column
+ average-width values measured by the last ANALYZE.)
+ Also, this function may update baserel-tuples if
+ it can compute a better estimate of the foreign table's total row count.
+ (The initial value is
+ from pg_class.reltuples
+ which represents the total row count seen by the
+ last ANALYZE.)
 
 
 
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d984da25d7..63761d5593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -911,6 +911,9 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* ... but do not let it set the rows estimate to zero */
 	rel->rows = clamp_row_est(rel->rows);
+
+	/* also, make sure rel->tuples is not insane relative to rel->rows */
+	rel->tuples = Max(rel->tuples, rel->rows);
 }
 
 /*


Re: pg_read_file() with virtual files returns empty string

2020-07-02 Thread Joe Conway
On 7/2/20 3:36 PM, Tom Lane wrote:
> Joe Conway  writes:
>> On 7/1/20 6:22 PM, Tom Lane wrote:
>>> Hm, I was expecting that the last successful iteration of
>>> enlargeStringInfo would increase the buffer size to MaxAllocSize,
>>> so that we'd really only be losing one byte (which we can't avoid
>>> if we use stringinfo).  But you're right that it's most likely moot
>>> since later manipulations of such a result would risk hitting overflows.
> 
>> Sorry to open this can of worms again, but I couldn't get my head past the 
>> fact
>> that reading the entire file would have a different size limit than reading 
>> the
>> exact number of bytes in the file.
> 
> Are you sure there actually is any such limit in the other code,
> after accounting for the way that stringinfo.c will enlarge its
> buffer?  That is, I believe that the limit is MaxAllocSize minus
> five bytes, not something less.
> 
>> So, inspired by what you did (and StringInfo itself) I came up with the
>> attached. This version performs equivalently to your patch (and HEAD), and
>> allows files up to and including (MaxAllocSize - VARHDRSZ) -- i.e. exactly 
>> the
>> same as the specified-length case and legacy behavior for the full file read.
> 
> I find this way overcomplicated for what it accomplishes.  In the
> real world there's not much difference between MaxAllocSize minus
> five and MaxAllocSize minus four.

Ok, so your version was not as bad as I thought.:

ll /tmp/rbftest*.bin
-rw-r--r-- 1 postgres postgres 1073741819 Jul  2 15:48 /tmp/rbftest1.bin
-rw-r--r-- 1 postgres postgres 1073741818 Jul  2 15:47 /tmp/rbftest2.bin
-rw-r--r-- 1 postgres postgres 1073741817 Jul  2 15:53 /tmp/rbftest3.bin

rbftest1.bin == MaxAllocSize - 4
rbftest2.bin == MaxAllocSize - 5
rbftest3.bin == MaxAllocSize - 6

postgres=# select length(pg_read_binary_file('/tmp/rbftest1.bin'));
ERROR:  requested length too large
postgres=# select length(pg_read_binary_file('/tmp/rbftest2.bin'));
ERROR:  requested length too large
postgres=# select length(pg_read_binary_file('/tmp/rbftest3.bin'));
   length

 1073741817

When I saw originally MaxAllocSize - 5 fail I skipped to something smaller by
4096 and it worked. But here I see that the actual max size is MaxAllocSize - 6.
I guess I can live with that.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Sync vs Flush

2020-07-02 Thread Jaka Jančar
Makes sense, thanks!

On Thu, Jul 2, 2020 at 15:29 Tom Lane  wrote:

> =?UTF-8?B?SmFrYSBKYW7EjWFy?=  writes:
> > What is a common situation for using Flush instead of Sync?
> > When would you need and wait for the output, get an error, yet still
> > proceed to send further messages that you would want the server to
> ignore?
>
> The only case I can think of offhand is bursting some time-consuming
> queries to the server, that is sending this all at once:
>
>Execute, Flush, Execute, Flush, Execute, Flush, Execute, Sync
>
> This presumes that, if an earlier query fails, you want the rest
> to be abandoned; else you'd use Syncs instead.  But if you leave
> out the Flushes then you won't see the tail end of (or indeed
> maybe none of) the output of an earlier query until a later query
> fills the server's output buffer.  So if you're hoping to overlap
> the client's processing with the server's you want the extra flushes.
>
> regards, tom lane
>


Re: Use of "long" in incremental sort code

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 12:47 PM James Coleman  wrote:
> But wouldn't that mean we'd get int on 32-bit systems, and since we're
> accumulating data we could go over that value in both memory and disk?
>
> My assumption is that it's preferable to have the "this run value" and
> the "total used across multiple runs" and both of those for disk and
> memory to be the same. In that case it seems we want to guarantee
> 64-bits.

I agree. There seems to be little reason to accommodate platform level
conventions, beyond making sure that everything works on less popular
or obsolete platforms.

I suppose that it's a little idiosyncratic to use int64 like this. But
it makes sense, and isn't nearly as ugly as the long thing, so I don't
think that it should really matter.

-- 
Peter Geoghegan




Re: Use of "long" in incremental sort code

2020-07-02 Thread Tom Lane
James Coleman  writes:
> On Thu, Jul 2, 2020 at 3:39 PM Tom Lane  wrote:
>> mumble ssize_t mumble

> But wouldn't that mean we'd get int on 32-bit systems, and since we're
> accumulating data we could go over that value in both memory and disk?

Certainly, a number that's meant to represent the amount of data *on disk*
shouldn't use ssize_t.  But I think it's appropriate if you want to
represent in-memory quantities while also allowing negative values.

I guess if you're expecting in-memory sizes exceeding 2GB, you might worry
that ssize_t could overflow.  I'm dubious that a 32-bit machine could get
to that, though, seeing that it's going to have other demands on its
address space.

> My assumption is that it's preferable to have the "this run value" and
> the "total used across multiple runs" and both of those for disk and
> memory to be the same. In that case it seems we want to guarantee
> 64-bits.

If you're not going to distinguish in-memory from not-in-memory, agreed.

regards, tom lane




Re: Use of "long" in incremental sort code

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 12:44 PM Tom Lane  wrote:
> > That's from POSIX, though. I imagine MSVC won't be happy (surprise!).
>
> We've got quite a few uses of it already, so apparently it's fine.

Oh, looks like we have a compatibility hack for MSVC within
win32_port.h, where ssize_t is typedef'd to __int64. I didn't realize
that it was okay to use ssize_t.

-- 
Peter Geoghegan




Re: Use of "long" in incremental sort code

2020-07-02 Thread James Coleman
On Thu, Jul 2, 2020 at 3:39 PM Tom Lane  wrote:
>
> Peter Geoghegan  writes:
> > On Thu, Jul 2, 2020 at 10:53 AM James Coleman  wrote:
> >> Do you think it's reasonable to use int64 across the board for memory
> >> and disk space numbers then? If so, I can update the patch.
>
> > Using int64 as a replacement for long is the safest general strategy,
>
> mumble ssize_t mumble

But wouldn't that mean we'd get int on 32-bit systems, and since we're
accumulating data we could go over that value in both memory and disk?

My assumption is that it's preferable to have the "this run value" and
the "total used across multiple runs" and both of those for disk and
memory to be the same. In that case it seems we want to guarantee
64-bits.

Patch using int64 attached.

James


v2-0001-Use-int64-instead-of-long-for-space-used-variable.patch
Description: Binary data


Re: Use of "long" in incremental sort code

2020-07-02 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Jul 2, 2020 at 12:39 PM Tom Lane  wrote:
>> mumble ssize_t mumble

> That's from POSIX, though. I imagine MSVC won't be happy (surprise!).

We've got quite a few uses of it already, so apparently it's fine.

regards, tom lane




Re: Use of "long" in incremental sort code

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 12:39 PM Tom Lane  wrote:
> mumble ssize_t mumble

That's from POSIX, though. I imagine MSVC won't be happy (surprise!).

-- 
Peter Geoghegan




Re: Use of "long" in incremental sort code

2020-07-02 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Jul 2, 2020 at 10:53 AM James Coleman  wrote:
>> Do you think it's reasonable to use int64 across the board for memory
>> and disk space numbers then? If so, I can update the patch.

> Using int64 as a replacement for long is the safest general strategy,

mumble ssize_t mumble

regards, tom lane




Re: pg_read_file() with virtual files returns empty string

2020-07-02 Thread Tom Lane
Joe Conway  writes:
> On 7/1/20 6:22 PM, Tom Lane wrote:
>> Hm, I was expecting that the last successful iteration of
>> enlargeStringInfo would increase the buffer size to MaxAllocSize,
>> so that we'd really only be losing one byte (which we can't avoid
>> if we use stringinfo).  But you're right that it's most likely moot
>> since later manipulations of such a result would risk hitting overflows.

> Sorry to open this can of worms again, but I couldn't get my head past the 
> fact
> that reading the entire file would have a different size limit than reading 
> the
> exact number of bytes in the file.

Are you sure there actually is any such limit in the other code,
after accounting for the way that stringinfo.c will enlarge its
buffer?  That is, I believe that the limit is MaxAllocSize minus
five bytes, not something less.

> So, inspired by what you did (and StringInfo itself) I came up with the
> attached. This version performs equivalently to your patch (and HEAD), and
> allows files up to and including (MaxAllocSize - VARHDRSZ) -- i.e. exactly the
> same as the specified-length case and legacy behavior for the full file read.

I find this way overcomplicated for what it accomplishes.  In the
real world there's not much difference between MaxAllocSize minus
five and MaxAllocSize minus four.

regards, tom lane




Re: Sync vs Flush

2020-07-02 Thread Tom Lane
=?UTF-8?B?SmFrYSBKYW7EjWFy?=  writes:
> What is a common situation for using Flush instead of Sync?
> When would you need and wait for the output, get an error, yet still
> proceed to send further messages that you would want the server to ignore?

The only case I can think of offhand is bursting some time-consuming
queries to the server, that is sending this all at once:

   Execute, Flush, Execute, Flush, Execute, Flush, Execute, Sync

This presumes that, if an earlier query fails, you want the rest
to be abandoned; else you'd use Syncs instead.  But if you leave
out the Flushes then you won't see the tail end of (or indeed
maybe none of) the output of an earlier query until a later query
fills the server's output buffer.  So if you're hoping to overlap
the client's processing with the server's you want the extra flushes.

regards, tom lane




Re: Persist MVCC forever - retain history

2020-07-02 Thread Thomas Munro
On Fri, Jul 3, 2020 at 6:56 AM Mitar  wrote:
> I was thinking and reading about how to design the schema to keep
> records of all changes which happen to the table, at row granularity,
> when I realized that all this is already done for me by PostgreSQL
> MVCC. All rows (tuples) are already stored, with an internal version
> field as well.

This was a research topic in ancient times (somewhere I read that in
some ancient version, VACUUM didn't originally remove tuples, it moved
them to permanent write-only storage).  Even after the open source
project began, there was a "time travel" feature, but it was removed
in 6.2:

https://www.postgresql.org/docs/6.3/c0503.htm

> So I wonder, how could I hack PostgreSQL to disable vacuuming a table,
> so that all tuples persist forever, and how could I make those
> internal columns visible so that I could make queries asking for
> results at the particular historical version of table state? My
> understanding is that indices are already indexing over those internal
> columns as well, so those queries over historical versions would be
> efficient as well. Am I missing something which would make this not
> possible?

There aren't indexes on those things.

If you want to keep track of all changes in a way that lets you query
things as of historical times, including joins, and possibly including
multiple time dimensions ("on the 2nd of Feb, what address did we
think Fred lived at on the 1st of Jan?") you might want to read
"Developing Time-Oriented Database Applications in SQL" about this,
freely available as a PDF[1].  There's also a bunch of temporal
support in more recent SQL standards, not supported by PostgreSQL, and
it was designed by the author of that book.  There are people working
on trying to implement parts of the standard support for PostgreSQL.

> Is this something I would have to run a custom version of PostgreSQL
> or is this possible through an extension of sort?

There are some extensions that offer some temporal support inspired by
the standard (I haven't used any of them so I can't comment on them).

[1] http://www2.cs.arizona.edu/~rts/publications.html




Re: Persist MVCC forever - retain history

2020-07-02 Thread David G. Johnston
On Thursday, July 2, 2020, Mitar  wrote:


> make queries asking for
> results at the particular historical version of table state?


Even for a single table how would you go about specifying this in a
user-friendly way?  Then consider joins.


> Is this something I would have to run a custom version of PostgreSQL
> or is this possible through an extension of sort?
>

 If by “this” you mean leveraging MVCC you don’t; it isn’t suitable for
persistent temporal data.

The fundamental missing piece is that there is no concept of timestamp in
MVCC. Plus, wrap-around and freezing aren’t just nice-to-have features.

David J.


Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

(Sorry if this was already discussed, it looks pretty obvious, but I
could not find anything.)

I was thinking and reading about how to design the schema to keep
records of all changes which happen to the table, at row granularity,
when I realized that all this is already done for me by PostgreSQL
MVCC. All rows (tuples) are already stored, with an internal version
field as well.

So I wonder, how could I hack PostgreSQL to disable vacuuming a table,
so that all tuples persist forever, and how could I make those
internal columns visible so that I could make queries asking for
results at the particular historical version of table state? My
understanding is that indices are already indexing over those internal
columns as well, so those queries over historical versions would be
efficient as well. Am I missing something which would make this not
possible?

Is this something I would have to run a custom version of PostgreSQL
or is this possible through an extension of sort?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Use of "long" in incremental sort code

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 10:53 AM James Coleman  wrote:
> Do you think it's reasonable to use int64 across the board for memory
> and disk space numbers then? If so, I can update the patch.

Using int64 as a replacement for long is the safest general strategy,
and so ISTM that it might be worth doing that even in cases where it
isn't clearly necessary. After all, any code that uses long must have
been written with the assumption that that was the same thing as
int64, at least on most platforms.

There is nothing wrong with using Size/size_t, and doing so is often
slightly clearer. But it's no drop-in replacement for long.

-- 
Peter Geoghegan




Re: pg_read_file() with virtual files returns empty string

2020-07-02 Thread Joe Conway
On 7/1/20 6:22 PM, Tom Lane wrote:
> Joe Conway  writes:
>> The only downside is that the max filesize is reduced to (MaxAllocSize -
>> MIN_READ_SIZE - 1) compared to MaxAllocSize with the old method.
> 
> Hm, I was expecting that the last successful iteration of
> enlargeStringInfo would increase the buffer size to MaxAllocSize,
> so that we'd really only be losing one byte (which we can't avoid
> if we use stringinfo).  But you're right that it's most likely moot
> since later manipulations of such a result would risk hitting overflows.
> 
> I marked the CF entry as RFC.

Sorry to open this can of worms again, but I couldn't get my head past the fact
that reading the entire file would have a different size limit than reading the
exact number of bytes in the file.

So, inspired by what you did (and StringInfo itself) I came up with the
attached. This version performs equivalently to your patch (and HEAD), and
allows files up to and including (MaxAllocSize - VARHDRSZ) -- i.e. exactly the
same as the specified-length case and legacy behavior for the full file read.

But if you object I will just go with your version barring any other opinions.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/contrib/adminpack/expected/adminpack.out b/contrib/adminpack/expected/adminpack.out
index 5738b0f..edf3ebf 100644
*** a/contrib/adminpack/expected/adminpack.out
--- b/contrib/adminpack/expected/adminpack.out
*** SELECT pg_file_rename('test_file1', 'tes
*** 79,85 
  (1 row)
  
  SELECT pg_read_file('test_file1');  -- not there
! ERROR:  could not stat file "test_file1": No such file or directory
  SELECT pg_read_file('test_file2');
   pg_read_file 
  --
--- 79,85 
  (1 row)
  
  SELECT pg_read_file('test_file1');  -- not there
! ERROR:  could not open file "test_file1" for reading: No such file or directory
  SELECT pg_read_file('test_file2');
   pg_read_file 
  --
*** SELECT pg_file_rename('test_file2', 'tes
*** 108,114 
  (1 row)
  
  SELECT pg_read_file('test_file2');  -- not there
! ERROR:  could not stat file "test_file2": No such file or directory
  SELECT pg_read_file('test_file3');
   pg_read_file 
  --
--- 108,114 
  (1 row)
  
  SELECT pg_read_file('test_file2');  -- not there
! ERROR:  could not open file "test_file2" for reading: No such file or directory
  SELECT pg_read_file('test_file3');
   pg_read_file 
  --
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index ceaa618..1515032 100644
*** a/src/backend/utils/adt/genfile.c
--- b/src/backend/utils/adt/genfile.c
*** read_binary_file(const char *filename, i
*** 106,138 
   bool missing_ok)
  {
  	bytea	   *buf;
! 	size_t		nbytes;
  	FILE	   *file;
  
! 	if (bytes_to_read < 0)
! 	{
! 		if (seek_offset < 0)
! 			bytes_to_read = -seek_offset;
! 		else
! 		{
! 			struct stat fst;
! 
! 			if (stat(filename, ) < 0)
! 			{
! if (missing_ok && errno == ENOENT)
! 	return NULL;
! else
! 	ereport(ERROR,
! 			(errcode_for_file_access(),
! 			 errmsg("could not stat file \"%s\": %m", filename)));
! 			}
! 
! 			bytes_to_read = fst.st_size - seek_offset;
! 		}
! 	}
! 
! 	/* not sure why anyone thought that int64 length was a good idea */
! 	if (bytes_to_read > (MaxAllocSize - VARHDRSZ))
  		ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("requested length too large")));
--- 106,116 
   bool missing_ok)
  {
  	bytea	   *buf;
! 	size_t		nbytes = 0;
  	FILE	   *file;
  
! 	/* clamp request size to what we can actually deliver */
! 	if (bytes_to_read > (int64) (MaxAllocSize - VARHDRSZ))
  		ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("requested length too large")));
*** read_binary_file(const char *filename, i
*** 154,162 
  (errcode_for_file_access(),
   errmsg("could not seek in file \"%s\": %m", filename)));
  
! 	buf = (bytea *) palloc((Size) bytes_to_read + VARHDRSZ);
  
! 	nbytes = fread(VARDATA(buf), 1, (size_t) bytes_to_read, file);
  
  	if (ferror(file))
  		ereport(ERROR,
--- 132,194 
  (errcode_for_file_access(),
   errmsg("could not seek in file \"%s\": %m", filename)));
  
! 	if (bytes_to_read >= 0)
! 	{
! 		/* If passed explicit read size just do it */
! 		buf = (bytea *) palloc((Size) bytes_to_read + VARHDRSZ);
  
! 		nbytes = fread(VARDATA(buf), 1, (size_t) bytes_to_read, file);
! 	}
! 	else
! 	{
! 		/* Negative read size, read rest of file */
! 
! 		/* Chunk size for reads and allocations */
! #define READ_BLOCK_SIZE 4096
! 		/* VARDATA maximum allowed size */
! 		int64	alloc_max = MaxAllocSize - VARHDRSZ;
! 		/* VARDATA allocated size */
! 		Size	maxdatlen = READ_BLOCK_SIZE - VARHDRSZ;
! 
! 		/* initialize the buffer */
! 		buf = (bytea *) palloc(0);
! 
! 		while (!(feof(file) || 

Re: Use of "long" in incremental sort code

2020-07-02 Thread James Coleman
On Thu, Jul 2, 2020 at 1:36 PM Peter Geoghegan  wrote:
>
> On Mon, Jun 29, 2020 at 9:13 PM David Rowley  wrote:
> > I noticed the incremental sort code makes use of the long datatype a
> > few times, e.g in TuplesortInstrumentation and
> > IncrementalSortGroupInfo.
>
> I agree that long is terrible, and should generally be avoided.
>
> > Maybe Size would be better for the in-memory fields and uint64 for the
> > on-disk fields?
>
> FWIW we have to use int64 for the in-memory tuplesort.c fields. This
> is because it must be possible for the fields to have negative values
> in the context of tuplesort. If there is going to be a general rule
> for in-memory fields, then ISTM that it'll have to be "use int64".
>
> logtape.c uses long for on-disk fields. It also relies on negative
> values, albeit to a fairly limited degree (it uses -1 as a magic
> value).

Do you think it's reasonable to use int64 across the board for memory
and disk space numbers then? If so, I can update the patch.

James




Re: Use of "long" in incremental sort code

2020-07-02 Thread Peter Geoghegan
On Mon, Jun 29, 2020 at 9:13 PM David Rowley  wrote:
> I noticed the incremental sort code makes use of the long datatype a
> few times, e.g in TuplesortInstrumentation and
> IncrementalSortGroupInfo.

I agree that long is terrible, and should generally be avoided.

> Maybe Size would be better for the in-memory fields and uint64 for the
> on-disk fields?

FWIW we have to use int64 for the in-memory tuplesort.c fields. This
is because it must be possible for the fields to have negative values
in the context of tuplesort. If there is going to be a general rule
for in-memory fields, then ISTM that it'll have to be "use int64".

logtape.c uses long for on-disk fields. It also relies on negative
values, albeit to a fairly limited degree (it uses -1 as a magic
value).

-- 
Peter Geoghegan




Re: Sync vs Flush

2020-07-02 Thread Jaka Jančar
Hehe, that's exactly what I am doing, which is why I thought of just
sending two Syncs. Good to hear it's OK.

>From reading the Extended query protocol docs, I somehow got the impression
that you need to do everything within one cycle, and send Sync only at the
end of the cycle:

 - "The extended query protocol breaks down the above-described simple
query protocol into multiple steps."
 - "[Only] At completion of each series of extended-query messages, the
frontend should issue a Sync message."
 - "A Flush [and not Sync] must be sent [...] if the frontend wishes to
examine the results of that command before issuing more commands."
 - "The simple Query message is approximately equivalent to the series
Parse, Bind, portal Describe, Execute, Close, Sync."

What is a common situation for using Flush instead of Sync?
When would you need and wait for the output, get an error, yet still
proceed to send further messages that you would want the server to ignore?

Jaka

On Thu, Jul 2, 2020 at 12:41 PM Tom Lane  wrote:

> =?UTF-8?B?SmFrYSBKYW7EjWFy?=  writes:
> > For an extended query that needs to get parameter types before sending
> > them, is there a difference in doing:
>
> > Parse, Describe statement, Flush, Bind, Execute, Sync
> > vs
> > Parse, Describe statement, Sync, Bind, Execute, Sync
>
> Sync is a resync point after an error, so the real question is what
> you want to have happen if you get some kind of error during the Parse.
> If you expect that the app wouldn't proceed with issuing Bind/Execute
> then you want to do it the second way.
>
> I suppose you could do
>
> Send Parse/Describe/Flush
> Read results
> If OK:
>Send Bind/Execute/Sync
> else:
>Send Sync# needed to get back to normal state
>
> but that doesn't sound all that convenient.
>
> regards, tom lane
>


Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 7:48 AM Daniel Gustafsson  wrote:
> This patch fails to apply to HEAD due to conflicts in nbtpage.c, can you 
> please
> submit a rebased version?

I attach the rebased patch series.

Thanks
-- 
Peter Geoghegan


v3-0002-Add-nbtree-Valgrind-buffer-lock-checks.patch
Description: Binary data


v3-0001-Add-Valgrind-buffer-access-instrumentation.patch
Description: Binary data


Re: proposal: unescape_text function

2020-07-02 Thread Pavel Stehule
čt 2. 7. 2020 v 17:27 odesílatel Daniel Gustafsson  napsal:

> > On 23 Jun 2020, at 11:51, Pavel Stehule  wrote:
>
> > I changed the name to more accurately "unicode_unescape". Patch is
> assigned
>
> You've made this function return Oid, where it used to be void.  Was that a
> copy-paste mistake? Else the code needs fixing as it doesn't return an Oid.
>
> +Oid
> +check_unicode_value(pg_wchar c)
> +{
> +   if (!is_valid_unicode_codepoint(c))
> +   ereport(ERROR,
> +   (errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("invalid Unicode escape value")));
> +}
>
>
yes, it is my error

I am sending fixed patch

Thank you for check

Pavel

cheers ./daniel
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f065856535..6aecdf1641 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3539,6 +3539,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ unicode_unescape
+
+unicode_unescape ( string text
+, escape_char text  )
+text
+   
+   
+Evaluate escaped unicode chars (4 or 6 digits) to chars.
+   
+   
+unicode_unescape('\0441\043B\043E\043D')
+слон
+   
+  
+
  
 

diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index be86eb37fe..c7f94298c1 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -26,7 +26,6 @@
 #include "parser/parser.h"
 #include "parser/scansup.h"
 
-static bool check_uescapechar(unsigned char escape);
 static char *str_udeescape(const char *str, char escape,
 		   int position, core_yyscan_t yyscanner);
 
@@ -278,44 +277,6 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
 	return cur_token;
 }
 
-/* convert hex digit (caller should have verified that) to value */
-static unsigned int
-hexval(unsigned char c)
-{
-	if (c >= '0' && c <= '9')
-		return c - '0';
-	if (c >= 'a' && c <= 'f')
-		return c - 'a' + 0xA;
-	if (c >= 'A' && c <= 'F')
-		return c - 'A' + 0xA;
-	elog(ERROR, "invalid hexadecimal digit");
-	return 0;	/* not reached */
-}
-
-/* is Unicode code point acceptable? */
-static void
-check_unicode_value(pg_wchar c)
-{
-	if (!is_valid_unicode_codepoint(c))
-		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("invalid Unicode escape value")));
-}
-
-/* is 'escape' acceptable as Unicode escape character (UESCAPE syntax) ? */
-static bool
-check_uescapechar(unsigned char escape)
-{
-	if (isxdigit(escape)
-		|| escape == '+'
-		|| escape == '\''
-		|| escape == '"'
-		|| scanner_isspace(escape))
-		return false;
-	else
-		return true;
-}
-
 /*
  * Process Unicode escapes in "str", producing a palloc'd plain string
  *
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index cac70d5df7..9d3173bc6d 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -218,3 +218,41 @@ scanner_isspace(char ch)
 		return true;
 	return false;
 }
+
+/* convert hex digit (caller should have verified that) to value */
+unsigned int
+hexval(unsigned char c)
+{
+	if (c >= '0' && c <= '9')
+		return c - '0';
+	if (c >= 'a' && c <= 'f')
+		return c - 'a' + 0xA;
+	if (c >= 'A' && c <= 'F')
+		return c - 'A' + 0xA;
+	elog(ERROR, "invalid hexadecimal digit");
+	return 0;	/* not reached */
+}
+
+/* is Unicode code point acceptable? */
+void
+check_unicode_value(pg_wchar c)
+{
+	if (!is_valid_unicode_codepoint(c))
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid Unicode escape value")));
+}
+
+/* is 'escape' acceptable as Unicode escape character (UESCAPE syntax) ? */
+bool
+check_uescapechar(unsigned char escape)
+{
+	if (isxdigit(escape)
+		|| escape == '+'
+		|| escape == '\''
+		|| escape == '"'
+		|| scanner_isspace(escape))
+		return false;
+	else
+		return true;
+}
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index df10bfb906..ce8373c417 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -6139,3 +6139,202 @@ unicode_is_normalized(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(result);
 }
+
+/*
+ * Process Unicode escapes in "str"
+ *
+ * escape: the escape character to use
+ */
+static void
+udeescape(StringInfo str, const char *instr, size_t len, char escape)
+{
+	pg_wchar	pair_first = 0;
+	char		cbuf[MAX_UNICODE_EQUIVALENT_STRING + 1];
+
+	while (len > 0)
+	{
+		if (instr[0] == escape)
+		{
+			if (len >= 2 &&
+instr[1] == escape)
+			{
+if (pair_first)
+	goto invalid_pair;
+appendStringInfoChar(str, escape);
+instr += 2;
+len -= 2;
+			}
+			else if (len >= 5 &&
+	 isxdigit((unsigned char) instr[1]) &&
+	 isxdigit((unsigned char) instr[2]) &&
+	 isxdigit((unsigned char) instr[3]) &&
+	 isxdigit((unsigned char) instr[4]))
+			{
+pg_wchar	unicode;
+
+unicode = (hexval(instr[1]) << 12) +
+	(hexval(instr[2]) << 8) +
+	(hexval(instr[3]) << 4) +
+	

Re: cannot restore schema with is not distinct from on hstore since PG 9.6.8

2020-07-02 Thread Tom Lane
[ just to tie back to this old thread ]

I wrote:
> I poked into this area for awhile, and it turns out to be even a
> worse can of worms than I thought.  I looked through gram.y and
> parse_expr.c, and identified several distinct classes of issue.
> (I'm not promising that I found everything.)

In fact, the thread at [1] identifies an essentially similar class
of issue that I missed: JOIN ... USING (x) also implicitly looks up
an equality operator by seeing what "tab1.x = tab2.x" resolves as.
This is much like the CASE situation, in that there's no
SQL-standard-compliant way to reverse-list a view containing
such a construct while showing how it was resolved.  We'd need
to invent some new syntax if we want to make this safer.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CAC35HNnNGavaZ%3DP%3DrUcwTwYEhfoyXDg32REXCRDgxBmC3No3nA%40mail.gmail.com




Re: Sync vs Flush

2020-07-02 Thread Tom Lane
=?UTF-8?B?SmFrYSBKYW7EjWFy?=  writes:
> For an extended query that needs to get parameter types before sending
> them, is there a difference in doing:

> Parse, Describe statement, Flush, Bind, Execute, Sync
> vs
> Parse, Describe statement, Sync, Bind, Execute, Sync

Sync is a resync point after an error, so the real question is what
you want to have happen if you get some kind of error during the Parse.
If you expect that the app wouldn't proceed with issuing Bind/Execute
then you want to do it the second way.

I suppose you could do

Send Parse/Describe/Flush
Read results
If OK:
   Send Bind/Execute/Sync
else:
   Send Sync# needed to get back to normal state

but that doesn't sound all that convenient.

regards, tom lane




Re: Binary support for pgoutput plugin

2020-07-02 Thread Dave Cramer
rebased

Thanks,

Dave Cramer


On Wed, 1 Jul 2020 at 06:43, Dave Cramer  wrote:

> Honestly I'm getting a little weary of fixing this up only to have the
> patch not get reviewed.
>
> Apparently it has no value so unless someone is willing to review it and
> get it committed I'm just going to let it go.
>
> Thanks,
>
> Dave Cramer
>
>
> On Wed, 1 Jul 2020 at 04:53, Daniel Gustafsson  wrote:
>
>> > On 7 Apr 2020, at 21:45, Dave Cramer  wrote:
>>
>> > New patch that fixes a number of errors in the check for validity as
>> well as reduces the memory usage by
>> > dynamically allocating the data changed as well as collapsing the
>> changed and binary arrays into a format array.
>>
>> The 0001 patch fails to apply, and possibly other in the series.  Please
>> submit
>> a rebased version. Marking the CF entry as Waiting for Author in the
>> meantime.
>>
>> cheers ./daniel
>
>


0001-First-pass-at-working-code-without-subscription-opti.patch
Description: Binary data


0005-Remove-C99-declaration-and-code.patch
Description: Binary data


0002-add-binary-column-to-pg_subscription.patch
Description: Binary data


0004-get-relid-inside-of-logical_read_insert.patch
Description: Binary data


0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
Description: Binary data


0006-make-sure-the-subcription-is-marked-as-binary.patch
Description: Binary data


0007-check-that-the-subscriber-is-compatible-with-the-pub.patch
Description: Binary data


0008-Changed-binary-and-changed-to-format-and-use-existin.patch
Description: Binary data


why do we allow people to create a publication before setting wal_leve

2020-07-02 Thread Dave Cramer
This seems pretty strange:

create publication pub1 for all tables;

 WARNING:  wal_level is insufficient to publish logical changes
HINT:  Set wal_level to logical before creating subscriptions.

Dave Cramer


Re: [PATCH] fix GIN index search sometimes losing results

2020-07-02 Thread Tom Lane
Pavel Borisov  writes:
> чт, 2 июл. 2020 г. в 19:38, Artur Zakirov :
>> So it seems we are losing some results with RUM as well.

> For me it is 100% predictable that unmodified RUM is still losing results
> as it is still using binary callback.

Right, that's in line with what I expected as well.

> The main my goal of saving binary legacy callback is that side callers like
> RUM will not break immediately but remain in
> existing state (i.e. losing results in some queries).

I don't really see why that should be a goal here.  I think a forced
compile error, calling attention to the fact that there's something
to fix, is a good thing as long as we do it in a major release.

regards, tom lane




Sync vs Flush

2020-07-02 Thread Jaka Jančar
Hi,

For an extended query that needs to get parameter types before sending
them, is there a difference in doing:

Parse, Describe statement, Flush, Bind, Execute, Sync
vs
Parse, Describe statement, Sync, Bind, Execute, Sync

Of course, there will be an additional ReadyForQuery in the latter case,
but other than that.

Thanks!

Jaka


Re: [PATCH] fix GIN index search sometimes losing results

2020-07-02 Thread Pavel Borisov
чт, 2 июл. 2020 г. в 19:38, Artur Zakirov :

> Hello,
>
> On Thu, Jul 2, 2020 at 8:23 PM Pavel Borisov 
> wrote:
> >
> > ср, 1 июл. 2020 г. в 23:16, Tom Lane :
> >>
> >> Pavel Borisov  writes:
> >> > Below is my variant how to patch Gin-Gist weights issue:
> >>
> >> I looked at this patch, but I'm unimpressed, because it's buggy.
> >
> >
> > Thank you, i'd noticed and made minor corrections in the patch. Now it
> should work
> > correctly,
> >
> > As for preserving the option to use legacy bool-style calls, personally
> I see much
> > value of not changing API ad hoc to fix something. This may not harm
> vanilla reseases
> > but can break many possible side things like RUM index etc which I think
> are abundant
> > around there. Furthermore if we leave legacy bool callback along with
> newly proposed and
> > recommended for further use it will cost nothing.
> >
> > So I've attached a corrected patch. Also I wrote some comments to the
> code and added
> > your test as a part of apatch. Again thank you for sharing your thoughts
> and advice.
> >
> > As always I'd appreciate everyone's opinion on the bugfix.
>
> I haven't looked at any of the patches carefully yet. But I tried both of
> them.
>
> I tried Tom's patch. To compile the RUM extension I've made few
> changes to use new
> TS_execute(). Speaking about backward compatibility. I also think that
> it is not so important
> here. And RUM alreadyhas a number of "#if PG_VERSION_NUM" directives. API
> breaks
> from time to time and it seems inevitable.
>
> I also tried "gin-gist-weight-patch-v4.diff". And it didn't require
> changes into RUM. But as
> Tom said above TS_execute() is broken already. Here is the example with
> "gin-gist-weight-patch-v4.diff" and RUM:
>
> =# create extension rum;
> =# create table test (a tsvector);
> =# insert into test values ('wd:1A wr:2D'), ('wd:1A wr:2D');
> =# create index on test using rum (a);
> =# select a from test where a @@ '!wd:D';
>a
> 
>  'wd':1A 'wr':2
>  'wd':1A 'wr':2
> (2 rows)
> =# set enable_seqscan to off;
> =# select a from test where a @@ '!wd:D';
>  a
> ---
> (0 rows)
>
> So it seems we are losing some results with RUM as well.
>
> --
> Artur
>
For me it is 100% predictable that unmodified RUM is still losing results
as it is still using binary callback.
The main my goal of saving binary legacy callback is that side callers like
RUM will not break immediately but remain in
existing state (i.e. losing results in some queries). To fix the issue
completely it is needed to make ternary logic in
Postgres Tsearch AND engage this ternary logic in RUM and other side
modules.

Thank you for your consideration!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Use of "long" in incremental sort code

2020-07-02 Thread James Coleman
On Tue, Jun 30, 2020 at 7:21 AM Peter Eisentraut
 wrote:
>
> On 2020-06-30 06:24, David Rowley wrote:
> > On Tue, 30 Jun 2020 at 16:20, Tom Lane  wrote:
> >> There is a fairly widespread issue that memory-size-related GUCs and
> >> suchlike variables are limited to represent sizes that fit in a "long".
> >> Although Win64 is the *only* platform where that's an issue, maybe
> >> it's worth doing something about.  But we shouldn't just fix the sort
> >> code, if we do do something.
> >>
> >> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.)
> >
> > I raised it mostly because this new-to-PG13-code is making the problem 
> > worse.
>
> Yeah, we recently got rid of a bunch of inappropriate use of long, so it
> seems reasonable to make this new code follow that.

I've attached a patch to make this change but with one tweak: I
decided to use unint64 for both memory and disk (rather than Size in
some cases) since we aggregated across multiple runs and have shared
code that deals with both values.

James


v1-0001-Use-unint64-instead-of-long-for-space-used-variab.patch
Description: Binary data


Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-07-02 Thread James Coleman
It seems like the consensus over at another discussion on this topic
[1] is that we ought to go ahead and print the zeros [for machine
readable output formats], even though that creates some interesting
scenarios like the fact that disk sorts will print 0 for memory even
though that's not true.

The change has already been made and pushed for hash disk spilling, so
I think we ought to use Justin's patch here.

James

[1] https://www.postgresql.org/message-id/2276865.1593102811%40sss.pgh.pa.us




Re: [PATCH] fix GIN index search sometimes losing results

2020-07-02 Thread Artur Zakirov
Hello,

On Thu, Jul 2, 2020 at 8:23 PM Pavel Borisov  wrote:
>
> ср, 1 июл. 2020 г. в 23:16, Tom Lane :
>>
>> Pavel Borisov  writes:
>> > Below is my variant how to patch Gin-Gist weights issue:
>>
>> I looked at this patch, but I'm unimpressed, because it's buggy.
>
>
> Thank you, i'd noticed and made minor corrections in the patch. Now it should 
> work
> correctly,
>
> As for preserving the option to use legacy bool-style calls, personally I see 
> much
> value of not changing API ad hoc to fix something. This may not harm vanilla 
> reseases
> but can break many possible side things like RUM index etc which I think are 
> abundant
> around there. Furthermore if we leave legacy bool callback along with newly 
> proposed and
> recommended for further use it will cost nothing.
>
> So I've attached a corrected patch. Also I wrote some comments to the code 
> and added
> your test as a part of apatch. Again thank you for sharing your thoughts and 
> advice.
>
> As always I'd appreciate everyone's opinion on the bugfix.

I haven't looked at any of the patches carefully yet. But I tried both of them.

I tried Tom's patch. To compile the RUM extension I've made few
changes to use new
TS_execute(). Speaking about backward compatibility. I also think that
it is not so important
here. And RUM alreadyhas a number of "#if PG_VERSION_NUM" directives. API breaks
from time to time and it seems inevitable.

I also tried "gin-gist-weight-patch-v4.diff". And it didn't require
changes into RUM. But as
Tom said above TS_execute() is broken already. Here is the example with
"gin-gist-weight-patch-v4.diff" and RUM:

=# create extension rum;
=# create table test (a tsvector);
=# insert into test values ('wd:1A wr:2D'), ('wd:1A wr:2D');
=# create index on test using rum (a);
=# select a from test where a @@ '!wd:D';
   a

 'wd':1A 'wr':2
 'wd':1A 'wr':2
(2 rows)
=# set enable_seqscan to off;
=# select a from test where a @@ '!wd:D';
 a
---
(0 rows)

So it seems we are losing some results with RUM as well.

-- 
Artur




Re: Built-in connection pooler

2020-07-02 Thread Konstantin Knizhnik



On 02.07.2020 17:44, Daniel Gustafsson wrote:

On 2 Jul 2020, at 13:33, Konstantin Knizhnik  wrote:
On 01.07.2020 12:30, Daniel Gustafsson wrote:

On 24 Mar 2020, at 17:24, Konstantin Knizhnik  wrote:
Rebased version of the patch is attached.

And this patch also fails to apply now, can you please submit a new version?
Marking the entry as Waiting on Author in the meantime.

cheers ./daniel

Rebased version of the patch is attached.

Both Travis and Appveyor fails to compile this version:

proxy.c: In function ‘client_connect’:
proxy.c:302:6: error: too few arguments to function ‘ParseStartupPacket’
   if (ParseStartupPacket(chan->client_port, chan->proxy->parse_ctx, 
startup_packet+4, startup_packet_size-4, false) != STATUS_OK) /* skip packet size */
   ^
In file included from proxy.c:8:0:
../../../src/include/postmaster/postmaster.h:71:12: note: declared here
  extern int  ParseStartupPacket(struct Port* port, MemoryContext memctx, void* 
pkg_body, int pkg_size, bool ssl_done, bool gss_done);
 ^
: recipe for target 'proxy.o' failed
make[3]: *** [proxy.o] Error 1

cheers ./daniel



Sorry, correct patch is attached.

diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c
index 6fbfef2b12..27aa6cba8e 100644
--- a/contrib/spi/refint.c
+++ b/contrib/spi/refint.c
@@ -11,6 +11,7 @@
 
 #include "commands/trigger.h"
 #include "executor/spi.h"
+#include "storage/proc.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -94,6 +95,8 @@ check_primary_key(PG_FUNCTION_ARGS)
 	else
 		tuple = trigdata->tg_newtuple;
 
+	MyProc->is_tainted = true;
+
 	trigger = trigdata->tg_trigger;
 	nargs = trigger->tgnargs;
 	args = trigger->tgargs;
@@ -286,6 +289,8 @@ check_foreign_key(PG_FUNCTION_ARGS)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: cannot process INSERT events");
 
+	MyProc->is_tainted = true;
+
 	/* Have to check tg_trigtuple - tuple being deleted */
 	trigtuple = trigdata->tg_trigtuple;
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b81aab239f..aa435d4066 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -732,6 +732,169 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_sessions (integer)
+  
+   max_sessions configuration parameter
+  
+  
+  
+
+  The maximum number of client sessions that can be handled by
+  one connection proxy when session pooling is enabled.
+  This parameter does not add any memory or CPU overhead, so
+  specifying a large max_sessions value
+  does not affect performance.
+  If the max_sessions limit is reached new connections are not accepted.
+
+
+  The default value is 1000. This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  session_pool_size (integer)
+  
+   session_pool_size configuration parameter
+  
+  
+  
+
+  Enables session pooling and defines the maximum number of
+  backends that can be used by client sessions for each database/user combination.
+  Launched non-tainted backends are never terminated even if there are no active sessions.
+  Backend is considered as tainted if client updates GUCs, creates temporary table or prepared statements.
+  Tainted backend can server only one client.
+
+
+  The default value is 10, so up to 10 backends will serve each database,
+
+  
+ 
+
+ 
+  proxy_port (integer)
+  
+   proxy_port configuration parameter
+  
+  
+  
+
+  Sets the TCP port for the connection pooler.
+  Clients connected to main "port" will be assigned dedicated backends,
+  while client connected to proxy port will be connected to backends through proxy which
+  performs transaction level scheduling. 
+   
+
+  The default value is 6543.
+
+  
+ 
+
+ 
+  connection_proxies (integer)
+  
+   connection_proxies configuration parameter
+  
+  
+  
+
+  Sets number of connection proxies.
+  Postmaster spawns separate worker process for each proxy. Postmaster scatters connections between proxies using one of scheduling policies (round-robin, random, load-balancing).
+  Each proxy launches its own subset of backends.
+  So maximal number of non-tainted backends is  session_pool_size*connection_proxies*databases*roles.
+   
+
+  The default value is 0, so session pooling is disabled.
+
+  
+ 
+
+ 
+  session_schedule (enum)
+  
+   session_schedule configuration parameter
+  
+  
+  
+
+  Specifies scheduling policy for assigning session to proxies in case of
+  connection pooling. Default policy is round-robin.
+
+
+  With 

Re: proposal: unescape_text function

2020-07-02 Thread Daniel Gustafsson
> On 23 Jun 2020, at 11:51, Pavel Stehule  wrote:

> I changed the name to more accurately "unicode_unescape". Patch is assigned

You've made this function return Oid, where it used to be void.  Was that a
copy-paste mistake? Else the code needs fixing as it doesn't return an Oid.

+Oid
+check_unicode_value(pg_wchar c)
+{
+   if (!is_valid_unicode_codepoint(c))
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("invalid Unicode escape value")));
+}

cheers ./daniel




Re: suggest to rename enable_incrementalsort

2020-07-02 Thread James Coleman
I think the change makes a lot of sense. The only reason I had it as
enable_incrementalsort in the first place was trying to broadly
following the existing GUC names, but as has already been pointed out,
there's a lot of variation there, and my version of the patch already
changed it to be more readable (at one point it was
enable_incsort...which is short...but does not have an obvious
meaning).

I've attached a patch to make the change, though if people are
interested in Tom's suggestion of enable_sort_incremental I could
switch to that.

James


v1-0001-Rename-enable_incrementalsort-for-clarity.patch
Description: Binary data


Re: Auxiliary Processes and MyAuxProc

2020-07-02 Thread Alvaro Herrera
On 2020-Mar-26, Mike Palmiotto wrote:

Regarding 0001:

> diff --git a/src/backend/postmaster/subprocess.c 
> b/src/backend/postmaster/subprocess.c
> new file mode 100644
> index 00..3e7a45bf10
> --- /dev/null
> +++ b/src/backend/postmaster/subprocess.c
> @@ -0,0 +1,62 @@
> +/*-
> + *
> + * subprocess.c
> + *
> + * Copyright (c) 2004-2020, PostgreSQL Global Development Group
> + *
> + *
> + * IDENTIFICATION
> + * src/backend/postmaster/syslogger.c

Wrong file identification.

> +static PgSubprocess process_types[] = {
> + {
> + .desc = "checker",
> + .entrypoint = CheckerModeMain
> + },
> + {
...

This array has to match the order in subprocess.h:

> +typedef enum
> +{
> + NoProcessType = -1,
> + CheckerType = 0,
> + BootstrapType,
> + StartupType,
> + BgWriterType,
> + CheckpointerType,
> + WalWriterType,
> + WalReceiverType,/* end of Auxiliary Process Forks */
> +
> + NUMSUBPROCESSTYPES  /* Must be last! */
> +} SubprocessType;

This sort of thing is messy and unfriendly to maintain.  I suggest we
use the same trick as in cmdtaglist.h and rmgrlist.h; see commits
2f9661311b83 and 5a1cd89f8f4a for examples.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Auxiliary Processes and MyAuxProc

2020-07-02 Thread Daniel Gustafsson
> On 2 Jul 2020, at 17:03, Alvaro Herrera  wrote:
> 
> On 2020-Jul-02, Daniel Gustafsson wrote:
> 
>>> On 27 Mar 2020, at 00:30, Mike Palmiotto  
>>> wrote:
>> 
>>> Are these pieces required to make this patchset committable? Is there
>>> anything else needed at this point to make it committable?
>> 
>> The submitted version no longer applies, the 0009 patch has conflicts in
>> postmaster.c.  Can you please submit a new version of the patch?
> 
> If the first 8 patches still apply, please do keep it as needs-review
> though, since we can still give advice on those first parts.

Done.

cheers ./daniel



Re: Auxiliary Processes and MyAuxProc

2020-07-02 Thread Alvaro Herrera
On 2020-Jul-02, Daniel Gustafsson wrote:

> > On 27 Mar 2020, at 00:30, Mike Palmiotto  
> > wrote:
> 
> > Are these pieces required to make this patchset committable? Is there
> > anything else needed at this point to make it committable?
> 
> The submitted version no longer applies, the 0009 patch has conflicts in
> postmaster.c.  Can you please submit a new version of the patch?

If the first 8 patches still apply, please do keep it as needs-review
though, since we can still give advice on those first parts.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

2020-07-02 Thread Daniel Gustafsson
> On 26 Apr 2020, at 02:17, Peter Geoghegan  wrote:

> Attached is v2 of the patch series.

This patch fails to apply to HEAD due to conflicts in nbtpage.c, can you please
submit a rebased version?

cheers ./daniel



Re: estimation problems for DISTINCT ON with FDW

2020-07-02 Thread Tom Lane
Etsuro Fujita  writes:
> On Wed, Jul 1, 2020 at 11:40 PM Tom Lane  wrote:
>> Short of sending a whole second query to the remote server, it's
>> not clear to me how we could get the full table size (or equivalently
>> the target query's selectivity for that table).  The best we realistically
>> can do is to adopt pg_class.reltuples if there's been an ANALYZE of
>> the foreign table.  That case already works (and this proposal doesn't
>> break it).  The problem is what to do when pg_class.reltuples is zero
>> or otherwise badly out-of-date.

> In estimate_path_cost_size(), if use_remote_estimate is true, we
> adjust the rows estimate returned from the remote server, by factoring
> in the selectivity of the locally-checked quals.  I thought what I
> proposed above would be more consistent with that.

No, I don't think that would be very helpful.  There are really three
different numbers of interest here:

1. The actual total rowcount of the remote table.

2. The number of rows returned by the remote query (which is #1 times
the selectivity of the shippable quals).

3. The number of rows returned by the foreign scan (which is #2 times
the selectivity of the non-shippable quals)).

Clearly, rel->rows should be set to #3.  However, what we really want
for rel->tuples is #1.  That's because, to the extent that the planner
inspects rel->tuples at all, it's to adjust whole-table stats such as
we might have from ANALYZE.  What you're suggesting is that we use #2,
but I doubt that that's a big improvement.  In a decently tuned query
it's going to be a lot closer to #3 than to #1.

We could perhaps try to make our own estimate of the selectivity of the
shippable quals and then back into #1 from the value we got for #2 from
the remote server.  But that sounds mighty error-prone, so I doubt it'd
make for much of an improvement.  It also doesn't sound like something
I'd want to back-patch.

Another point here is that, to the extent we are relying on whole-table
stats from the last ANALYZE, pg_class.reltuples is actually the right
value to go along with that.  We could spend a lot of cycles doing
what I just suggested and end up with net-worse estimates.

In any case, the proposal I'm making is just to add a sanity-check
clamp to prevent the worst effects of not setting rel->tuples sanely.
It doesn't foreclose future improvements inside the FDW.

regards, tom lane




Re: Built-in connection pooler

2020-07-02 Thread Daniel Gustafsson
> On 2 Jul 2020, at 13:33, Konstantin Knizhnik  
> wrote:
> On 01.07.2020 12:30, Daniel Gustafsson wrote:
>>> On 24 Mar 2020, at 17:24, Konstantin Knizhnik  
>>> wrote:
>>> Rebased version of the patch is attached.
>> And this patch also fails to apply now, can you please submit a new version?
>> Marking the entry as Waiting on Author in the meantime.
>> 
>> cheers ./daniel
> Rebased version of the patch is attached.

Both Travis and Appveyor fails to compile this version:

proxy.c: In function ‘client_connect’:
proxy.c:302:6: error: too few arguments to function ‘ParseStartupPacket’
  if (ParseStartupPacket(chan->client_port, chan->proxy->parse_ctx, 
startup_packet+4, startup_packet_size-4, false) != STATUS_OK) /* skip packet 
size */
  ^
In file included from proxy.c:8:0:
../../../src/include/postmaster/postmaster.h:71:12: note: declared here
 extern int  ParseStartupPacket(struct Port* port, MemoryContext memctx, void* 
pkg_body, int pkg_size, bool ssl_done, bool gss_done);
^
: recipe for target 'proxy.o' failed
make[3]: *** [proxy.o] Error 1

cheers ./daniel



Re: TRUNCATE on foreign tables

2020-07-02 Thread Daniel Gustafsson
> On 1 Mar 2020, at 03:24, Kohei KaiGai  wrote:

> The attached is revised version.

This version fails to apply to HEAD due to conflicts in postgres_fdw expected
test output.  Can you please submit a rebased version.  Marking the entry
Waiting on Author in the meantime.

cheers ./daniel




Re: [Patch] Use internal pthreads reimplementation only when building with MSVC

2020-07-02 Thread Daniel Gustafsson
> On 9 Apr 2020, at 23:57, Alvaro Herrera  wrote:

> Please do submit patches as separate attachments rather than in the
> email body.

Since the CF app is unable to see that there is a patch at all, I took the
liberty to resubmit the posted patch rebased on top of HEAD and with the C++
replaced with a C /* */ comment.

Marking this entry Waiting on Author based on Alvaros questions.

cheers ./daniel



msvc_pthreads.diff
Description: Binary data


Re: POC and rebased patch for CSN based snapshots

2020-07-02 Thread Movead Li
Thanks for the remarks,



>Some remarks on your patch: 

>1. The variable last_max_csn can be an atomic variable. 

Yes will consider.



>2. GenerateCSN() routine: in the case than csn < csnState->last_max_csn 

>This is the case when someone changed the value of the system clock. I 

>think it is needed to write a WARNING to the log file. (May be we can do 

>synchronization with a time server. 

Yes good point, I will work out a way to report the warning, it should exist a 

report gap rather than report every time it generates CSN.

If we really need a correct time? What's the inferiority if one node generate

csn by monotonically increasing?



>3. That about global snapshot xmin? In the pgpro version of the patch we 

>had GlobalSnapshotMapXmin() routine to maintain circular buffer of 

>oldestXmins for several seconds in past. This buffer allows to shift 

>oldestXmin in the past when backend is importing global transaction. 

>Otherwise old versions of tuples that were needed for this transaction 

>can be recycled by other processes (vacuum, HOT, etc). 

>How do you implement protection from local pruning? I saw 

>SNAP_DESYNC_COMPLAIN, but it is not used anywhere.

I have researched your patch which is so great, in the patch only data

out of 'global_snapshot_defer_time' can be vacuum, and it keep dead

tuple even if no snapshot import at all,right?



I am thanking about a way if we can start remain dead tuple just before

we import a csn snapshot.



Base on Clock-SI paper, we should get local CSN then send to shard nodes,

because we do not known if the shard nodes' csn bigger or smaller then

master node, so we should keep some dead tuple all the time to support

snapshot import anytime.



Then if we can do a small change to CLock-SI model, we do not use the 

local csn when transaction start, instead we touch every shard node for

require their csn, and shard nodes start keep dead tuple, and master node

choose the biggest csn to send to shard nodes.



By the new way, we do not need to keep dead tuple all the time and do

not need to manage a ring buf, we can give to ball to 'snapshot too old'

feature. But for trade off, almost all shard node need wait. 

I will send more detail explain in few days.





>4. The current version of the patch is not applied clearly with current 

>master. 

Maybe it's because of the release of PG13, it cause some conflict, I will

rebase it.



---
Regards,

Highgo Software (Canada/China/Pakistan) 

URL : http://www.highgo.ca/ 

EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: Auxiliary Processes and MyAuxProc

2020-07-02 Thread Daniel Gustafsson
> On 27 Mar 2020, at 00:30, Mike Palmiotto  
> wrote:

> Are these pieces required to make this patchset committable? Is there
> anything else needed at this point to make it committable?

The submitted version no longer applies, the 0009 patch has conflicts in
postmaster.c.  Can you please submit a new version of the patch?

cheers ./daniel



Re: POC: rational number type (fractions)

2020-07-02 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Andrew Dunstan  writes:
> > On 7/1/20 7:00 PM, Stephen Frost wrote:
> >> ... extensions outside of PG simply don't thrive as independent projects.
> >> 
> >> There's various potential reasons for that, from being hard to find, to
> >> being hard to install and work with, to the fact that we don't have a
> >> centralized extension system (PGXN isn't really endorsed at all by
> >> core... and I don't really think it should be), and our general
> >> extension management system isn't particularly great anyway.
> 
> > Then these are things we should fix. But the right fix isn't including
> > every extension in the core code.
> 
> Yeah.  We *must not* simply give up on extensibility and decide that
> every interesting feature has to be in core.  I don't have any great
> ideas about how we grow the wider Postgres development community and
> infrastructure, but that certainly isn't the path to doing so.

I don't see where I was either proposing that we give up extensibility,
or that we have to include every extension in the core code.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

2020-07-02 Thread Daniel Gustafsson
> On 28 Mar 2020, at 19:57, Nikolay Shaplov  wrote:
> 
> A new version of the patch.
> Autovacuum options were extended in b07642db
> 
> So I added that options to the current patch.

The heapam.c hunk in this version fails to apply to HEAD, can you please submit
a rebased version?  Marking the CF entry as Waiting on Author in the meantime.

cheers ./daniel



Re: Let people set host(no)ssl settings from initdb

2020-07-02 Thread Daniel Gustafsson
The CF Patch Tester consider this patch to be malformed and is unable to apply
and test it.  Can you please submit a rebased version?

cheers ./daniel



Re: POC: rational number type (fractions)

2020-07-02 Thread Tom Lane
Andrew Dunstan  writes:
> On 7/1/20 7:00 PM, Stephen Frost wrote:
>> ... extensions outside of PG simply don't thrive as independent projects.
>> 
>> There's various potential reasons for that, from being hard to find, to
>> being hard to install and work with, to the fact that we don't have a
>> centralized extension system (PGXN isn't really endorsed at all by
>> core... and I don't really think it should be), and our general
>> extension management system isn't particularly great anyway.

> Then these are things we should fix. But the right fix isn't including
> every extension in the core code.

Yeah.  We *must not* simply give up on extensibility and decide that
every interesting feature has to be in core.  I don't have any great
ideas about how we grow the wider Postgres development community and
infrastructure, but that certainly isn't the path to doing so.

regards, tom lane




Re: Remove Deprecated Exclusive Backup Mode

2020-07-02 Thread Stephen Frost
Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Thu, Jul 2, 2020 at 1:06 AM Jehan-Guillaume de Rorthais 
> wrote:
> > function to list in progress non-exclusive backup and related backend pid
> > might
> > be a good start?
> 
> I think it would make perfect sense to show manual (exclusive or
> non-exclusive) base backups in pg_stat_progress_basebackup. There's no
> fundamental need that one should only be for base backups taken with
> pg_basebackup. In fact, I'd argue that's a mistake in the view in v13 that
> it does not include this information.

I agree entirely that it was a mistake in v13 to not include this- and
to not include a way for other backup tools to report their progress.
This is a good example of why we really need an in-core non-streamed
backup capability, imv, because if we don't we end up with things like
this that are just thinking about streaming basebackups.  We also have
no in-core code that is user-facing that exercises the low-level backup
API.

> It could have "phase" set to "manual non-exclusive" for example, and leave
> the other fields at NULL.

Yeah.

> I guess in theory even for exclusive ones, with the pid column set to NULL.
> (As Stephen mentioned at some point in the future we might also want to
> extend it to allow these tools to report their progress as well into it,
> probably by just calling an admin function on the connection that they
> already have).

Right, that wouldn't have been hard to include and would have been quite
nice.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2020-07-02 Thread Stephen Frost
Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> On Wed, 2020-07-01 at 16:46 -0400, Stephen Frost wrote:
> > If someone could explain what is so special about *this* part of the
> > system that we absolutely can't possibly accept any change that might
> > break user's scripts, and why it's worth all of the angst, maintenance,
> > ridiculously difficult documentation to understand and hacks (the
> > interface to pg_start_backup is ridiculously warty because of this), I'd
> > greatly appreciate it.
> 
> Easy.  Lots of people write backup scripts, and fewer people write
> complicated pg_catalog queries.  We would make way more users unhappy.

And no one writes scripts to do restores?  No, I don't believe that
(also don't believe that no one writes scripts or queries that use
various catalog tables).  Also, we certainly do get complaints from time
to time from people who got hit by a catalog change, but it's usually
"this stopped working" and you tell them "do this instead" and they're
happy to update their code.  That's why we have major versions, and why
we have back-branches and why we support a given major version for 5
years.

> You have a point about reworking the way recovery works: after teaching
> several classes and running into oddities with recovery parameters in
> postgresql.conf, I am not sure if that was a good move.

If it was a good idea or bad idea is really pretty independent of the
point I was trying to make: we *did* completely change the restore API
without any deprecation period for the old one or period of time where
we supported both or any of the foolishness that we've been doing for
the backup API for the past however many years.  Perhaps the mob of
users who are going to complain about that change hasn't decided to
upgrade to v12 quite yet, but I doubt it.

I'll say it again- I really think we need to stop this special case
where we've decided that we can't possibly make a breaking change to the
backup API even across major versions.  We don't do that, evidently, in
hardly any other cases and it's an altogether good thing- we don't have
ridiculously complicated restore documentation that has to explain two
completely different ways of doing a restore, nor is the code
complicated by having different ways, etc..

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Report error position in partition bound check

2020-07-02 Thread Daniel Gustafsson
> On 10 Apr 2020, at 23:50, Alexandra Wang  wrote:

> On Fri, Apr 10, 2020 at 8:37 AM Ashutosh Bapat 
> mailto:ashutosh.ba...@2ndquadrant.com>> 
> wrote:
> > for a multi-key value the ^
> > points to the first column and the reader may think that that's the
> > problematci column. Should it instead point to ( ?
> 
> I attached a v2 of Amit's 0002 patch to also report the exact column
> for the partition overlap errors.

This patch fails to apply to HEAD due to conflicts in the create_table expected
output.  Can you please submit a rebased version?  I'm marking the CF entry
Waiting on Author in the meantime.

cheers ./daniel



Re: Reducing WaitEventSet syscall churn

2020-07-02 Thread Daniel Gustafsson
> On 13 Mar 2020, at 08:21, Kyotaro Horiguchi  wrote:

> The attached are:
> 0001-0004 Not changed
> 0005 Fix interface of PQregisterEventProc
> 0006 Add new libpq event for this use.
> 0007 Another version of "0006 Reuse a WaitEventSet in
> libpqwalreceiver.c" based on libpq event.
> 0008-0011 Not changed (old 0007-0010, blindly appended)

Since 0001 has been applied already in 9b8aa0929390a, the patchtester is unable
to make heads or tails with trying this patchset.  Can you please submit a
rebased version without the already applied changes?

cheers ./daniel



Re: Recognizing superuser in pg_hba.conf

2020-07-02 Thread Vik Fearing
On 7/2/20 3:14 PM, Daniel Gustafsson wrote:
>> On 30 Mar 2020, at 20:28, Tom Lane  wrote:
>>
>> Tomas Vondra  writes:
>>> I see this patch is marked as RFC since 12/30, but there seems to be
>>> quite a lot of discussion about the syntax, keywords and how exactly to
>>> identify the superuser. So I'll switch it back to needs review, which I
>>> think is a better representation of the current state.
>>
>> Somebody switched it to RFC again, despite the facts that
>>
>> (a) there is absolutely no consensus about what syntax to use
>> (and some of the proposals imply very different patches),
>>
>> (b) there's been no discussion at all since the last CF, and
>>
>> (c) the patch is still failing in the cfbot (src/test/ssl fails).
>>
>> While resolving (c) would seem to be the author's problem, I don't
>> think it's worth putting effort into that detail until we have
>> some meeting of the minds about (a).  So I'll put this back to
>> "needs review".
> 
> Since there hasn't been any more progress on this since the last CF, and the
> fact that the outcome may result in a completely different patch, I'm inclined
> to mark this returned with feedback rather than have it linger.  The 
> discussion
> can continue and the entry be re-opened.
> 
> Thoughts?


No objection.
-- 
Vik Fearing




Re: Recognizing superuser in pg_hba.conf

2020-07-02 Thread Daniel Gustafsson
> On 30 Mar 2020, at 20:28, Tom Lane  wrote:
> 
> Tomas Vondra  writes:
>> I see this patch is marked as RFC since 12/30, but there seems to be
>> quite a lot of discussion about the syntax, keywords and how exactly to
>> identify the superuser. So I'll switch it back to needs review, which I
>> think is a better representation of the current state.
> 
> Somebody switched it to RFC again, despite the facts that
> 
> (a) there is absolutely no consensus about what syntax to use
> (and some of the proposals imply very different patches),
> 
> (b) there's been no discussion at all since the last CF, and
> 
> (c) the patch is still failing in the cfbot (src/test/ssl fails).
> 
> While resolving (c) would seem to be the author's problem, I don't
> think it's worth putting effort into that detail until we have
> some meeting of the minds about (a).  So I'll put this back to
> "needs review".

Since there hasn't been any more progress on this since the last CF, and the
fact that the outcome may result in a completely different patch, I'm inclined
to mark this returned with feedback rather than have it linger.  The discussion
can continue and the entry be re-opened.

Thoughts?

cheers ./daniel



Re: Read access for pg_monitor to pg_replication_origin_status view

2020-07-02 Thread Daniel Gustafsson



> On 14 Jun 2020, at 05:46, Michael Paquier  wrote:
> 
> On Wed, Jun 10, 2020 at 12:35:49PM +0900, Michael Paquier wrote:
>> On Tue, Jun 09, 2020 at 05:07:39PM +0900, Masahiko Sawada wrote:
>>> I agreed with the change you proposed.
>> 
>> OK, thanks.  Then let's wait a couple of days to see if anybody has
>> any objections with the removal of the hardcoded superuser check
>> for those functions.
> 
> Committed the part removing the superuser checks as of cc07264.

AFAICT from the thread there is nothing left of this changeset to consider, so
I've marked the entry as committed in the CF app.  Feel free to update in case
I've missed something.

cheers ./daniel



Re: POC: rational number type (fractions)

2020-07-02 Thread Andrew Dunstan


On 7/1/20 7:00 PM, Stephen Frost wrote:
> Greetings,
>
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Stephen Frost  writes:
>>> I disagree with this and instead lean more towards the side that Robert
>>> and Jeff were taking in that this would be a useful extension and
>>> something we should consider including in core.  I disagree with Tom and
>>> Noah, specifically because, if we add this capability then I see our
>>> potential use-cases as increasing and therefore getting more individuals
>>> interested in working with us- to potentially include new contributors
>>> and possibly committers.
>> FWIW, I'm entirely in favor of having this available as an extension.
>> But I'm not in favor of it being in core.  I'm afraid it will end up
>> like the geometric types, i.e. a backwater of not-very-good code that
>> gets little love because it's not in line with the core competencies
>> of a bunch of database geeks.  If it's a separate project, then we
>> could hope to attract interest from people who know the subject matter
>> better but would never dare touch the PG backend in general.  There's
>> also the whole project-management issue that we have finite resources
>> and so we can *not* afford to put every arguably-useful feature in core.
> The issue that you highlight regarding geometric types is really that we
> simply refuse to punt things from core, ever, and that's not a
> reasonable position to take for long-term sanity.  On the flip side,
> it's ridiculously rare for an extension to have any kind of real
> life as an independent project- yes, there's one big exception (PostGIS)
> because it's simply ridiculously useful, and a few other cases
> where one company/individual or another funds the work of a particular
> extension because they need it for whatever, but by and large,
> extensions outside of PG simply don't thrive as independent projects.
>
> There's various potential reasons for that, from being hard to find, to
> being hard to install and work with, to the fact that we don't have a
> centralized extension system (PGXN isn't really endorsed at all by
> core... and I don't really think it should be), and our general
> extension management system isn't particularly great anyway.
>


Then these are things we should fix. But the right fix isn't including
every extension in the core code.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [PATCH] Initial progress reporting for COPY command

2020-07-02 Thread Daniel Gustafsson
> On 2 Jul 2020, at 14:42, Josef Šimánek  wrote:
> čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson  > napsal:

> The automated patchtester for the Commitfest gets confused when there are two
> versions of the same changeset attached to the email, as it tries to apply 
> them
> both which obviously results in an application failure.  I've attached just 
> the
> previously submitted patch version to this email to see if we can get a test
> run of it.
> 
> Thanks, I'm new to commitfest and I was confused as well.

No worries, we're here to help.

> I tried to reattach the thread there. I'll prepare a new patch soon, what 
> should I do? Just attach it again?

Correct, just reply to the thread with a new version of the patch attached, and
it'll get picked up automatically. No need to do anything more.

cheers ./daniel



Re: [PATCH] Initial progress reporting for COPY command

2020-07-02 Thread Josef Šimánek
čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson  napsal:

> The automated patchtester for the Commitfest gets confused when there are
> two
> versions of the same changeset attached to the email, as it tries to apply
> them
> both which obviously results in an application failure.  I've attached
> just the
> previously submitted patch version to this email to see if we can get a
> test
> run of it.
>

Thanks, I'm new to commitfest and I was confused as well. I tried to
reattach the thread there. I'll prepare a new patch soon, what should I do?
Just attach it again?


> cheers ./daniel
>
>


Re: WIP: Aggregation push-down

2020-07-02 Thread Daniel Gustafsson
> On 19 May 2020, at 10:17, Antonin Houska  wrote:

> The next version is attached.

This version now fails to apply to HEAD, with what looks like like a trivial
error in the expected test output.  Can you please submit a rebased version so
we can see it run in the patch tester CI?  I'm marking the entry as Waiting on
Author in the meantime.

cheers ./daniel



Re: [PATCH] Initial progress reporting for COPY command

2020-07-02 Thread Daniel Gustafsson
The automated patchtester for the Commitfest gets confused when there are two
versions of the same changeset attached to the email, as it tries to apply them
both which obviously results in an application failure.  I've attached just the
previously submitted patch version to this email to see if we can get a test
run of it.

cheers ./daniel



copy-progress-v2.patch
Description: Binary data


Re: Libpq support to connect to standby server as priority

2020-07-02 Thread Daniel Gustafsson
> On 18 May 2020, at 09:33, Greg Nancarrow  wrote:

> I'd like to submit a new version of a patch that I'd previously
> submitted but was eventually Returned with Feedback (closed in
> commitfest 2020-03).

This patch no longer applies, can you please submit a rebased version?  I've
marked the entry as Waiting on Author in the meantime.

cheers ./daniel




Re: Remove Deprecated Exclusive Backup Mode

2020-07-02 Thread Jehan-Guillaume de Rorthais
On Thu, 2 Jul 2020 12:32:14 +0200
Magnus Hagander  wrote:
[...]
> > non-exclusive backup...this is not that easy anymore. And
> > pg_is_in_backup() is quite misleading if the admin found it without reading
> > doc. Maybe an admin 
> 
> Yeah, as it is now it should really be called pg_is_in_exclusive_backup().

+1
We end up to the same conclusion while discussing with Gilles Darold this
morning. But considering the discussion bellow, this might not be needed anyway.

> > function to list in progress non-exclusive backup and related backend pid
> > might
> > be a good start?
> >  
> 
> I think it would make perfect sense to show manual (exclusive or
> non-exclusive) base backups in pg_stat_progress_basebackup. There's no
> fundamental need that one should only be for base backups taken with
> pg_basebackup. In fact, I'd argue that's a mistake in the view in v13 that
> it does not include this information.
> 
> It could have "phase" set to "manual non-exclusive" for example, and leave
> the other fields at NULL.

Even in manual non-exclusive we could define some different phase:

 * waiting for checkpoint to finish
 * external backup
 * waiting for wal archiving to finish

So maybe exposing the backup label and/or the method might be useful as well?

> I guess in theory even for exclusive ones, with the pid column set to NULL.

We are discussing removing the mandatory connection during the non-exclusive
backup. If it become optional at some point, this PID might be NULL as well...

> (As Stephen mentioned at some point in the future we might also want to
> extend it to allow these tools to report their progress as well into it,
> probably by just calling an admin function on the connection that they
> already have).
> 
> 
> > Tools like pg_basebackup (and probably pgbackrest/barman to some extends)
> > still need the backup to abort on disconnection. Maybe it could flag its
> > session using the replication protocol or call a new admin function or load
> > a hook on session-shutdown to keep previous API behavior?
> 
> Suddenly requiring a replication protocol connection for one thing, when
> all their other things are done over a normal one, seems really terrible.

Sorry, I misexplained. Adding such flag to the replication protocol was only for
pg_basebackup. Other tools would use an admin function to achieve the same goal.

> But having an admin function would work.
> 
> But anything requiring loading of a hook is going to raise the bar
> massively as suddenly somebody needs to install a compiled binary on the
> server in order to use it.

Contrib already provide many useful tools. How would it be different from eg.
pg_archivecleanup? IIRC, end of session hook was discussed in the past, but I
did not follow the thread by the time. But anyway, an admin function would
probably make the job as well anyway.

> But calling a separate function is pretty much > what I suggested upthread
> (except I suggested a new version of pg_stop_backup, but implementation wise)
> 
> But I'm not sure what previous API behavior you're looking to preserve here?

Current non-exclusive backup are automatically aborted when the
pg_start_backup() session disappear without pg_stop_backup(). I suppose this was
the API behavior that Robert is concerned to loose when he is writing about
people forgetting to end backups.

And now you are talking about Stephen's idea to report the progress of a backup,
this kind of API could be used as watchdog as well: if the frontend did not
report before some optional kind of (user defined) timeout, abort the backup.
We would have a "similar" behavior than current one.

To be clear, I suppose these admin functions could be called from any backend
session, taking some kind of backup id (eg. the token you were talking about
upthread).

Regards,




Re: MultiXact\SLRU buffers configuration

2020-07-02 Thread Daniel Gustafsson
> On 15 May 2020, at 02:03, Kyotaro Horiguchi  wrote:

> Generally in such cases, condition variables would work.  In the
> attached PoC, the reader side gets no penalty in the "likely" code
> path.  The writer side always calls ConditionVariableBroadcast but the
> waiter list is empty in almost all cases.  But I couldn't cause the
> situation where the sleep 1000u is reached.

The submitted patch no longer applies, can you please submit an updated
version?  I'm marking the patch Waiting on Author in the meantime.

cheers ./daniel



Re: Built-in connection pooler

2020-07-02 Thread Konstantin Knizhnik



On 01.07.2020 12:30, Daniel Gustafsson wrote:

On 24 Mar 2020, at 17:24, Konstantin Knizhnik  wrote:
Rebased version of the patch is attached.

And this patch also fails to apply now, can you please submit a new version?
Marking the entry as Waiting on Author in the meantime.

cheers ./daniel

Rebased version of the patch is attached.

diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c
index 6fbfef2b12..27aa6cba8e 100644
--- a/contrib/spi/refint.c
+++ b/contrib/spi/refint.c
@@ -11,6 +11,7 @@
 
 #include "commands/trigger.h"
 #include "executor/spi.h"
+#include "storage/proc.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -94,6 +95,8 @@ check_primary_key(PG_FUNCTION_ARGS)
 	else
 		tuple = trigdata->tg_newtuple;
 
+	MyProc->is_tainted = true;
+
 	trigger = trigdata->tg_trigger;
 	nargs = trigger->tgnargs;
 	args = trigger->tgargs;
@@ -286,6 +289,8 @@ check_foreign_key(PG_FUNCTION_ARGS)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: cannot process INSERT events");
 
+	MyProc->is_tainted = true;
+
 	/* Have to check tg_trigtuple - tuple being deleted */
 	trigtuple = trigdata->tg_trigtuple;
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b81aab239f..aa435d4066 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -732,6 +732,169 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_sessions (integer)
+  
+   max_sessions configuration parameter
+  
+  
+  
+
+  The maximum number of client sessions that can be handled by
+  one connection proxy when session pooling is enabled.
+  This parameter does not add any memory or CPU overhead, so
+  specifying a large max_sessions value
+  does not affect performance.
+  If the max_sessions limit is reached new connections are not accepted.
+
+
+  The default value is 1000. This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  session_pool_size (integer)
+  
+   session_pool_size configuration parameter
+  
+  
+  
+
+  Enables session pooling and defines the maximum number of
+  backends that can be used by client sessions for each database/user combination.
+  Launched non-tainted backends are never terminated even if there are no active sessions.
+  Backend is considered as tainted if client updates GUCs, creates temporary table or prepared statements.
+  Tainted backend can server only one client.
+
+
+  The default value is 10, so up to 10 backends will serve each database,
+
+  
+ 
+
+ 
+  proxy_port (integer)
+  
+   proxy_port configuration parameter
+  
+  
+  
+
+  Sets the TCP port for the connection pooler.
+  Clients connected to main "port" will be assigned dedicated backends,
+  while client connected to proxy port will be connected to backends through proxy which
+  performs transaction level scheduling. 
+   
+
+  The default value is 6543.
+
+  
+ 
+
+ 
+  connection_proxies (integer)
+  
+   connection_proxies configuration parameter
+  
+  
+  
+
+  Sets number of connection proxies.
+  Postmaster spawns separate worker process for each proxy. Postmaster scatters connections between proxies using one of scheduling policies (round-robin, random, load-balancing).
+  Each proxy launches its own subset of backends.
+  So maximal number of non-tainted backends is  session_pool_size*connection_proxies*databases*roles.
+   
+
+  The default value is 0, so session pooling is disabled.
+
+  
+ 
+
+ 
+  session_schedule (enum)
+  
+   session_schedule configuration parameter
+  
+  
+  
+
+  Specifies scheduling policy for assigning session to proxies in case of
+  connection pooling. Default policy is round-robin.
+
+
+  With round-robin policy postmaster cyclicly scatter sessions between proxies.
+
+
+  With random policy postmaster randomly choose proxy for new session.
+
+
+  With load-balancing policy postmaster choose proxy with lowest load average.
+  Load average of proxy is estimated by number of clients connection assigned to this proxy with extra weight for SSL connections.
+   
+  
+ 
+
+ 
+  idle_pool_worker_timeout (integer)
+  
+   idle_pool_worker_timeout configuration parameter
+  
+  
+  
+
+ Terminate an idle connection pool worker after the specified number of milliseconds.
+ The default value is 0, so pool workers are never terminated.
+   
+  
+ 
+
+ 
+  

Re: [PATCH] fix GIN index search sometimes losing results

2020-07-02 Thread Pavel Borisov
ср, 1 июл. 2020 г. в 23:16, Tom Lane :

> Pavel Borisov  writes:
> > Below is my variant how to patch Gin-Gist weights issue:
>
> I looked at this patch, but I'm unimpressed, because it's buggy.
>

Thank you, i'd noticed and made minor corrections in the patch. Now it
should work
correctly,

As for preserving the option to use legacy bool-style calls, personally I
see much
value of not changing API ad hoc to fix something. This may not harm
vanilla reseases
but can break many possible side things like RUM index etc which I think
are abundant
around there. Furthermore if we leave legacy bool callback along with
newly proposed and
recommended for further use it will cost nothing.

So I've attached a corrected patch. Also I wrote some comments to the code
and added
your test as a part of apatch. Again thank you for sharing your thoughts
and advice.

As always I'd appreciate everyone's opinion on the bugfix.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


gin-gist-weight-patch-v4.diff
Description: Binary data


  1   2   >