Re: xl_heap_header alignment?

2020-08-21 Thread David G. Johnston
On Fri, Aug 21, 2020 at 5:41 PM Bruce Momjian  wrote:

> On Wed, Jul 22, 2020 at 06:58:33AM +0200, Antonin Houska wrote:
> > Tom Lane  wrote:
> >
> > > I don't particularly want to remove the field, but we ought to
> > > change or remove the comment.
> >
> > I'm not concerned about the existence of the field as well. The comment
> just
> > made me worried that I might be missing some fundamental concept. Thanks
> for
> > your opinion.
>
> I have developed the attached patch to address this.
>

I would suggest either dropping the word "potentially" or removing the
sentence.  I'm not a fan of this in-between position on principle even if I
don't understand the full reality of the implementation.

If leaving the word "potentially" is necessary it would be good to point
out where the complexity is documented as a part of that - this header file
probably not the best place to go into detail.

David J.


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-21 Thread Amit Kapila
On Fri, Aug 21, 2020 at 5:10 PM Dilip Kumar  wrote:
>
> I have reviewed and tested the patch and the changes look fine to me.
>

Thanks, I will push the next patch early next week (by Tuesday) unless
you or someone else has any more comments on it. The summary of the
patch (v52-0001-Extend-the-BufFile-interface, attached with my
previous email) I am planning to push is: "It extends the BufFile
interface to support temporary files that can be used by the single
backend when the corresponding files need to be survived across the
transaction and need to be opened and closed multiple times. Such
files need to be created as a member of a SharedFileSet. We have
implemented the interface for BufFileTruncate to allow files to be
truncated up to a particular offset and extended the BufFileSeek API
to support SEEK_END case. We have also added an option to provide a
mode while opening the shared BufFiles instead of always opening in
read-only mode. These enhancements in BufFile interface are required
for the upcoming patch to allow the replication apply worker, to
properly handle streamed in-progress transactions."

-- 
With Regards,
Amit Kapila.




Re: Improve Managing Databases Overview Doc Page

2020-08-21 Thread Bruce Momjian
On Thu, Jul 16, 2020 at 02:55:54PM -0700, David G. Johnston wrote:
> Hackers,
> 
> As a reaction to this documentation comment [1] I went through the main
> paragraph of the Database Management Overview and came up with the reworded 
> and
> expanded page.  Proposed for HEAD only.  Added to the commitfest 2020-09.
> 
> 1. https://www.postgresql.org/message-id/flat/
> 57083a441ddd2f3b9cdc0967c6689384cddeeedb.camel%40cybertec.at#
> f7198de1af14f7c5d84e7095b6b52bff

FYI, patch applied.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: xl_heap_header alignment?

2020-08-21 Thread Bruce Momjian
On Wed, Jul 22, 2020 at 06:58:33AM +0200, Antonin Houska wrote:
> Tom Lane  wrote:
> 
> > I don't particularly want to remove the field, but we ought to
> > change or remove the comment.
> 
> I'm not concerned about the existence of the field as well. The comment just
> made me worried that I might be missing some fundamental concept. Thanks for
> your opinion.

I have developed the attached patch to address this.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index aa17f7df84..38683893c0 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -137,8 +137,7 @@ typedef struct xl_heap_truncate
  * or updated tuple in WAL; we can save a few bytes by reconstructing the
  * fields that are available elsewhere in the WAL record, or perhaps just
  * plain needn't be reconstructed.  These are the fields we must store.
- * NOTE: t_hoff could be recomputed, but we may as well store it because
- * it will come for free due to alignment considerations.
+ * FYI, t_hoff could potentially be recomputed.
  */
 typedef struct xl_heap_header
 {


Re: Documentation patch for backup manifests in protocol.sgml

2020-08-21 Thread Bruce Momjian
On Tue, Aug 18, 2020 at 02:41:09PM +0200, Bernd Helmle wrote:
> Hi,
> 
> protocol.sgml describes the protocol messages received by a BASE_BACKUP
> streaming command, but doesn't tell anything about the additional
> CopyResponse data message containing the contents of the backup
> manifest (if requested) after having received the tar files. So i
> propose the attached to give a little more detail in this paragraph.
> 
>   Thanks, Bernd
> 

> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
> index 8b00235a516..31918144b37 100644
> --- a/doc/src/sgml/protocol.sgml
> +++ b/doc/src/sgml/protocol.sgml
> @@ -2665,8 +2665,10 @@ The commands accepted in replication mode are:
>ustar interchange format specified in the POSIX 
> 1003.1-2008
>standard) dump of the tablespace contents, except that the two trailing
>blocks of zeroes specified in the standard are omitted.
> -  After the tar data is complete, a final ordinary result set will be 
> sent,
> -  containing the WAL end position of the backup, in the same format as
> +  After the tar data is complete, and if a backup manifest was requested,
> +  another CopyResponse result is sent, containing the manifest data for 
> the
> +  current base backup. In any case, a final ordinary result set will be
> +  sent, containing the WAL end position of the backup, in the same 
> format as
>the start position.
>   

If someone can confirm this, I will apply it?  Magnus?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Problems with the FSM, heap fillfactor, and temporal locality

2020-08-21 Thread Peter Geoghegan
Hi John,

On Fri, Aug 21, 2020 at 3:10 AM John Naylor  wrote:
> Interesting stuff. Is lower-than-default fillfactor important for the
> behavior you see?

It's hard to say. It's definitely not important as far as the initial
bulk loading behavior is concerned (the behavior where related tuples
get inserted onto multiple disparate tuples all too often). That will
happen with any fill factor, including the default/100.

I'm concerned about heap fill factor in particular because I suspect
that that doesn't really work sensibly.

To give you some concrete idea of the benefits, I present a
pg_stat_database from the master branch after 6 hours of BenchmarkSQL
with a rate limit:

-[ RECORD 1 ]-+--
datid | 13,619
datname   | postgres
numbackends   | 3
xact_commit   | 45,902,031
xact_rollback | 200,131
blks_read | 662,677,079
blks_hit  | 24,790,989,538
tup_returned  | 30,054,930,608
tup_fetched   | 13,722,542,025
tup_inserted  | 859,165,629
tup_updated   | 520,611,959
tup_deleted   | 20,074,897
conflicts | 0
temp_files| 88
temp_bytes| 18,849,890,304
deadlocks | 0
checksum_failures |
checksum_last_failure |
blk_read_time | 124,233,831.492
blk_write_time| 8,588,876.871
stats_reset   | 2020-08-20 13:51:08.351036-07

Here is equivalent output for my simple patch that naively disables the FSM:

-[ RECORD 1 ]-+--
datid | 13,619
datname   | postgres
numbackends   | 3
xact_commit   | 46,369,235
xact_rollback | 201,919
blks_read | 658,267,665
blks_hit  | 19,980,524,578
tup_returned  | 30,804,856,896
tup_fetched   | 11,839,865,936
tup_inserted  | 861,798,911
tup_updated   | 525,895,435
tup_deleted   | 20,277,618
conflicts | 0
temp_files| 88
temp_bytes| 18,849,439,744
deadlocks | 0
checksum_failures |
checksum_last_failure |
blk_read_time | 117,167,612.616
blk_write_time| 7,873,922.175
stats_reset   | 2020-08-20 13:50:51.72056-07

Note that there is a ~20% reduction in blks_hit here, even though the
patch does ~1% more transactions (the rate limiting doesn't work
perfectly). There is also a ~5.5% reduction in aggregate
blk_read_time, and a ~9% reduction in blk_write_time. I'd say that
that's pretty significant.

I also see small but significant improvements in transaction latency,
particularly with 90th percentile latency.

> Hmm. You focus on FSM, but I'm thinking also multiple VM bits
> potentially got cleared (maybe not this benchmark but other
> scenarios).

I'm focussed on how heapam interacts with the FSM, and its effects on
locality. It's definitely true that this could go on to affect how the
visibility map gets set -- we could set fewer bits unnecessarily. And
it probably has a number of other undesirable consequences that are
hard to quantify. Clearly there are many reasons why making the
physical database layout closer to that of the logical database is a
good thing. I probably have missed a few.

> I'm not sure I follow the "close-by" criterion. It doesn't seem
> immediately relevant to what I understand as the main problem you
> found, of free space. In other words, if we currently write to 5
> blocks, but with smarter FSM logic we can find a single block, it
> seems that should be preferred over close-by blocks each with less
> space? Or are you envisioning scoring by both free space and distance
> from the original block?

I am thinking of doing both at the same time.

Think of a table with highly skewed access. There is a relatively
small number of hot spots -- parts of the primary key's key space that
are consistently affected by many skewed updates (with strong heap
correlation to primary key order). We simply cannot ever hope to avoid
migrating heavily updated rows to new pages -- too much contention in
the hot spots for that. But, by 1) Not considering pages
free-space-reclaimable until the free space reaches ~50%, and 2)
preferring close-by free pages, we avoid (or at least delay)
destroying locality of access. There is a much better chance that rows
from logically related hot spots will all migrate to the same few
blocks again and again, with whole blocks becoming free in a
relatively predictable, phased fashion. (I'm speculating here, but my
guess is that this combination will help real world workloads by quite
a bit.)

Preferring close-by blocks in the FSM is something that there was
discussion of when the current FSM implementation went in back in 8.4.
I am almost certain that just doing that will not help. If it helps at
all then it will help by preserving locality as tuple turnover takes
place over time, and I think that you need to be clever about 

Re: deb repo doesn't have latest. or possible to update web page?

2020-08-21 Thread Magnus Hagander
On Thu, Aug 20, 2020 at 10:38 PM Roger Pack  wrote:

> On Wed, Aug 19, 2020 at 11:23 AM Magnus Hagander 
> wrote:
> >
> >
> >
> > On Wed, Aug 19, 2020 at 7:04 PM Roger Pack 
> wrote:
> >>
> >> As a note I tried to use the deb repo today:
> >>
> >> https://www.postgresql.org/download/linux/debian/
> >>
> >> with an old box on Wheezy.
> >> It only seems to have binaries up to postgres 10.
> >>
> >> Might be nice to make a note on the web page so people realize some
> >> distro's aren't supported fully instead of (if they're like me)
> >> wondering "why don't these instructions work? It says to run  apt-get
> >> install postgresql-12" ...
> >
> >
> > The page lists which distros *are* supported. You can assume that
> anything *not* listed is unsupported.
> >
> > In the case of wheezy, whatever was the latest when it stopped being
> supported, is still there. Which I guess can cause some confusing if you
> just run the script without reading the note that's there. I'm unsure how
> to fix that though.
>
> The confusion in my case is I wasn't sure why my distro was named,
> tried the instructions and it...half worked.
>
> Maybe something like this?
>
> The PostgreSQL apt repository supports the currently supported stable
> versions of Debian with the latest versions of Postgres:
>
> xxx
> xxx
>
> Older versions of Debian may also be supported with older versions of
> Postgres.
>

Well, they are not supported. The packages may be there, but they are not
supported. I think that's an important distinction. Maybe add something
like "some packages may be available for older versions of Debian, but are
not supported" or such?



Or get rid of the wheezy side altogether?
>

Or move it to the archive. I'm not entirely sure why it's still there.
Christoph?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)

2020-08-21 Thread Fujii Masao




On 2020/08/22 1:04, Fujii Masao wrote:



On 2020/08/21 23:53, Tom Lane wrote:

Fujii Masao  writes:

Pushed. Thanks!


Buildfarm shows this patch has got problems under
-DRELCACHE_FORCE_RELEASE and/or -DCATCACHE_FORCE_RELEASE:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2020-08-21%2011%3A54%3A12


Thanks for the report!

This happens because, in text format, whether "Planning:" line is output
varies depending on the system state. So explain_filter() should ignore
"Planning:" line. Patch attached. I'm now checking whether the patched
version works fine.


I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: run pgindent on a regular basis / scripted manner

2020-08-21 Thread Magnus Hagander
On Fri, Aug 14, 2020 at 10:26 PM Alvaro Herrera 
wrote:

> On 2020-Aug-13, Magnus Hagander wrote:
>
> > That is:
> > 1. Whenever a patch is pushed on master on the main repo a process kicked
> > off (or maybe wait 5 minutes to coalesce multiple patches if there are)
> > 2. This process checks out master, and runs pgindent on it
> > 3. When done, this gets committed to a new branch (or just overwrites an
> > existing branch of course, we don't need to maintain history here) like
> > "master-indented". This branch can be in a different repo, but one that
> > starts out as a clone of the main one
> > 4. A committer (any committer) can then on regular basis examine the
> > differences by fetch + diff. If they're happy with it, cherry pick it in.
> > If not, figure out what needs to be done to adjust it.
>
> Sounds good -- for branch master.
>

So mostly for testing, I've set up a job that does this.

Basically it runs every 15 minutes and if there is a new commit on master
it will rebase onto the latest master and run pgindent on it. This then
gets pushed up to a separate repo (postgresql-pgindent.git on
git.postgresql.org), and can be viewed there.

To see the current state of pgindent, view:
https://git.postgresql.org/gitweb/?p=postgresql-pgindent.git;a=commitdiff;h=master-pgindent

(or decorate as wanted to see for example a raw patch format)

If a committer wants to use it directly, just "git remote add" the
postgresql-pgindent.git and then cherry-pick the branch tip into your own
repository, and push. Well, actually that will right now fail because the
commit is made by "Automatic pgindent" which is not an approved committer,
but if we want to do this as a more regular thing, we can certainly fix
that.

Note that the only thing the job does is run pgindent. It does not attempt
to do anything with the typedef list at this point.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2020-08-21 Thread Ibrar Ahmed
On Wed, Aug 19, 2020 at 6:15 PM Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> On 18.08.2020 02:54, Alvaro Herrera wrote:
> > On 2020-Aug-14, Ibrar Ahmed wrote:
> >
> >> The table used for the test contains three columns (integer, text,
> >> varchar).
> >> The total number of rows is 1000 in total.
> >>
> >> Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
> >> COPY: 9069.432 ms vacuum; 2567.961ms
> >> COPY: 9004.533 ms vacuum: 2553.075ms
> >> COPY: 8832.422 ms vacuum: 2540.742ms
> >>
> >> Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
> >> COPY: 10031.723 ms vacuum: 127.524 ms
> >> COPY: 9985.109  ms vacuum: 39.953 ms
> >> COPY: 9283.373  ms vacuum: 37.137 ms
> >>
> >> Time to take the copy slightly increased but the vacuum time
> significantly
> >> decrease.
> > "Slightly"?  It seems quite a large performance drop to me -- more than
> > 10%.  Where is that time being spent?  Andres said in [1] that he
> > thought the performance shouldn't be affected noticeably, but this
> > doesn't seem to hold true.  As I understand, the idea was that there
> > would be little or no additional WAL records .. only flags in the
> > existing record.  So what is happening?
> >
> > [1]
> https://postgr.es/m/20190408010427.4l63qr7h2fjcy...@alap3.anarazel.de
>
> I agree that 10% performance drop is not what we expect with this patch.
> Ibrar, can you share more info about your tests? I'd like to reproduce
> this slowdown and fix it, if necessary.
>
> I've run some tests on my laptop and COPY FREEZE shows the same time for
> both versions, while VACUUM is much faster on the patched version. I've
> also checked WAL generation and it shows that the patch works correctly
> as it doesn't add any records for COPY.
>
> Not patched:
>
> Time: 54883,356 ms (00:54,883)
> Time: 65373,333 ms (01:05,373)
> Time: 64684,592 ms (01:04,685)
> VACUUM Time: 60861,670 ms (01:00,862)
>
> COPY  wal_bytes 3765 MB
> VACUUM wal_bytes 6015 MB
> table size 5971 MB
>
> Patched:
>
> Time: 53142,947 ms (00:53,143)
> Time: 65420,812 ms (01:05,421)
> Time: 66600,114 ms (01:06,600)
> VACUUM Time: 63,401 ms
>
> COPY  wal_bytes 3765 MB
> VACUUM wal_bytes 30 kB
> table size 5971 MB
>
> The test script is attached.
>
> > Also, when Andres posted this patch first, he said this was only for
> > heap_multi_insert because it was a prototype.  But I think we expect
> > that the table_insert path (CIM_SINGLE mode in copy) should also receive
> > that treatment.
>
> I am afraid that extra checks for COPY FREEZE  in heap_insert() will
> slow down normal insertions.
>
> --
> Anastasia Lubennikova
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
Here is my test;


postgres=# BEGIN;

BEGIN


postgres=*# TRUNCATE foo;

TRUNCATE TABLE


postgres=*# COPY foo(id, name, address) FROM '/home/ibrar/bar.csv'
DELIMITER ',' FREEZE;

COPY 1000


postgres=*# COMMIT;

COMMIT


postgres=# VACUUM;

VACUUM


postgres=# SELECT count(*) FROM foo;

  count

--

 1000

(1 row)



-- 
Ibrar Ahmed


Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)

2020-08-21 Thread Fujii Masao



On 2020/08/21 23:53, Tom Lane wrote:

Fujii Masao  writes:

Pushed. Thanks!


Buildfarm shows this patch has got problems under
-DRELCACHE_FORCE_RELEASE and/or -DCATCACHE_FORCE_RELEASE:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2020-08-21%2011%3A54%3A12


Thanks for the report!

This happens because, in text format, whether "Planning:" line is output
varies depending on the system state. So explain_filter() should ignore
"Planning:" line. Patch attached. I'm now checking whether the patched
version works fine.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/test/regress/expected/explain.out 
b/src/test/regress/expected/explain.out
index a1ee6c6792..dc7ab2ce8b 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -23,6 +23,9 @@ begin
 -- Ignore text-mode buffers output because it varies depending
 -- on the system state
 CONTINUE WHEN (ln ~ ' +Buffers: .*');
+-- Ignore text-mode "Planning:" line because whether it's output
+-- varies depending on the system state
+CONTINUE WHEN (ln = 'Planning:');
 return next ln;
 end loop;
 end;
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index 01783c607a..c79116c927 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -25,6 +25,9 @@ begin
 -- Ignore text-mode buffers output because it varies depending
 -- on the system state
 CONTINUE WHEN (ln ~ ' +Buffers: .*');
+-- Ignore text-mode "Planning:" line because whether it's output
+-- varies depending on the system state
+CONTINUE WHEN (ln = 'Planning:');
 return next ln;
 end loop;
 end;


Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)

2020-08-21 Thread Tom Lane
Fujii Masao  writes:
> Pushed. Thanks!

Buildfarm shows this patch has got problems under
-DRELCACHE_FORCE_RELEASE and/or -DCATCACHE_FORCE_RELEASE:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2020-08-21%2011%3A54%3A12

regards, tom lane




Re: Creating a function for exposing memory usage of backend process

2020-08-21 Thread torikoshia

Thanks for all your comments!

Thankfully it seems that this feature is regarded as not
meaningless one, I'm going to do some improvements.


On Wed, Aug 19, 2020 at 10:56 PM Michael Paquier  
wrote:

On Wed, Aug 19, 2020 at 06:12:02PM +0900, Fujii Masao wrote:

On 2020/08/19 17:40, torikoshia wrote:
Yes, I didn't add regression tests because of the unstability of the 
output.
I thought it would be OK since other views like pg_stat_slru and 
pg_shmem_allocations

didn't have tests for their outputs.


You're right.


If you can make a test with something minimal and with a stable
output, adding a test is helpful IMO, or how can you make easily sure
that this does not get broken, particularly in the event of future
refactorings, or even with platform-dependent behaviors?


OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


On Thu, Aug 20, 2020 at 12:02 AM Tom Lane  wrote:

Michael Paquier  writes:
> By the way, I was looking at the code that has been committed, and I
> think that it is awkward to have a SQL function in mcxt.c, which is a
> rather low-level interface.  I think that this new code should be
> moved to its own file, one suggestion for a location I have being
> src/backend/utils/adt/mcxtfuncs.c.

I agree with that,


Thanks for pointing out.
Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


On Thu, Aug 20, 2020 at 11:09 AM Fujii Masao 
 wrote:

On 2020/08/20 0:01, Tom Lane wrote:

Given the lack of clear use-case, and the possibility (admittedly
not strong) that this is still somehow a security hazard, I think
we should revert it.  If it stays, I'd like to see restrictions
on who can read the view.



For example, allowing only the role with pg_monitor to see this view?


Attached a patch adding that restriction.
(0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch)

Of course, this restriction makes pg_backend_memory_contexts hard to use
when the user of the target session is not granted pg_monitor because 
the

scope of this view is session local.

In this case, I imagine additional operations something like temporarily
granting pg_monitor to that user.

Thoughts?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 23fe541d5bd3cead787bb7c638f0086b9c2e13eb Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 21 Aug 2020 21:22:10 +0900
Subject: [PATCH] Added a regression test for pg_backend_memory_contexts.

---
 src/test/regress/expected/sysviews.out | 7 +++
 src/test/regress/sql/sysviews.sql  | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..06e09fd10b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -19,6 +19,13 @@ select count(*) >= 0 as ok from pg_available_extensions;
  t
 (1 row)
 
+-- There will surely be at least one context.
+select count(*) > 0 as ok from pg_backend_memory_contexts;
+ ok 
+
+ t
+(1 row)
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
  ok 
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 28e412b735..2c3b88c855 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -12,6 +12,9 @@ select count(*) >= 0 as ok from pg_available_extension_versions;
 
 select count(*) >= 0 as ok from pg_available_extensions;
 
+-- There will surely be at least one context.
+select count(*) > 0 as ok from pg_backend_memory_contexts;
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
 
-- 
2.18.1

From 4eee73933874fbab91643e7461717ba9038d8d76 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 21 Aug 2020 19:01:38 +0900
Subject: [PATCH] Rellocated the codes for pg_backend_memory_contexts in mcxt.c
 to src/backend/utils/adt/mcxtfuncs.c as they are low low-level interface.

---
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/mcxtfuncs.c | 157 ++
 src/backend/utils/mmgr/mcxt.c | 137 --
 3 files changed, 158 insertions(+), 137 deletions(-)
 create mode 100644 src/backend/utils/adt/mcxtfuncs.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 5d2aca8cfe..54d5c37947 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	lockfuncs.o \
 	mac.o \
 	mac8.o \
+	mcxtfuncs.o \
 	misc.o \
 	name.o \
 	network.o \
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
new file mode 100644
index 00..50e1b07ff0
--- /dev/null
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -0,0 +1,157 @@

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-21 Thread Ashutosh Sharma
Hi Masahiko-san,

Please find the updated patch with the following new changes:

1) It adds the code changes in heap_force_kill function to clear an
all-visible bit on the visibility map corresponding to the page that
is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
flag on the page header.

2) It adds the code changes in heap_force_freeze function to reset the
ctid value in a tuple header if it is corrupted.

3) It adds several notes and examples in the documentation stating
when and how we need to use the functions provided by this module.

Please have a look and let me know for any other concern.

Thanks,

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Thu, Aug 20, 2020 at 11:43 AM Ashutosh Sharma  wrote:
>
> On Thu, Aug 20, 2020 at 11:04 AM Masahiko Sawada
>  wrote:
> >
> > On Wed, 19 Aug 2020 at 20:45, Ashutosh Sharma  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 3:55 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Wed, 19 Aug 2020 at 15:09, Ashutosh Sharma  
> > > > wrote:
> > > > >
> > > > > On Wed, Aug 19, 2020 at 9:27 AM Masahiko Sawada
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 17 Aug 2020 at 15:05, Ashutosh Sharma 
> > > > > >  wrote:
> > > > > > >
> > > > > > > > pg_force_freeze() can revival a tuple that is already deleted 
> > > > > > > > but not
> > > > > > > > vacuumed yet. Therefore, the user might need to reindex indexes 
> > > > > > > > after
> > > > > > > > using that function. For instance, with the following script, 
> > > > > > > > the last
> > > > > > > > two queries: index scan and seq scan, will return different 
> > > > > > > > results.
> > > > > > > >
> > > > > > > > set enable_seqscan to off;
> > > > > > > > set enable_bitmapscan to off;
> > > > > > > > set enable_indexonlyscan to off;
> > > > > > > > create table tbl (a int primary key);
> > > > > > > > insert into tbl values (1);
> > > > > > > >
> > > > > > > > update tbl set a = a + 100 where a = 1;
> > > > > > > >
> > > > > > > > explain analyze select * from tbl where a < 200;
> > > > > > > >
> > > > > > > > -- revive deleted tuple on heap
> > > > > > > > select heap_force_freeze('tbl', array['(0,1)'::tid]);
> > > > > > > >
> > > > > > > > -- index scan returns 2 tuples
> > > > > > > > explain analyze select * from tbl where a < 200;
> > > > > > > >
> > > > > > > > -- seq scan returns 1 tuple
> > > > > > > > set enable_seqscan to on;
> > > > > > > > explain analyze select * from tbl;
> > > > > > > >
> > > > > > >
> > > > > > > I am not sure if this is the right use-case of pg_force_freeze
> > > > > > > function. I think we should only be running pg_force_freeze 
> > > > > > > function
> > > > > > > on a tuple for which VACUUM reports "found xmin ABC from before
> > > > > > > relfrozenxid PQR" sort of error otherwise it might worsen the 
> > > > > > > things
> > > > > > > instead of making it better.
> > > > > >
> > > > > > Should this also be documented? I think that it's hard to force the
> > > > > > user to always use this module in the right situation but we need to
> > > > > > show at least when to use.
> > > > > >
> > > > >
> > > > > I've already added some examples in the documentation explaining the
> > > > > use-case of force_freeze function. If required, I will also add a note
> > > > > about it.
> > > > >
> > > > > > > > Also, if a tuple updated and moved to another partition is 
> > > > > > > > revived by
> > > > > > > > heap_force_freeze(), its ctid still has special values:
> > > > > > > > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I 
> > > > > > > > don't
> > > > > > > > see a problem yet caused by a visible tuple having the special 
> > > > > > > > ctid
> > > > > > > > value, but it might be worth considering either to reset ctid 
> > > > > > > > value as
> > > > > > > > well or to not freezing already-deleted tuple.
> > > > > > > >
> > > > > > >
> > > > > > > For this as well, the answer remains the same as above.
> > > > > >
> > > > > > Perhaps the same is true when a tuple header is corrupted including
> > > > > > xmin and ctid for some reason and the user wants to fix it? I'm
> > > > > > concerned that a live tuple having the wrong ctid will cause SEGV or
> > > > > > PANIC error in the future.
> > > > > >
> > > > >
> > > > > If a tuple header itself is corrupted, then I think we must kill that
> > > > > tuple. If only xmin and t_ctid fields are corrupted, then probably we
> > > > > can think of resetting the ctid value of that tuple. However, it won't
> > > > > be always possible to detect the corrupted ctid value. It's quite
> > > > > possible that the corrupted ctid field has valid values for block
> > > > > number and offset number in it, but it's actually corrupted and it
> > > > > would be difficult to consider such ctid as corrupted. Hence, we can't
> > > > > do anything about such types of corruption. Probably in such cases we
> > > > > need to run VACUUM FULL on such tables so that new ctid gets generated
> > > > > for 

Re: Implementing Incremental View Maintenance

2020-08-21 Thread Tatsuo Ishii
From: Yugo NAGATA 
Subject: Re: Implementing Incremental View Maintenance
Date: Fri, 21 Aug 2020 17:23:20 +0900
Message-ID: <20200821172320.a2506577d5244b6066f69...@sraoss.co.jp>

> On Wed, 19 Aug 2020 10:02:42 +0900 (JST)
> Tatsuo Ishii  wrote:
> 
>> I have looked into this.
> 
> Thank you for your reviewing!
>  
>> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch:
>>   This one needs a comment to describe what the function does etc.
>> 
>>   +void
>>   +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
>>   +{
> 
> I added a comment for this function and related places.
> 
> +/*
> + * SetTransitionTablePreserved
> + *
> + * Prolong lifespan of transition tables corresponding specified relid and
> + * command type to the end of the outmost query instead of each nested query.
> + * This enables to use nested AFTER trigger's transition tables from outer
> + * query's triggers.  Currently, only immediate incremental view maintenance
> + * uses this.
> + */
> +void
> +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
> 
> Also, I removed releted unnecessary code which was left accidentally.
> 
>  
>> - 0007-Add-aggregates-support-in-IVM.patch
>>   "Check if the given aggregate function is supporting" shouldn't be
>>   "Check if the given aggregate function is supporting IVM"?
> 
> Yes, you are right. I fixed this, too.
> 
>> 
>> + * check_aggregate_supports_ivm
>> + *
>> + * Check if the given aggregate function is supporting

Thanks for the fixes. I have changed the commit fest status to "Ready
for Committer".

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-08-21 Thread Andrey Lepikhov



On 8/7/20 2:14 PM, Amit Langote wrote:

I was playing around with v5 and I noticed an assertion failure which
I concluded is due to improper setting of ri_usesBulkModify.  You can
reproduce it with these steps.

create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table foo (a int, b int) partition by list (a);
create table foo1 (like foo);
create foreign table ffoo1 partition of foo for values in (1) server
lb options (table_name 'foo1');
create table foo2 (like foo);
create foreign table ffoo2 partition of foo for values in (2) server
lb options (table_name 'foo2');
create function print_new_row() returns trigger language plpgsql as $$
begin raise notice '%', new; return new; end; $$;
create trigger ffoo1_br_trig before insert on ffoo1 for each row
execute function print_new_row();
copy foo from stdin csv;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

1,2
2,3
\.

NOTICE:  (1,2)
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.

#0  0x7f2d5e266337 in raise () from /lib64/libc.so.6
#1  0x7f2d5e267a28 in abort () from /lib64/libc.so.6
#2  0x00aafd5d in ExceptionalCondition
(conditionName=0x7f2d37b468d0 "!resultRelInfo->ri_usesBulkModify ||
resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn == NULL",
 errorType=0x7f2d37b46680 "FailedAssertion",
fileName=0x7f2d37b4677f "postgres_fdw.c", lineNumber=1863) at
assert.c:67
#3  0x7f2d37b3b0fe in postgresExecForeignInsert (estate=0x2456320,
resultRelInfo=0x23a8f58, slot=0x23a9480, planSlot=0x0) at
postgres_fdw.c:1862
#4  0x0066362a in CopyFrom (cstate=0x23a8d40) at copy.c:3331

The problem is that partition ffoo1's BR trigger prevents it from
using multi-insert, but its ResultRelInfo.ri_usesBulkModify is true,
which is copied from its parent.  We should really check the same
things for a partition that CopyFrom() checks for the main target
relation (root parent) when deciding whether to use multi-insert.
Thnx, I added TAP-test on this problem> However instead of duplicating 
the same logic to do so in two places

(CopyFrom and ExecInitPartitionInfo), I think it might be a good idea
to refactor the code to decide if multi-insert mode can be used for a
given relation by checking its properties and put it in some place
that both the main target relation and partitions need to invoke.
InitResultRelInfo() seems to be one such place.

+1


Also, it might be a good idea to use ri_usesBulkModify more generally
than only for foreign relations as the patch currently does, because I
can see that it can replace the variable insertMethod in CopyFrom().
Having both insertMethod and ri_usesBulkModify in each ResultRelInfo
seems confusing and bug-prone.

Finally, I suggest renaming ri_usesBulkModify to ri_usesMultiInsert to
reflect its scope.

Please check the attached delta patch that applies on top of v5 to see
what that would look like.

I merged your delta patch (see v6 in attachment) to the main patch.
Currently it seems more commitable than before.

--
regards,
Andrey Lepikhov
Postgres Professional
>From da6c4fe8262df58164e2c4ab80e085a019c9c6c1 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Thu, 9 Jul 2020 11:16:56 +0500
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopyIn
* EndForeignCopyIn
* ExecForeignCopyIn

BeginForeignCopyIn and EndForeignCopyIn initialize and free
the CopyState of bulk COPY. The ExecForeignCopyIn routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c|  60 ++-
 

Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)

2020-08-21 Thread Fujii Masao




On 2020/08/21 14:54, Pierre Giraud wrote:

Le 20/08/2020 à 17:41, Fujii Masao a écrit :



On 2020/08/20 22:34, Julien Rouhaud wrote:

On Thu, Aug 20, 2020 at 12:29 PM David Rowley 
wrote:


On Thu, 20 Aug 2020 at 19:58, Fujii Masao
 wrote:


On 2020/08/20 13:00, David Rowley wrote:

I forgot to mention earlier, you'll also need to remove the part in
the docs that mentions BUFFERS requires ANALYZE.


Thanks for the review! I removed that.
Attached is the updated version of the patch.
I also added the regression test performing "explain (buffers on)"
without "analyze" option.


Looks good to me.


Looks good to me too.


David and Julien, thanks for the review! I'd like to wait for
Pierre's opinion about this change before committing the patch.

Pierre,
could you share your opinion about this change?


It looks good to me too. Thanks a lot!


Pushed. Thanks!


Let's not forget to notify Hubert (depesz) once integrated.


You're going to do that?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [POC]Enable tuple change partition caused by BEFORE TRIGGER

2020-08-21 Thread Ashutosh Bapat
On Fri, Aug 21, 2020 at 1:28 PM movead...@highgo.ca  wrote:
>
> Hello hackers,
>
> Currently, if BEFORE TRIGGER causes a partition change, it reports an error 
> 'moving row
> to another partition during a BEFORE FOR EACH ROW trigger is not supported' 
> and fails
> to execute. I want to try to address this limitation and have made an initial 
> patch to get
> feedback  from other hackers.

I am not opposed to removing that limitation, it would be good to know
the usecase we will solve. Trying to change a partition key in a
before trigger on a partition looks dubious to me. If at all it should
be done by a partitioned table level trigger and not a partition level
trigger.

>
>
> The implemented approach is similar to when a change partition caused by an 
> UPDATE
>
> statement. If it's a BEFORE INSERT TRIGGER then we just need to insert the 
> row produced
>
> by a trigger to the new partition, and if it's a BEFORE UPDATE TRIGGER we 
> need to delete
>
> the old tuple and insert the row produced by the trigger to the new partition.

If the triggers are not written carefully, this could have ping-pong
effect, where the row keeps on bouncing from one partition to the
other. Obviously it will be user who must be blamed for this but with
thousands of partitions it's not exactly easy to keep track of the
trigger's effects. If we prohibited the row movement because of before
trigger, users don't need to worry about it at all.

>
>
> In current BEFORE TRIGGER implementation, it reports an error once a trigger 
> result out
>
> of current partition, but I think it should check it after finish all 
> triggers call, and you can
>
> see the discussion in [1][2]. In the attached patch I have changed this rule, 
> I check the
>
> partition constraint only once after all BEFORE TRIGGERS have been called. If 
> you do not
>
> agree with this way, I can change the implementation.

I think this change may be good irrespective of the row movement change.

>
>
> And another point is that when inserting to new partition caused by BEFORE 
> TRIGGER,
>
> then it will not trigger the BEFORE TRIGGER on a new partition. I think it's 
> the right way,
>
> what's more, I think the UPDATE approach cause partition change should not 
> trigger the
>
> BEFORE TRIGGERS too, you can see discussed on [1].
>

That looks dubious to me. Before row triggers may be used in several
different ways, for auditing, for verification of inserted data or to
change some other data based on this change and so on. If we don't
execute before row trigger on the partition where the row gets moved,
all this expected work won't happen. This also needs some background
about the usecase which requires this change.
--
Best Wishes,
Ashutosh Bapat




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-21 Thread Dilip Kumar
On Fri, Aug 21, 2020 at 3:13 PM Amit Kapila  wrote:
>
> On Fri, Aug 21, 2020 at 10:33 AM Dilip Kumar  wrote:
> >
> > On Fri, Aug 21, 2020 at 9:14 AM Amit Kapila  wrote:
> > >
> > > 2.
> > > + /*
> > > + * If the new location is smaller then the current location in file then
> > > + * we need to set the curFile and the curOffset to the new values and 
> > > also
> > > + * reset the pos and nbytes.  Otherwise nothing to do.
> > > + */
> > > + else if ((newFile < file->curFile) ||
> > > + newOffset < file->curOffset + file->pos)
> > > + {
> > > + file->curFile = newFile;
> > > + file->curOffset = newOffset;
> > > + file->pos = 0;
> > > + file->nbytes = 0;
> > > + }
> > >
> > > Shouldn't there be && instead of || because if newFile is greater than
> > > curFile then there is no meaning to update it?
> >
> > I think this condition is wrong it should be,
> >
> > else if ((newFile < file->curFile) || ((newFile == file->curFile) &&
> > (newOffset < file->curOffset + file->pos)
> >
> > Basically, either new file is smaller otherwise if it is the same
> > then-new offset should be smaller.
> >
>
> I think we don't need to use file->pos for that as that is required
> only for the current buffer, otherwise, such a condition should
> suffice the need. However, I was not happy with the way code and
> conditions were arranged in BufFileTruncateShared, so I have
> re-arranged them and change quite a few comments in that API. Apart
> from that I have updated the docs and ran pgindent for the first
> patch. Do let me know if you have any more comments on the first
> patch?

I have reviewed and tested the patch and the changes look fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Bogus documentation for bogus geometric operators

2020-08-21 Thread Emre Hasegeli
> While revising the docs for the geometric operators, I came across
> these entries:
>
> <^  Is below (allows touching)? circle '((0,0),1)' <^ circle 
> '((0,5),1)'
> >^  Is above (allows touching)? circle '((0,5),1)' >^ circle 
> >'((0,0),1)'
>
> These have got more than a few problems:
>
> 1. There are no such operators for circles, so the examples are pure
> fantasy.
>
> 2. What these operators do exist for is points (point_below, point_above
> respectively) and boxes (box_below_eq, box_above_eq).  However, the
> stated behavior is not what the point functions actually do:
>
> point_below(PG_FUNCTION_ARGS)
> ...
> PG_RETURN_BOOL(FPlt(pt1->y, pt2->y));
>
> point_above(PG_FUNCTION_ARGS)
> ...
> PG_RETURN_BOOL(FPgt(pt1->y, pt2->y));
>
> So point_below would be more accurately described as "is strictly below",
> so its operator name really ought to be <<|.  And point_above is "is
> strictly above", so its operator name ought to be |>>.

I prepared a patch to add <<| and |>> operators for points to
deprecate the previous ones.

> 3. The box functions do seem to be correctly documented:
>
> box_below_eq(PG_FUNCTION_ARGS)
> ...
> PG_RETURN_BOOL(FPle(box1->high.y, box2->low.y));
>
> box_above_eq(PG_FUNCTION_ARGS)
> ...
> PG_RETURN_BOOL(FPge(box1->low.y, box2->high.y));
>
> But there are comments in the source code to the effect of
>
>  * box_below_eq and box_above_eq are obsolete versions that (probably
>  * erroneously) accept the equal-boundaries case.  Since these are not
>  * in sync with the box_left and box_right code, they are deprecated and
>  * not supported in the PG 8.1 rtree operator class extension.
>
> I'm not sure how seriously to take this deprecation comment, but it
> is true that box_below (<<|) and box_above (|>>) have analogs for
> other data types while these functions don't.

I think we should take this comment seriously and deprecate those
operators, so the patch removes them from the documentation.

> 4. Just for extra fun, these point operators are listed in some
> GIST and SP-GIST opclasses; though the box ones are not, as per
> that code comment.

It also updates the operator classes to support the new operators
instead of former ones.  I don't think there are many users of them to
notice the change.

I am adding this to the next commitfest.


0001-Add-and-operators-for-points-v01.patch
Description: Binary data


Re: Problems with the FSM, heap fillfactor, and temporal locality

2020-08-21 Thread John Naylor
On Fri, Aug 21, 2020 at 2:48 AM Peter Geoghegan  wrote:
>
> I'm concerned about how the FSM gives out pages to heapam. Disabling
> the FSM entirely helps TPC-C/BenchmarkSQL, which uses non-default heap
> fillfactors for most tables [1].

Hi Peter,

Interesting stuff. Is lower-than-default fillfactor important for the
behavior you see?

> located on the same blocks (or at least on neighboring blocks). It's
> possible that one orderline will span two neighboring blocks here, but
> it will never span three or more blocks. Each order has 5 - 15 order
> lines, and so I was surprised to see that a small minority or order
> line tuples end up occupying as many as 5 or 7 heap pages on the
> master branch (i.e. with the current FSM intact during bulk loading).

Hmm. You focus on FSM, but I'm thinking also multiple VM bits
potentially got cleared (maybe not this benchmark but other
scenarios).

> If the FSM tried to get free space from a close-by block, then we
> might at least see related updates that cannot fit a successor tuple
> on the same block behave in a coordinated fashion. We might at least
> have both updates relocate the successor tuple to the same
> mostly-empty block -- they both notice the same nearby free block, so
> both sets of successor tuples end up going on the same most-empty
> block. The two updating backends don't actually coordinate, of course
> -- this just happens as a consequence of looking for nearby free
> space.

I'm not sure I follow the "close-by" criterion. It doesn't seem
immediately relevant to what I understand as the main problem you
found, of free space. In other words, if we currently write to 5
blocks, but with smarter FSM logic we can find a single block, it
seems that should be preferred over close-by blocks each with less
space? Or are you envisioning scoring by both free space and distance
from the original block?

> supposed to be reserved. It should not be enough for the space used on
> the page to shrink by just 1% (to 89%). Maybe it should have to reach
> as low as 50% or less before we flip it back to "free to take space
> from for new unrelated tuples". The idea is that fill factor space is
> truly reserved for updates -- that should be "sticky" for all of this
> to work well.

Makes sense. If we go this route, I wonder if we should keep the
current behavior and use any free space if the fillfactor is 100%,
since that's in line with the intention. Also, the 50% (or whatever)
figure could be scaled depending on fillfactor. As in, check if
freespace > (100% - fillfactor * 0.6), or something.

I'm not sure how to distinguish blocks that have never reached
fillfactor vs. ones that did but haven't gained back enough free space
to be considered again. Naively, we could start by assuming the last
block can always be filled up to fillfactor, but earlier blocks must
use the stricter rule. That's easy since we already try the last block
anyway before extending the relation.

The flip side of this is: Why should vacuum be in a hurry to dirty a
page, emit WAL, and update the FSM if it only removes one dead tuple?
This presentation [1] (pages 35-43) from Masahiko Sawada had the idea
of a "garbage map", which keeps track of which parts of the heap have
the most dead space, and focus I/O efforts on those blocks first. It
may or may not be worth the extra complexity by itself, but it seems
it would work synergistically with your proposal: Updating related
tuples would concentrate dead tuples on fewer pages, and vacuums would
more quickly free up space that can actually be used to store multiple
new tuples.

[1] 
https://www.slideshare.net/masahikosawada98/vacuum-more-efficient-than-ever-99916671

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: proposal - function string_to_table

2020-08-21 Thread Pavel Stehule
pá 21. 8. 2020 v 11:08 odesílatel Pavel Stehule 
napsal:

>
>
> pá 21. 8. 2020 v 9:44 odesílatel Peter Smith 
> napsal:
>
>> On Fri, Aug 21, 2020 at 5:21 AM Pavel Stehule 
>> wrote:
>>
>> > new patch attached
>>
>> Thanks for taking some of my previous review comments.
>>
>> I have re-checked the string_to_table_20200820.patch.
>>
>> Below are some remaining questions/comments:
>>
>> 
>>
>> COMMENT (help text)
>>
>> +Splits the string at occurrences
>> +of delimiter and forms the remaining data
>> +into a text tavke.
>>
>> What did you mean by "remaining" in that description?
>> It gets a bit strange thinking about remaining NULLs, or remaining
>> empty strings.
>>
>> Why not just say "... and forms the data into a text table."
>>
>> ---
>>
>> +Splits the string at occurrences
>> +of delimiter and forms the remaining data
>> +into a text tavke.
>>
>> Typo: "tavke." -> "table."
>>
>
> This text is taken from doc for string_to_array
>

I fixed typo. I hope and expect so doc will be finalized by native
speakers.


>
>
>> 
>>
>> COMMENT (help text reference to regexp_split_to_table)
>>
>> +input string can be done by function
>> +regexp_split_to_table (see > linkend="functions-posix-regexp"/>).
>> +   
>>
>> In the previous review I suggested adding a reference to the
>> regexp_split_to_table function.
>> A hyperlink would be a bonus, but maybe it is not possible.
>>
>> The hyperlink added in the latest patch is to page for POSIX Regular
>> Expressions, which doesn't seem appropriate.
>>
>
> ok I remove it
>
>>
>> 
>>
>> QUESTION (test cases)
>>
>> Thanks for merging lots of my additional test cases!
>>
>> Actually, the previous PDF I sent was 2 pages long but you only merged
>> the tests of page 1.
>> I wondered was it accidental to omit all those 2nd page tests?
>>
>
> I'll check it
>

I forgot it - now it is merged. Maybe it is over dimensioned for one
function, but it is (at the end) a test of string_to_array function too.


>
>> 
>>
>> QUESTION (function name?)
>>
>> I noticed that ALL current string functions that use delimiters have
>> the word "split" in their name.
>>
>> e.g.
>> * regexp_split_to_array
>> * regexp_split_to_table
>> * split_part
>>
>> But "string_to_table" is not following this pattern.
>>
>> Maybe a different choice of function name would be more consistent
>> with what is already there?
>> e.g.  split_to_table, string_split_to_table, etc.
>>
>
> I don't agree. This function is twin (with almost identical behaviour) for
> "string_to_array" function, so I think so the name is correct.
>

Unfortunately - there is not consistency in naming already, But I think so
string_to_table is a better name, because this function is almost identical
with string_to_array.

Regards

Pavel


>
>> 
>>
>> Kind Regards,
>> Peter Smith.
>> Fujitsu Australia
>>
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9a4ac5a1ea..56cecff6ca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3458,6 +3458,36 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ string_to_table
+
+string_to_table ( string text, delimiter text  , nullstr text   )
+set of text
+   
+   
+Splits the string at occurrences
+of delimiter and forms the remaining data
+into a table with one text type column.
+If delimiter is NULL,
+each character in the string will become a
+separate element in the array.
+If delimiter is an empty string, then
+the string is treated as a single field.
+If null_string is supplied and is
+not NULL, fields matching that string are converted
+to NULL entries.
+   
+   
+string_to_table('xx~^~yy~^~zz', '~^~', 'yy')
+
+xx
+NULL
+zz
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index df10bfb906..14213c3318 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -26,6 +26,7 @@
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
+#include "nodes/execnodes.h"
 #include "parser/scansup.h"
 #include "port/pg_bswap.h"
 #include "regex/regex.h"
@@ -35,6 +36,7 @@
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/sortsupport.h"
+#include "utils/tuplestore.h"
 #include "utils/varlena.h"
 
 
@@ -92,6 +94,16 @@ typedef struct
 	pg_locale_t locale;
 } VarStringSortSupport;
 
+/*
+ * Holds target metadata used for split string to array or to table.
+ */
+typedef struct
+{
+	ArrayBuildState	*astate;
+	Tuplestorestate *tupstore;
+	TupleDesc	tupdesc;
+} SplitStringTargetData;
+
 /*
  * This should be large enough that most strings will fit, but small enough
  * that we feel comfortable putting it on the stack
@@ -139,7 +151,7 @@ static bytea *bytea_substring(Datum str,
 		

Re: proposal - function string_to_table

2020-08-21 Thread Pavel Stehule
pá 21. 8. 2020 v 9:44 odesílatel Peter Smith  napsal:

> On Fri, Aug 21, 2020 at 5:21 AM Pavel Stehule 
> wrote:
>
> > new patch attached
>
> Thanks for taking some of my previous review comments.
>
> I have re-checked the string_to_table_20200820.patch.
>
> Below are some remaining questions/comments:
>
> 
>
> COMMENT (help text)
>
> +Splits the string at occurrences
> +of delimiter and forms the remaining data
> +into a text tavke.
>
> What did you mean by "remaining" in that description?
> It gets a bit strange thinking about remaining NULLs, or remaining
> empty strings.
>
> Why not just say "... and forms the data into a text table."
>
> ---
>
> +Splits the string at occurrences
> +of delimiter and forms the remaining data
> +into a text tavke.
>
> Typo: "tavke." -> "table."
>

This text is taken from doc for string_to_array


> 
>
> COMMENT (help text reference to regexp_split_to_table)
>
> +input string can be done by function
> +regexp_split_to_table (see  linkend="functions-posix-regexp"/>).
> +   
>
> In the previous review I suggested adding a reference to the
> regexp_split_to_table function.
> A hyperlink would be a bonus, but maybe it is not possible.
>
> The hyperlink added in the latest patch is to page for POSIX Regular
> Expressions, which doesn't seem appropriate.
>

ok I remove it

>
> 
>
> QUESTION (test cases)
>
> Thanks for merging lots of my additional test cases!
>
> Actually, the previous PDF I sent was 2 pages long but you only merged
> the tests of page 1.
> I wondered was it accidental to omit all those 2nd page tests?
>

I'll check it

>
> 
>
> QUESTION (function name?)
>
> I noticed that ALL current string functions that use delimiters have
> the word "split" in their name.
>
> e.g.
> * regexp_split_to_array
> * regexp_split_to_table
> * split_part
>
> But "string_to_table" is not following this pattern.
>
> Maybe a different choice of function name would be more consistent
> with what is already there?
> e.g.  split_to_table, string_split_to_table, etc.
>

I don't agree. This function is twin (with almost identical behaviour) for
"string_to_array" function, so I think so the name is correct.


> 
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>


[POC]Enable tuple change partition caused by BEFORE TRIGGER

2020-08-21 Thread movead...@highgo.ca
Hello hackers,

Currently, if BEFORE TRIGGER causes a partition change, it reports an error 
'moving row
to another partition during a BEFORE FOR EACH ROW trigger is not supported' and 
fails
to execute. I want to try to address this limitation and have made an initial 
patch to get
feedback  from other hackers.

The implemented approach is similar to when a change partition caused by an 
UPDATE
statement. If it's a BEFORE INSERT TRIGGER then we just need to insert the row 
produced
by a trigger to the new partition, and if it's a BEFORE UPDATE TRIGGER we need 
to delete
the old tuple and insert the row produced by the trigger to the new partition.

In current BEFORE TRIGGER implementation, it reports an error once a trigger 
result out
of current partition, but I think it should check it after finish all triggers 
call, and you can
see the discussion in [1][2]. In the attached patch I have changed this rule, I 
check the
partition constraint only once after all BEFORE TRIGGERS have been called. If 
you do not
agree with this way, I can change the implementation.

And another point is that when inserting to new partition caused by BEFORE 
TRIGGER,
then it will not trigger the BEFORE TRIGGER on a new partition. I think it's 
the right way,
what's more, I think the UPDATE approach cause partition change should not 
trigger the
BEFORE TRIGGERS too, you can see discussed on [1].

[1]https://www.postgresql.org/message-id/2020082017164661079648%40highgo.ca
[2]https://www.postgresql.org/message-id/20200318210213.GA9781@alvherre.pgsql



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


enable_partition_change_by_BEFORE_TRIGGER.patch
Description: Binary data


Re: Implementing Incremental View Maintenance

2020-08-21 Thread Yugo NAGATA
On Wed, 19 Aug 2020 10:02:42 +0900 (JST)
Tatsuo Ishii  wrote:

> I have looked into this.

Thank you for your reviewing!
 
> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch:
>   This one needs a comment to describe what the function does etc.
> 
>   +void
>   +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
>   +{

I added a comment for this function and related places.

+/*
+ * SetTransitionTablePreserved
+ *
+ * Prolong lifespan of transition tables corresponding specified relid and
+ * command type to the end of the outmost query instead of each nested query.
+ * This enables to use nested AFTER trigger's transition tables from outer
+ * query's triggers.  Currently, only immediate incremental view maintenance
+ * uses this.
+ */
+void
+SetTransitionTablePreserved(Oid relid, CmdType cmdType)

Also, I removed releted unnecessary code which was left accidentally.

 
> - 0007-Add-aggregates-support-in-IVM.patch
>   "Check if the given aggregate function is supporting" shouldn't be
>   "Check if the given aggregate function is supporting IVM"?

Yes, you are right. I fixed this, too.

> 
> + * check_aggregate_supports_ivm
> + *
> + * Check if the given aggregate function is supporting


Regards,
Yugo Nagata


-- 
Yugo NAGATA 


IVM_patches_v17.tar.gz
Description: application/gzip


Re: Libpq support to connect to standby server as priority

2020-08-21 Thread Peter Smith
Hi Greg,

> Thanks for the further review, an updated patch is attached. Please
> see my responses to your comments below:
>

Thanks for addressing all of my previous review comments in your new v19 patch.

Everything looks good to me now, so I am marking this as "ready for committer".

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: proposal - function string_to_table

2020-08-21 Thread Peter Smith
On Fri, Aug 21, 2020 at 5:21 AM Pavel Stehule  wrote:

> new patch attached

Thanks for taking some of my previous review comments.

I have re-checked the string_to_table_20200820.patch.

Below are some remaining questions/comments:



COMMENT (help text)

+Splits the string at occurrences
+of delimiter and forms the remaining data
+into a text tavke.

What did you mean by "remaining" in that description?
It gets a bit strange thinking about remaining NULLs, or remaining
empty strings.

Why not just say "... and forms the data into a text table."

---

+Splits the string at occurrences
+of delimiter and forms the remaining data
+into a text tavke.

Typo: "tavke." -> "table."



COMMENT (help text reference to regexp_split_to_table)

+input string can be done by function
+regexp_split_to_table (see ).
+   

In the previous review I suggested adding a reference to the
regexp_split_to_table function.
A hyperlink would be a bonus, but maybe it is not possible.

The hyperlink added in the latest patch is to page for POSIX Regular
Expressions, which doesn't seem appropriate.



QUESTION (test cases)

Thanks for merging lots of my additional test cases!

Actually, the previous PDF I sent was 2 pages long but you only merged
the tests of page 1.
I wondered was it accidental to omit all those 2nd page tests?



QUESTION (function name?)

I noticed that ALL current string functions that use delimiters have
the word "split" in their name.

e.g.
* regexp_split_to_array
* regexp_split_to_table
* split_part

But "string_to_table" is not following this pattern.

Maybe a different choice of function name would be more consistent
with what is already there?
e.g.  split_to_table, string_split_to_table, etc.



Kind Regards,
Peter Smith.
Fujitsu Australia




Re: display offset along with block number in vacuum errors

2020-08-21 Thread Masahiko Sawada
On Thu, 20 Aug 2020 at 21:12, Amit Kapila  wrote:
>
> On Thu, Aug 20, 2020 at 12:32 PM Amit Kapila  wrote:
> >
> > On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada
> >  wrote:
> > >
> > > On Thu, 20 Aug 2020 at 14:01, Amit Kapila  wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
> > > >  wrote:
> > > >
> > > > Here, we can notice that for the index, we are getting context
> > > > information but not for the heap. The reason is that in
> > > > vacuum_error_callback, we are not printing additional information for
> > > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
> > > > when block number is invalid. If we want to cover the 'info' messages
> > > > then won't it be better if we print a message in those phases even
> > > > block number is invalid (something like 'while scanning relation
> > > > \"%s.%s\"")
> > >
> > > Yeah, there is an inconsistency. I agree to print the message even
> > > when the block number is invalid.
> > >
> >
> > Okay, I will update this and send this patch and rebased patch to
> > display offsets later today or tomorrow.
> >
>
> Attached are both the patches. The first one is to improve existing
> error context information, so I think we should back-patch to 13. The
> second one is to add additional vacuum error context information, so
> that is for only HEAD. Does that make sense? Also, let me know if you
> have any more comments.

Yes, makes sense to me.

I don't have comments on both patches. They look good to me.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Fix typo in procarrary.c

2020-08-21 Thread Masahiko Sawada
On Fri, 21 Aug 2020 at 12:39, Fujii Masao  wrote:
>
>
>
> On 2020/08/21 12:29, Masahiko Sawada wrote:
> > On Fri, 21 Aug 2020 at 11:18, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/08/21 10:58, Masahiko Sawada wrote:
> >>> Hi,
> >>>
> >>> I've attached the patch for $subject.
> >>>
> >>> s/replications lots/replication slots/
> >>
> >> Thanks for the patch!
> >>
> >> Also it's better to s/replications slots/replication slots/ ?
> >>
> >> --- a/src/backend/storage/ipc/procarray.c
> >> +++ b/src/backend/storage/ipc/procarray.c
> >> @@ -198,7 +198,7 @@ typedef struct ComputeXidHorizonsResult
> >>* be removed.
> >>*
> >>* This likely should only be needed to determine whether 
> >> pg_subtrans can
> >> -* be truncated. It currently includes the effects of replications 
> >> slots,
> >> +* be truncated. It currently includes the effects of replication 
> >> slots,
> >>* for historical reasons. But that could likely be changed.
> >>*/
> >>   TransactionId oldest_considered_running;
> >>
> >
> > Indeed. I agree with you.
>
> Thanks! So I pushed both fixes.
>

Thanks!

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Asymmetric partition-wise JOIN

2020-08-21 Thread Andrey V. Lepikhov

On 7/1/20 2:10 PM, Daniel Gustafsson wrote:

On 27 Dec 2019, at 08:34, Kohei KaiGai  wrote:



The attached v2 fixed the problem, and regression test finished correctly.


This patch no longer applies to HEAD, please submit an rebased version.
Marking the entry Waiting on Author in the meantime.

Rebased version of the patch on current master (d259afa736).

I rebased it because it is a base of my experimental feature than we 
don't break partitionwise join of a relation with foreign partition and 
a local relation if we have info that remote server has foreign table 
link to the local relation (by analogy with shippable extensions).


Maybe mark as 'Needs review'?

--
regards,
Andrey Lepikhov
Postgres Professional
>From 8dda8c4ba29ed4b2a54f66746ebedd9ab0bfded9 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Fri, 21 Aug 2020 10:38:59 +0500
Subject: [PATCH] Add one more planner strategy to JOIN with a partitioned
 relation.

Try to join inner relation with each partition of outer relation
and append results.
This strategy has potential benefits because it allows partitionwise
join with an unpartitioned relation or with a relation that is
partitioned by another schema.
---
 src/backend/optimizer/path/allpaths.c|   9 +-
 src/backend/optimizer/path/joinpath.c|   9 ++
 src/backend/optimizer/path/joinrels.c| 132 +
 src/backend/optimizer/plan/planner.c |   6 +-
 src/backend/optimizer/util/appendinfo.c  |  18 ++-
 src/backend/optimizer/util/relnode.c |  14 +-
 src/include/optimizer/paths.h|  10 +-
 src/test/regress/expected/partition_join.out | 145 +++
 src/test/regress/sql/partition_join.sql  |  63 
 9 files changed, 385 insertions(+), 21 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6da0dcd61c..4f110c5a2f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1278,7 +1278,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/* Add paths to the append relation. */
-	add_paths_to_append_rel(root, rel, live_childrels);
+	add_paths_to_append_rel(root, rel, live_childrels, NIL);
 }
 
 
@@ -1295,7 +1295,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  */
 void
 add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
-		List *live_childrels)
+		List *live_childrels,
+		List *original_partitioned_rels)
 {
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
@@ -1307,7 +1308,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
-	List	   *partitioned_rels = NIL;
+	List	   *partitioned_rels = original_partitioned_rels;
 	double		partial_rows = -1;
 
 	/* If appropriate, consider parallel append */
@@ -3950,7 +3951,7 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
 	}
 
 	/* Build additional paths for this rel from child-join paths. */
-	add_paths_to_append_rel(root, rel, live_children);
+	add_paths_to_append_rel(root, rel, live_children, NIL);
 	list_free(live_children);
 }
 
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..36464e31aa 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -324,6 +324,15 @@ add_paths_to_joinrel(PlannerInfo *root,
 	if (set_join_pathlist_hook)
 		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
 			   jointype, );
+
+	/*
+	 * 7. If outer relation is delivered from partition-tables, consider
+	 * distributing inner relation to every partition-leaf prior to
+	 * append these leafs.
+	 */
+	try_asymmetric_partitionwise_join(root, joinrel,
+	  outerrel, innerrel,
+	  jointype, );
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2d343cd293..4a7d0d0604 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
 
 #include "miscadmin.h"
 #include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
 #include "optimizer/joininfo.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -1551,6 +1552,137 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 	}
 }
 
+static List *
+extract_asymmetric_partitionwise_subjoin(PlannerInfo *root,
+		 RelOptInfo *joinrel,
+		 AppendPath *append_path,
+		 RelOptInfo *inner_rel,
+		 JoinType jointype,
+		 JoinPathExtraData *extra)
+{
+	List		*result = NIL;
+	ListCell	*lc;
+
+	foreach (lc, append_path->subpaths)
+	{
+		Path			*child_path = lfirst(lc);
+		RelOptInfo		*child_rel = child_path->parent;
+		Relids			child_join_relids;
+		RelOptInfo		*child_join_rel;
+		SpecialJoinInfo	*child_sjinfo;
+		List			*child_restrictlist;
+		AppendRelInfo