Re: Date-time extraneous fields with reserved keywords

2023-03-07 Thread Keisuke Kuroda
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Thank you for the response and new patch.

The scope of impact is limited to 'epoch' and 'infinity'.
Also, it is unlikely that these reserved words will be 
used in combination with time/date, so this patch is appropriate.

The new status of this patch is: Ready for Committer


Re: Date-time extraneous fields with reserved keywords

2023-03-02 Thread Keisuke Kuroda
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi Joseph,

Good catch.
Of the reserved words that are special values of type Date/Time, 
'now', 'today', 'tomorrow', 'yesterday', and 'allballs', 
I get an error even before applying the patch.

One thing I noticed is that the following SQL 
returns normal results even after applying the patch.

postgres=# select timestamp 'epoch 01:01:01';
  timestamp
-
 1970-01-01 00:00:00
(1 row)

When 'epoch','infinity','-infinity' and time are specified together, 
the time specified in the SQL is not included in result.
I think it might be better to assume that this pattern is also an error.
What do you think?

As a side note,
reserved words such as 'today', 'tomorrow', and 'yesterday'
can be used to specify a time.

postgres=# select timestamp 'today 01:01:01';
  timestamp
-
 2023-03-03 01:01:01
(1 row)

Best Regards,
Keisuke Kuroda
NTT Comware

The new status of this patch is: Waiting on Author


Re: Huge memory consumption on partitioned table with FKs

2021-03-10 Thread Keisuke Kuroda
Hi hackers,

> >
> > I did some cosmetic fooling with this (mostly, rewriting the comments
> > YA time) and pushed it.
>
> Perfect.   Thanks for your time on this.
>

Thank you for your help! I'm glad to solve it.

-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: simplifying foreign key/RI checks

2021-01-26 Thread Keisuke Kuroda
Hi Amit-san,

Thanks for the answer!

> If you only tested insert/update on the referencing table, I would've
> expected to see nothing in the result of that query, because the patch
> eliminates all use of SPI in that case.  I suspect the CachedPlan*
> memory contexts you are seeing belong to some early activity in the
> session.  So if you try the insert/update in a freshly started
> session, you would see 0 rows in the result of that query.

That's right.
CREATE PARTITION TABLE included in the test script(rep.sql) was using SPI.
In a new session, I confirmed that CachedPlan is not generated when only
execute INSERT.

# only execute INSERT

postgres=# INSERT INTO ps SELECT generate_series(1,4999);
INSERT 0 4999
postgres=#
postgres=# INSERT INTO pr SELECT i, i from generate_series(1,4999)i;
INSERT 0 4999

postgres=# SELECT name, sum(used_bytes) as bytes,
pg_size_pretty(sum(used_bytes)) FROM pg_backend_memory_contexts
WHERE name LIKE 'Cached%' GROUP BY name;

 name | bytes | pg_size_pretty
--+---+
(0 rows) ★ No CachedPlan

> Hmm, the patch tries to solve a general problem that SPI plans are not
> being shared among partitions whereas they should be.   So I don't
> think that it's necessarily specific to DELETE.  Until we have a
> solution like the patch on this thread for DELETE, it seems fine to
> consider the other patch as a stopgap solution.

I see.
So this is a solution to the problem of using SPI plans in partitions,
not just DELETE.
I agree with you, I think this is a solution to the current problem.

Best Regards,



-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: simplifying foreign key/RI checks

2021-01-25 Thread Keisuke Kuroda
Hi, Amit-san,

Nice patch. I have confirmed that this solves the problem in [1] with
INSERT/UPDATE.

HEAD + patch
   name   | bytes | pg_size_pretty
--+---+
 CachedPlanQuery  | 10280 | 10 kB
 CachedPlanSource | 14616 | 14 kB
 CachedPlan   | 13168 | 13 kB ★ 710MB -> 13kB
(3 rows)

> > This patch completely sidesteps the DELETE case, which has more insidious 
> > performance implications, but is also far less common, and whose solution 
> > will likely be very different.
>
> Yeah, we should continue looking into the ways to make referenced-side
> RI checks be less bloated.

However, as already mentioned, the problem of memory bloat on DELETE remains.
This can be solved by the patch in [1], but I think it is too much to apply
this patch only for DELETE. What do you think?

[1] 
https://www.postgresql.org/message-id/flat/cab4b85d-9292-967d-adf2-be0d803c3e23%40nttcom.co.jp_1

-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Huge memory consumption on partitioned table with FKs

2021-01-11 Thread Keisuke Kuroda
Hi Amit-san,

> Thanks for checking.  Indeed, it should have been added to the January
> commit-fest.  I've added it to the March one:
>
> https://commitfest.postgresql.org/32/2930/

Thank you for your quick response!

-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Huge memory consumption on partitioned table with FKs

2021-01-07 Thread Keisuke Kuroda
Hi Amit-san,

I noticed that this patch
(v3-0001-ri_triggers.c-Use-root-constraint-OID-as-key-to-r.patch)
is not registered in the commitfest. I think it needs to be registered for
commit, is that correct?

I have confirmed that this patch can be applied to HEAD (master),
and that check-world PASS.

Best Regards,

-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Huge memory consumption on partitioned table with FKs

2020-12-03 Thread Keisuke Kuroda
Hi Amit,

> I have attached a patch in which I've tried to merge the ideas from
> both my patch and Kuroda-san's.  I liked that his patch added
> conparentid to RI_ConstraintInfo because that saves a needless
> syscache lookup for constraints that don't have a parent.  I've kept
> my idea to compute the root constraint id only once in
> ri_LoadConstraint(), not on every invocation of ri_BuildQueryKey().
> Kuroda-san, anything you'd like to add to that?

Thank you for the merge! It looks good to me.
I think a fix for InvalidateConstraintCacheCallBack() is also good.

I also confirmed that the patch passed the make check-world.

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Huge memory consumption on partitioned table with FKs

2020-12-02 Thread Keisuke Kuroda
Hi Hackers,

>> I would embed all this knowledge in ri_BuildQueryKey though, without
>> adding the new function ri_GetParentConstOid.  I don't think that
>> function meaningful abstraction value, and instead it would make what I
>> suggest more difficult.

> It seems to me reasonable.

Indeed, I tried to get the conparentid with ri_BuildQueryKey.
(v2_reduce_ri_SPI-plan-hash.patch)

> Somewhat of a detour, but in reviewing the patch for
> Statement-Level RI checks, Andres and I observed that SPI
> made for a large portion of the RI overhead.

If this can be resolved by reviewing the Statement-Level RI checks,
then I think it would be better.

> BTW, one problem with Kuroda-san's patch is that it's using
> conparentid as the shared key, which can still lead to multiple
> queries being generated when using multi-level partitioning, because
> there would be many (intermediate) parent constraints in that case.

Horiguchi-san also mentiond,
In my patch, in ri_GetParentConstOid, iterate on getting
the constraint OID until comparentid == InvalidOid.
This should allow to get the "root constraint OID".

However, the names "comparentid" and "ri_GetParentConstOid"
are confusing. We should use names like "constraint_root_id",
as in Amit-san's patch.

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com


v2_reduce_ri_SPI-plan-hash.patch
Description: Binary data


Re: Huge memory consumption on partitioned table with FKs

2020-11-25 Thread Keisuke Kuroda
Hi Hackers,

Analyzed the problem and created a patch to resolve it.

# Problem 1

When you create a foreign key to a partitioned table, referential
integrity function is created for the number of partitions.
Internally, SPI_prepare() creates a plan and SPI_keepplan()
caches the plan.

The more partitions in the referencing table, the more plans will
be cached.

# Problem 2

When referenced table is partitioned table, the larger the number
of partitions, the larger the plan size to be cached.

The actual plan processed is simple and small if pruning is
enabled. However, the cached plan will include all partition
information.

The more partitions in the referenced table, the larger the plan
size to be cached.

# Idea for solution

Constraints with the same pg_constraint.parentid can be combined
into one plan with the same comparentid if we can guarantee that
all their contents are the same.

Problem 1 can be solved
and significant memory bloat can be avoided.
CachedPlan: 710MB -> 1466kB

Solving Problem 2 could also reduce memory,
but I don't have a good idea.

Currently, DISCARD ALL does not discard CachedPlan by SPI as in
this case. It may be possible to modify DISCARD ALL to discard
CachedPlan and run it periodically. However, we think the burden
on the user is high.

# result with patch(PG14 HEAD(e522024b) + patch)

   name   |  bytes  | pg_size_pretty
--+-+
 CachedPlanQuery  |   12912 | 13 kB
 CachedPlanSource |   17448 | 17 kB
 CachedPlan   | 1501192 | 1466 kB

CachedPlan * 1 Record

postgres=# SELECT count(*) FROM pg_backend_memory_contexts WHERE name
= 'CachedPlan' AND ident LIKE 'SELECT 1 FROM%';
 count
---
 1

postgres=# SELECT * FROM pg_backend_memory_contexts WHERE name =
'CachedPlan' AND ident LIKE 'SELECT 1 FROM%';
-[ RECORD 1 
]-+--
name  | CachedPlan
ident | SELECT 1 FROM "public"."ps" x WHERE "c1"
OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
parent| CacheMemoryContext
level | 2
total_bytes   | 2101248
total_nblocks | 12
free_bytes| 613256
free_chunks   | 1
used_bytes| 1487992
(1 record)

# result without patch(PG14 HEAD(e522024b))

   name   |   bytes   | pg_size_pretty
--+---+
 CachedPlanQuery  |   1326280 | 1295 kB
 CachedPlanSource |   1474528 | 1440 kB
 CachedPlan   | 744009200 | 710 MB

CachedPlan * 500 Records

postgres=# SELECT count(*) FROM pg_backend_memory_contexts WHERE name
= 'CachedPlan' AND ident LIKE 'SELECT 1 FROM%';
 count
---
   500

SELECT * FROM pg_backend_memory_contexts WHERE name = 'CachedPlan' AND
ident LIKE 'SELECT 1 FROM%';
name  | CachedPlan
ident | SELECT 1 FROM "public"."ps" x WHERE "c1"
OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
parent| CacheMemoryContext
level | 2
total_bytes   | 2101248
total_nblocks | 12
free_bytes| 613256
free_chunks   | 1
used_bytes| 1487992
...(500 same records)

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com


v1_reduce_ri_SPI-plan-hash.patch
Description: Binary data


Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-10-15 Thread Keisuke Kuroda
Hi Amit, Dilip,

I am glad to solve this problem, Thanks!

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-10-08 Thread Keisuke Kuroda
Hi Dilip,

> I could not see this issue even without the patch, it is taking less
> than 1s even without the patch.  See below results
>
> 2020-10-08 13:00:49 BEGIN 509
> 2020-10-08 13:00:49 table nsp_001.part_0001: INSERT:...
> 2020-10-08 13:00:49 COMMIT 509 (at 2020-10-08 13:00:48.741986+05:30)
>
> Am I missing something?

Thanks for running the tests.
It is the TRUNCATE decoding that takes time.
INSERT decoding is fast, even before the patch is applied.

2020-10-02 14:55:48 BEGIN 508
2020-10-02 14:55:48 table nsp_001.part_0001: INSERT ...
2020-10-02 14:55:49 COMMIT 508 (at 2020-10-02 14:55:48.744019+09)

However, TRUNCATE decode is slow
and take 222s in my environment.

2020-10-02 14:55:50 BEGIN 509
2020-10-02 14:59:42 table nsp_001.tbl_001 ... ns p_001.part_1000:
TRUNCATE: (no-flags)
2020-10-02 14:59:42 COMMIT 509 (at 2020-10-02 14:55:50.349219+09)

This script will wait 10 seconds after INSERT exits
before executing TRUNCATE, please wait for it to run.

When TRUNCATE completes,
the walsender process should be at 100% CPU.

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-10-02 Thread Keisuke Kuroda
Hi Dilip, Amit,

> > 5. Can you please once repeat the performance test done by Keisuke-San
> > to see if you have similar observations? Additionally, see if you are
> > also seeing the inconsistency related to the Truncate message reported
> > by him and if so why?
> >
>
> Okay, I will set up and do this test early next week.  Keisuke-San,
> can you send me your complete test script?

Yes, I've attached a test script(test_pg_recvlogical.sh)

Sorry, the issue with TRUNCATE not outputting was due to a build miss
on my part.
Even before the patch, TRUNCATE decodes and outputting correctly.
So, please check the performance only.

I have tested it again and will share the results with you.

Also, the argument of palloc was still MemoryContextAlloc,
which prevented me from applying the patch, so I've only fixed that part.

# test script

Please set PGHOME and CLUSTER_PUB before run.

sh test_pg_recvlogical.sh

# perf command

perf record --call-graph dwarf -p [walsender pid]
perf report -i perf.data --no-children

# before patch

decode + invalidation = 222s

2020-10-02 14:55:50 BEGIN 509
2020-10-02 14:59:42 table nsp_001.tbl_001, nsp_001.part_0001 ...
nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags)
2020-10-02 14:59:42 COMMIT 509 (at 2020-10-02 14:55:50.349219+09)

Samples: 229K of event 'cpu-clock:uhH', Event count (approx.): 5734725
  Overhead  Command   Shared Object  Symbol
-   96.07%  postgres  postgres   [.] hash_seq_search
 hash_seq_search
   - RelfilenodeMapInvalidateCallback
  - LocalExecuteInvalidationMessage
ReorderBufferExecuteInvalidations
ReorderBufferProcessTXN
ReorderBufferCommit
DecodeCommit
DecodeXactOp
LogicalDecodingProcessRecord
XLogSendLogical
WalSndLoop
StartLogicalReplication
exec_replication_command
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
main
__libc_start_main
_start
 0.74%  postgres  postgres   [.] LocalExecuteInvalidationMessage
 0.47%  postgres  postgres   [.] hash_bytes_uint32
 0.46%  postgres  postgres   [.] hash_search_with_hash_value
 0.37%  postgres  postgres   [.] RelfilenodeMapInvalidateCallback
 0.31%  postgres  postgres   [.] CatCacheInvalidate
 0.22%  postgres  postgres   [.] uint32_hash
 0.21%  postgres  postgres   [.] PlanCacheRelCallback
 0.20%  postgres  postgres   [.] hash_seq_init
 0.17%  postgres  postgres   [.] ReorderBufferExecuteInvalidations

# after patch

decode + invalidation = 1s or less

2020-10-02 15:20:08 BEGIN 509
2020-10-02 15:20:08 table nsp_001.tbl_001, nsp_001.part_0001 ...
nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags)
2020-10-02 15:20:08 COMMIT 509 (at 2020-10-02 15:20:08.704503+09)

Samples: 310  of event 'cpu-clock:uhH', Event count (approx.): 7750
  Overhead  Command   Shared Object   Symbol
-   49.35%  postgres  postgres[.] hash_seq_search
 hash_seq_search
 RelfilenodeMapInvalidateCallback
 LocalExecuteInvalidationMessage
 ReorderBufferExecuteInvalidations
 ReorderBufferProcessTXN
 ReorderBufferCommit
 DecodeCommit
 DecodeXactOp
 LogicalDecodingProcessRecord
 XLogSendLogical
 WalSndLoop
 StartLogicalReplication
 exec_replication_command
 PostgresMain
 BackendRun
 BackendStartup
 ServerLoop
 PostmasterMain
 main
 __libc_start_main
 _start
2.58%  postgres  libc-2.17.so[.] __memset_sse2
2.58%  postgres  libpthread-2.17.so  [.] __libc_recv
1.61%  postgres  libc-2.17.so[.] __memcpy_ssse3_back
1.61%  postgres  libpthread-2.17.so  [.] __errno_location
1.61%  postgres  postgres[.] AllocSetAlloc
1.61%  postgres  postgres[.] DecodeXLogRecord
1.29%  postgres  postgres[.] AllocSetFree
1.29%  postgres  postgres[.] hash_bytes_uint32
0.97%  postgres  [vdso]  [.] __vdso_gettimeofday
0.97%  postgres  postgres[.] LocalExecuteInvalidationMessage

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com
#!/bin/bash

PGHOME="/home/postgres/pgsql/master_20201002_1"
CLUSTER_PUB="/home/postgres/pgsql/master_20201002_1/data_pub"

${PGHOME}/bin/pg_ctl -D  ${CLUSTER_PUB} stop
rm -rf ${CLUSTER_PUB}
${PGHOME}/bin/initdb -D ${CLUSTER_PUB} --no-locale -E UTF8 -k

## Configuration
echo "shared_buffers = 1GB"  >> ${CLUSTER_PUB}/postgresql.conf
echo "max_locks_per_transaction = 1" >> ${CLUSTER_PUB}/postgresql.conf
echo "synchronous_commit = off"  >> ${CLUSTER_PUB}/postgresql.conf
echo "archive_mode = off"  >> ${CLUSTER_PUB}/postgresql.conf
echo "fsync = off" >> ${CLUSTER_PUB}/post

Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-10-01 Thread Keisuke Kuroda
> > I test the patch on the master HEAD(9796f455) and it works fine.
> > * make installcheck-world: passed
> > * walsender process continues to use 100% of the CPU 1core when
> > TRUNCATE 1000 partition: 1s or less
> > ** before patch : 230s
>
> Does this result indicate that it is still CPU bound but it does the
> actual decoding and completes in 1s instead of spending 230s mainly to
> execute unnecessary invalidations?

Okay, to check the decodeing and invalidation times,
I got the time to return the results to pg_recvlogical and perf.

Before the patch, most of the processing was done
by hash_seq_search in the ReferenodeMapInvalidateCallback.
After the patch, this problem has been resolved.

# test

(1) start pg_recvlogical
(2) INSERT to 2000 partition for creating RelfilenodemapHash
(3) TRUNCATE 1000 partition

(1)
pg_recvlogical --create-slot --start -f - --if-not-exists \
--plugin=test_decoding --slot=test1 --dbname=postgres --username=postgres \
--option=include-timestamp | gawk '{ print strftime("%Y-%m-%d
%H:%M:%S"), $0; fflush(); }'

(2)
DO $$
DECLARE
i INT;
j INT;
schema TEXT;
tablename TEXT;
BEGIN
FOR i IN 1 .. 2 LOOP
schema := 'nsp_' || to_char(i, 'FM000');
tablename := 'tbl_' || to_char(i, 'FM000');
EXECUTE 'INSERT INTO ' || schema || '.' || tablename || '
 SELECT i,i,i,0,0,0,0,0,0,0,0,0,''a''::bytea,''a''::bytea FROM
generate_series(0,,1) AS i';
END LOOP;
END;
$$ LANGUAGE plpgsql;

(3)
TRUNCATE TABLE nsp_001.tbl_001;

# before

(3) TRUNCATE
decode + invalidation = 222s

2020-10-01 16:58:29 BEGIN 529
2020-10-01 17:02:11 COMMIT 529 (at 2020-10-01 16:58:29.281747+09)

(Before the patch was applied, I was concerned that
the "table nsp_001.tbl_001 ... TRUNCATE" was not outputting.)

Samples: 228K of event 'cpu-clock:uhH', Event count (approx.): 5708075
  Overhead  Command   Shared Object   Symbol
-   96.00%  postgres  postgres[.] hash_seq_search
 hash_seq_search
 - RelfilenodeMapInvalidateCallback
  - LocalExecuteInvalidationMessage
ReorderBufferExecuteInvalidations
ReorderBufferProcessTXN
ReorderBufferCommit
DecodeCommit
DecodeXactOp
LogicalDecodingProcessRecord
XLogSendLogical
WalSndLoop
StartLogicalReplication
exec_replication_command
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
main
__libc_start_main
_start
+0.77%  postgres  postgres[.] LocalExecuteInvalidationMessage
 0.47%  postgres  postgres[.] hash_bytes_uint32
 0.42%  postgres  postgres[.] hash_search_with_hash_value
 0.37%  postgres  postgres[.] RelfilenodeMapInvalidateCallback
 0.32%  postgres  postgres[.] CatCacheInvalidate
 0.23%  postgres  postgres[.] PlanCacheRelCallback
 0.23%  postgres  postgres[.] ReorderBufferExecuteInvalidations

# after

(3) TRUNCATE
decode + invalidation = 1s or less

2020-10-01 17:09:43 BEGIN 537
2020-10-01 17:09:43 table nsp_001.tbl_001, nsp_001.part_0001,
nsp_001.part_0002 ... nsp_001.part_0999, nsp_001.part_1000: TRUNCATE:
(no-flags)
2020-10-01 17:09:43 COMMIT 537 (at 2020-10-01 17:09:43.192989+09)

Samples: 337  of event 'cpu-clock:uhH', Event count (approx.): 8425
  Overhead  Command   Shared Object   Symbol
-   47.48%  postgres  postgres[.] hash_seq_search
 hash_seq_search
 RelfilenodeMapInvalidateCallback
 LocalExecuteInvalidationMessage
 ReorderBufferExecuteInvalidations
 ReorderBufferProcessTXN
 ReorderBufferCommit
 DecodeCommit
 DecodeXactOp
 LogicalDecodingProcessRecord
 XLogSendLogical
 WalSndLoop
 StartLogicalReplication
 exec_replication_command
 PostgresMain
 BackendRun
 BackendStartup
 ServerLoop
 PostmasterMain
 main
 __libc_start_main
 _start
+3.26%  postgres  postgres[.] heap_hot_search_buffer
+2.67%  postgres  postgres[.] pg_comp_crc32c_sse42
+2.08%  postgres  libc-2.17.so[.] __memcpy_ssse3_back
+2.08%  postgres  postgres[.] DecodeXLogRecord
+2.08%  postgres  postgres[.] hash_search_with_hash_value
+1.48%  postgres  libpthread-2.17.so  [.] __libc_recv
+1.19%  postgres  libc-2.17.so[.] __memset_sse2
+1.19%  postgres  libpthread-2.17.so  [.] __errno_location
+1.19%  postgres  postgres[.] LocalExecuteInvalidationMessage
+1.19%  postgres  postgres[.] ReadPageInternal
+1.19%  postgres  postgres[.] XLogReadRecord
+1.19%  postgres  postgres[.] socket_is_send_pending
 hash_seq_search

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-30 Thread Keisuke Kuroda
Hi Dilip, Amit,

Thank you for the patch!

I test the patch on the master HEAD(9796f455) and it works fine.
* make installcheck-world: passed
* walsender process continues to use 100% of the CPU 1core when
TRUNCATE 1000 partition: 1s or less
** before patch : 230s

There is "ReorderBufferAddInvalidation" in reorderbuffer.h, but I
don't think it's needed.

src/include/replication/reorderbuffer.h
+void ReorderBufferAddInvalidation(ReorderBuffer *, TransactionId,
XLogRecPtr lsn,
+ int nmsgs, SharedInvalidationMessage *msgs);

If possible, I'd like to improve it even before PG13,
 but I recognize that it's difficult because it uses a PG14 or later
mechanism...

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-29 Thread Keisuke Kuroda
Hi Amit,

> I don't have a patch for this idea but you can refer [1]
> (v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see
> what I was trying to say about registering the invalidations in the
> form of ReorderBufferChange. We have something along the lines of what
> I described above in that patch but we have removed it because we need
> all the invalidations to be accumulated to handle aborts and we don't
> want two different mechanisms to store invalidations.

Thanks, I read the patch
(v25-0002-Issue-individual-invalidations-with-wal_level-lo)
and the review.

I understand the following.
* In v25-0002, there were two different mechanisms,
  XLOG_XACT_INVALIDATIONS (ReorderBufferAddInvalidation) and
  XLOG_INVALIDATIONS (ReorderBufferAddInvalidations).
* After a review, XLOG_XACT_INVALIDATIONS was implemented to
  generate all invalidation messages.

I'm going to write a patch that looks like the following.
* Add the process of collecting invalidation to
  XLOG_XACT_INVALIDATIONS in the form of ReorderBufferChange.
* Invalidation is executed in case
  REORDER_BUFFER_CHANGE_INVALIDATION.

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-27 Thread Keisuke Kuroda
Hi Amit,

> You need to refer to XLOG_XACT_INVALIDATIONS, not XLOG_INVALIDATIONS.

Thank you, that's right. XLOG_INVALIDATIONS is added at COMMIT,
so I need to refer to XLOG_XACT_INVALIDATIONS.

> Okay. Feel free to clarify your questions if you have any? Are you
> interested in writing a patch for the same?

Thank you! Yes, I would like to write a patch.
If you already have a discussed thread or patch for this idea,
please let me know.

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-27 Thread Keisuke Kuroda
Hi Amit,

Thank you for the reply!

> However, after commit c55040ccd0 (When wal_level=logical,
> write invalidations at command end into WAL so that decoding can use
> this information.) we actually know exactly when we need to execute
> each invalidation.

I see, thank you for your explaination.
I'll try to think about the solution by using XLOG_INVALIDATIONS
and referring to the thread
"PATCH: logical_work_mem and logical streaming of large in-progress
transactions".

Best Regards,
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-23 Thread Keisuke Kuroda
(command_id < change->data.command_id)
1750   {
1751 command_id = change->data.command_id;
1752
1753 if (!snapshot_now->copied)
1754 {
1755   /* we don't use the global one anymore */
1756   snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
1757  txn, command_id);
1758 }
1759
1760 snapshot_now->curcid = command_id;
1761
1762 TeardownHistoricSnapshot(false);
1763 SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
1764
1765 /*
1766  * Every time the CommandId is incremented, we could
1767  * see new catalog contents, so execute all
1768  * invalidations.
1769  */
1770 ReorderBufferExecuteInvalidations(rb, txn);
1771   }
1772
1773   break;

Do you have any solutions?

-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com
#!/bin/bash

PGHOME="/usr/pgsql-12"
CLUSTER_PUB="/home/postgres/relfilenode/data_pub"
CLUSTER_SUB="/home/postgres/relfilenode/data_sub"

${PGHOME}/bin/pg_ctl -D  ${CLUSTER_PUB} stop
${PGHOME}/bin/pg_ctl -D  ${CLUSTER_SUB} stop
rm -rf ${CLUSTER_PUB}
rm -rf ${CLUSTER_SUB}
${PGHOME}/bin/initdb -D ${CLUSTER_PUB} --no-locale -E UTF8 -k

## Configuration
echo "shared_buffers = 1GB"  >> ${CLUSTER_PUB}/postgresql.conf
echo "max_locks_per_transaction = 1" >> ${CLUSTER_PUB}/postgresql.conf
echo "synchronous_commit = off"  >> ${CLUSTER_PUB}/postgresql.conf
echo "archive_mode = off"  >> ${CLUSTER_PUB}/postgresql.conf
echo "fsync = off" >> ${CLUSTER_PUB}/postgresql.conf
echo "full_page_writes = off"  >> ${CLUSTER_PUB}/postgresql.conf
echo "autovacuum = off" >> ${CLUSTER_PUB}/postgresql.conf
echo "checkpoint_timeout = 120min" >> ${CLUSTER_PUB}/postgresql.conf
echo "max_wal_size = 20GB" >> ${CLUSTER_PUB}/postgresql.conf

echo "wal_level = logical" >> ${CLUSTER_PUB}/postgresql.conf
echo "max_wal_senders = 10" >> ${CLUSTER_PUB}/postgresql.conf
echo "wal_sender_timeout = 0" >> ${CLUSTER_PUB}/postgresql.conf
echo "wal_receiver_timeout = 0" >> ${CLUSTER_PUB}/postgresql.conf

echo "port = 5432" >> ${CLUSTER_PUB}/postgresql.conf

cp -p -r ${CLUSTER_PUB} ${CLUSTER_SUB}
echo "port = 6432" >> ${CLUSTER_SUB}/postgresql.conf

${PGHOME}/bin/pg_ctl -D ${CLUSTER_PUB} start

## create partition
${PGHOME}/bin/psql -p 5432 << "EOF"
DO $$
DECLARE
i INT;
j INT;
schema TEXT;
tablename TEXT;
partname TEXT;
BEGIN
FOR i IN 1 .. 2 LOOP
schema := 'nsp_' || to_char(i, 'FM000');
tablename := 'tbl_' || to_char(i, 'FM000');
EXECUTE 'CREATE SCHEMA ' || schema;
EXECUTE 'CREATE TABLE ' || schema || '.' || tablename || ' (
big01 BIGINT,
big02 BIGINT,
big03 BIGINT,
big04 BIGINT,
big05 BIGINT,
big06 BIGINT,
big07 BIGINT,
big08 BIGINT,
big09 BIGINT,
big10 BIGINT,
big11 BIGINT,
big12 BIGINT,
byt01 BYTEA,
byt02 BYTEA
) PARTITION BY RANGE (big02)';
FOR j IN 1 .. 1000 LOOP
partname := 'part_' || to_char(j, 'FM');
EXECUTE 'CREATE TABLE ' || schema || '.' || partname ||
' PARTITION OF ' || schema || '.' || tablename ||
' FOR VALUES FROM (' || (j-1)*10 || ') TO (' || (j)*10 || ')';
EXECUTE 'ALTER TABLE ' || schema || '.' || partname || ' ADD PRIMARY KEY (big01)';
END LOOP;
EXECUTE 'CREATE INDEX ON ' || schema || '.' || tablename || ' (big03)';
END LOOP;
END;
$$ LANGUAGE plpgsql;
EOF

${PGHOME}/bin/psql -p 5432 << "EOF"
CREATE TABLE test(id int primary key);
CREATE PUBLICATION pubdb_test FOR TABLE public.test;
EOF

${PGHOME}/bin/pg_ctl -D ${CLUSTER_SUB} start

${PGHOME}/bin/psql -p 6432 << "EOF"
CREATE TABLE test(id int primary key);
CREATE SUBSCRIPTION subdb_test CONNECTION 'host=localhost dbname=postgres' PUBLICATION pubdb_test WITH (copy_data = false);
EOF

# insert data(create relfilenodemaphash)
${PGHOME}/bin/psql -p 5432 << "EOF"
DO $$
DECLARE
i INT;
j INT;
schema TEXT;
tablename TEXT;
BEGIN
FOR i IN 1 .. 2 LOOP
schema := 'nsp_' || to_char(i, 'FM000');
tablename := 'tbl_' || to_char(i, 'FM000');
EXECUTE 'INSERT INTO ' || schema || '.' || tablename || '
 SELECT i,i,i,0,0,0,0,0,0,0,0,0,''a''::bytea,''a''::bytea FROM generate_series(0,,1) AS i';
END LOOP;
END;
$$ LANGUAGE plpgsql;
EOF

${PGHOME}/bin/psql -p 5432 << "EOF"
TRUNCATE TABLE nsp_001.tbl_001;
EOF

echo "---walsender process info---"

ps -aux | grep walsender | grep -v grep


Re: control max length of parameter values logged

2020-03-17 Thread keisuke kuroda
Hi,

BIND parameter truncation is good to me.
Logs could be very hard to read due to
the very long parameters recorded.

Now, parameter is extracuted from left.
e.g. "----" to "--..."

Is not necessary from right?
e.g. "----" to "...--"

It is sometimes nice to be able to check
the last of strings. For example, if there
are difference only in the last of strings
of many parameters.

Best Regards,
Keisuke Kuroda




Re: Exposure related to GUC value of ssl_passphrase_command

2020-03-05 Thread keisuke kuroda
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

I tested the patch on the master branch (addd034) and it works fine.

I think that test case which non-superuser can't see this parameter is 
unnecessary. 
There is a similar test for pg_read_all_settings role.

The new status of this patch is: Ready for Committer


Re: In PG12, query with float calculations is slower than PG11

2020-02-13 Thread keisuke kuroda
Thank you very much everyone.

Improvement was confirmed even if PG12_STABLE was built with gcc 4.8.5.

* PG_12_STABLE
* gcc 4.8.5

postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
 select (2 * a) , (2 * b) , (2 * c), (2 * d),  (2 * e)
 from realtest;

QUERY PLAN


 Seq Scan on public.realtest  (cost=0.00..288692.14 rows=873 width=40)
(actual time=0.012..4118.432 rows=1001 loops=1)
   Output: ('2'::double precision * a), ('2'::double precision * b),
('2'::double precision * c), ('2'::double precision * d), ('2'::double
precision * e)
   Buffers: shared hit=63695
 Planning Time: 0.034 ms
 Execution Time: 4811.957 ms
(5 rows)

  32.03%  postgres  postgres   [.] ExecInterpExpr
  12.28%  postgres  postgres   [.] float84mul
   9.62%  postgres  [vdso] [.] __vdso_clock_gettime
   6.45%  postgres  libc-2.17.so   [.] __isinf
   5.15%  postgres  postgres   [.] tts_buffer_heap_getsomeattrs
   3.83%  postgres  postgres   [.] ExecScan

Best Regards,
Keisuke Kuroda

2020年2月14日(金) 13:29 Amit Langote :

> On Fri, Feb 14, 2020 at 3:47 AM Andres Freund  wrote:
> > On 2020-02-13 13:40:43 -0500, Tom Lane wrote:
> > > ... and pushed.  One other change I made beyond those suggested
> > > was to push the zero-divide ereport's out-of-line as well.
> >
> > Thanks!
>
> Thank you all.
>
> I repeated some of the tests I did earlier and things look good.
>
> gcc-8
> =
>
> HEAD
>
> latency average = 296.842 ms
>
> 42.05%  postgres  postgres   [.] ExecInterpExpr
> 15.14%  postgres  postgres   [.] float8_accum
>  9.32%  postgres  libc-2.17.so   [.] __isinf
>  7.32%  postgres  postgres   [.] dsqrt
>  5.67%  postgres  postgres   [.] float8mul
>  4.20%  postgres  postgres   [.] ftod
>
> 11.7
>
> latency average = 289.439 ms
>
> 41.52%  postgres  postgres   [.] ExecInterpExpr
> 13.59%  postgres  libc-2.17.so   [.] __isinf
> 10.98%  postgres  postgres   [.] float8_accum
>  8.26%  postgres  postgres   [.] dsqrt
>  6.17%  postgres  postgres   [.] float8mul
>  3.65%  postgres  postgres   [.] ftod
>
> clang-7
> ===
>
> HEAD
>
> latency average = 233.735 ms
>
> 43.84%  postgres  postgres   [.] ExecInterpExpr
> 15.17%  postgres  postgres   [.] float8_accum
>  8.25%  postgres  postgres   [.] dsqrt
>  7.35%  postgres  postgres   [.] float8mul
>  5.84%  postgres  postgres   [.] ftod
>  3.78%  postgres  postgres   [.] tts_buffer_heap_getsomeattrs
>
> 11.7
>
> latency average = 221.009 ms
>
> 49.55%  postgres  postgres   [.] ExecInterpExpr
> 12.05%  postgres  postgres   [.] float8_accum
>  8.97%  postgres  postgres   [.] dsqrt
>  6.72%  postgres  postgres   [.] float8mul
>  5.62%  postgres  postgres   [.] ftod
>  2.18%  postgres  postgres   [.] slot_deform_tuple
>
> HEAD and PG 11 are now comparable even when built with gcc.
>
> Regards,
> Amit
>


Re: In PG12, query with float calculations is slower than PG11

2020-02-06 Thread keisuke kuroda
Hi,

I have been testing with newer compiler (clang-7)
and result is a bit different at least with clang-7.
Compiling PG 12.1 (even without patch) with clang-7
results in __isinf() no longer being a bottleneck,
that is, you don't see it in profiler at all.

So, there is no issue for people who use the modern clang toolchain,
but maybe that's not everyone.
So there would still be some interest in doing something about this.

* clang

bash-4.2$ which clang
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang

bash-4.2$ clang -v
clang version 7.0.1 (tags/RELEASE_701/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rh/llvm-toolset-7.0/root/usr/bin
Found candidate GCC installation:
/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7
Found candidate GCC installation:
/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Selected GCC installation:
/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64

** pg_config

---
CONFIGURE = '--prefix=/var/lib/pgsql/pgsql/12.1'
'CC=/opt/rh/llvm-toolset-7.0/root/usr/bin/clang'
'PKG_CONFIG_PATH=/opt/rh/llvm-toolset-7.0/root/usr/lib64/pkgconfig'
CC = /opt/rh/llvm-toolset-7.0/root/usr/bin/clang
---

* result(PostgreSQL 12.1 (even without patch))

postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
 select (2 * a) , (2 * b) , (2 * c), (2 * d),  (2 * e)
 from realtest;

QUERY PLAN
---
 Seq Scan on public.realtest  (cost=0.00..288697.59 rows=1115 width=40)
(actual time=0.012..3878.284 rows=1001 loops=1)
   Output: ('2'::double precision * a), ('2'::double precision * b),
('2'::double precision * c), ('2'::double precision * d), ('2'::double
precision * e)
   Buffers: shared hit=63695
 Planning Time: 0.038 ms
 Execution Time: 4533.767 ms
(5 rows)

Samples: 5K of event 'cpu-clock', Event count (approx.): 127500
Overhead  Command   Shared Object  Symbol
  33.92%  postgres  postgres   [.] ExecInterpExpr
  13.27%  postgres  postgres   [.] float84mul
  10.86%  postgres  [vdso] [.] __vdso_clock_gettime
   5.49%  postgres  postgres   [.] tts_buffer_heap_getsomeattrs
   3.96%  postgres  postgres   [.] ExecScan
   3.25%  postgres  libc-2.17.so   [.] __clock_gettime
   3.16%  postgres  postgres   [.] heap_getnextslot
   2.41%  postgres  postgres   [.] tts_virtual_clear
   2.39%  postgres  postgres   [.] SeqNext
   2.22%  postgres  postgres   [.] InstrStopNode

Best Regards,
Keisuke Kuroda

2020年2月7日(金) 3:48 Andres Freund :

> Hi,
>
> On 2020-02-06 11:03:51 -0500, Tom Lane wrote:
> > Andres seems to be of the opinion that the compiler should be willing
> > to ignore the semantic requirements of the C standard in order
> > to rearrange the code back into the cheaper order.  That sounds like
> > wishful thinking to me ... even if it actually works on his compiler,
> > it certainly isn't going to work for everyone.
>
> Sorry, but, uh, what are you talking about?  Please tell me which single
> standards violation I'm advocating for?
>
> I was asking about the inlining bit because the first email of the topic
> explained that as the problem, which I don't believe can be the full
> explanation - and it turns out it isn't. As Amit Langote's followup
> email explained, there's the whole issue of the order of checks being
> inverted - which is clearly bad. And wholly unrelated to inlining.
>
> And I asked about __isinf() being used because there are issues with
> accidentally ending up with the non-intrinsic version of isinf() when
> not using gcc, due to badly written standard library headers.
>
>
> > The patch looks unduly invasive to me, but I think that it might be
> > right that we should go back to a macro-based implementation, because
> > otherwise we don't have a good way to be certain that the function
> > parameter won't get evaluated first.
>
> I'd first like to see some actual evidence of this being a problem,
> rather than just the order of the checks.
>
>
> > (Another reason to do so is so that the file/line numbers generated
> > for the error reports go back to being at least a little bit useful.)
> > We could use local variables within the macro to avoid double evals,
> > if anyone thinks that's actually important --- I don't.
>
> I don't think that's necessarily a good idea. In fact, I think we should
> probably do the exact opposite, and move the error messages further out
> of line. All these otherwise very small functions having their own
> 

In PG12, query with float calculations is slower than PG11

2020-02-05 Thread keisuke kuroda
--
 Seq Scan on public.realtest  (cost=0.00..307328.38 rows=10828150 width=40)
(actual time=0.012..4009.012 rows=1001 loops=1)
   Output: ('2'::double precision * a), ('2'::double precision * b),
('2'::double precision * c), ('2'::double precision * d), ('2'::double
precision * e)
   Buffers: shared hit=63695
 Planning Time: 0.038 ms
 Execution Time: 4679.162 ms
(5 rows)

Samples: 5K of event 'cpu-clock', Event count (approx.): 137675
Overhead  Command   Shared Object  Symbol
  31.43%  postgres  postgres   [.] ExecInterpExpr
  14.24%  postgres  postgres   [.] float84mul
  10.40%  postgres  [vdso] [.] __vdso_clock_gettime
★ 5.41%  postgres  libc-2.17.so   [.] __isinf
   4.63%  postgres  postgres   [.] tts_buffer_heap_getsomeattrs
   4.03%  postgres  postgres   [.] ExecScan
   3.54%  postgres  libc-2.17.so   [.] __clock_gettime
   3.12%  postgres  postgres   [.] HeapTupleSatisfiesVisibility
   2.36%  postgres  postgres   [.] heap_getnextslot
   2.16%  postgres  postgres   [.] heapgettup_pagemode
   2.09%  postgres  postgres   [.] standard_ExecutorRun
   2.07%  postgres  postgres   [.] SeqNext
   2.03%  postgres  postgres   [.] ExecProcNodeInstr
   2.03%  postgres  postgres   [.] tts_virtual_clear

PG 12 is still slower compared to PG 11, but the __isinf() situation is
better with the patch.

Best Regards,
Keisuke Kuroda


check-float-val.patch
Description: Binary data


Re: Wrong value in metapage of GIN INDEX.

2019-08-29 Thread keisuke kuroda
Hi Moon-san.

Thank you for posting.

We are testing the GIN index onJSONB type.
The default maintenance_work_mem (64MB) was fine in usually cases.
However, this problem occurs when indexing very large JSONB data.

best regards,
Keisuke Kuroda

2019年8月29日(木) 17:20 Moon, Insung :

> Dear Hackers.
>
> Kuroda-san and I are interested in the GIN index and have been testing
> various things.
> While testing, we are found a little bug.
> Some cases, the value of nEntries in the metapage was set to the wrong
> value.
>
> This is a reproduce of bug situation.
> =# SET maintenance_work_mem TO '1MB';
> =# CREATE TABLE foo(i jsonb);
> =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM
> generate_series(1, 1) AS i;
>
> # Input the same value again.
> =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM
> generate_series(1, 1) AS i;
> # Creates GIN Index.
> =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops);
>
>
> =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH
> (fastupdate=off);
> -[ RECORD 1 ]+---
> pending_head | 4294967295
> pending_tail | 4294967295
> tail_free_size   | 0
> n_pending_pages  | 0
> n_pending_tuples | 0
> n_total_pages| 74
> n_entry_pages| 69
> n_data_pages | 4
> n_entries| 20004 <--★
> version  | 2
>
> In this example, the nentries value should be 10001 because the gin
> index stores duplicate values in one leaf(posting tree or posting
> list).
> But, if look at the nentries value of metapage using pageinspect, it
> is stored as 20004.
> So, Let's run the vacuum.
>
>
> =# VACUUM foo;
> =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0));
> -[ RECORD 1 ]+---
> pending_head | 4294967295
> pending_tail | 4294967295
> tail_free_size   | 0
> n_pending_pages  | 0
> n_pending_tuples | 0
> n_total_pages| 74
> n_entry_pages| 69
> n_data_pages | 4
> n_entries| 10001 <--★
> version  | 2
>
> Ah. Run to the vacuum, nEntries is changing the normal value.
>
> There is a problem with the ginEntryInsert function. That calls the
> table scan when creating the gin index, ginBuildCallback function
> stores the new heap value inside buildstate struct.
> And next step, If GinBuildState struct is the size of the memory to be
> using is equal to or larger than the maintenance_work_mem value, run
> to input value into the GIN index.
> This process is a function called ginEnctryInsert.
> The ginEntryInsert function called at this time determines that a new
> entry is added and increase the value of nEntries.
> However currently, ginEntryInsert is first to increase in the value of
> nEntries, and to determine if there are the same entries in the
> current GIN index.
> That causes the bug.
>
> The patch is very simple.
> Fix to increase the value of nEntries only when a non-duplicate GIN
> index leaf added.
>
> This bug detection and code fix worked with Kuroda-san.
>
> Best Regards.
> Moon.
>