Re: Creating a function for exposing memory usage of backend process

2020-08-25 Thread Fujii Masao




On 2020/08/25 11:39, Fujii Masao wrote:



On 2020/08/24 21:56, torikoshia wrote:

On 2020-08-24 13:13, Fujii Masao wrote:

On 2020/08/24 13:01, torikoshia wrote:

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


+1

But as I proposed upthread, what about a bit complicated test as follows,
e.g., to confirm that the internal logic for level works expectedly?

 SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
pg_backend_memory_contexts WHERE level = 0;


OK!
Attached an updated patch.


Thanks for updating the patch! Looks good to me.
Barring any objection, I will commit this patch.









Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


Thanks for the patch! Looks good to me.
Barring any objection, I will commit this patch at first.




The same code is moved around line-by-line.


Of course, this restriction makes pg_backend_memory_contexts hard to use
when the user of the target session is not granted pg_monitor because the
scope of this view is session local.

In this case, I imagine additional operations something like temporarily
granting pg_monitor to that user.


Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of internal
structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice. This
is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.


I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.


You mean 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?


Oops, I meant 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch.


This patch also looks good to me. Thanks!


I pushed the proposed three patches. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-24 Thread Fujii Masao




On 2020/08/24 21:56, torikoshia wrote:

On 2020-08-24 13:13, Fujii Masao wrote:

On 2020/08/24 13:01, torikoshia wrote:

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


+1

But as I proposed upthread, what about a bit complicated test as follows,
e.g., to confirm that the internal logic for level works expectedly?

 SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
pg_backend_memory_contexts WHERE level = 0;


OK!
Attached an updated patch.


Thanks for updating the patch! Looks good to me.
Barring any objection, I will commit this patch.









Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


Thanks for the patch! Looks good to me.
Barring any objection, I will commit this patch at first.




The same code is moved around line-by-line.


Of course, this restriction makes pg_backend_memory_contexts hard to use
when the user of the target session is not granted pg_monitor because the
scope of this view is session local.

In this case, I imagine additional operations something like temporarily
granting pg_monitor to that user.


Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of internal
structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice. This
is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.


I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.


You mean 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?


Oops, I meant 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch.


This patch also looks good to me. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-24 Thread Fujii Masao




On 2020/08/24 17:09, Michael Paquier wrote:

On Mon, Aug 24, 2020 at 02:48:50PM +0900, Fujii Masao wrote:

As far as I know, utils/adt is the directory to basically include the files
for a particular type or operator. So ISTM that mcxtfuncs.c doesn't
fit to this directory. Isn't it better to put that in utils/mmgr ?


We have also stuff like ruleutils.c, dbsize.c, genfile.c there which
is rather generic, so I would rather leave utils/mmgr/ out of the
business of this thread, and just keep inside all the lower-level APIs
for memory context handling.


Understood. So I will commit the latest patch 
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-24 Thread torikoshia

On 2020-08-24 13:13, Fujii Masao wrote:

On 2020/08/24 13:01, torikoshia wrote:

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one 
considering

the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


+1

But as I proposed upthread, what about a bit complicated test as 
follows,

e.g., to confirm that the internal logic for level works expectedly?

 SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
pg_backend_memory_contexts WHERE level = 0;


OK!
Attached an updated patch.







Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


Thanks for the patch! Looks good to me.
Barring any objection, I will commit this patch at first.




The same code is moved around line-by-line.

Of course, this restriction makes pg_backend_memory_contexts hard to 
use
when the user of the target session is not granted pg_monitor 
because the

scope of this view is session local.

In this case, I imagine additional operations something like 
temporarily

granting pg_monitor to that user.


Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of 
internal

structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice.  
This

is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.


I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.


You mean 
0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?


Oops, I meant 
0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch.






Attached a patch.


Thanks for updating the patch! This also looks good to me.



It seems to me that you are missing one "REVOKE ALL on
pg_backend_memory_contexts FROM PUBLIC" in patch 0003.

By the way, if that was just for me, I would remove used_bytes, which
is just a computation from the total and free numbers.  I'll defer
that point to Fujii-san.


Yeah, I was just thinking that displaying also used_bytes was useful,
but this might be inconsistent with the other views' ways.

Regards,


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

From 335b9eb0c60a7f12debd4c45d435888109b2bfcf Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 24 Aug 2020 21:28:20 +0900
Subject: [PATCH] Added a regression test for pg_backend_memory_contexts.

---
 src/test/regress/expected/sysviews.out | 9 +
 src/test/regress/sql/sysviews.sql  | 5 +
 2 files changed, 14 insertions(+)

diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..1cffc3349d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -19,6 +19,15 @@ select count(*) >= 0 as ok from pg_available_extensions;
  t
 (1 row)
 
+-- The entire output of pg_backend_memory_contexts is not stable,
+-- we test only the existance and basic condition of TopMemoryContext.
+select name, ident, parent, level, total_bytes >= free_bytes
+  from pg_backend_memory_contexts where level = 0;
+   name   | ident | parent | level | ?column? 
+--+---++---+--
+ TopMemoryContext |   || 0 | t
+(1 row)
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
  ok 
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 28e412b735..ac4a0e1cbb 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -12,6 +12,11 @@ select count(*) >= 0 as ok from pg_available_extension_versions;
 
 select count(*) >= 0 as ok from pg_available_extensions;
 
+-- The entire output of pg_backend_memory_contexts is not stable,
+-- we test only the existance and basic condition of TopMemoryContext.
+select name, ident, parent, level, total_bytes >= free_bytes
+  from pg_backend_memory_contexts where level = 0;
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
 
-- 
2.18.1



Re: Creating a function for exposing memory usage of backend process

2020-08-24 Thread Michael Paquier
On Mon, Aug 24, 2020 at 02:48:50PM +0900, Fujii Masao wrote:
> As far as I know, utils/adt is the directory to basically include the files
> for a particular type or operator. So ISTM that mcxtfuncs.c doesn't
> fit to this directory. Isn't it better to put that in utils/mmgr ?

We have also stuff like ruleutils.c, dbsize.c, genfile.c there which
is rather generic, so I would rather leave utils/mmgr/ out of the
business of this thread, and just keep inside all the lower-level APIs
for memory context handling.  I don't have a strong feeling for one
being better than the other, so if you prefer more one way than the
other, that's fine by me as long as the split is done as the new
functions depend on nothing static in mcxt.c.  And you are the
committer of this feature.

> The copyright line in new file mcxtfuncs.c should be changed as follows
> because it contains new code?

> - * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> - * Portions Copyright (c) 1994, Regents of the University of California
> + * Portions Copyright (c) 2020, PostgreSQL Global Development Group

FWIW, I usually choose what's proposed in the patch as a matter of
consistency, because it is a no-brainer and because you don't have to
think about past references when it comes to structures or such.
--
Michael


signature.asc
Description: PGP signature


Re: Creating a function for exposing memory usage of backend process

2020-08-23 Thread Fujii Masao




On 2020/08/24 13:13, Fujii Masao wrote:



On 2020/08/24 13:01, torikoshia wrote:

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


+1

But as I proposed upthread, what about a bit complicated test as follows,
e.g., to confirm that the internal logic for level works expectedly?

  SELECT name, ident, parent, level, total_bytes >= free_bytes FROM 
pg_backend_memory_contexts WHERE level = 0;





Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


Thanks for the patch! Looks good to me.
Barring any objection, I will commit this patch at first.


As far as I know, utils/adt is the directory to basically include the files
for a particular type or operator. So ISTM that mcxtfuncs.c doesn't
fit to this directory. Isn't it better to put that in utils/mmgr ?


The copyright line in new file mcxtfuncs.c should be changed as follows
because it contains new code?

- * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-23 Thread Fujii Masao




On 2020/08/24 13:01, torikoshia wrote:

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


+1

But as I proposed upthread, what about a bit complicated test as follows,
e.g., to confirm that the internal logic for level works expectedly?

 SELECT name, ident, parent, level, total_bytes >= free_bytes FROM 
pg_backend_memory_contexts WHERE level = 0;





Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


Thanks for the patch! Looks good to me.
Barring any objection, I will commit this patch at first.




The same code is moved around line-by-line.


Of course, this restriction makes pg_backend_memory_contexts hard to use
when the user of the target session is not granted pg_monitor because the
scope of this view is session local.

In this case, I imagine additional operations something like temporarily
granting pg_monitor to that user.


Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of internal
structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice.  This
is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.


I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.


You mean 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?



Attached a patch.


Thanks for updating the patch! This also looks good to me.



It seems to me that you are missing one "REVOKE ALL on
pg_backend_memory_contexts FROM PUBLIC" in patch 0003.

By the way, if that was just for me, I would remove used_bytes, which
is just a computation from the total and free numbers.  I'll defer
that point to Fujii-san.


Yeah, I was just thinking that displaying also used_bytes was useful,
but this might be inconsistent with the other views' ways.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-23 Thread torikoshia

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


The same code is moved around line-by-line.

Of course, this restriction makes pg_backend_memory_contexts hard to 
use
when the user of the target session is not granted pg_monitor because 
the

scope of this view is session local.

In this case, I imagine additional operations something like 
temporarily

granting pg_monitor to that user.


Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of internal
structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice.  This
is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.


I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.

Attached a patch.



It seems to me that you are missing one "REVOKE ALL on
pg_backend_memory_contexts FROM PUBLIC" in patch 0003.

By the way, if that was just for me, I would remove used_bytes, which
is just a computation from the total and free numbers.  I'll defer
that point to Fujii-san.
--
Michael



On 2020/08/20 2:59, Kasahara Tatsuhito wrote:
I totally agree that it's not *enough*, but in contrast to you I 
think
it's a good step. Subsequently we should add a way to get any 
backends

memory usage.
It's not too hard to imagine how to serialize it in a way that can be
easily deserialized by another backend. I am imagining something like
sending a procsignal that triggers (probably at CFR() time) a backend 
to
write its own memory usage into pg_memusage/ or something 
roughly

like that.


Sounds good. Maybe we can also provide the SQL-callable function
or view to read pg_memusage/, to make the analysis easier.

+1



I'm thinking about starting a new thread to discuss exposing other
backend's memory context.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

From dc4fade9111dc3f91e992c4d5af393dd5ed03270 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 24 Jul 2020 11:14:32 +0900
Subject: [PATCH] Previously pg_backend_memory_contexts doesn't have any
 restriction and anyone could access it. However, this view contains some
 internal information of the memory context. This policy could cause security
 issues. This patch revokes all on pg_shmem_allocations from public and only
 the superusers can access it.

---
 doc/src/sgml/catalogs.sgml   | 4 
 src/backend/catalog/system_views.sql | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 1232b24e74..9fe260ecff 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9697,6 +9697,10 @@ SCRAM-SHA-256$iteration count:

   
 
+  
+   By default, the pg_backend_memory_contexts view can be
+   read only by superusers.
+  
  
 
  
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ba5a23ac25..a2d61302f9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -557,6 +557,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 CREATE VIEW pg_backend_memory_contexts AS
 SELECT * FROM pg_get_backend_memory_contexts();
 
+REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
+
 -- Statistics views
 
 CREATE VIEW pg_stat_all_tables AS
-- 
2.18.1



Re: Creating a function for exposing memory usage of backend process

2020-08-22 Thread Michael Paquier
On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:
> OK. Added a regression test on sysviews.sql.
> (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)
> 
> Fujii-san gave us an example, but I added more simple one considering
> the simplicity of other tests on that.

What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.

> Added a patch for relocating the codes to mcxtfuncs.c.
> (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)

The same code is moved around line-by-line.

> Of course, this restriction makes pg_backend_memory_contexts hard to use
> when the user of the target session is not granted pg_monitor because the
> scope of this view is session local.
> 
> In this case, I imagine additional operations something like temporarily
> granting pg_monitor to that user.

Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of internal
structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice.  This
is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.

It seems to me that you are missing one "REVOKE ALL on
pg_backend_memory_contexts FROM PUBLIC" in patch 0003.

By the way, if that was just for me, I would remove used_bytes, which
is just a computation from the total and free numbers.  I'll defer
that point to Fujii-san.
--
Michael


signature.asc
Description: PGP signature


Re: Creating a function for exposing memory usage of backend process

2020-08-21 Thread torikoshia

Thanks for all your comments!

Thankfully it seems that this feature is regarded as not
meaningless one, I'm going to do some improvements.


On Wed, Aug 19, 2020 at 10:56 PM Michael Paquier  
wrote:

On Wed, Aug 19, 2020 at 06:12:02PM +0900, Fujii Masao wrote:

On 2020/08/19 17:40, torikoshia wrote:
Yes, I didn't add regression tests because of the unstability of the 
output.
I thought it would be OK since other views like pg_stat_slru and 
pg_shmem_allocations

didn't have tests for their outputs.


You're right.


If you can make a test with something minimal and with a stable
output, adding a test is helpful IMO, or how can you make easily sure
that this does not get broken, particularly in the event of future
refactorings, or even with platform-dependent behaviors?


OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


On Thu, Aug 20, 2020 at 12:02 AM Tom Lane  wrote:

Michael Paquier  writes:
> By the way, I was looking at the code that has been committed, and I
> think that it is awkward to have a SQL function in mcxt.c, which is a
> rather low-level interface.  I think that this new code should be
> moved to its own file, one suggestion for a location I have being
> src/backend/utils/adt/mcxtfuncs.c.

I agree with that,


Thanks for pointing out.
Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


On Thu, Aug 20, 2020 at 11:09 AM Fujii Masao 
 wrote:

On 2020/08/20 0:01, Tom Lane wrote:

Given the lack of clear use-case, and the possibility (admittedly
not strong) that this is still somehow a security hazard, I think
we should revert it.  If it stays, I'd like to see restrictions
on who can read the view.



For example, allowing only the role with pg_monitor to see this view?


Attached a patch adding that restriction.
(0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch)

Of course, this restriction makes pg_backend_memory_contexts hard to use
when the user of the target session is not granted pg_monitor because 
the

scope of this view is session local.

In this case, I imagine additional operations something like temporarily
granting pg_monitor to that user.

Thoughts?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 23fe541d5bd3cead787bb7c638f0086b9c2e13eb Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 21 Aug 2020 21:22:10 +0900
Subject: [PATCH] Added a regression test for pg_backend_memory_contexts.

---
 src/test/regress/expected/sysviews.out | 7 +++
 src/test/regress/sql/sysviews.sql  | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..06e09fd10b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -19,6 +19,13 @@ select count(*) >= 0 as ok from pg_available_extensions;
  t
 (1 row)
 
+-- There will surely be at least one context.
+select count(*) > 0 as ok from pg_backend_memory_contexts;
+ ok 
+
+ t
+(1 row)
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
  ok 
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 28e412b735..2c3b88c855 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -12,6 +12,9 @@ select count(*) >= 0 as ok from pg_available_extension_versions;
 
 select count(*) >= 0 as ok from pg_available_extensions;
 
+-- There will surely be at least one context.
+select count(*) > 0 as ok from pg_backend_memory_contexts;
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
 
-- 
2.18.1

From 4eee73933874fbab91643e7461717ba9038d8d76 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 21 Aug 2020 19:01:38 +0900
Subject: [PATCH] Rellocated the codes for pg_backend_memory_contexts in mcxt.c
 to src/backend/utils/adt/mcxtfuncs.c as they are low low-level interface.

---
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/mcxtfuncs.c | 157 ++
 src/backend/utils/mmgr/mcxt.c | 137 --
 3 files changed, 158 insertions(+), 137 deletions(-)
 create mode 100644 src/backend/utils/adt/mcxtfuncs.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 5d2aca8cfe..54d5c37947 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	lockfuncs.o \
 	mac.o \
 	mac8.o \
+	mcxtfuncs.o \
 	misc.o \
 	name.o \
 	network.o \
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
new file mode 100644
index 00..50e1b07ff0
--- /dev/null
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -0,0 +1,157 @@

Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Kasahara Tatsuhito
On 2020/08/20 0:01, Tom Lane wrote:
> The only situation I could imagine where this would have any use is
> where there is long-term (cross-query) bloat in, say, CacheMemoryContext
Yeah, in cases where a very large number of sessions are connected to
the DB for very long periods of time, the memory consumption of the
back-end processes may increase slowly, and such a feature is useful
for analysis.

And ,as Fujii said, this feature very useful to see which contexts are
consuming a lot of memory and to narrow down the causes.

On Thu, Aug 20, 2020 at 11:18 AM Fujii Masao
 wrote:
> On 2020/08/20 10:43, Andres Freund wrote:
> > Hi,
> > Even just being able to see the memory usage in a queryable way is a
> > huge benefit.
>
> +1
+1

I think this feature is very useful in environments where gdb is not
available or access to server logs is limited.
And it is cumbersome to extract and analyze the memory information
from the very large server logs.


> > I totally agree that it's not *enough*, but in contrast to you I think
> > it's a good step. Subsequently we should add a way to get any backends
> > memory usage.
> > It's not too hard to imagine how to serialize it in a way that can be
> > easily deserialized by another backend. I am imagining something like
> > sending a procsignal that triggers (probably at CFR() time) a backend to
> > write its own memory usage into pg_memusage/ or something roughly
> > like that.
>
> Sounds good. Maybe we can also provide the SQL-callable function
> or view to read pg_memusage/, to make the analysis easier.
+1

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Fujii Masao




On 2020/08/20 10:43, Andres Freund wrote:

Hi,

On 2020-08-19 21:29:06 -0400, Tom Lane wrote:

Andres Freund  writes:

On 2020-08-19 11:01:37 -0400, Tom Lane wrote:

I agree with that, but I think this patch has a bigger problem:
why bother at all?  It seems like a waste of code space and future
maintenance effort, because there is no use-case.



I don't agree with this at all. I think there's plenty use cases. It's
e.g. very common to try to figure out why the memory usage of a process
is high. Is it memory not returned to the OS? Is it caches that have
grown too much etc.


Oh, I agree completely that there are lots of use-cases for finding
out what a process' memory map looks like.  But this patch fails to
address any of them in a usable way.


Even just being able to see the memory usage in a queryable way is a
huge benefit.


+1


The difference over having to parse the log, then parse
the memory usage dump, and then aggregate the data in there in a
meaningful way is *huge*.  We've been slacking around lowering our
memory usage, and I think the fact that it's annoying to analyze is a
partial reason for that.


Agreed.

 

I totally agree that it's not *enough*, but in contrast to you I think
it's a good step. Subsequently we should add a way to get any backends
memory usage.
It's not too hard to imagine how to serialize it in a way that can be
easily deserialized by another backend. I am imagining something like
sending a procsignal that triggers (probably at CFR() time) a backend to
write its own memory usage into pg_memusage/ or something roughly
like that.


Sounds good. Maybe we can also provide the SQL-callable function
or view to read pg_memusage/, to make the analysis easier.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Fujii Masao




On 2020/08/20 0:01, Tom Lane wrote:

Hadn't been paying attention to this thread up till now, but ...

Michael Paquier  writes:

By the way, I was looking at the code that has been committed, and I
think that it is awkward to have a SQL function in mcxt.c, which is a
rather low-level interface.  I think that this new code should be
moved to its own file, one suggestion for a location I have being
src/backend/utils/adt/mcxtfuncs.c.


I agree with that, but I think this patch has a bigger problem:
why bother at all?  It seems like a waste of code space and future
maintenance effort, because there is no use-case.  In the situations
where you need to know where the memory went, you are almost never
in a position to leisurely execute a query and send the results over
to your client.  This certainly would be useless to figure out why
an already-running query is eating space, for instance.

The only situation I could imagine where this would have any use is
where there is long-term (cross-query) bloat in, say, CacheMemoryContext


Yes, this feature is useful to check a cross-query memory bloat like
the bloats of relcache, prepared statements, PL/pgSQL cache,
SMgrRelationHash, etc. For example, several years before, my colleague
investigated the cause of the memory bloat by using the almost same
feature that pg_cheat_funcs extension provides, and then found that
the cause was that the application forgot to release lots of SAVEPONT.



--- but it's not even very helpful for that, since you can't examine
anything finer-grain than a memory context.


Yes, but even that information can be good hint when investigating
the memory bloat.



Plus you need to be
running an interactive session, or else be willing to hack up your
application to try to get it to inspect the view (and log the
results somewhere) at useful times.

On top of all that, the functionality has Heisenberg problems,
because simply using it changes what you are trying to observe,
in complex and undocumented ways (not that the documentation
would be of any use to non-experts anyway).

My own thoughts about improving the debugging situation would've been
to create a way to send a signal to a session to make it dump its
current memory map to the postmaster log (not the client, since the
client is unlikely to be prepared to receive anything extraneous).
But this is nothing like that.


I agree that we need something like this, i.e., the way to monitor the memory
usage of any process even during query running. OTOH, I think that the added
feature has a use case and is good as the first step.



Given the lack of clear use-case, and the possibility (admittedly
not strong) that this is still somehow a security hazard, I think
we should revert it.  If it stays, I'd like to see restrictions
on who can read the view.


For example, allowing only the role with pg_monitor to see this view?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Andres Freund
Hi,

On 2020-08-19 21:29:06 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-08-19 11:01:37 -0400, Tom Lane wrote:
> >> I agree with that, but I think this patch has a bigger problem:
> >> why bother at all?  It seems like a waste of code space and future
> >> maintenance effort, because there is no use-case.
> 
> > I don't agree with this at all. I think there's plenty use cases. It's
> > e.g. very common to try to figure out why the memory usage of a process
> > is high. Is it memory not returned to the OS? Is it caches that have
> > grown too much etc.
> 
> Oh, I agree completely that there are lots of use-cases for finding
> out what a process' memory map looks like.  But this patch fails to
> address any of them in a usable way.

Even just being able to see the memory usage in a queryable way is a
huge benefit. The difference over having to parse the log, then parse
the memory usage dump, and then aggregate the data in there in a
meaningful way is *huge*.  We've been slacking around lowering our
memory usage, and I think the fact that it's annoying to analyze is a
partial reason for that.

I totally agree that it's not *enough*, but in contrast to you I think
it's a good step. Subsequently we should add a way to get any backends
memory usage.
It's not too hard to imagine how to serialize it in a way that can be
easily deserialized by another backend. I am imagining something like
sending a procsignal that triggers (probably at CFR() time) a backend to
write its own memory usage into pg_memusage/ or something roughly
like that.

Greetings,

Andres Freund




Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Tom Lane
Andres Freund  writes:
> On 2020-08-19 11:01:37 -0400, Tom Lane wrote:
>> I agree with that, but I think this patch has a bigger problem:
>> why bother at all?  It seems like a waste of code space and future
>> maintenance effort, because there is no use-case.

> I don't agree with this at all. I think there's plenty use cases. It's
> e.g. very common to try to figure out why the memory usage of a process
> is high. Is it memory not returned to the OS? Is it caches that have
> grown too much etc.

Oh, I agree completely that there are lots of use-cases for finding
out what a process' memory map looks like.  But this patch fails to
address any of them in a usable way.

>> My own thoughts about improving the debugging situation would've been
>> to create a way to send a signal to a session to make it dump its
>> current memory map to the postmaster log (not the client, since the
>> client is unlikely to be prepared to receive anything extraneous).

> That doesn't really work in a large number of environments, I'm
> afraid. Many many users don't have access to the server log.

My rationale for that was (a) it can be implemented without a lot
of impact on the memory map, and (b) requiring access to the server
log eliminates questions about whether you have enough privilege to
examine the map.  I'm prepared to compromise about (b), but less so
about (a).  Having to run a SQL query to find this out is a mess.

regards, tom lane




Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Andres Freund
Hi,

On 2020-08-19 11:01:37 -0400, Tom Lane wrote:
> Hadn't been paying attention to this thread up till now, but ...
> 
> Michael Paquier  writes:
> > By the way, I was looking at the code that has been committed, and I
> > think that it is awkward to have a SQL function in mcxt.c, which is a
> > rather low-level interface.  I think that this new code should be
> > moved to its own file, one suggestion for a location I have being
> > src/backend/utils/adt/mcxtfuncs.c.
> 
> I agree with that, but I think this patch has a bigger problem:
> why bother at all?  It seems like a waste of code space and future
> maintenance effort, because there is no use-case.  In the situations
> where you need to know where the memory went, you are almost never
> in a position to leisurely execute a query and send the results over
> to your client.  This certainly would be useless to figure out why
> an already-running query is eating space, for instance.

I don't agree with this at all. I think there's plenty use cases. It's
e.g. very common to try to figure out why the memory usage of a process
is high. Is it memory not returned to the OS? Is it caches that have
grown too much etc.

I agree it's not perfect:

> Plus you need to be running an interactive session, or else be willing
> to hack up your application to try to get it to inspect the view (and
> log the results somewhere) at useful times.

and that we likely should address that by *also* allowing to view the
memory usage of another process. Which obviously isn't entirely
trivial. But some infrastructure likely could be reused.


> My own thoughts about improving the debugging situation would've been
> to create a way to send a signal to a session to make it dump its
> current memory map to the postmaster log (not the client, since the
> client is unlikely to be prepared to receive anything extraneous).
> But this is nothing like that.

That doesn't really work in a large number of environments, I'm
afraid. Many many users don't have access to the server log.


> If it stays, I'd like to see restrictions on who can read the view.

As long as access is grantable rather than needing a security definer
wrapper I'd be fine with that.

Greetings,

Andres Freund




Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Tom Lane
Hadn't been paying attention to this thread up till now, but ...

Michael Paquier  writes:
> By the way, I was looking at the code that has been committed, and I
> think that it is awkward to have a SQL function in mcxt.c, which is a
> rather low-level interface.  I think that this new code should be
> moved to its own file, one suggestion for a location I have being
> src/backend/utils/adt/mcxtfuncs.c.

I agree with that, but I think this patch has a bigger problem:
why bother at all?  It seems like a waste of code space and future
maintenance effort, because there is no use-case.  In the situations
where you need to know where the memory went, you are almost never
in a position to leisurely execute a query and send the results over
to your client.  This certainly would be useless to figure out why
an already-running query is eating space, for instance.

The only situation I could imagine where this would have any use is
where there is long-term (cross-query) bloat in, say, CacheMemoryContext
--- but it's not even very helpful for that, since you can't examine
anything finer-grain than a memory context.  Plus you need to be
running an interactive session, or else be willing to hack up your
application to try to get it to inspect the view (and log the
results somewhere) at useful times.

On top of all that, the functionality has Heisenberg problems,
because simply using it changes what you are trying to observe,
in complex and undocumented ways (not that the documentation
would be of any use to non-experts anyway).

My own thoughts about improving the debugging situation would've been
to create a way to send a signal to a session to make it dump its
current memory map to the postmaster log (not the client, since the
client is unlikely to be prepared to receive anything extraneous).
But this is nothing like that.

Given the lack of clear use-case, and the possibility (admittedly
not strong) that this is still somehow a security hazard, I think
we should revert it.  If it stays, I'd like to see restrictions
on who can read the view.

regards, tom lane




Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Michael Paquier
On Wed, Aug 19, 2020 at 06:12:02PM +0900, Fujii Masao wrote:
> On 2020/08/19 17:40, torikoshia wrote:
>> Yes, I didn't add regression tests because of the unstability of the output.
>> I thought it would be OK since other views like pg_stat_slru and 
>> pg_shmem_allocations
>> didn't have tests for their outputs.
> 
> You're right.

If you can make a test with something minimal and with a stable
output, adding a test is helpful IMO, or how can you make easily sure
that this does not get broken, particularly in the event of future
refactorings, or even with platform-dependent behaviors?

By the way, I was looking at the code that has been committed, and I
think that it is awkward to have a SQL function in mcxt.c, which is a
rather low-level interface.  I think that this new code should be
moved to its own file, one suggestion for a location I have being
src/backend/utils/adt/mcxtfuncs.c.
--
Michael


signature.asc
Description: PGP signature


Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Fujii Masao




On 2020/08/19 17:40, torikoshia wrote:

On 2020-08-19 15:48, Fujii Masao wrote:

On 2020/08/19 9:43, torikoshia wrote:

On 2020-08-18 22:54, Fujii Masao wrote:

On 2020/08/18 18:41, torikoshia wrote:

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be located
just after pg_available_extension_versions entry. Because the rows in the table
"System Views" should be located in alphabetical order.


+ 
+ pg_backend_memory_contexts

Same as above.


Modified both.




+   The view pg_backend_memory_contexts displays all
+   the local backend memory contexts.

This description seems a bit confusing because maybe we can interpret this
as "... displays the memory contexts of all the local backends" wrongly. 
Thought?
What about the following description, instead?



 The view pg_backend_memory_contexts displays all
 the memory contexts of the server process attached to the current session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is actually NULL,
"context->name" would cause segmentation fault, so ISTM that the check
will never be performed.

If "context" can be NULL, the check should be performed before accessing
to "contect". OTOH, if "context" must not be NULL per the specification of
PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be TopMemoryContext
or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.


Isn't it better to add AssertArg(MemoryContextIsValid(context)), instead?


Thanks, that's better.



| for (child = context->firstchild; child != NULL; child = child->nextchild)
| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, parentname, 
level + 1);
| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash table name.
+  * To make it possible to identify its parent, we also display
+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") == 0)
+ values[2] = CStringGetTextDatum(parent->ident);
+ else
+ values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context,
but uses only the name of "parent" memory context. So isn't it better to use
"const char *parent" instead of "MemoryContext parent", as the argument of
the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the name
but also the ident of the "parent", I could not help but adding similar
codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


I was thinking that we can simplify the code as follows.
That is, we can just pass "name" as the argument of
PutMemoryContextsStatsTupleStore()
since "name" indicates context->name or ident (if name is "dynahash").

 for (child = context->firstchild; child != NULL; child = child->nextchild)
 {
-    const char *parentname;
-
-    /*
- * We labeled dynahash contexts with just the hash table name.
- * To make it possible to identify its parent, we also use
- * the hash table as its context name.
- */
-    if (context->ident && strcmp(context->name, "dynahash") == 0)
-    parentname = context->ident;
-    else
-    parentname = context->name;
-
 PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
-  child, parentname, level + 1);
+  child, name, level + 1);


I got it, thanks for the clarification!


Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread torikoshia

On 2020-08-19 15:48, Fujii Masao wrote:

On 2020/08/19 9:43, torikoshia wrote:

On 2020-08-18 22:54, Fujii Masao wrote:

On 2020/08/18 18:41, torikoshia wrote:

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version 
(199cec9779504c08aaa8159c6308283156547409)

and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts

+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should 
be located
just after pg_available_extension_versions entry. Because the rows 
in the table

"System Views" should be located in alphabetical order.


+ 
+ 
pg_backend_memory_contexts


Same as above.


Modified both.




+   The view pg_backend_memory_contexts 
displays all

+   the local backend memory contexts.

This description seems a bit confusing because maybe we can 
interpret this
as "... displays the memory contexts of all the local backends" 
wrongly. Thought?

What about the following description, instead?


 The view pg_backend_memory_contexts 
displays all
 the memory contexts of the server process attached to the 
current session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is 
actually NULL,
"context->name" would cause segmentation fault, so ISTM that the 
check

will never be performed.

If "context" can be NULL, the check should be performed before 
accessing
to "contect". OTOH, if "context" must not be NULL per the 
specification of

PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be 
TopMemoryContext

or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.


Isn't it better to add AssertArg(MemoryContextIsValid(context)), 
instead?


Thanks, that's better.



| for (child = context->firstchild; child != NULL; child = 
child->nextchild)

| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, 
parentname, level + 1);

| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash 
table name.
+  * To make it possible to identify its parent, we 
also display

+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") 
== 0)
+ values[2] = 
CStringGetTextDatum(parent->ident);

+ else
+ values[2] = 
CStringGetTextDatum(parent->name);


PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
context,
but uses only the name of "parent" memory context. So isn't it 
better to use
"const char *parent" instead of "MemoryContext parent", as the 
argument of

the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the 
name
but also the ident of the "parent", I could not help but adding 
similar

codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


I was thinking that we can simplify the code as follows.
That is, we can just pass "name" as the argument of
PutMemoryContextsStatsTupleStore()
since "name" indicates context->name or ident (if name is 
"dynahash").


 for (child = context->firstchild; child != NULL; child = 
child->nextchild)

 {
-    const char *parentname;
-
-    /*
- * We labeled dynahash contexts with just the hash table 
name.

- * To make it possible to identify its parent, we also use
- * the hash table as its context name.
- */
-    if (context->ident && strcmp(context->name, "dynahash") == 
0)

-    parentname = context->ident;
-    else
-    parentname = context->name;
-
 PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
-  child, parentname, level + 1);
+  child, name, level + 1);



Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Fujii Masao




On 2020/08/19 9:43, torikoshia wrote:

On 2020-08-18 22:54, Fujii Masao wrote:

On 2020/08/18 18:41, torikoshia wrote:

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be located
just after pg_available_extension_versions entry. Because the rows in the table
"System Views" should be located in alphabetical order.


+ 
+ pg_backend_memory_contexts

Same as above.


Modified both.




+   The view pg_backend_memory_contexts displays all
+   the local backend memory contexts.

This description seems a bit confusing because maybe we can interpret this
as "... displays the memory contexts of all the local backends" wrongly. 
Thought?
What about the following description, instead?



 The view pg_backend_memory_contexts displays all
 the memory contexts of the server process attached to the current session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is actually NULL,
"context->name" would cause segmentation fault, so ISTM that the check
will never be performed.

If "context" can be NULL, the check should be performed before accessing
to "contect". OTOH, if "context" must not be NULL per the specification of
PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be TopMemoryContext
or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.


Isn't it better to add AssertArg(MemoryContextIsValid(context)), instead?


Thanks, that's better.



| for (child = context->firstchild; child != NULL; child = child->nextchild)
| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, parentname, 
level + 1);
| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash table name.
+  * To make it possible to identify its parent, we also display
+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") == 0)
+ values[2] = CStringGetTextDatum(parent->ident);
+ else
+ values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context,
but uses only the name of "parent" memory context. So isn't it better to use
"const char *parent" instead of "MemoryContext parent", as the argument of
the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the name
but also the ident of the "parent", I could not help but adding similar
codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


I was thinking that we can simplify the code as follows.
That is, we can just pass "name" as the argument of
PutMemoryContextsStatsTupleStore()
since "name" indicates context->name or ident (if name is "dynahash").

 for (child = context->firstchild; child != NULL; child = child->nextchild)
 {
-    const char *parentname;
-
-    /*
- * We labeled dynahash contexts with just the hash table name.
- * To make it possible to identify its parent, we also use
- * the hash table as its context name.
- */
-    if (context->ident && strcmp(context->name, "dynahash") == 0)
-    parentname = context->ident;
-    else
-    parentname = context->name;
-
 PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
-  child, parentname, level + 1);
+  child, name, level + 1);


I got it, thanks for the clarification!

Attached a revised patch.


Thanks for updating the patch! I pushed it.

BTW, I guess 

Re: Creating a function for exposing memory usage of backend process

2020-08-18 Thread torikoshia

On 2020-08-18 22:54, Fujii Masao wrote:

On 2020/08/18 18:41, torikoshia wrote:

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version 
(199cec9779504c08aaa8159c6308283156547409)

and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts

+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be 
located
just after pg_available_extension_versions entry. Because the rows 
in the table

"System Views" should be located in alphabetical order.


+ 
+  
pg_backend_memory_contexts


Same as above.


Modified both.




+   The view pg_backend_memory_contexts 
displays all

+   the local backend memory contexts.

This description seems a bit confusing because maybe we can 
interpret this
as "... displays the memory contexts of all the local backends" 
wrongly. Thought?

What about the following description, instead?


 The view pg_backend_memory_contexts 
displays all
 the memory contexts of the server process attached to the 
current session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is 
actually NULL,
"context->name" would cause segmentation fault, so ISTM that the 
check

will never be performed.

If "context" can be NULL, the check should be performed before 
accessing
to "contect". OTOH, if "context" must not be NULL per the 
specification of

PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be 
TopMemoryContext

or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.


Isn't it better to add AssertArg(MemoryContextIsValid(context)), 
instead?


Thanks, that's better.



| for (child = context->firstchild; child != NULL; child = 
child->nextchild)

| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, 
parentname, level + 1);

| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash 
table name.
+  * To make it possible to identify its parent, we also 
display

+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") 
== 0)

+ values[2] = CStringGetTextDatum(parent->ident);
+ else
+ values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
context,
but uses only the name of "parent" memory context. So isn't it better 
to use
"const char *parent" instead of "MemoryContext parent", as the 
argument of

the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the 
name
but also the ident of the "parent", I could not help but adding 
similar

codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


I was thinking that we can simplify the code as follows.
That is, we can just pass "name" as the argument of
PutMemoryContextsStatsTupleStore()
since "name" indicates context->name or ident (if name is "dynahash").

 	for (child = context->firstchild; child != NULL; child = 
child->nextchild)

{
-   const char *parentname;
-
-   /*
-* We labeled dynahash contexts with just the hash table name.
-* To make it possible to identify its parent, we also use
-* the hash table as its context name.
-*/
-   if (context->ident && strcmp(context->name, "dynahash") == 0)
-   parentname = context->ident;
-   else
-   parentname = context->name;
-
PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
- child, 
parentname, level + 1);
+ 

Re: Creating a function for exposing memory usage of backend process

2020-08-18 Thread Fujii Masao




On 2020/08/18 18:41, torikoshia wrote:

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be located
just after pg_available_extension_versions entry. Because the rows in the table
"System Views" should be located in alphabetical order.


+ 
+  pg_backend_memory_contexts

Same as above.


Modified both.




+   The view pg_backend_memory_contexts displays all
+   the local backend memory contexts.

This description seems a bit confusing because maybe we can interpret this
as "... displays the memory contexts of all the local backends" wrongly. 
Thought?
What about the following description, instead?



 The view pg_backend_memory_contexts displays all
 the memory contexts of the server process attached to the current session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is actually NULL,
"context->name" would cause segmentation fault, so ISTM that the check
will never be performed.

If "context" can be NULL, the check should be performed before accessing
to "contect". OTOH, if "context" must not be NULL per the specification of
PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be TopMemoryContext
or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.


Isn't it better to add AssertArg(MemoryContextIsValid(context)), instead?




| for (child = context->firstchild; child != NULL; child = child->nextchild)
| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, parentname, 
level + 1);
| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash table name.
+  * To make it possible to identify its parent, we also display
+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") == 0)
+ values[2] = CStringGetTextDatum(parent->ident);
+ else
+ values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context,
but uses only the name of "parent" memory context. So isn't it better to use
"const char *parent" instead of "MemoryContext parent", as the argument of
the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the name
but also the ident of the "parent", I could not help but adding similar
codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


I was thinking that we can simplify the code as follows.
That is, we can just pass "name" as the argument of 
PutMemoryContextsStatsTupleStore()
since "name" indicates context->name or ident (if name is "dynahash").

for (child = context->firstchild; child != NULL; child = 
child->nextchild)
{
-   const char *parentname;
-
-   /*
-* We labeled dynahash contexts with just the hash table name.
-* To make it possible to identify its parent, we also use
-* the hash table as its context name.
-*/
-   if (context->ident && strcmp(context->name, "dynahash") == 0)
-   parentname = context->ident;
-   else
-   parentname = context->name;
-
PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
- child, 
parentname, level + 1);
+ child, name, 
level + 1);

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and 

Re: Creating a function for exposing memory usage of backend process

2020-08-18 Thread torikoshia

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version 
(199cec9779504c08aaa8159c6308283156547409)

and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts

+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be 
located
just after pg_available_extension_versions entry. Because the rows in 
the table

"System Views" should be located in alphabetical order.


+ 
+  pg_backend_memory_contexts

Same as above.


Modified both.




+   The view pg_backend_memory_contexts 
displays all

+   the local backend memory contexts.

This description seems a bit confusing because maybe we can interpret 
this
as "... displays the memory contexts of all the local backends" 
wrongly. Thought?

What about the following description, instead?


     The view pg_backend_memory_contexts 
displays all
     the memory contexts of the server process attached to the current 
session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is actually 
NULL,

"context->name" would cause segmentation fault, so ISTM that the check
will never be performed.

If "context" can be NULL, the check should be performed before 
accessing
to "contect". OTOH, if "context" must not be NULL per the 
specification of

PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be 
TopMemoryContext

or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.

| for (child = context->firstchild; child != NULL; child = 
child->nextchild)

| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, 
parentname, level + 1);

| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash table 
name.
+  * To make it possible to identify its parent, we also 
display

+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") == 
0)

+ values[2] = CStringGetTextDatum(parent->ident);
+ else
+ values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
context,
but uses only the name of "parent" memory context. So isn't it better 
to use
"const char *parent" instead of "MemoryContext parent", as the argument 
of

the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the name
but also the ident of the "parent", I could not help but adding similar
codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 18 Aug 2020 18:17:42 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.
 pg_get_backend_memory_contexts which exposes memory usage of the local
 backend. It also adds a new view pg_backend_memory_contexts for exposing
 local backend memory contexts.

---
 doc/src/sgml/catalogs.sgml   | 122 ++
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 147 +++
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 ++
 5 files changed, 291 insertions(+)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index fc329c5cff..1232b24e74 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9226,6 +9226,11 @@ SCRAM-SHA-256$iteration count:
   available versions of extensions
  
 
+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_config
   compile-time configuration parameters
@@ -9577,6 

Re: Creating a function for exposing memory usage of backend process

2020-08-17 Thread Fujii Masao




On 2020/08/17 21:14, Fujii Masao wrote:



On 2020/08/11 15:24, torikoshia wrote:

On 2020-08-08 10:44, Michael Paquier wrote:

On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:

On Fri, Jul 31, 2020 at 4:25 AM torikoshia  wrote:

And as Fujii-san told me in person, exposing memory address seems
not preferable considering there are security techniques like
address space layout randomization.


Yeah, exactly. ASLR wouldn't do anything to improve security if there
were no other security bugs, but there are, and some of those bugs are
harder to exploit if you don't know the precise memory addresses of
certain data structures. Similarly, exposing the addresses of our
internal data structures is harmless if we have no other security
bugs, but if we do, it might make those bugs easier to exploit. I
don't think this information is useful enough to justify taking that
risk.


FWIW, this is the class of issues where it is possible to print some
areas of memory, or even manipulate the stack so as it was possible to
pass down a custom pointer, so exposing the pointer locations is a
real risk, and this has happened in the past.  Anyway, it seems to me
that if this part is done, we could just make it superuser-only with
restrictive REVOKE privileges, but I am not sure that we have enough
user cases to justify this addition.



Thanks for your comments!

I convinced that exposing pointer locations introduce security risks
and it seems better not to do so.

And I now feel identifying exact memory context by exposing memory
address or other means seems overkill.
Showing just the context name of the parent would be sufficient and
0007 pattch takes this way.


Agreed.





On 2020-08-07 16:38, Kasahara Tatsuhito wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.

+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be located
just after pg_available_extension_versions entry. Because the rows in the table
"System Views" should be located in alphabetical order.


+ 
+  pg_backend_memory_contexts

Same as above.


+   The view pg_backend_memory_contexts displays all
+   the local backend memory contexts.

This description seems a bit confusing because maybe we can interpret this
as "... displays the memory contexts of all the local backends" wrongly. 
Thought?
What about the following description, instead?

     The view pg_backend_memory_contexts displays all
     the memory contexts of the server process attached to the current session.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is actually NULL,
"context->name" would cause segmentation fault, so ISTM that the check
will never be performed.

If "context" can be NULL, the check should be performed before accessing
to "contect". OTOH, if "context" must not be NULL per the specification of
PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Here is another comment.

+   if (parent == NULL)
+   nulls[2] = true;
+   else
+   /*
+* We labeled dynahash contexts with just the hash table name.
+* To make it possible to identify its parent, we also display
+* parent's ident here.
+*/
+   if (parent->ident && strcmp(parent->name, "dynahash") == 0)
+   values[2] = CStringGetTextDatum(parent->ident);
+   else
+   values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context,
but uses only the name of "parent" memory context. So isn't it better to use
"const char *parent" instead of "MemoryContext parent", as the argument of
the function? If we do that, we can simplify the above code.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-17 Thread Fujii Masao




On 2020/08/11 15:24, torikoshia wrote:

On 2020-08-08 10:44, Michael Paquier wrote:

On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:

On Fri, Jul 31, 2020 at 4:25 AM torikoshia  wrote:

And as Fujii-san told me in person, exposing memory address seems
not preferable considering there are security techniques like
address space layout randomization.


Yeah, exactly. ASLR wouldn't do anything to improve security if there
were no other security bugs, but there are, and some of those bugs are
harder to exploit if you don't know the precise memory addresses of
certain data structures. Similarly, exposing the addresses of our
internal data structures is harmless if we have no other security
bugs, but if we do, it might make those bugs easier to exploit. I
don't think this information is useful enough to justify taking that
risk.


FWIW, this is the class of issues where it is possible to print some
areas of memory, or even manipulate the stack so as it was possible to
pass down a custom pointer, so exposing the pointer locations is a
real risk, and this has happened in the past.  Anyway, it seems to me
that if this part is done, we could just make it superuser-only with
restrictive REVOKE privileges, but I am not sure that we have enough
user cases to justify this addition.



Thanks for your comments!

I convinced that exposing pointer locations introduce security risks
and it seems better not to do so.

And I now feel identifying exact memory context by exposing memory
address or other means seems overkill.
Showing just the context name of the parent would be sufficient and
0007 pattch takes this way.


Agreed.





On 2020-08-07 16:38, Kasahara Tatsuhito wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.

+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be located
just after pg_available_extension_versions entry. Because the rows in the table
"System Views" should be located in alphabetical order.


+ 
+  pg_backend_memory_contexts

Same as above.


+   The view pg_backend_memory_contexts displays all
+   the local backend memory contexts.

This description seems a bit confusing because maybe we can interpret this
as "... displays the memory contexts of all the local backends" wrongly. 
Thought?
What about the following description, instead?

The view pg_backend_memory_contexts displays all
the memory contexts of the server process attached to the current session.


+   const char *name = context->name;
+   const char *ident = context->ident;
+
+   if (context == NULL)
+   return;

The above check "context == NULL" is useless? If "context" is actually NULL,
"context->name" would cause segmentation fault, so ISTM that the check
will never be performed.

If "context" can be NULL, the check should be performed before accessing
to "contect". OTOH, if "context" must not be NULL per the specification of
PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-11 Thread torikoshia

On 2020-08-08 10:44, Michael Paquier wrote:

On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:
On Fri, Jul 31, 2020 at 4:25 AM torikoshia 
 wrote:

And as Fujii-san told me in person, exposing memory address seems
not preferable considering there are security techniques like
address space layout randomization.


Yeah, exactly. ASLR wouldn't do anything to improve security if there
were no other security bugs, but there are, and some of those bugs are
harder to exploit if you don't know the precise memory addresses of
certain data structures. Similarly, exposing the addresses of our
internal data structures is harmless if we have no other security
bugs, but if we do, it might make those bugs easier to exploit. I
don't think this information is useful enough to justify taking that
risk.


FWIW, this is the class of issues where it is possible to print some
areas of memory, or even manipulate the stack so as it was possible to
pass down a custom pointer, so exposing the pointer locations is a
real risk, and this has happened in the past.  Anyway, it seems to me
that if this part is done, we could just make it superuser-only with
restrictive REVOKE privileges, but I am not sure that we have enough
user cases to justify this addition.



Thanks for your comments!

I convinced that exposing pointer locations introduce security risks
and it seems better not to do so.

And I now feel identifying exact memory context by exposing memory
address or other means seems overkill.
Showing just the context name of the parent would be sufficient and
0007 pattch takes this way.


On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-07 Thread Michael Paquier
On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:
> On Fri, Jul 31, 2020 at 4:25 AM torikoshia  wrote:
>> And as Fujii-san told me in person, exposing memory address seems
>> not preferable considering there are security techniques like
>> address space layout randomization.
> 
> Yeah, exactly. ASLR wouldn't do anything to improve security if there
> were no other security bugs, but there are, and some of those bugs are
> harder to exploit if you don't know the precise memory addresses of
> certain data structures. Similarly, exposing the addresses of our
> internal data structures is harmless if we have no other security
> bugs, but if we do, it might make those bugs easier to exploit. I
> don't think this information is useful enough to justify taking that
> risk.

FWIW, this is the class of issues where it is possible to print some
areas of memory, or even manipulate the stack so as it was possible to
pass down a custom pointer, so exposing the pointer locations is a
real risk, and this has happened in the past.  Anyway, it seems to me
that if this part is done, we could just make it superuser-only with
restrictive REVOKE privileges, but I am not sure that we have enough
user cases to justify this addition.
--
Michael


signature.asc
Description: PGP signature


Re: Creating a function for exposing memory usage of backend process

2020-08-07 Thread Kasahara Tatsuhito
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I tested the latest 
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch) 
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409) and test 
was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Re: Creating a function for exposing memory usage of backend process

2020-07-31 Thread Robert Haas
On Fri, Jul 31, 2020 at 4:25 AM torikoshia  wrote:
> And as Fujii-san told me in person, exposing memory address seems
> not preferable considering there are security techniques like
> address space layout randomization.

Yeah, exactly. ASLR wouldn't do anything to improve security if there
were no other security bugs, but there are, and some of those bugs are
harder to exploit if you don't know the precise memory addresses of
certain data structures. Similarly, exposing the addresses of our
internal data structures is harmless if we have no other security
bugs, but if we do, it might make those bugs easier to exploit. I
don't think this information is useful enough to justify taking that
risk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Creating a function for exposing memory usage of backend process

2020-07-31 Thread torikoshia

On 2020-07-30 15:13, Kasahara Tatsuhito wrote:

Hi,

On Fri, Jul 10, 2020 at 5:32 PM torikoshia  
wrote:
- whether information for identifying parent-child relation is 
necessary

or it's an overkill

I think it's important to understand the parent-child relationship of
the context.
Personally, I often want to know the following two things ..

- In which life cycle is the target context? (Remaining as long as the
process is living? per query?)
- Does the target context belong to the correct (parent) context?

- if this information is necessary, memory address is suitable or 
other

means like assigning unique numbers are required

IMO, If each context can be uniquely identified (or easily guessed) by
"name" and "ident",
then I don't think the address information is necessary.
Instead, I like the way that directly shows the context name of the
parent, as in the 0005 patch.


Thanks for your opinion!

I also feel it'll be sufficient to know not the exact memory context
of the parent but the name of the parent context.

And as Fujii-san told me in person, exposing memory address seems
not preferable considering there are security techniques like
address space layout randomization.




On 2020-07-10 08:30:22 +0900, torikoshia wrote:

On 2020-07-08 22:12, Fujii Masao wrote:



Another comment about parent column is: dynahash can be parent?
If yes, its indent instead of name should be displayed in parent
column?



I'm not sure yet, but considering the changes in the future, it seems
better to do so.


Attached a patch which displays ident as parent when dynahash is a
parent.

I could not find the case when dynahash can be a parent so I tested it
using attached test purposed patch.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 31 Jul 2020 16:20:29 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

This patch implements a new SQL-callable function
pg_get_backend_memory_contexts which exposes memory usage of the
local backend.
It also adds a new view pg_backend_memory_contexts for exposing
local backend memory contexts.

---
 doc/src/sgml/catalogs.sgml   | 122 +++
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 140 +++
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 ++
 5 files changed, 284 insertions(+)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 26fda20d19..5bfc983a90 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9266,6 +9266,11 @@ SCRAM-SHA-256$iteration count:
   materialized views
  
 
+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_policies
   policies
@@ -10544,6 +10549,123 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_backend_memory_contexts
+
+  
+   pg_backend_memory_contexts
+  
+
+  
+   The view pg_backend_memory_contexts displays all
+   the local backend memory contexts.
+  
+  
+   pg_backend_memory_contexts contains one row
+   for each memory context.
+  
+
+  
+   pg_backend_memory_contexts Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   name text
+  
+  
+   Name of the memory context
+  
+ 
+
+ 
+  
+   ident text
+  
+  
+   Identification information of the memory context. This field is truncated at 1024 bytes
+  
+ 
+
+ 
+  
+   parent text
+  
+  
+   Name of the parent of this memory context
+  
+ 
+
+ 
+  
+   level int4
+  
+  
+   Distance from TopMemoryContext in context tree
+  
+ 
+
+ 
+  
+   total_bytes int8
+  
+  
+   Total bytes allocated for this memory context
+  
+ 
+
+ 
+  
+   total_nblocks int8
+  
+  
+   Total number of blocks allocated for this memory context
+  
+ 
+
+ 
+  
+   free_bytes int8
+  
+  
+   Free space in bytes
+  
+ 
+
+ 
+  
+   free_chunks int8
+  
+  
+   Total number of free chunks
+  
+ 
+
+ 
+  
+   used_bytes int8
+  
+  
+   Used space in bytes
+  
+ 
+
+   
+  
+
+ 
+
  
   pg_matviews
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8625cbeab6..ba5a23ac25 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -554,6 +554,9 @@ CREATE VIEW pg_shmem_allocations AS
 REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 
+CREATE VIEW pg_backend_memory_contexts AS
+SELECT * FROM 

Re: Creating a function for exposing memory usage of backend process

2020-07-30 Thread Kasahara Tatsuhito
Hi,

On Fri, Jul 10, 2020 at 5:32 PM torikoshia  wrote:
> - whether information for identifying parent-child relation is necessary
> or it's an overkill
I think it's important to understand the parent-child relationship of
the context.
Personally, I often want to know the following two things ..

- In which life cycle is the target context? (Remaining as long as the
process is living? per query?)
- Does the target context belong to the correct (parent) context?

> - if this information is necessary, memory address is suitable or other
> means like assigning unique numbers are required
IMO, If each context can be uniquely identified (or easily guessed) by
"name" and "ident",
then I don't think the address information is necessary.
Instead, I like the way that directly shows the context name of the
parent, as in the 0005 patch.

Best regards

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-07-12 Thread Fujii Masao




On 2020/07/10 17:32, torikoshia wrote:

On 2020-07-09 02:03, Andres Freund wrote:

Hi,

I think this is an incredibly useful feature.


Thanks for your kind comments and suggestion!



On 2020-07-07 22:02:10 +0900, torikoshia wrote:

> There can be multiple memory contexts with the same name. So I'm afraid
> that it's difficult to identify the actual parent memory context from
> this
> "parent" column. This is ok when logging memory contexts by calling
> MemoryContextStats() via gdb. Because child memory contexts are printed
> just under their parent, with indents. But this doesn't work in the
> view.
> To identify the actual parent memory or calculate the memory contexts
> tree
> from the view, we might need to assign unique ID to each memory context
> and display it. But IMO this is overkill. So I'm fine with current
> "parent"
> column. Thought? Do you have any better idea?

Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.


Hm. I wonder if we just could include the address of the context
itself. There might be reasons not to do so (e.g. security concerns
about leaked pointers making attacks easier), but I think it's worth
considering.



I tried exposing addresses of each context and their parent.
Attached a poc patch.

   =# SELECT name, address, parent_address, total_bytes FROM 
pg_backend_memory_contexts ;

    name   |  address  | parent_address | total_bytes
   --+---++-
    TopMemoryContext | 0x1280da0 |    |   80800
    TopTransactionContext    | 0x1309040 | 0x1280da0  |    8192
    Prepared Queries | 0x138a480 | 0x1280da0  |   16384
    Type information cache   | 0x134b8c0 | 0x1280da0  |   24624
    ...
    CacheMemoryContext   | 0x12cb390 | 0x1280da0  | 1048576
    CachedPlanSource | 0x13c47f0 | 0x12cb390  |    4096
    CachedPlanQuery  | 0x13c9ae0 | 0x13c47f0  |    4096
    CachedPlanSource | 0x13c7310 | 0x12cb390  |    4096
    CachedPlanQuery  | 0x13c1230 | 0x13c7310  |    4096
    ...


Now it's possible to identify the actual parent memory context even when
there are multiple memory contexts with the same name.

I'm not sure, but I'm also worrying about this might incur some security
related problems..

I'd like to hear more opinions about:

- whether information for identifying parent-child relation is necessary or 
it's an overkill
- if this information is necessary, memory address is suitable or other means 
like assigning unique numbers are required


To consider this, I'd like to know what security issue can actually
happen when memory addresses are exposed. I have no idea about this..

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-07-10 Thread torikoshia

On 2020-07-09 02:03, Andres Freund wrote:

Hi,

I think this is an incredibly useful feature.


Thanks for your kind comments and suggestion!



On 2020-07-07 22:02:10 +0900, torikoshia wrote:

> There can be multiple memory contexts with the same name. So I'm afraid
> that it's difficult to identify the actual parent memory context from
> this
> "parent" column. This is ok when logging memory contexts by calling
> MemoryContextStats() via gdb. Because child memory contexts are printed
> just under their parent, with indents. But this doesn't work in the
> view.
> To identify the actual parent memory or calculate the memory contexts
> tree
> from the view, we might need to assign unique ID to each memory context
> and display it. But IMO this is overkill. So I'm fine with current
> "parent"
> column. Thought? Do you have any better idea?

Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.


Hm. I wonder if we just could include the address of the context
itself. There might be reasons not to do so (e.g. security concerns
about leaked pointers making attacks easier), but I think it's worth
considering.



I tried exposing addresses of each context and their parent.
Attached a poc patch.

  =# SELECT name, address, parent_address, total_bytes FROM 
pg_backend_memory_contexts ;


   name   |  address  | parent_address | total_bytes
  --+---++-
   TopMemoryContext | 0x1280da0 ||   80800
   TopTransactionContext| 0x1309040 | 0x1280da0  |8192
   Prepared Queries | 0x138a480 | 0x1280da0  |   16384
   Type information cache   | 0x134b8c0 | 0x1280da0  |   24624
   ...
   CacheMemoryContext   | 0x12cb390 | 0x1280da0  | 1048576
   CachedPlanSource | 0x13c47f0 | 0x12cb390  |4096
   CachedPlanQuery  | 0x13c9ae0 | 0x13c47f0  |4096
   CachedPlanSource | 0x13c7310 | 0x12cb390  |4096
   CachedPlanQuery  | 0x13c1230 | 0x13c7310  |4096
   ...


Now it's possible to identify the actual parent memory context even when
there are multiple memory contexts with the same name.

I'm not sure, but I'm also worrying about this might incur some security
related problems..

I'd like to hear more opinions about:

- whether information for identifying parent-child relation is necessary 
or it's an overkill
- if this information is necessary, memory address is suitable or other 
means like assigning unique numbers are required




+/*
+ * PutMemoryContextsStatsTupleStore
+ * One recursion level for pg_get_backend_memory_contexts.
+ */
+static void
+PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
+   TupleDesc 
tupdesc, MemoryContext context,
+   MemoryContext 
parent, int level)
+{
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS9
+   Datum   values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+   boolnulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+   MemoryContextCounters stat;
+   MemoryContext child;
+   const char *name = context->name;
+   const char *ident = context->ident;
+
+   if (context == NULL)
+   return;
+
+   /*
+* To be consistent with logging output, we label dynahash contexts
+* with just the hash table name as with MemoryContextStatsPrint().
+*/
+   if (ident && strcmp(name, "dynahash") == 0)
+   {
+   name = ident;
+   ident = NULL;
+   }
+
+   /* Examine the context itself */
+   memset(, 0, sizeof(stat));
+   (*context->methods->stats) (context, NULL, (void *) , );
+
+   memset(values, 0, sizeof(values));
+   memset(nulls, 0, sizeof(nulls));
+
+   values[0] = CStringGetTextDatum(name);
+
+   if (ident)
+   {
+   int idlen = strlen(ident);
+   char
clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+   /*
+* Some identifiers such as SQL query string can be very long,
+* truncate oversize identifiers.
+*/
+   if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+			idlen = pg_mbcliplen(ident, idlen, 
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);

+


Why?


As described below[1], too long messages caused problems in the past and 
now
MemoryContextStatsPrint() truncates ident, so I decided to truncate it 
also

here.

Do you think it's not necessary here?

[1] https://www.postgresql.org/message-id/12319.1521999...@sss.pgh.pa.us


Regards,


--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 10 Jul 2020 17:01:01 +0900
Subject: [PATCH] Add a function 

Re: Creating a function for exposing memory usage of backend process

2020-07-10 Thread torikoshia

On 2020-07-08 22:12, Fujii Masao wrote:

Thanks for updating the patch! It basically looks good to me.

+  
+   backend memory contexts
+  

Do we need this indexterm?


Thanks! it's not necessary. I remove this indexterm.




+{ oid => '2282', descr => 'statistics: information about all memory
contexts of local backend',

Isn't it better to remove "statistics: " from the above description?


Yeah, it's my oversight.



+ 
+  role="column_definition">

+   parent text

There can be multiple memory contexts with the same name. So I'm 
afraid
that it's difficult to identify the actual parent memory context from 
this

"parent" column. This is ok when logging memory contexts by calling
MemoryContextStats() via gdb. Because child memory contexts are 
printed
just under their parent, with indents. But this doesn't work in the 
view.
To identify the actual parent memory or calculate the memory contexts 
tree
from the view, we might need to assign unique ID to each memory 
context
and display it. But IMO this is overkill. So I'm fine with current 
"parent"

column. Thought? Do you have any better idea?


Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.


Agreed. Displaying such ID would be more confusing to users.
Ok, let's leave the code as it is.

Another comment about parent column is: dynahash can be parent?
If yes, its indent instead of name should be displayed in parent 
column?


I'm not sure yet, but considering the changes in the future, it seems
better to do so.

But if we add information for identifying parent-child relation like the
memory address suggested from Andres, it seems not necessary.

So I'd like to go back to this point.



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION





Re: Creating a function for exposing memory usage of backend process

2020-07-08 Thread Andres Freund
Hi,

I think this is an incredibly useful feature.


On 2020-07-07 22:02:10 +0900, torikoshia wrote:
> > There can be multiple memory contexts with the same name. So I'm afraid
> > that it's difficult to identify the actual parent memory context from
> > this
> > "parent" column. This is ok when logging memory contexts by calling
> > MemoryContextStats() via gdb. Because child memory contexts are printed
> > just under their parent, with indents. But this doesn't work in the
> > view.
> > To identify the actual parent memory or calculate the memory contexts
> > tree
> > from the view, we might need to assign unique ID to each memory context
> > and display it. But IMO this is overkill. So I'm fine with current
> > "parent"
> > column. Thought? Do you have any better idea?
> 
> Indeed.
> I also feel it's not usual to assign a unique ID, which
> can vary every time the view displayed.

Hm. I wonder if we just could include the address of the context
itself. There might be reasons not to do so (e.g. security concerns
about leaked pointers making attacks easier), but I think it's worth
considering.


> We show each context using a recursive function and this is
> a kind of depth-first search. So, as far as I understand,
> we can identify its parent using both the "parent" column
> and the order of the rows.
> 
> Documenting these things may worth for who want to identify
> the relation between parents and children.
> 
> Of course, in the relational model, the order of relation
> does not have meaning so it's also unusual in this sense..

It makes it pretty hard to write summarizing queries, so I am not a huge
fan of just relying on the order.


> +/*
> + * PutMemoryContextsStatsTupleStore
> + *   One recursion level for pg_get_backend_memory_contexts.
> + */
> +static void
> +PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
> + TupleDesc 
> tupdesc, MemoryContext context,
> + MemoryContext 
> parent, int level)
> +{
> +#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS  9
> + Datum   values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
> + boolnulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
> + MemoryContextCounters stat;
> + MemoryContext child;
> + const char *name = context->name;
> + const char *ident = context->ident;
> +
> + if (context == NULL)
> + return;
> +
> + /*
> +  * To be consistent with logging output, we label dynahash contexts
> +  * with just the hash table name as with MemoryContextStatsPrint().
> +  */
> + if (ident && strcmp(name, "dynahash") == 0)
> + {
> + name = ident;
> + ident = NULL;
> + }
> +
> + /* Examine the context itself */
> + memset(, 0, sizeof(stat));
> + (*context->methods->stats) (context, NULL, (void *) , );
> +
> + memset(values, 0, sizeof(values));
> + memset(nulls, 0, sizeof(nulls));
> +
> + values[0] = CStringGetTextDatum(name);
> +
> + if (ident)
> + {
> + int idlen = strlen(ident);
> + char
> clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
> +
> + /*
> +  * Some identifiers such as SQL query string can be very long,
> +  * truncate oversize identifiers.
> +  */
> + if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
> + idlen = pg_mbcliplen(ident, idlen, 
> MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
> +

Why?

Greetings,

Andres Freund




Re: Creating a function for exposing memory usage of backend process

2020-07-08 Thread Fujii Masao




On 2020/07/07 22:02, torikoshia wrote:

On 2020-07-06 15:16, Fujii Masao wrote:

On 2020/07/06 12:12, torikoshia wrote:

On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao  wrote:

Thanks for your review!


I like more specific name like pg_backend_memory_contexts.


Agreed.

When I was trying to add this function as statistics function,
I thought that naming pg_stat_getbackend_memory_context()
might make people regarded it as a "per-backend statistics
function", whose parameter is backend ID number.
So I removed "backend", but now there is no necessity to do
so.


But I'd like to hear more opinions about the name from others.


I changed the name to pg_backend_memory_contexts for the time
being.


+1



- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c




+       Identification information of the memory context. This field is 
truncated if the identification field is longer than 1024 characters


"characters" should be "bytes"?


Fixed, but I used "characters" while referring to the
descriptions on the manual of pg_stat_activity.query
below.

| By default the query text is truncated at 1024 characters;

It has nothing to do with this thread, but considering
multibyte characters, it also may be better to change it
to "bytes".


Yeah, I agree we should write the separate patch fixing that. You will?
If not, I will do that later.


Thanks, I will try it!


Thanks!





Regarding the other comments, I revised the patch as you pointed.


Thanks for updating the patch! The patch basically looks good to me/
Here are some minor comments:

+#define MEMORY_CONTEXT_IDENT_SIZE    1024

This macro varible name sounds like the maximum allowed length of ident that
each menory context has. But actually this limits the maximum bytes of ident
to display. So I think that it's better to rename this macro to something like
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?


Agreed.
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE seems more accurate.


+#define PG_GET_MEMORY_CONTEXTS_COLS    9
+    Datum    values[PG_GET_MEMORY_CONTEXTS_COLS];
+    bool    nulls[PG_GET_MEMORY_CONTEXTS_COLS];

This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
for the consistency with the function name?


Thanks! Fixed it.


Thanks for updating the patch! It basically looks good to me.

+  
+   backend memory contexts
+  

Do we need this indexterm?






+{ oid => '2282', descr => 'statistics: information about all memory
contexts of local backend',

Isn't it better to remove "statistics: " from the above description?


Yeah, it's my oversight.



+ 
+  
+   parent text

There can be multiple memory contexts with the same name. So I'm afraid
that it's difficult to identify the actual parent memory context from this
"parent" column. This is ok when logging memory contexts by calling
MemoryContextStats() via gdb. Because child memory contexts are printed
just under their parent, with indents. But this doesn't work in the view.
To identify the actual parent memory or calculate the memory contexts tree
from the view, we might need to assign unique ID to each memory context
and display it. But IMO this is overkill. So I'm fine with current "parent"
column. Thought? Do you have any better idea?


Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.


Agreed. Displaying such ID would be more confusing to users.
Ok, let's leave the code as it is.

Another comment about parent column is: dynahash can be parent?
If yes, its indent instead of name should be displayed in parent column?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-07-07 Thread torikoshia

On 2020-07-06 15:16, Fujii Masao wrote:

On 2020/07/06 12:12, torikoshia wrote:
On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao 
 wrote:


Thanks for your review!


I like more specific name like pg_backend_memory_contexts.


Agreed.

When I was trying to add this function as statistics function,
I thought that naming pg_stat_getbackend_memory_context()
might make people regarded it as a "per-backend statistics
function", whose parameter is backend ID number.
So I removed "backend", but now there is no necessity to do
so.


But I'd like to hear more opinions about the name from others.


I changed the name to pg_backend_memory_contexts for the time
being.


+1



- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c



+       Identification information of the memory context. This field 
is truncated if the identification field is longer than 1024 
characters


"characters" should be "bytes"?


Fixed, but I used "characters" while referring to the
descriptions on the manual of pg_stat_activity.query
below.

| By default the query text is truncated at 1024 characters;

It has nothing to do with this thread, but considering
multibyte characters, it also may be better to change it
to "bytes".


Yeah, I agree we should write the separate patch fixing that. You will?
If not, I will do that later.


Thanks, I will try it!


Regarding the other comments, I revised the patch as you pointed.


Thanks for updating the patch! The patch basically looks good to me/
Here are some minor comments:

+#define MEMORY_CONTEXT_IDENT_SIZE  1024

This macro varible name sounds like the maximum allowed length of ident 
that
each menory context has. But actually this limits the maximum bytes of 
ident
to display. So I think that it's better to rename this macro to 
something like

MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?


Agreed.
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE seems more accurate.


+#define PG_GET_MEMORY_CONTEXTS_COLS9
+   Datum   values[PG_GET_MEMORY_CONTEXTS_COLS];
+   boolnulls[PG_GET_MEMORY_CONTEXTS_COLS];

This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
for the consistency with the function name?


Thanks! Fixed it.



+{ oid => '2282', descr => 'statistics: information about all memory
contexts of local backend',

Isn't it better to remove "statistics: " from the above description?


Yeah, it's my oversight.



+ 
+  role="column_definition">

+   parent text

There can be multiple memory contexts with the same name. So I'm afraid
that it's difficult to identify the actual parent memory context from 
this

"parent" column. This is ok when logging memory contexts by calling
MemoryContextStats() via gdb. Because child memory contexts are printed
just under their parent, with indents. But this doesn't work in the 
view.
To identify the actual parent memory or calculate the memory contexts 
tree

from the view, we might need to assign unique ID to each memory context
and display it. But IMO this is overkill. So I'm fine with current 
"parent"

column. Thought? Do you have any better idea?


Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.

We show each context using a recursive function and this is
a kind of depth-first search. So, as far as I understand,
we can identify its parent using both the "parent" column
and the order of the rows.

Documenting these things may worth for who want to identify
the relation between parents and children.

Of course, in the relational model, the order of relation
does not have meaning so it's also unusual in this sense..


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 7 Jul 2020 21:20:29 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

This patch implements a new SQL-callable function
pg_get_backend_memory_contexts which exposes memory usage of the
local backend.
It also adds a new view pg_backend_memory_contexts for exposing
local backend memory contexts.
---
 doc/src/sgml/catalogs.sgml   | 126 +
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 132 ++-
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 ++
 5 files changed, 279 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 003d278370..174717616b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9248,6 +9248,11 @@ SCRAM-SHA-256$iteration count:
   materialized views
  
 
+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_policies
   policies
@@ -10526,6 +10531,127 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_backend_memory_contexts
+

Re: Creating a function for exposing memory usage of backend process

2020-07-06 Thread Fujii Masao




On 2020/07/06 12:12, torikoshia wrote:

On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao  wrote:

Thanks for your review!


I like more specific name like pg_backend_memory_contexts.


Agreed.

When I was trying to add this function as statistics function,
I thought that naming pg_stat_getbackend_memory_context()
might make people regarded it as a "per-backend statistics
function", whose parameter is backend ID number.
So I removed "backend", but now there is no necessity to do
so.


But I'd like to hear more opinions about the name from others.


I changed the name to pg_backend_memory_contexts for the time
being.


+1



- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c




+       Identification information of the memory context. This field is 
truncated if the identification field is longer than 1024 characters


"characters" should be "bytes"?


Fixed, but I used "characters" while referring to the
descriptions on the manual of pg_stat_activity.query
below.

| By default the query text is truncated at 1024 characters;

It has nothing to do with this thread, but considering
multibyte characters, it also may be better to change it
to "bytes".


Yeah, I agree we should write the separate patch fixing that. You will?
If not, I will do that later.



Regarding the other comments, I revised the patch as you pointed.


Thanks for updating the patch! The patch basically looks good to me/
Here are some minor comments:

+#define MEMORY_CONTEXT_IDENT_SIZE  1024

This macro varible name sounds like the maximum allowed length of ident that
each menory context has. But actually this limits the maximum bytes of ident
to display. So I think that it's better to rename this macro to something like
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?

+#define PG_GET_MEMORY_CONTEXTS_COLS9
+   Datum   values[PG_GET_MEMORY_CONTEXTS_COLS];
+   boolnulls[PG_GET_MEMORY_CONTEXTS_COLS];

This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
for the consistency with the function name?

+{ oid => '2282', descr => 'statistics: information about all memory contexts 
of local backend',

Isn't it better to remove "statistics: " from the above description?

+ 
+  
+   parent text

There can be multiple memory contexts with the same name. So I'm afraid
that it's difficult to identify the actual parent memory context from this
"parent" column. This is ok when logging memory contexts by calling
MemoryContextStats() via gdb. Because child memory contexts are printed
just under their parent, with indents. But this doesn't work in the view.
To identify the actual parent memory or calculate the memory contexts tree
from the view, we might need to assign unique ID to each memory context
and display it. But IMO this is overkill. So I'm fine with current "parent"
column. Thought? Do you have any better idea?


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-07-05 Thread torikoshia
On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao  
wrote:


Thanks for your review!


I like more specific name like pg_backend_memory_contexts.


Agreed.

When I was trying to add this function as statistics function,
I thought that naming pg_stat_getbackend_memory_context()
might make people regarded it as a "per-backend statistics
function", whose parameter is backend ID number.
So I removed "backend", but now there is no necessity to do
so.


But I'd like to hear more opinions about the name from others.


I changed the name to pg_backend_memory_contexts for the time
being.



- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c



+       Identification information of the memory context. This field 
is truncated if the identification field is longer than 1024 
characters


"characters" should be "bytes"?


Fixed, but I used "characters" while referring to the
descriptions on the manual of pg_stat_activity.query
below.

| By default the query text is truncated at 1024 characters;

It has nothing to do with this thread, but considering
multibyte characters, it also may be better to change it
to "bytes".


Regarding the other comments, I revised the patch as you pointed.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 6 Jul 2020 10:50:49 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

Previously, the only way to examine the usage of backend processes
was attaching a debugger and call MemoryContextStats() and it was
not so convenient in some cases.

This patch implements a new SQL-callable function
pg_get_backend_memory_contexts which exposes memory usage of the
local backend.
It also adds a new view pg_backend_memory_contexts for exposing
local backend memory contexts.
---
 doc/src/sgml/catalogs.sgml   | 126 +
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 132 ++-
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 ++
 5 files changed, 279 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 49a881b262..abc097179f 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9248,6 +9248,11 @@ SCRAM-SHA-256$iteration count:
   materialized views
  
 
+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_policies
   policies
@@ -10526,6 +10531,127 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_backend_memory_contexts
+
+  
+   pg_backend_memory_contexts
+  
+
+  
+   backend memory contexts
+  
+
+  
+   The view pg_backend_memory_contexts displays all
+   the local backend memory contexts.
+  
+  
+   pg_backend_memory_contexts contains one row
+   for each memory context.
+  
+
+  
+   pg_backend_memory_contexts Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   name text
+  
+  
+   Name of the memory context
+  
+ 
+
+ 
+  
+   ident text
+  
+  
+   Identification information of the memory context. This field is truncated at 1024 bytes
+  
+ 
+
+ 
+  
+   parent text
+  
+  
+   Name of the parent of this memory context
+  
+ 
+
+ 
+  
+   level int4
+  
+  
+   Distance from TopMemoryContext in context tree
+  
+ 
+
+ 
+  
+   total_bytes int8
+  
+  
+   Total bytes allocated for this memory context
+  
+ 
+
+ 
+  
+   total_nblocks int8
+  
+  
+   Total number of blocks allocated for this memory context
+  
+ 
+
+ 
+  
+   free_bytes int8
+  
+  
+   Free space in bytes
+  
+ 
+
+ 
+  
+   free_chunks int8
+  
+  
+   Total number of free chunks
+  
+ 
+
+ 
+  
+   used_bytes int8
+  
+  
+   Used space in bytes
+  
+ 
+
+   
+  
+
+ 
+
  
   pg_matviews
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..403954039b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -554,6 +554,9 @@ CREATE VIEW pg_shmem_allocations AS
 REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 
+CREATE VIEW pg_backend_memory_contexts AS
+SELECT * FROM pg_get_backend_memory_contexts();
+
 -- Statistics views
 
 CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index abda22fa57..b0732ec560 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ 

Re: Creating a function for exposing memory usage of backend process

2020-07-03 Thread Fujii Masao




On 2020/07/03 11:45, torikoshia wrote:

On Wed, Jul 1, 2020 at 10:15 PM torikoshia  wrote:

I'm going to do some renaming and transportations.

- view name: pg_memory_contexts


I like more specific name like pg_backend_memory_contexts.
But I'd like to hear more opinions about the name from others.



- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c


Attached an updated patch.


Thanks for updating the patch!

+   level integer

In catalog.sgml, "int4" and "int8" are used in other catalogs tables.
So "integer" in the above should be "int4"?

+   total_bytes bigint

"bigint" should be "int8"?

+   Identification information of the memory context. This field is 
truncated if the identification field is longer than 1024 characters

"characters" should be "bytes"?

It's a bit confusing to have both "This field" and "the identification field"
in one description. What about just "This field is truncated at 1024 bytes"?

+  
+   Total bytes requested from malloc

Isn't it better not to use "malloc" in the description? For example,
what about something like "Total bytes allocated for this memory context"?

+#define PG_STAT_GET_MEMORY_CONTEXT_COLS9

Isn't it better to rename this to PG_GET_MEMORY_CONTEXTS_COLS
for the consistency with the function name?

+   memset(nulls, 0, sizeof(nulls));

"values[]" also should be initialized with zero?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-07-02 Thread torikoshia
On Wed, Jul 1, 2020 at 10:15 PM torikoshia  
wrote:

I'm going to do some renaming and transportations.

- view name: pg_memory_contexts
- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c


Attached an updated patch.

On Wed, Jul 1, 2020 at 10:58 PM Fujii Masao 
 wrote:

Ok, understood! I agree that it's strange to display different names
for the same memory context between this view and logging.

It's helpful if the comment there refers to MemoryContextStatsPrint()
and mentions the reason that you told.


Agreed. I changed the comments.


> It also derived from MemoryContextStatsPrint().
> I suppose it is for keeping readability of the log..

Understood. I may want to change the upper limit of query size to 
display.

But at the first step, I'm fine with limitting 1024 bytes.


Thanks, I've left it as it was.


> I'm going to follow the discussion on the mailing list and find why
> these codes were introduced.

https://www.postgresql.org/message-id/12319.1521999065%40sss.pgh.pa.us


Thanks for sharing!


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 3 Jul 2020 11:00:42 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

Previously, the only way to examine the usage of backend processes
was attaching a debugger and call MemoryContextStats() and it was
not so convenient in some cases.

This patch implements a new SQL-callable function
pg_get_memory_contexts which exposes memory usage of the local
backend.
It also adds a new view pg_memory_contexts for exposing local
backend memory contexts.
---
 doc/src/sgml/catalogs.sgml   | 126 ++
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 130 ++-
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 +++
 5 files changed, 277 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 49a881b262..e7ec822139 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9248,6 +9248,11 @@ SCRAM-SHA-256$iteration count:
   materialized views
  
 
+ 
+  pg_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_policies
   policies
@@ -10526,6 +10531,127 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_memory_contexts
+
+  
+   pg_memory_contexts
+  
+
+  
+   backend memory contexts
+  
+
+  
+   The view pg_memory_contexts displays all
+   the local backend memory contexts.
+  
+  
+   pg_memory_contexts contains one row
+   for each memory context.
+  
+
+  
+   pg_memory_contexts Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   name text
+  
+  
+   Name of the memory context
+  
+ 
+
+ 
+  
+   ident text
+  
+  
+   Identification information of the memory context. This field is truncated if the identification field is longer than 1024 characters
+  
+ 
+
+ 
+  
+   parent text
+  
+  
+   Name of the parent of this memory context
+  
+ 
+
+ 
+  
+   level integer
+  
+  
+   Distance from TopMemoryContext in context tree
+  
+ 
+
+ 
+  
+   total_bytes bigint
+  
+  
+   Total bytes requested from malloc
+  
+ 
+
+ 
+  
+   total_nblocks bigint
+  
+  
+   Total number of malloc blocks
+  
+ 
+
+ 
+  
+   free_bytes bigint
+  
+  
+   Free space in bytes
+  
+ 
+
+ 
+  
+   free_chunks bigint
+  
+  
+   Total number of free chunks
+  
+ 
+
+ 
+  
+   used_bytes bigint
+  
+  
+   Used space in bytes
+  
+ 
+
+   
+  
+
+ 
+
  
   pg_matviews
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..79d44842c6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -554,6 +554,9 @@ CREATE VIEW pg_shmem_allocations AS
 REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 
+CREATE VIEW pg_memory_contexts AS
+SELECT * FROM pg_get_memory_contexts();
+
 -- Statistics views
 
 CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index abda22fa57..76fd6f9a6a 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -21,12 +21,13 @@
 
 #include "postgres.h"
 
+#include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
 
-
 

Re: Creating a function for exposing memory usage of backend process

2020-07-01 Thread Fujii Masao




On 2020/07/01 22:15, torikoshia wrote:

On Wed, Jul 1, 2020 at 4:43 PM Fujii Masao  wrote:

Thanks for reviewing!


You treat pg_stat_local_memory_contexts view as a dynamic statistics view.
But isn't it better to treat it as just system view like pg_shmem_allocations
or pg_prepared_statements  because it's not statistics information? If yes,
we would need to rename the view, move the documentation from
monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
the more appropriate source file.


Agreed.
At first, I thought not only statistical but dynamic information about exactly
what is going on was OK when reading the sentence on the manual below.


PostgreSQL also supports reporting dynamic information about exactly what is 
going on in the system right now, such as the exact command currently being 
executed by other server processes, and which other connections exist in the 
system. This facility is independent of the collector process.


However, now I feel something strange because existing pg_stats_* views seem
to be per cluster information but the view I'm adding is about per backend
information.

I'm going to do some renaming and transportations.

- view name: pg_memory_contexts
- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c



+       tupdesc = rsinfo->setDesc;
+       tupstore = rsinfo->setResult;

These seem not to be necessary.


Thanks!


+       /*
+        * It seems preferable to label dynahash contexts with just the hash 
table
+        * name.  Those are already unique enough, so the "dynahash" part isn't
+        * very helpful, and this way is more consistent with pre-v11 practice.
+        */
+       if (ident && strcmp(name, "dynahash") == 0)
+       {
+               name = ident;
+               ident = NULL;
+       }

IMO it seems better to report both name and ident even in the case of
dynahash than report only ident (as name). We can easily understand
the context is used for dynahash when it's reported. But you think it's
better to report NULL rather than "dynahash"?


These codes come from MemoryContextStatsPrint() and my intension is to
keep consistent with it.


Ok, understood! I agree that it's strange to display different names
for the same memory context between this view and logging.

It's helpful if the comment there refers to MemoryContextStatsPrint()
and mentions the reason that you told.





+/* --
+ * The max bytes for showing identifiers of MemoryContext.
+ * --
+ */
+#define MEMORY_CONTEXT_IDENT_SIZE      1024

Do we really need this upper size limit? Could you tell me why
this is necessary?


It also derived from MemoryContextStatsPrint().
I suppose it is for keeping readability of the log..


Understood. I may want to change the upper limit of query size to display.
But at the first step, I'm fine with limitting 1024 bytes.




I'm going to follow the discussion on the mailing list and find why
these codes were introduced.


https://www.postgresql.org/message-id/12319.1521999065%40sss.pgh.pa.us

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-07-01 Thread torikoshia

On 2020-07-01 20:47, Daniel Gustafsson wrote:

For the next version (if there will be one), please remove the 
catversion bump
from the patch as it will otherwise just break patch application 
without
constant rebasing (as it's done now).  The committer will handle the 
catversion

change if/when it gets committed.

cheers ./daniel


Thanks for telling me it!
I'll do that way next time.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-07-01 Thread torikoshia
On Wed, Jul 1, 2020 at 4:43 PM Fujii Masao  
wrote:


Thanks for reviewing!

You treat pg_stat_local_memory_contexts view as a dynamic statistics 
view.
But isn't it better to treat it as just system view like 
pg_shmem_allocations
or pg_prepared_statements  because it's not statistics information? If 
yes,

we would need to rename the view, move the documentation from
monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
the more appropriate source file.


Agreed.
At first, I thought not only statistical but dynamic information about 
exactly

what is going on was OK when reading the sentence on the manual below.

PostgreSQL also supports reporting dynamic information about exactly 
what is going on in the system right now, such as the exact command 
currently being executed by other server processes, and which other 
connections exist in the system. This facility is independent of the 
collector process.


However, now I feel something strange because existing pg_stats_* views 
seem
to be per cluster information but the view I'm adding is about per 
backend

information.

I'm going to do some renaming and transportations.

- view name: pg_memory_contexts
- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c



+       tupdesc = rsinfo->setDesc;
+       tupstore = rsinfo->setResult;

These seem not to be necessary.


Thanks!


+       /*
+        * It seems preferable to label dynahash contexts with just the 
hash table
+        * name.  Those are already unique enough, so the "dynahash" 
part isn't
+        * very helpful, and this way is more consistent with pre-v11 
practice.

+        */
+       if (ident && strcmp(name, "dynahash") == 0)
+       {
+               name = ident;
+               ident = NULL;
+       }

IMO it seems better to report both name and ident even in the case of
dynahash than report only ident (as name). We can easily understand
the context is used for dynahash when it's reported. But you think it's
better to report NULL rather than "dynahash"?


These codes come from MemoryContextStatsPrint() and my intension is to
keep consistent with it.


+/* --
+ * The max bytes for showing identifiers of MemoryContext.
+ * --
+ */
+#define MEMORY_CONTEXT_IDENT_SIZE      1024

Do we really need this upper size limit? Could you tell me why
this is necessary?


It also derived from MemoryContextStatsPrint().
I suppose it is for keeping readability of the log..

I'm going to follow the discussion on the mailing list and find why
these codes were introduced.
If there's no important reason to do the same in our context, I'll
change them.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-07-01 Thread Daniel Gustafsson
> On 1 Jul 2020, at 07:48, torikoshia  wrote:

> Attached a patch for creating a view for local memory context
> and its explanation on the document.

For the next version (if there will be one), please remove the catversion bump
from the patch as it will otherwise just break patch application without
constant rebasing (as it's done now).  The committer will handle the catversion
change if/when it gets committed.

cheers ./daniel



Re: Creating a function for exposing memory usage of backend process

2020-07-01 Thread Fujii Masao




On 2020/07/01 14:48, torikoshia wrote:

On Mon, Jun 29, 2020 at 3:13 PM Fujii Masao  wrote:


Could you add this patch to Commitfest 2020-07?


Thanks for notifying, I've added it.
BTW, I registered you as an author because this patch used
lots of pg_cheat_funcs' codes.

   https://commitfest.postgresql.org/28/2622/


Thanks!




This patch provides only the function, but isn't it convenient to
provide the view like pg_shmem_allocations?



Sounds good. But isn't it better to document each column?
Otherwise, users cannot undersntad what "ident" column indicates.


Agreed.
Attached a patch for creating a view for local memory context
and its explanation on the document.


Thanks for updating the patch!

You treat pg_stat_local_memory_contexts view as a dynamic statistics view.
But isn't it better to treat it as just system view like pg_shmem_allocations
or pg_prepared_statements  because it's not statistics information? If yes,
we would need to rename the view, move the documentation from
monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
the more appropriate source file.

+   tupdesc = rsinfo->setDesc;
+   tupstore = rsinfo->setResult;

These seem not to be necessary.

+   /*
+* It seems preferable to label dynahash contexts with just the hash 
table
+* name.  Those are already unique enough, so the "dynahash" part isn't
+* very helpful, and this way is more consistent with pre-v11 practice.
+*/
+   if (ident && strcmp(name, "dynahash") == 0)
+   {
+   name = ident;
+   ident = NULL;
+   }

IMO it seems better to report both name and ident even in the case of
dynahash than report only ident (as name). We can easily understand
the context is used for dynahash when it's reported. But you think it's
better to report NULL rather than "dynahash"?

+/* --
+ * The max bytes for showing identifiers of MemoryContext.
+ * --
+ */
+#define MEMORY_CONTEXT_IDENT_SIZE  1024

Do we really need this upper size limit? Could you tell me why
this is necessary?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-06-30 Thread torikoshia
On Mon, Jun 29, 2020 at 3:13 PM Fujii Masao 
 wrote:



Could you add this patch to Commitfest 2020-07?


Thanks for notifying, I've added it.
BTW, I registered you as an author because this patch used
lots of pg_cheat_funcs' codes.

  https://commitfest.postgresql.org/28/2622/


This patch provides only the function, but isn't it convenient to
provide the view like pg_shmem_allocations?



Sounds good. But isn't it better to document each column?
Otherwise, users cannot undersntad what "ident" column indicates.


Agreed.
Attached a patch for creating a view for local memory context
and its explanation on the document.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 29 Jul 2020 13:02:29 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

Backend processes sometimes use a lot of memory because of various
reasons like caches, prepared statements and cursors.

Previously, the only way to examine the usage of backend processes
was attaching a debugger and call MemoryContextStats() and it was
not so convenient in some cases.

This patch implements a new SQL-callable function
pg_stat_get_memory_contexts which exposes memory usage of the
local backend.
It also adds a new view pg_stat_local_memory_contexts for exposing
local backend memory contexts.
---
 doc/src/sgml/monitoring.sgml | 137 +++
 src/backend/catalog/system_views.sql |  13 +++
 src/backend/postmaster/pgstat.c  |  80 
 src/backend/utils/adt/pgstatfuncs.c  |  45 +
 src/include/catalog/catversion.h |   2 +-
 src/include/catalog/pg_proc.dat  |   9 ++
 src/include/pgstat.h |   6 +-
 src/test/regress/expected/rules.out  |  10 ++
 8 files changed, 300 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dfa9d0d641..9b82af54d0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -350,6 +350,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_local_memory_contextspg_stat_local_memory_contexts
+  One row per memory context, showing information about
+   the current memory context on the local backend.
+   See 
+   pg_stat_local_memory_contexts for details.
+  
+ 
+
  
   pg_stat_progress_analyzepg_stat_progress_analyze
   One row for each backend (including autovacuum worker processes) running
@@ -3053,6 +3062,120 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_local_memory_contexts
+
+  
+   pg_stat_local_memory_contexts
+  
+
+  
+   The pg_stat_local_memory_contexts view will
+   contain one row per memory context, showing information about the
+   current memory context on the local backend.
+  
+
+  
+   pg_stat_local_memory_contexts View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   name text
+  
+  
+   Name of the context
+  
+ 
+
+ 
+  
+   ident text
+  
+  
+   Identification information of the context. This field is truncated if the identification field is longer than MEMORY_CONTEXT_IDENT_SIZE (1024 characters in a standard build)
+  
+ 
+
+ 
+  
+   parent text
+  
+  
+   Name of the parent of this context
+  
+ 
+
+ 
+  
+   level integer
+  
+  
+   Distance from TopMemoryContext in context tree
+  
+ 
+
+ 
+  
+   total_bytes bigint
+  
+  
+   Total bytes requested from malloc
+  
+ 
+
+ 
+  
+   total_nblocks bigint
+  
+  
+   Total number of malloc blocks
+  
+ 
+
+ 
+  
+   free_bytes bigint
+  
+  
+   Free space in bytes
+  
+ 
+
+ 
+  
+   free_chunks bigint
+  
+  
+   Total number of free chunks
+  
+ 
+
+ 
+  
+   used_bytes bigint
+  
+  
+   Used space in bytes
+  
+ 
+
+   
+  
+
+ 
+
  
   pg_stat_archiver
 
@@ -4617,6 +4740,20 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i

   
 
+  
+   
+
+ pg_stat_get_memory_contexts
+
+pg_stat_get_memory_contexts ()
+setof record
+   
+   
+Returns records of information about all memory contexts of the
+local backend.
+   
+  
+
   

 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..134b20f13f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -793,6 +793,19 @@ CREATE VIEW pg_stat_replication AS
 JOIN 

Re: Creating a function for exposing memory usage of backend process

2020-06-29 Thread Bharath Rupireddy
> > If there is no such way to know the cache sizes and other info such as
> > statistics, number of entries, cache misses, hits etc.  can the
> > approach discussed here be applied?
> I think it's partially yes.
>

> > If the user knows the cache statistics and other information, may be
> > we can allow user to take appropriate actions such as allowing him to
> > delete few entries through a command or some other way.
> Yeah, one of the purposes of the features we are discussing here is to
> use them for such situation.
>

Thanks Kasahara for the response. I will try to understand more about
getting the cache statistics and
also will study the possibility of applying this approach being
discussed here in this thread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Creating a function for exposing memory usage of backend process

2020-06-29 Thread Fujii Masao




On 2020/06/29 12:01, torikoshia wrote:

On 2020-06-20 03:11, Robert Haas wrote:

On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
 wrote:

> As a first step, to deal with (3) I'd like to add
> pg_stat_get_backend_memory_context() which target is limited to the
> local backend process.

+1


+1 from me, too.


Attached a patch that adds a function exposing memory usage of local
backend.


Thanks for the patch!
Could you add this patch to Commitfest 2020-07?



It's almost same as pg_cheat_funcs's pg_stat_get_memory_context().


This patch provides only the function, but isn't it convenient to
provide the view like pg_shmem_allocations?



I've also added MemoryContexts identifier because it seems useful to
distinguish the same kind of memory contexts.


Sounds good. But isn't it better to document each column?
Otherwise, users cannot undersntad what "ident" column indicates.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-06-28 Thread torikoshia

On 2020-06-20 03:11, Robert Haas wrote:

On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
 wrote:

> As a first step, to deal with (3) I'd like to add
> pg_stat_get_backend_memory_context() which target is limited to the
> local backend process.

+1


+1 from me, too.


Attached a patch that adds a function exposing memory usage of local
backend.

It's almost same as pg_cheat_funcs's pg_stat_get_memory_context().
I've also added MemoryContexts identifier because it seems useful to
distinguish the same kind of memory contexts.

For example, when there are many prepared statements we can
distinguish them using identifiers, which shows the cached query.

  =# SELECT name, ident FROM pg_stat_get_memory_contexts() WHERE name = 
'CachedPlanSource';

 name   |ident
  --+
   CachedPlanSource | PREPARE q1(text) AS SELECT ..;
   CachedPlanSource | PREPARE q2(text) AS SELECT ..;
  (2 rows)



Something that exposed this via shared memory would
be quite useful, and there are other things we'd like to expose
similarly, such as the plan(s) from the running queries, or even just
the untruncated query string(s). I'd like to have a good
infrastructure for that sort of thing, but I think it's quite tricky
to do properly. You need a variable-size chunk of shared memory, so
probably it has to use DSM somehow, and you need to update it as
things change, but if you update it too frequently performance will
stink. If you ping processes to do the updates, how do you know when
they've completed them, and what happens if they don't respond in a
timely fashion? These are probably all solvable problems, but I don't
think they are very easy ones.


Thanks for your comments!

It seems hard as you pointed out.
I'm going to consider it along with the advice of Fujii and Kasahara.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 29 Jun 2020 07:48:49 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

Backend processes sometimes use a lot of memory because of various
reasons like caches, prepared statements and cursors.

Previously, the only way to examine the usage of backend processes
was attaching a debugger and call MemoryContextStats() and it was
not so convenient in some cases.

This patch implements a new SQL-callable function
pg_stat_get_memory_contexts() which exposes memory usage of the
local backend.
---
 doc/src/sgml/monitoring.sgml| 14 +
 src/backend/postmaster/pgstat.c | 80 +
 src/backend/utils/adt/pgstatfuncs.c | 45 
 src/include/catalog/pg_proc.dat |  9 
 src/include/pgstat.h|  6 ++-
 5 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dfa9d0d641..74c8663981 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4617,6 +4617,20 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i

   
 
+  
+   
+
+ pg_stat_get_memory_contexts
+
+pg_stat_get_memory_contexts ()
+setof record
+   
+   
+Returns records of information about all memory contexts of the
+local backend.
+   
+  
+
   

 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c022597bc0..140f111654 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -62,6 +62,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -167,6 +168,13 @@ static const char *const slru_names[] = {
  */
 static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];
 
+/* --
+ * The max bytes for showing identifiers of MemoryContext.
+ * --
+ */
+#define MEMORY_CONTEXT_IDENT_SIZE	1024
+
+
 /* --
  * Local data
  * --
@@ -2691,6 +2699,78 @@ pgstat_fetch_slru(void)
 	return slruStats;
 }
 
+void
+pgstat_put_memory_contexts_tuplestore(Tuplestorestate *tupstore,
+  TupleDesc tupdesc, MemoryContext context,
+  MemoryContext parent, int level)
+{
+#define PG_STAT_GET_MEMORY_CONTEXT_COLS	9
+	Datum		values[PG_STAT_GET_MEMORY_CONTEXT_COLS];
+	bool		nulls[PG_STAT_GET_MEMORY_CONTEXT_COLS];
+	MemoryContextCounters stat;
+	MemoryContext child;
+	const char *name = context->name;
+	const char *ident = context->ident;
+
+	if (context == NULL)
+		return;
+
+	/*
+	 * It seems preferable to label dynahash contexts with just the hash table
+	 * name.  Those are already unique enough, so the "dynahash" part isn't
+	 * very helpful, and this way is more consistent with pre-v11 practice.
+	 */
+	if (ident && strcmp(name, "dynahash") 

Re: Creating a function for exposing memory usage of backend process

2020-06-26 Thread Kasahara Tatsuhito
Hi,

On Fri, Jun 26, 2020 at 3:42 PM Bharath Rupireddy
 wrote:
> While going through the mail chain on relation, plan and catalogue
> caching [1], I'm thinking on the lines that is there a way to know the
> current relation, plan and catalogue cache sizes? If there is a way
> already,  please ignore this and it would be grateful if someone point
> me to that.
AFAIK the only way to get statistics on PostgreSQL's backend  internal
local memory usage is to use MemoryContextStats() via gdb to output
the information to the log, so far.

> If there is no such way to know the cache sizes and other info such as
> statistics, number of entries, cache misses, hits etc.  can the
> approach discussed here be applied?
I think it's partially yes.

> If the user knows the cache statistics and other information, may be
> we can allow user to take appropriate actions such as allowing him to
> delete few entries through a command or some other way.
Yeah, one of the purposes of the features we are discussing here is to
use them for such situation.

Regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-06-26 Thread Bharath Rupireddy
Hi,

While going through the mail chain on relation, plan and catalogue
caching [1], I'm thinking on the lines that is there a way to know the
current relation, plan and catalogue cache sizes? If there is a way
already,  please ignore this and it would be grateful if someone point
me to that.

Posting this here as I felt it's relevant.

If there is no such way to know the cache sizes and other info such as
statistics, number of entries, cache misses, hits etc.  can the
approach discussed here be applied?

If the user knows the cache statistics and other information, may be
we can allow user to take appropriate actions such as allowing him to
delete few entries through a command or some other way.

I'm sorry, If I'm diverting the topic being discussed in this mail
thread, please ignore if it is irrelevant.

[1] - 
https://www.postgresql.org/message-id/flat/20161219.201505.11562604.horiguchi.kyotaro%40lab.ntt.co.jp

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Creating a function for exposing memory usage of backend process

2020-06-25 Thread Kasahara Tatsuhito
Hi !

On Thu, Jun 18, 2020 at 12:56 PM Fujii Masao
 wrote:
> Agreed. The feature to view how local memory contexts are used in
> each process is very useful!
+1

> >=# SELECT * FROM pg_stat_get_backend_memory_context(PID);
>
> I'm afraid that this interface is not convenient when we want to monitor
> the usages of local memory contexts for all the processes. For example,
> I'd like to monitor how much memory is totally used to store prepared
> statements information. For that purpose, I wonder if it's more convenient
> to provide the view displaying the memory context usages for
> all the processes.
How about separating a function that examines memory consumption
trends for all processes and a function that examines memory
consumption for a particular phase of a particular process?

For the former, as Fujii said, the function shows the output limited
information for each context type. All processes calculation and
output the information at idle status.

I think the latter is useful for debugging and other purposes.
For example, imagine preparing a function for registration like the following.
=# SELECT pg_stat_get_backend_memory_context_regist (pid, context,
max_children, calc_point)

pid: A target process
context: The top level of the context of interest
max_children: Maximum number of output for the target context's children
 (similar to MemoryContextStatsInternal()'s max_children)
calc_point: Single or multiple position(s) to calculate and output
context information
 (Existing hooks such as planner_hook, executor_start, etc.. could be used. )

This function informs the target PID to output the information of the
specified context at the specified calc_point.
When the target PID process reaches the calc_point, it calculates and
output the context information one time to a file or DSM.

(Currently PostgreSQL has no formal ways of externally modifying the
parameters of a particular process, so it may need to be
implemented...)

Sometimes I want to know the memory usage in the planning phase or
others with a query_string and/or plan_tree that before target process
move to the idle status.
So it would be nice to retrieve memory usage at some arbitrary point in time !

Regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-06-19 Thread Robert Haas
On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
 wrote:
> > As a first step, to deal with (3) I'd like to add
> > pg_stat_get_backend_memory_context() which target is limited to the
> > local backend process.
>
> +1

+1 from me, too. Something that exposed this via shared memory would
be quite useful, and there are other things we'd like to expose
similarly, such as the plan(s) from the running queries, or even just
the untruncated query string(s). I'd like to have a good
infrastructure for that sort of thing, but I think it's quite tricky
to do properly. You need a variable-size chunk of shared memory, so
probably it has to use DSM somehow, and you need to update it as
things change, but if you update it too frequently performance will
stink. If you ping processes to do the updates, how do you know when
they've completed them, and what happens if they don't respond in a
timely fashion? These are probably all solvable problems, but I don't
think they are very easy ones.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Creating a function for exposing memory usage of backend process

2020-06-17 Thread Fujii Masao




On 2020/06/17 22:00, torikoshia wrote:

Hi,

As you may know better than I do, backend processes sometimes use a lot
of memory because of the various reasons like caches, prepared
statements and cursors.
When supporting PostgreSQL, I face situations for investigating the
reason of memory bloat.

AFAIK, the way to examine it is attaching a debugger and call
MemoryContextStats(TopMemoryContext), however, I feel some difficulties
doing it:

   - some production environments don't allow us to run a debugger easily
   - many lines about memory contexts are hard to analyze


Agreed. The feature to view how local memory contexts are used in
each process is very useful!



Using an extension(pg_stat_get_memory_context() in pg_cheat_funcs[1]),
we can get the view of the memory contexts, but it's the memory contexts
of the backend which executed the pg_stat_get_memory_context().


[user interface]
If we have a function exposing memory contexts for specified PID,
we can easily examine them.
I imagine a user interface something like this:

   =# SELECT * FROM pg_stat_get_backend_memory_context(PID);


I'm afraid that this interface is not convenient when we want to monitor
the usages of local memory contexts for all the processes. For example,
I'd like to monitor how much memory is totally used to store prepared
statements information. For that purpose, I wonder if it's more convenient
to provide the view displaying the memory context usages for
all the processes.

To provide that view, all the processes need to save their local memory
context usages into the shared memory or the special files in their
convenient timing. For example, backends do that every end of query
execution (during waiting for new request from clients). OTOH,
the query on the view scans and displays all those information.

Of course there would be several issues in this idea. One issue is
the performance overhead caused when each process stores
its own memory context usage to somewhere. Even if backends do that
during waiting for new request from clients, non-negligible overhead
might happen. Performance test is necessary. Also this means that
we cannot see the memory context usage of the process in the middle of
query execution since it's saved at the end of query. If local memory bloat
occurs only during query execution and we want to investigate it, we still
need to use gdb to output the memory context information.

Another issue is that the large amount of shared memory might be
necessary to save the memory context usages for all the proceses. We can
save the usage information into the file instead, but which would cause
more overhead. If we use shared memory, the similar parameter like
track_activity_query_size might be necessary. That is, backends save
only the specified number of memory context information. If it's zero,
the feature is disabled.

Also we should reduce the same of information to save. For example,
instead of saving all memory context information like MemoryContextStats()
prints, it might be better to save the summary stats (per memory context
type) from them.




    name   |   parent   | level | total_bytes | 
total_nblocks | free_bytes | free_chunks | used_bytes | some other attibutes..
--++---+-+---++-+
  TopMemoryContext |    | 0 |   68720 | 
  5 |   9936 |  16 |  58784
  TopTransactionContext    | TopMemoryContext   | 1 |    8192 | 
  1 |   7720 |   0 |    472
  PL/pgSQL function    | TopMemoryContext   | 1 |   16384 | 
  2 |   5912 |   1 |  10472
  PL/pgSQL function    | TopMemoryContext   | 1 |   32768 | 
  3 |  15824 |   3 |  16944
  dynahash | TopMemoryContext   | 1 |    8192 | 
  1 |    512 |   0 |   7680
...


[rough implementation ideas and challenges]
I suppose communication between a process which runs
pg_stat_get_backend_memory_context()(referred to as A) and
target backend(referred to as B) is like:

   1. A sends a message to B and order to dump the memory contexts
   2. B dumps its memory contexts to some shared area
   3. A reads the shared area and returns it to the function invoker

To do so, there seem some challenges.

(1) how to share memory contexts information between backend processes
The amount of memory contexts greatly varies depending on the
situation, so it's not appropriate to share the memory contexts using
fixed shared memory.
Also using the file on 'stats_temp_directory' seems difficult thinking
about the background of the shared-memory based stats collector
proposal[2].
Instead, I'm thinking about using dsm_mq, which allows messages of
arbitrary length to be sent and receive.

(2) how to send messages wanting memory contexts
Communicating via signal 

Creating a function for exposing memory usage of backend process

2020-06-17 Thread torikoshia

Hi,

As you may know better than I do, backend processes sometimes use a lot
of memory because of the various reasons like caches, prepared
statements and cursors.
When supporting PostgreSQL, I face situations for investigating the
reason of memory bloat.

AFAIK, the way to examine it is attaching a debugger and call
MemoryContextStats(TopMemoryContext), however, I feel some difficulties
doing it:

  - some production environments don't allow us to run a debugger easily
  - many lines about memory contexts are hard to analyze

Using an extension(pg_stat_get_memory_context() in pg_cheat_funcs[1]),
we can get the view of the memory contexts, but it's the memory contexts
of the backend which executed the pg_stat_get_memory_context().


[user interface]
If we have a function exposing memory contexts for specified PID,
we can easily examine them.
I imagine a user interface something like this:

  =# SELECT * FROM pg_stat_get_backend_memory_context(PID);

   name   |   parent   | level | total_bytes | 
total_nblocks | free_bytes | free_chunks | used_bytes | some other 
attibutes..

--++---+-+---++-+
 TopMemoryContext || 0 |   68720 |   
  5 |   9936 |  16 |  58784
 TopTransactionContext| TopMemoryContext   | 1 |8192 |   
  1 |   7720 |   0 |472
 PL/pgSQL function| TopMemoryContext   | 1 |   16384 |   
  2 |   5912 |   1 |  10472
 PL/pgSQL function| TopMemoryContext   | 1 |   32768 |   
  3 |  15824 |   3 |  16944
 dynahash | TopMemoryContext   | 1 |8192 |   
  1 |512 |   0 |   7680

...


[rough implementation ideas and challenges]
I suppose communication between a process which runs
pg_stat_get_backend_memory_context()(referred to as A) and
target backend(referred to as B) is like:

  1. A sends a message to B and order to dump the memory contexts
  2. B dumps its memory contexts to some shared area
  3. A reads the shared area and returns it to the function invoker

To do so, there seem some challenges.

(1) how to share memory contexts information between backend processes
The amount of memory contexts greatly varies depending on the
situation, so it's not appropriate to share the memory contexts using
fixed shared memory.
Also using the file on 'stats_temp_directory' seems difficult thinking
about the background of the shared-memory based stats collector
proposal[2].
Instead, I'm thinking about using dsm_mq, which allows messages of
arbitrary length to be sent and receive.

(2) how to send messages wanting memory contexts
Communicating via signal seems simple but assigning a specific number
of signal for this purpose seems wasting.
I'm thinking about using fixed shared memory to put dsm_mq handle.
To send a message, A creates a dsm_mq and put the handle on the shared
memory area. When B founds a handle, B dumps the memory contexts to the
corresponding dsm_mq.

However, enabling B to find the handle needs to check the shared memory
periodically. I'm not sure the suitable location and timing for this
checking yet, and doubt this way of communication is acceptable because
it gives certain additional loads to all the backends.

(3) clarifying the necessary attributes
As far as reading the past disucussion[3], it's not so clear what kind
of information should be exposed regarding memory contexts.


As a first step, to deal with (3) I'd like to add
pg_stat_get_backend_memory_context() which target is limited to the
local backend process.


Thanks for reading and how do you think?


[1] 
https://github.com/MasaoFujii/pg_cheat_funcs#setof-record-pg_stat_get_memory_context
[2] 
https://www.postgresql.org/message-id/flat/20180629.173418.190173462.horiguchi.kyot...@lab.ntt.co.jp
[3] 
https://www.postgresql.org/message-id/20190805171608.g22gxwmfr2r7uf6t%40alap3.anarazel.de



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION