from:"Hamid Akhtar"

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-12-07 Thread Hamid Akhtar

On Wed, 7 Dec 2022 at 13:01, Alvaro Herrera  wrote:

> On 2022-Dec-06, Andres Freund wrote:
>
> > Hi,
> >
> > On 2022-11-25 02:45:01 +0500, Hamid Akhtar wrote:
> > > Rebasing the patch to the tip of the master branch.
> >
> > This doesn't pass tests on cfbot. Looks like possibly some files are
> missing?
>
> The .sql file is there all right, but meson.build is not altered to be
> aware of it.


I wasn't aware of the meson.build file. Attached is the latest version of
the patch that contains the updated meson.build.

-- 
Hamid Akhtar,
Percona LLC, www.percona.com


pageinspect_btree_multipagestats-v7.patch
Description: Binary data

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-11-24 Thread Hamid Akhtar

On Mon, 21 Nov 2022 at 17:34, Naeem Akhter  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> i've tested and verified the documentation.


Rebasing the patch to the tip of the master branch.


pageinspect_btree_multipagestats-v6.patch
Description: Binary data

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-11-10 Thread Hamid Akhtar

On Wed, 12 Oct 2022 at 10:51, Michael Paquier  wrote:

> On Mon, Sep 12, 2022 at 03:18:59PM -0400, Tom Lane wrote:
> > I spun up a 32-bit VM, since that had been on my to-do list anyway,
> > and it looks like I was right:
>
> This feedback has not been addressed and the thread is idle four
> weeks, so I have marked this CF entry as RwF.  Please feel free to
> resubmit once a new version of the patch is available.
>

Attaching the version 5 of the patch that addresses all 3 points raised by
Tom Lane earlier in the thread.
(1) Documentation is added.
(2) Passing "-1" for the number of blocks required now returns all the
remaining index pages after the starting block.
(3) The newly added test cases work for both 32-bit and 64-bit systems.

> --
> Michael
>

pageinspect_btree_multipagestats-v5.patch
Description: Binary data

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-09-27 Thread Hamid Akhtar

On Tue, 26 Jul 2022 at 21:35, Simon Riggs 
wrote:

> On Fri, 15 Jul 2022 at 12:29, Aleksander Alekseev
>  wrote:
> > Personally I didn't expect that
> > merging patches in this thread would take that long. They are in
> > "Ready for Committer" state for a long time now and there are no known
> > issues with them other than unit tests for SLRU [1] should be merged
> > first.
>
> These patches look ready to me, including the SLRU tests.
>
> Even though they do very little, these patches touch many aspects of
> the code, so it would make sense to apply these as the last step in
> the CF.
>
> To encourage committers to take that next step, let's have a
> democratic vote on moving this forwards:
> +1 from me.
>

This set of patches no longer applies cleanly to the master branch. There
are lots of
hunks as well as failures. Please rebase the patches.

There are failures for multiple files including the one given below:

patching file src/backend/replication/logical/worker.c
Hunk #1 succeeded at 1089 (offset 1 line).
Hunk #2 succeeded at 1481 (offset 1 line).
Hunk #3 succeeded at 3322 (offset 2 lines).
Hunk #4 succeeded at 3493 (offset 2 lines).
Hunk #5 FAILED at 4009.
1 out of 5 hunks FAILED -- saving rejects to file
src/backend/replication/logical/worker.c.rej

Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-09-25 Thread Hamid Akhtar

On Sun, 28 Aug 2022 at 17:30, Nathan Bossart 
wrote:

> On Thu, Aug 25, 2022 at 10:12:00AM +0530, Bharath Rupireddy wrote:
> > Please review the v12 patch attached.
>
> LGTM.  I've marked this as ready-for-committer.
>

This patch needs an update. It is failing on Windows for the test case
"33/238 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade with an
"exit status 2" error.

The details are available on the cfbot.cputue.org.
https://cirrus-ci.com/task/5709014662119424?logs=check_world#L267

--
Hamid Akhtar,
Percona LLC, www.percona.com

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-09-12 Thread Hamid Akhtar

On Tue, 6 Sept 2022 at 11:25, Ibrar Ahmed  wrote:

>
>
> On Mon, Aug 1, 2022 at 11:29 PM Naeem Akhter 
> wrote:
>
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  tested, passed
>> Implements feature:   tested, passed
>> Spec compliant:   not tested
>> Documentation:not tested
>>
>> Looks good to me.
>>
>> The new status of this patch is: Ready for Committer
>>
>
> The patch has a compilation error on the latest code base, please rebase
> your patch.
>
> [03:08:46.087] /tmp/cirrus-ci-build/contrib/cube/cubeparse.h:77:27: error:
> ‘result’ was not declared in this scope
> [03:08:46.087] 77 | int cube_yyparse (NDBOX **result, Size scanbuflen);
> [03:08:46.087] | ^~
> [03:08:46.087] /tmp/cirrus-ci-build/contrib/cube/cubeparse.h:77:40: error:
> expected primary-expression before ‘scanbuflen’
> [03:08:46.087] 77 | int cube_yyparse (NDBOX **result, Size scanbuflen);
> ...
>
> --
> Ibrar Ahmed
>

The compilation and regression are working fine. I have verified it against
the tip of the master branch [commit: 57796a0f].

Re: CFM Manager

2022-08-22 Thread Hamid Akhtar

On Tue, 23 Aug 2022 at 1:26 AM, Ibrar Ahmed  wrote:

>
>
> On Mon, Aug 22, 2022 at 9:47 PM Jacob Champion 
> wrote:
>
>> On Mon, Aug 22, 2022 at 9:40 AM Tom Lane  wrote:
>> > You attribute more organization to this than actually exists ;-)
>>
>> Ha, fair enough!
>>
>> > If Ibrar wants the job I think it's his.
>>
>> Excellent. Ibrar, I'll be updating the CFM checklist [1] over the next
>> couple of weeks. I'll try to have sections of it touched up by the
>> time you're due to use them. Let me know if there's anything in
>> particular that is confusing or needs more explanation.
>>
>> Thanks,
>> --Jacob
>>
>> [1] https://wiki.postgresql.org/wiki/CommitFest_Checklist
>>
>>
>> Thanks, I will start working.
>

I’d like to assist.


>
> --
>
> Ibrar Ahmed.
> Senior Software Engineer, PostgreSQL Consultant.
>

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-07-31 Thread Hamid Akhtar

On Sun, 31 Jul 2022 at 18:00, Hamid Akhtar  wrote:

>
>
> On Thu, 28 Jul 2022 at 00:36, Tom Lane  wrote:
>
>> Hamid Akhtar  writes:
>> > That attached patch is based on the master branch. It makes the
>> following
>> > changes to the pageinspect contrib module:
>> > - Updates bt_page_stats_internal function to accept 3 arguments instead
>> of
>> > 2.
>> > - The function now uses SRF macros to return a set rather than a single
>> > row. The function call now requires specifying column names.
>>
>> FWIW, I think you'd be way better off changing the function name, say
>> to bt_multi_page_stats().  Overloading the name this way is going to
>> lead to great confusion, e.g. somebody who fat-fingers the number of
>> output arguments in a JDBC call could see confusing results due to
>> invoking the wrong one of the two functions.  Also, I'm not quite sure
>> what you mean by "The function call now requires specifying column
>> names", but it doesn't sound like an acceptable restriction from a
>> compatibility standpoint.  If a different name dodges that issue then
>> it's clearly a better way.
>>
>> regards, tom lane
>>
>
> Attached please find the latest version of the patch;
> pageinspect_btree_multipagestats_02.patch.
>
> It no longer modifies the existing bt_page_stats function. Instead, it
> introduces a new function bt_multi_page_stats as you had suggested. The
> function expects three arguments where the first argument is the index name
> followed by block number and number of blocks to be returned.
>
> Please ignore this statement. It was a typo.
> "The function call now requires specifying column names"
>

Attached is the rebased version of the patch
(pageinspect_btree_multipagestats_03.patch) for the master branch.


>
>
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 5c07365..31d7b1c 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -13,12 +13,12 @@ OBJS = \
 	rawpage.o
 
 EXTENSION = pageinspect
-DATA =  pageinspect--1.9--1.10.sql pageinspect--1.8--1.9.sql \
-	pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \
-	pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
-	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
-	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
-	pageinspect--1.0--1.1.sql
+DATA =  pageinspect--1.10--1.11.sql pageinspect--1.9--1.10.sql \
+	pageinspect--1.8--1.9.sql pageinspect--1.7--1.8.sql \
+	pageinspect--1.6--1.7.sql pageinspect--1.5.sql \
+	pageinspect--1.5--1.6.sql pageinspect--1.4--1.5.sql \
+	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
+	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 REGRESS = page btree brin gin gist hash checksum oldextversions
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 9375d55..f7ec95c 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -46,6 +46,7 @@ PG_FUNCTION_INFO_V1(bt_page_items);
 PG_FUNCTION_INFO_V1(bt_page_items_bytea);
 PG_FUNCTION_INFO_V1(bt_page_stats_1_9);
 PG_FUNCTION_INFO_V1(bt_page_stats);
+PG_FUNCTION_INFO_V1(bt_multi_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
 #define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
@@ -80,6 +81,26 @@ typedef struct BTPageStat
 	BTCycleId	btpo_cycleid;
 } BTPageStat;
 
+/*
+ * cross-call data structure for SRF for page stats
+ */
+typedef struct ua_page_stats
+{
+	uint64		blkno;
+	uint64		blk_count;
+}			ua_page_stats;
+
+/*
+ * cross-call data structure for SRF for page items
+ */
+typedef struct ua_page_items
+{
+	Page		page;
+	OffsetNumber offset;
+	bool		leafpage;
+	bool		rightmost;
+	TupleDesc	tupd;
+}			ua_page_items;
 
 /* -
  * GetBTPageStatistics()
@@ -177,34 +198,15 @@ GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
 }
 
 /* ---
- * bt_page_stats()
+ * bt_index_block_validate()
  *
- * Usage: SELECT * FROM bt_page_stats('t1_pkey', 1);
+ * Validate index type is btree and block number
+ * is within a valid block number range.
  * ---
  */
-static Datum
-bt_page_stats_internal(PG_FUNCTION_ARGS, enum pageinspect_version ext_version)
+static void
+bt_index_block_validate(Relation rel, int64 blkno)
 {
-	text	   *relname = PG_GETARG_TEXT_PP(0);
-	int64		blkno = (ext_version == PAGEINSPECT_V1_8 ? PG_GETARG_UINT32(1) : PG_GETARG_INT64(1));
-	Buffer		buffer;
-	Relation	rel;
-	RangeVar   *relrv;
-	Datum		result;
-	HeapTuple	tuple;
-	TupleDesc	tupleDesc;
-	int			j;
-	char	   *values[11];
-	BTPageStat	stat;
-

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-07-31 Thread Hamid Akhtar

On Thu, 28 Jul 2022 at 00:36, Tom Lane  wrote:

> Hamid Akhtar  writes:
> > That attached patch is based on the master branch. It makes the following
> > changes to the pageinspect contrib module:
> > - Updates bt_page_stats_internal function to accept 3 arguments instead
> of
> > 2.
> > - The function now uses SRF macros to return a set rather than a single
> > row. The function call now requires specifying column names.
>
> FWIW, I think you'd be way better off changing the function name, say
> to bt_multi_page_stats().  Overloading the name this way is going to
> lead to great confusion, e.g. somebody who fat-fingers the number of
> output arguments in a JDBC call could see confusing results due to
> invoking the wrong one of the two functions.  Also, I'm not quite sure
> what you mean by "The function call now requires specifying column
> names", but it doesn't sound like an acceptable restriction from a
> compatibility standpoint.  If a different name dodges that issue then
> it's clearly a better way.
>
> regards, tom lane
>

Attached please find the latest version of the patch;
pageinspect_btree_multipagestats_02.patch.

It no longer modifies the existing bt_page_stats function. Instead, it
introduces a new function bt_multi_page_stats as you had suggested. The
function expects three arguments where the first argument is the index name
followed by block number and number of blocks to be returned.

Please ignore this statement. It was a typo.
"The function call now requires specifying column names"
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 5c07365..31d7b1c 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -13,12 +13,12 @@ OBJS = \
 	rawpage.o
 
 EXTENSION = pageinspect
-DATA =  pageinspect--1.9--1.10.sql pageinspect--1.8--1.9.sql \
-	pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \
-	pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
-	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
-	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
-	pageinspect--1.0--1.1.sql
+DATA =  pageinspect--1.10--1.11.sql pageinspect--1.9--1.10.sql \
+	pageinspect--1.8--1.9.sql pageinspect--1.7--1.8.sql \
+	pageinspect--1.6--1.7.sql pageinspect--1.5.sql \
+	pageinspect--1.5--1.6.sql pageinspect--1.4--1.5.sql \
+	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
+	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 REGRESS = page btree brin gin gist hash checksum oldextversions
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 62f2c1b..5dd216e 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -46,6 +46,7 @@ PG_FUNCTION_INFO_V1(bt_page_items);
 PG_FUNCTION_INFO_V1(bt_page_items_bytea);
 PG_FUNCTION_INFO_V1(bt_page_stats_1_9);
 PG_FUNCTION_INFO_V1(bt_page_stats);
+PG_FUNCTION_INFO_V1(bt_multi_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
 #define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
@@ -80,6 +81,26 @@ typedef struct BTPageStat
 	BTCycleId	btpo_cycleid;
 } BTPageStat;
 
+/*
+ * cross-call data structure for SRF for page stats
+ */
+typedef struct ua_page_stats
+{
+	uint64		blkno;
+	uint64		blk_count;
+}			ua_page_stats;
+
+/*
+ * cross-call data structure for SRF for page items
+ */
+typedef struct ua_page_items
+{
+	Page		page;
+	OffsetNumber offset;
+	bool		leafpage;
+	bool		rightmost;
+	TupleDesc	tupd;
+}			ua_page_items;
 
 /* -
  * GetBTPageStatistics()
@@ -177,34 +198,15 @@ GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
 }
 
 /* ---
- * bt_page_stats()
+ * bt_index_block_validate()
  *
- * Usage: SELECT * FROM bt_page_stats('t1_pkey', 1);
+ * Validate index type is btree and block number
+ * is within a valid block number range.
  * ---
  */
-static Datum
-bt_page_stats_internal(PG_FUNCTION_ARGS, enum pageinspect_version ext_version)
+static void
+bt_index_block_validate(Relation rel, int64 blkno)
 {
-	text	   *relname = PG_GETARG_TEXT_PP(0);
-	int64		blkno = (ext_version == PAGEINSPECT_V1_8 ? PG_GETARG_UINT32(1) : PG_GETARG_INT64(1));
-	Buffer		buffer;
-	Relation	rel;
-	RangeVar   *relrv;
-	Datum		result;
-	HeapTuple	tuple;
-	TupleDesc	tupleDesc;
-	int			j;
-	char	   *values[11];
-	BTPageStat	stat;
-
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("must be superuser to use pageinspect functions")));
-
-	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
-	rel = relation_openrv(relrv, AccessShareLock);
-
 	if (!IS_INDEX(rel) || !IS_BTREE(rel))
 		ereport(ER

Re: explain analyze rows=%.0f

2022-07-26 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The previous review was incorrectly posted. Updating the pat.ch review

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-07-25 Thread Hamid Akhtar

On Fri, 1 Jul 2022 at 13:01, Drouvot, Bertrand  wrote:

> Hi,
> On 6/30/22 10:24 AM, Hamid Akhtar wrote:
>
> On Mon, 27 Jun 2022 at 15:52, Bharath Rupireddy <
> bharath.rupireddyforpostg...@gmail.com> wrote:
>
>> On Mon, Jun 27, 2022 at 1:40 PM Drouvot, Bertrand 
>> wrote:
>> >
>> > Hi,
>> >
>> > On 6/27/22 9:31 AM, Hamid Akhtar wrote:
>> >
>> >
>> > Hello Hackers,
>> >
>> > While working on one of my blogs on the B-Tree indexes, I needed to
>> look at a range of B-Tree page statistics. So the goto solution was to use
>> pageinspect. However, reviewing stats for multiple pages meant issuing
>> multiple queries.
>>
>> +1 to improve the API.
>>
> I think it makes sense too.
>
> But what about the other pageinspect's functions that also use a single
> blkno as parameter? Should not the patch also takes care of them?
>
> I've started working on that. But it's going to be a much bigger change
with a lot of code refactoring.

So, taking this one step at a time, IMHO, this patch is good to be reviewed
now.


> Regards,
>
> Bertrand
>

Re: explain analyze rows=%.0f

2022-07-25 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

LGTM

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-06-30 Thread Hamid Akhtar

On Thu, 30 Jun 2022 at 14:27, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Jun 30, 2022 at 1:54 PM Hamid Akhtar 
> wrote:
> >
> >> Do we have any
> >> difference in the execution times for the above query vs the new
> >> function introduced in the v1 patch? If there's not much difference, I
> >> would suggest adding an SQL function around the generate_series
> >> approach in the pageinspect extension for better and easier usability.
> >
> >
> > Based on some basic SQL execution time comparison of the two approaches,
> I see that the API change, on average, is around 40% faster than the SQL.
> >
> > CREATE TABLE test2 AS (SELECT generate_series(1, 500) AS col1);
> > CREATE INDEX test2_col1_idx ON test2(col1);
> >
> > EXPLAIN ANALYZE
> > SELECT * FROM bt_page_stats('test2_col1_idx', 1, 5000);
> >
> > EXPLAIN ANALYZE
> > SELECT * FROM GENERATE_SERIES(1, 5000) blkno,
> bt_page_stats('test2_col1_idx',blkno::int);
> >
> > For me, the API change returns back the data in around 74ms whereas the
> SQL returns it in 102ms. So considering this and as you mentioned, the
> alternative may not be that obvious to everyone, it is a fair improvement.
>
> I'm wondering what happens with a bit of huge data and different test
> cases each test case executed, say, 2 or 3 times.
>
> If the difference in execution times is always present, then the API
> approach or changing the core function would make more sense.
>

Technically, AFAIK, the performance difference will always be there.
Firstly, in the API change, there is no additional overhead of the
generate_series function. Additionally, with API change, looping over the
pages has a smaller overhead when compared with the overhead of the SQL
approach.


>
> Regards,
> Bharath Rupireddy.
>

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-06-30 Thread Hamid Akhtar

On Mon, 27 Jun 2022 at 15:52, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Jun 27, 2022 at 1:40 PM Drouvot, Bertrand 
> wrote:
> >
> > Hi,
> >
> > On 6/27/22 9:31 AM, Hamid Akhtar wrote:
> >
> >
> > Hello Hackers,
> >
> > While working on one of my blogs on the B-Tree indexes, I needed to look
> at a range of B-Tree page statistics. So the goto solution was to use
> pageinspect. However, reviewing stats for multiple pages meant issuing
> multiple queries.
>
> +1 to improve the API.
>
> > I felt that there's an opportunity for improvement in the extension by
> extending the API to output the statistics for multiple pages with a single
> query.
> >
> > That attached patch is based on the master branch. It makes the
> following changes to the pageinspect contrib module:
> > - Updates bt_page_stats_internal function to accept 3 arguments instead
> of 2.
> > - The function now uses SRF macros to return a set rather than a single
> row. The function call now requires specifying column names.
> >
> > The extension version is bumped to 1.11 (PAGEINSPECT_V1_11).
> > To maintain backward compatibility, for versions below 1.11, the
> multi-call mechanism is ended to keep the old behavior consistent.
> >
> > Regression test cases for the module are updated as well as part of this
> change. Here is a subset of queries that are added to the btree.sql test
> case file for pageinspect.
> >
> > 
> > CREATE TABLE test2 AS (SELECT generate_series(1, 5000) AS col1);
> > CREATE INDEX test2_col1_idx ON test2(col1);
> > SELECT * FROM bt_page_stats('test2_col1_idx', 1, 2);
> >
> > For example, this could be written as:
> >
> > select * from
> > generate_series(1, 2) blkno ,
> > bt_page_stats('test2_col1_idx',blkno::int);
> >
> > Or, if one wants to inspect to whole relation, something like:
> >
> > select * from
> > generate_series(1, pg_relation_size('test2_col1_idx'::regclass::text) /
> 8192 - 1) blkno ,
> > bt_page_stats('test2_col1_idx',blkno::int);
>
> Good one. But not all may know the alternatives.


+1


> Do we have any
> difference in the execution times for the above query vs the new
> function introduced in the v1 patch? If there's not much difference, I
> would suggest adding an SQL function around the generate_series
> approach in the pageinspect extension for better and easier usability.
>

Based on some basic SQL execution time comparison of the two approaches, I
see that the API change, on average, is around 40% faster than the SQL.

CREATE TABLE test2 AS (SELECT generate_series(1, 500) AS col1);
CREATE INDEX test2_col1_idx ON test2(col1);

EXPLAIN ANALYZE
SELECT * FROM bt_page_stats('test2_col1_idx', 1, 5000);

EXPLAIN ANALYZE
SELECT * FROM GENERATE_SERIES(1, 5000) blkno,
bt_page_stats('test2_col1_idx',blkno::int);

For me, the API change returns back the data in around 74ms whereas the SQL
returns it in 102ms. So considering this and as you mentioned, the
alternative may not be that obvious to everyone, it is a fair improvement.


>
> Regards,
> Bharath Rupireddy.
>

Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-06-27 Thread Hamid Akhtar

Hello Hackers,

While working on one of my blogs on the B-Tree indexes
<https://www.percona.com/blog/postgresql-14-b-tree-index-reduced-bloat-with-bottom-up-deletion/>,
I needed to look at a range of B-Tree page statistics. So the goto solution
was to use pageinspect. However, reviewing stats for multiple pages meant
issuing multiple queries. I felt that there's an opportunity for
improvement in the extension by extending the API to output the statistics
for multiple pages with a single query.

That attached patch is based on the master branch. It makes the following
changes to the pageinspect contrib module:
- Updates bt_page_stats_internal function to accept 3 arguments instead of
2.
- The function now uses SRF macros to return a set rather than a single
row. The function call now requires specifying column names.

The extension version is bumped to 1.11 (PAGEINSPECT_V1_11).
To maintain backward compatibility, for versions below 1.11, the multi-call
mechanism is ended to keep the old behavior consistent.

Regression test cases for the module are updated as well as part of this
change. Here is a subset of queries that are added to the btree.sql test
case file for pageinspect.


CREATE TABLE test2 AS (SELECT generate_series(1, 5000) AS col1);
CREATE INDEX test2_col1_idx ON test2(col1);
SELECT * FROM bt_page_stats('test2_col1_idx', 1, 2);
SELECT * FROM bt_page_stats('test2_col1_idx', 1, 0);
SELECT * FROM bt_page_stats('test2_col1_idx', 0, 1);
SELECT * FROM bt_page_stats('test2_col1_idx', 1, -1);
DROP TABLE test2;


Regards.

--
Hamid Akhtar,
Percona LLC,
URL : www.percona.com
CELL：+923335449950
SKYPE: engineeredvirus
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 5c0736564a..31d7b1c581 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -13,12 +13,12 @@ OBJS = \
 	rawpage.o
 
 EXTENSION = pageinspect
-DATA =  pageinspect--1.9--1.10.sql pageinspect--1.8--1.9.sql \
-	pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \
-	pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
-	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
-	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
-	pageinspect--1.0--1.1.sql
+DATA =  pageinspect--1.10--1.11.sql pageinspect--1.9--1.10.sql \
+	pageinspect--1.8--1.9.sql pageinspect--1.7--1.8.sql \
+	pageinspect--1.6--1.7.sql pageinspect--1.5.sql \
+	pageinspect--1.5--1.6.sql pageinspect--1.4--1.5.sql \
+	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
+	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 REGRESS = page btree brin gin gist hash checksum oldextversions
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 62f2c1b315..85127fcfdb 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -45,6 +45,7 @@ PG_FUNCTION_INFO_V1(bt_page_items_1_9);
 PG_FUNCTION_INFO_V1(bt_page_items);
 PG_FUNCTION_INFO_V1(bt_page_items_bytea);
 PG_FUNCTION_INFO_V1(bt_page_stats_1_9);
+PG_FUNCTION_INFO_V1(bt_page_stats_1_11);
 PG_FUNCTION_INFO_V1(bt_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -80,6 +81,26 @@ typedef struct BTPageStat
 	BTCycleId	btpo_cycleid;
 } BTPageStat;
 
+/*
+ * cross-call data structure for SRF for page stats
+ */
+typedef struct ua_page_stats
+{
+	uint64		blkno;
+	uint64		blk_count;
+}			ua_page_stats;
+
+/*
+ * cross-call data structure for SRF for page items
+ */
+typedef struct ua_page_items
+{
+	Page		page;
+	OffsetNumber offset;
+	bool		leafpage;
+	bool		rightmost;
+	TupleDesc	tupd;
+}			ua_page_items;
 
 /* -
  * GetBTPageStatistics()
@@ -176,35 +197,17 @@ GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
 		stat->avg_item_size = 0;
 }
 
+
 /* ---
- * bt_page_stats()
+ * bt_index_block_validate()
  *
- * Usage: SELECT * FROM bt_page_stats('t1_pkey', 1);
+ * Validate index type is btree and block number
+ * is within a valid block number range.
  * ---
  */
-static Datum
-bt_page_stats_internal(PG_FUNCTION_ARGS, enum pageinspect_version ext_version)
+static void
+bt_index_block_validate(Relation rel, int64 blkno)
 {
-	text	   *relname = PG_GETARG_TEXT_PP(0);
-	int64		blkno = (ext_version == PAGEINSPECT_V1_8 ? PG_GETARG_UINT32(1) : PG_GETARG_INT64(1));
-	Buffer		buffer;
-	Relation	rel;
-	RangeVar   *relrv;
-	Datum		result;
-	HeapTuple	tuple;
-	TupleDesc	tupleDesc;
-	int			j;
-	char	   *values[11];
-	BTPageStat	stat;
-
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("must be superuser to use pageinspect functions")));
-
-	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
-	rel = relation_openrv(relrv, AccessShareLock);
-
 	if (!IS_INDEX(rel) || !IS_BTREE(re

Re: Parameter for planner estimate of recursive queries

2022-01-28 Thread Hamid Akhtar

On Tue, 25 Jan 2022 at 14:44, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 31.12.21 15:10, Simon Riggs wrote:
> >> The factor 10 is a reasonably safe assumption and helps avoid worst
> >> case behavior in bigger graph queries. However, the factor 10 is way
> >> too large for many types of graph query, such as where the path
> >> through the data is tight, and/or the query is written to prune bushy
> >> graphs, e.g. shortest path queries. The factor 10 should not be
> >> hardcoded in the planner, but should be settable, just as
> >> cursor_tuple_fraction is.
> > If you think this should be derived without parameters, then we would
> > want a function that starts at 1 for 1 input row and gets much larger
> > for larger input. The thinking here is that Graph OLTP is often a
> > shortest path between two nodes, whereas Graph Analytics and so the
> > worktable will get much bigger.
>
> On the one hand, this smells like a planner hint.  But on the other
> hand, it doesn't look like we will come up with proper graph-aware
> selectivity estimation system any time soon, so just having all graph
> OLTP queries suck until then because the planner hint is hardcoded
> doesn't seem like a better solution.  So I think this setting can be ok.
>   I think the way you have characterized it makes sense, too: for graph
> OLAP, you want a larger value, for graph OLTP, you want a smaller value.
>

Do you think there is a case to replace the 10x multiplier with
"recursive_worktable_estimate" for total_rows calculation in the
cost_recursive_union function too?

Re: psql crash while executing core regression tests

2021-03-02 Thread Hamid Akhtar

I use CentOS 7 with flex 2.5.37 quite extensively have never come across a
psql crash. This seems more like an environment related issue on your
system.



On Tue, Mar 2, 2021 at 1:53 PM walker  wrote:

> hi
>
> During installation from source code, there are many crashes for psql
> while executing core regression tests, all the crashes are similar, the
> backtrace info as follows:
>
> Core was generated by 'psql'.
> Program terminated with signal 11, Segmentation fault.
> # 0 0x0043f140 in slash_yylex()
> (gdb) bt
> #0 0x0043f140 in slash_yylex()
> #1 0x004430fc in psql_scan_slash_command()
> #2 0x0043f140 in HandleSlashCmds()
> #3 0x0043f140 in MainLoop()
> #4 0x0043f140 in main()
>
> I did more compared testing about this scenario, as follows:
> 1. install from local source code(xxx.tar.gz)
> 1) switch to source tree directory, and build there   no crash
> generated
> 2) create a build directory, and build there   no crash generated
>
> 2. install from git source code
> 1) switch to source tree directory, and build there  no crash generated
> 2) create a build directory, and build there  many crashes generated,
> but if install newer version of flex, e.g. 2.6.4, the problem doesn't
> exist. Any suggestions about this behavior?
>
> NOTES:
> test commands are same, as follows:
> configure --enable-coverage --enable-tap-tests
> make
> make check
>
> testing environment:
> PostgreSQL: 13.2
> redhat 7.4, 3.10.0-693.e17.x86_64
> flex: 2.5.37
>
> thanks
> walker
>

Re: Bug in error reporting for multi-line JSON

2021-02-27 Thread Hamid Akhtar

Updated the patch based on feedback.

The new status of this patch is: Needs review

Re: Use boolean array for nulls parameters

2021-01-19 Thread Hamid Akhtar

I personally don't see any benefit in this change. The focus shouldn't be
on fixing things that aren't broken. Perhaps, there is more value in using
bitmap data type to keep track of NULL values, which is typical storage vs
performance debate, and IMHO, it's better to err on using slightly more
storage for much better performance. IIRC, the bitmap idea has previously
discussed been rejected too.

On Tue, Jan 19, 2021 at 7:07 PM japin  wrote:

>
> Hi,
>
> When I review the [1], I find that the tuple's nulls array use char type.
> However there are many places use boolean array to repsent the nulls array,
> so I think we can replace the char type nulls array to boolean type.  This
> change will break the SPI_xxx API, I'm not sure whether this chagnges cause
> other problems or not.  Any thought?
>
> [1] -
> https://www.postgresql.org/message-id/flat/ca+hiwqgkfjfydeq5vhph6eqpkjsbfpddy+j-kxyfepqedts...@mail.gmail.com
>
> --
> Regrads,
> Japin Li.
> ChengDu WenWu Information Technology Co.,Ltd.
>
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2020-12-08 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

The patch looks good to me. With regards to the code comments, I had pretty 
similar concerns as already raised by Dmitry; and those have already been 
answered by you. So this patch is good to go from my side once you have updated 
the comments per your last email.

Re: Remove unnecessary else branch

2020-10-13 Thread Hamid Akhtar

On Tue, Oct 13, 2020 at 6:37 PM Heikki Linnakangas  wrote:

> On 13/10/2020 16:30, Li Japin wrote:
> > Hi,
> >
> > I found in guc-file.l we can omit the else branch in
> AbsoluteConfigLocation().
>
> It will compile the same, so it's just a matter of code readability or
> taste which style is better here. I think we should leave it alone, it's
> fine as it is.
>
> - Heikki
>
>
>
I agree with Heikki from the code execution point of view.

"canonicalize_path(abs_path);" statement is also condition independent and
can be pulled out of both if and else blocks. Removing
unnecessary statements makes the code more readable, but it is a matter of
choice/style.

Re: Improved Cost Calculation for IndexOnlyScan

2020-10-13 Thread Hamid Akhtar

On Mon, Oct 12, 2020 at 3:46 PM Hamid Akhtar  wrote:

>
>
> On Tue, Sep 29, 2020 at 2:57 PM Heikki Linnakangas 
> wrote:
>
>> On 29/09/2020 11:49, Hamid Akhtar wrote:
>> > So, not actually random replacement here, rather a change with
>> > baserel->allvisfrac taken into consideration (as given below):
>> > 
>> > index_random_page_cost = Min(spc_seq_page_cost + spc_random_page_cost *
>> > (1.0 - baserel->allvisfrac), spc_random_page_cost);
>> > 
>> >
>> > Does this make sense?
>>
>> No. genericcostestimate() is only concerned with accesses to the index,
>> not the the heap accesses that are needed with Index Scans. 'allvisfrac'
>> should not affect the number of *index* pages fetched in any way.
>>
>> - Heikki
>>
>
> Currently, the costing for indexonlyscan only differs based on
> 'allvisfrac'. IIUC, the current implementation changes the number of pages
> being fetched based on 'allvisfrac'.
>
> This patch actually makes indexonlyscan specific changes
> to genericcostestimate function. Currently, regardless of the value of
> 'allvisfrac', it is being assumed that the cost of fetching index pages is
> random page cost. That is not aligned with the current cost calculation for
> indexonlyscan. Therefore, I'm suggesting to reduce the random page in a
> similar fashion in case of indexonlyscan.
>
> I'm adding this to the commitfest.
>

Retrospectively looking at the patch, I see your point. Your criticism is
valid. I'll revalidate this issue and rework the patch if necessary.


>
> --
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
> ADDR: 10318 WHALLEY BLVD, Surrey, BC
> CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
> SKYPE: engineeredvirus
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Improved Cost Calculation for IndexOnlyScan

2020-10-12 Thread Hamid Akhtar

On Tue, Sep 29, 2020 at 2:57 PM Heikki Linnakangas  wrote:

> On 29/09/2020 11:49, Hamid Akhtar wrote:
> > So, not actually random replacement here, rather a change with
> > baserel->allvisfrac taken into consideration (as given below):
> > 
> > index_random_page_cost = Min(spc_seq_page_cost + spc_random_page_cost *
> > (1.0 - baserel->allvisfrac), spc_random_page_cost);
> > 
> >
> > Does this make sense?
>
> No. genericcostestimate() is only concerned with accesses to the index,
> not the the heap accesses that are needed with Index Scans. 'allvisfrac'
> should not affect the number of *index* pages fetched in any way.
>
> - Heikki
>

Currently, the costing for indexonlyscan only differs based on
'allvisfrac'. IIUC, the current implementation changes the number of pages
being fetched based on 'allvisfrac'.

This patch actually makes indexonlyscan specific changes
to genericcostestimate function. Currently, regardless of the value of
'allvisfrac', it is being assumed that the cost of fetching index pages is
random page cost. That is not aligned with the current cost calculation for
indexonlyscan. Therefore, I'm suggesting to reduce the random page in a
similar fashion in case of indexonlyscan.

I'm adding this to the commitfest.

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Improved Cost Calculation for IndexOnlyScan

2020-09-29 Thread Hamid Akhtar

On Tue, Sep 29, 2020 at 1:08 PM Heikki Linnakangas  wrote:

> On 29/09/2020 10:06, Hamid Akhtar wrote:
> > In one of my earlier emails [1], I mentioned that there seems to be a
> > problem with how the cost for index only scans is being calculated.
> > [1]
> >
> https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com
> >
> > My concern is that there seems to be a bigger disconnect between the
> > cost of index only scan and the execution time. Having tested this on 3
> > different systems, docker, laptop and a server with RAID 5 SSD
> > configured, at the threshold where index only scan cost exceeds that of
> > sequential scan, index only is still around 30% faster than the
> > sequential scan.
>
> A 30% discrepancy doesn't sound too bad, to be honest. The exact
> threshold depends on so many factors.
>
> > My initial hunch was that perhaps we need to consider a different
> > approach when considering cost for index only scan. However, the
> > solution seems somewhat simple.
> >
> > cost_index function in costsize.c, in case of indexonlyscan, multiplies
> > the number of pages fetched by a factor of (1.0 - baserel->allvisfrac)
> > which is then used to calculate the max_IO_cost and min_IO_cost.
> >
> > This is very similar to the cost estimate methods for indexes internally
> > call genericostesimate function. This function primarily gets the number
> > of pages for the indexes and multiplies that with random page cost
> > (spc_random_page_cost) to get the total disk access cost.
> >
> > I believe that in case of index only scan, we should adjust the
> > spc_random_page_cost in context of baserel->allvisfrac so that it
> > accounts for random pages for only the fraction that needs to be read
> > for the relation and excludes that the index page fetches.
>
> That doesn't sound right to me. The genericcostestimate() function
> calculates the number of *index* pages fetched. It makes no difference
> if it's an Index Scan or an Index Only Scan.
>
> genericcostestimate() could surely be made smarter. Currently, it
> multiplies the number of index pages fetched with random_page_cost, even
> though a freshly created index is mostly physically ordered by the keys.
> seq_page_cost with some fudge factor might be more appropriate, whether
> or not it's an Index Only Scan. Not sure what the exact formula should
> be, just replacing random_page_cost with seq_page_cost is surely not
> right either.
>
> - Heikki
>

So, not actually random replacement here, rather a change with
baserel->allvisfrac taken into consideration (as given below):

index_random_page_cost = Min(spc_seq_page_cost + spc_random_page_cost *
(1.0 - baserel->allvisfrac), spc_random_page_cost);


Does this make sense?

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Improved Cost Calculation for IndexOnlyScan

2020-09-29 Thread Hamid Akhtar

In one of my earlier emails [1], I mentioned that there seems to be a
problem with how the cost for index only scans is being calculated.
[1]
https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com

My concern is that there seems to be a bigger disconnect between the cost
of index only scan and the execution time. Having tested this on 3
different systems, docker, laptop and a server with RAID 5 SSD configured,
at the threshold where index only scan cost exceeds that of sequential
scan, index only is still around 30% faster than the sequential scan.

My initial hunch was that perhaps we need to consider a different approach
when considering cost for index only scan. However, the solution seems
somewhat simple.

cost_index function in costsize.c, in case of indexonlyscan, multiplies the
number of pages fetched by a factor of (1.0 - baserel->allvisfrac) which is
then used to calculate the max_IO_cost and min_IO_cost.

This is very similar to the cost estimate methods for indexes internally
call genericostesimate function. This function primarily gets the number of
pages for the indexes and multiplies that with random page cost
(spc_random_page_cost) to get the total disk access cost.

I believe that in case of index only scan, we should adjust the
spc_random_page_cost in context of baserel->allvisfrac so that it accounts
for random pages for only the fraction that needs to be read for the
relation and excludes that the index page fetches.

With the attached POC for this approach (based on the current master
branch), index only scan which was previously bailing out at much earlier,
approximately around when 50% of the index/table was being scanned. With
the attached patch, this percentage goes upto 70%. The execution time for
index only still remains well below that of the sequential scan, however,
this is a significant improvement IMHO.

Following is the simple SQL that I was using to see how this patch impacts
that indexonlyscan costing and execution time for the scan. You may tweak
the table size or the number of rows being scanned for your environment..

SQL:
-
CREATE TABLE index_test AS (SELECT GENERATE_SERIES(1, 100)::int id,
'hello world' AS name);
CREATE INDEX on index_test(id);
VACUUM ANALYZE index_test;
EXPLAIN ANALYZE SELECT COUNT(*) FROM index_test WHERE id < 700;

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


poc_indexonly_costing.patch
Description: Binary data

Re: fixing old_snapshot_threshold's time->xid mapping

2020-09-24 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Patch looks good to me.

Case for Improved Costing for Index Only Scans?

2020-09-08 Thread Hamid Akhtar

While running a simple SELECT statement on a table where I expected
indexonly scan to be preferred over sequential scan, the planner kept on
selecting the sequential scan. Looking at the EXPLAIN output, it seemed
obvious that the cost of indexonly was exceeding that of sequential scan.
So the planner was right, however, my expectation was that indexonly should
have been selected instead.

Following is an example table that I’ll be referring to.

-- Creating a very basic table, index and running vacuum analyze
CREATE TABLE index_test AS (SELECT GENERATE_SERIES(1, 1000)::int id,
'hello world' AS name);
CREATE INDEX on index_test;
VACUUM ANALYZE index_test;

-- SELECT statement
SELECT id FROM index_test WHERE id < [SOME VALUE];

So as a starting point, I wanted to identify when the cost for indexonly
exceeds that of a sequential scan. In my case, the number turned out to be
6,243,354 with the table containing 10,000,000 tuples.

When the cost exceeds, the planner should select the more optimum path.
However, my concern was why is the indexonly scan cost greater than that of
sequential one. Turning on the timing, the expectation was that at the
tipping point, indexonly execution time should be greater than that of
sequential one. However, I see that indexonly scan’s execution was much
much faster than the sequential scan. In terms of timing, this is what I
expected. So I turned on the timing in psql and ran EXPLAIN ANALYZE.
Following are the outputs.

-- SEQUENTIAL SCAN
EXPLAIN ANALYZE SELECT id FROM index_test WHERE id < 6243354;

  QUERY PLAN

---
 Seq Scan on index_test  (cost=0.00..179053.25 rows=6227030 width=4)
(actual time=0.175..993.291 rows=6243353 loops=1)
   Filter: (id < 6243354)
   Rows Removed by Filter: 3756647
 Planning Time: 0.235 ms
 Execution Time: 1149.590 ms
(5 rows)

Time: 1150.798 ms (00:01.151)
postgres=# explain analyze select id from index_test where id < 6243353;


-- INDEXONLY SCAN
EXPLAIN ANALYZE SELECT id FROM index_test WHERE id < 6243353;

  QUERY
PLAN
--
 Index Only Scan using index_test_id_idx on index_test
 (cost=0.43..177277.44 rows=6227029 width=4) (actual time=0.063..718.390
rows=6243352 loops=1)
   Index Cond: (id < 6243353)
   Heap Fetches: 0
 Planning Time: 0.174 ms
 Execution Time: 873.070 ms
(5 rows)

Time: 874.079 ms

Given that this is a very well packed table, still you can see that the
execution time increases from 718ms for indexonly scan to 993ms only with
an increase of a single tuple just because the cost of indexonly scan goes
beyond that of sequential scan.

I’ve tried this in a docker, my laptop and on a server without
virtualization. All have shown similar gains.

Reviewing the costestimate function in cost_size.c, I see that the cost of
indexonly differs for min_IO_cost and max_IO_cost. The indexTotalCost cost
remains the same whether it’s an indexonly scan or not. I believe that
total index cost in case of indexonly scan should also be updated.

I don't think there is a need to modify all amcostestimate functions for
indexes, however, perhaps we can pull in another factor that helps make
cost for indexonly more realistic.

ISTM, it should be reduced by a factor somewhere around (index
size)/(relation size), even perhaps putting together expected index size,
actual index size and relation size in some equation.

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: track_planning causing performance regression

2020-08-19 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Overall, the patch works fine. However, I have a few observations:

(1) Code Comments:
- The code comments should be added for the 2 new macros, in particular for 
PGSS_NUM_LOCK_PARTITIONS. As you explained in your email, this may be used to 
limit the number of locks if a very large value for pgss_max is specified.
- From the code I inferred that the number of locks can in future be less than 
pgss_max (per your email where in future this macro could be used to limit the 
number of locks). I suggest to perhaps add some notes helping future changes in 
this code area.

(2) It seems like that "pgss->lock = &(pgss->base + pgss_max)->lock;" statement 
should not use pgss_max directly and instead use PGSS_NUM_LOCK_PARTITIONS 
macro, as when a limit is imposed on number of locks, this statement will cause 
an overrun.


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

The new status of this patch is: Waiting on Author

Re: track_planning causing performance regression

2020-08-18 Thread Hamid Akhtar

On Tue, Aug 18, 2020 at 8:43 PM Fujii Masao 
wrote:

>
> > Yes, I pushed the document_overhead_by_track_planning.patch, but this
> > CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with
> lwlocks
> > in pg_stat_statements. The latter patch has not been committed yet.
> > Probably attachding the different patches in the same thread would cause
> > this confusing thing... Anyway, thanks for your comment!
>
> To avoid further confusion, I attached the rebased version of
> the patch that was registered at CF. I'd appreciate it if
> you review this version.
>

Thank you. Reviewing it now.


>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: track_planning causing performance regression

2020-08-17 Thread Hamid Akhtar

On Mon, Aug 17, 2020 at 2:21 PM Fujii Masao 
wrote:

>
>
> On 2020/07/31 21:40, Hamid Akhtar wrote:
> > <https://commitfest.postgresql.org/29/2634/>
> >
> > On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao  <mailto:masao.fu...@oss.nttdata.com>> wrote:
> >
> >
> >
> > On 2020/07/04 12:22, Pavel Stehule wrote:
> >  >
> >  >
> >  > pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>> napsal:
> >  >
> >  >
> >  >
> >  > On 2020/07/03 16:02, Pavel Stehule wrote:
> >  >  >
> >  >  >
> >  >  > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>>>
> napsal:
> >  >  >
> >  >  >
> >  >  >
> >  >  > On 2020/07/03 13:05, Pavel Stehule wrote:
> >  >  >  > Hi
> >  >  >  >
> >  >  >  > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>>
> <mailto:masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>
> <mailto:masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>
> <mailto:masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>
> <mailto:masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>>>>
> napsal:
> >  >  >  >
> >  >  >  >
> >  >  >  >
> >  >  >  > On 2020/07/01 7:37, Peter Geoghegan wrote:
> >  >  >  >  > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
> masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>>
> <mailto:masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>
> <mailto:masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>
> <mailto:masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>
> <mailto:masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>>>>
> wrote:
> >  >  >  >  >> Ants and Andres suggested to replace the
> spinlock used in pgss_store() with
> >  >  >  >  >> LWLock. I agreed with them and posted the
> POC patch doing that. But I think
> >  >  >  >  >> the patch is an item for v14. The patch may
> address the reported performance
> >  >  >  >  >> issue, but may cause other performance
> issues in other workloads. We would
> >  >  >  >  >> need to measure how the patch affects the
> performance in various workloads.
> >  >  >  >  >> It seems too late to do that at this stage
> of v13. Thought?
> >  >  >  >  >
> >  >  >  >  > I agree that it's too late for v13.
> >  >  >  >
> >  >  >  > Thanks for the comment!
> >  >  >  >
> >  >  >  > So I pushed the patch and changed default of
> track_planning to off.
> >  >  >  >
> >  >  >  >
> >  >  >  > Maybe there can be documented so enabling this
> option can have a negative impact on performance.
> >  >  >
> >  >  > Yes. What about adding either of the followings into
> the doc?
> >  >  >
> >  >  >   Enabling this parameter may incur a noticeable
> performance penalty.
> >  >  >
> >  >

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2020-08-17 Thread Hamid Akhtar

Unfortunately the latest patch doesn't apply cleanly on the master branch. Can 
you please share an updated one.

Re: track_planning causing performance regression

2020-07-31 Thread Hamid Akhtar

 

On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao 
wrote:

>
>
> On 2020/07/04 12:22, Pavel Stehule wrote:
> >
> >
> > pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com > napsal:
> >
> >
> >
> > On 2020/07/03 16:02, Pavel Stehule wrote:
> >  >
> >  >
> >  > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >> napsal:
> >  >
> >  >
> >  >
> >  > On 2020/07/03 13:05, Pavel Stehule wrote:
> >  >  > Hi
> >  >  >
> >  >  > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >  masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com  napsal:
> >  >  >
> >  >  >
> >  >  >
> >  >  > On 2020/07/01 7:37, Peter Geoghegan wrote:
> >  >  >  > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >  masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com  >  >  >  >> Ants and Andres suggested to replace the spinlock
> used in pgss_store() with
> >  >  >  >> LWLock. I agreed with them and posted the POC
> patch doing that. But I think
> >  >  >  >> the patch is an item for v14. The patch may
> address the reported performance
> >  >  >  >> issue, but may cause other performance issues in
> other workloads. We would
> >  >  >  >> need to measure how the patch affects the
> performance in various workloads.
> >  >  >  >> It seems too late to do that at this stage of v13.
> Thought?
> >  >  >  >
> >  >  >  > I agree that it's too late for v13.
> >  >  >
> >  >  > Thanks for the comment!
> >  >  >
> >  >  > So I pushed the patch and changed default of
> track_planning to off.
> >  >  >
> >  >  >
> >  >  > Maybe there can be documented so enabling this option can
> have a negative impact on performance.
> >  >
> >  > Yes. What about adding either of the followings into the doc?
> >  >
> >  >   Enabling this parameter may incur a noticeable
> performance penalty.
> >  >
> >  > or
> >  >
> >  >   Enabling this parameter may incur a noticeable
> performance penalty,
> >  >   especially when a fewer kinds of queries are executed
> on many
> >  >   concurrent connections.
> >  >
> >  >
> >  > This second variant looks perfect for this case.
> >
> > Ok, so patch attached.
> >
> >
> > +1
>
> Thanks for the review! Pushed.
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>
>
>
You might also want to update this patch's status in the commitfest:
https://commitfest.postgresql.org/29/2634/

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Rethinking opclass member checks and dependency strength

2020-07-29 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Looks good to me. 

CORRECTION:
In my previous review I had mistakenly mentioned that it was causing a server 
crash. Tests run perfectly fine without any errors.

The new status of this patch is: Ready for Committer

Re: Rethinking opclass member checks and dependency strength

2020-07-28 Thread Hamid Akhtar

On Tue, Jul 28, 2020 at 8:43 PM Tom Lane  wrote:

> Hamid Akhtar  writes:
> > I've gone through the patch and applied on the master branch, other than
> a few hunks, and whether as suggested upthread, the default case for
> "switch (op->number)" should throw an error or not, I found that bloom
> regression is crashing.
> > -
> > test bloom... FAILED (test process exited with
> exit code 2)   20 ms
>
> Hmm ... I think you must have done something wrong.  For me,
> am-check-members-callback-5.patch still applies cleanly (just a few
> small offsets), and it passes that test as well as the rest of
> check-world.  The cfbot agrees [1].
>
> Maybe you didn't "make clean" before rebuilding?
>
> regards, tom lane
>
> [1]
> https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/71250
>

I was pretty sure I did make clean before testing the patch, but perhaps I
didn't as re-running it causes all tests to pass.

Sorry for the false alarm. All good with the patch.

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Rethinking opclass member checks and dependency strength

2020-07-28 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

I've gone through the patch and applied on the master branch, other than a few 
hunks, and whether as suggested upthread, the default case for "switch 
(op->number)" should throw an error or not, I found that bloom regression is 
crashing.
-
test bloom... FAILED (test process exited with exit 
code 2)   20 ms

+server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+connection to server was lost
-

The new status of this patch is: Waiting on Author

Re: Should we remove a fallback promotion? take 2

2020-07-28 Thread Hamid Akhtar

There have been no real objections on this patch and it seems to work. So,
IMHO, the only thing that needs to be done is perhaps update the patch so
that it applies clearly on the master branch.

On Mon, Jul 27, 2020 at 9:31 PM Fujii Masao 
wrote:

>
>
> On 2020/07/27 17:53, Hamid Akhtar wrote:
> > Applying the patch to the current master branch throws 9 hunks. AFAICT,
> the patch is good otherwise.
>
> So you think that the patch can be marked as Ready for Committer?
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Should we remove a fallback promotion? take 2

2020-07-27 Thread Hamid Akhtar

Applying the patch to the current master branch throws 9 hunks. AFAICT, the
patch is good otherwise.

On Wed, Jun 3, 2020 at 3:20 PM Fujii Masao 
wrote:

>
>
> On 2020/06/03 12:06, Kyotaro Horiguchi wrote:
> > At Wed, 3 Jun 2020 09:43:17 +0900, Fujii Masao <
> masao.fu...@oss.nttdata.com> wrote in
> >> I will change the status back to Needs Review.
>
> Thanks for the review!
>
> >   record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1,
> false);
> >   if (record != NULL)
> >   {
> > -  fast_promoted = true;
> > +  promoted = true;
> >
> > Even if we missed the last checkpoint record, we don't give up
> > promotion and continue fall-back promotion but the variable "promoted"
> > stays false. That is confusiong.
> >
> > How about changing it to fallback_promotion, or some names with more
> > behavior-specific name like immediate_checkpoint_needed?
>
>
> I like doEndOfRecoveryCkpt or something, but I have no strong opinion
> about that flag naming. So I'm ok with immediate_checkpoint_needed
> if others also like that, too.
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Improving psql slash usage help message

2020-07-21 Thread Hamid Akhtar

So are you suggesting to not fix this or do a more detailed review and
assess what other psql messages can be grouped together.

On Sun, Jul 12, 2020 at 8:15 PM Tom Lane  wrote:

> Hamid Akhtar  writes:
> > On Sun, Apr 26, 2020 at 1:03 AM David G. Johnston <
> > david.g.johns...@gmail.com> wrote:
> >> On Sat, Apr 25, 2020 at 12:29 PM Hamid Akhtar 
> >> wrote:
> >>> "\dE" displays the list with a "List of relations" heading whereas
> "\det"
> >>> displays "List of foreign tables". So, to differentiate the two, I
> suggest
> >>> to change the help message for "\dE" to:
> >>> \dE[S+] [PATTERN]  list foreign relations
>
> >> help.c and the documentation need to be synchronized a bit more than
> this
> >> single issue.
> >> Calling it "foreign relation" for \dE and "foreign table" for \det does
> >> convey that there is a difference - not sure it a huge improvement
> though.
> >> The "\d[Eimstv]" family of meta-commands should, in the help, probably
> be
> >> moved together to show the fact that they are basically "list relation
> >> names [of this type only]" while "\det" is "list foreign table info".
>
> FWIW, I agree with David on this point.  I find it bizarre and unhelpful
> that slashUsage shows \dt, \di, etc as separate commands and fails to
> indicate that they can be combined.  We could merge these entries into
>
> fprintf(output, _("  \\d[tivmsE][S+] [PATRN] list relations of
> specified type(s)\n"));
>
> which would both remind people that they can give more than one type,
> and shorten the already-overly-long list.
>
> > I think from a user perspective, grouping these wouldn't be helpful. For
> > example, it may cause FDW related commands to be spread through out the
> > help.
>
> That seems quite irrelevant to this proposal.
>
> regards, tom lane
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: WIP/PoC for parallel backup

2020-07-06 Thread Hamid Akhtar

On Mon, Jul 6, 2020 at 5:24 PM Daniel Gustafsson  wrote:

> > On 12 Jun 2020, at 19:28, Robert Haas  wrote:
>
> > I am sure that nobody is arguing that the patch has to be beneficial
> > in all cases in order to justify applying it. However, there are
> > several good arguments against proceding with this patch:
>
> This thread has stalled with no resolution to the raised issues, and the
> latest
> version of the patch (v15) posted no longer applies (I only tried 0001
> which
> failed, the green tick in the CFBot is due it mistakenlt thinking an
> attached
> report is a patch).  I'm marking this patch Returned with Feedback.  Please
> open a new CF entry when there is a new version of the patch.
>
> cheers ./daniel


I think this is fair. There are quite a few valid points raised by Robert.


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: tar-related code in PostgreSQL

2020-06-28 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The patch works perfectly.

The new status of this patch is: Ready for Committer

Re: WIP/PoC for parallel backup

2020-06-11 Thread Hamid Akhtar

As far I understand, parallel backup is not a mandatory performance
feature, rather, one at user's discretion. This IMHO indicates that it will
benefit some users and it may not others.

Taking a backup is an I/O intensive workload. So by parallelizing it
through multiple worker threads/processes, creates an overhead of its own.
So what precisely are we optimizing here. Looking at a running database
system in any environment, I see the following potential scenarios playing
out. These are probably clear to everyone here, but I'm listing these for
completeness and clarity.

Locally Running Backup:
(1) Server has no clients connected other than base backup.
(2) Server has other clients connected which are actively performing
operations causing disk I/O.

Remotely Running Backup:
(3) Server has no clients connected other than remote base backup.
(4) Server has other clients connected which are actively performing
operations causing disk I/O.

Others:
(5) Server or the system running base backup has other processes competing
for disk or network bandwidth.

Generally speaking, I see that parallelization could potentially benefit in
scenarios (2), (4) and (5) with the reason being that having more than one
thread increases the likelihood that backup will now get a bigger time
slice for disk I/O and network bandwidth. With (1) and (3), since there are
no competing processes, addition of multiple threads or processes will only
increase CPU overhead whilst still getting the same network and disk time
slice. In this particular case, the performance will degrade.

IMHO, that’s why by adding other load on the server, perhaps by running
pgbench simultaneously may show improved performance for parallel backup.
Also, running parallel backup on a local laptop more often than yields
improved performance.

There are obviously other factors that may impact the performance like the
type of I/O scheduler being used whether CFQ or some other.

IMHO, parallel backup has obvious performance benefits, but we need to
ensure that users understand that there is potential for slower backup if
there is no competition for resources.

On Fri, May 22, 2020 at 11:03 AM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

>
> On Thu, May 21, 2020 at 7:12 PM Robert Haas  wrote:
>
>>
>> So, basically, when we go from 1 process to 4, the additional
>> processes spend all of their time waiting rather than doing any useful
>> work, and that's why there is no performance benefit. Presumably, the
>> reason they spend all their time waiting for ClientRead/ClientWrite is
>> because the network between the two machines is saturated, so adding
>> more processes that are trying to use it at maximum speed just leads
>> to spending more time waiting for it to be available.
>>
>> Do we have the same results for the local backup case, where the patch
>> helped?
>>
>
> Here is the result for local backup case (100GB data). Attaching the
> captured logs.
>
> The total number of events (pg_stat_activity) captured during local runs:
> - 82 events for normal backups
> - 31 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 24 - in normal backups
> 14 - in parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 8 - in normal backup
> 43 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 32 - ClientRead in parallel backups for diff processes.
>
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Should we remove a fallback promotion? take 2

2020-06-02 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

I've applied the v2 patch on the master branch. There some hunks, but the patch 
got applied. So, I ran make installcheck-world and everything looks fine to me 
with this patch. Though, I do have a few suggestions in general:

(1) I see two functions being used (a) CheckPromoteSignal and (b) 
IsPromoteSignaled in the code. Should these be combined into a single function 
and perhaps check for "promote_signaled" and the "PROMOTE_SIGNAL_FILE". Not 
sure if doing this will break "sigusr1_handler" in postmaster.c though.

(2) CheckPromoteSignal is checking for "PROMOTE_SIGNAL_FILE" file. So, perhaps, 
rather than calling stat on "PROMOTE_SIGNAL_FILE" in if statements, I would 
suggest to use CheckPromoteSignal function instead as it does nothing but stat 
on "PROMOTE_SIGNAL_FILE" (after applying your patch).

The new status of this patch is: Waiting on Author

Re: improving basebackup.c's file-reading code

2020-04-29 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

The idea and the patch looks good to me. 

It makes sense to change fread to basebackup_read_file which internally calls 
pg_pread which is a portable function as opposed to read. As you've mentioned, 
this is much better for error handling.

I guess it is more of a personal choice, but I would suggest making the while 
conditions consistent as well. The while loop in the patch @ line 121 
conditions over return value of "basebackup_read_file" whereas @ line 177, it 
has a condition "(len < statbuf->st_size)".

The new status of this patch is: Ready for Committer

Re: Improving psql slash usage help message

2020-04-27 Thread Hamid Akhtar

On Sun, Apr 26, 2020 at 1:03 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sat, Apr 25, 2020 at 12:29 PM Hamid Akhtar 
> wrote:
>
>>
>> "\dE" displays the list with a "List of relations" heading whereas "\det"
>> displays "List of foreign tables". So, to differentiate the two, I suggest
>> to change the help message for "\dE" to:
>>
>>   \dE[S+] [PATTERN]  list foreign relations
>>
>> One could argue that both essentially mean the same thing, however,
>> considering "\dE+" also outputs size, it makes sense IMHO to make this
>> change (as PG documentation: relation is essentially a mathematical term
>> for table). Attached is the patch that makes this change.
>>
>
> help.c and the documentation need to be synchronized a bit more than this
> single issue.
>
> Calling it "foreign relation" for \dE and "foreign table" for \det does
> convey that there is a difference - not sure it a huge improvement though.
> The "\d[Eimstv]" family of meta-commands should, in the help, probably be
> moved together to show the fact that they are basically "list relation
> names [of this type only]" while "\det" is "list foreign table info".
>

I think from a user perspective, grouping these wouldn't be helpful. For
example, it may cause FDW related commands to be spread through out the
help. Currently, those are nicely grouped together, which makes life
relatively easy IMO.


>
> David J.
>
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Improving psql slash usage help message

2020-04-25 Thread Hamid Akhtar

psql slash usage show two options for listing foreign tables.

  \dE[S+] [PATTERN]  list foreign tables
  \det[+] [PATTERN]  list foreign tables

This seems a little odd especially when the output of both of these
commands is different.

postgres=# \dE+
   List of relations
 Schema | Name | Type  | Owner
+--+---+
 public | foo  | foreign table | highgo
(1 row)

postgres=# \det
  List of foreign tables
 Schema | Table | Server
+---+-
 public | foo   | orc_srv
(1 row)


"\dE" displays the list with a "List of relations" heading whereas "\det"
displays "List of foreign tables". So, to differentiate the two, I suggest
to change the help message for "\dE" to:

  \dE[S+] [PATTERN]  list foreign relations

One could argue that both essentially mean the same thing, however,
considering "\dE+" also outputs size, it makes sense IMHO to make this
change (as PG documentation: relation is essentially a mathematical term
for table). Attached is the patch that makes this change.

Regards.

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


psql_help_dE.patch
Description: Binary data

Re: Do we need to handle orphaned prepared transactions in the server?

2020-04-16 Thread Hamid Akhtar

Thank you everyone for the detailed feedback.

On Fri, Apr 17, 2020 at 5:40 AM Michael Paquier  wrote:

> On Thu, Apr 16, 2020 at 03:11:51PM -0400, Tom Lane wrote:
> > Even if I liked the core idea, loading the functionality onto VACUUM
> seems
> > like a fairly horrid design choice.  It's quite unrelated to what that
> > command does.  In the autovac code path, it's going to lead to multiple
> > autovac workers all complaining simultaneously about the same problem.
> > But having manual vacuums complain about issues unrelated to the task at
> > hand is also a seriously poor bit of UX design.  Moreover, that won't do
> > all that much to surface problems, since most(?) installations never run
> > manual vacuums; or if they do, the "manual" runs are really done by a
> cron
> > job or the like, which is not going to notice the warnings.  So you still
> > need a log-scraping tool.
>
> +1.
>
> > If we were going to go down the path of periodically logging warnings
> > about old prepared transactions, some single-instance background task
> > like the checkpointer would be a better place to do the work in.  But
> > I'm not really recommending that, because I agree with Robert that
> > we just plain don't want this functionality.
>
> I am not sure that the checkpointer is a good place to do that either,
> joining back with your argument in the first paragraph of this email
> related to vacuum.  One potential approach would be a contrib module
> that works as a background worker?  However, I would think that
> finding a minimum set of requirements that we think are generic enough
> for most users would be something hard to draft a list of.  If we had
> a small, minimal contrib/ module in core that people could easily
> extend for their own needs and that we would intentionally keep as
> minimal, in the same spirit as say passwordcheck, perhaps..
> --
> Michael
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Do we need to handle orphaned prepared transactions in the server?

2020-04-16 Thread Hamid Akhtar

On Thu, Apr 16, 2020 at 5:20 PM Robert Haas  wrote:

> On Thu, Apr 16, 2020 at 4:43 AM Hamid Akhtar 
> wrote:
> > My real question is whether vacuum should be preemptively complaining
> about prepared transactions or stale replication slots rather than waiting
> for transaction id to exceed the safe limit. I presume by the time safe
> limit is exceeded, vacuum's work would already have been significantly
> impacted.
>
> Yeah, for my part, I agree that letting things go until the point
> where VACUUM starts to complain is usually bad. Generally, you want to
> know a lot sooner. That being said, I think the solution to that is to
> run a monitoring tool, not to overload the autovacuum worker with
> additional duties.
>

So is the concern performance overhead rather than the need for such a
feature?

Any server running with prepared transactions enabled, more likely than
not, requires a monitoring tool for tracking orphaned prepared
transactions. For such environments, surely the overhead created by such a
feature implemented in the server will create a lower overhead than their
monitoring tool.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Do we need to handle orphaned prepared transactions in the server?

2020-04-16 Thread Hamid Akhtar

This patch actually does not discard any prepared transactions and only
throws a warning for each orphaned one. So, there is no behaviour change
except for getting some warnings in the log or emitting some warning to a
client executing a vacuum command.

I hear all the criticism which I don't disagree with. Obviously, scripts
and other solutions could provide a lot more flexibility.

Also, I believe most of us agree that vacuum needs to be smarter.

src/backend/commands/vacuum.c does throw warnings for upcoming wraparound
issues with one warning in particular mentioning prepared transactions and
stale replication slots. So, throwing warnings is not unprecedented. There
are 3 warnings in this file which I believe can also be handled by external
tools. I'm not debating the merit of these warnings, nor am I trying to
justify the addition of new warnings based on these.

My real question is whether vacuum should be preemptively complaining about
prepared transactions or stale replication slots rather than waiting for
transaction id to exceed the safe limit. I presume by the time safe limit
is exceeded, vacuum's work would already have been significantly impacted.

AFAICT, my patch actually doesn't break anything and doesn't add any
significant overhead to the vacuum process. It does supplement the current
warnings though which might be useful.

On Thu, Apr 16, 2020 at 10:32 AM Craig Ringer  wrote:

> On Thu, 16 Apr 2020 at 13:23, Craig Ringer  wrote:
>
>>
>> Just discarding the prepared xacts is not the answer though.
>>
>
> ... however, I have wondered a few times about making vacuum smarter about
> cases where the xmin is held down by prepared xacts or by replication
> slots. If we could record the oldest *and newest* xid needed by such
> resource retention markers we could potentially teach vacuum to remove
> intermediate dead rows. For high-churn workloads like like workqueue
> applications that could be a really big win.
>
> We wouldn't need to track a fine-grained snapshot with an in-progress list
> (or inverted in-progress list like historic snapshots) for these. We'd just
> remember the needed xid range in [xmin,xmax] form. And we could even do the
> same for live backends' PGXACT - it might not be worth the price there, but
> if you have workloads that have batch xacts + high churn rate xacts it'd be
> pretty appealing.
>
> It wouldn't help with xid wraparound concerns, but it could help a lot
> with bloat caused by old snapshots for some very common workloads.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Do we need to handle orphaned prepared transactions in the server?

2020-04-14 Thread Hamid Akhtar

Thank you so much Bruce and Sawada.

Bruce:
I'll update the documentation with more details for max_age_prepared_xacts

Sawada:
You have raised 4 very valid points. Here are my thoughts.

(1)
I think your concern is that if we can reduce the need of 2 GUCs to one.

The purpose of having 2 different GUCs was serving two different purposes.
- max_age_prepared_xacts; this defines the maximum age of a prepared
transaction after which it may be considered an orphan.
- prepared_xacts_vacuum_warn_timeout; since we are throwing warnings in the
log, we need a way of controlling the behaviour to prevent from flooding
the log file with our messages. This timeout defines that. May be there is
a better way of handling this, but may be not.

(2)
Your point is that when there are more than one prepared transactions (and
even if the first one is removed), timeout
prepared_xacts_vacuum_warn_timeout isn't always accurate.

Yes, I agree. However, for us to hit the exact timeout for each prepared
transaction, we need setup timers and callback functions. That's too much
of an overhead IMHO. The alternative design that I took (the current
design) is based on the assumption that we don't need a precise timeout for
these transactions or for vacuum to report these issues to log. So, a
decent enough way of setting a timeout should be good enough considering
that it doesn't add any real overhead to the vacuum process. This does mean
that an orphaned prepared transaction may be notified after
prepared_xacts_vacuum_warn_timeout * 2. This, IMHO should be acceptable
behavior.

(3)
Message is too detailed.

Yes, I agree. I'll review this an update the patch.

(4)
GUCs should be renamed.

Yes, I agree. The names you have suggested make more sense. I'll send an
updated version of the patch with the new names and other suggested changes.

On Wed, Mar 11, 2020 at 10:43 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Mon, 2 Mar 2020 at 21:42, Hamid Akhtar  wrote:
> >
> > Here is the v2 of the same patch after rebasing it and running it
> through pgindent. There are no other code changes.
> >
>
> Thank you for working on this. I think what this patch tries to
> achieve would be helpful to inform orphaned prepared transactions that
> can be cause of inefficient vacuum to users.
>
> As far as I read the patch, the setting this feature using newly added
> parameters seems to be complicated to me. IIUC, even if a prepared
> transactions is enough older than max_age_prepared_xacts, we don't
> warn if it doesn't elapsed prepared_xacts_vacuum_warn_timeout since
> when the "first" prepared transaction is created. And the first
> prepared transaction means that the first entry for
> TwoPhaseStateData->prepXacts. Therefore, if there is always more than
> one prepared transaction, we don't warn for at least
> prepared_xacts_vacuum_warn_timeout seconds even if the first added
> prepared transaction is already removed. So I'm not sure how we can
> think the setting value of prepared_xacts_vacuum_warn_timeout.
>
> Regarding the warning message, I wonder if the current message is too
> detailed. If we want to inform that there is orphaned prepared
> transactions to users, it seems to me that it's enough to report the
> existence (and possibly the number of orphaned prepared transactions),
> rather than individual details.
>
> Given that the above things, we can simply think this feature; we can
> have only max_age_prepared_xacts, and autovacuum checks the minimum of
> prepared_at of prepared transactions, and compare it to
> max_age_prepared_xacts. We can warn if (CurrentTimestamp -
> min(prepared_at)) > max_age_prepared_xacts. In addition, if we also
> want to control this behavior by the age of xid, we can have another
> GUC parameter for comparing the age(min(xid of prepared transactions))
> to that value.
>
> Finally, regarding the name of parameters, when we mention the age of
> transaction it means the age of xid of the transaction, not the time.
> Please refer to other GUC parameter having "age" in its name such as
> autovacuum_freeze_max_age, vacuum_freeze_min_age. The patch adds
> max_age_prepared_xacts but I think it should be renamed. For example,
> prepared_xact_warn_min_duration is for time and
> prepared_xact_warn_max_age for age.
>
> Regards,
>
> --
> Masahiko Sawadahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: BUG #16346: pg_upgrade fails on a trigger with a comment

2020-04-08 Thread Hamid Akhtar

I have tested the patch in a little more detail.
(1) Verified that it fixes the bug
(2) Ran regression tests; all are passing.

To recap, the attached patch moves restoration of comments to the
RESTORE_PASS_POST_ACL. This ensures that comments are
restored in a PASS when essentially all required objects are created
including event triggers and materialized views (and any other db
objects).

This patch is good from my side.

On Wed, Apr 8, 2020 at 1:10 AM Hamid Akhtar  wrote:

> As you have mentioned, I have verified that indeed commit 4c40b27b broke
> this.
>
> In this particular commit moves restoration of materialized views and
> event triggers to the last phase. Perhaps, comments should also be moved to
> this phase as there may comments on either of these types of objects.
>
> Attached is a patch that resolves this issue. I've verified that it
> resolve the upgrade (and restore issue) introduced by 4c40b27b. I'll test
> this patch in a little more detail tomorrow.
>
> On Mon, Apr 6, 2020 at 8:26 PM PG Bug reporting form <
> nore...@postgresql.org> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:  16346
>> Logged by:  Alexander Lakhin
>> Email address:  exclus...@gmail.com
>> PostgreSQL version: 12.2
>> Operating system:   Ubuntu 18.04
>> Description:
>>
>> When using pg_upgrade on a database with the following contents:
>> CREATE FUNCTION public.test_event_trigger() RETURNS event_trigger
>> LANGUAGE plpgsql
>> AS $$
>> BEGIN
>> RAISE NOTICE 'test_event_trigger: % %', tg_event, tg_tag;
>> END
>> $$;
>>
>> CREATE EVENT TRIGGER regress_event_trigger3 ON ddl_command_start
>>EXECUTE PROCEDURE public.test_event_trigger();
>>
>> COMMENT ON EVENT TRIGGER regress_event_trigger3 IS 'test comment';
>>
>> I get:
>> Restoring global objects in the new cluster ok
>> Restoring database schemas in the new cluster
>>   postgres
>> *failure*
>>
>> Consult the last few lines of "pg_upgrade_dump_14174.log" for
>> the probable cause of the failure.
>> Failure, exiting
>>
>> pg_upgrade_dump_14174.log contains:
>> command: "/src/postgres/tmp_install/usr/local/pgsql/bin/pg_restore" --host
>> /src/postgres --port 50432 --username postgres --clean --create
>> --exit-on-error --verbose --dbname template1
>> "pg_upgrade_dump_14174.custom"
>> >> "pg_upgrade_dump_14174.log" 2>&1
>> pg_restore: connecting to database for restore
>> pg_restore: dropping DATABASE PROPERTIES postgres
>> pg_restore: dropping DATABASE postgres
>> pg_restore: creating DATABASE "postgres"
>> pg_restore: connecting to new database "postgres"
>> pg_restore: connecting to database "postgres" as user "postgres"
>> pg_restore: creating COMMENT "DATABASE "postgres""
>> pg_restore: creating DATABASE PROPERTIES "postgres"
>> pg_restore: connecting to new database "postgres"
>> pg_restore: connecting to database "postgres" as user "postgres"
>> pg_restore: creating pg_largeobject "pg_largeobject"
>> pg_restore: creating FUNCTION "public.test_event_trigger()"
>> pg_restore: creating COMMENT "EVENT TRIGGER "regress_event_trigger3""
>> pg_restore: while PROCESSING TOC:
>> pg_restore: from TOC entry 3705; 0 0 COMMENT EVENT TRIGGER
>> "regress_event_trigger3" postgres
>> pg_restore: error: could not execute query: ERROR:  event trigger
>> "regress_event_trigger3" does not exist
>> Command was: COMMENT ON EVENT TRIGGER "regress_event_trigger3" IS 'test
>> comment';
>>
>> It looks like the commit 4c40b27b broke this.
>>
>>
>
> --
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
> ADDR: 10318 WHALLEY BLVD, Surrey, BC
> CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
> SKYPE: engineeredvirus
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: BUG #16346: pg_upgrade fails on a trigger with a comment

2020-04-07 Thread Hamid Akhtar

As you have mentioned, I have verified that indeed commit 4c40b27b broke
this.

In this particular commit moves restoration of materialized views and event
triggers to the last phase. Perhaps, comments should also be moved to this
phase as there may comments on either of these types of objects.

Attached is a patch that resolves this issue. I've verified that it resolve
the upgrade (and restore issue) introduced by 4c40b27b. I'll test this
patch in a little more detail tomorrow.

On Mon, Apr 6, 2020 at 8:26 PM PG Bug reporting form 
wrote:

> The following bug has been logged on the website:
>
> Bug reference:  16346
> Logged by:  Alexander Lakhin
> Email address:  exclus...@gmail.com
> PostgreSQL version: 12.2
> Operating system:   Ubuntu 18.04
> Description:
>
> When using pg_upgrade on a database with the following contents:
> CREATE FUNCTION public.test_event_trigger() RETURNS event_trigger
> LANGUAGE plpgsql
> AS $$
> BEGIN
> RAISE NOTICE 'test_event_trigger: % %', tg_event, tg_tag;
> END
> $$;
>
> CREATE EVENT TRIGGER regress_event_trigger3 ON ddl_command_start
>EXECUTE PROCEDURE public.test_event_trigger();
>
> COMMENT ON EVENT TRIGGER regress_event_trigger3 IS 'test comment';
>
> I get:
> Restoring global objects in the new cluster ok
> Restoring database schemas in the new cluster
>   postgres
> *failure*
>
> Consult the last few lines of "pg_upgrade_dump_14174.log" for
> the probable cause of the failure.
> Failure, exiting
>
> pg_upgrade_dump_14174.log contains:
> command: "/src/postgres/tmp_install/usr/local/pgsql/bin/pg_restore" --host
> /src/postgres --port 50432 --username postgres --clean --create
> --exit-on-error --verbose --dbname template1 "pg_upgrade_dump_14174.custom"
> >> "pg_upgrade_dump_14174.log" 2>&1
> pg_restore: connecting to database for restore
> pg_restore: dropping DATABASE PROPERTIES postgres
> pg_restore: dropping DATABASE postgres
> pg_restore: creating DATABASE "postgres"
> pg_restore: connecting to new database "postgres"
> pg_restore: connecting to database "postgres" as user "postgres"
> pg_restore: creating COMMENT "DATABASE "postgres""
> pg_restore: creating DATABASE PROPERTIES "postgres"
> pg_restore: connecting to new database "postgres"
> pg_restore: connecting to database "postgres" as user "postgres"
> pg_restore: creating pg_largeobject "pg_largeobject"
> pg_restore: creating FUNCTION "public.test_event_trigger()"
> pg_restore: creating COMMENT "EVENT TRIGGER "regress_event_trigger3""
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 3705; 0 0 COMMENT EVENT TRIGGER
> "regress_event_trigger3" postgres
> pg_restore: error: could not execute query: ERROR:  event trigger
> "regress_event_trigger3" does not exist
> Command was: COMMENT ON EVENT TRIGGER "regress_event_trigger3" IS 'test
> comment';
>
> It looks like the commit 4c40b27b broke this.
>
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


upgrade_comment_fix_bug16346.patch
Description: Binary data

Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan

2020-03-27 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

All good with this patch. 

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

The new status of this patch is: Ready for Committer

Re: Minor issues in .pgpass

2020-03-04 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Tested and looks fine to me.

The new status of this patch is: Ready for Committer

Re: Minor issues in .pgpass

2020-03-04 Thread Hamid Akhtar

On Wed, Mar 4, 2020 at 4:54 PM Fujii Masao 
wrote:

>
>
> On 2020/03/04 20:39, Hamid Akhtar wrote:
> >
> >
> > On Tue, Mar 3, 2020 at 8:57 PM Fujii Masao  <mailto:masao.fu...@oss.nttdata.com>> wrote:
> >
> >
> >
> > On 2020/03/03 21:38, Hamid Akhtar wrote:
> >  >
> >  >
> >  > On Mon, Mar 2, 2020 at 6:07 PM Fujii Masao <
> masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>  masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>>> wrote:
> >  >
> >  >
> >  >
> >  > On 2020/02/29 0:46, Hamid Akhtar wrote:
> >  >  > The following review has been posted through the
> commitfest application:
> >  >  > make installcheck-world:  not tested
> >  >  > Implements feature:   not tested
> >  >  > Spec compliant:   not tested
> >  >  > Documentation:not tested
> >  >  >
> >  >  > First of all, this seems like fixing a valid issue,
> albeit, the probability of somebody messing is low, but it is still better
> to fix this problem.
> >  >  >
> >  >  > I've not tested the patch in any detail, however, there
> are a couple of comments I have before I proceed on with detailed testing.
> >  >
> >  > Thanks for the review and comments!
> >  >
> >  >  > 1. pgindent is showing a few issues with formatting.
> Please have a look and resolve those.
> >  >
> >  > Yes.
> >  >
> >  >  > 2. I think you can potentially use "len" variable instead
> of introducing "buflen" and "tmplen" variables.
> >  >
> >  > Basically I don't want to use the same variable for several
> purposes
> >  > because which would decrease the code readability.
> >  >
> >  >  > Also, I would choose a more appropriate name for "tmp"
> variable.
> >  >
> >  > Yeah, so what about "rest" as the variable name?
> >  >
> >  >  > I believe if you move the following lines before the
> conditional statement and simply and change the if statement to "if (len >=
> sizeof(buf) - 1)", it will serve the purpose.
> >  >
> >  > ISTM that this doesn't work correctly when the "buf" contains
> >  > trailing carriage returns but not newlines (i.e., this line
> is too long
> >  > so the "buf" doesn't include newline). In this case,
> pg_strip_crlf()
> >  > shorten the "buf" and then its return value "len" should
> become
> >  > less than sizeof(buf). So the following condition always
> becomes
> >  > false unexpectedly in that case even though there is still
> rest of
> >  > the line to eat.
> >  >
> >  >
> >  > Per code comments for pg_strip_crlf:
> >  > "pg_strip_crlf -- Remove any trailing newline and carriage return"
> >  > If the buf read contains a newline or a carriage return at the
> end, then clearly the line
> >  > is not exceeding the sizeof(buf).
> >
> > No if the length of the setting line exceeds sizeof(buf) and
> > the buf contains only a carriage return at the end and not newline.
> > This case can happen because fgets() stops reading when a newline
> > (not a carriage return) is found. Normal users are very unlikely to
> > add a carriage return into the middle of the pgpass setting line
> > in practice, though. But IMO the code should handle even this
> > case because it *can* happen, if the code is not so complicated.
> >
> >
> > I'm not sure if I understand your comment here. From the code of
> pg_strip_crlf
> > I see that it is handling both carriage return and/or new line at the
> end of a
> > string:
>
> So if "buf" contains a carriage return at the end, it's removed and
> the "len" that pg_strip_crlf() returns obviously should be smaller
> than sizeof(buf). This causes the following condition that you
> proposed as follows to always be false (i.e., len < sizeof(buf) - 1)
> even when there are still rest of line. So we cannot eat rest of
> the line even though it exists. I'm missing something?
>

No, you are perfectly fine. I now understand where you are coming from. So,
al

Re: Minor issues in .pgpass

2020-03-04 Thread Hamid Akhtar

On Tue, Mar 3, 2020 at 8:57 PM Fujii Masao 
wrote:

>
>
> On 2020/03/03 21:38, Hamid Akhtar wrote:
> >
> >
> > On Mon, Mar 2, 2020 at 6:07 PM Fujii Masao  <mailto:masao.fu...@oss.nttdata.com>> wrote:
> >
> >
> >
> > On 2020/02/29 0:46, Hamid Akhtar wrote:
> >  > The following review has been posted through the commitfest
> application:
> >  > make installcheck-world:  not tested
> >  > Implements feature:   not tested
> >  > Spec compliant:   not tested
> >  > Documentation:not tested
> >  >
> >  > First of all, this seems like fixing a valid issue, albeit, the
> probability of somebody messing is low, but it is still better to fix this
> problem.
> >  >
> >  > I've not tested the patch in any detail, however, there are a
> couple of comments I have before I proceed on with detailed testing.
> >
> > Thanks for the review and comments!
> >
> >  > 1. pgindent is showing a few issues with formatting. Please have
> a look and resolve those.
> >
> > Yes.
> >
> >  > 2. I think you can potentially use "len" variable instead of
> introducing "buflen" and "tmplen" variables.
> >
> > Basically I don't want to use the same variable for several purposes
> > because which would decrease the code readability.
> >
> >  > Also, I would choose a more appropriate name for "tmp" variable.
> >
> > Yeah, so what about "rest" as the variable name?
> >
> >  > I believe if you move the following lines before the conditional
> statement and simply and change the if statement to "if (len >= sizeof(buf)
> - 1)", it will serve the purpose.
> >
> > ISTM that this doesn't work correctly when the "buf" contains
> > trailing carriage returns but not newlines (i.e., this line is too
> long
> > so the "buf" doesn't include newline). In this case, pg_strip_crlf()
> > shorten the "buf" and then its return value "len" should become
> > less than sizeof(buf). So the following condition always becomes
> > false unexpectedly in that case even though there is still rest of
> > the line to eat.
> >
> >
> > Per code comments for pg_strip_crlf:
> > "pg_strip_crlf -- Remove any trailing newline and carriage return"
> > If the buf read contains a newline or a carriage return at the end, then
> clearly the line
> > is not exceeding the sizeof(buf).
>
> No if the length of the setting line exceeds sizeof(buf) and
> the buf contains only a carriage return at the end and not newline.
> This case can happen because fgets() stops reading when a newline
> (not a carriage return) is found. Normal users are very unlikely to
> add a carriage return into the middle of the pgpass setting line
> in practice, though. But IMO the code should handle even this
> case because it *can* happen, if the code is not so complicated.
>

I'm not sure if I understand your comment here. From the code of
pg_strip_crlf
I see that it is handling both carriage return and/or new line at the end
of a
string:
=
src/common/string.c
=
while (len > 0 && (str[len - 1] == '\n' || str[len - 1] == '\r'))
str[--len] = '\0';
=


> Regards,
>
>
> --
> Fujii Masao
> NTT DATA CORPORATION
> Advanced Platform Technology Group
> Research and Development Headquarters
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Minor issues in .pgpass

2020-03-03 Thread Hamid Akhtar

On Tue, Mar 3, 2020 at 5:38 PM Hamid Akhtar  wrote:

>
>
> On Mon, Mar 2, 2020 at 6:07 PM Fujii Masao 
> wrote:
>
>>
>>
>> On 2020/02/29 0:46, Hamid Akhtar wrote:
>> > The following review has been posted through the commitfest application:
>> > make installcheck-world:  not tested
>> > Implements feature:   not tested
>> > Spec compliant:   not tested
>> > Documentation:not tested
>> >
>> > First of all, this seems like fixing a valid issue, albeit, the
>> probability of somebody messing is low, but it is still better to fix this
>> problem.
>> >
>> > I've not tested the patch in any detail, however, there are a couple of
>> comments I have before I proceed on with detailed testing.
>>
>> Thanks for the review and comments!
>>
>> > 1. pgindent is showing a few issues with formatting. Please have a look
>> and resolve those.
>>
>> Yes.
>>
>> > 2. I think you can potentially use "len" variable instead of
>> introducing "buflen" and "tmplen" variables.
>>
>> Basically I don't want to use the same variable for several purposes
>> because which would decrease the code readability.
>>
>
That is fine.


>
>> > Also, I would choose a more appropriate name for "tmp" variable.
>>
>> Yeah, so what about "rest" as the variable name?
>>
>
May be something like "excess_buf" or any other one that describes that
these bytes are to be discarded.


>
>> > I believe if you move the following lines before the conditional
>> statement and simply and change the if statement to "if (len >= sizeof(buf)
>> - 1)", it will serve the purpose.
>>
>> ISTM that this doesn't work correctly when the "buf" contains
>> trailing carriage returns but not newlines (i.e., this line is too long
>> so the "buf" doesn't include newline). In this case, pg_strip_crlf()
>> shorten the "buf" and then its return value "len" should become
>> less than sizeof(buf). So the following condition always becomes
>> false unexpectedly in that case even though there is still rest of
>> the line to eat.
>>
>
> Per code comments for pg_strip_crlf:
> "pg_strip_crlf -- Remove any trailing newline and carriage return"
>
> If the buf read contains a newline or a carriage return at the end, then
> clearly the line
> is not exceeding the sizeof(buf). If alternatively, it doesn't, then
> pg_strip_crlf will have
> no effect on string length and for any lines exceeding sizeof(buf), the
> following conditional
> statement becomes true.
>
>
>> > +   if (len >= sizeof(buf) - 1)
>> > +   {
>> > +   chartmp[LINELEN];
>>
>> Regards,
>>
>> --
>> Fujii Masao
>> NTT DATA CORPORATION
>> Advanced Platform Technology Group
>> Research and Development Headquarters
>>
>
>
> --
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
> ADDR: 10318 WHALLEY BLVD, Surrey, BC
> CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
> SKYPE: engineeredvirus
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Minor issues in .pgpass

2020-03-03 Thread Hamid Akhtar

On Mon, Mar 2, 2020 at 6:07 PM Fujii Masao 
wrote:

>
>
> On 2020/02/29 0:46, Hamid Akhtar wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  not tested
> > Implements feature:   not tested
> > Spec compliant:   not tested
> > Documentation:not tested
> >
> > First of all, this seems like fixing a valid issue, albeit, the
> probability of somebody messing is low, but it is still better to fix this
> problem.
> >
> > I've not tested the patch in any detail, however, there are a couple of
> comments I have before I proceed on with detailed testing.
>
> Thanks for the review and comments!
>
> > 1. pgindent is showing a few issues with formatting. Please have a look
> and resolve those.
>
> Yes.
>
> > 2. I think you can potentially use "len" variable instead of introducing
> "buflen" and "tmplen" variables.
>
> Basically I don't want to use the same variable for several purposes
> because which would decrease the code readability.
>
> > Also, I would choose a more appropriate name for "tmp" variable.
>
> Yeah, so what about "rest" as the variable name?
>
> > I believe if you move the following lines before the conditional
> statement and simply and change the if statement to "if (len >= sizeof(buf)
> - 1)", it will serve the purpose.
>
> ISTM that this doesn't work correctly when the "buf" contains
> trailing carriage returns but not newlines (i.e., this line is too long
> so the "buf" doesn't include newline). In this case, pg_strip_crlf()
> shorten the "buf" and then its return value "len" should become
> less than sizeof(buf). So the following condition always becomes
> false unexpectedly in that case even though there is still rest of
> the line to eat.
>

Per code comments for pg_strip_crlf:
"pg_strip_crlf -- Remove any trailing newline and carriage return"

If the buf read contains a newline or a carriage return at the end, then
clearly the line
is not exceeding the sizeof(buf). If alternatively, it doesn't, then
pg_strip_crlf will have
no effect on string length and for any lines exceeding sizeof(buf), the
following conditional
statement becomes true.


> > +   if (len >= sizeof(buf) - 1)
> > +   {
> > +   chartmp[LINELEN];
>
> Regards,
>
> --
> Fujii Masao
> NTT DATA CORPORATION
> Advanced Platform Technology Group
> Research and Development Headquarters
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Do we need to handle orphaned prepared transactions in the server?

2020-03-02 Thread Hamid Akhtar

Here is the v2 of the same patch after rebasing it and running it through
pgindent. There are no other code changes.


On Wed, Feb 19, 2020 at 8:04 PM Hamid Akhtar  wrote:

> All,
>
> Attached is version 1 of POC patch for notifying of orphaned
> prepared transactions via warnings emitted to a client
> application and/or log file. It applies to PostgreSQL branch
> "master" on top of "e2e02191" commit.
>
> I've tried to keep the patch as less invasive as I could with
> minimal impact on vacuum processes, so the performance impact
> and the changes are minimal in that area of PostgreSQL core.
>
>
> - What's in this Patch:
>
> This patch throws warnings when an autovacuum worker encounters
> an orphaned prepared transaction. It also throws warnings to a
> client when a vacuum command is issued. This patch also
> introduces two new GUCs:
>
> (1) max_age_prepared_xacts
> - The age after creation of a prepared transaction after which it
> will be considered an orphan.
>
> (2) prepared_xacts_vacuum_warn_timeout
> - The timeout period for an autovacuum (essentially any of its
> worker) to check for orphaned prepared transactions and throw
> warnings if any are found.
>
>
> - What This Patch Does:
>
> If the GUCs are enabled (set to a value higher than -1), an
> autovacuum worker running in the background checks if the
> timeout has expired. If so, it checks if there are any orphaned
> prepared transactions (i.e. their age has exceeded
> max_age_prepared_xacts). If it finds any, it throws a warning for
> every such transaction. It also emits the total number of orphaned
> prepared transactions if one or more are found.
>
> When a vacuum command is issued from within a client, say psql,
> in that case, we skip the vacuum timeout check and simply scan
> for any orphaned prepared transactions. Warnings are emitted to
> the client and log file if any are found.
>
>
> - About the New GUCs:
>
> = max_age_prepared_xacts:
> Sets maximum age after which a prepared transaction is considered an
> orphan. It applies when "prepared transactions" are enabled. The
> age for a transaction is calculated from the time it was created to
> the current time. If this value is specified without units, it is taken
> as milliseconds. The default value is -1 which allows prepared
> transactions to live forever.
>
> = prepared_xacts_vacuum_warn_timeout:
> Sets timeout after which vacuum starts throwing warnings for every
> prepared transactions that has exceeded maximum age defined by
> "max_age_prepared_xacts". If this value is specified without units,
> it is taken as milliseconds. The default value of -1 will disable
> this warning mechanism. Setting a too value could potentially fill
> up log with orphaned prepared transaction warnings, so this
> parameter must be set to a value that is reasonably large to not
> fill up log file, but small enough to notify of long running and
> potential orphaned prepared transactions. There is no additional
> timer or worker introduced with this change. Whenever a vacuum
> worker runs, it first checks for any orphaned prepared transactions.
> So at best, this GUC serves as a guideline for a vacuum worker
> if a warning should be thrown to log file or a client issuing
> vacuum command.
>
>
> - What this Patch Does Not Cover:
>
> The warning is not thrown when user either runs vacuumdb or passes
> individual relations to be vacuum. Specifically in case of vacuumdb,
> it breaks down a vacuum command to an attribute-wise vacuum command.
> So the vacuum command is indirectly run many times. Considering that
> we want to emit warnings for every manual vacuum command, this simply
> floods the terminal and log with orphaned prepared transactions
> warnings. We could potentially handle that, but the overhead of
> that seemed too much to me (and I've not invested any time looking
> to fix that either). Hence, warnings are not thrown when user runs
> vacuumdb and relation specific vacuum.
>
>
>
> On Fri, Jan 31, 2020 at 7:02 PM Hamid Akhtar 
> wrote:
>
>>
>>
>> On Thu, Jan 30, 2020 at 8:28 AM Craig Ringer 
>> wrote:
>>
>>> On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar 
>>> wrote:
>>> >
>>> > So having seen the feedback on this thread, and I tend to agree with
>>> most of what has been said here, I also agree that the server core isn't
>>> really the ideal place to handle the orphan prepared transactions.
>>> >
>>> > Ideally, these must be handled by a transaction manager, however, I do
>>> believe that we cannot let database suffer for failing of an external
>>> software, and we di

Re: Minor issues in .pgpass

2020-02-28 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

First of all, this seems like fixing a valid issue, albeit, the probability of 
somebody messing is low, but it is still better to fix this problem.

I've not tested the patch in any detail, however, there are a couple of 
comments I have before I proceed on with detailed testing.

1. pgindent is showing a few issues with formatting. Please have a look and 
resolve those.
2. I think you can potentially use "len" variable instead of introducing 
"buflen" and "tmplen" variables. Also, I would choose a more appropriate name 
for "tmp" variable.

I believe if you move the following lines before the conditional statement and 
simply and change the if statement to "if (len >= sizeof(buf) - 1)", it will 
serve the purpose.

/* strip trailing newline and carriage return */
len = pg_strip_crlf(buf);

if (len == 0)
continue;


So, the patch should look like this in my opinion (ignore the formatting issues 
as this is just to give you an idea of what I mean):

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 408000a..6ca262f 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -6949,6 +6949,7 @@ passwordFromFile(const char *hostname, const char *port, 
const char *dbname,
 {
FILE   *fp;
struct stat stat_buf;
+   int line_number = 0;
 
 #define LINELEN NAMEDATALEN*5
charbuf[LINELEN];
@@ -7018,10 +7019,40 @@ passwordFromFile(const char *hostname, const char 
*port, const char *dbname,
if (fgets(buf, sizeof(buf), fp) == NULL)
break;
 
-   /* strip trailing newline and carriage return */
-   len = pg_strip_crlf(buf);
+   line_number++;
 
-   if (len == 0)
+/* strip trailing newline and carriage return */
+len = pg_strip_crlf(buf);
+
+if (len == 0)
+continue;
+
+   if (len >= sizeof(buf) - 1)
+   {
+   chartmp[LINELEN];
+
+   /*
+* Warn if this password setting line is too long,
+* because it's unexpectedly truncated.
+*/
+   if (buf[0] != '#')
+   fprintf(stderr,
+   libpq_gettext("WARNING: line %d 
too long in password file \"%s\"\n"),
+   line_number, pgpassfile);
+
+   /* eat rest of the line */
+   while (!feof(fp) && !ferror(fp))
+   {
+   if (fgets(tmp, sizeof(tmp), fp) == NULL)
+   break;
+   len = strlen(tmp);
+   if (len < sizeof(tmp) -1 || tmp[len - 1] == 
'\n')
+   break;
+   }
+   }
+
+   /* ignore comments */
+   if (buf[0] == '#')

---
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

The new status of this patch is: Waiting on Author

Re: Do we need to handle orphaned prepared transactions in the server?

2020-02-19 Thread Hamid Akhtar

All,

Attached is version 1 of POC patch for notifying of orphaned
prepared transactions via warnings emitted to a client
application and/or log file. It applies to PostgreSQL branch
"master" on top of "e2e02191" commit.

I've tried to keep the patch as less invasive as I could with
minimal impact on vacuum processes, so the performance impact
and the changes are minimal in that area of PostgreSQL core.

- What's in this Patch:

This patch throws warnings when an autovacuum worker encounters
an orphaned prepared transaction. It also throws warnings to a
client when a vacuum command is issued. This patch also
introduces two new GUCs:

(1) max_age_prepared_xacts
- The age after creation of a prepared transaction after which it
will be considered an orphan.

(2) prepared_xacts_vacuum_warn_timeout
- The timeout period for an autovacuum (essentially any of its
worker) to check for orphaned prepared transactions and throw
warnings if any are found.

- What This Patch Does:

If the GUCs are enabled (set to a value higher than -1), an
autovacuum worker running in the background checks if the
timeout has expired. If so, it checks if there are any orphaned
prepared transactions (i.e. their age has exceeded
max_age_prepared_xacts). If it finds any, it throws a warning for
every such transaction. It also emits the total number of orphaned
prepared transactions if one or more are found.

When a vacuum command is issued from within a client, say psql,
in that case, we skip the vacuum timeout check and simply scan
for any orphaned prepared transactions. Warnings are emitted to
the client and log file if any are found.

- About the New GUCs:

= max_age_prepared_xacts:
Sets maximum age after which a prepared transaction is considered an
orphan. It applies when "prepared transactions" are enabled. The
age for a transaction is calculated from the time it was created to
the current time. If this value is specified without units, it is taken
as milliseconds. The default value is -1 which allows prepared
transactions to live forever.

= prepared_xacts_vacuum_warn_timeout:
Sets timeout after which vacuum starts throwing warnings for every
prepared transactions that has exceeded maximum age defined by
"max_age_prepared_xacts". If this value is specified without units,
it is taken as milliseconds. The default value of -1 will disable
this warning mechanism. Setting a too value could potentially fill
up log with orphaned prepared transaction warnings, so this
parameter must be set to a value that is reasonably large to not
fill up log file, but small enough to notify of long running and
potential orphaned prepared transactions. There is no additional
timer or worker introduced with this change. Whenever a vacuum
worker runs, it first checks for any orphaned prepared transactions.
So at best, this GUC serves as a guideline for a vacuum worker
if a warning should be thrown to log file or a client issuing
vacuum command.

- What this Patch Does Not Cover:

The warning is not thrown when user either runs vacuumdb or passes
individual relations to be vacuum. Specifically in case of vacuumdb,
it breaks down a vacuum command to an attribute-wise vacuum command.
So the vacuum command is indirectly run many times. Considering that
we want to emit warnings for every manual vacuum command, this simply
floods the terminal and log with orphaned prepared transactions
warnings. We could potentially handle that, but the overhead of
that seemed too much to me (and I've not invested any time looking
to fix that either). Hence, warnings are not thrown when user runs
vacuumdb and relation specific vacuum.

On Fri, Jan 31, 2020 at 7:02 PM Hamid Akhtar  wrote:

>
>
> On Thu, Jan 30, 2020 at 8:28 AM Craig Ringer 
> wrote:
>
>> On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar 
>> wrote:
>> >
>> > So having seen the feedback on this thread, and I tend to agree with
>> most of what has been said here, I also agree that the server core isn't
>> really the ideal place to handle the orphan prepared transactions.
>> >
>> > Ideally, these must be handled by a transaction manager, however, I do
>> believe that we cannot let database suffer for failing of an external
>> software, and we did a similar change through introduction of idle in
>> transaction timeout behavior.
>>
>> The difference, IMO, is that idle-in-transaction aborts don't affect
>> anything we've promised to be durable.
>>
>> Once you PREPARE TRANSACTION the DB has made a promise that that txn
>> is durable. We don't have any consistent feedback channel to back to
>> applications and say "Hey, if you're not going to finish this up we
>> need to get rid of it soon, ok?". If a distributed transaction manager
>> gets consensus for commit and goes to COMMIT PREPARED a previously
>> prepared txn on

Re: Do we need to handle orphaned prepared transactions in the server?

2020-01-31 Thread Hamid Akhtar

On Thu, Jan 30, 2020 at 8:28 AM Craig Ringer  wrote:

> On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar  wrote:
> >
> > So having seen the feedback on this thread, and I tend to agree with
> most of what has been said here, I also agree that the server core isn't
> really the ideal place to handle the orphan prepared transactions.
> >
> > Ideally, these must be handled by a transaction manager, however, I do
> believe that we cannot let database suffer for failing of an external
> software, and we did a similar change through introduction of idle in
> transaction timeout behavior.
>
> The difference, IMO, is that idle-in-transaction aborts don't affect
> anything we've promised to be durable.
>
> Once you PREPARE TRANSACTION the DB has made a promise that that txn
> is durable. We don't have any consistent feedback channel to back to
> applications and say "Hey, if you're not going to finish this up we
> need to get rid of it soon, ok?". If a distributed transaction manager
> gets consensus for commit and goes to COMMIT PREPARED a previously
> prepared txn only to find that it has vanished, that's a major
> problem, and one that may bring the entire DTM to a halt until the
> admin can intervene.
>
> This isn't like idle-in-transaction aborts. It's closer to something
> like uncommitting a previously committed transaction.
>
> I do think it'd make sense to ensure that the documentation clearly
> highlights the impact of abandoned prepared xacts on server resource
> retention and performance, preferably with pointers to appropriate
> views. I haven't reviewed the docs to see how clear that is already.
>

Having seen the documentation, IMHO the document does contain enough
information for users to understand what issues can be caused by these
orphaned prepared transactions.


>
> I can also see an argument for a periodic log message (maybe from
> vacuum?) warning when old prepared xacts hold xmin down. Including one
> sent to the client application when an explicit VACUUM is executed.
> (In fact, it'd make sense to generalise that for all xmin-retention).
>

I think that opens up the debate on what we really mean by "old" and
whether that requires a syntax change when creating a prepared
transactions as Thomas Kellerer suggested earlier?

I agree that vacuum should periodically throw warnings for any prepared
xacts that are holding xmin down.

Generalising it for all xmin-retention is a fair idea IMHO, though that
does increase the overall scope here. A vacuum process should (ideally)
periodically throw out warnings for anything that is preventing it
(including
orphaned prepared transactions) from doing its routine work so that
somebody can take necessary actions.


> But I'm really not a fan of aborting such txns. If you operate with
> some kind of broken global transaction manager that can forget or
> abandon prepared xacts, then fix it, or adopt site-local periodic
> cleanup tasks that understand your site's needs.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Do we need to handle orphaned prepared transactions in the server?

2020-01-29 Thread Hamid Akhtar

So having seen the feedback on this thread, and I tend to agree with most
of what has been said here, I also agree that the server core isn't really
the ideal place to handle the orphan prepared transactions.

Ideally, these must be handled by a transaction manager, however, I do
believe that we cannot let database suffer for failing of an external
software, and we did a similar change through introduction of idle in
transaction timeout behavior. That said, implementing something similar for
this feature is too much of an overhead both in terms of code complexity
and resources utilisation (if the feature is implemented).

I'm currently working on other options to tackle this problem.


On Tue, 28 Jan 2020 at 9:04 AM, Craig Ringer  wrote:

> On Thu, 23 Jan 2020 at 15:04, Michael Paquier  wrote:
>
> > It seems to me that what you are describing here is a set of
> > properties good for a monitoring tool that we don't necessarily need
> > to maintain in core.  There are already tools able to do that in ways
> > I think are better than what we could ever design, like
> > check_pgactivity and such.
>
> I really have to disagree here.
>
> Relying on external tools gives users who already have to piece
> together a lot of fragments even more moving parts to keep track of.
> It introduces more places where new server releases may not be
> supported in a timely manner by various tools users rely on. More
> places where users may get wrong or incomplete information from
> outdated or incorrect tools. I cite the monstrosity that
> "check_postgres.pl" has become as a specific example of why pushing
> our complexity onto external tools is not always the right answer.
>
> We already have a number of views that prettify information to help
> administrators operate the server. You could argue that
> pg_stat_activity and pg_stat_replication are unnecessary for example;
> users should use external tools to query pg_stat_get_activity(),
> pg_stat_get_wal_senders(), pg_authid and pg_database directly to get
> the information they need. Similarly, we could do away with
> pg_stat_user_indexes and the like, as they're just convenience views
> over lower level information exposed by the server.
>
> But can you really imagine using postgres day to day without
> pg_stat_activity?
>
> It is my firm opinion that visibility into locking behaviour and lock
> waits is of a similar level of importance. So is giving users some way
> to get insight into table and index bloat on our MVCC database. With
> the enormous uptake of various forms of replication and HA it's also
> important that users also be able to see what's affecting resource
> retention - holding down vacuum, retaining WAL, etc.
>
> The server knows more than any tools. Views in the server can also be
> maintained along with the server to address changes in how it manages
> things like resource retention, so external tools get a more
> consistent insight into server behaviour.
>
> > I'd rather just focus in the core code on the basics with views
> > that map directly to what we have in memory and/or disk.
>
> Per above, I just can't agree with this. PostgreSQL is a system with
> end users who need to interact with it, most of whom will not know how
> its innards work. If we're going to position it even more as a
> component in some larger stack such that it's not expected to really
> be used standalone, then we should make some effort to guide users
> toward the other components they will need *in our own documentation*
> and ensure they're tested and maintained.
>
> Proposals to do that with HA and failover tooling, backup tooling etc
> have never got off the ground. I think we do users a great disservice
> there personally. I don't expect any proposal to bless specific
> monitoring tools to be any more successful.
>
> More importantly, I fail to see why every monitoring tool should
> reinvent the same information collection queries and views, each with
> their own unique bugs and quirks, when we can provide information
> users need directly from the server.
>
> In any case I guess it's all hot air unless I pony up a patch to show
> how I think it should work.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise
>
>
> --
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: BUG #16171: Potential malformed JSON in explain output

2020-01-24 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I've reviewed and verified this patch and IMHO, this is ready to be committed.


So I have verified this patch against the tip of REL_12_STABLE branch (commit 
c4c76d19).
- git apply  works without issues.

Although the patch description mentions that it fixes a malformed JSON in 
explain output by creating a "Plan" group, this also fixes the same 
malformation issue for XML and YAML formats. It does not impact the text output 
though.

I'm sharing the problematic part of output for an unpatched version for JSON, 
XML, and YAML formats:
-- JSON
   "Plans": [
 "Subplans Removed": 1,
 {
   "Node Type": "Seq Scan",

-- XML
   
 1
 

-- YAML
 Plans:
   Subplans Removed: 1
   - Node Type: "Seq Scan"

The patched version gives the following and correct output:
-- JSON
   "Plans": [
 {
   "Subplans Removed": 1
 }, 
 {
   "Node Type": "Seq Scan",

-- XML
   
 
   1
 
 

-- YAML
 Plans:
   - Subplans Removed: 1
   - Node Type: "Seq Scan"

Following is the query that I used for validating the output. I picked it up 
(and simplified) from "src/test/regress/sql/partition_prune.sql". You can 
probably come up with a simpler query, but this does the job. The query below 
gives the output in JSON format:

create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (2);

-- Disallow index only scans as concurrent transactions may stop visibility
-- bits being set causing "Heap Fetches" to be unstable in the EXPLAIN ANALYZE
-- output.
set enable_indexonlyscan = off;

prepare ab_q1 (int, int, int) as
select * from ab where a between $1 and $2 and b <= $3;

-- Execute query 5 times to allow choose_custom_plan
-- to start considering a generic plan.
execute ab_q1 (1, 8, 3);
execute ab_q1 (1, 8, 3);
execute ab_q1 (1, 8, 3);
execute ab_q1 (1, 8, 3);
execute ab_q1 (1, 8, 3);

explain (format json, analyze, costs off, summary off, timing off) execute 
ab_q1 (2, 2, 3);

deallocate ab_q1;
drop table ab;


The new status of this patch is: Ready for Committer

Do we need to handle orphaned prepared transactions in the server?

2020-01-21 Thread Hamid Akhtar

Hello Everyone,

I have been thinking about the orphaned prepared transaction problem in
PostgreSQL and pondering on ways for handling it.

A prepared transaction can be left unfinished (neither committed nor
rollbacked) if the client has disappeared. It can happen for various
reasons including a client crash, or a server crash leading to client's
connection getting terminated and never returning back. Another way a
prepared transaction can be left unfinished is if a backup is restored that
carried the preparation steps, but not the steps closing the transaction.

Needless to mention that this does hamper maintenance work including
vacuuming of dead tuples.

First and foremost is to define what an orphaned transaction is. At this
stage, I believe any prepared transaction that has been there for more than
X time may be considered as an orphan. X may be defined as an integer in
seconds (a GUC perhaps). May be there are better ways to define this.
Please feel free to chime in.

This leads to a question whether at server level, we need to be better at
managing these orphaned prepared transactions. There are obviously other
ways of identifying such transactions by simply querying the
pg_prepared_xacts and checking transaction start date, which begs the
question if there is a real need here to make a change in the server to
either terminate these transactions (perhaps something similar to
idle_in_transaction_session_timeout) or notify an administrator (less
preferred as I believe notifications should be handled by some external
tools, not by server).

I see 3 potential solutions for solving this:
(1) Only check for any prepared transactions when server is starting or
restarting (may be after a crash)
(2) Have a background process that is checking on an idle timeout of
prepared transactions
(3) Do not make any change in the server and let the administrator handle
this by a client or an external tool

Option (1) IMHO seems to be the least suitable one as I'd expect that when
a server is being started (or restarted) perhaps after a crash, it is done
manually and user can see the server startup logs. So it is very likely
that user will notice any prepared transactions that were created when the
server was previously running and take any necessary actions.

Option (3) is let user manage it on their own, however they wish. This is
the simplest and the easiest way as we don't need to do anything here.

Option (2) is probably the best solution IMHO. Though, it does require
changes in the server which might not be an undertaking we wish to not
pursue for this problem.

So in case we wish to move forward with Option (2), this will require a
change in the server. One potential place is in autovacuum by adding a
similar change as it was done for idle_in_transaction_session_timeout, but
rather than terminating the connection in this case, we simply abort/roll
back the transaction. We could have a similar GUC for a prepared
transaction timeout. Though in this case, to be able to do that, we
obviously need a backend process that can monitor the timer which will add
overhead to any existing background process like the autovacuum, or
creation of a new background process (which is not such a good idea IMHO)
which will add even more overhead.

At this stage, I'm not sure of the scale of changes this will require,
however, I wanted to get an understanding and consensus on whether (a) this
is something we should work on, and (b) whether an approach to implementing
a timeout makes sense.

Please feel free to share your thoughts here.

Regards.

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

64 matches

Mail list logo