date:20180312

Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-12 Thread Ashutosh Bapat

On Sun, Mar 11, 2018 at 5:49 PM, Amit Kapila  wrote:

>
>> +/* Add projection step if needed */
>> +if (target && simple_gather_path->pathtarget != target)
>>
>> If the target was copied someplace, this test will fail. Probably we want to
>> check containts of the PathTarget structure? Right now copy_pathtarget() is 
>> not
>> called from many places and all those places modify the copied target. So 
>> this
>> isn't a problem. But we can't guarantee that in future. Similar comment for
>> gather_merge path creation.
>>
>
> I am not sure if there is any use of copying the path target unless
> you want to modify it , but anyway we use the check similar to what is
> used in patch in the multiple places in code.  See
> create_ordered_paths.  So, we need to change all those places first if
> we sense any such danger.

Even if the test fails and we add a projection path here, while
creating the plan we avoid adding a Result node when the projection
target and underlying plan's target look same
(create_projection_plan()), so this works. An advantage with this
simple check (although it miscosts the projection) is that we don't do
expensive target equality check for every path. The expensive check
happens only on the chosen path.

>
>>
>>  deallocate tenk1_count;
>> +explain (costs off) select ten, costly_func(ten) from tenk1;
>>
>> verbose output will show that the parallel seq scan's targetlist has
>> costly_func() in it. Isn't that what we want to test here?
>>
>
> Not exactly, we want to just test whether the parallel plan is
> selected when the costly function is used in the target list.

Parallel plan may be selected whether or not costly function exists in
the targetlist, if the underlying scan is optimal with parallel scan.
AFAIU, this patch is about pushing down the costly functions into the
parllel scan's targetlist. I think that can be verified only by
looking at the targetlist of parallel seq scan plan.

The solution here addresses only parallel scan requirement. In future
if we implement a solution which also addresses requirements of FDW
and custom plan (i.e. ability to handle targetlists by FDW and custom
plan), the changes made here will need to be reverted. That would be a
painful exercsize.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: parallel append vs. simple UNION ALL

2018-03-12 Thread Ashutosh Bapat

On Fri, Mar 9, 2018 at 1:28 AM, Robert Haas  wrote:
>
>> 0003
>> Probably we want to rename generate_union_path() as generate_union_rel() or
>> generate_union_paths() since the function doesn't return a path anymore.
>> Similarly for generate_nonunion_path().
>
> Good point.  Changed.

It looks like it was not changed in all the places. make falied. I
have fixed all the instances of these two functions in the attached
patchset (only 0003 changes). Please check.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From 7e654c0ae30d867edea5a1a2ca8f7a17b05ed7c5 Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Fri, 23 Feb 2018 11:53:07 -0500
Subject: [PATCH 1/4] Let Parallel Append over simple UNION ALL have partial
 subpaths.

A simple UNION ALL gets flattened into an appendrel of subquery
RTEs, but up until now it's been impossible for the appendrel to use
the partial paths for the subqueries, so we can implement the
appendrel as a Parallel Append but only one with non-partial paths
as children.

There are three separate obstacles to removing that limitation.
First, when planning a subquery, propagate any partial paths to the
final_rel so that they are potentially visible to outer query levels
(but not if they have initPlans attached, because that wouldn't be
safe).  Second, after planning a subquery, propagate any partial paths
for the final_rel to the subquery RTE in the outer query level in the
same way we do for non-partial paths.  Third, teach finalize_plan() to
account for the possibility that the fake parameter we use for rescan
signalling when the plan contains a Gather (Merge) node may be
propagated from an outer query level.

Patch by me, reviewed and tested by Amit Khandekar and Rajkumar
Raghuwanshi.  Test cases based on examples by Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/ca+tgmoa6l9a1nnck3atdvzlz4kkhdn1+tm7mfyfvp+uqps7...@mail.gmail.com
---
 src/backend/optimizer/path/allpaths.c |   22 +
 src/backend/optimizer/plan/planner.c  |   16 ++
 src/backend/optimizer/plan/subselect.c|   17 ++-
 src/test/regress/expected/select_parallel.out |   65 +
 src/test/regress/sql/select_parallel.sql  |   25 ++
 5 files changed, 143 insertions(+), 2 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a0..ea4e683 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2179,6 +2179,28 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
  create_subqueryscan_path(root, rel, subpath,
 		  pathkeys, required_outer));
 	}
+
+	/* If consider_parallel is false, there should be no partial paths. */
+	Assert(sub_final_rel->consider_parallel ||
+		   sub_final_rel->partial_pathlist == NIL);
+
+	/* Same for partial paths. */
+	foreach(lc, sub_final_rel->partial_pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+		List	   *pathkeys;
+
+		/* Convert subpath's pathkeys to outer representation */
+		pathkeys = convert_subquery_pathkeys(root,
+			 rel,
+			 subpath->pathkeys,
+			 make_tlist_from_pathtarget(subpath->pathtarget));
+
+		/* Generate outer path using this subpath */
+		add_partial_path(rel, (Path *)
+		 create_subqueryscan_path(root, rel, subpath,
+  pathkeys, required_outer));
+	}
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 24e6c46..66e7e7b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2195,6 +2195,22 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	}
 
 	/*
+	 * Generate partial paths for final_rel, too, if outer query levels might
+	 * be able to make use of them.
+	 */
+	if (final_rel->consider_parallel && root->query_level > 1 &&
+		!limit_needed(parse))
+	{
+		Assert(!parse->rowMarks && parse->commandType == CMD_SELECT);
+		foreach(lc, current_rel->partial_pathlist)
+		{
+			Path	   *partial_path = (Path *) lfirst(lc);
+
+			add_partial_path(final_rel, partial_path);
+		}
+	}
+
+	/*
 	 * If there is an FDW that's responsible for all baserels of the query,
 	 * let it consider adding ForeignPaths.
 	 */
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index dc86dd5..83008d7 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2202,6 +2202,13 @@ SS_charge_for_initplans(PlannerInfo *root, RelOptInfo *final_rel)
 		path->parallel_safe = false;
 	}
 
+	/*
+	 * Forget about any partial paths and clear consider_parallel, too;
+	 * they're not usable if we attached an initPlan.
+	 */
+	final_rel->partial_pathlist = NIL;
+	final_rel->consider_parallel = false;
+
 	/* We needn't do set_cheapest() here, caller will do it */
 }
 
@@ -2407,10 +2414,16 @@

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2018-03-12 Thread Ashutosh Bapat

On Mon, Mar 12, 2018 at 1:25 PM, Etsuro Fujita
 wrote:
> (2018/03/09 20:55), Etsuro Fujita wrote:
>>
>> (2018/03/08 14:24), Ashutosh Bapat wrote:
>>>
>>> For local constraints to be enforced, we use remote
>>> constraints. For local WCO we need to use remote WCO. That means we
>>> create many foreign tables pointing to same local table on the foreign
>>> server through many views, but it's not impossible.
>>
>>
>> Maybe I don't understand this correctly, but I guess that it would be
>> the user's responsibility to not create foreign tables in such a way.
>
>
> I think I misunderstood your words.  Sorry for that.  I think what you
> proposed would be a solution for this issue, but I'm not sure that's a good
> one because that wouldn't work for the data sources that don't support views
> with WCO options.

Our solution for the constraints doesn't work with the data sources
(like flat files) which don't support constraints. So, that argument
doesn't help.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: ALTER TABLE ADD COLUMN fast default

2018-03-12 Thread Andrew Dunstan

On Mon, Mar 12, 2018 at 1:29 AM, David Rowley
 wrote:
> On 9 March 2018 at 02:11, David Rowley  wrote:
>> On 8 March 2018 at 18:40, Andrew Dunstan  
>> wrote:
>>>  select * from t;
>>>  fastdef tps = 107.145811
>>>  master  tps = 150.207957
>>>
>>> "select * from t" used to be about a wash, but with this patch it's
>>> got worse. The last two queries were worse and are now better, so
>>> that's a win.
>>
>> How does it compare to master if you drop a column out the table?
>> Physical tlists will be disabled in that case too. I imagine the
>> performance of master will drop much lower than the all columns
>> missing case.
>
> I decided to test this for myself, and the missing version is still
> slightly slower than the dropped column version, but not by much. I'm
> not personally concerned about this.
>
> The following results are with 1000 column tables with 64 rows each.

I've done some more extensive benchmarking now. Here are some fairly
typical results from pgbench runs done on standard scale 100 pgbench
data:

[andrew@foo tests]$ PATH=$HOME/pg_fast_def/root/HEAD/inst/bin:$PATH
pgbench -S -c 10 -j 5 -T 60 test
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 5
duration: 60 s
number of transactions actually processed: 2235601
latency average = 0.268 ms
tps = 37256.886332 (including connections establishing)
tps = 37258.562925 (excluding connections establishing)
[andrew@foo tests]$ PATH=$HOME/pg_head/root/HEAD/inst/bin:$PATH
pgbench -S -c 10 -j 5 -T 60 test
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 5
duration: 60 s
number of transactions actually processed: 2230085
latency average = 0.269 ms
tps = 37164.696271 (including connections establishing)
tps = 37166.647971 (excluding connections establishing)

So generally the patched code and master are pretty much on a par.

I have also done some testing on cases meant to stress-test the
feature a bit - two 1000 column, all columns having defaults, one with
a dropped column. For the fast_default case I then also copied the
tables (and again dropped a column) so that the data files and table
definitions would match fairly closely what was being tested in the
master branch. The scripts in the attached tests.tgz. The test
platform is an Amazon r4.2xlarge instance running RHEL7.

There are two sets of results attached, one for 64 row tables and one
for 50k row tables.

The 50k row results are fairly unambiguous, the patched code performs
as well as or better (in some cases spectacularly better) than master.
In a few cases the patched code performs slightly worse than master in
the last (fdnmiss) case with the copied tables.

>
> Going by the commitfest app, the patch still does appear to be waiting
> on Author. Never-the-less, I've made another pass over it and found a
> few mistakes and a couple of ways to improve things:
>

working on these. Should have a new patch tomorrow.

> Thanks again for working on this feature. I hope we can get this into PG11.
>

Thanks for you help. I hope so too.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

results.t100r50k
Description: Binary data

results.t100r64
Description: Binary data

tests.tgz
Description: application/compressed

Re: WARNING in parallel index creation.

2018-03-12 Thread Peter Geoghegan

On Sun, Mar 11, 2018 at 10:15 PM, Jeff Janes  wrote:
> Then when I create in index, I get a warning:
>
> jjanes=# create index on pgbench_accounts (foobar(filler));
> WARNING:  cannot set parameters during a parallel operation
> WARNING:  cannot set parameters during a parallel operation
>
> If I create the index again within the same session, there is no WARNING.
>
> This only occurs if plperl.on_init is set in the postgresql.conf file.  It
> doesn't seem to matter what it is set to,
> even the empty string triggers the warning.

I wonder why DefineCustomStringVariable() does not set var->reset_val.
We see that within DefineCustomEnumVariable(),
DefineCustomRealVariable(), DefineCustomIntVariable(), etc.

-- 
Peter Geoghegan

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-12 Thread Ashutosh Bapat

On Mon, Mar 12, 2018 at 7:49 PM, Jeevan Chalke
 wrote:
>
>
> On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat
>  wrote:
>>
>> On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
>>
>> GroupPathExtraData now completely absorbs members from and replaces
>> OtherUpperPathExtraData. This means that we have to consider a way to
>> pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
>> haven't tried it in this patch.
>
>
> Initially, I was passing OtherUpperPathExtraData to FDW. But now we need to
> pass GroupPathExtraData.
>
> However, since GetForeignUpperPaths() is a generic function for all upper
> relations, we might think of renaming this struct to UpperPathExtraData. Add
> an UpperRelationKind member to it
> Which will be used to distinguish the passed in extra data. But now we only
> have extra data for grouping only, I chose not to do that here. But someone,
> when needed, may choose this approach.

We don't need UpperRelationKind member in that structure. That will be
provided by the RelOptInfo passed.

The problem here is the extra information required for grouping is not
going to be the same for that needed for window aggregate and
certainly not for ordering. If we try to jam everything in the same
structure, it will become large with many members useless for a given
operation. A reader will not have an idea about which of them are
useful and which of them are not. So, instead we should try some
polymorphism. I think we can pass a void * to GetForeignUpperPaths()
and corresponding FDW hook knows what to cast it to based on the
UpperRelationKind passed.

BTW, the patch has added an argument to GetForeignUpperPaths() but has
not documented the change in API. If we go the route of polymorphism,
we will need to document the mapping between UpperRelationKind and the
type of structure passed in.

>
>>
>> With this patch there's a failure in partition_aggregation where the
>> patch is creating paths with MergeAppend with GatherMerge underneath.
>> I think this is related to the call
>> add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
>> true. But I didn't investigate it further.
>
>
> I fixed it. We need to pass  is_partial_agg instead of
> extra->partial_partitionwise_grouping while calling
> add_paths_to_partial_grouping_rel() in case of parallelism.

Thanks for investigation and the fix.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Google Summer of Code: Potential Applicant

2018-03-12 Thread Craig Ringer

On 13 March 2018 at 05:34, Christos Maris 
wrote:

> Hey Aleksander,
>
> I am mostly interested in anything that requires C/C++ implementation and
> AlgoDS.
>
> For that reason I would love to work in any of the following (in that
> order of preference):
>
>1. Sorting algorithms benchmark and implementation
>2. Enhancing amcheck for all AMs
>3. TOAST'ing in slices
>4. Thrift datatype support
>
> I can work on any of the previous stated ones, so I am waiting on your
> feedback/insights on which one to choose.
>
>
I don't suppose I can interest you in wire-protocol compression support
instead? Probably not if you're more interested in an algorithms and data
science angle. But I can hope ;)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Google Summer of Code: Potential Applicant

2018-03-12 Thread Craig Ringer

On 13 March 2018 at 05:34, Christos Maris 
wrote:

> Hey Aleksander,
>
> I am mostly interested in anything that requires C/C++ implementation and
> AlgoDS.
>
> For that reason I would love to work in any of the following (in that
> order of preference):
>
>1. Sorting algorithms benchmark and implementation
>2. Enhancing amcheck for all AMs
>3. TOAST'ing in slices
>4. Thrift datatype support
>
>
Having recently worked with Thrift, I recommend ... don't use Thrift. The
library is awkward to work with, it isn't very source-compatible across
versions.

Consider protobuf instead.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

RE: unique indexes on partitioned tables

2018-03-12 Thread Shinoda, Noriyoshi

Hi Álvaro,

Thank you for your developing the new patch.
I will continue testing.

-Original Message-
From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com] 
Sent: Tuesday, March 13, 2018 7:51 AM
To: Shinoda, Noriyoshi 
Cc: Amit Langote ; Peter Eisentraut 
; Jaime Casanova 
; Jesper Pedersen ; 
Pg Hackers 
Subject: Re: unique indexes on partitioned tables

Shinoda, Noriyoshi wrote:

Hi,

> I tried this feature with the latest snapshot. When I executed the 
> following SQL statement, multiple primary keys were created on the 
> partition.
> Is this the intended behavior?

It turns out that the error check for duplicate PKs is only invoked if you tell 
this code that it's being invoked by ALTER TABLE, and my original patch wasn't. 
 I changed it and now everything seems to behave as expected.

I added a test case pretty much like yours, which now works correctly.
I also added another one where the bogus PK is two levels down rather than one. 
 This is because I had originally developed a different fix -- which fixed the 
problem for your test case, until I realized that since this code is recursive, 
we could cause trouble at a distance.

Thanks for reporting the problem

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: [HACKERS] Commitfest 2018-9 duplicate patch deletion request.

2018-03-12 Thread Shinoda, Noriyoshi

Thank you so much.

-Original Message-
From: David Steele [mailto:da...@pgmasters.net] 
Sent: Monday, March 12, 2018 10:24 PM
To: Shinoda, Noriyoshi ; pgsql-hackers 

Subject: Re: [HACKERS] Commitfest 2018-9 duplicate patch deletion request.

On 3/12/18 6:12 AM, Shinoda, Noriyoshi wrote:
> 
> Last week, I posted a patch to Commitfest 2018-9 which title is "[WIP] 
> Document update for Logical Replication security".
> 
> I do not know the reason, but the same content duplicated. Since I can 
> not delete posts, would you please delete either one?

Done.

--
-David
da...@pgmasters.net

Re: unique indexes on partitioned tables

2018-03-12 Thread Alvaro Herrera

Shinoda, Noriyoshi wrote:

Hi,

> I tried this feature with the latest snapshot. When I executed the
> following SQL statement, multiple primary keys were created on the
> partition. 
> Is this the intended behavior?

It turns out that the error check for duplicate PKs is only invoked if
you tell this code that it's being invoked by ALTER TABLE, and my
original patch wasn't.  I changed it and now everything seems to behave
as expected.

I added a test case pretty much like yours, which now works correctly.
I also added another one where the bogus PK is two levels down rather
than one.  This is because I had originally developed a different fix --
which fixed the problem for your test case, until I realized that since
this code is recursive, we could cause trouble at a distance.

Thanks for reporting the problem

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Additional Statistics Hooks

2018-03-12 Thread Tom Lane

Mat Arye  writes:
> So the use-case is an analytical query like

> SELECT date_trunc('hour', time) AS MetricMinuteTs, AVG(value) as avg
> FROM hyper
> WHERE time >= '2001-01-04T00:00:00' AND time <= '2001-01-05T01:00:00'
> GROUP BY MetricMinuteTs
> ORDER BY MetricMinuteTs DESC;

> Right now this query will choose a much-less-efficient GroupAggregate plan
> instead of a HashAggregate. It will choose this because it thinks the
> number of groups
> produced here is 9,000,000 because that's the number of distinct time
> values there are.
> But, because date_trunc "buckets" the values there will be about 24 groups
> (1 for each hour).

While it would certainly be nice to have better behavior for that,
"add a hook so users who can write C can fix it by hand" doesn't seem
like a great solution.  On top of the sheer difficulty of writing a
hook function, you'd have the problem that no pre-written hook could
know about all available functions.  I think somehow we'd need a way
to add per-function knowledge, perhaps roughly like the protransform
feature.

regards, tom lane

Re: explain with costs in subselect.sql

2018-03-12 Thread Andres Freund

On 2018-03-12 18:25:42 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > When forcing JITing to be enabled for all queries, obviously only useful
> > for testing, I noticed that two explain outputs changed after I added
> > explain support.
> 
> > The only differences come from:
> 
> > -- Unspecified-type literals in output columns should resolve as text
> 
> > SELECT *, pg_typeof(f1) FROM
> >   (SELECT 'foo' AS f1 FROM generate_series(1,3)) ss ORDER BY 1;
> 
> > -- ... unless there's context to suggest differently
> 
> > explain verbose select '42' union all select '43';
> > explain verbose select '42' union all select 43;
> 
> > which don't use costs=off.  Is there a reason for that? I assume it was
> > just a harmless oversight?
> 
> Duh, yeah, explain (verbose, costs off) would do fine there.
> Mea culpa.  Do you want to fix it?

Yup, will do.

Random aside: Noticed that we have *no* proper coverage of AND / OR
behaviour of returning NULL if no element returns true. Will push
something to add coverage, I've now manually retested that twice for
JITing, which seems stupid.

Greetings,

Andres Freund

explain with costs in subselect.sql

2018-03-12 Thread Andres Freund

Hi Tom, all,

When forcing JITing to be enabled for all queries, obviously only useful
for testing, I noticed that two explain outputs changed after I added
explain support.

The only differences come from:

-- Unspecified-type literals in output columns should resolve as text

SELECT *, pg_typeof(f1) FROM
  (SELECT 'foo' AS f1 FROM generate_series(1,3)) ss ORDER BY 1;

-- ... unless there's context to suggest differently

explain verbose select '42' union all select '43';
explain verbose select '42' union all select 43;

which don't use costs=off.  Is there a reason for that? I assume it was
just a harmless oversight?

Has been added in
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1e7c4bb0049732ece651d993d03bb6772e5d281a

Greetings,

Andres Freund

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-12 Thread Tom Lane

I wrote:
> Re-reading that thread, it seems like we should have applied Jeff's
> initial trivial patch[1] (to not hold AutovacuumScheduleLock across
> table_recheck_autovac) rather than waiting around for a super duper
> improvement to get agreed on.  I'm a bit tempted to go do that;
> if nothing else, it seems simple enough to back-patch, unlike most
> of the rest of what was discussed.

Jeff mentioned that that patch wasn't entirely trivial to rebase over
15739393e, and I now see why: in the code structure as it stands,
we don't know soon enough whether the table is shared.  In the
attached rebase, I solved that with the brute-force method of just
reading the pg_class tuple an extra time.  I think this is probably
good enough, really.  I thought about having do_autovacuum pass down
the tuple to table_recheck_autovac so as to not increase the net
number of syscache fetches, but I'm slightly worried about whether
we could be passing a stale pg_class tuple to table_recheck_autovac
if we do it like that.  OTOH, that just raises the question of why
we are doing any of this while holding no lock whatever on the target
table :-(.  I'm content to leave that question to the major redesign
that was speculated about in the bug #13750 thread.

This also corrects the inconsistency that at the bottom, do_autovacuum
clears wi_tableoid while taking AutovacuumLock, not AutovacuumScheduleLock
as is the documented lock for that field.  There's no actual bug there,
but cleaning this up might provide a slight improvement in concurrency
for operations that need AutovacuumLock but aren't looking at other
processes' wi_tableoid.  (Alternatively, we could decide that there's
no real need anymore for the separate AutovacuumScheduleLock, but that's
more churn than I wanted to deal with here.)

I think this is OK to commit/backpatch --- any objections?

regards, tom lane

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 21f5e2c..639bd71 100644
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
*** typedef struct autovac_table
*** 212,220 
   * wi_launchtime Time at which this worker was launched
   * wi_cost_*	Vacuum cost-based delay parameters current in this worker
   *
!  * All fields are protected by AutovacuumLock, except for wi_tableoid which is
!  * protected by AutovacuumScheduleLock (which is read-only for everyone except
!  * that worker itself).
   *-
   */
  typedef struct WorkerInfoData
--- 212,220 
   * wi_launchtime Time at which this worker was launched
   * wi_cost_*	Vacuum cost-based delay parameters current in this worker
   *
!  * All fields are protected by AutovacuumLock, except for wi_tableoid and
!  * wi_sharedrel which are protected by AutovacuumScheduleLock (note these
!  * two fields are read-only for everyone except that worker itself).
   *-
   */
  typedef struct WorkerInfoData
*** do_autovacuum(void)
*** 2317,2323 
--- 2317,2325 
  	foreach(cell, table_oids)
  	{
  		Oid			relid = lfirst_oid(cell);
+ 		HeapTuple	classTup;
  		autovac_table *tab;
+ 		bool		isshared;
  		bool		skipit;
  		int			stdVacuumCostDelay;
  		int			stdVacuumCostLimit;
*** do_autovacuum(void)
*** 2342,2350 
  		}

  		/*
! 		 * hold schedule lock from here until we're sure that this table still
! 		 * needs vacuuming.  We also need the AutovacuumLock to walk the
! 		 * worker array, but we'll let go of that one quickly.
  		 */
  		LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
  		LWLockAcquire(AutovacuumLock, LW_SHARED);
--- 2344,2366 
  		}

  		/*
! 		 * Find out whether the table is shared or not.  (It's slightly
! 		 * annoying to fetch the syscache entry just for this, but in typical
! 		 * cases it adds little cost because table_recheck_autovac would
! 		 * refetch the entry anyway.  We could buy that back by copying the
! 		 * tuple here and passing it to table_recheck_autovac, but that
! 		 * increases the odds of that function working with stale data.)
! 		 */
! 		classTup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
! 		if (!HeapTupleIsValid(classTup))
! 			continue;			/* somebody deleted the rel, forget it */
! 		isshared = ((Form_pg_class) GETSTRUCT(classTup))->relisshared;
! 		ReleaseSysCache(classTup);
! 
! 		/*
! 		 * Hold schedule lock from here until we've claimed the table.  We
! 		 * also need the AutovacuumLock to walk the worker array, but that one
! 		 * can just be a shared lock.
  		 */
  		LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
  		LWLockAcquire(AutovacuumLock, LW_SHARED);
*** do_autovacuum(void)
*** 2381,2386 
--- 2397,2412 
  		}

  		/*
+ 		 * Store the table's OID in shared memory before releasing the
+ 		 * schedule lock, so that other workers don't try to vacuum it
+ 		 * concurrently.  (We claim it here so as not to hold
+ 		 * AutovacuumScheduleLock while

Re: JIT compiling with LLVM v11

2018-03-12 Thread Andres Freund

Hi,

On 2018-03-12 17:14:00 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Currently a handful of explain outputs in the regression tests change
> > output when compiled with JITing. Therefore I'm thinking of adding
> > JITINFO or such option, which can be set to false for those tests?
> > Maintaining duplicate output for them seems painful. Better ideas?
> 
> Why not just suppress that info when COSTS OFF is specified?

I wondered about that too.  But that'd mean it'd be harder to write a
test that tests the planning bits of JITing (i.e. decision whether to
use optimization & inlining or not) .  Not sure if it's worth adding
complexity to be able to do so.

Greetings,

Andres Freund

Re: JIT compiling with LLVM v11

2018-03-12 Thread Tom Lane

Andres Freund  writes:
> Currently a handful of explain outputs in the regression tests change
> output when compiled with JITing. Therefore I'm thinking of adding
> JITINFO or such option, which can be set to false for those tests?
> Maintaining duplicate output for them seems painful. Better ideas?

Why not just suppress that info when COSTS OFF is specified?

regards, tom lane

Re: JIT compiling with LLVM v11

2018-03-12 Thread Andres Freund

On 2018-03-09 13:08:36 -0800, Andres Freund wrote:
> On 2018-03-09 15:42:24 -0500, Peter Eisentraut wrote:
> > What I'd quite like is if EXPLAIN or EXPLAIN ANALYZE showed something
> > about what kind of JIT processing was done, if any, to help with this
> > kind of testing.
> 
> Yea, I like that. I think we can only show that when timing is on,
> because otherwise the tests will not be stable depending on --with-jit
> being specified or not.
> 
> So I'm thinking of displaying it similar to the "Planning time" piece,
> i.e. depending on es->summary being enabled. It'd be good to display the
> inline/optimize/emit times too. I think we can just store it in the
> JitContext. But the inline/optimize/emission times will only be
> meaningful when the query is actually executed, I don't see a way around
> that...

Not yet really happy with how it exactly looks, but here's my current
state:

tpch_10[20923][1]=# ;explain (format text, analyze, timing off) SELECT relkind, 
relname FROM pg_class pgc WHERE relkind = 'r';
┌┐
│   QUERY PLAN  
 │
├┤
│ Seq Scan on pg_class pgc  (cost=0.00..15.70 rows=77 width=65) (actual rows=77 
loops=1) │
│   Filter: (relkind = 'r'::"char") 
 │
│   Rows Removed by Filter: 299 
 │
│ Planning time: 0.187 ms   
 │
│ JIT:  
 │
│   Functions: 4
 │
│   Inlining: false 
 │
│   Optimization: false 
 │
│ Execution time: 72.229 ms 
 │
└┘
(9 rows)

tpch_10[20923][1]=# ;explain (format text, analyze, timing on) SELECT relkind, 
relname FROM pg_class pgc WHERE relkind = 'r';

┌┐
│ QUERY PLAN
 │
├┤
│ Seq Scan on pg_class pgc  (cost=0.00..15.70 rows=77 width=65) (actual 
time=40.570..40.651 rows=77 loops=1) │
│   Filter: (relkind = 'r'::"char") 
 │
│   Rows Removed by Filter: 299 
 │
│ Planning time: 0.138 ms   
 │
│ JIT:  
 │
│   Functions: 4
 │
│   Inlining: false 
 │
│   Inlining Time: 0.000
 │
│   Optimization: false 
 │
│   Optimization Time: 5.023
 │
│   Emission Time: 34.987   
 │
│ Execution time: 46.277 ms 
 │
└┘
(12 rows)

json (excerpt):
│ "Triggers": [   ↵│
│ ],  ↵│
│ "JIT": {↵│
│   "Functions": 4,   ↵│
│   "Inlining": false,↵│
│   "Inlining Time": 0.000,   ↵│
│   "Optimization": false,↵│
│   "Optimization Time": 9.701,   ↵│
│   "Emission Time": 52.951   ↵│
│ },  ↵│
│ "Execution Time": 70.292↵│


I'm not at all wedded to the current format, but I feel like that's the
basic functionality needed?

Right now the JIT bit will only be displayed if at least one JITed
function has been emitted. Otherwise we'll just create noise for
everyone.

Currently a handful of explain outputs

Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-03-12 Thread Tom Lane

Michail Nikolaev  writes:
> [ offset_index_only_v4.patch ]

I started to go through this patch with the intention of committing it,
but the more I looked at it the less happy I got.  What you've done to
IndexNext() is a complete disaster from a modularity standpoint: it now
knows all about the interactions between index_getnext, index_getnext_tid,
and index_fetch_heap.  Or that is, it knows too much and still not enough,
because it's flat out wrong for the case that xs_continue_hot is set.
You can't call index_getnext_tid when that's still true from last time.

I'm not sure about a nicer way to refactor that, but I'd suggest that
maybe you need an additional function in indexam.c that hides all this
knowledge about the internal behavior of an IndexScanDesc.

The PredicateLockPage call also troubles me quite a bit, not only from
a modularity standpoint but because that implies a somewhat-user-visible
behavioral change when this optimization activates.  People who are using
serializable mode are not going to find it to be an improvement if their
queries fail and need retries a lot more often than they did before.
I don't know if that problem is bad enough that we should disable skipping
when serializable mode is active, but it's something to think about.

You haven't documented the behavior required for tuple-skipping in any
meaningful fashion, particularly not the expectation that the child plan
node will still return tuples that just need not contain any valid
content.  Also, this particular implementation of that:

+   ExecClearTuple(slot);
+   slot->tts_isempty = false;
+   slot->tts_nvalid = 0;

is a gross hack and probably wrong.  You could use ExecStoreAllNullTuple,
perhaps.

I'm disturbed by the fact that you've not taught the planner about the
potential cost saving from this, so that it won't have any particular
reason to pick a regular indexscan over some other plan type when this
optimization applies.  Maybe there's no practical way to do that, or maybe
it wouldn't really matter in practice; I've not looked into it.  But not
doing anything feels like a hack.

Setting this back to Waiting on Author.

regards, tom lane

Re: JIT compiling with LLVM v11

2018-03-12 Thread Peter Eisentraut

On 3/12/18 13:05, Andres Freund wrote:
>> will *not* use JIT, becaue jit_expressions applies at execution time.
> Right.  It'd be easy to change that so jit_expressions=off wouldn't have
> an effect there anymore.  But I'm not sure we want that? I don't have a
> strong feeling about this, except that I think jit_above_cost etc should
> apply at plan, not execution time.

I lean toward making everything apply at plan time.  Not only is that
easier in the current code structure, but over time we'll probably want
to add more detailed planner knobs, e.g., perhaps an alternative
cpu_tuple_cost, and all of that would be a planner setting.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: Configurable file mode mask

2018-03-12 Thread Stephen Frost

Michael, David,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Fri, Mar 09, 2018 at 01:51:14PM -0500, David Steele wrote:
> > How about a GUC that enforces one mode or the other on startup?  Default
> > would be 700.  The GUC can be set automatically by initdb based on the
> > -g option.  We had this GUC originally, but since the front-end tools
> > can't read it we abandoned it.  Seems like it would be good as an
> > enforcing mechanism, though.
> 
> Hm.  OK.  I can see the whole set of points about that.  Please let me
> think a bit more about that bit.  Do you think that there could be a
> pool of users willing to switch from one mode to another?  Compared to
> your v1, we could indeed have a GUC which enforces a restriction to not
> allow group access, and enabled by default.  As the commit fest is
> running and we don't have a clear picture yet, I am afraid that it may
> be better to move that to v12, and focus on getting patches 1 and 2
> committed. This will provide a good base for the next move.

We already had a discussion about having a GUC for this and concluded,
rightly in my view, that it's not sensible to have since we don't want
all of the various tools having to read and parse out postgresql.conf.

I don't see anything in the discussion which has changed that and I
don't agree that there's an issue with using the privileges on the data
directory for this- it's a simple solution which all of the tools can
use and work with easily.  I certainly don't agree that it's a serious
issue to relax the explicit check- it's just a check, which a user could
implement themselves if they wished to and had a concern for.  On the
other hand, with the explicit check, we are actively preventing an
entirely reasonable goal of wanting to use a read-only role to perform a
backup of the system.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Re: [GSOC 18] Performance Farm Project——Initialization Project

2018-03-12 Thread Dave Page

Hi

On Mon, Mar 12, 2018 at 9:57 AM, Hongyuan Ma  wrote:

> Hi Dave,
> Thank you for your reminder. This is indeed my negligence.
> In fact, I have browsed the code in the pgperffarm.git
>  repository,
> including the client folder and the web folder. However, I found that the
> web folder has some unfinished html files and does not contain model class
> files. And the project used Django 1.8 without importing the Django REST
> Framework (When I talked to Mark about the PerfFarm project, he told me he
> insisted on using Django 1.11 and considered using the REST framework). So
> I mistakenly thought that the code in the web folder had been shelved.
>

Nope, not at all. It just wasn't updated to meet the latest PG
infrastructure requirements yet (basically just the update to Django 11).
The rest should be fine as-is.


>
> In the newly initialized Django application, I upgraded its version and
> assigned the directory to make the project structure clearer. I want to use
> a separate front-end development approach (The front-end applications will
> use vue.js for programming.). I hope you can allow me to use it instead of
> the old one. I am willing to integrate the auth module into this new
> application as soon as possible.
>

I would much prefer to see jQuery + React, purely because there are likely
more PostgreSQL developers (particularly from the pgAdmin team) that know
those technologies. It is important to consider long term maintenance as
well as ease of initial development with any project.

Thanks.


>
> Regards,
> Hongyuan Ma (cs_maleica...@163.com)
>
> 在 2018-03-12 02:26:43，"Dave Page"  写道：
>
> Hi
>
> Maybe I’m missing something (I’ve been offline a lot recently for
> unavoidable reasons), but the perf farm project already has a Django
> backend initialised and configured to work with community auth, on
> community infrastructure.
>
> https://git.postgresql.org/gitweb/?p=pgperffarm.git;a=summary
>
> On Sunday, March 11, 2018, Hongyuan Ma  wrote:
>
>> Hello, mark
>> I initialized a Django project and imported the Django REST Framework.
>> Its github address is: https://github.com/PGPerfFarm/server-code
>> I created some model classes and also wrote scripts in the dbtools folder
>> to import simulation data for development. I am hesitant to use admin or
>> xadmin as the administrator's backend for the site (I prefer to use xadmin).
>>
>> I also initialized the website's front-end application. Here is its
>> address: https://github.com/PGPerfFarm/front-end-code.git
>> I wrote the header component as shown:
>>
>> I hope this effect can enhance the future user experience:).
>> This application uses vue.js and element-ui, but if you insist on using
>> other js libraries or not using js, please let me know. I will empty this
>> project and then rewrite it as you wish.
>>
>> My next step is to determine the necessary api interface and the
>> corresponding components of the front-end application. Then implement them
>> one by one.
>> Finally, as you can see, I created an organization named PGPerfFarm on
>> github without permission. I sent you an invitation letter, and after you
>> join, I am willing to hand over the administrator of the organization to
>> you.
>>
>>
>> Regards,
>> Hongyuan Ma (cs_maleica...@163.com)
>>
>>
>>
>>
>
>
> --
> Dave Page
> Blog: http://pgsnake.blogspot.com
> Twitter: @pgsnake
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
>
>



-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Additional Statistics Hooks

2018-03-12 Thread Euler Taveira

2018-03-12 14:03 GMT-03:00 Mat Arye :
> I have a question about statistics hooks. I am trying to teach the planner
> that when grouping by something like date_trunc('1 day', time) will produce
> a lot less rows than the number of distinct time values. I want to do that
> in an extension. The problem is that I don't see a way to make the
> get_relation_stats_hook work well fo that since by the time it's called you
> only see the `time` var and not the full expression. None of the other hooks
> seem appropriate either. So 2 questions:
>
Isn't it the case to extend the available hook?

> 1) Would people be opposed to adding a code hook somewhere at the start of
> `examine_variable` (selfuncs.c) to allow creating statistics on complete
> expressions? I can submit a patch if this seems reasonable.
>
If you explain the use case maybe it could be considered.

> 2) Do patches that add code hooks (and are probably under 10 lines) need to
> go through the entire commitfest process. I guess what I am really asking is
> if PG12 would be the first version such a patch could appear in or is PG11
> still a possibility? Just wondering what the policy on such stuff is.
>
If it is a new feature and is not in the last CF, it won't be
considered for v11 (even small patches).


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Function to track shmem reinit time

2018-03-12 Thread Grigory Smolkin



On 03/03/2018 09:00 PM, Peter Eisentraut wrote:

I find this premise a bit dubious.  Why have a log file if it's too big
to find anything in it?  Server crashes aren't the only thing people are
interested in.  So we'll need a function for "last $anything".


Thank you for your interest in this topic.
Well, on heavy loaded machine log file will be big, because you usually 
want to log every query for later analysis, and, because postgres is 
dumping everything in a single file, searching for some specific error 
will be slow. Log file is certainly needed for digging the details of a 
crash, but pg_shmem_init_time is more about online monitoring.


On 03/03/2018 09:43 PM, Justin Pryzby wrote:


I think one can tell if it's crashed recently by comparing start time of parent
postmaster and its main children (I'm going to go put this in place for myself
now).

ts=# SELECT backend_type, backend_start, pg_postmaster_start_time(), 
backend_start-pg_postmaster_start_time() FROM pg_stat_activity ORDER BY 
backend_start LIMIT 1;
 backend_type | backend_start |   
pg_postmaster_start_time|?column?
-+---+---+-
  autovacuum launcher | 2018-03-02 00:21:11.604468-03 | 2018-03-02 
00:12:46.757642-03 | 00:08:24.846826


Thank you, though we must take into account the fact that autovacuum may 
not be enabled so it`s better to use checkpointer or bgwriter. It`s 
working solution, yes, but it looks more like workaround and requires 
from user an understanding of postgres internals.



On 03/04/2018 06:56 PM, Tomas Vondra wrote:


Can you please explain why pg_postmaster_start_time can't be used for
this purpose? It seems like a pretty good match, considering it's meant
to show server start time.


Because a backend crash do not reset pg_postmaster_start_time.

On 03/04/2018 07:02 PM, Tomas Vondra wrote:

On 02/28/2018 01:11 PM, Anastasia Lubennikova wrote:

Attached patch introduces a new function pg_shmem_init_time(),
which returns the time shared memory was last (re)initialized.
It is created for use by monitoring tools to track backend crashes.

Currently, if the 'restart_after_crash' option is on, postgres will
just restart. And the only way to know that it happened is to
regularly parse logfile or monitor it, catching restart messages.
This approach is really inconvenient for users, who have gigabytes of
logs.

This new function can be periodiacally called by a monitoring agent,
and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/
we know that server crashed-restarted, and also know the exact time,
when.


I don't think it really solves the problem, though. For example if the
whole VM reboots (which can be a matter of seconds), this check will say
"shmem_init_time == pg_postmaster_start_time" and you've not detected
anything.

IMHO pg_postmaster_start_time is the right way to monitor uptime, and
the right way to detect spurious restarts is to remember the last value
you've seen and compare it to the current one.


Yes, for the described case with VM restart pg_postmaster_start_time 
works fine.
pg_postmaster_start_time is essential for almost every case and 
pg_shmem_init_time only expand his usefulness, not diminish.


On 03/04/2018 07:09 PM, Tom Lane wrote:

It evidently depends on how you want to define "server uptime".  If you
get backend crashes often enough, you might feel a need to define it
as "time since last crash".  Although I would think that if that's
happening regularly in a production environment, you have a problem
you need to fix, not just measure.


Absolutely. And for that you need to know ASAP that backend crash 
happened in the first place.


On 03/04/2018 07:09 PM, Tom Lane wrote:

My own thought about that is that if you are trying to measure
backend crashes, just knowing the time of the latest one is little help.
You want to know how often they're happening.  So this gets back to the
question of why the postmaster log isn't a useful source of that
information.


For that purpose pg_shmem_init_time also can be used.
For example we can build a chart based on values from "select (now() - 
pg_shmem_init_time());" taken, for example, every 10 seconds.
Values around 0 =< x =<10 will signal about memory reinitialization 
which is usually a byproduct of a backend crash.


On 03/04/2018 07:09 PM, Tom Lane wrote:

  I think that if we're to do anything in this area,
improving the usefulness of the log would be more important than
providing the proposed function.


The separation of maintenance messages and query messaged into different 
log files is sorely needed.

This way server errors can be identified fast and in convenient way.
But as I mentioned earlier, pg_shmem_init_time() is about online monitoring.


On 03/04/2018 07:02 PM, Tomas Vondra wrote:

Actually, after looking at the code a bit, I think that test would not
really work anyway, because those two

Re: Ambigous Plan - Larger Table on Hash Side

2018-03-12 Thread Tom Lane

Andres Freund  writes:
> Not sure I follow. Unless the values are equivalent (i.e. duplicate key
> values), why should non-uniformity in key space translate to hash space?

Duplicates are exactly the problem.  See estimate_hash_bucket_stats.

> And if there's duplicates it shouldn't hurt much either, unless doing
> a semi/anti-join? All rows are going to be returned and IIRC we quite
> cheaply continue a bucket scan?

If the bucket containing the MCV is bigger than work_mem, you gotta
problem --- one not necessarily shared by the other relation.

regards, tom lane

Re: CURRENT OF causes an error when IndexOnlyScan is used

2018-03-12 Thread Tom Lane

Anastasia Lubennikova  writes:
> [ return_heaptuple_in_btree_indexonlyscan_v2.patch ]

I took a quick look at this, but I'm concerned about a couple of points:

1. What's the performance penalty of forming (and then deforming) the
added heap tuple?  We'd be paying it for every index-only scan, whether
or not any CURRENT OF fetch ever happened.

2. The constructed tuple provides tableoid and ctid all right, but it'd
still have garbage values for other system columns.  As the code stands,
we will properly error out if some attempt is made to fetch any of those
other columns, but with this change we'd just return the garbage.  This
is a minor point, but it doesn't seem negligible to me; it might've been
hard to identify the bug at hand if we'd not had the cross-check of not
building a heap tuple.

At this point Yugo-san's original patch is starting to look more
attractive.  It's still ugly, but at least it's not imposing a performance
cost on unrelated queries.

regards, tom lane

Re: Ambigous Plan - Larger Table on Hash Side

2018-03-12 Thread Andres Freund

On 2018-03-12 12:52:00 -0400, Tom Lane wrote:
> Narendra Pradeep U U  writes:
> >   Recently I came across a case where the planner choose larger table 
> > on hash side. I am not sure whether it is an intended  behavior or we are 
> > missing something. 
> 
> Probably the reason is that the smaller table has a less uniform
> distribution of the hash key.  You don't want to hash with a nonuniform
> distribution of the hashtable key; if many keys go into the same bucket
> then performance degrades drastically.

Not sure I follow. Unless the values are equivalent (i.e. duplicate key
values), why should non-uniformity in key space translate to hash space?
And if there's duplicates it shouldn't hurt much either, unless doing
a semi/anti-join? All rows are going to be returned and IIRC we quite
cheaply continue a bucket scan?

Greetings,

Andres Freund

Re: All Taxi Services need Index Clustered Heap Append

2018-03-12 Thread stalkthetiger

Have you looked at Timescale DB extension on postgresql?



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: JIT compiling with LLVM v11

2018-03-12 Thread Andres Freund

On 2018-03-12 11:21:36 -0400, Peter Eisentraut wrote:
> On 3/11/18 14:25, Andres Freund wrote:
> >> It's perhaps a bit confusing that some of the jit_* settings take effect
> >> at plan time and some at execution time.  At the moment, this mainly
> >> affects me reading the code ;-), but it would also have some effect on
> >> prepared statements and such.
> > Not quite sure what you mean?
> 
> I haven't tested this, but what appears to be the case is that
> 
> SET jit_above_cost = 0;
> PREPARE foo AS SELECT ;
> SET jit_above_cost = infinity;
> EXECUTE foo;
> 
> will use JIT, because jit_above_cost applies at plan time, whereas
> 
> SET jit_expressions = on;
> PREPARE foo AS SELECT ;
> SET jit_expressions = off;
> EXECUTE foo;
> 
> will *not* use JIT, becaue jit_expressions applies at execution time.

Right.  It'd be easy to change that so jit_expressions=off wouldn't have
an effect there anymore.  But I'm not sure we want that? I don't have a
strong feeling about this, except that I think jit_above_cost etc should
apply at plan, not execution time.

Greetings,

Andres Freund

Additional Statistics Hooks

2018-03-12 Thread Mat Arye

Hi All,

I have a question about statistics hooks. I am trying to teach the planner
that when grouping by something like date_trunc('1 day', time) will produce
a lot less rows than the number of distinct time values. I want to do that
in an extension. The problem is that I don't see a way to make
the get_relation_stats_hook work well fo that since by the time it's called
you only see the `time` var and not the full expression. None of the other
hooks seem appropriate either. So 2 questions:

1) Would people be opposed to adding a code hook somewhere at the start of
`examine_variable` (selfuncs.c) to allow creating statistics on complete
expressions? I can submit a patch if this seems reasonable.

2) Do patches that add code hooks (and are probably under 10 lines) need to
go through the entire commitfest process. I guess what I am really asking
is if PG12 would be the first version such a patch could appear in or is
PG11 still a possibility? Just wondering what the policy on such stuff is.

Thanks,
Mat
TimescaleDB

Re: Ambigous Plan - Larger Table on Hash Side

2018-03-12 Thread Tom Lane

Narendra Pradeep U U  writes:
>   Recently I came across a case where the planner choose larger table on 
> hash side. I am not sure whether it is an intended  behavior or we are 
> missing something. 

Probably the reason is that the smaller table has a less uniform
distribution of the hash key.  You don't want to hash with a nonuniform
distribution of the hashtable key; if many keys go into the same bucket
then performance degrades drastically.

regards, tom lane

Re: Cast jsonb to numeric, int, float, bool

2018-03-12 Thread Tom Lane

=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?=  writes:
> But what would be the scenario of failure if we have an implicit cast from
> jsonb datatype (that likely already parsed the number internally, or knows
> it holds non-numeric value) to numeric, that returns an error if it's
> unable to perform the conversion?

One fundamental problem is this: what's special about a cast to numeric?
There's no reason to assume that a jsonb field is more likely to be
numeric than anything else.  But basically you only get to have one of
these.  If we put this in, and then somebody else comes along and proposes
an implicit cast to text, or an implicit cast to timestamp, then
everything is broken, because the parser will no longer be able to
resolve max(jsonb) --- it won't know which implicit cast to apply.

Another fundamental problem is that implicit casts mask mistakes.
If there's an implicit cast to numeric, that applies everywhere not
only where it was what you meant.  For some context on this you might
go back to the archives around the time of 8.3, where we actually
removed a bunch of implicit casts because they led to too many
surprising behaviors.  Restricting implicit casts to the same type
category is a rule-of-thumb for reducing the potential surprise factor.

The cast notation is important here because it lets/makes the user
specify which semantics she wants for the converted value.  I think
it's about equally likely for people to be converting JSON fields to text,
or (some form of) numeric, or datetime, so I don't think it's appropriate
to privilege one of those over all others.

> What would be other options, if not implicit cast?

Explicit casts, ie (jsonvar->'fieldname')::numeric.  Yeah, you have
to write a bit more, but you don't get surprised by the way the
parser interpreted your query.

The other thing you can potentially do is use a variant operator
name, as we did for text output with ->>.  But that doesn't scale
to very many result types, because it's impossible to choose
readily mnemonic operator names.  So I'd stick with the casts.

regards, tom lane

Re: pgsql: Allow UNIQUE indexes on partitioned tables

2018-03-12 Thread Alvaro Herrera

David G. Johnston wrote:

> Something like:
> 
> When establishing a unique constraint for a multi-level partition hierarchy
> all the "partition by" columns of the target partitioned table, as well as
> those of all its descendant partitioned tables, must be included in the
> constraint definition.

Yeah, that seems better to me.  Pushed.

> If I understand the above then the following failing test would be a worthy
> addition to memorialize the behavior of ALTER TABLE ATTACH under this
> constraint.
> 
> create table idxpart (a int primary key, b int) partition by range (a);
> create table idxpart1 (a int not null, b int, primary key (a, b)) partition
> by range (a, b);
> alter table idxpart attach partition idxpart1 for values from (1) to (1000);

Included this one too.


Thanks for reading!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Ambigous Plan - Larger Table on Hash Side

2018-03-12 Thread Narendra Pradeep U U

Hi ,



  Recently I came across a case where the planner choose larger table on 
hash side. I am not sure whether it is an intended  behavior or we are missing 
something. 

  

 I have two tables (a and b) each with  single column in it. One table 'a' 
is large with around 30 million distinct rows and other table 'b' has merely 
70,000 rows with one-seventh (10,000) distinct rows. I have analyzed both the 
table.  But while joining both the table I get the larger table on hash side. 

tpch=# explain select b from b left join a on a = b;

   QUERY PLAN   
 

-

 Hash Left Join  (cost=824863.75..950104.42 rows=78264 width=4)

   Hash Cond: (b.b = a.a)o

   -  Foreign Scan on b  (cost=0.00..821.64 rows=78264 width=4)

 CStore File: /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879

 CStore File Size: 314587

   -  Hash  (cost=321721.22..321721.22 rows=30667722 width=4)

 -  Foreign Scan on a  (cost=0.00..321721.22 rows=30667722 width=4)

   CStore File: 
/home/likewise-open/pg96/data/cstore_fdw/1818708/1849876

   CStore File Size: 123236206

(9 rows)






I would like to know the reason for choosing this plan and Is there a easy fix 
to prevent such plans (especially like this one where it choose a larger hash 
table) ?  



Thanks,

Pradeep

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-12 Thread Tom Lane

I wrote:
> Maybe this type of situation is an argument for trusting an ANALYZE-based
> estimate more than the VACUUM-based estimate.  I remain uncomfortable with
> that in cases where VACUUM looked at much more of the table than ANALYZE
> did, though.  Maybe we need some heuristic based on the number of pages
> actually visited by each pass?

I looked into doing something like that.  It's possible, but it's fairly
invasive; there's no clean way to compare those page counts without
altering the API of acquire_sample_rows() to pass back the number of pages
it visited.  That would mean a change in FDW-visible APIs.  We could do
that, but I don't want to insist on it when there's nothing backing it up
except a fear that *maybe* ANALYZE's estimate will be wrong often enough
to worry about.

So at this point I'm prepared to go forward with your patch, though not
to risk back-patching it.  Field experience will tell us if we need to
do more.  I propose the attached cosmetic refactoring, though.

regards, tom lane

diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 3cfbc08..474c3bd 100644
*** a/contrib/pgstattuple/pgstatapprox.c
--- b/contrib/pgstattuple/pgstatapprox.c
*** statapprox_heap(Relation rel, output_typ
*** 184,190 
  
  	stat->table_len = (uint64) nblocks * BLCKSZ;
  
! 	stat->tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned,
  			   stat->tuple_count + misc_count);
  
  	/*
--- 184,190 
  
  	stat->table_len = (uint64) nblocks * BLCKSZ;
  
! 	stat->tuple_count = vac_estimate_reltuples(rel, nblocks, scanned,
  			   stat->tuple_count + misc_count);
  
  	/*
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5f21fcb..ef93fb4 100644
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
*** acquire_sample_rows(Relation onerel, int
*** 1249,1267 
  		qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
  
  	/*
! 	 * Estimate total numbers of rows in relation.  For live rows, use
! 	 * vac_estimate_reltuples; for dead rows, we have no source of old
! 	 * information, so we have to assume the density is the same in unseen
! 	 * pages as in the pages we scanned.
  	 */
- 	*totalrows = vac_estimate_reltuples(onerel, true,
- 		totalblocks,
- 		bs.m,
- 		liverows);
  	if (bs.m > 0)
  		*totaldeadrows = floor((deadrows / bs.m) * totalblocks + 0.5);
  	else
  		*totaldeadrows = 0.0;
  
  	/*
  	 * Emit some interesting relation info
--- 1249,1270 
  		qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
  
  	/*
! 	 * Estimate total numbers of live and dead rows in relation, extrapolating
! 	 * on the assumption that the average tuple density in pages we didn't
! 	 * scan is the same as in the pages we did scan.  Since what we scanned is
! 	 * a random sample of the pages in the relation, this should be a good
! 	 * assumption.
  	 */
  	if (bs.m > 0)
+ 	{
+ 		*totalrows = floor((liverows / bs.m) * totalblocks + 0.5);
  		*totaldeadrows = floor((deadrows / bs.m) * totalblocks + 0.5);
+ 	}
  	else
+ 	{
+ 		*totalrows = 0.0;
  		*totaldeadrows = 0.0;
+ 	}
  
  	/*
  	 * Emit some interesting relation info
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 7aca69a..b50c554 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*** vacuum_set_xid_limits(Relation rel,
*** 766,781 
   * vac_estimate_reltuples() -- estimate the new value for pg_class.reltuples
   *
   *		If we scanned the whole relation then we should just use the count of
!  *		live tuples seen; but if we did not, we should not trust the count
!  *		unreservedly, especially not in VACUUM, which may have scanned a quite
!  *		nonrandom subset of the table.  When we have only partial information,
!  *		we take the old value of pg_class.reltuples as a measurement of the
   *		tuple density in the unscanned pages.
-  *
-  *		This routine is shared by VACUUM and ANALYZE.
   */
  double
! vac_estimate_reltuples(Relation relation, bool is_analyze,
  	   BlockNumber total_pages,
  	   BlockNumber scanned_pages,
  	   double scanned_tuples)
--- 766,779 
   * vac_estimate_reltuples() -- estimate the new value for pg_class.reltuples
   *
   *		If we scanned the whole relation then we should just use the count of
!  *		live tuples seen; but if we did not, we should not blindly extrapolate
!  *		from that number, since VACUUM may have scanned a quite nonrandom
!  *		subset of the table.  When we have only partial information, we take
!  *		the old value of pg_class.reltuples as a measurement of the
   *		tuple density in the unscanned pages.
   */
  double
! vac_estimate_reltuples(Relation relation,
  	   BlockNumber total_pages,
  	   BlockNumber scanned_pages,
  	   double scanned_tuples)
*** vac_estimate_reltuples(Relation

Re: [HACKERS] proposal: schema variables

2018-03-12 Thread Pavel Stehule

2018-03-12 16:38 GMT+01:00 Pavel Luzanov :

>
> On 12.03.2018 09:54, Pavel Stehule wrote:
>
>
> 2018-03-12 7:49 GMT+01:00 Pavel Luzanov :
>
>>
>> Is there any chances that it will work on replicas?
>>
> ...
>
> sure, it should to work. Now, I am try to solve a issues on concept level
> - the LET code is based on DML code base, so probably there is check for rw
> transactions. But it is useless for LET command.
>
>
> Very, very good!
>
> As I understand, the work on this patch now in progress and it not in
> commitfest.
> Please explain what features of schema variables I can review now.
>
> From first post of this thread the syntax of the CREATE VARIABLE command:
> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type
>   [ DEFAULT expression ] [[NOT] NULL]
>   [ ON TRANSACTION END { RESET | DROP } ]
>   [ { VOLATILE | STABLE } ];
>

Now, it is too early for review - it is in development. Some features are
not implemented yet - DEFAULTs, ON TRANSACTION END .., others has not sense
(what I know now VOLATILE, STABLE). Schema variables are passed as
parameters to query, so the behave is like any other params - it is STABLE
only.


>
> But in psql I see only:
> \h create variable
> Command: CREATE VARIABLE
> Description: define a new permissioned typed schema variable
> Syntax:
> CREATE VARIABLE [ IF NOT EXISTS ] name [ AS ] data_type ]
>
> I can include DEFAULT clause in CREATE VARIABLE command, but the value not
> used:
> postgres=# create variable i int default 0;
> CREATE VARIABLE
> postgres=# select i;
>  i
> ---
>
> (1 row)
>
> postgres=# \d+ i
>  schema variable "public.i"
>  Column |  Type   | Storage
> +-+-
>  i  | integer | plain
>
>
defaults are not implemented yet


>
> BTW, I found an error in handling of table aliases:
>
> postgres=# create variable x text;
> CREATE VARIABLE
> postgres=# select * from pg_class AS x where x.relname = 'x';
> ERROR:  type text is not composite
>
> It thinks that x.relname is an attribute of x variable instead of an alias
> for pg_class table.
>
>
It is not well handled collision. This should be detected and prohibited.
In this case, because x is scalar, then x.xx has not sense, and then it
should not be handled like variable. So the current design is not too
practical - it generates more collisions than it is necessary and still,
there are some errors.

Now, there is one important question - storage - Postgres stores all
objects to files - only memory storage is not designed yet. This is part,
where I need a help.

Regards

Pavel


>
> -
> Pavel Luzanov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: pgsql: Allow UNIQUE indexes on partitioned tables

2018-03-12 Thread Alvaro Herrera

David Rowley wrote:
> On 20 February 2018 at 09:40, Alvaro Herrera  wrote:
> > Modified Files
> > --
> > doc/src/sgml/ddl.sgml |   9 +-
> 
> Attached is a very small fix to a small error this patch created in the docs.

Pushed, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: Unlogged tables re-initialization tests

2018-03-12 Thread Dagfinn Ilmari Mannsåker

David Steele  writes:

> On 3/12/18 11:27 AM, Peter Eisentraut wrote:
>> On 3/11/18 05:11, Michael Paquier wrote:
>>> On Fri, Mar 09, 2018 at 05:23:48PM -0500, Peter Eisentraut wrote:
 This seems like a useful test.

 On 3/5/18 12:35, David Steele wrote:
> +mkdir($tablespaceDir)
> + or die "unable to mkdir \"$tablespaceDir\"";

 Use BAIL_OUT instead of die in tests.
>>>
>>> Would it be better to make this practice more uniform?  From the code of
>>> the tests:
>>> $ git grep die -- */t/*.pl | wc -l
>>> 50
>> 
>> Yes, or maybe there is a way to "catch" die and turn it into BAIL_OUT?
>
> something like this should work:
>
> # Convert die to BAIL_OUT
> $SIG{__DIE__} = sub {BAIL_OUT(@_)};

$SIG{__DIE__} gets called even for exceptions that would be caught by a
surrunding eval block, so this should at the very least be:

$SIG{__DIE__} = sub { BAIL_OUT(@_) unless $^S };

However, that is still wrong, because die() and BAIL_OUT() mean
different things: die() aborts the current test script, while BAIL_OUT()
aborts the entire 'prove' run, i.e. all subsequent scripts in the same
test directory.

- ilmari
-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl

Re: pgsql: Local partitioned indexes

2018-03-12 Thread Alvaro Herrera

Amit Langote wrote:

> Sorry, I'd missed reporting one more sentence that doesn't apply anymore.
> Attached gets rid of that.

Thanks, applied.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] proposal: schema variables

2018-03-12 Thread Pavel Luzanov



On 12.03.2018 09:54, Pavel Stehule wrote:


2018-03-12 7:49 GMT+01:00 Pavel Luzanov >:



Is there any chances that it will work on replicas?

...

sure, it should to work. Now, I am try to solve a issues on concept 
level - the LET code is based on DML code base, so probably there is 
check for rw transactions. But it is useless for LET command.


Very, very good!

As I understand, the work on this patch now in progress and it not in 
commitfest.

Please explain what features of schema variables I can review now.

From first post of this thread the syntax of the CREATE VARIABLE command:
CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type
  [ DEFAULT expression ] [[NOT] NULL]
  [ ON TRANSACTION END { RESET | DROP } ]
  [ { VOLATILE | STABLE } ];

But in psql I see only:
\h create variable
Command: CREATE VARIABLE
Description: define a new permissioned typed schema variable
Syntax:
CREATE VARIABLE [ IF NOT EXISTS ] name [ AS ] data_type ]

I can include DEFAULT clause in CREATE VARIABLE command, but the value 
not used:

postgres=# create variable i int default 0;
CREATE VARIABLE
postgres=# select i;
 i
---

(1 row)

postgres=# \d+ i
 schema variable "public.i"
 Column |  Type   | Storage
+-+-
 i  | integer | plain


BTW, I found an error in handling of table aliases:

postgres=# create variable x text;
CREATE VARIABLE
postgres=# select * from pg_class AS x where x.relname = 'x';
ERROR:  type text is not composite

It thinks that x.relname is an attribute of x variable instead of an 
alias for pg_class table.



-
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PATCH: Unlogged tables re-initialization tests

2018-03-12 Thread David Steele

On 3/12/18 11:27 AM, Peter Eisentraut wrote:
> On 3/11/18 05:11, Michael Paquier wrote:
>> On Fri, Mar 09, 2018 at 05:23:48PM -0500, Peter Eisentraut wrote:
>>> This seems like a useful test.
>>>
>>> On 3/5/18 12:35, David Steele wrote:
 +mkdir($tablespaceDir)
 +  or die "unable to mkdir \"$tablespaceDir\"";
>>>
>>> Use BAIL_OUT instead of die in tests.
>>
>> Would it be better to make this practice more uniform?  From the code of
>> the tests:
>> $ git grep die -- */t/*.pl | wc -l
>> 50
> 
> Yes, or maybe there is a way to "catch" die and turn it into BAIL_OUT?

something like this should work:

# Convert die to BAIL_OUT
$SIG{__DIE__} = sub {BAIL_OUT(@_)};

-- 
-David
da...@pgmasters.net

Re: PATCH: Unlogged tables re-initialization tests

2018-03-12 Thread Peter Eisentraut

On 3/11/18 05:11, Michael Paquier wrote:
> On Fri, Mar 09, 2018 at 05:23:48PM -0500, Peter Eisentraut wrote:
>> This seems like a useful test.
>>
>> On 3/5/18 12:35, David Steele wrote:
>>> +mkdir($tablespaceDir)
>>> +   or die "unable to mkdir \"$tablespaceDir\"";
>>
>> Use BAIL_OUT instead of die in tests.
> 
> Would it be better to make this practice more uniform?  From the code of
> the tests:
> $ git grep die -- */t/*.pl | wc -l
> 50

Yes, or maybe there is a way to "catch" die and turn it into BAIL_OUT?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Jsonb transform for pl/python

2018-03-12 Thread Nikita Glukhov



On 01.02.2018 19:06, Peter Eisentraut wrote:

On 1/12/18 10:43, Aleksander Alekseev wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

LGTM.

The new status of this patch is: Ready for Committer

I've been working on polishing this a bit.  I'll keep working on it.  It
should be ready to commit soon.


Hi.

I have reviewed this patch.  Attached new 6th version of the patch with
v5-v6 delta-patch.

 * Added out of memory checks after the following function calls:
   - PyList_New()
   - PyDict_New()
   - PyString_FromStringAndSize()  (added PyString_FromJsonbValue())

 * Added Py_XDECREF() for key-value pairs and list elements after theirs
   appending because PyDict_SetItem() and PyList_Append() do not steal
   references (see also hstore_plpython code).

 * Removed unnecessary JsonbValue heap-allocations in PyObject_ToJsonbValue().

 * Added iterating to the end of iterator in PyObject_FromJsonb() for correct
   freeing of JsonbIterators.

 * Passed JsonbParseState ** to PyXxx_ToJsonbValue() functions.

 * Added transformation of Python tuples into JSON arrays because standard
   Python JSONEncoder encoder does the same.
   (See https://docs.python.org/2/library/json.html#py-to-json-table)


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



0001-jsonb_plpython-extension-v6.patch.gz
Description: application/gzip
diff --git a/contrib/jsonb_plpython/expected/jsonb_plpython2u.out b/contrib/jsonb_plpython/expected/jsonb_plpython2u.out
index 7073d52..7d29589 100644
--- a/contrib/jsonb_plpython/expected/jsonb_plpython2u.out
+++ b/contrib/jsonb_plpython/expected/jsonb_plpython2u.out
@@ -341,8 +341,22 @@ SELECT testDecimal();
  255
 (1 row)
 
+-- tuple -> jsonb
+CREATE FUNCTION testTuple() RETURNS jsonb
+LANGUAGE plpython2u
+TRANSFORM FOR TYPE jsonb
+AS $$
+x = (1, 'String', None)
+return x
+$$;
+SELECT testTuple();
+  testtuple  
+-
+ [1, "String", null]
+(1 row)
+
 DROP EXTENSION plpython2u CASCADE;
-NOTICE:  drop cascades to 17 other objects
+NOTICE:  drop cascades to 18 other objects
 DETAIL:  drop cascades to extension jsonb_plpython2u
 drop cascades to function test1(jsonb)
 drop cascades to function test1complex(jsonb)
@@ -360,6 +374,7 @@ drop cascades to function test1set()
 drop cascades to function testcomplexnumbers()
 drop cascades to function testrange()
 drop cascades to function testdecimal()
+drop cascades to function testtuple()
 -- test jsonb int -> python int
 CREATE EXTENSION jsonb_plpythonu CASCADE;
 NOTICE:  installing required extension "plpythonu"
diff --git a/contrib/jsonb_plpython/expected/jsonb_plpython3u.out b/contrib/jsonb_plpython/expected/jsonb_plpython3u.out
index 5acd45c..2492858 100644
--- a/contrib/jsonb_plpython/expected/jsonb_plpython3u.out
+++ b/contrib/jsonb_plpython/expected/jsonb_plpython3u.out
@@ -340,8 +340,22 @@ SELECT testDecimal();
  255
 (1 row)
 
+-- tuple -> jsonb
+CREATE FUNCTION testTuple() RETURNS jsonb
+LANGUAGE plpython2u
+TRANSFORM FOR TYPE jsonb
+AS $$
+x = (1, 'String', None)
+return x
+$$;
+SELECT testTuple();
+  testtuple  
+-
+ [1, "String", null]
+(1 row)
+
 DROP EXTENSION plpython3u CASCADE;
-NOTICE:  drop cascades to 17 other objects
+NOTICE:  drop cascades to 18 other objects
 DETAIL:  drop cascades to extension jsonb_plpython3u
 drop cascades to function test1(jsonb)
 drop cascades to function test1complex(jsonb)
@@ -359,3 +373,4 @@ drop cascades to function test1set()
 drop cascades to function testcomplexnumbers()
 drop cascades to function testrange()
 drop cascades to function testdecimal()
+drop cascades to function testtuple()
diff --git a/contrib/jsonb_plpython/jsonb_plpython.c b/contrib/jsonb_plpython/jsonb_plpython.c
index 28f7a3f..705e82f 100644
--- a/contrib/jsonb_plpython/jsonb_plpython.c
+++ b/contrib/jsonb_plpython/jsonb_plpython.c
@@ -21,7 +21,7 @@ static PyObject *decimal_constructor;
 
 static PyObject *PyObject_FromJsonb(JsonbContainer *jsonb);
 static JsonbValue *PyObject_ToJsonbValue(PyObject *obj,
-	  JsonbParseState *jsonb_state);
+	  JsonbParseState **jsonb_state, bool is_elem);
 
 #if PY_MAJOR_VERSION >= 3
 typedef PyObject *(*PLyUnicode_FromStringAndSize_t)
@@ -53,6 +53,43 @@ _PG_init(void)
 #define PLyUnicode_FromStringAndSize (PLyUnicode_FromStringAndSize_p)
 
 /*
+ * PyString_FromJsonbValue
+ *
+ * Transform string JsonbValue into python string
+ */
+static PyObject *
+PyString_FromJsonbValue(JsonbValue *jbv)
+{
+	PyObject   *str;
+
+	Assert(jbv->type == jbvString);
+
+	str = PyString_FromStringAndSize(jbv->val.string.val, jbv->val.string.len);
+
+	if (!str)
+		ereport(ERROR,
+(errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));
+
+	return str;
+}
+
+/*
+ * PyString_ToJsonbValue
+ *
+ * Transform

Re: Cast jsonb to numeric, int, float, bool

2018-03-12 Thread Komяpa

Hi Tom,


> I hadn't been following this thread particularly, but I happened to notice
> this bit, and I thought I'd better pop up to say No Way.  There will be
> *no* implicit casts from json to any numeric type.  We have learned the
> hard way that implicit cross-category casts are dangerous.
>

I can see how a cast from text can be problematic, especially before the
'unknown' type.

But what would be the scenario of failure if we have an implicit cast from
jsonb datatype (that likely already parsed the number internally, or knows
it holds non-numeric value) to numeric, that returns an error if it's
unable to perform the conversion?

The issue I think is that jsonb is special because it is in fact container.
We've got dot-notation to unpack things from composite TYPE, and we've got
arrow-notation to unpack things from jsonb, but such unpacking cannot take
part in further computations, even though it's visually compatible.

What would be other options, if not implicit cast?

Darafei Praliaskouski,
GIS Engineer / Juno Minsk

Re: JIT compiling with LLVM v11

2018-03-12 Thread Peter Eisentraut

On 3/11/18 14:25, Andres Freund wrote:
>> It's perhaps a bit confusing that some of the jit_* settings take effect
>> at plan time and some at execution time.  At the moment, this mainly
>> affects me reading the code ;-), but it would also have some effect on
>> prepared statements and such.
> Not quite sure what you mean?

I haven't tested this, but what appears to be the case is that

SET jit_above_cost = 0;
PREPARE foo AS SELECT ;
SET jit_above_cost = infinity;
EXECUTE foo;

will use JIT, because jit_above_cost applies at plan time, whereas

SET jit_expressions = on;
PREPARE foo AS SELECT ;
SET jit_expressions = off;
EXECUTE foo;

will *not* use JIT, becaue jit_expressions applies at execution time.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Cast jsonb to numeric, int, float, bool

2018-03-12 Thread Tom Lane

Nikita Glukhov  writes:
> On 01.03.2018 11:19, Darafei "Komяpa" Praliaskouski wrote:
>> I would expect some casts to be implicit, so that chaining with other 
>> functions is possible:

> I think that only cast to a numeric type can be made implicit, because 
> it does not lose precision.

I hadn't been following this thread particularly, but I happened to notice
this bit, and I thought I'd better pop up to say No Way.  There will be
*no* implicit casts from json to any numeric type.  We have learned the
hard way that implicit cross-category casts are dangerous.  See the
advice in the CREATE CAST man page:

It is wise to be conservative about marking casts as implicit. An
overabundance of implicit casting paths can cause PostgreSQL to choose
surprising interpretations of commands, or to be unable to resolve
commands at all because there are multiple possible interpretations. A
good rule of thumb is to make a cast implicitly invokable only for
information-preserving transformations between types in the same
general type category. For example, the cast from int2 to int4 can
reasonably be implicit, but the cast from float8 to int4 should
probably be assignment-only. Cross-type-category casts, such as text
to int4, are best made explicit-only.

regards, tom lane

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-12 Thread Tom Lane

David Gould  writes:
> On Wed, 7 Mar 2018 21:39:08 -0800
> Jeff Janes  wrote:
>> As for preventing it in the first place, based on your description of your
>> hardware and operations, I was going to say you need to increase the max
>> number of autovac workers, but then I remembered you from "Autovacuum slows
>> down with large numbers of tables. More workers makes it slower" (
>> https://www.postgresql.org/message-id/20151030133252.3033.4249%40wrigleys.postgresql.org).
>> So you are probably still suffering from that?  Your patch from then seemed
>> to be pretty invasive and so controversial.

> We have been building from source using that patch for the worker contention
> since then. It's very effective, there is no way we could have continued to
> rely on autovacuum without it. It's sort of a nuisance to keep updating it
> for each point release that touches autovacuum, but here we are.

Re-reading that thread, it seems like we should have applied Jeff's
initial trivial patch[1] (to not hold AutovacuumScheduleLock across
table_recheck_autovac) rather than waiting around for a super duper
improvement to get agreed on.  I'm a bit tempted to go do that;
if nothing else, it seems simple enough to back-patch, unlike most
of the rest of what was discussed.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/CAMkU%3D1zQUAV6Zv3O7R5BO8AfJO%2BLAw7satHYfd%2BV2t5MO3Bp4w%40mail.gmail.com

Re: FOR EACH ROW triggers on partitioned tables

2018-03-12 Thread Peter Eisentraut

On 3/9/18 15:41, Alvaro Herrera wrote:
>> One thing I'd like to add before claiming this committable (backend-
>> side) is enabling constraint triggers.  AFAICT that requires a bit of
>> additional logic, but it shouldn't be too terrible.  This would allow
>> for deferrable unique constraints, for example.
> 
> v7 supports constraint triggers.  I added an example using a UNIQUE
> DEFERRABLE constraint, and another one using plain CREATE CONSTRAINT TRIGGER.
> It's neat to see that the WHEN clause is executed at the time of the
> operation, and the trigger action is deferred (or not) till COMMIT time.

I'm not sure why you have the CommandCounterIncrement() changes in
separate patches.

It looks like there are some test cases that are essentially duplicates,
e.g.,

+create trigger failed before insert or update or delete on parted_trig
+  for each row execute procedure trigger_nothing();

+create trigger trig_ins_before_1 before insert on parted_stmt_trig
+  for each row execute procedure trigger_notice();

Perhaps the latter is supposed to be testing statement triggers instead?

Some documentation updates are needed, at least in catalogs.sgml and
CREATE TRIGGER reference page.

The argument names of CreateTrigger() are slightly different in the .h
and .c files.

I'm wondering about deferrable unique constraint triggers.  In index.c,
the CreateTrigger() call doesn't pass any parent trigger OID.  How is
this meant to work?  I mean, it does work, it seems.  Some comments maybe.

In CloneRowTriggersToPartition(), for this piece

+   /*
+* We only clone a) FOR EACH ROW triggers b) timed AFTER
events, c)
+* that are not constraint triggers.
+*/
+   if (!TRIGGER_FOR_ROW(trigForm->tgtype) ||
+   !TRIGGER_FOR_AFTER(trigForm->tgtype) ||
+   OidIsValid(trigForm->tgconstraint))
+   continue;

I would rather have some elog(ERROR)'s if it finds triggers it can't
support instead of silently skipping them.

What is the story with transition tables?  Why are they not supported?
I don't understand this comment in CreateTrigger():

+   /*
+* Disallow use of transition tables.  If this partitioned table
+* has any partitions, the error would occur below; but if it
+* doesn't then we would only hit that code when the first CREATE
+* TABLE ... PARTITION OF is executed, which is too late.  Check
+* early to avoid the problem.
+*/

Earlier in the thread, others have indicated that transition tables
should work.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-12 Thread Jeevan Chalke

On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
>
> GroupPathExtraData now completely absorbs members from and replaces
> OtherUpperPathExtraData. This means that we have to consider a way to
> pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
> haven't tried it in this patch.
>

Initially, I was passing OtherUpperPathExtraData to FDW. But now we need to
pass GroupPathExtraData.

However, since GetForeignUpperPaths() is a generic function for all upper
relations, we might think of renaming this struct to UpperPathExtraData.
Add an UpperRelationKind member to it
Which will be used to distinguish the passed in extra data. But now we only
have extra data for grouping only, I chose not to do that here. But
someone, when needed, may choose this approach.


> With this patch there's a failure in partition_aggregation where the
> patch is creating paths with MergeAppend with GatherMerge underneath.
> I think this is related to the call
> add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
> true. But I didn't investigate it further.
>

I fixed it. We need to pass  is_partial_agg instead of
extra->partial_partitionwise_grouping while calling
add_paths_to_partial_grouping_rel() in case of parallelism.


> With those two things remaining I am posting this patch, so that
> Jeevan Chalke can merge this patch into his patches and also merge
> some of his changes related to mine and Robert's changes. Let me know
> if this refactoring looks good.
>

Will rebase my changes tomorrow.


>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 87ea18c..ed16f7d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -353,7 +353,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 			 UpperRelationKind stage,
 			 RelOptInfo *input_rel,
 			 RelOptInfo *output_rel,
-			 OtherUpperPathExtraData *extra);
+			 GroupPathExtraData *group_extra);
 
 /*
  * Helper functions
@@ -429,7 +429,7 @@ static void add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 static void add_foreign_grouping_paths(PlannerInfo *root,
 		   RelOptInfo *input_rel,
 		   RelOptInfo *grouped_rel,
-		   OtherUpperPathExtraData *extra);
+		   GroupPathExtraData *group_extra);
 static void apply_server_options(PgFdwRelationInfo *fpinfo);
 static void apply_table_options(PgFdwRelationInfo *fpinfo);
 static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
@@ -5235,7 +5235,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 static void
 postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 			 RelOptInfo *input_rel, RelOptInfo *output_rel,
-			 OtherUpperPathExtraData *extra)
+			 GroupPathExtraData *group_extra)
 {
 	PgFdwRelationInfo *fpinfo;
 
@@ -5255,7 +5255,7 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 	fpinfo->pushdown_safe = false;
 	output_rel->fdw_private = fpinfo;
 
-	add_foreign_grouping_paths(root, input_rel, output_rel, extra);
+	add_foreign_grouping_paths(root, input_rel, output_rel, group_extra);
 }
 
 /*
@@ -5268,7 +5268,7 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 static void
 add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		   RelOptInfo *grouped_rel,
-		   OtherUpperPathExtraData *extra)
+		   GroupPathExtraData *group_extra)
 {
 	Query	   *parse = root->parse;
 	PgFdwRelationInfo *ifpinfo = input_rel->fdw_private;
@@ -5288,16 +5288,16 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	 * Store passed-in target and havingQual in fpinfo. If its a foreign
 	 * partition, then path target and HAVING quals fetched from the root are
 	 * not correct as Vars within it won't match with this child relation.
-	 * However, server passed them through extra and thus fetch from it.
+	 * However, server passed them through group_extra and thus fetch from it.
 	 */
-	if (extra)
+	if (group_extra)
 	{
 		/* Partial aggregates are not supported. */
-		if (extra->isPartialAgg)
+		if (group_extra->partial_partitionwise_grouping)
 			return;
 
-		fpinfo->grouped_target = extra->relTarget;
-		fpinfo->havingQual = extra->havingQual;
+		fpinfo->grouped_target = group_extra->target;
+		fpinfo->havingQual = group_extra->havingQual;
 	}
 	else
 	{
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1611975..9a34bf9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -109,43 +109,6 @@ typedef struct
 	int		   *tleref_to_colnum_map;
 }

Re:Google Summer of Code: Potential Applicant

2018-03-12 Thread Hongyuan Ma

Hi,


Maybe you can try searching for the name of the tutor for the item you are 
interested in on this page below.


https://www.postgresql.org/list/pgsql-hackers/


Then you can find the email address of the tutor you want to contact, and then 
share your thoughts with him.
It is exactly what I have done. Hope it can help you.


Regards,
Hongyuan Ma (cs_maleica...@163.com) 





At 2018-03-12 21:48:21, "Christos Maris"  wrote:

Hey there,


My name is Christos (Chris for short) and I would love to work with you via the 
GSoC program this summer.


I am sending this mail because I am in a need for some instructions on how to 
find a mentor and a project to work on.


Can anyone help me with that? Also is there any potential mentor here?


Any info helps.


Thanks a lot in advance!

Re:Re: [GSOC 18] Performance Farm Project——Initialization Project

2018-03-12 Thread Hongyuan Ma

Hi Dave,
Thank you for your reminder. This is indeed my negligence.
In fact, I have browsed the code in the pgperffarm.git repository, including 
the client folder and the web folder. However, I found that the web folder has 
some unfinished html files and does not contain model class files. And the 
project used Django 1.8 without importing the Django REST Framework (When I 
talked to Mark about the PerfFarm project, he told me he insisted on using 
Django 1.11 and considered using the REST framework). So I mistakenly thought 
that the code in the web folder had been shelved.


In the newly initialized Django application, I upgraded its version and 
assigned the directory to make the project structure clearer. I want to use a 
separate front-end development approach (The front-end applications will use 
vue.js for programming.). I hope you can allow me to use it instead of the old 
one. I am willing to integrate the auth module into this new application as 
soon as possible.




Regards,
Hongyuan Ma (cs_maleica...@163.com) 

在 2018-03-12 02:26:43，"Dave Page"  写道：
Hi


Maybe I’m missing something (I’ve been offline a lot recently for unavoidable 
reasons), but the perf farm project already has a Django backend initialised 
and configured to work with community auth, on community infrastructure.


https://git.postgresql.org/gitweb/?p=pgperffarm.git;a=summary

On Sunday, March 11, 2018, Hongyuan Ma  wrote:

Hello, mark
I initialized a Django project and imported the Django REST Framework. Its 
github address is: https://github.com/PGPerfFarm/server-code
I created some model classes and also wrote scripts in the dbtools folder to 
import simulation data for development. I am hesitant to use admin or xadmin as 
the administrator's backend for the site (I prefer to use xadmin).


I also initialized the website's front-end application. Here is its address: 
https://github.com/PGPerfFarm/front-end-code.git
I wrote the header component as shown:


I hope this effect can enhance the future user experience:).
This application uses vue.js and element-ui, but if you insist on using other 
js libraries or not using js, please let me know. I will empty this project and 
then rewrite it as you wish.


My next step is to determine the necessary api interface and the corresponding 
components of the front-end application. Then implement them one by one.
Finally, as you can see, I created an organization named PGPerfFarm on github 
without permission. I sent you an invitation letter, and after you join, I am 
willing to hand over the administrator of the organization to you.




Regards,
Hongyuan Ma (cs_maleica...@163.com) 




 



--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Google Summer of Code: Potential Applicant

2018-03-12 Thread Aleksander Alekseev

Hello Chris,

> I am sending this mail because I am in a need for some instructions on how
> to find a mentor and a project to work on.
> 
> Can anyone help me with that? Also is there any potential mentor here?

Take a look on the list of our GSoC projects:

https://wiki.postgresql.org/wiki/GSoC_2018

Which one you are most interested in?

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2018-03-12 Thread Amit Kapila

On Mon, Mar 12, 2018 at 12:18 AM, Alexander Korotkov
 wrote:
> On Sat, Mar 3, 2018 at 2:53 PM, Amit Kapila  wrote:
>>
>> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro
>> >  If that is indeed a race, could it be fixed by
>> > calling PredicateLockPageSplit() at the start of _hash_splitbucket()
>> > instead?
>> >
>>
>> Yes, but I think it would be better if we call this once we are sure
>> that at least one tuple from the old bucket has been transferred
>> (consider if all tuples in the old bucket are dead).
>
>
> Is it really fair?  For example, predicate lock can be held by session
> which queried some key, but didn't find any corresponding tuple.
> If we imagine this key should be in new bucket while all existing
> tuples would be left in old bucket.  As I get, in this case no locks
> would be transferred since no tuples were moved to the new bucket.
> So, further insertion to the new bucket wouldn't conflict with session,
> which looked for non-existing key, while it should.  Do it make sense?
>

Valid point, I think on split we should always transfer locks from old
bucket to new bucket.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pg_serial early wraparound

2018-03-12 Thread Anastasia Lubennikova

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The patch doesn't break anything in regression tests and does the code cleanup.
As far as I understand, the removed code was dead, since SLRU size is large 
enough 
and the wraparound, described in the message is impossible.
So I mark it as Ready For Committer.

I didn't manage to repeat the attached test, though. 
Server doesn't start after xid reset. It throws an error:

server stopped
== setting next xid to 1073741824 =
Write-ahead log reset
waiting for server to start2018-03-12 14:18:59.551 MSK [16126] LOG:  
listening on IPv4 address "127.0.0.1", port 5432
2018-03-12 14:18:59.625 MSK [16126] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2018-03-12 14:18:59.764 MSK [16127] LOG:  database system was shut down at 
2018-03-12 14:18:59 MSK
2018-03-12 14:18:59.802 MSK [16127] FATAL:  could not access status of 
transaction 10737418
2018-03-12 14:18:59.802 MSK [16127] DETAIL:  Could not open file 
"pg_xact/000A": Нет такого файла или каталога.
2018-03-12 14:18:59.803 MSK [16126] LOG:  startup process (PID 16127) exited 
with exit code 1
2018-03-12 14:18:59.803 MSK [16126] LOG:  aborting startup due to startup 
process failure
2018-03-12 14:18:59.804 MSK [16126] LOG:  database system is shut down

Google Summer of Code: Potential Applicant

2018-03-12 Thread Christos Maris

Hey there,

My name is Christos (Chris for short) and I would love to work with you via
the GSoC program this summer.

I am sending this mail because I am in a need for some instructions on how
to find a mentor and a project to work on.

Can anyone help me with that? Also is there any potential mentor here?

Any info helps.

Thanks a lot in advance!

Re: FOR EACH ROW triggers on partitioned tables

2018-03-12 Thread Peter Eisentraut

On 3/9/18 16:05, Alvaro Herrera wrote:
> ... and this little addendum makes pg_dump work correctly.

The header file says "recursing", but the .c file calls the argument
"in_partition".

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Commitfest 2018-9 duplicate patch deletion request.

2018-03-12 Thread David Steele

On 3/12/18 6:12 AM, Shinoda, Noriyoshi wrote:
> 
> Last week, I posted a patch to Commitfest 2018-9 which title is "[WIP]
> Document update for Logical Replication security".
> 
> I do not know the reason, but the same content duplicated. Since I can
> not delete posts, would you please delete either one?

Done.

-- 
-David
da...@pgmasters.net

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-12 Thread Jeevan Chalke

On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
>  wrote:
> > On Thu, Mar 8, 2018 at 7:31 PM, Robert Haas 
> wrote:
> >>
> >> This kind of goes along with the suggestion I made yesterday to
> >> introduce a new function, which at the time I proposed calling
> >> initialize_grouping_rel(), to set up new grouped or partially grouped
> >> relations.  By doing that it would be easier to ensure the
> >> initialization is always done in a consistent way but only for the
> >> relations we actually need.  But maybe we should call it
> >> fetch_grouping_rel() instead.  The idea would be that instead of
> >> calling fetch_upper_rel() we would call fetch_grouping_rel() when it
> >> is a question of the grouped or partially grouped relation.  It would
> >> either return the existing relation or initialize a new one for us.  I
> >> think that would make it fairly easy to initialize only the ones we're
> >> going to need.
> >
> > Hmm. I am working on refactoring the code to do something like this.
> >
>
> Here's patch doing the same. I have split create_grouping_paths() into
> three functions 1. to handle degenerate grouping paths
> (try_degenerate_grouping_paths()) 2. to create the grouping rels,
> partial grouped rel and grouped rel (make_grouping_rels()), which also
> sets some properties in GroupPathExtraData. 3. populate grouping rels
> with paths (populate_grouping_rels_with_paths()). With those changes,
> I have been able to get rid of partially grouped rels when they are
> not necessary. But I haven't tried to get rid of grouped_rels when
> they are not needed.
>
> GroupPathExtraData now completely absorbs members from and replaces
> OtherUpperPathExtraData. This means that we have to consider a way to
> pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
> haven't tried it in this patch.
>
> With this patch there's a failure in partition_aggregation where the
> patch is creating paths with MergeAppend with GatherMerge underneath.
> I think this is related to the call
> add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
> true. But I didn't investigate it further.
>
> With those two things remaining I am posting this patch, so that
> Jeevan Chalke can merge this patch into his patches and also merge
> some of his changes related to mine and Robert's changes. Let me know
> if this refactoring looks good.
>

Thanks Ashutosh for the refactoring patch.
I will rebase my changes and will also resolve above two issues you have
reported.


>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)

2018-03-12 Thread Julian Markwort

Arthur Zakirov wrote on 2018-03-09:
> I'd like to review the patch and leave a feedback. I tested it with
> different indexes on same table and with same queries and it works fine.

Thanks for taking some time to review my patch, Arthur!

> First of all, GUC variables and functions. How about union
> 'pg_stat_statements.good_plan_enable' and
> 'pg_stat_statements.bad_plan_enable' into
> 'pg_stat_statements.track_plan' with GUC_LIST_INPUT flag? It may accept
> comma separated values 'good' and 'bad'. It lets to add another tracking
> type in future without adding new variable.

This sounds like a good idea to me; Somebody already suggested that tracking an 
"average plan" would be interesting as well, however I don't have any clever 
ideas on how to identify such a plan.

> In same manner pg_stat_statements_good_plan_reset() and
> pg_stat_statements_bad_plan_reset() functions can be combined in
> function:

> pg_stat_statements_plan_reset(in queryid bigint, in good boolean, in bad
> boolean)

This is also sensible, however if any more kinds of plans would be added in the 
future, there would be a lot of flags in this function. I think having varying 
amounts of flags (between extension versions) as arguments to the function also 
makes any upgrade paths difficult. Thinking more about this, we could also user 
function overloading to avoid this.
An alternative would be to have the caller pass the type of plan he wants to 
reset. Omitting the type would result in the deletion of all plans, maybe?
pg_stat_statements_plan_reset(in queryid bigint, in plan_type text)

I'm not sure if there are any better ways to do this?

> Further comments on the code.
 You're right about all of those, thanks!

Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-03-12 Thread amul sul

On Mon, Mar 12, 2018 at 11:45 AM, amul sul  wrote:
> On Sat, Mar 10, 2018 at 5:25 PM, Amit Kapila  wrote:
>> On Fri, Mar 9, 2018 at 3:18 PM, amul sul  wrote:
>>> On Thu, Mar 8, 2018 at 12:31 PM, Amit Kapila  
>>> wrote:
 On Thu, Mar 8, 2018 at 11:04 AM, Pavan Deolasee

> This is just one example. I am almost certain there are many such cases 
> that
> will require careful attention.
>

 Right, I think we should be able to detect and fix such cases.

>>>
>>> I found a couple of places (in heap_lock_updated_tuple, rewrite_heap_tuple,
>>> EvalPlanQualFetch & heap_lock_updated_tuple_rec) where ItemPointerEquals is
>>> use to check tuple has been updated/deleted.  With the proposed patch
>>> ItemPointerEquals() will no longer work as before, we required addition 
>>> check
>>> for updated/deleted tuple, proposed the same in latest patch[1]. Do let me 
>>> know
>>> your thoughts/suggestions on this, thanks.
>>>
>>
>> I think you have identified the places correctly.  I have few
>> suggestions though.
>>
>> 1.
>> - if (!ItemPointerEquals(>t_self, ctid))
>> + if (!(ItemPointerEquals(>t_self, ctid) ||
>> +   (!ItemPointerValidBlockNumber(ctid) &&
>> +(ItemPointerGetOffsetNumber(>t_self) ==   /* TODO: Condn.
>> should be macro? */
>> + ItemPointerGetOffsetNumber(ctid)
>>
>> Can't we write this and similar tests as:
>> ItemPointerValidBlockNumber(ctid) &&
>> !ItemPointerEquals(>t_self, ctid)?  It will be bit simpler to
>> understand and serve the purpose.
>>
>
> Yes, you are correct, we need not worry about offset matching -- invalid block
> number check and ItemPointerEquals is more than enough to conclude the tuple 
> has
> been deleted or not.  Will change the condition accordingly in the next 
> version.
>
>> 2.
>>
>> if (mytup.t_data->t_infomask & HEAP_XMAX_INVALID ||
>>   ItemPointerEquals(_self, _data->t_ctid) ||
>> + !HeapTupleHeaderValidBlockNumber(mytup.t_data) ||
>>   HeapTupleHeaderIsOnlyLocked(mytup.t_data))
>>
>> I think it is better to keep the check for
>> HeapTupleHeaderValidBlockNumber earlier than ItemPointerEquals as it
>> will first validate if block number is valid and then only compare the
>> complete CTID.
>
> Sure, will do that.

I did the aforementioned changes in the attached patch, thanks.

Regards,
Amul


0001-Invalidate-ip_blkid-v6.patch
Description: Binary data


0002-isolation-tests-v6.patch
Description: Binary data

Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-03-12 Thread Andrey Borodin

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I've tested this patch with different types of order by, including needing 
recheck, but everything seems to work as expected.
There are some missing spaces before some comment lines and I'm a bit unsure on 
some lines of code duplicated between nodeIndexonlyscan.c and nodeIndexscan.c
But besides this, I have no more notes on the code.
I think the patch is ready for committer.

Best regards, Andrey Borodin.

The new status of this patch is: Ready for Committer

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-12 Thread Ashutosh Bapat

On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
 wrote:
> On Thu, Mar 8, 2018 at 7:31 PM, Robert Haas  wrote:
>>
>> This kind of goes along with the suggestion I made yesterday to
>> introduce a new function, which at the time I proposed calling
>> initialize_grouping_rel(), to set up new grouped or partially grouped
>> relations.  By doing that it would be easier to ensure the
>> initialization is always done in a consistent way but only for the
>> relations we actually need.  But maybe we should call it
>> fetch_grouping_rel() instead.  The idea would be that instead of
>> calling fetch_upper_rel() we would call fetch_grouping_rel() when it
>> is a question of the grouped or partially grouped relation.  It would
>> either return the existing relation or initialize a new one for us.  I
>> think that would make it fairly easy to initialize only the ones we're
>> going to need.
>
> Hmm. I am working on refactoring the code to do something like this.
>

Here's patch doing the same. I have split create_grouping_paths() into
three functions 1. to handle degenerate grouping paths
(try_degenerate_grouping_paths()) 2. to create the grouping rels,
partial grouped rel and grouped rel (make_grouping_rels()), which also
sets some properties in GroupPathExtraData. 3. populate grouping rels
with paths (populate_grouping_rels_with_paths()). With those changes,
I have been able to get rid of partially grouped rels when they are
not necessary. But I haven't tried to get rid of grouped_rels when
they are not needed.

GroupPathExtraData now completely absorbs members from and replaces
OtherUpperPathExtraData. This means that we have to consider a way to
pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
haven't tried it in this patch.

With this patch there's a failure in partition_aggregation where the
patch is creating paths with MergeAppend with GatherMerge underneath.
I think this is related to the call
add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
true. But I didn't investigate it further.

With those two things remaining I am posting this patch, so that
Jeevan Chalke can merge this patch into his patches and also merge
some of his changes related to mine and Robert's changes. Let me know
if this refactoring looks good.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
commit c882a54585b93b8976d2d9a25f55d18ed27d69c1
Author: Ashutosh Bapat 
Date:   Thu Mar 8 15:33:51 2018 +0530

Split create_grouping_paths()

Separate code in create_grouping_paths() into two functions: first to
create the grouping relations, partial grouped rel and grouped rel,
second to populate those with paths. These two functions are then
called from create_grouping_paths() and try_partitionwise_grouping().

As part of this separate degenerate grouping case into a function of
its own (try_degenerate_grouping_paths()) to be called only from
create_grouping_paths().

Ashutosh Bapat.

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a5a049f..1611975 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -123,12 +123,27 @@ typedef struct
  */
 typedef struct
 {
+	/* Data which remains constant once set. */
 	bool		can_hash;
 	bool		can_sort;
 	bool		can_partial_agg;
 	bool		partial_costs_set;
 	AggClauseCosts agg_partial_costs;
 	AggClauseCosts agg_final_costs;
+
+	/*
+	 * Data which depends upon the input and grouping relations and hence may
+	 * change across partitioning hierarchy
+	 */
+	bool		consider_parallel;	/* It probably remains same across
+	 * partitioning hierarchy. But double
+	 * check.
+	 */
+	bool		try_parallel_aggregation;
+	bool		partitionwise_grouping;
+	bool		partial_partitionwise_grouping;
+	Node	   *havingQual;
+	PathTarget *target;
 } GroupPathExtraData;

 /* Local functions */
@@ -161,9 +176,7 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
 	  RelOptInfo *input_rel,
 	  PathTarget *target,
 	  const AggClauseCosts *agg_costs,
-	  grouping_sets_data *gd,
-	  GroupPathExtraData *extra,
-	  OtherUpperPathExtraData *child_data);
+	  grouping_sets_data *gd);
 static void consider_groupingsets_paths(PlannerInfo *root,
 			RelOptInfo *grouped_rel,
 			Path *path,
@@ -240,14 +253,31 @@ static void try_partitionwise_grouping(PlannerInfo *root,
 		   RelOptInfo *input_rel,
 		   RelOptInfo *grouped_rel,
 		   RelOptInfo *partially_grouped_rel,
-		   PathTarget *target,
 		   const AggClauseCosts *agg_costs,
 		   grouping_sets_data *gd,
-		   GroupPathExtraData *extra,
-		   Node *havingQual,
-		   bool forcePartialAgg);
+		   GroupPathExtraData *extra);
 static bool group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target,

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-12 Thread Pavan Deolasee

On Sun, Mar 11, 2018 at 11:18 AM, Peter Geoghegan  wrote:

>
> It sounds like we should try to thoroughly understand why these
> duplicates arose. Did you actually call EvalPlanQualSetPlan() for all
> subplans at the time?
>
>
The reason for duplicates or even wrong answers is quite simple. The way
UPDATE/DELETE currently works for partition table is that we expand the
inheritance tree for the parent result relation and then create a subplan
for each partition separately. This works fine, even when there exists a
FROM/USING clause in the UPDATE/DELETE statement because the final result
does not change irrespective of whether you first do a UNION ALL between
all partitions and then find the candidate rows or whether you find
candidate rows from individual partitions separately.

In case of MERGE though, since we are performing a RIGHT OUTER JOIN between
the result relation and the source relation, we may conclude that a
matching target  row does not exist for a source row, whereas it actually
exists but in some other partition. For example,

CREATE TABLE target (key int, val text) PARTITION BY LIST ( key);
CREATE TABLE part1 PARTITION OF target FOR VALUES IN (1, 2, 3);
CREATE TABLE part2 PARTITION OF target FOR VALUES IN (4, 5, 6);
CREATE TABLE source (skey integer);
INSERT INTO source VALUES (1), (4), (7);
INSERT INTO part1 VALUES (1, 'v1'), (2, 'v2'), (3, 'v3');
INSERT INTO part2 VALUES (4, 'v4'), (5, 'v5'), (6, 'v6');

postgres=# SELECT * FROM target RIGHT OUTER JOIN source ON key = skey;
 key | val | skey
-+-+--
   1 | v1  |1
   4 | v4  |4
 | |7
(3 rows)

This gives the right answer. But if we join individual partitions and then
do a UNION ALL,

postgres=# SELECT * FROM part1 RIGHT OUTER JOIN source ON key = skey UNION
ALL SELECT * FROM part2 RIGHT OUTER JOIN source ON key = skey;
 key | val | skey
-+-+--
   1 | v1  |1
 | |4
 | |7
 | |1
   4 | v4  |4
 | |7
(6 rows)

This is what nodeModifyTable does and hence we end up getting duplicates or
even incorrectly declared NOT MATCHED rows, where as they are matched in a
different partition.

I don't think not calling EvalPlanQualSetPlan() on all subplans is a
problem because we really never execute those subplans. In fact. we should
fix that so that those subplans are not even initialised.

> As you know, there is an ON CONFLICT DO UPDATE + partitioning patch in
> the works from Alvaro. In your explanation about that approach that
> you cited, you wondered what the trouble might have been with ON
> CONFLICT + partitioning, and supposed that the issues were similar
> there. Are they? Has that turned up much?
>
>
Well, I initially thought that ON CONFLICT DO UPDATE on partition table may
have the same challenges, but that's probably not the case. For INSERT ON
CONFLICT it's still just an INSERT path, with some special handling for
UPDATEs. Currently, for partition or inherited table, UPDATEs/DELETEs go
via inheritance_planner() thus expanding inheritance for the result
relation where as INSERTs go via simple grouping_planner().

For MERGE, we do all three DMLs. That doesn't mean we could not
re-implement MERGE on the lines of INSERTs, but that would most likely mean
complete re-writing of the UPDATEs/DELETEs for partition/inheritance
tables. The challenges would just be the same in both cases.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

RE: Intermittent pg_ctl failures on Windows

2018-03-12 Thread Badrul Chowdhury

Hi Tom,

This is a great catch. I am looking into it: I will start by reproducing the 
error as you suggested.

Thanks,
Badrul

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Saturday, March 10, 2018 2:48 PM
To: pgsql-hackers@lists.postgresql.org
Subject: Intermittent pg_ctl failures on Windows

The buildfarm's Windows members occasionally show weird pg_ctl failures, for 
instance this recent case:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbuildfarm.postgresql.org%2Fcgi-bin%2Fshow_log.pl%3Fnm%3Dbowerbird%26dt%3D2018-03-10%252020%253A30%253A20=04%7C01%7Cbachow%40microsoft.com%7C28a8094e84c74c26ecb108d586d91a9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636563189370087651%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1=qBtMsJ0EJFs4DVtkA6TZJhCDNlj392uNxsB6MHnu7po%3D=0

### Restarting node "master"
# Running: pg_ctl -D 
G:/prog/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_006_logical_decoding_master_data/pgdata
 -l 
G:/prog/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/log/006_logical_decoding_master.log
 restart waiting for server to shut down done server stopped waiting for 
server to startThe process cannot access the file because it is being used 
by another process.
 stopped waiting
pg_ctl: could not start server
Examine the log output.
Bail out!  system pg_ctl failed

or this one:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbuildfarm.postgresql.org%2Fcgi-bin%2Fshow_log.pl%3Fnm%3Dbowerbird%26dt%3D2017-12-29%252023%253A30%253A24=04%7C01%7Cbachow%40microsoft.com%7C28a8094e84c74c26ecb108d586d91a9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636563189370087651%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1=NdoDkZxBagXpiPDjNmhN6znHh%2BITyjEv2StPpLaabaw%3D=0

### Stopping node "subscriber" using mode fast # Running: pg_ctl -D 
c:/prog/bf/root/HEAD/pgsql.build/src/test/subscription/tmp_check/t_001_rep_changes_subscriber_data/pgdata
 -m fast stop waiting for server to shut downpg_ctl: could not open PID 
file 
"c:/prog/bf/root/HEAD/pgsql.build/src/test/subscription/tmp_check/t_001_rep_changes_subscriber_data/pgdata/postmaster.pid":
 Permission denied Bail out!  system pg_ctl failed

I'd been writing these off as Microsoft randomness and/or antivirus 
interference, but it suddenly occurred to me that there might be a consistent 
explanation: since commit f13ea95f9, when pg_ctl is waiting for server 
start/stop, it is trying to read postmaster.pid more-or-less concurrently with 
the postmaster writing to that file.  On Unix that's not much of a problem, but 
I believe that on Windows you have to specifically open the file with sharing 
enabled, or you get error messages like these.
The postmaster should be enabling sharing, because port.h redirects open/fopen 
to pgwin32_open/pgwin32_fopen which enable the sharing flags.
But it only does that #ifndef FRONTEND.  So pg_ctl is just using naked open(), 
which could explain these failures.

If this theory is accurate, it should be pretty easy to replicate the problem 
if you modify the postmaster to hold postmaster.pid open longer when rewriting 
it, e.g. stick fractional-second sleeps into CreateLockFile and 
AddToDataDirLockFile.

I'm not in a position to investigate this in detail nor test a fix, but I think 
somebody should.

regards, tom lane

Re: Prefix operator for text and spgist support

2018-03-12 Thread Ildus Kurbangaliev

On Tue, 6 Mar 2018 19:27:21 +0300
Arthur Zakirov  wrote:

> On Mon, Feb 19, 2018 at 05:19:15PM +0300, Ildus Kurbangaliev wrote:
> > At brief look at this place seems better to move this block into
> > pattern_fixed_prefix function. But there is also `vartype` variable
> > which used to in prefix construction, and it would require pass this
> > variable too. And since pattern_fixed_prefix called in 10 other
> > places and vartype is required only for this ptype it seems better
> > just keep this block outside of this function.  
> 
> Understood.
> 
> > I've added documentation in current version of the patch.  
> 
> Thank you.
> 
> Can you rebase the patch due to changes within pg_proc.h?
> 
> Also here
> 
> +   
> +There is also the prefix operator ^@ and
> corresponding
> +text_startswith function which covers cases
> when only
> +searching by beginning of the string is needed.
> +   
> 
> I think text_startswith should be enclosed with the  tag.
> I'm not sure, but I think  used for operators, keywords,
> etc. I haven't found a manual which describes how to use tags, but
> after looking at the documentation where  is used, I think
> that for function  should be used.
> 

Hi, thanks for the review. I've fixed documentation as you said and
also rebased to current master.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f59af25a6..4dc11d8df2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -2274,6 +2274,21 @@
ph
   
 
+  
+   
+
+ text_startswith
+
+text_startswith(string, prefix)
+   
+   bool
+   
+Returns true if string starts with prefix.
+   
+   text_startswith('alphabet', 'alph')
+   t
+  
+
   

 
@@ -4033,6 +4048,12 @@ cast(-44 as bit(12))   11010100
 ILIKE, respectively.  All of these operators are
 PostgreSQL-specific.

+
+   
+There is also the prefix operator ^@ and corresponding
+text_startswith function which covers cases when only
+searching by beginning of the string is needed.
+   
   
 
 
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index e47f70be89..06b7519052 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -161,6 +161,7 @@
~~
~=~
~~
+   ^@
   
  
  
diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index f156b2166e..5beb9e33a1 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -496,7 +496,7 @@ spg_text_inner_consistent(PG_FUNCTION_ARGS)
 			 * well end with a partial multibyte character, so that applying
 			 * any encoding-sensitive test to it would be risky anyhow.)
 			 */
-			if (strategy > 10)
+			if (strategy > 10 && strategy != RTPrefixStrategyNumber)
 			{
 if (collate_is_c)
 	strategy -= 10;
@@ -526,6 +526,10 @@ spg_text_inner_consistent(PG_FUNCTION_ARGS)
 	if (r < 0)
 		res = false;
 	break;
+case RTPrefixStrategyNumber:
+	if (r != 0)
+		res = false;
+	break;
 default:
 	elog(ERROR, "unrecognized strategy number: %d",
 		 in->scankeys[j].sk_strategy);
@@ -601,10 +605,21 @@ spg_text_leaf_consistent(PG_FUNCTION_ARGS)
 	for (j = 0; j < in->nkeys; j++)
 	{
 		StrategyNumber strategy = in->scankeys[j].sk_strategy;
-		text	   *query = DatumGetTextPP(in->scankeys[j].sk_argument);
-		int			queryLen = VARSIZE_ANY_EXHDR(query);
+		text	   *query;
+		int			queryLen;
 		int			r;
 
+		/* for this strategy collation is not important */
+		if (strategy == RTPrefixStrategyNumber)
+		{
+			res = DatumGetBool(DirectFunctionCall2(text_startswith,
+out->leafValue, in->scankeys[j].sk_argument));
+			goto next;
+		}
+
+		query = DatumGetTextPP(in->scankeys[j].sk_argument);
+		queryLen = VARSIZE_ANY_EXHDR(query);
+
 		if (strategy > 10)
 		{
 			/* Collation-aware comparison */
@@ -655,6 +670,7 @@ spg_text_leaf_consistent(PG_FUNCTION_ARGS)
 break;
 		}
 
+next:
 		if (!res)
 			break;/* no need to consider remaining conditions */
 	}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index bf240aa9c5..50b2583ca0 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1317,8 +1317,18 @@ patternsel(PG_FUNCTION_ARGS, Pattern_Type ptype, bool negate)
 	 * right cache key, so that they can be re-used at runtime.
 	 */
 	patt = (Const *) other;
-	pstatus = pattern_fixed_prefix(patt, ptype, collation,
-   , _selec);
+
+	if (ptype == Pattern_Type_Prefix)
+	{
+		char	*s = TextDatumGetCString(constval);
+		prefix = string_to_const(s, vartype);
+		pstatus = Pattern_Prefix_Partial;
+		rest_selec = 1.0;	/* all */
+		pfree(s);
+	}
+	else
+		pstatus = pattern_fixed_prefix(patt, ptype,

[HACKERS] Commitfest 2018-9 duplicate patch deletion request.

2018-03-12 Thread Shinoda, Noriyoshi

Dear Hackers, CFM

Last week, I posted a patch to Commitfest 2018-9 which title is "[WIP] Document 
update for Logical Replication security".
I do not know the reason, but the same content duplicated. Since I can not 
delete posts, would you please delete either one?

Best regards,

Noriyoshi Shinoda

Re: Failed to request an autovacuum work-item in silence

2018-03-12 Thread Ildar Musin


Hi,

autovac_get_workitem_name() declaration seems redundant and should be 
removed. The same thing with including "utils/lsyscache.h" in brin.c.


The 'requested' variable in brininsert() I would again rename to 
something like 'success' because a work item is requested anyway but 
what matters is whether the request was satisfied/successful.


Except for those two points everything is fine and works as expected.


--
Ildar Musin
i.mu...@postgrespro.ru

Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-03-12 Thread Michail Nikolaev

Hello.

> Sorry, seems like I've incorrectly expressed what I wanted to say.
> I mean this Assert() can be checked before loop, not on every loop cycle.

Yes, I understood it. Condition should be checked every cycle - at least it
is done this way for index only scan:
https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeIndexonlyscan.c#L234

But since original index scan do not contains such check (probably due ot
https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeIndexscan.c#L552)
- I think it could be removed.

Michail.

Re: All Taxi Services need Index Clustered Heap Append

2018-03-12 Thread Konstantin Knizhnik




On 02.03.2018 19:30, Darafei "Komяpa" Praliaskouski wrote:

Hi,

I work at a ride sharing company and we found a simple scenario that 
Postgres has a lot to improve at.
After my talk at pgconf.ru  Alexander Korotkov 
encouraged me to share my story and thoughts in -hackers.


Story setting:

 - Amazon RDS (thus vanilla unpatchable postgres), 
synchronuous_commit=off (we're ok to lose some seconds on crash).


 - Amazon gp2 storage. It's almost like SSD for small spikes, but has 
IOPS limit if you're doing a lot of operations.


 - Drivers driving around the city and sending their GPS positions 
each second, for simplicity - 50k of them at a time.


 - An append-only table of shape (id uuid, ts timestamp, geom 
geometry, heading speed accuracy float, source text).

A btree index on (id, ts).

 - A service that gets the measurements on the network, batches all 
into a buffer of 1 second (~N=50k rows) and inserting via COPY.


After someone orders and completes a trip, we have the id of the 
driver and trip time interval coming from another service, and want to 
get trip's route to calculate the bill. Trip times are from seconds up 
to 4 hours (N=4*3600=14400 rows, a page typically has 60 rows).


select * from positions where id = :id and ts between :start_ts and 
:end_ts;


Data for more than 24 hours ago is not needed in this service, thus a 
storage of 70 gb should be enough. On the safe side we're giving it 
500 gb, which on gp2 gives a steady 1500 iops.


In development (on synthetic data) plans use index and look great, so 
we proceed with Postgres and not MySQL. :)


When deployed to production we figured out that a single query for 
interval of more than half an hour (1800+ rows) can exhaust all the IOPS.
Data is appended with increasing time field, which effectively ensures 
no rows from the same driver are ever going to be in the same heap 
page. A 4 hour long request can degrade system for 10 seconds. gp2 
provides max 1 IOPS, and we need to get 14400 pages then. We need 
the biggest available gp2 offer just to read 2 megabytes of data in 
1.5 seconds. The best io-optimized io1 volumes provide 32000 IOPS, 
which get us as low as 500 ms.


If the data was perfectly packed, it would be just 240 8k pages and 
translate to 120 input operations of gp2's 16kb blocks.


Our options were:

 - partitioning. Not entirely trivial when your id is uuid. To get 
visible gains, we need to make sure each driver gets their own 
partition. That would leave us with 50 000(+) tables, and rumors say 
that in that's what is done in some bigger taxi service, and relcache 
then eats up all the RAM and system OOMs.


 - CLUSTER table using index. Works perfect on test stand, isn't 
available as online option.


 - Postgres Pro suggested CREATE INDEX .. INCLUDE (on commitfest 
https://commitfest.postgresql.org/15/1350/). We can't use that as it's 
not in upstream/amazon Postgres yet.


 - We decided to live with overhead of unnecessary sorting by all 
fields and keeping a copy of heap and created a btree over all the 
fields to utilize Index-Only Scans:


  * Testing went well on dump of production database.

  * After we've made indexes on production, we found out that 
performance is _worse_ than with simpler index.


  * EXPLAIN (BUFFERS) revealed that Visibility Map is never being 
frozen, as autovacuum ignores append-only never-updated never-deleted 
table that is only truncated once a day. No way to force autovacuum on 
such table exists.


  * We created a microservice (hard to find spot for crontab in 
distributed system!) that periodically agressively runs VACUUM on the 
table.
It indeed helped with queries, but.. VACUUM skips all-visible pages in 
index, but always walks over all the pages of btree, which is even 
larger than the heap in our case.
There is a patch to repair this behavior on commitfest 
https://commitfest.postgresql.org/16/952/ - but not yet in 
upstream/amazon Postgres.


  * We ended up inventing partitioning schema that rotates a table 
every gigabyte of data, to keep VACUUM run time low. There are a 
hundreds partitions with indexes on all the fields. Finally the system 
is stable.


  * EXPLAIN (BUFFERS) shows later that each reading query visits all 
the indexes of partitioned table and fetches a page from index to know 
there are 0 rows there. To prune obviously unneeded partitions we 
decided to add constraint on timestamp after a partition is finalized. 
Timestamps are sanitized due to mobile network instability are stamped 
on the client side, so we don't know the bounds in advance. 
Unfortunately that means we need two seq scans to do it: first one to 
get min(ts), max(ts), and second one on ALTER TABLE ADD CONSTRAINT. 
This operation also eats up iops.


We are not very large company but we bump into too many scalability 
issues on this path already. Searching for solutions on every step 
shows other people with tables named like "gpsdata" and "traces", so

Re: Concurrency bug in UPDATE of partition-key

2018-03-12 Thread Amit Khandekar

On 8 March 2018 at 16:55, Amit Khandekar  wrote:
> The attached WIP patch is how the fix is turning out to be.
> ExecDelete() passes back the epqslot returned by EvalPlanQual().
> ExecUpdate() uses this re-fetched tuple to re-process the row, just
> like it does when heap_update() returns HeapTupleUpdated.
>
> I need to write an isolation test for this fix.

Attached patch now has the isolation test included. The only
difference with the earlier WIP patch is about some cosmetic changes,
and some more comments.

>
> [1] : 
> https://www.postgresql.org/message-id/CAJ3gD9d%3DwcbF-G00bYo4v5iWC69zm1UTSNxyRLpOgO9j4AtLWQ%40mail.gmail.com
>
>
> --
> Thanks,
> -Amit Khandekar
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


fix_concurrency_bug.patch
Description: Binary data

Re: Protect syscache from bloating with negative cache entries

2018-03-12 Thread Kyotaro HORIGUCHI

Oops.

At Mon, 12 Mar 2018 17:34:08 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180312.173408.162882093.horiguchi.kyot...@lab.ntt.co.jp>
> Something like the sttached test script causes relcache

This is that.

#! /usr/bin/perl


# printf("drop schema if exists test_schema;\n", $i);
printf("create schema test_schema;\n", $i);
printf("create table test_schema.t%06d ();\n", $i);

for $i (0..10) {
printf("create table test_schema.t%06d ();\n", $i);
}

printf("set syscache_memory_target = \'1kB\';\n");
printf("set syscache_prune_min_age = \'15s\';\n");

for $i (0..10) {
printf("select * from test_schema.t%06d;\n", $i);
}

Re: Protect syscache from bloating with negative cache entries

2018-03-12 Thread Kyotaro HORIGUCHI

At Fri, 09 Mar 2018 17:40:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180309.174001.202113825.horiguchi.kyot...@lab.ntt.co.jp>
> > In short, it's not really apparent to me that negative syscache entries
> > are the major problem of this kind.  I'm afraid that you're drawing very
> > large conclusions from a specific workload.  Maybe we could fix that
> > workload some other way.
> 
> The current patch doesn't consider whether an entry is negative
> or positive(?). Just clean up all entries based on time.
> 
> If relation has to have the same characterictics to syscaches, it
> might be better be on the catcache mechanism, instaed of adding
> the same pruning mechanism to dynahash..

For the moment, I added such feature to dynahash and let only
relcache use it in this patch. Hash element has different shape
in "prunable" hash and pruning is performed in a similar way
sharing the setting with syscache. This seems working fine.

It is bit uneasy that all syscaches and relcache shares the same
value of syscache_memory_target...

Something like the sttached test script causes relcache
"bloat". Server emits the following log entries in DEBUG1 message
level.

DEBUG:  removed 11240/32769 entries from hash "Relcache by OID" at character 15

# The last few words are just garbage I mentioned in another thread.

The last two patches do that (as PoC).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 705b67a79ef7e27a450083944f8d970b7eb9e619 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 26 Dec 2017 17:43:09 +0900
Subject: [PATCH 1/3] Remove entries that haven't been used for a certain time

Catcache entries can be left alone for several reasons. It is not
desirable that they eat up memory. With this patch, This adds
consideration of removal of entries that haven't been used for a
certain time before enlarging the hash array.
---
 doc/src/sgml/config.sgml  |  38 +++
 src/backend/access/transam/xact.c |   3 +
 src/backend/utils/cache/catcache.c| 152 +-
 src/backend/utils/misc/guc.c  |  23 
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/utils/catcache.h  |  19 
 6 files changed, 233 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3a8fc7d803..394e0703f8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1557,6 +1557,44 @@ include_dir 'conf.d'
   
  
 
+ 
+  syscache_memory_target (integer)
+  
+   syscache_memory_target configuration parameter
+  
+  
+  
+   
+Specifies the maximum amount of memory to which syscache is expanded
+without pruning. The value defaults to 0, indicating that pruning is
+always considered. After exceeding this size, syscache pruning is
+considered according to
+. If you need to keep
+certain amount of syscache entries with intermittent usage, try
+increase this setting.
+   
+  
+ 
+
+ 
+  syscache_prune_min_age (integer)
+  
+   syscache_prune_min_age configuration parameter
+  
+  
+  
+   
+Specifies the minimum amount of unused time in seconds at which a
+syscache entry is considered to be removed. -1 indicates that syscache
+pruning is disabled at all. The value defaults to 600 seconds
+(10 minutes). The syscache entries that are not
+used for the duration can be removed to prevent syscache bloat. This
+behavior is suppressed until the size of syscache exceeds
+.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index dbaaf8e005..86d76917bb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -733,6 +733,9 @@ void
 SetCurrentStatementStartTimestamp(void)
 {
 	stmtStartTimestamp = GetCurrentTimestamp();
+
+	/* Set this timestamp as aproximated current time */
+	SetCatCacheClock(stmtStartTimestamp);
 }
 
 /*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 5ddbf6eab1..0236a05127 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -71,9 +71,23 @@
 #define CACHE6_elog(a,b,c,d,e,f,g)
 #endif
 
+/*
+ * GUC variable to define the minimum size of hash to cosider entry eviction.
+ * Let the name be the same with the guc variable name, not using 'catcache'.
+ */
+int syscache_memory_target = 0;
+
+/* GUC variable to define the minimum age of entries that will be cosidered
+ * to be evicted in seconds. Ditto for the name.
+ */
+int syscache_prune_min_age = 600;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr =

Re: GSOC 2018 proposal

2018-03-12 Thread Aleksander Alekseev

Hello Charles,

> I am currently preparing a proposal for pg_thrift project. I noticed
> that there are several protocols supported by thrift, which ones do we
> have higher priority? I mean which ones I need to implement during
> this project?

Binary protocols, i.e. TBinaryProtocol and TCompactProtocol. The first
one is a bit faster but more redundant, the second one is slower but
more compact. It's your choice which one to implement first, but at
least one binary protocol should be fully supported (ideally - both).

As far as I'm aware other protocols are rarely used and are not fully
implemented in most existing libraries.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: Transform for pl/perl

2018-03-12 Thread Anthony Bykov

On Mon, 5 Mar 2018 14:03:37 +0100
Pavel Stehule  wrote:

> Hi
> 
> I am looking on this patch. I found few issues:
> 
> 1. compile warning
> 
> I../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2
> -I/usr/lib64/perl5/CORE  -c -o jsonb_plperl.o jsonb_plperl.c
> jsonb_plperl.c: In function ‘SV_FromJsonbValue’:
> jsonb_plperl.c:69:9: warning: ‘result’ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
>   return result;
>  ^~
> jsonb_plperl.c: In function ‘SV_FromJsonb’:
> jsonb_plperl.c:142:9: warning: ‘result’ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
>   return result;
>  ^~
> 
> 2. bad comment
> 
> /*
>  * SV_ToJsonbValue
>  *
>  * Transform Jsonb into SV --- propably reverse direction
>  */
> 
> 
> /*
>  * HV_ToJsonbValue
>  *
>  * Transform Jsonb into SV
>  */
> 
> /*
>  * plperl_to_jsonb(SV *in)
>  *
>  * Transform Jsonb into SV
>  */
> 
> 3. Do we need two identical tests fro PLPerl and PLPerlu? Why?
> 
> Regards
> 
> Pavel

Hello, thanks for reviewing my patch! I really appreciate it.

That warnings are created on purpose - I was trying to prevent the case
when new types are added into pl/perl, but new transform logic was not.
Maybe there is a better way to do it, but right now I'll just add the
"default: pg_unreachable" logic.

About the 3 point - I thought that plperlu and plperl uses different
interpreters. And if they act identically on same examples - there
is no need in identical tests for them indeed.

Point 2 is fixed in this version of the patch.

Please, find in attachments a new version of the patch.

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/contrib/Makefile b/contrib/Makefile
index 8046ca4..53d44fe 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -75,9 +75,9 @@ ALWAYS_SUBDIRS += sepgsql
 endif
 
 ifeq ($(with_perl),yes)
-SUBDIRS += hstore_plperl
+SUBDIRS += hstore_plperl jsonb_plperl
 else
-ALWAYS_SUBDIRS += hstore_plperl
+ALWAYS_SUBDIRS += hstore_plperl jsonb_plperl
 endif
 
 ifeq ($(with_python),yes)
diff --git a/contrib/jsonb_plperl/Makefile b/contrib/jsonb_plperl/Makefile
new file mode 100644
index 000..cd86553
--- /dev/null
+++ b/contrib/jsonb_plperl/Makefile
@@ -0,0 +1,40 @@
+# contrib/jsonb_plperl/Makefile
+
+MODULE_big = jsonb_plperl
+OBJS = jsonb_plperl.o $(WIN32RES)
+PGFILEDESC = "jsonb_plperl - jsonb transform for plperl"
+
+PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plperl
+
+EXTENSION = jsonb_plperlu jsonb_plperl
+DATA = jsonb_plperlu--1.0.sql jsonb_plperl--1.0.sql
+
+REGRESS = jsonb_plperl jsonb_plperl_relocatability jsonb_plperlu jsonb_plperlu_relocatability
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/jsonb_plperl
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+# We must link libperl explicitly
+ifeq ($(PORTNAME), win32)
+# these settings are the same as for plperl
+override CPPFLAGS += -DPLPERL_HAVE_UID_GID -Wno-comment
+# ... see silliness in plperl Makefile ...
+SHLIB_LINK += $(sort $(wildcard ../../src/pl/plperl/libperl*.a))
+else
+rpathdir = $(perl_archlibexp)/CORE
+SHLIB_LINK += $(perl_embed_ldflags)
+endif
+
+# As with plperl we need to make sure that the CORE directory is included
+# last, probably because it sometimes contains some header files with names
+# that clash with some of ours, or with some that we include, notably on
+# Windows.
+override CPPFLAGS := $(CPPFLAGS) $(perl_embed_ccflags) -I$(perl_archlibexp)/CORE
diff --git a/contrib/jsonb_plperl/expected/jsonb_plperl.out b/contrib/jsonb_plperl/expected/jsonb_plperl.out
new file mode 100644
index 000..152e62d
--- /dev/null
+++ b/contrib/jsonb_plperl/expected/jsonb_plperl.out
@@ -0,0 +1,243 @@
+CREATE EXTENSION jsonb_plperl CASCADE;
+NOTICE:  installing required extension "plperl"
+-- test hash -> jsonb
+CREATE FUNCTION testHVToJsonb() RETURNS jsonb
+LANGUAGE plperl
+TRANSFORM FOR TYPE jsonb
+AS $$
+$val = {a => 1, b => 'boo', c => undef};
+return $val;
+$$;
+SELECT testHVToJsonb();
+  testhvtojsonb  
+-
+ {"a": 1, "b": "boo", "c": null}
+(1 row)
+
+-- test array -> jsonb
+CREATE FUNCTION testAVToJsonb() RETURNS jsonb
+LANGUAGE plperl
+TRANSFORM FOR TYPE jsonb
+AS $$
+$val = [{a => 1, b => 'boo', c => undef}, {d => 2}];
+return $val;
+$$;
+SELECT testAVToJsonb();
+testavtojsonb
+-
+ [{"a": 1, "b": "boo", "c": null}, {"d": 2}]
+(1 row)
+
+-- test scalar -> jsonb
+CREATE FUNCTION testSVToJsonb() RETURNS jsonb
+LANGUAGE plperl
+TRANSFORM FOR TYPE jsonb
+AS $$
+$val = 1;
+return $val;
+$$;
+SELECT testSVToJsonb();
+ testsvtojsonb 
+---
+ 1
+(1 row)
+
+-- test blessed scalar -> jsonb
+CREATE FUNCTION testBlessedToJsonb() RETURNS jsonb
+LANGUAGE plperl

Fix error in ECPG while connection handling

2018-03-12 Thread Jeevan Ladhe

Hi,

I came across following error while working on ecpg client program.

$ install/bin/ecpg ecpg_connection_ptr.pgc
ecpg_connection_ptr.pgc:26: ERROR: AT option not allowed in WHENEVER
statement

I have attached simple ecpg program 'ecpg_connection_ptr_issue.pgc' that
reproduces the above issue.

After doing some investigation I could see that, in preproc.y, in rule of
'ClosePortalStmt', function output_statement() is called, which in turn
frees
the connection pointer. Here is the relevant trailing snippet from the
output_statement() function.

  whenever_action(whenever_mode | 2);
free(stmt);
if (connection != NULL)
free(connection);
}

Now, when the ecpg parses following statement using rule 'ECPGWhenever':
 EXEC SQL WHENEVER SQLERROR CONTINUE;
it checks if the connection is set, if it is then throws an error as below:

if (connection)
mmerror(PARSE_ERROR, ET_ERROR, "AT option not allowed in WHENEVER
statement");

Now, ideally the connection would have been null here, but, as the
'ClosePortalStmt'
rule freed the connection but did not set it to NULL, it still sees that
there
is a connection(which is actually having garbage in it) and throws an error.

I see similarly there are other places, which are freeing this global
connection
but not setting to NULL, and all those should be fixed. I have attached a
patch
ecpg_connection_ptr_issue_fix.patch to fix these places.

Regards,
Jeevan Ladhe


ecpg_connection_ptr_issue_fix.patch
Description: Binary data


ecpg_connection_ptr_issue.pgc
Description: Binary data

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2018-03-12 Thread Etsuro Fujita


(2018/03/09 20:55), Etsuro Fujita wrote:

(2018/03/08 14:24), Ashutosh Bapat wrote:

For local constraints to be enforced, we use remote
constraints. For local WCO we need to use remote WCO. That means we
create many foreign tables pointing to same local table on the foreign
server through many views, but it's not impossible.


Maybe I don't understand this correctly, but I guess that it would be
the user's responsibility to not create foreign tables in such a way.


I think I misunderstood your words.  Sorry for that.  I think what you 
proposed would be a solution for this issue, but I'm not sure that's a 
good one because that wouldn't work for the data sources that don't 
support views with WCO options.


Best regards,
Etsuro Fujita

Re: PATCH: Configurable file mode mask

2018-03-12 Thread Michael Paquier

On Fri, Mar 09, 2018 at 01:51:14PM -0500, David Steele wrote:
> How about a GUC that enforces one mode or the other on startup?  Default
> would be 700.  The GUC can be set automatically by initdb based on the
> -g option.  We had this GUC originally, but since the front-end tools
> can't read it we abandoned it.  Seems like it would be good as an
> enforcing mechanism, though.

Hm.  OK.  I can see the whole set of points about that.  Please let me
think a bit more about that bit.  Do you think that there could be a
pool of users willing to switch from one mode to another?  Compared to
your v1, we could indeed have a GUC which enforces a restriction to not
allow group access, and enabled by default.  As the commit fest is
running and we don't have a clear picture yet, I am afraid that it may
be better to move that to v12, and focus on getting patches 1 and 2
committed. This will provide a good base for the next move.

There are three places where things are still not correct:

-   if (chmod(location, S_IRWXU) != 0)
+   current_umask = umask(0);
+   umask(current_umask);
+
+   if (chmod(location, PG_DIR_MODE_DEFAULT & ~current_umask) != 0)
This is in tablespace.c.

@@ -185,6 +186,9 @@ main(int argc, char **argv)
exit(1);
}

+   /* Set dir/file mode mask */
+   umask(PG_MODE_MASK_DEFAULT);
+
In pg_rewind and pg_resetwal, isn't that also a portion which is not
necessary without the group access feature?

This is all I have basically for patch 2, which would be good for
shipping.
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] proposal: schema variables

2018-03-12 Thread Pavel Stehule

2018-03-12 7:49 GMT+01:00 Pavel Luzanov :

> Hi,
>
> I plan to make usability and feature test review in several days.
>
> Is there any chances that it will work on replicas?
> Such possibility is very helpful in generating reports.
> Now, LET command produces an error:
>
> ERROR:  cannot execute LET in a read-only transaction
>
>

> But if we say that variables are non-transactional ?
>

sure, it should to work. Now, I am try to solve a issues on concept level -
the LET code is based on DML code base, so probably there is check for rw
transactions. But it is useless for LET command.

Regards

Pavel


>
> -
> Pavel Luzanov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
> On 08.03.2018 21:00, Pavel Stehule wrote:
>
> Hi
>
> 2018-02-07 7:34 GMT+01:00 Pavel Stehule :
>
>> Hi
>>
>> updated patch with your changes in documentation and pg_dump (initial)
>> support
>>
>> Main issue of this patch is storage. We can reuse local buffers used for
>> temp tables. But it does allocation by 8KB and it creates temp files for
>> every object. That is too big overhead. Storing just in session memory is
>> too simple - then there should be lot of new code used, when variable will
>> be dropped.
>>
>> I have ideas how to allow work with mix of scalar and composite types -
>> so it will be next step of this prototype.
>>
>> Regards
>>
>> Pavel
>>
>
> new update - rebased, + some initial support for composite values on right
> side and custom types, arrays are supported too.
>
> omega=# CREATE VARIABLE xx AS (a int, b numeric);
> CREATE VARIABLE
> omega=# LET xx = (10, 20)::xx;
> LET
> omega=# SELECT xx;
> +-+
> |   xx|
> +-+
> | (10,20) |
> +-+
> (1 row)
>
> omega=# SELECT xx.a + xx.b;
> +--+
> | ?column? |
> +--+
> |   30 |
> +--+
> (1 row)
>
> omega=# \d xx
> schema variable "public.xx"
> ++-+
> | Column |  Type   |
> ++-+
> | a  | integer |
> | b  | numeric |
> ++-+
>
>
> Regards
>
> Pavel
>
>
>
>

Re: [HACKERS] proposal: schema variables

2018-03-12 Thread Pavel Luzanov


Hi,

I plan to make usability and feature test review in several days.

Is there any chances that it will work on replicas?
Such possibility is very helpful in generating reports.
Now, LET command produces an error:

ERROR:  cannot execute LET in a read-only transaction

But if we say that variables are non-transactional ?

-
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On 08.03.2018 21:00, Pavel Stehule wrote:

Hi

2018-02-07 7:34 GMT+01:00 Pavel Stehule >:


Hi

updated patch with your changes in documentation and pg_dump
(initial) support

Main issue of this patch is storage. We can reuse local buffers
used for temp tables. But it does allocation by 8KB and it creates
temp files for every object. That is too big overhead. Storing
just in session memory is too simple - then there should be lot of
new code used, when variable will be dropped.

I have ideas how to allow work with mix of scalar and composite
types - so it will be next step of this prototype.

Regards

Pavel


new update - rebased, + some initial support for composite values on 
right side and custom types, arrays are supported too.


omega=# CREATE VARIABLE xx AS (a int, b numeric);
CREATE VARIABLE
omega=# LET xx = (10, 20)::xx;
LET
omega=# SELECT xx;
+-+
|   xx    |
+-+
| (10,20) |
+-+
(1 row)

omega=# SELECT xx.a + xx.b;
+--+
| ?column? |
+--+
|   30 |
+--+
(1 row)

omega=# \d xx
schema variable "public.xx"
++-+
| Column |  Type   |
++-+
| a  | integer |
| b  | numeric |
++-+


Regards

Pavel

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2018-03-12 Thread Andrey Borodin



> 12 марта 2018 г., в 1:54, Alexander Korotkov  
> написал(а):
> 
> On Wed, Mar 7, 2018 at 8:30 PM, Alvaro Herrera  
> wrote:
> I suggest to create a new function GinPredicateLockPage() that checks
> whether fast update is enabled for the index.  The current arrangement
> looks too repetitive and it seems easy to make a mistake.
> 
> BTW, should we also skip CheckForSerializableConflictIn() when
> fast update is enabled?  AFAICS, now it doesn't cause any errors or
> false positives, but makes useless load.  Is it correct?
> 
BTW to BTW. I think we should check pending list size with 
GinGetPendingListCleanupSize() here
+
+   /*
+* If fast update is enabled, we acquire a predicate lock on the entire
+* relation as fast update postpones the insertion of tuples into index
+* structure due to which we can't detect rw conflicts.
+*/
+   if (GinGetUseFastUpdate(ginstate->index))
+   PredicateLockRelation(ginstate->index, snapshot);

Because we can alter alter index set (fastupdate = off), but there still will 
be pending list.

We were discussing this with Shubham back in July, chosen some approach that 
seemed better, but I can't remember what was that...

Best regards, Andrey Borodin.

Re: Inconsistent behavior in serializable snapshot

2018-03-12 Thread Kuntal Ghosh

On Sun, Mar 11, 2018 at 7:52 PM, Kuntal Ghosh
 wrote:
> Hello hackers,
>
> While working on serializable transaction isolation, I've noticed some
> strange behavior in the first permutation mentioned in
> isolation/specs/read-only-anomaly-2.spec file.
>
> setup
> {
> CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
> INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
> }
>
> # without s3, s1 and s2 commit
> permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
>
> Here, we can see a serial order T1 <- T2 without any conflict.
> However, if I perform "VACUUM FREEZE bank_account" after the setup
> step, s2wx throws a conflict error:
> ERROR:  could not serialize access due to read/write dependencies
> among transactions
> DETAIL:  Reason code: Canceled on identification as a pivot, during write.
> HINT:  The transaction might succeed if retried.
>
> Is this an expected behavior?
Got the answer.

In this case, when we perform VACUUM FREEZE on the table, the planner
chooses a plan with sequential scan(instead of Index Scan) to scan the
table for SELECT and UPDATE statements. Hence, we've to take relation
level SIReadLock instead of page and tuple level SIReadLock. This
causes the serialization conflict.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-03-12 Thread amul sul

On Sat, Mar 10, 2018 at 5:25 PM, Amit Kapila  wrote:
> On Fri, Mar 9, 2018 at 3:18 PM, amul sul  wrote:
>> On Thu, Mar 8, 2018 at 12:31 PM, Amit Kapila  wrote:
>>> On Thu, Mar 8, 2018 at 11:04 AM, Pavan Deolasee
>>>
 This is just one example. I am almost certain there are many such cases 
 that
 will require careful attention.

>>>
>>> Right, I think we should be able to detect and fix such cases.
>>>
>>
>> I found a couple of places (in heap_lock_updated_tuple, rewrite_heap_tuple,
>> EvalPlanQualFetch & heap_lock_updated_tuple_rec) where ItemPointerEquals is
>> use to check tuple has been updated/deleted.  With the proposed patch
>> ItemPointerEquals() will no longer work as before, we required addition check
>> for updated/deleted tuple, proposed the same in latest patch[1]. Do let me 
>> know
>> your thoughts/suggestions on this, thanks.
>>
>
> I think you have identified the places correctly.  I have few
> suggestions though.
>
> 1.
> - if (!ItemPointerEquals(>t_self, ctid))
> + if (!(ItemPointerEquals(>t_self, ctid) ||
> +   (!ItemPointerValidBlockNumber(ctid) &&
> +(ItemPointerGetOffsetNumber(>t_self) ==   /* TODO: Condn.
> should be macro? */
> + ItemPointerGetOffsetNumber(ctid)
>
> Can't we write this and similar tests as:
> ItemPointerValidBlockNumber(ctid) &&
> !ItemPointerEquals(>t_self, ctid)?  It will be bit simpler to
> understand and serve the purpose.
>

Yes, you are correct, we need not worry about offset matching -- invalid block
number check and ItemPointerEquals is more than enough to conclude the tuple has
been deleted or not.  Will change the condition accordingly in the next version.

> 2.
>
> if (mytup.t_data->t_infomask & HEAP_XMAX_INVALID ||
>   ItemPointerEquals(_self, _data->t_ctid) ||
> + !HeapTupleHeaderValidBlockNumber(mytup.t_data) ||
>   HeapTupleHeaderIsOnlyLocked(mytup.t_data))
>
> I think it is better to keep the check for
> HeapTupleHeaderValidBlockNumber earlier than ItemPointerEquals as it
> will first validate if block number is valid and then only compare the
> complete CTID.

Sure, will do that.

Thanks for the confirmation and suggestions.

Regards,
Amul

82 matches

Mail list logo