Re: GSOC 2018 Proposal review

2018-03-26 Thread Garima Natani
Hi Mark,

Thanks a lot for your feedback towards improving my proposal.
I have updated timeline as per changes suggested by you. Can you please
review it again?
Below is the link to shared google docs.
*https://docs.google.com/document/d/1W2N8zRXvylj5cxW8aKHvPDKFgj0Qk59UIJr9SPwFsa8/edit#
*


Regards,
Garima Natani
Mtech, First year
Geoinformatics and Natural Resources Engineering
Centre of Studies in Resources Engineering
Indian Institute of Technology, Bombay.
Have a nice day!!





On Tue, Mar 27, 2018 at 7:34 AM, Mark Wong  wrote:

> Hi Garima,
>
> On Tue, Mar 20, 2018 at 10:36:05PM +0530, Garima Natani wrote:
> > Hi All,
> >
> > I am looking forward to working with PostgreSQL organization In GSoC
> 2018.
>
> Glad to see you're interested in this project!
>
> > I have created Proposal for "Develop Performance Farm Database and
> Website"
> > Project. Please,
> >
> > Please, can you review it send comments?
>
> I think the general idea looks ok.  The biggest item I'd adjust is that
> the performance farm plants (to distinguish them from the build farm
> animals) shouldn't connect directly to the database.  Some of the django
> web app work needs to be moved up into phase one, because I'd propose
> the web app should also have an API that the build farm plants should
> use to transfer results.
>
> Good luck!
>
> Regards,
> Mark
>
> --
> Mark Wonghttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
>


Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types

2018-03-26 Thread Haribabu Kommi
On Tue, Mar 27, 2018 at 3:03 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Mon, Mar 26, 2018 at 8:24 PM, Michael Paquier 
> wrote:
>
>> On Tue, Mar 27, 2018 at 11:43:27AM +1100, Haribabu Kommi wrote:
>> > Patch attached with the above behavior along with other comments from
>> > upthread.
>>
>> Thanks for the updated version.
>>
>> The function changes look logically good to me.
>>
>> +  
>> +   The PQhost function returns NULL when the
>> +   input conn parameter is NULL or an empty string if conn cannot be
>> evaluated.
>> +   Applications of this function must carefully evaluate the return
>> value.
>> +  
>>
>> - "Applications of this function must carefully evaluate the return
>> value" is rather vague, so I would append to the end "depending on the
>> state of the connection involved."
>> The same applies to PQport() for consistency.
>>
>> Perhaps the documentation should mention as well that making the
>> difference between those different values is particularly relevant when
>> using multiple-value strings?  I would rather add one paragraph on the
>> matter than nothing.  I really think that we have been lacking clarity
>> in the documentation for those APIs for too long, and people always
>> argue about what they should do.  If we have a base documented, then we
>> can more easily argue for the future as well, and things are clear to
>> the user.
>>
> ​
>
> "depending on the state of the connection" doesn't move the goal-posts
> that far though...and "Applications of this function" would be better
> written as "Callers of this function" if left in place.
>
> In any case something like the following framework seems more useful to
> the reader who doesn't want to scan the source code for the PQhost/PQport
> functions.
>
> The PQhost function returns NULL when the input conn parameter is NULL or
> an empty string if conn cannot be evaluated.  Otherwise, the return value
> depends on the state of the conn: specifically (translate code to
> documentation here).  Furthermore, if both host and hostaddr properties
> exist on conn the return value will contain only the host.
>
> I'm undecided on the need for a  element but would lean against it
> in favor of the above, slightly longer, paragraph.
>
> And yes, while I'm not sure right now what the multi-value condition logic
> results in it should be mentioned - at least if the goal of the docs is to
> be a sufficient resource for using these functions.  In particular what
> happens when the connection is inactive (unless that falls under "cannot be
> evaluated"...).
>

Thanks for the review.

updated patch attached with additional doc updates as per the suggestion
from the upthreads.


Regards,
Hari Babu
Fujitsu Australia


PQhost-to-return-active-connected-host-and-hostaddr_v7.patch
Description: Binary data


Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-26 Thread Amit Kapila
On Tue, Mar 27, 2018 at 3:08 AM, Robert Haas  wrote:
> On Sat, Mar 24, 2018 at 9:40 AM, Amit Kapila  wrote:
>> For me, it is equivalent to the master.  The average of ten runs on
>> the master is 20664.3683 and with all the patches applied it is
>> 20590.4734.  I think there is some run-to-run variation, but more or
>> less there is no visible degradation.  I think we have found the root
>> cause and eliminated it.  OTOH, I have found another case where new
>> patch series seems to degrade.
>
> All right, I have scaled my ambitions back further.  Here is a revised
> and slimmed-down version of the patch series.
>

It still didn't help much.  I am seeing the similar regression in one
of the tests [1] posted previously.

>  If we forget about
> "Remove explicit path construction logic in create_ordered_paths" for
> now, then we don't really need a new upperrel.  So this patch just
> modifies the toplevel scan/join rel in place, which should avoid a
> bunch of overhead in add_path() and other places, while hopefully
> still fixing the originally-reported problem.
>

If we don't want to go with the upperrel logic, then maybe we should
consider just merging some of the other changes from my previous patch
in 0003* patch you have posted and then see if it gets rid of all the
cases where we are seeing a regression with this new approach.  I
think that with this approach you want to target the problem of
partitonwise aggregate, but maybe we can deal with it in a separate
patch.

[1] -
Test case
--
DO $$
DECLARE count integer;
BEGIN
For count In 1..100 Loop
Execute 'explain Select count(ten) from tenk1';
END LOOP;
END;
$$;

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-26 Thread Amit Langote
On 2018/03/26 23:20, Alvaro Herrera wrote:
> Pushed now.

Thank you!

> Amit Langote wrote:
>> On 2018/03/24 9:23, Alvaro Herrera wrote:
> 
>>> To fix this, I had to completely rework the "get partition parent root"
>>> stuff into "get list of ancestors of this partition".
>>
>> I wondered if a is_partition_ancestor(partrelid, ancestorid) isn't enough
>> instead of creating a list of ancestors and then looping over it as you've
>> done, but maybe what you have here is fine.
> 
> Yeah, I wondered about doing it that way too (since you can stop looking
> early), but decided that I didn't like repeatedly opening pg_inherits
> for each level.  Anyway the most common case is a single level, and in
> rare cases two levels ... I don't think we're going to see much more
> than that.  So it doesn't matter too much.  We can refine later anyway,
> if this becomes a hot spot (I doubt it TBH).

Yes, I suppose.

>>> * General code style improvements, comment rewording, etc.
>>
>> There was one comment in Fujita-san's review he posted on Friday [1] that
>> doesn't seem to be addressed in v10, which I think we probably should.  It
>> was this comment:
>>
>> "ExecBuildProjectionInfo is called without setting the tuple descriptor of
>> mtstate->mt_conflproj to tupDesc.  That might work at least for now, but I
>> think it's a good thing to set it appropriately to make that future proof."
>>
>> All of his other comments seem to have been taken care of in v10.  I have
>> fixed the above one in the attached updated version.
> 
> I was of two minds about this item myself; we don't use the tupdesc for
> anything at that point and I expect more things would break if we
> required that.  But I don't think it hurts, so I kept it.
> 
> The one thing I wasn't terribly in love with is the four calls to
> map_partition_varattnos(), creating the attribute map four times ... but
> we already have it in the TupleConversionMap, no?  Looks like we could
> save a bunch of work there.

Hmm, actually we can't use that map, assuming you're talking about the
following map:

  TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];

We can only use that to tell if we need converting expressions (as we
currently do), but it cannot be used to actually convert the expressions.
The map in question is for use by do_convert_tuple(), not to map varattnos
in Vars using map_variable_attnos().

But it's definitely a bit undesirable to have various
map_partition_varattnos() calls within ExecInitPartitionInfo() to
initialize the same information (the map) multiple times.

> And a final item is: can we have a whole-row expression in the clauses?
> We currently don't handle those either, not even to throw an error.
> [figures a test case] ... and now that I test it, it does crash!
> 
> create table part (a int primary key, b text) partition by range (a);
> create table part1 (b text, a int not null);
> alter table part attach partition part1 for values from (1) to (1000);
> insert into part values (1, 'two') on conflict (a)
>   do update set b = format('%s (was %s)', excluded.b, part.b)
>   where part.* *<> (1, text 'two');
> 
> I think this means we should definitely handle found_whole_row.  (If you
> create part1 in the normal way, it works as you'd expect.)

I agree.  That means we simply remove the Assert after the
map_partition_varattnos call.

> I'm going to close a few other things first, then come back to this.

Attached find a patch to fix the whole-row expression issue.  I added your
test to insert_conflict.sql.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 9a13188649..f1a972e235 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -557,7 +557,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
{
List   *onconflset;
TupleDesc   tupDesc;
-   boolfound_whole_row;
 
leaf_part_rri->ri_onConflict = 
makeNode(OnConflictSetState);
 
@@ -571,12 +570,10 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
onconflset = (List *) copyObject((Node *) 
node->onConflictSet);
onconflset =
map_partition_varattnos(onconflset, 
INNER_VAR, partrel,
-   
firstResultRel, _whole_row);
-   Assert(!found_whole_row);
+   
firstResultRel, NULL);
onconflset =
map_partition_varattnos(onconflset, 
firstVarno, partrel,
-   
firstResultRel, 

Re: idea - custom menu

2018-03-26 Thread Pavel Stehule
2018-03-27 2:44 GMT+02:00 Nikolay Samokhvalov :

> On Sat, Mar 3, 2018 at 1:08 PM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>
> Hi Pavel, I don't know how I missed this email, but finally I found it :-)
>
>
>> I am looking on project https://github.com/NikolayS/postgres_dba
>>
>> Nice idea, and perfect publicity of typical PostgreSQL maintenance task.
>>
>
> Thank you! It's in very early stage, I'm going to polish existing reports
> and add more soon.
>
>
>>
>> We can allow to call some external applications on some key press (not
>> command) and evaluate their output. mc has F2 for User menu. It can works
>> well with pdmenu or some similar. Some simple menu can be done on we lines
>> with ncurses.
>>
>
>> It can be substitution of some custom commands.
>>
>> Comments, notes?
>>
>
> This is a great idea. Adding more reports means that menu will be very
> long. I considered having two layers of menu, but it will become less
> convenient. First of all because of necessity to press  after typing
> each command. If user could just press a single button combination, it
> would make working with such thing much more convenient. I would also bind
> some key to allow repeating the last report without going to the main menu
> at all. Had a look at pdmenu – well, if it could be possible to use it
> for postgres_dba, it would be awesome. Does it work with MacOS, by the
> way?
>
> Also (different story), all new Postgres/psql users hate its output for
> wide result sets. Making your pspg more "available" for end users would
> help a lot here – is it possible to ship it with psql itself, at least as
> an optional feature? I don't know anyone who likes default output of wide
> tables in psql. The pspg is very underestimated.
>
> And one more thing (also a different story completely): wouldn't it be
> good to have ability to connect psql to remote Postgres over ssh tunnel
> just with some option in psql? Currently, users need to setup an SSH
> tunnel with "ssh" command first, and only then connect psql, but I see
> that for new users, it's difficult to manage such thing, some of them end
> up opening Postgres to the world (it was mentioned recently in some
> article that at least 700k Postgres instances were found with port 5432
> available for the world, so ssh tunneling option would be great in psql).
>
> And finally: several people asked me to use colors in reports, but it's
> not possible currently in psql – I can use color output for "\echo"
> command now, but I cannot do anything like that for the server's response.
> For instance, it would be nice to make some numbers colored/bold if they
> are above/below some threshold value, for specific column in the server's
> output. It could be \filter command specifying the rules of the output's
> post-processing. Can ncurses be useful for such thing?
>

ncurses can work with color well - what is difficult is parametrization of
psql report builder or some pager, that ca do this work - probably limits
will be different, ...

https://github.com/okbob/pspg

Regards

Pavel


>
> Regards,
> Nikolay
>


Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types

2018-03-26 Thread David G. Johnston
On Mon, Mar 26, 2018 at 8:24 PM, Michael Paquier 
wrote:

> On Tue, Mar 27, 2018 at 11:43:27AM +1100, Haribabu Kommi wrote:
> > Patch attached with the above behavior along with other comments from
> > upthread.
>
> Thanks for the updated version.
>
> The function changes look logically good to me.
>
> +  
> +   The PQhost function returns NULL when the
> +   input conn parameter is NULL or an empty string if conn cannot be
> evaluated.
> +   Applications of this function must carefully evaluate the return
> value.
> +  
>
> - "Applications of this function must carefully evaluate the return
> value" is rather vague, so I would append to the end "depending on the
> state of the connection involved."
> The same applies to PQport() for consistency.
>
> Perhaps the documentation should mention as well that making the
> difference between those different values is particularly relevant when
> using multiple-value strings?  I would rather add one paragraph on the
> matter than nothing.  I really think that we have been lacking clarity
> in the documentation for those APIs for too long, and people always
> argue about what they should do.  If we have a base documented, then we
> can more easily argue for the future as well, and things are clear to
> the user.
>
​

"depending on the state of the connection" doesn't move the goal-posts that
far though...and "Applications of this function" would be better written as
"Callers of this function" if left in place.

In any case something like the following framework seems more useful to the
reader who doesn't want to scan the source code for the PQhost/PQport
functions.

The PQhost function returns NULL when the input conn parameter is NULL or
an empty string if conn cannot be evaluated.  Otherwise, the return value
depends on the state of the conn: specifically (translate code to
documentation here).  Furthermore, if both host and hostaddr properties
exist on conn the return value will contain only the host.

I'm undecided on the need for a  element but would lean against it in
favor of the above, slightly longer, paragraph.

And yes, while I'm not sure right now what the multi-value condition logic
results in it should be mentioned - at least if the goal of the docs is to
be a sufficient resource for using these functions.  In particular what
happens when the connection is inactive (unless that falls under "cannot be
evaluated"...).

David J.


Re: [HACKERS] pg_serial early wraparound

2018-03-26 Thread Thomas Munro
On Tue, Mar 27, 2018 at 5:50 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Rebased again, now with a commit message.  That assertion has since
>> been removed (commit ec99dd5a) so the attached test script can once
>> again be used to see the contents of pg_serial as the xid goes all the
>> way around, if you build with TEST_OLDSERXID defined so that
>> predicate.c forces information about xids out to pg_serial.
>
> Couple thoughts here ---

Thanks for looking at this!

> Seems like if the patch is correct as-is, then the OldSerXidPage
> macro could be simplified, as the modulo no longer does anything.

The patch already did that:

-#define OldSerXidPage(xid) uint32) (xid)) /
OLDSERXID_ENTRIESPERPAGE) % (OLDSERXID_MAX_PAGE + 1))
+#define OldSerXidPage(xid) (((uint32) (xid)) / OLDSERXID_ENTRIESPERPAGE)

> Also, OldSerXidSegment doesn't seem to be used.

Right, thanks.  Removed.

> I'm a little worried because Anastasia couldn't repeat the test;
> why is that?

Hmm.  I'm not sure.  It works for me on a couple of machines and what I see is:

== setting next xid to 65536 =
...
Contents of pg_serial:
0002
== setting next xid to 1073741824 =
...
Contents of pg_serial:
8000
== setting next xid to 2147483648 =
...
Contents of pg_serial:
1
== setting next xid to 3221225472 =
...
Contents of pg_serial:
18000
== setting next xid to 65536 =
...
Contents of pg_serial:
0002
== setting next xid to 1073741824 =
...
Contents of pg_serial:
8000

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Remove-obsolete-SLRU-wrapping-and-warnings-from-p-v4.patch
Description: Binary data


Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types

2018-03-26 Thread Michael Paquier
On Tue, Mar 27, 2018 at 11:43:27AM +1100, Haribabu Kommi wrote:
> Patch attached with the above behavior along with other comments from
> upthread.

Thanks for the updated version.

The function changes look logically good to me.

+  
+   The PQhost function returns NULL when the
+   input conn parameter is NULL or an empty string if conn cannot be 
evaluated.
+   Applications of this function must carefully evaluate the return value.
+  
+
+  
+   
+when both host and hostaddr
+parameters are specified in the connection string, the connection
+host parameter is returned.
+   
Some nitpicking about the documentation changes:
- NULL needs to use the markup .
- conn should use the markup .  As the docs describe a value
of the input parameter, we cannot talk about "the connection" either in
those cases.
- "Applications of this function must carefully evaluate the return
value" is rather vague, so I would append to the end "depending on the
state of the connection involved."
The same applies to PQport() for consistency.

Perhaps the documentation should mention as well that making the
difference between those different values is particularly relevant when
using multiple-value strings?  I would rather add one paragraph on the
matter than nothing.  I really think that we have been lacking clarity
in the documentation for those APIs for too long, and people always
argue about what they should do.  If we have a base documented, then we
can more easily argue for the future as well, and things are clear to
the user.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] A design for amcheck heapam verification

2018-03-26 Thread Peter Geoghegan
On Fri, Mar 23, 2018 at 7:13 AM, Andrey Borodin  wrote:
> I've just flipped patch to WoA. But if above issues will be fixed I think 
> that patch is ready for committer.

Attached is v7, which has the small tweaks that you suggested.

Thank you for the review. I hope that this can be committed shortly.

-- 
Peter Geoghegan
From ede1ba731dc818172a94adbb6331323c1f2b1170 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 24 Aug 2017 20:58:21 -0700
Subject: [PATCH v7 1/2] Add Bloom filter data structure implementation.

A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership.  Callers will sometimes incur false
positives, but never false negatives.  The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.

Two classic applications of Bloom filters are cache filtering, and data
synchronization testing.  Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.

This commit adds a test harness extension module, test_bloomfilter.  It
can be used to get a sense of how the Bloom filter implementation
performs under varying conditions.
---
 src/backend/lib/Makefile   |   4 +-
 src/backend/lib/README |   2 +
 src/backend/lib/bloomfilter.c  | 304 +
 src/include/lib/bloomfilter.h  |  27 ++
 src/test/modules/Makefile  |   1 +
 src/test/modules/test_bloomfilter/.gitignore   |   4 +
 src/test/modules/test_bloomfilter/Makefile |  21 ++
 src/test/modules/test_bloomfilter/README   |  71 +
 .../test_bloomfilter/expected/test_bloomfilter.out |  25 ++
 .../test_bloomfilter/sql/test_bloomfilter.sql  |  22 ++
 .../test_bloomfilter/test_bloomfilter--1.0.sql |  10 +
 .../modules/test_bloomfilter/test_bloomfilter.c| 138 ++
 .../test_bloomfilter/test_bloomfilter.control  |   4 +
 src/tools/pgindent/typedefs.list   |   1 +
 14 files changed, 632 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/lib/bloomfilter.c
 create mode 100644 src/include/lib/bloomfilter.h
 create mode 100644 src/test/modules/test_bloomfilter/.gitignore
 create mode 100644 src/test/modules/test_bloomfilter/Makefile
 create mode 100644 src/test/modules/test_bloomfilter/README
 create mode 100644 src/test/modules/test_bloomfilter/expected/test_bloomfilter.out
 create mode 100644 src/test/modules/test_bloomfilter/sql/test_bloomfilter.sql
 create mode 100644 src/test/modules/test_bloomfilter/test_bloomfilter--1.0.sql
 create mode 100644 src/test/modules/test_bloomfilter/test_bloomfilter.c
 create mode 100644 src/test/modules/test_bloomfilter/test_bloomfilter.control

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index d1fefe43f2..191ea9bca2 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = binaryheap.o bipartite_match.o dshash.o hyperloglog.o ilist.o \
-	   knapsack.o pairingheap.o rbtree.o stringinfo.o
+OBJS = binaryheap.o bipartite_match.o bloomfilter.o dshash.o hyperloglog.o \
+   ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/README b/src/backend/lib/README
index 5e5ba5e437..376ae273a9 100644
--- a/src/backend/lib/README
+++ b/src/backend/lib/README
@@ -3,6 +3,8 @@ in the backend:
 
 binaryheap.c - a binary heap
 
+bloomfilter.c - probabilistic, space-efficient set membership testing
+
 hyperloglog.c - a streaming cardinality estimator
 
 pairingheap.c - a pairing heap
diff --git a/src/backend/lib/bloomfilter.c b/src/backend/lib/bloomfilter.c
new file mode 100644
index 00..dcf32c015c
--- /dev/null
+++ b/src/backend/lib/bloomfilter.c
@@ -0,0 +1,304 @@
+/*-
+ *
+ * bloomfilter.c
+ *		Space-efficient set membership testing
+ *
+ * A Bloom filter is a probabilistic data structure that is used to test an
+ * element's membership of a set.  False positives are possible, but false
+ * negatives are not; a test of membership of the set returns either "possibly
+ * in set" or "definitely not in set".  This can be very space efficient when
+ * individual elements are larger than a few bytes, because elements are hashed
+ * in order to set bits in the Bloom filter bitset.
+ *
+ * Elements can be added to the set, but not removed.  The more elements that
+ * are added, the larger the probability of false positives.  Caller must hint
+ * an estimated total size of the set when its Bloom filter is initialized.
+ * This is used to balance the use of memory against the final false positive
+ * rate.
+ *
+ * 

Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2018-03-26 Thread Chapman Flack
On 03/26/18 12:34, Tom Lane wrote:
> If that's the argument, why is the WALInsertLockUpdateInsertingAt(CurrPos)
> call still there?  GetXLogBuffer() would do that too.

"Because I hadn't noticed that," he said, sheepishly.

> In any case, the new comment ... fails to
> explain what's going on, and the reference to a function that is not
> actually called from the vicinity of the comment ...
> suggest something like "GetXLogBuffer() will fetch and initialize the
> next WAL page for us.  ... worth explaining how you know that the new
> page's header is short not long.

Here are patches responding to that (and also fixing the unintended
inclusion of .travis.yml).

What I have not done here is respond to Michael's objection, which
I haven't had time to think more about yet.

-Chap
>From 3606cccfd584654970f56e909798d0d163fa7e96 Mon Sep 17 00:00:00 2001
From: Chapman Flack 
Date: Mon, 26 Mar 2018 23:08:26 -0400
Subject: [PATCH 1/2] Zero headers of unused pages after WAL switch.

When writing zeroed pages to the remainder of a WAL segment
after a WAL switch, ensure that the headers of those pages are
also zeroed, as their initialized values otherwise reduce the
compressibility of the WAL segment file by general tools.
---
 src/backend/access/transam/xlog.c | 45 ---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cb9c2a2..cf4eaa1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1552,11 +1552,50 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
 		/* Use up all the remaining space on the first page */
 		CurrPos += freespace;
 
+		/*
+		 * Cause all remaining pages in the segment to be flushed, leaving the
+		 * XLog position where it should be at the start of the next segment. In
+		 * the process, the pages will be initialized and physically written to
+		 * the file. That costs a bit of I/O (compared to simply leaving the
+		 * rest of the file unwritten, as was once done), but has an advantage
+		 * that the tail of the file will contain predictable (ideally constant)
+		 * data, so that general-purpose compression tools perform well on it.
+		 * (If left unwritten, the tail contains whatever is left over from the
+		 * file's last use as an earlier segment, and may compress poorly.) The
+		 * I/O cost is of little concern because any period when WAL segments
+		 * are being switched out (for, e.g., checkpoint timeout reasons) before
+		 * they are filled is clearly a low-workload period.
+		 */
 		while (CurrPos < EndPos)
 		{
-			/* initialize the next page (if not initialized already) */
-			WALInsertLockUpdateInsertingAt(CurrPos);
-			AdvanceXLInsertBuffer(CurrPos, false);
+			/*
+			 * This loop only touches full pages that follow the last actually-
+			 * used data in the segment. It will never touch the first page of a
+			 * segment; we would not be here to switch out a segment to which
+			 * nothing at all had been written.
+			 *
+			 * The minimal actions to flush the page would be to call
+			 * WALInsertLockUpdateInsertingAt(CurrPos) followed by
+			 * AdvanceXLInsertBuffer(...). The page would be left initialized
+			 * mostly to zeros, except for the page header (always the short
+			 * variant, as this is never a segment's first page).
+			 *
+			 * The large vistas of zeros are good for compressibility, but
+			 * the short headers interrupting them every XLOG_BLCKSZ (and,
+			 * worse, with values that differ from page to page) are not. The
+			 * effect varies with compression tool, but bzip2 (which compressed
+			 * best among several common tools on scantly-filled WAL segments)
+			 * compresses about an order of magnitude worse if those headers
+			 * are left in place.
+			 *
+			 * Rather than complicating AdvanceXLInsertBuffer itself (which is
+			 * called in heavily-loaded circumstances as well as this lightly-
+			 * loaded one) with variant behavior, we just use GetXLogBuffer
+			 * (which itself calls the two methods we need) to get the pointer
+			 * and re-zero the short header.
+			 */
+			currpos = GetXLogBuffer(CurrPos);
+			MemSet(currpos, 0, SizeOfXLogShortPHD);
 			CurrPos += XLOG_BLCKSZ;
 		}
 	}
-- 
2.7.3

>From f9549ec41d84e60964887d4784abe4562b0bda0f Mon Sep 17 00:00:00 2001
From: Chapman Flack 
Date: Mon, 26 Mar 2018 23:09:04 -0400
Subject: [PATCH 2/2] Add test for ensuring WAL segment is zeroed out

Switch to a new WAL file and ensure the last block in the now
switched-away-from file is completely zeroed out.

This adds a mode to check_pg_config() where the match from the
regexp can be returned, not just the literal count of matches.
Only one match is returned (the first), but given the structure
of pg_config.h that's likely to be enough.
---
 src/test/perl/TestLib.pm | 18 --
 

Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread David Rowley
On 27 March 2018 at 13:45, David Rowley  wrote:
> On 27 March 2018 at 12:49, Tom Lane  wrote:
>> Oh, I thought of another thing that would need to get done, if we decide
>> to commit this.  array_agg_serialize/deserialize only work if the array
>> element type has send/receive functions.  The planner's logic to decide
>> whether partial aggregation is possible doesn't look any deeper than
>> whether the aggregate has serialize/deserialize, but I think we have to
>> teach it that if it's these particular serialize/deserialize functions,
>> it needs to check the array element type.  Otherwise we'll get runtime
>> failures if partial aggregation is attempted on array_agg().
>>
>> This would not matter for any core datatypes, since they all have
>> send/receive functions, but I imagine it's still pretty common for
>> extension types not to.
>>
>> For my money it'd be sufficient to hard-code the test like
>>
>> if ((aggserialfn == F_ARRAY_AGG_SERIALIZE ||
>>  aggdeserialfn == F_ARRAY_AGG_DESERIALIZE) &&
>> !array_element_type_has_send_and_recv(exprType(aggregate input)))
>
> I'd failed to consider this.
>
> Would it not be cleaner to just reject trans types without a
> send/receive function? Or do you think that would exclude other valid
> cases?

Seems I didn't mean "trans types". I should have said aggregate
function argument types.

The attached I adds this check.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


no_parallel_agg_when_agg_args_dont_have_sendrecv.patch
Description: Binary data


Re: [PATCH] Add missing type conversion functions for PL/Python

2018-03-26 Thread Haozhou Wang
Thanks Nikita!

On Tue, Mar 27, 2018 at 12:07 AM, Nikita Glukhov 
wrote:

> On 26.03.2018 17:19, David Steele wrote:
>
> On 2/20/18 10:14 AM, Haozhou Wang wrote:
>>
>>> Thank you very much for your review!
>>>
>>> I attached a new patch with typo fixed.
>>>
>> I think it's a bit premature to mark this Ready for Committer after a
>> review consisting of a few typos.  Anthony only said that he started
>> looking at it so I've marked it Needs Review.
>>
>
> Hi.
>
> I also have looked at this patch and found some problems.
> Attached fixed 3th version of the patch:
>  * initialization of arg->u.scalar was moved into PLy_output_setup_func()
>  * added range checks for int16 and int32 types
>  * added subroutine PLyInt_AsLong() for correct handling OverflowError
> that can
>be thrown from PyInt_AsLong()
>  * casting from Python float to PostgreSQL numeric using
> PyFloat_AsDouble() was
>removed because it can return incorrect result for Python long and
>float8_numeric() uses float8 and numeric I/O functions
>  * fixed whitespace
>
>
> --
> Nikita Glukhov
>
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


-- 
Regards,
Haozhou


Fix for pg_stat_activity putting client hostaddr into appname field

2018-03-26 Thread Edmund Horner
I noticed when querying pg_stat_activity (in 10.1):

$ SELECT pid, application_name, client_hostname, backend_type FROM
pg_stat_activity;

  pid  | application_name |client_hostname| backend_type
---+--+---+--
 10207 |  |   | autovacuum launcher
 10209 |  |   | logical
replication launcher
 10211 | psql | localhost | client backend
 10212 | DBeaver 4.3.4 - Main | ms006593.met.co.nz| client backend
 10213 | DBeaver 4.3.4 - Metadata | ms006593.met.co.nz| client backend
 10219 | psql | at-ice-db01.met.co.nz | client backend
 10205 | ms006593.met.co.nz   |   | background writer
 10204 | ms006593.met.co.nz   |   | checkpointer
 10206 | at-ice-db01.met.co.nz|   | walwriter

I've tracked this down to bootstrap/pgstat.c.

We create shared memory segments BackendAppnameBuffer,
BackendClientHostnameBuffer, and BackendActivityBufferSize with enough
room for MaxBackends strings each.  In each PgBackendStatus item, we
point st_appname, st_clienthostname, and st_activity_raw to the
appropriate offset within this blocks.

But the stats array includes auxiliary processes, which means it has
NumBackendStatSlots items.  The pointers for the aux process strings
are out of range.  In the case of my query, the pointers for
st_appname in the aux processes happen to point into the
BackendClientHostnameBuffer segment.

This patch uses NumBackendStatSlots for the sizes of those segments,
rather than MaxBackends.

I considered whether aux processes really need those strings
(especially st_clienthostname), but decided it was more consistent
just to assume that they might.  (It's an extra 3 kB... if we want to
save that we can put a bunch of if statements in pgstat_bestart and
other places.)

The patch also allocates local memory for st_clienthostname in
pgstat_read_current_status.  These strings shouldn't change very
often, but it seems safer and more consistent to treat them as we
treat the other two string fields.  Without the string copy, I think a
client disconnect and be replaced after the stats have been copied,
resulting in the new hostname appearing alongside the copied stats
fields from the old connection.

Cheers,
Edmund


pgstats_memory_sizes_v1.patch
Description: Binary data


Re: Parallel safety of binary_upgrade_create_empty_extension

2018-03-26 Thread Thomas Munro
On Tue, Mar 27, 2018 at 3:30 PM, Thomas Munro
 wrote:
> I hacked something up in Python

#   otool -tvV | \

In case anyone is interested in trying that, it should be "otool -tvV
[path to postgres executable compiled with -O0]" (meaning disassemble
it).  I was removing my home directory from the path before sending
and apparently got carried away...

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Parallel safety of binary_upgrade_create_empty_extension

2018-03-26 Thread Thomas Munro
On Tue, Mar 27, 2018 at 12:23 PM, Tom Lane  wrote:
> Querying for other functions marked 'r' leaves me with some other related
> doubts:
>
> 1. Why are the various flavors of pg_get_viewdef() marked 'r'?  Surely
> reading the catalogs is a thing parallel children are allowed to do.
> If there is a good reason for pg_get_viewdef() to be 'r', why doesn't
> the same reason apply to all the other ruleutils functions?
>
> 2. Why are the various thingy-to-xml functions marked 'r'?  Again it
> can't be because they read catalogs or data.  I can imagine that the
> reason for restricting cursor_to_xml is that the cursor might execute
> parallel-unsafe operations, but then why isn't it marked 'u'?
>
> 3. Isn't pg_import_system_collations() unsafe, for the same reason
> as binary_upgrade_create_empty_extension()?

Yeah.  I hacked something up in Python to analyse the C call graph and
look for non-PARALLEL SAFE functions written in C that can reach
AssignTransactionId.  Attached, for interest.  Not a great approach
because current_schema, fetch_search_path, SPI_XXX and a couple of
others all lead there creating many possibly false positives (though
who knows).  If I filter those out I'm left with the ones already
mentioned (pg_import_system_collations,
binary_upgrade_create_empty_extension) plus two others:

1.  unique_key_recheck, not user callable anyway.
2.  brin_summarize_range is marked 's'.  Seems wrong.

-- 
Thomas Munro
http://www.enterprisedb.com
#!/usr/bin/env python
#
# Do any functions declared PARALLEL SAFE or PARALLEL RESTRICTED have a call
# graph that could apparently reach an unsafe function?
#
# Obtain a list of pg_proc functions that are declared 'r' or 's':
#
#   psql postgres -t -c "select prosrc from pg_proc where proparallel != 'u' and prolang = 12" > pg_functions.data
#
# Obtain a list of edges in the function call graph (macOS):
#
#   otool -tvV | \
#   awk '/^_[^:]*:/ { caller = substr($1,2); }
#/\tcallq\t/ { callee = substr($3,2);
#  printf("%s%s\n", caller, callee); }' > call_graph.data

from networkx import DiGraph
from networkx.algorithms.shortest_paths.generic import has_path
from networkx.algorithms.shortest_paths.generic import shortest_path
import re

unsafe_functions = ["AssignTransactionId"]

# prune certain subgraphs from the graph because they are false positives
ignore_functions = ["errfinish", "LockAcquire", "ExecEvalNextValueExpr", "ExecInitExpr", "SPI_execute_plan", "_SPI_execute_plan", "SPI_cursor_fetch"]

pg_functions = []
with open("pg_functions.data", "r") as data:
  for line in data:
pg_functions.append(line.strip())
  
call_graph = DiGraph()
with open("call_graph.data", "r") as data:
  for line in data:
 caller, callee = line.strip().split(":")
 if callee not in ignore_functions:
   call_graph.add_node(caller)
   call_graph.add_node(callee)
   call_graph.add_edge(caller, callee)

for pg_function in pg_functions:
  if call_graph.has_node(pg_function):
for unsafe_function in unsafe_functions:
  if has_path(call_graph, pg_function, unsafe_function):
print "There is a path from %s to %s: %s" % (pg_function, unsafe_function, "->".join(shortest_path(call_graph, pg_function, unsafe_function)))
  


Re: ppc64le support in 9.3 branch?

2018-03-26 Thread Mark Wong
On Tue, Mar 27, 2018 at 11:09:37AM +1030, Andrew Dunstan wrote:
> On Tue, Mar 27, 2018 at 10:57 AM, Alvaro Herrera
>  wrote:
> > Andrew Dunstan wrote:
> >
> >> I am in discussions with Mark, he's going to disable the animals from
> >> building 9.3. (by setting branches_to_build to 'HEAD_PLUS_LATEST4'
> >> instead of 'ALL').
> >
> > So once we desupport 9.3, they'll stop building 9.4? :-)
> >
> 
> 
> Then we'll set it back to ALL :-)

I'll put it on my calendar. :)

Regards,
Mark

-- 
Mark Wonghttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services



Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-26 Thread Michael Paquier
On Sat, Mar 24, 2018 at 09:12:09PM -0400, Stephen Frost wrote:
> Yeah, neither 2 or 3 really appeals to me.  Option 1 does touch a number
> of places but in a pretty straight-forward way- and if there's a typo
> there, the compiler is likely to complain, so it seems like the risk is
> relatively low.

One example of place which can be easily consolidated is pg_wal whose
definition is in xlog_internal.h.  And there are a couple of other
places which could be consolidated without major refactoring like what I
proposed initially on this thread.  I would suggest to focus on this
effort on a separate thread later on.
--
Michael


signature.asc
Description: PGP signature


Re: GSOC 2018 Proposal review

2018-03-26 Thread Mark Wong
Hi Garima,

On Tue, Mar 20, 2018 at 10:36:05PM +0530, Garima Natani wrote:
> Hi All,
> 
> I am looking forward to working with PostgreSQL organization In GSoC 2018.

Glad to see you're interested in this project!

> I have created Proposal for "Develop Performance Farm Database and Website"
> Project. Please,
> 
> Please, can you review it send comments?

I think the general idea looks ok.  The biggest item I'd adjust is that
the performance farm plants (to distinguish them from the build farm
animals) shouldn't connect directly to the database.  Some of the django
web app work needs to be moved up into phase one, because I'd propose
the web app should also have an API that the build farm plants should
use to transfer results.

Good luck!

Regards,
Mark

-- 
Mark Wonghttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services



Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-26 Thread Michael Paquier
On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote:
> +1. It's better for us to focus on the code change of the fillter on pg_rewind
> rather than such "refactoring".

(filter takes one 'l', not two)

Okay.  I had my mind mostly focused on how to shape the exclusion list
and get it shared between the base backup and pg_rewind, so let's move
on with the duplicated list for now.  I did not put much efforts into
the pg_rewind portion to be honest.

> As I told upthread, the previous patch has the
> problem where the files which should be skipped are not skipped. ISTM that,
> in pg_rewind, the filter should be triggered in recurse_dir() not
> process_source_file().

If you put that into recurse_dir you completely ignore the case where
changes are fetched by libpq.  Doing the filtering when processing the
file map has the advantage to take care of both the local and remote
cases, which is why I am doing it there.  So you would just get half of
the cake and not the whole of it.

> BTW what should pg_rewind do when it finds the directory which should be
> skipped, in the source directory? In your patch, pg_rewind just tries to skip
> that directory at all. But isn't this right behavior? If that directory 
> doesn't
> exist in the target directory (though I'm not sure if this situation is 
> really 
> possible), I'm thinking that pg_rewind should create that "empty" directory
> in the target. No?

I am not exactly sure what you are coming up with here.  The target
server should have the same basic directory mapping as the source as the
target has been initialized normally with initdb or a base backup from
another node, so checking for the *contents* of directories is enough
and keeps the code more simple, as the exclude filter entries are based
on elements inherent to PostgreSQL internals.  Please note as well that
if a non-system directory is present on the source but not the target
then it would get created on the target.

At the end I have finished with the attached.  I have taken the decision
to not include as well xlog.h in pg_rewind to avoid having to drag a lot
of backend-only headers like pg_resetwal does, which I prefer avoid as
that's only hardcoding values for "backup_label" and "tablespace_map".
This applies filters based on directory contents, so by running the
regression tests you can see entries like the following ones:
entry "postmaster.opts" excluded from source file list
entry "pg_subtrans/" excluded from source file list
entry "pg_notify/" excluded from source file list
entry "base/12360/pg_internal.init" excluded from source file list
entry "backup_label.old" excluded from source file list
entry "global/pg_internal.init" excluded from source file list
entry "postmaster.opts" excluded from target file list
entry "pg_subtrans/" excluded from target file list
entry "pg_notify/" excluded from target file list
entry "base/12360/pg_internal.init" excluded from target file list
entry "global/pg_internal.init" excluded from target file list

Processing the filemap list on the target also matters in my opinion.
When at recovery, all the previous files will be wiped out, and we
should not remove either things like postmaster.pid as those are around
to prevent corruption problems.

Thanks,
--
Michael
From ba4502c9711c7b7933a5ef2186e895c0ae6b4616 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 5 Feb 2018 15:48:35 +0900
Subject: [PATCH] Add exclude list similar to base backups in pg_rewind

After being rewound, a standby to-be-recycled needs to perform recovery
from the last checkpoint where WAL forked after a promotion, which leads
it to automatically remove some files which may have been copied from
the source cluster. This makes use of the same filtering list as base
backups to find out what is this data and then remove it. This reduces
the amount of data transferred during a rewind without changing the
usefulness of the operation.

Documentation is updated to take into account what is filtered out.
---
 doc/src/sgml/ref/pg_rewind.sgml  |  14 +++-
 src/backend/replication/basebackup.c |   3 +
 src/bin/pg_rewind/filemap.c  | 144 ---
 3 files changed, 148 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 8e49249826..520d843f0e 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -231,7 +231,19 @@ PostgreSQL documentation
  
   Copy all other files such as pg_xact and
   configuration files from the source cluster to the target cluster
-  (everything except the relation files).
+  (everything except the relation files). Similarly to base backups,
+  the contents of the directories pg_dynshmem/,
+  pg_notify/, pg_replslot/,
+  pg_serial/, pg_snapshots/,
+  pg_stat_tmp/, and
+  pg_subtrans/ are omitted from the data copied
+  from the source cluster. Any file or 

Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread David Rowley
On 27 March 2018 at 12:49, Tom Lane  wrote:
> I wrote:
>> The main thing that remains undone is to get some test coverage ---
>> AFAICS, none of these new functions get exercised in the standard
>> regression tests.
>
> Oh, I thought of another thing that would need to get done, if we decide
> to commit this.  array_agg_serialize/deserialize only work if the array
> element type has send/receive functions.  The planner's logic to decide
> whether partial aggregation is possible doesn't look any deeper than
> whether the aggregate has serialize/deserialize, but I think we have to
> teach it that if it's these particular serialize/deserialize functions,
> it needs to check the array element type.  Otherwise we'll get runtime
> failures if partial aggregation is attempted on array_agg().
>
> This would not matter for any core datatypes, since they all have
> send/receive functions, but I imagine it's still pretty common for
> extension types not to.
>
> For my money it'd be sufficient to hard-code the test like
>
> if ((aggserialfn == F_ARRAY_AGG_SERIALIZE ||
>  aggdeserialfn == F_ARRAY_AGG_DESERIALIZE) &&
> !array_element_type_has_send_and_recv(exprType(aggregate input)))

I'd failed to consider this.

Would it not be cleaner to just reject trans types without a
send/receive function? Or do you think that would exclude other valid
cases?


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: idea - custom menu

2018-03-26 Thread Nikolay Samokhvalov
On Sat, Mar 3, 2018 at 1:08 PM, Pavel Stehule 
wrote:

> Hi
>

Hi Pavel, I don't know how I missed this email, but finally I found it :-)


> I am looking on project https://github.com/NikolayS/postgres_dba
>
> Nice idea, and perfect publicity of typical PostgreSQL maintenance task.
>

Thank you! It's in very early stage, I'm going to polish existing reports
and add more soon.


>
> We can allow to call some external applications on some key press (not
> command) and evaluate their output. mc has F2 for User menu. It can works
> well with pdmenu or some similar. Some simple menu can be done on we lines
> with ncurses.
>

> It can be substitution of some custom commands.
>
> Comments, notes?
>

This is a great idea. Adding more reports means that menu will be very
long. I considered having two layers of menu, but it will become less
convenient. First of all because of necessity to press  after typing
each command. If user could just press a single button combination, it
would make working with such thing much more convenient. I would also bind
some key to allow repeating the last report without going to the main menu
at all. Had a look at pdmenu – well, if it could be possible to use it for
postgres_dba, it would be awesome. Does it work with MacOS, by the way?

Also (different story), all new Postgres/psql users hate its output for
wide result sets. Making your pspg more "available" for end users would
help a lot here – is it possible to ship it with psql itself, at least as
an optional feature? I don't know anyone who likes default output of wide
tables in psql. The pspg is very underestimated.

And one more thing (also a different story completely): wouldn't it be good
to have ability to connect psql to remote Postgres over ssh tunnel just
with some option in psql? Currently, users need to setup an SSH tunnel with
"ssh" command first, and only then connect psql, but I see that for new
users, it's difficult to manage such thing, some of them end up opening
Postgres to the world (it was mentioned recently in some article that at
least 700k Postgres instances were found with port 5432 available for the
world, so ssh tunneling option would be great in psql).

And finally: several people asked me to use colors in reports, but it's not
possible currently in psql – I can use color output for "\echo" command
now, but I cannot do anything like that for the server's response. For
instance, it would be nice to make some numbers colored/bold if they are
above/below some threshold value, for specific column in the server's
output. It could be \filter command specifying the rules of the output's
post-processing. Can ncurses be useful for such thing?

Regards,
Nikolay


Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types

2018-03-26 Thread Haribabu Kommi
On Tue, Mar 27, 2018 at 12:23 AM, Michael Paquier 
wrote:

> On Mon, Mar 26, 2018 at 11:39:51PM +1100, Haribabu Kommi wrote:
> > And if we are not going to change the above, then PQhost() function
> > returns 3 values,
> > - NULL when the conn==NULL
> > - Actual host or hostaddr of the active connection
> > - Empty string when the conn is not able to evaluate.
> >
> > Is it fine to proceed like above?
>
> Yeah, I would think that this is a sensible approach.  Keeping
> consistent with NULL is good for the other function, and we still need a
> way to identify the dont-know state.  Peter, what's your input on that?
>

Patch attached with the above behavior along with other comments from
upthread.


Regards,
Hari Babu
Fujitsu Australia


PQhost-to-return-active-connected-host-and-hostaddr_v6.patch
Description: Binary data


Re: ppc64le support in 9.3 branch?

2018-03-26 Thread Andrew Dunstan
On Tue, Mar 27, 2018 at 10:57 AM, Alvaro Herrera
 wrote:
> Andrew Dunstan wrote:
>
>> I am in discussions with Mark, he's going to disable the animals from
>> building 9.3. (by setting branches_to_build to 'HEAD_PLUS_LATEST4'
>> instead of 'ALL').
>
> So once we desupport 9.3, they'll stop building 9.4? :-)
>


Then we'll set it back to ALL :-)

cheers

andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread David Rowley
On 27 March 2018 at 13:26, Alvaro Herrera  wrote:
> David Rowley wrote:
>
>> Anyway, the options are not zero for anyone who is strongly affected
>> with no other workaround. They just need to disable parallel query,
>> which to me seems fairly similar to the 8.4 release note's "the
>> previous behavior can be restored by disabling enable_hashagg"
>
> synchronized_seqscans is another piece of precedent in the area, FWIW.

This is true. I guess the order of aggregation could be made more
certain if we remove the cost based optimiser completely, and just
rely on a syntax based optimiser.  Unfortunately, even with that
UPDATE will still reuse free space created by DELETE+VACUUM, so we'd
need to get rid of VACUUM too. Of course, I'm not being serious, but I
was just trying to show there's a lot more to be done if we want to
make the aggregate input order deterministic without an ORDER BY in
the aggregate function.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-26 Thread Tomas Vondra
Hi Dean,

Here is an updated patch (hopefully) fixing the bugs you've reported so
far. In particular, it fixes this:

1) mostly harmless memset bug in UpdateStatisticsForTypeChange

2) passing the right list (stat_clauses) to mcv_clauselist_selectivity

3) corrections to a couple of outdated comments

4) handling of NOT clauses in MCV lists (and in histograms)

The query you posted does not fail anymore, but there's a room for
improvement. We should be able to handle queries like this:

select * from foo where a=1 and not b=1;

But we don't, because we only recognize F_EQSEL, F_SCALARLTSEL and
F_SCALARGTSEL, but F_NEQSEL (which is what "not b=1" uses). Should be
simple to fix, I believe.

5) handling of mcv_lowsel in statext_clauselist_selectivity

I do believe the new behavior is correct - as I suspected, I broke this
during the last rebase, where I also moved some stuff from the histogram
part to the MCV part. I've also added the (sum of MCV frequencies), as
you suggested.

I think we could improve the estimate further by computing ndistinct
estimate, and then using that to compute average frequency of non-MCV
items. Essentially what var_eq_const does:

if (otherdistinct > 1)
selec /= otherdistinct;

Not sure how to do that when there are not just equality clauses.

BTW I think there's a bug in handling the fullmatch flag - it should not
be passed to AND/OR subclauses the way it is, because then

WHERE a=1 OR (a=2 AND b=2)

will probably set it to 'true' because of (a=2 AND b=2). Which will
short-circuit the statext_clauselist_selectivity, forcing it to ignore
the non-MCV part.

But that's something I need to look at more closely tomorrow. Another
thing I probably need to do is to add more regression tests, protecting
against bugs similar to those you found.

Thanks for the feedback so far!


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-multivariate-MCV-lists-20180327.patch.gz
Description: application/gzip


0002-multivariate-histograms-20180327.patch.gz
Description: application/gzip


Re: ppc64le support in 9.3 branch?

2018-03-26 Thread Alvaro Herrera
Andrew Dunstan wrote:

> I am in discussions with Mark, he's going to disable the animals from
> building 9.3. (by setting branches_to_build to 'HEAD_PLUS_LATEST4'
> instead of 'ALL').

So once we desupport 9.3, they'll stop building 9.4? :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread Alvaro Herrera
David Rowley wrote:

> Anyway, the options are not zero for anyone who is strongly affected
> with no other workaround. They just need to disable parallel query,
> which to me seems fairly similar to the 8.4 release note's "the
> previous behavior can be restored by disabling enable_hashagg"

synchronized_seqscans is another piece of precedent in the area, FWIW.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ppc64le support in 9.3 branch?

2018-03-26 Thread Andrew Dunstan
On Mon, Mar 26, 2018 at 4:50 PM, Michael Paquier  wrote:
> On Fri, Mar 23, 2018 at 08:01:53PM +0100, Magnus Hagander wrote:
>> On Fri, Mar 23, 2018 at 7:58 PM, Andres Freund  wrote:
>>> I'm inclined to just ask them to stop running the animals on that
>>> branch. There are no pre-existing users on 9.3 ppc64le, and new
>>> customers hopefully won't move to 9.3. ISTM backpatching is riskier than
>>> just changing a bunch of buildfarm configurations.
>>
>> +1 for dropping it.
>
> +1.  If something like that were to happen for 10 or 9.6 knowing that
> they still have four years to go, that could be debated, but just for 6
> months there is little benefit.


I am in discussions with Mark, he's going to disable the animals from
building 9.3. (by setting branches_to_build to 'HEAD_PLUS_LATEST4'
instead of 'ALL').

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread David Rowley
On 27 March 2018 at 11:28, Tom Lane  wrote:
> David Rowley  writes:
>> This very much reminds me of something that exists in the 8.4 release notes:
>>> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce sorted 
>>> output (Tom)
>
> That's a completely false analogy: we got a significant performance
> benefit for a significant fraction of users by supporting hashed
> aggregation.  My argument here is that only a very tiny fraction of
> string_agg/array_agg users will not care about aggregation order, and thus
> I don't believe that this patch can help very many people.  Against that,
> it's likely to hurt other people, by breaking their queries and forcing
> them to insert expensive explicit sorts to fix it.  Even discounting the
> backwards-compatibility question, we don't normally adopt performance
> features for which it's unclear that the net gain over all users is
> positive.

I don't believe you can go and claim this is a false analogy based on
your estimates on the number of cases in which each applies. The
analogy which I was pointing at was that we've been here before...
we've had versions of the planner which generated plans which would
have a deterministic sort order and we've later discovered that we can
improve performance by allowing the planner to have more flexibility
to do things in different ways which may no longer provide implicitly
sorted results. We've previously recognised that some users may have
become accustomed to the previous behaviour and we've mentioned a
workaround in the release notes so that those users are not out in the
cold. This seems exactly the same to me, and certainly not "false".

I have to say, it really would be a shame to have this concern block
us from future optimisations in aggregation.

I also think that anyone who expects string_agg to always aggregate in
the same order has not done a good job of looking at the docs. I see
you already quoted the "This ordering is unspecified by default, but
can be controlled by writing an ORDER BY clause within the aggregate
call, as shown in Section 4.2.7." part. This text appears in all
version of PostgreSQL that we currently support.   We do sometimes
change things where the docs say something like "this works, but don't
rely on it, we might change in the future", in this case, we've never
even claimed that it worked in the first place!

Anyway, the options are not zero for anyone who is strongly affected
with no other workaround. They just need to disable parallel query,
which to me seems fairly similar to the 8.4 release note's "the
previous behavior can be restored by disabling enable_hashagg"

If we go by your argument then we should perhaps remove parallel
aggregation for all the floating point types too, since the order in
which such values are aggregated also can affect the result. I
mentioned this in [1], but nobody seemed too concerned at the time.

I see some other discussion on this whole topic in [2]. Looks like
parallel array_agg would please the PostGIS people.

[1] 
https://www.postgresql.org/message-id/CAKJS1f8QRDLvewk336SzUzxiXH1wBHG8rdKsqWEHbAraMXA2_Q%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CAFjFpRe9W5xvYai-QOs6RshrJf7gWFsiZEZtxnu8vD4qLQZ3LQ%40mail.gmail.com#cafjfpre9w5xvyai-qos6rshrjf7gwfsizeztxnu8vd4qlqz...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: CALL optional in PL/pgSQL

2018-03-26 Thread Andrew Dunstan
On Fri, Mar 2, 2018 at 2:01 AM, Tom Lane  wrote:
> Pavel Stehule  writes:
>> 2018-03-01 5:51 GMT+01:00 Peter Eisentraut >> This seems to be a popular issue when porting from PL/SQL, so I'll throw
>>> it out here for discussion.  Apparently, in PL/SQL you can call another
>>> procedure without the CALL keyword.  Here is a patch that attempts to
>>> implement that in PL/pgSQL as well.  It's not very pretty.
>
>> The CALL is not optional in PL/SQL - I was surprised - it is required in
>> some environments, and it should not be used in other (like PL/SQL)


It's not required in a PL/SQLl block, see


This is hardly surprising given PL/SQL's Ada roots. This is exactly
how Ada allows procedure calls.


>
> I think this is an actively bad idea.  It introduces an inherent ambiguity
> into the grammar; for instance
>
> PERFORM (2);
>
> now has two valid interpretations.  The only way to resolve that is with
> heuristics or treating a bunch more words as reserved keywords, neither of
> which are appetizing.  (I didn't look to see which way Peter did it, but
> his description of his patch as "not very pretty" doesn't fill me with
> happiness.)  And it would likely cause headaches down the road whenever
> we attempt to add new syntax to plpgsql.
>
> I think we should reject the idea.


Well, the upside would be increased Oracle compatibility. I don't
think that's worthless.

I haven't dug deeply into it, but Peter's patch didn't look
desperately ugly to me at first glance.

Even though it's a tiny patch this seems like next release material at best.

>
>>> Also, I think PL/SQL allows you to call a procedure with no arguments
>>> without parentheses.  I have not implemented that.  I think it could be
>>> done, but it's not very appealing.
>
>> I don't like this feature.
>
> This idea is even worse in terms of the amount of syntax space it will
> occupy, again for zero functional benefit.
>

If we were going to do it then we should be consistent about it and
also allow parameter-less function calls to skip the parentheses. But
anyway none of that is currently proposed so let's save the argument
for the time when it is :-)

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: http2 wire format

2018-03-26 Thread Craig Ringer
On 26 March 2018 at 22:56, Tom Lane  wrote:

> Damir Simunic  writes:
> >> On 26 Mar 2018, at 11:06, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
> >>> If anyone finds the idea of Postgres speaking http2 appealing
>
> TBH, this sounds like a proposal to expend a whole lot of work (much of it
> outside the core server, and thus not under our control) in order to get
> from a state of affairs where there are things we'd like to do but can't
> because of protocol compatibility worries, to a different state of affairs
> where there are things we'd like to do but can't because of protocol
> compatibility worries.  Why would forcing our data into a protocol
> designed for a completely different purpose, and which we have no control
> over, be a step forward?  How would that address the fundamental issue of
> inertia in multiple chunks of software (ie, client libraries and
> applications as well as the server)?
>

I think the idea is that the protocol (supposedly) solves a lot of the
issues we now face, and has sufficient extensibility built in for future
use.

I'm not convinced. The v4 protocol TODO hasn't been addressed, not has
support for handshake authentication models like SSPI, GSSAPI. There's been
no mention of query cancels, text encodings, or any of the other ongoing
pain points in the v3 protocol.

I completely understand the desire to support a totally new model where Pg
accepts and internally dispatches requests to a separate set of executors,
which may or may not be 1:1 with session state. I think we all do. But
predicating a protocol change on that being possible is wholly impractical.
But it looks like the availability of something like that is just being
assumed.

I want to see concrete reasons why this meets our existing and future
needs, and those of client apps.

I want to see _EXAMPLES_ of how protocol exchanges would work. Show:

- connect
- authenticate
- establish session
- begin txn
- query
- result set
- query
- error midway through result set
- sync recovery
- rollback
- utility query
- resultset
- query
- query cancel


There's been no visible consideration of overheads and comparison with
existing v3 protocol. Personally I'm fine with adding some protocol
overhead in bytes terms; low latency links have the bandwidth not to care
much compared to payload sizes etc. On high latency links it's all about
the round trips, not message sizes. But I want to know what those overheads
are, and why they're there.

I'm rapidly losing interest. Unless this goes back toward the concrete and
practical I think it's going nowhere.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proposal: http2 wire format

2018-03-26 Thread Stephen Frost
Greetings,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 26 March 2018 at 21:05, Damir Simunic 
> wrote:
> > The same goes for the ‘authorization’ header. Postgres does not support
> > Bearer token authorization today. But maybe you’ll be able to define a
> > function that knows how to deal with the token, and somehow signal to
> > Postgres that you want it to call this function when it sees such a header.
> > Or maybe someone wrote a plugin that does that, and you configure your
> > server to use it.
> 
> You've consistently ignored my comments re authentication and authorization.
> 
> How would a multi-step handshake authentication like GSSAPI or SSPI be
> implemented with HTTP2? Efficiently?

I've been trying to avoid this thread, but I'll throw in that the way
GSSAPI is handled on the web today is through SPNEGO:

https://en.wikipedia.org/wiki/SPNEGO

Would be great to get pgAdmin4 to work under a webserver which is
performing SPNEGO and Kerberos delegation to allow user who are
authenticated to the web server to let the web server proxy those
credentials to allow connecting to PG, and, independently, for
credentials to be able to be delegated to PG which can be used for
connections to other services via FDWs.

All of that is largely independent of http2, of course.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread Tom Lane
I wrote:
> The main thing that remains undone is to get some test coverage ---
> AFAICS, none of these new functions get exercised in the standard
> regression tests.

Oh, I thought of another thing that would need to get done, if we decide
to commit this.  array_agg_serialize/deserialize only work if the array
element type has send/receive functions.  The planner's logic to decide
whether partial aggregation is possible doesn't look any deeper than
whether the aggregate has serialize/deserialize, but I think we have to
teach it that if it's these particular serialize/deserialize functions,
it needs to check the array element type.  Otherwise we'll get runtime
failures if partial aggregation is attempted on array_agg().

This would not matter for any core datatypes, since they all have
send/receive functions, but I imagine it's still pretty common for
extension types not to.

For my money it'd be sufficient to hard-code the test like

if ((aggserialfn == F_ARRAY_AGG_SERIALIZE ||
 aggdeserialfn == F_ARRAY_AGG_DESERIALIZE) &&
!array_element_type_has_send_and_recv(exprType(aggregate input)))

Someday we might need more flexible testing, but that can be solved
some other time.

regards, tom lane



Re: Parallel safety of binary_upgrade_create_empty_extension

2018-03-26 Thread Tom Lane
Thomas Munro  writes:
> On Tue, Mar 27, 2018 at 11:58 AM, Tom Lane  wrote:
>> I wonder whether we shouldn't mark *all* of these parallel-unsafe.
>> I'm not exactly convinced that 'restricted' is sufficient for the
>> others, and even if it is, there's certainly little if any upside
>> for letting them be executed in parallel-enabled mode.

> Yeah, I almost sent a patch to do exactly that, but then I realised
> the others all just set a global variable so they're technically fine
> as 'r'.

Yeah, so I see after looking closer.

> I have no strong preference either way; these functions will
> only actually be run in parallel in the weird situation of
> force_parallel_mode = on.

While the minimal patch would be fine for now, what I'm worried about
is preventing future mistakes.  It seems highly likely that the reason
binary_upgrade_create_empty_extension is marked 'r' is that somebody
copied-and-pasted that from another binary_upgrade_foo function without
thinking very hard.  Marking them all 'u' would help to forestall future
mistakes of the same sort, while leaving them as 'r' doesn't seem to buy
us anything much (beyond a smaller catalog patch today).

Querying for other functions marked 'r' leaves me with some other related
doubts:

1. Why are the various flavors of pg_get_viewdef() marked 'r'?  Surely
reading the catalogs is a thing parallel children are allowed to do.
If there is a good reason for pg_get_viewdef() to be 'r', why doesn't
the same reason apply to all the other ruleutils functions?

2. Why are the various thingy-to-xml functions marked 'r'?  Again it
can't be because they read catalogs or data.  I can imagine that the
reason for restricting cursor_to_xml is that the cursor might execute
parallel-unsafe operations, but then why isn't it marked 'u'?

3. Isn't pg_import_system_collations() unsafe, for the same reason
as binary_upgrade_create_empty_extension()?

regards, tom lane



Re: Parallel safety of binary_upgrade_create_empty_extension

2018-03-26 Thread Thomas Munro
On Tue, Mar 27, 2018 at 11:58 AM, Tom Lane  wrote:
> # select proname, proparallel from pg_proc where proname like 'binary_upg%';
>   proname   | proparallel
> +-
>  binary_upgrade_create_empty_extension  | r
>  binary_upgrade_set_next_array_pg_type_oid  | r
>  binary_upgrade_set_next_heap_pg_class_oid  | r
>  binary_upgrade_set_next_index_pg_class_oid | r
>  binary_upgrade_set_next_pg_authid_oid  | r
>  binary_upgrade_set_next_pg_enum_oid| r
>  binary_upgrade_set_next_pg_type_oid| r
>  binary_upgrade_set_next_toast_pg_class_oid | r
>  binary_upgrade_set_next_toast_pg_type_oid  | r
>  binary_upgrade_set_record_init_privs   | r
> (10 rows)
>
> I wonder whether we shouldn't mark *all* of these parallel-unsafe.
> I'm not exactly convinced that 'restricted' is sufficient for the
> others, and even if it is, there's certainly little if any upside
> for letting them be executed in parallel-enabled mode.

Yeah, I almost sent a patch to do exactly that, but then I realised
the others all just set a global variable so they're technically fine
as 'r'.  I have no strong preference either way; these functions will
only actually be run in parallel in the weird situation of
force_parallel_mode = on.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Parallel safety of binary_upgrade_create_empty_extension

2018-03-26 Thread Tom Lane
... BTW:

# select proname, proparallel from pg_proc where proname like 'binary_upg%';
  proname   | proparallel 
+-
 binary_upgrade_create_empty_extension  | r
 binary_upgrade_set_next_array_pg_type_oid  | r
 binary_upgrade_set_next_heap_pg_class_oid  | r
 binary_upgrade_set_next_index_pg_class_oid | r
 binary_upgrade_set_next_pg_authid_oid  | r
 binary_upgrade_set_next_pg_enum_oid| r
 binary_upgrade_set_next_pg_type_oid| r
 binary_upgrade_set_next_toast_pg_class_oid | r
 binary_upgrade_set_next_toast_pg_type_oid  | r
 binary_upgrade_set_record_init_privs   | r
(10 rows)

I wonder whether we shouldn't mark *all* of these parallel-unsafe.
I'm not exactly convinced that 'restricted' is sufficient for the
others, and even if it is, there's certainly little if any upside
for letting them be executed in parallel-enabled mode.

regards, tom lane



Re: Parallel safety of binary_upgrade_create_empty_extension

2018-03-26 Thread Tom Lane
Thomas Munro  writes:
> Here's a single character patch to mark
> that function PARALLEL UNSAFE.

Ugh.  Clearly a bug.

> Obviously that'll affect only newly
> initdb'd clusters after this patch, but that's what people have in a
> pg_upgrade scenario.

We're a bit fortunate on that.  I wonder if there's some less
trial-and-error way of discovering that functions are or are not
parallel safe.

> This goes back to d89f06f0482 so I think it should probably be
> back-patched to 9.6 and 10.

Roger, will do.

regards, tom lane



Parallel safety of binary_upgrade_create_empty_extension

2018-03-26 Thread Thomas Munro
Hi,

My colleague Richard Yen came across this situation:

pg_restore: [archiver (db)] could not execute query: ERROR:  cannot
assign XIDs during a parallel operation
Command was: -- For binary upgrade, create an empty extension and
insert objects into it
DROP EXTENSION IF EXISTS "btree_gin";
SELECT pg_catalog.binary_upgrade_create_empty_extension('btree_gin',
'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]);

It turned out that the target cluster was running with
force_parallel_mode = on.  Here's a single character patch to mark
that function PARALLEL UNSAFE.  Obviously that'll affect only newly
initdb'd clusters after this patch, but that's what people have in a
pg_upgrade scenario.

This goes back to d89f06f0482 so I think it should probably be
back-patched to 9.6 and 10.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Mark-binary_upgrade_create_empty_extension-PARALLEL-.patch
Description: Binary data


Re: Proposal: http2 wire format

2018-03-26 Thread David G. Johnston
On Mon, Mar 26, 2018 at 1:05 PM, Damir Simunic  wrote:

> Would it be the only protocol supported? What if I wanted JSON or CSV
> returned, or just plain old Postgres v3 binary format, since I already have
> the parser written for it? Wouldn’t you need to first solve the problem of
> content negotiation?
>

​Is content negotiation - beyond client/server character encoding -
something we want the server to be capable of performing?  My gut reaction
is no.

Getting rid of having to write a framing parser in every client language
> ​?​
>


​How large a problem/benefit is this overall?  We are choosing between
standard-but-new versus specialized-but-widely-implemented.  While the v3
protocol is a sunk cost there must be considerable value in incrementing it
20% to get to better place rather than starting over from scratch with a
general-purpose, and I suspect more verbose, protocol.

I admire the vision presented here but I do wonder whether its asking
PostgreSQL to be more than it is reasonably capable of being?  Presently
the architectures I'm aware of have clients talk to middleware application
servers, running DB drivers, talking to PostgreSQL clusters.  This vision
wants to remove the middleware application server and allow clients to
directly communicate with the server in client-native protocols and formats
(http and json).  That adds a considerable amount of responsibility to
PostgreSQL that it does not presently have and, having observed the
community for a number of years now and seeing the responses on this
thread, is responsibility it probably should not be given.  Let those
concerns reside in the middleware under the control of developers -
potentially through frameworks such as PostGraphile [1] and the like.

Or a fork - one that can choose to operate different and/or more frequent
release cycle than the annual one that PostgreSQL uses.

David J.

[1] https://github.com/graphile/postgraphile


Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread Tom Lane
David Rowley  writes:
> On 27 March 2018 at 09:27, Tom Lane  wrote:
>> I do not think it is accidental that these aggregates are exactly the ones
>> that do not have parallelism support today.  Rather, that's because you
>> just about always have an interest in the order in which the inputs get
>> aggregated, which is something that parallel aggregation cannot support.

> This very much reminds me of something that exists in the 8.4 release notes:
>> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce sorted 
>> output (Tom)

That's a completely false analogy: we got a significant performance
benefit for a significant fraction of users by supporting hashed
aggregation.  My argument here is that only a very tiny fraction of
string_agg/array_agg users will not care about aggregation order, and thus
I don't believe that this patch can help very many people.  Against that,
it's likely to hurt other people, by breaking their queries and forcing
them to insert expensive explicit sorts to fix it.  Even discounting the
backwards-compatibility question, we don't normally adopt performance
features for which it's unclear that the net gain over all users is
positive.

Maybe what that says is that rather than giving up on this altogether,
we should shelve it till we have less stupid planner+executor behavior
for ORDER BY inside aggregates.  That's been on the to-do list for
a long while, but not risen to the top ...

regards, tom lane



Re: Backend memory dump analysis

2018-03-26 Thread Tom Lane
Vladimir Sitnikov  writes:
> It takes just a single pass to compute "total" (and it takes no memory), so
> it would be much better if "TopMemoryContext: ..." was replaced with
> "Total memory used by all contexts is XXX bytes"
> Current TopMemoryContext row is extremely misleading.

This may or may not be a good thing to do, but in any case it's well
outside the scope of this patch, whose ambition is only to get additional
identification info attached to contexts for which that's useful.
Moreover, seeing how late we are in the v11 cycle, it's hard to justify
doing more; that smells like a new feature and the time for that is past
for this year.  The only reason I'm considering this patch now at all
is that it rethinks some API changes we made earlier in v11, and it'd be
nice to avoid an additional round of churn there in v12.

> PS. SQL text might involve sensitive information (e.g. logins, passwords,
> personal IDs), so there might be security issues with printing SQL by
> default.

Indeed, that's something that extensions would need to think about.
I do not believe it's an issue for MemoryContextStats though; the
postmaster log can already contain sensitive data.

regards, tom lane



Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Peter Geoghegan
On Mon, Mar 26, 2018 at 12:17 PM, Simon Riggs  wrote:
>> As far as I
>> know, the proposed MERGE patch has that issue an existing DML commands
>> don't; but someone else may have better information.
>
> I will look deeper and report back.

It's quite clear that the problem exists with the MERGE patch; the
simple fact that RangeVarGetRelidExtended() is called twice with the
same RangeVar argument shows this. However, the Oid cross-check seems
like a sufficient defense against an inconsistency that causes real
trouble, since the cross-check will only error-out when a concurrent
table creation (or maybe ALTER TABLE) makes a second table visible, in
a schema that appears earlier in the user's search_path. It's hard to
imagine any legitimate user truly preferring some alternative behavior
in this particular scenario, which makes it okay.

This cross-check workaround is ugly, but apparently there is a
precedent in copy.c. I didn't know that detail until Robert pointed it
out. That makes me feel a lot better about this general question of
how the target relation is represented, having two RTEs, etc.

-- 
Peter Geoghegan



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread David Rowley
On 27 March 2018 at 09:27, Tom Lane  wrote:
> I spent a fair amount of time hacking on this with intent to commit,
> but just as I was getting to code that I liked, I started to have second
> thoughts about whether this is a good idea at all.  I quote from the fine
> manual:
>
> The aggregate functions array_agg, json_agg, jsonb_agg,
> json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as
> similar user-defined aggregate functions, produce meaningfully
> different result values depending on the order of the input
> values. This ordering is unspecified by default, but can be controlled
> by writing an ORDER BY clause within the aggregate call, as shown in
> Section 4.2.7. Alternatively, supplying the input values from a sorted
> subquery will usually work ...
>
> I do not think it is accidental that these aggregates are exactly the ones
> that do not have parallelism support today.  Rather, that's because you
> just about always have an interest in the order in which the inputs get
> aggregated, which is something that parallel aggregation cannot support.

This was not in my list of reasons for not adding them the first time
around. I mentioned these reasons in a response to Stephen.

> I fear that what will happen, if we commit this, is that something like
> 0.01% of the users of array_agg and string_agg will be pleased, another
> maybe 20% will be unaffected because they wrote ORDER BY which prevents
> parallel aggregation, and the remaining 80% will scream because we broke
> their queries.  Telling them they should've written ORDER BY isn't going
> to cut it, IMO, when the benefit of that breakage will accrue only to some
> very tiny fraction of use-cases.

This very much reminds me of something that exists in the 8.4 release notes:

> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce sorted 
> output (Tom)

> Previously, these types of queries always removed duplicate rows by means of 
> Sort/Unique processing (i.e., sort then remove adjacent duplicates). Now they 
> can be implemented by hashing, which will not produce sorted output. If an 
> application relied on the output being in sorted order, the recommended fix 
> is to add an ORDER BY clause. As a short-term workaround, the previous 
> behavior can be restored by disabling enable_hashagg, but that is a very 
> performance-expensive fix. SELECT DISTINCT ON never uses hashing, however, so 
> its behavior is unchanged.

Seems we were happy enough then to tell users to add an ORDER BY.

However, this case is different, since before the results were always
ordered. This time they're possibly ordered. So we'll probably
surprise fewer people this time around.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-26 Thread Andres Freund
Hi,

On 2018-03-23 17:01:54 +0530, Jeevan Chalke wrote:
> Attached patch which fixes that.

Thanks, will push. For the future, I'd be more likely to notice if you
CC me ;)


> However, I am not sure whether it is expected to have stable regression run
> with installcheck having local settings.
> For example, If I have enabale_hashagg = false locally; I will definitely
> see failures.
> 
> ISTM, that I am missing Andres point here.

I don't think there's a hard and fast rule here. I personally often
during development disable parallelism because it makes some things
harder (you can't easily debug crashes with gdb, benchmarks show larger
variance, ...).  There doesn't seem to be an equivalent benefit to
support running e.g. with enabale_hashagg = false.

- Andres



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread David Rowley
On 27 March 2018 at 10:20, Stephen Frost  wrote:
> In the end, I do tend to agree that we probably should add parallel
> support to these aggregates, but it'd also be nice to hear from those
> who had worked to add parallelism to the various aggregates as to if
> there was some reason these were skipped.

That was me. I originally skipped string_agg and array_agg as I
thought parallelism would not be much use due to the growth of the
size of the state. I thought the combine function could become very
expensive as it has to work on each aggregated value, instead of a
consolidated result, as the other aggregates provide. The reason for
adding this now and not back then is:

1. I was wrong, parallel aggregation can improve performance for these
aggregates, and;
2. It's annoying that a single non-partial aggregate in the query
disabled parallel aggregation, and;
3. When I originally considered the performance I didn't consider that
some queries may filter values before aggregation via a FILTER or a
WHERE clause.
4. It's not just parallel aggregation that uses partial aggregation,
we can now push aggregation below Append for partitioned tables in
some cases. Hopefully, we'll also have "group before join" one day
too.

While 2 is still true, there should be a reduction in the times this
happens. Just the json and xml aggs holding us back from the standard
set now.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-26 Thread Robert Haas
On Sat, Mar 24, 2018 at 9:40 AM, Amit Kapila  wrote:
> For me, it is equivalent to the master.  The average of ten runs on
> the master is 20664.3683 and with all the patches applied it is
> 20590.4734.  I think there is some run-to-run variation, but more or
> less there is no visible degradation.  I think we have found the root
> cause and eliminated it.  OTOH, I have found another case where new
> patch series seems to degrade.

All right, I have scaled my ambitions back further.  Here is a revised
and slimmed-down version of the patch series.  If we forget about
"Remove explicit path construction logic in create_ordered_paths" for
now, then we don't really need a new upperrel.  So this patch just
modifies the toplevel scan/join rel in place, which should avoid a
bunch of overhead in add_path() and other places, while hopefully
still fixing the originally-reported problem.  I haven't tested this
beyond verifying that it passes the regression test, but I've run out
of time for today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0004-Rewrite-the-code-that-applies-scan-join-targets-to-p.patch
Description: Binary data


0003-Postpone-generate_gather_paths-for-topmost-scan-join.patch
Description: Binary data


0002-CP_IGNORE_TLIST.patch
Description: Binary data


0001-Teach-create_projection_plan-to-omit-projection-wher.patch
Description: Binary data


Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> Currently it is implemented via different v3 messages (parse, bind, execute, 
> row description, data row, etc etc).
> 
> The claim is *any* implementation "on top of HTTP/2" would basically require 
> to implement those "parse, bind, execute, row data, etc" *messages*.

Why? Wouldn’t you be able to package that into a single request with query in 
the data frame and params as headers?

> Say you pick to use "/parse" url with SQL text in body instead of "parse 
> message". It does not make the whole thing "just HTTP/2". It just means 
> you've created "your own protocol on top of HTTP/2”.

It is new functionality, isn’t it? Of course you have to evolve protocol 
semantics for that. That’s the whole point! HTTP2 is just a nice substrate that 
comes with the way to negotiate capabilities and can separate the metadata from 
payload. Nothing revolutionary, but it lets you move forward without hurting 
existing applications. Isn’t that an upgrade from v3?

> 
> Clients would have to know the sequence of valid messages,
> clients would have to know if SQL should be present in body or in URL or in 
> form post data, etc, etc.
> 
> I believe Andres means exactly the same thing as he says
> 
> By the way: you cannot just "load balance" "parse/bind/exec" to different 
> backends, so the load balancer should be aware of meaning of those 
> "parse/bind/exec" messages. I believe that is one of the requirements Craig 
> meant by "Is practical to implement in connection pooler proxies”.

Why can’t I package this into a single request? Don’t modern web proxies deal 
with session affinity and stuff like that?

> 
> Andres>You *still* need a full blown protocol ontop of it. So no, this 
> doesn't change that
> 
> 
> Damir> Did you know that Go has HTTP2 support in the standard library? And so 
> does java, too?
> 
> Java has TCP implementation in the standard library.
> Does it help implementing v3 protocol?

It does. If Java only had IP, without TCP, would you be able to implement your 
driver? Yes, but you’d have to suffer longer.

> In the same way HTTP/2 "as a library" helps implementing v4. The problem is 
> it does not. Developer would have to somehow code the coding rules (e.g. 
> header names, body formats).
> HTTP/2 is just too low level.
> 

It’s just framing. But standard framing.

> 
> Damir>Why parse some obscure Postgres-specific binary data types when you can 
> have the database send you the data in the serialization format of your 
> client language?
> 
> From my own experience, automatic use of server-prepared statements (see 
> https://github.com/pgjdbc/pgjdbc/pull/319 
>  ) did cut end-user response times 
> of our business application in half.
> That is clients would have to know the way to use prepared statements in 
> order to get decent performance.
> If you agree with that, then "v3 parse message", "v3 bind message", "v3 
> execute message" is not that different from "HTTP/2 POST to /parse", "HTTP/2 
> POST to /bind", "HTTP/2 POST to /execute". It is still "obscure 
> PostgreSQL-specific HTTP/2 calls”.

What of having that in one single request?

> 
> Even if you disagree (really?) you would still have to know 
> PostgreSQL-specific way to encode SQL text and "number of rows returned" and 
> "wire formats for the columns" even for a single "HTTP POST 
> /just/execute/sql" kind of API. Even that is "a full blown protocol ontop of 
> HTTP2" (c) Andres.

What does your business app do with the data?



> 
> Vladimir



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread Tom Lane
Stephen Frost  writes:
> Yeah, there certainly seems like a lot of opportunity for the ordering
> to end up being volatile already and queries depending on it not
> changing really shouldn't be making that assumption.  I do think it was
> probably a bad move on our part to say that ordering a subquery will
> "usually" work in the documentation but having that slip-up in the
> documentation mean that we actually *are* going to guarantee it'll
> always work to use a subquery ordering to feed an aggregate is a pretty
> big stretch (and is it even really true today anyway?  the documentation
> certainly doesn't seem to be clear on that...).

It's true as long as there's nothing in the outer query that would cause
re-ordering of the subquery output.  I'd be the first to agree that the
wording is a bit sloppy, but I think there are lots of people depending
on that behavior, not least because that was the only way to do it before
we supported ORDER BY inside an aggregate call.

Now, that doesn't directly bear on the argument at hand, because if you're
aggregating over the result of an ORDER BY subquery then you aren't doing
parallel aggregation anyway.  (Perhaps there's parallelism inside the
subquery, but that's irrelevant.)  It is related in the sense that there
are lots of people depending on things other than ORDER-BY-in-the-
aggregate-call to get the results they expect.

regards, tom lane



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread Tom Lane
Tomas Vondra  writes:
> On 03/26/2018 10:27 PM, Tom Lane wrote:
>> I fear that what will happen, if we commit this, is that something like
>> 0.01% of the users of array_agg and string_agg will be pleased, another
>> maybe 20% will be unaffected because they wrote ORDER BY which prevents
>> parallel aggregation, and the remaining 80% will scream because we broke
>> their queries.  Telling them they should've written ORDER BY isn't going
>> to cut it, IMO, when the benefit of that breakage will accrue only to some
>> very tiny fraction of use-cases.

> Isn't the ordering unreliable *already*?

Not if the query is such that what gets chosen is, say, an indexscan or
mergejoin.  It might be theoretically unreliable and yet work fine for
a given application.

I might be too pessimistic about the fraction of users who are depending
on ordered input without having written anything that explicitly forces
that ... but I stand by the theory that it substantially exceeds the
fraction of users who could get any benefit.

Your own example of assuming that separate aggregates are computed
in the same order reinforces my point, I think.  In principle, anybody
who's doing that should write

   array_agg(e order by x),
   array_agg(f order by x),
   string_agg(g order by x)

because otherwise they shouldn't assume that; the manual certainly doesn't
promise it.  But nobody does that in production, because if they did
they'd get killed by the fact that the sorts are all done independently.
(We should improve that someday, but it hasn't been done yet.)  So I think
there are an awful lot of people out there who are assuming more than a
lawyerly reading of the manual would allow.  Their reaction to this will
be about like ours every time the GCC guys decide that some longstanding
behavior of C code isn't actually promised by the text of the C standard.

regards, tom lane



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 03/26/2018 10:27 PM, Tom Lane wrote:
> > I fear that what will happen, if we commit this, is that something like
> > 0.01% of the users of array_agg and string_agg will be pleased, another
> > maybe 20% will be unaffected because they wrote ORDER BY which prevents
> > parallel aggregation, and the remaining 80% will scream because we broke
> > their queries.  Telling them they should've written ORDER BY isn't going
> > to cut it, IMO, when the benefit of that breakage will accrue only to some
> > very tiny fraction of use-cases.
> 
> Isn't the ordering unreliable *already*? It depends on ordering of
> tuples on the input. So if the table is scanned by index scan or
> sequential scan, that will affect the array_agg/string_agg results. If
> the input is a join, it's even more volatile.

It's not even just that- for a seq scan, it'll also depend on if anyone
else is also scanning the table as we might not start at the beginning
of the table thanks to synchronize seq scans.

> IMHO it's not like we're making the ordering unpredictable - it's been
> like that since forever.

Yeah, there certainly seems like a lot of opportunity for the ordering
to end up being volatile already and queries depending on it not
changing really shouldn't be making that assumption.  I do think it was
probably a bad move on our part to say that ordering a subquery will
"usually" work in the documentation but having that slip-up in the
documentation mean that we actually *are* going to guarantee it'll
always work to use a subquery ordering to feed an aggregate is a pretty
big stretch (and is it even really true today anyway?  the documentation
certainly doesn't seem to be clear on that...).

> Also, how is this different from ORDER BY clause? If a user does not
> specify an ORDER BY clause, I don't think we'd care very much about
> changes to output ordering due to plan changes, for example.

This seems like it boils down to "well, everyone *knows* that rows can
come back in any order" combined with "our docs claim that an ordered
subquery will *usually* work".

In the end, I do tend to agree that we probably should add parallel
support to these aggregates, but it'd also be nice to hear from those
who had worked to add parallelism to the various aggregates as to if
there was some reason these were skipped.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Proposal: http2 wire format

2018-03-26 Thread Alvaro Hernandez



On 26/03/18 21:57, Damir Simunic wrote:


On 26 Mar 2018, at 15:42, Alvaro Hernandez > wrote:




On 26/03/18 13:11, Damir Simunic wrote:
On 26 Mar 2018, at 11:13, Vladimir Sitnikov 
> 
wrote:


Damir> * What are the criteria for getting this into the core?
Craig>Mine would be:

+1

There's a relevant list as well: 
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md 





This is a great addition to the list, thanks!

Damir



    Hi Damir.

    I'm interested in the idea. However, way before writing a PoC, 
IMVHO I'd rather write a detailed document including:


- A brief summary of the main features of HTTP2 and why it might be a 
good fit for PG (of course there's a lot of doc in the wild about 
HTTP/2, so just a summary of the main relevant features and an 
analysis of how it may fit Postgres).


- A more or less thorough description of how every feature in current 
PostgreSQL protocol would be implemented on HTTP/2.


- Similar to the above, but applied to the v4 TODO feature list.

- A section for connection poolers, as  an auth, as these are very 
important topics.



    Hope this helps,

    Álvaro



Álvaro, it does help, thanks. This discussion is to inform such a 
document. But the topic is such that having a good PoC will move the 
discussion further much faster.


    I don't particularly agree with this. A PoC for a significant 
undertaking like this is a) time and effort consuming and b) will need 
to take clear directions that may be completely wrong later. A matter as 
important as this requires design, and coding without it seems to be 
wrong direction in my opinion. I'm a bit skeptical about HTTP/2 being a 
good idea here, but I'm more than in to study it in detail if I'm 
enlightened about it and the proposed advantages. As you may have seen I 
contributed one of the "brain dumps" at 
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md 
So I'm on the side of looking at new versions/protocols. But a PoC will 
not help here at all, a design document will :)




Can you help with thinking about how would HTTP2 impact connection 
poolers, I don’t know much about those?


    For one, authentication. It needs to be possible to authenticate 
when there is a "man in the middle", which is far from trivial. Also 
extra round trips may not be acceptable. There are challenges in session 
management, user management, whether to add balancing and failover 
here... If a new protocol is to be designed, I'd like to think about 
these problems -even if not solve them all-.



    Álvaro

--

Alvaro Hernandez


---
OnGres



Re: Index scan prefetch?

2018-03-26 Thread Claudio Freire
On Mon, Mar 26, 2018 at 4:59 PM, Justin Pryzby  wrote:
>> But now effective_io_concurrency parameter is applicable only for bitmap
> ...
>> Will it be useful to support it also for index scan?
>> Or there are some other ways to address this problem?
>
> Does your case perform well with bitmap heap scan (I mean bitmap scan of the
> single index)?  It seems to me that prefetch wouldn't help, as it would just
> incur the same random cost you're already seeing; the solution may be to 
> choose
> another plan(bitmap) with sequential access to enable read-ahead,
>
> Also: Claudio mentioned here that bitmap prefetch can cause the kernel to 
> avoid
> its own readahead, negatively affecting some queries:
> https://www.postgresql.org/message-id/flat/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32%40gmail.com#20180207054227.ge17...@telsasoft.com
>
> What's the pg_stats "correlation" for the table column with index being
> scanned?  How many tuples?  Would you send explain(analyze,buffers) for the
> problem query, and with SET enable_bitmapscan=off; ?

Also, check out this thread:

http://www.postgresql-archive.org/Prefetch-index-pages-for-B-Tree-index-scans-td5728926.html



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-26 Thread Dean Rasheed
On 26 March 2018 at 20:17, Tomas Vondra  wrote:
> On 03/26/2018 09:01 PM, Dean Rasheed wrote:
>> Also, just above that, in statext_clauselist_selectivity(), it
>> computes the list stat_clauses, then doesn't appear to use it
>> anywhere. I think that would have been the appropriate thing to pass
>> to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses
>> into mcv_clauselist_selectivity() will cause it to fail to find any
>> matches and then underestimate.
>
> Will check.
>

Here's a test case demonstrating this bug:

drop table if exists foo;
create table foo(a int, b int, c int);
insert into foo select 0,0,0 from generate_series(1,10);
insert into foo select 1,1,1 from generate_series(1,1);
insert into foo select 2,2,2 from generate_series(1,1000);
insert into foo select 3,3,3 from generate_series(1,100);
insert into foo select x,x,x from generate_series(4,1000) g(x);
insert into foo select x,x,x from generate_series(4,1000) g(x);
insert into foo select x,x,x from generate_series(4,1000) g(x);
insert into foo select x,x,x from generate_series(4,1000) g(x);
insert into foo select x,x,x from generate_series(4,1000) g(x);
analyse foo;
explain analyse select * from foo where a=1 and b=1 and c=1;
create statistics foo_mcv_ab (mcv) on a,b from foo;
analyse foo;
explain analyse select * from foo where a=1 and b=1 and c=1;

With the multivariate MCV statistics, the estimate gets worse because
it passes the c=1 clause to mcv_clauselist_selectivity(), and nothing
matches.

There's also another bug, arising from the fact that
statext_is_compatible_clause() says that NOT clauses are supported,
but mcv_clauselist_selectivity() doesn't support them. So with the
above table:

select * from foo where (a=0 or b=0) and not (b in (1,2));
ERROR:  unknown clause type: 111

Regards,
Dean



Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread Tomas Vondra
On 03/26/2018 10:27 PM, Tom Lane wrote:
> David Rowley  writes:
>> [ combinefn_for_string_and_array_aggs_v7.patch ]
> 
> I spent a fair amount of time hacking on this with intent to commit,
> but just as I was getting to code that I liked, I started to have second
> thoughts about whether this is a good idea at all.  I quote from the fine
> manual:
> 
> The aggregate functions array_agg, json_agg, jsonb_agg,
> json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as
> similar user-defined aggregate functions, produce meaningfully
> different result values depending on the order of the input
> values. This ordering is unspecified by default, but can be controlled
> by writing an ORDER BY clause within the aggregate call, as shown in
> Section 4.2.7. Alternatively, supplying the input values from a sorted
> subquery will usually work ...
> 
> I do not think it is accidental that these aggregates are exactly the ones
> that do not have parallelism support today.  Rather, that's because you
> just about always have an interest in the order in which the inputs get
> aggregated, which is something that parallel aggregation cannot support.
> 

I don't think that's quite true. I know plenty of people who do things
like this:

SELECT
   a,
   b,
   avg(c),
   sum(d),
   array_agg(e),
   array_agg(f),
   string_agg(g)
FROM hugetable GROUP BY a,b HAVING avg(c) > 100.89;

and then do some additional processing on the result in some way
(subquery, matview, ...). They don't really care about ordering of
values in the arrays, as long as orderings of all the arrays match.

Currently queries like this can use parallelism at all, and the patch
fixes that I believe.

> I fear that what will happen, if we commit this, is that something like
> 0.01% of the users of array_agg and string_agg will be pleased, another
> maybe 20% will be unaffected because they wrote ORDER BY which prevents
> parallel aggregation, and the remaining 80% will scream because we broke
> their queries.  Telling them they should've written ORDER BY isn't going
> to cut it, IMO, when the benefit of that breakage will accrue only to some
> very tiny fraction of use-cases.
> 

Isn't the ordering unreliable *already*? It depends on ordering of
tuples on the input. So if the table is scanned by index scan or
sequential scan, that will affect the array_agg/string_agg results. If
the input is a join, it's even more volatile.

IMHO it's not like we're making the ordering unpredictable - it's been
like that since forever.

Also, how is this different from ORDER BY clause? If a user does not
specify an ORDER BY clause, I don't think we'd care very much about
changes to output ordering due to plan changes, for example.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2018-03-26 Thread Mark Rofail
On 3/7/18 5:43 AM, Alvaro Herrera wrote:

I searched for the new GIN operator so that I
> could brush it up for commit ahead of the rest -- only to find out that
> it was eviscerated from the patch between v5 and v5.1.
>
> The latest version of the patch which contained the new GIN operator is
version  `*Array-ELEMENT-foreign-key-v6.patch
*`
which works fine and passed all the regression tests at the time
(2018-01-21). We abandoned the GIN operator since it couldn't follow the
same logic as the rest of GIN operators use since it operates on a Datum
not an array. Not because of any error.

just as Andreas said

On Wed, Jan 31, 2018 at 1:52 AM, Andreas Karlsson
 wrote:

 The issue I see is that
> ginqueryarrayextract() needs to make a copy of the search key but to do
> so it needs to know the type of anyelement (to know if it needs to
> detoast, etc). But there is as far as I can tell no way to check the
> type of anyelement in this context.
>
>
If there is any way to  obtain a copy of the datum I would be more than
happy to integrate the GIN operator to the patch.

Best.
Mark Rofail


Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
Hi Andres,

> 
> At least I do *NOT* want many protocols in core. We've a hard enough
> time to keep up with integrating patches and maintenance to not just
> willy nilly integrate multiple new features with unclear lifetimes.

Admire your effort in applying all these patches—this commitfest thing looks 
frenetic now that I’m subscribed to the mailing list. Can only guess the effort 
required on the part of a few of you to study and triage everything. Respect.

Actually, I don’t advocate multiple protocols in core. But the exercise of 
considering one will help generalize the architecture enough to make all 
protocols pluggable. 

The most interesting part for me is working out content negotiation—I think 
being able to package data in new ways will be super-interesting.

> 
> *NONE* of the interesting problems are solved by HTTP2. You *still*
> need a full blown protocol ontop of it. So no, this doesn't change that.

If you had to nominate only one of those problems, which one would you consider 
the most interesting?


Thanks for chiming in, really appreciate your time,
Damir





Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
Damir>Wouldn’t that be protocol semantics? Framing is already taken care of
by the wire protocol.

Apparently I'm using the wrong word. I do mean protocol semantics.

Damir>But can you pull off grpc.. Would it be the only protocol supported?

Of course there will be lots of "older clients"
For instance, pgjdbc has been supporting v2 and v3 for quite a while.
Now it supports just v3.

Damir>Can you imagine the reaction and discussion if I came up with this?

I would +1 for that :)

Damir>how can I do something about the status quo of FEBE protocol that
would be defensible in front of the Postgres community?” What would be your
answer?

I would definitely check if frameworks like GRPC, Apache Thrift, and
similar are suitable.

Damir>What if I wanted JSON or CSV returned

That is not up to the protocol. It is not up to the backend.
It looks like a task for the application since CSV could mean lots of stuff
(e.g. commas vs tabs vs quotes etc). You don't want to integrate all that
stuff in core.

Damir>Wouldn’t you need to first solve the problem of content negotiation?

Currently content negotiation is done at application level (e.g. at bind
message). I don't think it makes sense to negotiate the content for each
and every protocol message. It does sound like an over-engineering.


Damir>Wouldn’t HTTP2 framing still allow prepared statements and cursors?

It will require to implement parse/bind/execute "protocol messages" again
and again.

Damir>’m just proposing a layer under it that gets rid of a lot of pain.

The thing is it is not clear whose pain are you treating.
Application developers just use drivers, so they don't care if there's
HTTP/2 under the covers.
Driver developers have more-or-less tested v3 implementations, so they
don't have that pain of "implementing v3 parser".
PostgreSQL core developers ..., well, you know.

Vladimir


Re: Parallel Aggregates for string_agg and array_agg

2018-03-26 Thread Tom Lane
David Rowley  writes:
> [ combinefn_for_string_and_array_aggs_v7.patch ]

I spent a fair amount of time hacking on this with intent to commit,
but just as I was getting to code that I liked, I started to have second
thoughts about whether this is a good idea at all.  I quote from the fine
manual:

The aggregate functions array_agg, json_agg, jsonb_agg,
json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as
similar user-defined aggregate functions, produce meaningfully
different result values depending on the order of the input
values. This ordering is unspecified by default, but can be controlled
by writing an ORDER BY clause within the aggregate call, as shown in
Section 4.2.7. Alternatively, supplying the input values from a sorted
subquery will usually work ...

I do not think it is accidental that these aggregates are exactly the ones
that do not have parallelism support today.  Rather, that's because you
just about always have an interest in the order in which the inputs get
aggregated, which is something that parallel aggregation cannot support.

I fear that what will happen, if we commit this, is that something like
0.01% of the users of array_agg and string_agg will be pleased, another
maybe 20% will be unaffected because they wrote ORDER BY which prevents
parallel aggregation, and the remaining 80% will scream because we broke
their queries.  Telling them they should've written ORDER BY isn't going
to cut it, IMO, when the benefit of that breakage will accrue only to some
very tiny fraction of use-cases.

In short, I think we ought to reject this.

Just in case I'm outvoted, attached is what I'd gotten done so far.
The main noncosmetic changes I'd made were to improve the caching logic
(it's silly to set up a lookup cache and then not cache the fmgr_info
lookup) and to not cheat quite as much on the StringInfo passed down to
the element typreceive function.  There isn't any other place, I don't
think, where we don't honor the expectation that StringInfos have trailing
null bytes, and some places may depend on it --- array_recv does.

The main thing that remains undone is to get some test coverage ---
AFAICS, none of these new functions get exercised in the standard
regression tests.

I'm also a bit unhappy that the patch introduces code into
array_userfuncs.c that knows everything there is to know about the
contents of ArrayBuildState and ArrayBuildStateArr.  Previously that
knowledge was pretty well localized in arrayfuncs.c.  I wonder if it'd
be better to move the new combinefuncs and serial/deserial funcs into
arrayfuncs.c.  Or perhaps export primitives from there that could be
used to build them.

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7b1a85f..faa4939 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** NULL baz(3 rows)
*** 13358,13364 

 array of the argument type

!   No
input values, including nulls, concatenated into an array
   
  
--- 13358,13364 

 array of the argument type

!   Yes
input values, including nulls, concatenated into an array
   
  
*** NULL baz(3 rows)
*** 13372,13378 

 same as argument data type

!   No
input arrays concatenated into array of one higher dimension
 (inputs must all have same dimensionality,
  and cannot be empty or NULL)
--- 13372,13378 

 same as argument data type

!   Yes
input arrays concatenated into array of one higher dimension
 (inputs must all have same dimensionality,
  and cannot be empty or NULL)
*** NULL baz(3 rows)
*** 13633,13639 

 same as argument types

!   No
input values concatenated into a string, separated by delimiter
   
  
--- 13633,13639 

 same as argument types

!   Yes
input values concatenated into a string, separated by delimiter
   
  
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index cb7a6b6..0007a28 100644
*** a/src/backend/utils/adt/array_userfuncs.c
--- b/src/backend/utils/adt/array_userfuncs.c
***
*** 13,24 
--- 13,44 
  #include "postgres.h"
  
  #include "catalog/pg_type.h"
+ #include "libpq/pqformat.h"
  #include "common/int.h"
  #include "utils/array.h"
+ #include "utils/datum.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
  #include "utils/typcache.h"
  
+ /*
+  * SerialIOData
+  *		Used for caching element-type data in array_agg_serialize
+  */
+ typedef struct SerialIOData
+ {
+ 	FmgrInfo	typsend;
+ } SerialIOData;
+ 
+ /*
+  * DeserialIOData
+  *		Used for caching element-type data in array_agg_deserialize
+  */
+ typedef struct 

Re: Why does load_external_function() return PGFunction?

2018-03-26 Thread Andres Freund
On 2018-03-26 11:31:36 -0700, Andres Freund wrote:
> On 2018-03-26 11:14:03 -0400, Robert Haas wrote:
> > On Sat, Mar 24, 2018 at 4:33 PM, Andres Freund  wrote:
> > > I don't think it really changes the need, but it's worthwhile to note
> > > that gcc-8 warns about this now:
> > > /home/andres/src/postgresql/src/backend/postmaster/bgworker.c: In 
> > > function ‘LookupBackgroundWorkerFunction’:
> > > /home/andres/src/postgresql/src/backend/postmaster/bgworker.c:1246:9: 
> > > warning: cast between incompatible function types from ‘PGFunction’ {aka 
> > > ‘long unsigned int (*)(struct FunctionCallInfoData *)’} to ‘void 
> > > (*)(Datum)’ {aka ‘void (*)(long unsigned int)’} [-Wcast-function-type]
> > >   return (bgworker_main_type)
> > 
> > That's probably going to mean we need, or at least want, to do
> > something about this at some point.  Warning-free compiles are
> > desirable.
> 
> Right. We can suppress those if we want, either by adding another cast,
> or by adding -Wno-cast-function-type. The latter wouldn't be entirely a
> bad idea, it seems to be a warning of very limited benefit.

I dug up a thread about the introduction of the warning:
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00423.html

Sounds like we should add something like
typedef void (*GenericFuncPtr) (void);
or such? Similar to what Tom proposed upthread.

- Andres



Re: Online enabling of checksums

2018-03-26 Thread Tomas Vondra
Hi,

I see enable_data_checksums() does this:

if (cost_limit <= 0)
ereport(ERROR,
(errmsg("cost limit must be a positive value")));

Is there a reason not to allow -1 (no limit), just like for vacuum_cost?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 18:09, Vladimir Sitnikov  
> wrote:
> 
> Damir>Postgres doesn’t know about grpc, s
> 
> I'm afraid you are missing the point.
> I would say PostgreSQL doesn't know about HTTP/2.
> It is the same as "PostgreSQL doesn't know about grpc".
> 
> Here's a quote from your pg_h2 repo:
> >What we need is to really build a request object and correctly extract
> > the full payload and parameters from the request. For example,
> >maybe we want to implement a QUERY method, similar to POST or PUT,
> > and pass the query text as the body of the request, with parameters
> > in the query string or in the headers
> 
> It basically suggests to implement own framing on top of HTTP/2.

Wouldn’t that be protocol semantics? Framing is already taken care of by the 
wire protocol.

> 
> When I say GRPC, I mean "implement PostgreSQL-specific protocol via GRPC 
> messages".
> 
> Let's take current message formats: 
> https://www.postgresql.org/docs/current/static/protocol-message-formats.html
> If one defines those message formats via GRPC, then GRPC would autogenerate 
> parsers and serializers for lots of languages "for free".
> 
> For instance
> Query (F)
>  Byte1('Q') Identifies the message as a simple query.
>  Int32 Length of message contents in bytes, including self.
>  String The query string itself.
> 
> can be defined via GPRC as
> message Query {
>   string queryText = 1;
> }
> 
> This is trivial to read, trivial to write, trivial to maintain, and it 
> automatically generates parsers/generators for lots of languages.
> 

I agree with you 100% here. But can you pull off grpc without HTTP2 framing in 
place? Would it be the only protocol supported? What if I wanted JSON or CSV 
returned, or just plain old Postgres v3 binary format, since I already have the 
parser written for it? Wouldn’t you need to first solve the problem of content 
negotiation?

HTTP2 proposal is pragmatically much smaller chunk, and it’s already hard to 
explain. Can you imagine the reaction and discussion if I came up with this?

In fact, if you ask yourself the question “how can I do something about the 
status quo of FEBE protocol that would be defensible in front of the Postgres 
community?” What would be your answer? 

> 
> Parsing of the current v3 protocol has to be reimplemented for each and every 
> language, and it would be pain to implement parsing for v4.
> Are you going to create "http/2" clients for Java, C#, Ruby, Swift, Dart, 
> etc, etc?
> 
> I am not saying that a mere redefinition of v3 messages as GRPC would do the 
> trick. I am saying that you'd better consider frameworks that would enable 
> transparent implementation of client libraries.
> 
> Damir>and will talk to any HTTP2 conforming client
> 
> I do not see where are you heading to.

Getting rid of having to write a framing parser in every client language?

> Is "curl as PostgreSQL client" one of the key objectives for you?

No, it’s just something that is available right now—the point is to demonstrate 
increased ability to get the data out, without having to write access code over 
and over, and then lug that whenever you install some data processing piece. 
Kind of the same motivation why you think grpc is it. I’m just proposing a 
layer under it that gets rid of a lot of pain.

> True clients (the ones that are used by the majority of applications) should 
> support things like "prepared statements", "data types", "cursors" (resultset 
> streaming), etc. I can hardly imagine a case when one would use "curl" and 
> operate with prepared statements.

Wouldn’t HTTP2 framing still allow prepared statements and cursors?

> I think psql is pretty good client, so I see no point in implementing HTTP/2 
> for a mere reason of using curl to fetch data from the DB.

> 
> Vladimir




Re: JIT compiling with LLVM v12.2

2018-03-26 Thread Andres Freund
Hi,

On 2018-03-25 00:07:11 +1300, Thomas Munro wrote:
> I spotted a couple of typos and some very minor coding details -- see
> please see attached.

Thanks, applying 0001 in a bit.


> From 648e303072c77e781eca2bb06f488f6be9ccac84 Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Sat, 24 Mar 2018 23:12:40 +1300
> Subject: [PATCH 2/2] Minor code cleanup for llvmjit_wrap.cpp.
> 
> llvm::sys::getHostCPUName()'s result is a llvm::StringRef.  Its data() member
> function doesn't guarantee a null-terminated result, so we'd better jump
> through an extra hoop to get a C string.

Hm, I checked, and it's fine, I'm not enthusiastic about this...


> It seems better to use LLVMCreateMessage() rather than strdup() to allocate
> the copy returned by LLVMGetHostCPUFeatures() and LLVMGetHostCPUName(),
> since the contract is that the caller should free it with
> LLVMDisposeMessage().  While we can see that LLVMCreateMessage() and
> LLVMDisposeMessage() are currently just wrappers for strdup() and free(),
> using them symmetrically seems like a good idea for future Windows support,
> where DLLs can be using different heap allocators (the same reason we provide
> PQfreemem in libpq).

I just kept it similar to nearby functions in the LLVM code.


> Fix brace style.

I tried to keep this as it's submitted to LLVM, I hope we can get rid of
them for newer version soon...  I think I'll update them to be exactly
the same as soon as the upstream patch is applied.

Greetings,

Andres Freund



Re: Index scan prefetch?

2018-03-26 Thread Justin Pryzby
On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote:
> Hi, hackers.
> 
> I was faced with the following bad performance use case with Postgres: there
> is a huge append-only table with serial key (ID)
> which is permanently appended using multithreaded pgloader.

I think this could be similar to the problem I reported here:
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com

The analysis at the time was that, due to "repeated keys", a btree index on the
timestamp column had non-consecutive heap TIDs (btree insertion uses random
early insertion point to avoid superlinear lookup cost during such insertions).

But, our case also involved multiple child processes simultaneously inserting
into same table, and I wonder if "repeated keys" were more or less unrelated to
the problem.  The behavior is maybe caused easily by simultaneous insertions
"clustered" around the same target: for us: now(), for you: nextval().

> But now effective_io_concurrency parameter is applicable only for bitmap
...
> Will it be useful to support it also for index scan?
> Or there are some other ways to address this problem?

Does your case perform well with bitmap heap scan (I mean bitmap scan of the
single index)?  It seems to me that prefetch wouldn't help, as it would just
incur the same random cost you're already seeing; the solution may be to choose
another plan(bitmap) with sequential access to enable read-ahead,

Also: Claudio mentioned here that bitmap prefetch can cause the kernel to avoid
its own readahead, negatively affecting some queries:
https://www.postgresql.org/message-id/flat/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32%40gmail.com#20180207054227.ge17...@telsasoft.com

What's the pg_stats "correlation" for the table column with index being
scanned?  How many tuples?  Would you send explain(analyze,buffers) for the
problem query, and with SET enable_bitmapscan=off; ?

Justin



Re: WIP: Covering + unique indexes.

2018-03-26 Thread Peter Geoghegan
On Mon, Mar 26, 2018 at 3:10 AM, Alexander Korotkov
 wrote:
> So, as I get you're proposing to introduce INDEX_ALT_TID_MASK flag
> which would indicate that we're storing something special in the t_tid
> offset.  And that should help us not only for covering indexes, but also for
> further btree enhancements including suffix truncation.  What exactly do
> you propose to store into t_tid offset when INDEX_ALT_TID_MASK flag
> is set?  Is it number of attributes in this particular index tuple?

Yes. I think that once INDEX_ALT_TID_MASK is available, we should
store the number of attributes in that particular "separator key"
tuple (which has undergone suffix truncation), and always work off of
that. You could then have status bits in offset as follows:

* 1 bit that represents that this is a "separator key" IndexTuple
(high key or internal IndexTuple). Otherwise, it's a leaf IndexTuple
with an ordinary heap TID. (When INDEX_ALT_TID_MASK isn't set, it's
the same as today.)

* 3 reserved bits. I think that one of these bits can eventually be
used to indicate that the internal IndexTuple actually has a
"normalized key" representation [1], which seems like the best way to
do suffix truncation, long term. I think that we should support simple
suffix truncation, of the kind that this patch implements, alongside
normalized key suffix truncation. We need both for various reasons
[2].

Not sure what the other two flag bits might be used for, but they seem
worth having.

* 12 bits for the number of attributes, which should be more than
enough, even when INDEX_MAX_KEYS is significantly higher than 32. A
static assertion can keep this safe when INDEX_MAX_KEYS is set
ridiculously high.

I think that this scheme is future-proof. Maybe you have additional
ideas on the representation. Please let me know what you think.

When we eventually add optimizations that affect IndexTuples on the
leaf level, we can start using the block number (bi_hi + bi_lo)
itself, much like GIN posting lists. No need to further consider that
(the leaf level optimizations) today, because using block number
provides us with many more bits.

In internal page items, the block number is always a block number, so
internal IndexTuples are rather like GIN posting tree pointers in the
main entry tree (its leaf level) -- a conventional item pointer block
number is used, alongside unconventional use of the offset field,
where there are 16 bits available because no real offset is required.

[1] 
https://wiki.postgresql.org/wiki/Key_normalization#Optimizations_enabled_by_key_normalization
[2] 
https://wiki.postgresql.org/wiki/Key_normalization#How_big_can_normalized_keys_get.2C_and_is_it_worth_it.3F
-- 
Peter Geoghegan



Re: Re: csv format for psql

2018-03-26 Thread Pavel Stehule
2018-03-26 20:26 GMT+02:00 Daniel Verite :

> Pavel Stehule wrote:
>
> > b) the list of pset options is bloating - every possible new format can
> > introduce fieldsep_X option
>
> What new format?
> The usefulness of fieldsep does not extend outside of xSV, and it's
> no suprise that there have been no other use for a fieldsep-like
> variable until now, with the other formats supported in psql.
>

I can imagine format based on \t as field separator, or some other
invisible chars.


>
> In fact it's even absurd for any format to use a non-fixed separator
> at a place that is key for being parsed unambiguously.
>
> We could even support only the comma and make it non-configurable
> based on the fact it's Comma-Separated-Values, not
> Whatever-Separated-Values, except that won't do much
> to serve the users interests, as the reality is that
> people use various separators.
>

I don't think so allow only comma as separator is the solution.

Regards

Pavel


>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>


Re: committing inside cursor loop

2018-03-26 Thread Peter Eisentraut
On 3/19/18 20:40, Peter Eisentraut wrote:
> On 3/14/18 08:05, Ildus Kurbangaliev wrote:
>>> The ROLLBACK call in the first loop iteration undoes the UPDATE
>>> command that drives the loop.  Is it then sensible to continue the
>>> loop?
>>>
>> I think that in the first place ROLLBACK was prohibited because of cases
>> like this, but it seems to safe to continue the loop when portal
>> strategy is PORTAL_ONE_SELECT.
> 
> Maybe, but even plain SELECT commands can have side effects.

Here is an updated patch that supports the ROLLBACK case as well, and
prevents holding portals with a strategy other than PORTAL_ONE_SELECT.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 1627af796da44d03f2d0d739250621be1c3244a3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 26 Mar 2018 15:40:49 -0400
Subject: [PATCH v2] PL/pgSQL: Allow committing inside cursor loop

Previously, committing or aborting inside a cursor loop was prohibited
because that would close and remove the cursor.  To allow that,
automatically convert such cursors to holdable cursors so they survive
commits or rollbacks.  Portals now have a new state "auto-held", which
means they have been converted automatically from pinned.  An auto-held
portal is kept on transaction commit or rollback, but is still removed
when returning to the main loop on error.
---
 doc/src/sgml/plpgsql.sgml  |  37 +-
 src/backend/tcop/postgres.c|   2 +
 src/backend/utils/mmgr/portalmem.c | 145 +
 src/include/utils/portal.h |   4 +
 .../plpgsql/src/expected/plpgsql_transaction.out   | 108 ++-
 src/pl/plpgsql/src/pl_exec.c   |  18 +--
 src/pl/plpgsql/src/sql/plpgsql_transaction.sql |  67 ++
 7 files changed, 333 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 7ed926fd51..2043e644a9 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3496,8 +3496,41 @@ Transaction Management

 

-A transaction cannot be ended inside a loop over a query result, nor
-inside a block with exception handlers.
+Special considerations apply to cursor loops.  Consider this example:
+
+CREATE PROCEDURE transaction_test2()
+LANGUAGE plpgsql
+AS $$
+DECLARE
+r RECORD;
+BEGIN
+FOR r IN SELECT * FROM test2 ORDER BY x LOOP
+INSERT INTO test1 (a) VALUES (r.x);
+COMMIT;
+END LOOP;
+END;
+$$;
+
+CALL transaction_test2();
+
+Normally, cursors are automatically closed at transaction commit.
+However, a cursor created as part of a loop like this is automatically
+converted to a holdable cursor by the first COMMIT or
+ROLLBACK.  That means that the cursor is fully
+evaluated at the first COMMIT or
+ROLLBACK rather than row by row.  The cursor is still
+removed automatically after the loop, so this is mostly invisible to the
+user.
+   
+
+   
+Transaction commands are not allowed in cursor loops driven by commands
+that are not read-only (for example UPDATE
+... RETURNING).
+   
+
+   
+A transaction cannot be ended inside a block with exception handlers.

   
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f7aa4a7484..6fc1cc272b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3938,6 +3938,8 @@ PostgresMain(int argc, char *argv[],
if (am_walsender)
WalSndErrorCleanup();
 
+   PortalErrorCleanup();
+
/*
 * We can't release replication slots inside AbortTransaction() 
as we
 * need to be able to start and abort transactions while having 
a slot
diff --git a/src/backend/utils/mmgr/portalmem.c 
b/src/backend/utils/mmgr/portalmem.c
index 75a6dde32b..74a1040d8a 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -620,6 +620,36 @@ PortalHashTableDeleteAll(void)
}
 }
 
+/*
+ * "Hold" a portal.  Prepare it for access by later transactions.
+ */
+static void
+HoldPortal(Portal portal)
+{
+   /*
+* Note that PersistHoldablePortal() must release all resources
+* used by the portal that are local to the creating transaction.
+*/
+   PortalCreateHoldStore(portal);
+   PersistHoldablePortal(portal);
+
+   /* drop cached plan reference, if any */
+   PortalReleaseCachedPlan(portal);
+
+   /*
+* Any resources belonging to the portal will be released in the
+* upcoming transaction-wide cleanup; the portal will no longer
+* have its own resources.
+*/
+   portal->resowner = NULL;
+
+   /*
+* Having successfully exported the holdable cursor, mark it as
+* not belonging to this transaction.

Re: Why does load_external_function() return PGFunction?

2018-03-26 Thread Tom Lane
Andres Freund  writes:
> On 2018-02-06 15:43:29 -0500, Tom Lane wrote:
>> void* isn't necessarily compatible with function pointers --- there are
>> platforms where they're physically different widths, though possibly
>> you'd never get PG to run on such hardware anyway.

> Btw, given that we store function pointers in datums, that ship has
> sailed a long while ago, no?

We do?  Data pointers, sure, but I can't think of any use of the other.

regards, tom lane



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-26 Thread Tomas Vondra


On 03/26/2018 09:01 PM, Dean Rasheed wrote:
> On 18 March 2018 at 23:57, Tomas Vondra  wrote:
>> Attached is an updated version of the patch series, addressing issues
>> pointed out by Alvaro.
> 
> I've just been reading the new code in
> statext_clauselist_selectivity() and mcv_clauselist_selectivity(), and
> I'm having a hard time convincing myself that it's correct.
> 
> This code in statext_clauselist_selectivity() looks a bit odd:
> 
> /*
>  * Evaluate the MCV selectivity. See if we got a full match and the
>  * minimal selectivity.
>  */
> if (stat->kind == STATS_EXT_MCV)
> s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid,
> jointype, sjinfo, rel,
> , _lowsel);
> 
> /*
>  * If we got a full equality match on the MCV list, we're done (and the
>  * estimate is likely pretty good).
>  */
> if (fullmatch && (s1 > 0.0))
> return s1;
> 
> /*
>  * If it's a full match (equalities on all columns) but we haven't found
>  * it in the MCV, then we limit the selectivity by frequency of the last
>  * MCV item. Otherwise reset it to 1.0.
>  */
> if (!fullmatch)
> mcv_lowsel = 1.0;
> 
> return Min(s1, mcv_lowsel);
> 
> So if fullmatch is true and s1 is greater than 0, it will return s1. 
> If fullmatch is true and s1 equals 0, it will return Min(s1, 
> mcv_lowsel) which will also be s1. If fullmatch is false, mcv_lowsel 
> will be set to 1 and it will return Min(s1, mcv_lowsel) which will 
> also be s1. So it always just returns s1, no? Maybe there's no point 
> in computing fullmatch.
> 

Hmmm, I think you're right. It probably got broken in the last rebase,
when I moved a bunch of code from the histogram part to the MCV one.
I'll take a look.

> Also, wouldn't mcv_lowsel potentially be a significant overestimate
> anyway? Perhaps 1 minus the sum of the MCV frequencies might be
> closer, but even that ought to take into account the number of
> distinct values remaining, although that information may not always be
> available.
> 

That is definitely true. 1 minus the sum of the MCV frequencies, and I
suppose we might even improve that if we had some ndistinct estimate on
those columns to compute an average.

> Also, just above that, in statext_clauselist_selectivity(), it
> computes the list stat_clauses, then doesn't appear to use it
> anywhere. I think that would have been the appropriate thing to pass
> to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses
> into mcv_clauselist_selectivity() will cause it to fail to find any
> matches and then underestimate.
> 

Will check.

> I've also come across a few incorrect/out-of-date comments:
> 
> /*
>  * mcv_clauselist_selectivity
>  *  Return the estimated selectivity of the given clauses using MCV list
>  *  statistics, or 1.0 if no useful MCV list statistic exists.
>  */
> 
> -- I can't see any code path that returns 1.0 if there are no MCV
> stats. The last part of that comment is probably more appropriate to
> statext_clauselist_selectivity()
> 
> 
> /*
>  * mcv_update_match_bitmap
>  * [snip]
>  * The function returns the number of items currently marked as 'match', and
>  * ...
> 
> -- it doesn't seem to return the number of items marked as 'match'.
> 
> Then inside that function, this comment is wrong (copied from the
> preceding comment):
> 
> /* AND clauses assume nothing matches, initially */
> memset(bool_matches, STATS_MATCH_FULL, sizeof(char) *
> mcvlist->nitems);
> 
> Still reading...
> 
> Regards,
> Dean
> 

Yeah, sorry about that - I forgot to fix those comments after removing
the match counting to simplify the patches.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Simon Riggs
On 26 March 2018 at 17:52, Robert Haas  wrote:
> On Mon, Mar 26, 2018 at 12:16 PM, Simon Riggs  wrote:
>> On 26 March 2018 at 16:09, Robert Haas  wrote:
>>> On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs  wrote:
 Since we now have MVCC catalog scans, all the name lookups are
 performed using the same snapshot so in the above scenario the newly
 created object would be invisible to the second name lookup.
>>>
>>> That's not true, because each lookup would be performed using a new
>>> snapshot -- not all under one snapshot.
>>
>> You're saying we take a separate snapshot for each table we lookup?
>> Sounds weird to me.
>
> I'm saying we take a separate snapshot for each and every catalog
> lookup, except when we know that no catalog changes can have occurred.
> See the commit message for 568d4138c646cd7cd8a837ac244ef2caf27c6bb8.
> If you do a lookup in pg_class and 3 lookups in pg_attribute each of
> the 4 can be done under a different snapshot, in the worst case.
> You're not the first person to believe that the MVCC catalog scan
> patch fixes that problem, but as the guy who wrote it, it definitely
> doesn't.  What that patch fixed was, prior to that patch, a catalog
> scan might find the WRONG NUMBER OF ROWS, like you might do a lookup
> against a unique index for an object that existed and, if the row was
> concurrently updated, you might find 0 rows or 2 rows instead of 1
> row.  IOW, it guaranteed that we used a consistent snapshot for each
> individual lookup, not a consistent snapshot for the whole course of a
> command.

That all makes sense, thanks for explaining.

I spent a few more minutes, going "but", "but" though I can now see
good reasons for everything to work this way.

>> So this error could happen in SELECT, UPDATE, DELETE or INSERT as well.
>>
>> Or you see this as something related specifically to MERGE, if so how?
>> Please explain what you see.
>
> As I said before, the problem occurs if the same command looks up the
> same table name in more than one place.  There is absolutely nothing
> to guarantee that we get the same answer every time.

> As far as I
> know, the proposed MERGE patch has that issue an existing DML commands
> don't; but someone else may have better information.

I will look deeper and report back.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-26 Thread Tomas Vondra

On 03/26/2018 06:21 PM, Dean Rasheed wrote:
> On 26 March 2018 at 14:08, Tomas Vondra  wrote:
>> On 03/26/2018 12:31 PM, Dean Rasheed wrote:
>>> A wider concern I have is that I think this function is trying to be
>>> too clever by only resetting selected stats. IMO it should just reset
>>> all stats unconditionally when the column type changes, which would
>>> be consistent with what we do for regular stats.
>>>
>> The argument a year ago was that it's more plausible that the semantics
>> remains the same. I think the question is how the type change affects
>> precision - had the type change in the opposite direction (int to real)
>> there would be no problem, because both ndistinct and dependencies would
>> produce the same statistics.
>>
>> In my experience people are far more likely to change data types in a
>> way that preserves precision, so I think the current behavior is OK.
> 
> Hmm, I don't really buy that argument. Altering a column's type
> allows the data in it to be rewritten in arbitrary ways, and I don't
> think we should presume that the statistics will still be valid just
> because the user *probably* won't do something that changes the data
> much.
> 

Maybe, I can only really speak about my experience, and in those cases
it's usually "the column is an INT and I need a FLOAT". But you're right
it's not guaranteed to be like that, perhaps the right thing to do is
resetting the stats.

Another reason to do that might be consistency - resetting just some of
the stats might be surprising for users. And we're are already resetting
per-column stats on that column, so the users running ANALYZE anyway.

BTW in my response I claimed this:

>
> The other reason is that when reducing precision, it generally
> enforces the dependency (you can't violate functional dependencies or
> break grouping by merging values). So you will have stale stats with
> weaker dependencies, but it's still better than not having any.>

That's actually bogus. For example for functional dependencies, it's
important on which side of the dependency we reduce precision. With
(a->b) dependency, reducing precision of "b" does indeed strengthen it,
but reducing precision of "a" does weaken it. So I take that back.

So, I'm not particularly opposed to just resetting extended stats
referencing the altered column.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: http2 wire format

2018-03-26 Thread Andres Freund
Hi,

On 2018-03-26 20:36:09 +0200, Damir Simunic wrote:
> If so, I’m not suggesting we get rid of FEBE, but leave it as is and 
> complement it with a widely understood and supported protocol, that in fact 
> takes compatibility way more seriously than FEBE. Just leave v3 frozen. Seems 
> like ultimate backward compatibility, no? Or am I missing something?

Maintaining it forever both in postgres, and in various libraries.


> You likely know every possible use case for Postgres, which makes you
> believe that the status quo is the right way. Or maybe I didn’t flesh
> out my proposal enough for you to give it a chance. Either way, I just
> can’t figure out where would HTTP2 be the same as status quo or a step
> backward compared to FEBE. I can see you’re super-busy and dedicated,
> but if you can find the time to enlighten me beyond just waving the
> “compatibility” and “engineering” banners, I’d appreciate you
> endlessly.

Well, besides vague points you've not elaborated what this is actually
gaining us. From my POV, HTTP2 wouldn't solve any of the interesting
protocol issues since those are one layer above what HTTP2 provides.


> (Oh, it’s my data, too; presently held hostage to the v3 protocol).

What on earth. You can freely migrate off.


> You mention twice loss of control--what exactly is the fear? 

If there's issues with what the library does or how it does it - we
can't fix them ourselves.


> You know what? HTTP2 just might fix it. Getting a new protocol into
> the core will force enough adjustments to the code to open the door
> for the next protocol on the horizon: QUIC, which happens to be UDP
> based, and might just be the ticket. At a minimum it will get
> significantly more people thinking about the possibility of
> reattaching sessions and doing all kinds of other things. Allowing
> multiple protocols is not very different from allowing a multitude of
> pl implementations.

> Help me put HTTP2 in place, and I’ll bet you, within a few months someone 
> will come up with a patch for QUIC. And then someone else will remember your 
> paragraph above and say “hmm, let’s see…"

At least I do *NOT* want many protocols in core. We've a hard enough
time to keep up with integrating patches and maintenance to not just
willy nilly integrate multiple new features with unclear lifetimes.

> 
> > I realize that
> > webservers manage to have pretty lightweight sessions, but that's not a
> > property of the protocol they use, it's a property of their internal
> > architectures.  We can't get there without a massive rewrite of the PG
> > server --- one that would be largely independent of any particular way of
> > representing data on the wire, anyway.
> > 
> 
> A smart outsider might come along, look at an ultra-fast web server,
> then look at Postgres and think, “Hmm, both speak HTTP2, but one is
> blazing fast, the other slow.

Err. What does http2 or the v3 protocol have to do with any of this? The
performance issues v3 has are all above where http2 would be, so ...?


> There are three alternatives to the proposal: do nothing, make a few
> anemic changes to v3, or start a multiyear discussion on the design of
> the next protocol. And you’ll still converge to something like HTTP2
> or QUIC.

*NONE* of the interesting problems are solved by HTTP2. You *still*
need a full blown protocol ontop of it. So no, this doesn't change that.


Greetings,

Andres Freund



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-26 Thread Dean Rasheed
On 18 March 2018 at 23:57, Tomas Vondra  wrote:
> Attached is an updated version of the patch series, addressing issues
> pointed out by Alvaro.

I've just been reading the new code in
statext_clauselist_selectivity() and mcv_clauselist_selectivity(), and
I'm having a hard time convincing myself that it's correct.

This code in statext_clauselist_selectivity() looks a bit odd:

/*
 * Evaluate the MCV selectivity. See if we got a full match and the
 * minimal selectivity.
 */
if (stat->kind == STATS_EXT_MCV)
s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid,
jointype, sjinfo, rel,
, _lowsel);

/*
 * If we got a full equality match on the MCV list, we're done (and the
 * estimate is likely pretty good).
 */
if (fullmatch && (s1 > 0.0))
return s1;

/*
 * If it's a full match (equalities on all columns) but we haven't found
 * it in the MCV, then we limit the selectivity by frequency of the last
 * MCV item. Otherwise reset it to 1.0.
 */
if (!fullmatch)
mcv_lowsel = 1.0;

return Min(s1, mcv_lowsel);

So if fullmatch is true and s1 is greater than 0, it will return s1.
If fullmatch is true and s1 equals 0, it will return Min(s1,
mcv_lowsel) which will also be s1. If fullmatch is false, mcv_lowsel
will be set to 1 and it will return Min(s1, mcv_lowsel) which will
also be s1. So it always just returns s1, no? Maybe there's no point
in computing fullmatch.

Also, wouldn't mcv_lowsel potentially be a significant overestimate
anyway? Perhaps 1 minus the sum of the MCV frequencies might be
closer, but even that ought to take into account the number of
distinct values remaining, although that information may not always be
available.

Also, just above that, in statext_clauselist_selectivity(), it
computes the list stat_clauses, then doesn't appear to use it
anywhere. I think that would have been the appropriate thing to pass
to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses
into mcv_clauselist_selectivity() will cause it to fail to find any
matches and then underestimate.

I've also come across a few incorrect/out-of-date comments:

/*
 * mcv_clauselist_selectivity
 *  Return the estimated selectivity of the given clauses using MCV list
 *  statistics, or 1.0 if no useful MCV list statistic exists.
 */

-- I can't see any code path that returns 1.0 if there are no MCV
stats. The last part of that comment is probably more appropriate to
statext_clauselist_selectivity()


/*
 * mcv_update_match_bitmap
 * [snip]
 * The function returns the number of items currently marked as 'match', and
 * ...

-- it doesn't seem to return the number of items marked as 'match'.

Then inside that function, this comment is wrong (copied from the
preceding comment):

/* AND clauses assume nothing matches, initially */
memset(bool_matches, STATS_MATCH_FULL, sizeof(char) *
mcvlist->nitems);

Still reading...

Regards,
Dean



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 15:42, Alvaro Hernandez  wrote:
> 
> 
> 
> On 26/03/18 13:11, Damir Simunic wrote:
>>> On 26 Mar 2018, at 11:13, Vladimir Sitnikov >> > wrote:
>>> 
>>> Damir> * What are the criteria for getting this into the core?
>>> Craig>Mine would be: 
>>> 
>>> +1
>>> 
>>> There's a relevant list as well: 
>>> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md
>>>  
>>> 
>>>  
>>> 
>> 
>> This is a great addition to the list, thanks!
>> 
>> Damir
>> 
> 
> Hi Damir.
> 
> I'm interested in the idea. However, way before writing a PoC, IMVHO I'd 
> rather write a detailed document including:
> 
> - A brief summary of the main features of HTTP2 and why it might be a good 
> fit for PG (of course there's a lot of doc in the wild about HTTP/2, so just 
> a summary of the main relevant features and an analysis of how it may fit 
> Postgres).
> 
> - A more or less thorough description of how every feature in current 
> PostgreSQL protocol would be implemented on HTTP/2.
> 
> - Similar to the above, but applied to the v4 TODO feature list.
> 
> - A section for connection poolers, as  an auth, as these are very important 
> topics.
> 
> 
> Hope this helps,
> 
> Álvaro
> 

Álvaro, it does help, thanks. This discussion is to inform such a document. But 
the topic is such that having a good PoC will move the discussion further much 
faster. 

Can you help with thinking about how would HTTP2 impact connection poolers, I 
don’t know much about those?
> -- 
> 
> Alvaro Hernandez
> 
> 
> ---
> OnGres



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 18:19, Vladimir Sitnikov  
> wrote:
> 
> Tom>But starting from the assumption that HTTP2 solves our problems seems to 
> me to be "Here's a hammer.
> 
> Agree.

Funny you agree with that—for someone having the experience of writing a driver 
and having a long list of things that you find wrong and frustrating, one would 
expect you do look at how other protocols work, or at least consider that maybe 
the right way is to change something server side.

> 
> Just a side note: if v4 is ever invented I wish client language support
> is considered.
> It does take resources to implement message framing, and data parsing (e.g. 
> int, timestamp, struct, array, ...) for each language independently.

This is a strange statement about framing. Did you know that Go has HTTP2 
support in the standard library? And so does java, too? 
https://github.com/http2/http2-spec/wiki/Implementations 


The part I hinted at in the example but did not get the message across is that 
I’m advocating the best possible client language support. The right way is to 
stop writing drivers and format the data server side. Why parse some obscure 
Postgres-specific binary data types when you can have the database send you the 
data in the serialization format of your client language? Or JSON or protobuf 
or whatever you want. What if your application has data patterns that would 
benefit from being sent over the wire in some specific columnar format? 
Wouldn’t it be cool if you could add that to the server and have all clients 
just work, without being locked in into a language because of its driver?

My point is that you go in steps. Put the foot in the door first, enable 
experimentation and then you’ll get to where you want to be.


> 
> Vladimir 



Re: Why does load_external_function() return PGFunction?

2018-03-26 Thread Andres Freund
Hi,

On 2018-02-06 15:43:29 -0500, Tom Lane wrote:
> void* isn't necessarily compatible with function pointers --- there are
> platforms where they're physically different widths, though possibly
> you'd never get PG to run on such hardware anyway.

Btw, given that we store function pointers in datums, that ship has
sailed a long while ago, no?

Greetings,

Andres Freund



Re: Why does load_external_function() return PGFunction?

2018-03-26 Thread Andres Freund
On 2018-03-26 11:14:03 -0400, Robert Haas wrote:
> On Sat, Mar 24, 2018 at 4:33 PM, Andres Freund  wrote:
> > I don't think it really changes the need, but it's worthwhile to note
> > that gcc-8 warns about this now:
> > /home/andres/src/postgresql/src/backend/postmaster/bgworker.c: In function 
> > ‘LookupBackgroundWorkerFunction’:
> > /home/andres/src/postgresql/src/backend/postmaster/bgworker.c:1246:9: 
> > warning: cast between incompatible function types from ‘PGFunction’ {aka 
> > ‘long unsigned int (*)(struct FunctionCallInfoData *)’} to ‘void 
> > (*)(Datum)’ {aka ‘void (*)(long unsigned int)’} [-Wcast-function-type]
> >   return (bgworker_main_type)
> 
> That's probably going to mean we need, or at least want, to do
> something about this at some point.  Warning-free compiles are
> desirable.

Right. We can suppress those if we want, either by adding another cast,
or by adding -Wno-cast-function-type. The latter wouldn't be entirely a
bad idea, it seems to be a warning of very limited benefit.

Greetings,

Andres Freund



Re: Re: csv format for psql

2018-03-26 Thread Daniel Verite
Pavel Stehule wrote:

> b) the list of pset options is bloating - every possible new format can
> introduce fieldsep_X option

What new format?
The usefulness of fieldsep does not extend outside of xSV, and it's
no suprise that there have been no other use for a fieldsep-like
variable until now, with the other formats supported in psql.

In fact it's even absurd for any format to use a non-fixed separator
at a place that is key for being parsed unambiguously.

We could even support only the comma and make it non-configurable
based on the fact it's Comma-Separated-Values, not
Whatever-Separated-Values, except that won't do much
to serve the users interests, as the reality is that
people use various separators.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-26 Thread Claudio Freire
On Mon, Mar 26, 2018 at 2:46 PM, Claudio Freire  wrote:
> On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire  
> wrote:
>> On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane  wrote:
>>> Claudio Freire  writes:
 On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane  wrote:
> I hadn't paid any attention to this patch previously, so maybe I'm
> missing something ... but this sure seems like a very bizarre approach
> to the problem.  If the idea is to fix the FSM's upper levels after
> vacuuming a known sub-range of the table, why would you not proceed
> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
> of page numbers?  This setup with a threshold seems entirely Rube
> Goldbergian.  It's dependent on a magic threshold number that you can
> only select by trial and error, and it's inevitably going to spend time
> updating segments of the FSM tree that have nothing to do with the part
> that's been vacuumed.
>>>
 Well, the point is to not only update the range we know we've
 vacuumed, but also to finish the updates done by a potential
 previously cancelled autovacuum.
>>>
>>> I think that's not an important consideration, or at least would stop
>>> being one after a patch like this.  The reason it's a problem now is
>>> precisely that we don't try to vacuum the FSM till the very end; if
>>> we did FSM cleanup every X pages --- in particular, before not after
>>> the final relation-truncation attempt --- there wouldn't be risk of
>>> skipping so much FSM work that we need to worry about forcing the
>>> issue just in case there'd been a prior cancellation.
>>
>> I'm thinking that in conjunction with the high MWM patch for vacuum,
>> it could still happen that considerable amount of vacuuming is left
>> unexposed upon cancellation.
>>
>> The "bizarre" approach provides some relief.
>>
>> I'll see about implementing the FSM range vacuuming operation for
>> non-initial runs, there seems to be consensus that it's a good idea.
>>
>> But I still think this partial run at the very beginning is useful still.
>
> Attached patches, rebased and modified as discussed:
>
> 1 no longer does tree pruning, it just vacuums a range of the FSM
>
> 2 reintroduces tree pruning for the initial FSM vacuum
>
> 3 and 4 remain as they were, but rebased

Sorry, ignore patch number 3 in the earlier mail, I selected the wrong
patch to attach.

Attached now is the correct number 3
From c069f7590fd0ff24aa77d8025eda7e17527bca71 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 26 Feb 2018 13:23:59 -0300
Subject: [PATCH 3/4] FSM: Fix relation extension FSM update

When updating the FSM recursively during relation extension,
the update could step on higher-level entries with an incorrect
value, by always bubbling up the leaf value instead of the root
value that is the result of the bubble up in fsm_set_avail.

If somehow the root contained a different value from the new value
set by fsm_update_recursive, upper levels would be set to the
wrong value instead.

This didn't happen often since the value set during relation
extension is usually pretty high and unlikely to be beaten by
other pre-existing values, but the possibility existed nonetheless,
especially during bulk loads.
---
 src/backend/storage/freespace/freespace.c | 36 ++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 2f48092..80f5e80 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -107,6 +107,8 @@ static Size fsm_space_cat_to_avail(uint8 cat);
 /* workhorse functions for various operations */
 static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
    uint8 newValue, uint8 minValue);
+static uint8 fsm_set_and_get_maxval(Relation rel, FSMAddress addr, uint16 slot,
+	uint8 newValue);
 static BlockNumber fsm_search(Relation rel, uint8 min_cat);
 static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof,
 			 BlockNumber start, BlockNumber end);
@@ -719,6 +721,33 @@ fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 }
 
 /*
+ * Set value in given FSM page and slot.
+ *
+ * The new value at the root node is returned.
+ */
+static uint8
+fsm_set_and_get_maxval(Relation rel, FSMAddress addr, uint16 slot, uint8 newValue)
+{
+	Buffer		buf;
+	Page		page;
+	uint8		newmax = 0;
+
+	buf = fsm_readbuf(rel, addr, true);
+	LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+	page = BufferGetPage(buf);
+
+	if (fsm_set_avail(page, slot, newValue))
+		MarkBufferDirtyHint(buf, false);
+
+	newmax = fsm_get_avail(page, 0);
+
+	UnlockReleaseBuffer(buf);
+
+	return newmax;
+}
+
+/*
  * Search the tree for a heap page with at least min_cat of free space
  */
 

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-26 Thread Claudio Freire
On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire  wrote:
> On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane  wrote:
>> Claudio Freire  writes:
>>> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane  wrote:
 I hadn't paid any attention to this patch previously, so maybe I'm
 missing something ... but this sure seems like a very bizarre approach
 to the problem.  If the idea is to fix the FSM's upper levels after
 vacuuming a known sub-range of the table, why would you not proceed
 by teaching FreeSpaceMapVacuum to recurse only within that sub-range
 of page numbers?  This setup with a threshold seems entirely Rube
 Goldbergian.  It's dependent on a magic threshold number that you can
 only select by trial and error, and it's inevitably going to spend time
 updating segments of the FSM tree that have nothing to do with the part
 that's been vacuumed.
>>
>>> Well, the point is to not only update the range we know we've
>>> vacuumed, but also to finish the updates done by a potential
>>> previously cancelled autovacuum.
>>
>> I think that's not an important consideration, or at least would stop
>> being one after a patch like this.  The reason it's a problem now is
>> precisely that we don't try to vacuum the FSM till the very end; if
>> we did FSM cleanup every X pages --- in particular, before not after
>> the final relation-truncation attempt --- there wouldn't be risk of
>> skipping so much FSM work that we need to worry about forcing the
>> issue just in case there'd been a prior cancellation.
>
> I'm thinking that in conjunction with the high MWM patch for vacuum,
> it could still happen that considerable amount of vacuuming is left
> unexposed upon cancellation.
>
> The "bizarre" approach provides some relief.
>
> I'll see about implementing the FSM range vacuuming operation for
> non-initial runs, there seems to be consensus that it's a good idea.
>
> But I still think this partial run at the very beginning is useful still.

Attached patches, rebased and modified as discussed:

1 no longer does tree pruning, it just vacuums a range of the FSM

2 reintroduces tree pruning for the initial FSM vacuum

3 and 4 remain as they were, but rebased
From f640e1b75b0644ac41b006ac2c69b81b7ddf Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/4] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Implement heap block range FSM vacuuming, and make vacuumlazy use
it to vacuum the affected FSM at intermediate steps.
Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.
---
 src/backend/commands/vacuumlazy.c | 57 +-
 src/backend/storage/freespace/freespace.c | 67 ---
 src/include/storage/freespace.h   |  2 +
 3 files changed, 118 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index f9da24c..c44996f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -85,6 +85,19 @@
 #define VACUUM_TRUNCATE_LOCK_TIMEOUT			5000	/* ms */
 
 /*
+ * When a table has no indexes, vacuum the FSM at most every 1/Nth
+ * of the relation has been vacuumed to prevent bloat during long-running
+ * vacuums. This specifies the N.
+ */
+#define VACUUM_FSM_EVERY_FRACTION 8
+
+/*
+ * When a table has no indexes, vacuum the FSM at most every 8GB
+ * worth of dirty pages.
+ */
+#define VACUUM_FSM_EVERY_PAGES (((Size)8 * 1024 * 1024 * 1024) / BLCKSZ)
+
+/*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
  * tables.
@@ -465,7 +478,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
-vacuumed_pages;
+vacuumed_pages,
+freespace_updated_pages,

Re: Proposal: http2 wire format

2018-03-26 Thread Alvaro Hernandez



On 26/03/18 13:11, Damir Simunic wrote:
On 26 Mar 2018, at 11:13, Vladimir Sitnikov 
> wrote:


Damir> * What are the criteria for getting this into the core?
Craig>Mine would be:

+1

There's a relevant list as well: 
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md 





This is a great addition to the list, thanks!

Damir



    Hi Damir.

    I'm interested in the idea. However, way before writing a PoC, 
IMVHO I'd rather write a detailed document including:


- A brief summary of the main features of HTTP2 and why it might be a 
good fit for PG (of course there's a lot of doc in the wild about 
HTTP/2, so just a summary of the main relevant features and an analysis 
of how it may fit Postgres).


- A more or less thorough description of how every feature in current 
PostgreSQL protocol would be implemented on HTTP/2.


- Similar to the above, but applied to the v4 TODO feature list.

- A section for connection poolers, as  an auth, as these are very 
important topics.



    Hope this helps,

    Álvaro

--

Alvaro Hernandez


---
OnGres



Re: WIP: a way forward on bootstrap data

2018-03-26 Thread John Naylor


> On Mar 26, 2018, at 10:44 PM, Tom Lane  wrote

> Layout of .dat files seems generally reasonable, but I don't understand
> the proposed make rule:
> 
> +reformat-dat-files:
> +$(PERL) -I $(catalogdir) $< catalog/rewrite_dat.pl -o catalog 
> catalog/pg_*.dat
> 
> This rule has no prerequisite, so what's $< supposed to be?  Also, I think
> the rule probably ought to be located in src/include/catalog/Makefile,
> because that's typically where you'd be cd'd to when messing with the
> .dat files, I'd think.  (Hm, I see no such makefile, but maybe it's time
> for one.  A convenience rule located one level up doesn't seem very
> convenient.)
> 

Oops, copy-pasto. And I’ll see about a new Makefile.

>> It seems like with that, it'd be good to split off the data-format
>> section of the README into a new file, maybe README.data, which will
>> contain code snippets and some example scenarios. I'll include the
>> example pg_proc.prokind merger among those.
> 
> It would be more work, but maybe we should move this into the main
> SGML docs.  It seems rather silly to have SGML documentation for the
> .BKI file format, which now will be an internal matter that hardly
> any developers need worry about, but not for the .DAT file format.
> But I understand if that seems a bridge too far for today --- certainly
> a README file is way better than nothing.

Makes sense on all points. I’m not optimistic about creating a new sgml doc on 
time, but I’ll keep it in mind.

-John Naylor 


Re: PATCH: Exclude temp relations from base backup

2018-03-26 Thread David Steele
On 3/26/18 1:06 PM, Stephen Frost wrote:
> 
> * Teodor Sigaev (teo...@sigaev.ru) wrote:
>> Will autovacuum (or something else) complain about absense of relfile during
>> orphan table deleting? I mean, you get a base backup without temp tables,
>> then you try to run postgres on it and will it complain about existing
>> record in pg_class and absence of corresponding relfile?
> 
> I would certainly hope not considering that's what happens during
> regular crash recovery also, so if there's an issue with that, we'd have
> a problem in released versions.

Agreed.  The logic for pg_basebackup was modeled off RemovePgTempFiles()
which is called at postmaster start.  We are just doing the cleanup in
advance (in the backup only, of course).

Thanks,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: PATCH: Exclude temp relations from base backup

2018-03-26 Thread Stephen Frost
Greetings,

* Teodor Sigaev (teo...@sigaev.ru) wrote:
> Will autovacuum (or something else) complain about absense of relfile during
> orphan table deleting? I mean, you get a base backup without temp tables,
> then you try to run postgres on it and will it complain about existing
> record in pg_class and absence of corresponding relfile?

I would certainly hope not considering that's what happens during
regular crash recovery also, so if there's an issue with that, we'd have
a problem in released versions.

There's an independent discussion that was being had recently about how
to make sure those records in pg_class get cleaned up in a reasonable
timeframe and don't lead to problems with wrap-arounds, but that's a
different and pre-existing issue.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] pg_serial early wraparound

2018-03-26 Thread Tom Lane
Thomas Munro  writes:
> Rebased again, now with a commit message.  That assertion has since
> been removed (commit ec99dd5a) so the attached test script can once
> again be used to see the contents of pg_serial as the xid goes all the
> way around, if you build with TEST_OLDSERXID defined so that
> predicate.c forces information about xids out to pg_serial.

Couple thoughts here ---

Seems like if the patch is correct as-is, then the OldSerXidPage
macro could be simplified, as the modulo no longer does anything.
Also, OldSerXidSegment doesn't seem to be used.

I'm a little worried because Anastasia couldn't repeat the test;
why is that?

regards, tom lane



Re: new function for tsquery creartion

2018-03-26 Thread Dmitry Ivanov

Patch 03 (the documentation) needed some proof-reading.  I've attached
a new version of that patch with some small suggested improvements.


Thanks, I'm definitely going to use this.

Is there anything to_tsquery() can do that websearch_to_tsquery() 
can't?


Currently, no.


Would it be OK to use user-supplied websearch strings?
Ie can it produce a syntax error?


I believe that's the most important question. After a private discussion 
with Theodor I came to a conclusion that the most beneficial outcome 
would be to suppress all syntax errors and give user some result, 
cutting all misused operators along the way. This requires some changes, 
though.


Is there any way to write OR as a term (that's a valuable non-stopword 
in French)?


You could quote it like this: websearch_to_tsquery('"or"');

Moreover, it's still possible to use & and |.


It seems like AROUND(x) should be documented also more generally for
tsquery, but I see there is some discussion about how that should
look.


Personally, I like  operator better. It would instantly deprecate 
AROUND(N), which is why I'm going to drop it.



By the way, not this patch's fault, but I noticed that commit
f5f1355dc4d did this:

-   (errmsg("query contains only
stopword(s) or doesn't contain lexeme(s), ignored")));
+   (errmsg("text-search query contains
only stop words or doesn't contain lexemes, ignored")));

But the old test still appears in an example in 
doc/src/sgml/textsearch.sgml.


Will fix this.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2018-03-26 Thread Tom Lane
Chapman Flack  writes:
> On 03/25/18 23:27, Stephen Frost wrote:
>> AdvanceXLInsertBuffer() does quite a bit, so I'm a bit surprised to see
>> this simply removing that call, you're confident there's nothing done
>> which still needs doing..?

> My belief from looking at the code was that AdvanceXLInsertBuffer() is among
> the things GetXLogBuffer() does, so calling both would result in two calls
> to the former (which I don't believe would hurt, it would only
> do enough work the second time to determine it had already been done).

If that's the argument, why is the WALInsertLockUpdateInsertingAt(CurrPos)
call still there?  GetXLogBuffer() would do that too.

In any case, the new comment seems very poorly written, as it fails to
explain what's going on, and the reference to a function that is not
actually called from the vicinity of the comment doesn't help.  I'd
suggest something like "GetXLogBuffer() will fetch and initialize the
next WAL page for us.  But it initializes the page header to nonzero,
which is undesirable because X, so we then reset the header to zero".
It might also be worth explaining how you know that the new page's
header is short not long.

regards, tom lane



Re: Using base backup exclusion filters to reduce data transferred with pg_rewind

2018-03-26 Thread Fujii Masao
On Sun, Mar 25, 2018 at 5:06 PM, Michael Paquier  wrote:
> On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote:
>> Stephen Frost  writes:
>>> I don't completely buy off on the argument that having these #define's
>>> would make it easier for forks (we've had quite a few folks fork PG, but
>>> how many of them have actually changed "base"?) and I'm on the fence
>>> about if these will make our lives simpler down the road when it comes
>>> to changing the directory names
>>
>> I am distressed that nobody, apparently, is putting any weight on the
>> back-patching pain that will result from widespread replacement of path
>> names with macros.  I don't buy that either we or anyone else will need
>> to change these names in future, so I see pain and effectively no
>> gain.
>
> That's actually something I worry about as well (as the author!), which
> is why I qualify the changes as intrusive.  At the end, I think that I
> would be tempted to just do #3, aka to keep a copy of the filter list in
> pg_rewind code while hardcoding a minimum of names and mention in both
> basebackup.c and pg_rewind code to not forget to update the filter list
> if necessary.

+1. It's better for us to focus on the code change of the fillter on pg_rewind
rather than such "refactoring". As I told upthread, the previous patch has the
problem where the files which should be skipped are not skipped. ISTM that,
in pg_rewind, the fillter should be triggered in recurse_dir() not
process_source_file().

BTW what should pg_rewind do when it finds the directory which should be
skipped, in the source directory? In your patch, pg_rewind just tries to skip
that directory at all. But isn't this right behavior? If that directory doesn't
exist in the target directory(though I'm not sure if this situation is really
possible), I'm thinking that pg_rewind should create that "empty" directory
 in the target. No?

Regards,

-- 
Fujii Masao



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-26 Thread Dean Rasheed
On 26 March 2018 at 14:08, Tomas Vondra  wrote:
> On 03/26/2018 12:31 PM, Dean Rasheed wrote:
>> A wider concern I have is that I think this function is trying to be
>> too clever by only resetting selected stats. IMO it should just reset
>> all stats unconditionally when the column type changes, which would
>> be consistent with what we do for regular stats.
>>
> The argument a year ago was that it's more plausible that the semantics
> remains the same. I think the question is how the type change affects
> precision - had the type change in the opposite direction (int to real)
> there would be no problem, because both ndistinct and dependencies would
> produce the same statistics.
>
> In my experience people are far more likely to change data types in a
> way that preserves precision, so I think the current behavior is OK.

Hmm, I don't really buy that argument. Altering a column's type allows
the data in it to be rewritten in arbitrary ways, and I don't think we
should presume that the statistics will still be valid just because
the user *probably* won't do something that changes the data much.

Regards,
Dean



Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
Tom>But starting from the assumption that HTTP2 solves our problems seems
to me to be "Here's a hammer.

Agree.

Just a side note: if v4 is ever invented I wish client language support
is considered.
It does take resources to implement message framing, and data parsing (e.g.
int, timestamp, struct, array, ...) for each language independently.

Vladimir


Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Simon Riggs
On 26 March 2018 at 16:09, Robert Haas  wrote:
> On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs  wrote:
>> Since we now have MVCC catalog scans, all the name lookups are
>> performed using the same snapshot so in the above scenario the newly
>> created object would be invisible to the second name lookup.
>
> That's not true, because each lookup would be performed using a new
> snapshot -- not all under one snapshot.

You're saying we take a separate snapshot for each table we lookup?
Sounds weird to me.

So this error could happen in SELECT, UPDATE, DELETE or INSERT as well.

Or you see this as something related specifically to MERGE, if so how?
Please explain what you see.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
Damir>Postgres doesn’t know about grpc, s

I'm afraid you are missing the point.
I would say PostgreSQL doesn't know about HTTP/2.
It is the same as "PostgreSQL doesn't know about grpc".

Here's a quote from your pg_h2 repo:
>What we need is to really build a request object and correctly extract
> the full payload and parameters from the request. For example,
>maybe we want to implement a QUERY method, similar to POST or PUT,
> and pass the query text as the body of the request, with parameters
> in the query string or in the headers

It basically suggests to implement own framing on top of HTTP/2.

When I say GRPC, I mean "implement PostgreSQL-specific protocol via GRPC
messages".

Let's take current message formats:
https://www.postgresql.org/docs/current/static/protocol-message-formats.html
If one defines those message formats via GRPC, then GRPC would autogenerate
parsers and serializers for lots of languages "for free".

For instance
Query (F)
 Byte1('Q') Identifies the message as a simple query.
 Int32 Length of message contents in bytes, including self.
 String The query string itself.

can be defined via GPRC as
message Query {
  string queryText = 1;
}

This is trivial to read, trivial to write, trivial to maintain, and it
automatically generates parsers/generators for lots of languages.


Parsing of the current v3 protocol has to be reimplemented for each and
every language, and it would be pain to implement parsing for v4.
Are you going to create "http/2" clients for Java, C#, Ruby, Swift, Dart,
etc, etc?

I am not saying that a mere redefinition of v3 messages as GRPC would do
the trick. I am saying that you'd better consider frameworks that would
enable transparent implementation of client libraries.

Damir>and will talk to any HTTP2 conforming client

I do not see where are you heading to.
Is "curl as PostgreSQL client" one of the key objectives for you?
True clients (the ones that are used by the majority of applications)
should support things like "prepared statements", "data types", "cursors"
(resultset streaming), etc. I can hardly imagine a case when one would use
"curl" and operate with prepared statements.
I think psql is pretty good client, so I see no point in implementing
HTTP/2 for a mere reason of using curl to fetch data from the DB.

Vladimir


Re: [PATCH] Add missing type conversion functions for PL/Python

2018-03-26 Thread Nikita Glukhov

On 26.03.2018 17:19, David Steele wrote:


On 2/20/18 10:14 AM, Haozhou Wang wrote:

Thank you very much for your review!

I attached a new patch with typo fixed.

I think it's a bit premature to mark this Ready for Committer after a
review consisting of a few typos.  Anthony only said that he started
looking at it so I've marked it Needs Review.


Hi.

I also have looked at this patch and found some problems.
Attached fixed 3th version of the patch:
 * initialization of arg->u.scalar was moved into PLy_output_setup_func()
 * added range checks for int16 and int32 types
 * added subroutine PLyInt_AsLong() for correct handling OverflowError that can
   be thrown from PyInt_AsLong()
 * casting from Python float to PostgreSQL numeric using PyFloat_AsDouble() was
   removed because it can return incorrect result for Python long and
   float8_numeric() uses float8 and numeric I/O functions
 * fixed whitespace


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index d6a6a84..3b874e1 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -46,6 +46,18 @@ static PyObject *PLyDict_FromTuple(PLyDatumToOb *arg, HeapTuple tuple, TupleDesc
 /* conversion from Python objects to Datums */
 static Datum PLyObject_ToBool(PLyObToDatum *arg, PyObject *plrv,
  bool *isnull, bool inarray);
+static Datum PLyObject_ToInt16(PLyObToDatum *arg, PyObject *plrv,
+  bool *isnull, bool inarray);
+static Datum PLyObject_ToInt32(PLyObToDatum *arg, PyObject *plrv,
+  bool *isnull, bool inarray);
+static Datum PLyObject_ToInt64(PLyObToDatum *arg, PyObject *plrv,
+  bool *isnull, bool inarray);
+static Datum PLyObject_ToFloat4(PLyObToDatum *arg, PyObject *plrv,
+   bool *isnull, bool inarray);
+static Datum PLyObject_ToFloat8(PLyObToDatum *arg, PyObject *plrv,
+   bool *isnull, bool inarray);
+static Datum PLyObject_ToNumeric(PLyObToDatum *arg, PyObject *plrv,
+	bool *isnull, bool inarray);
 static Datum PLyObject_ToBytea(PLyObToDatum *arg, PyObject *plrv,
   bool *isnull, bool inarray);
 static Datum PLyObject_ToComposite(PLyObToDatum *arg, PyObject *plrv,
@@ -397,16 +409,35 @@ PLy_output_setup_func(PLyObToDatum *arg, MemoryContext arg_mcxt,
 		{
 			case BOOLOID:
 arg->func = PLyObject_ToBool;
-break;
+return; /* no need to initialize arg->u.scalar */
 			case BYTEAOID:
 arg->func = PLyObject_ToBytea;
+return;	/* no need to initialize arg->u.scalar */
+			case INT2OID:
+arg->func = PLyObject_ToInt16;
+break;
+			case INT4OID:
+arg->func = PLyObject_ToInt32;
+break;
+			case INT8OID:
+arg->func = PLyObject_ToInt64;
+break;
+			case FLOAT4OID:
+arg->func = PLyObject_ToFloat4;
+break;
+			case FLOAT8OID:
+arg->func = PLyObject_ToFloat8;
+break;
+			case NUMERICOID:
+arg->func = PLyObject_ToNumeric;
 break;
 			default:
 arg->func = PLyObject_ToScalar;
-getTypeInputInfo(typeOid, , >u.scalar.typioparam);
-fmgr_info_cxt(typinput, >u.scalar.typfunc, arg_mcxt);
 break;
 		}
+
+		getTypeInputInfo(typeOid, , >u.scalar.typioparam);
+		fmgr_info_cxt(typinput, >u.scalar.typfunc, arg_mcxt);
 	}
 }
 
@@ -884,6 +915,210 @@ PLyObject_ToBool(PLyObToDatum *arg, PyObject *plrv,
 	return BoolGetDatum(PyObject_IsTrue(plrv));
 }
 
+/* Try to convert python Int to C long. */
+static long
+PLyInt_AsLong(PyObject *plrv, bool *err)
+{
+	long		val = PyInt_AsLong(plrv);
+
+	/* If -1 is returned then OverflowError is possible. */
+	*err = val == -1 && PyErr_Occurred();
+
+	if (*err)
+	{
+		/* Catch OverflowError exception. */
+		if (!PyErr_ExceptionMatches(PyExc_OverflowError))
+			PLy_elog(ERROR, NULL);
+
+		PyErr_Clear();
+	}
+
+	return val;
+}
+
+/*
+ * Convert a Python object to a PostgreSQL int16 datum directly.
+ * If can not convert it directly, fallback to PLyObject_ToScalar
+ * to convert it.
+ */
+static Datum
+PLyObject_ToInt16(PLyObToDatum *arg, PyObject *plrv,
+  bool *isnull, bool inarray)
+{
+	if (plrv == Py_None)
+	{
+		*isnull = true;
+		return (Datum) 0;
+	}
+
+	if (PyInt_Check(plrv) || PyLong_Check(plrv))
+	{
+		bool		err;
+		long		val = PLyInt_AsLong(plrv, );
+
+		if (err || val < SHRT_MIN || val > SHRT_MAX)
+			ereport(ERROR,
+	(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+	 errmsg("value \"%s\" is out of range for type %s",
+			PLyObject_AsString(plrv), "smallint")));
+
+		*isnull = false;
+		return Int16GetDatum((int16) val);
+	}
+
+	return PLyObject_ToScalar(arg, plrv, isnull, inarray);
+}
+
+/*
+ * Convert a Python object to a PostgreSQL int32 datum directly.
+ * If can not convert it directly, fallback to PLyObject_ToScalar
+ * to convert it.
+ */
+static Datum
+PLyObject_ToInt32(PLyObToDatum *arg, PyObject *plrv,
+  bool *isnull, bool inarray)
+{
+	if (plrv == Py_None)
+	{
+		*isnull = true;
+		return (Datum) 0;
+	}
+
+	if (PyInt_Check(plrv) || 

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Simon Riggs
On 26 March 2018 at 15:39, Pavan Deolasee  wrote:

> reviewer

1. In ExecMergeMatched() we have a line of code that does this...

if (TransactionIdIsCurrentTransactionId(hufd.xmax))
then errcode(ERRCODE_CARDINALITY_VIOLATION)

I notice this is correct, but too strong. It should be possible to run
a sequence of MERGE commands inside a transaction, repeatedly updating
the same set of rows, as is possible with UPDATE.

We need to check whether the xid is the current subxid and the cid is
the current commandid, rather than using
TransactionIdIsCurrentTransactionId()

On further analysis, I note that ON CONFLICT suffers from this problem
as well, looks like I just refactored it from there.

2. EXPLAIN ANALYZE looks unchanged from some time back. The math is
only correct as long as there are zero rows that do not cause an
INS/UPD/DEL.
We don't test for that. I think this is a regression from an earlier
report of the same bug, or perhaps I didn't fix it at the time.

3. sp. depedning trigger.sgml

4. trigger.sgml replace "specific actions specified" with "events
specified in actions"
to avoid the double use of "specific"

5. I take it the special code for TransformMerge target relations is
replaced by "right_rte"? Seems fragile to leave it like that. Can we
add an Assert()? Do we care?

6. I didn't understand "Assume that the top-level join RTE is at the
end. The source relation
+ * is just before that."
What is there is no source relation?

7. The formatting of the SQL statement in transformMergeStmt that
begins "Construct a query of the form" is borked, so the SQL layout is
unclear, just needs pretty print

8. I didn't document why I thought this was true "XXX if we have a
constant subquery, we can also skip join", but one of the explain
analyze outputs shows this is already true - where we provide a
constant query and it skips the join. So perhaps we can remove the
comment. (Search for "Seq Scan on target t_1")

9. I think we need to mention that MERGE won't work with rules or
inheritance (only partitioning)  in the main doc page. The current
text says that rules are ignored, which would be true if we didn't
specifically throw ERROR feature not supported.

10. Comment needs updating for changes in code below it - "In MERGE
when and condition, no system column is allowed"

11. In comment "Since the plan re-evaluated by EvalPlanQual uses the
second RTE", suggest using "join RTE" to make it more explicit which
RTE we are discussing

12. Missed out merge.sgml from v25 patch.

13. For triggers we say "No separate triggers are defined for
MERGE"
we should also state the same caveat for POLICY events

That's all I can see so far.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-03-26 Thread Fabien COELHO


Hello Marina,


Many thanks to both of you! I'm working on a patch in this direction..


I think that the best approach for now is simply to reset (command
zero, random generator) and start over the whole script, without
attempting to be more intelligent. The limitations should be clearly
documented (one transaction per script), though. That would be a
significant enhancement already.


I'm not sure that we can always do this, because we can get new errors until 
we finish the failed transaction block, and we need destroy the conditional 
stack..


Sure. I'm suggesting so as to simplify that on failures the retry would 
always restarts from the beginning of the script by resetting everything, 
indeed including the conditional stack, the random generator state, the 
variable values, and so on.


This mean enforcing somehow one script is one transaction.

If the user does not do that, it would be their decision and the result 
becomes unpredictable on errors (eg some sub-transactions could be 
executed more than once).


Then if more is needed, that could be for another patch.

--
Fabien.



Re: PATCH: Exclude temp relations from base backup

2018-03-26 Thread Teodor Sigaev

Hi!

Will autovacuum (or something else) complain about absense of relfile during 
orphan table deleting? I mean, you get a base backup without temp tables, then 
you try to run postgres on it and will it complain about existing record in 
pg_class and absence of corresponding relfile?



David Steele wrote:

On 3/13/18 12:34 PM, David Steele wrote:


Updated the patch to change die() to BAIL_OUT() and use append_to_file()
as suggested for another test patch.


Updated patch now that the unlogged table exclusions have been committed
[1].

Thanks,



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: WIP: a way forward on bootstrap data

2018-03-26 Thread Tom Lane
John Naylor  writes:
> With the attachments this time.

Layout of .dat files seems generally reasonable, but I don't understand
the proposed make rule:

+reformat-dat-files:
+   $(PERL) -I $(catalogdir) $< catalog/rewrite_dat.pl -o catalog 
catalog/pg_*.dat

This rule has no prerequisite, so what's $< supposed to be?  Also, I think
the rule probably ought to be located in src/include/catalog/Makefile,
because that's typically where you'd be cd'd to when messing with the
.dat files, I'd think.  (Hm, I see no such makefile, but maybe it's time
for one.  A convenience rule located one level up doesn't seem very
convenient.)

> My clean-up patch tried to make that more consistent. For this sample,
> it would add blank lines before the comments in pg_amop, and remove
> blank lines from the first few entries in pg_type. If you wanted to
> opine on that before I rework that patch, I'd be grateful.

No particular objection to either.

>>> -Maybe document examples of how to do bulk-editing of data files?

>> +1.  In the end, that's the reason we're doing all this work, so showing
>> people how to benefit seems like a good thing.

> It seems like with that, it'd be good to split off the data-format
> section of the README into a new file, maybe README.data, which will
> contain code snippets and some example scenarios. I'll include the
> example pg_proc.prokind merger among those.

It would be more work, but maybe we should move this into the main
SGML docs.  It seems rather silly to have SGML documentation for the
.BKI file format, which now will be an internal matter that hardly
any developers need worry about, but not for the .DAT file format.
But I understand if that seems a bridge too far for today --- certainly
a README file is way better than nothing.

regards, tom lane



Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-26 Thread Tom Lane
Arthur Zakirov  writes:
> On Sun, Mar 25, 2018 at 12:18:10AM -0400, Tom Lane wrote:
>> My thought was (a) the ROLLBACK case is ok, because the next use of
>> the dictionary will reload it, and (b) the reload-concurrently-with-
>> DROP case is annoying, because indeed it leaks, but the window is small
>> and it probably won't be an issue in practice.  We would need to be
>> sure that the DSM segment goes away at postmaster restart, but given
>> that I think it'd be tolerable.  Of course it'd be better not to have
>> the race, but I see no easy way to prevent it -- do you?

> I'm not sure that I understood the second case correclty. Can cache
> invalidation help in this case? I don't have confident knowledge of cache
> invalidation. It seems to me that InvalidateTSCacheCallBack() should
> release segment after commit.

"Release after commit" sounds like a pretty dangerous design to me,
because a release necessarily implies some kernel calls, which could
fail.  We can't afford to inject steps that might fail into post-commit
cleanup (because it's too late to recover by failing the transaction).
It'd be better to do cleanup while searching for a dictionary to use.

I assume the DSM infrastructure already has some solution for getting
rid of DSM segments when the last interested process disconnects,
so maybe you could piggyback on that somehow.

regards, tom lane



Re: Why does load_external_function() return PGFunction?

2018-03-26 Thread Robert Haas
On Sat, Mar 24, 2018 at 4:33 PM, Andres Freund  wrote:
> I don't think it really changes the need, but it's worthwhile to note
> that gcc-8 warns about this now:
> /home/andres/src/postgresql/src/backend/postmaster/bgworker.c: In function 
> ‘LookupBackgroundWorkerFunction’:
> /home/andres/src/postgresql/src/backend/postmaster/bgworker.c:1246:9: 
> warning: cast between incompatible function types from ‘PGFunction’ {aka 
> ‘long unsigned int (*)(struct FunctionCallInfoData *)’} to ‘void (*)(Datum)’ 
> {aka ‘void (*)(long unsigned int)’} [-Wcast-function-type]
>   return (bgworker_main_type)

That's probably going to mean we need, or at least want, to do
something about this at some point.  Warning-free compiles are
desirable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Direct converting numeric types to bool

2018-03-26 Thread Tom Lane
n.zhuch...@postgrespro.ru writes:
> Yes, in postgres already exists int::boolean casting and in case:
> select 10::bigint::int::boolean;
> it will perfectly work. But if you want to cast more significant number:
> select (2^32)::bigint::int::boolean;
> you receive the boundary value error.

Right.  I don't see a reason why we shouldn't create a bigint-to-bool
cast to fix that, and smallint-to-bool maybe for completeness.  However,
I'm less excited about float or numeric to bool, because I don't think
there is any very principled argument about what such a cast should do
with infinities, NaNs, or denormalized numbers.  Somebody who knows
what should happen for their own application can create their own cast
that handles those cases ... but I'm unsure that there's a one-size-
fits-all answer that we could put into a default behavior.

regards, tom lane



Re: bugfifx: a minor mistake in brin_inclusion.c comment

2018-03-26 Thread Alvaro Herrera
Tomas Vondra wrote:
> Hi,
> 
> while looking at brin_inclusion.c, I've noticed that the comment about
> INCLUSION_UNMERGEABLE and INCLUSION_CONTAINS_EMPTY uses incorrect values
> (1 instead of 2). Attached is a simple fix.

Uh, yeah, this is wrong.

> But perhaps it would be better to use the constants in the comment.

I fixed it like this, which seems neater.

Thanks!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



  1   2   >