Re: [HACKERS] Tracking wait event for latches

2016-06-01 Thread Michael Paquier
On Thu, Jun 2, 2016 at 12:25 PM, Thomas Munro
 wrote:
> This patch allows identifiers to be specified by the WaitLatch and
> WaitLatchOrSocket calls, but not for WaitEventSetWait, which is the
> more general waiting primitive.  I think it should be done by
> WaitEventSetWait, and merely passed down by those wrappers functions.

The advantage of having WaitEventSetWait track is that we could track
the events of secure_read and secure_write. One potential problem by
doing so is if those routines are called during authentication. I
don't recall that's the case, but this needs a double-check.

> These are actually names for *wait points*, not names for latches.

OK.

> Some of the language in this patch makes it sound like they are latch
> names/latch identifiers, which is inaccurate (the latches themselves
> wouldn't be very interesting eg MyLatch).  In some cases the main
> thing of interest is actually a socket or timer anyway, not a latch,
> so maybe it should appear as wait_event_type = WaitEventSet?

Hm. A latch could wait for multiple types things it is waiting for, so
don't you think we'd need to add more details in what is reported to
pg_stat_activity? There are two fields now, and in the context of this
patch:
- wait_event_type, which I'd like to think is associated to a latch,
so I named it so.
- wait_event, which is the code path that we are waiting at, like
SyncRep, the WAL writer main routine, etc.

Now if you would like to get a list of all the things that are being
waited for, shouldn't we add a third column to the set that has text[]
as return type? Let's name it wait_event_details, and for a latch we
have the following:
- WL_LATCH_SET
- WL_POSTMASTER_DEATH
- WL_SOCKET_READABLE
- WL_SOCKET_WRITEABLE
Compared to all the other existing wait types, that's a bit new and
that's exclusive to latches because we need a higher level of details.
Don't you think so? But actually I don't think that's necessary to go
into this level of details. We already know what a latch is waiting
for by looking at the code...

> Is there a reason you chose names like 'WALWriterMain'?  That
> particular wait point is in the function WalWriterMain (note different
> case).  It seems odd to use the containing function names to guide
> naming, but not use them exactly.  Since this namespace needs to be
> able to name wait points anywhere in the postgres source tree (and
> maybe outside it too, for extensions), maybe it should be made
> hierarchical, like 'walwriter.WalWriterMain' (or some other
> organisational scheme).

Yeah, possibly this could be improved. I have put some thoughts in
having clear names for each one of them, so I'd rather keep them
simple.

> I personally think it would be very useful for extensions to be able
> to name their wait points.  For example, I'd rather see
> 'postgres_fdw.pgfdw_get_result' or similar than a vague 'Extension'
> string which appears not only for all wait points in an extension but
> also for all extensions.  I hope we can figure out a good way to do
> that.  Clearly that would involve some shmem registry machinery to
> make the names visible across backends (a similar problem exists with
> lock tranche names).

This patch is shaped this way intentionally based on the feedback I
received at PGCon (Robert and others). We could provide a routine that
extensions call in _PG_init to register a new latch event name in
shared memory, but I didn't see much use in doing so, take for example
the case of background worker, it is possible to register a custom
string for pg_stat_activity via pgstat_report_activity. One could take
advantage of having custom latch wait names in shared memory if an
extension has wait points with latches though... But I am still not
sure if that's worth the complexity.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMENT ON, psql and access methods

2016-06-01 Thread Michael Paquier
On Wed, Jun 1, 2016 at 8:25 PM, Teodor Sigaev  wrote:
 As far as I can see, COMMENT ON has no support for access methods.
 Wouldn't we want to add it as it is created by a command? On top of
 that, perhaps we could have a backslash command in psql to list the
 supported access methods, like \dam[S]? The system methods would be in
 this case all the in-core ones.
>>>
>>>
>>> +1.
>>
>>
>> Are there other opinions? That's not a 9.6 blocker IMO, so I could get
>> patches out for 10.0 if needed.
>
>
> I missed that on review process, but I'm agree it should be implemented.

So, I have taken the time to hack that, and finished with the patch
attached, that is less intrusive than I thought first:
- COMMENT support is added for access methods
- Added meta-command \dA in psql to list the available access methods:
=# \dA
List of access methods
  Name  | Type
+---
 bloom  | Index
 brin   | Index
 btree  | Index
 gin| Index
 gist   | Index
 hash   | Index
 spgist | Index
(7 rows)
=# \dA+
List of access methods
  Name  | Type  |   Handler   |  Description
+---+-+
 bloom  | Index | blhandler   | bloom index access method
 brin   | Index | brinhandler | block range index (BRIN) access method
 btree  | Index | bthandler   | b-tree index access method
 gin| Index | ginhandler  | GIN index access method
 gist   | Index | gisthandler | GiST index access method
 hash   | Index | hashhandler | hash index access method
 spgist | Index | spghandler  | SP-GiST index access method
(7 rows)
- Fixed a missing tab completion for DROP ACCESS METHOD.

I have added an open item for 9.6 regarding this patch, that would be
good to complete this work in this release for consistency with the
other objects.
Regards,
-- 
Michael
From 12c5d77c7d66cd8b33c697fc389cf9915d32765b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 2 Jun 2016 12:47:46 +0900
Subject: [PATCH] Add support for COMMENT ON and psql meta-command for access
 methods

473b932 has visibly forgotten that access methods, being database objects
could have a description associated with them. Having a psql-level meta
command that allows to list all the access commands available on the
server was missing as well.

At the same time, I have noticed that tab completion of psql could be
more verbose regarding access methods, those things are fixed at the
same time.
---
 contrib/bloom/bloom--1.0.sql   |  1 +
 doc/src/sgml/ref/comment.sgml  |  1 +
 doc/src/sgml/ref/psql-ref.sgml | 13 +
 src/backend/parser/gram.y  |  5 ++--
 src/bin/psql/command.c |  3 +++
 src/bin/psql/describe.c| 61 ++
 src/bin/psql/describe.h|  3 +++
 src/bin/psql/tab-complete.c| 30 -
 8 files changed, 108 insertions(+), 9 deletions(-)

diff --git a/contrib/bloom/bloom--1.0.sql b/contrib/bloom/bloom--1.0.sql
index 7fa7513..87b5442 100644
--- a/contrib/bloom/bloom--1.0.sql
+++ b/contrib/bloom/bloom--1.0.sql
@@ -5,6 +5,7 @@ LANGUAGE C;
 
 -- Access method
 CREATE ACCESS METHOD bloom TYPE INDEX HANDLER blhandler;
+COMMENT ON ACCESS METHOD bloom IS 'bloom index access method';
 
 -- Opclasses
 
diff --git a/doc/src/sgml/ref/comment.sgml b/doc/src/sgml/ref/comment.sgml
index 3321d4b..9582de8 100644
--- a/doc/src/sgml/ref/comment.sgml
+++ b/doc/src/sgml/ref/comment.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 
 COMMENT ON
 {
+  ACCESS METHOD object_name |
   AGGREGATE aggregate_name ( aggregate_signature ) |
   CAST (source_type AS target_type) |
   COLLATION object_name |
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index df79a37..4079ac6 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1130,6 +1130,19 @@ testdb=
 
   
 
+  
+\dA[+] [ pattern ]
+
+
+
+Lists access methods. If pattern is specified, only access
+methods whose names match the pattern are shown. If
++ is appended to the command name, each access
+method is listed with its associated handler function and description.
+
+
+  
 
   
 \db[+] [ pattern ]
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 20384db..5d646e7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -5693,8 +5693,8 @@ opt_restart_seqs:
  *	The COMMENT ON statement can take different forms based upon the type of
  *	the object associated with the comment. The form of the statement is:
  *
- *	COMMENT ON [ [ CONVERSION | COLLATION | DATABASE | DOMAIN |
- * EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER |
+ *	COMMENT ON [ [ ACCESS METHOD | CONVERSION | COLLATION | DATABASE |
+ * DOMAIN | EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER |
  * FOREIGN 

Re: [HACKERS] Tracking wait event for latches

2016-06-01 Thread Thomas Munro
On Fri, May 20, 2016 at 8:14 AM, Michael Paquier
 wrote:
> Hi all,
>
> As I mentioned $subject a couple of months back after looking at the
> wait event facility here:
> http://www.postgresql.org/message-id/CAB7nPqTJpgAvOK4qSC96Fpm5W+aCtJ9D=3Vn9AfiEYsur=-j...@mail.gmail.com
> I have actually taken some time to implement this idea.
>
> The particular case that I had in mind was to be able to track in
> pg_stat_activity processes that are waiting on a latch for synchronous
> replication, and here is what this patch gives in this case:
> =# alter system set synchronous_standby_names = 'foo';
> ALTER SYSTEM
> =# select pg_reload_conf();
>  pg_reload_conf
> 
>  t
> (1 row)
> =# -- Do something
> [...]
>
> And from another session:
> =# select wait_event_type, wait_event from pg_stat_activity where pid = 83316;
>  wait_event_type | wait_event
> -+
>  Latch   | SyncRep
> (1 row)
>
> This is a boring patch, and it relies on the wait event facility that
> has been added recently in 9.6. Note a couple of things though:
> 1) There is something like 30 code paths calling WaitLatch in the
> backend code, all those events are classified and documented similarly
> to LWLock events.
> 2) After discussing this stuff while at PGCon, it does not seem worth
> to have any kind of APIs to be able to add in shared memory custom
> latch names that extensions could load through _PG_init(). In
> replacement to that, there is a latch type flag called "Extension"
> that can be used for this purpose.
> Comments are welcome, I am adding that in the 9.7 queue.

This is very cool, and I can't wait to have this feature!  It'll be
useful for all kinds of developers and users.  I wanted this today to
help debug something I am working on, and remembered this patch.  I
have signed up as a reviewer for the September commitfest.  But here
is some initial feedback based on a quick glance at it:

This patch allows identifiers to be specified by the WaitLatch and
WaitLatchOrSocket calls, but not for WaitEventSetWait, which is the
more general waiting primitive.  I think it should be done by
WaitEventSetWait, and merely passed down by those wrappers functions.

These are actually names for *wait points*, not names for latches.
Some of the language in this patch makes it sound like they are latch
names/latch identifiers, which is inaccurate (the latches themselves
wouldn't be very interesting eg MyLatch).  In some cases the main
thing of interest is actually a socket or timer anyway, not a latch,
so maybe it should appear as wait_event_type = WaitEventSet?

Is there a reason you chose names like 'WALWriterMain'?  That
particular wait point is in the function WalWriterMain (note different
case).  It seems odd to use the containing function names to guide
naming, but not use them exactly.  Since this namespace needs to be
able to name wait points anywhere in the postgres source tree (and
maybe outside it too, for extensions), maybe it should be made
hierarchical, like 'walwriter.WalWriterMain' (or some other
organisational scheme).

I personally think it would be very useful for extensions to be able
to name their wait points.  For example, I'd rather see
'postgres_fdw.pgfdw_get_result' or similar than a vague 'Extension'
string which appears not only for all wait points in an extension but
also for all extensions.  I hope we can figure out a good way to do
that.  Clearly that would involve some shmem registry machinery to
make the names visible across backends (a similar problem exists with
lock tranche names).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown

2016-06-01 Thread Noah Misch
On Thu, May 26, 2016 at 03:27:40PM +0530, Amit Kapila wrote:
> I think the workers should stop processing tuples after the tuple queues
> got detached.  This will not only handle above situation gracefully, but
> will allow to speed up the queries where Limit clause is present on top of
> Gather node.  Patch for the same is attached with mail (this was part of
> original parallel seq scan patch, but was not applied and the reason as far
> as I remember was we thought such an optimization might not be required for
> initial version).
> 
> Another approach to fix this issue could be to reset mqh_partial_bytes and
> mqh_length_word_complete in shm_mq_sendv  in case of SHM_MQ_DETACHED.
> These are currently reset only incase of success.

[Action required within 72 hours.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

2016-06-01 Thread Noah Misch
On Thu, May 26, 2016 at 05:08:31PM +0530, Amit Kapila wrote:
> Just to summarize, apart from above issue, we have discussed two different
> issues related to parallel query in this thread.
> a. Push down of parallel restricted clauses to nodes below gather.  Patch
> to fix same is posted upthread [1].
> b. Wrong assumption that target list can only contain Vars.  Patch to fix
> same is posted upthread [2].  Test which prove our assumption is wrong is
> also posted upthread [3].
> 
> I will add this issue in list of open issues for 9.6 @
> https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items
> 
> [1] -
> https://www.postgresql.org/message-id/CAA4eK1Ky2=HsTsT4hmfL=eal5rv0_t59tvwzvk9hqkvn6do...@mail.gmail.com
> [2] -
> https://www.postgresql.org/message-id/CAA4eK1L-Uo=s4=0jvvva51pj06u5wddvsqg673yuxj_ja+x...@mail.gmail.com
> [3] -
> https://www.postgresql.org/message-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=z...@mail.gmail.com

[Action required within 72 hours.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pg9.6 segfault using simple query (related to use fk for join estimates)

2016-06-01 Thread Noah Misch
On Sun, May 29, 2016 at 01:36:01AM -0400, Noah Misch wrote:
> On Fri, May 06, 2016 at 03:06:01PM -0400, Robert Haas wrote:
> > On Thu, May 5, 2016 at 10:48 AM, David Rowley
> >  wrote:
> > > On 5 May 2016 at 16:04, David Rowley  wrote:
> > >> I've started making some improvements to this, but need to talk to
> > >> Tomas. It's currently in the middle of his night, but will try to
> > >> catch him in his morning to discuss this with him.
> > >
> > > Ok, so I spoke to Tomas about this briefly, and he's asked me to send
> > > in this patch. He didn't get time to look over it due to some other
> > > commitments he has today.
> > >
> > > I do personally feel that if the attached is not good enough, or not
> > > very close to good enough then probably the best course of action is
> > > to revert the whole thing.
> > 
> > Tom, what do you think about this patch?  Is it good enough, or should
> > we revert the whole thing?
> 
> [This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 9.6 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.
> 
> [1] 
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-01 Thread Andres Freund
On 2016-06-01 15:33:18 -0700, Andres Freund wrote:
> Cpu: i7-6820HQ
> Ram: 24GB of memory
> Storage: Samsung SSD 850 PRO 1TB, encrypted
> postgres -c shared_buffers=6GB -c backend_flush_after=128 -c 
> max_wal_size=100GB -c fsync=on -c synchronous_commit=off
> pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N

Using scale 5000 database, with wal compression enabled (otherwise the
whole thing is too slow in both cases), and 64 clients gives:

disabled:
latency average = 11.896 ms
latency stddev = 42.187 ms
tps = 5378.025369 (including connections establishing)
tps = 5378.248569 (excluding connections establishing)

128:
latency average = 11.002 ms
latency stddev = 10.621 ms
tps = 5814.586813 (including connections establishing)
tps = 5814.840249 (excluding connections establishing)


With flushing disabled, rougly every 30s you see:
progress: 150.0 s, 6223.3 tps, lat 10.036 ms stddev 9.521
progress: 151.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 152.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 153.0 s, 4952.9 tps, lat 39.050 ms stddev 249.839

progress: 172.0 s, 4888.0 tps, lat 12.851 ms stddev 11.507
progress: 173.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 174.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 175.0 s, 4636.8 tps, lat 41.421 ms stddev 268.416

progress: 196.0 s, 1119.2 tps, lat 9.618 ms stddev 8.321
progress: 197.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 198.0 s, 1920.9 tps, lat 94.375 ms stddev 429.756
progress: 199.0 s, 5260.8 tps, lat 12.087 ms stddev 11.595


With backend flushing enabled there's not a single such pause.


If you use spinning rust instead of SSDs, the pauses aren't 1-2s
anymore, but easily 10-30s.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does people favor to have matrix data type?

2016-06-01 Thread Kouhei Kaigai
> -Original Message-
> From: Jim Nasby [mailto:jim.na...@bluetreble.com]
> Sent: Wednesday, June 01, 2016 11:32 PM
> To: Kaigai Kouhei(海外 浩平); Gavin Flower; Joe Conway; Ants Aasma; Simon Riggs
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Does people favor to have matrix data type?
> 
> On 5/30/16 9:05 PM, Kouhei Kaigai wrote:
> > Due to performance reason, location of each element must be deterministic
> > without walking on the data structure. This approach guarantees we can
> > reach individual element with 2 steps.
> 
> Agreed.
> 
> On various other points...
> 
> Yes, please keep the discussion here, even when it relates only to PL/R.
> Whatever is being done for R needs to be done for plpython as well. I've
> looked at ways to improve analytics in plpython related to this, and it
> looks like I need to take a look at the fast-path function stuff. One of
> the things I've pondered for storing ndarrays in Postgres is how to
> reduce or eliminate the need to copy data from one memory region to
> another. It would be nice if there was a way to take memory that was
> allocated by one manager (ie: python's) and transfer ownership of that
> memory directly to Postgres without having to copy everything. Obviously
> you'd want to go the other way as well. IIRC cython's memory manager is
> the same as palloc in regard to very large allocations basically being
> ignored completely, so this should be possible in that case.
> 
> One thing I don't understand is why this type needs to be limited to 1
> or 2 dimensions? Isn't the important thing how many individual elements
> you can fit into GPU? So if you can fit a 1024x1024, you could also fit
> a 100x100x100, a 32x32x32x32, etc. At low enough values maybe that stops
> making sense, but I don't see why there needs to be an artificial limit.
>
Simply, I didn't looked at the matrix larger than 2-dimensional. 
Is it a valid mathematic concept?
Because of the nature of GPU, it is designed to map threads on X-, Y-,
and Z-axis. However, not limited to 3-dimensions, because programmer can
handle upper 10bit of X-axis as 4th dimension for example.

> I think what's important for something like kNN is that the storage is
> optimized for this, which I think means treating the highest dimension
> as if it was a list. I don't know if it then matters whither the lower
> dimensions are C style vs FORTRAN style. Other algorithms might want
> different storage.
>
FORTRAN style is preferable, because it allows to use BLAS library without
data format transformation.
I'm not sure why you prefer a list structure on the highest dimension.
A simple lookup table is enough and suitable for massive parallel processors.

> Something else to consider is the 1G toast limit. I'm pretty sure that's
> why MADlib stores matricies as a table of vectors. I know for certain
> it's a problem they run into, because they've discussed it on their
> mailing list.
> 
> BTW, take a look at MADlib svec[1]... ISTM that's just a special case of
> what you're describing with entire grids being zero (or vice-versa).
> There might be some commonality there.
> 
> [1] https://madlib.incubator.apache.org/docs/v1.8/group__grp__svec.html
>
Once we try to deal with a table as representation of matrix, it goes
beyond the scope of data type in PostgreSQL. Likely, users have to take
something special jobs to treat it, more than operator functions that
support matrix data types.

For a matrix larger than toast limit, my thought is that; a large matrix
which is consists of multiple grid can have multiple toast pointers for
each sub-matrix. Although individual sub-matrix must be up to 1GB, we can
represent an entire matrix as a set of grid more than 1GB.

As I wrote in the previous message, a matrix structure head will have offset
to each sub-matrix of grid. The sub-matrix will be inline, or external toast
if VARATT_IS_1B_E(ptr).
Probably, we also have to hack row deletion code not to leak sub-matrix in
the toast table.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

 --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-01 Thread Andreas Karlsson

On 06/02/2016 01:41 AM, Michael Paquier wrote:
> On Thu, Jun 2, 2016 at 7:36 AM, Andreas Karlsson  
wrote:
>> Looked at this quickly and I do not think adding it would be what I 
consider
>> a small patch since we would essentially need to copy the validation 
logic

>> from DefineAggregate and AggregateCreate and modify it to fit the alter
>> case. I am leaning towards either either leaving the aggregate functions
>> alone or updating the catalog manually.
>
> As this is proving to be a hassle, what would it cost to leave those
> operator classes as-is for 9.6 and come up with a cleaner solution at
> DDL level with 10.0? Then we could still focus on the other extensions
> that have content that can be easily switched. That's more than 90% of
> the things that need to marked with parallel-safe.

I think at least three of the four aggregate functions are little used, 
so I do not think many users would be affected. And only min(citext) and 
max(citext) can make use of the parallel aggregation.


The functions are:

min(citext)
max(citext)
int_array_aggregate(int4)
rewrite(tsquery[])

It would be nice if we had support for this in ALTER AGGREGATE in 9.6 
already since I bet that there are external extensions which want to 
take advantage of the parallel aggregation, but at least if I add this 
command I do not feel like it would be a minor patch. If people disagree 
and are fine with copying the validation logic, then I am prepare to do 
the work. I would just rather not rush this if there is no chance anyway 
that it will get into 9.6.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-01 Thread Michael Paquier
On Thu, Jun 2, 2016 at 7:36 AM, Andreas Karlsson  wrote:
> On 05/25/2016 03:32 AM, Tom Lane wrote:
>>
>> Andreas Karlsson  writes:
>>>
>>> On 05/25/2016 02:41 AM, Robert Haas wrote:

 I'd rather extend see us ALTER AGGREGATE to do this.
>>
>>
>>> Wouldn't that prevent this from going into 9.6? I do not think changing
>>> ALTER AGGREGATE is 9.6 materials.
>>
>>
>> Well, it's debatable --- but if the patch to do it is small and the
>> alternatives are really ugly, that would be an acceptable choice IMO.
>> Certainly we'd want to add that capability eventually anyway.
>
>
> Looked at this quickly and I do not think adding it would be what I consider
> a small patch since we would essentially need to copy the validation logic
> from DefineAggregate and AggregateCreate and modify it to fit the alter
> case. I am leaning towards either either leaving the aggregate functions
> alone or updating the catalog manually.

As this is proving to be a hassle, what would it cost to leave those
operator classes as-is for 9.6 and come up with a cleaner solution at
DDL level with 10.0? Then we could still focus on the other extensions
that have content that can be easily switched. That's more than 90% of
the things that need to marked with parallel-safe.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-06-01 Thread Michael Paquier
On Thu, Jun 2, 2016 at 6:40 AM, Tom Lane  wrote:
> I suggest that there's a more principled reason for refusing a back-patch
> here, which is that we don't back-patch new features, only bug fixes.
> This request is certainly not a bug fix.  It's in support of a new feature
> --- and one that's not even ours, but a third-party extension.

Yes, that's not a bug fix. I agree on that.

> Considering that said extension isn't finished yet, it's hard to guess
> whether this will be the only blocking factor preventing it from working
> with older versions, but it might well be that there are other problems.
> Also, even if it would work, the author would be reduced to saying things
> like "it will work in 9.4, if it's 9.4.9 or later".  Keeping track of that
> level of detail is no fun either for the author or for users.

The extension in this case is for 9.4, for a product yet to be
released that is now on 9.4.8, so I don't care much about the support
grid here to be honest.

> It'd be a lot simpler all around to just say "my spiffy new extension 
> requires 9.6
> or later".

Well, that's the only factor as far as I saw that prevented me to use
this extension on Windows. But I won't fight your might regarding a
backpatch, wearing the burden of a custom patch applied to miscadmin.h
for REL9_4_STABLE is not huge: this code never changes.

> In short, I'd vote for putting this change in HEAD, but I see no need to
> back-patch.

OK, fine for me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-01 Thread Andreas Karlsson

On 05/25/2016 03:32 AM, Tom Lane wrote:

Andreas Karlsson  writes:

On 05/25/2016 02:41 AM, Robert Haas wrote:

I'd rather extend see us ALTER AGGREGATE to do this.



Wouldn't that prevent this from going into 9.6? I do not think changing
ALTER AGGREGATE is 9.6 materials.


Well, it's debatable --- but if the patch to do it is small and the
alternatives are really ugly, that would be an acceptable choice IMO.
Certainly we'd want to add that capability eventually anyway.


Looked at this quickly and I do not think adding it would be what I 
consider a small patch since we would essentially need to copy the 
validation logic from DefineAggregate and AggregateCreate and modify it 
to fit the alter case. I am leaning towards either either leaving the 
aggregate functions alone or updating the catalog manually.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-01 Thread Andres Freund
On 2016-05-31 16:03:46 -0400, Robert Haas wrote:
> On Fri, May 27, 2016 at 12:37 AM, Andres Freund  wrote:
> > I don't think the situation is quite that simple. By *disabling* backend 
> > flushing it's also easy to see massive performance regressions.  In 
> > situations where shared buffers was configured appropriately for the 
> > workload (not the case here IIRC).
> 
> On what kind of workload does setting backend_flush_after=0 represent
> a large regression vs. the default settings?
> 
> I think we have to consider that pgbench and parallel copy are pretty
> common things to want to do, and a non-zero default setting hurts
> those workloads a LOT.

I don't think pgbench's workload has much to do with reality. Even less
so in the setup presented here.

The slowdown comes from the fact that default pgbench randomly, but
uniformly, updates a large table. Which is slower with
backend_flush_after if the workload is considerably bigger than
shared_buffers, but, and that's a very important restriction, the
workload at the same time largely fits in to less than
/proc/sys/vm/dirty_ratio / 20% (probably even 10% /
/proc/sys/vm/dirty_background_ratio) of the free os memory.  The "trick"
in that case is that very often, before a buffer has been written back
to storage by the OS, it'll be re-dirtied by postgres.  Which means
triggering flushing by postgres increases the total amount of writes.
That only matters if the kernel doesn't trigger writeback because of the
above ratios, or because of time limits (30s /
dirty_writeback_centisecs).


> I have a really hard time believing that the benefits on other
> workloads are large enough to compensate for the slowdowns we're
> seeing here.

As a random example, without looking for good parameters, on my laptop:
pgbench -i -q -s 1000

Cpu: i7-6820HQ
Ram: 24GB of memory
Storage: Samsung SSD 850 PRO 1TB, encrypted
postgres -c shared_buffers=6GB -c backend_flush_after=128 -c max_wal_size=100GB 
-c fsync=on -c synchronous_commit=off
pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N
(note the -N)
disabled:
latency average = 2.774 ms
latency stddev = 10.388 ms
tps = 5761.883323 (including connections establishing)
tps = 5762.027278 (excluding connections establishing)

128:
latency average = 2.543 ms
latency stddev = 3.554 ms
tps = 6284.069846 (including connections establishing)
tps = 6284.184570 (excluding connections establishing)

Note the latency dev which is 3x better. And the improved throughput.

That's for a workload which even fits into the OS memory. Without
backend flushing there's several periods looking like

progress: 249.0 s, 7237.6 tps, lat 1.997 ms stddev 4.365
progress: 250.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 251.0 s, 1880.6 tps, lat 17.761 ms stddev 169.682
progress: 252.0 s, 6904.4 tps, lat 2.328 ms stddev 3.256

i.e. moments in which no transactions are executed. And that's on
storage that can do 500MB/sec, and tens of thousand IOPs.


If you change the workload workload that uses synchronous_commit, is
bigger than OS memory and/or doesn't have very fast storage, the
differences can be a *LOT* bigger.


In general, any workload which doesn't fit a) the above criteria of
likely re-dirtying blocks it already dirtied, before kernel triggered
writeback happens b) concurrently COPYs into an indvidual file, is
likely to be faster (or unchanged if within s_b) with backend flushing.


Which means that transactional workloads that are bigger than the OS
memory, or which have a non-uniform distribution leading to some
locality, are likely to be faster. In practice those are *hugely* more
likely than the uniform distribution that pgbench has.

Similarly, this *considerably* reduces the impact a concurrently running
vacuum or COPY has on concurrent queries. Because suddenly VACUUM/COPY
can't create a couple gigabytes of dirty buffers which will be written
back at some random point in time later, stalling everything.


I think the benefits of a more predictable (and often faster!)
performance in a bunch of actual real-worl-ish workloads are higher than
optimizing for benchmarks.



> We have nobody writing in to say that
> backend_flush_after>0 is making things way better for them, and
> Ashutosh and I have independently hit massive slowdowns on unrelated
> workloads.

Actually, we have some of evidence of that? Just so far not in this
thread; which I don't find particularly surprising.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-06-01 Thread David G. Johnston
On Wed, Jun 1, 2016 at 5:40 PM, Tom Lane  wrote:

> > On Wed, Jun 1, 2016 at 5:04 PM, Robert Haas 
> wrote:
> >> Probably not, but yes, I do want to reduce the commit load. I also
> >> think that we essentially have a contract with our users to limit what
> >> we back-patch to critical bug fixes and security fixes.  When we don't
> >> do that, people start asking to have individual fixes cherry-picked
> >> instead of just upgrading, and that's not good.  We may know that such
> >> changes are low-risk, but that doesn't mean everyone else does.
>
> I suggest that there's a more principled reason for refusing a back-patch
> here, which is that we don't back-patch new features, only bug fixes.
> This request is certainly not a bug fix.  It's in support of a new feature
> --- and one that's not even ours, but a third-party extension.
>

Maybe I don't understand PGDLLEXPORT...

​The PostgreSQL function/feature in question is already in place and can be
accessed by someone using Linux or other unix-like variant.  But it cannot
be access by our Window's users because we failed to add a PGDLLEXPORT
somewhere.​  If it is our goal to treat Windows and Linux/Unix equally then
that discrepancy is on its face a bug.  The fact we don't catch these until
some third-party points it out doesn't make it any less a bug.


> Considering that said extension isn't finished yet, it's hard to guess
> whether this will be the only blocking factor preventing it from working
> with older versions, but it might well be that there are other problems.
>

​From my prior point the reason someone wants to use the unexposed but
documented public API shouldn't matter.

Also, even if it would work, the author would be reduced to saying things
> like "it will work in 9.4, if it's 9.4.9 or later".  Keeping track of that

level of detail is no fun either for the author or for users.


​This seems hollow.  "It works on the latest version of all officially
supported PostgreSQL releases" is a reasonable policy for a third-party
application to take.​  That it might work on older releases would be a
bonus for some new users.

  It'd be
> a lot simpler all around to just say "my spiffy new extension requires 9.6
> or later".
>

​And it makes the available user base considerably smaller.  A small change
to get the other 80% of the users is something to take seriously.
​


>
> In short, I'd vote for putting this change in HEAD, but I see no need to
> back-patch.
>

​I see a need for back-patching and no technical reason why it cannot be
done - easily.

​David J.​


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-06-01 Thread Tom Lane
> On Wed, Jun 1, 2016 at 5:04 PM, Robert Haas  wrote:
>> Probably not, but yes, I do want to reduce the commit load. I also
>> think that we essentially have a contract with our users to limit what
>> we back-patch to critical bug fixes and security fixes.  When we don't
>> do that, people start asking to have individual fixes cherry-picked
>> instead of just upgrading, and that's not good.  We may know that such
>> changes are low-risk, but that doesn't mean everyone else does.

I suggest that there's a more principled reason for refusing a back-patch
here, which is that we don't back-patch new features, only bug fixes.
This request is certainly not a bug fix.  It's in support of a new feature
--- and one that's not even ours, but a third-party extension.

Considering that said extension isn't finished yet, it's hard to guess
whether this will be the only blocking factor preventing it from working
with older versions, but it might well be that there are other problems.
Also, even if it would work, the author would be reduced to saying things
like "it will work in 9.4, if it's 9.4.9 or later".  Keeping track of that
level of detail is no fun either for the author or for users.  It'd be
a lot simpler all around to just say "my spiffy new extension requires 9.6
or later".

In short, I'd vote for putting this change in HEAD, but I see no need to
back-patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-01 Thread Josh berkus
On 06/01/2016 02:21 PM, Robert Haas wrote:
> If you lined up ten people in a room all of whom were competent
> database professionals and none of whom knew anything about PostgreSQL
> and asked them to guess what a setting called work_mem does and what a
> setting called max_parallel_degree does, I will wager you $5 that
> they'd do better on the second one.  Likewise, I bet the guesses for
> max_parallel_degree would be closer to the mark than the guesses for
> maintenance_work_mem or replacement_sort_tuples or commit_siblings or
> bgwriter_lru_multiplier.

Incidentally, the reason I didn't jump into this thread until the
patches showed up is that I don't think it actually matters what the
parameters are named.  They're going to require documentation
regardless, parallism just isn't something people grok instinctively.

I care about how the parameters *work*, and whether that's consistent
across our various resource management settings.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-01 Thread Tom Lane
Robert Haas  writes:
> I've largely given up hope of coming up with an alternative that can
> attract more than one vote and that is also at least mildly accurate,
> but one idea is max_parallel_workers_per_gather_node.  That will be
> totally clear.

Given the reference to Gather nodes, couldn't you drop the word
"parallel"?  "node" might not be necessary either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-06-01 Thread David G. Johnston
On Wed, Jun 1, 2016 at 5:04 PM, Robert Haas  wrote:

> On Wed, Jun 1, 2016 at 12:24 PM, Alvaro Herrera
>  wrote:
> > Robert Haas wrote:
> >> On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer 
> wrote:
> >> > On 1 June 2016 at 11:48, Michael Paquier 
> wrote:
> >> >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD
> >> >> and back-branches?
> >> >
> >> > Sounds sensible to me.
> >>
> >> I don't really want to set a precedent that we'll back-patch
> >> PGDLLIMPORT markings every time somebody needs a new symbol for some
> >> extension they are writing, but I don't mind changing this in master.
> >
> > I wonder why is that -- just to reduce the commit load?  I don't think
> > this kind of change is likely to break anything, is it?
>
> Probably not, but yes, I do want to reduce the commit load. I also
> think that we essentially have a contract with our users to limit what
> we back-patch to critical bug fixes and security fixes.  When we don't
> do that, people start asking to have individual fixes cherry-picked
> instead of just upgrading, and that's not good.  We may know that such
> changes are low-risk, but that doesn't mean everyone else does.


​Are there two concerns here?  One, that people will think we are
back-patching stuff and destabilizing back-branches, and two, that people
will see increase back-patching and therefore make unreasonable requests of
us to which we dislike saying "no".  The later doesn't seem likely, and I'd
say you can't stop people from having badly formed opinions and that our
track record on back-patching decisions is excellent.

We want third-party tools​

​to support our prior releases and if miss making one of our features
available to Windows because of a missing PGDLLIMPORT that's on our heads
and should be fixed.  If a user equates that to "please batch-patch jsonb
to 9.3 because I don't want to upgrade" I'm not going to feel much guilt
saying "that's different, ain't gonna happen".  Informed people will
understand the purpose of the back-patch and until I start hearing a vocal
uninformed person start griping I'd rather give the community the benefit
of the doubt.

David J.
​


Re: [HACKERS] Rename max_parallel_degree?

2016-06-01 Thread Robert Haas
On Wed, Jun 1, 2016 at 10:10 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> Your explanation is clear, however the name max_parallel_workers makes it
>> sound like that parallelising an operation is all about workers.  Yes it
>> depends a lot on the number of workers allocated for parallel operation,
>> but that is not everything.  I think calling it max_parallelism as
>> suggested by Alvaro upthread suits better than max_parallel_workers.
>
> I don't think that's a good direction at all.  This entire discussion is
> caused by the fact that it's not very clear what "max_parallel_degree"
> measures.  Fixing that problem by renaming the variable to something that
> doesn't even pretend to tell you what it's counting is not an improvement.

I agree with that, but I also think you're holding the name of this
GUC to a ridiculously high standard.  It's not like you can look at
"work_mem" and know what it measures without reading the fine manual.
If you lined up ten people in a room all of whom were competent
database professionals and none of whom knew anything about PostgreSQL
and asked them to guess what a setting called work_mem does and what a
setting called max_parallel_degree does, I will wager you $5 that
they'd do better on the second one.  Likewise, I bet the guesses for
max_parallel_degree would be closer to the mark than the guesses for
maintenance_work_mem or replacement_sort_tuples or commit_siblings or
bgwriter_lru_multiplier.

I've largely given up hope of coming up with an alternative that can
attract more than one vote and that is also at least mildly accurate,
but one idea is max_parallel_workers_per_gather_node.  That will be
totally clear.  Three possible problems, none of them fatal, are:

- I have plans to eventually fix it so that the workers are shared
across the whole plan rather than just the plan node.  In most cases
that won't matter, but there are corner cases where it does.  Now when
that happens, we can rename this to max_parallel_workers_per_query,
breaking backward compatibility.
- It will force us to use a different GUC for utility commands.
- It's a bit long-winded.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in comment in nbtree.h

2016-06-01 Thread Thomas Munro
Hi

Following along with a btree bug report, I saw a typo "referencd" in a
comment.  Also "we've" seems a bit odd here, but maybe it's just me.
Maybe it should be like this?

--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -522,7 +522,7 @@ typedef struct BTScanPosData
Buffer  buf;/* if valid, the
buffer is pinned */

XLogRecPtr  lsn;/* pos in the WAL
stream when page was read */
-   BlockNumber currPage;   /* page we've referencd by
items array */
+   BlockNumber currPage;   /* page referenced by items array */
BlockNumber nextPage;   /* page's right link when we
scanned it */

/*

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-06-01 Thread Robert Haas
On Wed, Jun 1, 2016 at 12:24 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer  wrote:
>> > On 1 June 2016 at 11:48, Michael Paquier  wrote:
>> >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD
>> >> and back-branches?
>> >
>> > Sounds sensible to me.
>>
>> I don't really want to set a precedent that we'll back-patch
>> PGDLLIMPORT markings every time somebody needs a new symbol for some
>> extension they are writing, but I don't mind changing this in master.
>
> I wonder why is that -- just to reduce the commit load?  I don't think
> this kind of change is likely to break anything, is it?

Probably not, but yes, I do want to reduce the commit load. I also
think that we essentially have a contract with our users to limit what
we back-patch to critical bug fixes and security fixes.  When we don't
do that, people start asking to have individual fixes cherry-picked
instead of just upgrading, and that's not good.  We may know that such
changes are low-risk, but that doesn't mean everyone else does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Tom Lane
Paul Ramsey  writes:
> One of the things people find annoying about postgis is that
> ST_Intersects(ST_Intersection(a, b), a) can come out as false (a
> derived point at a crossing of lines may not exactly intersect either
> of the input lines), which is a direct result of our use of exact math
> for the boolean intersects test.

That's an interesting comment, because it's more or less exactly the
type of failure that we could expect to get if we remove fuzzy comparisons
from the built-in types.  How much of a problem is it in practice for
PostGIS users?  Do you have any plans to change it?

> Anyways, go forth and do whatever makes sense for PgSQL

I think we're trying to figure out what that is ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-06-01 Thread Pavel Stehule
2016-06-01 17:55 GMT+02:00 Jim Nasby :

> On 5/31/16 7:04 PM, Peter van Hardenberg wrote:
>
>> The idea of converting a JSONB array to a PG array is appealing and
>> would potentially be more general-purpose than adding a new unnest. I'm
>> not sure how feasible either suggestion is.
>>
>
> The one part I think is missing right now is unnest allows you to 'stitch'
> or 'zip' multiple arrays together into a single recordset via
> unnest(array1, array2, ...). Presumably that could be added to the json
> equivalents.
>
> I will say that I think the current state of affairs is gratuitously
>> verbose and expects users to memorize a substantial number of long
>> function names to perform simple tasks.
>>
>
> +100. It's *much* easier to deal with JSON in other languages because they
> have native support for the concept of a dictionary, so changing an element
> is as simple as json['foo'][3] = 'new'. Trying to do that in Postgres is
> horrible partly because of the need to remember some odd operator, but
> moreso because it's ultimately still an operator. What we need is a form of
> *addressing*. If you could directly access items in a JSON doc with []
> notation then a lot of the current functions could go away, *especially* if
> the [] notation allowed things like a slice and a list of values (ie:
> json['foo', 'bar', 'baz'] = '[42,{"my": "nice object"},"with a random
> string"]'. Or = row(42, ...).
>

these features I would to see in Postgres too.

Regards

Pavel


> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Misdesigned command/status APIs for parallel dump/restore

2016-06-01 Thread Tom Lane
I wrote:
> I am pretty strongly tempted to get rid of MasterStartParallelItem
> altogether and just hard-code what it does in DispatchJobForTocEntry.
> ...
> A different line of thought would be to fix the worker-command-parsing
> situation by allowing the parsing to happen in format-specific code,
> but avoid duplicative coding by letting archive formats share a common
> implementation function if they had no need for any custom data.

In the attached patch for this, I took a middle ground of separating out
the command and status string building and parsing functions.  There isn't
presently any provision for letting archive format modules override these,
but that could easily be added if we ever need it.  Meanwhile, this saves
about 100 lines of rather messy code.

Like the other couple of patches I've posted recently for parallel dump,
this is against current HEAD.  There will be minor merge conflicts when
these patches are combined; but they should be reviewable independently,
so I'll worry about the merge issues later.

regards, tom lane

diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index e9e8698..509a312 100644
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
***
*** 20,49 
   * the desired number of worker processes, which each enter WaitForCommands().
   *
   * The master process dispatches an individual work item to one of the worker
!  * processes in DispatchJobForTocEntry().  That calls
!  * AH->MasterStartParallelItemPtr, a routine of the output format.  This
!  * function's arguments are the parents archive handle AH (containing the full
!  * catalog information), the TocEntry that the worker should work on and a
!  * T_Action value indicating whether this is a backup or a restore task.  The
!  * function simply converts the TocEntry assignment into a command string that
!  * is then sent over to the worker process. In the simplest case that would be
!  * something like "DUMP 1234", with 1234 being the TocEntry id.
!  *
   * The worker process receives and decodes the command and passes it to the
   * routine pointed to by AH->WorkerJobDumpPtr or AH->WorkerJobRestorePtr,
   * which are routines of the current archive format.  That routine performs
!  * the required action (dump or restore) and returns a malloc'd status string.
!  * The status string is passed back to the master where it is interpreted by
!  * AH->MasterEndParallelItemPtr, another format-specific routine.  That
!  * function can update state or catalog information on the master's side,
!  * depending on the reply from the worker process.  In the end it returns a
!  * status code, which is 0 for successful execution.
   *
!  * Remember that we have forked off the workers only after we have read in
!  * the catalog.  That's why our worker processes can also access the catalog
!  * information.  (In the Windows case, the workers are threads in the same
!  * process.  To avoid problems, they work with cloned copies of the Archive
!  * data structure; see init_spawned_worker_win32().)
   *
   * In the master process, the workerStatus field for each worker has one of
   * the following values:
--- 20,41 
   * the desired number of worker processes, which each enter WaitForCommands().
   *
   * The master process dispatches an individual work item to one of the worker
!  * processes in DispatchJobForTocEntry().  We send a command string such as
!  * "DUMP 1234" or "RESTORE 1234", where 1234 is the TocEntry ID.
   * The worker process receives and decodes the command and passes it to the
   * routine pointed to by AH->WorkerJobDumpPtr or AH->WorkerJobRestorePtr,
   * which are routines of the current archive format.  That routine performs
!  * the required action (dump or restore) and returns an integer status code.
!  * This is passed back to the master which updates its state accordingly.
   *
!  * In principle additional archive-format-specific information might be needed
!  * in commands or worker status responses, but so far that hasn't proved
!  * necessary, since workers have full copies of the ArchiveHandle/TocEntry
!  * data structures.  Remember that we have forked off the workers only after
!  * we have read in the catalog.  That's why our worker processes can also
!  * access the catalog information.  (In the Windows case, the workers are
!  * threads in the same process.  To avoid problems, they work with cloned
!  * copies of the Archive data structure; see init_spawned_worker_win32().)
   *
   * In the master process, the workerStatus field for each worker has one of
   * the following values:
*** ParallelBackupEnd(ArchiveHandle *AH, Par
*** 703,708 
--- 695,804 
  }
  
  /*
+  * These next four functions handle construction and parsing of the command
+  * strings and response strings for parallel workers.
+  *
+  * Currently, these can be the same regardless of which archive format we are
+  * processing.  In future, 

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Paul Ramsey
On Wed, Jun 1, 2016 at 8:59 AM, Jim Nasby  wrote:
> On 6/1/16 10:03 AM, Paul Ramsey wrote:
>>
>> We don't depend on these, we have our own :/
>> The real answer for a GIS system is to have an explicit tolerance
>> parameter for calculations like distance/touching/containment, but
>> unfortunately we didn't do that so now we have our own
>> compatibility/boil the ocean problem if we ever wanted/were funded to
>> add one.
>
>
> Well it sounds like what's currently happening in Postgres is probably going
> to change, so how might we structure that to help PostGIS? Would simply
> lopping off the last few bits of the significand/mantissa work, or is that
> not enough when different GRSes are involved?

PostGIS doesn't look at all at what the PgSQL geotypes do, so go
forward w/o fear. Tolerance in geo world is more than vertex rounding
though, it's things like saying that when distance(pt,line) < epsilon
then distance(pt,line) == 0, or similarly for shape touching, etc. One
of the things people find annoying about postgis is that

ST_Intersects(ST_Intersection(a, b), a) can come out as false (a
derived point at a crossing of lines may not exactly intersect either
of the input lines), which is a direct result of our use of exact math
for the boolean intersects test.

Anyways, go forth and do whatever makes sense for PgSQL

P

>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-06-01 Thread David Fetter
On Tue, May 31, 2016 at 06:15:32PM -0400, David G. Johnston wrote:
> I stand corrected.  I was thinking you could somehow craft unnest(' value here>') but there is no way to auto-convert to "anyarray"...
> 
> > The json_array_elements family manages to do the right thing.  Why
> > would it be harder to make sure UNNEST and ROWS FROM() do so?
> >
> 
> I have apparently failed to understand your point.  All I saw was that you
> wanted "unnest(jsonb)" to work in an identical fashion to
> "​jsonb_array_elements(jsonb)".  If there is some aspect beyond this being
> an aliasing situation then you have failed to communicate it such that I
> comprehended that fact.
> 

Upon further investigation, I think UNNEST should Just Work™ which is
to say that it should unnest arrays into their top-level constituent
elements if the standard doesn't specify some other behavior.

Separately, I suppose, I think there needs to be an easy way to cast
the output of UNNEST.  Lacking knowledge of the intricacies of
parsing, etc., I'd propose CAST(UNNEST(...) AS ...), or better yet,
UNNEST(...):: at least in the case without WITH ORDINALITY.

Further out in the future, at least so it seems to me, it would be
nice to have a feature where one could cast a column to an expanded
row type, e.g.:

SELECT my_jsonb::(i INT, t TEXT, p POINT), foo, bar
FROM ...

and get a result set with 5 columns in it.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-06-01 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer  wrote:
> > On 1 June 2016 at 11:48, Michael Paquier  wrote:
> >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD
> >> and back-branches?
> >
> > Sounds sensible to me.
> 
> I don't really want to set a precedent that we'll back-patch
> PGDLLIMPORT markings every time somebody needs a new symbol for some
> extension they are writing, but I don't mind changing this in master.

I wonder why is that -- just to reduce the commit load?  I don't think
this kind of change is likely to break anything, is it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-06-01 Thread David G. Johnston
On Tue, May 31, 2016 at 6:20 PM, Tom Lane  wrote:

> David Fetter  writes:
> > On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> >> While likely not that common the introduction of an ambiguity makes
> >> raises the bar considerably.
>
> > What ambiguity?
>
> You get that as a result of the recent introduction of unnest(tsvector),
> which we debated a few weeks ago and seem to have decided to leave as-is.
> But it failed before 9.6 too, with
>
> So at least in this particular case, adding unnest(jsonb) wouldn't be a
> problem from the standpoint of not being able to resolve calls that we
> could resolve before.
>
> Nonetheless, there *is* an ambiguity here, which is specific to json(b):
> what type of array are you expecting to get?  The reason we have both
> json[b]_array_elements() and json[b]_array_elements_text() is that there
> are plausible use-cases for returning either json or plain text.  It's not
> hard to imagine that somebody will want json[b]_array_elements_numeric()
> before long, too.  If you want to have an unnest(jsonb) then you will need
> to make an arbitrary decision about which type it will return, and that
> doesn't seem like an especially great idea to me.
>

​I'm on the fence given the presence of the tsvector overload and the lack
of any syntactic concerns.​

I would either have it keep the same form as our main unnest function:


and/or have two functions

unnest(json, anyelement) : anyelement
unnest(jsonb, anyelement) : anyelement

The first one cannot fail at runtime (do to type conversion) while the
later two can.

If you can't tell I do like our introduction of what are basically Java
generics into our idiomatic json implementation.

​I'd call this implementing a better option for polymorphism than creating
a new function every time someone wants typed output.​

David J.


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-06-01 Thread David G. Johnston
On Wed, Jun 1, 2016 at 11:55 AM, Jim Nasby  wrote:

> On 5/31/16 7:04 PM, Peter van Hardenberg wrote:
>
>> The idea of converting a JSONB array to a PG array is appealing and
>> would potentially be more general-purpose than adding a new unnest. I'm
>> not sure how feasible either suggestion is.
>>
>
> The one part I think is missing right now is unnest allows you to 'stitch'
> or 'zip' multiple arrays together into a single recordset via
> unnest(array1, array2, ...). Presumably that could be added to the json
> equivalents.


You can just use "ROWS FROM" to get the same result.

https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-TABLEFUNCTIONS
"""
The special table function UNNEST may be called with any number of array
parameters, and it returns a corresponding number of columns, as if UNNEST
(Section 9.18) had been called on each parameter separately and combined
using the ROWS FROM construct.
​"""

​Yes, the unnest form can be used within the target-list but that argument
is not going to get you very far as that use of SRF is greatly frowned upon
now that we have LATERAL.

David J.


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-01 Thread Andreas Karlsson

On 06/01/2016 05:15 PM, Tom Lane wrote:

Andreas Karlsson  writes:

On 06/01/2016 04:44 PM, Tom Lane wrote:

I don't understand why you think you need the CREATE OR REPLACE FUNCTION
commands?  We only need to change proargtypes, and the updates did that.
The catcache can take care of itself.



Maybe I did something wrong (I try to avoid doing manual catalog
updates), but when I tested it, I needed to run the CREATE OR REPLACE
FUNCTION command to later be able to set the parallel safety. See the
example below.


In this particular example, the problem seems to be that you forgot to
update pronargs; it works for me when I add "pronargs = 2" to the UPDATE.
Your "working" example is actually creating a new function, not replacing
the old pg_proc entry.

BTW, it strikes me that the pronamespace tests in these queries are
redundant: the OID is unique, so what matters is the search path
in the regprocedure lookups.


Thanks, I have fixed this.


The reason I use to_regprocedure is so that these scripts work for
people who have installed the extensions in beta1 and therefore only
have the new signatures. If they run these scripts they would get an
error if I used the cast. Is it ok if these scripts break for beta1 users?


Ugh.  This is more of a mess than I thought.  I don't like update scripts
that might silently fail to do what they're supposed to, but leaving
beta1 users in the lurch is not very nice either.

I wonder ... could we get away with using regproc rather than
regprocedure?  These function names are probably unique anyway ...


Yeah, I would have strongly preferred to be able to use the cast. All 
these functions have unique names within the core, but there is the 
small risk of someone adding a user function with the same name. I do 
not like either option.


The attached patch still uses to_regprocedure, but I can change to using 
::regproc if that is what you prefer.


Andreas



gin-gist-signatures-v2.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Jim Nasby

On 6/1/16 10:03 AM, Paul Ramsey wrote:

We don't depend on these, we have our own :/
The real answer for a GIS system is to have an explicit tolerance
parameter for calculations like distance/touching/containment, but
unfortunately we didn't do that so now we have our own
compatibility/boil the ocean problem if we ever wanted/were funded to
add one.


Well it sounds like what's currently happening in Postgres is probably 
going to change, so how might we structure that to help PostGIS? Would 
simply lopping off the last few bits of the significand/mantissa work, 
or is that not enough when different GRSes are involved?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-06-01 Thread Jim Nasby

On 5/31/16 7:04 PM, Peter van Hardenberg wrote:

The idea of converting a JSONB array to a PG array is appealing and
would potentially be more general-purpose than adding a new unnest. I'm
not sure how feasible either suggestion is.


The one part I think is missing right now is unnest allows you to 
'stitch' or 'zip' multiple arrays together into a single recordset via 
unnest(array1, array2, ...). Presumably that could be added to the json 
equivalents.



I will say that I think the current state of affairs is gratuitously
verbose and expects users to memorize a substantial number of long
function names to perform simple tasks.


+100. It's *much* easier to deal with JSON in other languages because 
they have native support for the concept of a dictionary, so changing an 
element is as simple as json['foo'][3] = 'new'. Trying to do that in 
Postgres is horrible partly because of the need to remember some odd 
operator, but moreso because it's ultimately still an operator. What we 
need is a form of *addressing*. If you could directly access items in a 
JSON doc with [] notation then a lot of the current functions could go 
away, *especially* if the [] notation allowed things like a slice and a 
list of values (ie: json['foo', 'bar', 'baz'] = '[42,{"my": "nice 
object"},"with a random string"]'. Or = row(42, ...).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-01 Thread David G. Johnston
On Wed, Jun 1, 2016 at 11:45 AM, Petr Jelinek  wrote:

> That GUC also controls worker processes that are started by extensions,
> not just ones that parallel query starts. This is btw one thing I don't
> like at all about how the current limits work, the parallel query will
> fight for workers with extensions because they share the same limit.


​Given that this models reality the GUC is doing its job.  Now, maybe we
need additional knobs to give the end-user the ability to influence how
those fights will turn out.

But as far as a high-level setting goes max_worker_processes seems to fit
the bill - and apparently fits within our existing cluster options naming
convention.

Parallel query uses workers to assist in query execution.
Background tasks use workers during execution.
​Others.​

David J.
​


Re: [HACKERS] Rename max_parallel_degree?

2016-06-01 Thread Petr Jelinek

On 01/06/16 17:27, Jim Nasby wrote:

On 5/31/16 8:48 PM, Robert Haas wrote:

On Tue, May 31, 2016 at 5:58 PM, Tom Lane  wrote:

Alvaro Herrera  writes:

Robert Haas wrote:

I just want to point out that if we change #1, we're breaking
postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
I'd just leave it alone.



We can add the old name as a synonym in guc.c to maintain
compatibility.


I doubt this is much of an issue at this point; max_worker_processes has
only been there a release or so, and surely there are very few people
explicitly setting it, given its limited use-case up to now.  It will be
really hard to change it after 9.6, but I think we could still get away
with that today.


max_worker_processes was added in 9.4, so it's been there for two
releases, but it probably is true that few people have set it.
Nevertheless, I don't think there's much evidence that it is a bad
enough name that we really must change it.


ISTM that all the confusion about parallel query would go away if the
setting was max_parallel_assistants instead of _workers. It's exactly
how parallel query works: there are helpers that *assist* the backend in
executing the query.

The big downside to "assistants" is it breaks all lexical connection to
max_worker_processes. So what if we change that to
max_assistant_processes? I think "assistant" and "worker" are close
enough in meaning for "stand alone" uses of BG workers so as not to be
confusing, and I don't see any options for parallelism that are any
clearer.


That GUC also controls worker processes that are started by extensions, 
not just ones that parallel query starts. This is btw one thing I don't 
like at all about how the current limits work, the parallel query will 
fight for workers with extensions because they share the same limit.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-01 Thread Jim Nasby

On 5/31/16 8:48 PM, Robert Haas wrote:

On Tue, May 31, 2016 at 5:58 PM, Tom Lane  wrote:

Alvaro Herrera  writes:

Robert Haas wrote:

I just want to point out that if we change #1, we're breaking
postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
I'd just leave it alone.



We can add the old name as a synonym in guc.c to maintain compatibility.


I doubt this is much of an issue at this point; max_worker_processes has
only been there a release or so, and surely there are very few people
explicitly setting it, given its limited use-case up to now.  It will be
really hard to change it after 9.6, but I think we could still get away
with that today.


max_worker_processes was added in 9.4, so it's been there for two
releases, but it probably is true that few people have set it.
Nevertheless, I don't think there's much evidence that it is a bad
enough name that we really must change it.


ISTM that all the confusion about parallel query would go away if the 
setting was max_parallel_assistants instead of _workers. It's exactly 
how parallel query works: there are helpers that *assist* the backend in 
executing the query.


The big downside to "assistants" is it breaks all lexical connection to 
max_worker_processes. So what if we change that to 
max_assistant_processes? I think "assistant" and "worker" are close 
enough in meaning for "stand alone" uses of BG workers so as not to be 
confusing, and I don't see any options for parallelism that are any clearer.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-01 Thread Tom Lane
Andreas Karlsson  writes:
> On 06/01/2016 04:44 PM, Tom Lane wrote:
>> I don't understand why you think you need the CREATE OR REPLACE FUNCTION
>> commands?  We only need to change proargtypes, and the updates did that.
>> The catcache can take care of itself.

> Maybe I did something wrong (I try to avoid doing manual catalog 
> updates), but when I tested it, I needed to run the CREATE OR REPLACE 
> FUNCTION command to later be able to set the parallel safety. See the 
> example below.

In this particular example, the problem seems to be that you forgot to
update pronargs; it works for me when I add "pronargs = 2" to the UPDATE.
Your "working" example is actually creating a new function, not replacing
the old pg_proc entry.

BTW, it strikes me that the pronamespace tests in these queries are
redundant: the OID is unique, so what matters is the search path
in the regprocedure lookups.

> The reason I use to_regprocedure is so that these scripts work for 
> people who have installed the extensions in beta1 and therefore only 
> have the new signatures. If they run these scripts they would get an 
> error if I used the cast. Is it ok if these scripts break for beta1 users?

Ugh.  This is more of a mess than I thought.  I don't like update scripts
that might silently fail to do what they're supposed to, but leaving
beta1 users in the lurch is not very nice either.

I wonder ... could we get away with using regproc rather than
regprocedure?  These function names are probably unique anyway ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Joe Conway
On 06/01/2016 07:52 AM, Jim Nasby wrote:
> On 6/1/16 9:27 AM, Tom Lane wrote:
>> Kevin Grittner  writes:
>>> > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas 
>>> wrote:
 >> Figuring out what to do about it is harder.  Your proposal seems
 to be
 >> to remove them except where we need the fuzzy behavior, which
 doesn't
 >> sound unreasonable, but I don't personally understand why we need it
 >> in some places and not others.
>>> > +1
>>> > My first inclination is to remove those macros in version 10, but
>>> > it would be good to hear from some people using these types on what
>>> > the impact of that would be.
>> As I understand it, the key problem is that tests like "is point on line"
>> would basically never succeed except in the most trivial cases,
>> because of
>> roundoff error.  That's not very nice, and it might cascade to larger
>> problems like object-containment tests failing unexpectedly.  We would
>> need to go through all the geometric operations and figure out where that
>> kind of gotcha is significant and what we can do about it.  Seems like a
>> fair amount of work :-(.  If somebody's willing to do that kind of
>> investigation, then sure, but I don't think just blindly removing these
>> macros is going to lead to anything good.
> 
> I suspect another wrinkle here is that in the GIS world a single point
> can be represented it multiple reference/coordinate systems, and it
> would have different values in each of them. AIUI the transforms between
> those systems can be rather complicated if different projection methods
> are involved. I don't know if PostGIS depends on what these macros are
> doing or not. If it doesn't, perhaps it would be sufficient to lop of
> the last few bits of the significand. ISTM that'd be much better than
> what the macros currently do.

IIRC PostGIS uses a function from libgeos to do things like "point
equals" (ST_Equals). I've never looked at that source, but would be
unsurprised to find that it does something similar to this although
probably also more sophisticated.

(looks) yes -- ST_Equals calls GEOSEquals() after some sanity checking...

> BTW, I suspect the macro values were chosen specifically for dealing
> with LAT/LONG.

I think that is it exactly.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Paul Ramsey
On Wed, Jun 1, 2016 at 7:52 AM, Jim Nasby  wrote:

>
> I suspect another wrinkle here is that in the GIS world a single point can
> be represented it multiple reference/coordinate systems, and it would have
> different values in each of them. AIUI the transforms between those systems
> can be rather complicated if different projection methods are involved. I
> don't know if PostGIS depends on what these macros are doing or not. If it
> doesn't, perhaps it would be sufficient to lop of the last few bits of the
> significand. ISTM that'd be much better than what the macros currently do.

We don't depend on these, we have our own :/
The real answer for a GIS system is to have an explicit tolerance
parameter for calculations like distance/touching/containment, but
unfortunately we didn't do that so now we have our own
compatibility/boil the ocean problem if we ever wanted/were funded to
add one.
P.

> BTW, I suspect the macro values were chosen specifically for dealing with
> LAT/LONG.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-06-01 Thread David Fetter
On Tue, May 31, 2016 at 06:20:26PM -0400, Tom Lane wrote:
> David Fetter  writes:
> > On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> >> While likely not that common the introduction of an ambiguity makes
> >> raises the bar considerably.
> 
> > What ambiguity?
> 
> My first thought about it was that
> 
> select unnest('{1,2,3}');
> 
> would start failing.  But it turns out it already does fail:
> 
> ERROR:  function unnest(unknown) is not unique
> 
> You get that as a result of the recent introduction of unnest(tsvector),
> which we debated a few weeks ago and seem to have decided to leave as-is.
> But it failed before 9.6 too, with
> 
> ERROR:  could not determine polymorphic type because input has type "unknown"
> 
> So at least in this particular case, adding unnest(jsonb) wouldn't be a
> problem from the standpoint of not being able to resolve calls that we
> could resolve before.
> 
> Nonetheless, there *is* an ambiguity here, which is specific to json(b):
> what type of array are you expecting to get?  The reason we have both
> json[b]_array_elements() and json[b]_array_elements_text() is that there
> are plausible use-cases for returning either json or plain text.  It's not
> hard to imagine that somebody will want json[b]_array_elements_numeric()
> before long, too.  If you want to have an unnest(jsonb) then you will need
> to make an arbitrary decision about which type it will return, and that
> doesn't seem like an especially great idea to me.

How about making casts work?  UNNEST(jsonb)::NUMERIC or similar,
whatever won't make the parser barf.

> > UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
> > than the json_array_elements-like functions do.
> 
> AFAICT, this is nonsense.  We did not tie WITH ORDINALITY to UNNEST;
> it works for any set-returning function.

Oops.  My mistake.  Sorry about the noise.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-01 Thread Andreas Karlsson

On 06/01/2016 04:44 PM, Tom Lane wrote:

Andreas Karlsson  writes:

It is the least ugly of all the ugly solutions I could think of. I have
attached a patch which fixes the signatures using this method. I use
CREATE OR REPLACE FUNCTION to update to catcache. What do you think? Is
it too ugly?


I don't understand why you think you need the CREATE OR REPLACE FUNCTION
commands?  We only need to change proargtypes, and the updates did that.
The catcache can take care of itself.


Maybe I did something wrong (I try to avoid doing manual catalog 
updates), but when I tested it, I needed to run the CREATE OR REPLACE 
FUNCTION command to later be able to set the parallel safety. See the 
example below.


CREATE FUNCTION f(text) RETURNS int LANGUAGE SQL AS $$ SELECT 1 $$;

BEGIN;

UPDATE pg_proc SET proargtypes = 
array_to_string('{int,int}'::regtype[]::oid[], ' ')::oidvector

WHERE oid = to_regprocedure('f(text)')
AND pronamespace = current_schema()::regnamespace;

ALTER FUNCTION f(int, int) STRICT;

COMMIT;

vs

CREATE FUNCTION f(text) RETURNS int LANGUAGE SQL AS $$ SELECT 1 $$;

BEGIN;

UPDATE pg_proc SET proargtypes = 
array_to_string('{int,int}'::regtype[]::oid[], ' ')::oidvector

WHERE oid = to_regprocedure('f(text)')
AND pronamespace = current_schema()::regnamespace;

CREATE OR REPLACE FUNCTION f(int, int) RETURNS int LANGUAGE SQL AS $$ 
SELECT 1 $$;


ALTER FUNCTION f(int, int) STRICT;

COMMIT;


I think it would be good practice to be more careful about
schema-qualifying all the pg_catalog table and type names.


I do not think we generally schema qualify things in extension scripts 
and instead rely of the CREATE/ALTER EXTENSION commands to set the 
search path correct. Am I mistaken?



I also think it's a bad idea to use to_regprocedure() rather than
a cast to regprocedure.  If the name isn't found, we want an error,
not a silent NULL result leading to no update occurring.


The reason I use to_regprocedure is so that these scripts work for 
people who have installed the extensions in beta1 and therefore only 
have the new signatures. If they run these scripts they would get an 
error if I used the cast. Is it ok if these scripts break for beta1 users?


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Removing overhead commands in parallel dump/restore

2016-06-01 Thread Tom Lane
While testing parallel dump/restore over the past few days, I noticed that
it seemed to do an awful lot of duplicative SET commands, which I traced
to the fact that restore was doing _doSetFixedOutputState(AH) in the wrong
place, ie, once per TOC entry not once per worker.  Another thing that's
useless overhead is that lockTableForWorker() is doing an actual SQL query
to fetch the name of a table that we already have at hand.  Poking around
in this area also convinced me that it was pretty weird for CloneArchive
to be managing encoding, and only encoding, when cloning a dump
connection; that should be handled by setup_connection.  I also noticed
several unchecked strdup() calls that of course should be pg_strdup().

I put together the attached patch that cleans all this up.  It's hard to
show any noticeable performance difference, but the query log certainly
looks cleaner.  Any objections?

regards, tom lane

diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index e9e8698..6a808dc 100644
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
*** IsEveryWorkerIdle(ParallelState *pstate)
*** 802,808 
  static void
  lockTableForWorker(ArchiveHandle *AH, TocEntry *te)
  {
- 	Archive*AHX = (Archive *) AH;
  	const char *qualId;
  	PQExpBuffer query;
  	PGresult   *res;
--- 802,807 
*** lockTableForWorker(ArchiveHandle *AH, To
*** 813,845 
  
  	query = createPQExpBuffer();
  
! 	/*
! 	 * XXX this is an unbelievably expensive substitute for knowing how to dig
! 	 * a table name out of a TocEntry.
! 	 */
! 	appendPQExpBuffer(query,
! 	  "SELECT pg_namespace.nspname,"
! 	  "   pg_class.relname "
! 	  "  FROM pg_class "
! 	"  JOIN pg_namespace on pg_namespace.oid = relnamespace "
! 	  " WHERE pg_class.oid = %u", te->catalogId.oid);
! 
! 	res = PQexec(AH->connection, query->data);
! 
! 	if (!res || PQresultStatus(res) != PGRES_TUPLES_OK)
! 		exit_horribly(modulename,
! 	  "could not get relation name for OID %u: %s\n",
! 	  te->catalogId.oid, PQerrorMessage(AH->connection));
! 
! 	resetPQExpBuffer(query);
! 
! 	qualId = fmtQualifiedId(AHX->remoteVersion,
! 			PQgetvalue(res, 0, 0),
! 			PQgetvalue(res, 0, 1));
  
  	appendPQExpBuffer(query, "LOCK TABLE %s IN ACCESS SHARE MODE NOWAIT",
  	  qualId);
- 	PQclear(res);
  
  	res = PQexec(AH->connection, query->data);
  
--- 812,821 
  
  	query = createPQExpBuffer();
  
! 	qualId = fmtQualifiedId(AH->public.remoteVersion, te->namespace, te->tag);
  
  	appendPQExpBuffer(query, "LOCK TABLE %s IN ACCESS SHARE MODE NOWAIT",
  	  qualId);
  
  	res = PQexec(AH->connection, query->data);
  
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index ad8e132..32e1c82 100644
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
*** restore_toc_entries_postfork(ArchiveHand
*** 3893,3898 
--- 3893,3899 
  	ropt->pghost, ropt->pgport, ropt->username,
  	ropt->promptPassword);
  
+ 	/* re-establish fixed state */
  	_doSetFixedOutputState(AH);
  
  	/*
*** parallel_restore(ParallelArgs *args)
*** 4071,4080 
  	TocEntry   *te = args->te;
  	int			status;
  
- 	_doSetFixedOutputState(AH);
- 
  	Assert(AH->connection != NULL);
  
  	AH->public.n_errors = 0;
  
  	/* Restore the TOC item */
--- 4072,4080 
  	TocEntry   *te = args->te;
  	int			status;
  
  	Assert(AH->connection != NULL);
  
+ 	/* Count only errors associated with this TOC entry */
  	AH->public.n_errors = 0;
  
  	/* Restore the TOC item */
*** CloneArchive(ArchiveHandle *AH)
*** 4443,4452 
--- 4443,4456 
  		RestoreOptions *ropt = AH->public.ropt;
  
  		Assert(AH->connection == NULL);
+ 
  		/* this also sets clone->connection */
  		ConnectDatabase((Archive *) clone, ropt->dbname,
  		ropt->pghost, ropt->pgport, ropt->username,
  		ropt->promptPassword);
+ 
+ 		/* re-establish fixed state */
+ 		_doSetFixedOutputState(clone);
  	}
  	else
  	{
*** CloneArchive(ArchiveHandle *AH)
*** 4454,4460 
  		char	   *pghost;
  		char	   *pgport;
  		char	   *username;
- 		const char *encname;
  
  		Assert(AH->connection != NULL);
  
--- 4458,4463 
*** CloneArchive(ArchiveHandle *AH)
*** 4468,4485 
  		pghost = PQhost(AH->connection);
  		pgport = PQport(AH->connection);
  		username = PQuser(AH->connection);
- 		encname = pg_encoding_to_char(AH->public.encoding);
  
  		/* this also sets clone->connection */
  		ConnectDatabase((Archive *) clone, dbname, pghost, pgport, username, TRI_NO);
  
! 		/*
! 		 * Set the same encoding, whatever we set here is what we got from
! 		 * pg_encoding_to_char(), so we really shouldn't run into an error
! 		 * setting that very same value. Also see the comment in
! 		 * SetupConnection().
! 		 */
! 		PQsetClientEncoding(clone->connection, encname);
  	}
  
  	/* Let 

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Jim Nasby

On 6/1/16 9:27 AM, Tom Lane wrote:

Kevin Grittner  writes:

> On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas  wrote:

>> Figuring out what to do about it is harder.  Your proposal seems to be
>> to remove them except where we need the fuzzy behavior, which doesn't
>> sound unreasonable, but I don't personally understand why we need it
>> in some places and not others.

> +1
> My first inclination is to remove those macros in version 10, but
> it would be good to hear from some people using these types on what
> the impact of that would be.

As I understand it, the key problem is that tests like "is point on line"
would basically never succeed except in the most trivial cases, because of
roundoff error.  That's not very nice, and it might cascade to larger
problems like object-containment tests failing unexpectedly.  We would
need to go through all the geometric operations and figure out where that
kind of gotcha is significant and what we can do about it.  Seems like a
fair amount of work :-(.  If somebody's willing to do that kind of
investigation, then sure, but I don't think just blindly removing these
macros is going to lead to anything good.


I suspect another wrinkle here is that in the GIS world a single point 
can be represented it multiple reference/coordinate systems, and it 
would have different values in each of them. AIUI the transforms between 
those systems can be rather complicated if different projection methods 
are involved. I don't know if PostGIS depends on what these macros are 
doing or not. If it doesn't, perhaps it would be sufficient to lop of 
the last few bits of the significand. ISTM that'd be much better than 
what the macros currently do.


BTW, I suspect the macro values were chosen specifically for dealing 
with LAT/LONG.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-01 Thread Tom Lane
Andreas Karlsson  writes:
> It is the least ugly of all the ugly solutions I could think of. I have 
> attached a patch which fixes the signatures using this method. I use 
> CREATE OR REPLACE FUNCTION to update to catcache. What do you think? Is 
> it too ugly?

I don't understand why you think you need the CREATE OR REPLACE FUNCTION
commands?  We only need to change proargtypes, and the updates did that.
The catcache can take care of itself.

I think it would be good practice to be more careful about
schema-qualifying all the pg_catalog table and type names.

I also think it's a bad idea to use to_regprocedure() rather than
a cast to regprocedure.  If the name isn't found, we want an error,
not a silent NULL result leading to no update occurring.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does people favor to have matrix data type?

2016-06-01 Thread Jim Nasby

On 5/30/16 9:05 PM, Kouhei Kaigai wrote:

Due to performance reason, location of each element must be deterministic
without walking on the data structure. This approach guarantees we can
reach individual element with 2 steps.


Agreed.

On various other points...

Yes, please keep the discussion here, even when it relates only to PL/R. 
Whatever is being done for R needs to be done for plpython as well. I've 
looked at ways to improve analytics in plpython related to this, and it 
looks like I need to take a look at the fast-path function stuff. One of 
the things I've pondered for storing ndarrays in Postgres is how to 
reduce or eliminate the need to copy data from one memory region to 
another. It would be nice if there was a way to take memory that was 
allocated by one manager (ie: python's) and transfer ownership of that 
memory directly to Postgres without having to copy everything. Obviously 
you'd want to go the other way as well. IIRC cython's memory manager is 
the same as palloc in regard to very large allocations basically being 
ignored completely, so this should be possible in that case.


One thing I don't understand is why this type needs to be limited to 1 
or 2 dimensions? Isn't the important thing how many individual elements 
you can fit into GPU? So if you can fit a 1024x1024, you could also fit 
a 100x100x100, a 32x32x32x32, etc. At low enough values maybe that stops 
making sense, but I don't see why there needs to be an artificial limit. 
I think what's important for something like kNN is that the storage is 
optimized for this, which I think means treating the highest dimension 
as if it was a list. I don't know if it then matters whither the lower 
dimensions are C style vs FORTRAN style. Other algorithms might want 
different storage.


Something else to consider is the 1G toast limit. I'm pretty sure that's 
why MADlib stores matricies as a table of vectors. I know for certain 
it's a problem they run into, because they've discussed it on their 
mailing list.


BTW, take a look at MADlib svec[1]... ISTM that's just a special case of 
what you're describing with entire grids being zero (or vice-versa). 
There might be some commonality there.


[1] https://madlib.incubator.apache.org/docs/v1.8/group__grp__svec.html
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Tom Lane
Kevin Grittner  writes:
> On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas  wrote:
>> Figuring out what to do about it is harder.  Your proposal seems to be
>> to remove them except where we need the fuzzy behavior, which doesn't
>> sound unreasonable, but I don't personally understand why we need it
>> in some places and not others.

> +1

> My first inclination is to remove those macros in version 10, but
> it would be good to hear from some people using these types on what
> the impact of that would be.

As I understand it, the key problem is that tests like "is point on line"
would basically never succeed except in the most trivial cases, because of
roundoff error.  That's not very nice, and it might cascade to larger
problems like object-containment tests failing unexpectedly.  We would
need to go through all the geometric operations and figure out where that
kind of gotcha is significant and what we can do about it.  Seems like a
fair amount of work :-(.  If somebody's willing to do that kind of
investigation, then sure, but I don't think just blindly removing these
macros is going to lead to anything good.

Also, I suppose this means that Robert promises not to make any of his
usual complaints about breaking compatibility?  Because we certainly
would be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Kevin Grittner
On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas  wrote:
> On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli  wrote:
>> There are those macros defined for the built-in geometric types:
>>
>>> #define EPSILON 1.0E-06
>>
>>> #define FPzero(A)   (fabs(A) <= EPSILON)
>>> #define FPeq(A,B)   (fabs((A) - (B)) <= EPSILON)
>>> #define FPne(A,B)   (fabs((A) - (B)) > EPSILON)
>>> #define FPlt(A,B)   ((B) - (A) > EPSILON)
>>> #define FPle(A,B)   ((A) - (B) <= EPSILON)
>>> #define FPgt(A,B)   ((A) - (B) > EPSILON)
>>> #define FPge(A,B)   ((B) - (A) <= EPSILON)
>>
>> with this warning:
>>
>>>  *XXX These routines were not written by a numerical analyst.
>
> I agree that those macros looks like a pile of suck.

+1

> It's unclear to
> me what purpose they're trying to accomplish, but regardless of what
> it is, it's hard for me to believe that they are accomplishing it.
> Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on
> the scale of the number; e.g. if the values are all like "3" it's
> probably fine but if they are like "37142816124856" it's probably not
> enough fuzz and if they are all like ".0004" it's probably way too
> much fuzz.

Also, it's a pretty lame heuristic.  It doesn't really eliminate
*any* problems; it just aims to make them less frequent by shifting
the edges around.  In doing so, it creates whole new classes of
problems:

test=# \set A '''(1.996,1.996),(1,1)''::box'
test=# \set B '''(2,2),(1,1)''::box'
test=# \set C '''(2.004,2.004),(1,1)''::box'
test=# select :A = :B, :B = :C, :A = :C;
 ?column? | ?column? | ?column?
--+--+--
 t| t| f
(1 row)

Is the benefit we get from the macros worth destroying the
transitive property of the comparison operators on these types?

> Figuring out what to do about it is harder.  Your proposal seems to be
> to remove them except where we need the fuzzy behavior, which doesn't
> sound unreasonable, but I don't personally understand why we need it
> in some places and not others.

+1

My first inclination is to remove those macros in version 10, but
it would be good to hear from some people using these types on what
the impact of that would be.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-01 Thread Tom Lane
Amit Kapila  writes:
> Your explanation is clear, however the name max_parallel_workers makes it
> sound like that parallelising an operation is all about workers.  Yes it
> depends a lot on the number of workers allocated for parallel operation,
> but that is not everything.  I think calling it max_parallelism as
> suggested by Alvaro upthread suits better than max_parallel_workers.

I don't think that's a good direction at all.  This entire discussion is
caused by the fact that it's not very clear what "max_parallel_degree"
measures.  Fixing that problem by renaming the variable to something that
doesn't even pretend to tell you what it's counting is not an improvement.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-06-01 Thread Michael Meskes
> However, the problem I pointed out is that when the new library is
> incompatible with the older one, say the major version of libpq.dll
> changes from 5 to 6, the application user and/or developer cannot
> notice the incompatibility immediately and easily.  On Unix/Linux,
> the application fails to start because the older library is not
> found.  On the other hand, the application will start on Windows and
> probably cause difficult trouble due to the incompatibility.

I don't think this is a very good situation, but I have no idea if this
can be solved. However, I'd prefer a technical solution over a
documentation one.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array of domain types

2016-06-01 Thread Thom Brown
On 1 June 2016 at 14:20, Konstantin Knizhnik 
wrote:

> I wonder why domain types can not be used for specification of array
> element:
>
> create domain objref as bigint;
> create table foo(x objref[]);
> ERROR:  type "objref[]" does not exist
> create table foo(x bigint[]);
> CREATE TABLE
>
> Is there some principle problem here or it is just not implemented?


It's not implemented, but patches welcome.

Thom


[HACKERS] array of domain types

2016-06-01 Thread Konstantin Knizhnik
I wonder why domain types can not be used for specification of array 
element:


create domain objref as bigint;
create table foo(x objref[]);
ERROR:  type "objref[]" does not exist
create table foo(x bigint[]);
CREATE TABLE

Is there some principle problem here or it is just not implemented?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-06-01 Thread David G. Johnston
On Wed, Jun 1, 2016 at 9:00 AM, Robert Haas  wrote:

> On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer 
> wrote:
> > On 1 June 2016 at 11:48, Michael Paquier 
> wrote:
> >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD
> >> and back-branches?
> >
> > Sounds sensible to me.
>
>
​+1
​


> I don't really want to set a precedent that we'll back-patch
> PGDLLIMPORT markings every time somebody needs a new symbol for some
> extension they are writing
> ​ [...]
>
>
-1​


​David J.​


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Robert Haas
On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli  wrote:
> There are those macros defined for the built-in geometric types:
>
>> #define EPSILON 1.0E-06
>
>> #define FPzero(A)   (fabs(A) <= EPSILON)
>> #define FPeq(A,B)   (fabs((A) - (B)) <= EPSILON)
>> #define FPne(A,B)   (fabs((A) - (B)) > EPSILON)
>> #define FPlt(A,B)   ((B) - (A) > EPSILON)
>> #define FPle(A,B)   ((A) - (B) <= EPSILON)
>> #define FPgt(A,B)   ((A) - (B) > EPSILON)
>> #define FPge(A,B)   ((B) - (A) <= EPSILON)
>
> with this warning:
>
>>  *XXX These routines were not written by a numerical analyst.

I agree that those macros looks like a pile of suck.  It's unclear to
me what purpose they're trying to accomplish, but regardless of what
it is, it's hard for me to believe that they are accomplishing it.
Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on
the scale of the number; e.g. if the values are all like "3" it's
probably fine but if they are like "37142816124856" it's probably not
enough fuzz and if they are all like ".0004" it's probably way too
much fuzz.

Figuring out what to do about it is harder.  Your proposal seems to be
to remove them except where we need the fuzzy behavior, which doesn't
sound unreasonable, but I don't personally understand why we need it
in some places and not others.  It would be good if some of the people
who are more numerically inclined than I am (and hate floats less, but
then that's everyone) could jump in here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-06-01 Thread Robert Haas
On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer  wrote:
> On 1 June 2016 at 11:48, Michael Paquier  wrote:
>> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD
>> and back-branches?
>
> Sounds sensible to me.

I don't really want to set a precedent that we'll back-patch
PGDLLIMPORT markings every time somebody needs a new symbol for some
extension they are writing, but I don't mind changing this in master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename synchronous_standby_names?

2016-06-01 Thread Masahiko Sawada
On Wed, Jun 1, 2016 at 9:49 AM, Michael Paquier
 wrote:
> On Wed, Jun 1, 2016 at 3:56 AM, David G. Johnston
>  wrote:
>> On Tue, May 31, 2016 at 2:19 PM, Peter Eisentraut
>>  wrote:
>>>
>>> On 5/31/16 1:47 PM, Jaime Casanova wrote:

 Are we going to change synchronous_standby_names? Certainly the GUC is
 not *only* a list of names anymore.

 synchronous_standby_config?
 synchronous_standbys (adjust to correct english if necesary)?
>>>
>>>
>>> If the existing values are still going to be accepted, then I would leave
>>> it as is.
>>
>>
>> +1
>
> +1. We've made quite a lot of deal to take an approach for the N-sync
> that is 100% backward-compatible, it would be good to not break that
> effort.

+1

-- 
Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMENT ON, psql and access methods

2016-06-01 Thread Teodor Sigaev

As far as I can see, COMMENT ON has no support for access methods.
Wouldn't we want to add it as it is created by a command? On top of
that, perhaps we could have a backslash command in psql to list the
supported access methods, like \dam[S]? The system methods would be in
this case all the in-core ones.


+1.


Are there other opinions? That's not a 9.6 blocker IMO, so I could get
patches out for 10.0 if needed.


I missed that on review process, but I'm agree it should be implemented.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-01 Thread Amit Kapila
On Tue, May 31, 2016 at 11:30 PM, Tom Lane  wrote:
>
> I wrote:
> > I really think that a GUC named "max_parallel_workers", which in fact
> > limits the number of workers and not something else, is the way to go.
>
> To be concrete, I suggest comparing the attached documentation patch
> with Robert's.  Which one is more understandable?
>

Your explanation is clear, however the name max_parallel_workers makes it
sound like that parallelising an operation is all about workers.  Yes it
depends a lot on the number of workers allocated for parallel operation,
but that is not everything.  I think calling it max_parallelism as
suggested by Alvaro upthread suits better than max_parallel_workers.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Change in order of criteria - reg

2016-06-01 Thread Amit Langote
On 2016/06/01 13:07, sri harsha wrote:
> Hi,
> 
> In PostgreSQL , does the order in which the criteria is given matter ??
> For example
> 
> Query 1 : Select * from TABLE where a > 5 and b < 10;
> 
> Query 2 : Select * from TABLE where b <10 and a > 5;
> 
> Are query 1 and query 2 the same in PostgreSQL or different ?? If its
> different , WHY ??

tl;dr they are the same.  As in they obviously produce the same result and
result in invoking the same plan.

Internally, optimizer will order application of those quals in resulting
plan based on per-tuple cost of individual quals.  So a cheaper, more
selective qual might result in short-circuiting of relatively expensive
quals for a large number of rows in the table saving some cost in
run-time.  Also, if index scan is chosen and quals pushed down, the
underlying index method might know to order quals smartly.

However, the cost-markings of operators/functions involved in quals better
match reality.  By default, most operators/functions in a database are
marked with cost of 1 unit.  Stable sorting used in ordering of quals
would mean the order of applying quals in resulting plan matches the
original order (ie, the order in which they appear in the query).  So, if
the first specified qual really happens to be an expensive qual but marked
as having the same cost as other less expensive quals, one would have to
pay the price of evaluating it for all the rows.  Whereas, correctly
marking the costs could have avoided that (as explained above).  Note that
I am not suggesting that ordering quals in query by their perceived cost
is the solution.  Keep optimizer informed by setting costs appropriately
and it will do the right thing more often than not. :)

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-01 Thread Masahiko Sawada
On Sat, May 7, 2016 at 5:34 AM, Robert Haas  wrote:
> On Mon, May 2, 2016 at 8:25 PM, Andres Freund  wrote:
>> + * heap_tuple_needs_eventual_freeze
>> + *
>> + * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac)
>> + * will eventually require freezing.  Similar to heap_tuple_needs_freeze,
>> + * but there's no cutoff, since we're trying to figure out whether freezing
>> + * will ever be needed, not whether it's needed now.
>> + */
>> +bool
>> +heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
>>
>> Wouldn't redefining this to heap_tuple_is_frozen() and then inverting the
>> checks be easier to understand?
>
> I thought it much safer to keep this as close to a copy of
> heap_tuple_needs_freeze() as possible.  Copying a function and
> inverting all of the return values is much more likely to introduce
> bugs, IME.

I agree.

>> +   /*
>> +* If xmax is a valid xact or multixact, this tuple is also not 
>> frozen.
>> +*/
>> +   if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
>> +   {
>> +   MultiXactId multi;
>> +
>> +   multi = HeapTupleHeaderGetRawXmax(tuple);
>> +   if (MultiXactIdIsValid(multi))
>> +   return true;
>> +   }
>>
>> Hm. What's the test inside the if() for? There shouldn't be any case
>> where xmax is invalid if HEAP_XMAX_IS_MULTI is set.   Now there's a
>> check like that outside of this commit, but it seems strange to me
>> (Alvaro, perhaps you could comment on this?).
>
> Here again I was copying existing code, with appropriate simplifications.
>
>> + *
>> + * Clearing both visibility map bits is not separately WAL-logged.  The 
>> callers
>>   * must make sure that whenever a bit is cleared, the bit is cleared on WAL
>>   * replay of the updating operation as well.
>>
>> I think including "both" here makes things less clear, because it
>> differentiates clearing one bit from clearing both. There's no practical
>> differentce atm, but still.
>
> I agree.

Fixed.

>>   *
>>   * VACUUM will normally skip pages for which the visibility map bit is set;
>>   * such pages can't contain any dead tuples and therefore don't need 
>> vacuuming.
>> - * The visibility map is not used for anti-wraparound vacuums, because
>> - * an anti-wraparound vacuum needs to freeze tuples and observe the latest 
>> xid
>> - * present in the table, even on pages that don't have any dead tuples.
>>   *
>>
>> I think the remaining sentence isn't entirely accurate, there's now more
>> than one bit, and they're different with regard to scan_all/!scan_all
>> vacuums (or will be - maybe this updated further in a later commit? But
>> if so, that sentence shouldn't yet be removed...).
>
> We can adjust the language, but I don't really see a big problem here.

This comment is not incorporate this patch so far.

>> -/* Number of heap blocks we can represent in one byte. */
>> -#define HEAPBLOCKS_PER_BYTE 8
>> -
>> Hm, why was this moved to the header? Sounds like something the outside
>> shouldn't care about.
>
> Oh... yeah.  Let's undo that.

Fixed.

>> #define HEAPBLK_TO_MAPBIT(x) (((x) % HEAPBLOCKS_PER_BYTE) * 
>> BITS_PER_HEAPBLOCK)
>>
>> Hm. This isn't really a mapping to an individual bit anymore - but I
>> don't really have a better name in mind. Maybe TO_OFFSET?
>
> Well, it sorta is... but we could change it, I suppose.
>
>> +static const uint8 number_of_ones_for_visible[256] = {
>> ...
>> +};
>> +static const uint8 number_of_ones_for_frozen[256] = {
>> ...
>>  };
>>
>> Did somebody verify the new contents are correct?
>
> I admit that I didn't.  It seemed like an unlikely place for a goof,
> but I guess we should verify.
>> /*
>> - * visibilitymap_clear - clear a bit in visibility map
>> + * visibilitymap_clear - clear all bits in visibility map
>>   *
>>
>> This seems rather easy to misunderstand, as this really only clears all
>> the bits for one page, not actually all the bits.
>
> We could change "in" to "for one page in the".

Fixed.

>>   * the bit for heapBlk, or InvalidBuffer. The caller is responsible for
>> - * releasing *buf after it's done testing and setting bits.
>> + * releasing *buf after it's done testing and setting bits, and must pass 
>> flags
>> + * for which it needs to check the value in visibility map.
>>   *
>>   * NOTE: This function is typically called without a lock on the heap page,
>>   * so somebody else could change the bit just after we look at it.  In fact,
>> @@ -327,17 +351,16 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, 
>> Buffer heapBuf,
>>
>> I'm not seing what flags the above comment change is referring to?
>
> Ugh.  I think that's leftover cruft from an earlier patch version that
> should have been excised from what got committed.

Fixed.

>> /*
>> -* A single-bit read is atomic.  There could be memory-ordering 
>> effects
>> +* A single byte read is atomic.  There could be 

Re: [HACKERS] Parallel pg_dump's error reporting doesn't work worth squat

2016-06-01 Thread Kyotaro HORIGUCHI
At Tue, 31 May 2016 12:29:50 -0400, Tom Lane  wrote in 
<7445.1464712...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > At Fri, 27 May 2016 13:20:20 -0400, Tom Lane  wrote in 
> > <14603.1464369...@sss.pgh.pa.us>
> >> Kyotaro HORIGUCHI  writes:
> >>> By the way, the reason of the "invalid snapshot identifier" is
> >>> that some worker threads try to use it after the connection on
> >>> the first worker closed.
> 
> >> ... BTW, I don't quite see what the issue is there.
> 
> > The master session died from lack of libz and the failure of
> > compressLevel's propagation already fixed. Some of the children
> > that started transactions after the master's death will get the
> > error.
> 
> I don't think I believe that theory, because it would require the master
> to not notice the lack of libz before it launches worker processes, but
> instead while the workers are working.

The master actually *didn't* notice the lack of libz until it
launces worker processes before cae2bb1. So the current master
don't suffer the problem, but it is not desirable that sudden
death from any reason of a child causes this kind of behavior.

>  But AFAICS, while there are worker
> processes open, the master does nothing except wait for workers and
> dispatch new jobs to them; it does no database work of its own.  So the
> libz-isn't-there error has to have occurred in one of the workers.

Yes, the firstly-commanded worker dies from that then the master
disconencts its connection owning the snapshot before terminating
any other workers. It occurs with the current master (9ee56df)
minus cae2bb1, having --without-zlib at configure.

> > If we want prevent it perfectly, one solution could be that
> > non-master children explicitly wait the master to arrive at the
> > "safe" state before starting their transactions. But I suppose it
> > is not needed here.
> 
> Actually, I believe the problem is in archive_close_connection, around
> line 295 in HEAD: once the master realizes that one child has failed,
> it first closes its own database connection and only second tries to kill
> the remaining children.  So there's a race condition wherein remaining
> children have time to see the missing-snapshot error.

Agreed.

> In the patch I posted yesterday, I reversed the order of those two
> steps, which should fix this problem in most scenarios:
> https://www.postgresql.org/message-id/7005.1464657...@sss.pgh.pa.us

Yeah, just transposing DisconnectDatabase and ShutdownWorkersHard
in archive_close_connection fixed the problem.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-06-01 Thread Etsuro Fujita

On 2016/05/31 14:53, Amit Langote wrote:

On 2016/05/30 22:59, Craig Ringer wrote:

On 30 May 2016 at 16:17, Etsuro Fujita  wrote:



That's a good point, but the basic idea is to send the local query
almost-as-is to the remote server if possible.  For example, if the local
query is "INSERT INTO foreign_table(a,b,c) VALUES (1, 2, 3), (4, 5, 6)",
send the remote query "INSERT INTO remote_table(a,b,c) VALUES (1, 2, 3),
(4, 5, 6)" to the remote server where remote_table is the table name for
the foreign table on the remote server.  So, wouldn't the query string
length be a problem in many cases?  Maybe I'm missing something, though.




FDWs don't operate at that level. They don't see the original query string.
They're plan nodes that operate with a row-by-row push/pull model. The
foreign table node in question has no idea you're doing a multivalued
insert and doesn't care if it's INSERT INTO ... SELECT, INSERT INTO ...
VALUES, or COPY.



IIUC, what Fujita-san seems to be referring to here is safe push-down of a
insert's query or values expression (and hence the whole insert itself)
considered during the *planning* step.


That's really what I have in mind.  Thanks for the explanation!


Although that sounds like a
different optimization from  what's being discussed on this thread.  The
latter certainly seems to have its benefits in case of push-down failure
and might as well be the majority of cases.


Agreed.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers