Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-07-19 Thread Craig Ringer
>
> On 19 July 2017 at 20:26, Neha Sharma 
> wrote:
>
>> Hi,
>>
>> I am getting FailedAssertion while executing the attached
>> script.However,I am not able to produce the core dump for the same,the
>> script runs in background and takes around a day time to produce the
>> mentioned error.
>>
>> "TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(oldestXact,
>> ShmemVariableCache->oldestXid))", File: "clog.c", Line: 683)
>> 2017-07-19 01:16:51.973 GMT [27873] LOG:  server process (PID 28084) was
>> terminated by signal 6: Aborted
>> 2017-07-19 01:16:51.973 GMT [27873] DETAIL:  Failed process was running:
>> autovacuum: VACUUM pg_toast.pg_toast_13029 (to prevent wraparound)"
>>
>
>
What are the starting conditions of your postgres instance? Does your
script assume a newly initdb'd instance with no custom configuration? If
not, what setup steps/configuration precede your script run?





> well short of the 2-million mark.
>

Er, billion.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] <> join selectivity estimate question

2017-07-19 Thread Thomas Munro
On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane  wrote:
> I don't think it does really.  The thing about a <> semijoin is that it
> will succeed unless *every* join key value from the inner query is equal
> to the outer key value (or is null).  That's something we should consider
> to be of very low probability typically, so that the <> selectivity should
> be estimated as nearly 1.0.  If the regular equality selectivity
> approaches 1.0, or when there are expected to be very few rows out of the
> inner query, then maybe the <> estimate should start to drop off from 1.0,
> but it surely doesn't move linearly with the equality selectivity.

Ok, here I go like a bull in a china shop: please find attached a
draft patch.  Is this getting warmer?

In the comment for JOIN_SEMI I mentioned a couple of refinements I
thought of but my intuition was that we don't go for such sensitive
and discontinuous treatment of stats; so I made the simplifying
assumption that RHS always has more than 1 distinct value in it.

Anti-join <> returns all the nulls from the LHS, and then it only
returns other LHS rows if there is exactly one distinct non-null value
in RHS and it happens to be that one.  But if we make the same
assumption I described above, namely that there are always at least 2
distinct values on the RHS, then the join selectivity is just
nullfrac.

-- 
Thomas Munro
http://www.enterprisedb.com


neqjoinsel-fix-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-07-19 Thread Craig Ringer
On 19 July 2017 at 20:26, Neha Sharma  wrote:

> Hi,
>
> I am getting FailedAssertion while executing the attached script.However,I
> am not able to produce the core dump for the same,the script runs in
> background and takes around a day time to produce the mentioned error.
>
> "TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(oldestXact,
> ShmemVariableCache->oldestXid))", File: "clog.c", Line: 683)
> 2017-07-19 01:16:51.973 GMT [27873] LOG:  server process (PID 28084) was
> terminated by signal 6: Aborted
> 2017-07-19 01:16:51.973 GMT [27873] DETAIL:  Failed process was running:
> autovacuum: VACUUM pg_toast.pg_toast_13029 (to prevent wraparound)"
>

It's not much of a stretch to assume this was likely introduced
by ea42cc18c:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ea42cc18c35381f639d45628d792e790ff39e271

so I'll have a look to see if I can spot how it could happen.

Please show your "SELECT version()" and if using a build from git, the "git
rev-parse --short HEAD" for the build you're running.

It's interesting and probably significant that your Pg was nearing
wraparound. Just a tip for that - if you want to create a situation where
you near wraparound quickly and easily, for testing, you can, on a test
instance you don't care about:

- select txid_current();
- stop pg
- pg_resetxlog -n $(( $the_xid + (1<<31) - 1001000 ))
- try to start pg, note the missing clog segment it complains about
- dd if=/dev/zero bs=1 count=262144 of=datadir/pg_clog/$MISSINGSEGNAME
- start Pg


That should put you about 1000 txn's from the 1 million xid limit, assuming
I got my maths right (don't assume that), and assuming your starting xid is
well short of the 2-million mark.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Dealing with logical replication

2017-07-19 Thread Craig Ringer
On 19 July 2017 at 16:34, Tatsuo Ishii  wrote:

> Now that we are going to have logical replication in PostgreSQL 10, I
> have started thinking how Pgpool-II can deal with it. For example, the
> logical replication does not replicate DDLs. Isn't it convenient for
> users to do it automatically in Pgpool-II? Or even doing it for
> TRUNCATE?
>
> Or are they against the design philosophy of the logical replication?
>

(Disclaimer - Petr Jelinek and Peter Eisentraut were the main ones working
on in in-core logical rep, and I haven't had time to play with it much).

TL;DR: a pooler can only solve a limited subset of the problem, fairly
unreliably, and this is really something that needs further work in core.

Not replicating TRUNCATE and schema changes is more an implementation
limitation than anything else. It's desirable to get to the point where
logical replication can transparently replicate DDL. There are some hurdles
for doing so, but it'll be possible to get something in place that's good
enough when time/release progress permits.

Similarly, with TRUNCATE, AFAIK support just didn't make the cut for pg10.

A pooler could well help in the mean time, but you have to consider
ordering with care. For example, given "U" upstream, "D" downstream:

U: CREATE TABLE x (a integer, b integer);
D: CREATE TABLE x (a); -- user script syncs
U: INSERT INTO x (a,b) VALUES (1,0);
D: [applies INSERT 1,0]
U: INSERT INTO x (a,b) VALUES (2,0);
U: ALTER TABLE x DROP COLUMN b;
D: ALTER TABLE x DROP COLUMN b; -- user script syncs
U: INSERT INTO x (a) VALUES (3);
D: [ERROR on INSERT of 2,0: no column 'b']

Because the DDL here is transported out of band vs the row data, you can
easily create situations where the schema change is applied before the last
committed-but-not-yet-replicated data from the upstream that was based on
the old schema.

To achieve correct ordering, the simplest approach is to record DDL in a
table when you perform it on the upstream, and replay it when you see rows
in that table appear on the downstream.  You know it's safe to replay it
now. This is the essence of what BDR and pglogical do with their DDL
replication, but they handle DDL in the middle of transactions that also
make row data changes by intercepting writes to the queue table and
performing the DDL at the exact point in the transaction where it happened
on the upstream. I don't think that's possible with the in-core logical
replication yet, and certainly not something a pooler can do.

To do it externally, you have to take note of when a schema change happened
on the upstream and apply it on the downstream at or after the point where
the downstream has replayed and confirmed up to the upstream lsn where the
schema change happened. Then apply the schema change.

A pooler trying to help here must also be very aware of the impact of
multi-statements. If you send a single simple query message with a mixture
of schema change commands and normal DML, you probably don't want to repeat
the DML on downstream nodes or you'll get duplicate rows etc. But ... by
unless it embeds transaction control commands, a simple query message
executes in a single implicit transaction, so if you extract just the DDL
you'll again have ordering issues of upstream vs downstream.

There are even a variety of difficulties to overcome with doing it in core:
event triggers don't capture ddl command text and have no facility to turn
the internal command representation back into SQL command text, nor do we
have any way to turn the internal representation back into a parsenode tree
for execution on a downstream's standard_ProcessUtility. However, we can
capture raw command text with ProcessUtility_hook now that we have
byte-ranges for the query-parts of a multi-part query (yay!), and that
works well enough if you also capture the active search_path and apply with
the same search_path. It can match the wrong object if extra objects with
the same name are present earlier on the search_path on the downstream than
on the upstream, so it's not ideal, but that's a weird corner case.

If we had a hook in the logical apply worker's insert or wal-message
routines it'd be possible to write an extension to do this for pg10, but
AFAICS we don't.

So schema changes in logical replication currently require more care than
in physical replication.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] psql's \r broken since e984ef5861d

2017-07-19 Thread Julien Rouhaud
On 20/07/2017 04:24, Tom Lane wrote:
> I wrote:
>> Ah.  I don't feel like trawling the archives for the discussion right now,
>> but I believe this was an intentional change to make the behavior more
>> consistent.
> 
> Oh ... a quick look in the commit log finds the relevant discussion:
> https://www.postgresql.org/message-id/flat/9b4ea968-753f-4b5f-b46c-d7d3bf7c8f90%40manitou-mail.org
> 

Oh I see.  Thanks a lot, sorry for the noise.

-- 
Julien Rouhaud


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-07-19 Thread Peter Geoghegan
On Wed, Jul 19, 2017 at 8:33 PM, Craig Ringer  wrote:
> That's silly, so here's a patch to teach pageinspect how to decode infomasks
> to a human readable array of flag names.
>
> Example:
>
> SELECT t_infomask, t_infomask2, flags
> FROM heap_page_items(get_raw_page('test1', 0)),
>  LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags);
>  t_infomask | t_infomask2 |   flags
> +-+
>2816 |   2 |
> {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
> (1 row)

Seems like a good idea to me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-07-19 Thread Craig Ringer
On 20 July 2017 at 11:33, Craig Ringer  wrote:

> Hi
>
> Whenever I'm debugging some kind of corruption incident, possible
> visibility bug, etc, I always land up staring at integer infomasks or using
> a SQL helper function to decode them.
>
> That's silly, so here's a patch to teach pageinspect how to decode
> infomasks to a human readable array of flag names.
>
> Example:
>
> SELECT t_infomask, t_infomask2, flags
> FROM heap_page_items(get_raw_page('test1', 0)),
>  LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags);
>  t_infomask | t_infomask2 |   flags
>
> +-+-
> ---
>2816 |   2 | {HEAP_XMIN_COMMITTED,HEAP_
> XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
> (1 row)
>
>
> To decode individual mask integers you can just call it directly. It's
> strict, so pass 0 for the other mask if you don't have both, e.g.
>
> SELECT heap_infomask_flags(2816, 0);
>
> The patch backports easily to older pageinspect versions for when you're
> debugging something old.
>
> BTW, I used text[] not enums. That costs a fair bit of memory, but it
> doesn't seem worth worrying too much about in this context.
>
> For convenience it also tests and reports HEAP_LOCKED_UPGRADED and
> HEAP_XMAX_IS_LOCKED_ONLY as pseudo-flags.
>
> I decided not to filter out 
> HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID
> when HEAP_XMIN_FROZEN is set
>

Er, decided not to filter out  HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID.
Obviously wouldn't filter out HEAP_XMAX_INVALID, that was a copy-paste'o.

I wonder if it's worth dropping the HEAP_ prefix. Meh, anyway, usable as-is.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] [PATCH] pageinspect function to decode infomasks

2017-07-19 Thread Craig Ringer
Hi

Whenever I'm debugging some kind of corruption incident, possible
visibility bug, etc, I always land up staring at integer infomasks or using
a SQL helper function to decode them.

That's silly, so here's a patch to teach pageinspect how to decode
infomasks to a human readable array of flag names.

Example:

SELECT t_infomask, t_infomask2, flags
FROM heap_page_items(get_raw_page('test1', 0)),
 LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags);
 t_infomask | t_infomask2 |   flags

+-+
   2816 |   2 |
{HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
(1 row)


To decode individual mask integers you can just call it directly. It's
strict, so pass 0 for the other mask if you don't have both, e.g.

SELECT heap_infomask_flags(2816, 0);

The patch backports easily to older pageinspect versions for when you're
debugging something old.

BTW, I used text[] not enums. That costs a fair bit of memory, but it
doesn't seem worth worrying too much about in this context.

For convenience it also tests and reports HEAP_LOCKED_UPGRADED and
HEAP_XMAX_IS_LOCKED_ONLY as pseudo-flags.

I decided not to filter
out HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID
when HEAP_XMIN_FROZEN is set; that doesn't make sense when we
examine HEAP_XMAX_IS_LOCKED_ONLY or HEAP_LOCKED_UPGRADED, and filtering
them out could be just as confusing as leaving them in.

The infomask2 natts mask is ignored. You can bitwise-and it out in SQL
pretty easily if needed. I could output it here as a constructed text
datum, but it seems mostly pointless.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 488a1f69b8082258d508ba681a4f4a5f6fce2267 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 20 Jul 2017 11:20:21 +0800
Subject: [PATCH v1] Introduce heap_infomask_flags to decode infomask and
 infomask2

---
 contrib/pageinspect/Makefile  |   3 +-
 contrib/pageinspect/expected/page.out |  25 ++
 contrib/pageinspect/heapfuncs.c   | 120 ++
 contrib/pageinspect/pageinspect--1.6--1.7.sql |   9 ++
 contrib/pageinspect/pageinspect.control   |   2 +-
 contrib/pageinspect/sql/page.sql  |  14 +++
 doc/src/sgml/pageinspect.sgml |  32 +++
 7 files changed, 203 insertions(+), 2 deletions(-)
 create mode 100644 contrib/pageinspect/pageinspect--1.6--1.7.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 0a3cbee..de114c7 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -5,7 +5,8 @@ OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
 		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
+DATA = pageinspect--1.6--1.7.sql \
+	pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
 	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
 	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
 	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 8e15947..054c69d 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -82,6 +82,31 @@ SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0));
  
 (1 row)
 
+-- If we freeze the only tuple on test1, the infomask should
+-- always be the same in all test runs.
+VACUUM FREEZE test1;
+SELECT t_infomask, t_infomask2, flags
+FROM heap_page_items(get_raw_page('test1', 0)),
+ LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags);
+ t_infomask | t_infomask2 |   flags
++-+
+   2816 |   2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
+(1 row)
+
+SELECT t_infomask, t_infomask2, flags
+FROM heap_page_items(get_raw_page('test1', 0)),
+ LATERAL heap_infomask_flags(t_infomask, t_infomask2, false) m(flags);
+ t_infomask | t_infomask2 |   flags   
++-+---
+   2816 |   2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
+(1 row)
+
+SELECT heap_infomask_flags(2816, 0);
+heap_infomask_flags
+---
+ {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
+(1 row)
+
 DROP TABLE test1;
 -- check that using any of these functions with a partitioned table would fail
 create table 

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-19 Thread Thomas Munro
On Thu, Jul 20, 2017 at 2:02 PM, Robert Haas  wrote:
> On Wed, Jul 19, 2017 at 7:45 PM, Thomas Munro
>  wrote:
>> Isn't this the same as the issue reported here?
>>
>> https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com
>
> Hmm, possibly.  But why would that affect the partition-wise join case only?

It doesn't.  From Rafia's part_reg.zip we see a bunch of rows=1 that
turn out to be wrong by several orders of magnitude:

21_nopart_head.out:  Hash Semi Join  (cost=5720107.25..9442574.55
rows=1 width=50)
21_part_head.out:Hash Semi Join  (cost=5423094.06..8847638.36
rows=1 width=38)
21_part_patched.out: Hash Semi Join  (cost=309300.53..491665.60 rows=1 width=12)

My guess is that the consequences of that bad estimate are sensitive
to arbitrary other parameters moving around, as you can see from the
big jump in execution time I showed in the that message, measured on
unpatched master of the day:

  4 workers = 9.5s
  3 workers = 39.7s

That's why why both parallel hash join and partition-wise join are
showing regressions on Q21: it's just flip-flopping between various
badly costed plans.  Note that even without parallelism, the fix that
Tom Lane suggested gives a much better plan:

https://www.postgresql.org/message-id/CAEepm%3D11BiYUkgXZNzMtYhXh4S3a9DwUP8O%2BF2_ZPeGzzJFPbw%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Peter Geoghegan
On Wed, Jul 19, 2017 at 7:57 PM, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> My argument for the importance of index bloat to the more general
>> bloat problem is simple: any bloat that accumulates, that cannot be
>> cleaned up, will probably accumulate until it impacts performance
>> quite noticeably.
>
> But that just begs the question: *does* it accumulate indefinitely, or
> does it eventually reach a more-or-less steady state?

Yes, I believe it does reach a more-or-less steady state. It saturates
when there is a lot of contention, because then you actually can reuse
the bloat. If it didn't saturate, and instead became arbitrarily bad,
then we'd surely have heard about that before now.

The bloat is not entirely wasted, because it actually prevents you
from getting even more bloat in that part of the keyspace.

> The traditional
> wisdom about btrees, for instance, is that no matter how full you pack
> them to start with, the steady state is going to involve something like
> 1/3rd free space.  You can call that bloat if you want, but it's not
> likely that you'll be able to reduce the number significantly without
> paying exorbitant costs.

For the purposes of this discussion, I'm mostly talking about
duplicates within a page on a unique index. If the keyspace owned by
an int4 unique index page only covers 20 distinct values, it will only
ever cover 20 distinct values, now and forever, despite the fact that
there is room for about 400 (a 90/10 split leaves you with 366 items +
1 high key).

I don't know if I should really even call this bloat, since the term
is so overloaded, although this is what other database systems call
index bloat. I like to think of it as "damage to the keyspace",
although that terminology seems unlikely to catch on.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Tom Lane
Peter Geoghegan  writes:
> My argument for the importance of index bloat to the more general
> bloat problem is simple: any bloat that accumulates, that cannot be
> cleaned up, will probably accumulate until it impacts performance
> quite noticeably.

But that just begs the question: *does* it accumulate indefinitely, or
does it eventually reach a more-or-less steady state?  The traditional
wisdom about btrees, for instance, is that no matter how full you pack
them to start with, the steady state is going to involve something like
1/3rd free space.  You can call that bloat if you want, but it's not
likely that you'll be able to reduce the number significantly without
paying exorbitant costs.

I'm not claiming that we don't have any problems, but I do think it's
important to draw a distinction between bloat and normal operating
overhead.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Masahiko Sawada
On Thu, Jul 20, 2017 at 8:24 AM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> "Joshua D. Drake"  writes:
>> > At PGConf US Philly last week I was talking with Jim and Jan about
>> > performance. One of the items that came up is that PostgreSQL can't run
>> > full throttle for long periods of time. The long and short is that no
>> > matter what, autovacuum can't keep up. This is what I have done:
>>
>> Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
>> autovacuum_vacuum_cost_limit.
>
> Or get rid of the cost delay entirely and let autovacuum actually go as
> fast as it can when it's run.  The assertion that it can't keep up is
> still plausible, but configuring autovacuum to sleep regularly and then
> complaining that it's not able to keep up doesn't make sense.
>
> Reducing the nap time might also be helpful if autovacuum is going as
> fast as it can and it's able to clear a table in less than a minute.
>
> There have been discussions on this list about parallel vacuum of a
> particular table as well; to address this issue I'd encourage reviewing
> those discussions and looking at writing a patch to implement that
> feature as that would address the case where the table is large enough
> that autovacuum simply can't get through all of it before the other
> backends have used all space available and then substantially increased
> the size of the relation (leading to vacuum on the table running for
> longer).

Yeah, the parallel vacuum of a particular table might help this issue
unless disk I/O is bottle-neck. I'm planning work on that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-07-19 Thread Etsuro Fujita

On 2017/07/19 23:36, Tom Lane wrote:

Etsuro Fujita  writes:

* Modified rewrite_targetlist(), which is a new function added to
preptlist.c, so that we do const-simplification to junk TLEs that
AddForeignUpdateTargets() added, as that API allows the FDW to add junk
TLEs containing non-Var expressions to the query's targetlist.


This does not seem like a good idea to me.  eval_const_expressions is not
a cheap thing, and for most use-cases those cycles will be wasted, and it
has never been the responsibility of preprocess_targetlist to do this sort
of thing.


Hm, I added that const-simplification to that function so that the 
existing FDWs that append junk TLEs that need const-simplification, 
which I don't know really exist, would work well for this fix, without 
any changes, but I agree on that point.



Please put the responsibility of doing const-expression simplification
in these cases somewhere closer to where the problem is being created.


It would be reasonable that it's the FDW's responsibility to do that 
const-simplification if necessary?


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql's \r broken since e984ef5861d

2017-07-19 Thread Tom Lane
I wrote:
> Ah.  I don't feel like trawling the archives for the discussion right now,
> but I believe this was an intentional change to make the behavior more
> consistent.

Oh ... a quick look in the commit log finds the relevant discussion:
https://www.postgresql.org/message-id/flat/9b4ea968-753f-4b5f-b46c-d7d3bf7c8f90%40manitou-mail.org

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Peter Geoghegan
On Wed, Jul 19, 2017 at 3:54 PM, Alvaro Herrera
 wrote:
>> Index bloat is a general problem that B-Trees have in all other major
>> systems, but I think that PostgreSQL has a tendency to allow indexes
>> to become progressively more bloated over time, in a way that it often
>> can never recover from [1].
>
> Interesting assertion.

I don't pretend to understand the complicated feedback loops that may
exist for workloads that are very reliant on pruning,
kill_prior_tuples cleanup, very aggressive vacuuming, etc. I'll just
say that it seems very reasonable to suppose that they exist.

My argument for the importance of index bloat to the more general
bloat problem is simple: any bloat that accumulates, that cannot be
cleaned up, will probably accumulate until it impacts performance
quite noticeably.

Index bloat may not seem as important as I suggest to some. The ideas
I'm talking about were received somewhat skeptically at pgCon, when I
brought them up informally. Maybe this is a more cogent explanation:
if your indexes were somehow magically never bloated, but the heap
could become just as bloated, then it might not matter that much
simply because the heap pages wouldn't be accessed by index scans.
Heap page accesses may demonstrably be the bottleneck today, without
that magic in place, but perhaps only because indexes point to the
bloat in the first place. It could be a bit like the situation with
bloating UNDO segments in Oracle; those are not the worst thing to
have to bloat. And, the kill_prior_tuples stuff is only as strong as
the weakest link (oldest snapshot).

> Many years ago I set to implement btree page
> merging[1] from ideas in a 1996 paper[2], though that work never saw the
> light of day.  Maybe it can be valuable now.

Lots of other database systems don't implement B-Tree page merging,
because it's hard to make work with techniques like Lehman & Yao
B-Link trees, and the average case space utilization still ends up
being good enough. It may be more worthwhile for us, though.
Apparently Kevin has some ideas here.

If I'm right about this index bloat issue, then I'd sooner tackle the
problem by preventing bloat in unique indexes in the fist place, by
using some additional indirection, a topic that I've said plenty about
recently. I think that you can sometimes, though fairly rarely, see
cases that get *really* out of hand, but with things like in-database
queues, that have unusually aggressive update patterns where a great
many duplicates are generated in indexes [1]. Our handling of
duplicates in unique indexes [2] is surely a problem there.

> Another thing to consider is indirect indexes, wherein you get less heap
> bloat because more updates can become HOT.

I think that the stuff I'm talking about, about having indirection for
the primary key and making sure unique indexes actually disallow even
physical duplicates actually enables indirect indexes. Remember how I
pointed out issues with unique indexes and VACUUM when you presented
on it at the pgCon unconference? I think that those problems may be
solvable through centralizing everything by making duplicates within
leaf pages physically impossible for unique indexes (not just
logically impossible). One chokepoint for dealing with bloat cleanup
for every index is the entire point of your indirect index design, but
that requires that you actually have no ambiguity about what every
logical pointer (in every secondary index) points to.

> If we also allow heap to be pruned from line pointers by walking indexes
> to remove specific pointers, instead of requiring indexes to be scanned
> whole for the removal, as proposed by Andres, we could get further
> along.

Yeah. I talked to Robert about doing that just today. That seems very
worthwhile, no matter how it ends up being used (whether it's by
VACUUM, something closer to an synchronous deletion, or whatever).

[1] https://brandur.org/postgres-queues
[2] 
https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql's \r broken since e984ef5861d

2017-07-19 Thread Tom Lane
Julien Rouhaud  writes:
> On 20/07/2017 03:34, Tom Lane wrote:
>> Works for me.  Please describe exactly what misbehavior you're seeing.

> Here's a simple test case, last \p still show the query buffer:

Ah.  I don't feel like trawling the archives for the discussion right now,
but I believe this was an intentional change to make the behavior more
consistent.  Prior versions did things weirdly differently depending on
whether you'd typed anything, eg modifying your example slightly:

regression=# select version();
 version
  
--
 PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 
(Red Hat 4.4.7-18), 64-bit
(1 row)

regression=# \p
select version();
regression=# mistake
regression-# \r
Query buffer reset (cleared).
regression=# \p
select version();
regression=# \g 
 version
  
--
 PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 
(Red Hat 4.4.7-18), 64-bit
(1 row)

I think we felt that throwing away the previous-query buffer
when we didn't have to was generally to be avoided, so we
wanted to standardize on this behavior not the other one.
Do you think differently?

I have some recollection that there were also cases where \p
would print something different than what \g would execute,
which of course is quite nasty.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 7:45 PM, Thomas Munro
 wrote:
> On Thu, Jul 20, 2017 at 7:00 AM, Robert Haas  wrote:
>> I think the problem is that the row count estimates for the child
>> joins seem to be totally bogus:
>>
>> ->  Hash Semi Join  (cost=309300.53..491665.60 rows=1 width=12)
>> (actual time=10484.422..15945.851 rows=1523493 loops=3)
>>   Hash Cond: (l1.l_orderkey = l2.l_orderkey)
>>   Join Filter: (l2.l_suppkey <> l1.l_suppkey)
>>   Rows Removed by Join Filter: 395116
>>
>> That's clearly wrong.  In the un-partitioned plan, the join to l2
>> produces about as many rows of output as the number of rows that were
>> input (998433 vs. 962909); but here, a child join with a million rows
>> as input is estimated to produce only 1 row of output.  I bet the
>> problem is that the child-join's row count estimate isn't getting
>> initialized at all, but then something is clamping it to 1 row instead
>> of 0.
>>
>> So this looks like a bug in Ashutosh's patch.
>
> Isn't this the same as the issue reported here?
>
> https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com

Hmm, possibly.  But why would that affect the partition-wise join case only?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql's \r broken since e984ef5861d

2017-07-19 Thread Julien Rouhaud
On 20/07/2017 03:34, Tom Lane wrote:
> Julien Rouhaud  writes:
>> Unless I miss something, \r isn't working anymore,
> 
> Works for me.  Please describe exactly what misbehavior you're seeing.
> What libreadline or libedit version are you using?
> 

I have libreadline 7.0_p3.

Here's a simple test case, last \p still show the query buffer:

psql -X postgres



postgres=# select version();
  version


 PostgreSQL 10beta2@decb08ebdf on x86_64-pc-linux-gnu, compiled by gcc
(Gentoo 4.9.3 p1.5, pie-0.6.4) 4.9.3, 64-bit
(1 row)

postgres=# \p
select version();
postgres=# \r
Query buffer reset (cleared).
postgres=# \p
select version();

On a 9.6:

postgres=# select version();
 version

--
 PostgreSQL 9.6.3@3c017a545f on x86_64-pc-linux-gnu, compiled by gcc
(Gentoo 4.9.3 p1.5, pie-0.6.4) 4.9.3, 64-bit
(1 row)

postgres=# \p
select version();
postgres=# \r
Query buffer reset (cleared).
postgres=# \p
Query buffer is empty.


-- 
Julien Rouhaud


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql's \r broken since e984ef5861d

2017-07-19 Thread Tom Lane
Julien Rouhaud  writes:
> Unless I miss something, \r isn't working anymore,

Works for me.  Please describe exactly what misbehavior you're seeing.
What libreadline or libedit version are you using?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql's \r broken since e984ef5861d

2017-07-19 Thread Julien Rouhaud
Hello,

Unless I miss something, \r isn't working anymore, since
exec_command_print() fallback to display previous_buf if query_buf has
been freed.

Trivial patch to fix issue (free both buffers in exec_command_reset())
attached.

Regards.

-- 
Julien Rouhaud
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 14c64208ca..4087532052 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -109,7 +109,7 @@ static backslashResult exec_command_prompt(PsqlScanState scan_state, bool active
 static backslashResult exec_command_pset(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_quit(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_reset(PsqlScanState scan_state, bool active_branch,
-   PQExpBuffer query_buf);
+   PQExpBuffer query_buf, PQExpBuffer previous_buf);
 static backslashResult exec_command_s(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_set(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_setenv(PsqlScanState scan_state, bool active_branch,
@@ -369,7 +369,7 @@ exec_command(const char *cmd,
 	else if (strcmp(cmd, "q") == 0 || strcmp(cmd, "quit") == 0)
 		status = exec_command_quit(scan_state, active_branch);
 	else if (strcmp(cmd, "r") == 0 || strcmp(cmd, "reset") == 0)
-		status = exec_command_reset(scan_state, active_branch, query_buf);
+		status = exec_command_reset(scan_state, active_branch, query_buf, previous_buf);
 	else if (strcmp(cmd, "s") == 0)
 		status = exec_command_s(scan_state, active_branch);
 	else if (strcmp(cmd, "set") == 0)
@@ -2060,11 +2060,12 @@ exec_command_quit(PsqlScanState scan_state, bool active_branch)
  */
 static backslashResult
 exec_command_reset(PsqlScanState scan_state, bool active_branch,
-   PQExpBuffer query_buf)
+   PQExpBuffer query_buf, PQExpBuffer previous_buf)
 {
 	if (active_branch)
 	{
 		resetPQExpBuffer(query_buf);
+		resetPQExpBuffer(previous_buf);
 		psql_scan_reset(scan_state);
 		if (!pset.quiet)
 			puts(_("Query buffer reset (cleared)."));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-19 Thread Thomas Munro
On Thu, Jul 20, 2017 at 7:00 AM, Robert Haas  wrote:
> I think the problem is that the row count estimates for the child
> joins seem to be totally bogus:
>
> ->  Hash Semi Join  (cost=309300.53..491665.60 rows=1 width=12)
> (actual time=10484.422..15945.851 rows=1523493 loops=3)
>   Hash Cond: (l1.l_orderkey = l2.l_orderkey)
>   Join Filter: (l2.l_suppkey <> l1.l_suppkey)
>   Rows Removed by Join Filter: 395116
>
> That's clearly wrong.  In the un-partitioned plan, the join to l2
> produces about as many rows of output as the number of rows that were
> input (998433 vs. 962909); but here, a child join with a million rows
> as input is estimated to produce only 1 row of output.  I bet the
> problem is that the child-join's row count estimate isn't getting
> initialized at all, but then something is clamping it to 1 row instead
> of 0.
>
> So this looks like a bug in Ashutosh's patch.

Isn't this the same as the issue reported here?

https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> "Joshua D. Drake"  writes:
> > At PGConf US Philly last week I was talking with Jim and Jan about 
> > performance. One of the items that came up is that PostgreSQL can't run 
> > full throttle for long periods of time. The long and short is that no 
> > matter what, autovacuum can't keep up. This is what I have done:
> 
> Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
> autovacuum_vacuum_cost_limit.

Or get rid of the cost delay entirely and let autovacuum actually go as
fast as it can when it's run.  The assertion that it can't keep up is
still plausible, but configuring autovacuum to sleep regularly and then
complaining that it's not able to keep up doesn't make sense.

Reducing the nap time might also be helpful if autovacuum is going as
fast as it can and it's able to clear a table in less than a minute.

There have been discussions on this list about parallel vacuum of a
particular table as well; to address this issue I'd encourage reviewing
those discussions and looking at writing a patch to implement that
feature as that would address the case where the table is large enough
that autovacuum simply can't get through all of it before the other
backends have used all space available and then substantially increased
the size of the relation (leading to vacuum on the table running for
longer).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] JSONB - JSONB operator feature request

2017-07-19 Thread Merlin Moncure
On Tue, Jul 18, 2017 at 12:49 PM, David Fetter  wrote:
> On Tue, Jul 18, 2017 at 01:36:32PM +0200, david.tu...@linuxbox.cz wrote:
>> Hi,
>>
>> some users and me used hstore - hstore for example storing only changed
>> rows in trigger like:
>>
>> hsore(NEW) - hstore(OLD)
>>
>> There isn't same operator/function in JSON/JSONB. We can only remove keys
>> from JSONB, but not equal key-value pairs. Is there any chance to have
>> same feature with JSON/JSONB in postgres core?
>
> Here's one slightly modified from 
> http://coussej.github.io/2016/05/24/A-Minus-Operator-For-PostgreSQLs-JSONB/
>
> CREATE OR REPLACE FUNCTION jsonb_minus ( arg1 jsonb, arg2 jsonb )
> RETURNS jsonb
> LANGUAGE sql
> AS $$
> SELECT
> COALESCE(json_object_agg(
> key,
> CASE
> -- if the value is an object and the value of the second argument 
> is
> -- not null, we do a recursion
> WHEN jsonb_typeof(value) = 'object' AND arg2 -> key IS NOT NULL
> THEN jsonb_minus(value, arg2 -> key)
> -- for all the other types, we just return the value
> ELSE value
> END
> ), '{}')::jsonb
> FROM
> jsonb_each(arg1)
> WHERE
> arg1 -> key IS DISTINCT FROM arg2 -> key
> $$;
>
> CREATE OPERATOR - (
> PROCEDURE = jsonb_minus,
> LEFTARG   = jsonb,
> RIGHTARG  = jsonb
> );
>
> I suspect that there's a faster way to do the jsonb_minus function
> internally.

yes, please!  I also sorely miss the hstore 'slice' function which is
very similar.  The main remaining disadvantage with jsonb WRT to
hstore is that you can't do simple retransformations that these
operations allow for.  Too often you end up doing multiple '->'
operations against the same object followed by a rebundling which is a
real performance killer.

I understand that there are more edge cases due the flexible json
structure but I'd be quite happy returning NULL or erroring when you
can't arrive at a sensible extraction.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Joe Conway
On 07/19/2017 03:29 PM, Tom Lane wrote:
> "Joshua D. Drake"  writes:
>> At PGConf US Philly last week I was talking with Jim and Jan about 
>> performance. One of the items that came up is that PostgreSQL can't run 
>> full throttle for long periods of time. The long and short is that no 
>> matter what, autovacuum can't keep up. This is what I have done:
> 
> Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
> autovacuum_vacuum_cost_limit.

I would try
  autovacuum_vacuum_cost_delay = 0
and for any tables > 1 million rows:
  autovacuum_vacuum_scale_factor: 0
  autovacuum_vacuum_threshold: 10 (perhaps even smaller)

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Alvaro Herrera
Peter Geoghegan wrote:

> Index bloat is a general problem that B-Trees have in all other major
> systems, but I think that PostgreSQL has a tendency to allow indexes
> to become progressively more bloated over time, in a way that it often
> can never recover from [1].

Interesting assertion.  Many years ago I set to implement btree page
merging[1] from ideas in a 1996 paper[2], though that work never saw the
light of day.  Maybe it can be valuable now.

Another thing to consider is indirect indexes, wherein you get less heap
bloat because more updates can become HOT.

If we also allow heap to be pruned from line pointers by walking indexes
to remove specific pointers, instead of requiring indexes to be scanned
whole for the removal, as proposed by Andres, we could get further
along.

[1] 
https://www.postgresql.org/message-id/20020912235429.4714071a.alvhe...@atentus.com
[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9961

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Tom Lane
"Joshua D. Drake"  writes:
> At PGConf US Philly last week I was talking with Jim and Jan about 
> performance. One of the items that came up is that PostgreSQL can't run 
> full throttle for long periods of time. The long and short is that no 
> matter what, autovacuum can't keep up. This is what I have done:

Try reducing autovacuum_vacuum_cost_delay more, and/or increasing
autovacuum_vacuum_cost_limit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Peter Geoghegan
On Wed, Jul 19, 2017 at 3:11 PM, Joshua D. Drake  wrote:
> The good news is, PostgreSQL is not doing half bad against 128 connections
> with only 16vCPU. The bad news is we more than doubled our disk size without
> getting reuse or bloat under control. The concern here is that under heavy
> write loads that are persistent, we will eventually bloat out and have to
> vacuum full, no matter what. I know that Jan has done some testing and the
> best he could get is something like 8 days before PostgreSQL became unusable
> (but don't quote me on that).
>
> I am open to comments, suggestions, running multiple tests with different
> parameters or just leaving this in the archive for people to reference.

Did you see my blogpost on planet PostgreSQL from last night?

https://pgeoghegan.blogspot.com/2017/07/postgresql-index-bloat-microscope.html

Perhaps you could use my query to instrument an interesting index, to
see what that turns up. I would really like to get a better sense of
how often and to what extent index bloat is a problem that VACUUM is
just inherently incapable of keeping under control over time. The
timeline for performance to degrade with real production workloads is
very interesting to me. It's really hard to simulate certain types of
problems that you will see frequently in production.

Index bloat is a general problem that B-Trees have in all other major
systems, but I think that PostgreSQL has a tendency to allow indexes
to become progressively more bloated over time, in a way that it often
can never recover from [1]. This may be a particular problem with
unique indexes, where many physical duplicates accumulate in pages.
Duplicates that are theoretically reclaimable, but due to how the
keyspace is split up, will never actually be reclaimed [2].

[1] 
postgr.es/m/CAH2-Wzmf6intNY1ggiNzOziiO5Eq=DsXfeptODGxO=2j-i1...@mail.gmail.com
[2] 
https://wiki.postgresql.org/wiki/Key_normalization#VACUUM_and_nbtree_page_deletion
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-19 Thread Joshua D. Drake

Hello,

At PGConf US Philly last week I was talking with Jim and Jan about 
performance. One of the items that came up is that PostgreSQL can't run 
full throttle for long periods of time. The long and short is that no 
matter what, autovacuum can't keep up. This is what I have done:


Machine:

16vCPU
59G Memory
10G SSD (/)
500G SSD /srv/main/9.6 (PGDATA) : 240MB Sustained with 15k IOPS
* Yes, we really got 240MB sustained performance

I used benchmarksql which is a tpc-c benchmark similar to pgbench but 
supposedly more thorough.


https://sourceforge.net/projects/benchmarksql/

PostgreSQL 9.6 on Ubuntu 16.04 x64.

postgresql.conf:

max_connections: 1000 (just to keep it out of the way)
shared_buffers: 32G (Awesome work Haas)
work_mem: 32M
maintenance_work_mem: 2G
effective_io_concurrency: 1

	* Before anybody suggests increasing this, on GCE over a dozen tests, 
anything but disabling this appears to be a performance hit of ~ 10% (I 
can reproduce those tests if you like on another thread).


synchronous_commit: off
checkpoint_timeout: 60min
max_wal_size: 5G
random_page_cost: 1
effective_cache_size: 32GB
*this probably should be more like 50 but still
autovacuum_max_workers: 12
* One for each table + a couple for system tables
autovacuum_vacuum_scale_factor: 0.1
autovacuum_cost_delay: 5ms

Here are the benchmarksql settings for all 4 runs:

17:07:54,268 [main] INFO   jTPCC : Term-00, warehouses=500
17:07:54,269 [main] INFO   jTPCC : Term-00, terminals=128
17:07:54,272 [main] INFO   jTPCC : Term-00, runTxnsPerTerminal=10
17:07:54,273 [main] INFO   jTPCC : Term-00, limitTxnsPerMin=30
17:07:54,273 [main] INFO   jTPCC : Term-00, terminalWarehouseFixed=false
17:07:54,274 [main] INFO   jTPCC : Term-00,
17:07:54,274 [main] INFO   jTPCC : Term-00, newOrderWeight=45
17:07:54,274 [main] INFO   jTPCC : Term-00, paymentWeight=43
17:07:54,274 [main] INFO   jTPCC : Term-00, orderStatusWeight=4
17:07:54,275 [main] INFO   jTPCC : Term-00, deliveryWeight=4
17:07:54,275 [main] INFO   jTPCC : Term-00, stockLevelWeight=4

For run 0, I started with:

vacuumdb -U postgres -fz;./runBenchmark.sh my_postgres.properties

And then for each subsequent run, I just ran the benchmark without the 
vacuum full so that PostgreSQL could prove us wrong. It didn't. Here is 
the break down of the results:


RUN START DISK SIZE END DISK SIZE   TPS/Terminal
0   54  78  868.6796875
1   78  91  852.4765625
2   91  103 741.4609375
3   103 116 686.125

The good news is, PostgreSQL is not doing half bad against 128 
connections with only 16vCPU. The bad news is we more than doubled our 
disk size without getting reuse or bloat under control. The concern here 
is that under heavy write loads that are persistent, we will eventually 
bloat out and have to vacuum full, no matter what. I know that Jan has 
done some testing and the best he could get is something like 8 days 
before PostgreSQL became unusable (but don't quote me on that).


I am open to comments, suggestions, running multiple tests with 
different parameters or just leaving this in the archive for people to 
reference.


Thanks in advance,

JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Authentication mechanisms categorization

2017-07-19 Thread Álvaro Hernández Tortosa



On 14/07/17 13:29, Michael Paquier wrote:

On Fri, Jul 14, 2017 at 12:16 PM, Álvaro Hernández Tortosa
 wrote:

 If the parameter authmethod would rather be "authmethods", i.e., a list,
I think it would be significantly more flexible.

Yes, but the handling of a list becomes messier if there are some
other connection parameters that are dependent on the authentication
method. Say if a list made of scram-sha-256 and scram-sha-3 as methods
is sent, and a parameter named saslchannel lists scram-sha-256-plus is
used, this becomes unusable with scram-sha-3. Using individual names
for a parameter makes interactions with other parameters easier to
handle and less bug-prone. That's also by experience more flexible for
the application.


 I agree with a list of methods and all the values already existing for
sslmode, this might be more than enough, specially if the channel binding
SCRAM mechanisms would get a different authmethod than their non-channel
binding partners (like scram-sha-256-plus). This makes the list argument for
the authmethods, in my opinion, stronger.

For the channel binding patch, I have actually implemented saslname to
enforce the name of the SASL mechanism name to use (SCRAM-SHA-256 or
its -PLUS) as well as saslchannelbinding to enforce the channel
binding type. That's very handy, and at the end I saw that having a
list does not add much value in a feature that should be as simple as
possible as the client will use one match at the end for
authentication, and let the client know if it failed or succeeded (yes
I hate sslmode=prefer which does up to two attempts at once). But
that's as far as my opinion stands.

It is not possible to know the future, but we cannot discard as well
the fact that a future authentication method, say hoge could as well
support scram-sha-256, in which case cases like that using a list
"authmethods=hoge,sasl authmechanisms=scram-sha-256" would mean that
scram-sha-256 needs to be enforced for both things, but the dependency
handling makes things unnecessary complicated in libpq. My argument
here is crazy though.


Hi Michael.

I'm mostly convinced by the power of all the parameters that 
already exist, given that you added both saslname and saslchannelbinding 
to the already existing sslmode. That's great, and allows for very fine 
choosing of the auth method. So it would be great if (non-libpq) driver 
implementations would expose the same parameter names to the users. I 
will study this for JDBC.


My only fear is that this could become very complicated for the 
user, and could end up looking like the authentication algorithm lists 
for SSL, which are very hard to digest for the non expert. To handle 
list of auth methods spread through three parameters, seem to me going a 
bit in this direction. A categorization like the one proposed, while 
maybe difficult to do initially, and maybe to maintain too, is just 
precisely to take this burden out from the user, and expose a simpler 
setting for them.


Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-19 Thread Tom Lane
Ashutosh Sharma  writes:
> Here are the list of macros and variables from 'intrpvar.h' file that
> are just defined in perl module but not in plperl on Windows,

> #ifdef PERL_USES_PL_PIDSTATUS
> PERLVAR(I, pidstatus,   HV *)   /* pid-to-status mappings for waitpid */
> #endif

> #ifdef PERL_SAWAMPERSAND
> PERLVAR(I, sawampersand, U8)/* must save all match strings */
> #endif

I am really suspicious that this means your libperl was built in an unsafe
fashion, that is, by injecting configuration choices as random -D switches
in the build process rather than making sure the choices were recorded in
perl's config.h.  As an example, looking at the perl 5.24.1 headers on
a Fedora box, it looks to me like PERL_SAWAMPERSAND could only get defined
if PERL_COPY_ON_WRITE were not defined, and the only way that that can
happen is if PERL_NO_COW is defined, and there are no references to the
latter anyplace except in this particular #if defined test in perl.h.

Where did your perl installation come from, anyway?  Are you sure the .h
files match up with the executables?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-19 Thread Mark Rofail
On Wed, Jul 19, 2017 at 10:08 PM, Alvaro Herrera 
wrote:

> So let's step back a bit,
> get a patch that works for the case where the types match on both sides
> of the FK, then we review that patch; if all is well, we can discuss the
> other problem as a stretch goal.


Agreed. This should be a future improvment.

I think the next step should be testing the performnce before/after the
modifiactions.


[HACKERS] pgbench minor doc typo

2017-07-19 Thread Fabien COELHO


Alik Khilazhev is submitting a patch about a zipfian random function
for pgbench, and noticed a typo in the documentation about 
random_exponential.


Attached is a fix extracted from his patch submission, which could be 
applied to head/10/9.6.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 64b043b..03e1212 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1045,7 +1045,7 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1 - exp(-parameter))
 
   Then value i between min and
   max inclusive is drawn with probability:
-  f(x) - f(x + 1).
+  f(i) - f(i + 1).
  
 
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Jul 19, 2017 at 8:56 PM, Tom Lane  wrote:
>> I wonder if it'd be worth the trouble to stick something like this into
>> xlog.c:
>> StaticAssertStmt(sizeof(ControlFileData) <= 512,
>> "pg_control is too large for atomic disk writes");

> +1. Even if it just gets triggered in 20 years by some hacker, that's
> a good reminder about assumptions behind the update logic.

Done.  I found the size checks had metastasized into pg_resetwal as well,
and probably should have been in pg_rewind since it also rewrites
pg_control, so it ended up a slightly larger patch than I first thought.
Seems cleaner than before though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-19 Thread Alvaro Herrera
Mark Rofail wrote:
> On Tue, Jul 18, 2017 at 11:14 PM, Alvaro Herrera 
> wrote:
> >
> > Why did we add an operator and not a support
> > procedure?
> 
> I thought the support procedures were constant within an opclass.

Uhh ... I apologize but I think I was barking at the wrong tree.  I was
thinking that it mattered that the opclass mechanism was able to
determine whether some array @>> some element, but that's not true: it's
the queries in ri_triggers.c, which have no idea about opclasses.

(I tihnk we would have wanted to use to opclasses in order to find out
what operator to use in the first place, if ri_triggers.c was already
using that general idea; but in reality it's already using hardcoded
operator names, so it doesn't matter.)

I'm not entirely sure what's the best way to deal with the polymorphic
problem, but on the other hand as Robert says downthread maybe we
shouldn't be solving it at this stage anyway.  So let's step back a bit,
get a patch that works for the case where the types match on both sides
of the FK, then we review that patch; if all is well, we can discuss the
other problem as a stretch goal.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-07-19 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Jul 19, 2017 at 8:26 AM, Neha Sharma
>  wrote:
> > I am getting FailedAssertion while executing the attached script.However,I
> > am not able to produce the core dump for the same,the script runs in
> > background and takes around a day time to produce the mentioned error.
> 
> Maybe you should run it for another day with core dumps enabled (try
> the "-c" option to pg_ctl) and then pull a backtrace if you get one.
> This is a pretty generic script so it's going to be hard to guess
> where the problem is otherwise.

Also, please state server version used.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-19 Thread Robert Haas
On Tue, Jul 18, 2017 at 2:26 AM, Vik Fearing
 wrote:
> On 07/07/2017 02:02 AM, Mark Kirkwood wrote:
>> I'd prefer *not* to see a table and its partitions all intermixed in the
>> same display (especially with nothing indicating which are partitions) -
>> as this will make for unwieldy long lists when tables have many
>> partitions. Also it would be good if the 'main' partitioned table and
>> its 'partitions' showed up as a different type in some way.
>
> I've just read through this thread, and I'm wondering why we can't just
> have something like  \set SHOW_PARTITIONS true  or something, and that
> would default to false.

We could, and that would have the advantage of letting people set a
default.  On the other hand, if you want to override the default
behavior just once, adding a modifier character is a lot less typing
than issuing \set, retyping your command, and issuing \set again to
change it back.  So I don't know which is better.

My main point is that it's too late to be making changes upon which we
do not have a clear consensus.  I reject the argument that v11 will be
too late to make this change.  Now that we have partitioning, I
believe there will be zillions of things that need to be done to
improve it further; several of those things already have proposed
patches; this can be another one of those things.  If we rush
something in now and it turns out that it isn't well-liked, we may
well end up with one behavior for v<10, another behavior for v=10, and
a third behavior for v>10.  Better to wait and make the change later
when we have a few more data points.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 3:26 PM, Mark Dilger  wrote:
>> Typically, when you try to store an out-of-range value in PostgreSQL,
>> you get an ERROR, and that's one of the selling points of PostgreSQL.
>> PostgreSQL users regularly beat up other projects for, say, allowing
>> -00-00 to be considered a valid date, or any similar perceived
>> laxity in enforcing data consistency.  I don't like the idea that we
>> can just deviate from that principle whenever adhering to it is too
>> much work.
>
> I don't see the relevance of this statement.  I am not planning to allow
> abstime data that is outside the range of the new epoch.  Attempts to
> cast strings like '1962-07-07 01:02:03' to abstime would draw an
> exception with a suitably informative message.
>
> Now, the objection to having on-disk data automagically change meaning
> is concerning, ...

I see those things as very similar issues, but YMMV.

> ... and I was particularly unsatisfied with the idea that
> NOSTART_ABSTIME and NOEND_ABSTIME would suddenly be
> reinterpreted as seconds in the year 2068, which is why I made mention
> of it upthread.  I was less concerned with dates prior to 1970 being
> reinterpreted, though it is hard to justify why that bothers me less.

I think any change in the meaning of bytes already on disk is a
concern.  It wouldn't bother me to say "these are second-class types,
so if you have columns of this type, pg_upgrade will fail and tell you
that you can't upgrade a database with those columns from a catversion
< X to a catversion > X".  But having it silently work and silently
corrupt your data doesn't seem OK to me.

> Those who feel otherwise might like to also argue for dropping
> float4 because float8 does all the same stuff better.

I don't think that's the same thing, but given my notorious hatred of
floating point arithmetic, it would also bother me less than you might
think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Mark Dilger

> On Jul 19, 2017, at 9:53 AM, Robert Haas  wrote:
> 
> On Mon, Jul 17, 2017 at 6:12 PM, Tom Lane  wrote:
>> So, thinking about how that would actually work ... the thing to do in
>> order to preserve existing on-disk values is to alternate between signed
>> and unsigned interpretations of abstimes.  That is, right now, abstime
>> is signed with origin 1970.  The conversion I'm arguing we should make
>> real soon now is to unsigned with origin 1970.  If the project lives
>> so long, in circa 70 years we'd switch it to signed with origin 2106.
>> Yadda yadda.
> 
> Doesn't this plan amount to breaking pg_upgrade compatibility and
> hoping that nobody notice?  Existing values will be silently
> reinterpreted according to the new rules.  If we had two actually
> separate types and let people convert columns from the old type to the
> new type with just a validation scan, that would be engineering at the
> level of quality that I've come to associate with this project.  

This is what I have done in my fork.  I repurposed the type as "secondstamp"
since I think of it as a timestamp down to second precision (truncating
fractional seconds.)  I changed the Oids assigned to the catalog entries
associated with the type, and considered myself somewhat immune to
the project dropping the abstime type, which the documentation warned
would happen eventually.


> If
> this type is so marginal that we don't want to do that kind of work,
> then I think we should just rip it out; that doesn't preclude someone
> maintaining it in their own fork, or even adding it back as a new type
> in a contrib module with a #define for the base year.  Silently
> changing the interpretation of the same data in the core code, though,
> seems both far too clever and not clever enough.

I would be happy to contribute the "secondstamp" type as part of a patch
that removes the abstime type.  I can add a catalog table that holds the
epoch, and add documentation for the type stating that every time the
epoch changes, their data will be automatically reinterpreted, and as such
they should only use the datatype if that is ok. 

Under this plan, users with abstime columns who upgrade to postgres 11
will not have an easy time, because the type will be removed.  But that is
the same and no worse than what they would get if we just remove the
abstime type in postgres 11 without any replacement. 

I'm not implementing any of this yet, as I expect further objections.

Thoughts?

mark


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Mark Dilger

> On Jul 19, 2017, at 10:23 AM, Robert Haas  wrote:
> 
> On Wed, Jul 19, 2017 at 1:12 PM, Tom Lane  wrote:
>>> Doesn't this plan amount to breaking pg_upgrade compatibility and
>>> hoping that nobody notice?
>> 
>> Well, what we'd need to do is document that the type is only meant to be
>> used to store dates within say +/- 30 years from current time.  As long
>> as people adhere to that use-case, the proposal would work conveniently
>> long into the future ...
> 
> Typically, when you try to store an out-of-range value in PostgreSQL,
> you get an ERROR, and that's one of the selling points of PostgreSQL.
> PostgreSQL users regularly beat up other projects for, say, allowing
> -00-00 to be considered a valid date, or any similar perceived
> laxity in enforcing data consistency.  I don't like the idea that we
> can just deviate from that principle whenever adhering to it is too
> much work.

I don't see the relevance of this statement.  I am not planning to allow
abstime data that is outside the range of the new epoch.  Attempts to
cast strings like '1962-07-07 01:02:03' to abstime would draw an
exception with a suitably informative message.

Now, the objection to having on-disk data automagically change meaning
is concerning, and I was particularly unsatisfied with the idea that
NOSTART_ABSTIME and NOEND_ABSTIME would suddenly be
reinterpreted as seconds in the year 2068, which is why I made mention
of it upthread.  I was less concerned with dates prior to 1970 being
reinterpreted, though it is hard to justify why that bothers me less.

>> I'd definitely be on board with just dropping the type altogether despite
>> Mark's concern.
> 
> Then I vote for that option.

I was somewhat surprised when Tom was onboard with the idea of keeping
abstime around (for my benefit).  My original post was in response to his
statement that, right offhand, he couldn't think of any use for abstime that
wasn't handled better by timestamptz (paraphrase), and so I said that
improving storage efficiency was such a use.  I maintain that position.  The
abstime type is a good and useful type, and we will lose that use when we
discard it.  Those who feel otherwise might like to also argue for dropping
float4 because float8 does all the same stuff better.

mark


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-07-19 Thread Robins Tharakan
On 20 July 2017 at 05:08, Michael Paquier  wrote:

> On Wed, Jul 19, 2017 at 8:59 PM,
> ​​
> Fabrízio de Royes Mello
> > You should add the properly sgml docs for this pg_dumpall change also.
>
> Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
> extensions are in src/test/modules/test_pg_dump, but you just care
> about the former with this patch. And if you implement some new tests,
> look at the other tests and base your work on that.
>

​Thanks Michael /
​
Fabrízio.

Updated patch (attached) additionally adds SGML changes for pg_dumpall.
(I'll try to work on the tests, but sending this
​​
nonetheless
​).​


-
robins


pgdump_nocomments_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-19 Thread Robert Haas
On Sun, Jul 16, 2017 at 7:23 PM, Stephen Frost  wrote:
>> Refusing to improve LDAP for the users who have no choice seems like a very
>> unfriendly thing to do.
>
> I'm fine with improving LDAP in general, but, as I tried to point out,
> having a way to make it easier to integrate PG into an AD environment
> would be better.  It's not my intent to stop this patch but rather to
> point out the issues with LDAP auth that far too frequently are not
> properly understood.

Then it's off-topic for this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-07-19 Thread Michael Paquier
On Wed, Jul 19, 2017 at 8:59 PM, Fabrízio de Royes Mello
 wrote:
> On Wed, Jul 19, 2017 at 3:54 PM, Robins Tharakan  wrote:
>> You may want to consider this patch (attached) which additionally has the
>> pg_dumpall changes.
>> It would be great if you could help with the tests though, am unsure how
>> to go about them.
>
> You should add the properly sgml docs for this pg_dumpall change also.

Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
extensions are in src/test/modules/test_pg_dump, but you just care
about the former with this patch. And if you implement some new tests,
look at the other tests and base your work on that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Michael Paquier
On Wed, Jul 19, 2017 at 8:56 PM, Tom Lane  wrote:
> I wonder if it'd be worth the trouble to stick something like this into
> xlog.c:
>
> /*
>  * For reliability's sake, it's critical that pg_control updates
>  * be atomic writes.  That generally means the active data can't
>  * be more than one disk sector, which is 512 bytes on common
>  * hardware.  Be very careful about raising this limit.
>  */
>  StaticAssertStmt(sizeof(ControlFileData) <= 512,
>   "pg_control is too large for atomic disk writes");

+1. Even if it just gets triggered in 20 years by some hacker, that's
a good reminder about assumptions behind the update logic.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 2:29 PM, Mark Rofail  wrote:
> On Wed, Jul 19, 2017 at 7:28 PM, Robert Haas  wrote:
>>
>> Why do we have to solve that limitation?
>
> Since the regress test labled element_foreing_key fails now that I made the
> RI queries utilise @(anyarray, anyelement), that means it's not functioning
> as it is meant to be.

Well, if this is a new test introduced by the patch, you could also
just change the test.  Off-hand, I'm not sure that it's very important
to make the case work where the types don't match between the
referenced table and the referencing table, which is what you seem to
be talking about here.  But maybe I'm misunderstanding the situation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 12:24 AM, Rafia Sabih
 wrote:
> On testing this patch for TPC-H (for scale factor 20) benchmark I found a
> regression for Q21, on head it was taking some 600 seconds and with this
> patch it is taking 3200 seconds. This comparison is on the same partitioned
> database, one using the partition wise join patch and other is without it.
> The execution time of Q21 on unpartitioned head is some 300 seconds. The
> explain analyse output for each of these cases is attached.

Interesting.

> This suggests that partitioning is not a suitable strategy for this query,
> but then may be partition wise should not be picked for such a case to
> aggravate the performance issue.

In the unpartitioned case, and in the partitioned case on head, the
join order is l1-(nation-supplier)-l2-orders-l3.  In the patched case,
the join order changes to l1-l2-supplier-orders-nation-l3.  If the
planner used the former join order, it wouldn't be able to do a
partition-wise join at all, so it must think that the l1-l2 join gets
much cheaper when done partitionwise, thus justifying a change in the
overall join order to be able to use partion-wise join.  But it
doesn't work out.

I think the problem is that the row count estimates for the child
joins seem to be totally bogus:

->  Hash Semi Join  (cost=309300.53..491665.60 rows=1 width=12)
(actual time=10484.422..15945.851 rows=1523493 loops=3)
  Hash Cond: (l1.l_orderkey = l2.l_orderkey)
  Join Filter: (l2.l_suppkey <> l1.l_suppkey)
  Rows Removed by Join Filter: 395116

That's clearly wrong.  In the un-partitioned plan, the join to l2
produces about as many rows of output as the number of rows that were
input (998433 vs. 962909); but here, a child join with a million rows
as input is estimated to produce only 1 row of output.  I bet the
problem is that the child-join's row count estimate isn't getting
initialized at all, but then something is clamping it to 1 row instead
of 0.

So this looks like a bug in Ashutosh's patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-07-19 Thread Fabrízio de Royes Mello
On Wed, Jul 19, 2017 at 3:54 PM, Robins Tharakan  wrote:
>
>
> On 18 July 2017 at 23:55, David Fetter  wrote:
>>
>> Excellent point about pg_dumpall.  I'll see what I can draft up in the
>> next day or two and report back.
>
>
>
> Hi David,
>
> You may want to consider this patch (attached) which additionally has the
pg_dumpall changes.
> It would be great if you could help with the tests though, am unsure how
to go about them.
>

You should add the properly sgml docs for this pg_dumpall change also.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Tom Lane
I wrote:
> Hm.  Currently sizeof(struct ControlFileData) = 296, at least on my
> machine.  Letting it grow past 512 would be problematic.  It's hard
> to see getting to that any time soon, though; we don't add fields
> there often.

I wonder if it'd be worth the trouble to stick something like this into
xlog.c:

/*
 * For reliability's sake, it's critical that pg_control updates
 * be atomic writes.  That generally means the active data can't
 * be more than one disk sector, which is 512 bytes on common
 * hardware.  Be very careful about raising this limit.
 */
 StaticAssertStmt(sizeof(ControlFileData) <= 512,
  "pg_control is too large for atomic disk writes");


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-07-19 Thread Robins Tharakan
On 18 July 2017 at 23:55, David Fetter  wrote:
>
> Excellent point about pg_dumpall.  I'll see what I can draft up in the
> next day or two and report back.



​Hi David,

You may want to consider this patch (attached) which additionally has the
pg_dumpall changes.
It would be great if you could help with the tests though, am unsure how to
go about them.

-
robins​


pgdump_nocomments_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-07-19 Thread Peter Geoghegan
On Wed, Jul 19, 2017 at 10:56 AM, Robert Haas  wrote:
>> I strongly agree. I simply don't understand how you can adopt UNDO for
>> MVCC, and yet expect to get a benefit commensurate with the effort
>> without also implementing "retail index tuple deletion" first.
>
> I agree that we need retail index tuple deletion.  I liked Claudio's
> idea at 
> http://postgr.es/m/cagtbqpz-ktrqiaa13xg1gne461yowra-s-yccqptyfrpkta...@mail.gmail.com
> -- that seems indispensible to making retail index tuple deletion
> reasonably efficient.  Is anybody going to work on getting that
> committed?

I will do review work on it.

IMV the main problems are:

* The way a "header" is added at the PageAddItemExtended() level,
rather than making heap TID something much closer to a conventional
attribute that perhaps only nbtree and indextuple.c have special
knowledge of, strikes me as the wrong way to go.

* It's simply not acceptable to add overhead to *all* internal items.
That kills fan-in. We're going to need suffix truncation for the
common case where the user-visible attributes for a split point/new
high key at the leaf level sufficiently distinguish what belongs on
either side. IOW, you should only see internal items with a heap TID
in the uncommon case where you have so many duplicates at the leaf
level that you have no choice put to use a split point that's right in
the middle of many duplicates.

Fortunately, if we confine ourselves to making heap TID part of the
keyspace, the code can be far simpler than what would be needed to get
my preferred, all-encompassing design for suffix truncation [1] to
work. I think we could just stash the number of attributes
participating in a comparison within internal pages' unused item
pointer offset. I've talked about this before, in the context of
Anastasia's INCLUDED columns patch. If we can have a variable number
of attributes for heap tuples, we can do so for index tuples, too.

* We might also have problems with changing the performance
characteristics for the worse in some cases by some measures. This
will probably technically increase the amount of bloat for some
indexes with sparse deletion patterns. I think that that will be well
worth it, but I don't expect a slam dunk.

A nice benefit of this work is that it lets us kill the hack that adds
randomness to the search for free space among duplicates, and may let
us follow the Lehman & Yao algorithm more closely.

[1] 
https://wiki.postgresql.org/wiki/Key_normalization#Suffix_truncation_of_normalized_keys
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jul 19, 2017 at 1:35 PM, Tom Lane  wrote:
>> Alternatively, we could turn the origin point for abstime into
>> pg_control field, and regard changing it as a reason for a database
>> not being pg_upgrade'able unless it lacks any abstime columns.

> I would be OK with that, too, but is there any danger that we're going
> to grow pg_control to a size where reads and writes can no longer be
> assumed atomic, if we keep adding things?

Hm.  Currently sizeof(struct ControlFileData) = 296, at least on my
machine.  Letting it grow past 512 would be problematic.  It's hard
to see getting to that any time soon, though; we don't add fields
there often.

Note that I'm not seriously pushing for this solution.  I'm just trying
to make sure that we've considered all the reasonable options.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-19 Thread Mark Rofail
On Wed, Jul 19, 2017 at 7:28 PM, Robert Haas  wrote:

> Why do we have to solve that limitation?


Since the regress test labled element_foreing_key fails now that I made the
RI queries utilise @(anyarray, anyelement), that means it's not functioning
as it is meant to be.


Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions

2017-07-19 Thread Michael Paquier
On Wed, Jul 19, 2017 at 8:05 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas  wrote:
>>> It's interesting that you bring this up.  I've also wondered why we
>>> don't use random TLIs.  I suppose I'm internally assuming that it's
>>> because the people who wrote the code are far more brilliant and
>>> knowledgeable of this area than I could ever be and that doing
>>> anything else would create some kind of awful problem, but maybe
>>> that's not so.
>
>> I am not the only who worked on that, but the result code is a tad
>> more simple, as it is possible to guess more easily some hierarchy for
>> the timelines, of course with the history files at hand.
>
> Yeah, right now you have the ability to guess that, say, timeline 42
> is a descendant of 41, which you couldn't assume with random TLIs.
> Also, the values are only 32 bits, which is not wide enough to allow
> imagining that random() could be relied on to produce non-duplicate
> values.

pg_backend_random() perhaps? If any new code uses random(), those
would be slashed quickly at review.

> If we had separate database identifiers for slave installations, which
> AFAIR we don't, it'd be possible to consider incorporating part of
> the server ID into timeline IDs it creates, which would alleviate
> Brian's issue I think.  That is, instead of 1, 2, 3, ..., a server
> might create 1xyz, 2xyz, 3xyz, ... where "xyz" are random digits
> associated with the particular installation.  This is obviously
> not bulletproof since you could have collisions of the xyz's, but
> it would help.  Also you could imagine allowing DBAs to assign
> distinct xyz codes to every slave in a given community.

I am not much into any concept of complicating the timeline name to be honest :)

Having a unique identifier per node has value for other purposes, like
clustering, and we would have the same information by adding in the
history file the ID of the node that generated the new timeline.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 1:35 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Jul 19, 2017 at 1:12 PM, Tom Lane  wrote:
>>> I'd definitely be on board with just dropping the type altogether despite
>>> Mark's concern.
>
>> Then I vote for that option.
>
> BTW, another possible compromise is to move abstime into a contrib
> module; we've always accepted that contrib modules can be held to a
> lower standard than core features.  I'm not volunteering to do the
> work for that, but it's worth contemplating.

I would be OK with that, provided the documentation calls out the hazard.

> Alternatively, we could turn the origin point for abstime into
> pg_control field, and regard changing it as a reason for a database
> not being pg_upgrade'able unless it lacks any abstime columns.

I would be OK with that, too, but is there any danger that we're going
to grow pg_control to a size where reads and writes can no longer be
assumed atomic, if we keep adding things?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10b2: column-list UPDATE syntax fails with single column

2017-07-19 Thread Tom Lane
Justin Pryzby  writes:
> ts=# begin; UPDATE eric_enodeb_201707 SET 
> (pmhdelayvarbest50pct,pmlicconnecteduserstimecong)=(0,0) ;
> BEGIN
> UPDATE 3
> ts=# begin; UPDATE eric_enodeb_201707 SET (pmhdelayvarbest50pct)=(0) ;
> BEGIN
> ERROR:  source for a multiple-column UPDATE item must be a sub-SELECT or 
> ROW() expression

Hm.  It's kind of unfortunate that this isn't backwards-compatible, but
it was only accidental that that case was accepted before.  It's really
wrong, because the source for a parenthesized UPDATE target list ought
to be a row value even when there's only one item in the list.  That
is, the correct standards-compliant spelling would be

UPDATE eric_enodeb_201707 SET (pmhdelayvarbest50pct) = ROW(0);

Now, it's true that "(1,2)" is fully equivalent to "ROW(1,2)", but
"(0)" is *not* equivalent to "ROW(0)"; it's just a scalar 0.  So your
existing code is non-spec-compliant and was really being accepted in
error.

We could maybe hack up some weird action-at-a-distance kluge that would
make this case work like before, but I'm afraid it would just introduce
other inconsistencies.

> It may be that our use was wrong (?) or unintuitive (I'm in the middle of
> changing it), but wondered if it was intentional or otherwise if the release
> notes should mention that old syntax is no longer accepted.

Not sure if it's worth getting into in the release notes.  Using the
parenthesis notation for single target columns seems rather weird.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-19 Thread Tom Lane
Ashutosh Sharma  writes:
> On Wed, Jul 19, 2017 at 9:42 PM, Tom Lane  wrote:
>> I imagine the route to a solution is to fix things so that the relevant
>> macros are all defined correctly in both cases.  But why they aren't
>> already is certainly an interesting question.  Have you identified just
>> which fields are added or missing relative to what libperl thinks?

> Here are the list of macros and variables from 'intrpvar.h' file that
> are just defined in perl module but not in plperl on Windows,

> #ifdef PERL_USES_PL_PIDSTATUS
> PERLVAR(I, pidstatus,   HV *)   /* pid-to-status mappings for waitpid */
> #endif

> #ifdef PERL_SAWAMPERSAND
> PERLVAR(I, sawampersand, U8)/* must save all match strings */
> #endif

> #ifdef FCRYPT
> PERLVARI(I, cryptseen,  bool,   FALSE)  /* has fast crypt() been initialized? 
> */
> #else
> /* One byte hole in the interpreter structure.  */
> #endif

> #ifdef USE_REENTRANT_API
> PERLVAR(I, reentrant_buffer, REENTR *)  /* here we store the _r buffers */
> #endif

> #ifdef PERL_GLOBAL_STRUCT_PRIVATE
> PERLVARI(I, my_cxt_keys, const char **, NULL) /* per-module array of
> pointers to MY_CXT_KEY constants */
> # endif

> #ifdef DEBUG_LEAKING_SCALARS_FORK_DUMP
> /* File descriptor to talk to the child which dumps scalars.  */
> PERLVARI(I, dumper_fd,  int,-1)
> #endif

> #ifdef DEBUG_LEAKING_SCALARS
> PERLVARI(I, sv_serial,  U32,0)  /* SV serial number, used in sv.c */
> #endif

> #ifdef PERL_TRACE_OPS
> PERLVARA(I, op_exec_cnt, OP_max+2, UV)
> #endif

Huh.  So those seem like symbols that ought to be exposed somewhere in
Perl's headers.  Perhaps we're failing to #include some "perl_config.h" or
equivalent file that records these ABI options?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions

2017-07-19 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas  wrote:
>> It's interesting that you bring this up.  I've also wondered why we
>> don't use random TLIs.  I suppose I'm internally assuming that it's
>> because the people who wrote the code are far more brilliant and
>> knowledgeable of this area than I could ever be and that doing
>> anything else would create some kind of awful problem, but maybe
>> that's not so.

> I am not the only who worked on that, but the result code is a tad
> more simple, as it is possible to guess more easily some hierarchy for
> the timelines, of course with the history files at hand.

Yeah, right now you have the ability to guess that, say, timeline 42
is a descendant of 41, which you couldn't assume with random TLIs.
Also, the values are only 32 bits, which is not wide enough to allow
imagining that random() could be relied on to produce non-duplicate
values.

If we had separate database identifiers for slave installations, which
AFAIR we don't, it'd be possible to consider incorporating part of
the server ID into timeline IDs it creates, which would alleviate
Brian's issue I think.  That is, instead of 1, 2, 3, ..., a server
might create 1xyz, 2xyz, 3xyz, ... where "xyz" are random digits
associated with the particular installation.  This is obviously
not bulletproof since you could have collisions of the xyz's, but
it would help.  Also you could imagine allowing DBAs to assign
distinct xyz codes to every slave in a given community.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-07-19 Thread Robert Haas
On Sat, Jul 15, 2017 at 8:58 PM, Peter Geoghegan  wrote:
> To repeat myself, for emphasis: *Not all bloat is equal*.

+1.

> I strongly agree. I simply don't understand how you can adopt UNDO for
> MVCC, and yet expect to get a benefit commensurate with the effort
> without also implementing "retail index tuple deletion" first.

I agree that we need retail index tuple deletion.  I liked Claudio's
idea at 
http://postgr.es/m/cagtbqpz-ktrqiaa13xg1gne461yowra-s-yccqptyfrpkta...@mail.gmail.com
-- that seems indispensible to making retail index tuple deletion
reasonably efficient.  Is anybody going to work on getting that
committed?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-07-19 Thread Robert Haas
On Sat, Jul 15, 2017 at 6:36 PM, Alexander Korotkov
 wrote:
> I think in general there are two ways dealing with out index AM API
> limitation.  One of them is to extend index AM API.

That's pretty much what I have in mind.  I think it's likely that if
we end up with, say, 3 kinds of heap and 12 kinds of index, there will
be some compatibility matrix.  Some index AMs will be compatible with
some heap AMs, and others won't be.  For example, if somebody makes an
IOT-type heap using the API proposed here or some other one, BRIN
probably won't work at all.  btree, on the other hand, could probably
be made to work, perhaps with some greater or lesser degree of
modification.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PG10b2: column-list UPDATE syntax fails with single column

2017-07-19 Thread Justin Pryzby
In testing our application with PG10beta2, I discovered that our use of UPDATE
broke, apparently by this commit

|commit 906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
|Author: Tom Lane 
|Date:   Tue Nov 22 15:19:57 2016 -0500
|
|Improve handling of "UPDATE ... SET (column_list) = row_constructor".

ERROR ON QUERY: UPDATE eric_enodeb_cell_metrics SET (pmActiveDrbDlSum)=(%s) 
WHERE sect_id=%s AND subnetwork_id=%s AND start_time=%s AND site_id=%s AND 
interval_seconds=%s AND nedn=%s AND mecontext=%s AND EUtranCellxDD=%s AND 
EUtranCell=%s AND subnetwork=%s AND device_id=%s --
Query Error: ERROR:  source for a multiple-column UPDATE item must be a 
sub-SELECT or ROW() expression

This still works for multiple columns but in pg10b2 no longer works for single
column update with "column_list" syntax.

ts=# begin; UPDATE eric_enodeb_201707 SET 
(pmhdelayvarbest50pct,pmlicconnecteduserstimecong)=(0,0) ;
BEGIN
UPDATE 3
ts=# begin; UPDATE eric_enodeb_201707 SET (pmhdelayvarbest50pct)=(0) ;
BEGIN
ERROR:  source for a multiple-column UPDATE item must be a sub-SELECT or ROW() 
expression

It may be that our use was wrong (?) or unintuitive (I'm in the middle of
changing it), but wondered if it was intentional or otherwise if the release
notes should mention that old syntax is no longer accepted.

Either way - thanks in advance.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-19 Thread Ashutosh Sharma
On Wed, Jul 19, 2017 at 9:42 PM, Tom Lane  wrote:
>
> Ashutosh Sharma  writes:
> > Actually the function used for generation of handshake Key i.e HS_KEYp()
> > considers 'sizeof(PerlInterpreter)' to generate the key and somehow the
> > sizeof PerlInterpreter is not uniform in plperl and perl modules incase of
> > Windows but on Linux it remains same in both the modules.
>
> Yipes.  So actually, this is catching a live ABI problem, which presumably
> we've escaped seeing bad effects from only through sheer good luck.
> I suppose that the discrepancies in the struct contents only occur after
> the last field that plperl happens to touch directly --- but that is
> unlikely to be true forever.
>
> > *typedef struct interpreter PerlInterpreter;struct interpreter {#  include
> > "intrpvar.h"};*
> > where intrpvar.h has different variables defined inside it and most of the
> > variables definition are protected with various macros. And there are some
> > macros that are just defined in perl but not in plperl module which means
> > the sizeof(PerlInterpreter) on the two modules are going to be different
> > and thereby resulting in a  different key.
>
> I imagine the route to a solution is to fix things so that the relevant
> macros are all defined correctly in both cases.  But why they aren't
> already is certainly an interesting question.  Have you identified just
> which fields are added or missing relative to what libperl thinks?

Here are the list of macros and variables from 'intrpvar.h' file that
are just defined in perl module but not in plperl on Windows,

#ifdef PERL_USES_PL_PIDSTATUS
PERLVAR(I, pidstatus,   HV *)   /* pid-to-status mappings for waitpid */
#endif

#ifdef PERL_SAWAMPERSAND
PERLVAR(I, sawampersand, U8)/* must save all match strings */
#endif

#ifdef FCRYPT
PERLVARI(I, cryptseen,  bool,   FALSE)  /* has fast crypt() been initialized? */
#else
/* One byte hole in the interpreter structure.  */
#endif

#ifdef USE_REENTRANT_API
PERLVAR(I, reentrant_buffer, REENTR *)  /* here we store the _r buffers */
#endif

#ifdef PERL_GLOBAL_STRUCT_PRIVATE
PERLVARI(I, my_cxt_keys, const char **, NULL) /* per-module array of
pointers to MY_CXT_KEY constants */
# endif

#ifdef DEBUG_LEAKING_SCALARS_FORK_DUMP
/* File descriptor to talk to the child which dumps scalars.  */
PERLVARI(I, dumper_fd,  int,-1)
#endif

#ifdef DEBUG_LEAKING_SCALARS
PERLVARI(I, sv_serial,  U32,0)  /* SV serial number, used in sv.c */
#endif

#ifdef PERL_TRACE_OPS
PERLVARA(I, op_exec_cnt, OP_max+2, UV)
#endif

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions

2017-07-19 Thread Michael Paquier
On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas  wrote:
> On Wed, Jul 19, 2017 at 11:23 AM, Brian Faherty
>  wrote:
>>   I was working with replication and recovery the other day and noticed that
>> there were scenarios where I could cause multiple servers to enter the same
>> timeline while possibly having divergent data. One such scenario is Master A
>> and Replica B are both on timeline 1. There is an event that causes Replica
>> B to become promoted which changes it to timeline 2. Following this, you
>> perform a restore on Master A to a point before the event happened. Once
>> Postgres completes this recovery on Master A, it will switch over to
>> timeline 2. There are now WAL files that have been written to timeline 2
>> from both servers.
>>
>> From this scenario, I would like to suggest considering using non-sequential
>> timelines. From what I have investigated so far, I believe the *.history
>> files in the WAL directory already have all the timelines id's in them and
>> are in order. If we could make those timeline ids to be a bit more
>> unique/random, and still rely on the ordering in the *.history file, I think
>> this would help prevent multiple servers on the same timeline with divergent
>> data.

It seems to me that you are missing one piece here: the history files
generated at the moment of the timeline bump. When recovery finishes,
an instance scans the archives or from the instances it is streaming
from for history files, and chooses a timeline number that does not
match existing ones. So you are trying to avoid a problem that can
easily be solved with a proper archive for example.

>> I was hoping to begin a conversation on whether or not non-sequential
>> timelines are a good idea before I looked at the code around timelines.
>
> It's interesting that you bring this up.  I've also wondered why we
> don't use random TLIs.  I suppose I'm internally assuming that it's
> because the people who wrote the code are far more brilliant and
> knowledgeable of this area than I could ever be and that doing
> anything else would create some kind of awful problem, but maybe
> that's not so.

I am not the only who worked on that, but the result code is a tad
more simple, as it is possible to guess more easily some hierarchy for
the timelines, of course with the history files at hand.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jul 19, 2017 at 1:12 PM, Tom Lane  wrote:
>> I'd definitely be on board with just dropping the type altogether despite
>> Mark's concern.

> Then I vote for that option.

BTW, another possible compromise is to move abstime into a contrib
module; we've always accepted that contrib modules can be held to a
lower standard than core features.  I'm not volunteering to do the
work for that, but it's worth contemplating.

Alternatively, we could turn the origin point for abstime into a
pg_control field, and regard changing it as a reason for a database
not being pg_upgrade'able unless it lacks any abstime columns.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache lookup errors with functions manipulation object addresses

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 2:25 AM, Michael Paquier
 wrote:
> Would we want to improve the error handling of such objects?

+1 for such an improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 8:08 AM, Mark Rofail  wrote:
> To summarise, the options we have to solve the limitation of the @>(anyarray
> , anyelement) where it produces the following error: operator does not
> exist: integer[] @> smallint

Why do we have to solve that limitation?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 1:12 PM, Tom Lane  wrote:
>> Doesn't this plan amount to breaking pg_upgrade compatibility and
>> hoping that nobody notice?
>
> Well, what we'd need to do is document that the type is only meant to be
> used to store dates within say +/- 30 years from current time.  As long
> as people adhere to that use-case, the proposal would work conveniently
> long into the future ...

Typically, when you try to store an out-of-range value in PostgreSQL,
you get an ERROR, and that's one of the selling points of PostgreSQL.
PostgreSQL users regularly beat up other projects for, say, allowing
-00-00 to be considered a valid date, or any similar perceived
laxity in enforcing data consistency.  I don't like the idea that we
can just deviate from that principle whenever adhering to it is too
much work.

> I'd definitely be on board with just dropping the type altogether despite
> Mark's concern.

Then I vote for that option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread David Fetter
On Wed, Jul 19, 2017 at 01:12:02PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Jul 17, 2017 at 6:12 PM, Tom Lane  wrote:
> >> So, thinking about how that would actually work ... the thing to do in
> >> order to preserve existing on-disk values is to alternate between signed
> >> and unsigned interpretations of abstimes.  That is, right now, abstime
> >> is signed with origin 1970.  The conversion I'm arguing we should make
> >> real soon now is to unsigned with origin 1970.  If the project lives
> >> so long, in circa 70 years we'd switch it to signed with origin 2106.
> >> Yadda yadda.
> 
> > Doesn't this plan amount to breaking pg_upgrade compatibility and
> > hoping that nobody notice?
> 
> Well, what we'd need to do is document that the type is only meant to be
> used to store dates within say +/- 30 years from current time.  As long
> as people adhere to that use-case, the proposal would work conveniently
> long into the future ...
> 
> > If we had two actually
> > separate types and let people convert columns from the old type to the
> > new type with just a validation scan, that would be engineering at the
> > level of quality that I've come to associate with this project.
> 
> ... whereas that would be seriously INconvenient.  It's not just the
> ALTER TABLE pushups, which presumably would be something you could do
> and forget.  It's that the new type name would be something you'd have
> to change your applications to know about, and then you (or more likely
> your successor) would have to do it over again decades later.
> 
> I'd definitely be on board with just dropping the type altogether despite
> Mark's concern.  But I am not sure that the way you are proposing would
> please anybody except pedants.

+1 for just dropping the types, preferably modifying the contrib
extensions that depend on it, less preferably, dropping those, too.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jul 17, 2017 at 6:12 PM, Tom Lane  wrote:
>> So, thinking about how that would actually work ... the thing to do in
>> order to preserve existing on-disk values is to alternate between signed
>> and unsigned interpretations of abstimes.  That is, right now, abstime
>> is signed with origin 1970.  The conversion I'm arguing we should make
>> real soon now is to unsigned with origin 1970.  If the project lives
>> so long, in circa 70 years we'd switch it to signed with origin 2106.
>> Yadda yadda.

> Doesn't this plan amount to breaking pg_upgrade compatibility and
> hoping that nobody notice?

Well, what we'd need to do is document that the type is only meant to be
used to store dates within say +/- 30 years from current time.  As long
as people adhere to that use-case, the proposal would work conveniently
long into the future ...

> If we had two actually
> separate types and let people convert columns from the old type to the
> new type with just a validation scan, that would be engineering at the
> level of quality that I've come to associate with this project.

... whereas that would be seriously INconvenient.  It's not just the
ALTER TABLE pushups, which presumably would be something you could do
and forget.  It's that the new type name would be something you'd have
to change your applications to know about, and then you (or more likely
your successor) would have to do it over again decades later.

I'd definitely be on board with just dropping the type altogether despite
Mark's concern.  But I am not sure that the way you are proposing would
please anybody except pedants.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 8:26 AM, Neha Sharma
 wrote:
> I am getting FailedAssertion while executing the attached script.However,I
> am not able to produce the core dump for the same,the script runs in
> background and takes around a day time to produce the mentioned error.

Maybe you should run it for another day with core dumps enabled (try
the "-c" option to pg_ctl) and then pull a backtrace if you get one.
This is a pretty generic script so it's going to be hard to guess
where the problem is otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 11:23 AM, Brian Faherty
 wrote:
> Hey hackers,
>   I was working with replication and recovery the other day and noticed that
> there were scenarios where I could cause multiple servers to enter the same
> timeline while possibly having divergent data. One such scenario is Master A
> and Replica B are both on timeline 1. There is an event that causes Replica
> B to become promoted which changes it to timeline 2. Following this, you
> perform a restore on Master A to a point before the event happened. Once
> Postgres completes this recovery on Master A, it will switch over to
> timeline 2. There are now WAL files that have been written to timeline 2
> from both servers.
>
> From this scenario, I would like to suggest considering using non-sequential
> timelines. From what I have investigated so far, I believe the *.history
> files in the WAL directory already have all the timelines id's in them and
> are in order. If we could make those timeline ids to be a bit more
> unique/random, and still rely on the ordering in the *.history file, I think
> this would help prevent multiple servers on the same timeline with divergent
> data.
>
> I was hoping to begin a conversation on whether or not non-sequential
> timelines are a good idea before I looked at the code around timelines.

It's interesting that you bring this up.  I've also wondered why we
don't use random TLIs.  I suppose I'm internally assuming that it's
because the people who wrote the code are far more brilliant and
knowledgeable of this area than I could ever be and that doing
anything else would create some kind of awful problem, but maybe
that's not so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new function for tsquery creartion

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobny  wrote:
> Let me introduce new function for full text search query creation(which is
> called 'queryto_tsquery'). It takes 'google like' query string and
> translates it to tsquery.

I haven't looked at the code, but that sounds like a neat idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 12:12 PM, Tom Lane  wrote:
> Ashutosh Sharma  writes:
>> Actually the function used for generation of handshake Key i.e HS_KEYp()
>> considers 'sizeof(PerlInterpreter)' to generate the key and somehow the
>> sizeof PerlInterpreter is not uniform in plperl and perl modules incase of
>> Windows but on Linux it remains same in both the modules.
>
> Yipes.

+1 for "yipes".  It sounds like we should really try to fix the
underlying problem, rather than just working around the check.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Robert Haas
On Mon, Jul 17, 2017 at 6:12 PM, Tom Lane  wrote:
> So, thinking about how that would actually work ... the thing to do in
> order to preserve existing on-disk values is to alternate between signed
> and unsigned interpretations of abstimes.  That is, right now, abstime
> is signed with origin 1970.  The conversion I'm arguing we should make
> real soon now is to unsigned with origin 1970.  If the project lives
> so long, in circa 70 years we'd switch it to signed with origin 2106.
> Yadda yadda.

Doesn't this plan amount to breaking pg_upgrade compatibility and
hoping that nobody notice?  Existing values will be silently
reinterpreted according to the new rules.  If we had two actually
separate types and let people convert columns from the old type to the
new type with just a validation scan, that would be engineering at the
level of quality that I've come to associate with this project.  If
this type is so marginal that we don't want to do that kind of work,
then I think we should just rip it out; that doesn't preclude someone
maintaining it in their own fork, or even adding it back as a new type
in a contrib module with a #define for the base year.  Silently
changing the interpretation of the same data in the core code, though,
seems both far too clever and not clever enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] new function for tsquery creartion

2017-07-19 Thread Victor Drobny

Dear all,

Now Postgres has a few functions to create tsqueries for full text 
search. The main one is the to_tsquery function that allows to make 
query with any operation. But to make correct query all of the operators 
should be specified explicitly. In order to make it easier postgres has 
functions like plainto_tsquery and phraseto_tsquery which allow to make 
tsqueries from strings. But they are not flexible enough.


Let me introduce new function for full text search query creation(which 
is called 'queryto_tsquery'). It takes 'google like' query string and 
translates it to tsquery.

The main features are the following:
All the text inside double quotes would be treated as a phrase("a b c" 
-> 'a <-> b  <-> c')
New operator AROUND(N). It matches if the distance between words(or 
maybe phrases) is less than or equal to N.

Alias for !('-rat' is the same as '!rat')
Alias for |('dog OR cat' is the same as 'dog | cat')

As a plainto_tsquery and phraseto_tsquery it will fill operators by 
itself, but already placed operations won't be ignored. It allows to 
combine two approaches.


In the attachment you can find patch with the new features, tests and 
documentation for it.

What do you think about it?

Thank you very much for the attention!

--
--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e073f7b..d6fb4ce 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9494,6 +9494,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

 
  
+  queryto_tsquery
+
+ queryto_tsquery( config regconfig ,  query text)
+
+tsquery
+produce tsquery from google like query
+queryto_tsquery('english', 'The Fat Rats')
+'fat'  'rat'
+   
+   
+
+ 
   querytree
  
  querytree(query tsquery)
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index fe630a6..999e4ad 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -797,13 +797,15 @@ UPDATE tt SET ti =

 PostgreSQL provides the
 functions to_tsquery,
-plainto_tsquery, and
-phraseto_tsquery
+plainto_tsquery,
+phraseto_tsquery and
+queryto_tsquery
 for converting a query to the tsquery data type.
 to_tsquery offers access to more features
 than either plainto_tsquery or
 phraseto_tsquery, but it is less forgiving
-about its input.
+about its input. queryto_tsquery provides a 
+different, Google like syntax to create tsquery.

 

@@ -960,8 +962,68 @@ SELECT phraseto_tsquery('english', 'The Fat  Rats:C');
 -
  'fat' - 'rat' - 'c'
 
+
+
+
+queryto_tsquery( config regconfig,  querytext text) returns tsquery
+
+
+   
+queryto_tsquery creates a tsquery from a unformated text.
+But instead of plainto_tsquery and phraseto_tsquery it won't
+ignore already placed operations. This function supports following operators:
+
+ 
+  
+   '"some text" - any text inside quote signs will be treated as a phrase and will be
+performed like in phraseto_tsquery.
+  
+ 
+ 
+  
+   'OR' - standard logical operator. It is just an alias for '|'' sign.
+  
+ 
+ 
+  
+   'terma AROUND(N) termb' - this operation will match if the distance between 
+   terma and termb is less than N.
+  
+ 
+ 
+  
+   '-' - standard logical negation sign. It is an alias for '!' sign.
+  
+ 
+
+Other missing operators will be replaced by AND like in plainto_tsquery.

 
+   
+Examples:
+
+  select queryto_tsquery('The fat rats');
+   queryto_tsquery 
+  -
+   'fat' & 'rat'
+  (1 row)
+
+
+  select queryto_tsquery('"supernovae stars" AND -crab');
+ queryto_tsquery  
+  --
+   'supernova' <-> 'star' & !'crab'
+(1 row)
+
+
+  select queryto_tsquery('-run AROUND(5) "gnu debugger" OR "I like bananas"');
+queryto_tsquery  
+  ---
+   !'run' AROUND(5) 'gnu' <-> 'debugg' | 'like' <-> 'banana'
+  (1 row)
+
+
+
   
 
   
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index 18368d1..10fd8c3 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -414,7 +414,8 @@ add_to_tsvector(void *_state, char *elem_value, int elem_len)
  * and different variants are ORed together.
  */
 static void
-pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, int16 weight, bool prefix)
+pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval,
+	

Re: [HACKERS] tupconvert.c API change in v10 release notes

2017-07-19 Thread Tom Lane
Justin Pryzby  writes:
> FYI, I happened across this commit comment:
> 3f902354b08ac788600f0ae54fcbfc1d4e3ea765
> |   So, let's accept the removal of the guarantee about
> |   the output tuple's rowtype marking, recognizing that this is a API change
> |   that could conceivably break third-party callers of tupconvert.c.  (So,
> |   let's remember to mention it in the v10 release notes.)

> ..but couldn't see that the commit or change is so referenced.

Yeah, I see nothing about 3f902354b in release-10.sgml either.
We've had varying policies over the years about whether to mention
internal API changes in the release notes or not, but this one
I think does belong there, since it's a silent API break rather
than one that would easily be caught due to compiler errors.
Bruce, did you have any specific reasoning for leaving it out?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tupconvert.c API change in v10 release notes

2017-07-19 Thread Justin Pryzby
FYI, I happened across this commit comment:

3f902354b08ac788600f0ae54fcbfc1d4e3ea765
|   So, let's accept the removal of the guarantee about
|   the output tuple's rowtype marking, recognizing that this is a API change
|   that could conceivably break third-party callers of tupconvert.c.  (So,
|   let's remember to mention it in the v10 release notes.)

..but couldn't see that the commit or change is so referenced.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-19 Thread Tom Lane
Ashutosh Sharma  writes:
> Actually the function used for generation of handshake Key i.e HS_KEYp()
> considers 'sizeof(PerlInterpreter)' to generate the key and somehow the
> sizeof PerlInterpreter is not uniform in plperl and perl modules incase of
> Windows but on Linux it remains same in both the modules.

Yipes.  So actually, this is catching a live ABI problem, which presumably
we've escaped seeing bad effects from only through sheer good luck.
I suppose that the discrepancies in the struct contents only occur after
the last field that plperl happens to touch directly --- but that is
unlikely to be true forever.

> *typedef struct interpreter PerlInterpreter;struct interpreter {#  include
> "intrpvar.h"};*
> where intrpvar.h has different variables defined inside it and most of the
> variables definition are protected with various macros. And there are some
> macros that are just defined in perl but not in plperl module which means
> the sizeof(PerlInterpreter) on the two modules are going to be different
> and thereby resulting in a  different key.

I imagine the route to a solution is to fix things so that the relevant
macros are all defined correctly in both cases.  But why they aren't
already is certainly an interesting question.  Have you identified just
which fields are added or missing relative to what libperl thinks?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Using non-sequential timelines in order to help with possible collisions

2017-07-19 Thread Brian Faherty
Hey hackers,
  I was working with replication and recovery the other day and noticed
that there were scenarios where I could cause multiple servers to enter the
same timeline while possibly having divergent data. One such scenario is
Master A and Replica B are both on timeline 1. There is an event that
causes Replica B to become promoted which changes it to timeline 2.
Following this, you perform a restore on Master A to a point before the
event happened. Once Postgres completes this recovery on Master A, it will
switch over to timeline 2. There are now WAL files that have been written
to timeline 2 from both servers.

>From this scenario, I would like to suggest considering using
non-sequential timelines. From what I have investigated so far, I believe
the *.history files in the WAL directory already have all the timelines
id's in them and are in order. If we could make those timeline ids to be a
bit more unique/random, and still rely on the ordering in the *.history
file, I think this would help prevent multiple servers on the same timeline
with divergent data.

I was hoping to begin a conversation on whether or not non-sequential
timelines are a good idea before I looked at the code around timelines.

-- 
Brian Faherty


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-19 Thread Ashutosh Sharma
On Thu, Jul 13, 2017 at 10:30 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> It would be nice to get to the bottom of why we're getting a version
>> mismatch on Windows, since we're clearly not getting one on Linux.
>
> Yeah, that's what's bothering me: as long as that remains unexplained,
> I don't have any confidence that we're fixing the right thing.

Okay, I tried to explore on this a bit and my findings are as follows.

On Linux, the handshake key being generated in plperl code i.e. inside
XS_EXTERNAL() in Util.c (generated from Util.xs during build time) and perl
code (inside Perl_xs_handshake function) are same which means the following
condition inside Perl_xs_handshake() becomes true and therefore, there is
no mismatch error.

   got = INT2PTR(void*, (UV)(key & HSm_KEY_MATCH));
   need = (void *)(HS_KEY(FALSE, FALSE, "", "") & HSm_KEY_MATCH);
   if (UNLIKELY(got != need))
   goto bad_handshake;


However, on Windows, the handshake key generated in plperl code inside
XS_EXTERNAL() and in perl code i.e. inside Perl_xs_handshake() are
different thereby resulting in a mismatch error.

Actually the function used for generation of handshake Key i.e HS_KEYp()
considers 'sizeof(PerlInterpreter)' to generate the key and somehow the
sizeof PerlInterpreter is not uniform in plperl and perl modules incase of
Windows but on Linux it remains same in both the modules.

This is how PerlInterpreter is defined in Perl source,





*typedef struct interpreter PerlInterpreter;struct interpreter {#  include
"intrpvar.h"};*

where intrpvar.h has different variables defined inside it and most of the
variables definition are protected with various macros. And there are some
macros that are just defined in perl but not in plperl module which means
the sizeof(PerlInterpreter) on the two modules are going to be different
and thereby resulting in a  different key. But, then the point is, why no
such difference is observed on Linux. Well, as of now, i haven't found the
reason behind it and i am still investigating on it.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


>
> regards, tom lane


Re: [HACKERS] JSONB - JSONB operator feature request

2017-07-19 Thread david . turon
Hi,

hstore have only key-value pairs, but in json can have same behavior - 
only equal objects are removed:

SELECT '{"a": 1, "b": {"c": 2}}'::JSONB - '{"a": 1, "b": {"c": 3}}'::JSONB 

'{"b": {"c": 2}}'

SELECT '{"a": 1, "b": {"c": 2}}'::JSONB - '{"a": 2, "b": {"c": 2}}'::JSONB 

'{"a": 1}'

David


-- 
-
Ing. David TUROŇ
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:+420 591 166 224
fax:+420 596 621 273
mobil:  +420 732 589 152
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-



Od: David Fetter 
Komu:   david.tu...@linuxbox.cz
Kopie:  pgsql-hackers@postgresql.org
Datum:  18. 07. 2017 18:24
Předmět:Re: [HACKERS] JSONB - JSONB operator feature request



On Tue, Jul 18, 2017 at 01:36:32PM +0200, david.tu...@linuxbox.cz wrote:
> Hi,
> 
> some users and me used hstore - hstore for example storing only changed 
> rows in trigger like:
> 
> hstore(NEW) - hstore(OLD)
> 
> There isn't same operator/function in JSON/JSONB. We can only remove 
keys 
> from JSONB, but not equal key-value pairs. Is there any chance to have 
> same feature with JSON/JSONB in postgres core?

What would - mean precisely for JSON[B]?

For example, what would you expect

SELECT '{"a": 1, "b": {"c": 2}}'::JSONB - '{"b": 1, "b": {"c": 3}}'::JSONB

to yield?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [HACKERS] merge psql ef/ev sf/sv handling functions

2017-07-19 Thread Fabien COELHO



While reviewing Corey's \if patch, I complained that there was some amount
of copy-paste in "psql/command.c".

Here is an attempt at merging some functions which removes 160 lines of
code.


Thank you for the patch. Is this an item for PG11?


Yep.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-07-19 Thread Tom Lane
Etsuro Fujita  writes:
> * Modified rewrite_targetlist(), which is a new function added to 
> preptlist.c, so that we do const-simplification to junk TLEs that 
> AddForeignUpdateTargets() added, as that API allows the FDW to add junk 
> TLEs containing non-Var expressions to the query's targetlist.

This does not seem like a good idea to me.  eval_const_expressions is not
a cheap thing, and for most use-cases those cycles will be wasted, and it
has never been the responsibility of preprocess_targetlist to do this sort
of thing.

Please put the responsibility of doing const-expression simplification
in these cases somewhere closer to where the problem is being created.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-19 Thread Fabien COELHO


Hello Alik,


I am attaching patch v3.


Among other things I fixed small typo in description of 
random_exponential function in pgbench.sgml file.


Ok. Probably this typo should be committed separatly and independently.

A few comments about v3:

Patch applies cleanly, compiles, works.

About the maths: As already said, I'm not at ease with a random_zipfian 
function which does not display a (good) zipfian distribution. At the 
minimum the documentation should be clear about the approximations implied 
depending on the parameter value.


In the litterature the theta parameter seems to be often called alpha
or s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest to
stick to "s" instead of "theta"?

About the code: looks simpler than the previous version, which is good.

Double constants should be used when the underlying type is a double,
instead of relying on implicit int to double promotion (0 -> 0.0, 1 -> 1.0).

Functions zipfZeta(n, theta) does not really computes the zeta(n) function,
so I think that a better name should be chosen. It seems to compute
H_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct.

The handling of cache overflow by randomly removing one entry looks like
a strange idea. Rather remove the oldest entry?

ISTM that it should print a warning once if the cache array overflows as 
performance would drop heavily.


If the zipf cache is constant size, there is no point in using dynamic 
allocation, just declare an array...


Comments need updating: eg "theta parameter of previous execution" which
dates back when there was only one value.

There should be a comment to explain that the structure is in the thread
for thread safety.

There should be non regression tests somehow. If the "improve pgbench
tap test infrastructure" get through, things should be added there.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2017-07-19 Thread Alexander Kuzmenkov

That's not actually a problem as I am reusing an older v4 from Heikki
now for the future, but I wanted to let you know about that first.

Thank you, I'll look into that.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-19 Thread Tom Lane
Mark Dilger  writes:
>> On Jul 18, 2017, at 9:13 PM, Mark Dilger  wrote:
>> There was not much conversation about this, so I went ahead with what
>> I think makes a logical first step.  The attached patch removes the tinterval
>> datatype from the sources.

> As predicted, this second patch (which should be applied *after* the prior
> tinterval_abatement patch) removes the reltime datatype from the sources.

Seems like a good plan.  Please be sure to add these patches to the next
commitfest, so we remember them when the time comes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-19 Thread Mark Rofail
*To summarise,* the options we have to solve the limitation of the
@>(anyarray , anyelement) where it produces the following error: operator
does not exist: integer[] @> smallint

*Option 1: *Multiple Operators
Have separate operators for every combination of datatypes instead of a
single polymorphic definition (i.e int4[] @>> int8, int4[] @>> int4, int4[]
@>> int2, int4[] @>> numeric.)

Drawback: High maintenance.


*Option 2: *Explicit casting
Where we compare the datatype of the 2 operands and cast with the
appropriate datatype

Drawback: figuring out the appropriate cast may require considerable
computation


*Option 3:* Unsafe Polymorphic datatypes
This a little out there. But since @>(anyarray, anyelement) have to resolve
to the same datatype. How about defining new datatypes without this
constraint? Where we handle the datatypes ourselves? It would ve something
like @>(unsafeAnyarray, unsafeAnyelement).

Drawback: a lot of defensive programming has to be implemented to guard
against any exception.


*Another thing*
Until this is settled, another thing I have to go through is performance
testing. To provide evidence that all we did actually enhances the
performance of the RI checks. How can I go about this?

Best Regards,
Mark Rofail


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-19 Thread Mark Rofail
On Tue, Jul 18, 2017 at 11:14 PM, Alvaro Herrera 
wrote:
>
> Why did we add an operator and not a support
> procedure?


I thought the support procedures were constant within an opclass. They
implement the mandotary function required of an opclass. I don't see why we
would need to implement new ones since they already deal with the lefthand
operand which is the refrencing coloumn and is always an array so anyarray
would suffice.

Also the support procedure don't interact with the left and right operands
simultanously. And we want to target the combinations of  int4[] @>> int8,
int4[] @>> int4, int4[] @>> int2, int4[] @>> numeric.

So I think implementing operators is the way to go.

Best Regards,
Mark Rofail.


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-07-19 Thread Etsuro Fujita

On 2017/07/13 21:10, Etsuro Fujita wrote:

Attached is an updated version of the patch.

Here is an updated version of the patch.  Changes are:

* Modified rewrite_targetlist(), which is a new function added to 
preptlist.c, so that we do const-simplification to junk TLEs that 
AddForeignUpdateTargets() added, as that API allows the FDW to add junk 
TLEs containing non-Var expressions to the query's targetlist.

* Updated docs in fdwhandler.sgml.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 6924,6929  update bar set f2 = f2 + 100 returning *;
--- 6924,6988 
7 | 277
  (6 rows)
  
+ -- Test that UPDATE/DELETE with inherited target works with row-level triggers
+ CREATE TRIGGER trig_row_before
+ BEFORE UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ CREATE TRIGGER trig_row_after
+ AFTER UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ explain (verbose, costs off)
+ update bar set f2 = f2 + 100;
+   QUERY PLAN  

+ 
--
+  Update on public.bar
+Update on public.bar
+Foreign Update on public.bar2
+  Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1 RETURNING 
f1, f2, f3
+->  Seq Scan on public.bar
+  Output: bar.f1, (bar.f2 + 100), bar.ctid
+->  Foreign Scan on public.bar2
+  Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, bar2.*
+  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+ (9 rows)
+ 
+ update bar set f2 = f2 + 100;
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (3,333,33),NEW: (3,433,33)
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (4,344,44),NEW: (4,444,44)
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (7,277,77),NEW: (7,377,77)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (3,333,33),NEW: (3,433,33)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (4,344,44),NEW: (4,444,44)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (7,277,77),NEW: (7,377,77)
+ explain (verbose, costs off)
+ delete from bar where f2 < 400;
+  QUERY PLAN   
   
+ 
-
+  Delete on public.bar
+Delete on public.bar
+Foreign Delete on public.bar2
+  Remote SQL: DELETE FROM public.loct2 WHERE ctid = $1 RETURNING f1, f2, f3
+->  Seq Scan on public.bar
+  Output: bar.ctid
+  Filter: (bar.f2 < 400)
+->  Foreign Scan on public.bar2
+  Output: bar2.ctid, bar2.*
+  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 
400)) FOR UPDATE
+ (10 rows)
+ 
+ delete from bar where f2 < 400;
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW DELETE ON bar2
+ NOTICE:  OLD: (7,377,77)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW DELETE ON bar2
+ NOTICE:  OLD: (7,377,77)
+ -- cleanup
+ DROP TRIGGER trig_row_before ON bar2;
+ DROP TRIGGER trig_row_after ON bar2;
  drop table foo cascade;
  NOTICE:  drop cascades to foreign table foo2
  drop table bar cascade;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***
*** 1609,1614  explain (verbose, costs off)
--- 1609,1634 
  update bar set f2 = f2 + 100 returning *;
  update bar set f2 = f2 + 100 returning *;
  
+ -- Test that UPDATE/DELETE with inherited target works with row-level triggers
+ CREATE TRIGGER trig_row_before
+ BEFORE UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ 
+ CREATE TRIGGER trig_row_after
+ AFTER UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ 
+ explain (verbose, costs off)
+ update bar set f2 = f2 + 100;
+ update bar set f2 = f2 + 100;
+ 
+ explain (verbose, costs off)
+ delete from bar where f2 < 400;
+ delete from bar where f2 < 400;
+ 
+ -- cleanup
+ DROP TRIGGER trig_row_before ON bar2;
+ DROP TRIGGER trig_row_after ON bar2;
  drop table foo cascade;
  drop table bar cascade;
  drop table loct1;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***
*** 432,438  AddForeignUpdateTargets (Query *parsetree,
  
  
  
!  This function is called in the rewriter, not the planner, so the
   information available is a bit different from that available to the
   planning routines.
   parsetree is the parse tree for the UPDATE or
--- 432,438 
  
  
  
!  Although this function is called in the planner, the
   

[HACKERS] Dealing with logical replication

2017-07-19 Thread Tatsuo Ishii
Now that we are going to have logical replication in PostgreSQL 10, I
have started thinking how Pgpool-II can deal with it. For example, the
logical replication does not replicate DDLs. Isn't it convenient for
users to do it automatically in Pgpool-II? Or even doing it for
TRUNCATE?

Or are they against the design philosophy of the logical replication?

Comments are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cache lookup errors with functions manipulation object addresses

2017-07-19 Thread Michael Paquier
Hi all,

Per an offline report from Moshe Jacobson, it is possible to trigger
easily cache lookup errors using pg_describe_object with invalid
object IDs and pg_describe_object(). I had a closer look at things in
this area, to notice that there are other user-facing failures as many
things use the same interface:
=# select pg_identify_object('pg_class'::regclass, 0::oid, 0);
ERROR:  XX000: cache lookup failed for relation 0
=# select pg_describe_object('pg_class'::regclass, 0::oid, 0);
ERROR:  XX000: cache lookup failed for relation 0
=# select pg_identify_object_as_address('pg_class'::regclass, 0::oid, 0);
ERROR:  XX000: cache lookup failed for relation 0

As my previous opinion on the matter in
https://www.postgresql.org/message-id/87wpxfygg9@credativ.de, I
still think that "cache lookup" failures should not be things a user
is able to trigger at will, and that those errors should be replaced
by proper NULL results. That's clearly not an item for PG10, but we
could consider improving things for PG11. Still, we are talking about
adding NULLness handling in getObjectDescription(), which goes into
low-level functions to grab the name of some objects, and some of
those functions have their own way to deal with incorrect objects
(format_type_be returns "???" for example for functions).

Would we want to improve the error handling of such objects? Or that's
not worth the effort? Álvaro, what's your take on the matter as you
worked a lot on that?

Thoughts,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] merge psql ef/ev sf/sv handling functions

2017-07-19 Thread Masahiko Sawada
On Wed, Jul 19, 2017 at 2:41 PM, Fabien COELHO  wrote:
>
>>> While reviewing Corey's \if patch, I complained that there was some
>>> amount
>>> of copy-paste in "psql/command.c".
>>>
>>> Here is an attempt at merging some functions which removes 160 lines of
>>> code.
>>
>>
>> Thank you for the patch. Is this an item for PG11?
>
>
> Yes, as it is submitted to CF 2017-09.
>

Thank!
It is already registered to next CF. I missed it, sorry.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-07-19 Thread Haribabu Kommi
On Sat, Jul 15, 2017 at 12:30 PM, Robert Haas  wrote:

> On Fri, Jul 14, 2017 at 8:35 AM, Haribabu Kommi
>  wrote:
> > To replace tuple with slot, I took trigger and SPI calls as the first
> step
> > in modifying
> > from tuple to slot, Here I attached a WIP patch. The notable changes are,
> >
> > 1. Replace most of the HeapTuple with Slot in SPI interface functions.
> > 2. In SPITupleTable, Instead of HeapTuple, it is changed to
> TupleTableSlot.
> > But this change may not be a proper approach, because a duplicate copy of
> > TupleTableSlot is generated and stored.
> > 3. Changed all trigger interfaces to accept TupleTableSlot Instead of
> > HeapTuple.
> > 4. ItemPointerData is added as a member to the TupleTableSlot structure.
> > 5. Modified the ExecInsert and others to work directly on TupleTableSlot
> > instead
> > of tuple(not completely).
>
> What problem are you trying to solve with these changes?  I'm not
> saying that it's a bad idea, but I think you should spell out the
> motivation a little more explicitly.
>

Sorry for not providing complete details. I am trying these experiments
to find out the best way to return the tuple from Storage methods by
designing a proper API.

The changes that I am doing are to reduce the dependency on the
HeapTuple format with value/nulls array. So if there is no dependency
on the HeapTuple format in the upper layers of the PostgreSQL storage,
then directly we can define the StorageAPI to return the value/nulls array
instead of one big chunck of tuple data like HeapTuple or StorageTuple(void
*).

I am finding out that eliminating the HeapTuple usage in the upper layers
needs some major changes, How about not changing anything in the upper
layers of storage currently and just support the pluggable tuple with one of
the following approach for first version?

1. Design an API that returns values/nulls array and convert that into a
HeapTuple whenever it is required in the upper layers. All the existing
heap form/deform tuples are used for every tuple with some adjustments.

2. Design an API that returns StorageTuple(void *) with first member
represents the TYPE of the storage, so that corresponding registered
function calls can be called to deform/form the tuple whenever there is
a need of tuple.

3. Design an API that returns StorageTuple(void *) but the necessary
format information of that tuple can be get from the tupledesc. wherever
the tuple is present, there exists a tupledesc in most of the cases. How
about adding some kind of information in tupledesc to find out the tuple
format and call the necessary functions

4. Design an API to return always the StorageTuple and it converts to
HeapTuple with a function hook if it gets registered (for heap storages
this is not required to register the hook, because it is already a HeapTuple
format). This function hook should be placed in the heap form/deform
functions.

Any other better ideas for a first version.

Regards,
Hari Babu
Fujitsu Australia