Re: [HACKERS] Asymmetry between parent and child wrt "false" quals

2017-03-21 Thread Ashutosh Bapat
On Tue, Mar 21, 2017 at 2:19 PM, Amit Langote
 wrote:
> On 2017/03/21 14:59, Ashutosh Bapat wrote:
>> When I run a query like below on a child-less table, the plan comes out to be
>>
>> explain verbose SELECT * FROM uprt1_l WHERE a = 1 AND a = 2;
>>   QUERY PLAN
>> --
>>  Result  (cost=0.00..11.50 rows=1 width=13)
>>Output: a, b, c
>>One-Time Filter: false
>>->  Seq Scan on public.uprt1_l  (cost=0.00..11.50 rows=1 width=13)
>>  Output: a, b, c
>>  Filter: (uprt1_l.a = 1)
>> (6 rows)
>>
>> where as the same query run on a parent with children, the plan is
>> postgres=# \d prt1_l
>> Table "public.prt1_l"
>>  Column |   Type| Collation | Nullable | Default
>> +---+---+--+-
>>  a  | integer   |   | not null |
>>  b  | integer   |   |  |
>>  c  | character varying |   |  |
>> Partition key: RANGE (a)
>> Number of partitions: 3 (Use \d+ to list them.)
>>
>> postgres=# explain verbose SELECT * FROM prt1_l WHERE a = 1 AND a = 2;
>> QUERY PLAN
>> ---
>>  Result  (cost=0.00..0.00 rows=0 width=40)
>>Output: prt1_l.a, prt1_l.b, prt1_l.c
>>One-Time Filter: false
>> (3 rows)
>>
>> For a parent table with children, set_append_rel_size() evaluates
>> restrictions in loop
>>  880 foreach(l, root->append_rel_list)
>>  881 {
>>  882 AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
>>
>> starting at 1021. If any of the restrictions are evaluated to false,
>> it set the child as dummy. If all children are dummy, the appendrel is
>> set to dummy.
>>
>> But for a child-less table, even if the "false" qual is available in
>> baserestrictinfo in set_rel_size(), we do not mark the relation as
>> dummy. Instead, paths are created for it and only at the time of
>> planning we add the gating plan when there is a pseudo constant quals.
>> Why do we have different behaviours in these two cases?
>
> I think the case where there is no child table would not be handled by
> set_append_rel_size(), because rte->inh would be false.  Instead, I
> thought the test at the beginning of relation_excluded_by_constraints()
> would have detected this somehow; the comment there says the following:
>
> /*
>  * Regardless of the setting of constraint_exclusion, detect
>  * constant-FALSE-or-NULL restriction clauses.  Because const-folding will
>  * reduce "anything AND FALSE" to just "FALSE", any such case should
>  * result in exactly one baserestrictinfo entry.
>
> But the qual (a = 1 and a = 2) is *not* reduced to exactly one
> constant-false-or-null baserestrictinfo entry; instead I see that there
> are two RestrictInfos viz. a = 1 and const-FALSE at that point.  I think
> the const-folding mentioned in the above comment does not occur after
> equivalence class processing, which would be required to conclude that (a
> = 1 and a = 2) reduces to constant-false.  OTOH, (a = 1 and false) can be
> reduced to constant-false much earlier when performing
> preprocess_qual_conditions().

Right.

>
> That said, I am not sure if it's worthwhile to modify the test at the
> beginning of relation_excluded_by_constraints() to iterate over
> rel->baserestrictinfos to look for any const-FALSE quals, instead of doing
> it only when there *only* the const-FALSE qual.
>
I don't think we should do it in relation_excluded_by_constraints().
We should do it outside like what is being done in
set_append_rel_size(). Probably we should extract common code into a
function and call it for both kinds of relations.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-21 Thread Michael Paquier
On Wed, Mar 22, 2017 at 12:46 AM, Teodor Sigaev  wrote:
 If that can happen, don't we have the same problem in many other places?
 Like, all the SLRUs? They don't fsync the directory either.
>>>
>>> Right, pg_commit_ts and pg_clog enter in this category.
>>
>>
>> Implemented as attached.
>>
 Is unlink() guaranteed to be durable, without fsyncing the directory? If
 not, then we need to fsync() the directory even if there are no files in
 it
 at the moment, because some might've been removed earlier in the
 checkpoint
 cycle.
>
> What is protection if pg crashes after unlimk() but before fsync()? Right,
> it's rather small window for such scenario, but isn't better to  have
> another protection?

This may apply in some cases, but my lookup on the matter regarding
this patch is that we don't actually need something like that yet,
because there are no cases where we could apply it:
- Removal of past WAL segments is on a per-node base.
- Installation of new segments is guessable.
- Removal of backup_label and tablespace map is based on the timings
of pg_start/stop_backup, once again on a per-node basis.
This patch reduces the window to the minimum we can do: once
durable_unlink() leaves, we can guarantee that the system is in a
durable state.

> Like WAL-logging of WAL segment removing...

The pace of removal of the WAL segments is different on each node, and
should be different on each node (for example there could be
replication slots with different retention policies on cascading
downstream standbys). So that's not a concept you can apply here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible regression with gather merge.

2017-03-21 Thread Mithun Cy
I accidently encountered a case where gather merge was picked as
default but disabling same by setting max_parallel_workers_per_gather
= 0; produced a non-parallel plan which was faster than gather merge,
but its cost is marked too high when compared to gather merge.

I guess we need some cost adjustment is planner code.

Test setting
=
create table test as (select id, (random()*1)::int as v1, random() as
v2 from generate_series(1,100) id);
create index test_v1_idx on test (v1);


Server setting is default.


postgres=# explain analyze select * from test order by v1, v2 limit 10;
   QUERY
PLAN

 Limit  (cost=19576.71..19577.88 rows=10 width=16) (actual
time=265.989..265.995 rows=10 loops=1)
   ->  Gather Merge  (cost=19576.71..116805.80 rows=84 width=16)
(actual time=265.987..265.992 rows=10 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort  (cost=18576.69..19618.36 rows=416667 width=16)
(actual time=250.202..250.424 rows=911 loops=3)
   Sort Key: v1, v2
   Sort Method: external merge  Disk: 9272kB
   ->  Parallel Seq Scan on test  (cost=0.00..9572.67
rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3)
 Planning time: 0.193 ms
 Execution time: 271.222 ms

postgres=# set max_parallel_workers_per_gather = 0;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 10;
 QUERY PLAN
-
 Limit  (cost=37015.64..37015.67 rows=10 width=16) (actual
time=211.582..211.584 rows=10 loops=1)
   ->  Sort  (cost=37015.64..39515.64 rows=100 width=16) (actual
time=211.581..211.582 rows=10 loops=1)
 Sort Key: v1, v2
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on test  (cost=0.00..15406.00 rows=100
width=16) (actual time=0.085..107.522 rows=100 loops=1)
 Planning time: 0.093 ms
 Execution time: 211.608 ms
(7 rows)



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-21 Thread Michael Paquier
On Tue, Mar 21, 2017 at 10:37 PM, Kuntal Ghosh
 wrote:
> On Tue, Mar 21, 2017 at 10:52 AM, Michael Paquier
>  wrote:
> We reserve a slot for each possible BackendId, plus one for each
> possible auxiliary process type. For a non-auxiliary process,
> BackendId is used to refer the backend status in PgBackendStatus
> array. So, a bgworker without any BackendId can't initialize its'
> entry in PgBackendStatus array. In simple terms, it will not be shown
> in pg_stat_activity. I've added some comments regarding this in
> pgstat_bestart().

- * Called from InitPostgres.
- * MyDatabaseId, session userid, and application_name must be set
- * (hence, this cannot be combined with pgstat_initialize).
+ *
+ * Apart from auxiliary processes, MyBackendId, MyDatabaseId,
+ * session userid, and application_name must be set for a
+ * backend (hence, this cannot be combined with pgstat_initialize).
That looks right.

> And, any bgworker having valid BackendId will have either a valid
> userid or BOOTSTRAP_SUPERUSERID.

So looking at the area of the code in more details, my memories have
failed me a bit. InitPostgres() is setting up MyBackendId via
SharedInvalBackendInit(), something that will never be called for
backend processes not connected to a database. The bgworker for
logical replication does not do that.

>> If you want to test this configuration, feel free to use this background 
>> worker:
>> https://github.com/michaelpq/pg_plugins/tree/master/hello_world
>> This just prints an entry to the logs every 10s, and does not connect
>> to any database. Adding a call to pgstat_bestart() in hello_main
>> triggers the assertion.
>>
> In this case, pgstat_bestart() shouldn't be called from this module as
> the spawned bgworker will have invalid BackendId. By the way, thanks
> for sharing the repo link. Found a lot of interesting things to
> explore and learn. :)

RIP to this process. Not sure if that's worth the documentation. I
imagine that people usually implement bgworkers by deleting lines in
worker_spi and keeping its structure.

>> Except for the two issues pointed out in this email, I am pretty cool
>> with this patch. Nice work.
> Thank you. :)
>
> Please find the updated patches.

Okay, switched as ready for committer. One note for the committer
though: keeping the calls of pgstat_bestart() out of
BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid() keeps users the
possibility to not have backends connected to the database show up in
pg_stat_activity. This may matter for some users (cloud deployment for
example). I am as well in favor in keeping the work of those routines
minimal, without touching at pgstat.

if (!bootstrap)
CommitTransactionCommand();
+
return;
Some useless noise here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Silence perl warning in check-world

2017-03-21 Thread Peter Eisentraut
On 3/13/17 16:51, Jeff Janes wrote:
> In some older versions of perl (like v5.10), I get a warning that:
> 
> Use of implicit split to @_ is deprecated at
> src/test/recovery/t/006_logical_decoding.pl
>  line 26.
> 
> Splitting into a dummy variable silences that warning, as in the
> attached.  There may be a better way to silence the warning.
>  (Presumably it actually is clobbering @_ in those older versions of
> Perl, but that doesn't seem to cause problems in this particular case.)

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-03-21 Thread Peter Eisentraut
On 3/21/17 15:34, Robert Haas wrote:
> So I tend to think that there should always be some explicit user
> action to cause the creation of a slot, like --create-slot-if-needed
> or --create-slot=name.  That still won't prevent careless use of that
> option but it's less dangerous than assuming that a user who refers to
> a nonexistent slot intended to create it when, perhaps, they just
> typo'd it.

I have the same concern.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregates and row types

2017-03-21 Thread Tom Lane
Thomas Munro  writes:
> Is is expected that the first query below can be analysed and planned,
> but the second can't?

> explain select x from (select row(42)) s(x);

> explain select count(x) from (select row(42)) s(x);
> ERROR:  record type has not been registered

Well, ideally that wouldn't happen, but making it go away isn't
all that trivial.  Try coercing the ROW() expression to some named
composite type.  For example, in the regression database this works:

regression=# select count(x) from (select row(42)::int4_tbl) s(x);
 count 
---
 1
(1 row)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.

2017-03-21 Thread Peter Eisentraut
On 3/17/17 13:56, Elvis Pranskevichus wrote:
> Currently, clients wishing to know when the server exits hot standby
> have to resort to polling, which is often suboptimal.
> 
> This adds the new "in_hot_standby" GUC variable that is reported via a
> ParameterStatus message.

The terminology chosen here is not very clear.  What is the opposite of
"in hot standby"?  Warm standby?  Cold standby?  Not standby at all?
Promoted to primary (writable)?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 11:49 PM, Robert Haas  wrote:
> I'm a little worried that this whole question of changing the file
> naming scheme is a diversion which will result in torpedoing any
> chance of getting some kind of improvement here for v11.  I don't
> think the patch is all that far from being committable but it's not
> going to get there if we start redesigning the world around it.

Ha.  A little Freudian slip there, since I obviously meant v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 8:10 PM, Stephen Frost  wrote:
>> We've already
>> created quite a few incompatibilities in this release, and I'm not
>> entirely eager to just keep cranking them out at top speed.
>
> That position would seem to imply that you're in favor of keeping the
> current default of 16MB, but that doesn't make sense given that you
> started this discussion advocating to make it larger.  Changing your
> position is certainly fine, but it'd be good to be more clear if that's
> what you meant here or if you were just referring to the file naming
> scheme but you do still want to increase the default size.

To be honest, I'd sort of forgotten about the change which is the
nominal subject of this thread - I was more focused on the patch,
which makes it configurable.  I was definitely initially in favor of
raising the value, but I got cold feet, a bit, when Alvaro pointed out
that going to 64MB would require a substantial increase in
min_wal_size.  I'm not sure people with small installations will
appreciate seeing that value cranked up from 5 segments * 16MB = 80MB
to, say, 3 segments * 64MB = 192MB.  That's an extra 100+ MB of space
that doesn't really do anything for you.  And nobody's done any
benchmarking to see whether having only 3 segments is even a workable,
performant configuration, so maybe we'll end up with 5 * 64MB = 320MB
by default.

I'm a little worried that this whole question of changing the file
naming scheme is a diversion which will result in torpedoing any
chance of getting some kind of improvement here for v11.  I don't
think the patch is all that far from being committable but it's not
going to get there if we start redesigning the world around it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PUBLICATIONS and pg_dump

2017-03-21 Thread Peter Eisentraut
On 2/26/17 14:25, Petr Jelinek wrote:
> Yeah that was oversight in initial patch, publications and their
> membership was supposed to be dumped only when table filter is not used.
> I mistakenly made it check for data_only instead of using the
> selectDumpableObject machinery.

Committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-03-21 Thread Haribabu Kommi
On Wed, Mar 22, 2017 at 1:53 PM, Seki, Eiji 
wrote:

>
> Thank you for your review, again.
>
> I think your proposals are better, so I reflected them.


Thanks for the updated patch. Patch looks good to me.
I marked it as "ready for committer".

While reviewing this patch, I found that PGXACT->vacuumFlags
variable name needs a rename because with the addition of
PROC_IN_LOGICAL_DECODING flag "vacuumFlags" doesn't
only use it for vacuum operation. I feel this variable can be renamed
as just "flags", but anyway that is a different patch.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-03-21 Thread Craig Ringer
On 15 March 2017 at 21:56, Stephen Frost  wrote:
> Greetings,
>
> * Sachin Kotwal (kotsac...@gmail.com) wrote:
>> Thanks. I understand this is small but new feature and not bug fix.
>> But we should be able to backpatch if there is no dependency.
>
> No, it's a new feature and won't be back-patched.
>
>> It will help users to get benefit of this feature for g96 and pg95 in RDS
>> until they will have pg10 in RDS.
>
> There is no need to wait for pg10 to be in RDS to use PG10's pg_dumpall
> against RDS databases.  pg_dump and pg_dumpall are very intentionally
> designed and intended to work against older versions of PG, so as soon
> as PG10 is released you'll be able to run PG10's pg_dumpall against your
> 9.6 or 9.5 RDS databases.

However, there's no guarantee that 9.5 or 9.6 will be able to
_restore_ dumps made with Pg10's pg_dumpall and pg_dump.

We don't have any output-compatibility in pg_dump to limit it to
features from some $older_release .

But ... you can always backpatch it yourself into older Pg and build
your own pg_dump and pg_dumpall.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-21 Thread Craig Ringer
On 22 March 2017 at 09:49, Craig Ringer  wrote:

>> Overall, though, I think that 0001 looks far better than any previous
>> iteration.  It's simple.  It looks safe.  It seems unlikely to break
>> anything that works now.  Woo hoo!
>
> Funny that this started with "hey, here's a simple, non-invasive
> function for looking up the status of an arbitrary xid".

Changes made per discussion.

Removed the comments on TransactionIdDidCommit and
TransactionIdDidAbort . It's not going to be relevant for the immense
majority of callers anyway, and callers that are looking up arbitrary
user supplied XIDs will (hopefully) be looking at
TransactionIdInRecentPast anyway.

I'll be leaving the 'xid' vs 'bigint' issues elsewhere in Pg for next
release, nowhere near time for that now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 5766e6506e569b0d91e22e90c5e6786ce9fefdf6 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 23 Jan 2017 13:25:30 +0800
Subject: [PATCH 1/3] Fix race between clog truncation and lookup

There was previously no way to look up an arbitrary xid without
running the risk of having clog truncated out from under you.

This hasn't previously been a problem because anything looking up xids in clog
knows they're protected by datminxid, but that's not the case for arbitrary
user-supplied XIDs. clog is truncated before we advanced oldestXid under
XidGenLock, so holding XidGenLock during a clog lookup is insufficient to
prevent the race. There's no way to look up a SLRU with soft-failure;
attempting a lookup produces an I/O error. There's also no safe way to trap and
swallow the SLRU lookup error due mainly to locking issues.

To address this, introduce a copy of oldestXid, oldestClogXid, that is advanced
before clog truncation under a new LWLock, CLogTruncationLock. Lookups of
arbitrary XIDs must take and hold CLogTruncationLock to prevent concurrent
advance of the minimum valid xid in clog.

This race also exists in a worse form on standby servers. On a standby we only
advance oldestXid when we replay the next checkpoint, so there's a much larger
window between clog truncation and subsequent updating of the limit. Fix this
by recording the oldest xid in clog truncation records and applying the update
to oldestClogXid under ClogTruncationLock before replaying the clog truncation.

No attempt is made to eagerly update oldestXid on the standby, so it may fall
behind oldestClogXid until the next checkpoint.

Note that there's no need to take ClogTruncationLock for normal clog lookups
protected by datfrozenxid, only if accepting arbitrary XIDs that might not be
protected by vacuum thresholds.
---
 doc/src/sgml/monitoring.sgml |  4 +++
 src/backend/access/rmgrdesc/clogdesc.c   | 12 +++--
 src/backend/access/transam/clog.c| 46 +---
 src/backend/access/transam/transam.c |  4 +--
 src/backend/access/transam/varsup.c  | 23 +++-
 src/backend/access/transam/xlog.c| 11 
 src/backend/commands/vacuum.c|  2 +-
 src/backend/storage/lmgr/lwlocknames.txt |  1 +
 src/include/access/clog.h|  8 +-
 src/include/access/transam.h |  7 +
 10 files changed, 101 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcb2d33..1c84ce5 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1018,6 +1018,10 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  Waiting to read or update old snapshot control information.
 
 
+ CLogTruncationLock
+ Waiting to truncate the transaction log or waiting for transaction log truncation to finish.
+
+
  clog
  Waiting for I/O on a clog (transaction status) buffer.
 
diff --git a/src/backend/access/rmgrdesc/clogdesc.c b/src/backend/access/rmgrdesc/clogdesc.c
index 352de48..ef268c5 100644
--- a/src/backend/access/rmgrdesc/clogdesc.c
+++ b/src/backend/access/rmgrdesc/clogdesc.c
@@ -23,12 +23,20 @@ clog_desc(StringInfo buf, XLogReaderState *record)
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
 
-	if (info == CLOG_ZEROPAGE || info == CLOG_TRUNCATE)
+	if (info == CLOG_ZEROPAGE)
 	{
 		int			pageno;
 
 		memcpy(, rec, sizeof(int));
-		appendStringInfo(buf, "%d", pageno);
+		appendStringInfo(buf, "page %d", pageno);
+	}
+	else if (info == CLOG_TRUNCATE)
+	{
+		xl_clog_truncate xlrec;
+
+		memcpy(, rec, sizeof(xl_clog_truncate));
+		appendStringInfo(buf, "page %d; oldestXact %u",
+			xlrec.pageno, xlrec.oldestXact);
 	}
 }
 
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 5b1d13d..2d33510 100644
--- a/src/backend/access/transam/clog.c
+++ 

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-03-21 Thread Tom Lane
Peter Eisentraut  writes:
> No answer.  Can we remove this chunk?

>> +if (no_role_passwords && binary_upgrade)

Perhaps, but why?  ISTM that trying to run pg_upgrade as non-superuser
is a nonstarter for a number of reasons, while if you're superuser you
do not need --no-role-passwords.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee
On Wed, Mar 22, 2017 at 3:51 AM, Mithun Cy 
wrote:

> On Tue, Mar 21, 2017 at 8:10 PM, Robert Haas 
> wrote:
> > If the WAL writing hides the loss, then I agree that's not a big
> > concern.  But if the loss is still visible even when WAL is written,
> > then I'm not so sure.
>
> The tests table schema was taken from earlier tests what Pavan has posted
> [1], hence it is UNLOGGED all I tried to stress the tests. Instead of
> updating 1 row at a time through pgbench (For which I and Pavan both did
> not see any regression), I tried to update all the rows in the single
> statement.
>

Sorry, I did not mean to suggest that you set it up wrongly, I was just
trying to point out that the test case itself may not be very practical.
But given your recent numbers, the regression is clearly non-trivial and
something we must address.


> I have changed the settings as recommended and did a quick test as above
> in our machine by removing UNLOGGED world in create table statement.
>
> Patch Tested : Only 0001_interesting_attrs_v18.patch in [2]
>
> Response time recorded shows there is a much higher increase in response
> time from 10% to 25% after the patch.
>
>
Thanks for repeating the tests. They are very useful. It might make sense
to reverse the order or do 6 tests each and alternate between patched and
unpatched master just to get rid of any other anomaly.

BTW may I request another test with the attached patch? In this patch, we
check if the PageIsFull() even before deciding which attributes to check
for modification. If the page is already full, there is hardly any chance
of doing a HOT update  (there could be a corner case where the new tuple is
smaller than the tuple used in previous UPDATE and we have just enough
space to do HOT update this time, but I can think that's too narrow).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001_interesting_attrs_v19.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-21 Thread Amit Kapila
On Tue, Mar 21, 2017 at 11:49 PM, Ashutosh Sharma  wrote:
>>
>> I can confirm that that fixes the seg faults for me.
>
> Thanks for confirmation.
>
>>
>> Did you mean you couldn't reproduce the problem in the first place, or that
>> you could reproduce it and now the patch fixes it?  If the first of those, I
>> forget to say you do have to wait for hot standby to reach a consistency and
>> open for connections, and then connect to the standby ("psql -p 9874"),
>> before the seg fault will be triggered.
>
> I meant that I was not able to reproduce the issue on HEAD.
>
>>
>> But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges
>> from btree_xlog_delete_get_latestRemovedXid, which I don't understand the
>> reason for the divergence.  Is there a reason we dropped the PANIC if we
>> have not reached consistency?
>
> Well, I'm not quite sure how would standby allow any backend to
> connect to it until it has reached to a consistent state. If you see
> the definition of btree_xlog_delete_get_latestRemovedXid(), just
> before consistency check there is a if-condition 'if
> (CountDBBackends(InvalidOid) == 0)' which means
> we are checking for consistent state only after knowing that there are
> some backends connected to the standby. So, Is there a possibility of
> having some backend connected to standby server without having it in
> consistent state.
>

I don't think so, but I think we should have reachedConsistency check
and elog(PANIC,..) similar to btree.  If you see other conditions
where we PANIC in btree or hash xlog code, you will notice that those
are also theoretically not possible cases.  It seems this is to save
database from getting corrupt or behaving insanely if due to some
reason (like a coding error or others) the check fails.

In a quick look, I don't find any other divergence in both the
function, is there any other divergence in both functions, if so, I
think we should at the very least mention something about it in the
function header.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Questionable tag usage

2017-03-21 Thread Tom Lane
Peter Eisentraut  writes:
>> On 1/4/17 11:50 PM, Tom Lane wrote:
>>> Anyway, bottom line is I'm not terribly excited about fixing just this
>>> one place.  I think we need to decide whether we like the new more-verbose
>>> output for links.  If we don't, we need to fix the markup rules to not do
>>> that.  If we do, there are a lot of places that need adjustment to be less
>>> duplicative, and we should try to be somewhat systematic about fixing
>>> them.

> This question is still open.  Do we want to keep the new linking style
> Section 1.2.3, "Title", or revert back to the old style just Section
> 1.2.3?  It's a simple toggle setting.

I'd vote for reverting for now.  If someone wants to run through the
docs and make considered decisions about where the more verbose style
is a win and where it isn't, then we could make the style change.
But that does not seem like a high-priority task --- and at the moment,
what we've got is a huge pile of docs that were written with the
less verbose style of markup in mind.  So my bet is that there's a lot
of places where more-verbose is not a win.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-03-21 Thread Seki, Eiji
On 2017-03-21 07:46:47 Haribabu Kommi wrote:
>On Tue, Mar 21, 2017 at 3:16 PM, Seki, Eiji  wrote:
>+/* Use these flags in GetOldestXmin as "flags" */
>
>How about some thing like the following.
>/* Use the following flags as an input "flags" to GetOldestXmin function */
>
>
>+/* Ignore vacuum backends (Note: this also ignores analyze with vacuum 
>backends) */
>+#define   PROCARRAY_FLAGS_VACUUM  
>PROCARRAY_FLAGS_DEFAULT | PROCARRAY_VACUUM_FLAG
>+/* Ignore analyze backends (Note: this also ignores vacuum with analyze 
>backends) */
>+#define   PROCARRAY_FLAGS_ANALYZE 
>PROCARRAY_FLAGS_DEFAULT | PROCARRAY_ANALYZE_FLAG
>
>Whenever the above flags are passed to the GetOldestXmin() function,
>it just verifies whether any one of the flags are set in the backend flags
>or not. And also actually the PROC_IN_ANALYZE flag will set when
>analyze operation is started and reset at the end. I feel, it is not
>required to mention the Note section.
>
>+/* Ignore vacuum backends and analyze ones */
>
>How about changing it as "Ignore both vacuum and analyze backends".

Thank you for your review, again.

I think your proposals are better, so I reflected them.

--
Regards,
Eiji Seki
Fujitsu




get_oldest_xmin_with_ignore_flags_v4.patch
Description: get_oldest_xmin_with_ignore_flags_v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-03-21 Thread Craig Ringer
Hi all

Updated timeline following patch attached.

There's a change in read_local_xlog_page to ensure we maintain
ThisTimeLineID properly, otherwise it's just comment changes.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From d42ceaec47793f67c55523d1aeb72be61c4f2dea Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 1 Sep 2016 10:16:55 +0800
Subject: [PATCH] Teach xlogreader to follow timeline switches

The XLogReader was timeline-agnostic and assumed that all WAL segments
requested would be on ThisTimeLineID.

When decoding from a logical slot, it's necessary for xlog reading to
be able to read xlog from historical (i.e. not current) timelines.
Otherwise decoding fails after failover to a physical replica because
the oldest still-needed archives are in the historical timeline.

Supporting logical decoding timeline following is a pre-requisite for
logical decoding on physical standby servers. It also makes it
possible to promote a replica with logical slots to a master and
replay from those slots, allowing logical decoding applications to
follow physical failover.

Logical slots cannot actually be created or advanced on a replica so this is
mostly foundation work for subsequent changes to enable logical decoding on
standbys.

Tests are included to exercise the functionality using a cold disk-level copy
of the master that's started up as a replica with slots intact, but the
intended use of the functionality is with logical decoding on a standby.

Note that an earlier version of logical decoding timeline following
was committed to 9.6 as 24c5f1a103ce, 3a3b309041b0, 82c83b337202, and
f07d18b6e94d. It was then reverted by c1543a81a7a8 just after 9.6
feature freeze when issues were discovered too late to safely fix them
in the 9.6 release cycle.

The prior approach failed to consider that a record could be split
across pages that are on different segments, where the new segment
contains the start of a new timeline. In that case the old segment
might be missing or renamed with a .partial suffix.

This patch reworks the logic to be page-based and in the process
simplify how the last timeline for a segment is looked up.
---
 src/backend/access/transam/xlogutils.c | 213 +++--
 src/backend/replication/logical/logicalfuncs.c |   8 +-
 src/backend/replication/walsender.c|  11 +-
 src/include/access/xlogreader.h|  16 ++
 src/include/access/xlogutils.h |   3 +
 src/test/recovery/Makefile |   2 +
 .../recovery/t/009_logical_decoding_timelines.pl   | 130 +
 7 files changed, 364 insertions(+), 19 deletions(-)
 create mode 100644 src/test/recovery/t/009_logical_decoding_timelines.pl

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index b2b9fcb..28c07d3 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -19,6 +19,7 @@
 
 #include 
 
+#include "access/timeline.h"
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
@@ -662,6 +663,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
 	/* state maintained across calls */
 	static int	sendFile = -1;
 	static XLogSegNo sendSegNo = 0;
+	static TimeLineID sendTLI = 0;
 	static uint32 sendOff = 0;
 
 	p = buf;
@@ -677,7 +679,8 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
 		startoff = recptr % XLogSegSize;
 
 		/* Do we need to switch to a different xlog segment? */
-		if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+		if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo) ||
+			sendTLI != tli)
 		{
 			char		path[MAXPGPATH];
 
@@ -704,6 +707,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
 	path)));
 			}
 			sendOff = 0;
+			sendTLI = tli;
 		}
 
 		/* Need to seek in the file? */
@@ -754,6 +758,133 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
 }
 
 /*
+ * Determine which timeline to read an xlog page from and set the
+ * XLogReaderState's currTLI to that timeline ID.
+ *
+ * We care about timelines in xlogreader when we might be reading xlog
+ * generated prior to a promotion, either if we're currently a standby in
+ * recovery or if we're a promoted master reading xlogs generated by the old
+ * master before our promotion.
+ *
+ * wantPage must be set to the start address of the page to read and
+ * wantLength to the amount of the page that will be read, up to
+ * XLOG_BLCKSZ. If the amount to be read isn't known, pass XLOG_BLCKSZ.
+ *
+ * We switch to an xlog segment from the new timeline eagerly when on a
+ * historical timeline, as soon as we reach the start of the xlog segment
+ * containing the timeline switch.  The server copied the segment to the new
+ * timeline so all the data up to the switch point 

[HACKERS] Replication status in logical replication

2017-03-21 Thread Masahiko Sawada
Hi all,

When using logical replication, I ran into a situation where the
pg_stat_replication.state is not updated until any wal record is sent
after started up. For example, I set up logical replication with 2
subscriber and restart the publisher server, but I see the following
status for a while (maybe until autovacuum run).

=# select application_name, state, sent_location, write_location,
flush_location, replay_location, sync_state from pg_stat_replication ;
 application_name |  state  | sent_location | write_location |
flush_location | replay_location | sync_state
--+-+---+++-+
 node1| catchup | 0/16329F8 | 0/16329F8  |
0/16329F8  | 0/16329F8   | potential
 node2| catchup | 0/16329F8 | 0/16329F8  |
0/16329F8  | 0/16329F8   | async
(2 rows)

It seems that all wal senders have caught up but
pg_stat_replication.state is still "catchup". The reason of this
behavior is that WalSndCaughtUp is updated only in WalSndWaitForWal in
logical replication during running, and in logical_read_xlog_page
always try to read next wal record (i.g. it calls
WalSndWaitForWal(targetPagePtr + reqLen)). So WalSndWaitForWal cannot
update WalSndCaughtUp until any new wal record is created after
started up and wal sender read it.

Attached patch fixes this behavior by updating WalSndCaughtUp before
trying to read next WAL if already caught up.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


logical_repl_caught_up.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication access control patches

2017-03-21 Thread Peter Eisentraut
On 3/20/17 15:10, Petr Jelinek wrote:
> Hmm but REPLICATION role can do basebackup/consume wal, so how does
> giving it limited publication access help? Wouldn't we need some
> SUBSCRIPTION role/grant used instead for logical replication connections
> instead of REPLICATION for this to make sense?

Since we're splitting up the pg_hba.conf setup for logical and physical
connections, it would probably not matter.

But just to think it through, how could we split this up sensibly?

Here is the complete list of things that rolreplication allows:

- create/drop replication slot
- pg_logical_slot_get_changes() and friends
- connect to walsender

For logical replication, we could slice it up this way:

- new user attribute allowing the creating of logical replication slots
- store owner of slot, allow drop and get based on ownership
- allow anyone to connect as walsender

Another problem is that the walsender command to create a replication
slot allows you to load an arbitrary plugin.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 7:18 PM, Peter Geoghegan  wrote:
>> As shown in 0008-hj-shared-buf-file-v8.patch.  Thoughts?
>
> A less serious issue I've also noticed is that you add palloc() calls,
> implicitly using the current memory context, within buffile.c.
> BufFileOpenTagged() has some, for example. However, there is a note
> that we don't need to save the memory context when we open a BufFile
> because we always repalloc(). That is no longer the case here.

Similarly, I think that your new type of BufFile has no need to save
CurrentResourceOwner, because it won't ever actually be used. I
suppose that you should at least note this in comments.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-03-21 Thread Craig Ringer
On 21 March 2017 at 09:05, Craig Ringer  wrote:

> Thanks, that's a helpful point. The commit in question is 978b2f65. I
> didn't notice that it introduced XLogReader use in twophase.c, though
> I should've realised given the discussion about fetching recent 2pc
> info from xlog. I don't see any potential for issues at first glance,
> but I'll go over it in more detail. The main concern is validity of
> ThisTimeLineID, but since it doesn't run in recovery I don't see much
> of a problem there. That also means it can afford to use the current
> timeline-oblivious read_local_xlog_page safely.
>
> TAP tests for 2pc were added by 3082098. I'll check to make sure they
> have appropriate coverage for this.

The TAP tests pass fine, and I can't see any likely issues either.

XLogReader for 2PC doesn't happen on standby, and RecoveryInProgress()
will update ThisTimeLineID on promotion.

>> Did you check whether ThisTimeLineID is actually always valid in the
>> processes logical decoding could run in?  IIRC it's not consistently
>> update during recovery in any process but the startup process.
>
> I share your concerns that it may not be well enough maintained.
> Thankyou for the reminder, that's been on my TODO and got lost when I
> had to task-hop to other priorities.

The main place we maintain ThisTimeLineID (outside StartupXLOG of
course) is in walsender's GetStandbyFlushRecPtr, which calls
GetWalRcvWriteRecPtr. That's not used in walsender's logical decoding
or in the SQL interface.

I've changed the order of operations in read_local_xlog_page to ensure
that RecoveryInProgress() updates ThisTimeLineID if we're promoted,
and made it update ThisTimeLineID from GetXLogReplayRecPtr otherwise.

pg_logical_slot_get_changes_guts was fine already.

Because xlog read callbacks must not attempt to read pages past the
flush limit (master) or replay limit (standby), it doesn't matter if
ThisTimeLineID is completely up to date, only that it's valid as-of
that LSN.

I did identify one problem. The startup process renames the last
segment in a timeline to .partial when it processes a timeline switch.
See xlog.c:7597. So if we have the following order of operations:

* Update ThisTimeLineID to 2 at latest redo ptr
* XLogReadDetermineTimeline chooses timeline 2 to read from
* startup process replays timeline switch to TL 3 and renames last
segment in old timeline to .partial
* XLogRead() tries to open segment with TL 2

we'll fail. I don't think it matters much though. We're not actually
supporting streaming decoding from standby this release by the looks,
and even if we did the effect would be limited to an ERROR and a
reconnect. It doesn't look like there's really any sort of lock or
other synchronisation we can rely on to prevent this, and we should
probably just live with it. If we have already opened the segment
we'll just keep reading from it without noticing the rename; if we
haven't and are switching to it just as it's renamed we'll ERROR when
we try to open it.

I had cascading and promotion tests in progress for decoding on
standby, but doubt there's much point finishing them off now that it's
not likely that decoding on standby can be added for this CF.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 5:07 AM, Thomas Munro
 wrote:
>> buffile.c should stop pretending to care about anything other than
>> temp files, IMV. 100% of all clients that want temporary files go
>> through buffile.c. 100% of all clients that want non-temp files (files
>> which are not marked FD_TEMPORARY) access fd.c directly, rather than
>> going through buffile.c.
>
> I still need BufFile because I want buffering.
>
> There are 3 separate characteristics enabled by flags with 'temporary'
> in their name.  I think we should consider separating the concerns by
> splitting and renaming them:
>
> 1.  Segmented BufFile behaviour.  I propose renaming BufFile's isTemp
> member to isSegmented, because that is what it really does.   I want
> that feature independently without getting confused about lifetimes.
> Tested with small MAX_PHYSICAL_FILESIZE as you suggested.

I would have proposed to get rid of the isTemp field entirely. It is
always true with current usage, any only #ifdef NOT_USED code presumes
that it could be any other way. BufFile is all about temp files, which
ISTM should be formalized. The whole point of BufFile is to segment
fd.c temp file segments. Who would ever want to use BufFile without
that capability anyway?

> 2.  The temp_file_limit system.  Currently this applies to fd.c files
> opened with FD_TEMPORARY.  You're right that we shouldn't be able to
> escape that sanity check on disk space just because we want to manage
> disk file ownership differently.  I propose that we create a new flag
> FD_TEMP_FILE_LIMIT that can be set independentlyisTemp of the flags
> controlling disk file lifetime.  When working with SharedBufFileSet,
> the limit applies to each backend in respect of files it created,
> while it has them open.  This seems a lot simpler than any
> shared-temp-file-limit type scheme and is vaguely similar to the way
> work_mem applies in each backend for parallel query.

I agree that that makes sense as a user-visible behavior of
temp_file_limit. This user-visible behavior is what I actually
implemented for parallel CREATE INDEX.

> 3.  Delete-on-close/delete-at-end-of-xact.  I don't want to use that
> facility so I propose disconnecting it from the above.  We c{ould
> rename those fd.c-internal flags FD_TEMPORARY and FD_XACT_TEMPORARY to
> FD_DELETE_AT_CLOSE and FD_DELETE_AT_EOXACT.

This reliably unlink()s all files, albeit while relying on unlink()
ENOENT as a condition that terminates deletion of one particular
worker's BufFile's segments. However, because you effectively no
longer use resowner.c, ISTM that there is still a resource leak in
error paths. ResourceOwnerReleaseInternal() won't call FileClose() for
temp-ish files (that are not quite temp files in the current sense) in
the absence of no other place managing to do so, such as
BufFileClose(). How can you be sure that you'll actually close() the
FD itself (not vFD) within fd.c in the event of an error? Or Delete(),
which does some LRU maintenance for backend's local VfdCache?

If I follow the new code correctly, then it doesn't matter that you've
unlink()'d to take care of the more obvious resource management chore.
You can still have a reference leak like this, if I'm not mistaken,
because you still have backend local state (local VfdCache) that is
left totally decoupled with the new "shadow resource manager" for
shared BufFiles.

> As shown in 0008-hj-shared-buf-file-v8.patch.  Thoughts?

A less serious issue I've also noticed is that you add palloc() calls,
implicitly using the current memory context, within buffile.c.
BufFileOpenTagged() has some, for example. However, there is a note
that we don't need to save the memory context when we open a BufFile
because we always repalloc(). That is no longer the case here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-03-21 Thread Peter Eisentraut
On 3/13/17 16:41, Peter Eisentraut wrote:
> Why this?

No answer.  Can we remove this chunk?

> + if (no_role_passwords && binary_upgrade)
> + {
> + fprintf(stderr, _("%s: options --no-role-passwords and
> --binary-upgrade cannot be used together\n"),
> + progname);
> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
> + progname);
> + exit_nicely(1);
> + }

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updating the "tool sets" documentation for modern FreeBSD

2017-03-21 Thread Peter Eisentraut
On 3/13/17 21:16, Thomas Munro wrote:
> For several operating systems we give handy package manager one-liners
> to install all the requirements for building our documentation.  All
> current production FreeBSD releases have a friendly new package
> manager a bit like apt/yum, so here's a documentation patch to give
> the one line command.  Also, a brief note about gmake vs make in the
> doc subdir.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-03-21 Thread Michael Paquier
On Fri, Mar 17, 2017 at 5:15 PM, Michael Paquier
 wrote:
> On Fri, Mar 17, 2017 at 5:00 PM, Nikhil Sontakke
>  wrote:
>> Micheal, it looks like you are working on a final version of this patch? I
>> will wait to review it from my end, then.
>
> I have to admit that I am beginning to get drawn into it...

And here is what I got. I have found a couple of inconsistencies in
the patch, roughly:
- During recovery entries marked with ondisk = true should have their
start and end LSN reset to InvalidXLogRecPtr. This was actually
leading to some inconsistencies in MarkAsPreparing() for 2PC
transactions staying around for more than 2 checkpoints.
- RecoverPreparedTransactions(), StandbyRecoverPreparedTransactions()
and PrescanPreparedTransactions() doing both a scan of pg_twophase and
the shared memory entries was way too complicated. I have changed
things so as only memory entries are scanned by those routines, but an
initial scan of pg_twophase is done before recovery.
- Some inconsistencies in the comments and some typos found on the way.
- Simplification of some routines used in redo, as well as simplified
the set of routines made available to users.

Tests are passing for me, an extra lookup would be nice.
-- 
Michael


twophase_recovery_shmem_michael.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unacccented Fractions

2017-03-21 Thread Peter Eisentraut
On 3/13/17 23:01, David E. Wheeler wrote:
> I noticed that unaccent.rules has spaces in front of the unaccented 
> representation of fraction glyphs:

> This makes sense to me, as I’d like “1¼”, for example to become “1 1/4”. 
> However, that’s not what seems to happen:

These leading spaces come all the way from the input files that are used
to produce the unaccent rules, where they are presumably put for exactly
the reason you point out.  However, the unaccent lexer is not prepared
for that.  It just looks for any amount of whitespace separating the two
data columns.  So that might have to be fixed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2017-03-21 Thread Haribabu Kommi
On Tue, Mar 7, 2017 at 4:09 PM, Ideriha, Takeshi <
ideriha.take...@jp.fujitsu.com> wrote:

>
> Attached 004_declareStmt_test_v5.patch is a rebased one.
> The rest of patches are same as older version.
>

Thanks for the update patch. I started reviewing the patches.

There was a minor conflict in applying 004_declareXX patch.

Some comments in 001_declareStmt_preproc_v5.patch:

+ if (INFORMIX_MODE)
+ {
+ if (pg_strcasecmp(stmt+strlen("close "), "database") == 0)
+ {
+ if (connection)
+ mmerror(PARSE_ERROR, ET_ERROR, "AT option not allowed in CLOSE DATABASE
statement");
+
+ fprintf(base_yyout, "{ ECPGdisconnect(__LINE__, \"CURRENT\");");
+ whenever_action(2);
+ free(stmt);
+ break;
+ }
+ }

The same code block is present in the stmtClosePortalStmt condition to
verify the close
statement. Because of the above code addition, the code present in
stmtClosePortalStmt
is a dead code. So remove the code present in stmtClosePortalStmt or try to
reuse it.


+static void
+output_cursor_name(char *str)

This function needs some comments to explain the code flow for better
understanding.

+/*
+ * Translate the EXEC SQL DECLARE STATEMENT into ECPGdeclare function
+ */

How about using Transform instead of Translate? (similar references also)


I am yet to review the other patches.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Questionable tag usage

2017-03-21 Thread Peter Eisentraut
On 1/11/17 11:55, Peter Eisentraut wrote:
> On 1/4/17 11:50 PM, Tom Lane wrote:
>> Anyway, bottom line is I'm not terribly excited about fixing just this
>> one place.  I think we need to decide whether we like the new more-verbose
>> output for links.  If we don't, we need to fix the markup rules to not do
>> that.  If we do, there are a lot of places that need adjustment to be less
>> duplicative, and we should try to be somewhat systematic about fixing
>> them.
> 
> We can turn off the new link appearance and change it back to the old
> one.  We had previously speculated that this change might be awkward in
> some cases, but without any concrete cases.  If we have concrete cases,
> we can change it.

This question is still open.  Do we want to keep the new linking style
Section 1.2.3, "Title", or revert back to the old style just Section
1.2.3?  It's a simple toggle setting.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-21 Thread Craig Ringer
On 22 March 2017 at 01:49, Robert Haas  wrote:

> /me smacks forehead.  Actually, it should be CLogTruncationLock, with
> a capital L, for consistency with CLogControlLock.

Will do.

> The new lock needs to be added to the table in monitoring.sgml.

Same.

> I don't think the new header comments in TransactionIdDidCommit and
> TransactionIdDidAbort are super-clear.  I'm not sure you're going to
> be able to explain it there in a reasonable number of words, but I
> think that speaking of "testing against oldestClogXid" will leave
> people wondering what exactly that means. Maybe just write "caller is
> responsible for ensuring that the clog records covering XID being
> looked up can't be truncated away while the lookup is in progress",
> and then leave the bit about CLogTruncationLock to be explained by the
> callers that do that.  Or you could drop these comments entirely.

OK. I'll revisit and see if I can clean it up, otherwise remove it.

> Overall, though, I think that 0001 looks far better than any previous
> iteration.  It's simple.  It looks safe.  It seems unlikely to break
> anything that works now.  Woo hoo!

Funny that this started with "hey, here's a simple, non-invasive
function for looking up the status of an arbitrary xid".

Mature, complex systems eh?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-21 Thread Peter Eisentraut
This patch is looking pretty good to me, modulo the failing pg_dump tests.

Attached is a fixup patch.  I have mainly updated some comments and
variable naming for (my) clarity.  No functional changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 04f772f350773cce9890386c4a5924ee251ebbe6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 21 Mar 2017 15:38:39 -0400
Subject: [PATCH] fixup! Logical replication support for initial data copy

---
 src/backend/catalog/pg_subscription.c  |   6 +-
 src/backend/commands/subscriptioncmds.c|   2 +-
 .../libpqwalreceiver/libpqwalreceiver.c|   9 +-
 src/backend/replication/logical/tablesync.c| 190 +++--
 src/backend/replication/logical/worker.c   |  22 +--
 src/include/replication/logical.h  |   2 -
 src/include/replication/worker_internal.h  |   8 +-
 src/test/regress/expected/subscription.out |   2 -
 src/test/regress/sql/subscription.sql  |   2 -
 src/test/subscription/t/001_rep_changes.pl |   6 +-
 src/test/subscription/t/004_sync.pl|   2 +-
 11 files changed, 127 insertions(+), 124 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 9b74892548..e420ec14d2 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -321,10 +321,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn,
 			return SUBREL_STATE_UNKNOWN;
 		}
 
-		ereport(ERROR,
-(errcode(ERRCODE_UNDEFINED_OBJECT),
- errmsg("subscription table %u in subscription %u does not exist",
-		relid, subid)));
+		elog(ERROR, "subscription table %u in subscription %u does not exist",
+			 relid, subid);
 	}
 
 	/* Get the state. */
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index cba2d5c085..0784ca7951 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -992,7 +992,7 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 (errmsg("could not receive list of replicated tables from the publisher: %s",
 		res->err)));
 
-	/* Proccess tables. */
+	/* Process tables. */
 	slot = MakeSingleTupleTableSlot(res->tupledesc);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, slot))
 	{
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 3176182523..4dd8eef1f9 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -792,13 +792,12 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
  * Convert tuple query result to tuplestore.
  */
 static void
-libpqrcv_proccessTuples(PGresult *pgres, WalRcvExecResult *walres,
+libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 		const int nRetTypes, const Oid *retTypes)
 {
 	int		tupn;
 	int		coln;
 	int		nfields = PQnfields(pgres);
-	char   *cstrs[MaxTupleAttributeNumber];
 	HeapTuple		tuple;
 	AttInMetadata  *attinmeta;
 	MemoryContext	rowcontext;
@@ -830,9 +829,11 @@ libpqrcv_proccessTuples(PGresult *pgres, WalRcvExecResult *walres,
 	   "libpqrcv query result context",
 	   ALLOCSET_DEFAULT_SIZES);
 
-	/* Proccess returned rows. */
+	/* Process returned rows. */
 	for (tupn = 0; tupn < PQntuples(pgres); tupn++)
 	{
+		char   *cstrs[MaxTupleAttributeNumber];
+
 		CHECK_FOR_INTERRUPTS();
 
 		/* Do the allocations in temporary context. */
@@ -885,7 +886,7 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
 		case PGRES_SINGLE_TUPLE:
 		case PGRES_TUPLES_OK:
 			walres->status = WALRCV_OK_TUPLES;
-			libpqrcv_proccessTuples(pgres, walres, nRetTypes, retTypes);
+			libpqrcv_processTuples(pgres, walres, nRetTypes, retTypes);
 			break;
 
 		case PGRES_COPY_IN:
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 6c67a5ea9f..3e16b0d576 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -36,7 +36,7 @@
  *		  - if the apply is in front of the sync in the wal stream the new
  *			state is set to CATCHUP and apply loops until the sync process
  *			catches up to the same LSN as apply
- *		  - if the sync if in front of the apply in the wal stream the new
+ *		  - if the sync is in front of the apply in the wal stream the new
  *			state is set to SYNCDONE
  *		  - if both apply and sync are at the same position in the wal stream
  *			the state of the table is set to READY
@@ -104,7 +104,6 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 
-static List *table_states = NIL;
 static bool table_states_valid = false;
 
 StringInfo	copybuf = 

Re: [HACKERS] Logical replication existing data copy

2017-03-21 Thread Peter Eisentraut
On 3/20/17 19:54, Petr Jelinek wrote:
> Here is fixed version, also rebased on top of all the changes to pg_dump
> tests. Subscriptions are dumped unless --no-subscriptions is specified.

The problem with that is that pg_dump will now fail for unprivileged
users.  That's a separate problem to solve.  I suggest not overloading
this patch with that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem in Parallel Bitmap Heap Scan?

2017-03-21 Thread Dilip Kumar
On Wed, Mar 22, 2017 at 5:38 AM, Thomas Munro
 wrote:
> Isn't that one row short?  What happened to this one?
>
>  10.0.0.0/8 | 10::/8

Actually, In my last test I did not connect to regression database, I
have simply taken table and the few rows from inet.sql so it was only
16 rows even with seqscan.

Here are the updated results when I connect to regression database and re-test.

regression=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr
ORDER BY i;
 c  |i
+--
 10.0.0.0/8 | 9.1.2.3/8
 10.0.0.0/8 | 10.1.2.3/8
 10.0.0.0/32| 10.1.2.3/8
 10.0.0.0/8 | 10.1.2.3/8
 10.1.0.0/16| 10.1.2.3/16
 10.1.2.0/24| 10.1.2.3/24
 10.1.2.3/32| 10.1.2.3
 10.0.0.0/8 | 11.1.2.3/8
 192.168.1.0/24 | 192.168.1.226/24
 192.168.1.0/24 | 192.168.1.255/24
 192.168.1.0/24 | 192.168.1.0/25
 192.168.1.0/24 | 192.168.1.255/25
 192.168.1.0/26 | 192.168.1.226
 10.0.0.0/8 | 10::/8
 :::1.2.3.4/128 | ::4.3.2.1/24
 10:23::f1/128  | 10:23::f1/64
 10:23::8000/113| 10:23::
(17 rows)

regression=# explain analyze SELECT * FROM inet_tbl WHERE i <>
'192.168.1.0/24'::cidr
ORDER BY i;
QUERY PLAN
---
 Gather Merge  (cost=16.57..16.67 rows=10 width=64) (actual
time=4.972..4.983 rows=17 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   ->  Sort  (cost=16.56..16.58 rows=10 width=64) (actual
time=0.107..0.110 rows=8 loops=2)
 Sort Key: i
 Sort Method: quicksort  Memory: 26kB
 ->  Parallel Bitmap Heap Scan on inet_tbl  (cost=12.26..16.39
rows=10 width=64) (actual time=0.051..0.053 rows=8 loops=2)
   Recheck Cond: (i <> '192.168.1.0/24'::inet)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on inet_idx3  (cost=0.00..12.26
rows=17 width=0) (actual time=0.016..0.016 rows=17 loops=1)
 Index Cond: (i <> '192.168.1.0/24'::inet)
 Planning time: 0.113 ms
 Execution time: 5.691 ms
(13 rows)


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-03-21 Thread Peter Eisentraut
On 3/21/17 16:11, Vitaly Burovoy wrote:
> I've just checked and still get an error about a type, not about
> absence of a column:
> test=# CREATE TABLE itest3 (a int generated by default as identity, b text);
> CREATE TABLE
> test=# ALTER TABLE itest3 ALTER COLUMN o ADD GENERATED ALWAYS AS IDENTITY;
> ERROR:  identity column type must be smallint, integer, or bigint

OK, I have a fix.

> My argument is consistency.
> Since IDENTITY is a property of a column (similar to DEFAULT, NOT
> NULL, attributes, STORAGE, etc.), it follows a different rule: it is
> either set or not set. If it did not set before, the "SET" DDL "adds"
> it, if that property already present, the DDL replaces it.
> There is no "ADD" clause in DDLs like "...ALTER table ALTER column..."
> (only "SET", "RESET" and "DROP")[2].
> Your patch introduces the single DDL version with "...ALTER column
> ADD..." for a property.

But it creates a sequence, so it creates state.  So mistakes could
easily be masked.  With my patch, if you do ADD twice, you get an error.
 With your proposal, you'd have to use SET, and you could overwrite
existing sequence state without realizing it.

>> It does change the type, but changing the type doesn't change the
>> limits.  That is a property of how ALTER SEQUENCE works, which was
>> separately discussed.
> 
> Are you about the thread[1]? If so, I'd say the current behavior is not good.
> I sent an example with users' bad experience who will know nothing
> about sequences (because they'll deal with identity columns).
> Would it be better to change bounds of a sequence if they match the
> bounds of an old type (to the bounds of a new type)?

That's an idea, but that's for a separate patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-21 Thread Ideriha, Takeshi

>>I like the idea taking advantage of linestyle utilities 
>>to implement rst and markdown format efficiently instead of newly developing 
>>pset format things.
>>But I'm thinking two comments below needs change to something about not 
>>focusing only linestyle. 
>>That's because they really take care of both '\pset linestyle and \pset 
>>format' and it may lead to misunderstanding to readers.
>> 
>>---
>>/* Line style control structures */
>>const printTextFormat pg_markdown =
>> 
>>/* get selected or default line style */
>>const printTextFormat *
>>get_line_style(const printTableOpt *opt)
>>---

>It is in command.c?
>I have it done that \pset format changes linestyle


>psql (9.6.2, server 9.6.1)
>Type "help" for help.

>jelen=# \pset linestyle ascii
>Line style is ascii.
>jelen=# \pset format rst
>Output format is rst.
>jelen=# \pset linestyle
>Line style is rst.
>jelen=# 
>Peter wrote that this is not right, but i don`t know how it should like, 
>because most of this is done on linestyle, format is used only for switch from 
>console.

Thank you for explanation!

That's about print.c, but my explanation was poor...
My point was a slight thing about comments in source code.
I've just wanted to say comments needs change to match actual code.
These comments says about line style but the corresponding codes are about both 
line style and format. 

But these points should be considered after how to implement the new formats 
are decided.
So please don't care.
(I've just thought implementing the new formats using linestyle code came into 
consensus.)

Regards, 
Ideriha Takeshi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch proposal

2017-03-21 Thread Venkata B Nagothi
On Tue, Mar 21, 2017 at 8:46 AM, David Steele  wrote:

> Hi Venkata,
>
> On 2/28/17 11:59 PM, Venkata B Nagothi wrote:
>
>> On Wed, Mar 1, 2017 at 1:14 AM, Venkata B Nagothi > > wrote:
>> On Tue, Jan 31, 2017 at 6:49 AM, David Steele > > wrote:
>>
>> Do you know when those will be ready?
>>
>> Attached are both the patches with tests included.
>>
>
> Thanks for adding the tests.  They bring clarity to the patch.
>

Thank you again reviewing the patch and your comments !


> Unfortunately, I don't think the first patch (recoveryStartPoint) will
> work as currently implemented.  The problem I see is that the new function
> recoveryStartsHere() depends on pg_control containing a checkpoint right at
> the end of the backup.  There's no guarantee that this is true, even if
> pg_control is copied last.  That means a time, lsn, or xid that occurs
> somewhere in the middle of the backup can be selected without complaint
> from this code depending on timing.
>

Yes, that is true.  The latest best position, checkpoint position, xid and
timestamp of the restored backup of the backup is shown up in the
pg_controldata, which means, that is the position from which the recovery
would start. Which in-turn means, WALs start getting replayed from that
position towards --> minimum recovery position (which is the end backup,
which again means applying WALs generated between start and the end backup)
all the way through to  --> recovery target position. The best start
position to check with would the position shown up in the pg_control file,
which is way much better compared to the current postgresql behaviour.

The tests pass (or rather fail as expected) because they are written using
> values before the start of the backup.  It's actually the end of the backup
> (where the database becomes consistent on recovery) that defines where PITR
> can start and this distinction definitely matters for very long backups.  A
> better test would be to start the backup, get a time/lsn/xid after writing
> some data, and then make sure that time/lsn/xid is flagged as invalid on
> recovery.
>

Yes, i agree, the databases restored from the backup would start the
recovery and would become consistent once the end-backup is reached. The
point here is that, when the backup is restored and recovery proceeds
further, there is NO WAY to identify the next consistent position or
end-position of the backup. This patch is only implementing a check to
ensure recovery starts and proceeds at the right position instead of
pre-maturely getting promoted which is the current behaviour.

The current behaviour is that, if the XID shown-up by the pg_controldata is
say 1400 and the specified recovery_target_xid is say 200, then, postgresql
just completes the recovery and promotes at the immediate consistent
recovery target, which means, the DBA has to all the way start the
restoration and recovery process again to promote the database at the
intended position.

It is also problematic to assume that transaction IDs commit in order. If
> checkPoint.latestCompletedXid contains 5 then a recovery to xid 4 may or
> may not be successful depending on the commit order.  However, it appears
> in this case the patch would disallow recovery to 4.
>

If the txid 4 is the recovery target xid, then, the appropriate backup
taken previous to txid 4 must be used or as an alternative recovery target
like recovery_target_timestamp must be used to proceed to the respective
recovery target xid.


> I think this logic would need to be baked into recoveryStopsAfter() and/or
> recoveryStopsBefore() in order to work.  It's not clear to me what that
> would look like, however, or if the xid check is even practical.
>

The above two functions would determine if the recovery process has to stop
at a particular position or not and the patch has nothing to do with it.

To summarise, this patch only determines if the WAL replay has to start at
all. In other words, if the specified recovery target falls prior to the
restored database position, then, the WAL replay should not start, which
prevent the database from getting promoted at an unintended recovery target.

Any thoughts/comments ?

Regards,

Venkata Balaji N
Database Consultant


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 2:49 PM, Thomas Munro
 wrote:
> I'm going to experiment with refactoring the v10 parallel CREATE INDEX
> patch to use the SharedBufFileSet interface from
> hj-shared-buf-file-v8.patch today and see what problems I run into.

I would be happy if you took over parallel CREATE INDEX completely. It
makes a certain amount of sense, and not just because I am no longer
able to work on it.

You're the one doing things with shared BufFiles that are of
significant complexity. Certainly more complicated than what parallel
CREATE INDEX needs in every way, and necessarily so. I will still have
some more feedback on your shared BufFile design, though, while it's
fresh in my mind.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 12:15 PM, Robert Haas  wrote:
> Wouldn't it break on-disk compatibility with existing btree indexes?

Yes, it would, but see my later remarks on pd_prune_xid. I think that
that would be safe.

> I think we're still trying to solve a problem that Simon postulated in
> advance of evidence that shows how much of a problem it actually is.

I don't think we're still doing that. I think we're discussing the
risk of recycling being broken indefinitely when it doesn't happen in
time.

> Not only might that be unnecessary, but if we don't have a test
> demonstrating the problem, we also don't have a test demonstrating
> that a given approach fixes it.

Preventing recycling from happening until we feel like it is probably
fine. It is not fine to break it forever, though. The specific problem
is that there is an XID stored in dead B-Tree + SP-GiST pages that is
used in the subsequent RecentGlobalXmin interlock that determines if
recycling is safe (if there is no possible index scan that could land
on the dead page). You know, the _bt_page_recyclable() check.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-03-21 Thread Michael Banck
Hi,

Am Mittwoch, den 22.03.2017, 00:40 +0100 schrieb Michael Banck:
> I guess if we decide (physical) slots should not be created implicitly,
> then using the same UI as pg_receivewal makes sense for the sake of
> consistency, i.e. "--slot=name --create-slot [--if-not-exists]". That is
> rather verbose though and I don't think the --if-not-exists is great UX,
> but maybe there is some use-case for erroring out if a slot already
> exists that I haven't thought of.

Thinking about this some more, there's the case of somebody accidentally
using an already existing slot that was meant for another standby which
happens to be down just at that moment.

>From some quick testing, that makes replication fail once the other
standby is started up again, but soon after the new standby is taken
down, replication picks up without apparent problems for the other
standby.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Mar 21, 2017 at 6:02 PM, David Steele  wrote:
> > The biggest downside I can see is that this would change the naming scheme
> > for the default of 16MB compared to previous versions of Postgres.  However,
> > for all other wal-seg-size values changes would need to be made anyway.
> 
> I think changing the naming convention for 16MB WAL segments, which is
> still going to be what 99% of people use, is an awfully large
> compatibility break for an awfully marginal benefit.

It seems extremely unlikely to me that we're going to actually see users
deviate from whatever we set the default to and so I'm not sure that
this is a real concern.  We aren't changing what 9.6 and below's naming
scheme is, just what PG10+ do, and PG10+ are going to have a different
default WAL size.

I realize the current patch still has the 16MB default even though a
rather large portion of the early discussion appeared in favor of
changing it to 64MB.  Once we've done that, I don't think it makes one
whit of difference what the naming scheme looks like when you're using
16MB sizes because essentially zero people are going to actually use
such a setting.

> We've already
> created quite a few incompatibilities in this release, and I'm not
> entirely eager to just keep cranking them out at top speed.  

That position would seem to imply that you're in favor of keeping the
current default of 16MB, but that doesn't make sense given that you
started this discussion advocating to make it larger.  Changing your
position is certainly fine, but it'd be good to be more clear if that's
what you meant here or if you were just referring to the file naming
scheme but you do still want to increase the default size.

I'll admit that we might have a few more people using non-default sizes
once we make it an initdb-option (though I'm tempted to suggest that one
might be able to count them using their digits ;), but it seems very
unlikely that they would do so to reduce it back down to 16MB, so I'm
really not seeing the naming scheme change as a serious
backwards-incompatibility change.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Problem in Parallel Bitmap Heap Scan?

2017-03-21 Thread Thomas Munro
On Wed, Mar 22, 2017 at 1:21 AM, Dilip Kumar  wrote:
> postgres=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr
> ORDER BY i;
>  c  |i
> +--
>  10.0.0.0/8 | 9.1.2.3/8
>  10.0.0.0/8 | 10.1.2.3/8
>  10.0.0.0/32| 10.1.2.3/8
>  10.0.0.0/8 | 10.1.2.3/8
>  10.1.0.0/16| 10.1.2.3/16
>  10.1.2.0/24| 10.1.2.3/24
>  10.1.2.3/32| 10.1.2.3
>  10.0.0.0/8 | 11.1.2.3/8
>  192.168.1.0/24 | 192.168.1.226/24
>  192.168.1.0/24 | 192.168.1.255/24
>  192.168.1.0/24 | 192.168.1.0/25
>  192.168.1.0/24 | 192.168.1.255/25
>  192.168.1.0/26 | 192.168.1.226
>  :::1.2.3.4/128 | ::4.3.2.1/24
>  10:23::f1/128  | 10:23::f1/64
>  10:23::8000/113| 10:23::
> (16 rows)

Isn't that one row short?  What happened to this one?

 10.0.0.0/8 | 10::/8

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

2017-03-21 Thread Michael Paquier
On Wed, Mar 22, 2017 at 1:56 AM, David Steele  wrote:
> Hi Michael,
>
> On 3/10/17 9:15 AM, Peter Eisentraut wrote:
>>
>> On 3/9/17 17:03, Michael Paquier wrote:
>>>
>>> Having something like --limit-retained-segments partially addresses
>>> it, as long as there is a way to define an automatic mode, based on
>>> statvfs() obviously.
>>
>>
>> But that is not portable/usable enough, as we have determined, I think.
>>
>> Have you looked into using inotify for implementing your use case?
>
>
> This thread has been idle for quite a while.  Please respond and/or post a
> new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked
> "Returned with Feedback".

I have no idea what to do here, so I just marked it as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-03-21 Thread Michael Banck
Am Dienstag, den 21.03.2017, 15:34 -0400 schrieb Robert Haas:
> On Sun, Mar 19, 2017 at 12:01 PM, Magnus Hagander  wrote:
> > I think maybe we should output a message when the slot is created, at least
> > in verbose mode, to make sure people realize that happened. Does that seem
> > reasonable?
> 
> Slots are great until you leave one lying around by accident.  I'm
> afraid that no matter what we do, we're going to start getting
> complaints from people who mess that up.  For example, somebody
> creates a replica using the new super-easy method, and then blows it
> away without dropping the slot from the master, and then days or weeks
> later pg_wal fills up and takes the server down.  The user says, oh,
> these old write-ahead log files should have gotten removed, and
> removes them all.  Oops.

Hrm. 

Maybe it would be useful to log a warning like "slot XY has not been
active for X minutes", based on some threshold, though possibly the
people not keeping an eye on their replication slots won't spot that in
the logfiles, either.

> So I tend to think that there should always be some explicit user
> action to cause the creation of a slot, like --create-slot-if-needed
> or --create-slot=name.  That still won't prevent careless use of that
> option but it's less dangerous than assuming that a user who refers to
> a nonexistent slot intended to create it when, perhaps, they just
> typo'd it.

I guess if we decide (physical) slots should not be created implicitly,
then using the same UI as pg_receivewal makes sense for the sake of
consistency, i.e. "--slot=name --create-slot [--if-not-exists]". That is
rather verbose though and I don't think the --if-not-exists is great UX,
but maybe there is some use-case for erroring out if a slot already
exists that I haven't thought of.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 2:03 PM, Robert Haas  wrote:
> I agree that the extent to which code reuse is possible here is
> somewhat unclear, but I am 100% confident that the answer is non-zero.
> You and Thomas both need BufFiles that can be shared across multiple
> backends associated with the same ParallelContext.  I don't understand
> how you can argue that it's reasonable to have two different ways of
> sharing the same kind of object across the same set of processes.

I didn't argue that. Rather, I argued that there is going to be
significant additional requirements for PHJ, because it has to support
arbitrary many BufFiles, rather than either 1 or 2 (one per
tuplesort/logtapeset). Just how "signficant" that would be I cannot
say, regrettably. (Or, we're going to have to make logtape.c multiplex
BufFiles, which risks breaking other logtape.c routines that aren't
even used just yet.)

>> Isn't that an essential part of having a refcount, in general? You
>> were the one that suggested refcounting.
>
> No, quite the opposite.  My point in suggesting adding a refcount was
> to avoid needing to have a single owner.  Instead, the process that
> decrements the reference count to zero becomes responsible for doing
> the cleanup.  What you've done with the ref count is use it as some
> kind of medium for transferring responsibility from backend A to
> backend B; what I want is to allow backends A, B, C, D, E, and F to
> attach to the same shared resource, and whichever one of them happens
> to be the last one out of the room shuts off the lights.

Actually, that's quite possible with the design I came up with. The
restriction that Thomas can't live with as I've left things is that
you have to know the number of BufFiles ahead of time. I'm pretty sure
that that's all it is. (I do sympathize with the fact that that isn't
very helpful to him, though.)

> As I've said before, I think that's an anti-goal.  This is a different
> problem, and trying to reuse the solution we chose for the
> non-parallel case doesn't really work.  resowner.c could end up owning
> a shared reference count which it's responsible for decrementing --
> and then decrementing it removes the file if the result is zero.  But
> it can't own performing the actual unlink(), because then we can't
> support cases where the file may have multiple readers, since whoever
> owns the unlink() might try to zap the file out from under one of the
> others.

Define "zap the file". I think, based on your remarks here, that
you've misunderstood my design. I think you should at least understand
it fully if you're going to dismiss it.

It is true that a worker resowner can unlink() the files
mid-unification, in the same manner as with conventional temp files,
and not decrement its refcount in shared memory, or care at all in any
special way. This is okay because the leader (in the case of parallel
tuplesort) will realize that it should not "turn out the lights",
finding that remaining reference when it calls BufFileClose() in
registered callback, as it alone must. It doesn't matter that the
unlink() may have already occurred, or may be just about to occur,
because we are only operating on already-opened files, and never on
the link itself (we don't have to stat() the file link for example,
which is naturally only a task for the unlink()'ing backend anyway).
You might say that the worker only blows away the link itself, not the
file proper, since it may still be open in leader (say).

** We rely on the fact that files are themselves a kind of reference
counted thing, in general; they have an independent existence from the
link originally used to open() them. **

The reason that there is a brief wait in workers for parallel
tuplesort is because it gives us the opportunity to have the
immediately subsequent worker BufFileClose() not turn out the lights
in worker, because leader must have a reference on the BufFile when
workers are released. So, there is a kind of interlock that makes sure
that there is always at least 1 owner.

** There would be no need for an additional wait but for the fact the
leader wants to unify multiple worker BufFiles as one, and must open
them all at once for the sake of simplicity. But that's just how
parallel tuplesort in particular happens to work, since it has only
one BufFile in the leader, which it wants to operate on with
everything set up up-front. **

Thomas' design cannot reliably know how many segments there are in
workers in error paths, which necessitates his unlink()-ENOENT-ignore
hack. My solution is that workers/owners look after their own temp
segments in the conventional way, until they reach BufFileClose(),
which may never come if there is an error. The only way that clean-up
won't happen in conventional resowner.c-in-worker fashion is if
BufFileClose() is reached in owner/worker. BufFileClose() must be
reached when there is no error, which has to happen anyway when using
temp files. (Else there is 

[HACKERS] Aggregates and row types

2017-03-21 Thread Thomas Munro
Hi,

Is is expected that the first query below can be analysed and planned,
but the second can't?

explain select x from (select row(42)) s(x);

explain select count(x) from (select row(42)) s(x);
ERROR:  record type has not been registered

That may be a strange thing to want to do, but it's something I
noticed and thought I'd ask about, when I was trying (and failing) to
find a query that would get transient types into a hash table in a
parallel worker for a test case yesterday.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-21 Thread David Steele

On 3/21/17 2:34 PM, Fujii Masao wrote:

On Tue, Mar 21, 2017 at 11:03 AM, Tsunakawa, Takayuki
 wrote:

From: David Steele [mailto:da...@pgmasters.net]

Well, that's embarrassing.  When I recreated the function to add defaults
I messed up the AS clause and did not pay attention to the results of the
regression tests, apparently.

Attached is a new version rebased on 88e66d1.  Catalog version bump has
also been omitted.


I was late to reply because yesterday was a national holiday in Japan.  I 
marked this as ready for committer.  The patch applied cleanly and worked as 
expected.


The patch basically looks good to me, but one comment is;
backup.sgml (at least the description for "Making a non-exclusive
low level backup) seems to need to be updated.


Agreed.  Added in the attached patch and rebased on 8027556.

Thanks!
--
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 2d67521..c04eb46 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -887,7 +887,7 @@ SELECT pg_start_backup('label', false, false);
 
  In the same connection as before, issue the command:
 
-SELECT * FROM pg_stop_backup(false);
+SELECT * FROM pg_stop_backup(false [, true ]);
 
  This terminates the backup mode and performs an automatic switch to
  the next WAL segment.  The reason for the switch is to arrange for
@@ -895,6 +895,15 @@ SELECT * FROM pg_stop_backup(false);
  ready to archive.
 
 
+ If the backup process monitors the WAL archiving process independently,
+ the second parameter (which defaults to true) can be set to false to
+ prevent pg_stop_backup from blocking until all WAL is
+ archived.  Instead, the function will return as soon as the stop backup
+ record is written to the WAL.  This option must be used with caution:
+ if WAL archiving is not monitored correctly then the result might be a
+ useless backup.
+
+
  The pg_stop_backup will return one row with three
  values. The second of these fields should be written to a file named
  backup_label in the root directory of the backup. The
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9408a25..4dc30ca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18355,7 +18355,7 @@ SELECT set_config('log_statement_stats', 'off', false);
   
   

-pg_stop_backup(exclusive 
boolean)
+pg_stop_backup(exclusive 
boolean , wait_for_archive boolean 
)
 
setof record
Finish performing exclusive or non-exclusive on-line backup 
(restricted to superusers by default, but other users can be granted EXECUTE to 
run the function)
@@ -18439,7 +18439,13 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_start_backup. In a non-exclusive backup, the contents of
 the backup_label and tablespace_map are returned
 in the result of the function, and should be written to files in the
-backup (and not in the data directory).
+backup (and not in the data directory).  There is an optional second
+parameter of type boolean.  If false, the pg_stop_backup
+will return immediately after the backup is completed without waiting for
+WAL to be archived.  This behavior is only useful for backup
+software which independently monitors WAL archiving. Otherwise, WAL
+required to make the backup consistent might be missing and make the backup
+useless.

 

diff --git a/src/backend/access/transam/xlogfuncs.c 
b/src/backend/access/transam/xlogfuncs.c
index 96aa15e..1ef9f2b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -190,6 +190,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
boolnulls[3];
 
boolexclusive = PG_GETARG_BOOL(0);
+   boolwait_for_archive = PG_GETARG_BOOL(1);
XLogRecPtr  stoppoint;
 
/* check to see if caller supports us returning a tuplestore */
@@ -232,7 +233,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 * Stop the exclusive backup, and since we're in an exclusive 
backup
 * return NULL for both backup_label and tablespace_map.
 */
-   stoppoint = do_pg_stop_backup(NULL, true, NULL);
+   stoppoint = do_pg_stop_backup(NULL, wait_for_archive, NULL);
exclusive_backup_running = false;
 
nulls[1] = true;
@@ -250,7 +251,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 * Stop the non-exclusive backup. Return a copy of the backup 
label
 * and tablespace map so they can be written to disk by the 
caller.
 */
-   stoppoint = do_pg_stop_backup(label_file->data, true, NULL);
+   stoppoint = do_pg_stop_backup(label_file->data, 
wait_for_archive, NULL);
nonexclusive_backup_running = false;
   

Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 6:02 PM, David Steele  wrote:
> The biggest downside I can see is that this would change the naming scheme
> for the default of 16MB compared to previous versions of Postgres.  However,
> for all other wal-seg-size values changes would need to be made anyway.

I think changing the naming convention for 16MB WAL segments, which is
still going to be what 99% of people use, is an awfully large
compatibility break for an awfully marginal benefit.  We've already
created quite a few incompatibilities in this release, and I'm not
entirely eager to just keep cranking them out at top speed.  Where
it's necessary to achieve forward progress in some area, sure, but
this feels gratuitous to me.  I agree that we might have picked your
scheme if we were starting from scratch, but I have a hard time
believing it's a good idea to do it now just because of this patch.
Changing the WAL segment size has been supported for a long time, and
I don't see the fact that it will now potentially be
initdb-configurable rather than configure-configurable as a sufficient
justification for whacking around the naming scheme -- even though I
don't love the naming scheme we've got.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Removes uninitialized variable compiler warning

2017-03-21 Thread Todd Sedano
We can disregard this patch. It appears that PointerGetDatum initializes
lenlemm.


On Mon, Mar 20, 2017 at 10:04 AM, Tom Lane  wrote:

> Todd Sedano  writes:
> > This patch removes a compiler warning.
> > warning: variable 'lenlemm' is uninitialized when used here
> > [-Wuninitialized]
>
> Hm, on what compiler?  AFAICS, that parsetext() function hasn't
> changed meaningfully since 2007, and nobody complained of
> uninitialized-variable warnings in it before.
>
> We're generally willing to try to silence such warnings on mainstream
> compilers, but not on weird ones ...
>
> regards, tom lane
>


Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Peter Eisentraut
On 3/21/17 15:22, Robert Haas wrote:
> If you take the approach that Beena did, then you lose the
> correspondence with LSNs, which is admittedly not great but there are
> already helper functions available to deal with LSN -> filename
> mappings and I assume those will continue to work. If you take the
> opposite approach, then WAL filenames stop being consecutive, which
> seems to me to be far worse in terms of user and tool confusion.

Anecdotally, I think having the file numbers consecutive is very
important, for debugging and feel-good factor.

If you want to raise the segment size and preserve the LSN mapping, then
pick 256 MB as your next size.

I do think, however, that this has the potential of creating another
ongoing source of confusion similar to oid vs relfilenode, where the
numbers are often the same, except when they are not.  With hindsight, I
would have made the relfilenodes completely different from the OIDs.  We
chose to keep them (mostly) the same as the OIDs, for compatibility.  We
are seemingly making a similar kind of decision here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-21 Thread Peter Eisentraut
On 2/24/17 05:14, Dave Page wrote:
> - Adds a default role called pg_read_all_gucs
> - Allows members of pg_read_all_gucs to, well, read all GUCs
> - Grants pg_read_all_gucs to pg_monitor

Maybe pg_read_all_settings?  Consistent with pg_settings.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cast result of copyNode()

2017-03-21 Thread Mark Dilger

> On Mar 21, 2017, at 2:13 PM, David Steele  wrote:
> 
> Hi Mark,
> 
> On 3/9/17 3:34 PM, Peter Eisentraut wrote:
>> On 3/7/17 18:27, Mark Dilger wrote:
>>> You appear to be using a #define macro to wrap a function of the same name
>>> with the code:
>>> 
>>> #define copyObject(obj) ((typeof(obj)) copyObject(obj))
>> 
>> Yeah, that's a bit silly.  Here is an updated version that changes that.
> 
> Do you know when you'll have a chance to take a look at the updated patch?

The patch applies cleanly, compiles, and passes all the regression tests
for me on my laptop.  Peter appears to have renamed the function copyObject
as copyObjectImpl, which struct me as odd when I first saw it, but I don't have
a better name in mind, so that seems ok.

If the purpose of this patch is to avoid casting so many things down to (Node 
*),
perhaps some additional work along the lines of the patch I'm attaching are
appropriate.  (This patch applies on top Peter's v2 patch).  The idea being to
keep objects as (Expr *) where appropriate, rather than casting through (Node *)
quite so much.

I'm not certain that this is (a) merely a bad idea, (b) a different idea than 
what
Peter is proposing, and as such should be submitted independently, or
(c) something that aught to be included in Peter's patch prior to commit.
I only applied this idea to one file, and maybe not completely in that file, 
because
I'd like feedback before going any further along these lines.

mark



ideas_on_top_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] few fts functions for jsonb

2017-03-21 Thread Dmitry Dolgov
> On 21 March 2017 at 03:03, Andrew Dunstan 
wrote:
>
> However, I think it should probably be broken up into a couple of pieces -
> one for the generic json/jsonb transforms infrastructure (which probably
> needs some more comments) and one for the FTS functions that will use it.

Sure, here are two patches with separated functionality and a bit more
commentaries for the transform functions.
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 6a7aab2..bac08c0 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -42,6 +42,8 @@
 #define JB_PATH_CREATE_OR_INSERT \
 	(JB_PATH_INSERT_BEFORE | JB_PATH_INSERT_AFTER | JB_PATH_CREATE)
 
+#define is_jsonb_data(type) (type == WJB_KEY || type == WJB_VALUE || type == WJB_ELEM)
+
 /* state for json_object_keys */
 typedef struct OkeysState
 {
@@ -52,6 +54,23 @@ typedef struct OkeysState
 	int			sent_count;
 } OkeysState;
 
+/* state for iterate_json function */
+typedef struct IterateJsonState
+{
+	JsonLexContext		*lex;
+	JsonIterateAction	action;			/* an action that will be applied to each json value */
+	void*action_state;	/* any necessary context for iteration */
+} IterateJsonState;
+
+/* state for transform_json function */
+typedef struct TransformJsonState
+{
+	JsonLexContext		*lex;
+	StringInfo			strval;			/* resulting json */
+	JsonTransformAction	action;			/* an action that will be applied to each json value */
+	void*action_state;	/* any necessary context for transformation */
+} TransformJsonState;
+
 /* state for json_get* functions */
 typedef struct GetState
 {
@@ -271,6 +290,18 @@ static void setPathArray(JsonbIterator **it, Datum *path_elems,
 			 int level, Jsonb *newval, uint32 nelems, int op_type);
 static void addJsonbToParseState(JsonbParseState **jbps, Jsonb *jb);
 
+/* function supporting iterate_json(b) */
+static void apply_action(void *state, char *token, JsonTokenType tokentype);
+
+/* function supporting transform_json(b) */
+static void transform_object_start(void *state);
+static void transform_object_end(void *state);
+static void transform_array_start(void *state);
+static void transform_array_end(void *state);
+static void transform_object_field_start(void *state, char *fname, bool isnull);
+static void transform_array_element_start(void *state, bool isnull);
+static void transform_scalar(void *state, char *token, JsonTokenType tokentype);
+
 
 /*
  * SQL function json_object_keys
@@ -4130,3 +4161,206 @@ setPathArray(JsonbIterator **it, Datum *path_elems, bool *path_nulls,
 		}
 	}
 }
+
+/*
+ * Iterate over jsonb string values or elements, and pass them together with
+ * an iteration state to a specified JsonIterateAction.
+ */
+void *
+iterate_jsonb_values(Jsonb *jb, void *state, JsonIterateAction action)
+{
+	JsonbIterator		*it;
+	JsonbValue			v;
+	JsonbIteratorToken	type;
+
+	it = JsonbIteratorInit(>root);
+
+	while ((type = JsonbIteratorNext(, , false)) != WJB_DONE)
+	{
+		if ((type == WJB_VALUE || type == WJB_ELEM) && v.type == jbvString)
+		{
+			action(state, v.val.string.val, v.val.string.len);
+		}
+	}
+
+	return state;
+}
+
+/*
+ * Iterate over json string values or elements, and pass them together with an
+ * iteration state to a specified JsonIterateAction.
+ */
+void *
+iterate_json_values(text *json, void *action_state, JsonIterateAction action)
+{
+	JsonLexContext *lex = makeJsonLexContext(json, true);
+	JsonSemAction *sem = palloc0(sizeof(JsonSemAction));
+	IterateJsonState   *state = palloc0(sizeof(IterateJsonState));
+
+	state->lex = lex;
+	state->action = action;
+	state->action_state = action_state;
+
+	sem->semstate = (void *) state;
+	sem->scalar = apply_action;
+
+	pg_parse_json(lex, sem);
+
+	return state;
+}
+
+/*
+ * An auxiliary function for iterate_json_values to invoke a specified
+ * JsonIterateAction.
+ */
+static void
+apply_action(void *state, char *token, JsonTokenType tokentype)
+{
+	IterateJsonState   *_state = (IterateJsonState *) state;
+	if (tokentype == JSON_TOKEN_STRING)
+		(*_state->action) (_state->action_state, token, strlen(token));
+}
+
+/*
+ * Iterate over a jsonb, and apply a specified JsonTransformAction to every
+ * string value or element. Any necessary context for a JsonTransformAction can
+ * be passed in the action_state variable. Function returns a copy of an original jsonb
+ * object with transformed values.
+ */
+Jsonb *
+transform_jsonb(Jsonb *jsonb, void *action_state, JsonTransformAction transform_action)
+{
+	JsonbIterator		*it;
+	JsonbValue			v, *res = NULL;
+	JsonbIteratorToken	type;
+	JsonbParseState		*st = NULL;
+	text*out;
+	boolis_scalar = false;
+
+	it = JsonbIteratorInit(>root);
+	is_scalar = it->isScalar;
+
+	while ((type = JsonbIteratorNext(, , false)) != WJB_DONE)
+	{
+		if ((type == WJB_VALUE || type == WJB_ELEM) && v.type == jbvString)
+		{
+			out = transform_action(action_state, v.val.string.val, v.val.string.len);
+			v.val.string.val = 

Re: [HACKERS] PL/Python: Add cursor and execute methods to plan object

2017-03-21 Thread Andrew Dunstan


On 03/16/2017 05:32 PM, David Steele wrote:
> On 2/25/17 1:27 PM, Peter Eisentraut wrote:
>> Something that has been bothering me in PL/Python for a long time is the
>> non-object-oriented way in which plans are prepared and executed:
>>
>> plan = plpy.prepare(...)
>> res = plpy.execute(plan, ...)
>>
>> where plpy.execute() takes either a plan or a query string.
>>
>> I think a better style would be
>>
>> plan = plpy.prepare(...)
>> res = plan.execute(...)
>>
>> so that the "plan" is more like a statement handle that one finds in
>> other APIs.
>>
>> This ended up being very easy to implement, so I'm proposing to allow
>> this new syntax as an alternative.
>>
>> I came across this again as I was developing the background sessions API
>> for PL/Python.  So I'm also wondering here which style people prefer so
>> I can implement it there.
> This patch applies cleanly at cccbdde.
>
> Any Python folks out there who would like to take a crack at reviewing this?
>
>


I'm not particularly a Python folk, but I've done enough over the years
with PLs, including PLPython, that I think I can review this :-)

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-03-21 Thread David Steele

Hi Alexander,

On 3/11/17 7:06 AM, Pavel Stehule wrote:


I am sending a updated version with separated sort direction in special
variable

There is a question. Has desc direction sense for columns like schema or
table name?

Using desc, asc for size is natural. But for tablename?


Do you know when you'll have a chance to review the updated patch?

Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Mithun Cy
On Tue, Mar 21, 2017 at 8:10 PM, Robert Haas  wrote:
> If the WAL writing hides the loss, then I agree that's not a big
> concern.  But if the loss is still visible even when WAL is written,
> then I'm not so sure.

The tests table schema was taken from earlier tests what Pavan has posted
[1], hence it is UNLOGGED all I tried to stress the tests. Instead of
updating 1 row at a time through pgbench (For which I and Pavan both did
not see any regression), I tried to update all the rows in the single
statement. I have changed the settings as recommended and did a quick test
as above in our machine by removing UNLOGGED world in create table
statement.

Patch Tested : Only 0001_interesting_attrs_v18.patch in [2]

Machine: Scylla [ Last time I did same tests on IBM power2 but It is not
immediately available. So trying on another intel based performance
machine.]

[mithun.cy@scylla bin]$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):56
On-line CPU(s) list:   0-55
Thread(s) per core:2
Core(s) per socket:14
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:6
Model: 63
Model name:Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
Stepping:  2
CPU MHz:   1235.800
BogoMIPS:  4594.35
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  35840K
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55

[mithun.cy@scylla bin]$ cat /proc/meminfo
MemTotal:   65687464 kB


Postgresql.conf non default settings
===
shared_buffers  = 24 GB
max_wal_size = 10GB
min_wal_size = 5GB
synchronous_commit=off
autovacuum = off  /*manually doing vacumm full before every update. */

This system has 2 storage I have kept datadir on spinning disc and pg_wal
on ssd.

Tests :

DROP TABLE IF EXISTS testtab;

CREATE TABLE testtab (

col1 integer,

col2 text,

col3 float,

col4 text,

col5 text,

col6 char(30),

col7 text,

col8 date,

col9 text,

col10 text

);

INSERT INTO testtab

SELECT generate_series(1,1000),

md5(random()::text),

random(),

md5(random()::text),

md5(random()::text),

md5(random()::text)::char(30),

md5(random()::text),

now(),

md5(random()::text),

md5(random()::text);

CREATE INDEX testindx ON testtab (col1, col2, col3, col4, col5, col6, col7,
col8, col9);
Performance measurement tests: Ran12 times to eliminate run to run
latencies.
==
VACUUM FULL;
BEGIN;
UPDATE testtab SET col2 = md5(random()::text);
ROLLBACK;

Response time recorded shows there is a much higher increase in response
time from 10% to 25% after the patch.


[1] Re: rewrite HeapSatisfiesHOTAndKey

[2] Re: Patch: Write Amplification Reduction Method (WARM)

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


WARM_test.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread David Steele

On 3/21/17 3:22 PM, Robert Haas wrote:

On Tue, Mar 21, 2017 at 9:04 AM, Stephen Frost  wrote:

In short, I'm also concerned about this change to make WAL file names no
longer match up with LSNs and also about the odd stepping that you get
as a result of this change when it comes to WAL file names.


OK, that's a bit surprising to me, but what do you want to do about
it?  If you take the approach that Beena did, then you lose the
correspondence with LSNs, which is admittedly not great but there are
already helper functions available to deal with LSN -> filename
mappings and I assume those will continue to work. If you take the
opposite approach, then WAL filenames stop being consecutive, which
seems to me to be far worse in terms of user and tool confusion.


They are already non-consecutive.  Does 00010002 really 
logically follow 0001000100FF?  Yeah, sort of, if you know 
the rules.



Also
note that, both currently and with the patch, you can also reduce the
WAL segment size.  David's proposed naming scheme doesn't handle that
case, I think, and I think it would be all kinds of a bad idea to use
one file-naming approach for segments < 16MB and a separate approach
for segments > 16MB.  That's not making anything easier for users or
tool authors.


I believe it does handle that case, actually.  The minimum WAL segment 
size is 1MB so they would increase like:


00010001
000100010010
000100010020
...
00010001FFF0

You could always calculate the next WAL file by adding 
(wal_seg_size_in_mb << 20) to the previous WAL file's LSN.  This would 
even work for WAL segments > 4GB.  Overall, I think this would make 
calculating WAL ranges simpler than it is now.


The biggest downside I can see is that this would change the naming 
scheme for the default of 16MB compared to previous versions of 
Postgres.  However, for all other wal-seg-size values changes would need 
to be made anyway.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2017-03-21 Thread Michael Paquier
On Wed, Mar 22, 2017 at 6:24 AM, Andrew Dunstan
 wrote:
> This is really a pretty small patch all things considered, and pretty
> low-risk (although I haven;t been threough the code in fine detail yet).
> In the end I'm persuaded by Andres' point that there's actually no
> practical alternative way to make sure the data is actually synced to disk.
>
> If nobody else wants to pick it up I will, unless there is a strong
> objection.

Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Thomas Munro
On Wed, Mar 22, 2017 at 10:03 AM, Robert Haas  wrote:
> On Tue, Mar 21, 2017 at 3:50 PM, Peter Geoghegan  wrote:
>> I disagree with that. It is a
>> trade-off, I suppose. I have now run out of time to work through it
>> with you or Thomas, though.
>
> Bummer.

I'm going to experiment with refactoring the v10 parallel CREATE INDEX
patch to use the SharedBufFileSet interface from
hj-shared-buf-file-v8.patch today and see what problems I run into.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication apply to run with sync commit off by default

2017-03-21 Thread Petr Jelinek
On 21/03/17 18:54, Robert Haas wrote:
> On Mon, Mar 20, 2017 at 7:56 PM, Petr Jelinek
>  wrote:
>> On 18/03/17 13:31, Petr Jelinek wrote:
>>> On 07/03/17 06:23, Petr Jelinek wrote:
 there has been discussion at the logical replication initial copy thread
 [1] about making apply work with sync commit off by default for
 performance reasons and adding option to change that per subscription.

 Here I propose patch to implement this - it adds boolean column
 subssynccommit to pg_subscription catalog which decides if
 synchronous_commit should be off or local for apply. And it adds
 SYNCHRONOUS_COMMIT = boolean to the list of WITH options for CREATE and
 ALTER SUBSCRIPTION. When nothing is specified it will set it to false.

 The patch is built on top of copy patch currently as there are conflicts
 between the two and this helps a bit with testing of copy patch.

 [1]
 https://www.postgresql.org/message-id/CA+TgmoY7Lk2YKArcp4O=Qu=xoor8j71mad1oteojawmuje3...@mail.gmail.com

>>>
>>> I rebased this patch against recent changes and the latest version of
>>> copy patch.
>>
>> And another rebase after pg_dump tests commit.
> 
> +else if (strcmp(defel->defname, "nosynchronous_commit") == 0
> && synchronous_commit)
> +{
> +if (synchronous_commit_given)
> +ereport(ERROR,
> +(errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options")));
> +
> +synchronous_commit_given = true;
> +*synchronous_commit = !defGetBoolean(defel);
> +}
> 
> Uh, what's this nosynchronous_commit thing?

Ah originally I didn't have it as bool just as (no)synchronous_commit,
forgot to rip this out.


> 
> +  local otherwise to false. The
> +  default value is false independently of the default
> +  synchronous_commit setting for the instance.
> 
> This phrasing isn't very clear or accurate, IMHO.  I'd say something
> like "The value of this parameter overrides the synchronous_commit
> setting.  The default value is false."  And I'd make the word
> synchronous_commit in that sentence a link to the GUC, so that it's
> absolutely unmistakable what we mean by "the synchronous_commit
> setting".

Okay.

> 
>  /*
> + * We need to make new connection to new slot if slot name has changed
> + * so exit here as well if that's the case.
> + */
> +if (strcmp(newsub->slotname, MySubscription->slotname) != 0)
> +{
> +ereport(LOG,
> +(errmsg("logical replication worker for subscription
> \"%s\" will "
> +"restart because the replication slot name
> was changed",
> +MySubscription->name)));
> +
> +walrcv_disconnect(wrconn);
> +proc_exit(0);
> +}
> 
> Looks unrelated.
> 

Oops, need to fix this separately.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 1234142533027e411a2ea5feff60f782402cefe2 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Mon, 6 Mar 2017 13:07:45 +0100
Subject: [PATCH] Add option to modify sync commit per subscription

This also changes default behaviour of subscription workers to
synchronous_commit = off
---
 doc/src/sgml/catalogs.sgml | 11 +++
 doc/src/sgml/ref/alter_subscription.sgml   |  1 +
 doc/src/sgml/ref/create_subscription.sgml  | 12 
 src/backend/catalog/pg_subscription.c  |  1 +
 src/backend/commands/subscriptioncmds.c| 49 --
 src/backend/replication/logical/launcher.c |  2 +-
 src/backend/replication/logical/worker.c   | 10 ++
 src/bin/pg_dump/pg_dump.c  | 12 +++-
 src/bin/pg_dump/pg_dump.h  |  1 +
 src/bin/pg_dump/t/002_pg_dump.pl   |  2 +-
 src/bin/psql/describe.c|  5 ++-
 src/include/catalog/pg_subscription.h  | 11 ---
 src/test/regress/expected/subscription.out | 27 
 src/test/regress/sql/subscription.sql  |  3 +-
 14 files changed, 115 insertions(+), 32 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 228ec78..f71d9c9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6395,6 +6395,17 @@
  
 
  
+  subsynccommit
+  bool
+  
+  
+   If true, the apply for the subscription will run with
+   synchronous_commit set to local.
+   Otherwise it will have it set to false.
+  
+ 
+
+ 
   subconninfo
   text
   
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 6335e17..712de98 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -26,6 +26,7 @@ ALTER SUBSCRIPTION name WITH ( where suboption can 

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-03-21 Thread David Steele

On 3/21/17 2:31 PM, Andrew Dunstan wrote:

On 03/21/2017 01:37 PM, David Steele wrote:

>>

This thread has been idle for months since Tom's review.

The submission has been marked "Returned with Feedback".  Please feel
free to resubmit to a future commitfest.


Please revive. I am planning to look at this later this week.


Revived in "Waiting on Author" state.

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2017-03-21 Thread Andrew Dunstan


On 03/04/2017 01:08 AM, Robert Haas wrote:
> On Thu, Mar 2, 2017 at 5:02 AM, Michael Paquier
>  wrote:
>> On Thu, Mar 2, 2017 at 2:26 AM, David Steele  wrote:
>>> This patch is in need of a committer.  Any takers?
>>> I didn't see a lot of enthusiasm from committers on the thread
>> Stephen at least has showed interest.
>>
>>> so if nobody picks it up by the end of the CF I'm going to mark the patch 
>>> RWF.
>> Yes, that makes sense. If no committer is willing to have a look at
>> code-level and/or has room for this patch then it is as good as
>> doomed. Except for bug fixes, I have a couple of other patches that
>> are piling up so they would likely get the same treatment. There is
>> nothing really we can do about that.
> Before we reach the question of whether committers have time to look
> at this, we should first consider the question of whether it has
> achieved consensus.  I'll try to summarize the votes:
>
> Tom Lane: premise pretty dubious
> Robert Haas: do we even want this?
> Peter Eisentraut: I had voiced a similar concern [to Robert's] previously
> Albe Laurenz: I think we should have that
> Andres Freund: [Tom's counterproposal won't work]
> Robert Haas: [Andres has a good point, still nervous] I'm not sure
> there's any better alternative to what's being proposed, though.
> Fujii Masao: One idea is to provide the utility or extension which
> fsync's the specified files and directories
>
> Here's an attempt to translate those words into numerical votes.  As
> per my usual practice, I will count the patch author as +1:
>
> Michael Paquier: +1
> Tom Lane: -1
> Peter Eisentraut: -1
> Albe Laurenz: +1
> Andres Freund: +1
> Robert Haas: +0.5
> Fujii Masao: -0.5
>
> So, my interpretation is that, out of 7 votes, we have -2.5 and +3.5.
> Perhaps that is a consensus for proceeding, but if so it's a pretty
> marginal one.  I think the next step for this patch is to consider why
> we shouldn't, in lieu of what's proposed here, add a pg_fsync utility
> that fsyncs a file or recursively fsyncs a directory, ship that, and
> let people use it on their pg_dump files and/or base backups if they
> wish.  I am not altogether convinced that's a better option, but I am
> also not altogether convinced that it's worse.  Also, if anyone else
> wishes to vote, or if anyone to whom I've attributed a vote wishes to
> assign a numerical value to their vote other than the one I've
> assigned, please feel free.
>
> The issue with this patch isn't that nobody is interested, but that
> not everybody thinks it's a good idea.
>


This is really a pretty small patch all things considered, and pretty
low-risk (although I haven;t been threough the code in fine detail yet).
In the end I'm persuaded by Andres' point that there's actually no
practical alternative way to make sure the data is actually synced to disk.

If nobody else wants to pick it up I will, unless there is a strong
objection.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we cacheline align PGXACT?

2017-03-21 Thread David Steele

Hi Alexander

On 3/10/17 8:08 AM, Alexander Korotkov wrote:


Results look good for me.  Idea of committing both of patches looks
attractive.
We have pretty much acceleration for read-only case and small
acceleration for read-write case.
I'll run benchmark on 72-cores machine as well.


Have you had a chance to run those tests yet?

Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-21 Thread David Steele

On 3/10/17 7:16 AM, Aleksander Alekseev wrote:


Thanks a lot for the review!


Anyone want to take a crack at reviewing this new version?

Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cast result of copyNode()

2017-03-21 Thread David Steele

Hi Mark,

On 3/9/17 3:34 PM, Peter Eisentraut wrote:

On 3/7/17 18:27, Mark Dilger wrote:

You appear to be using a #define macro to wrap a function of the same name
with the code:

#define copyObject(obj) ((typeof(obj)) copyObject(obj))


Yeah, that's a bit silly.  Here is an updated version that changes that.


Do you know when you'll have a chance to take a look at the updated patch?

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 3:50 PM, Peter Geoghegan  wrote:
> On Tue, Mar 21, 2017 at 12:06 PM, Robert Haas  wrote:
>> From my point of view, the main point is that having two completely
>> separate mechanisms for managing temporary files that need to be
>> shared across cooperating workers is not a good decision.  That's a
>> need that's going to come up over and over again, and it's not
>> reasonable for everybody who needs it to add a separate mechanism for
>> doing it.  We need to have ONE mechanism for it.
>
> Obviously I understand that there is value in code reuse in general.
> The exact extent to which code reuse is possible here has been unclear
> throughout, because it's complicated for all kinds of reasons. That's
> why Thomas and I had 2 multi-hour Skype calls all about it.

I agree that the extent to which code reuse is possible here is
somewhat unclear, but I am 100% confident that the answer is non-zero.
You and Thomas both need BufFiles that can be shared across multiple
backends associated with the same ParallelContext.  I don't understand
how you can argue that it's reasonable to have two different ways of
sharing the same kind of object across the same set of processes.  And
if that's not reasonable, then somehow we need to come up with a
single mechanism that can meet both your requirements and Thomas's
requirements.

>> It's just not OK in my book for a worker to create something that it
>> initially owns and then later transfer it to the leader.
>
> Isn't that an essential part of having a refcount, in general? You
> were the one that suggested refcounting.

No, quite the opposite.  My point in suggesting adding a refcount was
to avoid needing to have a single owner.  Instead, the process that
decrements the reference count to zero becomes responsible for doing
the cleanup.  What you've done with the ref count is use it as some
kind of medium for transferring responsibility from backend A to
backend B; what I want is to allow backends A, B, C, D, E, and F to
attach to the same shared resource, and whichever one of them happens
to be the last one out of the room shuts off the lights.

>> The cooperating backends should have joint ownership of the objects from
>> the beginning, and the last process to exit the set should clean up
>> those resources.
>
> That seems like a facile summary of the situation. There is a sense in
> which there is always joint ownership of files with my design. But
> there is also a sense is which there isn't, because it's impossible to
> do that while not completely reinventing resource management of temp
> files. I wanted to preserve resowner.c ownership of fd.c segments.

As I've said before, I think that's an anti-goal.  This is a different
problem, and trying to reuse the solution we chose for the
non-parallel case doesn't really work.  resowner.c could end up owning
a shared reference count which it's responsible for decrementing --
and then decrementing it removes the file if the result is zero.  But
it can't own performing the actual unlink(), because then we can't
support cases where the file may have multiple readers, since whoever
owns the unlink() might try to zap the file out from under one of the
others.

> You maintain that it's better to have the leader unlink() everything
> at the end, and suppress the errors when that doesn't work, so that
> that path always just plows through.

I don't want the leader to be responsible for anything.  I want the
last process to detach to be responsible for cleanup, regardless of
which process that ends up being.  I want that for lots of good
reasons which I have articulated including (1) it's how all other
resource management for parallel query already works, e.g. DSM, DSA,
and group locking; (2) it avoids the need for one process to sit and
wait until another process assumes ownership, which isn't a feature
even if (as you contend, and I'm not convinced) it doesn't hurt much;
and (3) it allows for use cases where multiple processes are reading
from the same shared BufFile without the risk that some other process
will try to unlink() the file while it's still in use.  The point for
me isn't so much whether unlink() ever ignores errors as whether
cleanup (however defined) is an operation guaranteed to happen exactly
once.

> I disagree with that. It is a
> trade-off, I suppose. I have now run out of time to work through it
> with you or Thomas, though.

Bummer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields

2017-03-21 Thread Fabien COELHO


Hello Robert,


IMHO, what would be a lot more useful than something that generates
{read,equal,copy,out}funcs.c automatically would be something that
just checks them for trivial errors of omission.


Hmmm. Checking for errors is actually more complicated than generating the
function: basically you have to generate the function, at least
implicitely, then parse the actual functions, then compare the two, then 
generate meaningful messages. Thrice the work.



The idea would be that if you added a field that wasn't supposed to be
copied, you'd have to add something to copyfuncs.c that said, e.g.

/* NOTCOPIED: mymember */


Yep, I was thinking of maybe use directives added to header files to 
handle some special cases, but the real special cases would maybe more 
readily turned to manual to keep a simpler generation script.


I do not fancy relying on another representation/language because of Tom's 
objection that it would mean another language to learn, and I do not think 
that it is desirable in pg.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields

2017-03-21 Thread Fabien COELHO


Hello Andres,


It is not done yet, but it looks that it can work in the end with limited
effort. Currently it works for copy & equal.


It'd have to do out/read as well imo.


Sure. This part is WIP, though.


Is there some interest to generate the x00kB of sources rather than edit
them everytime, or forgetting it from time to time, or does everyone like it
as it is?


From my POV yes.  But it's not quite as trivial as just generating it
based on fields. Some fields are intentionally skipped, e.g. location,
for equalfuncs, but not copy/out/readfuncs. So there needs to be a way
to specify such special rules.


Indeed, I noticed these particularities... For some case it can be 
automated with limited input. For really special cases (eg a few read/out 
functions) , the script is told to skip some node types and the special 
manual version is taken from the file instead of being generated.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-21 Thread Denish Patel
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Looks good.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-03-21 Thread Dmitry Dolgov
> On 21 March 2017 at 18:16, David Steele  wrote:
>
> This thread has been idle for over a week.

Yes, sorry for the late reply. I'm still trying to find a better solution
for
some of the problems, that arose in this patch.

> On 15 March 2017 at 00:10, Tom Lane  wrote:
> Dmitry Dolgov <9erthali...@gmail.com> writes:
>
> I looked through this a bit.
>

Thank you for the feedback.

> > On 10 March 2017 at 06:20, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
> > I see a possible problem here:  This design only allows one subscripting
> > function.  But what you'd really want in this case is at least two: one
> > taking an integer type for selecting by array index, and one taking text
> > for selecting by field name.
>
> I would guess that what we really want for jsonb is the ability to
> intermix integer and text subscripts, so that
>  jsonbcol['foo'][42]['bar']
> would extract the "bar" field of an object in position 42 of an
> array in field "foo" of the given jsonb value.
>

Maybe I misunderstood you, but isn't it already possible?

```
=# select ('{"a": [{"b": 1}, {"c": 2}]}'::jsonb)['a'][0]['b'];
 jsonb
---
 1
(1 row)

=# select * from test;
data
-
 {"a": [{"b": 1}, {"c": 2}]}
(1 row)

=# update test set data['a'][0]['b'] = 42;
UPDATE 1

=# select * from test;
data
-
 {"a": [{"b": 42}, {"c": 2}]}
(1 row)
```

> I'm not totally excited about the naming you've chosen though,
> particularly the function names "array_subscripting()" and
> "jsonb_subscripting()" --- those are too generic, and a person coming to
> them for the first time would probably expect that they actually execute
> subscripting, when they do no such thing.  Names like
> "array_subscript_parse()" might be better.  Likewise the name of the
> new pg_type column doesn't really convey what it does, though I suppose
> "typsubscriptparse" is too much of a mouthful.  "typsubparse" seems short
> enough but might be confusing too.

It looks quite tempting for me to replace the word "subscript" by an
abbreviation all over the patch. Then it will become something like
"typsbsparse" which is not that mouthful, but I'm not sure if it will be
easily
recognizable.

> I wonder also if we should try to provide some helper functions rather
> than expecting every data type to know all there is to know about parsing
> and execution of subscripting.  Not sure what would be helpful, however.

I don't really see what details we can hide behind this helper, because
almost
all code there is type specific (e.g. to check if subscript indexes are
correct), can you elaborate on that?

> The documentation needs a lot of work of course, and I do not think
> you're doing either yourself or anybody else any favors with the proposed
> additions to src/tutorial/.

Yes, unfortunately, I forget to update documentation from the previous
version
of the patch. I'll fix it soon in the next version.

> Aren't SBS_VALIDATION and SBS_EXEC just hangovers from the previous
> design?  They're still in parse_node.h, and they're still mentioned in
> the docs, but I don't see them used in actual code anywhere.

Yes, these are from the previous version too, I'll remove them.

> Another question here is whether every datatype will be on board
> with the current rules about null subscripts, or whether we need to
> delegate null-handling to the datatype-specific execution function.
> I'm not sure; it would complicate the API significantly for what might be
> useless flexibility.

It looks for me that it's a great idea to perform null-handling inside
datatype
specific code and I'm not sure, what would be complicated? All necessary
information for that is already in `SubscriptingExecData` (I just have to
use
`eisnull` in a different way).

> I do not think that the extra SubscriptingBase data structure is paying
> for itself either; you're taking a hit in readability from the extra level
> of struct, and as far as I can find it's not buying you one single thing,
> because there's no code that operates on a SubscriptingBase argument.
> I'd just drop that idea and make two independent struct types, or else
> stick with the original ArrayRef design that made one struct serve both
> purposes.  (IOW, my feeling that a separate assign struct would be a
> readability improvement isn't exactly getting borne out by what you've
> done here.  But maybe there's a better way to do that.)

I'm thinking to replace these structures by more meaningful ones, something
like:

```
typedef struct SubscriptContainerRef
{
Expr xpr;
Oid refcontainertype;
Oid refelemtype;
int32 reftypmod;
Oid refcollid;
} SubscriptBase;

typedef struct SubscriptExtractRef
{
Exprxpr;
SubscriptContainerRef *containerExpr;
List*refupperindexpr;
List*reflowerindexpr;
Oidrefevalfunc;
} SubscriptExtractRef;

typedef struct SubscriptAssignRef
{

Re: [HACKERS] identity columns

2017-03-21 Thread Vitaly Burovoy
I haven't seen a patch (I'll do it later), just few notes:

On 3/21/17, Peter Eisentraut  wrote:
 3. Strange error (not about absence of a column; but see pp.5 and 8):
 test=# ALTER TABLE idnt ALTER COLUMN o ADD GENERATED ALWAYS AS
 IDENTITY;
 ERROR:  identity column type must be smallint, integer, or bigint
>>>
>>> What's wrong with that?
>>
>> The column mentioned there does not exist. Therefore the error should
>> be about it, not about a type of absent column:
>
> This was already fixed in the previous version.

I've just checked and still get an error about a type, not about
absence of a column:
test=# CREATE TABLE itest3 (a int generated by default as identity, b text);
CREATE TABLE
test=# ALTER TABLE itest3 ALTER COLUMN o ADD GENERATED ALWAYS AS IDENTITY;
ERROR:  identity column type must be smallint, integer, or bigint

>> Since we extend the spec by "ADD GENERATED...", "SEQUENCE NAME" and
>> allow more than one identity column, why can't we extend it by
>> allowing "SET GENERATED" for non-identity columns and drop "ADD
>> GENERATED..."?
>
> SQL commands generally don't work that way.  Either they create or they
> alter.  There are "OR REPLACE" options when you do both.

I'd agree with you if we are talking about database's objects like
tables, functions, columns etc.

> So I think it
> is better to keep these two things separate.  Also, while you argue that
> we *could* do it the way you propose, I don't really see an argument why
> it would be better.

My argument is consistency.
Since IDENTITY is a property of a column (similar to DEFAULT, NOT
NULL, attributes, STORAGE, etc.), it follows a different rule: it is
either set or not set. If it did not set before, the "SET" DDL "adds"
it, if that property already present, the DDL replaces it.
There is no "ADD" clause in DDLs like "...ALTER table ALTER column..."
(only "SET", "RESET" and "DROP")[2].
Your patch introduces the single DDL version with "...ALTER column
ADD..." for a property.

>> 16. changing a type does not change an underlying sequence type (its
>> limits):
>
> It does change the type, but changing the type doesn't change the
> limits.  That is a property of how ALTER SEQUENCE works, which was
> separately discussed.

Are you about the thread[1]? If so, I'd say the current behavior is not good.
I sent an example with users' bad experience who will know nothing
about sequences (because they'll deal with identity columns).
Would it be better to change bounds of a sequence if they match the
bounds of an old type (to the bounds of a new type)?

> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


[1] 
https://www.postgresql.org/message-id/flat/898deb94-5265-37cf-f199-4f79ef864...@2ndquadrant.com
[2] https://www.postgresql.org/docs/current/static/sql-altertable.html
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-21 Thread Alvaro Herrera
Thomas Munro wrote:

> What is your motivation for using DSA?  It seems you are creating an
> area and then using it to make exactly one allocation of a constant
> size known up front to hold your fixed size workitems array.  You
> don't do any dynamic allocation at runtime, apart from the detail that
> it happens to allocated on demand.  Perhaps it would make sense if you
> had a separate space per database or something like that, so that the
> shared memory need would be dynamic?

Well, the number of work items is currently fixed; but if you have many
BRIN indexes, then you'd overflow (lose requests).  By using DSA I am
making it easy to patch this afterwards so that an arbitrary number of
requests can be recorded.  

> It looks like outstanding autosummarisation work will be forgotten if
> you restart before it is processed.

That's true.  However, it would be easy to make index scans also request
work items if they find a full page range that should have been
summarized, so if they are lost, it's not a big deal.

> Over in another thread[1] we
> exchanged emails on another way to recognise that summarisation work
> needs to be done, if we are only interested in unsummarised ranges at
> the end of the heap.  I haven't studied BRIN enough to know if that is
> insufficient: can you finish up with unsummarised ranges not in a
> contiguous range at the end of the heap?

If we include my other patch to remove the index tuple for a certain
range, then yes, it can happen.  (That proposed patch requires manual
action, but range invalidation could also be invoked automatically when,
say, a certain number of tuples are removed from a page range.)

> If so, perhaps the BRIN
> index itself should also have a way to record that certain non-final
> ranges are unsummarised but should be summarised asynchronously?

I think this is unnecessary, and would lead to higher operating
overhead.  With this patch, it's very cheap to file a work item.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Bruce Momjian
On Tue, Mar 21, 2017 at 04:56:16PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Tue, Mar 21, 2017 at 04:43:58PM -0300, Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > > 
> > > > I don't think it makes sense to try and save bits and add complexity
> > > > when we have no idea if we will ever use them,
> > > 
> > > If we find ourselves in dire need of additional bits, there is a known
> > > mechanism to get back 2 bits from old-style VACUUM FULL.  I assume that
> > > the reason nobody has bothered to write the code for that is that
> > > there's no *that* much interest.
> > 
> > We have no way of tracking if users still have pages that used the bits
> > via pg_upgrade before they were removed.
> 
> Yes, that's exactly the code that needs to be written.

Yes, but once it is written it will take years before those bits can be
used on most installations.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-21 Thread Alvaro Herrera
Thomas Munro wrote:

> Another thought about this design:  Why autovacuum?

One reason is that autovacuum is already there, so it's convenient to
give it the responsibility for this kind of task.  Another reason is
that autovacuum is already doing this, via vacuum.  I don't see the
need to have a completely different process set for tasks that belong to
the system's cleanup process.

With this infrastructure, we could have other types of individual tasks
that could be run by autovacuum.  GIN pending list cleanup, for
instance, or VM bit setting.  Both of those are currently being doing
whenever VACUUM fires, but only because at the time they were written
there was no other convenient place to hook them onto.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Tue, Mar 21, 2017 at 04:43:58PM -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > 
> > > I don't think it makes sense to try and save bits and add complexity
> > > when we have no idea if we will ever use them,
> > 
> > If we find ourselves in dire need of additional bits, there is a known
> > mechanism to get back 2 bits from old-style VACUUM FULL.  I assume that
> > the reason nobody has bothered to write the code for that is that
> > there's no *that* much interest.
> 
> We have no way of tracking if users still have pages that used the bits
> via pg_upgrade before they were removed.

Yes, that's exactly the code that needs to be written.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 12:06 PM, Robert Haas  wrote:
> From my point of view, the main point is that having two completely
> separate mechanisms for managing temporary files that need to be
> shared across cooperating workers is not a good decision.  That's a
> need that's going to come up over and over again, and it's not
> reasonable for everybody who needs it to add a separate mechanism for
> doing it.  We need to have ONE mechanism for it.

Obviously I understand that there is value in code reuse in general.
The exact extent to which code reuse is possible here has been unclear
throughout, because it's complicated for all kinds of reasons. That's
why Thomas and I had 2 multi-hour Skype calls all about it.

> It's just not OK in my book for a worker to create something that it
> initially owns and then later transfer it to the leader.

Isn't that an essential part of having a refcount, in general? You
were the one that suggested refcounting.

> The cooperating backends should have joint ownership of the objects from
> the beginning, and the last process to exit the set should clean up
> those resources.

That seems like a facile summary of the situation. There is a sense in
which there is always joint ownership of files with my design. But
there is also a sense is which there isn't, because it's impossible to
do that while not completely reinventing resource management of temp
files. I wanted to preserve resowner.c ownership of fd.c segments.

You maintain that it's better to have the leader unlink() everything
at the end, and suppress the errors when that doesn't work, so that
that path always just plows through. I disagree with that. It is a
trade-off, I suppose. I have now run out of time to work through it
with you or Thomas, though.

> But even if were true that the waits will always be brief, I still
> think the way you've done it is a bad idea, because now tuplesort.c
> has to know that it needs to wait because of some detail of
> lower-level resource management about which it should not have to
> care.  That alone is a sufficient reason to want a better approach.

There is already a point at which the leader needs to wait, so that it
can accumulate stats that nbtsort.c cares about. So we already need a
leader wait point within nbtsort.c (that one is called directly by
nbtsort.c). Doesn't seem like too bad of a wart to have the same thing
for workers.

>> I believe that the main reason that you like the design I came up with
>> on the whole is that it's minimally divergent from the serial case.
>
> That's part of it, I guess, but it's more that the code you've added
> to do parallelism here looks an awful lot like what's gotten added to
> do parallelism in other cases, like parallel query.  That's probably a
> good sign.

It's also a good sign that it makes CREATE INDEX approximately 3 times faster.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Bruce Momjian
On Tue, Mar 21, 2017 at 04:43:58PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > I don't think it makes sense to try and save bits and add complexity
> > when we have no idea if we will ever use them,
> 
> If we find ourselves in dire need of additional bits, there is a known
> mechanism to get back 2 bits from old-style VACUUM FULL.  I assume that
> the reason nobody has bothered to write the code for that is that
> there's no *that* much interest.

We have no way of tracking if users still have pages that used the bits
via pg_upgrade before they were removed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Alvaro Herrera
Bruce Momjian wrote:

> I don't think it makes sense to try and save bits and add complexity
> when we have no idea if we will ever use them,

If we find ourselves in dire need of additional bits, there is a known
mechanism to get back 2 bits from old-style VACUUM FULL.  I assume that
the reason nobody has bothered to write the code for that is that
there's no *that* much interest.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-03-21 Thread Robert Haas
On Sun, Mar 19, 2017 at 12:01 PM, Magnus Hagander  wrote:
> I think maybe we should output a message when the slot is created, at least
> in verbose mode, to make sure people realize that happened. Does that seem
> reasonable?

Slots are great until you leave one lying around by accident.  I'm
afraid that no matter what we do, we're going to start getting
complaints from people who mess that up.  For example, somebody
creates a replica using the new super-easy method, and then blows it
away without dropping the slot from the master, and then days or weeks
later pg_wal fills up and takes the server down.  The user says, oh,
these old write-ahead log files should have gotten removed, and
removes them all.  Oops.

So I tend to think that there should always be some explicit user
action to cause the creation of a slot, like --create-slot-if-needed
or --create-slot=name.  That still won't prevent careless use of that
option but it's less dangerous than assuming that a user who refers to
a nonexistent slot intended to create it when, perhaps, they just
typo'd it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 6:36 AM, Dilip Kumar  wrote:
> On Tue, Mar 21, 2017 at 3:36 PM, Rafia Sabih
>  wrote:
>> On Wed, Mar 15, 2017 at 8:55 PM, Robert Haas  wrote:
>>> Note this:
>>>
>>> if (completed || !fcache->returnsSet)
>>> postquel_end(es);
>>>
>>> When the SQL function doesn't return a set, then we can allow
>>> parallelism even when lazyEval is set, because we'll only call
>>> ExecutorStart() once.  But my impression is that something like this:
>
> How about taking the decision of execute_once based on
> fcache->returnsSet instead of based on lazyEval?
>
> change
> + ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval);
> to
> + ExecutorRun(es->qd, ForwardScanDirection, count, !fcache->returnsSet);
>
> IMHO, Robert have the same thing in mind?

Yeah, something like that.

>>SELECT * FROM blah() LIMIT 3
>>
>>...will trigger three separate calls to ExecutorRun(), which is a
>>problem if the plan is a parallel plan.
>
> And you also need to test this case what Robert have mentioned up thread.

+1

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 9:04 AM, Stephen Frost  wrote:
> In short, I'm also concerned about this change to make WAL file names no
> longer match up with LSNs and also about the odd stepping that you get
> as a result of this change when it comes to WAL file names.

OK, that's a bit surprising to me, but what do you want to do about
it?  If you take the approach that Beena did, then you lose the
correspondence with LSNs, which is admittedly not great but there are
already helper functions available to deal with LSN -> filename
mappings and I assume those will continue to work. If you take the
opposite approach, then WAL filenames stop being consecutive, which
seems to me to be far worse in terms of user and tool confusion.  Also
note that, both currently and with the patch, you can also reduce the
WAL segment size.  David's proposed naming scheme doesn't handle that
case, I think, and I think it would be all kinds of a bad idea to use
one file-naming approach for segments < 16MB and a separate approach
for segments > 16MB.  That's not making anything easier for users or
tool authors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread Robert Haas
On Tue, Mar 14, 2017 at 6:10 PM, Peter Geoghegan  wrote:
> On Tue, Mar 14, 2017 at 2:48 PM, Peter Geoghegan  wrote:
>> I think that that's safe, but it is a little disappointing that it
>> does not allow us to skip work in the case that you really had in mind
>> when writing the patch. Better than nothing, though, and perhaps still
>> a good start. I would like to hear other people's opinions.
>
> Hmm. So, the SQL-callable function txid_current() exports a 64-bit
> XID, which includes an "epoch". If PostgreSQL used these 64-bit XIDs
> natively, we'd never need to freeze. Obviously we don't do that
> because the per-tuple overhead of visibility information is already
> high, and that would make it much worse. But, we can easily afford the
> extra overhead in a completely dead page. Maybe we can overcome the
> _bt_page_recyclable() limitation by being cleverer about how it
> determines if recycling is safe -- it can examine epoch, too. This
> would also be required in the similar function
> vacuumRedirectAndPlaceholder(), which is a part of SP-GiST.
>
> We already have BTPageOpaqueData.btpo, a union whose contained type
> varies based on the page being dead. We could just do the same with
> some other field in that struct, and then store epoch there. Clearly
> nobody really cares about most data that remains on the page. Index
> scans just need to be able to land on it to determine that it's dead,
> and VACUUM needs to be able to determine whether or not there could
> possibly be such an index scan at the time it considers recycling..
>
> Does anyone see a problem with that?

Wouldn't it break on-disk compatibility with existing btree indexes?

I think we're still trying to solve a problem that Simon postulated in
advance of evidence that shows how much of a problem it actually is.
Not only might that be unnecessary, but if we don't have a test
demonstrating the problem, we also don't have a test demonstrating
that a given approach fixes it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 2:03 PM, Peter Geoghegan  wrote:
> I think that since the comment refers to code from before 1999, it can
> go. Any separate patch to remove it would have an entirely negative
> linediff.

It's a good general principle that a patch should do one thing well
and not make unrelated changes.  I try hard to adhere to that
principle in my commits, and I think other committers generally do
(and should), too.  Of course, different people draw the line in
different places.  If you can convince another committer to include
that change in their commit of this patch, well, that's not my cup of
tea, but so be it.  If you want me to consider committing this, you're
going to have to submit that part separately, preferably on a separate
thread with a suitably descriptive subject line.

> Obviously iff you write to the file in the leader, there is little
> that the worker can do afterwards, but it's not a given that you'd
> want to do that, and this patch actually never does. You could equally
> well say that PHJ fails to provide for my requirement for having the
> leader write to the files sensibly in order to recycle blocks, a
> requirement that its shared BufFile mechanism expressly does not
> support.

>From my point of view, the main point is that having two completely
separate mechanisms for managing temporary files that need to be
shared across cooperating workers is not a good decision.  That's a
need that's going to come up over and over again, and it's not
reasonable for everybody who needs it to add a separate mechanism for
doing it.  We need to have ONE mechanism for it.

The second point is that I'm pretty convinced that the design you've
chosen is fundamentally wrong.  I've attempt to explain that multiple
times, starting about three months ago with
http://postgr.es/m/ca+tgmoyp0vzpw64dfmqt1jhy6szyavjoglkj3ermzzzn2f9...@mail.gmail.com
and continuing across many subsequent emails on multiple threads.
It's just not OK in my book for a worker to create something that it
initially owns and then later transfer it to the leader.  The
cooperating backends should have joint ownership of the objects from
the beginning, and the last process to exit the set should clean up
those resources.

>> That would cut hundreds of
>> lines from this patch with no real disadvantage that I can see --
>> including things like worker_wait(), which are only needed because of
>> the shortcomings of the underlying mechanism.
>
> I think it would definitely be a significant net gain in LOC. And,
> worker_wait() will probably be replaced by the use of the barrier
> abstraction anyway.

No, because if you do it Thomas's way, the worker can exit right away,
without waiting.  You don't have to wait via a different method; you
escape waiting altogether.  I understand that your point is that the
wait will always be brief, but I think that's probably an optimistic
assumption and definitely an unnecessary assumption.  It's optimistic
because there is absolutely no guarantee that all workers will take
the same amount of time to sort the data they read.  It is absolutely
not the case that all data sets sort at the same speed.  Because of
the way parallel sequential scan works, we're somewhat insulated from
that; workers that sort faster will get a larger chunk of the table.
However, that only means that workers will finish generating their
sorted runs at about the same time, not that they will finish merging
at the same time.  And, indeed, if some workers end up with more data
than others (so that they finish building runs at about the same time)
then some will probably take longer to complete the merging than
others.

But even if were true that the waits will always be brief, I still
think the way you've done it is a bad idea, because now tuplesort.c
has to know that it needs to wait because of some detail of
lower-level resource management about which it should not have to
care.  That alone is a sufficient reason to want a better approach.

I completely accept that whatever abstraction we use at the BufFile
level has to be something that can be plumbed into logtape.c, and if
Thomas's mechanism can't be bolted in there in a sensible way then
that's a problem.  But I feel quite strongly that the solution to that
problem isn't to adopt the approach you've taken here.

>> + * run.  Parallel workers always use quicksort, however.
>>
>> Comment fails to mention a reason.
>
> Well, I don't think that there is any reason to use replacement
> selection at all, what with the additional merge heap work last year.
> But, the theory there remains that RS is good when you can get one big
> run and no merge. You're not going to get that with parallel sort in
> any case, since the leader must merge. Besides, merging in the workers
> happens in the workers. And, the backspace requirement of 32MB of
> workMem per participant pretty much eliminates any use of RS that
> you'd get otherwise.

So, please mention that briefly in 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Bruce Momjian
On Tue, Mar 21, 2017 at 11:54:25PM +0530, Pavan Deolasee wrote:
> We can also save HEAP_WARM_UPDATED flag since this is required only for
> abort-handling case. We can find a way to push that information down to the 
> old
> tuple if UPDATE aborts and we detect the broken chain. Again, not fully 
> thought
> through, but doable. Of course, we will have to carefully evaluate all code
> paths and make sure that we don't lose that information ever.
> 
> If the consumption of bits become a deal breaker then I would first trade the
> HEAP_LATEST_TUPLE bit and then HEAP_WARM_UPDATED just from correctness
> perspective.

I don't think it makes sense to try and save bits and add complexity
when we have no idea if we will ever use them, but again, I am back to
my original question of whether we have done sufficient research, and if
everyone says "yes", I am find with that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Bruce Momjian
On Tue, Mar 21, 2017 at 11:45:09PM +0530, Pavan Deolasee wrote:
> Early in the discussion we talked about allowing multiple changes per
> WARM chain if they all changed the same index and were in the same
> direction so there were no duplicates, but it was complicated.  There
> was also discussion about checking the index during INSERT/UPDATE to see
> if there was a duplicate.  However, those ideas never led to further
> discussion.
> 
> 
> Well, once I started thinking about how to do vacuum etc, I realised that any
> mechanism which allows unlimited (even handful) updates per chain is going to
> be very complex and error prone. But if someone has ideas to do that, I am
> open. I must say though, it will make an already complex problem even more
> complex.

Yes, that is where we got stuck.  Have enough people studied the issue
to know that there are no simple answers?

> I know the current patch yields good results, but only on a narrow test
> case,
> 
> 
> Hmm. I am kinda surprised you say that because I never thought it was a narrow
> test case that we are targeting here. But may be I'm wrong.

Well, it is really a question of how often you want to do a second WARM
update (not possible) vs. the frequency of lazy vacuum.  I assumed that
would be a 100X or 10kX difference, but I am not sure myself either.  My
initial guess was that only allowing a single WARM update between lazy
vacuums would show no improvementin in real-world workloads, but maybe I
am wrong.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Bruce Momjian
On Tue, Mar 21, 2017 at 07:05:15PM +0100, Petr Jelinek wrote:
> >> Well, I don't want to rule it out either, but if we do a release to
> >> which you can't pg_upgrade, it's going to be really painful for a lot
> >> of users.  Many users can't realistically upgrade using pg_dump, ever.
> >> So they'll be stuck on the release before the one that breaks
> >> compatibility for a very long time.
> > 
> > Right.  If we weren't setting tuple and tid bits we could improve it
> > easily in PG 11, but if we use them for a single-change WARM chain for
> > PG 10, we might need bits that are not available to improve it later.
> > 
> 
> I thought there is still couple of bits available.

Yes, there are.  The issue is that we don't know how we would improve it
so we don't know how many bits we need, and my concern is that we
haven't discussed the improvement ideas enough to know we have done the
best we can for PG 10.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-03-21 Thread David Steele

On 3/8/17 8:36 PM, Tsunakawa, Takayuki wrote:

From: pgsql-hackers-ow...@postgresql.org

[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ashutosh Sharma
To start with, I ran the regression test-suite and didn't find any failures.
But, then I am not sure if huge_pages are getting used or not. However,
upon checking the settings for huge_pages and I found it as 'on'. I am
assuming, if huge pages is not being used due to shortage of large pages,
it should have fallen back to non-huge pages.


You are right, the server falls back to non-huge pages when the large pages run 
short.


I also ran the pgbench tests on read-only workload and here are the results
I got.

pgbench -c 4 -j 4 - T 600 bench

huge_pages=on, TPS = 21120.768085
huge_pages=off, TPS = 20606.288995


Thanks.  It's about 2% improvement, which is the same as what I got.


From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]

The line beginning 'Huge pages are known as...' has been accidentally
duplicated.


Oops, how careless I was.  Fixed.  As Ashutosh referred, I added a very simple 
suggestion to use Windows Group Policy tool.


Amit, Magnus, you are signed up as reviewers for this patch.  Do you 
know when you'll have a chance to take a look?


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

2017-03-21 Thread David Steele

On 3/7/17 9:42 PM, Pavan Deolasee wrote:


Fair point. I'm not going to "persist" with the idea too long. It seemed
like a good, low-risk feature to me which can benefit certain use cases
quite reasonably. It's not uncommon to create indexes (or reindex
existing indexes to remove index bloats) on extremely large tables and
avoiding a second heap scan can hugely benefit such cases. Holding up
the patch for something for which we don't even have a proposal yet
seemed a bit strange at first, but I see the point.

Anyways, for a recently vacuumed table of twice the size of RAM and on a
machine with SSDs, the patched CIC version runs about 25% faster. That's
probably the best case scenario.

I also realised that I'd attached a slightly older version of the patch.
So new version attached, but I'll remove the patch from CF if I don't
see any more votes in favour of the patch.


I have marked this submission "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest or open a discussion on hackers 
if you develop a new approach.


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-21 Thread Fujii Masao
On Tue, Mar 21, 2017 at 11:03 AM, Tsunakawa, Takayuki
 wrote:
> From: David Steele [mailto:da...@pgmasters.net]
>> Well, that's embarrassing.  When I recreated the function to add defaults
>> I messed up the AS clause and did not pay attention to the results of the
>> regression tests, apparently.
>>
>> Attached is a new version rebased on 88e66d1.  Catalog version bump has
>> also been omitted.
>
> I was late to reply because yesterday was a national holiday in Japan.  I 
> marked this as ready for committer.  The patch applied cleanly and worked as 
> expected.

The patch basically looks good to me, but one comment is;
backup.sgml (at least the description for "Making a non-exclusive
low level backup) seems to need to be updated.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-03-21 Thread Andrew Dunstan


On 03/21/2017 01:37 PM, David Steele wrote:
> On 3/16/17 11:54 AM, David Steele wrote:
>> On 2/1/17 12:53 AM, Michael Paquier wrote:
>>> On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane  wrote:
 Nikita Glukhov  writes:
> On 25.01.2017 23:58, Tom Lane wrote:
>> I think you need to take a second look at the code you're producing
>> and realize that it's not so clean either.  This extract from
>> populate_record_field, for example, is pretty horrid:

> But what if we introduce some helper macros like this:

> #define JsValueIsNull(jsv) \
>  ((jsv)->is_json ? !(jsv)->val.json.str \
>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)

> #define JsValueIsString(jsv) \
>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)

 Yeah, I was wondering about that too.  I'm not sure that you can make
 a reasonable set of helper macros that will fix this, but if you want
 to try, go for it.

 BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
 to go back to the manual every darn time to convince myself whether
 that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
 the reader (... or the author) having memorized C's precedence rules
 in quite that much detail.  Extra parens help.
>>>
>>> Moved to CF 2017-03 as discussion is going on and more review is
>>> needed on the last set of patches.
>>>
>>
>> The current patches do not apply cleanly at cccbdde:
>>
>> $ git apply ../other/0001-introduce-JsonContainerXxx-macros-v04.patch
>> error: patch failed: src/backend/utils/adt/jsonb_util.c:328
>> error: src/backend/utils/adt/jsonb_util.c: patch does not apply
>> error: patch failed: src/backend/utils/adt/jsonfuncs.c:1266
>> error: src/backend/utils/adt/jsonfuncs.c: patch does not apply
>> error: patch failed: src/include/utils/jsonb.h:218
>> error: src/include/utils/jsonb.h: patch does not apply
>>
>> In addition, it appears a number of suggestions have been made by Tom
>> that warrant new patches.  Marked "Waiting on Author".
>
> This thread has been idle for months since Tom's review.
>
> The submission has been marked "Returned with Feedback".  Please feel
> free to resubmit to a future commitfest.
>
>

Please revive. I am planning to look at this later this week.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze on Cygwin w/ concurrency

2017-03-21 Thread Andrew Dunstan


On 03/20/2017 11:47 PM, Noah Misch wrote:
> "pgbench -i -s 50; pgbench -S -j2 -c16 -T900 -P5" freezes consistently on
> Cygwin 2.2.1 and Cygwin 2.6.0.  (I suspect most other versions are affected.)
> I've pinged[1] the Cygwin bug thread with some additional detail.  If a Cygwin
> buildfarm member starts using --enable-tap-tests, you may see failures in the
> pgbench test suite.  (lorikeet used --enable-tap-tests from 2017-03-18 to
> 2017-03-20, but it failed before reaching the pgbench test suite.)  Curious
> that "make check" has too little concurrency to see more effects from this.


Yeah, I abandoned --enable-tap-test on lorikeet, didn't have time to get
to the bottom of the problems. Glad I'm not totally alone keeping this
alive.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee
On Tue, Mar 21, 2017 at 10:34 PM, Robert Haas  wrote:

> On Tue, Mar 21, 2017 at 12:49 PM, Bruce Momjian  wrote:
> > On Tue, Mar 21, 2017 at 09:25:49AM -0400, Robert Haas wrote:
> >> On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee
> >> > TBH I see many artificial scenarios here. It will be very useful if
> he can
> >> > rerun the query with some of these restrictions lifted. I'm all for
> >> > addressing whatever we can, but I am not sure if this test
> demonstrates a
> >> > real world usage.
> >>
> >> That's a very fair point, but if these patches - or some of them - are
> >> going to get committed then these things need to get discussed.  Let's
> >> not just have nothing-nothing-nothing giant unagreed code drop.
> >
> > First, let me say I love this feature for PG 10, along with
> > multi-variate statistics.
> >
> > However, not to be a bummer on this, but the persistent question I have
> > is whether we are locking ourselves into a feature that can only do
> > _one_ index-change per WARM chain before a lazy vacuum is required.  Are
> > we ever going to figure out how to do more changes per WARM chain in the
> > future, and is our use of so many bits for this feature going to
> > restrict our ability to do that in the future.
> >
> > I know we have talked about it, but not recently, and if everyone else
> > is fine with it, I am too, but I have to ask these questions.
>
> I think that's a good question.  I previously expressed similar
> concerns.  On the one hand, it's hard to ignore the fact that, in the
> cases where this wins, it already buys us a lot of performance
> improvement.  On the other hand, as you say (and as I said), it eats
> up a lot of bits, and that limits what we can do in the future.


I think we can save a bit few bits, at some additional costs and/or
complexity. It all depends on what matters us more. For example, we can
choose not to use HEAP_LATEST_TUPLE bit and instead always find the root
tuple the hard way. Since only WARM would ever need to find that
information, may be it's ok since WARM's other advantage will justify that.
Or we cache the information computed during earlier heap_prune_page call
and use that (just guessing that we can make it work, no concrete idea at
this moment).

We can also save HEAP_WARM_UPDATED flag since this is required only for
abort-handling case. We can find a way to push that information down to the
old tuple if UPDATE aborts and we detect the broken chain. Again, not fully
thought through, but doable. Of course, we will have to carefully evaluate
all code paths and make sure that we don't lose that information ever.

If the consumption of bits become a deal breaker then I would first trade
the HEAP_LATEST_TUPLE bit and then HEAP_WARM_UPDATED just from correctness
perspective.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG

2017-03-21 Thread David Steele

Hi Haribabu,

On 3/7/17 12:09 AM, Ideriha, Takeshi wrote:



I tried applying your patches. But it failed...
The error messages are as below.


Attached 004_declareStmt_test_v5.patch is a rebased one.
The rest of patches are same as older version.

Regards,
Ideriha, Takeshi


You are signed up to review this patch.  Do you know when you'll have a 
chance to do that?


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] perlcritic

2017-03-21 Thread David Steele

Hi Daniel,

On 3/6/17 12:02 PM, Dagfinn Ilmari Mannsåker wrote:

ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:


Hi Peter,

Peter Eisentraut  writes:


I posted this about 18 months ago but then ran out of steam. [ ] Here
is an updated patch.  The testing instructions below still apply.
Especially welcome would be ideas on how to address some of the places
I have marked with ## no critic.


Attached is a patch on top of yours that addresses all the ## no critic
annotations except RequireFilenameMatchesPackage, which can't be fixed
without more drastic reworking of the plperl build process.

Tested on perl 5.8.1 and 5.24.0 by configuring with --with-perl and
--enable-tap-tests followed by make check-world, and running pgindent
--build.


Attached is an updated version of the patch, in which
src/tools/msvc/gendef.pl actually compiles.  If someone on Windows could
test it, that would be great.


You are signed up to review this patch.  Do you know when you'll have a 
chance to do that?


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-21 Thread Ashutosh Sharma
Hi Jeff,

>
> I can confirm that that fixes the seg faults for me.

Thanks for confirmation.

>
> Did you mean you couldn't reproduce the problem in the first place, or that
> you could reproduce it and now the patch fixes it?  If the first of those, I
> forget to say you do have to wait for hot standby to reach a consistency and
> open for connections, and then connect to the standby ("psql -p 9874"),
> before the seg fault will be triggered.

I meant that I was not able to reproduce the issue on HEAD.

>
> But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges
> from btree_xlog_delete_get_latestRemovedXid, which I don't understand the
> reason for the divergence.  Is there a reason we dropped the PANIC if we
> have not reached consistency?

Well, I'm not quite sure how would standby allow any backend to
connect to it until it has reached to a consistent state. If you see
the definition of btree_xlog_delete_get_latestRemovedXid(), just
before consistency check there is a if-condition 'if
(CountDBBackends(InvalidOid) == 0)' which means
we are checking for consistent state only after knowing that there are
some backends connected to the standby. So, Is there a possibility of
having some backend connected to standby server without having it in
consistent state.

That is a case which should never happen, but
> it seems worth preserving the PANIC.  And why does this code get 'unused'
> from XLogRecGetBlockData(record, 0, ), while the btree code gets it from
> xlrec?  Is that because the record being replayed is structured differently
> between btree and hash, or is there some other reason?
>

Yes, That's right.  In case of btree index, we have used
XLogRegisterData() to add data to WAL record and to fetch the same we
use XLogRecGetData(). In case of hash index we have associated the
same data with some registered buffer using
XLogRegisterBufferData(0,...) and to fetch the same we use
XLogRecGetBlockData(0,...).  Now, if you see XLogRecordAssemble, it
first adds all the data assciated with registered buffers into the WAL
record followed by the main data (). Hence, the WAL record in  btree
and hash are organised differently.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-21 Thread Jesper Pedersen

Hi,

On 02/14/2017 12:27 AM, Ashutosh Sharma wrote:

Currently, Hash Index scan works tuple-at-a-time, i.e. for every
qualifying tuple in a page, it acquires and releases the lock which
eventually increases the lock/unlock traffic. For example, if an index
page contains 100 qualified tuples, the current hash index scan has to
acquire and release the lock 100 times to read those qualified tuples
which is not good from performance perspective and it also impacts
concurency with VACUUM.

Considering above points, I would like to propose a patch that allows
hash index scan to work in page-at-a-time mode. In page-at-a-time
mode, once lock is acquired on a target bucket page, the entire page
is scanned and all the qualified tuples are saved into backend's local
memory. This reduces the lock/unlock calls for retrieving tuples from
a page. Moreover, it also eliminates the problem of re-finding the
position of the last returned index tuple and more importanly it
allows VACUUM to release lock on current page before moving to the
next page which eventually improves it's concurrency with scan.

Attached patch modifies hash index scan code for page-at-a-time mode.
For better readability, I have splitted it into 3 parts,



Due to the commits on master these patches applies with hunks.

The README should be updated to mention the use of page scan.

hash.h needs pg_indent.


1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
patch rewrites the hash index scan module to work in page-at-a-time
mode. It basically introduces two new functions-- _hash_readpage() and
_hash_saveitem(). The former is used to load all the qualifying tuples
from a target bucket or overflow page into an items array. The latter
one is used by _hash_readpage to save all the qualifying tuples found
in a page into an items array. Apart from that, this patch bascially
cleans _hash_first(), _hash_next and hashgettuple().



For _hash_next I don't see this - can you explain ?

+ *
+ * On failure exit (no more tuples), we release pin and set
+ * so->currPos.buf to InvalidBuffer.


+ * Returns true if any matching items are found else returns false.

s/Returns/Return/g


2) 0002-Remove-redundant-function-_hash_step-and-some-of-the.patch:
this patch basically removes the redundant function _hash_step() and
some of the unused members of HashScanOpaqueData structure.



Looks good.


3) 0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index.patch:
this patch basically improves the locking strategy for VACUUM in hash
index. As the new hash index scan works in page-at-a-time, vacuum can
release the lock on previous page before acquiring a lock on the next
page, hence, improving hash index concurrency.



+* As the new hash index scan work in page at a time mode,

Remove 'new'.


I have also done the benchmarking of this patch and would like to
share the results for the same,

Firstly, I have done the benchmarking with non-unique values and i
could see a performance improvement of 4-7%. For the detailed results
please find the attached file 'results-non-unique values-70ff', and
ddl.sql, test.sql are test scripts used in this experimentation. The
detail of non-default GUC params and pgbench command are mentioned in
the result sheet. I also did the benchmarking with unique values at
300 and 1000 scale factor and its results are provided in
'results-unique-values-default-ff'.



I'm seeing similar results, and especially with write heavy scenarios.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 2:03 PM, Petr Jelinek
 wrote:
> This is why I like the idea of pluggable storage, if we ever get that it
> would buy us ability to implement completely different heap format
> without breaking pg_upgrade.

You probably won't be surprised to hear that I agree.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 1:21 PM, Simon Riggs  wrote:
>> The decision not to require the attribute numbers to match doesn't
>> necessarily mean we can't get rid of the Append node, though.  First
>> of all, in a lot of practical cases the attribute numbers will all
>> match.  Second, if they don't, the most that would be required is a
>> projection step, which could usually be done without a separate node
>> because most nodes are projection-capable.  And maybe not even that
>> much is needed; I'd have to go back and look at what Tom was worried
>> about the last time this came up.  (Hmm, maybe the problem had to do
>> with varnos matching, rather then attribute numbers?)
>
> There used to be some code there to fix them up, not sure where that went.

Me neither.  To be clear in case I haven't been already, I'm totally
fine with somebody doing the work to get rid of the Append node; I
just think it'll take some investigation and work that hasn't been
done yet.

(I'm also a little skeptical about the value of the work.  The Append
node doesn't cost much; what's expensive is that the planner isn't
smart about planning queries that involve Append nodes and so getting
rid of one can improve the whole plan shape.  But I think the answer
to that problem is optimizations like partition-wise join and
partition-wise aggregate, which can handle cases where an Append has
any number of surviving children.  Eliminating the Append only helps
when the number of surviving children is exactly one.  Now, that's not
to say I'm going to fight a patch if somebody writes one, but I think
to some extent it's just a band-aid.)

>> Another and independent problem with eliding the Append node is that,
>> if we did that, we'd still have to guarantee that the parent relation
>> corresponding to the Append node got locked somehow.  Otherwise, we'll
>> be accessing the tuple routing information for a table on which we
>> don't have a lock.  That's probably a solvable problem, too, but it
>> hasn't been solved yet.
>
> Hmm, why would we need to access tuple routing information?

I didn't state that very well.  It's not so much that we need access
to the tuple routing information as that we need to replan if it
changes, because if the tuple routing information changes then we
might need to include partitions that were previously being pruned.
If we haven't got some kind of a lock on the parent, I'm pretty sure
that's not going to work reliably.  Synchronization of invalidation
traffic relies on DDL statements holding a lock that conflicts with
the lock held by the process using the table; if there is no such
lock, we might fail to notice that we need to replan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee
On Tue, Mar 21, 2017 at 10:47 PM, Bruce Momjian  wrote:

> On Tue, Mar 21, 2017 at 01:04:14PM -0400, Robert Haas wrote:
> > > I know we have talked about it, but not recently, and if everyone else
> > > is fine with it, I am too, but I have to ask these questions.
> >
> > I think that's a good question.  I previously expressed similar
> > concerns.  On the one hand, it's hard to ignore the fact that, in the
> > cases where this wins, it already buys us a lot of performance
> > improvement.  On the other hand, as you say (and as I said), it eats
> > up a lot of bits, and that limits what we can do in the future.  On
> > the one hand, there is a saying that a bird in the hand is worth two
> > in the bush.  On the other hand, there is also a saying that one
> > should not paint oneself into the corner.
> >
> > I'm not sure we've had any really substantive discussion of these
> > issues.  Pavan's response to my previous comments was basically "well,
> > I think it's worth it", which is entirely reasonable, because he
> > presumably wouldn't have written the patch that way if he thought it
> > sucked.  But it might not be the only opinion.
>
> Early in the discussion we talked about allowing multiple changes per
> WARM chain if they all changed the same index and were in the same
> direction so there were no duplicates, but it was complicated.  There
> was also discussion about checking the index during INSERT/UPDATE to see
> if there was a duplicate.  However, those ideas never led to further
> discussion.
>

Well, once I started thinking about how to do vacuum etc, I realised that
any mechanism which allows unlimited (even handful) updates per chain is
going to be very complex and error prone. But if someone has ideas to do
that, I am open. I must say though, it will make an already complex problem
even more complex.


>
> I know the current patch yields good results, but only on a narrow test
> case,


Hmm. I am kinda surprised you say that because I never thought it was a
narrow test case that we are targeting here. But may be I'm wrong.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


  1   2   3   >