Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Robert Haas
On Tue, Aug 11, 2015 at 2:35 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 8/11/15 10:28 AM, Robert Haas wrote:
 Right.  So what?  First, you upgrade all of the clients one by one to
 a new version of the connector that supports SCRAM.

 In my experience, upgrading clients is, in many settings, significantly
 harder than upgrading servers.  So I think any plan to starts like the
 above isn't going to work.

Well, the real sequence of steps is actually:

1. Get SCRAM authentication committed to 9.6, and release 9.6.
2. Get driver authors to begin supporting SCRAM in their drivers.
3. Get users to update to those new drivers.
4. Enable SCRAM authentication for any given role once all clients
that need that role are SCRAM-capable.

Since supporting a new authentication method means adding a new
protocol message, there's no feasible method for rolling out SCRAM
without users at some point updating to a newer driver.  So there's
really not much choice but for (1)-(3) to happen.  If you're saying
it's likely to be a really long time before steps (2) and (3) are
completed for substantially all installations, I quite agree.

The thing we're actually debating here is whether enabling SCRAM
authentication for a role should also mean disabling MD5
authentication for that same role, or whether you should be able to
have two password verifiers stored for that role, one for SCRAM and
the other MD5.  If we allowed that, then you could turn SCRAM on for a
role, and only later turn MD5 off.  I think that's a bad plan because,
in most scenarios, allowing two forms of authentication for the same
account is strictly less secure than allowing only one.  And also
because it means adding a bunch of new system catalog machinery and
SQL syntax.  Instead, I think that, for any given role, you should get
to pick the way that password authentication works for that role:
either MD5, or SCRAM, but not whichever of those two the client
prefers.

We don't actually have any reason to believe that MD5 is insecure in
the way that we're using it.  But if it turns out that somebody finds
an effective preimage attack against MD5, which I think what is what
Heikki said it would take to make our use unsafe, then it's not going
to be good enough to let people use SCRAM.  They're going to need to
enforce the use of SCRAM.  So I really don't understand why we want to
add a lot of complexity to let people mix the two authentication types
on the same user account.  Unless SCRAM is the ONLY way to
authenticate to the server, the fact that some clients are using it
does not help you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] can't coax query planner into using all columns of a gist index

2015-08-11 Thread Gideon Dresdner
Greetings,

I had a discussion on IRC today with RhodiumToad regarding optimizing a
specific query. We didn't manage to figure out how to get postgres to hit a
GIST index.

The bigger picture is that I am trying to do some bioinformatics and
thought that postgres would be a great way of getting the 1000 genomes
project in the palm of my hand instead of submitting C++ programs to a
cluster, waiting for the results, etc.

Anyway, there is a multicolumn index that we created:

CREATE INDEX qcregions_chrregion_index ON qcregions USING gist(chr, region)
WHERE type = 'include'

- chr is an int between 1 and 24 (inclusive)
- region is an int4range
- type is an enum with two values, 'include' and 'exclude'.

Here are three relevant explain outputs:

Example 1:

EXPLAIN SELECT * FROM (values (12,5000), (13,5001) ) v(c,r)
WHERE EXISTS (SELECT region FROM qcregions
WHERE type = 'include' and region @ v.r and chr = v.c);
  QUERY PLAN

---
 Nested Loop Semi Join  (cost=0.41..8.82 rows=1 width=8)
   -  Values Scan on *VALUES*  (cost=0.00..0.03 rows=2 width=8)
   -  Index Scan using qcregions_chrregion_index on qcregions
 (cost=0.41..3464.26 rows=874 width=17)
 Index Cond: ((chr = *VALUES*.column1) AND (region @
*VALUES*.column2))
(4 rows)

Time: 1.284 ms

Example 2:

-- set enable_setbitmapscan = true
EXPLAIN SELECT * FROM (select * from vcf limit 2) AS vcf
WHERE EXISTS (SELECT region FROM qcregions
WHERE qcregions.chr = vcf.chr
  AND qcregions.type = 'include'
  AND vcf.pos @ qcregions.region);
QUERY PLAN

--
 Nested Loop Semi Join  (cost=4862.57..18775.78 rows=1 width=64)
   -  Limit  (cost=0.00..0.04 rows=2 width=64)
 -  Seq Scan on vcf  (cost=0.00..1894654.40 rows=84801840 width=64)
   -  Bitmap Heap Scan on qcregions  (cost=4862.57..7873.60 rows=874
width=17)
 Recheck Cond: ((vcf.pos @ region) AND (type =
'include'::qcregiontype) AND (chr = vcf.chr))
 -  BitmapAnd  (cost=4862.57..4862.57 rows=874 width=0)
   -  Bitmap Index Scan on qcregions_chrregion_index
 (cost=0.00..977.76 rows=20980 width=0)
 Index Cond: (vcf.pos @ region)
   -  Bitmap Index Scan on qcregions_chr_index
 (cost=0.00..3884.12 rows=215158 width=0)
 Index Cond: (chr = vcf.chr)
(10 rows)

Time: 0.708 ms

Example 3 (same as example 2 but with enable_bitmapscan = false).

-- set enable_bitmapscan = false
 EXPLAIN SELECT * FROM (select * from vcf limit 2) AS vcf
WHERE EXISTS (SELECT region FROM qcregions
WHERE qcregions.chr = vcf.chr
  AND qcregions.type = 'include'
  AND vcf.pos @ qcregions.region);
QUERY PLAN

--
 Nested Loop Semi Join  (cost=0.43..38691.26 rows=1 width=64)
   -  Limit  (cost=0.00..0.04 rows=2 width=64)
 -  Seq Scan on vcf  (cost=0.00..1894654.40 rows=84801840 width=64)
   -  Index Scan using qcregions_chr_index on qcregions
 (cost=0.43..12891.38 rows=874 width=17)
 Index Cond: (chr = vcf.chr)
 Filter: ((type = 'include'::qcregiontype) AND (vcf.pos @ region))
(6 rows)

Time: 1.214 ms

I am running psql (PostgreSQL) 9.4.4 on a laptop with 4 cores and 16 GB RAM.

Looking forward to hearing your thoughts,
Gideon.


Re: [HACKERS] Commitfest remaining Needs Review items

2015-08-11 Thread Stephen Frost
Heikki,

* Heikki Linnakangas (hlinn...@iki.fi) wrote:
 * self-defined policy for sepgsql regression test
 
 Stephen: would you have the time to handle this, please?

We'll definitely get this addressed soon.  Apologies for taking so long
to get to it, was in Brazil last week and over the weekend and then had
a bit of civic duty today (Jury duty).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] replication slot restart_lsn initialization

2015-08-11 Thread Gurjeet Singh
On Mon, Aug 10, 2015 at 7:12 AM, Andres Freund and...@anarazel.de wrote:

 On 2015-07-07 09:42:54 -0700, Gurjeet Singh wrote:
   /*
  + * Grab and save an LSN value to prevent WAL recycling past that point.
  + */
  +void
  +ReplicationSlotRegisterRestartLSN()
  +{

 Didn't like that description and function name very much. What does
 'grabbing' mean here? Should probably mention that it works on the
 currently active slot and modifies it.


In your version, I don't see a comment that refers to the fact that it
works on the currently active (global variable) slot.



 It's now ReplicationSlotReserveWal()


Okay.



  + ReplicationSlot *slot = MyReplicationSlot;
  +
  + Assert(slot != NULL);
  + Assert(slot-data.restart_lsn == InvalidXLogRecPtr);
  +
  + /*
  +  * The replication slot mechanism is used to prevent removal of
 required
  +  * WAL. As there is no interlock between this and checkpoints
 required, WAL
  +  * segment could be removed before
 ReplicationSlotsComputeRequiredLSN() has
  +  * been called to prevent that. In the very unlikely case that
 this happens
  +  * we'll just retry.
  +  */

 You removed some punctuation in that sentence converting a sentence in
 bad english into one without the original meaning ;). See the attached
 for a new version.


Your version looks better.



  +/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
  @@ -876,7 +942,7 @@ StartupReplicationSlots(void)
   }
 
   /* 
  - * Manipulation of ondisk state of replication slots
  + * Manipulation of on-disk state of replication slots
*
* NB: none of the routines below should take any notice whether a slot
 is the
* current one or not, that's all handled a layer above.
  diff --git a/src/backend/replication/slotfuncs.c
 b/src/backend/replication/slotfuncs.c
  index 9a2793f..01b376a 100644
  --- a/src/backend/replication/slotfuncs.c
  +++ b/src/backend/replication/slotfuncs.c
  @@ -40,6 +40,7 @@ Datum
   pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
   {
Namename = PG_GETARG_NAME(0);
  + boolimmediately_reserve = PG_GETARG_BOOL(1);
Datum   values[2];
boolnulls[2];
TupleDesc   tupdesc;
  @@ -58,10 +59,28 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
/* acquire replication slot, this will check for conflicting names
 */
ReplicationSlotCreate(NameStr(*name), false, RS_PERSISTENT);
 
  - values[0] = NameGetDatum(MyReplicationSlot-data.name);
  + if (immediately_reserve)
  + {
  + /* Allocate restart-LSN, if the user asked for it */
  + ReplicationSlotRegisterRestartLSN();
  +
  + /* Write this slot to disk */
  + ReplicationSlotMarkDirty();
  + ReplicationSlotSave();
 
  - nulls[0] = false;
  - nulls[1] = true;
  + values[0] = NameGetDatum(MyReplicationSlot-data.name);
  + values[1] =
 LSNGetDatum(MyReplicationSlot-data.restart_lsn);
  +
  + nulls[0] = false;
  + nulls[1] = false;
  + }
  + else
  + {
  + values[0] = NameGetDatum(MyReplicationSlot-data.name);
  +
  + nulls[0] = false;
  + nulls[1] = true;
  + }

 I moved
 values[0] = NameGetDatum(MyReplicationSlot-data.name);
 nulls[0] = false;
 to outside the conditional block, there seems no reason to have it in
 there?


The assignment to values[0] is being done twice. We can do away with the
one in the else part of the if condition.

Also, there was a typo in my patch [1] and it seems to have made it into
the commit: acronymLSN/. Tom seems to have just fixed it in
commit 750fc78b.

Best regards,

[1]: I altered you to it in a personal email, but probably it fell through
the cracks.
-- 
Gurjeet Singh http://gurjeet.singh.im/


Re: [HACKERS] max_connections and standby server

2015-08-11 Thread Andres Freund
On 2015-08-11 02:06:53 -0400, Tom Lane wrote:
 Hm.  Surely KnownAssignedXIDs could be resized at need.

It's in shared memory so GetSnapshotData() can access it, so not trivially.

 lock table on the standby, that could be completely occupied by locks
 taken by hot-standby backend processes, so I don't see why we're insisting
 on anything particular as to its size.

The startup process alone needs to be able to hold all the master's
exclusive locks at once since they're WAL logged (and have to be).

Idon't think common locks held by other processes are an actual problem
- if max_connections and max_locks_per_xact is the same they can only
hold as many locks as the master could. They'd all conflict with WAL
replay of the exclusive locks anyway.

Now you probably could create a problematic situation by creating
hundres of advisory locks or something. But that's a fairly different
scenario from an idle server not being able to replay the primary's WAL
records because it can't keep track of all the locks.


Now you can argue that it's uncommon to hold that many AE locks on the
primary in the first place. But i'm not sure it's true. The most common
reasons I've seen for exceeding locks are dumps and restores - and the
latter is all AELs.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-11 Thread Simon Riggs
On 10 August 2015 at 17:50, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Masahiko Sawada wrote:

  This topic may have been already discussed but, why don't we use just
  total scanned pages and total pages?

 Because those numbers don't extrapolate nicely.  If the density of dead
 tuples is irregular across the table, such absolute numbers might be
 completely meaningless: you could scan 90% of the table without seeing
 any index scan, and then at the final 10% be hit by many index scans
 cleaning dead tuples.  Thus you would see progress go up to 90% very
 quickly and then take hours to have it go to 91%.  (Additionally, and a
 comparatively minor point: since you don't know how many index scans are
 going to happen, there's no way to know the total number of blocks
 scanned, unless you don't count index blocks at all, and then the
 numbers become a lie.)

 If you instead track number of heap pages separately from index pages,
 and indicate how many index scans have taken place, you have a chance of
 actually figuring out how many heap pages are left to scan and how many
 more index scans will occur.


I think this overstates the difficulty.

Autovacuum knows what % of a table needs to be cleaned - that is how it is
triggered. When a vacuum runs we should calculate how many TIDs we will
collect and therefore how many trips to the indexes we need for given
memory. We can use the VM to find out how many blocks we'll need to scan in
the table. So overall, we know how many blocks we need to scan.

I think just storing (total num blocks, scanned blocks) is sufficiently
accurate to be worth holding, rather than make it even more complex.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-11 Thread Amit Kapila
On Wed, Jul 1, 2015 at 3:49 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 1 July 2015 at 11:14, Andres Freund and...@anarazel.de wrote:

 On 2015-07-01 09:08:11 +0100, Simon Riggs wrote:
  On 1 July 2015 at 09:00, Amit Kapila amit.kapil...@gmail.com wrote:
 
   On Tue, Jun 30, 2015 at 12:32 PM, Simon Riggs si...@2ndquadrant.com
   a.  the semantics of new LWLock (CommitLock) introduced
   by patch seems to be different in the sense that it is just taken in
   Exclusive mode (and no Shared mode is required) as per your
proposal. We
   could use existing LWLock APi's, but on the other hand we could even
   invent new LWLock API for this kind of locking.
  
 
  LWLock API code is already too complex, so -1 for more changes there

 I don't think that's a valid argument. It's better to have the
 complexity in one place (lwlock) than have rather similar complexity in
 several other places. The clog control lock is far from the only place
 that would benefit from tricks along these lines.


 What tricks are being used??

 Please explain why taking 2 locks is bad here, yet works fine elsewhere.


One thing that could be risky in this new scheme of locking
is that now in functions TransactionIdSetPageStatus and
TransactionIdSetStatusBit, we modify slru's shared state with Control Lock
in Shared mode whereas I think it is mandated in the code that those
should be modified with ControlLock in Exlusive mode.  This could have
some repercussions.

Another thing is that in this flow, with patch there will be three locks
(we take per-buffer locks before doing I/O) that will get involved rather
than
two, so one effect of this patch will be that currently while doing I/O,
concurrent committers will be allowed to proceed as we release ControlLock
before doing the same whereas with Patch, they will not be allowed as they
are blocked by CommitLock.  Now may be this scenario is less common and
doesn't matter much if the patch improves the more common scenario,
however this is an indication of what Andres tries to highlight that having
more
locks for this might make patch more complicated.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] max_connections and standby server

2015-08-11 Thread Simon Riggs
On 11 August 2015 at 06:42, Tom Lane t...@sss.pgh.pa.us wrote:

 Tatsuo Ishii is...@postgresql.org writes:
  I think this is because pg_control on the standby remembers that the
  previous primary server's max_connections = 1100 even if the standby
  server fails to start. Shouldn't we update pg_control file only when
  standby succeeds to start?

 Somebody refresh my memory as to why we have this restriction (that is,
 slave's max_connections = master's max_connections) in the first place?
 Seems like it should not be a necessary requirement, and working towards
 getting rid of it would be far better than any other answer.


That was the consensus on how to control things on the standby, as 9.0
closed.

Yes, there are various other ways of specifying those things and these days
they could be made to react more dynamically.

There are various major improvements to hot standby that could have
happened, but that time has been spent on the more useful logical
replication which is slowly making its way into core. More coming in 9.6.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Priority table or Cache table

2015-08-11 Thread Haribabu Kommi
On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 What is the configuration for test (RAM of m/c, shared_buffers,
 scale_factor, etc.)?

Here are the details:

CPU - 16 core, RAM - 252 GB

shared_buffers - 1700MB, buffer_cache_ratio - 70
wal_buffers - 16MB, synchronous_commit - off
checkpoint_timeout - 15min, max_wal_size - 5GB.

pgbench scale factor - 75 (1GB)

Load test table size - 1GB

  Threads HeadPatchedDiff
  1  3123  3238  3.68%
  2  5997  6261  4.40%
  4 11102   11407  2.75%

 I am suspecting that, this may because of buffer locks that are
 causing the problem.
 where as in older approach of different buffer pools, each buffer pool
 have it's own locks.
 I will try to collect the profile output and analyze the same.

 Any better ideas?


 I think you should try to find out during test, for how many many pages,
 it needs to perform clocksweep (add some new counter like
 numBufferBackendClocksweep in BufferStrategyControl to find out the
 same).  By theory your patch should reduce the number of times it needs
 to perform clock sweep.

 I think in this approach even if you make some buffers as non-replaceable
 (buffers for which BM_BUFFER_CACHE_PAGE is set), still clock sweep
 needs to access all the buffers.  I think we might want to find some way to
 reduce that if this idea helps.

 Another thing is that, this idea looks somewhat similar (although not same)
 to current Ring Buffer concept, where Buffers for particular types of scan
 uses buffers from Ring.  I think it is okay to prototype as you have done
 in patch and we can consider to do something on those lines if at all
 this patch's idea helps.

Thanks for the details. I will try the same.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_connections and standby server

2015-08-11 Thread Andres Freund
On 2015-08-11 13:53:15 +0900, Tatsuo Ishii wrote:
 Today I encountered an interesting situation.
 
 1) A streaming replication primary server and a standby server is
running. At this point max_connections = 100 on both servers.
 
 2) Shutdown both servers.
 
 3) Change max_connections to 1100 on both servers and restart both
servers.
 
 4) The primary server happily started but the standby server won't
because of lacking resource.
 
 5) Shutdown both servers.
 
 6) Restore max_connections to 100 on both servers and restart both
servers.
 
 7) The primary server happily started but the standby server won't
because of the reason below.
 
 32695 2015-08-11 13:46:22 JST FATAL:  hot standby is not possible because 
 max_connections = 100 is a lower setting than on the master server (its value 
 was 1100)
 32695 2015-08-11 13:46:22 JST CONTEXT:  xlog redo parameter change: 
 max_connections=1100 max_worker_processes=8 max_prepared_xacts=10 
 max_locks_per_xact=64 wal_level=hot_standby wal_log_hints=off
 32693 2015-08-11 13:46:22 JST LOG:  startup process (PID 32695) exited with 
 exit code 1
 32693 2015-08-11 13:46:22 JST LOG:  terminating any other active server 
 processes
 
 I think this is because pg_control on the standby remembers that the
 previous primary server's max_connections = 1100 even if the standby
 server fails to start. Shouldn't we update pg_control file only when
 standby succeeds to start?

I don't think that'd help. There's a WAL record generated that contains
the master's settings (C.f. XLogReportParameters()) and when replaying
we check that the local settings are compatible with the master's. So
you'll either have to have higher settings on the standby for at least
one restart or, maybe easier given 4), simply start the standby for a
second with hot_standby = off, and then re-enable it after it has
replayed pending WAL.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Priority table or Cache table

2015-08-11 Thread Amit Kapila
On Tue, Aug 11, 2015 at 11:31 AM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

 On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi 
 kommi.harib...@gmail.com
  wrote:
 
  What is the configuration for test (RAM of m/c, shared_buffers,
  scale_factor, etc.)?

 Here are the details:

 CPU - 16 core, RAM - 252 GB

 shared_buffers - 1700MB, buffer_cache_ratio - 70
 wal_buffers - 16MB, synchronous_commit - off
 checkpoint_timeout - 15min, max_wal_size - 5GB.

 pgbench scale factor - 75 (1GB)

 Load test table size - 1GB


It seems that test table can fit easily in shared buffers, I am not sure
this patch will be of benefit for such cases, why do you think it can be
beneficial for such cases?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Priority table or Cache table

2015-08-11 Thread Haribabu Kommi
On Tue, Aug 11, 2015 at 4:43 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Aug 11, 2015 at 11:31 AM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi
  kommi.harib...@gmail.com
  wrote:
 
  What is the configuration for test (RAM of m/c, shared_buffers,
  scale_factor, etc.)?

 Here are the details:

 CPU - 16 core, RAM - 252 GB

 shared_buffers - 1700MB, buffer_cache_ratio - 70
 wal_buffers - 16MB, synchronous_commit - off
 checkpoint_timeout - 15min, max_wal_size - 5GB.

 pgbench scale factor - 75 (1GB)

 Load test table size - 1GB


 It seems that test table can fit easily in shared buffers, I am not sure
 this patch will be of benefit for such cases, why do you think it can be
 beneficial for such cases?

Yes. This configuration combination is may not be best for the test.

The idea behind these setting is to provide enough shared buffers to cache
table by tuning the buffer_cache_ratio from 0 to 70% of shared buffers
So the cache tables have enough shared buffers and rest of the shared
buffers can be used for normal tables i.e load test table.

I will try to evaluate some more performance tests with different shared
buffers settings and load.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_connections and standby server

2015-08-11 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Somebody refresh my memory as to why we have this restriction (that is,
 slave's max_connections = master's max_connections) in the first place?
 Seems like it should not be a necessary requirement, and working towards
 getting rid of it would be far better than any other answer.

 If I recall correctly, that's because KnownAssignedXIDs and the lock
 table need to be large enough on the standby for the largest snapshot
 possible (procarray.c).

Hm.  Surely KnownAssignedXIDs could be resized at need.  As for the shared
lock table on the standby, that could be completely occupied by locks
taken by hot-standby backend processes, so I don't see why we're insisting
on anything particular as to its size.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-11 Thread Simon Riggs
On 11 August 2015 at 10:55, Amit Kapila amit.kapil...@gmail.com wrote:

 On Wed, Jul 1, 2015 at 3:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 
  On 1 July 2015 at 11:14, Andres Freund and...@anarazel.de wrote:
 
  On 2015-07-01 09:08:11 +0100, Simon Riggs wrote:
   On 1 July 2015 at 09:00, Amit Kapila amit.kapil...@gmail.com wrote:
  
On Tue, Jun 30, 2015 at 12:32 PM, Simon Riggs 
 si...@2ndquadrant.com
a.  the semantics of new LWLock (CommitLock) introduced
by patch seems to be different in the sense that it is just taken in
Exclusive mode (and no Shared mode is required) as per your
 proposal. We
could use existing LWLock APi's, but on the other hand we could even
invent new LWLock API for this kind of locking.
   
  
   LWLock API code is already too complex, so -1 for more changes there
 
  I don't think that's a valid argument. It's better to have the
  complexity in one place (lwlock) than have rather similar complexity in
  several other places. The clog control lock is far from the only place
  that would benefit from tricks along these lines.
 
 
  What tricks are being used??
 
  Please explain why taking 2 locks is bad here, yet works fine elsewhere.
 

 One thing that could be risky in this new scheme of locking
 is that now in functions TransactionIdSetPageStatus and
 TransactionIdSetStatusBit, we modify slru's shared state with Control Lock
 in Shared mode whereas I think it is mandated in the code that those
 should be modified with ControlLock in Exlusive mode.  This could have
 some repercussions.


Do you know of any? This is a technical forum, so if we see a problem we
say what it is, and if we don't, that's usually classed as a positive point
in a code review.


 Another thing is that in this flow, with patch there will be three locks
 (we take per-buffer locks before doing I/O) that will get involved rather
 than
 two, so one effect of this patch will be that currently while doing I/O,
 concurrent committers will be allowed to proceed as we release ControlLock
 before doing the same whereas with Patch, they will not be allowed as they
 are blocked by CommitLock.  Now may be this scenario is less common and
 doesn't matter much if the patch improves the more common scenario,
 however this is an indication of what Andres tries to highlight that
 having more
 locks for this might make patch more complicated.


It's easy to stripe the CommitLock in that case, if it is a problem.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-11 Thread Amit Kapila
On Tue, Aug 11, 2015 at 3:44 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 11 August 2015 at 10:55, Amit Kapila amit.kapil...@gmail.com wrote:

  What tricks are being used??
 
  Please explain why taking 2 locks is bad here, yet works fine
 elsewhere.
 

 One thing that could be risky in this new scheme of locking
 is that now in functions TransactionIdSetPageStatus and
 TransactionIdSetStatusBit, we modify slru's shared state with Control Lock
 in Shared mode whereas I think it is mandated in the code that those
 should be modified with ControlLock in Exlusive mode.  This could have
 some repercussions.


 Do you know of any? This is a technical forum, so if we see a problem we
 say what it is, and if we don't, that's usually classed as a positive point
 in a code review.


One of the main reason of saying this is that it is written in File
level comments in slru.c that for accessing (examine or modify)
the shared state, Control lock *must* be held in Exclusive mode
except in function SimpleLruReadPage_ReadOnly().  So, whereas
I agree that I should think more about if there is any breakage due
to patch, but I don't find any explanation either in your e-mail or in
patch why it is safe to modify the state after patch when it was not
before.  If you think it is safe, then atleast modify comments in
slru.c.



 Another thing is that in this flow, with patch there will be three locks
 (we take per-buffer locks before doing I/O) that will get involved rather
 than
 two, so one effect of this patch will be that currently while doing I/O,
 concurrent committers will be allowed to proceed as we release ControlLock
 before doing the same whereas with Patch, they will not be allowed as they
 are blocked by CommitLock.  Now may be this scenario is less common and
 doesn't matter much if the patch improves the more common scenario,
 however this is an indication of what Andres tries to highlight that
 having more
 locks for this might make patch more complicated.


 It's easy to stripe the CommitLock in that case, if it is a problem.


Sure, I think other places in code that take both the other locks also
needs to be checked for updation.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] statement_timeout affects query results fetching?

2015-08-11 Thread Shay Rojansky
Thanks (once again!) for the valuable suggestions Robert.

The idea of chunking/buffering via cursors has been raised before for
another purpose - allowing multiple queries concurrently at the API level
(where concurrently means interleaving when reading the resultsets). This
would imply exposing the number of rows fetched to the user like you
suggested. However, I don't think there's a way we can remove the API
option to *not* buffer (as I said before, ADO.NET even provides a standard
API feature for reading column-by-column), and therefore the general
problem remains...

I think the right solution for us at the driver level would be to switch to
driver-enforced timeouts, i.e. to no longer use statement_timeout but look
at socket read times instead. I'll look into doing that for our next
version.

Thanks for all your thoughts!

On Mon, Aug 10, 2015 at 2:30 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Aug 10, 2015 at 5:25 AM, Shay Rojansky r...@roji.org wrote:
  Thanks for the explanation Robert, that makes total sense. However, it
 seems
  like the utility of PG's statement_timeout is much more limited than I
  thought.
 
  In case you're interested, I dug a little further and it seems that
  Microsoft's client for SQL Server implements the following timeout
 (source):
 
  cumulative time-out (for all network packets that are read during the
  invocation of a method) for all network reads during command execution or
  processing of the results. A time-out can still occur after the first
 row is
  returned, and does not include user processing time, only network read
 time.
 
  Since it doesn't seem possible to have a clean query-processing-only
 timeout
  at the backend, we may be better off doing something similar to the above
  and enforce timeouts on the client only. Any further thoughts on this
 would
  be appreciated.

 An alternative you may want to consider is using the Execute message
 with a non-zero row count and reading all of the returned rows as they
 come back, buffering them in memory.  When those have all been
 consumed, issue another Execute message and get some more rows.

 AFAICS, the biggest problem with this is that there's no good way to
 bound the number of rows returned by size rather than by number, which
 has been complained about before by somebody else in a situation
 similar to yours.  Another problem is that I believe it will cause
 cursor_tuple_fraction to kick in, which may change query plans.  But
 it does have the advantage that the query will be suspended from the
 server's point of view, which I *think* will toll statement_timeout.

 You might also consider exposing some knobs to the user, so that they
 can set the number of rows fetched in one go, and let that be all the
 rows or only some of them.

 We really need a better way of doing this, but I think this is the way
 other drivers are handling it now.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-11 Thread Andres Freund
On 2015-08-05 15:46:36 +0200, Andres Freund wrote:
 On 2015-08-05 15:08:29 +0200, Andres Freund wrote:
  We might later want to change some of the harder to maintain macros to
  inline functions, but that seems better done separately.
 
 Here's a conversion for fastgetattr() and heap_getattr(). Not only is
 the resulting code significantly more readable, but the conversion also
 shrinks the code size:
 
 $ ls -l src/backend/postgres_stock src/backend/postgres
 -rwxr-xr-x 1 andres andres 37054832 Aug  5 15:18 src/backend/postgres_stock
 -rwxr-xr-x 1 andres andres 37209288 Aug  5 15:23 src/backend/postgres
 
 $ size src/backend/postgres_stock src/backend/postgres
text  data bss dec hex filename
 7026843 49982  298584 7375409  708a31 src/backend/postgres_stock
 7023851 49982  298584 7372417  707e81 src/backend/postgres
 
 stock is the binary compiled without the conversion.
 
 A lot of the size difference is debugging information (which now needs
 less duplicated information on each callsite), but you can see that the
 text section (the actual executable code) shrank by 3k.
 
 stripping the binary shows exactly that:
 -rwxr-xr-x 1 andres andres 7076760 Aug  5 15:44 src/backend/postgres_s
 -rwxr-xr-x 1 andres andres 7079512 Aug  5 15:45 src/backend/postgres_stock_s
 
 To be sure this doesn't cause problems I ran a readonly pgbench. There's
 a very slight, but reproducible, performance benefit. I don't think
 that's a reason for the change, I just wanted to make sure there's no
 regressions.

Slightly updated version attached. The only changes are updates to some
comments referencing the 'fastgetattr macro' and the like. Oh, and an
additional newline.

In my opinion this drastically increases readability and thus should be
applied. Will do so sometime tomorrow unless there's protest.

Btw, I found that many changes are much more readable when changing
git's config to use histogramm diffs (git config --global diff.algorithm
histogram or --histogram).

Regards,

Andres
From db76501e9c987c9a0b847ea8a2256a1ec9150b37 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 11 Aug 2015 13:02:45 +0200
Subject: [PATCH] Convert fastgetattr() and heap_getattr() into inline
 functions.

The current macro is very hard to read. Now that we can unconditionally
use inline functions there's no reason to keep them that way.

In my gcc 5 based environment this shaves of nearly 200k of the final
binary size. A lot of that is debugging information, but the
.text (i.e. code) section alone shrinks by 3k.

Discussion: 20150805134636.gf12...@awork2.anarazel.de
---
 src/backend/access/common/heaptuple.c |   6 +-
 src/backend/access/heap/heapam.c  |  46 --
 src/include/access/htup_details.h | 160 --
 3 files changed, 78 insertions(+), 134 deletions(-)

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 045e0a7..017c551 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -328,8 +328,8 @@ heap_attisnull(HeapTuple tup, int attnum)
 /* 
  *		nocachegetattr
  *
- *		This only gets called from fastgetattr() macro, in cases where
- *		we can't use a cacheoffset and the value is not null.
+ *		This only gets called from fastgetattr(), in cases where we can't use
+ *		a cacheoffset and the value is not null.
  *
  *		This caches attribute offsets in the attribute descriptor.
  *
@@ -544,7 +544,7 @@ nocachegetattr(HeapTuple tuple,
  *
  *		Fetch the value of a system attribute for a tuple.
  *
- * This is a support routine for the heap_getattr macro.  The macro
+ * This is a support routine for the heap_getattr.  The inline function
  * has already determined that the attnum refers to a system attribute.
  * 
  */
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3701d8e..ff93b3c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -983,52 +983,6 @@ heapgettup_pagemode(HeapScanDesc scan,
 }
 
 
-#if defined(DISABLE_COMPLEX_MACRO)
-/*
- * This is formatted so oddly so that the correspondence to the macro
- * definition in access/htup_details.h is maintained.
- */
-Datum
-fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
-			bool *isnull)
-{
-	return (
-			(attnum)  0 ?
-			(
-			 (*(isnull) = false),
-			 HeapTupleNoNulls(tup) ?
-			 (
-			  (tupleDesc)-attrs[(attnum) - 1]-attcacheoff = 0 ?
-			  (
-			   fetchatt((tupleDesc)-attrs[(attnum) - 1],
-		(char *) (tup)-t_data + (tup)-t_data-t_hoff +
-		(tupleDesc)-attrs[(attnum) - 1]-attcacheoff)
-			   )
-			  :
-			  nocachegetattr((tup), (attnum), (tupleDesc))
-			  )
-			 :
-			 (
-			  att_isnull((attnum) - 1, (tup)-t_data-t_bits) ?
-			  (
-			   (*(isnull) = true),
-			   (Datum) NULL
-			   )
-			  :
-			  (
-			   nocachegetattr((tup), (attnum), (tupleDesc))
-			   )
-			  )
-			

Re: [HACKERS] How to compare different datums within from a tuple?

2015-08-11 Thread Anastasia Lubennikova

 Can someone tell me, how I can compare two datum fields, when I do not
 know the data type in advance inside an executor function?


In a nutshell, there is no way to compare Datums.
Datum is an abstact data type. It's the backend internal representation of
a single value of any SQL data type.
The code using Datum has to know which type it is, since the Datum itself
doesn't contain that information.


 For example, x less than y where x and y are of various types that form
 intervals. I have found the method ExecTuplesMatch, but it is only for
 equality comparison, I think. Another one is ApplySortComparator... maybe
 that's the correct way to go?

 Some code to make things clearer...

 Datum x = heap_getattr(out-tts_tuple,
 node-xpos,
 out-tts_tupleDescriptor,
 isNull1);
 Datum y = slot_getattr(curr, node-ypos, isNull2);

 if (compareDatumWithCorrectMethod(x,y)  0)
 {
  /* do something */
 }

 Thx,
 Peter


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


Maybe you can use DatumGetXXX function to extract value. For example,
DatumGetInt32.
http://doxygen.postgresql.org/postgres_8h.html#aacbc8a3ac6d52d85feaf0b7ac1b1160c
-- 
Best regards,
Lubennikova Anastasia


[HACKERS] RLS and LEAKPROOF functions

2015-08-11 Thread Ted Toth
I built an index on a jsonb column of a table with RLS enabled but it
was not being used. Turns out the the function jsonb_contains needed
to be LEAKPROOF (thanks Joe Conway). However I do not actually know if
jsonb_contains is LEAKPROOF. Who's responsibility is it to insure
functions are leakproof? It would be useful if this function was
documented and there was an indication as to it's leakproofness.

Ted


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] A \pivot command for psql

2015-08-11 Thread David Fetter
On Tue, Aug 11, 2015 at 10:13:48AM -0400, Robert Haas wrote:
 On Sun, Aug 9, 2015 at 8:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  psql is a minority API, you know.
 
 Not for me.  psql has already got a bunch of bells and whistles to
 format things in particular ways that people have wanted, and I'm not
 really sure why the bar for this proposal should be any higher.  Is
 this less useful than \pset linestype unicode, complete with trying to
 auto-detect whether to enable the feature?  Than automatically
 switching between expanded mode and regular mode based on the width of
 the TTY?  I considered those features rather frivolous, but it seemed
 to me that their inclusion was widely supported and that a number of
 people were really quite happy about them.
 
 I guess you could argue that those are inherently client-side features
 and this is not, and I'll grant that point.  But a client-side feature
 in 9.5

9.5?!?  I hope you meant 9.6 because if 9.5 is still open for new
features, we're going to have to explain to all the people who thought
they missed the cut-off why this one is jumping the line in front of
them.

 beats a server-side feature in 10.6 every day of the week.

Having done some of the preliminary research into a server-side
implementation, I see it a different way.

First, this is more of a 9.6 feature than a 10.6, at least in terms of
the first cut functionality, and I don't see good reasons why the
performance would need to be terrible in the first cut.  In
particular, the PIVOT case looks like it could pretty easily use the
FILTER machinery directly, while the UNPIVOT case could use what
LATERAL does.  Doubtless, further optimizations are possible for each,
and for interesting sub-cases, but that's a project for later, as
performance optimizations should be.

Second, if we put this feature as-is in psql, we're stuck supporting
it in psql until the end of time, even if (when, I believe) we have a
fuller and likely not perfectly compatible feature on the back-end.

That said, a thing in psql that could slice serialized output into
columns would be handy as a broad, general part of reporting in psql,
and would mesh quite nicely with a back-end PIVOT (SERIALIZATION FOO)
or whatever syntax we land on.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-08-11 Thread Fabien COELHO


Hello Andres,


[...] Right. At the cost of keeping track of all files...


Sure. Pg already tracks all files, and probably some more tracking would 
be necessary for an early fsync feature to know what are those already 
fsync'ed and what are those not yet fsync'ed.



Yes, the other version has a higher space overhead.


Yep, this is my concern.

I'm not convinced that's meaningful in comparison to shared buffers in 
space. And rather doubtful it a loss performance wise in a loaded 
server. All the buffer headers are touched on other cores and doing the 
sort with indirection will greatly increase bus traffic.


The measures I collected and reported showed that the sorting time is 
basically insignificant, so bus traffic induced by sorting does not seem 
to be an issue.


[...] It's not particularly desirable to have a performance feature that 
works less well if the server is heavily and concurrently loaded. The 
likelihood of bogus sort results will increase with the churn rate in 
shared buffers.


Hm.

In conclusion I'm not convinced that it is worth the memory, but I'm also 
tired of arguing, and hopefully nobody else cares about a few more bytes 
per shared_buffers, so why should I care?


Here is a v8, I reduced the memory overhead of the heavy weight approach 
from 24 to 16 bytes per buffer, so it is medium weight:-). It might be 
compacted further down to 12 bytes by combining the 2 bits of forkNum 
either with relNode or blockNum, and use a uint64_t comparison field with

all data so that the comparison code would be simpler and faster.
I also fixed the computation of the shmem size which I had not updated
when switching to shmem.

The patches still include the two guc, but it is easy to remove one or the 
other. They are useful is someone wants to test. The default is on for 
sort, and off for flush. Maybe it should be on for both.


--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e900dcc..1cec243 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2454,6 +2454,28 @@ include_dir 'conf.d'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-checkpoint-sort xreflabel=checkpoint_sort
+  termvarnamecheckpoint_sort/varname (typebool/type)
+  indexterm
+   primaryvarnamecheckpoint_sort/ configuration parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+Whether to sort buffers before writting them out to disk on checkpoint.
+For a HDD storage, this setting allows to group together
+neighboring pages written to disk, thus improving performance by
+reducing random write activity.
+This sorting should have limited performance effects on SSD backends
+as such storages have good random write performance, but it may
+help with wear-leveling so be worth keeping anyway.
+The default is literalon/.
+This parameter can only be set in the filenamepostgresql.conf/
+file or on the server command line.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-checkpoint-warning xreflabel=checkpoint_warning
   termvarnamecheckpoint_warning/varname (typeinteger/type)
   indexterm
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index e3941c9..f538698 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -546,6 +546,18 @@
   /para
 
   para
+   When hard-disk drives (HDD) are used for terminal data storage
+   xref linkend=guc-checkpoint-sort allows to sort pages
+   so that neighboring pages on disk will be flushed together by
+   chekpoints, reducing the random write load and improving performance.
+   If solid-state drives (SSD) are used, sorting pages induces no benefit
+   as their random write I/O performance is good: this feature could then
+   be disabled by setting varnamecheckpoint_sort/ to valueoff/.
+   It is possible that sorting may help with SSD wear leveling, so it may
+   be kept on that account.
+  /para
+
+  para
The number of WAL segment files in filenamepg_xlog/ directory depends on
varnamemin_wal_size/, varnamemax_wal_size/ and
the amount of WAL generated in previous checkpoint cycles. When old log
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 68e33eb..bee38ab 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7995,11 +7995,13 @@ LogCheckpointEnd(bool restartpoint)
 sync_secs,
 total_secs,
 longest_secs,
+sort_secs,
 average_secs;
 	int			write_usecs,
 sync_usecs,
 total_usecs,
 longest_usecs,
+sort_usecs,
 average_usecs;
 	uint64		average_sync_time;
 
@@ -8030,6 +8032,10 @@ LogCheckpointEnd(bool restartpoint)
 		CheckpointStats.ckpt_end_t,
 		total_secs, total_usecs);
 
+	TimestampDifference(CheckpointStats.ckpt_sort_t,
+		CheckpointStats.ckpt_sort_end_t,
+		sort_secs, sort_usecs);
+
 	/*
 	 * Timing values 

Re: [HACKERS] Reducing ClogControlLock contention

2015-08-11 Thread Simon Riggs
On 11 August 2015 at 14:53, Amit Kapila amit.kapil...@gmail.com wrote:


 One more point here why do we need CommitLock before calling
 SimpleLruReadPage_ReadOnly() in the patch and if it is not required,
 then can we use LWLockAcquire(shared-buffer_locks[slotno], LW_EXCLUSIVE);
 instead of CommitLock?


That prevents read only access, not just commits, so that isn't a better
suggestion.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-11 Thread Simon Riggs
On 11 August 2015 at 11:39, Amit Kapila amit.kapil...@gmail.com wrote:

 On Tue, Aug 11, 2015 at 3:44 PM, Simon Riggs si...@2ndquadrant.com
 wrote:

 On 11 August 2015 at 10:55, Amit Kapila amit.kapil...@gmail.com wrote:

  What tricks are being used??
 
  Please explain why taking 2 locks is bad here, yet works fine
 elsewhere.
 

 One thing that could be risky in this new scheme of locking
 is that now in functions TransactionIdSetPageStatus and
 TransactionIdSetStatusBit, we modify slru's shared state with Control
 Lock
 in Shared mode whereas I think it is mandated in the code that those
 should be modified with ControlLock in Exlusive mode.  This could have
 some repercussions.


 Do you know of any? This is a technical forum, so if we see a problem we
 say what it is, and if we don't, that's usually classed as a positive point
 in a code review.


 One of the main reason of saying this is that it is written in File
 level comments in slru.c that for accessing (examine or modify)
 the shared state, Control lock *must* be held in Exclusive mode
 except in function SimpleLruReadPage_ReadOnly().  So, whereas
 I agree that I should think more about if there is any breakage due
 to patch, but I don't find any explanation either in your e-mail or in
 patch why it is safe to modify the state after patch when it was not
 before.  If you think it is safe, then atleast modify comments in
 slru.c.


except...

I explained that a reader will never be reading data that is concurrently
changed by a writer, so it was safe to break the general rule for this
specific case only.

Yes, I will modify comments in the patch.


 Another thing is that in this flow, with patch there will be three locks
 (we take per-buffer locks before doing I/O) that will get
 involved rather than
 two, so one effect of this patch will be that currently while doing I/O,
 concurrent committers will be allowed to proceed as we
 release ControlLock
 before doing the same whereas with Patch, they will not be allowed as
 they
 are blocked by CommitLock.  Now may be this scenario is less common and
 doesn't matter much if the patch improves the more common scenario,
 however this is an indication of what Andres tries to highlight that
 having more
 locks for this might make patch more complicated.


 It's easy to stripe the CommitLock in that case, if it is a problem.


 Sure, I think other places in code that take both the other locks also
 needs to be checked for updation.


 Not sure what that means, but there are no other places calling CommitLock

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] [patch] A \pivot command for psql

2015-08-11 Thread Daniel Verite
David Fetter wrote:

 That depends on what you mean by dynamic columns.  The approach
 taken in the tablefunc extension is to use functions which return
 SETOF RECORD, which in turn need to be cast at runtime.

For me, PIVOT with dynamic columns would be a pivot query
whose output columns are not enumerated as input in the
SQL query itself, in any form.

 The second, more on point, is to specify a serialization for the rows
 in the dynamic columns case.  Their syntax is PIVOT XML, but I
 would rather do something more like PIVOT (SERIALIZATION XML).

The SERIALIZATION  looks interesting, but I believe these days JSON
would make more sense than XML, both as easier for the client-side and
because of all the json_* functions we now have to mix json with
relational structures.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-11 Thread Amit Kapila
On Tue, Aug 11, 2015 at 4:09 PM, Amit Kapila amit.kapil...@gmail.com
wrote:

 On Tue, Aug 11, 2015 at 3:44 PM, Simon Riggs si...@2ndquadrant.com
wrote:


 Another thing is that in this flow, with patch there will be three locks
 (we take per-buffer locks before doing I/O) that will get involved
rather than
 two, so one effect of this patch will be that currently while doing I/O,
 concurrent committers will be allowed to proceed as we release
ControlLock
 before doing the same whereas with Patch, they will not be allowed as
they
 are blocked by CommitLock.  Now may be this scenario is less common and
 doesn't matter much if the patch improves the more common scenario,
 however this is an indication of what Andres tries to highlight that
having more
 locks for this might make patch more complicated.


 It's easy to stripe the CommitLock in that case, if it is a problem.


 Sure, I think other places in code that take both the other locks also
 needs to be checked for updation.


One more point here why do we need CommitLock before calling
SimpleLruReadPage_ReadOnly() in the patch and if it is not required,
then can we use LWLockAcquire(shared-buffer_locks[slotno], LW_EXCLUSIVE);
instead of CommitLock?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] statistics for array types

2015-08-11 Thread Jeff Janes
When reviewing some recent patches, I decided the statistics gathered for
arrays had some pre-existing shortcomings.

The main one is that when the arrays contain rare elements there is no
histogram to fall back upon when the MCE array is empty, the way there is
for scalar stats.  So it has to punt completely and resort to saying that
it is 0.5% selectivity without recourse to any data at all.

The rationale for applying the threshold before things are eligible for
inclusion in the MCE array seems to be that this puts some theoretical
bound on the amount of error we are likely to have in that element.  But I
think it is better to exceed that theoretical bound than it is to have no
data at all.

The attached patch forces there to be at least one element in MCE, keeping
the one element with the highest predicted frequency if the MCE would
otherwise be empty.  Then any other element queried for is assumed to be no
more common than this most common element.

I'd also briefly considered just having the part of the code that pulls the
stats out of pg_stats interpret a MCE array as meaning that nothing is more
frequent than the threshold, but that would mean that that part of the code
needs to know about how the threshold is chosen, which just seems wrong.
And it would need to know the difference between NULL MCE because no stats
were gathered, versus because stats were gathered but nothing met the
threshold.

I'm only marginally interested in this area, but since I did the
investigation into it already I wanted to present it here.  I'd be quite
happy if someone else wanted to take over the patch (or the concept behind
it).

Here is an example, where the estimate drops from 500 rows to 3 rows where
the correct number is zero:

create table foobar as
select (
  select
array_agg(floor((random()*10+g.y/100+f.x/1000))::int)
from generate_series(1,10) g(y)
  ) foo
from generate_series(1,10) f(x);

Unpatched:

analyze;

explain (analyze) select * from foobar where foo @ '{567}';
   QUERY PLAN

 Seq Scan on foobar  (cost=0.00..2387.00 rows=500 width=61) (actual
time=76.448..76.448 rows=0 loops=1)
   Filter: (foo @ '{567}'::integer[])
   Rows Removed by Filter: 10
 Planning time: 0.211 ms
 Execution time: 76.492 ms


With patch:

analyze;

explain (analyze) select * from foobar where foo @ '{567}';
  QUERY PLAN
--
 Seq Scan on foobar  (cost=0.00..2387.00 rows=3 width=61) (actual
time=78.604..78.604 rows=0 loops=1)
   Filter: (foo @ '{567}'::integer[])
   Rows Removed by Filter: 10
 Planning time: 0.199 ms
 Execution time: 78.665 ms

Cheers,

Jeff


array_type_analyze_MCE_V001.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] A \pivot command for psql

2015-08-11 Thread Robert Haas
On Sun, Aug 9, 2015 at 8:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 psql is a minority API, you know.

Not for me.  psql has already got a bunch of bells and whistles to
format things in particular ways that people have wanted, and I'm not
really sure why the bar for this proposal should be any higher.  Is
this less useful than \pset linestype unicode, complete with trying to
auto-detect whether to enable the feature?  Than automatically
switching between expanded mode and regular mode based on the width of
the TTY?  I considered those features rather frivolous, but it seemed
to me that their inclusion was widely supported and that a number of
people were really quite happy about them.

I guess you could argue that those are inherently client-side features
and this is not, and I'll grant that point.  But a client-side feature
in 9.5 beats a server-side feature in 10.6 every day of the week.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Robert Haas
On Sun, Aug 9, 2015 at 2:42 PM, Josh Berkus j...@agliodbs.com wrote:
 On 08/09/2015 08:09 AM, Robert Haas wrote:
 Why do we need to be able to authenticate using more than one
 mechanism?  If you have some clients that can't support SCRAM yet, you
 might as well continue using MD5 across the board until that changes.
 You're not going to get much real security out of using MD5 for some
 authentication attempts and SCRAM for other ones,

 Speaking as someone who has sheperded several clients through
 infrastructure upgrades, I have to disagree with this.

 First, people don't upgrade large infrastructures with multiple
 applications, ETL processes and APIs which connect with the database all
 at once.  They do it one component at a time, verify that component is
 working, and then move on to the next one.  Even within a single
 application, there could be many servers to upgrade, and you can't do
 them all simultaneously.

Right.  So what?  First, you upgrade all of the clients one by one to
a new version of the connector that supports SCRAM.
Second, once all of the clients that access a particular user account
can support SCRAM, you switch that account to use SCRAM.  The problem
you're talking about here would arise if, say, the JDBC maintainers
ripped out MD5 support at the same time they added SCRAM support.  But
that would be dumb.

 Second, you're forgetting hosted PostgreSQL, where there may be only one
 user available to each database owner.  So assigning a new login role
 for SCRAM isn't even an option.

Why do I care about having some of my authentication to a particular
role happen via SCRAM if I can't have it all happen via SCRAM?
Presumably, the benefit of SCRAM is that it's more secure than MD5
authentication.  So, if I can log into role X via MD5 but role Y only
via SCRAM, there might be some security benefit in that: an attacker
who can subvert MD5 but cannot subvert SCRAM can hack into role X but
not into role Y.  Good.  But allowing someone to authenticate to role
X via *either* SCRAM or MD5 doesn't help at all.  In fact it's
strictly worse, from a security perspective, than allowing someone to
authenticate via exactly one of those two methods, because now if the
attacker can subvert *either* of them he can break into that account.

In the hosted PostgreSQL situation you mention, there are really only
two cases.  Either all clients that need to log into that account can
support SCRAM, in which case we can use SCRAM and shut MD5 off.  Or
else they don't, and then we need to leave MD5 enabled.  But if we
have to leave MD5 enabled, then what exactly do we get out of using
SCRAM for some subset of the clients that can support it?

There may be a good answer to this question, but I don't think I've
seen it spelled out clearly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Josh Berkus
On 08/11/2015 07:28 AM, Robert Haas wrote:
 There may be a good answer to this question, but I don't think I've
 seen it spelled out clearly.

Please see my follow-up post about making by-login-role migration easier
for users.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GinPageIs* don't actually return a boolean

2015-08-11 Thread Robert Haas
On Tue, Aug 11, 2015 at 12:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We do not use !! elsewhere for this purpose, and I for one find it a
 pretty ugly locution.

We do, actually, now, in contrib/pg_xlogdump/pg_xlogdump.c.  I'd be in
favor of getting rid of those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Josh Berkus
On 08/11/2015 09:35 AM, Robert Haas wrote:
 On Tue, Aug 11, 2015 at 12:29 PM, Josh Berkus j...@agliodbs.com wrote:
 On 08/11/2015 07:28 AM, Robert Haas wrote:
 There may be a good answer to this question, but I don't think I've
 seen it spelled out clearly.

 Please see my follow-up post about making by-login-role migration easier
 for users.
 
 I read it, and now I've reread it, but I don't see how it addresses
 the points I raised.

I'm not disagreeing with your security argument, BTW, which is why I'm
trying to come up with ways that make it easy for users to switch to
SCRAM via gradual rollout.

You're suggesting, then, that the switchover should be relatively easy,
because drivers will support both MD5 and SCRAM, and once all drivers
support both, the DBA can just swap verifiers?

That makes sense if drivers go that way.  I'm concerned that some
drivers will have a different call for a SCRAM connection than for an
MD5 one; we'd want to exert our project influence to prevent that from
happening.

That also makes it a bit harder to test the new auth on a few app
servers before a general rollout, but there's ways around that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GinPageIs* don't actually return a boolean

2015-08-11 Thread Alvaro Herrera
Andres Freund wrote:
 On 2015-08-11 12:43:03 -0400, Robert Haas wrote:
  On Tue, Aug 11, 2015 at 12:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   We do not use !! elsewhere for this purpose, and I for one find it a
   pretty ugly locution.
  
  We do, actually, now, in contrib/pg_xlogdump/pg_xlogdump.c.  I'd be in
  favor of getting rid of those.
 
 And a bunch more places actually. Blame me. I'll come up with a patch
 fixing the macros and converting existing !! style ones.

Actually there's one that's not yours ...

alvin: master 0$ git grep '!!' -- \*.c | grep -v '!!!'
contrib/isn/isn.c: * It's a helper function, not intended to be used!!
contrib/sepgsql/uavc.c: sepgsql_audit_log(!!denied,
src/backend/access/gist/gist.c: /* yes!!, found 
*/
src/backend/access/gist/gist.c:  * awful!!, we need search tree to find 
parent ... , but before we
src/backend/access/gist/gistbuild.c:/* yes!!, found it */
src/backend/access/transam/xact.c:  Assert(!!(parsed-xinfo  
XACT_XINFO_HAS_ORIGIN) == (origin_id != InvalidRepOriginId));
src/backend/executor/nodeModifyTable.c: 
* ctid!! */
src/backend/replication/logical/reorderbuffer.c:Assert(!create || 
!!txn);
src/backend/storage/lmgr/lwlock.c:  
!!(state  LW_VAL_EXCLUSIVE),
src/backend/storage/lmgr/lwlock.c:  
!!(state  LW_FLAG_HAS_WAITERS),
src/backend/storage/lmgr/lwlock.c:  
!!(state  LW_FLAG_RELEASE_OK;
src/backend/tsearch/wparser_def.c: * order must be the same as in typedef enum 
{} TParserState!!
src/backend/utils/adt/like.c: * A big hack of the regexp.c code!! 
Contributed by
src/bin/pg_ctl/pg_ctl.c: * manager checkpoint, it's got nothing to do with 
database checkpoints!!
src/interfaces/ecpg/preproc/c_keywords.c: * !!WARNING!!: This list must be 
sorted, because binary
src/interfaces/ecpg/preproc/ecpg_keywords.c: * !!WARNING!!: This list must be 
sorted, because binary
src/pl/plpgsql/src/pl_scanner.c: * !!WARNING!!: These lists must be sorted by 
ASCII name, because binary

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] A \pivot command for psql

2015-08-11 Thread David Fetter
On Tue, Aug 11, 2015 at 05:13:03PM +0200, Daniel Verite wrote:
   David Fetter wrote:
 
  That depends on what you mean by dynamic columns.  The approach
  taken in the tablefunc extension is to use functions which return
  SETOF RECORD, which in turn need to be cast at runtime.
 
 For me, PIVOT with dynamic columns would be a pivot query
 whose output columns are not enumerated as input in the
 SQL query itself, in any form.

I'm pretty sure people will want to be able to specify them in some
form.  On one implementation, it looks like:

select * from (
   select times_purchased as Purchase Frequency, state_code
   from customers t
)
pivot xml
(
count(state_code)
for state_code in (select state_code from preferred_states)
)
order by 1

Another basically punts by making you responsible for generating the
SQL dynamically, a move I regard as a horrible UX failure.

  The second, more on point, is to specify a serialization for the rows
  in the dynamic columns case.  Their syntax is PIVOT XML, but I
  would rather do something more like PIVOT (SERIALIZATION XML).
 
 The SERIALIZATION  looks interesting, but I believe these days JSON
 would make more sense than XML, both as easier for the client-side and
 because of all the json_* functions we now have to mix json with
 relational structures.

I proposed SERIALIZATION as a parameter precisely so we could use
different ones for different cases.  JSON is certainly popular this
year, as XML was in prior years.  I may be wrong, but I'm certain that
there will be new ones, even popular ones, that haven't yet been
invented.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 64-bit XIDs again

2015-08-11 Thread Simon Riggs
On 30 July 2015 at 14:26, Alexander Korotkov a.korot...@postgrespro.ru
wrote:


 As I mentioned in CSN thread, it would be nice to replace XID with CSN
 when setting hint bits for tuple. In this case when hint bits are set we
 don't need any additional lookups to check visibility.

 http://www.postgresql.org/message-id/CAPpHfdv7BMwGv=ofug3s-jgvfkqhi79pr_zk1wsk-13oz+c...@mail.gmail.com
 Introducing 32-bit CSN doesn't seem reasonable for me, because it would
 double our troubles with wraparound.


Your idea to replace XIDs with CSNs instead of hinting them was a good one.
It removes the extra-lookup we thought we needed to check visibility with
CSN snapshots.

I agree 32-bit CSNs would not be a good idea though, a 64-bit CSN is needed.

If we break a CSN down into an Epoch and a 32-bit value then it becomes
more easily possible. The Epoch for XID and CSN can be the same - whichever
wraps first we just increment the Epoch.

By doing this we can reuse the page-level epoch for both XID and CSN. Now
hinting a tuple is just replacing a 32-bit XID with a 32-bit CSN.

We would probably need an extra flag bit for the case where the CSN is one
epoch later than the XID.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] GinPageIs* don't actually return a boolean

2015-08-11 Thread Robert Haas
On Tue, Aug 11, 2015 at 11:42 AM, Andres Freund and...@anarazel.de wrote:
 #define GinPageIsLeaf(page)( GinPageGetOpaque(page)-flags  GIN_LEAF )
 #define GinPageIsData(page)( GinPageGetOpaque(page)-flags  GIN_DATA )
 #define GinPageIsList(page)( GinPageGetOpaque(page)-flags  GIN_LIST )
 ...

 These macros don't actually return a boolean that's comparable with our
 true/false. That doesn't strike me as a good idea.

 If there's actually a boolean type defined by some included header (in
 which case we don't overwrite it in c.h!) this actually can lead to
 tests failing. If e.g. stdbool.h is included in c.h the tests fail with
 gcc-4.9.

!! is unknown to our codebase except where you've added it, and
personally, I hate that idiom.  I think we should write (val) != 0
instead of !!val.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving SS_finalize_plan processing to the end of planning

2015-08-11 Thread Robert Haas
On Sun, Aug 9, 2015 at 3:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I've started to work on path-ification of the upper planner (finally),

I don't have any specific technical comments on what you've proposed
here, but I'm excited to hear that this is moving forward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgbench bug in head

2015-08-11 Thread Fabien COELHO


Probably since 1bc90f7a (shame on me) pgbench does not report skipped 
transactions (-L) counts properly: data are not aggregated over all 
threads as they should be, either under --progress or in the end of run 
report.


The attached patch fixes these regressions.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 6f5bd99..2e55c90 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3447,8 +3447,8 @@ main(int argc, char **argv)
 
 		/* thread level stats */
 		throttle_lag += thread-throttle_lag;
-		throttle_latency_skipped = threads-throttle_latency_skipped;
-		latency_late = thread-latency_late;
+		throttle_latency_skipped += threads-throttle_latency_skipped;
+		latency_late += thread-latency_late;
 		if (throttle_lag_max  thread-throttle_lag_max)
 			throttle_lag_max = thread-throttle_lag_max;
 		INSTR_TIME_ADD(conn_total_time, thread-conn_time);
@@ -3759,7 +3759,10 @@ threadRun(void *arg)
 }
 
 for (i = 0; i  progress_nthreads; i++)
+{
+	skipped += thread[i].throttle_latency_skipped;
 	lags += thread[i].throttle_lag;
+}
 
 total_run = (now - thread_start) / 100.0;
 tps = 100.0 * (count - last_count) / run;
@@ -3767,7 +3770,6 @@ threadRun(void *arg)
 sqlat = 1.0 * (sqlats - last_sqlats) / (count - last_count);
 stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency);
 lag = 0.001 * (lags - last_lags) / (count - last_count);
-skipped = thread-throttle_latency_skipped - last_skipped;
 
 fprintf(stderr,
 		progress: %.1f s, %.1f tps, 
@@ -3777,7 +3779,8 @@ threadRun(void *arg)
 {
 	fprintf(stderr, , lag %.3f ms, lag);
 	if (latency_limit)
-		fprintf(stderr, ,  INT64_FORMAT  skipped, skipped);
+		fprintf(stderr, ,  INT64_FORMAT  skipped,
+skipped - last_skipped);
 }
 fprintf(stderr, \n);
 
@@ -3786,7 +3789,7 @@ threadRun(void *arg)
 last_sqlats = sqlats;
 last_lags = lags;
 last_report = now;
-last_skipped = thread-throttle_latency_skipped;
+last_skipped = skipped;
 
 /*
  * Ensure that the next report is in the future, in case

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] A \pivot command for psql

2015-08-11 Thread Daniel Verite
David Fetter wrote:

 Second, if we put this feature as-is in psql, we're stuck supporting
 it in psql until the end of time, even if (when, I believe) we have a
 fuller and likely not perfectly compatible feature on the back-end.

To me, doing \pivot in psql vs PIVOT in the backend is a false
dichotomy, both would be desirable to have in any order.

Having PIVOT in SQL is more important because it will help in a
broader range of cases.

But, even if I fail to be convincing on this, it will still be
much easier in a psql context to just type \pivot than turning a
non-pivot query into a pivoted equivalent, even if we achieve the
best possible (in ease of use) server-side SQL implementation.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GinPageIs* don't actually return a boolean

2015-08-11 Thread Andres Freund
On 2015-08-11 12:04:48 -0400, Robert Haas wrote:
 On Tue, Aug 11, 2015 at 11:42 AM, Andres Freund and...@anarazel.de wrote:
  #define GinPageIsLeaf(page)( GinPageGetOpaque(page)-flags  GIN_LEAF )
  #define GinPageIsData(page)( GinPageGetOpaque(page)-flags  GIN_DATA )
  #define GinPageIsList(page)( GinPageGetOpaque(page)-flags  GIN_LIST )
  ...
 
  These macros don't actually return a boolean that's comparable with our
  true/false. That doesn't strike me as a good idea.
 
  If there's actually a boolean type defined by some included header (in
  which case we don't overwrite it in c.h!) this actually can lead to
  tests failing. If e.g. stdbool.h is included in c.h the tests fail with
  gcc-4.9.
 
 !! is unknown to our codebase except where you've added it, and
 personally, I hate that idiom.  I think we should write (val) != 0
 instead of !!val.

Hm. I find !! slightly simpler and it's pretty widely used in other
projects, but I don't care much. As long as we fix the underlying issue,
which != 0 certainly does...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GinPageIs* don't actually return a boolean

2015-08-11 Thread Andres Freund
On 2015-08-11 12:43:03 -0400, Robert Haas wrote:
 On Tue, Aug 11, 2015 at 12:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  We do not use !! elsewhere for this purpose, and I for one find it a
  pretty ugly locution.
 
 We do, actually, now, in contrib/pg_xlogdump/pg_xlogdump.c.  I'd be in
 favor of getting rid of those.

And a bunch more places actually. Blame me. I'll come up with a patch
fixing the macros and converting existing !! style ones.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GinPageIs* don't actually return a boolean

2015-08-11 Thread Andres Freund
On 2015-08-11 17:42:37 +0200, Andres Freund wrote:
 Hi,
 
 #define GinPageIsLeaf(page)( GinPageGetOpaque(page)-flags  GIN_LEAF )
 #define GinPageIsData(page)( GinPageGetOpaque(page)-flags  GIN_DATA )
 #define GinPageIsList(page)( GinPageGetOpaque(page)-flags  GIN_LIST )
 ...
 
 These macros don't actually return a boolean that's comparable with our
 true/false. That doesn't strike me as a good idea.
 
 If there's actually a boolean type defined by some included header (in
 which case we don't overwrite it in c.h!) this actually can lead to
 tests failing. If e.g. stdbool.h is included in c.h the tests fail with
 gcc-4.9.

I guess the reason here is that if it's a boolean type known to the
compiler it's free to assume that it only contains true/false...

 I think we should add a !! to these macros to make sure it's an actual
 boolean.
 
 This has been the case since gin's initial commit in 8a3631f8d86cdd9b0 .

Even worse, we have that kind of macro all over. I thought
e.g. HeapTupleHeaderIs* would be safe agains that, but that turns out
not be the case.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Robert Haas
On Tue, Aug 11, 2015 at 12:29 PM, Josh Berkus j...@agliodbs.com wrote:
 On 08/11/2015 07:28 AM, Robert Haas wrote:
 There may be a good answer to this question, but I don't think I've
 seen it spelled out clearly.

 Please see my follow-up post about making by-login-role migration easier
 for users.

I read it, and now I've reread it, but I don't see how it addresses
the points I raised.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GinPageIs* don't actually return a boolean

2015-08-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 #define GinPageIsLeaf(page)( GinPageGetOpaque(page)-flags  GIN_LEAF )
 #define GinPageIsData(page)( GinPageGetOpaque(page)-flags  GIN_DATA )
 #define GinPageIsList(page)( GinPageGetOpaque(page)-flags  GIN_LIST )

 These macros don't actually return a boolean that's comparable with our
 true/false. That doesn't strike me as a good idea.

Agreed, this is risky.  For example, if the bit being tested is to the
left of the lowest byte of flags, storing the result into a bool
variable would do the wrong thing.

 I think we should add a !! to these macros to make sure it's an actual
 boolean.

Please write it more like

#define GinPageIsLeaf(page)  ((GinPageGetOpaque(page)-flags  GIN_LEAF) != 0)

We do not use !! elsewhere for this purpose, and I for one find it a
pretty ugly locution.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GinPageIs* don't actually return a boolean

2015-08-11 Thread Andres Freund
Hi,

#define GinPageIsLeaf(page)( GinPageGetOpaque(page)-flags  GIN_LEAF )
#define GinPageIsData(page)( GinPageGetOpaque(page)-flags  GIN_DATA )
#define GinPageIsList(page)( GinPageGetOpaque(page)-flags  GIN_LIST )
...

These macros don't actually return a boolean that's comparable with our
true/false. That doesn't strike me as a good idea.

If there's actually a boolean type defined by some included header (in
which case we don't overwrite it in c.h!) this actually can lead to
tests failing. If e.g. stdbool.h is included in c.h the tests fail with
gcc-4.9.

I think we should add a !! to these macros to make sure it's an actual
boolean.

This has been the case since gin's initial commit in 8a3631f8d86cdd9b0 .

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mention column name in error messages

2015-08-11 Thread Robert Haas
On Sun, Aug 9, 2015 at 11:44 AM, Franck Verrot fra...@verrot.fr wrote:
 On Wed, Jul 1, 2015 at 12:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 What seems more likely to lead to a usable patch is to arrange for the
 extra information you want to be emitted as error context, via an error
 context callback that gets installed at the right times.  ...
 ...
 with no need for int8in to be directly aware of the context.  You should
 try adapting that methodology for the cases you're worried about.


 Hi Tom (and others),

 Sorry it took so long for me to follow up on this, hopefully I found a
 couple
 a hours today to try writing another patch.

 In any case, thanks for reviewing my first attempt and taking time to write
 such a detailed critique... I've learned a lot!

 I am now using the error context callback stack. The current column name
 and column type are passed to the callback packed inside  a new structure
 of type TransformExprState.
 Those information are then passed to `errhint` and will be presented to the
 user later on (in case of coercion failure).


 Please find the WIP patch attached.
 (I've pushed the patch on my GH fork[1] too).

To make sure this doesn't get forgotten about, you may want to add it here:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Robert Haas
On Tue, Aug 11, 2015 at 12:49 PM, Josh Berkus j...@agliodbs.com wrote:
 You're suggesting, then, that the switchover should be relatively easy,
 because drivers will support both MD5 and SCRAM, and once all drivers
 support both, the DBA can just swap verifiers?

Yes, that's what I was imagining would happen.  I can't imagine driver
authors wanting to remove support from MD5, because even if SCRAM goes
into 9.6, pre-9.6 servers are going to exist for many years to come,
and people are going to want to talk to them.

It seems to me that the protocol flow should be:

(1) Client sends StartupMessage.

(2) Server checks whether this user has an MD5 password verifier or a
SCRAM password verifier.  If the former, it responds with
AuthenticationMD5Password or AuthenticationCleartextPassword just as
it would do today, I guess based on pg_hba.conf.  If the latter, it
responds with a new protocol message AuthenticationScram.

So, if you switch the password verifier, the clients will all
automatically begin using SCRAM, because the server will tell them to.
And if they can't, they'll fail.

 That makes sense if drivers go that way.  I'm concerned that some
 drivers will have a different call for a SCRAM connection than for an
 MD5 one; we'd want to exert our project influence to prevent that from
 happening.

I'm not sure that would be a disaster, but do any existing drivers
have a different call for a cleartext password
(pg_hba.conf='password') than they do for an MD5 password
(pg_hba.conf='md5')?  If not, I'm not sure why they'd add that just
because there is now a third way of doing password-based
authentication.

 That also makes it a bit harder to test the new auth on a few app
 servers before a general rollout, but there's ways around that.

Well, staging servers are a good idea...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and search_path

2015-08-11 Thread Robert Haas
On Mon, Aug 10, 2015 at 1:10 PM, Steve Thames sthame...@gmail.com wrote:
 Please consider making the arbitrary determination of search_path by pg_dump
 an optional behavior. Or better yet, just have it generate a backup that
 accurately reflects the database it is backing up.

Hmm, I don't think it's a question of making it optional.  I think the
current behavior is just a bug, and should be fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and search_path

2015-08-11 Thread Steve Thames
Thank you gentlemen for clarifying this.

I found this problem when my database modeling tool saw a change in the
database (the nextval() parameters) after a database restore.
I guess the tool must be reading adsrc for this information.

Cheers,
Steve Thames

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Tuesday, August 11, 2015 10:41 AM
To: Alvaro Herrera
Cc: Steve Thames; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] pg_dump and search_path

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Don't ever rely on adsrc.  It's useless.  Use pg_get_expr(adbin) 
 instead.  That's safe, for instance, if the sequence gets renamed.

It's probably past time we got rid of that column altogether.  It just
wastes space and cycles.  There was an argument for not being too quick to
get rid of it, but we deprecated it in 7.2 ... surely people have had more
than enough time to fix their applications.

regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and search_path

2015-08-11 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Don't ever rely on adsrc.  It's useless.  Use pg_get_expr(adbin)
 instead.  That's safe, for instance, if the sequence gets renamed.

It's probably past time we got rid of that column altogether.  It just
wastes space and cycles.  There was an argument for not being too quick
to get rid of it, but we deprecated it in 7.2 ... surely people have had
more than enough time to fix their applications.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Intentional usage of old style function declarations?

2015-08-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 We have a bunch of callbacks that use old-style C function
 declarations. I.e. functions with empty parens allowing all types of
 data being passed:

 E.g.
 extern bool expression_tree_walker(Node *node, bool (*walker) (),
void *context);
 extern Node *expression_tree_mutator(Node *node, Node *(*mutator) (),
  void *context);

 I find that to be fairly ugly. Was that intentional?

Yes.  If we had the signature of the walker written out explicitly as say

  bool (*walker) (Node *node, void *context)

then every walker function would have to be declared that way (rather than
being declared with its true context pointer type), requiring casting away
from void * inside the walker.  Or else we could explicitly cast walker
function names to a typedef for walker_function everywhere we call
expression_tree_walker.  Both are notationally pains in the rear, and
what's more, either solution totally destroys the argument that you've
gained any type-safety.  I don't think a forced cast is better than a weak
declaration.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and search_path

2015-08-11 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 10, 2015 at 1:10 PM, Steve Thames sthame...@gmail.com wrote:
 Please consider making the arbitrary determination of search_path by pg_dump
 an optional behavior. Or better yet, just have it generate a backup that
 accurately reflects the database it is backing up.

 Hmm, I don't think it's a question of making it optional.  I think the
 current behavior is just a bug, and should be fixed.

It is not a bug, and as far as I can see what Steve is complaining about
isn't even pg_dump's behavior: it is just how regclass constants work.
regclass_out only qualifies the name if it wouldn't be found in the
current search path.  This is a display behavior and has nothing to do
with what the actual value of the constant is:

regression=# create schema s1;
CREATE SCHEMA
regression=# create table s1.t1 (f1 serial);
CREATE TABLE
regression=# \d s1.t1
 Table s1.t1
 Column |  Type   | Modifiers  
+-+
 f1 | integer | not null default nextval('s1.t1_f1_seq'::regclass)

regression=# set search_path = s1;
SET
regression=# \d s1.t1
   Table s1.t1
 Column |  Type   |Modifiers
+-+-
 f1 | integer | not null default nextval('t1_f1_seq'::regclass)


Now, if pg_dump produced a file that failed to restore this state
of affairs correctly, that would be a bug.  But I have seen no
evidence suggesting that it doesn't get it right.  The way that the
commands are spelled in the dump file is an implementation detail.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Josh Berkus
On 08/11/2015 10:06 AM, Robert Haas wrote:
 On Tue, Aug 11, 2015 at 12:49 PM, Josh Berkus j...@agliodbs.com wrote:
 That makes sense if drivers go that way.  I'm concerned that some
 drivers will have a different call for a SCRAM connection than for an
 MD5 one; we'd want to exert our project influence to prevent that from
 happening.
 
 I'm not sure that would be a disaster, but do any existing drivers
 have a different call for a cleartext password
 (pg_hba.conf='password') than they do for an MD5 password
 (pg_hba.conf='md5')?  If not, I'm not sure why they'd add that just
 because there is now a third way of doing password-based
 authentication.

Well, there is a different send-and-response cycle to the SCRAM
approach, no?  Plus, I've seen driver authors do strange things in the
past, including PHP's various drivers and pypgsql, which IIRC required
you to manually pick a protocol version.  I'm not saying we should plan
for bad design, we should just get the word out to driver authors that
we think it would be a good idea to support both methods transparently.

 That also makes it a bit harder to test the new auth on a few app
 servers before a general rollout, but there's ways around that.
 
 Well, staging servers are a good idea...

Don't get me started. :-b

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Intentional usage of old style function declarations?

2015-08-11 Thread Andres Freund
Hi,

We have a bunch of callbacks that use old-style C function
declarations. I.e. functions with empty parens allowing all types of
data being passed:

E.g.
extern bool expression_tree_walker(Node *node, bool (*walker) (),
   void *context);
extern Node *expression_tree_mutator(Node *node, Node *(*mutator) (),
 void *context);

I find that to be fairly ugly. Was that intentional? Fixing it would
imply adding a fair number of (Node *) casts as there's suddenly actual
parameter type checking done.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and search_path

2015-08-11 Thread Alvaro Herrera
Steve Thames wrote:

SELECT a.attnum, n.nspname, c.relname, d.adsrc AS default_value 
  FROM pg_attribute AS a 
  JOIN pg_class AS c ON a.attrelid = c.oid 
  JOIN pg_namespace AS n ON c.relnamespace = n.oid 
 LEFT JOIN pg_attrdef   AS d ON d.adrelid  = c.oid AND d.adnum = a.attnum
 WHERE a.attnum  0   
   AND n.nspname = 'testschema'  
   AND c.relname = 'testtable';

Don't ever rely on adsrc.  It's useless.  Use pg_get_expr(adbin)
instead.  That's safe, for instance, if the sequence gets renamed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to compare different datums within from a tuple?

2015-08-11 Thread Peter Eisentraut
On 8/10/15 12:36 PM, Peter Moser wrote:
 Can someone tell me, how I can compare two datum fields, when I do not
 know the data type in advance inside an executor function?
 
 For example, x less than y where x and y are of various types that
 form intervals. I have found the method ExecTuplesMatch, but it is only
 for equality comparison, I think. Another one is ApplySortComparator...
 maybe that's the correct way to go?
 
 Some code to make things clearer...
 
 Datum x = heap_getattr(out-tts_tuple,
 node-xpos,
 out-tts_tupleDescriptor,
 isNull1);
 Datum y = slot_getattr(curr, node-ypos, isNull2);
 
 if (compareDatumWithCorrectMethod(x,y)  0)
 {
  /* do something */
 }

The tuple descriptor will contain the data type of the datum, so you can
use that to look up the default btree operator class and call the
respective operators in there.  But note that there is no single notion
of comparison in the system.  Comparison depends on operator class,
access method, possibly collation, null value treatment.  Some types
don't support comparison beyond equality.  A robust patch would need to
take that into account.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and search_path

2015-08-11 Thread Alvaro Herrera
Steve Thames wrote:
 Thank you gentlemen for clarifying this.
 
 I found this problem when my database modeling tool saw a change in the
 database (the nextval() parameters) after a database restore.
 I guess the tool must be reading adsrc for this information.

You can tell for sure by setting log_statement=all.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Peter Eisentraut
On 8/11/15 10:28 AM, Robert Haas wrote:
 Right.  So what?  First, you upgrade all of the clients one by one to
 a new version of the connector that supports SCRAM.

In my experience, upgrading clients is, in many settings, significantly
harder than upgrading servers.  So I think any plan to starts like the
above isn't going to work.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_connections and standby server

2015-08-11 Thread Michael Paquier
On Tue, Aug 11, 2015 at 2:57 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Tatsuo Ishii is...@postgresql.org writes:
 I think this is because pg_control on the standby remembers that the
 previous primary server's max_connections = 1100 even if the standby
 server fails to start. Shouldn't we update pg_control file only when
 standby succeeds to start?

 Somebody refresh my memory as to why we have this restriction (that is,
 slave's max_connections = master's max_connections) in the first place?
 Seems like it should not be a necessary requirement, and working towards
 getting rid of it would be far better than any other answer.

 If I recall correctly, that's because KnownAssignedXIDs and the lock
 table need to be large enough on the standby for the largest snapshot
 possible (procarray.c).

... And the maximum number of locks possible on master (for the lock
table, wasn't it for the concurrent number of AccessExclusiveLocks,
btw?).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GIN pending clean up is not interruptable

2015-08-11 Thread Jeff Janes
When a user backend (as opposed to vacuum or autoanalyze) gets burdened
with cleaning up the GIN pending list, it does not
call CHECK_FOR_INTERRUPTS().

Since cleaning does a lot of random IO, it can take a long time and it is
not nice to be uninterruptable.

The attached patch adds an else branch to call CHECK_FOR_INTERRUPTS().

But I think we could instead just call vacuum_delay_point unconditionally.
It calls CHECK_FOR_INTERRUPTS(), and if not in a throttled vacuum it does
nothing else.  (That is how ANALYZE handles it.)

This issue is in all branches.

Cheers,

Jeff


gin_freelist_interrupt.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-08-11 Thread Robert Haas
On Fri, Aug 7, 2015 at 3:37 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 I could have a discussion with Fujita-san about this topic.

 Also, let me share with the discussion towards entire solution.

 The primitive reason of this problem is, Scan node with scanrelid==0
 represents a relation join that can involve multiple relations, thus,
 its TupleDesc of the records will not fit base relations, however,
 ExecScanFetch() was not updated when scanrelid==0 gets supported.

 FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible
 to generate records according to the fdw_/custom_scan_tlist that
 reflects the definition of relation join, and only FDW/CSP know how
 to combine these base relations.
 In addition, host-side expressions (like Plan-qual) are initialized
 to reference the records generated by FDW/CSP, so the least invasive
 approach is to allow FDW/CSP to have own logic to recheck, I think.

 Below is the structure of ExecScanFetch().

   ExecScanFetch(ScanState *node,
 ExecScanAccessMtd accessMtd,
 ExecScanRecheckMtd recheckMtd)
   {
   EState *estate = node-ps.state;

   if (estate-es_epqTuple != NULL)
   {
   /*
* We are inside an EvalPlanQual recheck.  Return the test tuple if
* one is available, after rechecking any access-method-specific
* conditions.
*/
   Index   scanrelid = ((Scan *) node-ps.plan)-scanrelid;

   Assert(scanrelid  0);
   if (estate-es_epqTupleSet[scanrelid - 1])
   {
   TupleTableSlot *slot = node-ss_ScanTupleSlot;
   :
   return slot;
   }
   }
   return (*accessMtd) (node);
   }

 When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and
 checks its visibility (ForeignRecheck() always say 'yep, it is visible'),
 then ExecScan() applies its qualifiers by ExecQual().
 So, as long as FDW/CSP can return a record that satisfies the TupleDesc
 of this relation, made by the tuples in es_epqTuple[] array, rest of the
 code paths are common.

 I have an idea to solve the problem.
 It adds recheckMtd() call if scanrelid==0 just before the assertion above,
 and add a callback of FDW on ForeignRecheck().
 The role of this new callback is to set up the supplied TupleTableSlot
 and check its visibility, but does not define how to do this.
 It is arbitrarily by FDW driver, like invocation of alternative plan
 consists of only built-in logic.

 Invocation of alternative plan is one of the most feasible way to
 implement EPQ logic on FDW, so I think FDW also needs a mechanism
 that takes child path-nodes like custom_paths in CustomPath node.
 Once a valid path node is linked to this list, createplan.c transform
 them to relevant plan node, then FDW can initialize and invoke this
 plan node during execution, like ForeignRecheck().

 This design can solve another problem Fujita-san has also mentioned.
 If scan qualifier is pushed-down to the remote query and its expression
 node is saved in the private area of ForeignScan, the callback on
 ForeignRecheck() can evaluate the qualifier by itself. (Note that only
 FDW driver can know where and how expression node being pushed-down
 is saved in the private area.)

 In the summary, the following three enhancements are a straightforward
 way to fix up the problem he reported.
 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0
 2. Add a callback of FDW in ForeignRecheck() - to construct a record
according to the fdw_scan_tlist definition and to evaluate its
visibility, or to evaluate qualifier pushed-down if base relation.
 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths,
to construct plan nodes for EPQ evaluation.

 On the other hands, we also need to pay attention the development
 timeline. It is a really problem of v9.5, however, it looks to me
 the straight forward solution needs enhancement of FDW APIs.

 I'd like to see people's comment.

I'm not an expert in this area, but this plan does not seem unreasonable to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN pending clean up is not interruptable

2015-08-11 Thread Andres Freund
On 2015-08-11 15:07:15 -0700, Jeff Janes wrote:
 When a user backend (as opposed to vacuum or autoanalyze) gets burdened
 with cleaning up the GIN pending list, it does not
 call CHECK_FOR_INTERRUPTS().
 
 Since cleaning does a lot of random IO, it can take a long time and it is
 not nice to be uninterruptable.

Agreed.

 The attached patch adds an else branch to call CHECK_FOR_INTERRUPTS().
 
 But I think we could instead just call vacuum_delay_point unconditionally.
 It calls CHECK_FOR_INTERRUPTS(), and if not in a throttled vacuum it does
 nothing else.  (That is how ANALYZE handles it.)

Hm, I find that not exactly pretty. I'd rather just add an unconditional
CHECK_FOR_INTERRUPTS to the function.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN pending clean up is not interruptable

2015-08-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On 2015-08-11 15:07:15 -0700, Jeff Janes wrote:
 The attached patch adds an else branch to call CHECK_FOR_INTERRUPTS().
 
 But I think we could instead just call vacuum_delay_point unconditionally.
 It calls CHECK_FOR_INTERRUPTS(), and if not in a throttled vacuum it does
 nothing else.  (That is how ANALYZE handles it.)

 Hm, I find that not exactly pretty. I'd rather just add an unconditional
 CHECK_FOR_INTERRUPTS to the function.

CHECK_FOR_INTERRUPTS is very cheap.  But I tend to agree that you should
be using vacuum_delay_point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Test code is worth the space

2015-08-11 Thread Noah Misch
On Mon, Aug 10, 2015 at 07:02:17AM +0100, Simon Riggs wrote:
 Almost every patch I review has either zero or insufficient tests.
 
 If we care about robustness, then we must discuss tests. Here are my two
 recent experiences:
 
 I agree we could do with x10 as many tests, but that doesn't mean all tests
 make sense, so there needs to be some limiting principles to avoid adding
 pointless test cases. I recently objected to adding a test case to
 Isolation tester for the latest ALTER TABLE patch because in that case
 there is no concurrent behavior to test. There is already a regression test
 that tests lock levels for those statements, so in my view it is more
 sensible to modify the existing test than to add a whole new isolation test
 that does nothing more than demonstrate the lock levels work as described,
 which we already knew.

Committers press authors to delete tests more often than we press them to
resubmit with more tests.  No wonder so many patches have insufficient tests;
we treat those patches more favorably, on average.  I have no objective
principles for determining whether a test is pointlessly redundant, but I
think the principles should become roughly 10x more permissive than the
(unspecified) ones we've been using.

 On another review I suggested we add a function to core to allow it to be
 used in regression tests. A long debate ensued, deciding that we must be
 consistent and put diagnostic functions in contrib. My understanding is
 that we are not allowed to write regression tests that use contrib modules,
 yet the consistent place to put diagnostic functions is contrib -
 therefore, we are never allowed to write tests utilizing diagnostic
 functions. We are allowed to put modules for testing in another directory,
 but these are not distributed to users so cannot be used for production
 diagnosis. Systemic fail, advice needed.

If you want a user-visible function for production diagnosis, set aside the
fact that you wish to use it in a test, and find the best place for it.  Then,
place the tests with the function.  (That is, if the function belongs in
contrib for other reasons, put tests calling it in the contrib module itself.)

Place non-user-visible test support functions in regress.c, or use one of the
options Robert described.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-11 Thread Amit Kapila
On Tue, Aug 11, 2015 at 7:27 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 11 August 2015 at 14:53, Amit Kapila amit.kapil...@gmail.com wrote:


 One more point here why do we need CommitLock before calling
 SimpleLruReadPage_ReadOnly() in the patch and if it is not required,
 then can we use LWLockAcquire(shared-buffer_locks[slotno],
LW_EXCLUSIVE);
 instead of CommitLock?


 That prevents read only access, not just commits, so that isn't a better
suggestion.

read only access of what (clog page?)?

Here we are mainly doing three operations read clog page, write transaction
status
on clog page and update shared control state.  So basically two resources
are
involved clog page and shared control state, so which one of those you are
talking?

Apart from above, in below code, it is assumed that we have exclusive lock
on
clog page which we don't in the proposed patch as some one can read the
same page while we are modifying it. In current code, this assumption is
valid
because during Write we take CLogControlLock in Exclusive mode and while
Reading we take the same in Shared mode.

TransactionIdSetStatusBit()
{
..
/* note this assumes exclusive access to the clog page */
byteval = *byteptr;
byteval = ~(((1  CLOG_BITS_PER_XACT) - 1)  bshift);
byteval |= (status  bshift);
*byteptr = byteval;
..
}

Now even if this is a problem, I think we can solve it with some more lower
level lock or may be with atomic operation, but I have mentioned it to check
your opinion on the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-11 Thread Amit Kapila
On Tue, Aug 11, 2015 at 7:32 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 11 August 2015 at 11:39, Amit Kapila amit.kapil...@gmail.com wrote:


 Another thing is that in this flow, with patch there will be three
locks
 (we take per-buffer locks before doing I/O) that will get involved
rather than
 two, so one effect of this patch will be that currently while doing
I/O,
 concurrent committers will be allowed to proceed as we release
ControlLock
 before doing the same whereas with Patch, they will not be allowed as
they
 are blocked by CommitLock.  Now may be this scenario is less common and
 doesn't matter much if the patch improves the more common scenario,
 however this is an indication of what Andres tries to highlight that
having more
 locks for this might make patch more complicated.


 It's easy to stripe the CommitLock in that case, if it is a problem.


 Sure, I think other places in code that take both the other locks also
 needs to be checked for updation.


  Not sure what that means, but there are no other places calling
CommitLock


What I mean is that if want to ensure that we don't keep CommitLock while
doing I/O, then we might also want to ensure the same while waiting for I/O.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] replication slot restart_lsn initialization

2015-08-11 Thread Andres Freund
On 2015-08-11 15:59:59 -0700, Gurjeet Singh wrote:
 In your version, I don't see a comment that refers to the fact that it
 works on the currently active (global variable) slot.

Hm?

/*
 * Reserve WAL for the currently active slot.
 *
 * Compute and set restart_lsn in a manner that's appropriate for the type of
 * the slot and concurrency safe.
 */
  I moved
  values[0] = NameGetDatum(MyReplicationSlot-data.name);
  nulls[0] = false;
  to outside the conditional block, there seems no reason to have it in
  there?
 
 
 The assignment to values[0] is being done twice. We can do away with the
 one in the else part of the if condition.

Ugh, that's a mistake.

 [1]: I altered you to it in a personal email, but probably it fell through
 the cracks.

I usually just check the lists for newer patch versions, sorry...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-11 Thread Noah Misch
On Tue, Aug 11, 2015 at 01:04:48PM +0200, Andres Freund wrote:
 On 2015-08-05 15:46:36 +0200, Andres Freund wrote:
  On 2015-08-05 15:08:29 +0200, Andres Freund wrote:
   We might later want to change some of the harder to maintain macros to
   inline functions, but that seems better done separately.
  
  Here's a conversion for fastgetattr() and heap_getattr()

 Slightly updated version attached.

 In my opinion this drastically increases readability and thus should be
 applied. Will do so sometime tomorrow unless there's protest.

-1 to introducing more inline functions before committable code replaces what
you've already pushed for this thread.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] All-zero page in GIN index causes assertion failure

2015-08-11 Thread Alvaro Herrera
Alvaro Herrera wrote:

  BTW, shouldn't there be a step in BRIN vacuum that scans all the BRIN pages?
  If an empty page is missing from the FSM for any reason, there's nothing to
  add it there.
 
 Probably.  I didn't change this part yet.  There are two things to fix:
 1. since we use log_newpage_buffer(), we log the initialization but not
 the recording into FSM, so the page would be forgotten about.  This can
 be tested with PageIsEmpty().  An alternative to the vacuum scan is to
 use our own WAL record that not only logs the initialization itself but
 also the FSM update.  Not sure this is worth the trouble.
 
 2. additionally, if brin_getinsertbuffer extends the relation but we
 crash before the caller initializes it, the page would be detected by
 PageIsNew instead and would also need initialization.

Added this part.  It's using log_newpage_buffer too.  The vacuum scan
fixes the whole FSM, though, so after vacuum the FSM is up to date.
I think we could shave off a few bytes by using a separate WAL record,
but I'm not sure it's worth the trouble.

I intend to push this tomorrow.

I now think the free space calculations are broken, but I'll leave that
for later.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
commit 7e75695cf90e2fdea09dc5b5a2071026736fa749
Author: Alvaro Herrera alvhe...@alvh.no-ip.org
AuthorDate: Thu Aug 6 16:20:29 2015 -0300
CommitDate: Tue Aug 11 22:55:46 2015 -0300

Close some holes in BRIN page assignment

In some corner cases, it is possible for the BRIN index relation to be
extended by brin_getinsertbuffer but the new page not be used
immediately for anything in brin_doupdate; when this happens, the page
is initialized and the FSM is updated with the info about that page.  A
later index insert/update can use the page, but since the page is
already initialized, the initialization itself is not WAL-logged.  This
causes recovery to fail altogether.

There is another, related corner case inside brin_getinsertbuffer
itself, in which we extend the relation to put a new index tuple there,
but later find out that we cannot do so, and do not return the buffer;
the page obtained from extension is not even initialized.  The resulting
page is lost forever.

To fix, shuffle the code so that initialization is not done in
brin_getinsertbuffer at all, in normal cases; instead, the
initialization is done by its callers (brin_doinsert and brin_doupdate)
once they're certain that the page is going to be used.  When either
those functions or brin_getinsertbuffer itself determine that the new
page cannot be used, before bailing out they initialize the page as an
empty regular page, enter it in FSM and WAL-log all this.  This way, the
page is usable for future index insertions, and WAL replay doesn't find
trying to insert tuples in pages whose initialization didn't make it to
the WAL.

Additionally, add a new step to vacuuming so that all pages of the index
are scanned; whenever an uninitialized page is found, it is initialized
as empty and WAL-logged.  We also take this opportunity to update the
FSM, in case it has gotten out of date (which is a good thing because
those operations are not WAL-logged.)

Thanks to Heikki Linnakangas for finding the problem that kicked some
additional analysis of BRIN page assignment.
---
 src/backend/access/brin/brin.c |  43 ++
 src/backend/access/brin/brin_pageops.c | 232 +
 src/include/access/brin_page.h |   1 +
 src/include/access/brin_pageops.h  |   2 +
 4 files changed, 256 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index fc9f964..cbd40c6 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -68,6 +68,7 @@ static void brinsummarize(Relation index, Relation heapRel,
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
 			 BrinTuple *b);
+static void brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy);
 
 
 /*
@@ -736,6 +737,8 @@ brinvacuumcleanup(PG_FUNCTION_ARGS)
 	heapRel = heap_open(IndexGetRelation(RelationGetRelid(info-index), false),
 		AccessShareLock);
 
+	brin_vacuum_scan(info-index, info-strategy);
+
 	brinsummarize(info-index, heapRel,
   stats-num_index_tuples, stats-num_index_tuples);
 
@@ -1150,3 +1153,43 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
 
 	MemoryContextDelete(cxt);
 }
+
+/*
+ * brin_vacuum_scan
+ *		Do a complete scan of the index during VACUUM.
+ *
+ * This routine scans the complete index looking for uncatalogued index pages,
+ * i.e. those that might have been lost due to a crash after index extension
+ * and such.
+ */
+static void

[HACKERS] TransactionIdGetCommitTsData and its dereferenced pointers

2015-08-11 Thread Michael Paquier
Hi all,

TransactionIdGetCommitTsData@commit_ts.c does the following:
if (ts)
*ts = entry.time;
[...]
return *ts != 0;
This is a bad idea, because if TransactionIdGetCommitTsData is called
with ts == NULL this would simply crash. It seems to me that it makes
little sense to have ts == NULL either way because this function is
intended to return a timestamp in any case, hence I think that we
should simply Assert(ts == NULL) and remove those checks.

Petr, Alvaro, perhaps you intended something like the patch attached,
or perhaps something like that? This would be useful to not ERROR
should an OOM happen when allocating a timestamp pointer, though I
doubt that this is the first intention:
return ts != NULL ? *ts != 0 : false;
Regards,
-- 
Michael
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 5ad35c0..d35da03 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -266,6 +266,8 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	TransactionId oldestCommitTs;
 	TransactionId newestCommitTs;
 
+	Assert(ts != NULL);
+
 	/* Error if module not enabled */
 	if (!track_commit_timestamp)
 		ereport(ERROR,
@@ -294,8 +296,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 		TransactionIdPrecedes(xid, oldestCommitTs) ||
 		TransactionIdPrecedes(newestCommitTs, xid))
 	{
-		if (ts)
-			*ts = 0;
+		*ts = 0;
 		if (nodeid)
 			*nodeid = InvalidRepOriginId;
 		return false;
@@ -312,8 +313,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 		LWLockAcquire(CommitTsLock, LW_SHARED);
 		if (commitTsShared-xidLastCommit == xid)
 		{
-			if (ts)
-*ts = commitTsShared-dataLastCommit.time;
+			*ts = commitTsShared-dataLastCommit.time;
 			if (nodeid)
 *nodeid = commitTsShared-dataLastCommit.nodeid;
 
@@ -330,8 +330,8 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 		   SizeOfCommitTimestampEntry * entryno,
 		   SizeOfCommitTimestampEntry);
 
-	if (ts)
-		*ts = entry.time;
+	*ts = entry.time;
+
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index 8730c5e..3724d7e 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -379,8 +379,16 @@ tarOpen(ArchiveHandle *AH, const char *filename, char mode)
 	}
 	else
 	{
+		int		old_umask;
+
 		tm = pg_malloc0(sizeof(TAR_MEMBER));
 
+		/*
+		 * For security, no temporary file created can be group or other
+		 * accessible.
+		 */
+		old_umask = umask(S_IRWXG | S_IRWXO);
+
 #ifndef WIN32
 		tm-tmpFH = tmpfile();
 #else
@@ -415,6 +423,8 @@ tarOpen(ArchiveHandle *AH, const char *filename, char mode)
 		if (tm-tmpFH == NULL)
 			exit_horribly(modulename, could not generate temporary file name: %s\n, strerror(errno));
 
+		umask(old_umask);
+
 #ifdef HAVE_LIBZ
 
 		if (AH-compression != 0)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] can't coax query planner into using all columns of a gist index

2015-08-11 Thread Tom Lane
Gideon Dresdner gide...@gmail.com writes:
 I had a discussion on IRC today with RhodiumToad regarding optimizing a
 specific query. We didn't manage to figure out how to get postgres to hit a
 GIST index.

FWIW, I couldn't reproduce the described behavior.  Can you provide a
self-contained test case?  Are you sure your server is 9.4.4?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers