date:20160412

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-12 Thread Michael Paquier

On Wed, Apr 13, 2016 at 11:24 AM, Etsuro Fujita
 wrote:
> On 2016/04/13 3:14, Robert Haas wrote:
>>
>> I'm wondering why we are fixing this specific case and not any of the
>> other calls to PQexec() or PQexecParams() in postgres_fdw.c.
>>
>> I mean, many of those instances are cases where the query isn't likely
>> to run for very long, but certainly "FETCH %d FROM c%u" is in theory
>> just as bad as the new code introduced in 9.6.  In practice, it
>> probably isn't, because we're probably only fetching 50 rows at a time
>> rather than potentially a lot more, but if we're fixing this code up
>> to be interrupt-safe, maybe we should fix it all at the same time.
>> Even for the short-running queries like CLOSE and DEALLOCATE, it seems
>> possible that there could be a network-related hang which you might
>> want to interrupt.
>
>
> Actually, I was wondering, too, but I didn't propose that because, as far as
> I know, there are no reports from the field.  But I agree with you.

For something that is HEAD-only that's a great idea to put everything
into the same flag like that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-04-12 Thread Robert Haas

On Tue, Apr 12, 2016 at 10:30 PM, Noah Misch  wrote:
> That sounds like this open item is ready for CLOSE_WAIT status; is it?

I just retested this on power2.  Here are the results.  I retested
3fed4174 and 6150a1b0 plus master as of deb71fa9.  5-minute pgbench -S
runs, scale factor 300, with predictable prewarming to minimize
variation, as well as numactl --interleave.  Each result is a median
of three.

1 client: 3fed4174 = 13701.014931, 6150a1b0 = 13669.626916, master =
19685.571089
8 clients: 3fed4174 = 126676.357079, 6150a1b0 = 125239.911105, master
= 122940.079404
32 clients: 3fed4174 = 323989.685428, 6150a1b0 = 338638.095126, master
= 333656.861590
64 clients: 3fed4174 = 495434.372578, 6150a1b0 = 457794.475129, master
= 493034.922791
128 clients: 3fed4174 = 376412.090366, 6150a1b0 = 363157.294391,
master = 625498.280370

On this test 8, 32, and 64 clients are coming out about the same as
3fed4174, but 1 client and 128 clients are dramatically improved with
current master.  The 1-client result is a lot more surprising than the
128-client result; I don't know what's going on there.  But anyway I
don't see a regression here.

So, yes, I would say this should go to CLOSE_WAIT at this point,
unless Amit or somebody else turns up further evidence of a continuing
issue here.

Random points of possible interest:

1. During a 128-client run, top shows about 45% user time, 10% system
time, 45% idle.

2. About 3 minutes into a 128-client run, perf looks like this
(substantially abridged):

3.55%  postgres postgres [.] GetSnapshotData
2.15%  postgres postgres [.] LWLockAttemptLock
 |--32.82%-- LockBuffer
  |  |--48.59%-- _bt_relandgetbuf
  |  |--44.07%-- _bt_getbuf
  |--29.81%-- ReadBuffer_common
  |--23.88%-- GetSnapshotData
  |--5.30%-- LockAcquireExtended
 2.12%  postgres postgres [.] LWLockRelease
 2.02%  postgres postgres [.] _bt_compare
 1.88%  postgres postgres [.]
hash_search_with_hash_value
  |--47.21%-- BufTableLookup
  |--10.93%-- LockAcquireExtended
  |--5.43%-- GetPortalByName
  |--5.21%-- ReadBuffer_common
  |--4.68%-- RelationIdGetRelation
 1.87%  postgres postgres [.] AllocSetAlloc
 1.42%  postgres postgres [.] PinBuffer.isra.3
 0.96%  postgres libc-2.17.so [.] __memcpy_power7
 0.89%  postgres postgres [.]
UnpinBuffer.constprop.7
 0.80%  postgres postgres [.] PostgresMain
 0.80%  postgres postgres [.]
pg_encoding_mbcliplen
 0.71%  postgres postgres [.] hash_any
 0.62%  postgres postgres [.] AllocSetFree
 0.59%  postgres postgres [.] palloc
 0.57%  postgres libc-2.17.so [.] _int_free

A context-switch profile, somewhat amazingly, shows no context
switches for anything other than waiting on client read, implying that
performance is entirely constrained by memory bandwidth and CPU speed,
not lock contention.

> If someone does retest this, it would be informative to see how the system
> performs with 6150a1b0 reverted.  Your testing showed performance of 6150a1b0
> alone and of 6150a1b0 plus predecessors of 008608b and 4835458.  I don't
> recall seeing figures for 008608b + 4835458 - 6150a1b0, though.

That revert isn't trivial: even what exactly that would mean at this
point is somewhat subjective.  I'm also not sure there is much point.
6150a1b08a9fe7ead2b25240be46dddeae9d98e1 was written in such a way
that only platforms with single-byte spinlocks were going to have a
BufferDesc that fits into 64 bytes, which in retrospect was a bit
short-sighted.  Because the changes that were made to get it back down
to 64 bytes might also have other performance-relevant consequences,
it's a bit hard to be sure that that was the precise thing that caused
the regression.  And of course there was a fury of other commits going
in at the same time, some even on related topics, which further adds
to the difficulty of pinpointing this precisely.  All that is a bit
unfortunate in some sense, but I think we're just going to have to
keep moving forward and hope for the best.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-12 Thread Pavel Stehule

2016-04-12 22:48 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > I had a idea about additional options of COPY RAW statements. One can be
> > CAST function. These CAST functions can be used to any for any format.
>
> Uh, what?  CAST() is not about external representations of values, and
> overloading it for that purpose doesn't seem like a particularly good
> idea: you'd have to figure out what the conversions meant inside SQL as
> well as externally.  Also, maybe I missed something, but a different
> representation of individual data values within a COPY wasn't what we
> were after here.
>

I didn't think about this idea to deep - so there can be more than one
problem. More - I though about it before you designed RAW_TEXT mode - that
can coverage this use case too.

Originally I had only RAW mode, what can be difficult for JSONB, so my
solution was

COPY target(jsonbcol) FROM jsondata OPTIONS(RAW, CAST(json_to_jsonb)).

Now this idea is obsolete, because anybody can do

COPY target(jsonbcol) FROM jsondata OPTIONS(RAW_TEXT)

What is much more simple.

Using explicit casts in COPY statement was motivated by possible
requirement do some manipulations with data before their storing to table.
It is idea, and probably wrong idea.

I don't want to increase complexity of COPY statement too much. My goal is
enhance COPY to import single objects simply. And if you need some more
complex, then you can write some simple application where can be used
classic COPY or COPY RAW again (because it doesn't require escaping).

Regards

Pavel

>
> regards, tom lane
>

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Andres Freund

On 2016-04-12 23:52:14 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > I'm kinda inclined to apply that portion (or just the whole patch with
> > the spurious #ifdef 0 et al fixed) into 9.6; and add the necessary
> > checks in a few places. Because I really think this is likely to hit
> > unsuspecting users.
> 
> !!!
> 
> Be sure to consult with the RMT before doing anything of the sort.

I didn't plan to do anything without a few +1's. I don't think we can
release with the state of things as is though. I don't see a less
intrusive way than to get rid of that spinlock on all platforms capable
of significant concurrency.

So, RMT, what are your thoughts on this?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Alvaro Herrera

Andres Freund wrote:

> I'm kinda inclined to apply that portion (or just the whole patch with
> the spurious #ifdef 0 et al fixed) into 9.6; and add the necessary
> checks in a few places. Because I really think this is likely to hit
> unsuspecting users.

!!!

Be sure to consult with the RMT before doing anything of the sort.
It might as well decide to revert the whole patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-04-12 Thread Noah Misch

On Tue, Apr 12, 2016 at 05:36:07PM -0400, Robert Haas wrote:
> So the current status of this issue is:
> 
> 1. Andres committed a patch (008608b9d51061b1f598c197477b3dc7be9c4a64)
> to reduce the size of an LWLock by an amount equal to the size of a
> mutex (modulo alignment).
> 
> 2. Andres also committed a patch
> (48354581a49c30f5757c203415aa8412d85b0f70) to remove the spinlock from
> a BufferDesc, which also reduces its size, I think, because it
> replaces members of types BufFlags (2 bytes), uint8, slock_t, and
> unsigned with a single member of type pg_atomic_uint32.
> 
> The reason why these changes are relevant is because Andres thought
> the observed regression might be related to the BufferDesc growing to
> more than 64 bytes on POWER, which in turn could cause buffer
> descriptors to get split across cache lines.  However, in the
> meantime, I did some performance tests on the same machine that Amit
> used for testing in the email that started this thread:
> 
> http://www.postgresql.org/message-id/ca+tgmozjda6k7-17k4a48rvb0upr98hvuancfnnlrgsdb1u...@mail.gmail.com
> 
> The upshot of that is that (1) the performance degradation I saw was
> significant but smaller than what Amit reported in the OP, and (2) it
> looked like the patches Andres gave me to test at the time got
> performance back to about the same level we were at before 6150a1b0.
> So there's room for optimism that this is fixed, but perhaps some
> retesting is in order, since what was committed was, I think, not
> identical to what I tested.

That sounds like this open item is ready for CLOSE_WAIT status; is it?

If someone does retest this, it would be informative to see how the system
performs with 6150a1b0 reverted.  Your testing showed performance of 6150a1b0
alone and of 6150a1b0 plus predecessors of 008608b and 4835458.  I don't
recall seeing figures for 008608b + 4835458 - 6150a1b0, though.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-12 Thread Etsuro Fujita


On 2016/04/13 3:14, Robert Haas wrote:

I'm wondering why we are fixing this specific case and not any of the
other calls to PQexec() or PQexecParams() in postgres_fdw.c.

I mean, many of those instances are cases where the query isn't likely
to run for very long, but certainly "FETCH %d FROM c%u" is in theory
just as bad as the new code introduced in 9.6.  In practice, it
probably isn't, because we're probably only fetching 50 rows at a time
rather than potentially a lot more, but if we're fixing this code up
to be interrupt-safe, maybe we should fix it all at the same time.
Even for the short-running queries like CLOSE and DEALLOCATE, it seems
possible that there could be a network-related hang which you might
want to interrupt.


Actually, I was wondering, too, but I didn't propose that because, as 
far as I know, there are no reports from the field.  But I agree with you.



How about we encapsulate the while (PQisBusy(...)) loop into a new
function pgfdw_get_result(), which can be called after first calling
PQsendQueryParams()?  So then this code will say dmstate->result =
pgfdw_get_result(dmstate->conn).  And we can do something similar for
the other call to PQexecParams() in create_cursor().  Then let's also
add something like pgfdw_exec_query() which calls PQsendQuery() and
then pgfdw_get_result, and use that to replace all of the existing
calls to PQexec().

Then all the SQL postgres_fdw executes would be interruptible, not
just the new stuff.


Seems like a good idea.  Will do.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-12 Thread Noah Misch

On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
> On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch  wrote:
> > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> > > Well, if we *don't* do the rewrite before we release it, then we have to
> > > instead put information about the new version of the functions into the
> > old
> > > structure I think.
> > >
> > > So I think it's an open issue.
> >
> > Works for me...
> >
> > [This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 9.6 open item.  Magnus,
> > since you committed the patch believed to have created it, you own this
> > open
> > item.  If that responsibility lies elsewhere, please let us know whose
> > responsibility it is to fix this.  Since new open items may be discovered
> > at
> > any time and I want to plan to have them all fixed well in advance of the
> > ship
> > date, I will appreciate your efforts toward speedy resolution.  Please
> > present, within 72 hours, a plan to fix the defect within seven days of
> > this
> > message.  Thanks.
> >
> 
> I won't have time to do the bigger rewrite/reordeirng by then, but I can
> certainly commit to having the smaller updates done to cover the new
> functionality in less than a week. If nothing else, that'll be something
> for me to do on the flight over to pgconf.us.

Thanks for that plan; it sounds good.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-04-12 Thread David Rowley

On 1 April 2016 at 17:12, David Rowley  wrote:
> On 30 March 2016 at 09:14, Robert Haas  wrote:
>> On Tue, Mar 29, 2016 at 3:48 PM, Paul Ramsey  
>> wrote:
 I have no idea why the cost adjustments that you need are different
 for the scan case and the aggregate case.  That does seem problematic,
 but I just don't know why it's happening.
>>>
>>> What might be a good way to debug it? Is there a piece of code I can
>>> look at to try and figure out the contribution of COST in either case?
>>
>> Well, the cost calculations are mostly in costsize.c, but I dunno how
>> much that helps.  Maybe it would help if you posted some EXPLAIN
>> ANALYZE output for the different cases, with and without parallelism?
>>
>> One thing I noticed about this output (from your blog)...
>>
>> Finalize Aggregate
>>  (cost=16536.53..16536.79 rows=1 width=8)
>>  (actual time=2263.638..2263.639 rows=1 loops=1)
>>->  Gather
>>(cost=16461.22..16461.53 rows=3 width=32)
>>(actual time=754.309..757.204 rows=4 loops=1)
>>  Number of Workers: 3
>>  ->  Partial Aggregate
>>  (cost=15461.22..15461.23 rows=1 width=32)
>>  (actual time=676.738..676.739 rows=1 loops=4)
>>->  Parallel Seq Scan on pd
>>(cost=0.00..13856.38 rows=64 width=2311)
>>(actual time=3.009..27.321 rows=42 loops=4)
>>  Filter: (fed_num = 47005)
>>  Rows Removed by Filter: 17341
>>  Planning time: 0.219 ms
>>  Execution time: 2264.684 ms
>>
>> ...is that the finalize aggregate phase is estimated to be very cheap,
>> but it's actually wicked expensive.  We get the results from the
>> workers in only 750 ms, but it takes another second and a half to
>> aggregate those 4 rows???
>
> hmm, actually I've just realised that create_grouping_paths() should
> be accounting agg_costs differently depending if it's partial
> aggregation, finalize aggregation, or just normal. count_agg_clauses()
> needs to be passed the aggregate type information to allow the walker
> function to cost the correct portions of the aggregate correctly based
> on what type of aggregation the costs will be used for.  In short,
> please don't bother to spend too much time tuning your costs until I
> fix this.
>
> As of now the Partial Aggregate is including the cost of the final
> function... that's certainly broken, as it does not call that
> function.
>
> I will try to get something together over the weekend to fix this, but
> I have other work to do until then.

Hi Paul,

As of deb71fa, committed by Robert today, you should have a bit more
control over parallel aggregate costings. You can how raise the
transfn cost, or drop the combinefn cost to encourage parallel
aggregation. Keep in mind the derialfn and deserialfn costs are now
also accounted for too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why doesn't src/backend/port/win32/socket.c implement bind()?

2016-04-12 Thread Tom Lane

Magnus Hagander  writes:
> On Mon, Jan 11, 2016 at 6:19 AM, Amit Kapila 
> wrote:
>> On Sun, Jan 10, 2016 at 11:55 PM, Tom Lane  wrote:
>>> I think the reason why we're getting "No error" instead of a useful
>>> strerror report is that socket.c doesn't provide an implementation
>>> of bind() that includes TranslateSocketError().

>> listen also doesn't have such an implementation and probably few others.
>> I think here we should add a win32 wrapper over bind and listen
>> API's which ensures TranslateSocketError() should be called for
>> error cases.

> Yeah, that seems like a good idea.

I finally got around to doing this, after being annoyed by yet another
Windows buildfarm failure with no clear indication as to the cause:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird=2016-04-12%2022%3A30%3A12

While we wait to see if that actually helps give useful errors,
I had a thought about what may be happening here.  PostgresNode.pm
picks a random high port number and tests to see if it's free using
pg_isready, with (unless I'm misreading) any non-zero result code
being taken as "it's free".  The problem here is that that completely
fails to recognize a port being used by a non-Postgres process as
not-free --- most likely, you'll get PQPING_NO_RESPONSE for that case.
If there's other stuff using high ports on a particular buildfarm machine,
you'd expect occasional random test failures due to this.  The observed
fact that some buildfarm critters are much more prone to this type of
failure than others is well explained by this hypothesis.

I think we should forget about pg_isready altogether here, and instead
write some code that either tries to bind() the target port number itself,
or tries a low-level TCP connection request to the target port.  I'm
not sure what's the most convenient way to accomplish either in Perl.

The bind() solution would provide a more trustworthy answer, but it
might actually create more problems than it solves if the OS requires a
cooling-off period before giving the port out to a different process.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-12 Thread Stephen Frost

Robert, Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2016-04-12 14:29:10 -0400, Robert Haas wrote:
> > On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund  wrote:
> > That does not seem like a good idea from here.  One of the ideas I
> > still want to explore at some point is having a background process
> > identify the buffers that are just about to be evicted and stick them
> > on the freelist so that the backends don't have to run the clock sweep
> > themselves on a potentially huge number of buffers, at perhaps
> > substantial CPU cost.  Amit's last attempt at this didn't really pan
> > out, but I'm not convinced that the approach is without merit.
> 
> FWIW, I've posted an implementation of this in the checkpoint flushing
> thread; I saw quite substantial gains with it. It was just entirely
> unrealistic to push that into 9.6.

That is fantastic to hear and I certainly agree that we should be
working on that approach.

> > And, on the other hand, if we don't do something like that, it will be
> > quite an exceptional case to find anything on the free list.  Doing it
> > just to speed up developer benchmarking runs seems like the wrong
> > idea.
> 
> I don't think it's just developer benchmarks. I've seen a number of
> customer systems where significant portions of shared buffers were
> unused due to this.

Ditto.

I agree that we should be smarter when we have a bunch of free
shared_buffers space and we're doing sequential work.  I don't think we
want to immediately grab all that free space for the sequential work but
perhaps there's a reasonable heuristic we could use- such as if the free
space available is twice what we expect our sequential read to be, then
go ahead and load it into shared buffers?

The point here isn't to get rid of the ring buffers but rather to use
the shared buffer space when we have plenty of it and there isn't
contention for it.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] fd.c: flush data problems on osx

2016-04-12 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Mar 21, 2016 at 9:09 PM, Stas Kelvich  
> wrote:
>> On 21 Mar 2016, at 14:53, Andres Freund  wrote:
>>> I think we still need to fix the mmap() implementation to support the
>>> offset = 0, nbytes = 0 case (via fseek(SEEK_END).

>> It is already in this diff. Iâve added this few messages ago.

There are bigger issues in this code, actually, to wit assuming that
it should always be possible to mmap the target region.  That's patently
false on 32-bit machines, where you likely won't have a gigabyte of free
address space.

For this reason, I think that issuing a WARNING on mmap failure is a
damfool idea, and giving up on the flush attempt even more so.  We
should just silently fall through to the next implementation if we
cannot mmap the target region.

While I'm whinging: what happens when off_t is wider than size_t?  It's
entirely possible that the user has configured the relation segment size
to more than 4GB, so I do not think that's academic.  I think we also need
a test for length > SIZE_T_MAX and then fall through to another
implementation, rather than mapping and msync'ing some initial fragment of
the file, which is what will happen now.

> A similar change seems to be needed in initdb.c's pre_sync_fname.

Hmm, do we need to move this logic into src/common?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Pglogical questions and problems

2016-04-12 Thread Joshua D. Drake


Hello,

Alright -- here we go PostgreSQL 9.5.2, Ubuntu Trusty, All packages from 
apt.postgresql.org except PgLogical which is from 2Q:


I have the following setup:

Origin0->Replica0
 Table: logical_test(id bigserial primary key)

Origin0:

SELECT pglogical.create_node(
  node_name := 'origin',
  dsn := 'host=192.168.1.65 port=5432 dbname=logical'
);

CREATE TABLE logical_test (id bigserial primary key);

SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);

Replica0:

SELECT pglogical.create_node(
  node_name := 'replica0',
  dsn := 'host=192.168.1.66 port=5432 dbname=logical'
);

SELECT pglogical.create_subscription(
  subscription_name := 'replica0relations',
  provider_dsn := 'host=192.168.1.65 port=5432 dbname=logical',
  synchronize_structure := TRUE,
  synchronize_data := TRUE
);

Replicating from Origin0->Replica0 works. I then added Replica1:

SELECT pglogical.create_node(
  node_name := 'replica2',
  dsn := 'host=192.168.1.67 port=5432 dbname=logical'
);

SELECT pglogical.create_subscription(
  subscription_name := 'logical_subscriber2',
  provider_dsn := 'host=192.168.1.66 port=5432 dbname=logical',
  synchronize_data := TRUE,
  synchronize_structure := TRUE
);

The initial sync works, I end up with the table and all rows. However if 
I perform an update and add or modify rows, only the origin and replica0 
update. Replica1 provides the following:


2016-04-12 15:38:10 PDT [25712-2] ERROR:  cache lookup failed for 
replication origin 'pgl_logical_origin_replica084e3989'
2016-04-12 15:38:10 PDT [1192-89105] LOG:  worker process: pglogical 
apply 16384:1108649370 (PID 25712) exited with exit code 1


And continues to provide this rather non-useful message continuously in 
a loop.


I tried dropping the subscription:

logical=# select pglogical.drop_subscription(subscription_name := 
'logical_subscriber2');

 drop_subscription
---
 1

And dropping the tables:

logical=# drop table logical_test;
DROP TABLE

logical=# \d
No relations found.

Then add the subscription:

SELECT pglogical.create_subscription(
  subscription_name := 'logical_subscriber2',
  provider_dsn := 'host=192.168.1.66 port=5432 dbname=logical',
  synchronize_data := TRUE,
  synchronize_structure := TRUE
);

logical=# \d
 List of relations
 Schema |Name |   Type   |  Owner
+-+--+--
 public | logical_test| table| postgres
 public | logical_test_id_seq | sequence | postgres
(2 rows)

logical=# select count(*) from logical_test;
 count
---
  1100
(1 row)

That is accurate but if I try to add rows from the Origin:

logical=# truncate logical_test;
TRUNCATE TABLE
logical=# select count(*) from logical_test;
 count
---
 0
(1 row)

Replica0:

logical=# select count(*) from logical_test;
 count
---
 0
(1 row)

Replica2:

logical=# select count(*) from logical_test;
 count
---
  1100
(1 row)

Replica2 log:

2016-04-12 15:43:39 PDT [4881-1] LOG:  starting apply for subscription 
logical_subscriber2
2016-04-12 15:43:39 PDT [4881-2] ERROR:  cache lookup failed for 
replication origin 'pgl_logical_origin_replica084e3989'
2016-04-12 15:43:39 PDT [1192-100644] LOG:  worker process: pglogical 
apply 16384:1108649370 (PID 4881) exited with exit code 1


So what am I missing?

Origin pg_hba.conf:

hostssl  replicationpostgres192.168.1.66/32 md5
hostssl  logicalpostgres192.168.1.66/32 md5

Replica0 pg_hba.conf:

hostssl  logical postgres127.0.0.1/32   md5
hostssl  logical postgres192.168.1.66/32md5
hostssl  logical postgres192.168.1.67/32md5
hostssl  replication postgres192.168.1.67/32md5

Replica2 pg_hba.conf:

hostssl  logical postgres127.0.0.1/32   md5
hostssl  logical postgres192.168.1.66/32md5
hostssl  logical postgres192.168.1.67/32md5
hostssl  replication postgres192.168.1.67/32md5

All auth is done via .pgpass.

Sincerely,

JD

















--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-04-12 Thread Julien Rouhaud

On 11/04/2016 22:53, Julien Rouhaud wrote:
> On 11/04/2016 17:44, Robert Haas wrote:
>>
>> We should probably add the number of workers actually obtained to the
>> EXPLAIN ANALYZE output.  That's been requested before.
>>
> 
> If it's not too late for 9.6, it would be very great.
> 

Just in case I attach a patch to implement it. I'll add it to the next
commitfest.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 713cd0e..8a1fbab 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1339,8 +1339,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (plan->qual)
show_instrumentation_count("Rows 
Removed by Filter", 1,

   planstate, es);
-   ExplainPropertyInteger("Number of Workers",
+   ExplainPropertyInteger("Number of Workers 
planned",
   
gather->num_workers, es);
+   if (es->analyze)
+   {
+   int nworkers;
+
+   nworkers = ((GatherState *) 
planstate)->nworkers_launched;
+   ExplainPropertyInteger("Number of 
Workers launched",
+   
   nworkers, es);
+   }
if (gather->single_copy)
ExplainPropertyText("Single Copy",
  
gather->single_copy ? "true" : "false",
diff --git a/src/backend/executor/nodeGather.c 
b/src/backend/executor/nodeGather.c
index 3f0ed69..3834ed6 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -166,6 +166,7 @@ ExecGather(GatherState *node)
 */
pcxt = node->pei->pcxt;
LaunchParallelWorkers(pcxt);
+   node->nworkers_launched = pcxt->nworkers_launched;
 
/* Set up tuple queue readers to read the results. */
if (pcxt->nworkers_launched > 0)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index dbec07e..ee4e189 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1956,6 +1956,7 @@ typedef struct GatherState
struct ParallelExecutorInfo *pei;
int nreaders;
int nextreader;
+   int nworkers_launched;
struct TupleQueueReader **reader;
TupleTableSlot *funnel_slot;
boolneed_to_scan_locally;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-12 Thread Michael Paquier

On Wed, Apr 13, 2016 at 1:53 AM, Stas Kelvich  wrote:
>> On 12 Apr 2016, at 15:47, Michael Paquier  wrote:
>>
>> It looks to be the case... The PREPARE phase replayed after the
>> standby is restarted in recovery creates a series of exclusive locks
>> on the table created and those are not taken on HEAD. Once those are
>> replayed the LOCK_STANDBY record is conflicting with it. In the case
>> of the failure, the COMMIT PREPARED record cannot be fetched from
>> master via the WAL stream so the relation never becomes visible.
>
> Yep, it is. It is okay for prepared xact hold a locks for created/changed 
> tables,
> but code in standby_redo() was written in assumption that there are no 
> prepared
> xacts at the time of recovery. I’ll look closer at checkpointer code and will 
> send
> updated patch.
>
> And thanks again.

That's too late for 9.6 unfortunately, don't forget to add that in the next CF!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs

2016-04-12 Thread David Rowley

On 13 April 2016 at 08:52, Robert Haas  wrote:
> On Sun, Apr 10, 2016 at 8:47 AM, David Rowley
>  wrote:
>> I realised a few days ago that the parallel aggregate code does not
>> cost for the combine, serialisation and deserialisation functions at
>> all.
>
> Oops.
>
>> I've attached a patch which fixes this.
>
> I've committed this patch.  I wonder if it's going to produce compiler
> warnings for some people, complaining about possible use of an
> uninitialized variable.  That would kind of suck.  I don't much mind
> having to insert a dummy assignment to shut the compiler up; a smarter
> compiler will just throw it out anyway.  I'm less enthused about a
> dummy MemSet.  The compiler is less likely to be able to get rid of
> that, and it's more expensive if it doesn't.  But let's see what
> happens.

Thanks for committing.

I wondered that too, so checked a couple of compilers and got no
warnings, but the buildfarm should let us know. The other option would
be to palloc() them, and have them set to NULL initially... that's not
very nice either... Another option would be to protect the final
parallel path generation with if (grouped_rel->partial_pathlist &&
grouped_rel->consider_parallel). I'd imagine any compiler smart enough
to work out that uninitialised is not possible would also be able to
remove the check for grouped_rel->consider_parallel, but *shrug*, I
don't often look at the assembly that compilers generate, so I might
be giving them too much credit.

>> One small point which I was a little unsure of in the attached is,
>> should the "if (aggref->aggdirectargs)" part of
>> count_agg_clauses_walker() be within the "if
>> (!context->combineStates)". I simply couldn't decide. We currently
>> have no aggregates which this affects anyway, per; select * from
>> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
>> I've left it outwith.
>
> The direct arguments would be evaluated in the worker, but not in the
> leader, right?  Or am I confused?

That seems right, but I just can't think of how its possible to
parallelise these aggregates anyway.

>> Another thing I thought of is that it's not too nice that I have to
>> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
>> was tempted to invent some bitmask flags for this, then modify
>> create_agg_path() to use the same flags, but I thought I'd better not
>> cause too much churn with this patch.
>
> I'm kinda tempted to say this should be using an enum.  I note that
> serialStates has a subtly different meaning here than in some other
> places where you have used the same term.

hmm, I'm not sure how it's subtly different. Do you mean the
preference towards costing the finalfn when finalizeAggs is true, and
ignoring the serialfn in this case? nodeAgg.c should do the same,
although it'll deserialize in such a case. We can never finalize and
serialize in the same node.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-04-12 Thread Robert Haas

On Thu, Mar 31, 2016 at 10:13 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund  wrote:
>>> On 2016-03-31 06:43:19 -0400, Robert Haas wrote:
 To which proposal are you referring?
>
>>> 1) in 
>>> http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg...@awork2.anarazel.de
>
>> OK.  So, Noah, my proposed strategy is to wait and see if Andres can
>> make that work, and if not, then revisit the issue of what to do.
>
> I thought that proposal had already crashed and burned, on the grounds
> that byte-size spinlocks require instructions that many PPC machines
> don't have.

So the current status of this issue is:

1. Andres committed a patch (008608b9d51061b1f598c197477b3dc7be9c4a64)
to reduce the size of an LWLock by an amount equal to the size of a
mutex (modulo alignment).

2. Andres also committed a patch
(48354581a49c30f5757c203415aa8412d85b0f70) to remove the spinlock from
a BufferDesc, which also reduces its size, I think, because it
replaces members of types BufFlags (2 bytes), uint8, slock_t, and
unsigned with a single member of type pg_atomic_uint32.

The reason why these changes are relevant is because Andres thought
the observed regression might be related to the BufferDesc growing to
more than 64 bytes on POWER, which in turn could cause buffer
descriptors to get split across cache lines.  However, in the
meantime, I did some performance tests on the same machine that Amit
used for testing in the email that started this thread:

http://www.postgresql.org/message-id/ca+tgmozjda6k7-17k4a48rvb0upr98hvuancfnnlrgsdb1u...@mail.gmail.com

The upshot of that is that (1) the performance degradation I saw was
significant but smaller than what Amit reported in the OP, and (2) it
looked like the patches Andres gave me to test at the time got
performance back to about the same level we were at before 6150a1b0.
So there's room for optimism that this is fixed, but perhaps some
retesting is in order, since what was committed was, I think, not
identical to what I tested.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread David G. Johnston

On Tue, Apr 12, 2016 at 2:04 PM, Robert Haas  wrote:

> On Tue, Apr 12, 2016 at 4:32 PM, David G. Johnston
>  wrote:
> > I give a solid +10 to Robert's opinions on the matter and aside from
> > figuring out if and how to fit first-number versioning dynamics into our
> > release policies I think the community is doing a sufficient job on the
> > communication and planning front.  The biggest resource need is quality
> > control.  I dislike the fact that we are currently in a situation where
> the
> > first 3 point releases each year are considered "live betas" based
> > especially on both 9.3 and 9.5 post-release significant bug counts.
>
> /me blinks.
>
> I find it shocking that you would compare 9.5 to 9.3 that way.  Yeah,
> we've had a few bugs in 9.5: in particular, it was disappointing to
> have to disable abbreviated keys.  But I'm not sure that I really
> believe that affected massive numbers of users in a really negative
> way - many locales were just fine, and not every locale that had a
> problem with some string data necessarily had a problem with the
> strings people were actually storing.  But we haven't eaten anybody's
> data, at least not beyond what can be fixed by a REINDEX, unless I
> missed something here.
>

I probably over-implied my feelings regarding 9.5 since, yes, abbreviated
keys was largely out of realm of reasonable expectation.


I think I am colored by my involvement in attempting to help research this
one:

9.5.1 "Fix an oversight that caused hash joins to miss joining to some
tuples of the inner relation in rare cases (Tomas Vondra, Tom Lane)"

This one struck me as well for some reason...

9.5.1 "Fix overeager pushdown of HAVING clauses when grouping sets are
used."

The two ROW() comparison fixes for 9.5.2

I'm not exactly someone looking to poke a stick in PostgreSQL's side so
regardless of degree of "validity" of my feelings that fact that I have
them is likely informative.  Or I'm just in a particularly overly-sensitive
mood right now - I wouldn't fully discount the possibility.


> The fact is that this is a fairly hard problem to solve.  Some bugs
> are not going to get found before people try the software, and we
> can't make them try it while it's in beta.  We can only do our best to
> do good code review, but inevitably we will miss some things.
>

Agreed, and to the point of using corporate resources for improvement of
existing work as opposed to roadmap stuff.


> As for your proposal that we blindly consider $(N+1).0 to follow $N.4,
> I'm not particularly enthralled with that.  I think it's a good idea
> to look for a release that's got some particularly nifty feature(s)
> and use that as the time to move the first digit.  And, sure, plan to
> have that happen every 4-6 years or so, but adjust based on what
> actually gets into which releases.
>
>
The main point on that post was we emphasize not only the new stuff in the
just released version but that we re-celebrate everything that has been
accomplished in the previous 4 releases as well.  If the first-digit is
getting such significant attention we might as well play to that fact and
try and remind people who've missed out on prior releases that we are
continually innovating.  That, and the fact that the last N.0 release that
was so highly touted just went out of support.

Otherwise I don't see why we don't just start increment the first-digit
yearly.  Sure, every so often we think we've done enough to warrant an
increase there but the philosophy as a community doesn't actually match our
particular choice - we don't, in advance, place any special importance on
the first-digit releases, as evidenced by this discussion.

David J.

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Robert Haas

On Tue, Apr 12, 2016 at 4:32 PM, David G. Johnston
 wrote:
> I give a solid +10 to Robert's opinions on the matter and aside from
> figuring out if and how to fit first-number versioning dynamics into our
> release policies I think the community is doing a sufficient job on the
> communication and planning front.  The biggest resource need is quality
> control.  I dislike the fact that we are currently in a situation where the
> first 3 point releases each year are considered "live betas" based
> especially on both 9.3 and 9.5 post-release significant bug counts.

/me blinks.

I find it shocking that you would compare 9.5 to 9.3 that way.  Yeah,
we've had a few bugs in 9.5: in particular, it was disappointing to
have to disable abbreviated keys.  But I'm not sure that I really
believe that affected massive numbers of users in a really negative
way - many locales were just fine, and not every locale that had a
problem with some string data necessarily had a problem with the
strings people were actually storing.  But we haven't eaten anybody's
data, at least not beyond what can be fixed by a REINDEX, unless I
missed something here.

The fact is that this is a fairly hard problem to solve.  Some bugs
are not going to get found before people try the software, and we
can't make them try it while it's in beta.  We can only do our best to
do good code review, but inevitably we will miss some things.

As for your proposal that we blindly consider $(N+1).0 to follow $N.4,
I'm not particularly enthralled with that.  I think it's a good idea
to look for a release that's got some particularly nifty feature(s)
and use that as the time to move the first digit.  And, sure, plan to
have that happen every 4-6 years or so, but adjust based on what
actually gets into which releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Robert Haas

On Tue, Apr 12, 2016 at 4:07 PM, Oleg Bartunov  wrote:
> Our roadmap http://www.postgresql.org/developer/roadmap/ is the problem. We
> don't have clear roadmap and that's why we cannot plan future feature full
> release. There are several postgres-centric companies, which have most of
> developers, who do all major contributions. All these companies has their
> roadmaps, but not the community. I think 9.6 release is inflection point,
> where we should combine our roadmaps and release the one for the community.
> Than we could plan releases and our customers will see what to expect. I
> can't say for other companies, but we have big demand for many features from
> russian customers and we have to compete with other databases. Having
> community roadmap will helps us to work with customers and plan our
> resources.

I don't think it's realistic to plan what is going to go into a
certain release.  We don't know whether we want the patch until we see
the patch and review it and decide whether it's good.  We can't make
the decision, first, that the patch will be in the release, and then,
second, write the patch.  What we *can* do, as Josh says, is discuss
our plans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate costs don't consider combine/serial/deserial funcs

2016-04-12 Thread Robert Haas

On Sun, Apr 10, 2016 at 8:47 AM, David Rowley
 wrote:
> I realised a few days ago that the parallel aggregate code does not
> cost for the combine, serialisation and deserialisation functions at
> all.

Oops.

> I've attached a patch which fixes this.

I've committed this patch.  I wonder if it's going to produce compiler
warnings for some people, complaining about possible use of an
uninitialized variable.  That would kind of suck.  I don't much mind
having to insert a dummy assignment to shut the compiler up; a smarter
compiler will just throw it out anyway.  I'm less enthused about a
dummy MemSet.  The compiler is less likely to be able to get rid of
that, and it's more expensive if it doesn't.  But let's see what
happens.

> One small point which I was a little unsure of in the attached is,
> should the "if (aggref->aggdirectargs)" part of
> count_agg_clauses_walker() be within the "if
> (!context->combineStates)". I simply couldn't decide. We currently
> have no aggregates which this affects anyway, per; select * from
> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
> I've left it outwith.

The direct arguments would be evaluated in the worker, but not in the
leader, right?  Or am I confused?

> Another thing I thought of is that it's not too nice that I have to
> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
> was tempted to invent some bitmask flags for this, then modify
> create_agg_path() to use the same flags, but I thought I'd better not
> cause too much churn with this patch.

I'm kinda tempted to say this should be using an enum.  I note that
serialStates has a subtly different meaning here than in some other
places where you have used the same term.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-12 Thread Tom Lane

Pavel Stehule  writes:
> I had a idea about additional options of COPY RAW statements. One can be
> CAST function. These CAST functions can be used to any for any format.

Uh, what?  CAST() is not about external representations of values, and
overloading it for that purpose doesn't seem like a particularly good
idea: you'd have to figure out what the conversions meant inside SQL as
well as externally.  Also, maybe I missed something, but a different
representation of individual data values within a COPY wasn't what we
were after here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problems with huge_pages and IBM Power8

2016-04-12 Thread Tom Lane

reiner peterke  writes:
>> On Apr 12, 2016, at 10:20 PM, Andres Freund  wrote:
>> I've a bit of a hard time believing that this is related to huge pages.

> Well all i have at the moment is that when we disabled huge pages on the 
> kernel level and then restarted postgres there were no additional crashes.

That log excerpt shows pretty conclusively that both your semaphores and
your shared memory were yanked out from under you.  That's not a Postgres
bug, and it seems unlikely to be related to huge pages either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problems with huge_pages and IBM Power8

2016-04-12 Thread reiner peterke


> On Apr 12, 2016, at 10:20 PM, Andres Freund  wrote:
> 
> On 2016-04-12 21:58:14 +0200, reiner peterke wrote:
>> Hi
>> 
>> We have been doing some testing with Postgres (9.5.2) compiled on a Power8 
>> running Centos 7
>> 
>> When working with huge_pages, we initially got this error.
>> 
>> munmap(0x3efbe400) failed: Invalid argument
> 
> *munmap*, not mmap failed? that's odd; because there the hugepagesize
> shouldn't have much of an influence. If something fails it should be the
> initial mmap.  
I’ll double check in the morning, but i did copy it from the log.

> Could you show a strace of a failed start with an
> unmodified postgres?

we didn’t have the error when not using huge_pages.

>> after a bit of investigation we noticed that hugepagesize is har coded
>> to 2MB
> 
> Note it's not actually hardcoded to some size. It's just about rounding
> the size to a multiple of 2MB due to an older kernel bug:
>   /*
>* Round up the request size to a suitable large value.
>*
>* Some Linux kernel versions are known to have a bug, which 
> causes
>* mmap() with MAP_HUGETLB to fail if the request size is not a
>* multiple of any supported huge page size. To work around 
> that, we
>* round up the request size to nearest 2MB. 2MB is the most 
> common
>* huge page page size on affected systems.
> 
> 
>> Going further, we tried testing hugepages also on Ubuntu 16.04, also on the 
>> power8.  On Ubuntu Postgres did not like the hugepages at all (set also to 
>> 16MB)  and consistently crashed.
> 
>> Looking for some insight into this issue.  the error from the postgres
>> log on ubuntu is below.  It apperas to be related to semephores.
> 
> I've a bit of a hard time believing that this is related to huge pages.

Well all i have at the moment is that when we disabled huge pages on the kernel 
level and then restarted postgres there were no additional crashes.
Unfortunately I cannot access the server now.  I will look further tomorrow.


> 
> 
> Greetings,
> 
> Andres Freund

Sincerely,

Reiner Peterke



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread David G. Johnston

On Tue, Apr 12, 2016 at 1:07 PM, Oleg Bartunov  wrote:

>
>
>> * we don't *know* that any of the above items will require a backwards
>> compatibility break.
>>
>> People keep talking about "we might want to break compatibility/file
>> format one day".  But nobody is working on anything which will and
>> justifies it.
>>
>
> Our roadmap http://www.postgresql.org/developer/roadmap/ is the problem.
> We don't have clear roadmap and that's why we cannot plan future feature
> full release. There are several postgres-centric companies, which have most
> of developers, who do all major contributions. All these companies has
> their roadmaps, but not the community. I think 9.6 release is inflection
> point, where we should combine our roadmaps and release the one for the
> community. Than we could plan releases and our customers will see what to
> expect. I can't say for other companies, but we have big demand for many
> features from russian customers and we have to compete with other
> databases. Having community roadmap will helps us to work with customers
> and plan our resources.
>

I've already posited just having our release numbers operate on 5-year
increments 10.0 - 10.4; 11.0 - 11.4, etc on advocacy but was met with
silence.  In any case this comment is just furtherance of the tail wagging
the dog.  I see no fundamental reason to have to plan something momentous
enough, and actively schedule work to meet the plan, in order to justify a
10.0 release.

There is a bunch of hand-waving here, and its an environment I'm not
immersed in, but it seems that creating a roadmap today is tantamount to
waterfall design - useful in moderation but has largely shown to be
undesirable at scale.  Aside from the 1-year release cycle the project is
reasonably agile and well receptive to outside observation, questions, and
contributions.  If you have spare resources you need to keep busy just ask
how you can help.  To be honest the community would likely rather have
those people help review and test everything that is presently in-progress
- future goals on a roadmap are not nearly as important.  And if you have a
demand you think needs to be fulfill put the information out there and get
input.

If you are claiming the balance between community and profit is skewed
undesirably you will need to put forth a more concrete argument in order to
convince me.  For me, having plans reside in the profit-motive parts of the
community and having core simply operate openly seems to strike a solid
balance.

I give a solid +10 to Robert's opinions on the matter and aside from
figuring out if and how to fit first-number versioning dynamics into our
release policies I think the community is doing a sufficient job on the
communication and planning front.  The biggest resource need is quality
control.  I dislike the fact that we are currently in a situation where the
first 3 point releases each year are considered "live betas" based
especially on both 9.3 and 9.5 post-release significant bug counts.

David J.

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Josh berkus

On 04/12/2016 01:07 PM, Oleg Bartunov wrote:
> 
> Our roadmap http://www.postgresql.org/developer/roadmap/ is the problem.
> We don't have clear roadmap and that's why we cannot plan future feature
> full release. 

As someone who's worked at multiple proprietary software companies,
having a roadmap doesn't magically make code happen.

> There are several postgres-centric companies, which have
> most of developers, who do all major contributions. All these companies
> has their roadmaps, but not the community. I think 9.6 release is
> inflection point, where we should combine our roadmaps and release the
> one for the community. Than we could plan releases and our customers
> will see what to expect. I can't say for other companies, but we have
> big demand for many features from russian customers and we have to
> compete with other databases. Having community roadmap will helps us to
> work with customers and plan our resources.

It would be good to have a place for the companies who do PostgreSQL
feature work would publish their current efforts and timelines, so we at
least have a go-to place for "here's what someone's working on".  But
only if that information is going to be *updated*, something we're very
bad at.  And IMHO, a "roadmap" which is less that 50% accurate is a
waste of time.

There's an easy way for you to kick this off though: have PostgresPro
publish a wiki page or Trello board or github repo or whatever with your
roadmap and invite other full-time PostgreSQL contributors to add their
pieces.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problems with huge_pages and IBM Power8

2016-04-12 Thread Tom Lane

Andres Freund  writes:
> On 2016-04-12 21:58:14 +0200, reiner peterke wrote:
>> Looking for some insight into this issue.  the error from the postgres
>> log on ubuntu is below.  It apperas to be related to semephores.

> I've a bit of a hard time believing that this is related to huge pages.

I'm betting that's this:

http://www.postgresql.org/message-id/cak7teys9-o4bterbs3xuk2bffnnd55u2sm9j5r2fi7v6bhj...@mail.gmail.com

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problems with huge_pages and IBM Power8

2016-04-12 Thread Andres Freund

On 2016-04-12 21:58:14 +0200, reiner peterke wrote:
> Hi
> 
> We have been doing some testing with Postgres (9.5.2) compiled on a Power8 
> running Centos 7
> 
> When working with huge_pages, we initially got this error.
> 
> munmap(0x3efbe400) failed: Invalid argument

*munmap*, not mmap failed? that's odd; because there the hugepagesize
shouldn't have much of an influence. If something fails it should be the
initial mmap.  Could you show a strace of a failed start with an
unmodified postgres?

> after a bit of investigation we noticed that hugepagesize is har coded
> to 2MB

Note it's not actually hardcoded to some size. It's just about rounding
the size to a multiple of 2MB due to an older kernel bug:
/*
 * Round up the request size to a suitable large value.
 *
 * Some Linux kernel versions are known to have a bug, which 
causes
 * mmap() with MAP_HUGETLB to fail if the request size is not a
 * multiple of any supported huge page size. To work around 
that, we
 * round up the request size to nearest 2MB. 2MB is the most 
common
 * huge page page size on affected systems.

> Going further, we tried testing hugepages also on Ubuntu 16.04, also on the 
> power8.  On Ubuntu Postgres did not like the hugepages at all (set also to 
> 16MB)  and consistently crashed.

> Looking for some insight into this issue.  the error from the postgres
> log on ubuntu is below.  It apperas to be related to semephores.

I've a bit of a hard time believing that this is related to huge pages.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-12 Thread Pavel Stehule

2016-04-12 12:22 GMT+02:00 Ants Aasma :

> On 8 Apr 2016 9:14 pm, "Pavel Stehule"  wrote:
> > 2016-04-08 20:54 GMT+02:00 Andrew Dunstan :
> >> I should add that I've been thinking about this some more, and that I
> now agree that something should be done to support this at the SQL level,
> mainly so that clients can manage very large pieces of data in a
> stream-oriented fashion rather than having to marshall the data in memory
> to load/unload via INSERT/SELECT. Anything that is client-side only is
> likely to have this memory issue.
> >>
> >> At the same time I'm still not entirely convinced that COPY is a good
> vehicle for this. It's designed for bulk records, and already quite
> complex. Maybe we need something new that uses the COPY protocol but is
> more specifically tailored for loading or sending large singleton pieces of
> data.
> >
> >
> > Now it is little bit more time to think more about. But It is hard to
> design some more simpler than is COPY syntax. What will support both
> directions.
>
> Sorry for arriving late and adding to the bikeshedding. Maybe the
> answer is to make COPY pluggable. It seems to me that it would be
> relatively straightforward to add an extension mechanism for copy
> output and input plugins that could support any format expressible as
> a binary stream. Raw output would then be an almost trivial plugin.
> Others could implement JSON, protocol buffers, Redis bulk load, BSON,
> ASN.1 or whatever else serialisation format du jour. It will still
> have the same backwards compatibility issues as adding the raw output,
> but the payoff is greater.
>

I had a idea about additional options of COPY RAW statements. One can be
CAST function. These CAST functions can be used to any for any format.

COPY has two parts - client, and server side. Currently we cannot to expand
libpq, and we cannot to expand psql. So we have to send data to client in
target format and all transformations should be done on server side.
Personally, I strongly prefer to write Linux server side extensions against
MSWin client side extensions. The client (psql) is able to use a pipe - so
any client side transformation can be done outer psql.

Regards

Pavel




>
> Regards,
> Ants Aasma
>

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Kevin Grittner

On Tue, Apr 12, 2016 at 2:53 PM, Kevin Grittner  wrote:
> Readonly with client and job counts matching scale.

Single-socket i7, BTW.

>> A lot of this will be different between
>> single-socket and multi-socket servers; as soon as you have the latter
>> the likelihood of contention being bad goes up dramatically.
>
> Yeah, I know, and 4 socket has been at least an order of magnitude
> more problematic in my experience than 2 socket.  And the problems
> are far, far, far worse on kernels prior to 3.8, especially on 3.x
> before 3.8, so it's hard to know how to take any report of problems
> on a 4 node NUMA machine without knowing the kernel version.

Also, with 4 node NUMA I have seen far better scaling with
hyper-threading turned off.  I know there are environments where it
helps, but high-concurrency on multi-node NUMA is not one of them.

So, anyway, mentioning the HT setting is important, too.

Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-12 Thread Magnus Hagander

On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch  wrote:

> On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> > On Thu, Apr 7, 2016 at 5:15 AM, Noah Misch  wrote:
> > > Unless you especially want to self-impose the same tight resolution
> > > schedule
> > > that 9.6 regressions experience, let's move this to the "Non-bugs"
> section.
> > > Which do you prefer?  I don't think the opportunity for more
> documentation
> > > in
> > > light of 7117685 constitutes a regression, and I don't want "Open
> Issues"
> > > to
> > > double as a parking lot for slow-moving non-regressions.
> > >
> >
> > Well, if we *don't* do the rewrite before we release it, then we have to
> > instead put information about the new version of the functions into the
> old
> > structure I think.
> >
> > So I think it's an open issue.
>
> Works for me...
>
> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Magnus,
> since you committed the patch believed to have created it, you own this
> open
> item.  If that responsibility lies elsewhere, please let us know whose
> responsibility it is to fix this.  Since new open items may be discovered
> at
> any time and I want to plan to have them all fixed well in advance of the
> ship
> date, I will appreciate your efforts toward speedy resolution.  Please
> present, within 72 hours, a plan to fix the defect within seven days of
> this
> message.  Thanks.
>

I won't have time to do the bigger rewrite/reordeirng by then, but I can
certainly commit to having the smaller updates done to cover the new
functionality in less than a week. If nothing else, that'll be something
for me to do on the flight over to pgconf.us.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Oleg Bartunov

On Tue, Apr 12, 2016 at 9:25 PM, Josh berkus  wrote:

> On 04/12/2016 10:43 AM, Robert Haas wrote:
> > 1. Large backward compatibility breaks are bad.  Therefore, if any of
> > these things are absolutely impossible to do without major
> > compatibility breaks, we shouldn't do them at all.
>
> +1
>
> > 2. Small backward compatibility breaks are OK, but don't require doing
> > anything special to the version number.
>
> +1
>
> > 3. There's no value in aggregating many small backward compatibility
> > breaks into a single release.  That increases pain for users, rather
> > than decreasing it, and slows down development, too, because you have
> > to wait for the special magic release where it's OK to hose users.  We
> > typically have a few small backward compatibility breaks in each
> > release, and that's working fine, so I see little reason to change it.
>
> +1
>
> > 4. To the extent that I can guess what the things on Simon's list
> > means from what he wrote, and that's a little difficult because his
> > descriptions were very short, I think that everything on that list is
> > either (a) a bad idea or (b) something that we can do without any
> > compatibility break at all.
>
> +1
>
> Here's the features I can imagine being worth major backwards
> compatibility breaks:
>
> 1. Fully pluggable storage with a clean API.
>
> 2. Total elimination of VACUUM or XID freezing
>
> 3. Fully transparent-to-the user MM replication/clustering or sharding.
>
> 4. Perfect partitioning (i.e. transparent to the user, supports keys &
> joins, supports expressions on partition key, etc.)
>
> 5. Transparent upgrade-in-place (i.e. allowing 10.2 to use 10.1's tables
> without pg_upgrade or other modification).
>
> 6. Fully pluggable parser/executor with a good API
>
> That's pretty much it.  I can't imagine anything else which would
> justify imposing a huge upgrade barrier on users.  And, I'll point out,
> that in the above list:
>
> * nobody is currently working on anything in core except #4.
>

We are working on #3 (HA multimaster).


>
> * we don't *know* that any of the above items will require a backwards
> compatibility break.
>
> People keep talking about "we might want to break compatibility/file
> format one day".  But nobody is working on anything which will and
> justifies it.
>

Our roadmap http://www.postgresql.org/developer/roadmap/ is the problem. We
don't have clear roadmap and that's why we cannot plan future feature full
release. There are several postgres-centric companies, which have most of
developers, who do all major contributions. All these companies has their
roadmaps, but not the community. I think 9.6 release is inflection point,
where we should combine our roadmaps and release the one for the community.
Than we could plan releases and our customers will see what to expect. I
can't say for other companies, but we have big demand for many features
from russian customers and we have to compete with other databases. Having
community roadmap will helps us to work with customers and plan our
resources.


>
> --
> --
> Josh Berkus
> Red Hat OSAS
> (any opinions are my own)
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

[HACKERS] Problems with huge_pages and IBM Power8

2016-04-12 Thread reiner peterke

Hi

We have been doing some testing with Postgres (9.5.2) compiled on a Power8 
running Centos 7

When working with huge_pages, we initially got this error.

munmap(0x3efbe400) failed: Invalid argument

after a bit of investigation we noticed that hugepagesize is har coded to 2MB

src/backend/port/sysv_shmem.c (ligne 360)
 
...
int hugepagesize = 2 * 1024 * 1024;

But on the power they were configured to 16MB.  Recompiling to 16MB (8 * 1024 * 
1024) and we had no problems with the tests.

My initial questions are.

1 what is the hugepagesize hard coded to 2MB?
2 are there any side effect in setting it to 16MB?
3 since on the poer hugepages can have different values, would it be possible 
to have this value configurable?

Going further, we tried testing hugepages also on Ubuntu 16.04, also on the 
power8.  On Ubuntu Postgres did not like the hugepages at all (set also to 
16MB)  and consistently crashed.

Looking for some insight into this issue.  the error from the postgres log on 
ubuntu is below.
It apperas to be related to semephores. 

I don't have the compile optiona at the moment, I can provide those are other 
detais as needed.

Reiner

2016-04-12 12:26:42 CEST : 0 FATAL:  semctl(7864340, 14, SETVAL, 0) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  server process (PID 13352) exited with exit 
code 1
2016-04-12 12:26:42 CEST : 0 LOG:  terminating any other active server processes
2016-04-12 12:26:42 CEST facturation:system_dba 0 10.32.32.200WARNING:  
terminating connection because of crash of another server process
2016-04-12 12:26:42 CEST facturation:system_dba 0 10.32.32.200DETAIL:  The 
postmaster has commanded this server process to roll back the current 
transaction and exit, because another server process exited abnormally and 
possibly corrupted shared memory.
2016-04-12 12:26:42 CEST facturation:system_dba 0 10.32.32.200HINT:  In a 
moment you should be able to reconnect to the database and repeat your command.
2016-04-12 12:26:42 CEST postgres:admin 0 10.32.16.3WARNING:  terminating 
connection because of crash of another server process
2016-04-12 12:26:42 CEST postgres:admin 0 10.32.16.3DETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and exit, 
because another server process exited abnormally and possibly corrupted shared 
memory.
2016-04-12 12:26:42 CEST postgres:admin 0 10.32.16.3HINT:  In a moment you 
should be able to reconnect to the database and repeat your command.
2016-04-12 12:26:42 CEST postgres:perf_user 0 ::1WARNING:  terminating 
connection because of crash of another server process
2016-04-12 12:26:42 CEST postgres:perf_user 0 ::1DETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and exit, 
because another server process exited abnormally and possibly corrupted shared 
memory.
2016-04-12 12:26:42 CEST postgres:perf_user 0 ::1HINT:  In a moment you should 
be able to reconnect to the database and repeat your command.
2016-04-12 12:26:42 CEST : 0 WARNING:  terminating connection because of crash 
of another server process
2016-04-12 12:26:42 CEST : 0 DETAIL:  The postmaster has commanded this server 
process to roll back the current transaction and exit, because another server 
process exited abnormally and possibly corrupted shared memory.
2016-04-12 12:26:42 CEST : 0 HINT:  In a moment you should be able to reconnect 
to the database and repeat your command.
2016-04-12 12:26:42 CEST : 0 LOG:  all server processes terminated; 
reinitializing
2016-04-12 12:26:42 CEST : 0 LOG:  could not remove shared memory segment 
"/PostgreSQL.1612071802": No such file or directory
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7274497, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7307267, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7340036, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7372805, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7405574, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7438343, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7471112, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7503881, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7536650, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7569419, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7602188, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7634957, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7667726, 0, IPC_RMID, ...) failed: 
Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7700495, 0, IPC_RMID, ...)

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Kevin Grittner

On Tue, Apr 12, 2016 at 2:28 PM, Andres Freund  wrote:
> On 2016-04-12 14:17:12 -0500, Kevin Grittner wrote:
>> Well, something is different between your environment and mine,
>> since I saw no difference at scale 100 and 2.2% at scale 200.
>
> In a readonly test or r/w?

Readonly with client and job counts matching scale.

> A lot of this will be different between
> single-socket and multi-socket servers; as soon as you have the latter
> the likelihood of contention being bad goes up dramatically.

Yeah, I know, and 4 socket has been at least an order of magnitude
more problematic in my experience than 2 socket.  And the problems
are far, far, far worse on kernels prior to 3.8, especially on 3.x
before 3.8, so it's hard to know how to take any report of problems
on a 4 node NUMA machine without knowing the kernel version.

>> knowing more about your hardware, OS, configuration, etc., might
>> allow me to duplicate a problem so I can fix
>
>> For example, I used a "real" pg config, like I would for a production
>> machine (because that seems to me to be the environment that is most
>> important): the kernel is 3.13 (not one with pessimal scheduling) and
>> has tuning for THP, the deadline scheduler, the vm.*dirty* settings,
>> etc.  Without knowing even the kernel and what tuning the OS and pg
>> have had on your box, I could take a lot of shots in the dark without
>> hitting anything.
>
> That shouldn't really matter much for a read-only, shared_buffer
> resident, test? There's no IO and THP pretty much plays no role because
> there's very few memory allocations (removing the pressure causing the
> well known degradations).

I hate to assume which differences matter without trying, but some
of them seem less probable than others.

>> Oh, and the output of `numactl --hardware` would be good to have.
>> Thanks for all information you can provide.
>
> That was on Alexander's/PgPro's machine. Numactl wasn't installed, and I
> didn't have root. But it has four numa domains (gathered via /sys/).

On the machines I've used, it will give you the hardware report
without being root.  But of course, it can't do that if it's not
installed.  I hadn't yet seen a machine with multiple NUMA memory
segments that didn't have the numactl executable installed; I'll
keep in mind that can happen.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-12 Thread Fujii Masao

On Tue, Apr 12, 2016 at 9:04 AM, Masahiko Sawada  wrote:
> On Mon, Apr 11, 2016 at 8:47 PM, Fujii Masao  wrote:
>> On Mon, Apr 11, 2016 at 5:52 PM, Masahiko Sawada  
>> wrote:
>>> On Mon, Apr 11, 2016 at 1:31 PM, Fujii Masao  wrote:
 On Mon, Apr 11, 2016 at 10:58 AM, Masahiko Sawada  
 wrote:
> On Sat, Apr 9, 2016 at 12:32 PM, Tom Lane  wrote:
>> Jeff Janes  writes:
>>> When I compile now without cassert, I get the compiler warning:
>>
>>> syncrep.c: In function 'SyncRepUpdateConfig':
>>> syncrep.c:878:6: warning: variable 'parse_rc' set but not used
>>> [-Wunused-but-set-variable]
>>
>> If there's a good reason for that to be an Assert, I don't see it.
>> There are no callers of SyncRepUpdateConfig that look like they
>> need to, or should expect not to have to, tolerate errors.
>> I think the way to fix this is to turn the Assert into a plain
>> old test-and-ereport-ERROR.
>>
>
> I've changed the draft patch Amit implemented so that it doesn't parse
> twice(check_hook and assign_hook).
> So assertion that was in assign_hook is no longer necessary.
>
> Please find attached.

 Thanks for the patch!

 When I emptied s_s_names, reloaded the configration file, set it to 
 'standby1'
 and reloaded the configuration file again, the master crashed with
 the following error.

 *** glibc detected *** postgres: wal sender process postgres [local]
 streaming 0/3015F18: munmap_chunk(): invalid pointer:
 0x024d9a40 ***
 === Backtrace: =
 *** glibc detected *** postgres: wal sender process postgres [local]
 streaming 0/3015F18: munmap_chunk(): invalid pointer:
 0x024d9a40 ***
 /lib64/libc.so.6[0x3be8e75f4e]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
 === Backtrace: =
 /lib64/libc.so.6[0x3be8e75f4e]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(set_config_option+0x12cb)[0x982242]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(set_config_option+0x12cb)[0x982242]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(PostgresMain+0x772)[0x8141b6]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(PostgresMain+0x772)[0x8141b6]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(PostmasterMain+0x1134)[0x784c08]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x6ce12e]
 /lib64/libc.so.6(__libc_start_main+0xfd)[0x3be8e1ed5d]
 postgres: wal sender process postgres [local] streaming
 0/3015F18(PostmasterMain+0x1134)[0x784c08]
 postgres: wal sender process postgres [local] streaming 0/3015F18[0x467e99]

>>>
>>> Thank you for reviewing.
>>>
>>> SyncRepUpdateConfig() seems to be no longer necessary.
>>
>> Really? I was

Re: [HACKERS] Remove unused function from walsender.c

2016-04-12 Thread Fujii Masao

On Tue, Apr 12, 2016 at 6:36 AM, Simon Riggs  wrote:
> On 11 April 2016 at 08:05, Fujii Masao  wrote:
>
>>
>> There is an unused function GetOldestWALSendPointer() in walsender.c.
>> Per comment, it was introduced because we may need it in the future for
>> synchronous replication.
>>
>> Now we have very similar function SyncRepGetOldestSyncRecPtr() in
>> syncrep.c. Which makes me think that GetOldestWALSendPointer()
>> no longer needs to be maintained. So, is it time to remove that unused
>> function?
>
>
> Seems sensible cleanup to me.

Yep, pushed the cleanup patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Andres Freund

Hi,

On 2016-04-12 14:17:12 -0500, Kevin Grittner wrote:
> Well, something is different between your environment and mine,
> since I saw no difference at scale 100 and 2.2% at scale 200.

In a readonly test or r/w?  A lot of this will be different between
single-socket and multi-socket servers; as soon as you have the latter
the likelihood of contention being bad goes up dramatically.

> So,
> knowing more about your hardware, OS, configuration, etc., might
> allow me to duplicate a problem so I can fix

> For example, I used a "real" pg config, like I would for a production
> machine (because that seems to me to be the environment that is most
> important): the kernel is 3.13 (not one with pessimal scheduling) and
> has tuning for THP, the deadline scheduler, the vm.*dirty* settings,
> etc.  Without knowing even the kernel and what tuning the OS and pg
> have had on your box, I could take a lot of shots in the dark without
> hitting anything.

That shouldn't really matter much for a read-only, shared_buffer
resident, test? There's no IO and THP pretty much plays no role because
there's very few memory allocations (removing the pressure causing the
well known degradations).

> Oh, and the output of `numactl --hardware` would be good to have.
> Thanks for all information you can provide.

That was on Alexander's/PgPro's machine. Numactl wasn't installed, and I
didn't have root. But it has four numa domains (gathered via /sys/).

> It was the only reported case to that point, so the additional data
> point is valuable, if I can tell where that point is.  And you
> don't have any evidence that even with your configuration that any
> performance regression remains for those who have the default value
> for old_snapshot_threshold?

I haven't tested yet.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-04-12 Thread Tom Lane

Robert Haas  writes:
> On Thu, Mar 17, 2016 at 2:31 PM, Tom Lane  wrote:
>> So what I would now propose is
>> 
>> typedef void (*create_upper_paths_hook_type) (PlannerInfo *root,
>>   UpperRelationKind stage,
>>   RelOptInfo *input_rel);
>> 
>> and have this invoked at each stage right before we call
>> create_grouping_paths, create_window_paths, etc.

> Works for me.

Now that the commitfest crunch is over, I went back and took care of
this loose end.  I ended up passing the output_rel as well for each
step, to save hook users from having to look that up for themselves.
The hook calls are placed immediately before set_cheapest() for each
upper rel, so that all core-built Paths are available for inspection.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Kevin Grittner

On Tue, Apr 12, 2016 at 1:56 PM, Andres Freund  wrote:
> On 2016-04-12 13:44:00 -0500, Kevin Grittner wrote:
>> On Tue, Apr 12, 2016 at 12:38 PM, Andres Freund  wrote:
>>> On 2016-04-12 16:49:25 +, Kevin Grittner wrote:
 On a big NUMA machine with 1000 connections in saturation load
 there was a performance regression due to spinlock contention, for
 acquiring values which were never used.  Just fill with dummy
 values if we're not going to use them.
>>>
>>> FWIW, I could see massive regressions with just 64 connections.
>>
>> With what settings?
>
> You mean pgbench or postgres? The former -M prepared -c 64 -j 64 -S. The
> latter just a large enough shared buffers to contains the scale 300
> database, and adapted maintenance_work_mem. Nothing special.

Well, something is different between your environment and mine,
since I saw no difference at scale 100 and 2.2% at scale 200.  So,
knowing more about your hardware, OS, configuration, etc., might
allow me to duplicate a problem so I can fix it.  For example, I
used a "real" pg config, like I would for a production machine
(because that seems to me to be the environment that is most
important): the kernel is 3.13 (not one with pessimal scheduling)
and has tuning for THP, the deadline scheduler, the vm.*dirty*
settings, etc.  Without knowing even the kernel and what tuning the
OS and pg have had on your box, I could take a lot of shots in the
dark without hitting anything.  Oh, and the output of `numactl
--hardware` would be good to have.  Thanks for all information you
can provide.

>>  With or without the patch to avoid the locks when off?
>
> Without. Your commit message made it sound like you need unrealistic or
> at least unusual numbers of connections, and that's afaics not the case.

It was the only reported case to that point, so the additional data
point is valuable, if I can tell where that point is.  And you
don't have any evidence that even with your configuration that any
performance regression remains for those who have the default value
for old_snapshot_threshold?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Allow Pin/UnpinBuffer to operate in a lockfree manner.

2016-04-12 Thread Piotr Stefaniak


On 2016-04-12 07:00, Andres Freund wrote:

On 2016-04-12 00:32:13 -0400, Tom Lane wrote:

I wrote:

This commit has broken buildfarm member gaur, and no doubt pademelon
will be equally unhappy once it catches up to HEAD.


Spoke too soon, actually: pademelon did not get as far as initdb.

cc: "bufmgr.c", line 4032: error 1521: Incorrect initialization.
cc: "bufmgr.c", line 4058: error 1521: Incorrect initialization.

Apparently, assigning the result of init_spin_delay() is not as portable
as you thought.


Hm. I'm not sure why not though? Is it because delayStatus isn't static
or global scope?


It's because C89 requires initializers for aggregate and union types to 
be constant expressions (3.5.7p3):

All the expressions in an initializer for an object that has static
storage duration or in an initializer list for an object that has
aggregate or union type shall be constant expressions.


C99 removes this constraint (6.7.8p4):

All the expressions in an initializer for an object that has static storage 
duration shall be
constant expressions or string literals.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Add the "snapshot too old" feature

2016-04-12 Thread Andres Freund

On 2016-04-12 13:57:26 -0500, Kevin Grittner wrote:
> I have booked two large NUMA machines for Friday to look for any
> remaining performance regressions on high-concurrency loads on such
> machines.  Anyone with ideas on particular tests that they feel
> should be included, please let me know before then.

The worst case is probably statements which are utterly trivial to
execute, but do require a snapshot both during planning and
execution. Like e.g. SELECT 1;. That's probably a good thing to get
started with.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Add the "snapshot too old" feature

2016-04-12 Thread Kevin Grittner

On Mon, Apr 11, 2016 at 12:31 PM, Kevin Grittner  wrote:
> On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov

>> So, for read-only benchmark this patch introduces more than 5 times
>> regression on big machine.
>
> I did not schedule a saturation test on a big machine.  I guess I
> should have done.

I have booked two large NUMA machines for Friday to look for any
remaining performance regressions on high-concurrency loads on such
machines.  Anyone with ideas on particular tests that they feel
should be included, please let me know before then.

Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Andres Freund

On 2016-04-12 13:44:00 -0500, Kevin Grittner wrote:
> On Tue, Apr 12, 2016 at 12:38 PM, Andres Freund  wrote:
> > On 2016-04-12 16:49:25 +, Kevin Grittner wrote:
> >> On a big NUMA machine with 1000 connections in saturation load
> >> there was a performance regression due to spinlock contention, for
> >> acquiring values which were never used.  Just fill with dummy
> >> values if we're not going to use them.
> >
> > FWIW, I could see massive regressions with just 64 connections.
> 
> With what settings?

You mean pgbench or postgres? The former -M prepared -c 64 -j 64 -S. The
latter just a large enough shared buffers to contains the scale 300
database, and adapted maintenance_work_mem. Nothing special.

>  With or without the patch to avoid the locks when off?

Without. Your commit message made it sound like you need unrealistic or
at least unusual numbers of connections, and that's afaics not the case.

> > I'm a bit scared of having an innoccuous sounding option regress things
> > by a factor of 10. I think, in addition to this fix, we need to actually
> > solve the scalability issue here to a good degree.  One way to do so is
> > to apply the parts of 0001 in
> > http://archives.postgresql.org/message-id/20160330230914.GH13305%40awork2.anarazel.de
> > defining PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY and rely on that. Another
> > to apply the whole patch and simply put the lsn in an 8 byte atomic.
> 
> I think that we are well due for atomic access to aligned 8-byte
> values.  That would eliminate one potential hot spot in the
> "snapshot too old" code, for sure.

I'm kinda inclined to apply that portion (or just the whole patch with
the spurious #ifdef 0 et al fixed) into 9.6; and add the necessary
checks in a few places. Because I really think this is likely to hit
unsuspecting users.

FWIW, accessing a frequently changing value from a significant number of
connections, at a high frequency, isn't exactly free without a spinlock
either. But it should be much less bad.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Robert Haas

On Tue, Apr 12, 2016 at 2:27 PM, Andres Freund  wrote:
> none but 2) seem likely to require a substantial compatibility break.

And even that doesn't require one, if you keep the only system around
and make the new system optional via some sort of pluggable storage
API.  Which, to me, seems like the only sensible approach from a
development perspective.  If you decide to rip out the entire heapam
and replace it with something new in one fell swoop, you might as well
bother not writing the patch.  It's got the same chance of being
accepted either way.

I really think the time has come that we need an API for the heap the
same way we already have for indexes.  Regardless of exactly how we
choose to implement that, I think a large part of it will end up
looking similar to what we already have for FDWs.  We can either use
the FDW API itself and add whatever additional methods we need for
this purpose, or copy it to a new file, rename everything, and have
two slightly different versions.  AFAICS, the things we need that the
FDW API doesn't currently provide are:

1. The ability to have a local relfilenode associated with the data.
Or, ideally, several, so you have a separate set of files for each
index.

2. The ability to WAL-log changes to that relfilenode (or those
relfilenodes) in a sensible way.  Not sure whether the new generic
XLOG stuff is good enough for a first go-round here or if more is
needed.

3. The ability to intercept DDL commands directed at the table and
handle them in some arbitrary way.  This is really optional; people
could always provide a function-based API until we devise something
better.

4. The ability to build standard PostgreSQL indexes on top of the
data, if the underlying format still has a useful notion of CTIDs.
That is, if the underlying format is basically like our heap format,
but optimized in some way - e.g. append-only table that can't update
or delete with a smaller tuple header and page compression - then it
can reuse our indexing.  If it does something else, like an
index-organized table where rows can move around to different physical
positions, then it has to provide its own indexing facilities.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Kevin Grittner

On Tue, Apr 12, 2016 at 12:38 PM, Andres Freund  wrote:
> On 2016-04-12 16:49:25 +, Kevin Grittner wrote:
>> On a big NUMA machine with 1000 connections in saturation load
>> there was a performance regression due to spinlock contention, for
>> acquiring values which were never used.  Just fill with dummy
>> values if we're not going to use them.
>
> FWIW, I could see massive regressions with just 64 connections.

With what settings?  With or without the patch to avoid the locks when off?

> I'm a bit scared of having an innoccuous sounding option regress things
> by a factor of 10. I think, in addition to this fix, we need to actually
> solve the scalability issue here to a good degree.  One way to do so is
> to apply the parts of 0001 in
> http://archives.postgresql.org/message-id/20160330230914.GH13305%40awork2.anarazel.de
> defining PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY and rely on that. Another
> to apply the whole patch and simply put the lsn in an 8 byte atomic.

I think that we are well due for atomic access to aligned 8-byte
values.  That would eliminate one potential hot spot in the
"snapshot too old" code, for sure.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-12 Thread Andres Freund

On 2016-04-12 14:29:10 -0400, Robert Haas wrote:
> On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund  wrote:
> > While benchmarking on hydra
> > (c.f. 
> > http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de),
> > which has quite slow IO, I was once more annoyed by how incredibly long
> > the vacuum at the the end of a pgbench -i takes.
> >
> > The issue is that, even for an entirely shared_buffers resident scale,
> > essentially no data is cached in shared buffers. The COPY to load data
> > uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means
> > that copy immediately writes and evicts all data. Then vacuum reads &
> > writes the data in small chunks; again evicting nearly all buffers. Then
> > the creation of the ringbuffer has to read that data *again*.
> >
> > That's fairly idiotic.
> >
> > While it's not easy to fix this in the general case, we introduced those
> > ringbuffers for a reason after all, I think we at least should add a
> > special case for loads where shared_buffers isn't fully used yet.  Why
> > not skip using buffers from the ringbuffer if there's buffers on the
> > freelist? If we add buffers gathered from there to the ringlist, we
> > should have few cases that regress.
> 
> That does not seem like a good idea from here.  One of the ideas I
> still want to explore at some point is having a background process
> identify the buffers that are just about to be evicted and stick them
> on the freelist so that the backends don't have to run the clock sweep
> themselves on a potentially huge number of buffers, at perhaps
> substantial CPU cost.  Amit's last attempt at this didn't really pan
> out, but I'm not convinced that the approach is without merit.

FWIW, I've posted an implementation of this in the checkpoint flushing
thread; I saw quite substantial gains with it. It was just entirely
unrealistic to push that into 9.6.


> And, on the other hand, if we don't do something like that, it will be
> quite an exceptional case to find anything on the free list.  Doing it
> just to speed up developer benchmarking runs seems like the wrong
> idea.

I don't think it's just developer benchmarks. I've seen a number of
customer systems where significant portions of shared buffers were
unused due to this.

Unless you have an OLTP system, you can right now easily end up in a
situation where, after a restart, you'll never fill shared_buffers.
Just because sequential scans for OLAP and COPY use ringbuffers. It sure
isn't perfect to address the problem while there's free space in s_b,
but it sure is better than to just continue to have significant portions
of s_b unused.


> > Additionally, maybe we ought to increase the ringbuffer sizes again one
> > of these days? 256kb for VACUUM is pretty damn low.
> 
> But all that does is force the backend to write to the operating
> system, which is where the real buffering happens.

Relying on that has imo proven to be a pretty horrible idea.


> The bottom line
> here, IMHO, is not that there's anything wrong with our ring buffer
> implementation, but that if you run PostgreSQL on a system where the
> I/O is hitting a 5.25" floppy (not to say 8") the performance may be
> less than ideal.  I really appreciate IBM donating hydra - it's been
> invaluable over the years for improving PostgreSQL performance - but I
> sure wish they had donated a better I/O subsystem.

It's really not just hydra. I've seen the same problem on 24 disk raid-0
type installations. The small ringbuffer leads to reads/writes being
constantly interspersed, apparently defeating readahead.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-12 Thread Robert Haas

On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund  wrote:
> While benchmarking on hydra
> (c.f. 
> http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de),
> which has quite slow IO, I was once more annoyed by how incredibly long
> the vacuum at the the end of a pgbench -i takes.
>
> The issue is that, even for an entirely shared_buffers resident scale,
> essentially no data is cached in shared buffers. The COPY to load data
> uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means
> that copy immediately writes and evicts all data. Then vacuum reads &
> writes the data in small chunks; again evicting nearly all buffers. Then
> the creation of the ringbuffer has to read that data *again*.
>
> That's fairly idiotic.
>
> While it's not easy to fix this in the general case, we introduced those
> ringbuffers for a reason after all, I think we at least should add a
> special case for loads where shared_buffers isn't fully used yet.  Why
> not skip using buffers from the ringbuffer if there's buffers on the
> freelist? If we add buffers gathered from there to the ringlist, we
> should have few cases that regress.

That does not seem like a good idea from here.  One of the ideas I
still want to explore at some point is having a background process
identify the buffers that are just about to be evicted and stick them
on the freelist so that the backends don't have to run the clock sweep
themselves on a potentially huge number of buffers, at perhaps
substantial CPU cost.  Amit's last attempt at this didn't really pan
out, but I'm not convinced that the approach is without merit.

And, on the other hand, if we don't do something like that, it will be
quite an exceptional case to find anything on the free list.  Doing it
just to speed up developer benchmarking runs seems like the wrong
idea.

> Additionally, maybe we ought to increase the ringbuffer sizes again one
> of these days? 256kb for VACUUM is pretty damn low.

But all that does is force the backend to write to the operating
system, which is where the real buffering happens.  The bottom line
here, IMHO, is not that there's anything wrong with our ring buffer
implementation, but that if you run PostgreSQL on a system where the
I/O is hitting a 5.25" floppy (not to say 8") the performance may be
less than ideal.  I really appreciate IBM donating hydra - it's been
invaluable over the years for improving PostgreSQL performance - but I
sure wish they had donated a better I/O subsystem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Andres Freund

On 2016-04-12 11:25:21 -0700, Josh berkus wrote:
> Here's the features I can imagine being worth major backwards
> compatibility breaks:
> 
> 1. Fully pluggable storage with a clean API.
> 
> 2. Total elimination of VACUUM or XID freezing
> 
> 3. Fully transparent-to-the user MM replication/clustering or sharding.
> 
> 4. Perfect partitioning (i.e. transparent to the user, supports keys &
> joins, supports expressions on partition key, etc.)
> 
> 5. Transparent upgrade-in-place (i.e. allowing 10.2 to use 10.1's tables
> without pg_upgrade or other modification).
> 
> 6. Fully pluggable parser/executor with a good API
> 
> That's pretty much it.  I can't imagine anything else which would
> justify imposing a huge upgrade barrier on users.  And, I'll point out,
> that in the above list:
> 
> * nobody is currently working on anything in core except #4.
> 
> * we don't *know* that any of the above items will require a backwards
> compatibility break.

none but 2) seem likely to require a substantial compatibility break.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Josh berkus

On 04/12/2016 10:43 AM, Robert Haas wrote:
> 1. Large backward compatibility breaks are bad.  Therefore, if any of
> these things are absolutely impossible to do without major
> compatibility breaks, we shouldn't do them at all.

+1

> 2. Small backward compatibility breaks are OK, but don't require doing
> anything special to the version number.

+1

> 3. There's no value in aggregating many small backward compatibility
> breaks into a single release.  That increases pain for users, rather
> than decreasing it, and slows down development, too, because you have
> to wait for the special magic release where it's OK to hose users.  We
> typically have a few small backward compatibility breaks in each
> release, and that's working fine, so I see little reason to change it.

+1

> 4. To the extent that I can guess what the things on Simon's list
> means from what he wrote, and that's a little difficult because his
> descriptions were very short, I think that everything on that list is
> either (a) a bad idea or (b) something that we can do without any
> compatibility break at all.

+1

Here's the features I can imagine being worth major backwards
compatibility breaks:

1. Fully pluggable storage with a clean API.

2. Total elimination of VACUUM or XID freezing

3. Fully transparent-to-the user MM replication/clustering or sharding.

4. Perfect partitioning (i.e. transparent to the user, supports keys &
joins, supports expressions on partition key, etc.)

5. Transparent upgrade-in-place (i.e. allowing 10.2 to use 10.1's tables
without pg_upgrade or other modification).

6. Fully pluggable parser/executor with a good API

That's pretty much it.  I can't imagine anything else which would
justify imposing a huge upgrade barrier on users.  And, I'll point out,
that in the above list:

* nobody is currently working on anything in core except #4.

* we don't *know* that any of the above items will require a backwards
compatibility break.

People keep talking about "we might want to break compatibility/file
format one day".  But nobody is working on anything which will and
justifies it.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-12 Thread Robert Haas

On Mon, Apr 11, 2016 at 9:45 PM, Michael Paquier
 wrote:
> On Mon, Apr 11, 2016 at 5:16 PM, Etsuro Fujita
>  wrote:
>> On 2016/04/11 12:30, Michael Paquier wrote:
>>>
>>> +   if ((cancel = PQgetCancel(entry->conn)))
>>> +   {
>>> +   PQcancel(cancel, errbuf, sizeof(errbuf));
>>> +   PQfreeCancel(cancel);
>>> +   }
>>> Wouldn't it be better to issue a WARNING here if PQcancel does not
>>> succeed?
>>
>> Seems like a good idea.  Attached is an updated version of the patch.
>
> Thanks for the new version. The patch looks good to me.

I'm wondering why we are fixing this specific case and not any of the
other calls to PQexec() or PQexecParams() in postgres_fdw.c.

I mean, many of those instances are cases where the query isn't likely
to run for very long, but certainly "FETCH %d FROM c%u" is in theory
just as bad as the new code introduced in 9.6.  In practice, it
probably isn't, because we're probably only fetching 50 rows at a time
rather than potentially a lot more, but if we're fixing this code up
to be interrupt-safe, maybe we should fix it all at the same time.
Even for the short-running queries like CLOSE and DEALLOCATE, it seems
possible that there could be a network-related hang which you might
want to interrupt.

How about we encapsulate the while (PQisBusy(...)) loop into a new
function pgfdw_get_result(), which can be called after first calling
PQsendQueryParams()?  So then this code will say dmstate->result =
pgfdw_get_result(dmstate->conn).  And we can do something similar for
the other call to PQexecParams() in create_cursor().  Then let's also
add something like pgfdw_exec_query() which calls PQsendQuery() and
then pgfdw_get_result, and use that to replace all of the existing
calls to PQexec().

Then all the SQL postgres_fdw executes would be interruptible, not
just the new stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Robert Haas

On Tue, Apr 12, 2016 at 12:32 PM, Justin Clift  wrote:
> Yeah.  Moving the discussion here was more to determine which items
> really would need a backwards compatible break.  eg no other approach can
> be found.
>
> Seems I started it off badly, as no-one's yet jumped in to discuss the
> initial points. :(

I'm going to throw down the gauntlet (again) and say more or less what
I previously said on the pgsql-advocacy thread.  I think that:

1. Large backward compatibility breaks are bad.  Therefore, if any of
these things are absolutely impossible to do without major
compatibility breaks, we shouldn't do them at all.

2. Small backward compatibility breaks are OK, but don't require doing
anything special to the version number.

3. There's no value in aggregating many small backward compatibility
breaks into a single release.  That increases pain for users, rather
than decreasing it, and slows down development, too, because you have
to wait for the special magic release where it's OK to hose users.  We
typically have a few small backward compatibility breaks in each
release, and that's working fine, so I see little reason to change it.

4. To the extent that I can guess what the things on Simon's list
means from what he wrote, and that's a little difficult because his
descriptions were very short, I think that everything on that list is
either (a) a bad idea or (b) something that we can do without any
compatibility break at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Andres Freund

On 2016-04-12 16:49:25 +, Kevin Grittner wrote:
> On a big NUMA machine with 1000 connections in saturation load
> there was a performance regression due to spinlock contention, for
> acquiring values which were never used.  Just fill with dummy
> values if we're not going to use them.

FWIW, I could see massive regressions with just 64 connections.

I'm a bit scared of having an innoccuous sounding option regress things
by a factor of 10. I think, in addition to this fix, we need to actually
solve the scalability issue here to a good degree.  One way to do so is
to apply the parts of 0001 in
http://archives.postgresql.org/message-id/20160330230914.GH13305%40awork2.anarazel.de
defining PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY and rely on that. Another
to apply the whole patch and simply put the lsn in an 8 byte atomic.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Alvaro Herrera

Kevin Grittner wrote:

> FWIW, I spent a fair amount of time trying to make it PGC_SIGHUP,
> since it would be very nice to allow that; but I kept running into
> one problem after another with it, some of which were very hard to
> see how to fix.  My inclination is that trying to comment all the
> places that would need something done if we do this in some later
> release would be distracting until such time as we get there, and
> might give a false sense of security to anyone who fixes all the
> places the comments were scattered.

Okay, that's fair.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Tom Lane

Kevin Grittner  writes:
> On Tue, Apr 12, 2016 at 12:08 PM, Alvaro Herrera
>  wrote:
>> old_snapshot_threshold is PGC_POSTMASTER, so this is okay AFAICS, but
>> perhaps it'd be a good idea to add a oneline comment to guc.c indicating
>> to verify this code if there's an intention to lift that limitation --

> Perhaps, but this would be one of at least a dozen land mines that
> exist for trying to modify this setting to be read on reload.
> FWIW, I spent a fair amount of time trying to make it PGC_SIGHUP,
> since it would be very nice to allow that; but I kept running into
> one problem after another with it, some of which were very hard to
> see how to fix.

It'd be good if you document the problems you found somewhere, before
you forget them, just in case somebody does want to try to lift the
restriction.  I agree that scattered code comments wouldn't be the way.
Just a quick email to -hackers to get the info into the archives
might be enough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Kevin Grittner

On Tue, Apr 12, 2016 at 12:08 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> Avoid extra locks in GetSnapshotData if old_snapshot_threshold < 0
>>
>> On a big NUMA machine with 1000 connections in saturation load
>> there was a performance regression due to spinlock contention, for
>> acquiring values which were never used.  Just fill with dummy
>> values if we're not going to use them.
>
> old_snapshot_threshold is PGC_POSTMASTER, so this is okay AFAICS, but
> perhaps it'd be a good idea to add a oneline comment to guc.c indicating
> to verify this code if there's an intention to lift that limitation --
> snapshots taken before the reload would have invalid lsn/timestamp, so
> the current check would simply skip the check, which would be the wrong
> thing to do.
>
> I think it's reasonable to want to enable this feature with a simple
> reload, so if we ever do that it'd be good to have a pointer about that
> gotcha.  (I'm not proposing you do that, only add the comment for a
> future hacker.)

Perhaps, but this would be one of at least a dozen land mines that
exist for trying to modify this setting to be read on reload.
FWIW, I spent a fair amount of time trying to make it PGC_SIGHUP,
since it would be very nice to allow that; but I kept running into
one problem after another with it, some of which were very hard to
see how to fix.  My inclination is that trying to comment all the
places that would need something done if we do this in some later
release would be distracting until such time as we get there, and
might give a false sense of security to anyone who fixes all the
places the comments were scattered.

If there is a consensus that the comments would be worthwhile, I
can do a pass over the code I had before I gave up on PGC_SIGHUP
and try to add comments to all the appropriate spots based on
differences due to when the GUC was changed.  If we don't want
that, I'm not sure why this one spot is a better place for such a
comment than all the others.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-12 Thread Alvaro Herrera

Kevin Grittner wrote:
> Avoid extra locks in GetSnapshotData if old_snapshot_threshold < 0
> 
> On a big NUMA machine with 1000 connections in saturation load
> there was a performance regression due to spinlock contention, for
> acquiring values which were never used.  Just fill with dummy
> values if we're not going to use them.

old_snapshot_threshold is PGC_POSTMASTER, so this is okay AFAICS, but
perhaps it'd be a good idea to add a oneline comment to guc.c indicating
to verify this code if there's an intention to lift that limitation --
snapshots taken before the reload would have invalid lsn/timestamp, so
the current check would simply skip the check, which would be the wrong
thing to do.

I think it's reasonable to want to enable this feature with a simple
reload, so if we ever do that it'd be good to have a pointer about that
gotcha.  (I'm not proposing you do that, only add the comment for a
future hacker.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-12 Thread Stas Kelvich

> On 12 Apr 2016, at 15:47, Michael Paquier  wrote:
> 
> On Mon, Apr 11, 2016 at 7:16 PM, Stas Kelvich wrote:
>> Michael, it looks like that you are the only one person who can reproduce 
>> that bug. I’ve tried on bunch of OS’s and didn’t observe that behaviour, 
>> also looking at your backtraces I can’t get who is holding this lock (and 
>> all of that happens before first prepare record is replayed).
> 
> Where did you try it. FWIW, I can reproduce that on Linux and OSX, and
> only manually though:

Thanks a lot, Michael! Now I was able to reproduce that. Seems that when
you was performing manual setup, master instance issued checkpoint, but in
my script that didn’t happened because of shorter timing. There are tests
with checkpoint between prepare/commit in proposed test suite, but none of
them was issuing ddl.

> It looks to be the case... The PREPARE phase replayed after the
> standby is restarted in recovery creates a series of exclusive locks
> on the table created and those are not taken on HEAD. Once those are
> replayed the LOCK_STANDBY record is conflicting with it. In the case
> of the failure, the COMMIT PREPARED record cannot be fetched from
> master via the WAL stream so the relation never becomes visible.

Yep, it is. It is okay for prepared xact hold a locks for created/changed 
tables,
but code in standby_redo() was written in assumption that there are no prepared
xacts at the time of recovery. I’ll look closer at checkpointer code and will 
send
updated patch.

And thanks again.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Teodor Sigaev


There are only 3 fundamental options I see, the cleaner can wait,
"help", or move on.

"Helping" is what it does now and is dangerous.

Moving on gives the above-discussed unthrottling problem.

Waiting has two problems.  The act of waiting will cause autovacuums
to be canceled, unless ugly hacks are deployed to prevent that.   If
we deploy those ugly hacks, then we have the problem that a user
backend will end up waiting on an autovacuum to finish the cleaning,
and the autovacuum is taking its sweet time due to
autovacuum_vacuum_cost_delay.  (The "helping" model avoids this
problem because the user backend can just catch up with and pass the
io-throttled autovac process)
With pending cleanup patch backend will try to get lock on metapage with 
ConditionalLockPage. Will it interrupt autovacum worker?





For completeness sake, a fourth option would to move on, but only
after inserting the tuple directly into the main index structure
(rather then the pending list) like would be done with fastupdate off,
once the pending list is already oversized.  This is my favorite, but
there is no chance of it going into 9.6, much less being backpatched.

Agree, will reimplement for 9.7



Alvaro's recommendation, to let the cleaner off the hook once it
passes the page which was the tail page at the time it started, would
prevent any process from getting pinned down indefinitely, but would
not prevent the size of the list from increasing without bound.  I
think that would probably be good enough, because the current
throttling behavior is purely accidentally and doesn't *guarantee* a
limit on the size of the pending list.

Added, see attached patch (based on v3.1)
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Justin Clift

On 12 Apr 2016, at 17:23, Merlin Moncure  wrote:
> On Mon, Apr 11, 2016 at 11:39 AM, Justin Clift  wrote:
>> Moving over a conversation from the pgsql-advocacy mailing list.  In it
>> Simon (CC'd) raised the issue of potentially creating a 
>> backwards-compatibility
>> breaking release at some point in the future, to deal with things that
>> might have no other solution (my wording).
>> 
>> Relevant part of that thread there for reference:
>> 
>>  
>> http://www.postgresql.org/message-id/CANP8+jLtk1NtaJyXc=hAqX=0k+ku4zfavgvbkfs+_sor9he...@mail.gmail.com
>> 
>> Simon included a short starter list of potentials which might be in
>> that category:
>> 
>>  * SQL compliant identifiers
>>  * Remove RULEs
>>  * Change recovery.conf
>>  * Change block headers
>>  * Retire template0, template1
>>  * Optimise FSM
>>  * Add heap metapage
>>  * Alter tuple headers
>>  et al
>> 
>> This still is better placed on -hackers though, so lets have the
>> conversation here to figure out if a "backwards compatibility breaking"
>> release really is needed or not.
> 
> A couple of points here:
> *) I don't think having a version number that starts with 10 instead
> of 9 magically fixes backwards compatibility problems and I think
> that's a dangerous precedent to set unless we're willing to fork
> development and support version 9 indefinitely including major release
> versions.
> 
> *) Compatibility issues at the SQL level have to be taken much more
> seriously than other things (like internal layouts or .conf issues).
> 
> *) We need to do an honest cost benefit analysis before breaking
> things.  Code refactors placed on your users puts an enormous cost
> that is often underestimated.  I have some fairly specific examples of
> the costs related to the text cast removal for example.  It's not a
> pretty picture.

Yeah.  Moving the discussion here was more to determine which items
really would need a backwards compatible break.  eg no other approach can
be found.

Seems I started it off badly, as no-one's yet jumped in to discuss the
initial points. :(

+ Justin

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Teodor Sigaev


This restricts the memory used by ordinary backends when doing the
cleanup to be work_mem. Shouldn't we let them use
maintenance_work_mem? Only one backend can be doing this clean up of a
given index at any given time, so we don't need to worry about many
concurrent allocations of maintenance_work_mem.  This seems very
similar in spirit to index creation, where a backend is allowed to use
maintenance_work_mem.
Because it could be a several indexes in one pg instance. And each cleaner could 
eat maintenance_work_mem.





Also, do we plan on backpatching this?  While there are no known

Yes


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Robert Haas

On Tue, Apr 12, 2016 at 9:36 AM, Craig Ringer  wrote:
> Repeating the mapping at each checkpoint sounds pretty reasonable and means
> we always know what we need. There's no need to bloat each record with an
> extension name and no need for any kind of ugly global registration. The
> mapping table would be small and simple. I like it.
>
> Of course, it's all maybe-in-future stuff at this point, but I think that's
> a really good way to approach it.
>
> There's no way around the fact that user defined redo functions can affect
> reliability. But then, so can user-defined data types, functions, bgworkers,
> plpython functions loading ctypes, plpython functions getting creative in
> the datadir, and admins who paste into the wrong window. The scope for
> problems is somewhat greater but not IMO prohibitively so.

I am in agreement with you on both the merits of this particular thing
and the general principle you are articulating regarding how to think
about these things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Merlin Moncure

On Mon, Apr 11, 2016 at 11:39 AM, Justin Clift  wrote:
> Moving over a conversation from the pgsql-advocacy mailing list.  In it
> Simon (CC'd) raised the issue of potentially creating a 
> backwards-compatibility
> breaking release at some point in the future, to deal with things that
> might have no other solution (my wording).
>
> Relevant part of that thread there for reference:
>
>   
> http://www.postgresql.org/message-id/CANP8+jLtk1NtaJyXc=hAqX=0k+ku4zfavgvbkfs+_sor9he...@mail.gmail.com
>
> Simon included a short starter list of potentials which might be in
> that category:
>
>   * SQL compliant identifiers
>   * Remove RULEs
>   * Change recovery.conf
>   * Change block headers
>   * Retire template0, template1
>   * Optimise FSM
>   * Add heap metapage
>   * Alter tuple headers
>   et al
>
> This still is better placed on -hackers though, so lets have the
> conversation here to figure out if a "backwards compatibility breaking"
> release really is needed or not.

A couple of points here:
*) I don't think having a version number that starts with 10 instead
of 9 magically fixes backwards compatibility problems and I think
that's a dangerous precedent to set unless we're willing to fork
development and support version 9 indefinitely including major release
versions.

*) Compatibility issues at the SQL level have to be taken much more
seriously than other things (like internal layouts or .conf issues).

*) We need to do an honest cost benefit analysis before breaking
things.  Code refactors placed on your users puts an enormous cost
that is often underestimated.  I have some fairly specific examples of
the costs related to the text cast removal for example.  It's not a
pretty picture.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-12 Thread Anastasia Lubennikova

Attached version has fix of pg_dump suggested by Stephen 
Frost 
in -committers thread.

http://postgresql.nabble.com/pgsql-CREATE-INDEX-INCLUDING-column-td5897653.html
Sooner or later, I'd like to see this patch finished.

For now, it has two complaints:
- support of expressions as included columns.
Frankly, I don't understand, why it's a problem of the patch.
The patch is  already big enough and it will be much easier to add 
expressions support in the following patch, after the first one will be 
stable.

I wonder, if someone has objections to that?
Yes, it's a kind of delayed feature. But should we wait for every patch 
when it will be entirely completed?


- lack of review and testing
Obviously I did as much testing as I could.
So, if reviewers have any concerns about the patch, I'm waiting forward 
to see them.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int			indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i < numatts; i++)
+for (i = 0; i < indnkeyatts; i++)
 {
 	int			colno = index->indkey.values[i];
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d6b60db..342d5ec 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3557,6 +3557,14

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-12 Thread Andres Freund

On 2016-04-12 19:42:11 +0530, Amit Kapila wrote:
> Yes, it seems generally it is a good idea, but not sure if it is a complete
> fix for variation in performance we are seeing when we change shared memory
> structures.

I didn't suspect it would be. More whether it'd be beneficial
performance wise. FWIW, I haven't seen the variations you're observing
on any machine so far.

I think at high concurrency levels we're quite likely to interact with
the exact strategy used for the last-level/l3 cache. pgprocno,
allPgXact, BufferDescs are all arrays with a regular stride that we
access across several numa nodes, at a very high rate. At some point
that makes very likely that cache conflicts occur in set associative
caches.

> Andres suggested me on IM to take performance data on x86 m/c
> by padding PGXACT and the data for the same is as below:
> 
> median of 3, 5-min runs

Thanks for running these.

I presume these were *without* pg_prewarming the contents? It'd be
interesting to do the same with prewarming; aligning these structures
should be unrelated to the separate issue of bufferdesc order having a
rather massive performance effect.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 6:42 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > I agree with both of these ideas. Patch is attached.
>
> Pushed with minor cleanup.
>

Great, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [COMMITTERS] pgsql: Allow Pin/UnpinBuffer to operate in a lockfree manner.

2016-04-12 Thread Tom Lane

I wrote:
> It looks like that compiler adheres to the C89 restriction that an
> initializer for an array or struct must contain only link-time-constant
> expressions, even if the target object is of dynamic scope.
> The macro works with a link-time-constant pointer argument, but not
> with one that must be evaluated at runtime.

It strikes me that that means you could stick with this initialization
method if you made the macro argument be a literal constant string name,
like "buffer spinlock", and printed that rather than the address in
s_lock_stuck.  This would be different from what we do now, but not
necessarily any less useful.  I'm pretty dubious of the usefulness of
the addresses you're printing right now, because most of them are indeed
not the spinlock itself.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Tom Lane

Alexander Korotkov  writes:
> I agree with both of these ideas. Patch is attached.

Pushed with minor cleanup.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 6:34 PM, Teodor Sigaev  wrote:

> I agree with both of these ideas. Patch is attached.
>>
>
> Hmm, you accidentally forget to change flag type of GenericXLogRegister in
> contrib/bloom signature.


Fixed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


generic-xlog-register-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Teodor Sigaev


I agree with both of these ideas. Patch is attached.


Hmm, you accidentally forget to change flag type of GenericXLogRegister in 
contrib/bloom signature.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Tom Lane

Alexander Korotkov  writes:
> Attached patch is intended to fix this.  It zeroes "hole" in both
> GenericXLogFinish() and generic_redo().

Pushed.  I rewrote the comments a bit and modified GenericXLogFinish
so that it doesn't copy data only to overwrite it with zeroes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Preprocessor condition fix

2016-04-12 Thread Tom Lane

Christian Ullrich  writes:
> * Tom Lane wrote:
>> Oh?  Then we should not need that one (the /D switch in win32.mak) at all.
>> Should we just remove it?

> We have both confirmed several times that nothing depends on it. I think it 
> can go.

Done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Allow Pin/UnpinBuffer to operate in a lockfree manner.

2016-04-12 Thread Tom Lane

Andres Freund  writes:
> On 2016-04-12 00:32:13 -0400, Tom Lane wrote:
>> Apparently, assigning the result of init_spin_delay() is not as portable
>> as you thought.

> Hm. I'm not sure why not though? Is it because delayStatus isn't static
> or global scope?

It looks like that compiler adheres to the C89 restriction that an
initializer for an array or struct must contain only link-time-constant
expressions, even if the target object is of dynamic scope.
The macro works with a link-time-constant pointer argument, but not
with one that must be evaluated at runtime.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-12 Thread Amit Kapila

On Tue, Apr 12, 2016 at 3:48 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund 
> wrote:
>
>> I did get access to the machine (thanks!). My testing shows that
>> performance is sensitive to various parameters influencing memory
>> allocation. E.g. twiddling with max_connections changes
>> performance. With max_connections=400 and the previous patches applied I
>> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
>> dealing with an alignment/sharing related issue.
>>
>> Padding PGXACT to a full cache-line seems to take care of the largest
>> part of the performance irregularity. I looked at perf profiles and saw
>> that most cache misses stem from there, and that the percentage (not
>> absolute amount!) changes between fast/slow settings.
>>
>> To me it makes intuitive sense why you'd want PGXACTs to be on separate
>> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
>> making it immediately return propels performance up to 172, without
>> other changes. Additionally cacheline-padding PGXACT speeds things up to
>> 175 tps.
>>
>
> It seems like padding PGXACT to a full cache-line is a great improvement.
> We have not so many PGXACTs to care about bytes wasted to padding.
>

Yes, it seems generally it is a good idea, but not sure if it is a complete
fix for variation in performance we are seeing when we change shared memory
structures.  Andres suggested me on IM to take performance data on x86 m/c
by padding PGXACT and the data for the same is as below:

median of 3, 5-min runs

Client_Count/Patch_ver 8 64 128
HEAD 59708 329560 173655
PATCH 61480 379798 157580

Here, at 128 client-count the performance with patch still seems to have
variation.  The highest tps with patch (170363) is close to HEAD (175718).
This could be run-to-run variation, but I think it indicates that there are
more places where we might need such padding or may be optimize them, so
that they are aligned.

I can do some more experiments on similar lines, but I am out on vacation
and might not be able to access the m/c for 3-4 days.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pad_pgxact_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Justin Clift

On 12 Apr 2016, at 14:12, Yury Zhuravlev  wrote:
> Justin Clift wrote:
>> Simon included a short starter list of potentials which might be in
>> that category:
>> 
>>  * SQL compliant identifiers
>>  * Remove RULEs
>>  * Change recovery.conf
>>  * Change block headers
>>  * Retire template0, template1
>>  * Optimise FSM
>>  * Add heap metapage
>>  * Alter tuple headers
>>  et al
> 
> + CMake build I think.
> 
> Now I can build:
> * postgres
> * bin/* programs
> * pl/* languages
> * contrib/* (with cmake PGXS analogue)
> 
> Can run regression and isolation tests for postgres/pl* and all contrib 
> modules. 
> There is still a lot of work but I hope everything will turn out. Also it 
> would be good to get help.
> 
> Thanks.
> 
> PS https://github.com/stalkerg/postgres_cmake

If/when PostgreSQL can be built and tested with CMake... why would the
resulting code + database files + network protocol (etc) not be compatible
with previous versions? :)

+ Justin

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Preprocessor condition fix

2016-04-12 Thread Tom Lane

Christian Ullrich  writes:
> * Tom Lane wrote:
>> Hm, my grep found another one ...

> Oh, sorry. I saw that one, but thought it was intentional because _WIN64 
> is defined automatically anyway.

Oh?  Then we should not need that one (the /D switch in win32.mak) at all.
Should we just remove it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Preprocessor condition fix

2016-04-12 Thread Christian Ullrich

* From: Tom Lane [mailto:t...@sss.pgh.pa.us]

> Christian Ullrich  writes:

> > * Tom Lane wrote:
> >> Hm, my grep found another one ...
> 
> > Oh, sorry. I saw that one, but thought it was intentional because _WIN64
> > is defined automatically anyway.
> 
> Oh?  Then we should not need that one (the /D switch in win32.mak) at all.
> Should we just remove it?

We have both confirmed several times that nothing depends on it. I think it can 
go.

-- 
Christian



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Craig Ringer

On 12 April 2016 at 20:48, Robert Haas  wrote:

> > So how do we look at a generic log record, say "ok, the upstream that
> wrote
> > this says it needs to invoke registered generic xlog hook 42, which is
> >  from  at " ?
>
> Record enough information in WAL that the rmgrs can have names instead
> of ID numbers.  Stick the list of extension rmgrs in use into each
> checkpoint record and call it good.
>

Repeating the mapping at each checkpoint sounds pretty reasonable and means
we always know what we need. There's no need to bloat each record with an
extension name and no need for any kind of ugly global registration. The
mapping table would be small and simple. I like it.

Of course, it's all maybe-in-future stuff at this point, but I think that's
a really good way to approach it.

There's no way around the fact that user defined redo functions can affect
reliability. But then, so can user-defined data types, functions,
bgworkers, plpython functions loading ctypes, plpython functions getting
creative in the datadir, and admins who paste into the wrong window. The
scope for problems is somewhat greater but not IMO prohibitively so.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 3:33 AM, Tom Lane  wrote:

> ... BTW, with respect to the documentation angle, it seems to me
> that it'd be better if GenericXLogRegister were renamed to
> GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
> I think this would make the documentation clearer, and it would
> also make it easier to add other sorts of Register actions later,
> if we ever think of some (which seems not unlikely, really).
>
> Another thing to think about is whether we're going to regret
> hard-wiring the third argument as a boolean.  Should we consider
> making it a bitmask of flags, instead?  It's not terribly hard
> to think of other flags we might want there in future; for example
> maybe something to tell GenericXLogFinish whether it's worth trying
> to identify data movement on the page rather than just doing the
> byte-by-byte delta calculation.


I agree with both of these ideas. Patch is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


generic-xlog-register.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Code cleanup in the wake of recent LWLock refactoring.

2016-04-12 Thread Pavel Stehule

2016-04-12 14:51 GMT+02:00 Robert Haas :

> On Tue, Apr 12, 2016 at 4:18 AM, Pavel Stehule 
> wrote:
> > If there will be simple way, how to fix it, then I'll fix my extensions.
> But
> > new API is working only when the extension has own share memory segment.
> For
> > some complex extensions like Orafce it means expensive refactoring.
> >
> > What is worst, this refactoring is difficult now, because I support older
> > versions when I have not private shared segments.
>
> You don't need a separate shared memory segment.  You might want to
> have a look at what we did for pldebugger:
>
>
> http://git.postgresql.org/gitweb/?p=pldebugger.git;a=commitdiff;h=14c6caf08bb14a7404a8241e47a80ef5f97e451e
>
> I think the problem is similar to the one you are facing with orafce.
>

I'll try it. Thank you for info

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Yury Zhuravlev


Justin Clift wrote:

Simon included a short starter list of potentials which might be in
that category:

  * SQL compliant identifiers
  * Remove RULEs
  * Change recovery.conf
  * Change block headers
  * Retire template0, template1
  * Optimise FSM
  * Add heap metapage
  * Alter tuple headers
  et al


+ CMake build I think.

Now I can build:
* postgres
* bin/* programs
* pl/* languages
* contrib/* (with cmake PGXS analogue)

Can run regression and isolation tests for postgres/pl* and all contrib 
modules. 

There is still a lot of work but I hope everything will turn out. Also it 
would be good to get help.


Thanks.

PS https://github.com/stalkerg/postgres_cmake

--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-04-12 Thread Simon Riggs

On 12 April 2016 at 13:53, Robert Haas  wrote:

> On Mon, Apr 11, 2016 at 5:45 PM, Simon Riggs 
> wrote:
> > On 8 April 2016 at 17:49, Robert Haas  wrote:
> >> With the patch, you can - if you wish - substitute
> >> some other number for the one the planner comes up with.
> >
> > I saw you're using AccessExclusiveLock, the reason being it affects
> SELECTs.
> >
> > That is supposed to apply when things might change the answer from a
> SELECT,
> > whereas this affects only the default for a plan.
> >
> > Can I change this to a lower setting? I would have done this before
> applying
> > the patch, but you beat me to it.
>
> I don't have a problem with reducing the lock level there, if we're
> convinced that it's safe.


I'll run up a patch, with appropriate comments.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane  wrote:

> 1. It doesn't seem like generic_xlog.c has thought very carefully about
> the semantics of the "hole" between pd_lower and pd_upper.  The mainline
> XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
> for example RestoreBlockImage() explicitly zeroes the hole when restoring
> from a full-page image that has a hole.  But generic_xlog.c's redo routine
> does not do anything comparable, nor does GenericXLogFinish make any
> effort to ensure that the "hole" is all-zeroes after normal application of
> a generic update.  The reason this is of interest is that it means the
> contents of the "hole" could diverge between master and slave, or differ
> between the original state of a database and what it is after a crash and
> recovery.  That would at least complicate forensic comparisons of pages,
> and I think it might also break checksumming.  We thought that this was
> important enough to take the trouble of explicitly zeroing holes during
> mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?
>

Attached patch is intended to fix this.  It zeroes "hole" in both
GenericXLogFinish() and generic_redo().

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


generic-xlog-zero-gap.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-04-12 Thread Robert Haas

On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund  wrote:
> I can think of a number of relatively easy ways to address this:
> 1) Just zap (or issue?) all pending flush requests when getting an
>smgrinval/smgrclosenode
> 2) Do 1), but filter for the closed relnode
> 3) Actually handle the case of the last open segment not being
>RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so.
>
> I'm kind of inclined to do both 3) and 1).

#3 seems like it's probably about 15 years overdue, so let's do that anyway.

I don't quite understand why #1 fixes the problem - not because I
doubt you, but just because I haven't studied it much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-04-12 Thread Robert Haas

On Mon, Apr 11, 2016 at 5:45 PM, Simon Riggs  wrote:
> On 8 April 2016 at 17:49, Robert Haas  wrote:
>> With the patch, you can - if you wish - substitute
>> some other number for the one the planner comes up with.
>
> I saw you're using AccessExclusiveLock, the reason being it affects SELECTs.
>
> That is supposed to apply when things might change the answer from a SELECT,
> whereas this affects only the default for a plan.
>
> Can I change this to a lower setting? I would have done this before applying
> the patch, but you beat me to it.

I don't have a problem with reducing the lock level there, if we're
convinced that it's safe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-04-12 Thread tushar

On 04/11/2016 08:57 PM, Julien Rouhaud wrote:

>Expected = Expecting worker8 information , also loops=10 (including the
>Master)
>

Even if you set a per-table parallel_degree higher than
max_parallel_degree, it'll be maxed at max_parallel_degree.

Then, the explain shows that the planner assumed it'll launch 9 workers,
but only 8 were available (or needed perhaps) at runtime.

Right, if we increase max_worker_processes value in postgresql.conf file 
then

we are able to see the worker information in explain plan.

if parallel_degree value is higher than max_parallel_degree i.e

parallel_degree = 20,  max_parallel_degree=10  => [ select query 
accepting 10 workers ]

but in general  where table doesn't have parallel_degree set and 
max_parallel_degree is

set to 10 then select query is showing only 2 workers .

--
regards,tushar

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Code cleanup in the wake of recent LWLock refactoring.

2016-04-12 Thread Robert Haas

On Tue, Apr 12, 2016 at 4:18 AM, Pavel Stehule  wrote:
> If there will be simple way, how to fix it, then I'll fix my extensions. But
> new API is working only when the extension has own share memory segment. For
> some complex extensions like Orafce it means expensive refactoring.
>
> What is worst, this refactoring is difficult now, because I support older
> versions when I have not private shared segments.

You don't need a separate shared memory segment.  You might want to
have a look at what we did for pldebugger:

http://git.postgresql.org/gitweb/?p=pldebugger.git;a=commitdiff;h=14c6caf08bb14a7404a8241e47a80ef5f97e451e

I think the problem is similar to the one you are facing with orafce.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Robert Haas

On Mon, Apr 11, 2016 at 9:54 PM, Craig Ringer  wrote:
> If you rely on shared_preload_libraries, then "oops, I forgot to put
> myextension in shared_preload_libraries on the downstream" becomes a Bad
> Thing. There's no way to go back and redo the work you've passed over. You
> have to recreate the standby. Worse, it's silently wrong. The server has no
> idea it was meant to do anything and no way to tell the user it couldn't do
> what it was meant to do.

No, that's not right.  If you rely on shared_preload_libraries, then
you'd just have the server die like this:

FATAL: I don't know how to replay a record for extension rmgr "craigrules".
HINT: Add the necessary library to shared_preload_libraries and
restart the server.

That seems 100% fine, although maybe we could adopt the convention
that the user-visible extension rmgr name should match the shared
library name, just as we do for logical decoding plugins (IIRC), and
the server can try to load a library by that name and continue.

> We can't register redo routines in the catalogs because we don't have access
> to the syscaches etc during redo (right?).

Correct.

> So how do we look at a generic log record, say "ok, the upstream that wrote
> this says it needs to invoke registered generic xlog hook 42, which is
>  from  at " ?

Record enough information in WAL that the rmgrs can have names instead
of ID numbers.  Stick the list of extension rmgrs in use into each
checkpoint record and call it good.

> Personally I'd be willing to accept the limitations of the s_p_l and hook
> approach to the point where I think we should add the hook and accept the
> caveats of its use ... but I understand the problems with it. I understand
> not having custom redo routine support in this iteration of the patch. It's
> somewhat orthogonal job to this patch anyway - while handy for specific
> relation generic xlog, custom redo routines make most sense when combined
> with generic xlog that isn't necessarily associated with a specific
> relation. This patch doesn't support that.

Yeah.  And while this patch may (probably does) have technical defects
that need to be cured, I don't think that's an argument against this
approach.  For prototyping, this is totally fine.  For
performance-critical stuff, probably not so much.  But if you don't
give people a way to prototype stuff and extend the system, then they
won't be as likely to innovate and make new stuff, and then we won't
get as many new patches for core proposing new and innovative things.
I think it's great that we are finally getting close to make the
access method system truly extensible - that has been a clear need for
a long time, and I think we are going about it in the right way.  We
can add more in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-12 Thread Michael Paquier

On Mon, Apr 11, 2016 at 7:16 PM, Stas Kelvich wrote:
> Michael, it looks like that you are the only one person who can reproduce 
> that bug. I’ve tried on bunch of OS’s and didn’t observe that behaviour, also 
> looking at your backtraces I can’t get who is holding this lock (and all of 
> that happens before first prepare record is replayed).

Where did you try it. FWIW, I can reproduce that on Linux and OSX, and
only manually though:
1) Create a master and a streaming standby.
2) Run the following on master:
BEGIN;
CREATE TABLE foo (a int);
PREPARE TRANSACTION 'tx';
3) stop -m immediate the standby
4) COMMIT PREPARED 'tx' on master
5) Restart standby, and the node will wait for a lock

> Can you investigate it more? Particularly find out who holds the lock?

OK, so if I look at this backtrace that's a standby LOCK record, but
we already know that:
frame #9: 0x000107600383
postgres`LockAcquireExtended(locktag=0x7fff58a4b228, lockmode=8,
sessionLock='\x01', dontWait='\0', reportMemoryError='\0') + 2819 at
lock.c:998
frame #10: 0x0001075f9cd6
postgres`StandbyAcquireAccessExclusiveLock(xid=867, dbOid=16384,
relOid=16385) + 358 at standby.c:627
  * frame #11: 0x0001075fa23b
postgres`standby_redo(record=0x7f90a9841e38) + 251 at
standby.c:809
frame #12: 0x000107298d37 postgres`StartupXLOG + 9351 at xlog.c:6871

Here is the record pointer:
(lldb) p *record
(XLogReaderState) $4 = {
  read_page = 0x00010729b3c0 (postgres`XLogPageRead at xlog.c:10973)
  system_identifier = 6272572355656465658
  private_data = 0x7fff58a4bf40
  ReadRecPtr = 50424128
  EndRecPtr = 50424176
  decoded_record = 0x7f90a9843178
  main_data = 0x7f90a9804b78 "\x01"
  main_data_len = 16
  main_data_bufsz = 784
  record_origin = 0
  blocks = {
[0] = {
  in_use = '\0'
  rnode = (spcNode = 1663, dbNode = 16384, relNode = 2674)
  forknum = MAIN_FORKNUM
  blkno = 22
  flags = '\x10'
  has_image = '\0'
  bkp_image = 0x7f90a984826b "\x01"
  hole_offset = 892
  hole_length = 2076
  bimg_len = 6116
  bimg_info = '\x01'
  has_data = '\0'
  data = 0x7f90a98595d8 "\a"
  data_len = 0
  data_bufsz = 154
}

And in our case this corresponds to the record with LSN 0/03016940
that cannot take an exclusive LOCK:
rmgr: Transaction len (rec/tot):784/   813, tx:867, lsn:
0/03016610, prev 0/030165D8, desc: PREPARE
rmgr: Standby len (rec/tot): 16/42, tx:  0, lsn:
0/03016940, prev 0/03016610, desc: LOCK xid 867 db 16384 rel 16385
rmgr: Standby len (rec/tot): 28/54, tx:  0, lsn:
0/03016970, prev 0/03016940, desc: RUNNING_XACTS nextXid 868
latestCompletedXid 866 oldestRunningXid 867; 1 xacts: 867

There are two XID locks taken before that:
rmgr: Standby len (rec/tot): 16/42, tx:867, lsn:
0/03016578, prev 0/03014D40, desc: LOCK xid 867 db 16384 rel 16385
rmgr: Standby len (rec/tot): 16/42, tx:  0, lsn:
0/030165A8, prev 0/03016578, desc: LOCK xid 867 db 16384 rel 16385

And pg_locks on the standby is reporting that another lock has been
taken but not released:
=# select locktype, pid, mode, granted, fastpath from pg_locks where
relation = 16385;
 locktype |  pid  |mode | granted | fastpath
--+---+-+-+--
 relation | 68955 | AccessExclusiveLock | f   | f
 relation |  null | AccessExclusiveLock | t   | f
(2 rows)
In this case 68955 is the startup process trying to take the lock for
the LOCK record and it is not granted yet:
ioltas 68955   0.0  0.0  2593064   3228   ??  TXs   4:44PM   0:00.05
postgres: startup process   recovering 00010003
waiting

Now there is already a lock that has been taken and granted,
conflicting here... As the relation is only PREPARE'd yet and not
COMMIT PREPARED at this stage of the replay, isn't this lock taken
during the PREPARE phase, then it is not released by your new code
paths, no?

[One LOCK_DEBUG build later...]

It looks to be the case... The PREPARE phase replayed after the
standby is restarted in recovery creates a series of exclusive locks
on the table created and those are not taken on HEAD. Once those are
replayed the LOCK_STANDBY record is conflicting with it. In the case
of the failure, the COMMIT PREPARED record cannot be fetched from
master via the WAL stream so the relation never becomes visible.

Attached as two log files that are the result of a couple of runs,
those are the logs of the standby after being restarted in crash
recovery
- 2pc_master_logs.log, for HEAD.
- 2pc_patch_logs.log, with your last patch applied.
Feel free to have a look at them.
Regards,
-- 
Michael


2pc_master_logs.log
Description: Binary data


2pc_patch_logs.log
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Update copyright in genericdesc.c

2016-04-12 Thread Stephen Frost

* Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
> Attached fixes copyright in file mentioned in $subject.

Fixed.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 3:08 AM, Michael Paquier 
wrote:

> What I thought we should be able to do with that should not be only
> limited to indexes, aka to:
> - Be able to register and log full pages
> - Be able to log data associated to a block
> - Be able to have record data
> - Use at recovery custom routines to apply those WAL records
> The first 3 ones are done via the set of routines generating WAL
> records for the rmgr RM_GENERIC_ID. The 4th one needs a hook in
> generic_redo. Looking at the thread where this patch has been debated
> I am not seeing a discussion regarding that.
>

I've designed generic xlog under following restrictions:
 - We don't want users to register their own redo functions since it could
affect reliability. Unfortunately, I can't find original discussion now.
 - API should be as simple as possible for extension author.

However, I think we should add some way for custom redo functions one day.
If we will add pluggable storage engines (I hope one day we will), then we
would face some engines which don't use our buffer manager.  For such
pluggable storage engines, current generic WAL wouldn't work, but
user-defined redo functions would.

Now, my view is that we will need kind of two-phase recovery in order to
provide user-defined redo functions:
1) In the first phase, all built-in objects are recovered.  After this
phase, we can use system catalog.
2) In the second phase, user-defined redo functions are called on the base
of consistent system catalog.

I think we can implement such two-phase recovery even now on the base of
generic logical messages.  We can create logical slot.  When extension is
loaded it starts second phase of recovery by reading generic logical
messages from this logical slot.  Logical slot position should be also
advanced on checkpoint.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] raw output from copy

2016-04-12 Thread Ants Aasma

On 8 Apr 2016 9:14 pm, "Pavel Stehule"  wrote:
> 2016-04-08 20:54 GMT+02:00 Andrew Dunstan :
>> I should add that I've been thinking about this some more, and that I now 
>> agree that something should be done to support this at the SQL level, mainly 
>> so that clients can manage very large pieces of data in a stream-oriented 
>> fashion rather than having to marshall the data in memory to load/unload via 
>> INSERT/SELECT. Anything that is client-side only is likely to have this 
>> memory issue.
>>
>> At the same time I'm still not entirely convinced that COPY is a good 
>> vehicle for this. It's designed for bulk records, and already quite complex. 
>> Maybe we need something new that uses the COPY protocol but is more 
>> specifically tailored for loading or sending large singleton pieces of data.
>
>
> Now it is little bit more time to think more about. But It is hard to design 
> some more simpler than is COPY syntax. What will support both directions.

Sorry for arriving late and adding to the bikeshedding. Maybe the
answer is to make COPY pluggable. It seems to me that it would be
relatively straightforward to add an extension mechanism for copy
output and input plugins that could support any format expressible as
a binary stream. Raw output would then be an almost trivial plugin.
Others could implement JSON, protocol buffers, Redis bulk load, BSON,
ASN.1 or whatever else serialisation format du jour. It will still
have the same backwards compatibility issues as adding the raw output,
but the payoff is greater.

Regards,
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund  wrote:

> I did get access to the machine (thanks!). My testing shows that
> performance is sensitive to various parameters influencing memory
> allocation. E.g. twiddling with max_connections changes
> performance. With max_connections=400 and the previous patches applied I
> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
> dealing with an alignment/sharing related issue.
>
> Padding PGXACT to a full cache-line seems to take care of the largest
> part of the performance irregularity. I looked at perf profiles and saw
> that most cache misses stem from there, and that the percentage (not
> absolute amount!) changes between fast/slow settings.
>
> To me it makes intuitive sense why you'd want PGXACTs to be on separate
> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
> making it immediately return propels performance up to 172, without
> other changes. Additionally cacheline-padding PGXACT speeds things up to
> 175 tps.
>

It seems like padding PGXACT to a full cache-line is a great improvement.
We have not so many PGXACTs to care about bytes wasted to padding.  But
could it have another negative side-effect?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Choosing parallel_degree

2016-04-12 Thread Simon Riggs

On 12 April 2016 at 07:58, Amit Kapila  wrote:

> On Tue, Apr 12, 2016 at 3:15 AM, Simon Riggs 
> wrote:
>
>> On 8 April 2016 at 17:49, Robert Haas  wrote:
>>
>>
>>> With the patch, you can - if you wish - substitute
>>> some other number for the one the planner comes up with.
>>
>>
>> I saw you're using AccessExclusiveLock, the reason being it affects
>> SELECTs.
>>
>> That is supposed to apply when things might change the answer from a
>> SELECT, whereas this affects only the default for a plan.
>>
>>
> By this theory, shouldn't any other parameter like n_distinct_inherited
> which just effects the plan required lower lock level?
>

It should, yes, and I'm as surprised to see it isn't as you are.

Thread: Fabrizio was asked by Robert to provide or document an analysis of
why each setting was OK to change; 9 days later he had not done so or
replied, so I committed a reduced version of the patch that matched
existing tests and code comments.

I guess we could have salvaged some more from it, but we didn't and there's
never enough time.

If RMT allows, that can be changed or it can wait.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-04-12 Thread Fabien COELHO



Hello Andres,


I can think of a number of relatively easy ways to address this:
1) Just zap (or issue?) all pending flush requests when getting an
  smgrinval/smgrclosenode
2) Do 1), but filter for the closed relnode
3) Actually handle the case of the last open segment not being
  RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so.

I'm kind of inclined to do both 3) and 1).


Alas I'm a little bit outside my area of expertise. Just for suggestion 
purpose, possibly totally wrong (please accept my apology if this is the 
case), the ideas I had while thinking about these issues, some may be 
quite close to your suggestions above:


 - keep track of file/segment closing/reopenings (say with a counter), so
   that if a flush is about a file/segment which has been closed or
   reopened, it is just skipped. I'm not sure this is enough, because one
   process may do the file cleaning and another may want to flush, although
   I guess there is some kind of locking/shared data structures to manage
   such interactions between pg processes.

 - because of "empty the bucket when filled behavior" of the current
   implementation, a buffer to be flused may be kept for a very
   long time in the bucket. I think that flushing advices become stale
   after a while so should not be issued (the buffer may have been
   written again, ...), or the bucket should be flushed after a while
   even of not full.

Also, a detail in "pg_flush_data", there are a serie of three if/endif, 
but it is not obvious to me whether they are mutually exclusive while 
looking at the source, so I was wondering whether the code could try 
several flushings approaches one after the other... This is probably not 
the case, but I would be more at ease with a if/elif/elif/endif structure 
there so that is clearer.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Code cleanup in the wake of recent LWLock refactoring.

2016-04-12 Thread Pavel Stehule

Hi

2016-02-13 15:54 GMT+01:00 Tom Lane :

> Simon Riggs  writes:
> > On 10 February 2016 at 16:36, Tom Lane  wrote:
> >> FWIW, I wasn't paying attention either, but I'm convinced by Robert's
> >> argument.  Avoiding coupling between extensions is worth an API break.
>
> > Old APIs - why can't we keep it?
>
> Because with the old API, a bug in extension A may go unnoticed in A's
> testing but break when it's combined with extension B.  That causes
> headaches all around, not just to the extension authors but to their
> users.  The new API ensures detection of didn't-request-enough-locks
> bugs regardless of which other extensions are installed.  That is worth
> the cost of a forced API update, in Robert's judgement and mine too.
>
> (Having said that, I wonder if we could put back the old API as a shim
> layer *without* the allocate-some-excess-locks proviso.  That would
> get us to a situation where standalone testing of a broken extension
> would disclose its bug, without breaking non-buggy extensions.)
>

If there will be simple way, how to fix it, then I'll fix my extensions.
But new API is working only when the extension has own share memory
segment. For some complex extensions like Orafce it means expensive
refactoring.

What is worst, this refactoring is difficult now, because I support older
versions when I have not private shared segments.

Regards

Pavel



> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Choosing parallel_degree

2016-04-12 Thread tushar


On 04/11/2016 09:14 PM, Robert Haas wrote:

postgres=# explain analyze verbose select * from abd  where n<=1;
>>ERROR:  requested shared memory size overflows size_t
>>
>>if we remove the analyze keyword then query running successfully.
>>
>>Expected = Is it not better to throw the error at the time of setting
>>max_parallel_degree, if not supported ?

>
>+1

It surprises me that that request overflowed size_t.  I guess we
should look into why that's happening.  Did you test this on a 32-bit
system?

No, I tested on 64 bit machine.

--
regards,tushar



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-12 Thread Craig Ringer

On 12 April 2016 at 13:51, Tom Lane  wrote:

> Craig Ringer  writes:
> > The other area where there's room for extension without throwing out the
> > whole thing and rebuilding is handling of new top-level statements. We
> can
> > probably dispatch the statement text to a sub-parser provided by an
> > extension that registers interest in that statement name when we attempt
> to
> > parse it and fail. Even then I'm pretty sure it won't be possible to do
> so
> > while still allowing multi-statements. I wish we didn't support
> > multi-statements, but we're fairly stuck with them.
>
> Well, as I said, I've been there and done that.  Things get sticky
> when you notice that those "new top-level statements" would like to
> contain sub-clauses (e.g. arithmetic expressions) that should be defined
> by the core grammar.  And maybe the extension would also like to
> define additions to the expression grammar, requiring a recursive
> callback into the extension.  It gets very messy very fast.
>

Yuck. You'd ping-pong between two parsers, and have to try to exchange
sensible starting states. Point taken.

So even that seemingly not-that-bad restricted option turns out to be far
from it, which just goes to show what a pit of snakes parser extensibility
is...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

[HACKERS] Update copyright in genericdesc.c

2016-04-12 Thread Amit Langote

Hi,

Attached fixes copyright in file mentioned in $subject.

Thanks,
Amit
diff --git a/src/backend/access/rmgrdesc/genericdesc.c b/src/backend/access/rmgrdesc/genericdesc.c
index caa9a03..0796bb8 100644
--- a/src/backend/access/rmgrdesc/genericdesc.c
+++ b/src/backend/access/rmgrdesc/genericdesc.c
@@ -4,7 +4,7 @@
  *	  rmgr descriptor routines for access/transam/generic_xlog.c
  *
  *
- * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  * src/backend/access/rmgrdesc/genericdesc.c

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-04-12 Thread Amit Kapila

On Tue, Apr 12, 2016 at 3:15 AM, Simon Riggs  wrote:

> On 8 April 2016 at 17:49, Robert Haas  wrote:
>
>
>> With the patch, you can - if you wish - substitute
>> some other number for the one the planner comes up with.
>
>
> I saw you're using AccessExclusiveLock, the reason being it affects
> SELECTs.
>
> That is supposed to apply when things might change the answer from a
> SELECT, whereas this affects only the default for a plan.
>
>
By this theory, shouldn't any other parameter like n_distinct_inherited
which just effects the plan required lower lock level?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Noah Misch

On Thu, Apr 07, 2016 at 05:53:54PM -0700, Jeff Janes wrote:
> On Thu, Apr 7, 2016 at 4:33 PM, Tom Lane  wrote:
> > Jeff Janes  writes:
> >> To summarize the behavior change:
> >
> >> In the released code, an inserting backend that violates the pending
> >> list limit will try to clean the list, even if it is already being
> >> cleaned.  It won't accomplish anything useful, but will go through the
> >> motions until eventually it runs into a page the lead cleaner has
> >> deleted, at which point it realizes there is another cleaner and it
> >> stops.  This acts as a natural throttle to how fast insertions can
> >> take place into an over-sized pending list.
> >
> > Right.
> >
> >> The proposed change removes that throttle, so that inserters will
> >> immediately see there is already a cleaner and just go back about
> >> their business.  Due to that, unthrottled backends could add to the
> >> pending list faster than the cleaner can clean it, leading to
> >> unbounded growth in the pending list and could cause a user backend to
> >> becoming apparently unresponsive to the user, indefinitely.  That is
> >> scary to backpatch.
> >
> > It's scary to put into HEAD, either.  What if we simply don't take
> > that specific behavioral change?  It doesn't seem like this is an
> > essential part of fixing the bug as you described it.  (Though I've
> > not read the patch, so maybe I'm just missing the connection.)
> 
> There are only 3 fundamental options I see, the cleaner can wait,
> "help", or move on.
> 
> "Helping" is what it does now and is dangerous.
> 
> Moving on gives the above-discussed unthrottling problem.
> 
> Waiting has two problems.  The act of waiting will cause autovacuums
> to be canceled, unless ugly hacks are deployed to prevent that.   If
> we deploy those ugly hacks, then we have the problem that a user
> backend will end up waiting on an autovacuum to finish the cleaning,
> and the autovacuum is taking its sweet time due to
> autovacuum_vacuum_cost_delay.

Teodor, this thread has been quiet for four days, and the deadline to fix this
open item expired 23 hours ago.  Do you have a new plan for fixing it?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 104 matches

Mail list logo