date:20140415

Craig Ringer cr...@2ndquadrant.com writes:
 On 04/15/2014 03:39 AM, Tom Lane wrote:
 I still wish we could get rid of this problem by fixing the Windows build
 recipes so that the PGDLLEXPORT marking wasn't needed.  We proved to
 ourselves recently that getting rid of PGDLLIMPORT on global variables
 wouldn't work, but I'm not sure that the function end of it was really
 investigated.

 My understanding is that we *can* drop PGDLLEXPORT on functions without
 actively breaking anything. But we probably shouldn't.

 If we omit PGDLLEXPORT, the linker of the DLL/executable that imports
 the extern function will generate a thunk from the .LIB file for the
 target DLL during linkage; this thunk within the DLL/EXE with the
 undefined extern then jumps to the real address within the defining DLL/EXE.

TBH, if the only argument for this is a small efficiency difference,
then to my mind it barely requires discussion.  I don't give one hoot
about micro-optimization for the Windows platform; I'm satisfied if
it works at all there.  And I seriously doubt that a couple more cycles to
call any function implemented in a loadable module would matter anyway.

 I actually think we should *add* a LIBPQEXPORT that handles this for
 libpq, much like PGDLLEXPORT does for postgres(.exe). And in the
 process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or
 something.

My reaction to that is not bloody likely.  I remarked on this upthread
already, but there is absolutely no way that I want to clutter our source
code with platform-specific markings like that.

Perhaps somebody could try a Windows build with PGDLLEXPORT defined to
empty, and verify that it works, and if so do a pgbench comparison
against a build done the existing way?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 [ assorted comments about custom-scan patch, but particularly ]

 * The prune hook makes me feel very uneasy. It seems weirdly specific
 implementation detail, made stranger by the otherwise lack of data
 maintenance API calls. Calling that for every dirty page sounds like
 an issue and my patch rejection indicator is flashing red around that.

 Yeah.  After a fast review of the custom-scan and cache-scan patches, it
 seems to me that my original fears are largely confirmed: the custom scan
 patch is not going to be sufficient to allow development of any truly new
 plan type.  Yeah, you can plug in some new execution node types, but
 actually doing anything interesting is going to require patching other
 parts of the system.  Are we going to say to all comers, sure, we'll put
 a hook call anywhere you like, just ask?  I can't see this as being the
 way to go.

Without prejudice to the rest of what you said, this argument doesn't
hold much water with me.  I mean, anything that our extensibility
mechanism doesn't support today will require new hooks, but does that
mean we're never going to add any more hooks?  I sure hope not.  When
hooks are proposed here, we evaluate on them on their merits and
attempt to judge the likelihood that a hook in a particular place will
be useful, but generally we're not averse to adding them, and as long
as the paths aren't too performance-critical, I don't think we should
be averse to adding them.

We have a great system today for letting people add new data types and
things of that sort, but anything that penetrates more deeply into the
heart of the system pretty much can't be done; this is why various
companies, such as our respective employers, have developed and
maintained forks of the PostgreSQL code base instead of just hooking
in to the existing code.  We probably can't solve that problem
completely, but that doesn't mean we should throw in the towel.

And in particular, I think it's pretty normal that a new facility like
custom scans might create additional demand for new hooks.  If
something was completely impossible before, and the new facility makes
it almost-possible, then why shouldn't someone ask for a hook there?
A prune hook probably has no business in the custom scan patch proper,
but whether it's a good idea or a bad one should be decided on the
merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah.  After a fast review of the custom-scan and cache-scan patches, it
 seems to me that my original fears are largely confirmed: the custom scan
 patch is not going to be sufficient to allow development of any truly new
 plan type.  Yeah, you can plug in some new execution node types, but
 actually doing anything interesting is going to require patching other
 parts of the system.

 Without prejudice to the rest of what you said, this argument doesn't
 hold much water with me.  I mean, anything that our extensibility
 mechanism doesn't support today will require new hooks, but does that
 mean we're never going to add any more hooks?  I sure hope not.

No, that's not what I said.  ISTM that the argument for the custom-scan
API is that it allows interesting new things to be done *without further
modifying the core code*.  But the example application (cache_scan) fails
to demonstrate that, and indeed seems to be a counterexample.  Whether
we'd accept cache_scan on its own merits is a separate question.  The
problem for me is that custom-scan isn't showing that it can support what
was claimed without doing serious damage to modularity and maintainability
of the core code.

What this may mean is that we need more attention to refactoring of the
core code.  But just removing static from any function that looks like
it might be handy isn't my idea of well-considered refactoring.  More the
opposite in fact: if those things turn into APIs that we have to support,
it's going to kill any ability to do such refactoring.

A concrete example here is setrefs.c, whose responsibilities tend to
change from release to release.  I think if we committed custom-scan
as is, we'd have great difficulty changing setrefs.c's transformations
ever again, at least if we hoped to not break users of the custom-scan
API.  I'm not sure what the solution is --- but turning setrefs into
a white box instead of a black box isn't it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 A concrete example here is setrefs.c, whose responsibilities tend to
 change from release to release.  I think if we committed custom-scan
 as is, we'd have great difficulty changing setrefs.c's transformations
 ever again, at least if we hoped to not break users of the custom-scan
 API.  I'm not sure what the solution is --- but turning setrefs into
 a white box instead of a black box isn't it.

Yeah, this was my (general) complaint as well and the answer that I kept
getting back is well, it's ok, you can still break it between major
releases and the custom scan users will just have to deal with it.

I'm a bit on the fence about that, itself, but the other half of that
coin is that we could end up with parts of the *core* code that think
it's ok to go pulling in these functions, once they're exposed, and that
could end up making things quite ugly and difficult to maintain going
forward.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah.  After a fast review of the custom-scan and cache-scan patches, it
 seems to me that my original fears are largely confirmed: the custom scan
 patch is not going to be sufficient to allow development of any truly new
 plan type.  Yeah, you can plug in some new execution node types, but
 actually doing anything interesting is going to require patching other
 parts of the system.

 Without prejudice to the rest of what you said, this argument doesn't
 hold much water with me.  I mean, anything that our extensibility
 mechanism doesn't support today will require new hooks, but does that
 mean we're never going to add any more hooks?  I sure hope not.

 No, that's not what I said.  ISTM that the argument for the custom-scan
 API is that it allows interesting new things to be done *without further
 modifying the core code*.  But the example application (cache_scan) fails
 to demonstrate that, and indeed seems to be a counterexample.  Whether
 we'd accept cache_scan on its own merits is a separate question.  The
 problem for me is that custom-scan isn't showing that it can support what
 was claimed without doing serious damage to modularity and maintainability
 of the core code.

I think there's two separate things in there, one of which I agree
with and one of which I disagree with.  I agree that we must avoid
damaging the modularity and maintainability of the core code; I don't
agree that custom-scan needs to be able to do interesting things with
zero additional changes to the core code.  If we come up with three
interesting applications for custom scan that require 5 new hooks
between them, I'll consider that a major success - assuming those
hooks don't unduly limit future changes we may wish to make in the
core code.  I think your concern about exposing APIs that may not be
terribly stable is well-founded, but I don't think that means we
shouldn't expose *anything*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Andres Freund

Hi,

On 2014-04-15 11:07:11 -0400, Robert Haas wrote:
 On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
 [ discussion ]

What I think this discussion shows that this patch isn't ready for
9.4. The first iteration of the patch came in 2013-11-06. Imo that's
pretty damn late for a relatively complex patch. And obviously we don't
have agreement on the course forward.
I don't think we need to stop discussing, but I think it's pretty clear
that this isn't 9.4 material. And that it's far from Ready for Committer.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Andres Freund and...@2ndquadrant.com writes:
 What I think this discussion shows that this patch isn't ready for
 9.4. The first iteration of the patch came in 2013-11-06. Imo that's
 pretty damn late for a relatively complex patch. And obviously we don't
 have agreement on the course forward.
 I don't think we need to stop discussing, but I think it's pretty clear
 that this isn't 9.4 material. And that it's far from Ready for Committer.

Yeah.  I'm still not exactly convinced that custom-scan will ever allow
independent development of new plan types (which, with all due respect to
Robert, is what it was being sold as last year in Ottawa).  But I'm not
opposed in principle to committing it, if we can find a way to have a
cleaner API for things like setrefs.c.  It seems like late-stage planner
processing in general is an issue for this patch (createplan.c and
subselect.c are also looking messy).  EXPLAIN isn't too great either.

I'm not sure exactly what to do about those cases, but I wonder
whether things would get better if we had the equivalent of
expression_tree_walker/mutator capability for plan nodes.  The state
of affairs in setrefs and subselect, at least, is a bit reminiscent
of the bad old days when we had lots of different bespoke code for 
traversing expression trees.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Merlin Moncure

On Mon, Apr 14, 2014 at 7:45 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Apr 14, 2014 at 5:30 PM, Bruce Momjian br...@momjian.us wrote:
 I am glad you are looking at this.  You are right that it requires a
 huge amount of testing, but clearly our code needs improvement in this
 area.

 Thanks.

 Does anyone recall the original justification for the recommendation
 that shared_buffers never exceed 8GiB? I'd like to revisit the test
 case, if such a thing exists.

There are many reports of improvement from lowering shared_buffers.
The problem is that it tends to show up on complex production
workloads and that there is no clear evidence pointing to problems
with the clock sweep; it could be higher up in the partition locks or
something else entirely (like the O/S).  pgbench is also not the
greatest tool for sniffing out these cases: it's too random and for
large database optimization is generally an exercise in de-randomizing
i/o patterns.  We really, really need a broader testing suite that
covers more usage patterns.

I was suspicious for a while that spinlock contention inside the
clocksweep was causing stalls and posted a couple of different patches
to try and reduce the chance of that.  I basically gave up when I
couldn't demonstrate that case in simulated testing.

I still think there is no good reason for the clock to pedantically
adjust usage count on contented buffers...better to throw a single
TTAS and bail to the next buffer if either 'T' signals a lock.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \d+ and oid display

2014-04-15 Thread Bruce Momjian

On Thu, Apr 10, 2014 at 08:05:11PM -0400, Bruce Momjian wrote:
 On Thu, Apr 10, 2014 at 07:58:55PM -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   It also has changed the OID status to only display if it exists.  One
   question that came up with Robert is whether OID status should appear
   for \d as well, now that is only shows up when present.
  
  Yeah, I was wondering about that too.  If part of the argument here is
  to make these two displays act more alike, it seems inconsistent that
  one is emitted by \d while the other only comes out with \d+.
  
  Of course, there are two ways to fix that: maybe the replica info
  also only belongs in \d+?
 
 OK, I changed my patch to only show replica info for \d+.  If we decide
 to change them to both display for \d, I will update it again.

OK, hearing only quiet, I have applied the patch.  I like that we now
document the replication identity default, which should go a long way to
making this clearer.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich ch...@chrullrich.net wrote:
 I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was set,
 the postmaster etc. would ignore the events.

Why not just pass a command-line switch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Peter Geoghegan

On Tue, Apr 15, 2014 at 9:30 AM, Merlin Moncure mmonc...@gmail.com wrote:
 There are many reports of improvement from lowering shared_buffers.
 The problem is that it tends to show up on complex production
 workloads and that there is no clear evidence pointing to problems
 with the clock sweep; it could be higher up in the partition locks or
 something else entirely (like the O/S).  pgbench is also not the
 greatest tool for sniffing out these cases: it's too random and for
 large database optimization is generally an exercise in de-randomizing
 i/o patterns.  We really, really need a broader testing suite that
 covers more usage patterns.

I find it quite dissatisfying that we know so little about this.

I'm finding that my patch helps much less when shared_buffers is sized
large enough to fit the index entirely (although there are still some
localized stalls on master, where there are none with patched).
shared_buffers is still far too small to fit the entire heap. With
shared_buffers=24GB (which still leaves just under 8GB of memory for
the OS to use as cache, since this system has 32GB of main memory),
the numbers are much less impressive relative to master with the same
configuration. Both sets of numbers are still better than what you've
already seen with shared_buffers=8GB, since of course the no more
than 8GB recommendation is not an absolute, and as you say its
efficacy seemingly cannot be demonstrated with pgbench.

My guess is that the patch doesn't help because once there is more
than enough room to cache the entire index (slightly over twice as
many buffers as would be required to do so), even on master it becomes
virtually impossible to evict those relatively popular index pages,
since they still have an early advantage. It doesn't matter that
master's clock sweep has what I've called an excessively short-term
perspective, because there is always enough pressure relative to the
number of leaf pages being pinned to prefer to evict heap pages. There
is still a lot of buffers that can fit some moderate proportion of all
heap pages even after buffering the entire index (something like
~13GB).

You might say that with this new shared_buffers setting, clock sweep
doesn't need to have a good memory, because it can immediately
observe the usefulness of B-Tree leaf pages.

There is no need to limit myself to speculation here, of course. I'll
check it out using pg_buffercache.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Christian Ullrich

* From: Robert Haas

 On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich
 ch...@chrullrich.net wrote:

  I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was
  set, the postmaster etc. would ignore the events.
 
 Why not just pass a command-line switch?

Because, as I wrote in the message you are quoting, I did not think that
having a command-line option for the sole purpose of telling the 
postmaster who its parent is was a suitable solution. 

I had already given up on that idea based on Amit's advice, and I will 
create a patch based on a command-line option.

While I have you here, though, any suggestions on what the name of that
option should be? I think --background is about right. Also, how should
I treat the option on non-Windows platforms? Should it just not be there
(= error), or be ignored if present?

-- 
Christian



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Christian Ullrich

* From: Bruce Momjian

 On Mon, Apr 14, 2014 at 09:34:14AM +0530, Amit Kapila wrote:
  The problem can be solved this way, but the only question here is
  whether it is acceptable for users to have a new console window for
  server.
 
  Can others also please share their opinion if this fix (start server
  in new console) seems acceptable or shall we try by passing some
  information from pg_ctl and then ignore CTRL+C  CTRL+BREAK for it?
 
 I wanted to point out that I think this patch is only appropriate for
 head, not backpatching.  Also, on Unix we have to handle signals that

Yes, of course.

 come from the kill command.  Can you send CTRL+C from other applications
 on Windows?

Yes again, using GenerateConsoleCtrlEvent() you can send these events to
any (console-attached) process you have the required permissions for, 
but that is not an issue for the same reason it isn't one on Unix. All
the target process sees is the event, it cannot determine (nor does it 
care) where the event came from.

-- 
Christian



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Christian Ullrich

* From: Amit Kapila

 On Mon, Apr 14, 2014 at 11:46 AM, Christian Ullrich
 ch...@chrullrich.net wrote:


  * From: Amit Kapila
  Do you mean to say use some existing environment variable?
  Introducing an environment variable to solve this issue or infact
  using some existing environ variable doesn't seem to be the best way
  to pass such information.
 
  I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was
  set, the postmaster etc. would ignore the events.
 
 Do you plan to reset it and if yes when?
 I think there is chance that after setting this environment variable,
 some other instance of server (postmaster) can read it and missed the
 signal which it should have otherwise processed.

We have decided not to go this way, but just for completeness:

Environment inheritance works the same way on Windows as on Unix. When
a process is started with a modified environment (one of the plentiful
arguments of CreateProcess() et al.), only that process and its 
descendants see the modification. I had not planned to set a system-level
or user-level variable.

-- 
Christian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure

2014-04-15 Thread Andres Freund

On 2014-03-28 21:36:11 +0100, Andres Freund wrote:
 Hi,
 
 On 2014-03-27 08:02:35 -0400, Tom Lane wrote:
  Buildfarm member prairiedog thinks there's something unreliable about
  commit f01d1ae3a104019d6d68aeff85c4816a275130b3:
 
  *** 
  /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/expected/alter_table.out
   Thu Mar 27 04:12:40 2014
  --- 
  /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/results/alter_table.out
Thu Mar 27 04:52:02 2014
  ***
  *** 2333,2339 
) mapped;
 incorrectly_mapped | have_mappings 
+---
  !   0 | t
(1 row)
 
 That's rather odd. It has survived for a couple of months on the other
 buildfarm animals now... Could one of you apply the attached patch
 adding more details to eventual failures?

So I had made a notice to recheck on
this. 
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedogbr=HEAD
indicates there haven't been any further failures... So, for now I
assume this was caused by some problem fixed elsewhere.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure

Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-27 08:02:35 -0400, Tom Lane wrote:
 Buildfarm member prairiedog thinks there's something unreliable about
 commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

 So I had made a notice to recheck on
 this. 
 http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedogbr=HEAD
 indicates there haven't been any further failures... So, for now I
 assume this was caused by some problem fixed elsewhere.

Hard to say.  In any case, I agree we can't make any progress unless we
see it again.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity

2014-04-15 Thread David Fetter

On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote:
 On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   psql: conditionally display oids and replication identity
  
  Buildfarm isn't terribly pleased with this --- looks like you missed
  contrib/test_decoding/
 
 Fixed.  I added a personal script option that allows me to test contrib,
 but forgot to run it.

Is that script of general utility for committers?  If so, it might be
good to include it in the distribution.  I'd be happy to go through
and perl-ify it, document it, etc.  Or maybe it could be a new make
target...

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity

2014-04-15 Thread Andres Freund

On 2014-04-15 12:32:36 -0700, David Fetter wrote:
 On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote:
  On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
psql: conditionally display oids and replication identity
   
   Buildfarm isn't terribly pleased with this --- looks like you missed
   contrib/test_decoding/
  
  Fixed.  I added a personal script option that allows me to test contrib,
  but forgot to run it.
 
 Is that script of general utility for committers?  If so, it might be
 good to include it in the distribution.  I'd be happy to go through
 and perl-ify it, document it, etc.  Or maybe it could be a new make
 target...

make check-world

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity

David Fetter da...@fetter.org writes:
 On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote:
 Fixed.  I added a personal script option that allows me to test contrib,
 but forgot to run it.

 Is that script of general utility for committers?  If so, it might be
 good to include it in the distribution.  I'd be happy to go through
 and perl-ify it, document it, etc.  Or maybe it could be a new make
 target...

I'm pretty sure make check-world would've covered this already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-15 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/14/2014 04:34 PM, Joe Conway wrote:
 On 04/14/2014 04:25 PM, Andres Freund wrote:
 On 2014-04-14 16:22:48 -0700, Joe Conway wrote:
 That'll help performance, but lets say I generally keep WAL
 files for PITR and don't turn that off before starting --
 shouldn't I be very surprised to need over 3TB of archive
 storage when loading a 50GB table with a couple of indexes?
 
 The point is that more frequent checkpoints will increase the
 WAL volume *significantly* because more full page writes will
 have to be generated.
 
 OK, I'll see how much it can be brought down through checkpoint
 tuning and report back.

One more question before I get to that. I had applied the following
patch to XLogInsert

8--
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 2f71590..e39cd37 100644
- --- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -737,10 +737,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData
*rdata)
uint32  len,
write_len;
unsignedi;
+   unsignedinorm;
boolupdrqst;
booldoPageWrites;
boolisLogSwitch = (rmid == RM_XLOG_ID  info ==
XLOG_SWITCH);
uint8   info_orig = info;
+   uint32  xl_tot_len;

/* cross-check on whether we should be here or not */
if (!XLogInsertAllowed())
@@ -924,8 +926,23 @@ begin:;
 * header.
 */
INIT_CRC32(rdata_crc);
+   i = 0;
+   inorm = 0;
for (rdt = rdata; rdt != NULL; rdt = rdt-next)
+   {
COMP_CRC32(rdata_crc, rdt-data, rdt-len);
+
+   if (rdt_lastnormal == rdt)
+   {
+   inorm = i;
+   i = 0;
+   }
+   else
+   i++;
+   }
+   xl_tot_len = SizeOfXLogRecord + write_len;
+   if ((inorm + i)  4 || xl_tot_len  2000)
+   elog(LOG,
XLogInsert;tot_nml_blks;%d;tot_bkp_blks;%d;tot_Xlog_Len;%d, inorm,
i, xl_tot_len);

START_CRIT_SECTION();
8--

The idea was to record number of normal and backup blocks, and total
size of the record. I have quite a few entries in the log from the
test run which are like:

8--
2014-04-11 08:42:06.904 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16168
2014-04-11 09:03:12.790 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16172
2014-04-11 10:16:57.949 PDT;LOG:
XLogInsert;tot_nml_blks;3;tot_bkp_blks;5;tot_Xlog_Len;16150
8--

and

8--
2014-04-11 11:17:08.313 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332
2014-04-11 11:17:08.338 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;16020
2014-04-11 11:17:08.389 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332
8--


In other words, based on my inserted logic, it appears that there are
5 and 6 backup blocks on a fairly regular basis.

However in xlog.h it says:
8--
 * If we backed up any disk blocks with the XLOG record, we use flag
 * bits in xl_info to signal it.  We support backup of up to 4 disk
 * blocks per XLOG record.
8--

So is my logic to record number of backup blocks wrong, or is the
comment wrong, or am I otherwise misunderstanding something?

Thanks,

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTZxJAAoJEDfy90M199hl15MQAKTcv9BoZTsXDleSu9JrU1ha
UhHUnDALRmxWLgyPYsgtifxMQ3jLp5eLrkMHGnQbVD17619OgHckuOiEphc2bdQp
MfZlv3jrEqxnmsh6qKhK1J23mHj0cohWXQ9EUoyjE6tlZueLPyMigaIV662KP1d2
pUXCh6IEJYMMaPfqhR5Mxi62s+HMkpAULhafWeEeAwcU1eYNijFWlyxJWlsv7D6X
9ZuDSmRtqnAP0g23GcbxNkL/I9Yv090Uxar7um2Rw5SEUV+Uv1kMY0GVCjHluE0k
qZhSF1tE2jypThhSnv5xRHT3ZzdKoJtNmfLekjws7+dFZbSBLgNOj4EdV0H/wUgf
NqO71kkeRhd44uMRzii0cr03LwBiwqC2apCYoZy7s0X3rl10hZfKgVEKkyhaZ4VJ
QdfR1WdY/hC7mKW7NPnkycF+Es1ykEfuPnKHHsyJ3fHeFGxkKD3I6A8jGnNnS6VL
ba+jx+t3qnrcKQAW8lqQ3rAij5Jkb97Ljibc7o6w8cgnGA4S0tqsE6jDrdDR1FO4
ns5uULTs4REU8clFwiKNZnQfINRUUfqY1mtlRneJMANeafm0j2CyIzvqLqB2mdOH
YL9SS2lIngQlVSfgpu7EiSS7sJx8XGe3a3YFE9DoTBpq009scrscH40+kuN823wp
yruufkzaBN6lyAjo3zoR
=GQDN
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Excessive WAL generation and related performance issue

Joe Conway m...@joeconway.com writes:
 In other words, based on my inserted logic, it appears that there are
 5 and 6 backup blocks on a fairly regular basis.

 However in xlog.h it says:
 8--
  * If we backed up any disk blocks with the XLOG record, we use flag
  * bits in xl_info to signal it.  We support backup of up to 4 disk
  * blocks per XLOG record.
 8--

 So is my logic to record number of backup blocks wrong, or is the
 comment wrong, or am I otherwise misunderstanding something?

The comment is correct, so you did something wrong.  From memory,
there's a goto-label retry loop in that function; maybe you need
to zero your counters after the retry label?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-15 Thread Heikki Linnakangas


On 04/15/2014 11:53 PM, Joe Conway wrote:

One more question before I get to that. I had applied the following
patch to XLogInsert

8--
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 2f71590..e39cd37 100644
- --- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -737,10 +737,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData
*rdata)
 uint32  len,
 write_len;
 unsignedi;
+   unsignedinorm;
 boolupdrqst;
 booldoPageWrites;
 boolisLogSwitch = (rmid == RM_XLOG_ID  info ==
XLOG_SWITCH);
 uint8   info_orig = info;
+   uint32  xl_tot_len;

 /* cross-check on whether we should be here or not */
 if (!XLogInsertAllowed())
@@ -924,8 +926,23 @@ begin:;
  * header.
  */
 INIT_CRC32(rdata_crc);
+   i = 0;
+   inorm = 0;
 for (rdt = rdata; rdt != NULL; rdt = rdt-next)
+   {
 COMP_CRC32(rdata_crc, rdt-data, rdt-len);
+
+   if (rdt_lastnormal == rdt)
+   {
+   inorm = i;
+   i = 0;
+   }
+   else
+   i++;
+   }
+   xl_tot_len = SizeOfXLogRecord + write_len;
+   if ((inorm + i)  4 || xl_tot_len  2000)
+   elog(LOG,
XLogInsert;tot_nml_blks;%d;tot_bkp_blks;%d;tot_Xlog_Len;%d, inorm,
i, xl_tot_len);

 START_CRIT_SECTION();
8--

The idea was to record number of normal and backup blocks, and total
size of the record. I have quite a few entries in the log from the
test run which are like:

8--
2014-04-11 08:42:06.904 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16168
2014-04-11 09:03:12.790 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16172
2014-04-11 10:16:57.949 PDT;LOG:
XLogInsert;tot_nml_blks;3;tot_bkp_blks;5;tot_Xlog_Len;16150
8--

and

8--
2014-04-11 11:17:08.313 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332
2014-04-11 11:17:08.338 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;16020
2014-04-11 11:17:08.389 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332
8--


In other words, based on my inserted logic, it appears that there are
5 and 6 backup blocks on a fairly regular basis.

However in xlog.h it says:
8--
  * If we backed up any disk blocks with the XLOG record, we use flag
  * bits in xl_info to signal it.  We support backup of up to 4 disk
  * blocks per XLOG record.
8--

So is my logic to record number of backup blocks wrong, or is the
comment wrong, or am I otherwise misunderstanding something?


You're counting XLogRecData structs, not backup blocks. Each backup 
block typically consists of three XLogRecData structs, one to record a 
BkpBlock struct, one to record the data before the unused hole in the 
middle of the page, and one for after. Sometimes just two, if there is 
no hole to skip. See the loop just before the CRC calculation, that's 
where the XLogRecData entries for backup blocks are created.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-15 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/15/2014 02:15 PM, Heikki Linnakangas wrote:
 You're counting XLogRecData structs, not backup blocks. Each
 backup block typically consists of three XLogRecData structs, one
 to record a BkpBlock struct, one to record the data before the
 unused hole in the middle of the page, and one for after.
 Sometimes just two, if there is no hole to skip. See the loop just
 before the CRC calculation, that's where the XLogRecData entries
 for backup blocks are created.
 
 - Heikki

Ah, thanks for the explanation!

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTaLxAAoJEDfy90M199hlwuQP/3Tuea1TUe+4L21ZdProNIIF
fUtejFNUwqhkyWNcnePlubgSyTEIfHGEG9hatrB5/MWdzpiyEvXdDkvV1ODakEhJ
CVKZbnQ4dmnrevypy2f2YdhlbB9du/DDFhFcPOZGbn+vLywwM9oMPS8tQmsol37e
aITe2GnD5LpEcmCSqzz04OL+xAxKLe8fXaI9dDsTRWXb9qdj4pDHI706CeixwSFb
sGsGcIHXmnWieMby9qfWc0WGpc38iMRRkE+LeaEULhsycFP/2x09irXdhbl5T1SH
4PItwX0/ZgLskklG2gaD4HpNe75+Emj1i22PHDYhXSoAzpykHUf+kZZwMUr0AbaF
5QVCer071jHaMacpaVC7/qwUt8zISx4/1wtJuQzfk5H3P2q4L+b/xPmod5/cqs9z
/wFp+9kjMT4349sSMe1eDPTDoIZKgRh8Eiag5IfJtrOAjoK+FN+k8uWNikiyFDMu
z/3l+6mbfrl7FAmfeXLFC9fqhhGOiGLHoZufB/4qFgEikj4S94Hx9Q0nHqkMsFvM
Fcd3qcpLI06Xku7LmBPRvdZ8OVFGWAirH1jBlrdsvC9E5VoZxgByxg90EaTlwjAQ
1ZaGOsbQoXdOPOwe/rGx2ONGwgZp8uFwHSXzUY+CJucvfYQh2AD67AlEhS7Jb9NC
ummpulzJ6arce0815KaT
=Y+m4
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Need Multixact Freezing Docs

2014-04-15 Thread Josh Berkus

Hackers,

We need documentation on how users should intelligently set the
multixact freeze settings.  I'm happy to write the actual text, but I
definitely don't have any idea how to set these myself.  Under what
circumstances should they be different from freeze_max_age?  How?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I have checked that other place in code also check handle to
 decide if API has failed.  Refer function PGSharedMemoryIsInUse().
 So I think fix to call GetLastError() after checking handle is okay.
 Attached patch fixes this issue.  After patch, the server shows below
 log which is exactly what is expected from test_shm_mq

 In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred
 way to test for a NULL pointer.  I notice that the !hmap style is used
 throughout this code, so I guess cleaning that up is a matter for a
 separate commit.

 I think in that case we might want to cleanup some other similar usage
 (PGSharedMemoryCreate) of !hmap.

Ah.  Well, in that case maybe we should just leave it alone, since
it's been like that forever and nobody's cared until now.

 For the create case, I'm wondering if we should put the block that
 tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
 is your opinion?

 Either way is okay, but I think the way you are suggesting is better as it
 will make code consistent with other place (PGSharedMemoryCreate()).

OK, can you prepare a patch?


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How can we make beta testing better?

2014-04-15 Thread Josh Berkus

Hackers,

I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

I've tried a couple of things over the last two years and they haven't
worked all that well.  Since we're about to go into another beta testing
period, we need something new.  Ideas?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Question about optimising (Postgres_)FDW

2014-04-15 Thread Hannu Krosing

Hi

I am playing around with postgres_fdw and found that the following code ...

--
CREATE EXTENSION postgres_fdw;

CREATE SERVER loop foreign data wrapper postgres_fdw
  OPTIONS (port '5432', dbname 'testdb');
 
CREATE USER MAPPING FOR PUBLIC SERVER loop;

create table onemillion (
id serial primary key,
inserted timestamp default clock_timestamp(),
data text
);

insert into onemillion(data) select random() from
generate_series(1,100);

CREATE FOREIGN TABLE onemillion_pgfdw (
id int,
inserted timestamp,
data text
) SERVER loop
OPTIONS (table_name 'onemillion',
 use_remote_estimate 'true');

testdb=# explain analyse
select * from onemillion_pgfdw where id in (select id from onemillion
where data  '0.9' limit 100);
   QUERY
PLAN   
-
 Nested Loop  (cost=122.49..10871.06 rows=50 width=44) (actual
time=4.269..93.444 rows=100 loops=1)
   -  HashAggregate  (cost=22.06..23.06 rows=100 width=4) (actual
time=1.110..1.263 rows=100 loops=1)
 -  Limit  (cost=0.00..20.81 rows=100 width=4) (actual
time=0.038..1.026 rows=100 loops=1)
   -  Seq Scan on onemillion  (cost=0.00..20834.00
rows=100115 width=4) (actual time=0.036..0.984 rows=100 loops=1)
 Filter: (data  '0.9'::text)
 Rows Removed by Filter: 805
   -  Foreign Scan on onemillion_pgfdw  (cost=100.43..108.47 rows=1
width=29) (actual time=0.772..0.773 rows=1 loops=100)
 Total runtime: 93.820 ms
(8 rows)

Time: 97.283 ms
--

... actually performs 100 distinct SELECT * FROM onemillion WHERE id =
$1 calls on remote side.

Is there a way to force it to prefer a plan where the results of (select
id from onemillion where data  '0.9' limit 100)
are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
all at once ?

If not, how hord would it be to add this feature ?

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get more from indices.

Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 [ pathkey_and_uniqueindx_v10_20130411.patch ]

I thought some more about this patch, and realized that it's more or less
morally equivalent to allowing references to ungrouped variables when the
query has a GROUP BY clause listing all the columns of the primary key.
In that case the parser is effectively pretending that the GROUP BY list
contains additional implicit entries that are functionally dependent on
the entries that are actually there.  In this patch, what we want to do
is recognize that trailing entries in an ORDER BY list are semantically
no-ops and can be ignored because they are functionally dependent on
earlier entries.

Now, the reason that the parser restricts the functional dependency
deduction to a primary key is that it wants to be able to identify a
constraint OID that the query is dependent on to be semantically valid.
In this case, we don't need such an OID, so just finding any old unique
index on not-null columns is good enough.  (If someone drops the index,
the optimization might become incorrect, but that would force replanning
anyway.)

However, this way of thinking about it shows that the patch is missing
possible optimizations.  If we have ORDER BY a, b, c and (a,b) is the
primary key, then including c in the ORDER BY list is semantically
redundant, *whether or not we use an indexscan on the pkey index at all*.
More: if we have ORDER BY a, b, c and the primary key is (b,a), we
can still discard c from the sort requirement, even though the pkey
index as such isn't helpful for producing the required order.

So hacking up the pathkeys attributed to the indexscan is the wrong thing.
Rather, what we should be looking to do is decide that c is a useless
pathkey and remove it from the query_pathkeys, much as we'd do if we found
c = constant in WHERE.  That would allow optimization of other query
plans besides scan-the-pkey-index plans.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff - if

2014-04-15 Thread Thom Brown

On 15 April 2014 23:19, Andreas 'ads' Scherbaum adsm...@wars-nicht.de wrote:

 Hi,

 stumbled over a number of iff in the source where if is meant - not sure
 what the real story behind this is, but attached is a patch to fix the about
 80 occurrences.

 This only appears in comments, not in any code path.

Yeah, apparently those are intentional, and mean if and only if (i.e. =)

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Need Multixact Freezing Docs

2014-04-15 Thread Josh Berkus

On 04/15/2014 02:25 PM, Josh Berkus wrote:
 Hackers,
 
 We need documentation on how users should intelligently set the
 multixact freeze settings.  I'm happy to write the actual text, but I
 definitely don't have any idea how to set these myself.  Under what
 circumstances should they be different from freeze_max_age?  How?
 

Also: how do I check the multixact age of a table?  There doesn't seem
to be any data for this ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Ants Aasma

On Mon, Apr 14, 2014 at 8:11 PM, Peter Geoghegan p...@heroku.com wrote:
PostgreSQL implements a clock sweep algorithm, which gets us something
approaching an LRU for the buffer manager in trade-off for less
contention on core structures. Buffers have a usage_count/popularity
that currently saturates at 5 (BM_MAX_USAGE_COUNT). The classic CLOCK
algorithm only has one bit for what approximates our usage_count (so
it's either 0 or 1). I think that at its core CLOCK is an algorithm
that has some very desirable properties that I am sure must be
preserved. Actually, I think it's more accurate to say we use a
variant of clock pro, a refinement of the original CLOCK.

PostgreSQL replacement algorithm is more similar to Generalized CLOCK
or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm
that approximates LIRS[3]. LIRS is what MySQL implements[4] and
CLOCK-Pro is implemented by NetBSD [5] and there has been some work on
trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double
the cache size metadata entries and detect pages that have been
recently referenced. Basically they provide an adaptive tradeoff
between LRU and LFU.

In the past, various hackers have noted problems they've observed with
this scheme. A common pathology is to see frantic searching for a
victim buffer only to find all buffer usage_count values at 5. It may
take multiple revolutions of the clock hand before a victim buffer is
found, as usage_count is decremented for each and every buffer. Also,
BufFreelistLock contention is considered a serious bottleneck [1],
which is related.

There's a paper on a non blocking GCLOCK algorithm, that does lock
free clock sweep and buffer pinning[7]. If we decide to stay with
GCLOCK it may be interesting, although I still believe that some
variant of buffer nailing[8] is a better idea, my experience shows
that most of the locking overhead is cache line bouncing ignoring the
extreme cases where our naive spinlock implementation blows up.

Let's leave aside inner/root pages though, because they're so
dramatically useful when in a primary index on a tpb-b table that
they'll always be cached by any non-terrible algorithm. It beggars
belief that the still relatively dense (and consequently *popular*)
B+Tree leaf pages get so little credit for being of such long-term
utility (in the view of our clock sweep algorithm as implemented). The
algorithm has what could be loosely described as an excessively
short-term perspective. There is clearly a better balance to be had
here. I don't think the answer is that we have the B-Tree code give
its pages some special treatment that makes them harder to evict,
although I will admit that I briefly entertained the idea.

There has been some research that indicates that for TPC-A workloads
giving index pages higher weights increases hitrates[1].

I think the hardest hurdle for any changes in this area will be
showing that we don't have any nasty regressions. I think the best way
to do that would be to study separately the performance overhead of
the replacement algorithm and optimality of the replacement choices.
If we capture a bunch of buffer reference traces by instrumenting
PinBuffer, we can pretty accurately simulate the behavior of different
algorithm and tuning choices with different shared buffer sizes.
Obviously full scale tests are still needed due to interactions with
OS, controller and disk caches and other miscellaneous influences. But
even so, simulation would get us much better coverage of various
workloads and at least some confidence that it's a good change
overall. It will be very hard and time consuming to gather equivalent
evidence with full scale tests.

[1] http://www.csd.uoc.gr/~hy460/pdf/p35-nicola.pdf
[2] http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf
[3] http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/jiang02_LIRS.pdf
[4] http://lists.mysql.com/commits/28601
[5] http://fxr.watson.org/fxr/source/uvm/uvm_pdpolicy_clockpro.c?v=NETBSD
[6] http://lwn.net/Articles/147879/
[7]
http://derby-nb.googlecode.com/svn-history/r41/trunk/derby-nb/ICDE10_conf_full_409.pdf
[8]
http://www.postgresql.org/message-id/ca+tgmozypeyhwauejvyy9a5andoulcf33wtnprfr9sycw30...@mail.gmail.com

Regards,
Ants Aasma
--
Cybertec Schönig Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about optimising (Postgres_)FDW

Hannu Krosing ha...@2ndquadrant.com writes:
 Is there a way to force it to prefer a plan where the results of (select
 id from onemillion where data  '0.9' limit 100)
 are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
 all at once ?

You could write the query like that:

select * from onemillion_pgfdw where id = any (array(select id from
onemillion where data  '0.9' limit 100));

Or at least you should be able to, except when I try it I get

explain analyze
select * from onemillion_pgfdw where id = any (array(select id from
onemillion where data  '0.9' limit 100));
ERROR:  operator does not exist: integer = integer[]
HINT:  No operator matches the given name and argument type(s). You might need 
to add explicit type casts.
CONTEXT:  Remote SQL command: EXPLAIN SELECT id, inserted, data FROM 
public.onemillion WHERE ((id = ANY ((SELECT null::integer[]

so there's something the remote-estimate code is getting wrong here.
(It seems to work without remote_estimate, though.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Peter Geoghegan

On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma a...@cybertec.at wrote:
 PostgreSQL replacement algorithm is more similar to Generalized CLOCK
 or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm
 that approximates LIRS[3]. LIRS is what MySQL implements[4] and
 CLOCK-Pro is implemented by NetBSD [5] and there has been some work on
 trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double
 the cache size metadata entries and detect pages that have been
 recently referenced. Basically they provide an adaptive tradeoff
 between LRU and LFU.

That's good to know.

 There's a paper on a non blocking GCLOCK algorithm, that does lock
 free clock sweep and buffer pinning[7]. If we decide to stay with
 GCLOCK it may be interesting, although I still believe that some
 variant of buffer nailing[8] is a better idea, my experience shows
 that most of the locking overhead is cache line bouncing ignoring the
 extreme cases where our naive spinlock implementation blows up.

You might be right about that, but lets handle one problem at a time.
Who knows what the bottleneck will end up being if and when we address
the naivety around frequency? I want to better characterize that
problem first.

 There has been some research that indicates that for TPC-A workloads
 giving index pages higher weights increases hitrates[1].

Frankly, there doesn't need to be any research on this, because it's
just common sense that probabilistically, leaf pages are much more
useful than heap pages in servicing index scan queries if we assume a
uniform distribution. If we don't assume that, then they're still more
useful on average.

 I think the hardest hurdle for any changes in this area will be
 showing that we don't have any nasty regressions. I think the best way
 to do that would be to study separately the performance overhead of
 the replacement algorithm and optimality of the replacement choices.
 If we capture a bunch of buffer reference traces by instrumenting
 PinBuffer, we can pretty accurately simulate the behavior of different
 algorithm and tuning choices with different shared buffer sizes.
 Obviously full scale tests are still needed due to interactions with
 OS, controller and disk caches and other miscellaneous influences. But
 even so, simulation would get us much better coverage of various
 workloads and at least some confidence that it's a good change
 overall. It will be very hard and time consuming to gather equivalent
 evidence with full scale tests.

I think I agree with all of that. The fact that we as a community
don't appear to have too much to say about what workloads to
prioritize somewhat frustrates this. The other problem is that sizing
shared_buffers appropriately involves a surprising amount of deference
to rules of thumb that in practice no one is quite prepared to
rigorously defend - who is to say what apportionment of memory to
Postgres is appropriate here? I too was hopeful that we could evaluate
this work purely in terms of observed improvements to hit rate (at
least initially), but now I doubt even that. It would be great to be
able to say here are the parameters of this discussion, and have
everyone immediately agree with that, but in this instance that's
legitimately not possible.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff - if

2014-04-15 Thread Andrew Dunstan



On 04/15/2014 06:26 PM, Thom Brown wrote:

On 15 April 2014 23:19, Andreas 'ads' Scherbaum adsm...@wars-nicht.de wrote:

Hi,

stumbled over a number of iff in the source where if is meant - not sure
what the real story behind this is, but attached is a patch to fix the about
80 occurrences.

This only appears in comments, not in any code path.

Yeah, apparently those are intentional, and mean if and only if (i.e. =)



This is a reasonably common idiom, or used to be.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff - if

2014-04-15 Thread Steve Crawford


On 04/15/2014 05:36 PM, Andrew Dunstan wrote:


On 04/15/2014 06:26 PM, Thom Brown wrote:
On 15 April 2014 23:19, Andreas 'ads' Scherbaum 
adsm...@wars-nicht.de wrote:

Hi,

stumbled over a number of iff in the source where if is meant - 
not sure
what the real story behind this is, but attached is a patch to fix 
the about

80 occurrences.

This only appears in comments, not in any code path.
Yeah, apparently those are intentional, and mean if and only if 
(i.e. =)




This is a reasonably common idiom, or used to be.


If it has fallen into disuse the news has failed to reach me:

http://en.wikipedia.org/wiki/If_and_only_if
http://www.mathwords.com/i/if_and_only_if.htm
http://mathworld.wolfram.com/Iff.html
...

Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff - if

Steve Crawford scrawf...@pinpointresearch.com writes:
 On 04/15/2014 05:36 PM, Andrew Dunstan wrote:
 On 04/15/2014 06:26 PM, Thom Brown wrote:
 Yeah, apparently those are intentional, and mean if and only if 

 This is a reasonably common idiom, or used to be.

 If it has fallen into disuse the news has failed to reach me:
 http://en.wikipedia.org/wiki/If_and_only_if
 http://www.mathwords.com/i/if_and_only_if.htm
 http://mathworld.wolfram.com/Iff.html

The last discussion about this suggested that non-English speakers
might be familiar with other abbreviations for the concept.
But anyway, yes, those are not typos.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff - if

2014-04-15 Thread Christopher Browne

On Tue, Apr 15, 2014 at 6:19 PM, Andreas 'ads' Scherbaum 
adsm...@wars-nicht.de wrote:


 Hi,

 stumbled over a number of iff in the source where if is meant - not
 sure what the real story behind this is, but attached is a patch to fix the
 about 80 occurrences.


IFF is a common idiom in mathematics (and logic, which means it will be
of interest in computer science, which shares boundaries with both) which
stands for If And Only If.

Unless it has very recently fallen out of fashion in mathematics, it's
almost certainly intentional to use iff in the comments, and not merely a
typo.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

Re: [HACKERS] Patch: iff - if

2014-04-15 Thread David G Johnston

Thom Brown-2 wrote
 On 15 April 2014 23:19, Andreas 'ads' Scherbaum lt;

 adsmail@

 gt; wrote:

 Hi,

 stumbled over a number of iff in the source where if is meant - not
 sure
 what the real story behind this is, but attached is a patch to fix the
 about
 80 occurrences.

 This only appears in comments, not in any code path.
 
 Yeah, apparently those are intentional, and mean if and only if (i.e.
 =)

Just looking at the first few items someone's good intention is being ruined
by bad execution...especially:

Add...references to the section...a block is in [iff] chapters aren't
autolabelled...

Many of these are not mathematical propositions but flow-control logic for
which IF is indeed the correct term-of-art; though re-reading the example
above that is probably one of the more logical ones...

Now:

...new tuple was inserted, and its HEAP_ONLY_TUPLE flag is set [if/iff] a
HOT update was done

needs to be evaluated on its merits; namely does HEAP_ONLY_TUPLE ever get
set if a HOT update was not done?  If not then IFF is proper and necessary
to convey that fact (though even this one is marginal and both versions are
likely unambiguous in reality).

In short returns BOOLEAN can reasonably be said to properly use IF in
almost all cases - especially for those functions that are check-oriented
(and thus obviously if the opposite condition is present the opposite result
would be returned).  IFF is not incorrect here but for me is misplaced and
too-much-information.

Given that it is unlikely the mis/over-use of IFF in the comments is a
meaningful problem I would vote for letting the imprecision go and avoid the
code churn.  If the surrounding code was being altered anyway then having
the corresponding comment updated seems acceptable; but a blanket
find/replace doesn't seem helpful.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Patch-iff-if-tp5800153p5800172.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The question about the type numeric

2014-04-15 Thread sure.postgres

Hi hackers,

I am learning about numeric .
The comment of NumericShort format is:
 * In the NumericShort format, the remaining 14 bits of the header word
 * (n_short.n_header) are allocated as follows: 1 for sign (positive or
 * negative), 6 for dynamic scale, and 7 for weight.  In practice, most
 * commonly-encountered values can be represented this way.

So the Max of the NumericShort format should be up to 508 digits before the 
decimal point.
So the sign of the number 12345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567 
should be 0x807F.
The number is 257 digits before the decimal point.
But the sign is 0.
So is there anything wrong?

Thank you!

2014-04-15



wangshuo
HighGo Software Co.,Ltd.
Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan 
Shandong, China(Head Office)
Tel:+86-0531-55701530
Fax:+86-0531-55701544
Website：www.highgo.com
Mobile：18766416137

Re: [HACKERS] The question about the type numeric

2014-04-15 Thread David G Johnston

sure.postgres wrote
Hi hackers,

I am learning about numeric .
The comment of NumericShort format is:
* In the NumericShort format, the remaining 14 bits of the header word
* (n_short.n_header) are allocated as follows: 1 for sign (positive or
* negative), 6 for dynamic scale, and 7 for weight. In practice, most
* commonly-encountered values can be represented this way.

So the Max of the NumericShort format should be up to 508 digits before
the decimal point.
So the sign of the number
12345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567
should be 0x807F.
The number is 257 digits before the decimal point.
But the sign is 0.
So is there anything wrong?

[1000 00][0][00111 0][001]

I appreciate that you got no responses on the original e-mail but if you are
going to re-raise the question at least have the courtesy to respond to your
original thread and not go and start a new one. And maybe trying rephrasing
the question since most likely your original question was not worded in such
a way to garner a response.

I may have this totally wrong but I don't see why the sign of your number
should be anything but zero since that is, I presume, the value of the
specific bit for a positive number - which yours is.

So, in short, nothing seems to be wrong. If you think something is wrong
you should probably state what that is explicitly and ask someone to explain
what is happening.

I would have said all this when I saw the first e-mail but I wasn't (and
still am not) totally clear on what you are asking and was hoping someone
more familiar could make better sense of it.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/The-question-about-the-type-numeric-tp5800173p5800174.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-15 Thread Amit Kapila

On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 For the create case, I'm wondering if we should put the block that
 tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
 is your opinion?

 Either way is okay, but I think the way you are suggesting is better as it
 will make code consistent with other place (PGSharedMemoryCreate()).

 OK, can you prepare a patch?

Please find attached patch to address this issue.
One minor point to note is that now we have to call GetLastError() twice,
once inside error path and once to check EEXIST, but I think that is okay
as existing code in PGSharedMemoryCreate() does it that way.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_dsm_invalid_errcode_issue-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Amit Kapila

On Tue, Apr 15, 2014 at 11:53 PM, Christian Ullrich
ch...@chrullrich.net wrote:
 * From: Robert Haas
 Why not just pass a command-line switch?

 Because, as I wrote in the message you are quoting, I did not think that
 having a command-line option for the sole purpose of telling the
 postmaster who its parent is was a suitable solution.

 I had already given up on that idea based on Amit's advice, and I will
 create a patch based on a command-line option.

 While I have you here, though, any suggestions on what the name of that
 option should be? I think --background is about right.

--background as switch name seems to be okay.

 Also, how should
 I treat the option on non-Windows platforms? Should it just not be there
 (= error), or be ignored if present?

I think ignored for non-windows is better way to proceed.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

On 4/14/14, 10:30 PM, Andrew Dunstan wrote:
 
 On 04/14/2014 10:17 PM, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
 Add TAP tests for client programs
 I assume the buildfarm would need to be taught about this?


 
 
 Yes. It probably won't be a huge change, but it will need a bit of code.

It might be more future-proof if the build farm just called make
check-world and used some other way to identify the individual tests in
that output.  Otherwise, we'll need a new build farm release every time
a test suite is added.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

Peter Eisentraut pete...@gmx.net writes:
 On 4/14/14, 10:30 PM, Andrew Dunstan wrote:
 Yes. It probably won't be a huge change, but it will need a bit of code.

 It might be more future-proof if the build farm just called make
 check-world and used some other way to identify the individual tests in
 that output.  Otherwise, we'll need a new build farm release every time
 a test suite is added.

That argument would be more convincing if make check-world worked
on Windows ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup

On 4/9/14, 10:57 AM, Magnus Hagander wrote:
 So it'd be an array, and by default you'd have something like:
 basebackup_skip_path = $log_directory
 ?
 
 Maybe use it to skip backup labels by default as well.
 basebackup_skip_path = $log_directory, $backup_label_files
 
 
 I hadn't considered any details, but yes, someting along that line. And
 then you could also include arbitrary filenames or directories should
 you want.

What are the use cases for excluding anything else?

pg_basebackup ought to have some intelligence about what files are
appropriate to include or exclude, depending on what the user is trying
to do.  It shouldn't become a general file copying tool.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The question about the type numeric

2014-04-15 Thread sure.postgres

Hi hackers,

I am learning about numeric .
The comment of NumericShort format is:
 * In the NumericShort format, the remaining 14 bits of the header word
 * (n_short.n_header) are allocated as follows: 1 for sign (positive or
 * negative), 6 for dynamic scale, and 7 for weight.  In practice, most
 * commonly-encountered values can be represented this way.

So the Max of the NumericShort format should be up to 508 digits before the 
decimal point.
So the sign of the number 12345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567 
should be 0x807F.
The number is 257 digits before the decimal point.
But the sign is 0.
So is there anything wrong?

2014-04-15



wangshuo
HighGo Software Co.,Ltd.
Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan 
Shandong, China(Head Office)
Tel:+86-0531-55701530
Fax:+86-0531-55701544
Website：www.highgo.com
Mobile：18766416137

Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

On 4/15/14, 11:11 PM, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On 4/14/14, 10:30 PM, Andrew Dunstan wrote:
 Yes. It probably won't be a huge change, but it will need a bit of code.
 
 It might be more future-proof if the build farm just called make
 check-world and used some other way to identify the individual tests in
 that output.  Otherwise, we'll need a new build farm release every time
 a test suite is added.
 
 That argument would be more convincing if make check-world worked
 on Windows ...

What about it doesn't work on Windows?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

On 4/14/14, 3:28 PM, Peter Eisentraut wrote:
 On 4/4/14, 10:07 AM, Andres Freund wrote:
 If
 somebody previously tried to do the correct thing and attached
 PGDLLEXPORT to their own *function* prototoype, it would cause problems
 now.
 
 What is the difference (on affected platforms) between
 
 Datum funcname(PG_FUNCTION_ARGS);
 
 and writing (effectively)
 
 PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS);
 Datum funcname(PG_FUNCTION_ARGS);
 
 or for that matter
 
 Datum funcname(PG_FUNCTION_ARGS);
 PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS);
 
 
 If there isn't a difference, then my patch is fine.  Otherwise, it might
 be good to document the issues for extension authors.

Let me point out again that my patch doesn't actually do anything about
PGDLLEXPORT or the like.  It just adds automatic prototypes into
PG_FUNCTION_INFO_V1, to reduce compiler warnings in extensions and
reduce some boilerplate in general.

If it turns out that this might help someone optimize the Windows build,
then great.  But I gather it won't, so so what.  Or maybe it can be made
to work, in which case extension authors will get compiler errors about
places they need to clean up.  So it could still help that way, but who
knows, that's not the point.

If there are still concerns in this area, we could commit just the part
that adds the prototype to PG_FUNCTION_INFO_V1 and let that sit for a
while, and then remove the (now redundant) explicit prototypes in a
later release, so if there is a need to revert it, it won't be so big.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [doc] EXPLAIN CREATE MATERIALIZED VIEW AS?

2014-04-15 Thread Amit Langote

Hi,

Attached adds CREATE MATERIALIZED VIEW AS to the list of statements
that can be EXPLAINed.

--
Amit


explain-create-materialized-view-as.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

Peter Eisentraut pete...@gmx.net writes:
 Let me point out again that my patch doesn't actually do anything about
 PGDLLEXPORT or the like.  It just adds automatic prototypes into
 PG_FUNCTION_INFO_V1, to reduce compiler warnings in extensions and
 reduce some boilerplate in general.

Hmm ... for some reason I had gotten it in my head that you were adding
PGDLLEXPORT to the autogenerated extern declarations, but at least the
version of the patch in 1389762012.24046.2.ca...@vanquo.pezone.net
doesn't do that, so the point is moot.

I still object to the aspect of the patch that moves the externs for
_PG_init/_PG_fini into fmgr.h: that is conceptually wrong and will create
more confusion than the trivial code savings is worth.  But I won't
complain if you commit the PG_FUNCTION_INFO_V1 changes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Kouhei Kaigai

 Andres Freund and...@2ndquadrant.com writes:
  What I think this discussion shows that this patch isn't ready for
  9.4. The first iteration of the patch came in 2013-11-06. Imo that's
  pretty damn late for a relatively complex patch. And obviously we
  don't have agreement on the course forward.
  I don't think we need to stop discussing, but I think it's pretty
  clear that this isn't 9.4 material. And that it's far from Ready for
 Committer.
 
Yep, today is the expected feature freeze date towards v9.4.
It is little bit late to include v9.4 features, unfortunately.

 Yeah.  I'm still not exactly convinced that custom-scan will ever allow
 independent development of new plan types (which, with all due respect to
 Robert, is what it was being sold as last year in Ottawa).  But I'm not
 opposed in principle to committing it, if we can find a way to have a cleaner
 API for things like setrefs.c.  It seems like late-stage planner processing
 in general is an issue for this patch (createplan.c and subselect.c are
 also looking messy).  EXPLAIN isn't too great either.
 
 I'm not sure exactly what to do about those cases, but I wonder whether
 things would get better if we had the equivalent of
 expression_tree_walker/mutator capability for plan nodes.  The state of
 affairs in setrefs and subselect, at least, is a bit reminiscent of the
 bad old days when we had lots of different bespoke code for traversing
 expression trees.

Hmm. If we have something like expression_tree_walker/mutator for plan nodes,
we can pass a walker/mutator function's pointer instead of exposing static
functions that takes recursive jobs.
If custom-plan provider (that has sub-plans) got a callback with walker/
mutator pointer, all it has to do for sub-plans are calling this new
plan-tree walking support routine with supplied walker/mutator.
It seems to me more simple design than what I did.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Kouhei Kaigai

 On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Yeah.  After a fast review of the custom-scan and cache-scan
  patches, it seems to me that my original fears are largely
  confirmed: the custom scan patch is not going to be sufficient to
  allow development of any truly new plan type.  Yeah, you can plug in
  some new execution node types, but actually doing anything
  interesting is going to require patching other parts of the system.
 
  Without prejudice to the rest of what you said, this argument doesn't
  hold much water with me.  I mean, anything that our extensibility
  mechanism doesn't support today will require new hooks, but does that
  mean we're never going to add any more hooks?  I sure hope not.
 
  No, that's not what I said.  ISTM that the argument for the
  custom-scan API is that it allows interesting new things to be done
  *without further modifying the core code*.  But the example
  application (cache_scan) fails to demonstrate that, and indeed seems
  to be a counterexample.  Whether we'd accept cache_scan on its own
  merits is a separate question.  The problem for me is that custom-scan
  isn't showing that it can support what was claimed without doing
  serious damage to modularity and maintainability of the core code.
 
 I think there's two separate things in there, one of which I agree with
 and one of which I disagree with.  I agree that we must avoid damaging the
 modularity and maintainability of the core code; I don't agree that
 custom-scan needs to be able to do interesting things with zero additional
 changes to the core code.  If we come up with three interesting applications
 for custom scan that require 5 new hooks between them, I'll consider that
 a major success - assuming those hooks don't unduly limit future changes
 we may wish to make in the core code.  I think your concern about exposing
 APIs that may not be terribly stable is well-founded, but I don't think
 that means we shouldn't expose *anything*.
 
I agree 100%.

We usually change hook definition release-by-release, and it is author's
responsibility to follow the newer interface if he continues to maintain
his extension on the newer release also.
Probably, it is a gray stuff neither black nor white. If we can design
a perfect interface, it might be good but has no evolution further.
Of course, it does not justify poor designed interface, but an important
stuff is to find out a best way at this moment. It may take core
refactoring, not just exposing static functions. What I tried to implement
is the only way to implement it.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-15 Thread Hannu Krosing

On 04/16/2014 01:25 AM, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
 Is there a way to force it to prefer a plan where the results of (select
 id from onemillion where data  '0.9' limit 100)
 are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
 all at once ?
 You could write the query like that:

 select * from onemillion_pgfdw where id = any (array(select id from
 onemillion where data  '0.9' limit 100));
My actual use-case was about a join between a local and a remote table
and without rewriting the query (they come from ORM)

I was hoping to be able to nudge postgresql towards a better plan via some
tuning of table/fdw options or GUCs.

for example, would postgresql use the WHERE id IN (...) query on remote
side for a query like

select r.data, l.data
  from onemillion_pgfdw r
  join onemillion l
on r.id = l.id and l.data  '0.999';

if it recognizes that the local side returns only 1000 rows ?

or would it still use 1000 individual WHERE id = $1 queries.

Is getting the foreign data via IN and then turning the data into a hash
for joining one of the plans it considers at all ?

Best
Hannu


 Or at least you should be able to, except when I try it I get

 explain analyze
 select * from onemillion_pgfdw where id = any (array(select id from
 onemillion where data  '0.9' limit 100));
 ERROR:  operator does not exist: integer = integer[]
 HINT:  No operator matches the given name and argument type(s). You might 
 need to add explicit type casts.
 CONTEXT:  Remote SQL command: EXPLAIN SELECT id, inserted, data FROM 
 public.onemillion WHERE ((id = ANY ((SELECT null::integer[]

 so there's something the remote-estimate code is getting wrong here.
 (It seems to work without remote_estimate, though.)

   regards, tom lane


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Amit Kapila

On Wed, Apr 16, 2014 at 5:00 AM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma a...@cybertec.at wrote:
There's a paper on a non blocking GCLOCK algorithm, that does lock
free clock sweep and buffer pinning[7]. If we decide to stay with
GCLOCK it may be interesting, although I still believe that some
variant of buffer nailing[8] is a better idea, my experience shows
that most of the locking overhead is cache line bouncing ignoring the
extreme cases where our naive spinlock implementation blows up.

You might be right about that, but lets handle one problem at a time.
Who knows what the bottleneck will end up being if and when we address
the naivety around frequency? I want to better characterize that
problem first.

Just to summarize you about the previous discussion and the
improvements that we decided to do in this area based on feedback
are as follows:

1. Bgwriter needs to be improved so that it can help in reducing
usage count and finding next victim buffer (run the clock sweep
and add buffers to the free list).
2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist
are less.
3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer
(a spinlock for the freelist, and an lwlock for the clock sweep).
Here we can try to make it lock free based on atomic ops as
well.
4. Bgwriter needs to be more aggressive, logic based on which it
calculates how many buffers it needs to process needs to be
improved.
5. Contention around buffer mapping locks.
6. Cacheline bouncing around the buffer header spinlocks, is there
anything we can do to reduce this?
7. Choose Optimistically used buffer in StrategyGetBuffer().
8. Don't bump the usage count every time buffer is pinned.

I have already addressed some of these improvements in patch[1]
and for other's, I have plan to work on them for 9.5.

I think here you want to address the improvements related to usage
count and see if it can get us win in some of commonly used scenario's,
without affecting any other commonly used scenario. I feel this is good
idea to pursue and see if we can get good benefits with it.

Infact few days back, I had ran some tests manually to see the
problems around BufFreeListLock (currently I don't have script ready)
and more recently Jason Petersen has done some benchmarking
in this area which you can refer it here[2].

I wonder if we can work together to improve things in this area.

[1]
http://www.postgresql.org/message-id/006e01ce926c$c7768680$56639380$@kap...@huawei.com
[2]
https://googledrive.com/host/0Bx33JCTmOADOeTIwaE9KX21yWEk/Concurrency%20Limits%20with%20Large%20Working%20Sets

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-15 Thread Hannu Krosing

On 04/16/2014 06:12 AM, Hannu Krosing wrote:
 On 04/16/2014 01:25 AM, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
 Is there a way to force it to prefer a plan where the results of (select
 id from onemillion where data  '0.9' limit 100)
 are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
 all at once ?
 You could write the query like that:

 select * from onemillion_pgfdw where id = any (array(select id from
 onemillion where data  '0.9' limit 100));
 My actual use-case was about a join between a local and a remote table
 and without rewriting the query (they come from ORM)

 I was hoping to be able to nudge postgresql towards a better plan via some
 tuning of table/fdw options or GUCs.

 for example, would postgresql use the WHERE id IN (...) query on remote
 side for a query like

 select r.data, l.data
   from onemillion_pgfdw r
   join onemillion l
 on r.id = l.id and l.data  '0.999';

 if it recognizes that the local side returns only 1000 rows ?

 or would it still use 1000 individual WHERE id = $1 queries.

 Is getting the foreign data via IN and then turning the data into a hash
 for joining one of the plans it considers at all ?
It sees that could we need an extra tuning parameter for choosing the

ID IN (...) + HASH plan over individual SELECT .. WHERE ID = $1

something between

`fdw_startup_cost` and `fdw_tuple_cost`

to signify that an IN query returning 1000 rows runs faster than 1000 =
queries

as I understan currently they both would be estimated as

fdw_startup_cost + 1000 * fdw_tuple_cost

the new parameter could be fdw_call_cost or fdw_query_cost and would
estimate
how much each individual call to fdw costs, thus favouring calls which
return more
data in one call

Cheers
Hannu

 Best
 Hannu

 Or at least you should be able to, except when I try it I get

 explain analyze
 select * from onemillion_pgfdw where id = any (array(select id from
 onemillion where data  '0.9' limit 100));
 ERROR:  operator does not exist: integer = integer[]
 HINT:  No operator matches the given name and argument type(s). You might 
 need to add explicit type casts.
 CONTEXT:  Remote SQL command: EXPLAIN SELECT id, inserted, data FROM 
 public.onemillion WHERE ((id = ANY ((SELECT null::integer[]

 so there's something the remote-estimate code is getting wrong here.
 (It seems to work without remote_estimate, though.)

  regards, tom lane



-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?