Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

2014-04-16 Thread Craig Ringer
On 04/15/2014 10:17 PM, Tom Lane wrote:
  I actually think we should *add* a LIBPQEXPORT that handles this for
  libpq, much like PGDLLEXPORT does for postgres(.exe). And in the
  process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or
  something.
 My reaction to that is not bloody likely.  I remarked on this upthread
 already, but there is absolutely no way that I want to clutter our source
 code with platform-specific markings like that.
 
 Perhaps somebody could try a Windows build with PGDLLEXPORT defined to
 empty, and verify that it works, and if so do a pgbench comparison
 against a build done the existing way?

Good idea.

Personally, I don't care about Windows enough. I want it to work, but
performance optimisation is beyond what I'm bothered with.

Another useful test would be to modify libpq as described above, so its
headers set __declspec(dllexport) on its exports during compilation and
its headers set __declspec(dllimport) when included while compiling
other binaries that will link to libpq. Then use *that* in pgbench too
and see if it makes any meaningful difference.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BGWorkers, shared memory pointers, and postmaster restart

2014-04-16 Thread Craig Ringer
Hi all

I've been using the dynamic BGWorker support for some recent work, and I
think I've found an issue with how postmaster restarts are handled.

TL;DR: I don't think there's a safe way to use a BGWorker (static or
dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg
Datum that points into shared memory, and think we might need a API
change to fix that.



If the postmaster restarts due to a crash, it calls shared memory
startup hooks and sets the found pointer to false, so they're re-inited.
This makes sense as shm is presumed to be corrupt.

Dynamic BGWorkers are killed as a part of restart. However, they're not
unregistered, and are relaunched if there's a bgw_restart_time set.
They're called with the same bgw_main_arg Datum as with the original launch.

If this Datum is a pointer into the shared memory that just got zeroed
and re-inited, exciting things happen.

In my case the worker restart generally happens after shm is re-inited,
and since it gets reinitialized with the same contents the dynamic
bgworker registered during postmaster startup restarts normally. I only
noticed the problem because my shared memory init hook launches
bgworkers once shm is set up, and I was getting two copies of a bgworker
after postmaster restart.

This also affects static bgworkers that use a pointer into shared memory
to pass data on EXEC_BACKEND platforms (Windows).

For dynamic bgworkers, it seems like it'd be OK to just require
extensions to re-register them after a postmaster restart, or add a
BGW_UNREGISTER_ON_POSTMASTER_RESTART flag to let that be controlled so
bgworkers that didn't receive pointers into shm didn't have to deal with
it. So any pointer into shared memory that's being re-initialized would
be discarded when the bgworker was unregistered during postmater
restart. Dynamic bgworkers are new in 9.4, so we still have the freedom
to change behaviour for them.

But that won't fix static bgworkers that have a pointer into shm as an
argument. In non-EXEC_BACKEND platforms we can just pass a pointer to
palloc'd postmaster memory, but that won't work if we have to exec()
after fork(). I don't think it's reasonable to say well, don't do that
- if you can only send a single pass-by-value Datum to a bgworker's main
function, their utility is greatly reduced.


I could always set up an exit hook / SIGQUIT handler that tries to
unregister dynamic bgworkers using their BGWorkerHandle s, from the
worker that initially registered them. If the worker that registered
them is the one that crashed and triggered a postmaster restart this
won't do any good, so it's a half-solution at best.

I can't stash the BGWorkerHandle s for the launched workers in shm and
unregister them during postmaster restart either. For one thing, shm is
presumed corrupt. For another, BGWorkerHandle is an incomplete struct
with no exposed size, so I can't copy it into shm anyway.

This seems like a bit of a pickle. Am I missing something obvious?

The only way around it that I can currently see is to have a single
static bgworker with no pointer argument launch and manage all the other
workers required for the extension. It would launch them all with
bgw_restart_time = BGW_NEVER_RESTART and set its self as the
bgw_notify_pid . If they die, it unregisters them then registers a new
one. Effectively, it's doing the work the bgworker code does already,
except that it makes sure the bgworkers don't get relaunched on
postmaster restart. Since it doesn't depend on shm being valid, this
should work, but it's messy and seems like there should be a better way.

Do we need to change how we define BGWorkers to require that a BGWorker
with a Datum that points to shared memory be automatically unregistered
by the postmaster on restart? This would have to apply to static
BGWorkers too, so we'd want to think about adding a flag for it and
maybe even backpatching the flag into 9.3; it'd only affect extensions
that actually used the combination described above.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Peter Geoghegan
On Tue, Apr 15, 2014 at 9:44 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Apr 14, 2014 at 1:11 PM, Peter Geoghegan p...@heroku.com wrote:
 In the past, various hackers have noted problems they've observed with
 this scheme. A common pathology is to see frantic searching for a
 victim buffer only to find all buffer usage_count values at 5. It may
 take multiple revolutions of the clock hand before a victim buffer is
 found, as usage_count is decremented for each and every buffer.  Also,
 BufFreelistLock contention is considered a serious bottleneck [1],
 which is related.

 I think that the basic problem here is that usage counts increase when
 buffers are referenced, but they decrease when buffers are evicted,
 and those two things are not in any necessary way connected to each
 other.  In particular, if no eviction is happening, reference counts
 will converge to the maximum value.  I've read a few papers about
 algorithms that attempt to segregate the list of buffers into hot
 and cold lists, and an important property of such algorithms is that
 they mustn't be allowed to make everything hot.

It's possible that I've misunderstood what you mean here, but do you
really think it's likely that everything will be hot, in the event of
using something like what I've sketched here? I think it's an
important measure against this general problem that buffers really
earn the right to be considered hot, so to speak. With my prototype,
in order for a buffer to become as hard to evict as possible, at a
minimum it must be *continually* pinned for at least 30 seconds.
That's actually a pretty tall order. Although, as I said, I wouldn't
be surprised if it was worth making it possible for buffers to be even
more difficult to evict than that. It should be extremely difficult to
evict a root B-Tree page, and to a lesser extent inner pages even
under a lot of cache pressure, for example. There are lots of
workloads in which that can happen, and I have a hard time believing
that it's worth it to evict given the extraordinary difference in
their utility as compared to a lot of other things. I can imagine a
huge barrier against evicting what is actually a relatively tiny
number of pages being worth it.

I don't want to dismiss what you're saying about heating and cooling
being unrelated, but I don't find the conclusion that not everything
can be hot obvious. Maybe heat should be relative rather than
absolute, and maybe that's actually what you meant. There is surely
some workload where buffer access actually is perfectly uniform, and
what do you do there? What temperature are those buffers?

It occurs to me that within the prototype patch, even though
usage_count is incremented in a vastly slower fashion (in a wall time
sense), clock sweep doesn't take advantage of that. I should probably
investigate having clock sweep become more aggressive in decrementing
in response to realizing that it won't get some buffer's usage_count
down to zero on the next revolution either. There are certainly
problems with that, but they might be fixable. Within the patch, in
order for it to be possible for the usage_count to be incremented in
the interim, an average of 1.5 seconds must pass, so if clock sweep
were to anticipate another no-set-to-zero revolution, it seems pretty
likely that it would be exactly right, or if not then close enough,
since it can only really fail to correct for some buffers getting
incremented once more in the interim. Conceptually, it would be like
multiple logical revolutions were merged into one actual one,
sufficient to have the next revolution find a victim buffer.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: variant of regclass

2014-04-16 Thread Tatsuo Ishii
 Well, I noticed that, too, but I didn't think it was my job to tell
 the patch author what functions he should have wanted.  A follow-on
 patch to add to_regprocedure and to_regoperator wouldn't be much work,
 if you want that.
 
 And here is a patch for that.

Looks good to me except duplicate oids. Included is a patch to fix that.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0809a6d..db8691a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15294,10 +15294,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
/indexterm
 
indexterm
+primaryto_regprocedure/primary
+   /indexterm
+
+   indexterm
 primaryto_regoper/primary
/indexterm
 
indexterm
+primaryto_regoperator/primary
+   /indexterm
+
+   indexterm
 primaryto_regtype/primary
/indexterm
 
@@ -15482,11 +15490,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
entryget the oid of the named function/entry
   /row
   row
+   entryliteralfunctionto_regprocedure(parameterfunc_name/parameter)/function/literal/entry
+   entrytyperegprocedure/type/entry
+   entryget the oid of the named function/entry
+  /row
+  row
entryliteralfunctionto_regoper(parameteroperator_name/parameter)/function/literal/entry
entrytyperegoper/type/entry
entryget the oid of the named operator/entry
   /row
   row
+   entryliteralfunctionto_regoperator(parameteroperator_name/parameter)/function/literal/entry
+   entrytyperegoperator/type/entry
+   entryget the oid of the named operator/entry
+  /row
+  row
entryliteralfunctionto_regtype(parametertype_name/parameter)/function/literal/entry
entrytyperegtype/type/entry
entryget the oid of the named type/entry
@@ -15658,10 +15676,12 @@ SELECT collation for ('foo' COLLATE de_DE);
 
   para
The functionto_regclass/function, functionto_regproc/function,
-   functionto_regoper/function and functionto_regtype/function
-   translate relation, function, operator, and type names to objects of
-   type typeregclass/, typeregproc/, typeregoper/ and
-   typeregtype/, respectively.  These functions differ from a cast from
+   functionto_regprocedure/function, functionto_regoper/function,
+   functionto_regoperator/function, and functionto_regtype/function
+   functions translate relation, function, operator, and type names to objects
+   of type typeregclass/, typeregproc/, typeregprocedure/type,
+   typeregoper/, typeregoperator/type, and typeregtype/,
+   respectively.  These functions differ from a cast from
text in that they don't accept a numeric OID, and that they return null
rather than throwing an error if the name is not found (or, for
functionto_regproc/function and functionto_regoper/function, if
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index ed2bdbf..6210f45 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -323,6 +323,38 @@ regprocedurein(PG_FUNCTION_ARGS)
 }
 
 /*
+ * to_regprocedure	- converts proname(args) to proc OID
+ *
+ * If the name is not found, we return NULL.
+ */
+Datum
+to_regprocedure(PG_FUNCTION_ARGS)
+{
+	char	   *pro_name = PG_GETARG_CSTRING(0);
+	List	   *names;
+	int			nargs;
+	Oid			argtypes[FUNC_MAX_ARGS];
+	FuncCandidateList clist;
+
+	/*
+	 * Parse the name and arguments, look up potential matches in the current
+	 * namespace search list, and scan to see which one exactly matches the
+	 * given argument types.	(There will not be more than one match.)
+	 */
+	parseNameAndArgTypes(pro_name, false, names, nargs, argtypes);
+
+	clist = FuncnameGetCandidates(names, nargs, NIL, false, false, true);
+
+	for (; clist; clist = clist-next)
+	{
+		if (memcmp(clist-args, argtypes, nargs * sizeof(Oid)) == 0)
+			PG_RETURN_OID(clist-oid);
+	}
+
+	PG_RETURN_NULL();
+}
+
+/*
  * format_procedure		- converts proc OID to pro_name(args)
  *
  * This exports the useful functionality of regprocedureout for use
@@ -722,6 +754,45 @@ regoperatorin(PG_FUNCTION_ARGS)
 }
 
 /*
+ * to_regoperator	- converts oprname(args) to operator OID
+ *
+ * If the name is not found, we return NULL.
+ */
+Datum
+to_regoperator(PG_FUNCTION_ARGS)
+{
+	char	   *opr_name_or_oid = PG_GETARG_CSTRING(0);
+	Oid			result;
+	List	   *names;
+	int			nargs;
+	Oid			argtypes[FUNC_MAX_ARGS];
+
+	/*
+	 * Parse the name and arguments, look up potential matches in the current
+	 * namespace search list, and scan to see which one exactly matches the
+	 * given argument types.	(There will not be more than one match.)
+	 */
+	parseNameAndArgTypes(opr_name_or_oid, true, names, nargs, argtypes);
+	if (nargs == 1)
+		ereport(ERROR,
+(errcode(ERRCODE_UNDEFINED_PARAMETER),
+ errmsg(missing argument),
+ errhint(Use NONE to denote the missing 

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Andres Freund
Hi,

It's good to see focus on this - some improvements around s_b are sorely
needed.

On 2014-04-14 10:11:53 -0700, Peter Geoghegan wrote:
 1) Throttles incrementation of usage_count temporally. It becomes
 impossible to increment usage_count for any given buffer more
 frequently than every 3 seconds, while decrementing usage_count is
 totally unaffected.

I think this is unfortunately completely out of question. For one a
gettimeofday() for every uffer pin will become a significant performance
problem. Even the computation of the xact/stm start/stop timestamps
shows up pretty heavily in profiles today - and they are far less
frequent than buffer pins. And that's on x86 linux, where gettimeofday()
is implemented as something more lightweight than a full syscall.

The other significant problem I see with this is that its not adaptive
to the actual throughput of buffers in s_b. In many cases there's
hundreds of clock cycles through shared buffers in 3 seconds. By only
increasing the usagecount that often you've destroyed the little
semblance to a working LRU there is right now.

It also wouldn't work well for situations with a fast changing
workload  s_b. If you have frequent queries that take a second or so
and access some data repeatedly (index nodes or whatnot) only increasing
the usagecount once will mean they'll continually fall back to disk access.

 2) Has usage_count saturate at 10 (i.e. BM_MAX_USAGE_COUNT = 10), not
 5 as before. ... . This step on its own would be assumed extremely
 counter-productive by those in the know, but I believe that other
 measures ameliorate the downsides. I could be wrong about how true
 that is in other cases, but then the case helped here isn't what you'd
 call a narrow benchmark.

I don't see which mechanisms you have suggested that counter this?

I think having more granular usagecount is a good idea, but I don't
think it can realistically be implemented with the current method of
choosing victim buffers. The amount of cacheline misses around that is
already a major scalability limit; we surely can't make this even
worse. I think it'd be possible to get back to this if we had a better
bgwriter implementation.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The question about the type numeric

2014-04-16 Thread amulsul
But the sign is 0.
So is there anything wrong?
 
have look in src/backend/utils/adt/numeric.c @ 154  155 for POS  NEG
defination given as
 154 #define NUMERIC_POS 0x
 155 #define NUMERIC_NEG 0x4000

Regards,
Amul Sul



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/The-question-about-the-type-numeric-tp5800180p5800219.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2014-04-16 Thread Boszormenyi Zoltan

2014-04-16 10:54 keltezéssel, Boszormenyi Zoltan írta:

2014-01-18 18:08 keltezéssel, Boszormenyi Zoltan írta:

Hi,


Alvaro Herrera wrote:

Boszormenyi Zoltan escribió:

Rebased patches after the regression test and other details were fixed
in the infrastructure part.


This thread started in 2010, and various pieces have been applied
already and some others have changed in nature.  Would you please post a
new patchset, containing rebased patches that still need application, in
a new email thread to be linked in the commitfest entry?


I hope Thunderbird did the right thing and didn't include the reference
message ID when I told it to edit as new. So supposedly this is a new
thread but with all the cc: addresses kept.

I have rebased all remaining patches and kept the numbering.
All the patches from 18 to 25 that were previously in the
ECPG infrastructure CF link are included here.

There is no functional change.


Because of the recent bugfixes in the ECPG area, the patchset
didn't apply cleanly anymore. Rebased with no functional change.


The patches themselves are a little bigger though. It's because the
patches are generated with 10 lines of context, this was set in my
$HOME/.gitconfig:

[diff]
context = 10

Best regards,
Zoltán Böszörményi



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Peter Geoghegan
On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think this is unfortunately completely out of question. For one a
 gettimeofday() for every uffer pin will become a significant performance
 problem. Even the computation of the xact/stm start/stop timestamps
 shows up pretty heavily in profiles today - and they are far less
 frequent than buffer pins. And that's on x86 linux, where gettimeofday()
 is implemented as something more lightweight than a full syscall.

Come on, Andres. Of course exactly what I've done here is completely
out of the question as a patch that we can go and commit right now.
I've numerous caveats about bloating the buffer descriptors, and about
it being a proof of concept. I'm pretty sure we can come up with a
scheme to significantly cut down on the number of gettimeofday() calls
if it comes down to it. In any case, I'm interested in advancing our
understanding of the problem right now. Let's leave the minutiae to
one side for the time being.

 The other significant problem I see with this is that its not adaptive
 to the actual throughput of buffers in s_b. In many cases there's
 hundreds of clock cycles through shared buffers in 3 seconds. By only
 increasing the usagecount that often you've destroyed the little
 semblance to a working LRU there is right now.

If a usage_count can get to BM_MAX_USAGE_COUNT from its initial
allocation within an instant, that's bad. It's that simple. Consider
all the ways in which that can happen almost by accident.

You could probably reasonably argue that the trade-off or lack of
adaption (between an LRU and an LFU) that this particular sketch of
mine represents is inappropriate or sub-optimal, but I don't
understand why you're criticizing the patch for doing what I expressly
set out to do. I wrote I think a very real problem that may be that
approximating an LRU is bad because an actual LRU is bad.

 It also wouldn't work well for situations with a fast changing
 workload  s_b. If you have frequent queries that take a second or so
 and access some data repeatedly (index nodes or whatnot) only increasing
 the usagecount once will mean they'll continually fall back to disk access.

No, it shouldn't, because there is a notion of buffers getting a fair
chance to prove themselves. Now, it might well be the case that there
are workloads where what I've done to make that happen in this
prototype doesn't work out too well - I've already said so. But should
a buffer get a usage count of 5 just because the user inserted 5
tuples within a single DML command, for example? If so, why?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: iff - if

2014-04-16 Thread Andreas 'ads' Scherbaum

On 04/16/2014 12:19 AM, Andreas 'ads' Scherbaum wrote:


stumbled over a number of iff in the source where if is meant - not
sure what the real story behind this is, but attached is a patch to fix
the about 80 occurrences.


Looks like I missed something in my math lessons ...


--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Andres Freund
On 2014-04-16 01:58:23 -0700, Peter Geoghegan wrote:
 On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I think this is unfortunately completely out of question. For one a
  gettimeofday() for every uffer pin will become a significant performance
  problem. Even the computation of the xact/stm start/stop timestamps
  shows up pretty heavily in profiles today - and they are far less
  frequent than buffer pins. And that's on x86 linux, where gettimeofday()
  is implemented as something more lightweight than a full syscall.
 
 Come on, Andres. Of course exactly what I've done here is completely
 out of the question as a patch that we can go and commit right now.
 I've numerous caveats about bloating the buffer descriptors, and about
 it being a proof of concept. I'm pretty sure we can come up with a
 scheme to significantly cut down on the number of gettimeofday() calls
 if it comes down to it. In any case, I'm interested in advancing our
 understanding of the problem right now. Let's leave the minutiae to
 one side for the time being.

*I* don't think any scheme that involves measuring the time around
buffer pins is going to be acceptable. It's better than I say that now
rather than when you've invested significant time into the approach, no?

  The other significant problem I see with this is that its not adaptive
  to the actual throughput of buffers in s_b. In many cases there's
  hundreds of clock cycles through shared buffers in 3 seconds. By only
  increasing the usagecount that often you've destroyed the little
  semblance to a working LRU there is right now.
 
 If a usage_count can get to BM_MAX_USAGE_COUNT from its initial
 allocation within an instant, that's bad. It's that simple. Consider
 all the ways in which that can happen almost by accident.

Yes, I agree that that's a problem. It immediately going down to zero is
a problem as well though. And that's what will happen in many scenarios,
because you have time limits on increasing the usagecount, but not when
decreasing.

  It also wouldn't work well for situations with a fast changing
  workload  s_b. If you have frequent queries that take a second or so
  and access some data repeatedly (index nodes or whatnot) only increasing
  the usagecount once will mean they'll continually fall back to disk access.
 
 No, it shouldn't, because there is a notion of buffers getting a fair
 chance to prove themselves.

If you have a workload with  (BM_MAX_USAGE_COUNT + 1) clock
cycles/second, how does *any* buffer has a chance to prove itself?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Dynamic Background Workers and clean exit

2014-04-16 Thread Petr Jelinek

Hello,

I've been recently doing some work with dynamic bgworkers and noticed 
that I have no way of saying I am done now and want to exit cleanly 
because bgworkers get restarted automatically on exit code 0 no matter 
what is the restart interval set to.


I understand the rationale for this behavior when using static bgworkers 
which are meant to run forever, but dynamic ones are spawned dynamically 
by code so they should also be able to terminate cleanly as they have 
presumably finite work to do. Also I think the clean shutdown of dynamic 
bgworker should not be logged (or at least not with the same log level 
as it is now) since it's business as usual.


Since the dyanamic bgworkers are new in 9.4 the behavior can still be 
changed.


Thoughts?


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Peter Geoghegan
On Wed, Apr 16, 2014 at 2:18 AM, Andres Freund and...@2ndquadrant.com wrote:
 *I* don't think any scheme that involves measuring the time around
 buffer pins is going to be acceptable. It's better than I say that now
 rather than when you've invested significant time into the approach, no?

Well, I do think that it will be possible to make something like that
work. LRU-K/LRU-2 involves remembering the last two access times (not
the last one). Researchers considered preeminent authorities on
caching algorithms thought that was a good idea in 1993. There are
plenty of other examples of similar schemes too.

 No, it shouldn't, because there is a notion of buffers getting a fair
 chance to prove themselves.

 If you have a workload with  (BM_MAX_USAGE_COUNT + 1) clock
 cycles/second, how does *any* buffer has a chance to prove itself?

There could be lots of ways. I thought about representing that more
directly. I don't think that it's useful to have a large number of
revolutions in search of a victim under any circumstances.
Fundamentally, you're asking what if any scheme here leans too
heavily towards frequency?. That could certainly be a problem, as
I've said, and we could think about adaptation over heuristics, as
I've said, but it is very obviously a big problem that clock sweep
doesn't really care about frequency one bit right now.

Why should I be the one with all the answers? Aren't you interested in
the significance of the patch, and the test case?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Andres Freund
Hi,

On 2014-04-16 02:57:54 -0700, Peter Geoghegan wrote:
 Why should I be the one with all the answers?

Who said you need to be? The only thing I am saying is that I don't
agree with some of your suggestions?

I only responded to the thread now because downthread (in
CAM3SWZQa2OAVUrfPL-df=we1smozkbr392sw_novukzepxh...@mail.gmail.com) you
further argued using the timestamp - which I think is a flawed
concept. So I thought it'd be fair to argue against it now, rather than
later.

 Aren't you interested in the significance of the patch, and the test case?

Not particularly in the specifics to be honest. The tradeoffs of the
techniques you used in there seem prohibitive to me. It's easy to make
individual cases faster by sacrificing others.
Sometimes it's useful to prototype solutions while narrowing the scope
for evaluation to get faster feedback, but as I don't see the solutions
to be applicable in the general case...

I think it's very important to improve upon the current state. It's imo
one of postgres' biggest issues. But it's also far from trivial,
otherwise it'd be done already.

I *personally* don't think it's very likely that we can improve
significantly upon the current state as long as every process regularly
participates in the clock sweep. ISTM that prevents many more elaborate
techniques to be used (cache misses/bus traffic, locking). But that's
just gut feeling.
I also think there are bigger issues than the actual LRU/whatever
behaviour, namely the scalability issues around shared buffers making
both small and big s_b settings major bottlenecks. But that's just where
I have seen more issues personally.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart

2014-04-16 Thread Craig Ringer
On 04/16/2014 02:37 PM, Craig Ringer wrote:
 Hi all
 
 I've been using the dynamic BGWorker support for some recent work, and I
 think I've found an issue with how postmaster restarts are handled.
 
 TL;DR: I don't think there's a safe way to use a BGWorker (static or
 dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg
 Datum that points into shared memory, and think we might need a API
 change to fix that.

Andres sensibly points out that part of this is easily solved by passing
the bgworker an index into an array in a named shmem block. A simple
pass-by-value Datum that can be turned into a pointer to a shmem struct.

This still doesn't solve the other half of the issue, which is how to
handle dynamic bgworkers after a postmaster restart. They're effectively
lost/leaked: there's no way to retain a bgworker handle across restart,
and no way to list bgworkers, nor is there any idempotent way to say
Start a worker to do x only if it doesn't already exist (unique
names, magic cookie hashes, whatever).

With the current API the only solution to the second half that I see is
to have bgworkers run in non-restart mode and manage them yourself. Not
ideal.

Instead we need one of:

- A flag like BGW_UNREGISTER_ON_RESTART;

- To always unregister dynamic bgws on postmaster shm clear + restart;

- A way to list bgws, inspect their BackgroundWorker structs and obtain
their handles; or

- A way to idempotently register a bgw only if it doesn't already exist





-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart

2014-04-16 Thread Andres Freund
On 2014-04-16 19:11:37 +0800, Craig Ringer wrote:
 On 04/16/2014 02:37 PM, Craig Ringer wrote:
  Hi all
  
  I've been using the dynamic BGWorker support for some recent work, and I
  think I've found an issue with how postmaster restarts are handled.
  
  TL;DR: I don't think there's a safe way to use a BGWorker (static or
  dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg
  Datum that points into shared memory, and think we might need a API
  change to fix that.
 
 Andres sensibly points out that part of this is easily solved by passing
 the bgworker an index into an array in a named shmem block. A simple
 pass-by-value Datum that can be turned into a pointer to a shmem struct.
 
 This still doesn't solve the other half of the issue, which is how to
 handle dynamic bgworkers after a postmaster restart. They're effectively
 lost/leaked: there's no way to retain a bgworker handle across restart,
 and no way to list bgworkers, nor is there any idempotent way to say
 Start a worker to do x only if it doesn't already exist (unique
 names, magic cookie hashes, whatever).
 
 With the current API the only solution to the second half that I see is
 to have bgworkers run in non-restart mode and manage them yourself. Not
 ideal.
 
 Instead we need one of:
 
 - A flag like BGW_UNREGISTER_ON_RESTART;
 
 - To always unregister dynamic bgws on postmaster shm clear + restart;
 
 - A way to list bgws, inspect their BackgroundWorker structs and obtain
 their handles; or
 
 - A way to idempotently register a bgw only if it doesn't already exist

I think we should go for always unregistering dynamic bgws. There's
really little justification for keeping them around after a crash cycle.

While not the nicest place architecturally, it seems easy enough to do
in BackgroundWorkerShmemInit() which happens to be called conveniently
in a crash/restart cycle...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-16 Thread Etsuro Fujita

(2014/04/16 6:55), Hannu Krosing wrote:

--
CREATE EXTENSION postgres_fdw;

CREATE SERVER loop foreign data wrapper postgres_fdw
   OPTIONS (port '5432', dbname 'testdb');

CREATE USER MAPPING FOR PUBLIC SERVER loop;

create table onemillion (
 id serial primary key,
 inserted timestamp default clock_timestamp(),
 data text
);

insert into onemillion(data) select random() from
generate_series(1,100);

CREATE FOREIGN TABLE onemillion_pgfdw (
 id int,
 inserted timestamp,
 data text
) SERVER loop
OPTIONS (table_name 'onemillion',
  use_remote_estimate 'true');

testdb=# explain analyse
select * from onemillion_pgfdw where id in (select id from onemillion
where data  '0.9' limit 100);
QUERY
PLAN
-
  Nested Loop  (cost=122.49..10871.06 rows=50 width=44) (actual
time=4.269..93.444 rows=100 loops=1)
-  HashAggregate  (cost=22.06..23.06 rows=100 width=4) (actual
time=1.110..1.263 rows=100 loops=1)
  -  Limit  (cost=0.00..20.81 rows=100 width=4) (actual
time=0.038..1.026 rows=100 loops=1)
-  Seq Scan on onemillion  (cost=0.00..20834.00
rows=100115 width=4) (actual time=0.036..0.984 rows=100 loops=1)
  Filter: (data  '0.9'::text)
  Rows Removed by Filter: 805
-  Foreign Scan on onemillion_pgfdw  (cost=100.43..108.47 rows=1
width=29) (actual time=0.772..0.773 rows=1 loops=100)
  Total runtime: 93.820 ms
(8 rows)

Time: 97.283 ms
--

... actually performs 100 distinct SELECT * FROM onemillion WHERE id =
$1 calls on remote side.


Maybe I'm missing something, but I think that you can do what I think 
you'd like to do by the following procedure:


postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000');
ALTER SERVER
postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in 
(SELECT id FROM onemillion WHERE data  '0.9' LIMIT 100);

  QUERY PLAN
---
 Hash Semi Join  (cost=1023.10..41983.21 rows=100 width=30)
   Output: onemillion_pgsql.id, onemillion_pgsql.inserted, 
onemillion_pgsql.data

   Hash Cond: (onemillion_pgsql.id = onemillion.id)
   -  Foreign Scan on public.onemillion_pgsql  (cost=1000.00..39334.00 
rows=100 width=29)
 Output: onemillion_pgsql.id, onemillion_pgsql.inserted, 
onemillion_pgsql.data

 Remote SQL: SELECT id, inserted, data FROM public.onemillion
   -  Hash  (cost=21.85..21.85 rows=100 width=4)
 Output: onemillion.id
 -  Limit  (cost=0.00..20.85 rows=100 width=4)
   Output: onemillion.id
   -  Seq Scan on public.onemillion  (cost=0.00..20834.00 
rows=99918 width=4)

 Output: onemillion.id
 Filter: (onemillion.data  '0.9'::text)
 Planning time: 0.690 ms
(14 rows)

or, that as Tom mentioned, by disabling the use_remote_estimate function:

postgres=# ALTER FOREIGN TABLE onemillion_pgsql OPTIONS (SET 
use_remote_estimate 'false');

ALTER FOREIGN TABLE
postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in 
(SELECT id FROM onemillion WHERE data  '0.9' LIMIT 100);

  QUERY PLAN
--
 Hash Semi Join  (cost=123.10..41083.21 rows=100 width=30)
   Output: onemillion_pgsql.id, onemillion_pgsql.inserted, 
onemillion_pgsql.data

   Hash Cond: (onemillion_pgsql.id = onemillion.id)
   -  Foreign Scan on public.onemillion_pgsql  (cost=100.00..38434.00 
rows=100 width=30)
 Output: onemillion_pgsql.id, onemillion_pgsql.inserted, 
onemillion_pgsql.data

 Remote SQL: SELECT id, inserted, data FROM public.onemillion
   -  Hash  (cost=21.85..21.85 rows=100 width=4)
 Output: onemillion.id
 -  Limit  (cost=0.00..20.85 rows=100 width=4)
   Output: onemillion.id
   -  Seq Scan on public.onemillion  (cost=0.00..20834.00 
rows=99918 width=4)

 Output: onemillion.id
 Filter: (onemillion.data  '0.9'::text)
 Planning time: 0.215 ms
(14 rows)

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls

2014-04-16 Thread Nicholas White
Thanks for the detailed feedback, I'm sorry it took so long to
incorporate it. I've attached the latest version of the patch, fixing
in particular:

 We have this block:
I've re-written this so it only does a single pass through the window
definitions (my patch originally added a second pass), and only does
the clone if required.

 In gram.y there are some spurious whitespaces at end-of-line.
Fixed - I didn't know about diff --check, it's very useful!

 Also, in parsenodes.h, you had the [MANDATORY] and such tags.
I've re-written the comments (without tags) to make it much easier to
understand . I agree they were ugly!

Exactly what case does the in this case phrase refer to?
Clarified in the comments

A style issue.  You have this:
Fixed

 And a final style comment.
Fixed

 Finally, I'm not really sure about the column you added to the regression 
 tests table.
Indeed, it was a bit artificial. I've re-written the tests to use a
separate table as you suggest.

Thanks -

Nick
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0809a6d..5da852e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13185,6 +13185,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
  lag(replaceable class=parametervalue/replaceable typeany/
  [, replaceable class=parameteroffset/replaceable typeinteger/
  [, replaceable class=parameterdefault/replaceable typeany/ ]])
+ [ { RESPECT | IGNORE } NULLS ]
/function
   /entry
   entry
@@ -13199,7 +13200,9 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
replaceable class=parameterdefault/replaceable are evaluated
with respect to the current row.  If omitted,
replaceable class=parameteroffset/replaceable defaults to 1 and
-   replaceable class=parameterdefault/replaceable to null
+   replaceable class=parameterdefault/replaceable to null. If
+   literalIGNORE NULLS/ is specified then the function will be evaluated
+   as if the rows containing nulls didn't exist.
   /entry
  /row
 
@@ -13212,6 +13215,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
  lead(replaceable class=parametervalue/replaceable typeany/
   [, replaceable class=parameteroffset/replaceable typeinteger/
   [, replaceable class=parameterdefault/replaceable typeany/ ]])
+ [ { RESPECT | IGNORE } NULLS ]
/function
   /entry
   entry
@@ -13226,7 +13230,9 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
replaceable class=parameterdefault/replaceable are evaluated
with respect to the current row.  If omitted,
replaceable class=parameteroffset/replaceable defaults to 1 and
-   replaceable class=parameterdefault/replaceable to null
+   replaceable class=parameterdefault/replaceable to null. If
+   literalIGNORE NULLS/ is specified then the function will be evaluated
+   as if the rows containing nulls didn't exist.
   /entry
  /row
 
@@ -13320,11 +13326,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
   note
para
 The SQL standard defines a literalRESPECT NULLS/ or
-literalIGNORE NULLS/ option for functionlead/, functionlag/,
-functionfirst_value/, functionlast_value/, and
-functionnth_value/.  This is not implemented in
-productnamePostgreSQL/productname: the behavior is always the
-same as the standard's default, namely literalRESPECT NULLS/.
+literalIGNORE NULLS/ option for functionfirst_value/,
+functionlast_value/, and functionnth_value/.  This is not
+implemented in productnamePostgreSQL/productname: the behavior is
+always the same as the standard's default, namely literalRESPECT NULLS/.
 Likewise, the standard's literalFROM FIRST/ or literalFROM LAST/
 option for functionnth_value/ is not implemented: only the
 default literalFROM FIRST/ behavior is supported.  (You can achieve
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 2fcc630..5cea825 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2431,7 +2431,6 @@ window_gettupleslot(WindowObject winobj, int64 pos, TupleTableSlot *slot)
  * API exposed to window functions
  ***/
 
-
 /*
  * WinGetPartitionLocalMemory
  *		Get working memory that lives till end of partition processing
@@ -2467,6 +2466,17 @@ WinGetCurrentPosition(WindowObject winobj)
 }
 
 /*
+ * WinGetFrameOptions
+ *		Returns the frame option flags
+ */
+int
+WinGetFrameOptions(WindowObject winobj)
+{
+	Assert(WindowObjectIsValid(winobj));
+	return winobj-winstate-frameOptions;
+}
+
+/*
  * WinGetPartitionRowCount
  *		Return total number of rows contained in the current partition.
  *
diff --git a/src/backend/parser/gram.y 

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 3:22 AM, Peter Geoghegan p...@heroku.com wrote:
 It's possible that I've misunderstood what you mean here, but do you
 really think it's likely that everything will be hot, in the event of
 using something like what I've sketched here? I think it's an
 important measure against this general problem that buffers really
 earn the right to be considered hot, so to speak. With my prototype,
 in order for a buffer to become as hard to evict as possible, at a
 minimum it must be *continually* pinned for at least 30 seconds.
 That's actually a pretty tall order. Although, as I said, I wouldn't
 be surprised if it was worth making it possible for buffers to be even
 more difficult to evict than that. It should be extremely difficult to
 evict a root B-Tree page, and to a lesser extent inner pages even
 under a lot of cache pressure, for example. There are lots of
 workloads in which that can happen, and I have a hard time believing
 that it's worth it to evict given the extraordinary difference in
 their utility as compared to a lot of other things. I can imagine a
 huge barrier against evicting what is actually a relatively tiny
 number of pages being worth it.

I'm making a general statement about a property that I think a buffer
eviction algorithm ought to have.  I actually didn't say anything
about the algorithm you've chosen one way or the other.  Obviously,
you've built in some protections against everything becoming hot, and
that's a good thing as far as it goes.  But you also have a greatly
increased risk of everything becoming cold.  All you need is a rate of
buffer eviction that circles shared_buffers more often than once every
3 seconds, and everything will gradually cool down until you once
again can't distinguish which stuff is hot from which stuff isn't.

 I don't want to dismiss what you're saying about heating and cooling
 being unrelated, but I don't find the conclusion that not everything
 can be hot obvious. Maybe heat should be relative rather than
 absolute, and maybe that's actually what you meant. There is surely
 some workload where buffer access actually is perfectly uniform, and
 what do you do there? What temperature are those buffers?

Obviously, some value lower than the maximum and higher than the
minimum.  If they're all at max temperature and then a new buffer (a
btree room or vm page, for example) comes along and is much hotter,
there's no room on the scale left to express that.  If they're all at
min temperature and then a new buffer comes along that is just used
once and thrown out, there's no room left on the scale for that buffer
to emerge as a good candidate for eviction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Ants Aasma
On Wed, Apr 16, 2014 at 7:44 AM, Robert Haas robertmh...@gmail.com wrote:
 I think that the basic problem here is that usage counts increase when
 buffers are referenced, but they decrease when buffers are evicted,
 and those two things are not in any necessary way connected to each
 other.  In particular, if no eviction is happening, reference counts
 will converge to the maximum value.  I've read a few papers about
 algorithms that attempt to segregate the list of buffers into hot
 and cold lists, and an important property of such algorithms is that
 they mustn't be allowed to make everything hot.  It's easy to be too
 simplistic, here: an algorithm that requires that no more than half
 the list be hot will fall over badly on a workload where the working
 set exceeds the available cache and the really hot portion of the
 working set is 60% of the available cache.  So you need a more
 sophisticated algorithm than that.  But that core property that not
 all buffers can be hot must somehow be preserved, and our algorithm
 doesn't.

FWIW in CLOCK-Pro segregating buffers between hot and cold is tied to
eviction and the clock sweep, the ratio between hot and cold is
dynamically adapted based on prior experience. The main downside is
that it seems to require an indirection somewhere either in the clock
sweep or buffer lookup. Maybe it's possible to avoid that with some
clever engineering if we think hard enough.

CLOCK-Pro may also have too little memory of hotness, making it too
easy to blow the whole cache away with a burst of activity. It may be
useful to have a (possibly tunable) notion of fairness where one
query/backend can't take over the cache even though it may be an
overall win in terms of total number of I/Os performed. Maybe we need
to invent Generalized CLOCK-Pro with a larger number of levels,
ranging from cold, hot and scalding to infernal. :)

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart

2014-04-16 Thread Craig Ringer
On 04/16/2014 07:21 PM, Andres Freund wrote:
 On 2014-04-16 19:11:37 +0800, Craig Ringer wrote:
 On 04/16/2014 02:37 PM, Craig Ringer wrote:
 Hi all

 I've been using the dynamic BGWorker support for some recent work, and I
 think I've found an issue with how postmaster restarts are handled.

 TL;DR: I don't think there's a safe way to use a BGWorker (static or
 dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg
 Datum that points into shared memory, and think we might need a API
 change to fix that.

 Andres sensibly points out that part of this is easily solved by passing
 the bgworker an index into an array in a named shmem block. A simple
 pass-by-value Datum that can be turned into a pointer to a shmem struct.

 This still doesn't solve the other half of the issue, which is how to
 handle dynamic bgworkers after a postmaster restart. They're effectively
 lost/leaked: there's no way to retain a bgworker handle across restart,
 and no way to list bgworkers, nor is there any idempotent way to say
 Start a worker to do x only if it doesn't already exist (unique
 names, magic cookie hashes, whatever).

 With the current API the only solution to the second half that I see is
 to have bgworkers run in non-restart mode and manage them yourself. Not
 ideal.

 Instead we need one of:

 - A flag like BGW_UNREGISTER_ON_RESTART;

 - To always unregister dynamic bgws on postmaster shm clear + restart;

 - A way to list bgws, inspect their BackgroundWorker structs and obtain
 their handles; or

 - A way to idempotently register a bgw only if it doesn't already exist
 
 I think we should go for always unregistering dynamic bgws. There's
 really little justification for keeping them around after a crash cycle.

Seems simplest too, and extensions can reasonably expect to have to
relaunch things after postmaster restart. We just have to document
*where* they should do that, and that only dynamic bgworkers get cleared
on postmaster restart, not static ones.

I'd like to propose the following text as an addition to the comment in
bgworker.h:


 The bgw_main_arg passed to a bgworker should be pass-by-value Datum
 (and not a pointer). A pointer to TopMemoryContext postmaster
 memory is permitted for static bgworkers, but won't work on
 EXEC_BACKEND platforms. If any nontrivial data must be passed to a
 bgworker, a named shared memory segment should be allocated to
 contain it. For multiple workers, the bgw_main_arg may be an index
 into an array of fixed-width structs in that shm segment.

 Note that shared memory is re-initialized if the postmaster restarts
 after a backend crash. Backends should always use a shared memory
 startup hook to initialize their shared memory so that it is
 re-initialized on postmaster restart.


Don't think there's any good example code on how to do this in-tree, and
it's too long for a sane comment. Wonder if I should add an example to
contrib/worker_spi/ .

There's no race anywhere btw. bgworkers are killed, then shm is cleared,
then shm callbacks are rerun, then bgworkers are launched. So long as
shm is reinited appropriately by an extension then static bgworkers that
refer to that shm should be OK.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Merlin Moncure
On Tue, Apr 15, 2014 at 11:27 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Apr 16, 2014 at 5:00 AM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma a...@cybertec.at wrote:
 There's a paper on a non blocking GCLOCK algorithm, that does lock
 free clock sweep and buffer pinning[7]. If we decide to stay with
 GCLOCK it may be interesting, although I still believe that some
 variant of buffer nailing[8] is a better idea, my experience shows
 that most of the locking overhead is cache line bouncing ignoring the
 extreme cases where our naive spinlock implementation blows up.

 You might be right about that, but lets handle one problem at a time.
 Who knows what the bottleneck will end up being if and when we address
 the naivety around frequency? I want to better characterize that
 problem first.

 Just to summarize you about the previous discussion and the
 improvements that we decided to do in this area based on feedback
 are as follows:

 1. Bgwriter needs to be improved so that it can help in reducing
 usage count and finding next victim buffer (run the clock sweep
 and add buffers to the free list).
 2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist
 are less.
 3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer
 (a spinlock for the freelist, and an lwlock for the clock sweep).
 Here we can try to make it lock free based on atomic ops as
 well.
 4. Bgwriter needs to be more aggressive, logic based on which it
 calculates how many buffers it needs to process needs to be
 improved.
 5. Contention around buffer mapping locks.
 6. Cacheline bouncing around the buffer header spinlocks, is there
 anything we can do to reduce this?
 7. Choose Optimistically used buffer in StrategyGetBuffer().
 8. Don't bump the usage count every time buffer is pinned.

What about:  9. Don't wait on locked buffer in the clock sweep.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Andres Freund
On 2014-02-03 11:29:22 -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  So
  exit(0) - done, permanently
  exit(1) - done until restart interval
  exit(other) - crash
  and there's no way to obtain the restart immediately behavior?
 
  That's what I was thinking about, yes.  Of course, when the restart
  interval is 0, done until restart interval is the same as restart
  immediately, so for anyone who wants to *always* restart immediately
  there is no problem.  Where you will run into trouble is if you
  sometimes want to wait for the restart interval and other times want
  to restart immediately.  But I'm not sure that's a real use case.  If
  it is, I suggest that we assign it some other, more obscure exit code
  and reserve 0 and 1 for what I believe will probably be the common
  cases.
 
 Agreed, but after further reflection it seems like if you've declared
 a restart interval, then done until restart interval is probably the
 common case.  So how about
 
 exit(0) - done until restart interval, or permanently if there is none
 exit(1) - done permanently, even if a restart interval was declared
 exit(other) - crash
 
 I don't offhand see a point in an exit and restart immediately case.
 Why exit at all, if you could just keep running?  If you *can't* just
 keep running, it probably means you know you've bollixed something,
 so that the crash case is probably what to do anyway.

There's the case where you want to quickly go over all the databases,
but only use one bgworker for it. I don't think there's another way to
do that.

I think we really should bite the bullet and change this before 9.4
comes out. The current static bgworker facility has only been out there
for one release, and dynamic bgworkers aren't released yet at all. If we
wait with this for 9.5, we'll annoy many more people.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Andres Freund
On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote:
  1. Bgwriter needs to be improved so that it can help in reducing
  usage count and finding next victim buffer (run the clock sweep
  and add buffers to the free list).
  2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist
  are less.
  3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer
  (a spinlock for the freelist, and an lwlock for the clock sweep).
  Here we can try to make it lock free based on atomic ops as
  well.
  4. Bgwriter needs to be more aggressive, logic based on which it
  calculates how many buffers it needs to process needs to be
  improved.
  5. Contention around buffer mapping locks.
  6. Cacheline bouncing around the buffer header spinlocks, is there
  anything we can do to reduce this?
  7. Choose Optimistically used buffer in StrategyGetBuffer().
  8. Don't bump the usage count every time buffer is pinned.
 
 What about:  9. Don't wait on locked buffer in the clock sweep.

I don't think we do that? Or are you referring to locked buffer headers?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-16 Thread Hannu Krosing
On 04/16/2014 01:35 PM, Etsuro Fujita wrote:
 (2014/04/16 6:55), Hannu Krosing wrote:
...

 Maybe I'm missing something, but I think that you can do what I think
 you'd like to do by the following procedure:
No, what I'd like PostgreSQL to do is to

1. select the id+set from local table
2. select the rows from remote table with WHERE ID IN (set selected in
step 1)
3. then join the original set to selected set, with any suitable join
strategy

The things I do not want are

A. selecting all rows from remote table
(this is what your examples below do)

or

B. selecting rows from remote table by single selects using ID = $
(this is something that I managed to do by some tweaking of costs)

as A will be always slow if there are millions of rows in remote table
and B is slow(ish) when the idset is over a few hundred ids

I hope this is a bit better explanation than I provided before .

Cheers
Hannu

P.S. I am not sure if this is a limitation of postgres_fdw or postgres
itself

P.P.S I tested a little with with Multicorn an postgresql did not
request row
counts for any IN plans, so it may be that the planner does not consider
this
kind of plan at all. (testing was on PgSQL 9.3.4)

Hannu

 postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000');
 ALTER SERVER
 postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in
 (SELECT id FROM onemillion WHERE data  '0.9' LIMIT 100);
   QUERY PLAN
 ---

  Hash Semi Join  (cost=1023.10..41983.21 rows=100 width=30)
Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
 onemillion_pgsql.data
Hash Cond: (onemillion_pgsql.id = onemillion.id)
-  Foreign Scan on public.onemillion_pgsql 
 (cost=1000.00..39334.00 rows=100 width=29)
  Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
 onemillion_pgsql.data
  Remote SQL: SELECT id, inserted, data FROM public.onemillion
-  Hash  (cost=21.85..21.85 rows=100 width=4)
  Output: onemillion.id
  -  Limit  (cost=0.00..20.85 rows=100 width=4)
Output: onemillion.id
-  Seq Scan on public.onemillion  (cost=0.00..20834.00
 rows=99918 width=4)
  Output: onemillion.id
  Filter: (onemillion.data  '0.9'::text)
  Planning time: 0.690 ms
 (14 rows)

 or, that as Tom mentioned, by disabling the use_remote_estimate function:

 postgres=# ALTER FOREIGN TABLE onemillion_pgsql OPTIONS (SET
 use_remote_estimate 'false');
 ALTER FOREIGN TABLE
 postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in
 (SELECT id FROM onemillion WHERE data  '0.9' LIMIT 100);
   QUERY PLAN
 --

  Hash Semi Join  (cost=123.10..41083.21 rows=100 width=30)
Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
 onemillion_pgsql.data
Hash Cond: (onemillion_pgsql.id = onemillion.id)
-  Foreign Scan on public.onemillion_pgsql  (cost=100.00..38434.00
 rows=100 width=30)
  Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
 onemillion_pgsql.data
  Remote SQL: SELECT id, inserted, data FROM public.onemillion
-  Hash  (cost=21.85..21.85 rows=100 width=4)
  Output: onemillion.id
  -  Limit  (cost=0.00..20.85 rows=100 width=4)
Output: onemillion.id
-  Seq Scan on public.onemillion  (cost=0.00..20834.00
 rows=99918 width=4)
  Output: onemillion.id
  Filter: (onemillion.data  '0.9'::text)
  Planning time: 0.215 ms
 (14 rows)

 Thanks,

 Best regards,
 Etsuro Fujita





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Merlin Moncure
On Tue, Apr 15, 2014 at 11:44 PM, Robert Haas robertmh...@gmail.com wrote:
 I think that the basic problem here is that usage counts increase when
 buffers are referenced, but they decrease when buffers are evicted,
 and those two things are not in any necessary way connected to each
 other.  In particular, if no eviction is happening, reference counts
 will converge to the maximum value.  I've read a few papers about
 algorithms that attempt to segregate the list of buffers into hot
 and cold lists, and an important property of such algorithms is that
 they mustn't be allowed to make everything hot.  It's easy to be too
 simplistic, here: an algorithm that requires that no more than half
 the list be hot will fall over badly on a workload where the working
 set exceeds the available cache and the really hot portion of the
 working set is 60% of the available cache.  So you need a more
 sophisticated algorithm than that.  But that core property that not
 all buffers can be hot must somehow be preserved, and our algorithm
 doesn't.

A while back you sketched out an idea that did something like that:
hotly accessed buffers became 'perma-pinned' such that they no longer
participated in the clock sweep for eviction and there was a side-line
process that did a two stage eviction (IIRC) from the super hot stack
in order to mitigate locking.  This idea had a couple of nice
properties:

1) very hot buffers no longer get refcounted, reducing spinlock
contention (which has been documented in real world workloads)
2) eviction loop shrinks.  although you still have to check the 'very
hot' flag, thats an unlocked check (again, IIRC) and no further
processing is done.

The downside of this approach was complexity and difficult to test for
edge case complexity.  I would like to point out though that while i/o
efficiency gains are nice, I think contention issues are the bigger
fish to fry.

On Mon, Apr 14, 2014 at 12:11 PM, Peter Geoghegan p...@heroku.com wrote:
 1) Throttles incrementation of usage_count temporally. It becomes
 impossible to increment usage_count for any given buffer more
 frequently than every 3 seconds, while decrementing usage_count is
 totally unaffected.

hm, that's expensive.  how about a heuristic based on the number of
buffer allocations and the size of the buffer pool?

On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote:
 What about:  9. Don't wait on locked buffer in the clock sweep.

 I don't think we do that? Or are you referring to locked buffer headers?

Right -- exactly.  I posted patch for this a while back. It's quite
trivial: implement a trylock variant of the buffer header lock macro
and further guard the check with a non-locking test (which TAS()
already does generally, but the idea is to avoid the cache line lock
in likely cases of contention).  I believe this to be unambiguously
better: even if it's self healing or unlikely, there is no good reason
to jump into a spinlock fray or even request a contented cache line
while holding a critical lock.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-04-16 Thread Alvaro Herrera
Josh Berkus wrote:
 On 04/15/2014 02:25 PM, Josh Berkus wrote:
  Hackers,
  
  We need documentation on how users should intelligently set the
  multixact freeze settings.  I'm happy to write the actual text, but I
  definitely don't have any idea how to set these myself.  Under what
  circumstances should they be different from freeze_max_age?  How?

Measure consumption rate of multixacts, compare to consumption rate of
xids, and set the freeze ages so that they are reached more-or-less at
the same time, so that freezing for any of them would also freeze the
other one.  You need to set both table_freeze_ages to values that would
be reached later than both min_freeze_ages would be reached, if you get
what I mean.  The idea is that full scan of a table would fix both
things at once, saving a followup full scan shortly after the first one.  

You can see the current multixact value in pg_controldata output.  Keep
timestamped values of that somewhere (a table?) so that you can measure
consumption rate.  I don't think we provide SQL-level access to those
values.

 Also: how do I check the multixact age of a table?  There doesn't seem
 to be any data for this ...

pg_class.relminmxid is the oldest multixact value that might be present
in a table.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0

2014-04-16 Thread Alvaro Herrera
Bruce Momjian wrote:
 
 Yes, I saw that yesterday and fixed it.  I also did a dry run of
 backpatching and only 8.4 had conflicts, so I think we are good there.
 (This is like the readdir() fix all over again.)
 
 Once this is applied I will work on changing the libpq socket type to
 use portable pgsocket, but I am not planning to backpatch that unless we
 find a bug.

Are we done with this patch?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Petr Jelinek

On 16/04/14 15:10, Andres Freund wrote:


I think we really should bite the bullet and change this before 9.4
comes out. The current static bgworker facility has only been out there
for one release, and dynamic bgworkers aren't released yet at all. If we
wait with this for 9.5, we'll annoy many more people.



+1



On 2014-02-03 11:29:22 -0500, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:

So
exit(0) - done, permanently
exit(1) - done until restart interval
exit(other) - crash
and there's no way to obtain the restart immediately behavior?




Also I think if we do it this way, the incompatibility impact is rather 
small for most existing bgworkers, like Robert I haven't seen anybody 
actually using the exit code 0 currently - I am sure somebody does, but 
it seems to be very small minority.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Andres Freund
On 2014-04-16 08:25:23 -0500, Merlin Moncure wrote:
 The downside of this approach was complexity and difficult to test for
 edge case complexity.  I would like to point out though that while i/o
 efficiency gains are nice, I think contention issues are the bigger
 fish to fry.

That's my feeling as well.

 
 On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote:
  What about:  9. Don't wait on locked buffer in the clock sweep.
 
  I don't think we do that? Or are you referring to locked buffer headers?
 
 Right -- exactly.  I posted patch for this a while back. It's quite
 trivial: implement a trylock variant of the buffer header lock macro
 and further guard the check with a non-locking test (which TAS()
 already does generally, but the idea is to avoid the cache line lock
 in likely cases of contention).  I believe this to be unambiguously
 better: even if it's self healing or unlikely, there is no good reason
 to jump into a spinlock fray or even request a contented cache line
 while holding a critical lock.

IIRC you had problems proving the benefits of that, right?

I think that's because the locking times of buffer headers are short
enough that it's really unlikely to read a locked buffer header
spinlock. The spinlock acquiration will have made the locker the
exclusive owner of the spinlock in the majority of cases, and as soon as
that happens the cache miss/transfer will take far longer than the lock
takes.

I think this is the wrong level to optimize things. Imo there's two
possible solutions (that don't exclude each other):

* perform the clock sweep in one process so there's a very fast way to
  get to a free buffer. Possibly in a partitioned way.

* Don't take a global exclusive lock while performing the clock
  sweep. Instead increase StrategyControl-nextVictimBuffer in chunks
  under an exclusive lock, and then scan the potential victim buffers in
  those chunks without a global lock held.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0

2014-04-16 Thread Bruce Momjian
On Wed, Apr 16, 2014 at 10:34:55AM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  
  Yes, I saw that yesterday and fixed it.  I also did a dry run of
  backpatching and only 8.4 had conflicts, so I think we are good there.
  (This is like the readdir() fix all over again.)
  
  Once this is applied I will work on changing the libpq socket type to
  use portable pgsocket, but I am not planning to backpatch that unless we
  find a bug.
 
 Are we done with this patch?

I am about to patch it now.   I was going to do it yesterday but was
concerned I wouldn't be online long enough to catch any buildfarm
failure.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] test failure on latest source

2014-04-16 Thread Marco Atzeri

On 13/04/2014 18:09, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-04-12 16:35:48 -0400, Tom Lane wrote:

In principle, that commit shouldn't have affected behavior for pg_hba
entries with numeric address fields ...



Hm. getaddrinfo.c has this bit:
/* Unsupported flags. */
if (flags  NI_NAMEREQD)
return EAI_AGAIN;


Yeah, and that flag is only ever specified when attempting to do reverse
lookup on a client address to see if it matches a non-numeric pg_hba
entry.

regards, tom lane




just to recap, as I think no one have yet proposed/implemented
a solution.

first failure I see on cygwin is from
-
$ git log |head
commit fc752505a99a4e2c781a070d3d42a25289c22e3c
Author: Tom Lane t...@sss.pgh.pa.us
Date:   Wed Apr 2 17:11:24 2014 -0400

Fix assorted issues in client host name lookup.
[cut]
---

previous one is fine
--
commit f33a71a7865a1dd54f04b370e2637f88665f8db8
Author: Tom Lane t...@sss.pgh.pa.us
Date:   Wed Apr 2 14:30:08 2014 -0400

De-anonymize the union in JsonbValue.

Needed for strict C89 compliance.


error log

cat 
/pub/devel/postgresql/prova/build_orig/src/test/regress/log/postmaster.log
LOG:  could not resolve localhost: Non-recoverable failure in name 
resolution

LOG:  disabling statistics collector for lack of working socket
WARNING:  autovacuum not started because of misconfiguration
HINT:  Enable the track_counts option.
LOG:  invalid IP address 127.0.0.1: Non-recoverable failure in name 
resolution
CONTEXT:  line 86 of configuration file 
/pub/devel/postgresql/prova/build_orig/src/test/regress/./tmp_check/data/pg_hba.conf

FATAL:  could not load pg_hba.conf
---

and of course localhost and 127.0.0.1 are valid

 $ ping localhost

Pinging GE-MATZERI-EU [127.0.0.1] with 32 bytes of data:
Reply from 127.0.0.1: bytes=32 time1ms TTL=128
[cut]

 $ ping 127.0.0.1

Pinging 127.0.0.1 with 32 bytes of data:
Reply from 127.0.0.1: bytes=32 time1ms TTL=128
Reply from 127.0.0.1: bytes=32 time1ms TTL=128




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think this is the wrong level to optimize things. Imo there's two
 possible solutions (that don't exclude each other):

 * perform the clock sweep in one process so there's a very fast way to
   get to a free buffer. Possibly in a partitioned way.

 * Don't take a global exclusive lock while performing the clock
   sweep. Instead increase StrategyControl-nextVictimBuffer in chunks
   under an exclusive lock, and then scan the potential victim buffers in
   those chunks without a global lock held.

I definitely agree with both of these ideas.  But isn't it sort of
off-topic for this thread?  There are two issues here:

1. Improving the rate at which we can evict buffers, which is what
you're talking about here.

2. Improving the choice of which buffers we evict, which is what
Peter's talking about, or at least what I think he's talking about.

Those things are both important, but they're different, and I'm not
sure that working on one precludes working on the other.  There's
certainly the potential for overlap, but not necessarily.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 9:10 AM, Andres Freund and...@2ndquadrant.com wrote:
 Agreed, but after further reflection it seems like if you've declared
 a restart interval, then done until restart interval is probably the
 common case.  So how about

 exit(0) - done until restart interval, or permanently if there is none
 exit(1) - done permanently, even if a restart interval was declared
 exit(other) - crash

 I don't offhand see a point in an exit and restart immediately case.
 Why exit at all, if you could just keep running?  If you *can't* just
 keep running, it probably means you know you've bollixed something,
 so that the crash case is probably what to do anyway.

 There's the case where you want to quickly go over all the databases,
 but only use one bgworker for it. I don't think there's another way to
 do that.

 I think we really should bite the bullet and change this before 9.4
 comes out. The current static bgworker facility has only been out there
 for one release, and dynamic bgworkers aren't released yet at all. If we
 wait with this for 9.5, we'll annoy many more people.

So, exactly what do you want to change?  If you want to keep the
restart-immediately behavior, then that argues for NOT changing
exit(0).

I actually think the right answer here might be to give background
workers a way to change their configured restart interval.  We've
already got a shared memory area that the postmaster reads to know how
what to do when starting a dynamic background worker, so it probably
wouldn't be that hard.  But I'm not volunteering to undertake that on
April 16th.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-04-16 Thread Robert Haas
On Tue, Apr 15, 2014 at 2:52 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, thank you for the discussion.

 At Tue, 1 Apr 2014 11:41:20 -0400, Robert Haas wrote
 I don't find that very radical at all.  The backup_label file is
 *supposed* to be removed on the master if it crashes during the
 backup; and it should never be removed from the backup itself.  At
 least that's how I understand it.  Unfortunately, people too often

 The code indeed seems to assume that, and I couldn't think of any
 measure to avoid that dead-end once recovery sequence reads
 backup label accidentially left behind. I thought up to remove
 backup label during immediate shutdown on prvious versions, like
 9.4 does.

 CancelBackup does only stat-unlink-rename sequence so I think
 this doesn't obstruct immediate shutdown sequence. And this
 doesn't change any seeming behavior or interfaces just except for
 this case. What do you think about this? Isn't this also
 applicable for older versions?

I don't think we should consider changing long-established behavior in
the back-branches.  The old behavior may not be ideal, but that
doesn't mean it's a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Andres Freund
On 2014-04-16 10:29:29 -0400, Robert Haas wrote:
 On Wed, Apr 16, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote:
  I think this is the wrong level to optimize things. Imo there's two
  possible solutions (that don't exclude each other):
 
  * perform the clock sweep in one process so there's a very fast way to
get to a free buffer. Possibly in a partitioned way.
 
  * Don't take a global exclusive lock while performing the clock
sweep. Instead increase StrategyControl-nextVictimBuffer in chunks
under an exclusive lock, and then scan the potential victim buffers in
those chunks without a global lock held.
 
 I definitely agree with both of these ideas.  But isn't it sort of
 off-topic for this thread?

Yes, I agree it's somewhat offtopic - I only started on it (I think)
because Merlin commented on it. But I also agree with Merlin's that
comment at the moment that the scalability issues (concurrency and size
of shared buffers). If you can't use a large enough s_b to contain a
significant portion of your workload, you're relying on the OS cache
anyway.

 1. Improving the rate at which we can evict buffers, which is what
 you're talking about here.
 
 2. Improving the choice of which buffers we evict, which is what
 Peter's talking about, or at least what I think he's talking about.
 
 Those things are both important, but they're different, and I'm not
 sure that working on one precludes working on the other.  There's
 certainly the potential for overlap, but not necessarily.

I don't think that that they neccessarily preclude each other
either. But my gut feeling tells me that it'll be hard to have
interesting algorithmic improvements on the buffer eviction choice
because any additional complexity around that will have prohibitively
high scalability impacts due to the coarse locking.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0

2014-04-16 Thread Bruce Momjian
On Fri, Apr 11, 2014 at 08:28:55AM -0400, Bruce Momjian wrote:
 On Fri, Apr 11, 2014 at 10:03:08AM +0530, Amit Kapila wrote:
  On Fri, Apr 11, 2014 at 10:00 AM, Amit Kapila amit.kapil...@gmail.com 
  wrote:
   On Thu, Apr 10, 2014 at 5:21 PM, Bruce Momjian br...@momjian.us wrote:
   On Thu, Apr 10, 2014 at 11:05:49AM +0530, Amit Kapila wrote:
  
   Ah, yes, good point.  This is going to require backpatching then.
  
   I also think so.
  
   I think it's better to use check like below, just for matter of
   consistency with other place
   if (sock == INVALID_SOCKET)
  
   Agreed.  That is how I have coded the patch.
  
   Sorry, I didn't checked the latest patch before that comment.
  
   I verified that your last patch is fine.  Regression test also went fine.
  
  I have noticed small thing which I forgot to mention in previous mail.
  I think below added extra line is not required.
  
int
PQsocket(const PGconn *conn)
{
  +
 
 Yes, I saw that yesterday and fixed it.  I also did a dry run of
 backpatching and only 8.4 had conflicts, so I think we are good there.
 (This is like the readdir() fix all over again.)

Patch applied back through 9.0.  8.4 didn't have the infrastructure for
a proper fix.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Andres Freund
Hi,


Stephen flagged a ENOPARSE:

On 2014-04-16 16:49:55 +0200, Andres Freund wrote:
 But I also agree with Merlin's that comment at the moment that the
 scalability issues (concurrency and size of shared buffers).

That should have been:

But I also agree with Merlin's comment that at the moment the
scalability issues are bigger than the cache eviction choices.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The question about the type numeric

2014-04-16 Thread Robert Haas
On Tue, Apr 15, 2014 at 5:37 AM, sure.postgres sure.postg...@gmail.com wrote:
 Hi hackers,

 I am learning about numeric .
 The comment of NumericShort format is:
  * In the NumericShort format, the remaining 14 bits of the header word
  * (n_short.n_header) are allocated as follows: 1 for sign (positive or
  * negative), 6 for dynamic scale, and 7 for weight.  In practice, most
  * commonly-encountered values can be represented this way.

 So the Max of the NumericShort format should be up to 508 digits before the
 decimal point.
 So the sign of the number 12345678901234567890123456789012345678901234567890
 12345678901234567890123456789012345678901234567890123456789012345678901234567890
 12345678901234567890123456789012345678901234567890123456789012345678901234567890
 12345678901234567890123456789012345678901234567
 should be 0x807F.
 The number is 257 digits before the decimal point.
 But the sign is 0.
 So is there anything wrong?

I'm not sure I understand the question, but if it helps, the sign bit
will be set (1) for negative values and clear (0) for positive values.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] test failure on latest source

2014-04-16 Thread Alvaro Herrera
Marco Atzeri wrote:
 On 13/04/2014 18:09, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-12 16:35:48 -0400, Tom Lane wrote:
 In principle, that commit shouldn't have affected behavior for pg_hba
 entries with numeric address fields ...
 
 Hm. getaddrinfo.c has this bit:
 /* Unsupported flags. */
 if (flags  NI_NAMEREQD)
 return EAI_AGAIN;
 
 Yeah, and that flag is only ever specified when attempting to do reverse
 lookup on a client address to see if it matches a non-numeric pg_hba
 entry.

I don't know if this is relevant, but perhaps we're defining the
constants in a way that conflicts with the values defined by cygwin.  A
very quick search finds a 2007 patch for Mutt[1] that seems to have
NI_NAMEREQD defined as 8 somewhere, while 4 is NI_NOFQDN.  But we have
this in getaddrinfo.h:

#ifndef NI_NAMEREQD
#define NI_NAMEREQD 4
#endif

So maybe we're doing something wrong.  Indeed, my system has in
/usr/include/netdb.h

# define NI_NAMEREQD8   /* Don't return numeric addresses.  */

You'd do well to research this more, I think.

[1] http://marc.info/?l=mutt-devm=117752314512877w=2

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Andres Freund
On 2014-04-16 10:37:12 -0400, Robert Haas wrote:
 On Wed, Apr 16, 2014 at 9:10 AM, Andres Freund and...@2ndquadrant.com wrote:
  Agreed, but after further reflection it seems like if you've declared
  a restart interval, then done until restart interval is probably the
  common case.  So how about
 
  exit(0) - done until restart interval, or permanently if there is none
  exit(1) - done permanently, even if a restart interval was declared
  exit(other) - crash
 
  I don't offhand see a point in an exit and restart immediately case.
  Why exit at all, if you could just keep running?  If you *can't* just
  keep running, it probably means you know you've bollixed something,
  so that the crash case is probably what to do anyway.
 
  There's the case where you want to quickly go over all the databases,
  but only use one bgworker for it. I don't think there's another way to
  do that.
 
  I think we really should bite the bullet and change this before 9.4
  comes out. The current static bgworker facility has only been out there
  for one release, and dynamic bgworkers aren't released yet at all. If we
  wait with this for 9.5, we'll annoy many more people.
 
 So, exactly what do you want to change?  If you want to keep the
 restart-immediately behavior, then that argues for NOT changing
 exit(0).

I don't neccessarily want to keep that behaviour. It can be emulated
easily enough by setting the restart interval to 0. I just wanted to
explain that it's not completely unreasonable to want such a short
restart interval.

 I actually think the right answer here might be to give background
 workers a way to change their configured restart interval.  We've
 already got a shared memory area that the postmaster reads to know how
 what to do when starting a dynamic background worker, so it probably
 wouldn't be that hard.  But I'm not volunteering to undertake that on
 April 16th.

That seems like a separate feature - and I don't see a need to rush that
into 9.4. All I want that if we decide to change the API, we should do
it now.

What I dislike about the status quo is that the
1) The regular exit(0) behaves all but regular.
2) The only way to regularly exit is logged as a failure. Thus there's
   no real way to flag a failure that's actually a failure.

So I think your proposal above already is an improvement upon the status
quo. Maybe we also should change the logging of bgworker exits.

I think we probably also need a way to exit that's treated as an error,
but doesn't lead to a PANIC restart.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 10:49 AM, Andres Freund and...@2ndquadrant.com wrote:
 1. Improving the rate at which we can evict buffers, which is what
 you're talking about here.

 2. Improving the choice of which buffers we evict, which is what
 Peter's talking about, or at least what I think he's talking about.

 Those things are both important, but they're different, and I'm not
 sure that working on one precludes working on the other.  There's
 certainly the potential for overlap, but not necessarily.

 I don't think that that they neccessarily preclude each other
 either. But my gut feeling tells me that it'll be hard to have
 interesting algorithmic improvements on the buffer eviction choice
 because any additional complexity around that will have prohibitively
 high scalability impacts due to the coarse locking.

Doesn't that amount to giving up?  I mean, I'm not optimistic about
the particular approach Peter's chosen here being practical for the
reasons that you and I already articulated.  But I don't think that
means there *isn't* a viable approach; and I think Peter's test
results demonstrate that the additional complexity of a better
algorithm can more than pay for itself.  That's a pretty important
point to keep in mind.

Also, I think the scalability problems around buffer eviction are
eminently solvable, and in particular I'm hopeful that Amit is going
to succeed in solving them.  Suppose we have a background process
(whether the background writer or some other) that runs the clock
sweep, identifies good candidates for eviction, and pushes them on a
set of, say, 16 free-lists protected by spinlocks.  (The optimal
number of free-lists probably depends on the size of shared_buffers.)
Backends try to reclaim by popping buffers off of one of these
free-lists and double-checking whether the page is still a good
candidate for eviction (i.e. it's still clean and unpinned).  If the
free list is running low, they kick the background process via a latch
to make sure it's awake and working to free up more stuff, and if
necessary, advance the clock sweep themselves.  This can even be done
by multiple processes at once, if we adopt your idea of advancing the
clock sweep hand by N buffers at a time and then scanning them
afterwards without any global lock.  In such a world, it's still not
permissible for reclaim calculations to be super-complex, but you hope
that most of the activity is happening in the background process, so
cycle-shaving becomes less critical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0

2014-04-16 Thread Bruce Momjian
On Fri, Apr 11, 2014 at 08:28:55AM -0400, Bruce Momjian wrote:
 Once this is applied I will work on changing the libpq socket type to
 use portable pgsocket, but I am not planning to backpatch that unless we
 find a bug.

Attached is a follow up patch which stores socket values in libpq as
pgsocket, rather than int, and maps it to -1 only for the PQsocket()
external return value.  In the interest of time, I will apply this later
today, and only to head as it does not fix a bug.

This is the last open item I was working on.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
new file mode 100644
index 51d4de4..90b944a
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*** pqDropConnection(PGconn *conn)
*** 398,406 
  	/* Drop any SSL state */
  	pqsecure_close(conn);
  	/* Close the socket itself */
! 	if (conn-sock = 0)
  		closesocket(conn-sock);
! 	conn-sock = -1;
  	/* Discard any unread/unsent data */
  	conn-inStart = conn-inCursor = conn-inEnd = 0;
  	conn-outCount = 0;
--- 398,406 
  	/* Drop any SSL state */
  	pqsecure_close(conn);
  	/* Close the socket itself */
! 	if (conn-sock != PGINVALID_SOCKET)
  		closesocket(conn-sock);
! 	conn-sock = PGINVALID_SOCKET;
  	/* Discard any unread/unsent data */
  	conn-inStart = conn-inCursor = conn-inEnd = 0;
  	conn-outCount = 0;
*** keep_going:		/* We will come back to
*** 1631,1654 
  		   addr_cur-ai_addrlen);
  	conn-raddr.salen = addr_cur-ai_addrlen;
  
! 	/* Open a socket */
! 	{
! 		/*
! 		 * While we use 'pgsocket' as the socket type in the
! 		 * backend, we use 'int' for libpq socket values.
! 		 * This requires us to map PGINVALID_SOCKET to -1
! 		 * on Windows.
! 		 * See http://msdn.microsoft.com/en-us/library/windows/desktop/ms740516%28v=vs.85%29.aspx
! 		 */
! 		pgsocket sock = socket(addr_cur-ai_family, SOCK_STREAM, 0);
! #ifdef WIN32
! 		if (sock == PGINVALID_SOCKET)
! 			conn-sock = -1;
! 		else
! #endif
! 			conn-sock = sock;
! 	}
! 	if (conn-sock == -1)
  	{
  		/*
  		 * ignore socket() failure if we have more addresses
--- 1631,1638 
  		   addr_cur-ai_addrlen);
  	conn-raddr.salen = addr_cur-ai_addrlen;
  
! 	conn-sock = socket(addr_cur-ai_family, SOCK_STREAM, 0);
! 	if (conn-sock == PGINVALID_SOCKET)
  	{
  		/*
  		 * ignore socket() failure if we have more addresses
*** makeEmptyPGconn(void)
*** 2717,2723 
  	conn-client_encoding = PG_SQL_ASCII;
  	conn-std_strings = false;	/* unless server says differently */
  	conn-verbosity = PQERRORS_DEFAULT;
! 	conn-sock = -1;
  	conn-auth_req_received = false;
  	conn-password_needed = false;
  	conn-dot_pgpass_used = false;
--- 2701,2707 
  	conn-client_encoding = PG_SQL_ASCII;
  	conn-std_strings = false;	/* unless server says differently */
  	conn-verbosity = PQERRORS_DEFAULT;
! 	conn-sock = PGINVALID_SOCKET;
  	conn-auth_req_received = false;
  	conn-password_needed = false;
  	conn-dot_pgpass_used = false;
*** closePGconn(PGconn *conn)
*** 2882,2888 
  	 * Note that the protocol doesn't allow us to send Terminate messages
  	 * during the startup phase.
  	 */
! 	if (conn-sock = 0  conn-status == CONNECTION_OK)
  	{
  		/*
  		 * Try to send close connection message to backend. Ignore any
--- 2866,2872 
  	 * Note that the protocol doesn't allow us to send Terminate messages
  	 * during the startup phase.
  	 */
! 	if (conn-sock != PGINVALID_SOCKET  conn-status == CONNECTION_OK)
  	{
  		/*
  		 * Try to send close connection message to backend. Ignore any
*** PQgetCancel(PGconn *conn)
*** 3103,3109 
  	if (!conn)
  		return NULL;
  
! 	if (conn-sock  0)
  		return NULL;
  
  	cancel = malloc(sizeof(PGcancel));
--- 3087,3093 
  	if (!conn)
  		return NULL;
  
! 	if (conn-sock == PGINVALID_SOCKET)
  		return NULL;
  
  	cancel = malloc(sizeof(PGcancel));
*** PQrequestCancel(PGconn *conn)
*** 3284,3290 
  	if (!conn)
  		return FALSE;
  
! 	if (conn-sock  0)
  	{
  		strlcpy(conn-errorMessage.data,
  PQrequestCancel() -- connection is not open\n,
--- 3268,3274 
  	if (!conn)
  		return FALSE;
  
! 	if (conn-sock == PGINVALID_SOCKET)
  	{
  		strlcpy(conn-errorMessage.data,
  PQrequestCancel() -- connection is not open\n,
*** PQsocket(const PGconn *conn)
*** 5329,5335 
  {
  	if (!conn)
  		return -1;
! 	return conn-sock;
  }
  
  int
--- 5313,5319 
  {
  	if (!conn)
  		return -1;
! 	return (conn-sock != PGINVALID_SOCKET) ? conn-sock : -1;
  }
  
  int
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
new file mode 100644
index 8ccf6d3..50e4035
*** 

Re: [HACKERS] test failure on latest source

2014-04-16 Thread Marco Atzeri



On 16/04/2014 17:14, Alvaro Herrera wrote:

Marco Atzeri wrote:

On 13/04/2014 18:09, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-04-12 16:35:48 -0400, Tom Lane wrote:

In principle, that commit shouldn't have affected behavior for pg_hba
entries with numeric address fields ...



Hm. getaddrinfo.c has this bit:
/* Unsupported flags. */
if (flags  NI_NAMEREQD)
return EAI_AGAIN;


Yeah, and that flag is only ever specified when attempting to do reverse
lookup on a client address to see if it matches a non-numeric pg_hba
entry.


I don't know if this is relevant, but perhaps we're defining the
constants in a way that conflicts with the values defined by cygwin.  A
very quick search finds a 2007 patch for Mutt[1] that seems to have
NI_NAMEREQD defined as 8 somewhere, while 4 is NI_NOFQDN.  But we have
this in getaddrinfo.h:

#ifndef NI_NAMEREQD
#define NI_NAMEREQD 4
#endif

So maybe we're doing something wrong.  Indeed, my system has in
/usr/include/netdb.h

# define NI_NAMEREQD8   /* Don't return numeric addresses.  */

You'd do well to research this more, I think.

[1] http://marc.info/?l=mutt-devm=117752314512877w=2


on cygwin both 32 and 64 bit I see

netdb.h:#define NI_NAMEREQD 0x4 /* Not being able to resolve is 
an error. */


same on
w32api/ws2tcpip.h:#define NI_NAMEREQD 0x04

curiosly I see also on
roken-common.h:#define NI_NAMEREQD  0x02

$ cygcheck -f /usr/include/roken-common.h
libkrb5-devel-1.5.3-1

not sure if it has any relevance at all in this case.













--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Andres Freund
On 2014-04-16 11:28:04 -0400, Robert Haas wrote:
 On Wed, Apr 16, 2014 at 10:49 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  1. Improving the rate at which we can evict buffers, which is what
  you're talking about here.
 
  2. Improving the choice of which buffers we evict, which is what
  Peter's talking about, or at least what I think he's talking about.
 
  Those things are both important, but they're different, and I'm not
  sure that working on one precludes working on the other.  There's
  certainly the potential for overlap, but not necessarily.
 
  I don't think that that they neccessarily preclude each other
  either. But my gut feeling tells me that it'll be hard to have
  interesting algorithmic improvements on the buffer eviction choice
  because any additional complexity around that will have prohibitively
  high scalability impacts due to the coarse locking.
 
 Doesn't that amount to giving up?  I mean, I'm not optimistic about
 the particular approach Peter's chosen here being practical for the
 reasons that you and I already articulated.  But I don't think that
 means there *isn't* a viable approach; and I think Peter's test
 results demonstrate that the additional complexity of a better
 algorithm can more than pay for itself.  That's a pretty important
 point to keep in mind.

Well, I think it could be a very good idea to invest more resources
(cpu, bus, memory) in buffer management - but doing so right *now* where
it's all done under one monolithic lock will have noticeable
consequences for many workloads. Spending more cycles per buffer won't
be very noticeable if it's not done under a gigantic lock - right now it
will be.

 [ reasonable proposal ].  In such a world, it's still not
 permissible for reclaim calculations to be super-complex, but you hope
 that most of the activity is happening in the background process, so
 cycle-shaving becomes less critical.

Yes, agreed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 11:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 I actually think the right answer here might be to give background
 workers a way to change their configured restart interval.  We've
 already got a shared memory area that the postmaster reads to know how
 what to do when starting a dynamic background worker, so it probably
 wouldn't be that hard.  But I'm not volunteering to undertake that on
 April 16th.

 That seems like a separate feature - and I don't see a need to rush that
 into 9.4. All I want that if we decide to change the API, we should do
 it now.

Well, I'm pretty OK with that, since I proposed it before and still
think it's important, but I'd rather leave it alone for another
release than rush into something we'll just end up changing again.

 What I dislike about the status quo is that the
 1) The regular exit(0) behaves all but regular.
 2) The only way to regularly exit is logged as a failure. Thus there's
no real way to flag a failure that's actually a failure.

 So I think your proposal above already is an improvement upon the status
 quo.

I don't have time to write a patch for that, but I'm OK with
committing it (or having someone else commit it) even at this late
date, unless someone objects.

 Maybe we also should change the logging of bgworker exits.

It's pretty clear that the logging around bgworkers is way too verbose
for anything other than a long-running daemon, but I don't think we
should try to fix that problem right now.  It needs more careful
thought than we have time to give it at this juncture.  One idea might
be to let the registrant of the background worker specify a logging
level, or maybe just a flag bit to indicate verbose (LOG) or quiet
(say, DEBUG1 or DEBUG2) logging.

 I think we probably also need a way to exit that's treated as an error,
 but doesn't lead to a PANIC restart.

Why can't that be handled through ereport(ERROR/FATAL) rather than
through the choice of exit status?  It seems to me that the only point
of the exit status is or should be to provide feedback to the
postmaster on how it should respond to the background worker's
untimely demise.  If any other information needs to be conveyed, the
worker should log that itself rather than trying to tell the
postmaster what to log.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] test failure on latest source

2014-04-16 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 I don't know if this is relevant, but perhaps we're defining the
 constants in a way that conflicts with the values defined by cygwin.

Hm, that's a thought, though I still don't see how it's relevant to the
reported failure.  Perhaps Cygwin is defining these constants somewhere
other than netdb.h?  Because getaddrinfo.h definitely pulls that one
in first.

The bigger picture though is that this code isn't failing on the
buildfarm.  So what we need to ask is what's different about Marco's
machine.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Andres Freund
On 2014-04-16 11:37:47 -0400, Robert Haas wrote:
  I think we probably also need a way to exit that's treated as an error,
  but doesn't lead to a PANIC restart.
 
 Why can't that be handled through ereport(ERROR/FATAL) rather than
 through the choice of exit status?  It seems to me that the only point
 of the exit status is or should be to provide feedback to the
 postmaster on how it should respond to the background worker's
 untimely demise.  If any other information needs to be conveyed, the
 worker should log that itself rather than trying to tell the
 postmaster what to log.

I dislike that because it essentially requires the bgworker to have a
full error catching environment like PostgresMain() has. That seems
bad for many cases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-16 11:37:47 -0400, Robert Haas wrote:
 Why can't that be handled through ereport(ERROR/FATAL) rather than
 through the choice of exit status?

 I dislike that because it essentially requires the bgworker to have a
 full error catching environment like PostgresMain() has. That seems
 bad for many cases.

That sounds like utter nonsense.  No sane bgwriter code is going to be
able to discount the possibility of something throwing an elog(ERROR).
Now, you might not need the ability to do a transaction abort, but
you'll have to at least be able to terminate the process cleanly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Andres Freund
On 2014-04-16 11:54:25 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-04-16 11:37:47 -0400, Robert Haas wrote:
  Why can't that be handled through ereport(ERROR/FATAL) rather than
  through the choice of exit status?
 
  I dislike that because it essentially requires the bgworker to have a
  full error catching environment like PostgresMain() has. That seems
  bad for many cases.
 
 That sounds like utter nonsense.  No sane bgwriter code is going to be
 able to discount the possibility of something throwing an elog(ERROR).
 Now, you might not need the ability to do a transaction abort, but
 you'll have to at least be able to terminate the process cleanly.

Why? Throwing an error without an exception stack seems to work
sensibly? The error is promoted to FATAL and the normal FATAL handling
is performed? C.f. pg_re_throw() called via errfinish() in the ERRROR
case.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-16 Thread Robert Haas
On Tue, Apr 15, 2014 at 2:23 PM, Christian Ullrich ch...@chrullrich.net wrote:
 * From: Robert Haas
 On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich
 ch...@chrullrich.net wrote:

  I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was
  set, the postmaster etc. would ignore the events.

 Why not just pass a command-line switch?

 Because, as I wrote in the message you are quoting, I did not think that
 having a command-line option for the sole purpose of telling the
 postmaster who its parent is was a suitable solution.

True, but you didn't say why, which is what I asked.  You just said
you didn't think it was a good idea, without elaborating.

 While I have you here, though, any suggestions on what the name of that
 option should be? I think --background is about right. Also, how should
 I treat the option on non-Windows platforms? Should it just not be there
 (= error), or be ignored if present?

Well, we had a recent discussion that's related to this, about a
not-entirely-dissimilar problem on Solaris:

http://www.postgresql.org/message-id/22636.1392419...@sss.pgh.pa.us

The proposal there was --daemonize.  That seems somehow inapposite for
Windows, though.  I don't really have a strong opinion at this point
on what the best naming is; I don't love --background, but I haven't
got a better idea, either.

On the topic of how the option should be handled on non-Windows
platforms, I guess I'd vote for accepting it and ignoring it.  The
Solaris issue can (and, IMHO, should) be fixed by teaching pg_ctl to
use fork()+exec()+setsid() rather than system()+hope the shell behaves
the way we'd like, so we don't currently have an apparent need for any
special handling for this case on any platform other than Windows.
But that could change in the future, so I think it'd be smart to
choose a somewhat generic option name and just define it as a no-op on
non-Windows platforms for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 12:00 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-16 11:54:25 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-04-16 11:37:47 -0400, Robert Haas wrote:
  Why can't that be handled through ereport(ERROR/FATAL) rather than
  through the choice of exit status?

  I dislike that because it essentially requires the bgworker to have a
  full error catching environment like PostgresMain() has. That seems
  bad for many cases.

 That sounds like utter nonsense.  No sane bgwriter code is going to be
 able to discount the possibility of something throwing an elog(ERROR).
 Now, you might not need the ability to do a transaction abort, but
 you'll have to at least be able to terminate the process cleanly.

 Why? Throwing an error without an exception stack seems to work
 sensibly? The error is promoted to FATAL and the normal FATAL handling
 is performed? C.f. pg_re_throw() called via errfinish() in the ERRROR
 case.

And... so what's the problem?  You seemed to be saying that the
background worker would need to a more developed error-handling
environment in order to do proper logging, but here you're saying
(rightly, I believe) that it doesn't.  Even if it did, though, I think
the right solution is to install one, not make it the postmaster's job
to try to read the tea leaves in the worker's exit code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-16 Thread Robert Haas
On Tue, Apr 15, 2014 at 10:46 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com 
 wrote:
 On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 For the create case, I'm wondering if we should put the block that
 tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
 is your opinion?

 Either way is okay, but I think the way you are suggesting is better as it
 will make code consistent with other place (PGSharedMemoryCreate()).

 OK, can you prepare a patch?

 Please find attached patch to address this issue.
 One minor point to note is that now we have to call GetLastError() twice,
 once inside error path and once to check EEXIST, but I think that is okay
 as existing code in PGSharedMemoryCreate() does it that way.

OK.  I committed this blindly, but I don't have a Windows dev
environment, so please keep an eye on the Windows buildfarm members
and provide follow-on patches if any of them get unhappy about this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-16 Thread Hannu Krosing
On 04/16/2014 03:16 PM, Hannu Krosing wrote:
 On 04/16/2014 01:35 PM, Etsuro Fujita wrote:
 (2014/04/16 6:55), Hannu Krosing wrote:
 ...
 Maybe I'm missing something, but I think that you can do what I think
 you'd like to do by the following procedure:
 No, what I'd like PostgreSQL to do is to

 1. select the id+set from local table
 2. select the rows from remote table with WHERE ID IN (set selected in
 step 1)
 3. then join the original set to selected set, with any suitable join
 strategy

 The things I do not want are

 A. selecting all rows from remote table
 (this is what your examples below do)

 or

 B. selecting rows from remote table by single selects using ID = $
 (this is something that I managed to do by some tweaking of costs)

 as A will be always slow if there are millions of rows in remote table
 and B is slow(ish) when the idset is over a few hundred ids

 I hope this is a bit better explanation than I provided before .

 Cheers
 Hannu

 P.S. I am not sure if this is a limitation of postgres_fdw or postgres
 itself

 P.P.S I tested a little with with Multicorn an postgresql did not
 request row
 counts for any IN plans, so it may be that the planner does not consider
 this
 kind of plan at all. (testing was on PgSQL 9.3.4)

 Hannu
Also a sample run of the two plans to illustrate my point

How it is run now:

testdb=# explain analyse verbose
select r.data, l.data
  from onemillion_pgfdw r
  join onemillion l
on r.id = l.id and l.id between 10 and 100100;
 
QUERY
PLAN 
--
 Hash Join  (cost=111.61..198.40 rows=1 width=16) (actual
time=7534.360..8731.541 rows=101 loops=1)
   Output: r.data, l.data
   Hash Cond: (r.id = l.id)
   -  Foreign Scan on public.onemillion_pgfdw r  (cost=100.00..178.25
rows=2275 width=12) (actual time=1.628..8364.688 rows=100 loops=1)
 Output: r.id, r.inserted, r.data
 Remote SQL: SELECT id, data FROM public.onemillion
   -  Hash  (cost=10.39..10.39 rows=98 width=12) (actual
time=0.179..0.179 rows=101 loops=1)
 Output: l.data, l.id
 Buckets: 1024  Batches: 1  Memory Usage: 5kB
 -  Index Scan using onemillion_pkey on public.onemillion l 
(cost=0.42..10.39 rows=98 width=12) (actual time=0.049..0.124 rows=101
loops=1)
   Output: l.data, l.id
   Index Cond: ((l.id = 10) AND (l.id = 100100))
 Total runtime: 8732.213 ms
(13 rows)

Time: 8733.799 ms


And how the above query should be planned/executed:

testdb=# explain analyse verbose
select r.data, l.data
  from (select * from onemillion_pgfdw where id = any (array(select id
from onemillion where id between 10 and 100100))) r
  join onemillion l
on r.id = l.id;

QUERY
PLAN

 Nested Loop  (cost=110.81..1104.30 rows=111 width=16) (actual
time=2.756..3.738 rows=101 loops=1)
   Output: onemillion_pgfdw.data, l.data
   InitPlan 1 (returns $0)
 -  Index Only Scan using onemillion_pkey on public.onemillion 
(cost=0.42..10.39 rows=98 width=4) (actual time=0.055..0.118 rows=101
loops=1)
   Output: onemillion.id
   Index Cond: ((onemillion.id = 10) AND (onemillion.id =
100100))
   Heap Fetches: 101
   -  Foreign Scan on public.onemillion_pgfdw  (cost=100.00..163.41
rows=111 width=12) (actual time=2.729..3.012 rows=101 loops=1)
 Output: onemillion_pgfdw.id, onemillion_pgfdw.inserted,
onemillion_pgfdw.data
 Remote SQL: SELECT id, data FROM public.onemillion WHERE ((id =
ANY ($1::integer[])))
   -  Index Scan using onemillion_pkey on public.onemillion l 
(cost=0.42..8.37 rows=1 width=12) (actual time=0.005..0.006 rows=1
loops=101)
 Output: l.id, l.inserted, l.data
 Index Cond: (l.id = onemillion_pgfdw.id)
 Total runtime: 4.469 ms
(14 rows)

Time: 6.437 ms




 postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000');
 ALTER SERVER
 postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in
 (SELECT id FROM onemillion WHERE data  '0.9' LIMIT 100);
   QUERY PLAN
 ---

  Hash Semi Join  (cost=1023.10..41983.21 rows=100 width=30)
Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
 onemillion_pgsql.data
Hash Cond: (onemillion_pgsql.id = onemillion.id)
-  Foreign Scan on public.onemillion_pgsql 
 (cost=1000.00..39334.00 rows=100 width=29)
  

Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Andres Freund
On 2014-04-16 12:04:26 -0400, Robert Haas wrote:
 And... so what's the problem?  You seemed to be saying that the
 background worker would need to a more developed error-handling
 environment in order to do proper logging, but here you're saying
 (rightly, I believe) that it doesn't.  Even if it did, though, I think
 the right solution is to install one, not make it the postmaster's job
 to try to read the tea leaves in the worker's exit code.

Well, currently it will log the message that has been thrown, that might
lack context. LogChildExit() already has code to print activity of the
bgworker after it crashed.

Note that a FATAL error will always use a exit(1) - so we better not
make that a special case in the code :/.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 7:11 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 TL;DR: I don't think there's a safe way to use a BGWorker (static or
 dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg
 Datum that points into shared memory, and think we might need a API
 change to fix that.

 Andres sensibly points out that part of this is easily solved by passing
 the bgworker an index into an array in a named shmem block. A simple
 pass-by-value Datum that can be turned into a pointer to a shmem struct.

Yes.  I think the answer to your original complaint is that we don't
currently support that, and adding support for that is material for a
future release.

 This still doesn't solve the other half of the issue, which is how to
 handle dynamic bgworkers after a postmaster restart. They're effectively
 lost/leaked: there's no way to retain a bgworker handle across restart,
 and no way to list bgworkers, nor is there any idempotent way to say
 Start a worker to do x only if it doesn't already exist (unique
 names, magic cookie hashes, whatever).

Fair point.

 With the current API the only solution to the second half that I see is
 to have bgworkers run in non-restart mode and manage them yourself. Not
 ideal.

 Instead we need one of:

 - A flag like BGW_UNREGISTER_ON_RESTART;

I would be OK with this, maybe modulo the name.

 - To always unregister dynamic bgws on postmaster shm clear + restart;

I don't particularly care for this.  Let's suppose the background
worker is a long-running daemon, like a PG equivalent of cron.  In
static-background worker land, the admin has to restart the cluster to
get this going.  In dynamic-background worker land, he can load it on
the fly.  But once he gets it up and running, he wants it to stay up
and running, surviving crashes and everything.  That's a big part of
the value of having a background worker interface in the first place.

 - A way to list bgws, inspect their BackgroundWorker structs and obtain
 their handles; or

This is certainly a good idea.

 - A way to idempotently register a bgw only if it doesn't already exist

This is probably a good idea, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-16 12:04:26 -0400, Robert Haas wrote:
 And... so what's the problem?  You seemed to be saying that the
 background worker would need to a more developed error-handling
 environment in order to do proper logging, but here you're saying
 (rightly, I believe) that it doesn't.  Even if it did, though, I think
 the right solution is to install one, not make it the postmaster's job
 to try to read the tea leaves in the worker's exit code.

 Well, currently it will log the message that has been thrown, that might
 lack context. LogChildExit() already has code to print activity of the
 bgworker after it crashed.

I'm still not seeing the problem.  It's the background worker's job to
make sure that the right stuff gets logged, just as it would be for
any other backend.  Trying to bolt some portion of the responsibility
for that onto the postmaster is 100% wrong.

 Note that a FATAL error will always use a exit(1) - so we better not
 make that a special case in the code :/.

Well, everything's a special case, in that every error code has (and
should have) its own meaning.  But the handling we give to exit code 1
is not obviously incompatible with what you probably want to have
happen after a FATAL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: variant of regclass

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 3:27 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Well, I noticed that, too, but I didn't think it was my job to tell
 the patch author what functions he should have wanted.  A follow-on
 patch to add to_regprocedure and to_regoperator wouldn't be much work,
 if you want that.

 And here is a patch for that.

 Looks good to me except duplicate oids. Included is a patch to fix that.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Background Workers and clean exit

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 5:27 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 Hello,

 I've been recently doing some work with dynamic bgworkers and noticed that I
 have no way of saying I am done now and want to exit cleanly because
 bgworkers get restarted automatically on exit code 0 no matter what is the
 restart interval set to.

 I understand the rationale for this behavior when using static bgworkers
 which are meant to run forever, but dynamic ones are spawned dynamically by
 code so they should also be able to terminate cleanly as they have
 presumably finite work to do. Also I think the clean shutdown of dynamic
 bgworker should not be logged (or at least not with the same log level as it
 is now) since it's business as usual.

 Since the dyanamic bgworkers are new in 9.4 the behavior can still be
 changed.

 Thoughts?

See the other thread where this is being discussed...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] AF_UNSPEC vs PF_UNSPEC

2014-04-16 Thread Tom Lane
While wondering what the heck is going on in
http://www.postgresql.org/message-id/534e8fbb.9060...@gmail.com
I chanced to notice that pgstat.c and a couple of other places set
up arguments for getaddrinfo() like this:

hints.ai_family = PF_UNSPEC;

whereas the Single Unix Spec says clearly that AF_UNSPEC is what
to write if you're not intending to constrain the address family.
AF_UNSPEC is what we use in the majority of places, too.

On Linux, at least, these symbols have the same value so it doesn't
matter; but I wonder whether they are different on recent Cygwin.

Anyway, I think this is clearly wrong and we should change it.
I see a PF_INET that presumably ought to be AF_INET in
pg_dump/parallel.c, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Andres Freund
On 2014-04-16 12:20:01 -0400, Robert Haas wrote:
 On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-04-16 12:04:26 -0400, Robert Haas wrote:
  And... so what's the problem?  You seemed to be saying that the
  background worker would need to a more developed error-handling
  environment in order to do proper logging, but here you're saying
  (rightly, I believe) that it doesn't.  Even if it did, though, I think
  the right solution is to install one, not make it the postmaster's job
  to try to read the tea leaves in the worker's exit code.
 
  Well, currently it will log the message that has been thrown, that might
  lack context. LogChildExit() already has code to print activity of the
  bgworker after it crashed.
 
 I'm still not seeing the problem.  It's the background worker's job to
 make sure that the right stuff gets logged, just as it would be for
 any other backend.  Trying to bolt some portion of the responsibility
 for that onto the postmaster is 100% wrong.

Well, it already has taken on that responsibility, it's not my idea to
add it. I merely want to control more precisely what happens.

  Note that a FATAL error will always use a exit(1) - so we better not
  make that a special case in the code :/.
 
 Well, everything's a special case, in that every error code has (and
 should have) its own meaning.  But the handling we give to exit code 1
 is not obviously incompatible with what you probably want to have
 happen after a FATAL.

That was essentially directed at Tom's proposal in
http://www.postgresql.org/message-id/32224.1391444...@sss.pgh.pa.us - I
don't think that'd be compatible with FATAL errors.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-04-16 Thread Robert Haas
On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 It certainly might be --- I have no idea.  What surprised me is that we
 are relying solely on system() to block signals to pg_ctl-spawned
 servers.  The question is whether that is sufficient and whether we
 should be doing more.  I don't think we have to make adjustments just
 for Solaris 9.

 We aren't relying on system(); it does no such thing, according to the
 POSIX spec.  If it did, pg_ctl would be unable to print any errors to the
 terminal, because dissociating from the foreground process group generally
 also disables your ability to print on the terminal.

 I poked around in the POSIX spec a bit, and if I'm reading it correctly,
 the only thing that typically results in the postmaster becoming
 dissociated from the terminal is use of  to launch it.  In a shell
 with job control, that should result in the process being put into a
 background process group that won't receive terminal signals nor be
 permitted to do I/O to it.  This is distinct from dissociating altogether
 because you can use fg to return the process to foreground; if we did a
 setsid() we'd lose that option, if I'm reading the standards correctly.
 So I'm loath to see the postmaster doing setsid() for itself.

 POSIX also mandates that interactive shells have job control enabled by
 default.

 However ... the  isn't issued in the user's interactive shell.  It's
 seen by the shell launched by pg_ctl's system() call.  So it appears to
 be standards-conforming if that shell does nothing to move the launched
 postmaster into the background.

 The POSIX spec describes a shell switch -m that forces subprocesses
 to be launched in their own process groups.  So maybe what we ought
 to do is teach pg_ctl to do something like

system(set -m; postgres ...);

 Dunno if this is really portable, though it ought to be.

 Alternatively, we could do what the comments in pg_ctl have long thought
 desirable, namely get rid of use of system() in favor of fork()/exec().
 With that, pg_ctl could do a setsid() inside the child process.

I like this last approach.  It seems to me that the problem with
system() is that it's relying on the user's shell to have sane
behavior.  And while that obviously will work fine in a whole lot of
cases, I don't see why we should rely on it.  I don't think your
shell must support -m with POSIX-like semantics is really a
requirement we want to impose on PostgreSQL users.  The shell can't
make any system calls that we don't have access to ourselves, and
setsid() seems like the right one to use.  Maybe it's begging for
trouble to change anything here at all, but I think if we are going to
make a change it ought to be in the direction of making us less
dependent on the vagaries of the user's shell, not more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 12:28 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-16 12:20:01 -0400, Robert Haas wrote:
 On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-04-16 12:04:26 -0400, Robert Haas wrote:
  And... so what's the problem?  You seemed to be saying that the
  background worker would need to a more developed error-handling
  environment in order to do proper logging, but here you're saying
  (rightly, I believe) that it doesn't.  Even if it did, though, I think
  the right solution is to install one, not make it the postmaster's job
  to try to read the tea leaves in the worker's exit code.
 
  Well, currently it will log the message that has been thrown, that might
  lack context. LogChildExit() already has code to print activity of the
  bgworker after it crashed.

 I'm still not seeing the problem.  It's the background worker's job to
 make sure that the right stuff gets logged, just as it would be for
 any other backend.  Trying to bolt some portion of the responsibility
 for that onto the postmaster is 100% wrong.

 Well, it already has taken on that responsibility, it's not my idea to
 add it. I merely want to control more precisely what happens.s

I think that's doubling down on an already-questionable design principle.

Or if I may be permitted a more colloquial idiom:

Luke, it's a trap.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-04-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alternatively, we could do what the comments in pg_ctl have long thought
 desirable, namely get rid of use of system() in favor of fork()/exec().
 With that, pg_ctl could do a setsid() inside the child process.

 I like this last approach.

Me too, but it looked like a less-than-trivial bit of work to me
(else I might just have gone and done it).  Are you volunteering?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alternatively, we could do what the comments in pg_ctl have long thought
 desirable, namely get rid of use of system() in favor of fork()/exec().
 With that, pg_ctl could do a setsid() inside the child process.

 I like this last approach.

 Me too, but it looked like a less-than-trivial bit of work to me
 (else I might just have gone and done it).  Are you volunteering?

I don't have time right at the moment, but maybe at some point, if
nobody else gets to it first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tracking replication slot blockings

2014-04-16 Thread Magnus Hagander
I'm thinking it could be interesting to know how many times (or in some
other useful unit than times - how often) a specific replication slot has
blocked xlog rotation. Since this AFAIK only happens during checkpoints,
it seems it should be reasonably cheap to track? It would serve as an
indicator of which slave(s) are having enough trouble keeping up to
potentially cause issues.

Not having looked at that code at all yet, would this be something that's
simple to add?

Or is it a silly idea? :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] test failure on latest source

2014-04-16 Thread Marco Atzeri

On 16/04/2014 17:40, Tom Lane wrote:


The bigger picture though is that this code isn't failing on the
buildfarm.  So what we need to ask is what's different about Marco's
machine.


good question.
I checked again and I found that the fault is only on
the cygwin 64 bit build but not on the cygwin 32 bit one.

I was sure to have tested both, but it seems this time
I missed the comparison.


regards, tom lane


Regards
Marco





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking replication slot blockings

2014-04-16 Thread Andres Freund
Hi,

On 2014-04-16 18:51:41 +0200, Magnus Hagander wrote:
 I'm thinking it could be interesting to know how many times (or in some
 other useful unit than times - how often) a specific replication slot has
 blocked xlog rotation. Since this AFAIK only happens during checkpoints,
 it seems it should be reasonably cheap to track? It would serve as an
 indicator of which slave(s) are having enough trouble keeping up to
 potentially cause issues.

The xlog removal code just check the global minimum required LSN - it
doesn't check the individual slots. So you'd need to add a bit more code
to that location. But it'd be easy.

But I think I'd just monitor/graph the byte difference for all slots
using pg_replication_slots...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking replication slot blockings

2014-04-16 Thread Magnus Hagander
On Wed, Apr 16, 2014 at 6:56 PM, Andres Freund and...@2ndquadrant.comwrote:

 Hi,

 On 2014-04-16 18:51:41 +0200, Magnus Hagander wrote:
  I'm thinking it could be interesting to know how many times (or in some
  other useful unit than times - how often) a specific replication slot
 has
  blocked xlog rotation. Since this AFAIK only happens during
 checkpoints,
  it seems it should be reasonably cheap to track? It would serve as an
  indicator of which slave(s) are having enough trouble keeping up to
  potentially cause issues.

 The xlog removal code just check the global minimum required LSN - it
 doesn't check the individual slots. So you'd need to add a bit more code
 to that location. But it'd be easy.


Do we have statistics there somewhere - how often that global minimum
blocks something? That on it's own might be a start :)


But I think I'd just monitor/graph the byte difference for all slots
 using pg_replication_slots...


Yeah, that would work when monitored continously. I was more looking for
the view of hey, could this be what happened into a system that did not
previously have any monitoring installed and therefor no such history.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] LDAP: bugfix and deprecated OpenLDAP API

2014-04-16 Thread Magnus Hagander
On Mon, Oct 21, 2013 at 3:31 PM, Albe Laurenz laurenz.a...@wien.gv.atwrote:

 Peter Eisentraut wrote:
  --- 3511,3534 
   }
 
   /*
  ! * Perform an explicit anonymous bind.
  ! * This is not necessary in principle, but we want to set a timeout
  ! * of PGLDAP_TIMEOUT seconds and return 2 if the connection fails.
  ! * Unfortunately there is no standard conforming way to do that.
*/
 
  This comment has become a bit confusing.  What exactly is nonstandard?
  Setting a timeout, or returning 2?  The code below actually returns 3.

 I have improved the comment.

  + #ifdef HAVE_LIBLDAP
  +/* in OpenLDAP, use the LDAP_OPT_NETWORK_TIMEOUT option */
 
  We don't use HAVE_LIBLDAP anywhere else to mean OpenLDAP.  Existing
  LDAP-related code uses #ifdef WIN32.

 Changed.

   + #else
 
  There should be a comment here indicating what this #else belongs to
  (#else /* HAVE_LIBLDAP */, or whatever we end up using).

 Changed.

  +/* the nonstandard ldap_connect function performs an anonymous
 bind */
  +if (ldap_connect(ld, time) != LDAP_SUCCESS)
  +{
  +/* error or timeout in ldap_connect */
  +free(url);
  +ldap_unbind(ld);
  +return 2;
  +}
  + #endif
 
  here too

 Changed.

  Bonus: Write a commit message for your patch.  (Consider using git
  format-patch.)

 Suggested commit message is included in the patch.


Sorry about the huge delay in trying to get around to this one.

For the sake of the archives - I looked at the fact that the windows
codepath always returns 2, whereas the unix codepath separates at least one
case out as a return 3. I can't see a pattern in the windows return codes
that would make it possible to do the same though, without explicitly
listing different error codes mapping to each of them - so I don't think
it's worth doing that.

Thus - applied, and backpatched all the way. Thanks!

(And I just realized I forgot to credit reviewers in the commit message. My
apologies - I blame confusion after havnig to re-setup my windows build
environments all over again..)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Tracking replication slot blockings

2014-04-16 Thread Andres Freund
On 2014-04-16 19:09:09 +0200, Magnus Hagander wrote:
 On Wed, Apr 16, 2014 at 6:56 PM, Andres Freund and...@2ndquadrant.comwrote:
  The xlog removal code just check the global minimum required LSN - it
  doesn't check the individual slots. So you'd need to add a bit more code
  to that location. But it'd be easy.
 
 
 Do we have statistics there somewhere - how often that global minimum
 blocks something? That on it's own might be a start :)

Nope. Check xlog.c:KeepLogSeg(), it's pretty simple stuff ;). It's the
same place where wal_keep_segments is enforced...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Claudio Freire
On Wed, Apr 16, 2014 at 4:22 AM, Peter Geoghegan p...@heroku.com wrote:

 I don't want to dismiss what you're saying about heating and cooling
 being unrelated, but I don't find the conclusion that not everything
 can be hot obvious. Maybe heat should be relative rather than
 absolute, and maybe that's actually what you meant. There is surely
 some workload where buffer access actually is perfectly uniform, and
 what do you do there? What temperature are those buffers?

In that case, hotness, or retention priority, should be relative to
re-population cost.

IE: whether it's likely to still be in the OS cache or not, whether
it's dirty or not, etc.

 It occurs to me that within the prototype patch, even though
 usage_count is incremented in a vastly slower fashion (in a wall time
 sense), clock sweep doesn't take advantage of that. I should probably
 investigate having clock sweep become more aggressive in decrementing
 in response to realizing that it won't get some buffer's usage_count
 down to zero on the next revolution either. There are certainly
 problems with that, but they might be fixable. Within the patch, in
 order for it to be possible for the usage_count to be incremented in
 the interim, an average of 1.5 seconds must pass, so if clock sweep
 were to anticipate another no-set-to-zero revolution, it seems pretty
 likely that it would be exactly right, or if not then close enough,
 since it can only really fail to correct for some buffers getting
 incremented once more in the interim. Conceptually, it would be like
 multiple logical revolutions were merged into one actual one,
 sufficient to have the next revolution find a victim buffer.

Why use time at all? Why not synchronize usage bumpability to clock sweeps?

I'd use a simple bit that the clock sweep clears, and the users set.
Only one increase per sweep.

Or maybe use a decreasing loop count instead of a bit. In any case,
measuring time in terms of clock sweeps sounds like a better
proposition.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Peter Geoghegan
On Wed, Apr 16, 2014 at 4:01 AM, Andres Freund and...@2ndquadrant.com wrote:
 Aren't you interested in the significance of the patch, and the test case?

 Not particularly in the specifics to be honest. The tradeoffs of the
 techniques you used in there seem prohibitive to me. It's easy to make
 individual cases faster by sacrificing others.

You're the one poring over the specifics of what I've done, to my
consternation. I am not prepared to defend the patch at that level, as
I've made abundantly clear. I've called it a sketch, a proof of
concept half a dozen times already. I don't understand your difficulty
with that. I also don't understand how you can be so dismissive of the
benchmark, given the numbers involved. You're being unreasonable.

If I didn't write this patch, and I talked to people about this issue
at pgCon, I'm not sure that anyone would be convinced that it was a
problem, or at least that it was this much of a problem.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade tablespaces

2014-04-16 Thread Bruce Momjian
On Sun, Jan 12, 2014 at 11:04:41PM -0500, Bruce Momjian wrote:
  In the pgsql_old installation you have symlinks pointing back to the
  current default location. As well pg_tablespace points back to
  /usr/local/pgsql/data/ The issue is that there is not actually
  anything there in the way of a tablespace. So when pg_upgrade runs
  it tries to upgrade from /usr/local/pgsql/data/tblspc_dir to
  /usr/local/pgsql/data/tblspc_dir where the first directory either
  does not exist. or if the user went ahead and created the directory
  in the new installation, is empty. What is really wanted is to
  upgrade from /usr/local/pgsql_old/data/tblspc_dir to
  /usr/local/pgsql/data/tblspc_dir. Right now the only way that
  happens is with user intervention.
 
 Right, it points to _nothing_ in the _new_ cluster.  Perhaps the
 simplest approach would be to check all the pg_tablespace locations to
 see if they point at real directories.  If not, we would have to have
 the user update pg_tablespace and the symlinks.  :-(  Actually, even in
 9.2+, those symlinks are going to point at the same nothing.  That
 would support checking the symlinks in all versions.

I have developed the attached patch which checks all tablespaces to make
sure the directories exist.  I plan to backpatch this.

The reason we haven't seen this bug reported more frequently is that a
_database_ defined in a non-existent tablespace directory already throws
an backend error, so this check is only necessary where tables/indexes
(not databases) are defined in non-existant tablespace directories.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/contrib/pg_upgrade/tablespace.c b/contrib/pg_upgrade/tablespace.c
new file mode 100644
index 783ee93..5a3dc60
*** a/contrib/pg_upgrade/tablespace.c
--- b/contrib/pg_upgrade/tablespace.c
***
*** 11,16 
--- 11,18 
  
  #include pg_upgrade.h
  
+ #include sys/types.h
+ 
  static void get_tablespace_paths(void);
  static void set_tablespace_directory_suffix(ClusterInfo *cluster);
  
*** get_tablespace_paths(void)
*** 65,73 
--- 67,101 
  	i_spclocation = PQfnumber(res, spclocation);
  
  	for (tblnum = 0; tblnum  os_info.num_old_tablespaces; tblnum++)
+ 	{
+ 		struct stat statBuf;
+ 
  		os_info.old_tablespaces[tblnum] = pg_strdup(
  	 PQgetvalue(res, tblnum, i_spclocation));
  
+ 		/*
+ 		 * Check that the tablespace path exists and is a directory.
+ 		 * Effectively, this is checking only for tables/indexes in
+ 		 * non-existant tablespace directories.  Databases located in
+ 		 * non-existant tablespaces already throw a backend error.
+ 		 */
+ 		if (stat(os_info.old_tablespaces[tblnum], statBuf) != 0)
+ 		{
+ 			if (errno == ENOENT)
+ report_status(PG_FATAL,
+ 			  tablespace directory \%s\ does not exist\n,
+ 			  os_info.old_tablespaces[tblnum]);
+ 			else
+ report_status(PG_FATAL,
+ 			  cannot stat() tablespace directory \%s\: %s\n,
+ 			  os_info.old_tablespaces[tblnum], getErrorText(errno));
+ 		}
+ 		if (!S_ISDIR(statBuf.st_mode))
+ report_status(PG_FATAL,
+ 			  tablespace path \%s\ is not a directory\n,
+ 			  os_info.old_tablespaces[tblnum]);
+ 	}
+ 
  	PQclear(res);
  
  	PQfinish(conn);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 1:42 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Apr 16, 2014 at 4:01 AM, Andres Freund and...@2ndquadrant.com wrote:
 Aren't you interested in the significance of the patch, and the test case?

 Not particularly in the specifics to be honest. The tradeoffs of the
 techniques you used in there seem prohibitive to me. It's easy to make
 individual cases faster by sacrificing others.

 You're the one poring over the specifics of what I've done, to my
 consternation. I am not prepared to defend the patch at that level, as
 I've made abundantly clear. I've called it a sketch, a proof of
 concept half a dozen times already. I don't understand your difficulty
 with that. I also don't understand how you can be so dismissive of the
 benchmark, given the numbers involved. You're being unreasonable.

I don't think he's being unreasonable, and I don't understand why
you're getting bent out of shape about it.  You proposed a patch, he
articulated a problem, you don't want to fix it right now.  All of
which is fine.  Why the ad hominem accusations?

 If I didn't write this patch, and I talked to people about this issue
 at pgCon, I'm not sure that anyone would be convinced that it was a
 problem, or at least that it was this much of a problem.

I agree with that, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-04-16 Thread Josh Berkus

 You can see the current multixact value in pg_controldata output.  Keep
 timestamped values of that somewhere (a table?) so that you can measure
 consumption rate.  I don't think we provide SQL-level access to those
 values.

Bleh.  Do we provide SQL-level access in 9.4?  If not, I think that's a
requirement before release.  Telling users to monitor a setting using a
restricted-permission command-line utility which produces a
version-specific text file they have to parse is not going to win us a
lot of fans.

 
 Also: how do I check the multixact age of a table?  There doesn't seem
 to be any data for this ...
 
 pg_class.relminmxid is the oldest multixact value that might be present
 in a table.

On every database I've tested, age(relminmxid) returns int_max.  So this
is apparently broken.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Peter Geoghegan
On Wed, Apr 16, 2014 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't think he's being unreasonable, and I don't understand why
 you're getting bent out of shape about it.  You proposed a patch, he
 articulated a problem, you don't want to fix it right now.  All of
 which is fine.  Why the ad hominem accusations?

I just think it's bad form to hold something like this to the same
standards as a formal commitfest submission. I am well aware that the
patch probably has several scalability issues.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case against multixact GUCs

2014-04-16 Thread Josh Berkus
On 03/12/2014 09:45 AM, Heikki Linnakangas wrote:
 In hindsight, I think permanent multixids in their current form was a
 mistake. Before 9.3, the thing that made multixids special was that they
 could just be thrown away at a restart. They didn't need freezing. Now
 that they do, why not just use regular XIDs for them? We had to
 duplicate much of the wraparound and freezing logic for multixids that
 simply would not have been an issue if we had used regular XIDs instead.
 
 We could've perhaps kept the old multixids for their original purpose,
 as transient xids that can be forgotten about after all the old
 snapshots are gone. But for the permanent ones, it would've been simpler
 if we handled them more like subxids; make them part of the same XID
 space as regular XIDs.
 
 This is pretty hand-wavy of course, and it's too late now.

So, if we ripped out all the multixact stuff for 9.4, what would that
cost us?  I'm serious.  The multixact stuff has been broken since 9.3
was released, and it's *still* broken.  We can't give users any guidance
or tools on how to set multixact stuff, and autovacuum doesn't handle it
properly.

Seems like this was just a bad patch and we should rip it out.  What
features do we lose?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case against multixact GUCs

2014-04-16 Thread Andres Freund
On 2014-04-16 11:10:52 -0700, Josh Berkus wrote:
 On 03/12/2014 09:45 AM, Heikki Linnakangas wrote:
  In hindsight, I think permanent multixids in their current form was a
  mistake. Before 9.3, the thing that made multixids special was that they
  could just be thrown away at a restart. They didn't need freezing. Now
  that they do, why not just use regular XIDs for them? We had to
  duplicate much of the wraparound and freezing logic for multixids that
  simply would not have been an issue if we had used regular XIDs instead.
  
  We could've perhaps kept the old multixids for their original purpose,
  as transient xids that can be forgotten about after all the old
  snapshots are gone. But for the permanent ones, it would've been simpler
  if we handled them more like subxids; make them part of the same XID
  space as regular XIDs.
  
  This is pretty hand-wavy of course, and it's too late now.
 
 So, if we ripped out all the multixact stuff for 9.4, what would that
 cost us?

Ripping multixacts out in general? Err, right. We'd loose shared row
level locks...

I think ripping out stuff at this point would be the cause of many, many
more bugs than it'd prevent.

 I'm serious.  The multixact stuff has been broken since 9.3
 was released, and it's *still* broken. We can't give users any guidance
 or tools on how to set multixact stuff, and autovacuum doesn't handle it
 properly.

Sorry, but I think you're blowing some GUCs *WAY* out of proportion.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case against multixact GUCs

2014-04-16 Thread Josh Berkus
On 04/16/2014 11:22 AM, Andres Freund wrote:
 I'm serious.  The multixact stuff has been broken since 9.3
 was released, and it's *still* broken. We can't give users any guidance
 or tools on how to set multixact stuff, and autovacuum doesn't handle it
 properly.
 
 Sorry, but I think you're blowing some GUCs *WAY* out of proportion.

I'm not talking about the GUCs. I'm talking about the data corruption
bugs.  Including the new one this week.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case against multixact GUCs

2014-04-16 Thread Andres Freund
On 2014-04-16 11:25:49 -0700, Josh Berkus wrote:
 On 04/16/2014 11:22 AM, Andres Freund wrote:
  I'm serious.  The multixact stuff has been broken since 9.3
  was released, and it's *still* broken. We can't give users any guidance
  or tools on how to set multixact stuff, and autovacuum doesn't handle it
  properly.
  
  Sorry, but I think you're blowing some GUCs *WAY* out of proportion.
 
 I'm not talking about the GUCs.

That was about:
We can't give users any guidance or tools on how to set multixact
stuff, and autovacuum doesn't handle it properly.

 I'm talking about the data corruption bugs.

That was covered by at this point ripping this out seems likely to
cause many more bugs than it would solve.

 Including the new one this week.

Lets hold our horses a bit, we don't know what's happening there for
now.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Merlin Moncure
On Wed, Apr 16, 2014 at 1:07 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Apr 16, 2014 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't think he's being unreasonable, and I don't understand why
 you're getting bent out of shape about it.  You proposed a patch, he
 articulated a problem, you don't want to fix it right now.  All of
 which is fine.  Why the ad hominem accusations?

 I just think it's bad form to hold something like this to the same
 standards as a formal commitfest submission. I am well aware that the
 patch probably has several scalability issues.

In fairness to Andres, while *you* may know that issuing an expensive
syscall in a tight loop is on the list of Forbidden Things, a lot of
people don't and it's pretty reasonable to issue methodology
objections in order to get them documented.

Anyways, I'm still curious if you can post similar numbers basing the
throttling on gross allocation counts instead of time.  Meaning: some
number of buffer allocations has to have occurred before you consider
eviction.  Besides being faster I think it's a better implementation:
an intermittently loaded server will give more consistent behavior.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 Anyways, I'm still curious if you can post similar numbers basing the
 throttling on gross allocation counts instead of time.  Meaning: some
 number of buffer allocations has to have occurred before you consider
 eviction.  Besides being faster I think it's a better implementation:
 an intermittently loaded server will give more consistent behavior.

Yeah --- I think wall-clock-based throttling is fundamentally the wrong
thing anyway.  Are we going to start needing a CPU speed measurement to
tune the algorithm with?  Not the place to be going.  But driving it off
the number of allocations that've been done could be sensible.  (OTOH,
that means you need a central counter, which itself would be a
bottleneck.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case against multixact GUCs

2014-04-16 Thread Josh Berkus
On 04/16/2014 11:30 AM, Andres Freund wrote:
 On 2014-04-16 11:25:49 -0700, Josh Berkus wrote:
 On 04/16/2014 11:22 AM, Andres Freund wrote:
 I'm serious.  The multixact stuff has been broken since 9.3
 was released, and it's *still* broken. We can't give users any guidance
 or tools on how to set multixact stuff, and autovacuum doesn't handle it
 properly.

 Sorry, but I think you're blowing some GUCs *WAY* out of proportion.

 I'm not talking about the GUCs.
 
 That was about:
 We can't give users any guidance or tools on how to set multixact
 stuff, and autovacuum doesn't handle it properly.

OK.  I will point out that if multixact freeze was an *intentional*
feature, we'd never have accepted it given the total lack of either
documentation or monitorability.

 
 I'm talking about the data corruption bugs.
 
 That was covered by at this point ripping this out seems likely to
 cause many more bugs than it would solve.

That's certainly possible.  I just don't think the option of reversing
those patches should be off the table.  Things have been bad enough that
that might be the best option.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Robert Haas
On Wed, Apr 16, 2014 at 2:07 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Apr 16, 2014 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't think he's being unreasonable, and I don't understand why
 you're getting bent out of shape about it.  You proposed a patch, he
 articulated a problem, you don't want to fix it right now.  All of
 which is fine.  Why the ad hominem accusations?

 I just think it's bad form to hold something like this to the same
 standards as a formal commitfest submission. I am well aware that the
 patch probably has several scalability issues.

I don't agree.  I think it's perfectly appropriate to raise potential
issues at the earliest possible time.  People have regularly been
heard to complain in this forum that those objecting to a patch did
not object soon enough for them to make changes.  That's a hard
problem to fix because we can't force people whose salaries we're not
paying to attention to patches over whatever else they may have to do,
but we shouldn't label it as a bad thing when people choose to get
involved and provide feedback early.  Early feedback is exactly what
we want to encourage here.

And regardless of any of that, I think person X is being
unreasonable is a personal attack that has exactly zero place on this
mailing list.  We are here to talk about technology, not anyone's
character.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Merlin Moncure
On Wed, Apr 16, 2014 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 Anyways, I'm still curious if you can post similar numbers basing the
 throttling on gross allocation counts instead of time.  Meaning: some
 number of buffer allocations has to have occurred before you consider
 eviction.  Besides being faster I think it's a better implementation:
 an intermittently loaded server will give more consistent behavior.

 Yeah --- I think wall-clock-based throttling is fundamentally the wrong
 thing anyway.  Are we going to start needing a CPU speed measurement to
 tune the algorithm with?  Not the place to be going.  But driving it off
 the number of allocations that've been done could be sensible.  (OTOH,
 that means you need a central counter, which itself would be a
 bottleneck.)

sure -- note we already track that in BufferStrategyControl
(everything in buffer allocation is already centrally managed
essentially).

/*
 * Statistics.  These counters should be wide enough that they can't
 * overflow during a single bgwriter cycle.
 */
uint32  completePasses; /* Complete cycles of the clock sweep */
uint32  numBufferAllocs;/* Buffers allocated
since last reset */

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Peter Geoghegan
On Wed, Apr 16, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:
 I don't agree.  I think it's perfectly appropriate to raise potential
 issues at the earliest possible time.

If I didn't *strongly* emphasize my intent in writing the patch up
front, I'd certainly agree. I just don't see why what I've done cannot
be accepted in the spirit in which it was intended.

 And regardless of any of that, I think person X is being
 unreasonable is a personal attack that has exactly zero place on this
 mailing list.  We are here to talk about technology, not anyone's
 character.

Telling someone they're being unreasonable is not a personal attack.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-16 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 On Wed, Apr 16, 2014 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah --- I think wall-clock-based throttling is fundamentally the wrong
 thing anyway.  Are we going to start needing a CPU speed measurement to
 tune the algorithm with?  Not the place to be going.  But driving it off
 the number of allocations that've been done could be sensible.  (OTOH,
 that means you need a central counter, which itself would be a
 bottleneck.)

 sure -- note we already track that in BufferStrategyControl
 (everything in buffer allocation is already centrally managed
 essentially).

Indeed, but I'd think getting rid of that property would be one of the
top priorities for any attempt to do anything at all in this area of
the code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-04-16 Thread Alvaro Herrera
Josh Berkus wrote:
 
  You can see the current multixact value in pg_controldata output.  Keep
  timestamped values of that somewhere (a table?) so that you can measure
  consumption rate.  I don't think we provide SQL-level access to those
  values.
 
 Bleh.  Do we provide SQL-level access in 9.4?  If not, I think that's a
 requirement before release.

Yeah, good idea.  Want to propose a patch?

  Also: how do I check the multixact age of a table?  There doesn't seem
  to be any data for this ...
  
  pg_class.relminmxid is the oldest multixact value that might be present
  in a table.
 
 On every database I've tested, age(relminmxid) returns int_max.  So this
 is apparently broken.

Hmm, are you sure it's INT_MAX and not 4244967297?  Heikki reported
that: http://www.postgresql.org/message-id/52401aea.9000...@vmware.com
The absolute value is not important; I think that's mostly harmless.  I
don't think applying age() to a multixact value is meaningful, though;
that's only good for Xids.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-04-16 Thread Josh Berkus
On 04/16/2014 01:30 PM, Alvaro Herrera wrote:
 Josh Berkus wrote:

 You can see the current multixact value in pg_controldata output.  Keep
 timestamped values of that somewhere (a table?) so that you can measure
 consumption rate.  I don't think we provide SQL-level access to those
 values.

 Bleh.  Do we provide SQL-level access in 9.4?  If not, I think that's a
 requirement before release.
 
 Yeah, good idea.  Want to propose a patch?

Yeah, lemme dig into this.  I really think we need it for 9.4, feature
frozen or not.

 Also: how do I check the multixact age of a table?  There doesn't seem
 to be any data for this ...

 pg_class.relminmxid is the oldest multixact value that might be present
 in a table.

 On every database I've tested, age(relminmxid) returns int_max.  So this
 is apparently broken.
 
 Hmm, are you sure it's INT_MAX and not 4244967297?  Heikki reported
 that: http://www.postgresql.org/message-id/52401aea.9000...@vmware.com
 The absolute value is not important; I think that's mostly harmless.  I
 don't think applying age() to a multixact value is meaningful, though;
 that's only good for Xids.

Yeah, I'm sure:


josh=# select relname, age(relminmxid) from pg_class;
 relname |age
-+
 pg_statistic| 2147483647
 pg_type | 2147483647
 random  | 2147483647
 dblink_pkey_results | 2147483647
 pg_toast_17395  | 2147483647

...

So if age() doesn't mean anything, then how are users to know when the
need to freeze?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-04-16 Thread Alvaro Herrera
Josh Berkus wrote:

  Josh Berkus wrote:
 
  You can see the current multixact value in pg_controldata output.  Keep
  timestamped values of that somewhere (a table?) so that you can measure
  consumption rate.  I don't think we provide SQL-level access to those
  values.
 
  Bleh.  Do we provide SQL-level access in 9.4?  If not, I think that's a
  requirement before release.
  
  Yeah, good idea.  Want to propose a patch?
 
 Yeah, lemme dig into this.  I really think we need it for 9.4, feature
 frozen or not.

Great, thanks.

 josh=# select relname, age(relminmxid) from pg_class;
  relname |age
 -+
  pg_statistic| 2147483647
  pg_type | 2147483647
  random  | 2147483647
  dblink_pkey_results | 2147483647
  pg_toast_17395  | 2147483647
 
 ...
 
 So if age() doesn't mean anything, then how are users to know when the
 need to freeze?

I don't understand.  Autovacuum will freeze this automatically when the
threshold is reached.  Users don't need to do anything.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-04-16 Thread Petr Jelinek

On 16/04/14 18:34, Robert Haas wrote:

On Wed, Apr 16, 2014 at 12:28 PM, Andres Freund and...@2ndquadrant.com wrote:

On 2014-04-16 12:20:01 -0400, Robert Haas wrote:

On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund and...@2ndquadrant.com wrote:
I'm still not seeing the problem.  It's the background worker's job to
make sure that the right stuff gets logged, just as it would be for
any other backend.  Trying to bolt some portion of the responsibility
for that onto the postmaster is 100% wrong.


Well, it already has taken on that responsibility, it's not my idea to
add it. I merely want to control more precisely what happens.s


I think that's doubling down on an already-questionable design principle.

Or if I may be permitted a more colloquial idiom:

Luke, it's a trap.



Well the logging is just too spammy in general when it comes to dynamic 
bgworkers but that's easy to fix in the future, no need to make 
decisions for 9.4.


However I really don't like that I have to exit with exit code 1, which 
is normally used as failure, if I want to shutdown my dynamic bgworker 
once it has finished the work. And this behavior is something we can set 
properly only once...



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New functions for sslinfo extension

2014-04-16 Thread Воронин Дмитрий
Hello all, postgresmen! I want to present some functions to sslinfo extension module:1) ssl_get_count_of_extensions() --- get count of X509v3 extensions from client certificate;2) ssl_get_extension_names() --- get short names of X509v3 extensions from client certificate;3) ssl_get_extension_value(text) --- get value of extension from certificate (argument --- short name of extension);4) ssl_is_critical_extension(text) --- returns true, if extension is critical and false, if is not (argument --- short name of extension). I write those functions with libpq on C. I put code of module and sql-file for loading with this letter.  Best regards,Dmitry Voronin #include postgres.h
#include fmgr.h
#include utils/numeric.h
#include libpq/libpq-be.h
#include miscadmin.h
#include utils/builtins.h
#include mb/pg_wchar.h
#include funcapi.h

#include openssl/x509.h
#include openssl/x509v3.h


PG_MODULE_MAGIC;

X509_EXTENSION	*get_extension(X509* certificate, char *name);
Datum 		ssl_get_extension_value(PG_FUNCTION_ARGS);
Datum		ssl_is_critical_extension(PG_FUNCTION_ARGS);
Datum 		ssl_get_count_of_extensions(PG_FUNCTION_ARGS);
Datum		ssl_get_extension_names(PG_FUNCTION_ARGS);


X509_EXTENSION *get_extension(X509* certificate, char *name) {
	int 			extension_nid = 0;
	int 			locate = 0;
	
	extension_nid = OBJ_sn2nid(name);
	if (0 == extension_nid) {
	extension_nid = OBJ_ln2nid(name);
	if (0 == extension_nid) 
		return NULL;
	}
	locate = X509_get_ext_by_NID(certificate, extension_nid,  -1);
	return X509_get_ext(certificate, locate);
}


PG_FUNCTION_INFO_V1(ssl_get_extension_value);
Datum
ssl_get_extension_value(PG_FUNCTION_ARGS) {	
	X509 			*certificate = MyProcPort - peer;
	X509_EXTENSION 		*extension = NULL;
	char 			*extension_name = text_to_cstring(PG_GETARG_TEXT_P(0));
	BIO 			*bio = NULL;
	char 			*value = NULL;
	text 			*result = NULL;

	if (NULL == certificate)
	PG_RETURN_NULL();

	extension = get_extension(certificate, extension_name);
	if (NULL == extension)
	elog(ERROR, Extension by name \%s\ is not found in certificate, extension_name);

	bio = BIO_new(BIO_s_mem());
	char nullterm = '\0';
	X509V3_EXT_print(bio, extension, -1, -1);
	BIO_write(bio, nullterm, 1);
	BIO_get_mem_data(bio, value);

	result = cstring_to_text(value);
	BIO_free(bio);
	
	PG_RETURN_TEXT_P(result);
}


PG_FUNCTION_INFO_V1(ssl_is_critical_extension);
Datum
ssl_is_critical_extension(PG_FUNCTION_ARGS) {
	X509 			*certificate = MyProcPort - peer;
	X509_EXTENSION 		*extension = NULL;
	char 			*extension_name = text_to_cstring(PG_GETARG_TEXT_P(0));
	int 			critical = 0;
	
	if (NULL == certificate)
	PG_RETURN_NULL();
	
	extension = get_extension(certificate, extension_name);
	if (NULL == extension) 
	elog(ERROR, Extension name \%s\ is not found in certificate, extension_name);
	
	critical = X509_EXTENSION_get_critical(extension);
	PG_RETURN_BOOL(critical);
}


PG_FUNCTION_INFO_V1(ssl_get_count_of_extensions);
Datum
ssl_get_count_of_extensions(PG_FUNCTION_ARGS) {
	X509 			*certificate = MyProcPort - peer;
	
	if (NULL == certificate)
	PG_RETURN_NULL();
	
	PG_RETURN_INT32(X509_get_ext_count(certificate));
}


PG_FUNCTION_INFO_V1(ssl_get_extension_names);
Datum
ssl_get_extension_names(PG_FUNCTION_ARGS) {
	X509*certificate = MyProcPort - peer;
	FuncCallContext 		*funcctx;
	STACK_OF(X509_EXTENSION) 	*extension_stack = NULL;
	MemoryContext 			oldcontext;
	int call = 0;
	int max_calls = 0;
	X509_EXTENSION			*extension = NULL;
	ASN1_OBJECT			*object = NULL;
	int extension_nid = 0;
	text*result = NULL;
	
	if (NULL == certificate)
	PG_RETURN_NULL();
	
	extension_stack = certificate - cert_info - extensions;
	if (NULL == extension_stack) 
	PG_RETURN_NULL();
	
	if (SRF_IS_FIRSTCALL()) {
	funcctx = SRF_FIRSTCALL_INIT();
	oldcontext = MemoryContextSwitchTo(funcctx - multi_call_memory_ctx);
	funcctx - max_calls = X509_get_ext_count(certificate);
	MemoryContextSwitchTo(oldcontext);
	}
	funcctx = SRF_PERCALL_SETUP();
	
	call = funcctx - call_cntr;
	max_calls = funcctx - max_calls;
	
	if (call  max_calls) {
	extension = sk_X509_EXTENSION_value(extension_stack, call);
	object = X509_EXTENSION_get_object(extension);
	extension_nid = OBJ_obj2nid(object);
	
	if (0 == extension_nid)
		elog(ERROR, Unknown extension in certificate);
	
	result = cstring_to_text(OBJ_nid2sn(extension_nid));
	
 	SRF_RETURN_NEXT(funcctx, (Datum) result);
	}
	
	sk_X509_EXTENSION_free(extension);
	
	SRF_RETURN_DONE(funcctx);
}


sslextensions.control
Description: Binary data


sslextensions--1.0.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bogus tsdict, tsparser, etc object identities

2014-04-16 Thread Alvaro Herrera
Alvaro Herrera wrote:

 This problem is new in 9.3, so backpatching to that.  Prior to that we
 didn't have object identities.

Done.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-04-16 Thread Josh Berkus

 So if age() doesn't mean anything, then how are users to know when the
 need to freeze?
 
 I don't understand.  Autovacuum will freeze this automatically when the
 threshold is reached.  Users don't need to do anything.

What I'm asking is:

- how do users know if Autovacuum is keeping up with multixact feezing?
- how do users get data on multixact usage so that they can tune the
parameters?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Alvaro Herrera

So, from top to bottom I see the following elements:

* backend is executing a query
* this query is getting captured by pg_stat_statements
* the query is also getting captured by autoexplain, in chain from
  pg_stat_statements
* autoexplain runs the query, which invokes a plpgsql function
* this plpgsql function runs another query, and this one is captured by
  pg_stat_statements
* and also by autoexplain
* The autoexplain run of this inner query invokes a trigger
* .. which is a FK trigger, and therefore runs ri_PerformCheck which
  runs another query
* This FK check query gets captured by pg_stat_statements
* and also by autoexplain, which runs it
* this autoexplain run of the third query invokes SeqNext
* this does a heap access, which uses HeapTupleSatisfiesMVCC
* this one tries to read the update XID and gets an assertion failure.

Oh my.

Would it be possible for you to provide a self-contained setup that
behaves like this?  I think a bt full would be useful since it'd
provide the queries at each of the three stages.

I'm not quite clear on why the third query, the one in ri_PerformCheck,
is invoking a sequence.  AFAICS it's this:

/* --
 * The query string built is
 *  SELECT 1 FROM ONLY pktable x WHERE pkatt1 = $1 [AND 
...]
 * FOR KEY SHARE OF x
 * The type id's for the $ parameters are those of the
 * corresponding FK attributes.
 * --
 */
so either I'm misreading the whole thing, or there is something very odd
going on here.

Are you aware of something unusual in the FK setups?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improve the help message about psql -F

2014-04-16 Thread Bruce Momjian
On Tue, Feb 11, 2014 at 10:02:20PM -0500, Peter Eisentraut wrote:
 If you are going to change the help string for -F, you should also
 update the help string for -R, and possibly for -z and -0.

Patch applied with all the suggestions merged in;  commitfest item
marked as committed:

  -F, --field-separator=STRING
   field separator for unaligned output (default: |)
  -H, --html   HTML table output mode
  -P, --pset=VAR[=ARG] set printing option VAR to ARG (see \pset command)
  -R, --record-separator=STRING
   record separator for unaligned output (default: 
newline)
  -t, --tuples-onlyprint rows only
  -T, --table-attr=TEXTset HTML table tag attributes (e.g., width, border)
  -x, --expanded   turn on expanded table output
  -z, --field-separator-zero
   set field separator for unaligned output to zero byte
  -0, --record-separator-zero
   set record separator for unaligned output to zero 
byte


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_get_viewdefs() indentation considered harmful

2014-04-16 Thread Bruce Momjian
On Sat, Jan 25, 2014 at 01:02:36PM -0500, Andrew Dunstan wrote:
 
 On 01/25/2014 11:06 AM, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark st...@mit.edu wrote:
 Indeed even aside from the performance questions, once you're indented
 5-10 times the indention stops being useful at all. The query would
 probably be even more readable if we just made indentation modulo 40
 so once you get too far indented it wraps around which is not unlike
 how humans actually indent things in this case.
 Ha!  That seems a little crazy, but *capping* the indentation at some
 reasonable value might not be dumb.
 I could go for either of those approaches, if applied uniformly, and
 actually Greg's suggestion sounds a bit better: it seems more likely
 to preserve some readability in deeply nested constructs.
 
 With either approach you need to ask where the limit value is going
 to come from.  Is it OK to just hard-wire a magic number, or do we
 need to expose a knob somewhere?
 
  
 
 
 Simply capping it is probably the best bang for the buck. I suspect
 most people would prefer to have  q1 union q2 union q3 union q4
 with the subqueries all indented to the same level. But I understand
 the difficulties in doing so.
 
 A knob seems like overkill. I'd just hardwire some number, say three
 or four levels of indentation.

Did we address this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?

2014-04-16 Thread Bruce Momjian
On Mon, Jan 27, 2014 at 03:45:38PM +0100, Magnus Hagander wrote:
 On Mon, Jan 27, 2014 at 3:43 PM, Robert Haas robertmh...@gmail.com wrote:
 
 On Sun, Jan 26, 2014 at 1:03 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  For some reason CheckForStandbyTrigger() doesn't report permission
  errors when stat()int the trigger file. Shouldn't we fix that?
 
  static bool
  CheckForStandbyTrigger(void)
  {
  ...
  if (stat(TriggerFile, stat_buf) == 0)
  {
  ereport(LOG,
  (errmsg(trigger file found: %s,
 TriggerFile)));
  unlink(TriggerFile);
  triggered = true;
  fast_promote = true;
  return true;
  }
 
  Imo the stat() should warn about all errors but ENOENT?
 
 Seems reasonable.  It could lead to quite a bit of log spam, I
 suppose, but the way things are now could be pretty mystifying if
 you've located your trigger file somewhere outside $PGDATA, and a
 parent directory is lacking permissions.
 
 
 +1. Since it actually indicates something that's quite broken (since with that
 you can never make the trigger work until you fix it), the log spam seems like
 it would be appropriate. (Logspam is never nice, but a single log line is also
 very easy to miss - this should log enough that you wouldn't) 

I have developed the attached patch to address this issue.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
new file mode 100644
index 0106cdf..88ad51f
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*** CheckForStandbyTrigger(void)
*** 11102,11107 
--- 11102,3 
  		fast_promote = true;
  		return true;
  	}
+ 	else if (errno != ENOENT)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg(could not stat trigger file \%s\: %m,
+ 		TriggerFile)));
+ 
  	return false;
  }
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity

2014-04-16 Thread Bruce Momjian
On Tue, Apr 15, 2014 at 12:32:36PM -0700, David Fetter wrote:
 On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote:
  On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
psql: conditionally display oids and replication identity
   
   Buildfarm isn't terribly pleased with this --- looks like you missed
   contrib/test_decoding/
  
  Fixed.  I added a personal script option that allows me to test contrib,
  but forgot to run it.
 
 Is that script of general utility for committers?  If so, it might be
 good to include it in the distribution.  I'd be happy to go through
 and perl-ify it, document it, etc.  Or maybe it could be a new make
 target...

My script is wrapper around src/tools/pgtest.  I am attaching it in case
there are snippets in there that are useful to people.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
:

. traprm

[ $1 = -c ]  CONTRIB=Y  shift
[ $1 = -d ]  DOCS=Y  shift
[ $1 = -c ]  CONTRIB=Y  shift

export QUIET=$(($QUIET + 1))

. cd_pgtop

chown -R postgres .

echo Checking SGML
cd doc/src/sgml
# make postgres.sgml first so we don't filter on a configure check
make -q postgres.sgml
make check  $TMP/0 21
if grep -v 'fully-tagged'  $TMP/0 | egrep -qi 'Error|Warning'
thenecho SGML error
cat $TMP/0
exit 1
fi

[ $(pwd) != '/pgsql/8.4/doc/src/sgml' ]  make check-tabs

# Run only at night to check for HISTORY build problems
# in HISTORY.html.
if [ ! -t 0 -o $DOCS = Y ]
thenmake INSTALL.html  $TMP/0 21
if egrep -qi 'Error|Warning'  $TMP/0
thenecho SGML error
cat $TMP/0
exit 1
fi
# removed in PG 9.4
make HISTORY.html  $TMP/0 21
if grep -q 'Error'  $TMP/0
thenecho SGML error
cat $TMP/0
exit 1
fi
fi

# fails on /bin/sh
cd -  /dev/null

echo Checking duplicate oids
cd src/include/catalog
duplicate_oids  $TMP/0
if [ -s $TMP/0 ]
thenecho Duplicate system oids
cat $TMP/0
exit 1
fi
cd -  /dev/null

pggit diff --check || exit 1

echo Running pgtest
(aspg /pg/tools/pgtest --silent $@; echo $?  $TMP/ret) |
# use only one grep so we don't buffer output
egrep -v 'In file included from gram.y:|warning: unused variable 
.yyg.|yy_try_NUL_trans'

if [ $CONTRIB ]
thencd contrib
make --silent
# install-check is much faster because no initdbs
aspg make --silent check
cd -  /dev/null
fi

rm -fr src/test/regress/tmp_check


[ -t 0 ]  bell

exit $(cat $TMP/ret)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2014-04-16 Thread Bruce Momjian
On Thu, Feb  6, 2014 at 09:40:32AM +0100, Andres Freund wrote:
 On 2014-02-05 12:36:42 -0500, Robert Haas wrote:
   It may well be that your proposal is spot on.  But I'd like to see some
   data-structure-by-data-structure measurements, rather than assuming that
   alignment must be a good thing.
  
   I am fine with just aligning BufferDescriptors properly. That has
   clearly shown massive improvements.
  
  I thought your previous idea of increasing BUFFERALIGN to 64 bytes had
  a lot to recommend it.
 
 Good.
 
 I wonder if we shouldn't move that bit of logic:
   if (size = BUFSIZ)
   newStart = BUFFERALIGN(newStart);
 out of ShmemAlloc() and instead have a ShmemAllocAligned() and
 ShmemInitStructAligned() that does it. So we can sensibly can control it
 per struct.
 
  But that doesn't mean it doesn't need testing.
 
 I feel the need here, to say that I never said it doesn't need testing
 and never thought it didn't...

Where are we on this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >