Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Kouhei Kaigai
> On 7 May 2014 02:05, Kouhei Kaigai  wrote:
> > Prior to the development cycle towards v9.5, I'd like to reopen the
> > discussion of custom-plan interface. Even though we had lots of
> > discussion during the last three commit-fests, several issues are
> > still under discussion. So, I'd like to clarify direction of the
> > implementation, prior to the first commit-fest.
> >
> > (1) DDL support and system catalog
> >
> > Simon suggested that DDL command should be supported to track custom-
> > plan providers being installed, and to avoid nonsense hook calls if it
> > is an obvious case that custom-plan provider can help. It also makes
> > sense to give a chance to load extensions once installed.
> > (In the previous design, I assumed modules are loaded by LOAD command
> > or *_preload_libraries parameters).
> >
> > I tried to implement the following syntax:
> >
> >   CREATE CUSTOM PLAN  FOR (scan|join|any) HANDLER ;
> 
> Thank you for exploring that thought and leading the way on this research.
> I've been thinking about this also.
> 
> What I think we need is a declarative form that expresses the linkage between
> base table(s) and a related data structures that can be used to optimize
> a query, while still providing accurate results.
> 
> In other DBMS, we have concepts such as a JoinIndex or a MatView which allow
> some kind of lookaside behaviour. Just for clarity, a concrete example is
> Oracle's Materialized Views which can be set using ENABLE QUERY REWRITE
> so that the MatView can be used as an alternative path for a query. We do
> already have this concept in PostgreSQL, where an index can be used to
> perform an IndexOnlyScan rather than accessing the heap itself.
> 
> We have considerable evidence that the idea of alternate data structures
> results in performance gains.
> * KaiGai's work - https://wiki.postgresql.org/wiki/PGStrom
> * http://www.postgresql.org/message-id/52c59858.9090...@garret.ru
> * http://citusdata.github.io/cstore_fdw/
> * University of Manchester - exploring GPUs as part of the AXLE project
> * Barcelona SuperComputer Centre - exploring FPGAs, as part of the AXLE
> project
> * Some other authors have also cited gains using GPU technology in databases
> 
> So I would like to have a mechanism that provides a *generic* Lookaside
> for a table or foreign table.
> 
> Tom and Kevin have previously expressed that MatViews would represent a
> planning problem, in the general case. One way to solve that planning issue
> is to link structures directly together, in the same way that an index and
> a table are linked. We can then process the lookaside in the same way we
> handle a partial index - check prerequisites and if usable, calculate a
> cost for the alternate path.
> We need not add planning time other than to the tables that might benefit
> from that.
> 
> Roughly, I'm thinking of this...
> 
> CREATE LOOKASIDE ON foo
>TO foo_mat_view;
> 
> and also this...
> 
> CREATE LOOKASIDE ON foo
>TO foo_as_a_foreign_table   /* e.g. PGStrom */
> 
> This would allow the planner to consider alternate plans for foo_mv during
> set_plain_rel_pathlist() similarly to the way it considers index paths,
> in one of the common cases that the mat view covers just one table.
> 
> This concept is similar to ENABLE QUERY REWRITE in Oracle, but this thought
> goes much further, to include any generic user-defined data structure or
> foreign table.
> 
Let me clarify. This mechanism allows to add alternative scan/join paths
including built-in ones, not only custom enhanced plan/exec node, isn't it?
Probably, it is a variation of above proposition if we install a handler
function that proposes built-in path nodes towards the request for scan/join.

> Do we need this? For MVs, we *might* be able to deduce that the MV is
> rewritable for "foo", but that is not deducible for Foreign Tables, by
> current definition, so I prefer the explicit definition of objects that
> are linked - since doing this for indexes is already familiar to people.
> 
> Having an explicit linkage between data structures allows us to enhance
> an existing application by transaparently adding new structures, just as
> we already do with indexes. Specifically, that we allow more than one
> lookaside structure on any one table.
> 
Not only alternative data structure, alternative method to scan/join towards
same data structure is also important, isn't it?

> Forget the exact name, thats not important. But I think the requirements
> here are...
> 
> * Explicit definition that we are attaching an alternate path onto a table
> (conceptually similar to adding an index)
> 
I think the syntax allows "tables", not only a particular table.
It will inform the core planner this lookaside/customplan (name is not
important, anyway this feature...) can provide alternative path towards
the set of relations; being considered. So, it allows to reduce number of
function calls on planner stage.

> * Ability to check that the alternate path is via

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Simon Riggs
On 6 May 2014 17:55, Andres Freund  wrote:

>> All this changes is the cost of
>> IndexScans that would use more than 25% of shared_buffers worth of
>> data. Hopefully not many of those in your workload. Changing the cost
>> doesn't necessarily prevent index scans either. And if there are many
>> of those in your workload AND you run more than one at same time, then
>> the larger setting will work against you. So the benefit window for
>> such a high setting is slim, at best.
>
> Why? There's many workloads where indexes are larger than shared buffers
> but fit into the operating system's cache. And that's precisely what
> effective_cache_size is about.
> Especially on bigger machines shared_buffers can't be set big enough to
> actually use all the machine's memory. It's not uncommon to have 4GB
> shared buffers on a machine with 512GB RAM... It'd be absolutely
> disastrous to set effective_cache_size to 1GB for an analytics workload.

In this case, a setting of effective_cache_size > (4 * shared_buffers)
could be appropriate, as long as we are certain we have the memory.

We don't have any stats on peak memory usage to be certain - although
in that case its pretty clear.

If we had stats on how effective the indexscan was at multiple-hitting
earlier read blocks, we'd be able to autotune, but I accept that
without that we do still need the parameter.

>> I specifically picked 25% of shared_buffers because that is the point
>> at which sequential scans become more efficient and use the cache more
>> efficiently. If our cost models are correct, then switching away from
>> index scans shouldn't hurt at all.
>
> More often than not indexes are smaller than the table size, so this
> argument doesn't seem to make much sense.

If we believe that 25% of shared_buffers worth of heap blocks would
flush the cache doing a SeqScan, why should we allow 400% of
shared_buffers worth of index blocks? In your example, that would be
1GB of heap blocks, or 16GB of index blocks.  If our table is 100GB
with a 32GB index, then yes, that is 1% of the heap and 50% of the
index. But that doesn't matter, since I am discussing the point at
which we prevent the cache being churned. Given your example we do not
allow a SeqScan of a table larger than 1GB to flush cache, since we
use BAS_BULKREAD. If we allow an indexscan plan that will touch 16GB
of an index that will very clearly flush out our 4GB of
shared_buffers, increasing time for later queries even if they only
have to read from OS buffers back into shared_buffers. That will still
show itself as a CPU spike, which is what people say they are seeing.

I think I'm arguing myself towards using a BufferAccessStrategy of
BAS_BULKREAD for large IndexScans, BitMapIndexScans and
BitMapHeapScans. Yes, we can make plans assuming we can use OS cache,
but we shouldn't be churning shared_buffers when we execute those
plans. "large" here meaning the same thing as it does for SeqScans,
which is a scan that seems likely to touch more than 25% of shared
buffers. I'll work up a patch.

Perhaps it would also be useful to consider using a sequential scan of
the index relation for less selective BitmapIndexScans, just as we do
very effectively during VACUUM. Maybe that is a better idea than
bitmap indexes.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Jeff Janes
When recovering from a crash (with injection of a partial page write at
time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
verification failure.

16396 is a gin index.

If I have it ignore checksum failures, there is no apparent misbehavior.
 I'm trying to bisect it, but it could take a while and I thought someone
might have some theories based on the log:

29075  2014-05-06 23:29:51.411 PDT:LOG:  0: database system was not
properly shut down; automatic recovery in progress
29075  2014-05-06 23:29:51.411 PDT:LOCATION:  StartupXLOG, xlog.c:6361
29075  2014-05-06 23:29:51.412 PDT:LOG:  0: redo starts at 11/323FE1C0
29075  2014-05-06 23:29:51.412 PDT:LOCATION:  StartupXLOG, xlog.c:6600
29075  2014-05-06 23:29:51.471 PDT:WARNING:  01000: page verification
failed, calculated checksum 35967 but expected 7881
29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages
(16), node: 1663/16384/16396 blkno: 0
29075  2014-05-06 23:29:51.471 PDT:LOCATION:  PageIsVerified, bufpage.c:145
29075  2014-05-06 23:29:51.471 PDT:FATAL:  XX001: invalid page in block
28486 of relation base/16384/16396
29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages
(16), node: 1663/16384/16396 blkno: 0
29075  2014-05-06 23:29:51.471 PDT:LOCATION:  ReadBuffer_common,
bufmgr.c:483
27799  2014-05-06 23:29:51.473 PDT:LOG:  0: startup process (PID 29075)
exited with exit code 1
27799  2014-05-06 23:29:51.473 PDT:LOCATION:  LogChildExit,
postmaster.c:3281
27799  2014-05-06 23:29:51.473 PDT:LOG:  0: aborting startup due to
startup process failure

Cheers,

Jeff


Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Andres Freund
Hi,

On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> When recovering from a crash (with injection of a partial page write at
> time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> verification failure.
> 
> 16396 is a gin index.

Over which type? What was the load? make check?

> If I have it ignore checksum failures, there is no apparent misbehavior.
>  I'm trying to bisect it, but it could take a while and I thought someone
> might have some theories based on the log:

If you have the WAL a pg_xlogdump grepping for everything referring to
that block would be helpful.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-05-07 Thread Amit Kapila
On Tue, May 6, 2014 at 7:04 PM, Christoph Berg  wrote:
> Hi,
>
> if you split configuration and data by setting data_directory,
> postgresql.auto.conf is writen to the data directory
> (/var/lib/postgresql/9.4/main in Debian), but tried to be read from
> the etc directory (/etc/postgresql/9.4/main).
>
> One place to fix it would be in ProcessConfigFile in
> src/backend/utils/misc/guc-file.l:162 by always setting
> CallingFileName = NULL in src/backend/utils/misc/guc-file.l:162, but
> that breaks later in AbsoluteConfigLocation() when data_directory is
> NULL. (As the comment in ProcessConfigFile says.)

This problem occurs because we don't have the value of data_directory
set in postgresql.conf by the time we want to parse .auto.conf file
during server start.  The value of data_directory is only available after
processing of config files.  To fix it, we need to store the value of
data_directory during parse of postgresql.conf file so that we can use it
till data_directory is actually set.  Attached patch fixes the problem.
Could you please once confirm if it fixes the problem in your
env./scenario.

Another way to fix could be that during parse, we directly set the config
value of data_directory, but I didn't prefer that way because the parse
routine is called from multiple paths which later on process the values
separately.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_processing_auto_conf-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Simon Riggs
On 7 May 2014 08:17, Kouhei Kaigai  wrote:

> Let me clarify. This mechanism allows to add alternative scan/join paths
> including built-in ones, not only custom enhanced plan/exec node, isn't it?
> Probably, it is a variation of above proposition if we install a handler
> function that proposes built-in path nodes towards the request for scan/join.

Yes, I am looking for a way to give you the full extent of your
requirements, within the Postgres framework. I have time and funding
to assist you in achieving this in a general way that all may make use
of.

> Not only alternative data structure, alternative method to scan/join towards
> same data structure is also important, isn't it?

Agreed. My proposal is that if the planner allows the lookaside to an
FDW then we pass the query for full execution on the FDW. That means
that the scan, aggregate and join could take place via the FDW. i.e.
"Custom Plan" == lookaside + FDW

Or put another way, if we add Lookaside then we can just plug in the
pgstrom FDW directly and we're done. And everybody else's FDW will
work as well, so Citus etcc will not need to recode.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Heikki Linnakangas

On 05/06/2014 11:30 PM, Peter Geoghegan wrote:

On Tue, May 6, 2014 at 12:48 PM, Andres Freund  wrote:

Enthusiatically seconded. I've asked for that about three times without much 
success. If it had been my decision the patch wouldn't have been merged without 
that and other adjustments.


I'm almost certain that the only feedback of yours that I didn't
incorporate was that I didn't change the name of JsonbValue, a
decision I stand by, and also that I didn't add ascii art to
illustrate the on-disk format. I can write a patch that adds the
latter soon.


That would be great.

I found the serialization routine, convertJsonb() to be a bit of a mess. 
It's maintaining a custom stack of levels, which can be handy if you 
need to avoid recursion, but it's also relying on the native stack. And 
I didn't understand the point of splitting it into the "walk" and "put" 
functions; the division of labor between the two was far from clear 
IMHO. I started refactoring that, and ended up with the attached.


One detail that I found scary is that the estSize field in JsonbValue is 
not just any rough estimate. It's used ín the allocation of the output 
buffer for convertJsonb(), so it has to be large enough or you hit an 
assertion or buffer overflow. I believe it was correct as it was, but 
that kind of programming is always scary. I refactored the 
convertJsonb() function to use a StringInfo buffer instead, and removed 
estSize altogether.


This is still work-in-progress, but I thought I'd post this now to let 
people know I'm working on it. For example, the StringInfo isn't 
actually very well suited for this purpose, it might be better to have a 
custom buffer that's enlarged when needed.


For my own sanity, I started writing some docs on the on-disk format. 
See the comments in jsonb.h for my understanding of it. I moved around 
the structs a bit in jsonb.h, to make the format clearer, but the actual 
on-disk format is unchanged.


- Heikki

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index cf5d6f2..8413da7 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -65,7 +65,7 @@ jsonb_recv(PG_FUNCTION_ARGS)
 	if (version == 1)
 		str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 	else
-		elog(ERROR, "Unsupported jsonb version number %d", version);
+		elog(ERROR, "unsupported jsonb version number %d", version);
 
 	return jsonb_from_cstring(str, nbytes);
 }
@@ -249,7 +249,6 @@ jsonb_in_object_field_start(void *pstate, char *fname, bool isnull)
 	v.type = jbvString;
 	v.val.string.len = checkStringLen(strlen(fname));
 	v.val.string.val = pnstrdup(fname, v.val.string.len);
-	v.estSize = sizeof(JEntry) + v.val.string.len;
 
 	_state->res = pushJsonbValue(&_state->parseState, WJB_KEY, &v);
 }
@@ -290,8 +289,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 	JsonbInState *_state = (JsonbInState *) pstate;
 	JsonbValue	v;
 
-	v.estSize = sizeof(JEntry);
-
 	switch (tokentype)
 	{
 
@@ -300,7 +297,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 			v.type = jbvString;
 			v.val.string.len = checkStringLen(strlen(token));
 			v.val.string.val = pnstrdup(token, v.val.string.len);
-			v.estSize += v.val.string.len;
 			break;
 		case JSON_TOKEN_NUMBER:
 
@@ -312,7 +308,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 			v.type = jbvNumeric;
 			v.val.numeric = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(token), 0, -1));
 
-			v.estSize += VARSIZE_ANY(v.val.numeric) +sizeof(JEntry) /* alignment */ ;
 			break;
 		case JSON_TOKEN_TRUE:
 			v.type = jbvBool;
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 1caaa4a..49a1d4d 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -37,49 +37,16 @@
 #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), \
 			 JENTRY_POSMASK))
 
-/*
- * State used while converting an arbitrary JsonbValue into a Jsonb value
- * (4-byte varlena uncompressed representation of a Jsonb)
- *
- * ConvertLevel:  Bookkeeping around particular level when converting.
- */
-typedef struct convertLevel
-{
-	uint32		i;/* Iterates once per element, or once per pair */
-	uint32	   *header;			/* Pointer to current container header */
-	JEntry	   *meta;			/* This level's metadata */
-	char	   *begin;			/* Pointer into convertState.buffer */
-} convertLevel;
-
-/*
- * convertState:  Overall bookkeeping state for conversion
- */
-typedef struct convertState
-{
-	/* Preallocated buffer in which to form varlena/Jsonb value */
-	Jsonb	   *buffer;
-	/* Pointer into buffer */
-	char	   *ptr;
-
-	/* State for  */
-	convertLevel *allState,		/* Overall state array */
-			   *contPtr;		/* Cur container pointer (in allState) */
-
-	/* Current size of buffer containing allState array */
-	Size		levelSz;
-
-} convertState;
-
 static int	compareJsonbScalarValue(JsonbValue *a, Json

[HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Craig Ringer
Hi all

As part of development on BDR the issue of GUC globals not being
PGDLLEXPORT'ed has been run into a few times.

Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
concerns?

Barring objections I'll post a patch to do this tomorrow.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Kouhei Kaigai




> -Original Message-
> From: Simon Riggs [mailto:si...@2ndquadrant.com]
> Sent: Wednesday, May 07, 2014 5:02 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Tom Lane; Robert Haas; Andres Freund; PgHacker; Stephen Frost; Shigeru
> Hanada; Jim Mlodgenski; Peter Eisentraut; Kohei KaiGai
> Subject: Re: [v9.5] Custom Plan API
> 
> On 7 May 2014 08:17, Kouhei Kaigai  wrote:
> 
> > Let me clarify. This mechanism allows to add alternative scan/join
> > paths including built-in ones, not only custom enhanced plan/exec node,
> isn't it?
> > Probably, it is a variation of above proposition if we install a
> > handler function that proposes built-in path nodes towards the request
> for scan/join.
> 
> Yes, I am looking for a way to give you the full extent of your requirements,
> within the Postgres framework. I have time and funding to assist you in
> achieving this in a general way that all may make use of.
> 
> > Not only alternative data structure, alternative method to scan/join
> > towards same data structure is also important, isn't it?
> 
> Agreed. My proposal is that if the planner allows the lookaside to an FDW
> then we pass the query for full execution on the FDW. That means that the
> scan, aggregate and join could take place via the FDW. i.e.
> "Custom Plan" == lookaside + FDW
> 
> Or put another way, if we add Lookaside then we can just plug in the pgstrom
> FDW directly and we're done. And everybody else's FDW will work as well,
> so Citus etcc will not need to recode.
> 
Hmm. That sounds me, you intend to make FDW perform as a central facility
to host pluggable plan/exec stuff. Even though we have several things to be
clarified, I also think it's a direction worth to investigate.

Let me list up the things to be clarified / developed randomly.

* Join replacement by FDW; We still don't have consensus about join replacement
  by FDW. Probably, it will be designed to remote-join implementation primarily,
  however, things to do is similar. We may need to revisit the Hanada-san's
  proposition in the past.

* Lookaside for ANY relations; I want planner to try GPU-scan for any relations
  once installed, to reduce user's administration cost.
  It needs lookaside allow to specify a particular foreign-server, not foreign-
  table, then create ForeignScan node that is not associated with a particular
  foreign-table.

* ForeignScan node that is not associated with a particular foreign-table.
  Once we try to apply ForeignScan node instead of Sort or Aggregate, existing
  FDW implementation needs to be improved. These nodes scan on a materialized
  relation (generated on the fly), however, existing FDW code assumes
  ForeignScan node is always associated with a particular foreign-table.
  We need to eliminate this restriction.

* FDW method for MultiExec. In case when we can stack multiple ForeignScan
  nodes, it's helpful to support to exchange scanned tuples in their own
  data format. Let's assume two ForeignScan nodes are stacked. One performs
  like Sort, another performs like Scan. If they internally handle column-
  oriented data format, TupleTableSlot is not a best way for data exchange.

* Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented
  using writable FDW feature. Not a big issue, but don't forget it...

How about your opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Simon Riggs
On 7 May 2014 10:06, Kouhei Kaigai  wrote:

> Let me list up the things to be clarified / developed randomly.
>
> * Join replacement by FDW; We still don't have consensus about join 
> replacement
>   by FDW. Probably, it will be designed to remote-join implementation 
> primarily,
>   however, things to do is similar. We may need to revisit the Hanada-san's
>   proposition in the past.

Agreed. We need to push down joins into FDWs and we need to push down
aggregates also, so they can be passed to FDWs. I'm planning to look
at aggregate push down.

> * Lookaside for ANY relations; I want planner to try GPU-scan for any 
> relations
>   once installed, to reduce user's administration cost.
>   It needs lookaside allow to specify a particular foreign-server, not 
> foreign-
>   table, then create ForeignScan node that is not associated with a particular
>   foreign-table.

IMHO we would not want to add indexes to every column, on every table,
nor would we wish to use lookaside for all tables. It is a good thing
to be able to add optimizations for individual tables. GPUs are not
good for everything; it is good to be able to leverage their
strengths, yet avoid their weaknesses.

If do you want that, you can write an Event Trigger that automatically
adds a lookaside for any table.

> * ForeignScan node that is not associated with a particular foreign-table.
>   Once we try to apply ForeignScan node instead of Sort or Aggregate, existing
>   FDW implementation needs to be improved. These nodes scan on a materialized
>   relation (generated on the fly), however, existing FDW code assumes
>   ForeignScan node is always associated with a particular foreign-table.
>   We need to eliminate this restriction.

I don't think we need to do that, given the above.

> * FDW method for MultiExec. In case when we can stack multiple ForeignScan
>   nodes, it's helpful to support to exchange scanned tuples in their own
>   data format. Let's assume two ForeignScan nodes are stacked. One performs
>   like Sort, another performs like Scan. If they internally handle column-
>   oriented data format, TupleTableSlot is not a best way for data exchange.

I agree TupleTableSlot may not be best way for bulk data movement. We
probably need to look at buffering/bulk movement between executor
nodes in general, which would be of benefit for the FDW case also.
This would be a problem even for Custom Scans as originally presented
also, so I don't see much change there.

> * Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented
>   using writable FDW feature. Not a big issue, but don't forget it...

Yes, possible.


I hope these ideas make sense. This is early days and there may be
other ideas and much detail yet to come.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-05-07 Thread Christoph Berg
Re: Amit Kapila 2014-05-07 

> This problem occurs because we don't have the value of data_directory
> set in postgresql.conf by the time we want to parse .auto.conf file
> during server start.  The value of data_directory is only available after
> processing of config files.  To fix it, we need to store the value of
> data_directory during parse of postgresql.conf file so that we can use it
> till data_directory is actually set.  Attached patch fixes the problem.
> Could you please once confirm if it fixes the problem in your
> env./scenario.

Hi Amit,

the patch fixes the problem. Thanks for looking into this.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Heikki Linnakangas
So, apart from cleaning up the code, we really need to take a close look 
at the on-disk format now. The code can be cleaned up later, too, but 
we're going to be stuck with the on-disk format forever, so it's 
critical to get that right.


First, a few observations:

* JENTRY_ISFIRST is redundant. Whenever you deal with the Jentry struct, 
you know from the context which element in the array it is.


* JENTRY_ISNEST is set but never used.

* JENTRY_ISBOOL is defined as (JENTRY_ISNUMERIC | JENTRY_ISNEST), which 
seems confusing.


I'm going to proceed refactoring those things, which will change the 
on-disk format. It's late in the release cycle - these things really 
should've been cleaned up earlier - but it's important to get the 
on-disk format right. Shout if you have any objections.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Michael Paquier
On Wed, May 7, 2014 at 8:20 PM, Heikki Linnakangas
 wrote:
> So, apart from cleaning up the code, we really need to take a close look at
> the on-disk format now. The code can be cleaned up later, too, but we're
> going to be stuck with the on-disk format forever, so it's critical to get
> that right.
>
> First, a few observations:
>
> * JENTRY_ISFIRST is redundant. Whenever you deal with the Jentry struct, you
> know from the context which element in the array it is.
>
> * JENTRY_ISNEST is set but never used.
>
> * JENTRY_ISBOOL is defined as (JENTRY_ISNUMERIC | JENTRY_ISNEST), which
> seems confusing.
>
> I'm going to proceed refactoring those things, which will change the on-disk
> format. It's late in the release cycle - these things really should've been
> cleaned up earlier - but it's important to get the on-disk format right.
> Shout if you have any objections.
+1. It is saner to do that now than never.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andres Freund
Hi,

On 2014-05-07 14:20:19 +0300, Heikki Linnakangas wrote:
> So, apart from cleaning up the code, we really need to take a close look at
> the on-disk format now. The code can be cleaned up later, too, but we're
> going to be stuck with the on-disk format forever, so it's critical to get
> that right.

+1

> First, a few observations:

Agreed.

I'd like to add that:
* Imo we need space in jsonb ondisk values to indicate a format
  version. We won't fully get this right.
* The jentry representation should be changed so it's possible to get the type
  of a entry without checking individual types. That'll make code like
  findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
  much easier to read. Preferrably so it an have the same values (after
  shifting/masking) ask the JBE variants. And it needs space for futher
  types (or representations thereof).
* I wonder if the hash/object pair representation is wise and if it
  shouldn't be keys combined with offsets first, and then the
  values. That will make access much faster. And that's what jsonb
  essentially is about.
* I think both arrays and hashes should grow individual structs. With
  space for additional flags.

* I have doubts of the wisdom of allowing to embed jbvBinary values in
  JsonbValues. Although that can be changed later since it's not on disk.

> I'm going to proceed refactoring those things, which will change the on-disk
> format. It's late in the release cycle - these things really should've been
> cleaned up earlier - but it's important to get the on-disk format right.
> Shout if you have any objections.

I don't think it's likely that beta1 will be binary compatible with the
final version at this point.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 3:18 AM, Simon Riggs  wrote:
> If we believe that 25% of shared_buffers worth of heap blocks would
> flush the cache doing a SeqScan, why should we allow 400% of
> shared_buffers worth of index blocks?

I think you're comparing apples and oranges.  The 25% threshold is
answering the question "How big does a sequential scan have to be
before it's likely to flush so much so much unrelated data out of
shared_buffers that it hurts the performance of other things running
on the system?". So it's not really about whether or not things will
*fit* in the cache, but rather a judgement about at what point caching
that stuff is going to be less value than continuing to cache other
things.  Also, it's specifically a judgement about shared_buffers, not
system memory.

But effective_cache_size is used to estimate the likelihood that an
index scan which accesses the same heap or index block twice will
still be in cache on the second hit, and thus need to be faulted in
only once.  So this *is* a judgment about what will fit - generally
over a very short time scale.  And, since bringing a page into
shared_buffers from the OS cache is much less expensive than bringing
a page into memory from disk, it's really about what will fit in
overall system memory, not just shared_buffers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 7:40 AM, Andres Freund  wrote:
>> I'm going to proceed refactoring those things, which will change the on-disk
>> format. It's late in the release cycle - these things really should've been
>> cleaned up earlier - but it's important to get the on-disk format right.
>> Shout if you have any objections.

+1.

> I don't think it's likely that beta1 will be binary compatible with the
> final version at this point.

I rather think we're not ready for beta1 at this point (but I expect
to lose that argument).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Magnus Hagander
On Wed, May 7, 2014 at 1:20 PM, Heikki Linnakangas
wrote:

> So, apart from cleaning up the code, we really need to take a close look
> at the on-disk format now. The code can be cleaned up later, too, but we're
> going to be stuck with the on-disk format forever, so it's critical to get
> that right.
>
> First, a few observations:
>
> * JENTRY_ISFIRST is redundant. Whenever you deal with the Jentry struct,
> you know from the context which element in the array it is.
>
> * JENTRY_ISNEST is set but never used.
>
> * JENTRY_ISBOOL is defined as (JENTRY_ISNUMERIC | JENTRY_ISNEST), which
> seems confusing.
>
> I'm going to proceed refactoring those things, which will change the
> on-disk format. It's late in the release cycle - these things really
> should've been cleaned up earlier - but it's important to get the on-disk
> format right. Shout if you have any objections.


+1. It's now or never. If the on-disk format needs changing, not doing it
now is going to leave us with a "jsonc" in the future... Better bite the
bullet now.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andres Freund
On 2014-05-07 08:50:33 -0400, Robert Haas wrote:
> On Wed, May 7, 2014 at 7:40 AM, Andres Freund  wrote:
> > I don't think it's likely that beta1 will be binary compatible with the
> > final version at this point.
> 
> I rather think we're not ready for beta1 at this point (but I expect
> to lose that argument).

Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
problematic pieces of new code, locating unfinished pieces is part of
that. I don't see much point in forbidding incompatible changes in beta1
personally. That robs th the development cycle of the only period where
users can actually test the new version in a halfway sane manner and
report back with things that apparently broken.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Magnus Hagander
On Wed, May 7, 2014 at 2:56 PM, Andres Freund wrote:

> On 2014-05-07 08:50:33 -0400, Robert Haas wrote:
> > On Wed, May 7, 2014 at 7:40 AM, Andres Freund 
> wrote:
> > > I don't think it's likely that beta1 will be binary compatible with the
> > > final version at this point.
> >
> > I rather think we're not ready for beta1 at this point (but I expect
> > to lose that argument).
>
> Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
> problematic pieces of new code, locating unfinished pieces is part of
> that. I don't see much point in forbidding incompatible changes in beta1
> personally. That robs th the development cycle of the only period where
> users can actually test the new version in a halfway sane manner and
> report back with things that apparently broken.
>
>
We need to be very careful to tell people about it though. Preferrably if
we *know* a dump/reload will be needed to go beta1->beta2, we should
actually document that in the releasenotes of beta1 already. So people can
make proper plans..


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andres Freund
On 2014-05-07 15:00:01 +0200, Magnus Hagander wrote:
> On Wed, May 7, 2014 at 2:56 PM, Andres Freund wrote:
> 
> > On 2014-05-07 08:50:33 -0400, Robert Haas wrote:
> > > On Wed, May 7, 2014 at 7:40 AM, Andres Freund 
> > wrote:
> > > > I don't think it's likely that beta1 will be binary compatible with the
> > > > final version at this point.
> > >
> > > I rather think we're not ready for beta1 at this point (but I expect
> > > to lose that argument).
> >
> > Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
> > problematic pieces of new code, locating unfinished pieces is part of
> > that. I don't see much point in forbidding incompatible changes in beta1
> > personally. That robs th the development cycle of the only period where
> > users can actually test the new version in a halfway sane manner and
> > report back with things that apparently broken.
> >
> >
> We need to be very careful to tell people about it though. Preferrably if
> we *know* a dump/reload will be needed to go beta1->beta2, we should
> actually document that in the releasenotes of beta1 already. So people can
> make proper plans..

Yes, I think it actually makes sense to add that to *all* beta release
notes. Even in beta2, although slightly weakened.
That's not a new thing btw. E.g. 9.3 has had a catversion bump between
beta1/2:
git diff 09bd2acbe5ac866ce9..817a89423f429a6a8b -- 
src/include/catalog/catversion.h

The more interesting note probably is that there quite possibly won't be
pg_upgrade'ability...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Magnus Hagander
On Wed, May 7, 2014 at 3:04 PM, Andres Freund wrote:

> On 2014-05-07 15:00:01 +0200, Magnus Hagander wrote:
> > On Wed, May 7, 2014 at 2:56 PM, Andres Freund  >wrote:
> >
> > > On 2014-05-07 08:50:33 -0400, Robert Haas wrote:
> > > > On Wed, May 7, 2014 at 7:40 AM, Andres Freund 
> > > wrote:
> > > > > I don't think it's likely that beta1 will be binary compatible
> with the
> > > > > final version at this point.
> > > >
> > > > I rather think we're not ready for beta1 at this point (but I expect
> > > > to lose that argument).
> > >
> > > Well, I guess it depends on what we define 'beta1' to be. Imo
> evaluating
> > > problematic pieces of new code, locating unfinished pieces is part of
> > > that. I don't see much point in forbidding incompatible changes in
> beta1
> > > personally. That robs th the development cycle of the only period where
> > > users can actually test the new version in a halfway sane manner and
> > > report back with things that apparently broken.
> > >
> > >
> > We need to be very careful to tell people about it though. Preferrably if
> > we *know* a dump/reload will be needed to go beta1->beta2, we should
> > actually document that in the releasenotes of beta1 already. So people
> can
> > make proper plans..
>
> Yes, I think it actually makes sense to add that to *all* beta release
> notes. Even in beta2, although slightly weakened.
> That's not a new thing btw. E.g. 9.3 has had a catversion bump between
> beta1/2:
> git diff 09bd2acbe5ac866ce9..817a89423f429a6a8b --
> src/include/catalog/catversion.h
>
> The more interesting note probably is that there quite possibly won't be
> pg_upgrade'ability...
>
>
Yeah, that's the big thing really.

Requiring pg_upgrade between betas might even be "good" in the sense that
then we get more testing of pg_upgrade :) But requiring a dump/reload is
going to hurt people more.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane
Craig Ringer  writes:
> Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
> concerns?

That seems morally equivalent to "is there a reason not to make every
static variable global?".

IOW, no, I don't accept this proposition.  Every time we DLLEXPORT some
variable, we lose the freedom to redefine it later.  So DLLEXPORT'ing GUCs
should be on a case by case basis, just as for any other variable.  In
some cases it might be smarter to export a wrapper function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane
Magnus Hagander  writes:
> On Wed, May 7, 2014 at 2:56 PM, Andres Freund wrote:
>> Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
>> problematic pieces of new code, locating unfinished pieces is part of
>> that. I don't see much point in forbidding incompatible changes in beta1
>> personally. That robs th the development cycle of the only period where
>> users can actually test the new version in a halfway sane manner and
>> report back with things that apparently broken.

> We need to be very careful to tell people about it though. Preferrably if
> we *know* a dump/reload will be needed to go beta1->beta2, we should
> actually document that in the releasenotes of beta1 already. So people can
> make proper plans..

This seems like much ado about very little.  The policy will be the same
as it ever was: once beta1 is out, we prefer to avoid forcing an initdb,
but we'll do it if we have to.

In any case, +1 for fixing whatever needs to be fixed now; I expect to
have a fix for the limited-GIN-index-length issue later today, and that
really is also an on-disk format change, though it won't affect short
index entries.  ("Short" is TBD; I was thinking of hashing keys longer
than say 128 bytes.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Andres Freund
On 2014-05-07 09:35:06 -0400, Tom Lane wrote:
> Craig Ringer  writes:
> > Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
> > concerns?
> 
> That seems morally equivalent to "is there a reason not to make every
> static variable global?".
> 
> IOW, no, I don't accept this proposition.  Every time we DLLEXPORT some
> variable, we lose the freedom to redefine it later.  So DLLEXPORT'ing GUCs
> should be on a case by case basis, just as for any other variable.  In
> some cases it might be smarter to export a wrapper function.

I think what Craig actually tries to propose is to mark all GUCs
currently exported in headers PGDLLIMPORT. Currently it's easy to have
extensions that work on sane systems but not windows. If they're already
exposed in headers I don't think changes get any harder just because thy
also can get used on windows...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andres Freund
On 2014-05-07 09:44:36 -0400, Tom Lane wrote:
> Magnus Hagander  writes:
> > On Wed, May 7, 2014 at 2:56 PM, Andres Freund wrote:
> >> Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
> >> problematic pieces of new code, locating unfinished pieces is part of
> >> that. I don't see much point in forbidding incompatible changes in beta1
> >> personally. That robs th the development cycle of the only period where
> >> users can actually test the new version in a halfway sane manner and
> >> report back with things that apparently broken.
> 
> > We need to be very careful to tell people about it though. Preferrably if
> > we *know* a dump/reload will be needed to go beta1->beta2, we should
> > actually document that in the releasenotes of beta1 already. So people can
> > make proper plans..
> 
> This seems like much ado about very little.  The policy will be the same
> as it ever was: once beta1 is out, we prefer to avoid forcing an initdb,
> but we'll do it if we have to.

I think Magnus' point is that we should tell users that we'll try but
won't guarantee anything. -hackers knowing about it doesn't mean users
will know.

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Simon Riggs
On 7 May 2014 13:31, Robert Haas  wrote:
> On Wed, May 7, 2014 at 3:18 AM, Simon Riggs  wrote:
>> If we believe that 25% of shared_buffers worth of heap blocks would
>> flush the cache doing a SeqScan, why should we allow 400% of
>> shared_buffers worth of index blocks?
>
> I think you're comparing apples and oranges.

I understood the distinction, which is why I changed the direction of
my thinking to say

> Yes, we can make plans assuming we can use OS cache,
> but we shouldn't be churning shared_buffers when we execute those
> plans.

and hence why I proposed

> I think I'm arguing myself towards using a BufferAccessStrategy of
> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
> BitMapHeapScans.

which I hope will be effective in avoiding churn in shared_buffers
even though we may use much larger memory from the OS.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Tom Lane
Simon Riggs  writes:
> I think I'm arguing myself towards using a BufferAccessStrategy of
> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
> BitMapHeapScans.

As soon as you've got some hard evidence to present in favor of such
changes, we can discuss it.  I've got other things to do besides
hypothesize.

In the meantime, it seems like there is an emerging consensus that nobody
much likes the existing auto-tuning behavior for effective_cache_size,
and that we should revert that in favor of just increasing the fixed
default value significantly.  I see no problem with a value of say 4GB;
that's very unlikely to be worse than the pre-9.4 default (128MB) on any
modern machine.

Votes for or against?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Merlin Moncure
On Wed, May 7, 2014 at 9:07 AM, Tom Lane  wrote:
> Simon Riggs  writes:
>> I think I'm arguing myself towards using a BufferAccessStrategy of
>> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
>> BitMapHeapScans.
>
> As soon as you've got some hard evidence to present in favor of such
> changes, we can discuss it.  I've got other things to do besides
> hypothesize.

Let me throw out one last point:

It's pretty likely that s_b is going to be raised higher as a
percentage of RAM.   I never really bought into the conventional
wisdom of 25% and have had to set it lower many times.  Nevertheless,
it was a documented suggestion.

The core issues are:
1) There is no place to enter total system memory available to the
database in postgresql.conf
2) Memory settings (except for the above) are given as absolute
amounts, not percentages.

It would be a lot easier to standardize configurations particularly if
there was a way to electronically support #1 with auto-detection.
Then, e_c_s. s_b, work_mem, and various other settings could be given
using standard (and perhaps somewhat conservative) percentages using
the best and hopefully factually supported recommendations.   I
oversee dozens of servers in a virtualized environment (as most
enterprise shops are these days).  Everything is 'right sized', often
on demand, and often nobody bothers to adjust the various settings.

> In the meantime, it seems like there is an emerging consensus that nobody
> much likes the existing auto-tuning behavior for effective_cache_size,
> and that we should revert that in favor of just increasing the fixed
> default value significantly.  I see no problem with a value of say 4GB;
> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> modern machine.

In lieu of something fancy like the above, adjusting the defaults
seems a better way to go (so I vote to revert).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andres Freund
On 2014-05-07 10:07:07 -0400, Tom Lane wrote:
> In the meantime, it seems like there is an emerging consensus that nobody
> much likes the existing auto-tuning behavior for effective_cache_size,
> and that we should revert that in favor of just increasing the fixed
> default value significantly.  I see no problem with a value of say 4GB;
> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> modern machine.
> 
> Votes for or against?

+1 for increasing it to 4GB and remove the autotuning. I don't like the
current integration into guc.c much and a new static default doesn't
seem to be worse than the current autotuning. 

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 10:12 AM, Andres Freund  wrote:
> On 2014-05-07 10:07:07 -0400, Tom Lane wrote:
>> In the meantime, it seems like there is an emerging consensus that nobody
>> much likes the existing auto-tuning behavior for effective_cache_size,
>> and that we should revert that in favor of just increasing the fixed
>> default value significantly.  I see no problem with a value of say 4GB;
>> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
>> modern machine.
>>
>> Votes for or against?
>
> +1 for increasing it to 4GB and remove the autotuning. I don't like the
> current integration into guc.c much and a new static default doesn't
> seem to be worse than the current autotuning.

It was my proposal originally, so I assume I'd be counted as in favor,
but for the sake of clarity: +1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Magnus Hagander
On Wed, May 7, 2014 at 4:12 PM, Andres Freund wrote:

> On 2014-05-07 10:07:07 -0400, Tom Lane wrote:
> > In the meantime, it seems like there is an emerging consensus that nobody
> > much likes the existing auto-tuning behavior for effective_cache_size,
> > and that we should revert that in favor of just increasing the fixed
> > default value significantly.  I see no problem with a value of say 4GB;
> > that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> > modern machine.
> >
> > Votes for or against?
>
> +1 for increasing it to 4GB and remove the autotuning. I don't like the
> current integration into guc.c much and a new static default doesn't
> seem to be worse than the current autotuning.
>

+1.

If we can't make the autotuning better than that, we're better off holding
off on that one until we can actually figure out something better. (At
which point perhaps we can reach the level where we can just remove it..
But that's all handwaving about the future of course).


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Tom Lane
Robert Haas  writes:
> On Wed, May 7, 2014 at 3:18 AM, Simon Riggs  wrote:
>> If we believe that 25% of shared_buffers worth of heap blocks would
>> flush the cache doing a SeqScan, why should we allow 400% of
>> shared_buffers worth of index blocks?

> I think you're comparing apples and oranges.  The 25% threshold is
> answering the question "How big does a sequential scan have to be
> before it's likely to flush so much so much unrelated data out of
> shared_buffers that it hurts the performance of other things running
> on the system?". So it's not really about whether or not things will
> *fit* in the cache, but rather a judgement about at what point caching
> that stuff is going to be less value than continuing to cache other
> things.  Also, it's specifically a judgement about shared_buffers, not
> system memory.

> But effective_cache_size is used to estimate the likelihood that an
> index scan which accesses the same heap or index block twice will
> still be in cache on the second hit, and thus need to be faulted in
> only once.  So this *is* a judgment about what will fit - generally
> over a very short time scale.  And, since bringing a page into
> shared_buffers from the OS cache is much less expensive than bringing
> a page into memory from disk, it's really about what will fit in
> overall system memory, not just shared_buffers.

Another point is that the 25% seqscan threshold actually controls some
specific caching decisions, which effective_cache_size does not.  Raising
effective_cache_size "too high" is unlikely to result in cache trashing;
in fact I'd guess the opposite.  What that would do is cause the planner
to prefer indexscans over seqscans in more cases involving large tables.
But if you've got a table+index that's bigger than RAM, seqscans are
probably going to be worse for the OS cache than indexscans, because
they're going to require bringing in more data.

So I still think this whole argument is founded on shaky hypotheses
with a complete lack of hard data showing that a smaller default for
effective_cache_size would be better.  The evidence we have points
in the other direction.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-05-07 Thread Petr Jelinek

On 07/05/14 02:25, Petr Jelinek wrote:

On 06/05/14 19:05, Robert Haas wrote:

What I'm inclined to do is change the logic so that:

(1) After a crash-and-restart sequence, zero rw->rw_crashed_at, so
that anything which is still registered gets restarted immediately.


Yes, that's quite obvious change which I missed completely :).


(2) If a shmem-connected backend fails to release the deadman switch
or exits with an exit code other than 0 or 1, we crash-and-restart.  A
non-shmem-connected backend never causes a crash-and-restart.


+1


(3) When a background worker exits without triggering a
crash-and-restart, an exit code of precisely 0 causes the worker to be
unregistered; any other exit code has no special effect, so
bgw_restart_time controls.


+1



Ok, attached patch is my try at the proposed changes.

I don't like the reset_bgworker_crash_state function name, maybe you'll 
come up with something better...


I passes my tests for the desired behavior, hopefully I didn't miss some 
scenario.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index a6b25d8..e913395 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -472,13 +472,13 @@ bgworker_quickdie(SIGNAL_ARGS)
 	on_exit_reset();
 
 	/*
-	 * Note we do exit(0) here, not exit(2) like quickdie.  The reason is that
-	 * we don't want to be seen this worker as independently crashed, because
-	 * then postmaster would delay restarting it again afterwards.  If some
-	 * idiot DBA manually sends SIGQUIT to a random bgworker, the "dead man
-	 * switch" will ensure that postmaster sees this as a crash.
+	 * Note we do exit(2) here, not exit(0) for same reasons quickdie does.
+	 * Another reason is that bgworker is not restarted after exit(0).
+	 *
+	 * The bgw_restart_time does not apply here and the bgworker will be
+	 * restarted immediately.
 	 */
-	exit(0);
+	exit(2);
 }
 
 /*
@@ -832,8 +832,9 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
  * running but is no longer.
  *
  * In the latter case, the worker may be stopped temporarily (if it is
- * configured for automatic restart, or if it exited with code 0) or gone
- * for good (if it is configured not to restart and exited with code 1).
+ * configured for automatic restart and exited with code 1) or gone for
+ * good (if it exited with code other than 1 or if it is configured not
+ * to restart).
  */
 BgwHandleStatus
 GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6d09887..99d5e85 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -396,6 +396,7 @@ static void TerminateChildren(int signal);
 
 static int	CountChildren(int target);
 static int	CountUnconnectedWorkers(void);
+static void reset_bgworker_crash_state(void);
 static void maybe_start_bgworker(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
 static pid_t StartChildProcess(AuxProcType type);
@@ -2616,7 +2617,10 @@ reaper(SIGNAL_ARGS)
 			if (PgStatPID == 0)
 PgStatPID = pgstat_start();
 
-			/* some workers may be scheduled to start now */
+			/* reset the state of crashed bgworkers so their restart is not delayed */
+			reset_bgworker_crash_state();
+
+			/* workers may be scheduled to start now */
 			maybe_start_bgworker();
 
 			/* at this point we are really open for business */
@@ -2845,14 +2849,8 @@ CleanupBackgroundWorker(int pid,
 		snprintf(namebuf, MAXPGPATH, "%s: %s", _("worker process"),
  rw->rw_worker.bgw_name);
 
-		/* Delay restarting any bgworker that exits with a nonzero status. */
-		if (!EXIT_STATUS_0(exitstatus))
-			rw->rw_crashed_at = GetCurrentTimestamp();
-		else
-			rw->rw_crashed_at = 0;
-
 		/*
-		 * Additionally, for shared-memory-connected workers, just like a
+		 * For shared-memory-connected workers, just like a
 		 * backend, any exit status other than 0 or 1 is considered a crash
 		 * and causes a system-wide restart.
 		 */
@@ -2860,21 +2858,21 @@ CleanupBackgroundWorker(int pid,
 		{
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 			{
-rw->rw_crashed_at = GetCurrentTimestamp();
+rw->rw_crashed_at = 0;
 HandleChildCrash(pid, exitstatus, namebuf);
 return true;
 			}
 		}
 
-		if (!ReleasePostmasterChildSlot(rw->rw_child_slot))
-		{
-			/*
-			 * Uh-oh, the child failed to clean itself up.  Treat as a crash
-			 * after all.
-			 */
+		/* Mark any bgworker that exits with a nonzero status or fails to clean
+		 * up after itself as crashed and ready for restart at bgw_restart_time. */
+		if (!EXIT_STATUS_0(exitstatus) || !ReleasePostmasterChildSlot(rw->rw_child_slot))
 			rw->rw_crashed_at = GetCurrentTimestamp();
-			HandleChildCrash(pid, exitstatus, namebuf);
-			return tr

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Craig Ringer
On 05/07/2014 09:45 PM, Andres Freund wrote:
> I think what Craig actually tries to propose is to mark all GUCs
> currently exported in headers PGDLLIMPORT. Currently it's easy to have
> extensions that work on sane systems but not windows. If they're already
> exposed in headers I don't think changes get any harder just because thy
> also can get used on windows...

Yes, rather.

Exporting GUCs that're currently static wouldn't make sense.

I'm just taking about making what works on !windows work on Windows. If
a GUC is declared extern in a header, it should be PGDLLIMPORT.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane
Andres Freund  writes:
> On 2014-05-07 09:35:06 -0400, Tom Lane wrote:
>> Craig Ringer  writes:
>>> Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
>>> concerns?

>> That seems morally equivalent to "is there a reason not to make every
>> static variable global?".

> I think what Craig actually tries to propose is to mark all GUCs
> currently exported in headers PGDLLIMPORT.

There are few if any GUCs that aren't exposed in headers, just so that
guc.c can communicate with the owning modules.  That doesn't mean that
we want everybody in the world messing with them.

To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
we're okay with having third-party modules touching that.  Craig's
proposal is to remove human judgement from that process.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andrew Dunstan


On 05/07/2014 10:12 AM, Andres Freund wrote:

On 2014-05-07 10:07:07 -0400, Tom Lane wrote:

In the meantime, it seems like there is an emerging consensus that nobody
much likes the existing auto-tuning behavior for effective_cache_size,
and that we should revert that in favor of just increasing the fixed
default value significantly.  I see no problem with a value of say 4GB;
that's very unlikely to be worse than the pre-9.4 default (128MB) on any
modern machine.

Votes for or against?

+1 for increasing it to 4GB and remove the autotuning. I don't like the
current integration into guc.c much and a new static default doesn't
seem to be worse than the current autotuning.





+1. If we ever want to implement an auto-tuning heuristic it seems we're 
going to need some hard empirical evidence to support it, and that 
doesn't seem likely to appear any time soon.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Simon Riggs
On 7 May 2014 15:07, Tom Lane  wrote:
> Simon Riggs  writes:
>> I think I'm arguing myself towards using a BufferAccessStrategy of
>> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
>> BitMapHeapScans.
>
> As soon as you've got some hard evidence to present in favor of such
> changes, we can discuss it.  I've got other things to do besides
> hypothesize.

Now we have a theory to test, I'll write a patch and we can collect
evidence for, or against.

> In the meantime, it seems like there is an emerging consensus that nobody
> much likes the existing auto-tuning behavior for effective_cache_size,
> and that we should revert that in favor of just increasing the fixed
> default value significantly.  I see no problem with a value of say 4GB;
> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> modern machine.
>
> Votes for or against?

+1 for fixed 4GB and remove the auto-tuning code.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Simon Riggs
On 7 May 2014 15:10, Merlin Moncure  wrote:

> The core issues are:
> 1) There is no place to enter total system memory available to the
> database in postgresql.conf
> 2) Memory settings (except for the above) are given as absolute
> amounts, not percentages.

Those sound useful starting points.

The key issue for me is that effective_cache_size is a USERSET. It
applies per-query, just like work_mem (though work_mem is per query
node).

If we had "total system memory" we wouldn't know how to divide it up
amongst users since we have no functionality for "workload
management".

It would be very nice to be able to tell Postgres that "I have 64GB
RAM, use it wisely". At present, any and all users can set
effective_cache_size and work_mem to any value they please, any time
they wish and thus overuse available memory. Which is why I've had to
write plugins to manage the memory allocations better in userspace.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread David G Johnston
Tom Lane-2 wrote
> Andres Freund <

> andres@

> > writes:
>> On 2014-05-07 09:35:06 -0400, Tom Lane wrote:
>>> Craig Ringer <

> craig@

> > writes:
 Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
 concerns?
> 
>>> That seems morally equivalent to "is there a reason not to make every
>>> static variable global?".
> 
>> I think what Craig actually tries to propose is to mark all GUCs
>> currently exported in headers PGDLLIMPORT.
> 
> There are few if any GUCs that aren't exposed in headers, just so that
> guc.c can communicate with the owning modules.  That doesn't mean that
> we want everybody in the world messing with them.
> 
> To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
> we're okay with having third-party modules touching that.  Craig's
> proposal is to remove human judgement from that process.

So third-party modules that use GUC's that are not PGDLLEXPORT are doing so
improperly - even if it works for them because they only care/test
non-Windows platforms?

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/PGDLLEXPORTing-all-GUCs-tp5802901p5802955.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Andres Freund
On 2014-05-07 10:29:36 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-05-07 09:35:06 -0400, Tom Lane wrote:
> >> Craig Ringer  writes:
> >>> Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
> >>> concerns?
> 
> >> That seems morally equivalent to "is there a reason not to make every
> >> static variable global?".
> 
> > I think what Craig actually tries to propose is to mark all GUCs
> > currently exported in headers PGDLLIMPORT.
> 
> There are few if any GUCs that aren't exposed in headers, just so that
> guc.c can communicate with the owning modules.  That doesn't mean that
> we want everybody in the world messing with them.
> 
> To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
> we're okay with having third-party modules touching that.  Craig's
> proposal is to remove human judgement from that process.

It's just levelling the planefield between platforms. If I had an idea
how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on
windows.
The problem is that there's lot of variables which aren't exported and
which we'll only discover after the release. Just look at what
e.g. postgres_fdw needed. It's not particularly unlikely that others
fdws need some of those as well. But they can't change the release at
the same time.

If we want to declare variables off limits to extension/external code we
need a solution that works on !windows as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane
Andres Freund  writes:
> On 2014-05-07 10:29:36 -0400, Tom Lane wrote:
>> To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
>> we're okay with having third-party modules touching that.  Craig's
>> proposal is to remove human judgement from that process.

> It's just levelling the planefield between platforms. If I had an idea
> how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on
> windows.
> The problem is that there's lot of variables which aren't exported and
> which we'll only discover after the release. Just look at what
> e.g. postgres_fdw needed. It's not particularly unlikely that others
> fdws need some of those as well. But they can't change the release at
> the same time.

[ shrug... ] That problem is uncorrelated with GUC status, however.
If that's your argument for a patch, then the patch should DLLEXPORT
*every single non-static variable*.  Which is a discussion we've already
had, and rejected.

I'd not be against an automatic mechanism for that, and indeed put
considerable work into trying to make it happen a few months ago.  But
I'll resist wholesale cluttering of the source code with those markers.
As long as we have to have them, I think we should use them in the way
I outlined, that we mark only variables that are "considered okay to
access".  In fact, GUCs are exactly the *last* variables that should get
marked that way automatically, because so many of them are global only
because of the need for guc.c to communicate with the owning module,
not because we want anything else touching them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)

2014-05-07 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, May 6, 2014 at 8:08 PM, Tom Lane  wrote:
>> The early-exit code path supposes that JB_ROOT_COUNT is absolutely
>> reliable as an indicator that there's nothing in the jsonb value.
>> On the other hand, the realloc logic inside the iteration loop implies
>> that JB_ROOT_COUNT is just an untrustworthy estimate.  Which theory is
>> correct?  And why is there not a comment to be seen anywhere?  If the code
>> is correct then this logic is certainly worthy of a comment or three.

> JsonbIteratorNext() is passed "false" as its skipNested argument. It's
> recursive.

And?

I think you're just proving the point that this code is woefully
underdocumented.  If there were, somewhere, some comment explaining
what the heck JB_ROOT_COUNT actually counts, maybe I wouldn't be asking
this question.  jsonb.h is certainly not divulging any such information.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 11:19 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2014-05-07 10:29:36 -0400, Tom Lane wrote:
>>> To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
>>> we're okay with having third-party modules touching that.  Craig's
>>> proposal is to remove human judgement from that process.
>
>> It's just levelling the planefield between platforms. If I had an idea
>> how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on
>> windows.
>> The problem is that there's lot of variables which aren't exported and
>> which we'll only discover after the release. Just look at what
>> e.g. postgres_fdw needed. It's not particularly unlikely that others
>> fdws need some of those as well. But they can't change the release at
>> the same time.
>
> [ shrug... ] That problem is uncorrelated with GUC status, however.
> If that's your argument for a patch, then the patch should DLLEXPORT
> *every single non-static variable*.  Which is a discussion we've already
> had, and rejected.
>
> I'd not be against an automatic mechanism for that, and indeed put
> considerable work into trying to make it happen a few months ago.  But
> I'll resist wholesale cluttering of the source code with those markers.
> As long as we have to have them, I think we should use them in the way
> I outlined, that we mark only variables that are "considered okay to
> access".  In fact, GUCs are exactly the *last* variables that should get
> marked that way automatically, because so many of them are global only
> because of the need for guc.c to communicate with the owning module,
> not because we want anything else touching them.

I don't accept this argument.  In EnterpriseDB's Advanced Server fork
of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT
precisely because we have external modules that need access to them.
Of course, what that means is that when PostgreSQL changes things
around in a future release, we have to go revise those external
modules as well.  However, that just doesn't happen often enough to
actually pose a significant problem for us.  It's not really accurate
to think that people are only going to rely on the things that we
choose to PGDLLEXPORT.  It's more accurate to think that when we don't
mark things PGDLLEXPORT, we're forcing people to decide between (1)
giving up on writing their extension, (2) having that extension not
work on Windows, (3) submitting a patch to add a PGDLLEXPORT marking
and waiting an entire release cycle for that to go G.A., and then
still not being able to support older versions, or (4) forking
PostgreSQL.  That's an unappealing list of options.

I would not go so far as to say that we should PGDLLEXPORT absolutely
every non-static variable.  But I think adding those markings to every
GUC we've got is perfectly reasonable.  I am quite sure that the
2ndQuadrant folks know that they'll be on the hook to update any code
they write that uses those variables if and when a future version of
PostgreSQL whacks things around.  But that's not a new problem - the
same thing happens when a function signature changes, or when a
variable that does happen to have a PGDLLEXPORT marking changes.  And
at least in my experience it's also not a large problem.  The amount
of time EnterpriseDB spends updating our (large!) amount of
proprietary code in response to such changes is a small percentage of
our overall development time.

Enabling extensibility is a far more important goal than keeping
people from committing to interfaces that may change in the future,
especially since the latter is a losing battle anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane
Robert Haas  writes:
> I don't accept this argument.  In EnterpriseDB's Advanced Server fork
> of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT
> precisely because we have external modules that need access to them.

Well, that's an argument for marking every darn global variable as
PGDLLEXPORT.  But it's *not* an argument for marking GUCs in particular
that way.  In particular, you are conveniently ignoring the point that
GUCs are much more likely to be global as an artifact of the way guc.c
is modularized than because we actually think they should be globally
accessible.

If Craig has a concrete argument why all GUCs should be accessible
to external modules, then let's see it (after which we'd better debate
exposing the few that are in fact static in guc.c).  Or if you want
to hang your hat on the platform-leveling argument, then we should be
re-debating exporting *all* global variables.  But as far as the actually
proposed patch goes, all I'm hearing is very confused thinking.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote:
> Agreed. My proposal is that if the planner allows the lookaside to an
> FDW then we pass the query for full execution on the FDW. That means
> that the scan, aggregate and join could take place via the FDW. i.e.
> "Custom Plan" == lookaside + FDW

How about we get that working for FDWs to begin with and then we can
come back to this idea..?  We're pretty far from join-pushdown or
aggregate-pushdown to FDWs, last I checked, and having those would be a
massive win for everyone using FDWs.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)

2014-05-07 Thread Heikki Linnakangas

On 05/07/2014 06:27 PM, Tom Lane wrote:

I think you're just proving the point that this code is woefully
underdocumented.  If there were, somewhere, some comment explaining
what the heck JB_ROOT_COUNT actually counts, maybe I wouldn't be asking
this question.  jsonb.h is certainly not divulging any such information.


After having reverse-engineered the convertJsonb code, I think I can 
explain what JB_ROOT_COUNT is.


If the root of the Jsonb datum is an array, it's the number of elements 
in that top-level array. If it's an object, it's the number of key/value 
pairs in that top-level object. Some of the elements of that array (or 
values of the object) can be arrays or objects themselves.


gin_extract_jsonb recursively extracts all the elements, keys and values 
of any sub-object too, but JB_ROOT_COUNT only counts the top-level elements.


(I hope this is made a bit more clear in the comments I added in the 
patch I posted this morning)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Andres Freund
On 2014-05-07 12:21:55 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > I don't accept this argument.  In EnterpriseDB's Advanced Server fork
> > of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT
> > precisely because we have external modules that need access to them.
> 
> Well, that's an argument for marking every darn global variable as
> PGDLLEXPORT.  But it's *not* an argument for marking GUCs in particular
> that way.  In particular, you are conveniently ignoring the point that
> GUCs are much more likely to be global as an artifact of the way guc.c
> is modularized than because we actually think they should be globally
> accessible.

GUCs in general are user configurable things. So it's not particularly
unreasonable to assume that a significant fraction of them are of
interest to extensions. And it's not like exporting them gives way to
many additional dangers - they already can be overwritten.
In fact, I am pretty sure that nearly all of these cases are about
*reading* the underlying variable not writing them. It's pretty darn
less convenient and far slower to get the config variable as text and
then convert it to the underlying type.

> (after which we'd better debate  exposing the few that are in fact
> static in guc.c).

I plan to do that for most of them - completely independently of this
topic. I think 'export'ing a variable in several files is a horrible idea.

> Or if you want
> to hang your hat on the platform-leveling argument, then we should be
> re-debating exporting *all* global variables.

Yes. I am wondering whether that's not the most sensible course. It's a
pita, but essentially we'd have to do a global s/export/pg_export/ in
the headers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote:
> IMHO we would not want to add indexes to every column, on every table,
> nor would we wish to use lookaside for all tables. It is a good thing
> to be able to add optimizations for individual tables. GPUs are not
> good for everything; it is good to be able to leverage their
> strengths, yet avoid their weaknesses.

It's the optimizer's job to figure out which path to pick though, based
on which will have the lowest cost.

> If do you want that, you can write an Event Trigger that automatically
> adds a lookaside for any table.

This sounds terribly ugly and like we're pushing optimization decisions
on to the user instead of just figuring out what the best answer is.

> I agree TupleTableSlot may not be best way for bulk data movement. We
> probably need to look at buffering/bulk movement between executor
> nodes in general, which would be of benefit for the FDW case also.
> This would be a problem even for Custom Scans as originally presented
> also, so I don't see much change there.

Being able to do bulk movement would be useful, but (as I proposed
months ago) being able to do asyncronous returns would be extremely
useful also, when you consider FDWs and Append()- the main point there
being that you want to keep the FDWs busy and working in parallel.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 12:21 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I don't accept this argument.  In EnterpriseDB's Advanced Server fork
>> of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT
>> precisely because we have external modules that need access to them.
>
> Well, that's an argument for marking every darn global variable as
> PGDLLEXPORT.  But it's *not* an argument for marking GUCs in particular
> that way.  In particular, you are conveniently ignoring the point that
> GUCs are much more likely to be global as an artifact of the way guc.c
> is modularized than because we actually think they should be globally
> accessible.
>
> If Craig has a concrete argument why all GUCs should be accessible
> to external modules, then let's see it (after which we'd better debate
> exposing the few that are in fact static in guc.c).  Or if you want
> to hang your hat on the platform-leveling argument, then we should be
> re-debating exporting *all* global variables.  But as far as the actually
> proposed patch goes, all I'm hearing is very confused thinking.

I think there's actually a very good reason to think that GUCs are
good candidates for this treatment, which is that, by definition, the
GUC is a public interface: you can change it with a SET command.  It's
certainly easier to imagine an extension wanting access to
update_process_title than, say, criticalRelcachesBuilt.

But maybe you're right and we should revisit the idea of exposing
everything.  A quick grep through src/include suggests that GUCs are a
big percentage of what's not marked PGDLLEXPORT anyway, and the other
stuff that's in there is stuff like PgStartTime and PostmasterPid
which hardly seem like silly things to expose.

I certainly think we should err on the side of exposing stuff that
people think might be useful rather than pretending that we can stop
them from using symbols by refusing to PGDLLEXPORT them.  We can't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)

2014-05-07 Thread Tom Lane
Heikki Linnakangas  writes:
> gin_extract_jsonb recursively extracts all the elements, keys and values 
> of any sub-object too, but JB_ROOT_COUNT only counts the top-level elements.

Got it.  So if the top level is empty, we can exit early, but otherwise we
use its length * 2 as a guess at how big the output will be; which will
be right if it's an object without further substructure, and otherwise
might need enlargement.

> (I hope this is made a bit more clear in the comments I added in the 
> patch I posted this morning)

Didn't read that yet, but will incorporate this info into the jsonb_gin
patch I'm working on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane
Robert Haas  writes:
> On Wed, May 7, 2014 at 12:21 PM, Tom Lane  wrote:
>> If Craig has a concrete argument why all GUCs should be accessible
>> to external modules, then let's see it (after which we'd better debate
>> exposing the few that are in fact static in guc.c).

> I think there's actually a very good reason to think that GUCs are
> good candidates for this treatment, which is that, by definition, the
> GUC is a public interface: you can change it with a SET command.

Sure, and we provide public APIs for accessing/setting GUCs.  The SET
side of that is most emphatically *not* "just set the C variable".
Yeah, you can get away with reading them like that, assuming you want
the internal representation not the user-visible one.  In any case,
I've not heard the use-case why all (and only) GUCs might need to be
readable in that way.

Again, I'm not arguing against a proposal that we should automatically
export all globally-declared variables for platform-levelling reasons.
I *am* saying that I find a proposal to do that just to GUCs to be
unsupported by any argument made so far.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Andres Freund
On 2014-05-07 13:08:52 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Wed, May 7, 2014 at 12:21 PM, Tom Lane  wrote:
> >> If Craig has a concrete argument why all GUCs should be accessible
> >> to external modules, then let's see it (after which we'd better debate
> >> exposing the few that are in fact static in guc.c).
> 
> > I think there's actually a very good reason to think that GUCs are
> > good candidates for this treatment, which is that, by definition, the
> > GUC is a public interface: you can change it with a SET command.
> 
> Sure, and we provide public APIs for accessing/setting GUCs.

As strings. Not a useful representation for C code... And to avoid units
getting tacked on you need to first get the config option number, then
allocate an array on the stack, call GetConfigOptionByNum(), then parse
the result into the underlying type.

Meh.

> The SET
> side of that is most emphatically *not* "just set the C variable".
> Yeah, you can get away with reading them like that, assuming you want
> the internal representation not the user-visible one.  In any case,
> I've not heard the use-case why all (and only) GUCs might need to be
> readable in that way.

I think you're making up the 'and only' part. There's lots of variables
that should/need to be exported. Just look at the amount of mess you
cleaned up with variou extensions not actually working on
windows...
Last time round you argued against exporting all variables. So Craig
seems to have choosen a subset that's likely to be needed.

> Again, I'm not arguing against a proposal that we should automatically
> export all globally-declared variables for platform-levelling reasons.
> I *am* saying that I find a proposal to do that just to GUCs to be
> unsupported by any argument made so far.

Well, then let's start discussing that proposal then. I propose having
defining a 'pg_export' macro that's suitably defined by the buildsystem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Jeff Janes
On Wed, May 7, 2014 at 12:48 AM, Andres Freund wrote:

> Hi,
>
> On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> > When recovering from a crash (with injection of a partial page write at
> > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> > verification failure.
> >
> > 16396 is a gin index.
>
> Over which type? What was the load? make check?
>

A gin index on text[].

The load is a variation of the crash recovery tester I've been using the
last few years, this time adapted to use a gin index in a rather unnatural
way.  I just increment a counter on a random row repeatedly via a unique
key, but for this purpose that unique key is stuffed into text[] along with
a bunch of cruft.  The cruft is text representations of negative integers,
the actual key is text representation of nonnegative integers.

The test harness (patch to induce crashes, and two driving programs) and a
preserved data directory are here:

https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing

(role jjanes, database jjanes)

As far as I can tell, this problem goes back to the beginning of page
checksums.



> > If I have it ignore checksum failures, there is no apparent misbehavior.
> >  I'm trying to bisect it, but it could take a while and I thought someone
> > might have some theories based on the log:
>
> If you have the WAL a pg_xlogdump grepping for everything referring to
> that block would be helpful.
>

The only record which mentions block 28486 by name is this one:

rmgr: Gin len (rec/tot):   1576/  1608, tx:   77882205, lsn:
11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page, node:
1663/16384/16396 blkno: 28486

However, I think that that record precedes the recovery start point.

Cheers,

Jeff


Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 1:08 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, May 7, 2014 at 12:21 PM, Tom Lane  wrote:
>>> If Craig has a concrete argument why all GUCs should be accessible
>>> to external modules, then let's see it (after which we'd better debate
>>> exposing the few that are in fact static in guc.c).
>
>> I think there's actually a very good reason to think that GUCs are
>> good candidates for this treatment, which is that, by definition, the
>> GUC is a public interface: you can change it with a SET command.
>
> Sure, and we provide public APIs for accessing/setting GUCs.  The SET
> side of that is most emphatically *not* "just set the C variable".
> Yeah, you can get away with reading them like that, assuming you want
> the internal representation not the user-visible one.  In any case,
> I've not heard the use-case why all (and only) GUCs might need to be
> readable in that way.

My experience is that GUCs are a common thing to want expose to
extensions, and that C code usually wants the internal form, not the
string form.  I'm not arguing that nothing else needs to be exposed,
but if there's a better argument possible for exposing the GUC
variables than the fact that a bunch of people with experience
developing PostgreSQL extensions view that as a real need, I can't
think what it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Simon Riggs
On 7 May 2014 17:43, Stephen Frost  wrote:
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> IMHO we would not want to add indexes to every column, on every table,
>> nor would we wish to use lookaside for all tables. It is a good thing
>> to be able to add optimizations for individual tables. GPUs are not
>> good for everything; it is good to be able to leverage their
>> strengths, yet avoid their weaknesses.
>
> It's the optimizer's job to figure out which path to pick though, based
> on which will have the lowest cost.

Of course. I'm not suggesting otherwise.

>> If do you want that, you can write an Event Trigger that automatically
>> adds a lookaside for any table.
>
> This sounds terribly ugly and like we're pushing optimization decisions
> on to the user instead of just figuring out what the best answer is.

I'm proposing that we use a declarative approach, just like we do when
we say CREATE INDEX.

The idea is that we only consider a lookaside when a lookaside has
been declared. Same as when we add an index, the optimizer considers
whether to use that index. What we don't want to happen is that the
optimizer considers a GIN plan, even when a GIN index is not
available.

I'll explain it more at the developer meeting. It probably sounds a
bit weird at first.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Andres Freund
Hi,

On 2014-05-07 10:21:26 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 12:48 AM, Andres Freund wrote:
> 
> > Hi,
> >
> > On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> > > When recovering from a crash (with injection of a partial page write at
> > > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> > > verification failure.
> > >
> > > 16396 is a gin index.
> >
> > Over which type? What was the load? make check?
> >
> 
> A gin index on text[].
> 
> The load is a variation of the crash recovery tester I've been using the
> last few years, this time adapted to use a gin index in a rather unnatural
> way.  I just increment a counter on a random row repeatedly via a unique
> key, but for this purpose that unique key is stuffed into text[] along with
> a bunch of cruft.  The cruft is text representations of negative integers,
> the actual key is text representation of nonnegative integers.
> 
> The test harness (patch to induce crashes, and two driving programs) and a
> preserved data directory are here:
> 
> https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing
> 
> (role jjanes, database jjanes)
> 
> As far as I can tell, this problem goes back to the beginning of page
> checksums.

Interesting.

> > > If I have it ignore checksum failures, there is no apparent misbehavior.
> > >  I'm trying to bisect it, but it could take a while and I thought someone
> > > might have some theories based on the log:
> >
> > If you have the WAL a pg_xlogdump grepping for everything referring to
> > that block would be helpful.
> >
> 
> The only record which mentions block 28486 by name is this one:

Hm, try running it with -b specified.

> rmgr: Gin len (rec/tot):   1576/  1608, tx:   77882205, lsn:
> 11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page, node:
> 1663/16384/16396 blkno: 28486
> 
> However, I think that that record precedes the recovery start point.

If that's the case it seems likely that a PageSetLSN() or PageSetDirty()
are missing somewhere...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 7 May 2014 17:43, Stephen Frost  wrote:
> > It's the optimizer's job to figure out which path to pick though, based
> > on which will have the lowest cost.
> 
> Of course. I'm not suggesting otherwise.
> 
> >> If do you want that, you can write an Event Trigger that automatically
> >> adds a lookaside for any table.
> >
> > This sounds terribly ugly and like we're pushing optimization decisions
> > on to the user instead of just figuring out what the best answer is.
> 
> I'm proposing that we use a declarative approach, just like we do when
> we say CREATE INDEX.

There's quite a few trade-offs when it comes to indexes though.  I'm
trying to figure out when you wouldn't want to use a GPU, if it's
available to you and the cost model says it's faster?  To me, that's
kind of like saying you want a declarative approach for when to use a
HashJoin.

> The idea is that we only consider a lookaside when a lookaside has
> been declared. Same as when we add an index, the optimizer considers
> whether to use that index. What we don't want to happen is that the
> optimizer considers a GIN plan, even when a GIN index is not
> available.

Yes, I understood your proposal- I just don't agree with it. ;)

For MatViews and/or Indexes, there are trade-offs to be had as it
relates to disk space, insert speed, etc.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus
On 05/07/2014 07:31 AM, Andrew Dunstan wrote:

> +1. If we ever want to implement an auto-tuning heuristic it seems we're
> going to need some hard empirical evidence to support it, and that
> doesn't seem likely to appear any time soon.

4GB default it is, then.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 4:40 AM, Andres Freund  wrote:
> * Imo we need space in jsonb ondisk values to indicate a format
>   version. We won't fully get this right.

That would be nice.

> * The jentry representation should be changed so it's possible to get the type
>   of a entry without checking individual types. That'll make code like
>   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
>   much easier to read. Preferrably so it an have the same values (after
>   shifting/masking) ask the JBE variants. And it needs space for futher
>   types (or representations thereof).

I don't know what you mean. Every place that macros like
JBE_ISNUMERIC() are used subsequently involves access to (say) numeric
union fields. At best, you could have those functions check that the
types match, and then handle each case in a switch that only looked at
(say) the "candidate", but that doesn't really save much at all. It
wouldn't take much to have the macros give enum constant values back
as you suggest, though.

> * I wonder if the hash/object pair representation is wise and if it
>   shouldn't be keys combined with offsets first, and then the
>   values. That will make access much faster. And that's what jsonb
>   essentially is about.

I like that the physical layout reflects the layout of the original
JSON document. Besides, I don't think this is obviously the case. Are
you sure that it won't be more useful to make children as close to
their parents as possible? Particularly given idiomatic usage. Jsonb,
in point of fact, *is* fast.

> * I think both arrays and hashes should grow individual structs. With
>   space for additional flags.

Why?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus
On 05/06/2014 10:35 PM, Peter Geoghegan wrote:
> +1. In my view, we probably should have set it to a much higher
> absolute default value. The main problem with setting it to any
> multiple of shared_buffers that I can see is that shared_buffers is a
> very poor proxy for what effective_cache_size is supposed to
> represent. In general, the folk wisdom around sizing shared_buffers
> has past its sell-by date.

Unfortunately nobody has the time/resources to do the kind of testing
required for a new recommendation for shared_buffers.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:
> Unfortunately nobody has the time/resources to do the kind of testing
> required for a new recommendation for shared_buffers.

I meant to suggest that the buffer manager could be improved to the
point that the old advice becomes obsolete. Right now, it's much
harder to analyze shared_buffers than it should be, presumably because
of the problems with the buffer manager. I think that if we could
formulate better *actionable* advice around what we have right now,
that would have already happened.

We ought to be realistic about the fact that the current
recommendations around sizing shared_buffers are nothing more than
folk wisdom. That's the best we have right now, but that seems quite
unsatisfactory to me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andres Freund
On 2014-05-07 10:55:28 -0700, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 4:40 AM, Andres Freund  wrote:
> > * The jentry representation should be changed so it's possible to get the 
> > type
> >   of a entry without checking individual types. That'll make code like
> >   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
> >   much easier to read. Preferrably so it an have the same values (after
> >   shifting/masking) ask the JBE variants. And it needs space for futher
> >   types (or representations thereof).
>
> I don't know what you mean. Every place that macros like
> JBE_ISNUMERIC() are used subsequently involves access to (say) numeric
> union fields. At best, you could have those functions check that the
> types match, and then handle each case in a switch that only looked at
> (say) the "candidate", but that doesn't really save much at all. It
> wouldn't take much to have the macros give enum constant values back
> as you suggest, though.

Compare
for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (JBE_ISNULL(*e) && key->type == jbvNull)
{
result->type = jbvNull;
result->estSize = sizeof(JEntry);
}
else if (JBE_ISSTRING(*e) && key->type == jbvString)
{
result->type = jbvString;
result->val.string.val = data + JBE_OFF(*e);
result->val.string.len = JBE_LEN(*e);
result->estSize = sizeof(JEntry) + 
result->val.string.len;
}
else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
{
result->type = jbvNumeric;
result->val.numeric = (Numeric) (data + 
INTALIGN(JBE_OFF(*e)));

result->estSize = 2 * sizeof(JEntry) +
VARSIZE_ANY(result->val.numeric);
}
else if (JBE_ISBOOL(*e) && key->type == jbvBool)
{
result->type = jbvBool;
result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
result->estSize = sizeof(JEntry);
}
else
continue;

if (compareJsonbScalarValue(key, result) == 0)
return result;
}
with

for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (!JBE_TYPE_IS_SCALAR(*e))
continue;

if (JBE_TYPE(*e) != key->type)
continue;

result = getJsonbValue(e);

if (compareJsonbScalarValue(key, result) == 0)
return result;
}

Yes, it's not a fair comparison because I made up getJsonbValue(). But
it's *much* more readable regardless. And there's more places that could
use it. Like the second half of findJsonbValueFromSuperHeader(). FWIW,
that's one of the requests I definitely made before.

> > * I wonder if the hash/object pair representation is wise and if it
> >   shouldn't be keys combined with offsets first, and then the
> >   values. That will make access much faster. And that's what jsonb
> >   essentially is about.
>
> I like that the physical layout reflects the layout of the original
> JSON document.

Don't see much value in that. This is a binary representation and it'd
be bijective.

> Besides, I don't think this is obviously the case. Are
> you sure that it won't be more useful to make children as close to
> their parents as possible? Particularly given idiomatic usage.

Because - if done right - it would allow elementwise access without
scanning previous values.

> > * I think both arrays and hashes should grow individual structs. With
> >   space for additional flags.

> Why?

Because a) it will make the code more readable b) it'll allow for
adding different representations of hashes/arrays.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Jeff Janes
On Tue, May 6, 2014 at 9:55 AM, Andres Freund wrote:

> On 2014-05-06 17:43:45 +0100, Simon Riggs wrote:
>


> > All this changes is the cost of
> > IndexScans that would use more than 25% of shared_buffers worth of
> > data. Hopefully not many of those in your workload. Changing the cost
> > doesn't necessarily prevent index scans either. And if there are many
> > of those in your workload AND you run more than one at same time, then
> > the larger setting will work against you. So the benefit window for
> > such a high setting is slim, at best.
>

Not only do you need to run more than one at a time, but they also must use
mostly disjoint sets of data, in order for the larger estimate to be bad.


>
> Why? There's many workloads where indexes are larger than shared buffers
> but fit into the operating system's cache. And that's precisely what
> effective_cache_size is about.
>

It is more about the size of the table referenced by the index, rather than
the size of the index.  The point is that doing a large index scan might
lead you to visit the same table blocks repeatedly within quick succession.
 (If a small index scan is on the inner side of a nested loop, then you
might access the same index leaf blocks and the same table blocks
repeatedly--that is why is only mostly about the table size, rather than
exclusively).

Cheers,

Jeff


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Merlin Moncure
On Wed, May 7, 2014 at 1:13 PM, Peter Geoghegan  wrote:
> On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:
>> Unfortunately nobody has the time/resources to do the kind of testing
>> required for a new recommendation for shared_buffers.
>
> I meant to suggest that the buffer manager could be improved to the
> point that the old advice becomes obsolete. Right now, it's much
> harder to analyze shared_buffers than it should be, presumably because
> of the problems with the buffer manager. I think that if we could
> formulate better *actionable* advice around what we have right now,
> that would have already happened.
>
> We ought to be realistic about the fact that the current
> recommendations around sizing shared_buffers are nothing more than
> folk wisdom. That's the best we have right now, but that seems quite
> unsatisfactory to me.

I think the stock advice is worse then nothing because it is A. based
on obsolete assumptions and B. doesn't indicate what the tradeoffs are
or what kinds of symptoms adjusting the setting could alleviate.  The
documentation should be reduced to things that are known, for example:

*) raising shared buffers does not 'give more memory to postgres for
caching' -- it can only reduce it via double paging
*) are generally somewhat faster than fault to o/s buffers
*) large s_b than working dataset size can be good configuration for
read only loads especially
*) have bad interplay with o/s in some configurations with large settings
*) shared buffers can reduce write i/o in certain workloads
*) interplay with checkpoint
*) have different mechanisms for managing contention than o/s buffers

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] making bgworkers without shmem access actually not have shmem access

2014-05-07 Thread Robert Haas
I've complained about this problem a few times before: there's nothing
to prevent a background worker which doesn't request shared memory
access from calling InitProcess() and then accessing shared memory
anyway.  The attached patch is a first crack at fixing it.
Unfortunately, there's still a window between when we fork() and when
we detach shared memory, but that's also true for processes like
syslogger, whose death is nevertheless does not cause a
crash-and-restart.  Also unfortunately, in the EXEC_BACKEND case, we
actually have to attach shared memory first, to get our background
worker entry details. This is undesirable, but I'm not sure there's
any good way to fix it at all, and certainly not in time for 9.4.
Hopefully, the window when shared memory is attached with this patch
is short enough to make things relatively safe.

At a minimum, it's got to be better than the status quo, where shared
memory is accessible throughout the entire lifetime of
non-shmem-access background workers.

Attached is also a new background worker called say_hello which I used
for testing this patch.  That's obviously not for commit.

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index a6b25d8..8b8ae89 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,9 +20,11 @@
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
 #include "storage/barrier.h"
+#include "storage/dsm.h"
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lwlock.h"
+#include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
@@ -400,12 +402,16 @@ BackgroundWorkerStopNotifications(pid_t pid)
 BackgroundWorker *
 BackgroundWorkerEntry(int slotno)
 {
+	static BackgroundWorker myEntry;
 	BackgroundWorkerSlot *slot;
 
 	Assert(slotno < BackgroundWorkerData->total_slots);
 	slot = &BackgroundWorkerData->slot[slotno];
 	Assert(slot->in_use);
-	return &slot->worker;		/* can't become free while we're still here */
+
+	/* must copy this in case we don't intend to retain shmem access */
+	memcpy(&myEntry, &slot->worker, sizeof myEntry);
+	return &myEntry;
 }
 #endif
 
@@ -542,6 +548,19 @@ StartBackgroundWorker(void)
 	snprintf(buf, MAXPGPATH, "bgworker: %s", worker->bgw_name);
 	init_ps_display(buf, "", "", "");
 
+	/*
+	 * If we're not supposed to have shared memory access, then detach from
+	 * shared memory.  If we didn't request shared memory access, the
+	 * postmaster won't force a cluster-wide restart if we exit unexpectedly,
+	 * so we'd better make sure that we don't mess anything up that would
+	 * require that sort of cleanup.
+	 */
+	if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0)
+	{
+		dsm_detach_all();
+		PGSharedMemoryDetach();
+	}
+
 	SetProcessingMode(InitProcessing);
 
 	/* Apply PostAuthDelay */
@@ -616,19 +635,29 @@ StartBackgroundWorker(void)
 	/* We can now handle ereport(ERROR) */
 	PG_exception_stack = &local_sigjmp_buf;
 
-	/* Early initialization */
-	BaseInit();
-
 	/*
-	 * If necessary, create a per-backend PGPROC struct in shared memory,
-	 * except in the EXEC_BACKEND case where this was done in
-	 * SubPostmasterMain. We must do this before we can use LWLocks (and in
-	 * the EXEC_BACKEND case we already had to do some stuff with LWLocks).
+	 * If the background worker request shared memory access, set that up now;
+	 * else, detach all shared memory segments.
 	 */
-#ifndef EXEC_BACKEND
 	if (worker->bgw_flags & BGWORKER_SHMEM_ACCESS)
+	{
+		/*
+		 * Early initialization.  Some of this could be useful even for
+		 * background workers that aren't using shared memory, but they can
+		 * call the individual startup routines for those subsystems if needed.
+		 */
+		BaseInit();
+
+		/*
+		 * Create a per-backend PGPROC struct in shared memory, except in the
+		 * EXEC_BACKEND case where this was done in SubPostmasterMain. We must
+		 * do this before we can use LWLocks (and in the EXEC_BACKEND case we
+		 * already had to do some stuff with LWLocks).
+		 */
+#ifndef EXEC_BACKEND
 		InitProcess();
 #endif
+	}
 
 	/*
 	 * If bgw_main is set, we use that value as the initial entrypoint.
diff --git a/contrib/say_hello/Makefile b/contrib/say_hello/Makefile
new file mode 100644
index 000..2da89d7
--- /dev/null
+++ b/contrib/say_hello/Makefile
@@ -0,0 +1,17 @@
+# contrib/say_hello/Makefile
+
+MODULES = say_hello
+
+EXTENSION = say_hello
+DATA = say_hello--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/say_hello
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/say_hello/say_hello--1.0.sql b/contrib/say_hello/say_hello--1.0.sql
new file mode 100644
index 000..c82b7df
--- /d

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andres Freund
On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> On Wed, May 7, 2014 at 1:13 PM, Peter Geoghegan  wrote:
> > On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:
> >> Unfortunately nobody has the time/resources to do the kind of testing
> >> required for a new recommendation for shared_buffers.
> >
> > I meant to suggest that the buffer manager could be improved to the
> > point that the old advice becomes obsolete. Right now, it's much
> > harder to analyze shared_buffers than it should be, presumably because
> > of the problems with the buffer manager. I think that if we could
> > formulate better *actionable* advice around what we have right now,
> > that would have already happened.
> >
> > We ought to be realistic about the fact that the current
> > recommendations around sizing shared_buffers are nothing more than
> > folk wisdom. That's the best we have right now, but that seems quite
> > unsatisfactory to me.
> 
> I think the stock advice is worse then nothing because it is A. based
> on obsolete assumptions and B. doesn't indicate what the tradeoffs are
> or what kinds of symptoms adjusting the setting could alleviate.  The
> documentation should be reduced to things that are known, for example:
> 
> *) raising shared buffers does not 'give more memory to postgres for
> caching' -- it can only reduce it via double paging

That's absolutely not a necessary consequence. If pages are in s_b for a
while the OS will be perfectly happy to throw them away.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus
On 05/07/2014 11:13 AM, Peter Geoghegan wrote:
> We ought to be realistic about the fact that the current
> recommendations around sizing shared_buffers are nothing more than
> folk wisdom. That's the best we have right now, but that seems quite
> unsatisfactory to me.

So, as one of several people who put literally hundreds of hours into
the original benchmarking which established the sizing recommendations
for shared_buffers (and other settings), I find the phrase "folk wisdom"
personally offensive.  So, can we stop with this?

Otherwise, I don't think I can usefully participate in this discussion.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] making bgworkers without shmem access actually not have shmem access

2014-05-07 Thread Tom Lane
Robert Haas  writes:
> I've complained about this problem a few times before: there's nothing
> to prevent a background worker which doesn't request shared memory
> access from calling InitProcess() and then accessing shared memory
> anyway.  The attached patch is a first crack at fixing it.

> Comments?

Looks reasonable to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 11:38 AM, Andres Freund  wrote:
>> *) raising shared buffers does not 'give more memory to postgres for
>> caching' -- it can only reduce it via double paging
>
> That's absolutely not a necessary consequence. If pages are in s_b for a
> while the OS will be perfectly happy to throw them away.

The biggest problem with double buffering is not that it wastes
memory. Rather, it's that it wastes memory bandwidth. I think that
lessening that problem will be the major benefit of making larger
shared_buffers settings practical.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane
Andres Freund  writes:
> * The jentry representation should be changed so it's possible to get the type
>   of a entry without checking individual types. That'll make code like
>   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
>   much easier to read. Preferrably so it an have the same values (after
>   shifting/masking) ask the JBE variants. And it needs space for futher
>   types (or representations thereof).

Further types?  What further types?  JSON seems unlikely to grow any
other value types than what it's got.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 2:40 PM, Josh Berkus  wrote:
> On 05/07/2014 11:13 AM, Peter Geoghegan wrote:
>> We ought to be realistic about the fact that the current
>> recommendations around sizing shared_buffers are nothing more than
>> folk wisdom. That's the best we have right now, but that seems quite
>> unsatisfactory to me.
>
> So, as one of several people who put literally hundreds of hours into
> the original benchmarking which established the sizing recommendations
> for shared_buffers (and other settings), I find the phrase "folk wisdom"
> personally offensive.  So, can we stop with this?
>
> Otherwise, I don't think I can usefully participate in this discussion.

+1.

I think it is quite accurate to say that we can't predict precisely
what value of shared_buffers will perform best for a particular
workload and on a particular system.  There are people out there using
very large values and very small ones, according to what they have
found most effective.  But that does not mean, as the phrase "folk
wisdom" might be taken to imply, that we don't know anything at all
about what actually works well in practice.  Because we do know quite
a bit about that.  I and people I work with have been able to improve
performance greatly on many systems by providing guidance based on
what this community has been able to understand on this topic, and
dismissing it as rubbish is wrong.

Also, I seriously doubt that a one-size-fits-all guideline about
setting shared_buffers will ever be right for every workload.
Workloads, by their nature, are complex beasts.  The size of the
workload varies, and which portions of it are how hot vary, and the
read-write mix varies, and those are not problems with PostgreSQL;
those are problems with data.  That is not to say that we can't do
anything to make PostgreSQL work better across a wider range of
settings for shared_buffers, but it is to say that no matter how much
work we do on the code, setting this optimally for every workload will
probably remain complex.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andres Freund
On 2014-05-07 11:45:04 -0700, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 11:38 AM, Andres Freund  wrote:
> >> *) raising shared buffers does not 'give more memory to postgres for
> >> caching' -- it can only reduce it via double paging
> >
> > That's absolutely not a necessary consequence. If pages are in s_b for a
> > while the OS will be perfectly happy to throw them away.
> 
> The biggest problem with double buffering is not that it wastes
> memory. Rather, it's that it wastes memory bandwidth.

Doesn't match my experience. Even with the current buffer manager
there's usually enough locality to keep important pages in s_b for a
meaningful time. I *have* seen workloads that should have fit into
memory not fit because of double buffering.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 2:49 PM, Andres Freund  wrote:
> On 2014-05-07 11:45:04 -0700, Peter Geoghegan wrote:
>> On Wed, May 7, 2014 at 11:38 AM, Andres Freund  
>> wrote:
>> >> *) raising shared buffers does not 'give more memory to postgres for
>> >> caching' -- it can only reduce it via double paging
>> >
>> > That's absolutely not a necessary consequence. If pages are in s_b for a
>> > while the OS will be perfectly happy to throw them away.
>>
>> The biggest problem with double buffering is not that it wastes
>> memory. Rather, it's that it wastes memory bandwidth.
>
> Doesn't match my experience. Even with the current buffer manager
> there's usually enough locality to keep important pages in s_b for a
> meaningful time. I *have* seen workloads that should have fit into
> memory not fit because of double buffering.

Same here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andres Freund
On 2014-05-07 14:48:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > * The jentry representation should be changed so it's possible to get the 
> > type
> >   of a entry without checking individual types. That'll make code like
> >   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
> >   much easier to read. Preferrably so it an have the same values (after
> >   shifting/masking) ask the JBE variants. And it needs space for futher
> >   types (or representations thereof).
> 
> Further types?  What further types?  JSON seems unlikely to grow any
> other value types than what it's got.

I am not thinking about user level exposed types, but e.g. a hash/object
representation that allows duplicated keys and keeps the original
order. Because I am pretty sure that'll come.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 11:40 AM, Josh Berkus  wrote:
> So, as one of several people who put literally hundreds of hours into
> the original benchmarking which established the sizing recommendations
> for shared_buffers (and other settings), I find the phrase "folk wisdom"
> personally offensive.  So, can we stop with this?

I have also put a lot of time into benchmarking. No personal offence
was intended, and I'm glad that we have some advice to give to users,
but the fact of the matter is that current *official* recommendations
are very vague.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] making bgworkers without shmem access actually not have shmem access

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 2:44 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I've complained about this problem a few times before: there's nothing
>> to prevent a background worker which doesn't request shared memory
>> access from calling InitProcess() and then accessing shared memory
>> anyway.  The attached patch is a first crack at fixing it.
>
>> Comments?
>
> Looks reasonable to me.

Thanks for the fast review.  Committed, after fixing one further bug I spotted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 11:50 AM, Robert Haas  wrote:
> But that does not mean, as the phrase "folk
> wisdom" might be taken to imply, that we don't know anything at all
> about what actually works well in practice.

Folk wisdom doesn't imply that. It implies that we think this works,
and we may well be right, but there isn't all that much rigor behind
some of it. I'm not blaming anyone for this state of affairs. I've
heard plenty of people repeat the "don't exceed 8GB" rule - I
regularly repeated it myself. I cannot find any rigorous defense of
this, though. If you're aware of one, please point it out to me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] making bgworkers without shmem access actually not have shmem access

2014-05-07 Thread Petr Jelinek

On 07/05/14 20:37, Robert Haas wrote:

At a minimum, it's got to be better than the status quo, where shared
memory is accessible throughout the entire lifetime of
non-shmem-access background workers.



Seems reasonable to me, it might need to be revisited to at least try to 
figure out if we can make EXEC_BACKEND safer, but it's definitely better 
than the current state.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andrew Dunstan


On 05/07/2014 02:52 PM, Andres Freund wrote:

On 2014-05-07 14:48:51 -0400, Tom Lane wrote:

Andres Freund  writes:

* The jentry representation should be changed so it's possible to get the type
   of a entry without checking individual types. That'll make code like
   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
   much easier to read. Preferrably so it an have the same values (after
   shifting/masking) ask the JBE variants. And it needs space for futher
   types (or representations thereof).

Further types?  What further types?  JSON seems unlikely to grow any
other value types than what it's got.

I am not thinking about user level exposed types, but e.g. a hash/object
representation that allows duplicated keys and keeps the original
order. Because I am pretty sure that'll come.





That makes one of you. We've had hstore for years and nobody that I 
recall has asked for preservation of key order or duplicates. And any 
app that relies on either in JSON is still, IMNSHO, broken by design.


OTOH, there are suggestions of "superjson" types that support other 
scalar types such as timestamps.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus
On 05/07/2014 11:52 AM, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 11:40 AM, Josh Berkus  wrote:
>> So, as one of several people who put literally hundreds of hours into
>> the original benchmarking which established the sizing recommendations
>> for shared_buffers (and other settings), I find the phrase "folk wisdom"
>> personally offensive.  So, can we stop with this?
> 
> I have also put a lot of time into benchmarking. No personal offence
> was intended, and I'm glad that we have some advice to give to users,
> but the fact of the matter is that current *official* recommendations
> are very vague.

Well, they should be vague; the only hard data we have is rather
out-of-date (I think 8.2 was our last set of tests).  If we gave users
specific, detailed recommendations, we'd be misleading them.

For that matter, our advice on shared_buffers ... and our design for it
... is going to need to change radically soon, since Linux is getting an
ARC with a frequency cache as well as a recency cache, and FreeBSD and
OpenSolaris already have them.

FWIW, if someone could fund me for a month, I'd be happy to create a
benchmarking setup where we could test these kinds of things; I have
pretty clear ideas how to build one.  I imagine some of our other
consultants could make the same offer.  However, it's too much work for
anyone to get done "in their spare time".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane
btw ... in jsonb.h there is this comment:

 * Jsonb Keys and string array elements are treated equivalently when
 * serialized to text index storage.  One day we may wish to create an opclass
 * that only indexes values, but for now keys and values are stored in GIN
 * indexes in a way that doesn't really consider their relationship to each
 * other.

Is this an obsolete speculation about writing jsonb_hash_ops, or is there
still something worth preserving there?  If so, what?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 11:50 AM, Robert Haas  wrote:
>> Doesn't match my experience. Even with the current buffer manager
>> there's usually enough locality to keep important pages in s_b for a
>> meaningful time. I *have* seen workloads that should have fit into
>> memory not fit because of double buffering.
>
> Same here.

I think that it depends on whether or not you're thinking about the
worst case. Most people are not going to be in the category you
describe here. Plenty of people in the Postgres community run with
very large shared_buffers settings, on non i/o bound workloads, and
report good results - often massive, quickly apparent improvements.
I'm mostly concerned with obsoleting the 8GB hard ceiling rule here.

It probably doesn't matter whether and by how much one factor is worse
than the other, though. I found the section "5.2 Temporal Control:
Buffering" in the following paper, that speaks about the subject quite
interesting: http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 12:08 PM, Tom Lane  wrote:
>  * Jsonb Keys and string array elements are treated equivalently when
>  * serialized to text index storage.  One day we may wish to create an opclass
>  * that only indexes values, but for now keys and values are stored in GIN
>  * indexes in a way that doesn't really consider their relationship to each
>  * other.
>
> Is this an obsolete speculation about writing jsonb_hash_ops, or is there
> still something worth preserving there?  If so, what?

This is not obsolete. It would equally apply to a GiST opclass that
wanted to support our current definition of existence. Array elements
are keys simply by virtue of being strings, but otherwise are treated
as values. See the large comment block within gin_extract_jsonb().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane
And while I'm looking at it ...

The jsonb_ops storage format for values is inherently lossy, because it
cannot distinguish the string values "n", "t", "f" from JSON null or
boolean true, false respectively; nor does it know the difference between
a number and a string containing digits.  This appears to not quite be a
bug because the consistent functions force recheck for all queries that
care about values (as opposed to keys).  But if it's documented anywhere
I don't see where.  And in any case, is it a good idea?  We could fairly
easily change things so that these cases are guaranteed distinguishable.
We're using an entire byte to convey one bit of information (key or
value); I'm inclined to redefine the flag byte so that it tells not just
that but which JSON datatype is involved.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Robert Haas
On Wed, May 7, 2014 at 2:58 PM, Peter Geoghegan  wrote:
> On Wed, May 7, 2014 at 11:50 AM, Robert Haas  wrote:
>> But that does not mean, as the phrase "folk
>> wisdom" might be taken to imply, that we don't know anything at all
>> about what actually works well in practice.
>
> Folk wisdom doesn't imply that. It implies that we think this works,
> and we may well be right, but there isn't all that much rigor behind
> some of it. I'm not blaming anyone for this state of affairs. I've
> heard plenty of people repeat the "don't exceed 8GB" rule - I
> regularly repeated it myself. I cannot find any rigorous defense of
> this, though. If you're aware of one, please point it out to me.

I'm not sure the level of rigor you'd like to see is going to be
available here.  Complex systems have complex behavior; that's life.

At any rate, I'm not aware of any rigorous defense of the "don't
exceed 8GB" rule.  But, #1, I'd never put it that simply.   What I've
found is more like this: If it's possible to size shared_buffers so
that the working set fits entirely within shared_buffers, that
configuration is worthy of strong consideration.  Otherwise, you
probably want to keep shared_buffers low in order to avoid
checkpoint-related I/O spikes and minimize double buffering; try 25%
of system memory up to 512MB on Windows or up to 2GB on 32-bit Linux
or up to 8GB on 64-bit Linux for starters, and then tune based on your
workload.

And #2, I think the origin of the 8GB number on 64-bit non-Windows
systems is that people found that checkpoint-related I/O spikes became
intolerable when you went too much above that number.  On some
systems, the threshold is lower than that - for example, I believe
Merlin and others have reported numbers more like 2GB than 8GB - and
on other systems, the threshold is higher - indeed, some people go way
higher and never hit it at all.  I agree that it would be nice to
better-characterize why different users hit it at different levels,
but it's probably highly dependent on hardware, workload, and kernel
version, so I tend to doubt it can be characterized very simply.

If I had go to guess, I'd bet that fixing Linux's abominable behavior
around the fsync() call would probably go a long way toward making
higher values of shared_buffers more practical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 12:27 PM, Tom Lane  wrote:
> The jsonb_ops storage format for values is inherently lossy, because it
> cannot distinguish the string values "n", "t", "f" from JSON null or
> boolean true, false respectively; nor does it know the difference between
> a number and a string containing digits.  This appears to not quite be a
> bug because the consistent functions force recheck for all queries that
> care about values (as opposed to keys).  But if it's documented anywhere
> I don't see where.

The fact that we *don't* set the reset flag for
JsonbExistsStrategyNumber, and why that's okay is prominently
documented. So I'd say that it is.

> And in any case, is it a good idea?  We could fairly
> easily change things so that these cases are guaranteed distinguishable.
> We're using an entire byte to convey one bit of information (key or
> value); I'm inclined to redefine the flag byte so that it tells not just
> that but which JSON datatype is involved.

It seemed simpler to do it that way. As I've said before, jsonb_ops is
mostly concerned with hstore-style indexing. It could also be
particularly useful for expressional indexes on "tags" arrays of
strings, which is a common use-case.

jsonb_hash_ops on the other hand is for testing containment, which is
useful for querying heavily nested documents, where typically there is
a very low selectivity for keys. It's not the default largely because
I was concerned about not supporting all indexable operators by
default, and because I suspected that it would be preferred to have a
default closer to hstore.

Anyway, doing things that way for values won't obviate the need to set
the reset flag, unless you come up with a much more sophisticated
scheme. Existence (of keys) is only tested in respect of the
top-level. Containment (where values are tested) is more complicated.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, May 7, 2014 at 12:08 PM, Tom Lane  wrote:
>> Is this an obsolete speculation about writing jsonb_hash_ops, or is there
>> still something worth preserving there?  If so, what?

> This is not obsolete. It would equally apply to a GiST opclass that
> wanted to support our current definition of existence. Array elements
> are keys simply by virtue of being strings, but otherwise are treated
> as values. See the large comment block within gin_extract_jsonb().

It's not that aspect I'm complaining about, it's the bit about "one day we
may wish to write".  This comment should restrict itself to describing
what jsonb_ops does, not make already-or-soon-to-be-obsolete statements
about what other opclasses might or might not do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Jeff Janes
On Wed, May 7, 2014 at 10:34 AM, Andres Freund wrote:

> Hi,
>
> On 2014-05-07 10:21:26 -0700, Jeff Janes wrote:
> > On Wed, May 7, 2014 at 12:48 AM, Andres Freund  >wrote:
> > > If you have the WAL a pg_xlogdump grepping for everything referring to
> > > that block would be helpful.
> > >
> >
> > The only record which mentions block 28486 by name is this one:
>
> Hm, try running it with -b specified.
>

Still nothing more.


>
> > rmgr: Gin len (rec/tot):   1576/  1608, tx:   77882205, lsn:
> > 11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page,
> node:
> > 1663/16384/16396 blkno: 28486
> >
> > However, I think that that record precedes the recovery start point.
>
> If that's the case it seems likely that a PageSetLSN() or PageSetDirty()
> are missing somewhere...
>

Wouldn't that result in corrupted data after crash recovery when checksum
failures are ignored?  I haven't seen any of that.

Cheers,

Jeff


Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, May 7, 2014 at 12:27 PM, Tom Lane  wrote:
>> The jsonb_ops storage format for values is inherently lossy, because it
>> cannot distinguish the string values "n", "t", "f" from JSON null or
>> boolean true, false respectively; nor does it know the difference between
>> a number and a string containing digits.  This appears to not quite be a
>> bug because the consistent functions force recheck for all queries that
>> care about values (as opposed to keys).  But if it's documented anywhere
>> I don't see where.

> The fact that we *don't* set the reset flag for
> JsonbExistsStrategyNumber, and why that's okay is prominently
> documented. So I'd say that it is.

Meh.  Are you talking about the large comment block in gin_extract_jsonb?
The readability of that comment starts to go downhill with its use of
"reset" to refer to what everything else calls a "recheck" flag, and in
any case it's claiming that we *don't* need a recheck for exists (a
statement I suspect to be false, but more later).  It entirely fails to
explain that other query types *do* need a recheck because of arbitrary
decisions about not representing JSON datatype information fully.  There's
another comment in gin_consistent_jsonb that's just as misleading, because
it mentions (vaguely) some reasons why recheck is necessary without
mentioning this one.

A larger issue here is that, to the extent that this comment even has
the information I'm after, the comment is in the wrong place.  It is not
attached to the code that's actually making the lossy representational
choices (ie, make_scalar_key), nor to the code that decides whether or not
a recheck is needed (ie, gin_consistent_jsonb).  I don't think that basic
architectural choices like these should be relegated to comment blocks
inside specific functions to begin with.  A README file would be better,
perhaps, but there's not a specific directory associated with the jsonb
code; so I think this sort of info belongs either in jsonb.h or in the
file header comment for jsonb_gin.c.

> Anyway, doing things that way for values won't obviate the need to set
> the reset flag, unless you come up with a much more sophisticated
> scheme. Existence (of keys) is only tested in respect of the
> top-level. Containment (where values are tested) is more complicated.

I'm not expecting that it will make things better for the current query
operators.  I am thinking that exact rather than lossy storage might help
for future operators wanting to use this same index representation.
Once we ship 9.4, it's going to be very hard to change the index
representation, especially for the default opclass (cf the business about
which opclass is default for type inet).  So it behooves us to not throw
information away needlessly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-05-07 Thread Robert Haas
On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek  wrote:
>> What I'm inclined to do is change the logic so that:
>>
>> (1) After a crash-and-restart sequence, zero rw->rw_crashed_at, so
>> that anything which is still registered gets restarted immediately.
>
> Yes, that's quite obvious change which I missed completely :).

I've committed the portion of your patch which does this, with pretty
extensive changes.  I moved the function which resets the crash times
to bgworker.c, renamed it, and made it just reset all of the crash
times unconditionally; there didn't seem to be any point in skipping
the irrelevant ones, so it seemed best to keep things simple.  I also
moved the call from reaper() where you had it to
PostmasterStateMachine(), because the old placement did not work for
me when I carried out the following steps:

1. Kill a background worker with a 60-second restart time using
SIGTERM, so that it does exit(1).
2. Before it restarts, kill a regular backend with SIGQUIT.

Let me know if you think I've got it wrong.

>> (2) If a shmem-connected backend fails to release the deadman switch
>> or exits with an exit code other than 0 or 1, we crash-and-restart.  A
>> non-shmem-connected backend never causes a crash-and-restart.
>
> +1

I did this as a separate commit,
e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for
ReleasePostmasterChildSlot inside the if statement.   Then I realized
that was bogus, so I just pushed
eee6cf1f337aa488a20e9111df446cdad770e645 to fix that.  Hopefully it's
OK now.

>> (3) When a background worker exits without triggering a
>> crash-and-restart, an exit code of precisely 0 causes the worker to be
>> unregistered; any other exit code has no special effect, so
>> bgw_restart_time controls.
>
> +1

This isn't done yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Jeff Janes
On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:

> On 05/06/2014 10:35 PM, Peter Geoghegan wrote:
> > +1. In my view, we probably should have set it to a much higher
> > absolute default value. The main problem with setting it to any
> > multiple of shared_buffers that I can see is that shared_buffers is a
> > very poor proxy for what effective_cache_size is supposed to
> > represent. In general, the folk wisdom around sizing shared_buffers
> > has past its sell-by date.
>
> Unfortunately nobody has the time/resources to do the kind of testing
> required for a new recommendation for shared_buffers.
>

I think it is worse than that.  I don't think we know what such testing
would even look like.  SSD?  BBU? max_connections=2 with 256 cores?
 pgbench -N?  capture and replay of Amazon's workload?

If we could spell out/agree upon what kind of testing we would find
convincing, that would probably go a long way to getting some people to
work on carrying out the tests.  Unless the conclusion was "please have 3TB
or RAM and a 50 disk RAID", then there might be few takers.

Cheers,

Jeff


[HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Tom Lane
I wrote:
> The readability of that comment starts to go downhill with its use of
> "reset" to refer to what everything else calls a "recheck" flag, and in
> any case it's claiming that we *don't* need a recheck for exists (a
> statement I suspect to be false, but more later).

And, indeed, it's false:

regression=# create table j (f1 jsonb);
CREATE TABLE
regression=# insert into j values ('{"foo": {"bar": "baz"}}');
INSERT 0 1
regression=# insert into j values ('{"foo": {"blah": "baz"}}');
INSERT 0 1
regression=# insert into j values ('{"fool": {"bar": "baz"}}');
INSERT 0 1
regression=# create index on j using gin(f1);
CREATE INDEX
regression=# select * from j where f1 ? 'bar';
 f1 

(0 rows)

regression=# set enable_seqscan to 0;
SET
regression=# select * from j where f1 ? 'bar';
f1
--
 {"foo": {"bar": "baz"}}
 {"fool": {"bar": "baz"}}
(2 rows)

The indexscan is incorrectly returning rows where the queried key exists
but isn't at top-level.

We could fix this either by giving up on no-recheck for existence queries,
or by changing the way that non-top-level keys get indexed.  However
I suspect the latter would break containment queries, or at least make
their lives a lot more difficult.

Another idea would be to change the definition of the exists operator
so that it *does* look into sub-objects.  It seems rather random to me
that containment looks into sub-objects but exists doesn't.  However,
possibly there are good reasons for the non-orthogonality.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Heikki Linnakangas

On 05/07/2014 10:35 AM, Jeff Janes wrote:

When recovering from a crash (with injection of a partial page write at
time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
verification failure.

16396 is a gin index.

If I have it ignore checksum failures, there is no apparent misbehavior.
  I'm trying to bisect it, but it could take a while and I thought someone
might have some theories based on the log:

29075  2014-05-06 23:29:51.411 PDT:LOG:  0: database system was not
properly shut down; automatic recovery in progress
29075  2014-05-06 23:29:51.411 PDT:LOCATION:  StartupXLOG, xlog.c:6361
29075  2014-05-06 23:29:51.412 PDT:LOG:  0: redo starts at 11/323FE1C0
29075  2014-05-06 23:29:51.412 PDT:LOCATION:  StartupXLOG, xlog.c:6600
29075  2014-05-06 23:29:51.471 PDT:WARNING:  01000: page verification
failed, calculated checksum 35967 but expected 7881
29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages


A-ha. The WAL record of list page deletion doesn't create a full-page 
images of the deleted pages. That's pretty sensible, as a deleted page 
doesn't contain any data that needs to be retained. However, if a 
full-page image is not taken, then the page should be completely 
recreated from scratch at replay. It's not doing that.


Thanks for the testing! I'll have a stab at fixing that tomorrow. 
Basically, ginRedoDeleteListPages() needs to re-initialize the deleted 
pages.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Tom Lane
I wrote:
> Another idea would be to change the definition of the exists operator
> so that it *does* look into sub-objects.  It seems rather random to me
> that containment looks into sub-objects but exists doesn't.  However,
> possibly there are good reasons for the non-orthogonality.

No, wait, containment *doesn't* look into sub-objects:

regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}';
   f1
-
 {"foo": {"bar": "baz"}}
(1 row)

regression=# select * from j where f1 @> '{"bar": "baz"}';
 f1 

(0 rows)

This is rather surprising in view of the way that section 8.14.4
goes on about nesting.  But I guess the user-facing docs for jsonb
are in little better shape than the internal docs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus
On 05/07/2014 01:36 PM, Jeff Janes wrote:
> On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:

>> Unfortunately nobody has the time/resources to do the kind of testing
>> required for a new recommendation for shared_buffers.

> I think it is worse than that.  I don't think we know what such testing
> would even look like.  SSD?  BBU? max_connections=2 with 256 cores?
>  pgbench -N?  capture and replay of Amazon's workload?
> 
> If we could spell out/agree upon what kind of testing we would find
> convincing, that would probably go a long way to getting some people to
> work on carrying out the tests.  Unless the conclusion was "please have 3TB
> or RAM and a 50 disk RAID", then there might be few takers.

Well, step #1 would be writing some easy-to-run benchmarks which carry
out selected workloads and measure response times.  The minimum starting
set would include one OLTP/Web benchmark, and one DW benchmark.

I'm not talking about the software to run the workload; we have that, in
several varieties.  I'm talking about the actual database generator and
queries to run.  That's the hard work.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Jeff Janes
On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote:

> On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> >
> > *) raising shared buffers does not 'give more memory to postgres for
> > caching' -- it can only reduce it via double paging
>
> That's absolutely not a necessary consequence. If pages are in s_b for a
> while the OS will be perfectly happy to throw them away.
>

Is that an empirical observation?  I've run some simulations a couple years
ago, and also wrote some instrumentation to test that theory under
favorably engineered (but still plausible) conditions, and couldn't get
more than a small fraction of s_b to be so tightly bound in that the kernel
could forget about them.  Unless of course the entire workload or close to
it fits in s_b.

Cheers,

Jeff


Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 1:47 PM, Tom Lane  wrote:
> No, wait, containment *doesn't* look into sub-objects:
>
> regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}';
>f1
> -
>  {"foo": {"bar": "baz"}}
> (1 row)
>
> regression=# select * from j where f1 @> '{"bar": "baz"}';
>  f1
> 
> (0 rows)


Yes it does. It's just not intended to work like that. You have to
"give a path" to what you're looking for.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bgworker crashed or not?

2014-05-07 Thread Petr Jelinek

On 07/05/14 22:32, Robert Haas wrote:

On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek  wrote:
I've committed the portion of your patch which does this, with pretty
extensive changes.  I moved the function which resets the crash times
to bgworker.c, renamed it, and made it just reset all of the crash
times unconditionally; there didn't seem to be any point in skipping
the irrelevant ones, so it seemed best to keep things simple.  I also
moved the call from reaper() where you had it to
PostmasterStateMachine(), because the old placement did not work for
me when I carried out the following steps:

1. Kill a background worker with a 60-second restart time using
SIGTERM, so that it does exit(1).
2. Before it restarts, kill a regular backend with SIGQUIT.

Let me know if you think I've got it wrong.



No I think it's fine, I didn't try that combination and just wanted to 
put it as deep in the call as possible.




(2) If a shmem-connected backend fails to release the deadman switch
or exits with an exit code other than 0 or 1, we crash-and-restart.  A
non-shmem-connected backend never causes a crash-and-restart.


+1


I did this as a separate commit,
e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for
ReleasePostmasterChildSlot inside the if statement.   Then I realized
that was bogus, so I just pushed
eee6cf1f337aa488a20e9111df446cdad770e645 to fix that.  Hopefully it's
OK now.


The fixed one looks ok to me.




(3) When a background worker exits without triggering a
crash-and-restart, an exit code of precisely 0 causes the worker to be
unregistered; any other exit code has no special effect, so
bgw_restart_time controls.


+1


This isn't done yet.



Unless I am missing something this change was included in every patch I 
sent - setting rw->rw_terminate = true; in CleanupBackgroundWorker for 
zero exit code + comment changes. Or do you have objections to this 
approach?


Anyway missing parts attached.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index fd32d6c..5f86100 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -166,10 +166,12 @@ typedef struct BackgroundWorker
   
 
   
-   Background workers are expected to be continuously running; if they exit
-   cleanly, postgres will restart them immediately.  Consider doing
-   interruptible sleep when they have nothing to do; this can be achieved by
-   calling WaitLatch().  Make sure the
+   Background workers are normally expected to be running continuously;
+   they get restarted automaticaly in case of crash (see 
+   bgw_restart_time documentation for details),
+   but if they exit cleanly, postgres will not restart them.
+   Consider doing interruptible sleep when they have nothing to do; this can be
+   achieved by calling WaitLatch(). Make sure the
WL_POSTMASTER_DEATH flag is set when calling that function, and
verify the return code for a prompt exit in the emergency case that
postgres itself has terminated.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 64c9722..b642f37 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -884,8 +884,9 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
  * running but is no longer.
  *
  * In the latter case, the worker may be stopped temporarily (if it is
- * configured for automatic restart, or if it exited with code 0) or gone
- * for good (if it is configured not to restart and exited with code 1).
+ * configured for automatic restart and exited with code 1) or gone for
+ * good (if it exited with code other than 1 or if it is configured not
+ * to restart).
  */
 BgwHandleStatus
 GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 79d1c50..a0331e6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2849,7 +2849,11 @@ CleanupBackgroundWorker(int pid,
 		if (!EXIT_STATUS_0(exitstatus))
 			rw->rw_crashed_at = GetCurrentTimestamp();
 		else
+		{
+			/* Zero exit status means terminate */
 			rw->rw_crashed_at = 0;
+			rw->rw_terminate = true;
+		}
 
 		/*
 		 * Additionally, for shared-memory-connected workers, just like a
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index c9550cc..d3dd1f2 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -16,10 +16,11 @@
  * that the failure can only be transient (fork failure due to high load,
  * memory pressure, too many processes, etc); more permanent problems, like
  * failure to connect to a database, are detected later in the worker and dealt
- * with just by having the worker exit normally.  A worker which exits with a
- * return code of 0 will be immediately re

Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Peter Geoghegan
On Wed, May 7, 2014 at 1:51 PM, Peter Geoghegan  wrote:
> Yes it does. It's just not intended to work like that. You have to
> "give a path" to what you're looking for.


There is exactly one exception explicitly noted as such in the
user-visible docs, which is arrays and the containment of single
elements.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andres Freund
On 2014-05-07 13:51:57 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote:
> 
> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> > >
> > > *) raising shared buffers does not 'give more memory to postgres for
> > > caching' -- it can only reduce it via double paging
> >
> > That's absolutely not a necessary consequence. If pages are in s_b for a
> > while the OS will be perfectly happy to throw them away.
> >
> 
> Is that an empirical observation?

Yes.

> I've run some simulations a couple years
> ago, and also wrote some instrumentation to test that theory under
> favorably engineered (but still plausible) conditions, and couldn't get
> more than a small fraction of s_b to be so tightly bound in that the kernel
> could forget about them.  Unless of course the entire workload or close to
> it fits in s_b.

I think it depends on your IO access patterns. If the whole working set
fits into the kernel's page cache and there's no other demand for pages
it will stay in. If you constantly rewrite most all your pages they'll
also stay in the OS cache because they'll get written out. If the churn
in shared_buffers is so high (because it's so small in comparison to the
core hot data set) that there'll be dozens if not hundreds clock sweeps
a second you'll also have no locality.
It's also *hugely* kernel version specific :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >