date:20140505

Re: [HACKERS] Fix initdb for path with whitespace and at char

2014-05-05 Thread Amit Kapila

On Mon, May 5, 2014 at 6:38 PM, Heikki Linnakangas
 wrote:
> On 05/01/2014 07:55 AM, Amit Kapila wrote:
>> 4. Similar to Andrew, I also could not reproduce this problem on my
>> Windows system (Windows 7 64 bit)
>> e:\>"e:\PostgreSQL\master\install 1\ins@1\bin\initdb.exe" -D "e:
>> \PostgreSQL\master\Data 1"
>> e:\>"e:\PostgreSQL\master\install 1\ins@1\bin\pg_ctl.exe" start -D "e:
>> \PostgreSQL\master\Data 1"
>>
>> Above commands work fine.
>
>
> Hmm, I'm also testing on 64-bit Windows 7, and it failed for me. Note that I
> already committed the narrow fix for initdb - which I also backpatched - to
> master, so to reproduce you'll need to revert that locally (commit
> 503de546).

After reverting the specified commit, I could reproduce the issue and
verified HEAD(it is fixed).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-05-05 Thread Mark Kirkwood


On 06/05/14 16:28, Amit Kapila wrote:

On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood
 wrote:

On 05/05/14 15:22, Amit Kapila wrote:
Right, but have a look at the 1st message in this thread - the current
behavior (and to a large extent the above condition) means that setting
cost limits per table does not work in any remotely intuitive way.

ITSM that the whole purpose of a per table setting in this context is to
override the behavior of auto vacuum throttling - and currently this does
not happen unless you get real brutal (i.e setting the cost delay to zero in
addition to setting cost limit...making the whole cost limit a bit
pointless).

I think meaning of per table setting is just that it overrides the default
value of autovacuum_vacuum_cost_limit for that table and the rest of
calculation or concept remains same.  This is what currently code does
and the same is mentioned in docs as far as I can understand.

As per current behaviour the per-table cost_limit is also adjusted based
on the setting of GUC autovacuum_vacuum_cost_limit and right now it
follows a simple principle that the total cost limit for all workers should be
equal to autovacuum_vacuum_cost_limit.  Even code has below comment:

/*
* Adjust cost limit of each active worker to balance the total of cost
* limit to autovacuum_vacuum_cost_limit.
*/

Now If you want to change for the case where user specifies value per
table which is more than autovacuum_vacuum_cost_limit or otherwise,
then I think the new definition should be bit more clear and it is better
not to impact current calculation for default values.

I could think of 2 ways to change this:

a. if user has specified cost_limit value for table, then it just uses it
 rather than rebalancing based on value of system-wide guc variable
 autovacuum_vacuum_cost_limit
b. another could be to restrict setting per-table value to be lesser than
 system-wide value?

The former is used for auto vacuum parameters like scale_factor and
later is used for parameters like freeze_max_age.

Thoughts?

Alvaro, do you think above options makes sense to solve this problem?


Yes indeed - the code currently working differently from what one would 
expect. However the usual reason for handing knobs to the user for 
individual object is so that special configurations can be applied to 
them. The current method of operation of the per table knobs does not do 
this (not without clubbing 'em on the head)


The (ahem) sensible way that one would expect (perhaps even need) 
autovacuum throttling to work is:


- set sensible defaults for all the usual (well behaved) tables
- set a few really aggressive overrides for a handful of the naughty ones

Runaway free space bloat is one of the things that can really mangle a 
postgres system (I've been called in to rescue a few in my time)... 
there needs to be a way to control those few badly behaved tables ... 
without removing the usefulness of throttling the others.


Regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elog a stack trace

2014-05-05 Thread MauMau


From: "Jeff Janes" 

Does anyone have any hints on how to get a stack trace programmatically,
rather than trying to run ps to get the pid and then attach gdb to a
ephemeral process and hoping the situation has not been changed while you
are doing that?  I'd like to set

log_error_verbosity = stack

or

elog_stack(,...)

But those don't exist.


On Windows, you can use Stackwalk() or Stackwalk64() Win32 API.  Several 
years ago, for some software, I implemented a feature that outputs the stack 
trace of a crashing process to its server log file.


It would be very nice if PostgreSQL outputs the stack trace of a crashing 
postgres process in the server log.  That eliminates the need for users in 
many cases to send the huge core files to the support staff or to use the 
debugger to get the stack trace.  I've recently heard that MySQL has this 
feature.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-05-05 Thread Amit Kapila

On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood
 wrote:
> On 05/05/14 15:22, Amit Kapila wrote:
> Right, but have a look at the 1st message in this thread - the current
> behavior (and to a large extent the above condition) means that setting
> cost limits per table does not work in any remotely intuitive way.
>
> ITSM that the whole purpose of a per table setting in this context is to
> override the behavior of auto vacuum throttling - and currently this does
> not happen unless you get real brutal (i.e setting the cost delay to zero in
> addition to setting cost limit...making the whole cost limit a bit
> pointless).

I think meaning of per table setting is just that it overrides the default
value of autovacuum_vacuum_cost_limit for that table and the rest of
calculation or concept remains same.  This is what currently code does
and the same is mentioned in docs as far as I can understand.

As per current behaviour the per-table cost_limit is also adjusted based
on the setting of GUC autovacuum_vacuum_cost_limit and right now it
follows a simple principle that the total cost limit for all workers should be
equal to autovacuum_vacuum_cost_limit.  Even code has below comment:

/*
* Adjust cost limit of each active worker to balance the total of cost
* limit to autovacuum_vacuum_cost_limit.
*/

Now If you want to change for the case where user specifies value per
table which is more than autovacuum_vacuum_cost_limit or otherwise,
then I think the new definition should be bit more clear and it is better
not to impact current calculation for default values.

I could think of 2 ways to change this:

a. if user has specified cost_limit value for table, then it just uses it
rather than rebalancing based on value of system-wide guc variable
autovacuum_vacuum_cost_limit
b. another could be to restrict setting per-table value to be lesser than
system-wide value?

The former is used for auto vacuum parameters like scale_factor and
later is used for parameters like freeze_max_age.

Thoughts?

Alvaro, do you think above options makes sense to solve this problem?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Docs for 9.4's worker_spi?

2014-05-05 Thread Alvaro Herrera

Robert Haas wrote:
> On Mon, May 5, 2014 at 1:30 PM, Bruce Momjian  wrote:
> > Seems there is no documentation for the 9.4 worker_spi contrib module.  Is
> > this OK?  The comment at the top of the C file says:
> >
> >  *  Sample background worker code that demonstrates various coding
> >  *  patterns: establishing a database connection; starting and 
> > committing
> >  *  transactions; using GUC variables, and heeding SIGHUP to reread
> >  *  the configuration file; reporting to pg_stat_activity; using the
> >  *  process latch to sleep and exit in case of postmaster death.
> 
> worker_spi is not new in 9.4; it was added in 9.3 by Alvaro in commit
> da07a1e856511dca59cbb1357616e26baa64428e.  I agree that it should be
> documented like all of the other contrib modules,

Uh.  It's a code sample.  I didn't ever consider documenting it was a
necessity, or I would have done it.  I don't think it needs any
documentation now, either (in Docbook form I mean.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Docs for 9.4's worker_spi?

2014-05-05 Thread Robert Haas

On Mon, May 5, 2014 at 1:30 PM, Bruce Momjian  wrote:
> Seems there is no documentation for the 9.4 worker_spi contrib module.  Is
> this OK?  The comment at the top of the C file says:
>
>  *  Sample background worker code that demonstrates various coding
>  *  patterns: establishing a database connection; starting and committing
>  *  transactions; using GUC variables, and heeding SIGHUP to reread
>  *  the configuration file; reporting to pg_stat_activity; using the
>  *  process latch to sleep and exit in case of postmaster death.

worker_spi is not new in 9.4; it was added in 9.3 by Alvaro in commit
da07a1e856511dca59cbb1357616e26baa64428e.  I agree that it should be
documented like all of the other contrib modules, and maybe I need to
be involved in that since I did make some changes in 9.4.

I'm actually pretty uncertain that worker_spi is a particularly good
example of how to write a useful background worker.  The initial
version was awfully buggy and incomplete and I'm not all that sure
we've fixed all of those problems.  Aside from that, now that we have
dynamic background workers and dynamic shared memory, I suspect it's
possible to write something a lot more useful than what we've got here
now, and I think maybe we should do that and then get rid of this
completely.  It's too late to do that for 9.4, but maybe I can pull
something together for 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Robert Haas

On Mon, May 5, 2014 at 6:54 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, May 5, 2014 at 3:09 PM, Tom Lane  wrote:
>>> And the controlled shared segment is likely to be how big exactly?  It's
>>> probably not even possible for it to be smaller than a page size, 4K or
>>> so depending on the OS.  I agree with Andres that a name would be a good
>>> idea; complaining about the space needed to hold it is penny-wise and
>>> pound-foolish.
>> ...
>> Now, all that having been said, I recognize that human-readable names
>> are a generally useful thing, so I'm not going to hold my breath until
>> I turn blue if other people really want this, and it may turn out to
>> be useful someday.  But if anyone is curious whether I'm *confident*
>> that it will be useful someday: at this point, no.
>
> I'm not confident that it'll be useful either.  But I am confident that
> if we don't put it in now, and decide we want it later, there will be
> complaints when we change the API.  Better to have an ignored parameter
> than no parameter.

I'm generally skeptical of that philosophy.  If we put in an ignored
parameter, people may pass pointers to NULL or to garbage or to an
overly-long string, and they won't know it's broken until we make it
do something; at which point their code will begin to fail without
warning.  Speaking as an employee of a company that maintains several
PostgreSQL extensions that sometimes need to be updated for newer
server versions, I'd rather have a clean API break that makes the
build fail than a "soft" break that supposedly lets things continue
working but maybe breaks them in subtler ways.  Another problem with
this idea is that we might never get around to making it do anything,
and then the dead parameter is just a stupid and unnecessary wart.

If we're going to do anything at all here for 9.4, I recommend
ignoring the fact we're in feature freeze and going whole hog: add the
name, add the monitoring view, and add the monitoring view for the
main shared memory segment just for good measure.  That way, if we get
the design wrong or something, we have a chance of getting some
feedback.  If we're not going to do that, then I vote for doing
nothing and considering later whether to break it for 9.5, by which
time we may have some evidence as to whether and how this code is
really being used.  Anyone who expects PostgreSQL's C API to be
completely stable is going to be regularly disappointed, as most
recently demonstrated by the Enormous Header Churn of the 9.3 era.  I
don't particularly mind being the cause of further disappointment; as
long as the breakage is obvious rather than subtle, the fix usually
takes about 10 minutes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] `pg_dump -Fd` doesn't check write return status...

2014-05-05 Thread Bruce Momjian

On Tue, Apr 22, 2014 at 03:19:15PM -0400, Bruce Momjian wrote:
> On Sat, Mar 1, 2014 at 12:27:19PM -0800, Sean Chittenden wrote:
> > The attached patch fixes the case when `pg_dump -Fd …` is called
> > on a partition where write(2) fails for some reason or another. In
> > this case, backup jobs were returning with a successful exit code even
> > though most of the files in the dump directory were all zero length.
> >
> > I haven’t tested this patch’s failure conditions but the fix seems
> > simple enough: cfwrite() needs to have its return status checked
> > everywhere and exit_horribly() upon any failure. In this case,
> > callers of _WriteData() were not checking the return status and were
> > discarding the negative return status (e.g. ENOSPC).
> >
> > I made a cursory pass over the code and found one other instance where
> > write status wasn’t being checked and also included that.
> 
> As is often the case with pg_dump, the problems you saw are a small part
> of a larger set of problems in that code --- there is general ignoring of
> read and write errors.
> 
> I have developed a comprehensive patch that addresses all the issues I
> could find.  The use of function pointers and the calling of functions
> directly and through function pointers makes the fix quite complex.  I
> ended up placing checks at the lowest level and removing checks at
> higher layers where they were sporadically placed.  
> 
> Patch attached.  I have tested dump/restore of all four dump output
> formats with the 9.3 regression database and all tests passed.  I
> believe this patch is too complex to backpatch, and I don't know how it
> could be fixed more simply.

Patch applied.  Thanks for the report.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 05:22:59PM -0700, Peter Geoghegan wrote:
> On Mon, May 5, 2014 at 5:13 PM, Bruce Momjian  wrote:
> > OK, I understand now.  I also split the item into two separate ones so I
> > could highlight things.  Please see the new ouput --- I ended up
> > creating a pg_stat_statements section because there are now three items:
> >
> > http://momjian.us/pgsql_docs/release-9-4.html
> 
> 
> I agree with the need for a separate section.
> 
> You wrote:
> 
> "This allows programs to reuse the query text already retrieved by
> referencing queryid."
> 
> That's perhaps a little misleading, since queryid should virtually
> always match the original normalized query text (so we only have to
> get a new query text when there is a new queryid). I probably would
> have phrased it:
> 
> "This allows monitoring tools to only fetch query texts for newly
> observe entries, as determined by queryid"

OK, done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 5:13 PM, Bruce Momjian  wrote:
> OK, I understand now.  I also split the item into two separate ones so I
> could highlight things.  Please see the new ouput --- I ended up
> creating a pg_stat_statements section because there are now three items:
>
> http://momjian.us/pgsql_docs/release-9-4.html


I agree with the need for a separate section.

You wrote:

"This allows programs to reuse the query text already retrieved by
referencing queryid."

That's perhaps a little misleading, since queryid should virtually
always match the original normalized query text (so we only have to
get a new query text when there is a new queryid). I probably would
have phrased it:

"This allows monitoring tools to only fetch query texts for newly
observe entries, as determined by queryid"

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 04:20:26PM -0700, Peter Geoghegan wrote:
> On Mon, May 5, 2014 at 4:14 PM, Bruce Momjian  wrote:
> > We rarely get into specific numers like this.  It says "higher limit(s)" and
> > hopefully that is enough.  If you want to create a documentation 'id' I
> > can like to that for the "higher limits" text.
> 
> You don't have to mention a number. Just the fact that there is now
> *no* limit on stored query text length, which is the point. It's also
> nice that the per-entry shared memory overhead is much lower, which as
> I said I think warrants a mention in passing.

OK, I understand now.  I also split the item into two separate ones so I
could highlight things.  Please see the new ouput --- I ended up
creating a pg_stat_statements section because there are now three items:

http://momjian.us/pgsql_docs/release-9-4.html

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Andrew Dunstan



On 05/05/2014 07:34 PM, Peter Geoghegan wrote:

On Mon, May 5, 2014 at 4:26 PM, Andrew Dunstan  wrote:

After all, everything that's not a number or boolean is typed as text (just
as it is in JSON). We don't, for example, map anything to timestamp types.

JSON doesn't have a timestamp primitive type. Of those types that it
has, their internal representation, and their behavior in all relevant
contexts is more or less consistent with what you'd expect of the
mapped-to type. I think that's a very significant point - you will be
able to extract numerics, and manipulate them as numerics in a future
release without using text casting hacks. null values are not typed as
text either. Besides, the on-disk representation of numeric is quite a
lot more compact, and this could easily matter.





OK, but if we must talk about it then at least we should do so with 
precision and accuracy. The current wording is too sloppy.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 4:26 PM, Andrew Dunstan  wrote:
> After all, everything that's not a number or boolean is typed as text (just
> as it is in JSON). We don't, for example, map anything to timestamp types.

JSON doesn't have a timestamp primitive type. Of those types that it
has, their internal representation, and their behavior in all relevant
contexts is more or less consistent with what you'd expect of the
mapped-to type. I think that's a very significant point - you will be
able to extract numerics, and manipulate them as numerics in a future
release without using text casting hacks. null values are not typed as
text either. Besides, the on-disk representation of numeric is quite a
lot more compact, and this could easily matter.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Andrew Dunstan



On 05/05/2014 07:16 PM, Bruce Momjian wrote:

On Mon, May  5, 2014 at 10:58:57AM -0700, Peter Geoghegan wrote:

On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan  wrote:

How about:

This data type allows for faster access to values in the json document
and faster and more useful indexing of json.

We should refer to the fact that jsonb is internally typed. This isn't
all that obvious now, but it is evident for example when you sort a
set of raw scalar numeric jsonb values, which has a sane ordering (the
implementation invokes numeric_cmp()). You also get an internal,
per-number-scalar display scale, just like the numeric type proper.
I'm not all that sure about how to go about succinctly expressing
this, but clearly it's important.

Current text is:

Add structured (non-text) data type (JSONB) for storing JSON data (Oleg
Bartunov, Teodor Sigaev, Alexander Korotkov, Peter Geoghegan, and Andrew
Dunstan)

This allows for faster access to values in the JSON document and faster
and more useful indexing of JSON.  JSONB values are also typed as
appropriate scalar SQL types.

Is that OK?




No. If you must say something then start the last sentence with "Scalar 
values in JSONB documents are typed ...". Personally, I think we're 
making too much of this. After all, everything that's not a number or 
boolean is typed as text (just as it is in JSON). We don't, for example, 
map anything to timestamp types.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 4:16 PM, Bruce Momjian  wrote:
> This allows for faster access to values in the JSON document and 
> faster
> and more useful indexing of JSON.  JSONB values are also typed as
> appropriate scalar SQL types.
>
> Is that OK?

That seems reasonable.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 4:14 PM, Bruce Momjian  wrote:
> We rarely get into specific numers like this.  It says "higher limit(s)" and
> hopefully that is enough.  If you want to create a documentation 'id' I
> can like to that for the "higher limits" text.

You don't have to mention a number. Just the fact that there is now
*no* limit on stored query text length, which is the point. It's also
nice that the per-entry shared memory overhead is much lower, which as
I said I think warrants a mention in passing.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:58:57AM -0700, Peter Geoghegan wrote:
> On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan  wrote:
> > How about:
> >
> >This data type allows for faster access to values in the json document
> > and faster and more useful indexing of json.
> 
> We should refer to the fact that jsonb is internally typed. This isn't
> all that obvious now, but it is evident for example when you sort a
> set of raw scalar numeric jsonb values, which has a sane ordering (the
> implementation invokes numeric_cmp()). You also get an internal,
> per-number-scalar display scale, just like the numeric type proper.
> I'm not all that sure about how to go about succinctly expressing
> this, but clearly it's important.

Current text is:

Add structured (non-text) data type (JSONB) for storing JSON data (Oleg
Bartunov, Teodor Sigaev, Alexander Korotkov, Peter Geoghegan, and Andrew
Dunstan)

This allows for faster access to values in the JSON document and faster
and more useful indexing of JSON.  JSONB values are also typed as
appropriate scalar SQL types.

Is that OK?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 02:23:13PM -0700, Peter Geoghegan wrote:
> On Sun, May 4, 2014 at 6:49 AM, Andres Freund  wrote:
> > +  
> > +   
> > +Have pg_stat_statements use a flat file for query text storage, 
> > allowing higher limits (Peter Geoghegan)
> > +   
> > +
> > +   
> > +Also add the ability to retrieve all pg_stat_statements 
> > information except the query text.  This allows programs to reuse the query
> > +text already retrieved by referencing queryid.
> > +   
> > +  
> >
> > This isn't an optional thing, is it?
> 
> This is intended to be used by time-series monitoring tools that
> aggregate and graph pg_stat_statements data temporally. They usually
> won't need query texts, and so can only retrieve them lazily. The
> pg_stat_statements view presents exactly the same interface for ad-hoc
> querying, though.
> 
> The point of the first item is that there is no longer *any*
> limitation on the size of stored query texts. They are no longer
> truncated to track_activity_query_size bytes. The shared memory
> overhead is also decreased substantially, allowing us to increase the
> default pg_stat_statements.max setting from 1,000 to 5,000, while
> still reducing the overall shared memory overhead (assuming a default
> track_activity_query_size). I think that the removal of the
> limitation, and the substantial lowering of the per-entry footprint
> should both be explicitly noted.

We rarely get into specific numers like this.  It says "higher limit(s)" and
hopefully that is enough.  If you want to create a documentation 'id' I
can like to that for the "higher limits" text.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Thomas Munro

On 5 May 2014 17:22, Andres Freund  wrote:

> On 2014-05-05 13:07:48 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2014-05-05 10:07:46 -0400, Tom Lane wrote:
> > >> Also, -1 for adding another log_line_prefix escape.  If you're routing
> > >> multiple clusters logging to the same place (which is already a bit
> > >> unlikely IMO), you can put distinguishing strings in log_line_prefix
> > >> already.  And it's not like we've got an infinite supply of letters
> > >> for those escapes.
> >
> > > Using syslog and including the same config file from multiple clusters
> > > isn't that uncommon. But I can live without it.
> >
> > So, if you are sharing a config file, how is it that you can set a
> > per-cluster cluster_name but not a per-cluster log_line_prefix?
>
> Well, it's a included file. With log_line_prefix support just
> cluster_name has to be set per cluster. Without you need string
> interpolation in the config management to include cluster_name in
> log_line_prefix.
>

Attached as cluster-name-in-ps-v3-a.patch is a new version with
the following changes:

* the brackets removed, as suggested by Tom Lane

* static initialization of cluster_name to avoid any possibility
  of an uninitialized/null pointer being used before GUC
  initialization, as suggested by Andres Freund

* cluster_name moved to config group CONN_AUTH_SETTINGS, on the
  basis that it's similar to bonjour_name, but it isn't
  really... open to suggestions for a better config_group!

* a small amount of documentation in config.sgml (with
  cluster_name under Connection Settings, which probably
  isn't right either)

* sample conf file updated to show cluster_name and %C in
  log_line_prefix

A shorter version without the log_line_prefix support that Tom didn't
like is attached as cluster-name-in-ps-v3-b.patch.  I will try to add these
to the open commitfest, and see if there is something I can usefully
review in return.

I verified that SHOW cluster_name works as expected and you can't
change it with SET.

Thanks,

Thomas Munro
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4a33f87..0fd414e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -681,6 +681,24 @@ include 'filename'
   
  
 
+ 
+  cluster_name (string)
+  
+   cluster_name configuration parameter
+  
+  
+   
+Sets the cluster name that appears in the process title for all
+processes in this cluster.  No name is shown if this parameter is set
+to the empty string '' (which is the default).  The
+process title is typically viewed by the ps command, or in
+Windows by using the Process Explorer.  The cluster
+name can also be included in the log_line_prefix.
+This parameter can only be set at server start.
+   
+  
+ 
+
  
   tcp_keepalives_idle (integer)
   
@@ -4212,6 +4230,11 @@ local0.*/var/log/postgresql
 

 
+ %C
+ Cluster name
+ no
+
+
  %a
  Application name
  yes
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 977fc66..0adfee7 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2345,6 +2345,12 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, "%lx.%x", (long) (MyStartTime), MyProcPid);
 break;
+			case 'C':
+if (padding != 0)
+	appendStringInfo(buf, "%*s", padding, cluster_name);
+else
+	appendStringInfoString(buf, cluster_name);
+break;
 			case 'p':
 if (padding != 0)
 	appendStringInfo(buf, "%*d", padding, MyProcPid);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 15020c4..71897b8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -449,6 +449,7 @@ int			temp_file_limit = -1;
 
 int			num_temp_buffers = 1024;
 
+const char *cluster_name = "";
 char	   *data_directory;
 char	   *ConfigFileName;
 char	   *HbaFileName;
@@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
+		{"cluster_name", PGC_POSTMASTER, CONN_AUTH_SETTINGS,
+			gettext_noop("Sets the name of the cluster that appears in 'ps' output."),
+			NULL,
+			GUC_IS_NAME
+		},
+		&cluster_name,
+		"",
+		NULL, NULL, NULL
+	},
+
+	{
 		{"data_directory", PGC_POSTMASTER, FILE_LOCATIONS,
 			gettext_noop("Sets the server's data directory."),
 			NULL,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 70e5a51..8ad292a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -74,6 +74,8 @@
 	# (change requires restart)
 #bonjour_name = ''			# defaults to the computer name
 	# (change requires restart)
+#cluster_name = ''			# defaults to the computer name
+

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Tom Lane

Robert Haas  writes:
> On Mon, May 5, 2014 at 3:09 PM, Tom Lane  wrote:
>> And the controlled shared segment is likely to be how big exactly?  It's
>> probably not even possible for it to be smaller than a page size, 4K or
>> so depending on the OS.  I agree with Andres that a name would be a good
>> idea; complaining about the space needed to hold it is penny-wise and
>> pound-foolish.
> ...
> Now, all that having been said, I recognize that human-readable names
> are a generally useful thing, so I'm not going to hold my breath until
> I turn blue if other people really want this, and it may turn out to
> be useful someday.  But if anyone is curious whether I'm *confident*
> that it will be useful someday: at this point, no.

I'm not confident that it'll be useful either.  But I am confident that
if we don't put it in now, and decide we want it later, there will be
complaints when we change the API.  Better to have an ignored parameter
than no parameter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Tom Lane

Andres Freund  writes:
> On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
>> Looks all right to me.  Yeah, the right shift might have undefined
>> high-order bits, but we don't care because we're storing the result
>> into an int16.

> Doesn't at the very least
>   rnode.backend = (msg->sm.backend_hi << 16) | (int) 
> msg->sm.backend_lo;
> have to be
>   rnode.backend = ((int) msg->sm.backend_hi << 16) | (int) 
> msg->sm.backend_lo;

A promotion to int (or wider) is implicit in any arithmetic operation,
last I checked the C standard.  The "(int)" on the other side isn't
necessary either.

We might actually be better off casting both sides to unsigned int,
just to enforce that the left shifting is done with unsigned semantics.

>>> c) The ProcessMessageList() calls access msg->rc.id to test for the type
>>> of the existing message. That's not nice.

>> Huh?

> The checks should access msg->id, not msg->rc.id...

Meh.  If they're not the same thing, we've got big trouble anyway.
But if you want to change it as a cosmetic thing, no objection.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.4 wrap around issues

2014-05-05 Thread Jeff Janes

While stress testing the crash-recovery system, I keep running into
wraparound shutdowns that I think should not be occurring.  I go out of my
way give autovac a chance to complete every now and then, more often than
it should need to in order to keep up with the xid usage.

I think the problem traces down to the fact that updating
pg_database.datfrozenxid is WAL logged, but
updating ShmemVariableCache->oldestXid is not.

So a crash at just the right time means that the databases no longer think
they need to be vacuumed for wrap around, but the system as a whole thinks
that they do.

Should crash recovery end with the system reading the recovered pg_database
and recomputing ShmemVariableCache->oldestXid ?

I don't know that any non-pathological case could trigger this in
production.  But 2^32 is not getting any larger, while maximum common
database throughput and size is.

I bisect this down to f9b50b7c18c8ce7de1fee59409fe2 "Fix removal of files
in pgstats directories".  I think that before that commit, the leftover
stats file was somehow tricking the system into vacuuming the databases
more aggressively following a crash.

Cheers,

Jeff

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Sun, May 4, 2014 at 6:49 AM, Andres Freund  wrote:
> +  
> +   
> +Have pg_stat_statements use a flat file for query text storage, 
> allowing higher limits (Peter Geoghegan)
> +   
> +
> +   
> +Also add the ability to retrieve all pg_stat_statements information 
> except the query text.  This allows programs to reuse the query
> +text already retrieved by referencing queryid.
> +   
> +  
>
> This isn't an optional thing, is it?

This is intended to be used by time-series monitoring tools that
aggregate and graph pg_stat_statements data temporally. They usually
won't need query texts, and so can only retrieve them lazily. The
pg_stat_statements view presents exactly the same interface for ad-hoc
querying, though.

The point of the first item is that there is no longer *any*
limitation on the size of stored query texts. They are no longer
truncated to track_activity_query_size bytes. The shared memory
overhead is also decreased substantially, allowing us to increase the
default pg_stat_statements.max setting from 1,000 to 5,000, while
still reducing the overall shared memory overhead (assuming a default
track_activity_query_size). I think that the removal of the
limitation, and the substantial lowering of the per-entry footprint
should both be explicitly noted.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Robert Haas

On Mon, May 5, 2014 at 3:09 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, May 4, 2014 at 7:50 AM, Andres Freund  wrote:
>>> Thinking about this, I think it was a mistake to not add a 'name' field
>>> to dynamic shared memory's dsm_control_item.
>
>> Well, right now a dsm_control_item is 8 bytes.  If we add a name field
>> of our usual 64 bytes, they'll each be 9 times bigger.
>
> And the controlled shared segment is likely to be how big exactly?  It's
> probably not even possible for it to be smaller than a page size, 4K or
> so depending on the OS.  I agree with Andres that a name would be a good
> idea; complaining about the space needed to hold it is penny-wise and
> pound-foolish.

The control segment is sized to support a number of dynamic shared
memory segments not exceeding 64 + 2 *MaxBackends.  With default
settings, that currently works out to 288 segments, or 2306 bytes.
So, adding a 64-byte name to each of those structures would increase
the size from 2k to about 20k.

So, sure, that's not a lot of memory.  But I'm still not convinced
that's it's very useful.  What I think is going to happen is that (1)
most people won't be used dynamic shared memory at all, so they won't
have any use for this; (2) those people who do run an extension that
uses dynamic shared memory will most likely only be running one such
extension, so they won't need a name to know what the segments are
being used for; and (3) if and when we eventually get parallel query,
dynamic shared memory segments will be widely used, but a bunch of
segments that are all named "parallel_query" or "parallel_query.$PID"
isn't going to be too informative.

Now, all that having been said, I recognize that human-readable names
are a generally useful thing, so I'm not going to hold my breath until
I turn blue if other people really want this, and it may turn out to
be useful someday.  But if anyone is curious whether I'm *confident*
that it will be useful someday: at this point, no.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Josh Berkus

On 05/05/2014 11:53 AM, Tom Lane wrote:
> Sure.  They should all keep them outside $PGDATA, making it not-our-
> problem.  When and if we're prepared to consider it our problem, we
> will be sure to advise people.

OK.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:40:29AM -0700, Josh Berkus wrote:
> Major enhancement list:
> 
> Per discussion on the -advocacy list, the features we've picked out as
> "major enhancements" for advocacy reasons this round are:
> 
> * Materialized Views (because with refresh concurrently, MatViews are
> now useful, which they weren't, very, in 9.3)
> 
> * Logical Decoding/Changeset Extraction which we're calling "Data Change
> Streaming API" in the advocacy stuff.  Not sure if you want to use that
> name in the technical realease notes.
> 
> * Dynamic Background Workers (this one is a major feature, but won't be
> front-listed in advocacy doc because it's hard to explain and nobody's
> built anything cool with it yet.  But it should go under major
> enhancements in the release notes)
> 
> * JSONB and related features (such as indexing)
> 
> * ALTER SYSTEM SET
> 
> Lemme know if you need description text for any of the above.

OK, I added doc links and this list of major features.  You can see the
results here:

http://momjian.us/pgsql_docs/release-9-4.html

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Andres Freund

On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > a) SICleanupQueue() sometimes releases and reacquires the write lock
> >held on the outside. That's pretty damn fragile, not to mention
> >ugly. Even slight reformulations of the code in SIInsertDataEntries()
> >can break this... Can we please extend the comment in the latter that
> >to mention the lock dropping explicitly?
> 
> Want to write a better comment?

Will do.

> > b) we right/left shift -1 in a signed int by 16 in
> >CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
> >implementation defined behaviour.
> 
> Looks all right to me.  Yeah, the right shift might have undefined
> high-order bits, but we don't care because we're storing the result
> into an int16.

Doesn't at the very least
rnode.backend = (msg->sm.backend_hi << 16) | (int) 
msg->sm.backend_lo;
have to be
rnode.backend = ((int) msg->sm.backend_hi << 16) | (int) 
msg->sm.backend_lo;

> > c) The ProcessMessageList() calls access msg->rc.id to test for the type
> >of the existing message. That's not nice.
> 
> Huh?

The checks should access msg->id, not msg->rc.id...

> > After far, far too much confused head scratching, code reading, random
> > elog()s et al I found out that this is just because of a deficiency in
> > valgrind's undefinedness tracking. [...]
> > Unfortunately this cannot precisely be caught by valgrind's
> > suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
> > 0) in AddCatcacheInvalidationMessage() et al. to suppress these
> > warnings. Imo we can just add them unconditionally, but if somebody else
> > prefers we can add #ifdef USE_VALGRIND around them.
> 
> I'd be okay with USE_VALGRIND.  I'm not particularly hot on adding a
> memset for everybody just to make valgrind less confused.  Especially
> since that's really going to hide any problems, not fix them.

I can't see it having any measureable overhead. Even old compilers will
optimize the memset() away and just initialize the padding... But
anyway, USE_VALGRIND is fine with me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with GRANT/COMMENT ON FUNCTION with default

2014-05-05 Thread Alvaro Herrera

Jim Nasby wrote:
> Prior to default parameters on functions, GRANT and COMMENT accepted full 
> parameter syntax. IE:
> 
> GRANT EXECUTE ON test(t text) TO public
> 
> as opposed to regprocedure, which only accepts the data types ( test(text), 
> not test(t text) ).
> 
> They do not accept DEFAULT though:
> 
> GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public;
> ERROR:  syntax error at or near "DEFAULT"
> LINE 1: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public;
> 
> Presumably this is just an oversight?

I have to say that accepting DEFAULT there seems pretty odd to me.  What
if you specify the wrong default?  Do you get a "no such function"
error?  That would be pretty unhelpful.  But then accepting it ignoring
the fact that the default is wrong would be rather strange.

> Related to that, is it intentional that the regprocedure cast
> disallows *any* decorators to the function, other than type? If
> regprocedure at least accepted the full function parameter definition
> you could use it to get a definitive reference to a function.

Does pg_identify_object() give you what you want?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Andrew Dunstan



On 05/05/2014 02:33 PM, Josh Berkus wrote:

On 05/05/2014 11:31 AM, Bruce Momjian wrote:

JSONB values are also mapped to SQL scalar data types, rather
than being treated always as strings.

+ ... allowing for correct sorting of JSON according to internal datums.




The problem is that at least in one sense that's not what we're doing. 
The canonical ordering of object keys is not at all standard lexical 
ordering.


I really don't think this is Release Notes material. The fact that jsonb 
maps scalar values to internal postgres types is an implementation artefact.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 1:13 PM, Andrew Dunstan  wrote:
> The fact that jsonb maps scalar values to internal postgres types is an
> implementation artefact.

I wouldn't go that far, but I take your point. The fact that the
primitive/scalar JSON types are in a very real sense strongly typed is
a very important detail, though. The fact that what became jsonb is
like a nested, typed hstore has always been prominently emphasized,
for good reasons.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Issue with GRANT/COMMENT ON FUNCTION with default

2014-05-05 Thread Jim Nasby


Prior to default parameters on functions, GRANT and COMMENT accepted full 
parameter syntax. IE:

GRANT EXECUTE ON test(t text) TO public

as opposed to regprocedure, which only accepts the data types ( test(text), not 
test(t text) ).

They do not accept DEFAULT though:

GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public;
ERROR:  syntax error at or near "DEFAULT"
LINE 1: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public;

Presumably this is just an oversight? Related to that, is it intentional that 
the regprocedure cast disallows *any* decorators to the function, other than 
type? If regprocedure at least accepted the full function parameter definition 
you could use it to get a definitive reference to a function.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?

2014-05-05 Thread Andres Freund

Hi,

We really should fix this one of these days.

On 2014-03-26 18:45:54 -0700, Peter Geoghegan wrote:
> Attached patch silences the "Invalid read of size n" complaints of
> Valgrind. I agree with your general thoughts around backpatching. Note
> that the patch addresses a distinct complaint from Kevin's, as
> Valgrind doesn't take issue with the invalid reads past the end of
> spgxlogPickSplit variables on the stack.

I don't think that's entirely sufficient. The local spgxlogPickSplit
xlrec;/spgxlogMoveLeafs xlrec; variables are also inserted while
MAXLIGNing their size. That's slightly harder to fix :(. I don't have a
better idea than also allocating them dynamically :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Tom Lane

Andres Freund  writes:
> a) SICleanupQueue() sometimes releases and reacquires the write lock
>held on the outside. That's pretty damn fragile, not to mention
>ugly. Even slight reformulations of the code in SIInsertDataEntries()
>can break this... Can we please extend the comment in the latter that
>to mention the lock dropping explicitly?

Want to write a better comment?

> b) we right/left shift -1 in a signed int by 16 in
>CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
>implementation defined behaviour.

Looks all right to me.  Yeah, the right shift might have undefined
high-order bits, but we don't care because we're storing the result
into an int16.

> c) The ProcessMessageList() calls access msg->rc.id to test for the type
>of the existing message. That's not nice.

Huh?

> After far, far too much confused head scratching, code reading, random
> elog()s et al I found out that this is just because of a deficiency in
> valgrind's undefinedness tracking. [...]
> Unfortunately this cannot precisely be caught by valgrind's
> suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
> 0) in AddCatcacheInvalidationMessage() et al. to suppress these
> warnings. Imo we can just add them unconditionally, but if somebody else
> prefers we can add #ifdef USE_VALGRIND around them.

I'd be okay with USE_VALGRIND.  I'm not particularly hot on adding a
memset for everybody just to make valgrind less confused.  Especially
since that's really going to hide any problems, not fix them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Andres Freund

On 2014-05-05 14:15:58 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > While investigating an issue pointed out by valgrind around undefined
> > bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a
> > bug in ReceiveSharedInvalidMessages(). It tries to be safe against
> > recursion but it's not:
> > When it recurses into ReceiveSharedInvalidMessages() from it's main loop
> > from inside the inval callback while nextmsg = nummsgs it'll overwrite
> > the 'messages' array with new contents. But at this point the old
> > content of one entry in the messages array is still passed to
> > the LocalExecuteInvalidationMessage() that caused the recursion.
> 
> Hm, yeah, so if the called inval function continues to use the message
> contents after doing something that could result in a recursive call,
> it might be looking at trashed data.

FWIW, with a bit of changes around it to make it more visible
(malloc/freeing the array) this sometimes triggers during make check. So
it's quite plausible that this is the caused the relfilenodemap
regression error we were a bit confused about.

I started to look at this code because valgrind was bleating at me
inside inval.c and for the heck of I couldn't understand wh. Since then
this lead me into quite a wild goose chase. Leading me to discover a
couple of things I am not perfectly happy about:

a) SICleanupQueue() sometimes releases and reacquires the write lock
   held on the outside. That's pretty damn fragile, not to mention
   ugly. Even slight reformulations of the code in SIInsertDataEntries()
   can break this... Can we please extend the comment in the latter that
   to mention the lock dropping explicitly?
b) we right/left shift -1 in a signed int by 16 in
   CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
   implementation defined behaviour.
c) The ProcessMessageList() calls access msg->rc.id to test for the type
   of the existing message. That's not nice. The compiler is entirely
   free to e.g. copy the entire struct to local memory causing
   unitialized memory to be accessed. Entirely cosmetic, but ...

d)
The valgrind splats I was very occasionally getting were:
==21013== Conditional jump or move depends on uninitialised value(s)
==21013==at 0x8DD755: hash_search_with_hash_value (dynahash.c:885)
==21013==by 0x8DD60F: hash_search (dynahash.c:811)
==21013==by 0x799A58: smgrclosenode (smgr.c:357)
==21013==by 0x8B6ABF: LocalExecuteInvalidationMessage (inval.c:566)
==21013==by 0x77BBCF: ReceiveSharedInvalidMessages (sinval.c:127)
==21013==by 0x8B6C3F: AcceptInvalidationMessages (inval.c:640)
==21013==by 0x77FDCC: LockRelationOid (lmgr.c:107)
==21013==by 0x4A6F33: relation_open (heapam.c:1029)
==21013==by 0x4A71EB: heap_open (heapam.c:1195)
==21013==by 0x8B4228: SearchCatCache (catcache.c:1223)
==21013==by 0x8C61B1: SearchSysCache (syscache.c:909)
==21013==by 0x8C62CD: GetSysCacheOid (syscache.c:987)
==21013==  Uninitialised value was created by a stack allocation
==21013==at 0x8B64C3: AddCatcacheInvalidationMessage (inval.c:328)

After far, far too much confused head scratching, code reading, random
elog()s et al I found out that this is just because of a deficiency in
valgrind's undefinedness tracking. The problem is that it doesn't really
understand shared memory sufficiently. When a backend stored a message
in the SI ringbuffer it sets a bitmask about initialized memory for a
specific position in the ringubffer. If the *same* backend reads from
the same position in the ringbuffer (4096 messages later) into which
another backend has stored a *different* type of message the bitmap of
initialized bytes will possibly be wrong if the padding is distributed
differently.

Unfortunately this cannot precisely be caught by valgrind's
suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
0) in AddCatcacheInvalidationMessage() et al. to suppress these
warnings. Imo we can just add them unconditionally, but if somebody else
prefers we can add #ifdef USE_VALGRIND around them.

I can do a), c), d), if others agree but I am not sure about b)?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Andres Freund

On 2014-05-05 15:09:02 -0400, Tom Lane wrote:
> > I'm quite in favor of having something like this for the main shared
> > memory segment, but I think that's 9.5 material at this point.
> 
> If you're prepared to break the current APIs later to add a name parameter
> (which would have to be required, if it's to be useful at all), then sure,
> put the question off till 9.5.

I understood Robert to mean that it's too late for my proposed view for
9.4 - and I agree - but I wholeheartedly agree with you that we should
add a name parameter to the dsm API *now*. We can just Assert() that it's
nonzero if we don't think it's useful for now.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Andres Freund

On 2014-05-05 15:04:07 -0400, Robert Haas wrote:
> On Sun, May 4, 2014 at 7:50 AM, Andres Freund  wrote:
> > On 2014-05-04 13:44:17 +0200, Andres Freund wrote:
> >> postgres=# SELECT * FROM pg_shmem_allocations ORDER BY size DESC;
> >>  key | off |size | 
> >> allocated
> >> -+-+-+---
> >>  Buffer Blocks   |   286242528 | 17179869184 | t
> >>  Buffer Descriptors  |   152024800 |   134217728 | t
> >> ...
> >>  OldSerXidControlData| 17584357344 |  16 | t
> >> (44 rows)
> >
> > Thinking about this, I think it was a mistake to not add a 'name' field
> > to dynamic shared memory's dsm_control_item. Right now it's very hard to
> > figure out which extension allocated a dsm segment. Imo we should change
> > that before 9.4 is out. I am not suggesting to use it to identify
> > segments, but just as an identifier, passed in into dsm_create().
> >
> > Imo there should be a corresponding pg_dynshmem_allocations to
> > pg_shmem_allocations.
> 
> Well, right now a dsm_control_item is 8 bytes.  If we add a name field
> of our usual 64 bytes, they'll each be 9 times bigger.  We're not
> talking about a lot of bytes in absolute terms, but I guess I'm not in
> favor of an 800% size increase without somewhat more justification
> than you've provided here.  Who is using dynamic shared memory for
> enough different things at this point to get confused?

The kernel side overhead of creating a shared memory segment are so much
higher that this really isn't a meaningful saving. Also, are you really
considering a couple hundred bytes to be a problem?
I think it's quite a sensible thing for an administrator to ask where
all the memory has gone. The more users for dsm there the more important
that'll get. Right now pretty much the only thing a admin could do is to
poke around in /proc to see which backend has mapped the segment and try
to figure out via the logs what caused it to do so. Not nice.

> I'm quite in favor of having something like this for the main shared
> memory segment, but I think that's 9.5 material at this point.

Clearly. For one the version I posted here missed allocations which
aren't done via ShmemInitStruct (lwlock main array and hash table
allocations primarily). For another it's too late ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Tom Lane

Robert Haas  writes:
> On Sun, May 4, 2014 at 7:50 AM, Andres Freund  wrote:
>> Thinking about this, I think it was a mistake to not add a 'name' field
>> to dynamic shared memory's dsm_control_item.

> Well, right now a dsm_control_item is 8 bytes.  If we add a name field
> of our usual 64 bytes, they'll each be 9 times bigger.

And the controlled shared segment is likely to be how big exactly?  It's
probably not even possible for it to be smaller than a page size, 4K or
so depending on the OS.  I agree with Andres that a name would be a good
idea; complaining about the space needed to hold it is penny-wise and
pound-foolish.

> I'm quite in favor of having something like this for the main shared
> memory segment, but I think that's 9.5 material at this point.

If you're prepared to break the current APIs later to add a name parameter
(which would have to be required, if it's to be useful at all), then sure,
put the question off till 9.5.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Robert Haas

On Sun, May 4, 2014 at 7:50 AM, Andres Freund  wrote:
> On 2014-05-04 13:44:17 +0200, Andres Freund wrote:
>> postgres=# SELECT * FROM pg_shmem_allocations ORDER BY size DESC;
>>  key | off |size | allocated
>> -+-+-+---
>>  Buffer Blocks   |   286242528 | 17179869184 | t
>>  Buffer Descriptors  |   152024800 |   134217728 | t
>> ...
>>  OldSerXidControlData| 17584357344 |  16 | t
>> (44 rows)
>
> Thinking about this, I think it was a mistake to not add a 'name' field
> to dynamic shared memory's dsm_control_item. Right now it's very hard to
> figure out which extension allocated a dsm segment. Imo we should change
> that before 9.4 is out. I am not suggesting to use it to identify
> segments, but just as an identifier, passed in into dsm_create().
>
> Imo there should be a corresponding pg_dynshmem_allocations to
> pg_shmem_allocations.

Well, right now a dsm_control_item is 8 bytes.  If we add a name field
of our usual 64 bytes, they'll each be 9 times bigger.  We're not
talking about a lot of bytes in absolute terms, but I guess I'm not in
favor of an 800% size increase without somewhat more justification
than you've provided here.  Who is using dynamic shared memory for
enough different things at this point to get confused?

I'm quite in favor of having something like this for the main shared
memory segment, but I think that's 9.5 material at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 11:31 AM, Bruce Momjian  wrote:
> JSONB values are also mapped to SQL scalar data types, rather
> than being treated always as strings.

Something like that. Perhaps you should just go with what the
documentation says: "Primitive JSON types described by RFC 7159 are
effectively internally mapped onto native PostgreSQL types."

The fact that the default B-Tree operator class defines a type-wise
ordering is beside the point. That's just currently the most obvious
way in which this "shadow typing" is evident. At some point we're
going to have to figure out ways to manipulate jsonb using shadow type
specific operators or functions, so you can for example "extract"
actual numeric values easily. I don't want users to assume that those
don't exist right now due to a fundamental limitation of our
implementation.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Tom Lane

Josh Berkus  writes:
> On 05/05/2014 10:53 AM, Tom Lane wrote:
>> A larger and more philosophical point is that such a direction of
>> development could hardly be called a "foreign" data wrapper.  People
>> would expect Postgres to take full responsibility for such files,
>> including data integrity considerations such as fsync-at-checkpoints
>> and WAL support.  Even if we wanted the FDW abstractions to allow
>> for that, we're very far away from it.  And frankly I'd maintain
>> that FDW is the wrong abstraction.

> Certainly pluggable storage would be a better abstraction; but we don't
> have that yet.  In the meantime, we have one FDW which creates files
> *right now*, and we might have more in the future, so I'm trying to
> establish some guidelines as to how such FDWs should behave.

The guideline is simple: don't do that.  We should absolutely not
encourage this until/unless we have infrastructure to support it.
Just because one FDW author thought this would be a cool thing to do
does not make it a cool thing to do, and definitely not a cool thing
to encourage others to emulate.

> Regardless
> of whether or not you think FDWs should be managing files, it's better
> for users if all FDWs which manage files manage them in the same way.

Sure.  They should all keep them outside $PGDATA, making it not-our-
problem.  When and if we're prepared to consider it our problem, we
will be sure to advise people.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Josh Berkus

On 05/05/2014 10:53 AM, Andres Freund wrote:
> Still a user error. You need to reclone.
> 
> Depending on how archiving and the target timeline was configured the
> timeline increase won't be treated as an error...

Andres and I hashed this out on IRC.  The basic problem was that I was
relying on pg_stat_replication to point out when a successful
replication connection was established.  However, he pointed out cases
where pg_stat_replication will report sync or streaming even though
replication has failed due to differences in WAL position.  That appears
to be what happened here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] avoiding tuple copying in btree index builds

2014-05-05 Thread Robert Haas

On Mon, May 5, 2014 at 2:13 PM, Andres Freund  wrote:
> On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
>> Today, I discovered that when building a btree index, the btree code
>> uses index_form_tuple() to create an index tuple from the heap tuple,
>> calls tuplesort_putindextuple() to copy that tuple into the sort's
>> memory context, and then frees the original one it built.  This seemed
>> inefficient, so I wrote a patch to eliminate the tuple copying.  It
>> works by adding a function tuplesort_putindextuplevalues(), which
>> builds the tuple in the sort's memory context and thus avoids the need
>> for a separate copy.  I'm not sure if that's the best approach, but
>> the optimization seems wortwhile.
>
> Hm. It looks like we could quite easily just get rid of
> tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

I glanced at that, but it wasn't obvious to me how to convert the hash
usage.  If you have an idea, I'm all ears.

>> I tested it by repeatedly executing "REINDEX INDEX
>> pgbench_accounts_pkey" on a PPC64 machine.  pgbench_accounts contains
>> 10 million records.  With unpatched master as of
>> b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s,
>> 6.177s, and 6.201s.  With the attached patch, I got times of 5.787s,
>> 5.972s, and 5.913s, a savings of almost 5%.  Not bad considering the
>> amount of work involved.
>
> Yes, that's certainly worthwile. Nice.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Josh Berkus

On 05/05/2014 11:31 AM, Bruce Momjian wrote:
>   JSONB values are also mapped to SQL scalar data types, rather
> than being treated always as strings.

+ ... allowing for correct sorting of JSON according to internal datums.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:58:57AM -0700, Peter Geoghegan wrote:
> On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan  wrote:
> > How about:
> >
> >This data type allows for faster access to values in the json document
> > and faster and more useful indexing of json.
> 
> We should refer to the fact that jsonb is internally typed. This isn't
> all that obvious now, but it is evident for example when you sort a
> set of raw scalar numeric jsonb values, which has a sane ordering (the
> implementation invokes numeric_cmp()). You also get an internal,
> per-number-scalar display scale, just like the numeric type proper.
> I'm not all that sure about how to go about succinctly expressing
> this, but clearly it's important.

How about:

JSONB values are also mapped to SQL scalar data types, rather
than being treated always as strings.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Andres Freund

On 2014-05-05 11:17:18 -0700, Josh Berkus wrote:
> On 05/05/2014 10:53 AM, Tom Lane wrote:
> > Josh Berkus  writes:
> >> I'm working with the cstore_fdw project, which has an interesting
> >> property for an FDW: the FDW itself creates the files which make up the
> >> database.   This raises a couple of questions:
> > 
> >> 1) Do we want to establish a standard directory for FDWs which create
> >> files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
> >> leave it up to each FDW to decide?
> > 
> > I think we ought to vigorously discourage FDWs from storing any files
> > inside $PGDATA.  This cannot lead to anything except grief.  Just for
> > starters, what will operations such as pg_basebackup do with them?
> 
> That was one advantage to putting them in PGDATA; you get a copy of the
> files with pg_basebackup.

A corrupted copy. There's no WAL replay to correct skew due to write
activity while copying.

> Of course, they don't replicate after that,
> but they potentially could, in the future, with Logical Streaming
> Replication.

Nope. They're not in the WAL, so they won't be streamed out.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Magnus Hagander

On Mon, May 5, 2014 at 8:17 PM, Josh Berkus  wrote:

> On 05/05/2014 10:53 AM, Tom Lane wrote:
> > Josh Berkus  writes:
> >> I'm working with the cstore_fdw project, which has an interesting
> >> property for an FDW: the FDW itself creates the files which make up the
> >> database.   This raises a couple of questions:
> >
> >> 1) Do we want to establish a standard directory for FDWs which create
> >> files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
> >> leave it up to each FDW to decide?
> >
> > I think we ought to vigorously discourage FDWs from storing any files
> > inside $PGDATA.  This cannot lead to anything except grief.  Just for
> > starters, what will operations such as pg_basebackup do with them?
>
> That was one advantage to putting them in PGDATA; you get a copy of the
> files with pg_basebackup.  Of course, they don't replicate after that,
> but they potentially could, in the future, with Logical Streaming
> Replication.
>

Presumably they'd also be inconsistent? And as such not really useful
unless you actually shut the database down before you back it up (e.g.
don't use pg_basebackup)?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Josh Berkus

On 05/05/2014 10:53 AM, Tom Lane wrote:
> Josh Berkus  writes:
>> I'm working with the cstore_fdw project, which has an interesting
>> property for an FDW: the FDW itself creates the files which make up the
>> database.   This raises a couple of questions:
> 
>> 1) Do we want to establish a standard directory for FDWs which create
>> files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
>> leave it up to each FDW to decide?
> 
> I think we ought to vigorously discourage FDWs from storing any files
> inside $PGDATA.  This cannot lead to anything except grief.  Just for
> starters, what will operations such as pg_basebackup do with them?

That was one advantage to putting them in PGDATA; you get a copy of the
files with pg_basebackup.  Of course, they don't replicate after that,
but they potentially could, in the future, with Logical Streaming
Replication.

> A larger and more philosophical point is that such a direction of
> development could hardly be called a "foreign" data wrapper.  People
> would expect Postgres to take full responsibility for such files,
> including data integrity considerations such as fsync-at-checkpoints
> and WAL support.  Even if we wanted the FDW abstractions to allow
> for that, we're very far away from it.  And frankly I'd maintain
> that FDW is the wrong abstraction.

Certainly pluggable storage would be a better abstraction; but we don't
have that yet.  In the meantime, we have one FDW which creates files
*right now*, and we might have more in the future, so I'm trying to
establish some guidelines as to how such FDWs should behave.  Regardless
of whether or not you think FDWs should be managing files, it's better
for users if all FDWs which manage files manage them in the same way.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Tom Lane

Andres Freund  writes:
> While investigating an issue pointed out by valgrind around undefined
> bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a
> bug in ReceiveSharedInvalidMessages(). It tries to be safe against
> recursion but it's not:
> When it recurses into ReceiveSharedInvalidMessages() from it's main loop
> from inside the inval callback while nextmsg = nummsgs it'll overwrite
> the 'messages' array with new contents. But at this point the old
> content of one entry in the messages array is still passed to
> the LocalExecuteInvalidationMessage() that caused the recursion.

Hm, yeah, so if the called inval function continues to use the message
contents after doing something that could result in a recursive call,
it might be looking at trashed data.

> It looks to me like this is broken since at least fad153ec. I think the
> fix is just to make the current 'SharedInvalidationMessage *msg' not be
> pointers but a local copiy of the to-be-processed entry.

Yeah, that should do it.  I think I'd been trying to avoid copying
messages more times than necessary, but evidently I optimized away
one copy step too many :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] avoiding tuple copying in btree index builds

2014-05-05 Thread Andres Freund

Hi,

On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
> Today, I discovered that when building a btree index, the btree code
> uses index_form_tuple() to create an index tuple from the heap tuple,
> calls tuplesort_putindextuple() to copy that tuple into the sort's
> memory context, and then frees the original one it built.  This seemed
> inefficient, so I wrote a patch to eliminate the tuple copying.  It
> works by adding a function tuplesort_putindextuplevalues(), which
> builds the tuple in the sort's memory context and thus avoids the need
> for a separate copy.  I'm not sure if that's the best approach, but
> the optimization seems wortwhile.

Hm. It looks like we could quite easily just get rid of
tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

> I tested it by repeatedly executing "REINDEX INDEX
> pgbench_accounts_pkey" on a PPC64 machine.  pgbench_accounts contains
> 10 million records.  With unpatched master as of
> b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s,
> 6.177s, and 6.201s.  With the attached patch, I got times of 5.787s,
> 5.972s, and 5.913s, a savings of almost 5%.  Not bad considering the
> amount of work involved.

Yes, that's certainly worthwile. Nice.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan  wrote:
> How about:
>
>This data type allows for faster access to values in the json document
> and faster and more useful indexing of json.

We should refer to the fact that jsonb is internally typed. This isn't
all that obvious now, but it is evident for example when you sort a
set of raw scalar numeric jsonb values, which has a sane ordering (the
implementation invokes numeric_cmp()). You also get an internal,
per-number-scalar display scale, just like the numeric type proper.
I'm not all that sure about how to go about succinctly expressing
this, but clearly it's important.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Andres Freund

On 2014-05-05 10:30:17 -0700, Josh Berkus wrote:
> On 05/05/2014 10:25 AM, Andres Freund wrote:
> > On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
> >> On 05/03/2014 01:07 AM, Andres Freund wrote:
> >>> On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
>  Just got a report of a replication issue with 9.2.8 from a community 
>  member:
> 
>  Here's the sequence:
> 
>  1) A --> B (sync rep)
> 
>  2) Shut down B
> 
>  3) Shut down A
> 
>  4) Start up B as a master
> 
>  5) Start up A as sync replica of B
> 
>  6) A successfully joins B as a sync replica, even though its transaction
>  log is 1016 bytes *ahead* of B.
> 
>  7) Transactions written to B all hang
> 
>  8) Xlog on A is now corrupt, although the database itself is OK
> >>>
> >>> This is fundamentally borked practice.
> >>>
>  Now, the above sequence happened because of the user misunderstanding
>  what sync rep really means.  However, A should not have been able to
>  connect with B in replication mode, especially in sync rep mode; that
>  should have failed.  Any thoughts on why it didn't?
> >>>
> >>> I'd guess that B, while starting up, has written further WAL records
> >>> bringing it further ahead of A.
> >>
> >> Apparently not; from what I've seen pg_stat_replication even *shows*
> >> that the replica is ahead of the master.

That's the shutdown record from A that I've talked about.

>  Futher, Postgres should have
> >> recognized that there was a timeline branch point before A's last
> >> record, no?
> > 
> > There wasn't any timeline increase because - as far as I understand the
> > above - there wasn't any promotion. The cluster was shut down and
> > recovery.conf was created/removed respectively.
> 
> Ah, oops, left out a step.  B was promoted.

Still a user error. You need to reclone.

Depending on how archiving and the target timeline was configured the
timeline increase won't be treated as an error...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Tom Lane

Josh Berkus  writes:
> I'm working with the cstore_fdw project, which has an interesting
> property for an FDW: the FDW itself creates the files which make up the
> database.   This raises a couple of questions:

> 1) Do we want to establish a standard directory for FDWs which create
> files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
> leave it up to each FDW to decide?

I think we ought to vigorously discourage FDWs from storing any files
inside $PGDATA.  This cannot lead to anything except grief.  Just for
starters, what will operations such as pg_basebackup do with them?

A larger and more philosophical point is that such a direction of
development could hardly be called a "foreign" data wrapper.  People
would expect Postgres to take full responsibility for such files,
including data integrity considerations such as fsync-at-checkpoints
and WAL support.  Even if we wanted the FDW abstractions to allow
for that, we're very far away from it.  And frankly I'd maintain
that FDW is the wrong abstraction.

> 2) Do we want to support the TABLESPACE directive for FDWs?

A fortiori, no.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] avoiding tuple copying in btree index builds

2014-05-05 Thread Robert Haas

Hi,

Today, I discovered that when building a btree index, the btree code
uses index_form_tuple() to create an index tuple from the heap tuple,
calls tuplesort_putindextuple() to copy that tuple into the sort's
memory context, and then frees the original one it built.  This seemed
inefficient, so I wrote a patch to eliminate the tuple copying.  It
works by adding a function tuplesort_putindextuplevalues(), which
builds the tuple in the sort's memory context and thus avoids the need
for a separate copy.  I'm not sure if that's the best approach, but
the optimization seems wortwhile.

I tested it by repeatedly executing "REINDEX INDEX
pgbench_accounts_pkey" on a PPC64 machine.  pgbench_accounts contains
10 million records.  With unpatched master as of
b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s,
6.177s, and 6.201s.  With the attached patch, I got times of 5.787s,
5.972s, and 5.913s, a savings of almost 5%.  Not bad considering the
amount of work involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 542ed43..0743474 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -171,28 +171,21 @@ btbuildCallback(Relation index,
 void *state)
 {
 	BTBuildState *buildstate = (BTBuildState *) state;
-	IndexTuple	itup;
-
-	/* form an index tuple and point it at the heap tuple */
-	itup = index_form_tuple(RelationGetDescr(index), values, isnull);
-	itup->t_tid = htup->t_self;
 
 	/*
 	 * insert the index tuple into the appropriate spool file for subsequent
 	 * processing
 	 */
 	if (tupleIsAlive || buildstate->spool2 == NULL)
-		_bt_spool(itup, buildstate->spool);
+		_bt_spool(buildstate->spool, htup->t_self, values, isnull);
 	else
 	{
 		/* dead tuples are put into spool2 */
 		buildstate->haveDead = true;
-		_bt_spool(itup, buildstate->spool2);
+		_bt_spool(buildstate->spool2, htup->t_self, values, isnull);
 	}
 
 	buildstate->indtuples += 1;
-
-	pfree(itup);
 }
 
 /*
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 9ddc275..8cd3a91 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -185,9 +185,10 @@ _bt_spooldestroy(BTSpool *btspool)
  * spool an index entry into the sort file.
  */
 void
-_bt_spool(IndexTuple itup, BTSpool *btspool)
+_bt_spool(BTSpool *btspool, ItemPointerData self, Datum *values, bool *isnull)
 {
-	tuplesort_putindextuple(btspool->sortstate, itup);
+	tuplesort_putindextuplevalues(btspool->sortstate, btspool->index,
+  self, values, isnull);
 }
 
 /*
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 8b520c1..3b0a06c 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1150,6 +1150,29 @@ tuplesort_putindextuple(Tuplesortstate *state, IndexTuple tuple)
 	 */
 	COPYTUP(state, &stup, (void *) tuple);
 
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Collect one index tuple while collecting input data for sort, building
+ * it from caller-supplied values.
+ */
+void
+tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel,
+			  ItemPointerData self, Datum *values,
+  bool *isnull)
+{
+	MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
+	SortTuple	stup;
+
+	stup.tuple = index_form_tuple(RelationGetDescr(rel), values, isnull);
+	((IndexTuple) stup.tuple)->t_tid = self;
+	USEMEM(state, GetMemoryChunkSpace(stup.tuple));
+	/* set up first-column key value */
+	stup.datum1 = index_getattr((IndexTuple) stup.tuple,
+1,
+RelationGetDescr(state->indexRel),
+&stup.isnull1);
 	puttuple_common(state, &stup);
 
 	MemoryContextSwitchTo(oldcontext);
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 1a8b16d..5ef1f9b 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -717,7 +717,8 @@ typedef struct BTSpool BTSpool; /* opaque type known only within nbtsort.c */
 extern BTSpool *_bt_spoolinit(Relation heap, Relation index,
 			  bool isunique, bool isdead);
 extern void _bt_spooldestroy(BTSpool *btspool);
-extern void _bt_spool(IndexTuple itup, BTSpool *btspool);
+extern void _bt_spool(BTSpool *btspool, ItemPointerData self,
+		  Datum *values, bool *isnull);
 extern void _bt_leafbuild(BTSpool *btspool, BTSpool *spool2);
 
 /*
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 05445f0..2b62a98 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -85,6 +85,9 @@ extern void tuplesort_puttupleslot(Tuplesortstate *state,
 	   TupleTableSlot *slot);
 extern void tuplesort_putheaptuple(Tuplesortstate *state, HeapTuple tup);
 extern void tuplesort_putindextuple(Tuplesortstate *state, IndexTuple tuple);
+extern void tuplesort_putindextuplevalues(Tuplesortstate

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Fabrízio de Royes Mello

On Mon, May 5, 2014 at 2:26 PM, Josh Berkus  wrote:
>
> All,
>
> I'm working with the cstore_fdw project, which has an interesting
> property for an FDW: the FDW itself creates the files which make up the
> database.   This raises a couple of questions:
>
> 1) Do we want to establish a standard directory for FDWs which create
> files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
> leave it up to each FDW to decide?
>

-1. Each FDW must decide.


> 2) Do we want to support the TABLESPACE directive for FDWs?
>
> While cstore is the first FDW to create its own files, it won't
> necessarily be the last; I could imagine CSV_FDW doing so as well, or a
> future SQLite_FDW which does the same.  So I think the above questions
> are worth answering in general.  And we're planning to implement
> automated file management for cstore_fdw fairly soon, so we want to make
> it consistent with whatever we're doing in Postgres 9.5.
>

-1. We cannot guarantee the consistency of files using an "ALTER FOREIGN
TABLE ... SET TABLESPACE ..." as a normal "ALTER TABLE ..." does.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:40:29AM -0700, Josh Berkus wrote:
> On 05/04/2014 05:46 AM, Bruce Momjian wrote:
> > I have completed the initial version of the 9.4 release notes.  You can
> > view them here:
> > 
> > http://www.postgresql.org/docs/devel/static/release-9-4.html
> > 
> > I will be adding additional markup in the next few days.
> > 
> > Feedback expected and welcomed.  I expect to be modifying this until we
> > release 9.4 final.  I have marked items where I need help with question
> > marks.
> > 
> 
> Major enhancement list:
> 
> Per discussion on the -advocacy list, the features we've picked out as
> "major enhancements" for advocacy reasons this round are:
> 
> * Materialized Views (because with refresh concurrently, MatViews are
> now useful, which they weren't, very, in 9.3)
> 
> * Logical Decoding/Changeset Extraction which we're calling "Data Change
> Streaming API" in the advocacy stuff.  Not sure if you want to use that
> name in the technical realease notes.
> 
> * Dynamic Background Workers (this one is a major feature, but won't be
> front-listed in advocacy doc because it's hard to explain and nobody's
> built anything cool with it yet.  But it should go under major
> enhancements in the release notes)
> 
> * JSONB and related features (such as indexing)
> 
> * ALTER SYSTEM SET
> 
> Lemme know if you need description text for any of the above.

OK, great!  Once I have the markup done, I will beef up the descriptions
if needed and copy the text up to the major items section so we have
that all ready for beta.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Josh Berkus

On 05/04/2014 05:46 AM, Bruce Momjian wrote:
> I have completed the initial version of the 9.4 release notes.  You can
> view them here:
> 
>   http://www.postgresql.org/docs/devel/static/release-9-4.html
> 
> I will be adding additional markup in the next few days.
> 
> Feedback expected and welcomed.  I expect to be modifying this until we
> release 9.4 final.  I have marked items where I need help with question
> marks.
> 

Major enhancement list:

Per discussion on the -advocacy list, the features we've picked out as
"major enhancements" for advocacy reasons this round are:

* Materialized Views (because with refresh concurrently, MatViews are
now useful, which they weren't, very, in 9.3)

* Logical Decoding/Changeset Extraction which we're calling "Data Change
Streaming API" in the advocacy stuff.  Not sure if you want to use that
name in the technical realease notes.

* Dynamic Background Workers (this one is a major feature, but won't be
front-listed in advocacy doc because it's hard to explain and nobody's
built anything cool with it yet.  But it should go under major
enhancements in the release notes)

* JSONB and related features (such as indexing)

* ALTER SYSTEM SET

Lemme know if you need description text for any of the above.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Andres Freund

On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
> On 05/03/2014 01:07 AM, Andres Freund wrote:
> > On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
> >> Just got a report of a replication issue with 9.2.8 from a community 
> >> member:
> >>
> >> Here's the sequence:
> >>
> >> 1) A --> B (sync rep)
> >>
> >> 2) Shut down B
> >>
> >> 3) Shut down A
> >>
> >> 4) Start up B as a master
> >>
> >> 5) Start up A as sync replica of B
> >>
> >> 6) A successfully joins B as a sync replica, even though its transaction
> >> log is 1016 bytes *ahead* of B.
> >>
> >> 7) Transactions written to B all hang
> >>
> >> 8) Xlog on A is now corrupt, although the database itself is OK
> > 
> > This is fundamentally borked practice.
> > 
> >> Now, the above sequence happened because of the user misunderstanding
> >> what sync rep really means.  However, A should not have been able to
> >> connect with B in replication mode, especially in sync rep mode; that
> >> should have failed.  Any thoughts on why it didn't?
> > 
> > I'd guess that B, while starting up, has written further WAL records
> > bringing it further ahead of A.
> 
> Apparently not; from what I've seen pg_stat_replication even *shows*
> that the replica is ahead of the master.  Futher, Postgres should have
> recognized that there was a timeline branch point before A's last
> record, no?

There wasn't any timeline increase because - as far as I understand the
above - there wasn't any promotion. The cluster was shut down and
recovery.conf was created/removed respectively.

To me this is a operator error. We could try to defend against it more
vigorously, but thats's hard to do without breaking actual usecases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Docs for 9.4's worker_spi?

2014-05-05 Thread Bruce Momjian

Seems there is no documentation for the 9.4 worker_spi contrib module.  Is
this OK?  The comment at the top of the C file says:

 *  Sample background worker code that demonstrates various coding
 *  patterns: establishing a database connection; starting and committing
 *  transactions; using GUC variables, and heeding SIGHUP to reread
 *  the configuration file; reporting to pg_stat_activity; using the
 *  process latch to sleep and exit in case of postmaster death.


-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Josh Berkus

On 05/05/2014 10:25 AM, Andres Freund wrote:
> On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
>> On 05/03/2014 01:07 AM, Andres Freund wrote:
>>> On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
 Just got a report of a replication issue with 9.2.8 from a community 
 member:

 Here's the sequence:

 1) A --> B (sync rep)

 2) Shut down B

 3) Shut down A

 4) Start up B as a master

 5) Start up A as sync replica of B

 6) A successfully joins B as a sync replica, even though its transaction
 log is 1016 bytes *ahead* of B.

 7) Transactions written to B all hang

 8) Xlog on A is now corrupt, although the database itself is OK
>>>
>>> This is fundamentally borked practice.
>>>
 Now, the above sequence happened because of the user misunderstanding
 what sync rep really means.  However, A should not have been able to
 connect with B in replication mode, especially in sync rep mode; that
 should have failed.  Any thoughts on why it didn't?
>>>
>>> I'd guess that B, while starting up, has written further WAL records
>>> bringing it further ahead of A.
>>
>> Apparently not; from what I've seen pg_stat_replication even *shows*
>> that the replica is ahead of the master.  Futher, Postgres should have
>> recognized that there was a timeline branch point before A's last
>> record, no?
> 
> There wasn't any timeline increase because - as far as I understand the
> above - there wasn't any promotion. The cluster was shut down and
> recovery.conf was created/removed respectively.

Ah, oops, left out a step.  B was promoted.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Josh Berkus

All,

I'm working with the cstore_fdw project, which has an interesting
property for an FDW: the FDW itself creates the files which make up the
database.   This raises a couple of questions:

1) Do we want to establish a standard directory for FDWs which create
files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
leave it up to each FDW to decide?

2) Do we want to support the TABLESPACE directive for FDWs?

While cstore is the first FDW to create its own files, it won't
necessarily be the last; I could imagine CSV_FDW doing so as well, or a
future SQLite_FDW which does the same.  So I think the above questions
are worth answering in general.  And we're planning to implement
automated file management for cstore_fdw fairly soon, so we want to make
it consistent with whatever we're doing in Postgres 9.5.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 13:07:48 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-05-05 10:07:46 -0400, Tom Lane wrote:
> >> Also, -1 for adding another log_line_prefix escape.  If you're routing
> >> multiple clusters logging to the same place (which is already a bit
> >> unlikely IMO), you can put distinguishing strings in log_line_prefix
> >> already.  And it's not like we've got an infinite supply of letters
> >> for those escapes.
> 
> > Using syslog and including the same config file from multiple clusters
> > isn't that uncommon. But I can live without it.
> 
> So, if you are sharing a config file, how is it that you can set a
> per-cluster cluster_name but not a per-cluster log_line_prefix?

Well, it's a included file. With log_line_prefix support just
cluster_name has to be set per cluster. Without you need string
interpolation in the config management to include cluster_name in
log_line_prefix.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Josh Berkus

On 05/03/2014 01:07 AM, Andres Freund wrote:
> On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
>> Just got a report of a replication issue with 9.2.8 from a community member:
>>
>> Here's the sequence:
>>
>> 1) A --> B (sync rep)
>>
>> 2) Shut down B
>>
>> 3) Shut down A
>>
>> 4) Start up B as a master
>>
>> 5) Start up A as sync replica of B
>>
>> 6) A successfully joins B as a sync replica, even though its transaction
>> log is 1016 bytes *ahead* of B.
>>
>> 7) Transactions written to B all hang
>>
>> 8) Xlog on A is now corrupt, although the database itself is OK
> 
> This is fundamentally borked practice.
> 
>> Now, the above sequence happened because of the user misunderstanding
>> what sync rep really means.  However, A should not have been able to
>> connect with B in replication mode, especially in sync rep mode; that
>> should have failed.  Any thoughts on why it didn't?
> 
> I'd guess that B, while starting up, has written further WAL records
> bringing it further ahead of A.

Apparently not; from what I've seen pg_stat_replication even *shows*
that the replica is ahead of the master.  Futher, Postgres should have
recognized that there was a timeline branch point before A's last
record, no?

I'm working on getting permission to access the DB files.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regexp_replace( , , , NULL ) returns null?

2014-05-05 Thread Jim Nasby


On 5/2/14, 8:57 PM, Tom Lane wrote:

Jim Nasby  writes:

ISTM it’d be a lot better if it treated NULL flags the same as ‘’...


In Oracle's universe that probably makes sense, but to me it's not
sensible.  Why should "unknown" flags produce a non-unknown result?


Only because they're more options than data.


I find it hard to envision many use-cases where you wouldn't actually
have the flags as a constant, anyway; they're too fundamental to the
behavior of the function.


Unless you're wrapping this function; handling the case of the flags being 
optional becomes easier then.

(FWIW, I'm creating a version that accepts an array of search/replace 
arguments.)
--
Jim Nasby, Lead Data Architect   (512) 569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Tom Lane

Andres Freund  writes:
> On 2014-05-05 10:07:46 -0400, Tom Lane wrote:
>> Also, -1 for adding another log_line_prefix escape.  If you're routing
>> multiple clusters logging to the same place (which is already a bit
>> unlikely IMO), you can put distinguishing strings in log_line_prefix
>> already.  And it's not like we've got an infinite supply of letters
>> for those escapes.

> Using syslog and including the same config file from multiple clusters
> isn't that uncommon. But I can live without it.

So, if you are sharing a config file, how is it that you can set a
per-cluster cluster_name but not a per-cluster log_line_prefix?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Fix use of free in walsender error handling after a sysid mismatch.

2014-05-05 Thread Andres Freund

Hi,

Walsender does a PQClear(con) but then accesses data acquired with
PQgetvalue(). That's clearly not ok.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 1b7df22ff6da40ad4fcfa4eeacc1822373baa4e6 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 5 May 2014 18:03:44 +0200
Subject: [PATCH] Fix use of free in walsender error handling after a sysid
 mismatch.

Found via valgrind.

The bug exists since the introduction of the walsender, so backpatch
to 9.0.
---
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 96f31c4..88d27c7 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -152,6 +152,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 			 GetSystemIdentifier());
 	if (strcmp(primary_sysid, standby_sysid) != 0)
 	{
+		primary_sysid = pstrdup(primary_sysid);
 		PQclear(res);
 		ereport(ERROR,
 (errmsg("database system identifier differs between the primary and standby"),
-- 
1.8.5.rc2.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tab completion for setting search_path

2014-05-05 Thread Jeff Janes

On Sat, May 3, 2014 at 1:11 AM, Andres Freund wrote:

> On 2014-05-03 00:13:45 -0700, Jeff Janes wrote:
> > On Friday, May 2, 2014, Jeff Janes  wrote:
> >
> > > I've been working with an app that uses a schema name whose spelling is
> > > hard to type, and the lack of tab completion for "SET search_path TO"
> was
> > > bugging me.  So see attached.
> > >
> > > I filter out the system schemata, but not public.
>
> That'd be nice.
>
> > diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> > new file mode 100644
> > index 6d26ffc..dec3d4a
> > *** a/src/bin/psql/tab-complete.c
> > --- b/src/bin/psql/tab-complete.c
> > *** psql_completion(const char *text, int st
> > *** 3230,3235 
> > --- 3230,3242 
> >
> >   COMPLETE_WITH_LIST(my_list);
> >   }
> > + else if (pg_strcasecmp(prev2_wd, "search_path") == 0)
> > + {
> > + COMPLETE_WITH_QUERY(Query_for_list_of_schemas
> > + " AND
> nspname not like 'pg\\_%%' "
> > + " AND
> nspname not like 'information_schema' "
> > + " UNION
> SELECT 'DEFAULT' ");
> > + }
>
> Why should we exclude system schemata? That seems more likely to be
> confusing than helpful? I can see a point in excluding another backend's
> temp tables, but otherwise?
>

I've personally never had a need to set the search_path to a system schema,
and I guess I was implicitly modelling this on what is returned by \dn, not
by \dnS.   I wouldn't object much to including them; that would be better
than not having any completion.  I just don't see much point.

And now playing a bit with the system ones, I think it would be more
confusing to offer them.  pg_catalog and pg_temp_ always get
searched, whether you put them in the search_path or not.

Cheers,

Jeff

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 11:28:21AM -0400, Andrew Dunstan wrote:
> No, I think you're missing the point of what I'm saying, and I think
> the addition is at best misleading. I would avoid talk of key value
> pairs, anyway, that's not all that's in a json document (it might be
> an array, for example).
> 
> How about:
> 
>This data type allows for faster access to values in the json document and 
> faster and more useful indexing of json.

OK, I used you wording.  You are right I didn't understand it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-05-05 08:03:04 -0400, Stephen Frost wrote:
> > Uhh, that's not at all true.  You can trivially have multiple IPs on a
> > box w/o jails or containers (aliased interfaces) and then run PG on the
> > default port- which I find to be *far* more convenient than having the
> > same IP and a bunch of different ports.
> 
> Only that you then need different socket directories. Do you really do
> that regularly?

Yup.  I've wished for that to be the default in Debian quite a few
times, actually...  Much easier to deal with firewall rules and users
who are coming from pgAdmin and friends if they only have to deal with
understanding what hosts they need to connect to, and not worry about
the port also.

> Anyway, I am happy having the cluster_name thingy.

Agreed.

Thanks,

Stpehen

signature.asc
Description: Digital signature

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 10:07:46 -0400, Tom Lane wrote:
> Stephen Frost  writes:
> > Including the value of listen_addresses along w/ the port would make it
> > useful.  If we really don't want the cluster-name concept (which,
> > personally, I like quite a bit), how about including the listen_address
> > value if it isn't '*'?
> 
> Nah, let's do cluster name.  That way, somebody who's only got one
> postmaster isn't suddenly going to see a lot of useless clutter,
> ie the user gets to decide what to add to ps output.  "SHOW cluster_name"
> might be useful at the application level as well, I suspect.

Hm. What about unifiyng this with event_source? Not sure if it's a good
idea, but it's a pretty similar thing.

> Also, -1 for adding another log_line_prefix escape.  If you're routing
> multiple clusters logging to the same place (which is already a bit
> unlikely IMO), you can put distinguishing strings in log_line_prefix
> already.  And it's not like we've got an infinite supply of letters
> for those escapes.

Using syslog and including the same config file from multiple clusters
isn't that uncommon. But I can live without it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 08:03:04 -0400, Stephen Frost wrote:
> Craig,
> 
> * Craig Ringer (cr...@2ndquadrant.com) wrote:
> >   postgres[5433]: checkpointer process
> > 
> > at least as useful. The only time that's not unique is in a BSD jail or
> > lxc container, and in those cases IIRC ps can show you the
> > jail/container anyway.
> 
> Uhh, that's not at all true.  You can trivially have multiple IPs on a
> box w/o jails or containers (aliased interfaces) and then run PG on the
> default port- which I find to be *far* more convenient than having the
> same IP and a bunch of different ports.

Only that you then need different socket directories. Do you really do
that regularly?

Anyway, I am happy having the cluster_name thingy.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 09:52:33 -0400, Tom Lane wrote:
> Andres Freund  writes:
> >>> Aren't you potentially dereferencing a NULL pointer here?
> 
> >> Hmm -- I thought the GUC machinery would make sure cluster_name either
> >> pointed to the default I provided, an empty string, or a string read from
> >> the configuration file.  Perhaps I need to go and read up on how GUCs work.
> 
> > That's true - but I am not sure you can guarantee it's only called after
> > the GUC machinery has started up.
> 
> The elog code MUST be able to work before GUC initialization is done.
> What do you think will happen if we fail to open postgresql.conf,
> for example?

But elog. shouldn't call init_ps_display(), right? Anyway, I am all for
initializing it sensibly, after all it was I that suggested doing so above...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread David G Johnston

Merlin Moncure-2 wrote
> On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja <

> marko@

> > wrote:
>> On 5/2/14, 10:10 PM, Merlin Moncure wrote:
>>>
>>> On Fri, May 2, 2014 at 3:03 PM, Tom Lane <

> tgl@.pa

> > wrote:

 Meh.  Then you could have a query that works fine until you add a
 column
 to the table, and it stops working.  If nobody ever used column names
 identical to table names it'd be all right, but unfortunately people
 seem to do that a lot...
>>>
>>>
>>> That's already the case with select statements
>>
>> I don't think that's true if you table-qualify your column references and
>> don't use SELECT *.
>>
>>
>>> and, if a user were
>>> concerned about that, always have the option of aliasing the table as
>>> nearly 100% of professional developers do:
>>>
>>> SELECT f FROM foo f;
>>> etc.
>>
>>
>> So e.g.:
>>
>>   UPDATE foo f SET f = ..;
>>
>> would resolve to the table, despite there being a column called "f"? That
>> would break backwards compatibility.
>>
>> How about:
>>
>>   UPDATE foo SET ROW(foo) = (1,2,3);
>>
>> ISTM that this could be parsed unambiguously, though it's perhaps a bit
>> ugly.
> 
> Hm, that's a bit too ugly: row(foo) in this case means 'do special
> behavior X' whereas in all other cases it means make an anonymous
> rowtype with one attribute of type 'foo'.
> 
> How about:
> UPDATE foo SET (foo).* = (1,2,3);

Wouldn't

UPDATE foo SET (foo.*) = (1,2,3)

be better since it would cleanly support non-complete types like

UPDATE foo SET (foo.col1, foo.col3) = (1,3)

Though I am not that concerned about overloading the use of "ROW" in context
of an UPDATE.

As with normal usage of ROW why not make its presence optional - support
both syntaxes?

Keywords like USING and SET have different meanings when used in
DELETE/UPDATE so having ROW behave similarly wouldn't be that confusing -
and it does seem to have an ambiguity if you restrict this interpretation of
ROW to only the SET of the update statement.

Is there any need or requirement for (or against) interleaving normal and
row-valued, or even multiple row-valued, SET expressions?

UPDATE foo SET (foo.col1, foo.col3) = (1,3), foo.col2 = 2

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Supporting-multiple-column-assignment-in-UPDATE-9-5-project-tp5802240p5802471.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread Merlin Moncure

On Mon, May 5, 2014 at 10:32 AM, Andrew Dunstan  wrote:
>
> On 05/05/2014 11:20 AM, Pavel Stehule wrote:
>>
>>
>>
>>
>> How about:
>> UPDATE foo SET (foo).* = (1,2,3);
>>
>>
>> It is looking little bit strange
>>
>> I like previous proposal UPDATE foo SET foo = (1,2,3);
>>
>
> What if the table has a field called foo? Won't it then be ambiguous?

See upthread: it prefers the field to the table if both are there
(exactly as SELECT does).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread Andrew Dunstan



On 05/05/2014 11:20 AM, Pavel Stehule wrote:




How about:
UPDATE foo SET (foo).* = (1,2,3);


It is looking little bit strange

I like previous proposal UPDATE foo SET foo = (1,2,3);



What if the table has a field called foo? Won't it then be ambiguous?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Andrew Dunstan



On 05/05/2014 09:38 AM, Bruce Momjian wrote:

On Sun, May  4, 2014 at 10:24:54AM -0400, Andrew Dunstan wrote:

On 05/04/2014 10:12 AM, Petr Jelinek wrote:

On 04/05/14 14:46, Bruce Momjian wrote:

I have completed the initial version of the 9.4 release notes.  You can
view them here:

http://www.postgresql.org/docs/devel/static/release-9-4.html

I will be adding additional markup in the next few days.

Feedback expected and welcomed.  I expect to be modifying this until we
release 9.4 final.  I have marked items where I need help with question
marks.


Nice work,

one comment from me would be about jsonb:

+   
+
+ Add structured (non-text) data type (jsonb) for storing
JSON data (Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and
Andrew Dunstan)
+
+
+
+ This data type allows faster indexing and access to json
keys/value pairs.
+
+   

I think the explanation should be expanded to say that this data
type also has generic indexing not just faster indexing.



It's also slightly ambiguous - it's not immediately clear that the
"faster" also applies to the "access".

OK, how is this:

  This data type allows for faster indexing and access to json
  key/value pairs, as well as efficient indexing of all key/value
  pairs in a JSON document.



No, I think you're missing the point of what I'm saying, and I think the 
addition is at best misleading. I would avoid talk of key value pairs, 
anyway, that's not all that's in a json document (it might be an array, 
for example).


How about:

   This data type allows for faster access to values in the json document and 
faster and more useful indexing of json.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread Pavel Stehule

2014-05-05 17:02 GMT+02:00 Merlin Moncure :

> On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja  wrote:
> > On 5/2/14, 10:10 PM, Merlin Moncure wrote:
> >>
> >> On Fri, May 2, 2014 at 3:03 PM, Tom Lane  wrote:
> >>>
> >>> Meh.  Then you could have a query that works fine until you add a
> column
> >>> to the table, and it stops working.  If nobody ever used column names
> >>> identical to table names it'd be all right, but unfortunately people
> >>> seem to do that a lot...
> >>
> >>
> >> That's already the case with select statements
> >
> > I don't think that's true if you table-qualify your column references and
> > don't use SELECT *.
> >
> >
> >> and, if a user were
> >> concerned about that, always have the option of aliasing the table as
> >> nearly 100% of professional developers do:
> >>
> >> SELECT f FROM foo f;
> >> etc.
> >
> >
> > So e.g.:
> >
> >   UPDATE foo f SET f = ..;
> >
> > would resolve to the table, despite there being a column called "f"? That
> > would break backwards compatibility.
> >
> > How about:
> >
> >   UPDATE foo SET ROW(foo) = (1,2,3);
> >
> > ISTM that this could be parsed unambiguously, though it's perhaps a bit
> > ugly.
>
> Hm, that's a bit too ugly: row(foo) in this case means 'do special
> behavior X' whereas in all other cases it means make an anonymous
> rowtype with one attribute of type 'foo'.
>
> How about:
> UPDATE foo SET (foo).* = (1,2,3);
>

It is looking little bit strange

I like previous proposal UPDATE foo SET foo = (1,2,3);

Pavel


>
> merlin
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread Merlin Moncure

On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja  wrote:
> On 5/2/14, 10:10 PM, Merlin Moncure wrote:
>>
>> On Fri, May 2, 2014 at 3:03 PM, Tom Lane  wrote:
>>>
>>> Meh.  Then you could have a query that works fine until you add a column
>>> to the table, and it stops working.  If nobody ever used column names
>>> identical to table names it'd be all right, but unfortunately people
>>> seem to do that a lot...
>>
>>
>> That's already the case with select statements
>
> I don't think that's true if you table-qualify your column references and
> don't use SELECT *.
>
>
>> and, if a user were
>> concerned about that, always have the option of aliasing the table as
>> nearly 100% of professional developers do:
>>
>> SELECT f FROM foo f;
>> etc.
>
>
> So e.g.:
>
>   UPDATE foo f SET f = ..;
>
> would resolve to the table, despite there being a column called "f"? That
> would break backwards compatibility.
>
> How about:
>
>   UPDATE foo SET ROW(foo) = (1,2,3);
>
> ISTM that this could be parsed unambiguously, though it's perhaps a bit
> ugly.

Hm, that's a bit too ugly: row(foo) in this case means 'do special
behavior X' whereas in all other cases it means make an anonymous
rowtype with one attribute of type 'foo'.

How about:
UPDATE foo SET (foo).* = (1,2,3);

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:18:44AM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Sun, May  4, 2014 at 03:49:27PM +0200, Andres Freund wrote:
> 
> > > +  
> > > +   
> > > +Add huge_pages configuration parameter to attempt to use huge 
> > > translation look-aside buffer (TLB) pages on Linux (Christian Kruse,
> > > +Richard Poole, Abhijit Menon-Sen)
> > > +   
> > > 
> > > I think this is too detailed - how about: "... to use huge pages ..."?
> > 
> > I am not sure on this as the default is "try", but I updated the wording
> > like your suggested:
> > 
> > Add huge_pages configuration parameter to enable huge
> > translation look-aside buffer (TLB) pages on Linux
> 
> Please see
> http://www.postgresql.org/message-id/20140226161302.gd4...@eldon.alvh.no-ip.org
> about the "TLB huge pages" terminology.  Maybe "..enable huge MMU
> pages..." as suggested by Peter G. in that sub-thread.

You are totally correct.  I was referencing tlb from the _old_ name for
the variable, but didn't update the description when the variable was
renamed.

New text is:

Add huge_pages configuration parameter to use huge memory pages on 
Linux (Christian Kruse,
Richard Poole, Abhijit Menon-Sen)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Including the value of listen_addresses along w/ the port would make it
> > useful.  If we really don't want the cluster-name concept (which,
> > personally, I like quite a bit), how about including the listen_address
> > value if it isn't '*'?
> 
> Nah, let's do cluster name.  That way, somebody who's only got one
> postmaster isn't suddenly going to see a lot of useless clutter,
> ie the user gets to decide what to add to ps output.  "SHOW cluster_name"
> might be useful at the application level as well, I suspect.

Ah, yes, agreed, that could be quite useful.

> I still think the brackets are unnecessary though.

Either way is fine for me on this.

> Also, -1 for adding another log_line_prefix escape.  If you're routing
> multiple clusters logging to the same place (which is already a bit
> unlikely IMO), you can put distinguishing strings in log_line_prefix
> already.  And it's not like we've got an infinite supply of letters
> for those escapes.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Alvaro Herrera

Bruce Momjian wrote:
> On Sun, May  4, 2014 at 03:49:27PM +0200, Andres Freund wrote:

> > +  
> > +   
> > +Add huge_pages configuration parameter to attempt to use huge 
> > translation look-aside buffer (TLB) pages on Linux (Christian Kruse,
> > +Richard Poole, Abhijit Menon-Sen)
> > +   
> > 
> > I think this is too detailed - how about: "... to use huge pages ..."?
> 
> I am not sure on this as the default is "try", but I updated the wording
> like your suggested:
> 
>   Add huge_pages configuration parameter to enable huge
>   translation look-aside buffer (TLB) pages on Linux

Please see
http://www.postgresql.org/message-id/20140226161302.gd4...@eldon.alvh.no-ip.org
about the "TLB huge pages" terminology.  Maybe "..enable huge MMU
pages..." as suggested by Peter G. in that sub-thread.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Sun, May  4, 2014 at 08:46:07AM -0400, Bruce Momjian wrote:
> I have completed the initial version of the 9.4 release notes.  You can
> view them here:
> 
>   http://www.postgresql.org/docs/devel/static/release-9-4.html
> 
> I will be adding additional markup in the next few days.
> 
> Feedback expected and welcomed.  I expect to be modifying this until we
> release 9.4 final.  I have marked items where I need help with question
> marks.

I have updated output reflecting all hackers list feedback received so
far:

http://momjian.us/pgsql_docs/release-9-4.html

(The developer copy of the docs take a while to update.)  Now on to the
markup additions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> How about dropping the brackets, and the cluster-name concept, and
>> just doing
>> 
>> postgres: 5432 checkpointer process

> -1 for my part, as I'd just end up with a bunch of those and no
> distinction between the various processes.  In other words, without a
> cluster distinction, it's useless.

Yeah, after I sent that I got to the bit about running multiple
postmasters with different IP-address bindings.  I agree the port number
alone wouldn't be enough in that scenario.

> Including the value of listen_addresses along w/ the port would make it
> useful.  If we really don't want the cluster-name concept (which,
> personally, I like quite a bit), how about including the listen_address
> value if it isn't '*'?

Nah, let's do cluster name.  That way, somebody who's only got one
postmaster isn't suddenly going to see a lot of useless clutter,
ie the user gets to decide what to add to ps output.  "SHOW cluster_name"
might be useful at the application level as well, I suspect.

I still think the brackets are unnecessary though.

Also, -1 for adding another log_line_prefix escape.  If you're routing
multiple clusters logging to the same place (which is already a bit
unlikely IMO), you can put distinguishing strings in log_line_prefix
already.  And it's not like we've got an infinite supply of letters
for those escapes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor improvement to fdwhandler.sgml

2014-05-05 Thread Robert Haas

On Wed, Apr 30, 2014 at 4:10 AM, Etsuro Fujita
 wrote:
> (2014/04/28 23:31), Robert Haas wrote:
>> On Thu, Apr 24, 2014 at 7:59 AM, Etsuro Fujita
>>  wrote:
>>>
>>> The patch attached improves docs in fdwhandler.sgml a little bit.
>>
>>
>> When you submit a patch, it's helpful to describe what the patch
>> actually does, rather than just saying it makes things better.  For
>> example, I think that this patch could be described as "in
>> fdwhandler.sgml, mark references to scan_clauses with 
>> tags".
>
>
> I thought so.  Sorry, my explanation wasn't enough.
>
>
>> A problem with that idea is that scan_clauses is not a field in any
>> struct.
>
>
> I was mistaken.  I think those should be marked with  tags. Patch
> attached.

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> How about dropping the brackets, and the cluster-name concept, and
> just doing
> 
>  postgres: 5432 checkpointer process

-1 for my part, as I'd just end up with a bunch of those and no
distinction between the various processes.  In other words, without a
cluster distinction, it's useless.

Including the value of listen_addresses along w/ the port would make it
useful.  If we really don't want the cluster-name concept (which,
personally, I like quite a bit), how about including the listen_address
value if it isn't '*'?  I could see that also helping users who
installed from a distro and got '127.0.0.1' and don't understand why
they can't connect...

Of course, these are users who can use 'ps' but not 'netstat'.  Not sure
how big that set really is.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Tom Lane

Andres Freund  writes:
> On 2014-05-05 10:49:19 +, Thomas Munro wrote:
>> Hah -- I agree, but on systems using setproctitle, the program name and ":
>> " are provided already, so the end result would have to be different on
>> those systems and I figured it should be the same everywhere if possible.

> Fair point.

How about dropping the brackets, and the cluster-name concept, and
just doing

 postgres: 5432 checkpointer process

>>> Aren't you potentially dereferencing a NULL pointer here?

>> Hmm -- I thought the GUC machinery would make sure cluster_name either
>> pointed to the default I provided, an empty string, or a string read from
>> the configuration file.  Perhaps I need to go and read up on how GUCs work.

> That's true - but I am not sure you can guarantee it's only called after
> the GUC machinery has started up.

The elog code MUST be able to work before GUC initialization is done.
What do you think will happen if we fail to open postgresql.conf,
for example?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tab completion for setting search_path

2014-05-05 Thread Bernd Helmle




--On 3. Mai 2014 10:11:33 +0200 Andres Freund  
wrote:



diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 6d26ffc..dec3d4a
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*** psql_completion(const char *text, int st
*** 3230,3235 
--- 3230,3242 

COMPLETE_WITH_LIST(my_list);
}
+   else if (pg_strcasecmp(prev2_wd, "search_path") == 0)
+   {
+   COMPLETE_WITH_QUERY(Query_for_list_of_schemas
+   " AND nspname not 
like 'pg\\_%%' "
+   " AND nspname not 
like 'information_schema' "
+   " UNION SELECT 
'DEFAULT' ");
+   }


Why should we exclude system schemata? That seems more likely to be
confusing than helpful? I can see a point in excluding another backend's
temp tables, but otherwise?


I put my hands on this a while ago, too, but had a different notion in 
mind, which schema the completion should select. I came up with the 
following:




Just complete to a schema someone has CREATE or USAGE privs. However, the 
reason i stopped working on it was that i really want to have a completion 
to a list of schemas as well and i couldn't figure a good and easy way to 
do this atm.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 09:23:15AM +0300, Heikki Linnakangas wrote:
> On 05/04/2014 03:46 PM, Bruce Momjian wrote:
> >Auto-resize the catalog cache (Heikki Linnakangas)
> >
> >This reduces memory consumption for backends accessing only a few
> >tables, and improves performance for backend accessing many tables.
> 
> Move this to "General performance" section.

OK, moved.

> >Improve spinlock speed on x86_64 CPUs (test on i386?) (Heikki Linnakangas)
> 
> I believe this refers to commit b03d196b. The "test on i386" note
> can just be removed. Someone ought to test it on 32-bit i386 to see
> if the same change would be beneficial there, but that's a TODO item
> more than a release notes item. I doubt anyone's interested to spend
> time performance testing spinlocks on 32-bit i386, though, so I
> think we're going to just retain the current situation for the next
> decade.

Agreed.  Text removed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 09:10:11AM +0300, Heikki Linnakangas wrote:
> On 05/04/2014 03:46 PM, Bruce Momjian wrote:
> >I have completed the initial version of the 9.4 release notes.
> 
> Thanks!
> 
> >You can view them here:
> >
> > http://www.postgresql.org/docs/devel/static/release-9-4.html
> >
> >I will be adding additional markup in the next few days.
> >
> >Feedback expected and welcomed.  I expect to be modifying this until we
> >release 9.4 final.  I have marked items where I need help with question
> >marks.
> 
> Two rather large changes to the B-tree algorithms are missing:
> 
> * Fix race condition in B-tree page deletion (commit efada2b8)
> 
> * Make the handling of interrupted B-tree page splits more robust
> (commit 40dae7ec)
> 
> These changes are not visible to users, but they are good toknow for
> beta testers.

Thanks, added.  I though these were fixes for 9.4-only bugs, not fixes
for earlier bugs.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Sun, May  4, 2014 at 10:24:54AM -0400, Andrew Dunstan wrote:
> 
> On 05/04/2014 10:12 AM, Petr Jelinek wrote:
> >On 04/05/14 14:46, Bruce Momjian wrote:
> >>I have completed the initial version of the 9.4 release notes.  You can
> >>view them here:
> >>
> >>http://www.postgresql.org/docs/devel/static/release-9-4.html
> >>
> >>I will be adding additional markup in the next few days.
> >>
> >>Feedback expected and welcomed.  I expect to be modifying this until we
> >>release 9.4 final.  I have marked items where I need help with question
> >>marks.
> >>
> >
> >Nice work,
> >
> >one comment from me would be about jsonb:
> >
> >+   
> >+
> >+ Add structured (non-text) data type (jsonb) for storing
> >JSON data (Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and
> >Andrew Dunstan)
> >+
> >+
> >+
> >+ This data type allows faster indexing and access to json
> >keys/value pairs.
> >+
> >+   
> >
> >I think the explanation should be expanded to say that this data
> >type also has generic indexing not just faster indexing.
> >
> 
> 
> It's also slightly ambiguous - it's not immediately clear that the
> "faster" also applies to the "access".

OK, how is this:

 This data type allows for faster indexing and access to json
 key/value pairs, as well as efficient indexing of all key/value
 pairs in a JSON document.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication and archive_command

2014-05-05 Thread Heikki Linnakangas


On 05/05/2014 04:19 PM, Michael Renner wrote:

Hi,

apparently a few users were puzzled that archive_command is ignored
on slave servers, which comes as a surprise since streaming
replication will work fine from slaves and as far as I’ve checked the
documentation also doesn’t point out the fact that archive_command
gets a different treatment.

Is this intentional or an oversight? Should this be fixed in the code
(feature parity to SR) or in the documentation making this more
explicit?


It was intentional, although I can certainly understand the viewpoint 
that archive_command should also archive in the standby. IIRC people 
argued it both ways when the cascading replication was discussed


The current assumption is that the archive is shared by the master and 
standby (or standbys), so that there is no point in archiving the same 
file again in the standby. On the contrary, re-archiving the same file 
would fail, so you would need to disable archiving in the standby, and 
re-enable it when promoting, which would be more complicated.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] folder:lk/lk date:yesterday..

2014-05-05 Thread Heikki Linnakangas


On 04/30/2014 06:39 PM, Andres Freund wrote:

Hi,

Coverity flagged a couple of issues that seem to worth addressing by
changing the code instead of ignoring them:

01) heap_xlog_update() looks to coverity as if it could trigger a NULL
 pointer dereference. That's because it thinks that oldtup.t_data is
 NULL if XLR_BKP_BLOCK(0) while reconstructing incremental
 tuples. That fortunately can't happen as incremental updates are
 only performed if the tuples are on the same page.
 Add an Assert().
02) Be a bit more consistent in what type to use for a size
 variable. Inconsequential, but more consistent.
03) Don't leak memory after connection aborts in pg_recvlogical.
04) Use a sensible parameter for memset() when randomizing memory in
 reorderbuffer. Inconsequential.

Could somebody please pick these up?


Committed, thanks.


I have to say, I am not particularly happy with the complexity of the
control flow in heap_xlog_update() :(.


Agreed :-(. It might be good to split it into two functions, one for 
same-page updates and another for others. And have two different WAL 
record structs for the cases, too.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Sun, May  4, 2014 at 06:06:52PM +0400, Oleg Bartunov wrote:
> Bruce,
> 
> you forgot Alexander Korotkov, who contributed jsonb_hash_ops opclass
> for GIN.  Something like
> "Alexander Korotkov introduced an elegant jsonb_hash ops for GIN,
> which competes with MongoDB performance in contains operator".
> 
> Here is a link to discussion -
> http://www.postgresql.org/message-id/53304439.8000...@dunslane.net

OK, added.  As I remember Alexander's name was not in the initial commit
but mentioned in a later one, and I somehow missed that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Sun, May  4, 2014 at 03:49:27PM +0200, Andres Freund wrote:
> Hi,
> 
> On 2014-05-04 08:46:07 -0400, Bruce Momjian wrote:
> > Feedback expected and welcomed.  I expect to be modifying this until we
> > release 9.4 final.  I have marked items where I need help with question
> > marks.
> 
> Thanks for doing that work. Some comments inline:

Oh, I forgot to thank committers for making this job easier every year. 
The commit titles are helpful, the descriptions good, and many items are
marked as backward incompatible where appropriate.  Git hashes make
looking up commit details easy.

> +
> + 
> +  Remove system column pg_class.reltoastidxid (Michael Paquier)
> + 
> +
> + 
> +  Instead use normal index access methods.
> + 
> +
> 
> The explanation doesn't seem helpful to me. I'd just remove it as the
> full explanation seems to be to complex and not actually interesting.

OK, description removed.  I know pg_upgrade has to be modified to handle
this.  Hopefully others will understand how to adjust things.  You are
right that the full description is complex.

> 
> +   
> +Server
> +
> + 
> +
> +  
> +   
> +Have VACUUM properly report dead but not removable rows to the 
> statistics collector (Hari Babu)
> +   
> +
> +   
> +Previously these were reported as live rows.
> +   
> +  
> +
> +  
> +   
> +Improve SSL renegotiation handling (Álvaro Herrera)
> +   
> +  
> +
> +  
> +   
> +During immediate shutdown, send uncatchable termination signals to 
> child processes that have not already shutdown (MauMau,
> +Álvaro Herrera)
> +   
> +
> +   
> +This reduces the likelihood of orphaned child processes after 
> postmaster shutdown.
> +   
> +  
> +
> +  
> +   
> +Improve randomness of the database system identifier (Tom Lane)
> +   
> +  
> +
> +  
> +   
> +Allow background workers to be dynamically started and terminated 
> (Robert Haas)
> +   
> 
> This is much more important than the previous features imo. Also, I'd
> add the word "registered" in the above list.
> 
> +  
> +   
> +Allow dynamic allocation of shared memory segments (Robert Haas, 
> Amit Kapila)
> +   
> +  
> 
> Should also be earlier in the list.

OK, added "registered" and I moved this item and the one above to #2 and
#3.  Let me know if that isn't good.  I was thinking the dynamic stuff
wasn't useful until we had parallelism working, but I think I was wrong.

> 
> +  
> +   
> +Freeze tuples when tables are written with CLUSTER or VACUUM FULL 
> (Robert Haas, Andres Freund)
> +   
> +
> +   
> +This avoids later freezing overhead.
> +   
> +  
> 
> This sentence seems a bit awkward to me. Maybe "This avoids the need to
> freeze tuples at some later point in time."

OK, I wrote, "This avoids the need to freeze the tuples in the future."
> 
> +  
> +   
> +Improve speed of accessing many sequence values (David Rowley)
> +   
> +  
> 
> I think this description isn't accurate. Isn't it something like
> "Improve speed of accesessing many different sequences in the same backend."

Yes, better, thanks.

> +  
> +   
> +Use memory barriers to avoid some spinlock use (Heikki Linnakangas)
> +   
> +  
> 
> This is about 1a3d104475ce01326fc00601ed66ac4d658e37e5? If so I'd put it
> as: "Reduce spinlock contention during WAL replay." or similar.

Uh, yes.  I am not sure why I fixed on the memory barrier/spinlock part.
I used your text and moved it to the replication section.

> This imo deserves to be further up this list as this is one of the
> biggest bottlenecks in HS node performance.

I didn't move it too high up as it isn't a user-visible change, but I
put it at the top of the non-visible part.  Let me know if it needs
further promotion.

> +  
> +   
> +Add huge_pages configuration parameter to attempt to use huge 
> translation look-aside buffer (TLB) pages on Linux (Christian Kruse,
> +Richard Poole, Abhijit Menon-Sen)
> +   
> 
> I think this is too detailed - how about: "... to use huge pages ..."?

I am not sure on this as the default is "try", but I updated the wording
like your suggested:

Add huge_pages configuration parameter to enable huge
translation look-aside buffer (TLB) pages on Linux

> +   
> +This can improve performance on large memory systems.
> +   
> +  
> 
> Should this be in the performance section?

Well, performance is listed as "General Performance". We also have index
changes that help performance too but they are under "Index".  I think
it is probably in the right place as "configuration" is more specific
than "general performance".

> +  
> +   
> +Add configuration variable data_checksums to report whether data 
> page checksums

[HACKERS] Cascading replication and archive_command

2014-05-05 Thread Michael Renner

Hi,

apparently a few users were puzzled that archive_command is ignored on slave 
servers, which comes as a surprise since streaming replication will work fine 
from slaves and as far as I’ve checked the documentation also doesn’t point out 
the fact that archive_command gets a different treatment.

Is this intentional or an oversight? Should this be fixed in the code (feature 
parity to SR) or in the documentation making this more explicit?

best,
Michael

[1] 
http://serverfault.com/questions/514136/postgresql-archive-command-on-a-cascading-standby-server
 as well as personal contacts


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [HACKERS] Fix initdb for path with whitespace and at char

2014-05-05 Thread Heikki Linnakangas


On 05/01/2014 07:55 AM, Amit Kapila wrote:

On Wed, Apr 30, 2014 at 4:01 PM, Heikki Linnakangas
 wrote:

I committed the non-invasive fixes to backbranches (and master too, just to
keep it in sync), but the attached is what I came up with for master.

There are a couple of places in the code where we have #ifdef WIN32 code
that uses CreateProcess with "CMD /C ..." directly.


1. Do we need similar handling for CreatePipe case where it directly uses
executable path such as in function pipe_read_line()?
Currently the caller of pipe_read_line() calls canonicalize_path() to change
win32 specific path, is that sufficient or do we need SYSTEMQUOTE type
of handling for it.


No, SYSTEMQUOTE style handling is not required with CreateProcess. 
find_other_exec, which is the only caller of pipe_read_line, adds one 
pair of double-quotes around the executable's path, which is enough for 
CreateProcess.



2.
system.c
#include 
Do we really need inclusion of assert.h or this is for future use?


You're right, that's not needed.


3. One more observation is that currently install.bat doesn't work
for such paths:
install.bat "e:\PostgreSQL\master\install 1\ins@1"
1\ins@1""=="" was unexpected at this time.


Yeah, I noticed that. I haven't looked into what it would take to fix 
that, but for now you can just install to a path that doesn't contain 
whitespace, and move it from there. install.bat is only used by 
developers / packagers, so that's not a big deal.



4. Similar to Andrew, I also could not reproduce this problem on my
Windows system (Windows 7 64 bit)
e:\>"e:\PostgreSQL\master\install 1\ins@1\bin\initdb.exe" -D "e:
\PostgreSQL\master\Data 1"
e:\>"e:\PostgreSQL\master\install 1\ins@1\bin\pg_ctl.exe" start -D "e:
\PostgreSQL\master\Data 1"

Above commands work fine.


Hmm, I'm also testing on 64-bit Windows 7, and it failed for me. Note 
that I already committed the narrow fix for initdb - which I also 
backpatched - to master, so to reproduce you'll need to revert that 
locally (commit 503de546).


I fixed the issues with malloc that Tom pointed out and committed the 
wrapper functions to git master. But please let me know if there's still 
a problem with it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Thomas Munro

On 5 May 2014 10:49, Thomas Munro  wrote:

> On 5 May 2014 10:10, Andres Freund  wrote:
>
>> I guess the question is where this should be available as well. At the
>> very least I'd want to reference it in log_line_prefix as well?
>>
>
> Good idea, I will look into that.
>

See v2 patch attached which lets you use %C for cluster name in the log
prefix.

Maybe it would be overkill, but seeing the various responses about which
information belongs in the ps string, I guess we could also use a format
string with %blah fields for that.  Then the Debian/Ubuntu package users
who tend to think of the major version + name as the complete cluster
identifier could use "[%V/%C] ..." to get "postgres: [9.3/main] ...", and
others could throw in a "%P" to see a port number.
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 977fc66..0adfee7 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2345,6 +2345,12 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, "%lx.%x", (long) (MyStartTime), MyProcPid);
 break;
+			case 'C':
+if (padding != 0)
+	appendStringInfo(buf, "%*s", padding, cluster_name);
+else
+	appendStringInfoString(buf, cluster_name);
+break;
 			case 'p':
 if (padding != 0)
 	appendStringInfo(buf, "%*d", padding, MyProcPid);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 15020c4..7f7fd52 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -449,6 +449,7 @@ int			temp_file_limit = -1;
 
 int			num_temp_buffers = 1024;
 
+char	   *cluster_name;
 char	   *data_directory;
 char	   *ConfigFileName;
 char	   *HbaFileName;
@@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
+		{"cluster_name", PGC_POSTMASTER, CUSTOM_OPTIONS,
+			gettext_noop("Sets the name of the cluster that appears in 'ps' output."),
+			NULL,
+			GUC_IS_NAME
+		},
+		&cluster_name,
+		"",
+		NULL, NULL, NULL
+	},
+
+	{
 		{"data_directory", PGC_POSTMASTER, FILE_LOCATIONS,
 			gettext_noop("Sets the server's data directory."),
 			NULL,
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 6294ca3..ead7ea4 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -29,6 +29,7 @@
 #include "libpq/libpq.h"
 #include "miscadmin.h"
 #include "utils/ps_status.h"
+#include "utils/guc.h"
 
 extern char **environ;
 bool		update_process_title = true;
@@ -264,15 +265,24 @@ init_ps_display(const char *username, const char *dbname,
 	 * apparently setproctitle() already adds a `progname:' prefix to the ps
 	 * line
 	 */
-	snprintf(ps_buffer, ps_buffer_size,
-			 "%s %s %s ",
-			 username, dbname, host_info);
+#define PROGRAM_NAME_PREFIX ""
 #else
-	snprintf(ps_buffer, ps_buffer_size,
-			 "postgres: %s %s %s ",
-			 username, dbname, host_info);
+#define PROGRAM_NAME_PREFIX "postgres: "
 #endif
 
+	if (*cluster_name == '\0')
+	{
+		snprintf(ps_buffer, ps_buffer_size,
+ PROGRAM_NAME_PREFIX "%s %s %s ",
+ username, dbname, host_info);
+	}
+	else
+	{
+		snprintf(ps_buffer, ps_buffer_size,
+ PROGRAM_NAME_PREFIX "[%s] %s %s %s ",
+ cluster_name, username, dbname, host_info);
+	}
+
 	ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer);
 
 	set_ps_display(initial_str, true);
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index be68f35..639288a 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -223,6 +223,7 @@ extern int	temp_file_limit;
 
 extern int	num_temp_buffers;
 
+extern char *cluster_name;
 extern char *data_directory;
 extern char *ConfigFileName;
 extern char *HbaFileName;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-05-05 07:58:30 -0400, Stephen Frost wrote:
> > > I guess the question is where this should be available as well. At the
> > > very least I'd want to reference it in log_line_prefix as well?
> > 
> > I'm not entirely sure that I see the point of having it in
> > log_line_prefix- each cluster logs to its own log file which includes
> > the cluster name (at least on Debian/Ubuntu and friends).  The only use
> > case I can imagine here would be for syslog, but you could just *set*
> > the cluster name in the log_line_prefix, as it'd be (by definition..)
> > configurable per cluster.
> 
> So I've to configure it in multiple locations? I don't see the point. I
> usually try to configure as much in common/template config files that
> are included. Everything that doesn't have to be overwritten is good.

I see the point- we've already got quite a few %whatever's and adding
mostly useless ones may just add confusion and unnecessary complexity.
If you've already got your postgresql.conf templated through
puppet/chef/salt/whatever then having to add another '%{ CLUSTER_NAME }%'
or whatever shouldn't be a terribly difficult thing.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 07:58:30 -0400, Stephen Frost wrote:
> > I guess the question is where this should be available as well. At the
> > very least I'd want to reference it in log_line_prefix as well?
> 
> I'm not entirely sure that I see the point of having it in
> log_line_prefix- each cluster logs to its own log file which includes
> the cluster name (at least on Debian/Ubuntu and friends).  The only use
> case I can imagine here would be for syslog, but you could just *set*
> the cluster name in the log_line_prefix, as it'd be (by definition..)
> configurable per cluster.

So I've to configure it in multiple locations? I don't see the point. I
usually try to configure as much in common/template config files that
are included. Everything that doesn't have to be overwritten is good.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
>   postgres[5433]: checkpointer process
> 
> at least as useful. The only time that's not unique is in a BSD jail or
> lxc container, and in those cases IIRC ps can show you the
> jail/container anyway.

Uhh, that's not at all true.  You can trivially have multiple IPs on a
box w/o jails or containers (aliased interfaces) and then run PG on the
default port- which I find to be *far* more convenient than having the
same IP and a bunch of different ports.

What you *can't* have is two clusters with the same name and same major
version, at least on the Debian/Ubuntu distributions, and as such, I
would argue to also include the major version rather than include the
port, which you could get from pg_lsclusters.

> Showing the port would help new-ish users a lot; many seem to be very
> confused by which PostgreSQL instance(s) they're connecting to and which
> are running. Especially on Mac OS X, where people often land up with
> Apple's PostgreSQL, Homebrew, Postgres.app, and who knows what else
> running at the same time.

I'm far from convinced that showing the port in the ps output will
really help these users..  Not to mention that you can get that from
'netstat -anp' anyway.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-05-05 10:00:34 +, Thomas Munro wrote:
> > When running more than one cluster I often find myself looking at
> > the output of 'iotop' or other tools wondering which
> > cluster's "wal receiver process" or "checkpointer process" etc
> > I'm seeing.
> 
> I wonder about that pretty regularly. To the point that I've a hacky
> version of this locally. So +1 for me for the idea in general.

Ditto.

> > If cluster_name is not set, it defaults to the empty string and
> > the ps output is unchanged.  If it's set to 'foox' the ps output
> > includes that string in square brackets:
> > 
> >   postgres: [foox] checkpointer process
> >   postgres: [foox] writer process
> >   postgres: [foox] wal writer process
> >   postgres: [foox] autovacuum launcher process
> >   postgres: [foox] stats collector process
> >   postgres: [foox] munro foodb [local] idle
> 
> "postgres: [foox] ..." should rather be "postgres[foox]: ..." imo ;)
> 
> I guess the question is where this should be available as well. At the
> very least I'd want to reference it in log_line_prefix as well?

I'm not entirely sure that I see the point of having it in
log_line_prefix- each cluster logs to its own log file which includes
the cluster name (at least on Debian/Ubuntu and friends).  The only use
case I can imagine here would be for syslog, but you could just *set*
the cluster name in the log_line_prefix, as it'd be (by definition..)
configurable per cluster.

I'd much rather see other things added as log_line_prefix options..  An
interesting thought that just occured to me would be to allow any GUC to
be added to log_line_prefix using some kind of extended % support (eg:
'%{my_guc_here}' or something...).  Would also be useful for extensions
which add GUCs then?  Not sure about specifics, but does seem like an
interesting idea.

Oh, and I know people will shoot me for bringing it up again, but I'd
still like to see the CSV format be configurable ala log_line_prefix,
and the same for any kind of logging (or auditing) to a table which we
might eventually support.  Yes, we need to work out how to do file
changes when it's updated and stick a header on each new file with the
columns included.

Thanks,

Stephen

signature.asc
Description: Digital signature

1 2 >

1 - 100 of 110 matches

Mail list logo