date:20140417

Re: [HACKERS] WAL replay bugs

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 10:33 AM, Tom Lane  wrote:
>> Any objections to changing those two?
>
> Not here.  I've always suspected #2 was going to bite us someday anyway.

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Andrew Dunstan



On 04/17/2014 09:04 PM, Peter Geoghegan wrote:

On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan  wrote:

  track_activity_query_size = 10240
shared_preload_libraries = 'auto_explain,pg_stat_statements'

As you can see, auto_explain's log_min_duration hasn't been set, so it
shouldn't be doing anything very much, I should think.

track_activity_query_size is 10240? Not 1024?




yes.

I normally find 1024 is far too small.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread David Johnston

On Thursday, April 17, 2014, Joshua D. Drake  wrote:

>
> On 04/17/2014 07:07 PM, David G Johnston wrote:
>
>>
>> On 04/17/2014 05:24 PM, Tom Lane wrote:
>>  > On the whole I'm not sure this is something we ought to get into.
>>  > If you really need a fresh session, maybe you should start a
>>  > fresh session.
>>
>>
>> Isn't the whole point to avoid the reconnection overhead, especially for
>> connection poolers?  DISCARD ALL shouldn't cause any cleanup that
>> wouldn't otherwise occur when a session disconnects.  True global data
>> (not just session global) should be excluded.
>>
>
> The GD is global to the session only (Like temp tables).


Yes.  Tom's response makes it sound like the proposal is to throw away the
entire language environment for the whole server (thus needing super user
privilege) so I'm pointing out that what we are discussing is not that
invasive.


>
>
>> A better wording of the promise would be: "discard all" leaves the
>> session in the same state it would be in if the underlying connection
>> were dropped and re-established.
>>
>
> Except that it doesn't.
>
>
But is this what you intend it to mean, by implementing these features, or
are you thinking something different?

David J.

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Joshua D. Drake

On 04/17/2014 07:07 PM, David G Johnston wrote:

On 04/17/2014 05:24 PM, Tom Lane wrote:
 > On the whole I'm not sure this is something we ought to get into.
 > If you really need a fresh session, maybe you should start a
 > fresh session.

Isn't the whole point to avoid the reconnection overhead, especially for
connection poolers?  DISCARD ALL shouldn't cause any cleanup that
wouldn't otherwise occur when a session disconnects.  True global data
(not just session global) should be excluded.

The GD is global to the session only (Like temp tables).

A better wording of the promise would be: "discard all" leaves the
session in the same state it would be in if the underlying connection
were dropped and re-established.

Except that it doesn't.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Stephen Frost

* David G Johnston (david.g.johns...@gmail.com) wrote:
> > On 04/17/2014 05:24 PM, Tom Lane wrote:
> > > On the whole I'm not sure this is something we ought to get into.
> > > If you really need a fresh session, maybe you should start a
> > > fresh session.
> >
> Isn't the whole point to avoid the reconnection overhead, especially for
> connection poolers?  DISCARD ALL shouldn't cause any cleanup that wouldn't
> otherwise occur when a session disconnects.  True global data (not just
> session global) should be excluded.

Err, this is per-backend global data, so it will get cleaned up on
session disconnect.

> A better wording of the promise would be: "discard all" leaves the session
> in the same state it would be in if the underlying connection were dropped
> and re-established.

This is not accurate.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread David G Johnston

> On 04/17/2014 05:24 PM, Tom Lane wrote:
> > On the whole I'm not sure this is something we ought to get into.
> > If you really need a fresh session, maybe you should start a
> > fresh session.
>
>
Isn't the whole point to avoid the reconnection overhead, especially for
connection poolers?  DISCARD ALL shouldn't cause any cleanup that wouldn't
otherwise occur when a session disconnects.  True global data (not just
session global) should be excluded.

A better wording of the promise would be: "discard all" leaves the session
in the same state it would be in if the underlying connection were dropped
and re-established.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/DISCARD-ALL-Again-tp5800623p5800662.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 6:50 PM, Greg Stark  wrote:
> On Thu, Apr 17, 2014 at 4:48 PM, Peter Geoghegan  wrote:
>> Although I will add that not caching highly useful inner pages for the
>> medium term, because that index isn't being used at all for 5 minutes
>> probably is very bad. Using the 4,828 buffers that it would take to
>> store all the inner pages (as in my large primary index example) to go
>> store something else is probably penny wise and pound foolish.
>
> But there could easily be 20 unused indexes for every 1 index that is
> being used.

Sure, but then there might not be. Obviously there is a trade-off to
be made between recency and frequency. One interesting observation in
the LRU-K paper is that for their test case, a pure LFU actually works
very well, despite, as the authors acknowledge, being a terrible
algorithm in the real world. That's because their test case is so
simple, and concerns only one table/index, with a uniform
distribution.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark

On Thu, Apr 17, 2014 at 4:48 PM, Peter Geoghegan  wrote:
> Although I will add that not caching highly useful inner pages for the
> medium term, because that index isn't being used at all for 5 minutes
> probably is very bad. Using the 4,828 buffers that it would take to
> store all the inner pages (as in my large primary index example) to go
> store something else is probably penny wise and pound foolish.

But there could easily be 20 unused indexes for every 1 index that is
being used.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-04-17 Thread Amit Kapila

On Sat, Apr 12, 2014 at 1:21 PM, MauMau  wrote:
> Hello, Amit san, Tom san,
>
> I'm sorry for my late response.  I've just caught up with the discussion.
> I'm almost convinced.
>
> Please find attached the revised patch.  I'd like to follow the idea of
> adding a switch to pg_ctl.  The newly added ""-e event_source" sets the
> event source name for pg_ctl to use.  When -e is used with pg_ctl register,
> it will be added to the command line for Windows service (pg_ctl
> runservice).

Currently -e option is accepted with all the options that can be provided
in pg_ctl.  Shouldn't we accept it only with options related to service,
because that is only when it will be used.  Basically write_stderr() will
write to event log only incase of service.

Another minor point is you have forgotten to remove below declaration:
+ static void get_config_value(const char *name, char *buf, int buf_size);

Sorry for delayed response and I am not sure that I will be able to
complete the review of patch in next few days as I will be on vacation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Greg Stark

On Thu, Apr 17, 2014 at 5:26 PM, Jeff Janes  wrote:
> A lot of the bugs that turned up are not the kind I would expect to have
> been found in most beta testing done by non-hacking users.  Weren't they
> mostly around rare race conditions, crash recovery, and freezing?

Actually I was struck by how the bugs in 9.3 were the kinds of bugs
that should have turned up pretty quickly by user testing. They
certainly turned up pretty quickly after users put their production
applications on it. They *didn't* require rare race conditions, just
certain patterns of workloads for long enough to reliably reproduce.

They were specifically *not* the kinds of bugs that regression testing
would have found. Regression testing only finds bugs you anticipate
and think to put in the specification of correct behaviour. If you had
thought of these problems you would have tested them manually and in
any case you would have seen the omissions immediately on inspected
the code.

Crash recovery and freezing aren't rare things once you have hot
standbys everywhere and run 24x7 applications (or load tests) on your
systems. We could make freezing more frequent by having a mode that
bumps the xid by a few million randomly. That would still be pretty
hit and miss whether it happens to wrap around in any particular
state.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Buildfarm "master-next" branch?

2014-04-17 Thread Craig Ringer

On 04/17/2014 10:38 PM, Tom Lane wrote:
> IMO the best single thing that could happen for the buildfarm is if
> we had more critters (at least twice as many) running a wider variety of
> platforms, compilers, and configuration options than there are today.
> More frequent runs would come out of that automatically.

I'll be bringing up a new Windows buildfarm member once I've got a
current project knocked off. It's a pretty fast dedicated Windows Server
2012 box with a wide range of SDKs on it that can do 32-bit and 64-bit
builds.

Should help a little.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan  wrote:
>  track_activity_query_size = 10240
>shared_preload_libraries = 'auto_explain,pg_stat_statements'
>
> As you can see, auto_explain's log_min_duration hasn't been set, so it
> shouldn't be doing anything very much, I should think.

track_activity_query_size is 10240? Not 1024?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Tom Lane

"Joshua D. Drake"  writes:
> On 04/17/2014 05:24 PM, Tom Lane wrote:
>> 1. The things that DISCARD ALL resets are explicitly enumerated in its
>> documentation page; it is not an open-ended promise to clean up anything
>> anybody happens to think of.

> Actually, it is. Unless we change the wording in the documentation 

Well, we definitely need to tweak the short description of DISCARD ALL.
But the long description a bit further down the page is quite clear.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Joshua D. Drake



On 04/17/2014 05:24 PM, Tom Lane wrote:

"Joshua D. Drake"  writes:

Does it seem reasonable based on the docs:
DISCARD ALL:
Releases all temporary resources associated with the current session and
resets the session to its initial state.
That we should also release the GD?


There are a couple of reasons why this isn't anywhere near as
open-and-shut as you seem to think:

1. The things that DISCARD ALL resets are explicitly enumerated in its
documentation page; it is not an open-ended promise to clean up anything
anybody happens to think of.


Actually, it is. Unless we change the wording in the documentation 
because right now, DISCARD ALL states that it will return the session 
back to its initial state. It does not do that.


[... snipping technical details for those who would be better at arguing 
than I ... ]




On the whole I'm not sure this is something we ought to get into.
If you really need a fresh session, maybe you should start a
fresh session.


This isn't nearly as simple as it sounds. That said, the work around we 
came up with was to have server_reset_query in pg_bouncer call a 
function called discard_gd() that explicitly discards the GD for that 
session.



But if we are going to get into it, I think the
permissions aspect needs some careful thought.



Agreed.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Tom Lane

"Joshua D. Drake"  writes:
> Does it seem reasonable based on the docs:
> DISCARD ALL:
> Releases all temporary resources associated with the current session and 
> resets the session to its initial state.
> That we should also release the GD?

There are a couple of reasons why this isn't anywhere near as
open-and-shut as you seem to think:

1. The things that DISCARD ALL resets are explicitly enumerated in its
documentation page; it is not an open-ended promise to clean up anything
anybody happens to think of.  In particular, I would object to adding any
such functionality that doesn't correspond to a more specific form of
DISCARD, RESET, or whatever.

2. While I'm no Python expert, I believe GD is just a specific instance
of a general capability for global state in Python.  Are we going to
promise that any and all user-created data inside Python goes away?
What about other PLs?  Will users thank us if this suddenly starts
happening?

3. Since PL/Python is a superuser-only language, destroying either all
global state or just GD should require superuser privileges.  It will
definitely break things to make DISCARD ALL require superuser privs.
It's not very much more attractive to say that it only destroys Python
state if issued by a superuser.

We could in fact implement #2, I imagine, by destroying and recreating
the entire language interpreter.  So I could imagine implementing a
"DISCARD INTERPRETERS" kind of command that would zap the current
interpreter(s) for whichever PL languages happened to feel like
cooperating with the command.  But I'm not at all sure about the
superuser-privileges aspect of that.

More: IIRC, in at least some of the PLs, the interpreters are
SQL-user-specific.  Should a non-superuser session user have the ability
to destroy state in interpreters running for other user IDs (presumably,
state created by SECURITY DEFINER functions)?  I'd guess not.

On the whole I'm not sure this is something we ought to get into.
If you really need a fresh session, maybe you should start a
fresh session.  But if we are going to get into it, I think the
permissions aspect needs some careful thought.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Michael Paquier

On Fri, Apr 18, 2014 at 4:29 AM, Fabrízio de Royes Mello
 wrote:
>
> On Thu, Apr 17, 2014 at 12:46 PM, Bruce Momjian  wrote:
>>
>> On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote:
>> > Bruce Momjian  writes:
>> > > The idea is that we only need quotes when there are odd characters in
>> > > the identifier.  We do that right now in some places, though I can't
>> > > find them in pg_dump.  I know psql does that, see quote_ident().
>> >
>> > I think our general style rule is that identifiers embedded in messages
>> > are always double-quoted.  There's an exception for type names, but
>> > not otherwise.  You're confusing the message case with printing SQL.
>>
>> OK.  I was unclear if a status _display_ was a message like an error
>> message.
>>
>
> The attached patch fix missing double-quoted in "dumping contents of
> table.." message and add schema name to other messages:
> - "reading indexes for table \"%s\".\"%s\"\n"
> - "reading foreign key constraints for table \"%s\".\"%s\"\n"
> - "reading triggers for table \"%s\".\"%s\"\n"
>
> - "finding the columns and types of table \"%s\".\"%s\"\n"
> - "finding default expressions of table \"%s\".\"%s\"\n"
> - "finding check constraints for table \"%s\".\"%s\"\n"
Cool additions. There may be a more elegant way to check if namespace
is NULL, but I couldn't come up with one myself. So patch may be fine.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Josh Berkus

All,

So have encountered a 2nd report of this issue, or of an issue which
sounds very similar:

- corruption in two "queue" tables

- the tables are written in a high-concurrency, lock-contested environment

- user uses SELECT FOR UPDATE with these tables.

- pg_stat_statements .so is loaded, but the extension is not installed

- four rows were added to the queue tables, while not being added to the
PK index.  This allowed duplicate PKs to be added.

Currently the user is testing not loading the pg_stat_statements.so to
see if the problem goes away.  They have a destruction test environment,
so we should be able to confirm/deny in a couple days.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Josh Berkus

On 04/15/2014 09:53 PM, Rod Taylor wrote:
> A documented beta test process/toolset which does the following would help:
> 1) Enables full query logging
> 2) Creates a replica of a production DB, record $TIME when it stops.
> 3) Allow user to make changes (upgrade to 9.4, change hardware, change
> kernel settings, ...)
> 4) Plays queries from the CSV logs starting from $TIME mimicking actual
> timing and transaction boundaries
> 
> If Pg can make it easy to duplicate activities currently going on in
> production inside another environment, I would be pleased to fire a couple
> billion queries through it over the next few weeks.
> 
> #4 should include reporting useful to the project, such as a sampling of
> queries which performed significantly worse and a few relative performance
> stats for overall execution time.

So we have some software we've been procrastinating on OSS'ing, which does:

1) Takes full query CSV logs from a running postgres instance
2) Runs them against a target instance in parallel
3) Records response times for all queries

tsung and pgreplay also do this, but have some limitations which make
them impractical for a general set of logs to replay.

What it would need is:

A) scripting around coordinated backups
B) Scripting for single-command runs, including changing pg.conf to
record data.
C) tools to *analyze* the output data, including error messages.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Fabrízio de Royes Mello

On Thu, Apr 17, 2014 at 6:51 PM, Alvaro Herrera 
wrote:
>
> It does sounds a legitimate feature request to me.  I don't remember if
> we honored the request to add resetting of cached sequences, though; if
> we didn't, this one is probably going to be tough too.
>

+1


> Another point is that to implement this I think there will need to be
> another per-PL entry point to discard session data; are we okay with
> that?  Since this probably means a new column in pg_language, we
> couldn't even consider the idea of back-patching.  Unless we add a hook,
> which is registered in the PL's _PG_init()?
>

This week I had some similar trouble, but using "dblink" and "pgbouncer".
As expected "DISCARD ALL" don't clear the extension resources.

I was thinking if is possible to every extension register his own "discard"
procedure and then the "DISCARD ALL" can execute all registered extension
cleanup procedures. Makes sense?


> Are we going to backpatch a doc change that says "releases all temporary
> resources, except for plptyhon's and plperl's GD"?  Surely not ...
>

Maybe this doc can be like that:

"releases all temporary resources, except for extensions"

Grettings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread David G Johnston

Alvaro Herrera-9 wrote
> Are we going to backpatch a doc change that says "releases all temporary
> resources, except for plptyhon's and plperl's GD"?  Surely not ...

GD = Global Dictionary

I don't see why something like the following wouldn't have value.

"For those languages that make use of a Global Dictionary (or other global)
construct to maintain state across function calls the content of these
global constructs may not be reset upon a call to DISCARD ALL - please see
your language's documentation for specifics.".

Then, more attention should be made within the language sections both to the
constructs in general and also their limitations in the face of DISCARD ALL.

"Note, the session-level GD construct is not informed of the execution of
DISCARD ALL and thus is not cleared when such is executed."

It would be helpful to note any work-around options as well.

It sounds like, generally, PostgreSQL should promise to notify each
language/session of a pending DISCARD ALL and then leave it up to the
language to define how it will respond.  The documentation should reflect
this promise/action division-of-responsibilities.

David J.












--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/DISCARD-ALL-Again-tp5800623p5800640.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Alvaro Herrera

Joshua D. Drake wrote:
> 
> On 04/17/2014 02:17 PM, Josh Berkus wrote:
> >
> >On 04/17/2014 01:44 PM, Joshua D. Drake wrote:
> >>Does it seem reasonable based on the docs:
> >>
> >>DISCARD ALL:
> >>
> >>Releases all temporary resources associated with the current session and
> >>resets the session to its initial state.
> >>
> >>That we should also release the GD?
> >
> >It does, but that's a feature request, not a bug.  Same with pl/perl.
> >
> 
> Well I would argue it is a documentation bug/error and that yes, we
> have a legitimate feature request for DISCARD ALL to clear the GD.

It does sounds a legitimate feature request to me.  I don't remember if
we honored the request to add resetting of cached sequences, though; if
we didn't, this one is probably going to be tough too.

Another point is that to implement this I think there will need to be
another per-PL entry point to discard session data; are we okay with
that?  Since this probably means a new column in pg_language, we
couldn't even consider the idea of back-patching.  Unless we add a hook,
which is registered in the PL's _PG_init()?

Are we going to backpatch a doc change that says "releases all temporary
resources, except for plptyhon's and plperl's GD"?  Surely not ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Joshua D. Drake



On 04/17/2014 02:17 PM, Josh Berkus wrote:


On 04/17/2014 01:44 PM, Joshua D. Drake wrote:

Does it seem reasonable based on the docs:

DISCARD ALL:

Releases all temporary resources associated with the current session and
resets the session to its initial state.

That we should also release the GD?


It does, but that's a feature request, not a bug.  Same with pl/perl.



Well I would argue it is a documentation bug/error and that yes, we have 
a legitimate feature request for DISCARD ALL to clear the GD.


Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Jan Wieck


On 04/17/14 15:16, Merlin Moncure wrote:

On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus  wrote:

Hackers,

I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

I've tried a couple of things over the last two years and they haven't
worked all that well.  Since we're about to go into another beta testing
period, we need something new.  Ideas?


I've seen lots of bugs reported and fixed in the beta period over the
years.  My take is that it's basically unrealistic to expect volunteer
beta testers to replace bone fide regression testing.

I think it's a pretty fair statement that we've had some QC issues in
the general area of replication technologies.  What this is indicating
to me is that replication needs substantially more coverage in 'make
check'.  Since I'm wishing for things, it would be nice to see an
expansion of the buildfarm so that we could [optionally] run various
performance tests as well as various replication scenarios.  Then we
could go back to users and say, please donate 'repeatable tests and
machines to run them on' and reap the long term value.

Not at all making light out of any of this...it's a huge project.


The problem with testing replication is that it doesn't fit well into 
our standard regression testing. There are way too many moving parts as 
well as dependencies on the underlying OS and network topology.


You will discover a ton of race conditions once you actually move from 
testing with multiple postmasters (so you can kill one) on the same box 
to using multiple virtual machines and instead of completely severing a 
network connection using some packet shaping/filtering to introduce 
packet loss, limited bandwidth, async routing and so on. At least that 
is my experience from throwing that sort of sh*t at Slony at full speed.


Not trying to discourage anyone from trying. Just saying that it doesn't 
fit into our existing regression test framework.



Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Jeff Janes

On Tue, Apr 15, 2014 at 2:47 PM, Josh Berkus  wrote:

> Hackers,
>
> I think 9.3 has given us evidence that our users aren't giving new
> versions of PostgreSQL substantial beta testing, or if they are, they
> aren't sharing the results with us.
>

A lot of the bugs that turned up are not the kind I would expect to have
been found in most beta testing done by non-hacking users.  Weren't they
mostly around rare race conditions, crash recovery, and freezing?

>
> How can we make beta testing better and more effective?  How can we get
> more users to actually throw serious workloads at new versions and share
> the results?
>

If we are interested in positive results as well as negative, we should
change https://wiki.postgresql.org/wiki/HowToBetaTest

"pgsql-hackers: bugs, questions, and successful test reports are welcome
here if you are already subscribed to pgsql-hackers. Note that
pgsql-hackers is a high-traffic mailing list with a lot of development
discussion."

So successful reports are welcome, provided that you are willing to
subscribe to a list that generates tons of noise you won't understand.
 That doesn't sound all that welcoming.  (I already am subscribed, but I
still usually don't report successful tests, because "yeah, I did a bunch
of stuff, and nothing failed in an obvious way" just doesn't sound very
useful, and it is hard to get motivated to write up an exhaustive
description of a test that doesn't prove anything anyway--maybe if I did
for a few more hours, it would have found a problem.)

If we want to know how much beta testing is really going on, perhaps we
could do a survey asking people whether they did any beta testing, and if
so whether they reported the results.  Otherwise it would be hard to
distinguish "We aren't doing enough testing" from "We do lots of testing,
but it isn't strenuous enough to find the problems, or is testing the wrong
aspects of the system".

Cheers,

Jeff

Re: [HACKERS] Patch: iff -> if

2014-04-17 Thread Andreas 'ads' Scherbaum


On 04/17/2014 12:33 PM, Nicolas Barbier wrote:

2014-04-17 Michael Paquier :


Is there no equivalent in German? For example in French there is "ssi".


gdw (genau dann, wenn)


More likely that you see

 \equiv

or:

 \leftrightarrow


Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Josh Berkus

On 04/17/2014 01:44 PM, Joshua D. Drake wrote:
> Does it seem reasonable based on the docs:
> 
> DISCARD ALL:
> 
> Releases all temporary resources associated with the current session and
> resets the session to its initial state.
> 
> That we should also release the GD?

It does, but that's a feature request, not a bug.  Same with pl/perl.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Steve Singer


On 04/17/2014 04:33 PM, Andres Freund wrote:

Hi,

On 2014-04-17 16:23:54 -0400, Steve Singer wrote:

With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)

I am getting an assertion when doing a truncate via SPI when I have
wal_level=logical.

Stack trace is below.

I am just replicating a table with normal slony (2.2) I don't need to
establish any replication slots to get this.

Uh, that's somewhat nasty... You probably only get that because of
slony's habit of share locking catalogs. Could that be?


Yes slony does a select from pg_catalog and pg_namespace  in  the stored 
function just before doing the truncate.




For now, to circumvent the problem you could just revert
4a170ee9e0ebd7021cb1190fabd5b0cbe2effb8e for now.

I'll look into fixing it properly.

Greetings,

Andres Freund






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 1:33 PM, Peter Geoghegan  wrote:
> I can't imagine that this is much of a problem in practice.

Although I will add that not caching highly useful inner pages for the
medium term, because that index isn't being used at all for 5 minutes
probably is very bad. Using the 4,828 buffers that it would take to
store all the inner pages (as in my large primary index example) to go
store something else is probably penny wise and pound foolish.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Andres Freund

On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
> For once, this looks more like a problem in logical decoding, which is
> trying to assert about the tuple being updated; the assertion failing is
> the one added a week ago about not palloc'ing in a critical section.

It's this (older) assertion in HeapTupleHeaderGetCmax():


Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));

That can allocate memory if xmax is a multixact... Does anybody have a
better idea to solve this than adding a CritSectionCount == 0 && in
there?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 1:39 PM, Andres Freund  wrote:
> On 2014-04-17 13:33:27 -0700, Peter Geoghegan wrote:
>> Just over 99.6% of pages (leaving aside the meta page) in the big 10
>> GB pgbench_accounts_pkey index are leaf pages.
>
> That's a rather nice number. I knew it was big, but I'd have guessed
> it'd be a percent lower.

Yes, it's usually past 99.5% for int4. It's really bad if it's as low
as 96%, and I think that often points to what are arguably bad
indexing choices, like indexing text columns that have long text
strings.

> Do you happen to have the same stat handy for a sensibly wide text or
> numeric real world index? It'd be interesting to see what the worst case
> there is.

Yes, as it happens I do:
http://www.postgresql.org/message-id/CAM3SWZTcXrdDZSpA11qZXiyo4_jtxwjaNdZpnY54yjzq7d64=a...@mail.gmail.com

I was working of my Mouse Genome database, which is actually
real-world data use by medical researchers, stored in a PostgreSQL
database by those researchers and made available for the benefit of
other medical researchers.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Joshua D. Drake



Hello,

Over at my quaint establishment we have been working on some plython 
work that makes use of GD. We wrote this code with the assumption (per 
the docs) that when you issued a DISCARD ALL, the GD would be cleared. 
Apparently this is not the case. The docs themselves are clearly wrong, 
and this has been brought up before multiple times (that DISCARD ALL 
doesn't do what people think it does).


Does it seem reasonable based on the docs:

DISCARD ALL:

Releases all temporary resources associated with the current session and 
resets the session to its initial state.


That we should also release the GD?

Sincerely,

Joshua D. Drake


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Alvaro Herrera

Steve Singer wrote:
> With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)
> 
> I am getting an assertion when doing a truncate via SPI when I have
> wal_level=logical.
> 
> Stack trace is below.
> 
> I am just replicating a table with normal slony (2.2) I don't need
> to establish any replication slots to get this.

For once, this looks more like a problem in logical decoding, which is
trying to assert about the tuple being updated; the assertion failing is
the one added a week ago about not palloc'ing in a critical section.

Andres told me on IM that there's another weird thing going on (which is
how the catalog tuple gets a multixact in the first place) which is that
Slony does a SELECT FOR SHARE in the catalog tuple.

One simple approach would be to just disable that particular assert when
in a critical section.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Andres Freund

On 2014-04-17 13:33:27 -0700, Peter Geoghegan wrote:
> Just over 99.6% of pages (leaving aside the meta page) in the big 10
> GB pgbench_accounts_pkey index are leaf pages.

That's a rather nice number. I knew it was big, but I'd have guessed
it'd be a percent lower.

Do you happen to have the same stat handy for a sensibly wide text or
numeric real world index? It'd be interesting to see what the worst case
there is.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Andres Freund

Hi,

On 2014-04-17 16:23:54 -0400, Steve Singer wrote:
> With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)
> 
> I am getting an assertion when doing a truncate via SPI when I have
> wal_level=logical.
> 
> Stack trace is below.
> 
> I am just replicating a table with normal slony (2.2) I don't need to
> establish any replication slots to get this.

Uh, that's somewhat nasty... You probably only get that because of
slony's habit of share locking catalogs. Could that be?

For now, to circumvent the problem you could just revert
4a170ee9e0ebd7021cb1190fabd5b0cbe2effb8e for now.

I'll look into fixing it properly.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 8:10 AM, Greg Stark  wrote:
> I don't think "common sense" is compelling. I think you need to pin
> down exactly what it is about btree intermediate pages that the LRU
> isn't capturing and not just argue they're more useful. The LRU is
> already capturing which pages are more heavily used than others so you
> need to identify what it is that makes index pages *even more* useful
> than their frequency and recency of access indicates. Not just that
> they're more useful than an average page.

See example 1.1 within the LRU-K paper.

> So what I think is missing is that indexes are always accessed from
> the root down to the leaf. So the most recent page accessed will
> always be the leaf. And in whatever chain of pages was used to reach
> the last leaf page the least recently accessed will always be the
> root. But we'll need the root page again on the subsequent descent
> even if it's to reach the same leaf page we kept in ram in preference
> to it.

I can't imagine that this is much of a problem in practice. Consider
the break-down of pages within indexes when pgbench scale is 5,000, as
in my original benchmark:

[local] pg@pgbench=# with tots as (
SELECT count(*) c, type, relname from
(select relname, relpages, generate_series(1, relpages - 1) i
from pg_class c join pg_namespace n on c.relnamespace = n.oid where
relkind = 'i' and nspname = 'public') r,
lateral (select * from bt_page_stats(relname, i)) u
group by relname, type)
select tots.relname, relpages -1 as non_meta_pages, c, c/sum(c)
over(partition by tots.relname) as prop_of_index, type from tots join
pg_class c on c.relname = tots.relname order by 2 desc, 1, type;

relname| non_meta_pages |c|
prop_of_index| type
---++-++--
 pgbench_accounts_pkey |1370950 |4828 |
0.00352164557423684307 | i
 pgbench_accounts_pkey |1370950 | 1366121 |
0.99647762500455888253 | l
 pgbench_accounts_pkey |1370950 |   1 |
0.00729421204274408257 | r
 pgbench_tellers_pkey  |274 | 273 |
0.99635036496350364964 | l
 pgbench_tellers_pkey  |274 |   1 |
0.00364963503649635036 | r
 pgbench_branches_pkey | 28 |  27 |
0.96428571428571428571 | l
 pgbench_branches_pkey | 28 |   1 |
0.03571428571428571429 | r
(7 rows)

Time: 14562.297 ms

Just over 99.6% of pages (leaving aside the meta page) in the big 10
GB pgbench_accounts_pkey index are leaf pages. The inner pages and
root page are at an enormous advantage. In this example, the other
indexes don't even have what would be separately classified as an
inner page (and not a root page) at all, because it's perfectly
sufficient to only have a root page to get to any one of, say, 273
leaf pages (in the case of pgbench_tellers_pkey here).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Steve Singer


With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)

I am getting an assertion when doing a truncate via SPI when I have 
wal_level=logical.


Stack trace is below.

I am just replicating a table with normal slony (2.2) I don't need to 
establish any replication slots to get this.





(gdb) where
#0  0x7fc9b4f58295 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7fc9b4f5b438 in __GI_abort () at abort.c:90
#2  0x007a10f7 in ExceptionalCondition (
conditionName=conditionName@entry=0x955d90 "!(CritSectionCount == 0 
|| (CurrentMemoryContext) == ErrorContext || (MyAuxProcType == 
CheckpointerProcess))",

errorType=errorType@entry=0x7da7b0 "FailedAssertion",
fileName=fileName@entry=0x955a2e "mcxt.c", 
lineNumber=lineNumber@entry=670)

at assert.c:54
#3  0x007c3090 in palloc (size=16) at mcxt.c:670
#4  0x004dd83f in mXactCacheGetById (members=0x7fff679a3d18, 
multi=58)

at multixact.c:1411
#5  GetMultiXactIdMembers (multi=58, members=members@entry=0x7fff679a3d18,
allow_old=allow_old@entry=0 '\000') at multixact.c:1080
#6  0x0049e43f in MultiXactIdGetUpdateXid (xmax=,
t_infomask=) at heapam.c:6042
#7  0x004a1ccc in HeapTupleGetUpdateXid (tuple=)
at heapam.c:6083
#8  0x007cf7d9 in HeapTupleHeaderGetCmax 
(tup=tup@entry=0x7fc9ac838e38)

at combocid.c:122
#9  0x0049eb98 in log_heap_new_cid (
relation=relation@entry=0x7fc9b5a67dc0, tup=tup@entry=0x7fff679a3ea0)
at heapam.c:7047
#10 0x004a48a5 in heap_update 
(relation=relation@entry=0x7fc9b5a67dc0,

otid=otid@entry=0x2678c6c, newtup=newtup@entry=0x2678c68, cid=26,
crosscheck=crosscheck@entry=0x0, wait=wait@entry=1 '\001',
hufd=hufd@entry=0x7fff679a4080, lockmode=lockmode@entry=0x7fff679a407c)
at heapam.c:3734
#11 0x004a5842 in simple_heap_update (
relation=relation@entry=0x7fc9b5a67dc0, otid=otid@entry=0x2678c6c,
tup=tup@entry=0x2678c68) at heapam.c:4010
#12 0x00797cf7 in RelationSetNewRelfilenode (
relation=relation@entry=0x7fc9ab270b68, freezeXid=19459,
minmulti=minmulti@entry=58) at relcache.c:2956
#13 0x0059ddde in ExecuteTruncate (stmt=0x3a, stmt@entry=0x2678a58)
at tablecmds.c:1187
#14 0x006c3870 in standard_ProcessUtility (parsetree=0x2678a58,
queryString=, context=, params=0x0,
dest=, completionTag=) at utility.c:515
#15 0x005e79d9 in _SPI_execute_plan 
(plan=plan@entry=0x7fff679a4320,

paramLI=paramLI@entry=0x0, snapshot=snapshot@entry=0x0,
crosscheck_snapshot=crosscheck_snapshot@entry=0x0,
read_only=read_only@entry=0 '\000',
fire_triggers=fire_triggers@entry=1 '\001', tcount=tcount@entry=0)
at spi.c:2171
#16 0x005e83c1 in SPI_execute (
#16 0x005e83c1 in SPI_execute (
---Type  to continue, or q  to quit---
src=src@entry=0x25bde90 "truncate only \"disorder\".\"do_restock\"",
read_only=0 '\000', tcount=tcount@entry=0) at spi.c:386


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New windows compiler warning from 585bca39

2014-04-17 Thread Tom Lane

David Rowley  writes:
> I've attached a tiny patch that fixes a new compiler warning on the windows
> build...

Applied, thanks.

> Perhaps the #ifndef could be placed in a nicer spot in the patch, but the
> attached should at least describe where the problem lies...

Yeah, I thought it better to make a separate declaration to wrap in
#ifndef.  pgindent is probably going to insist on adding some vertical
whitespace around the #if, and that'll look horrid if it's just in the
middle of a list of variables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost

On Thursday, April 17, 2014, Merlin Moncure  wrote:
>
> no -- I got you. My point was, that's a pure guess unless you base it
> on evidence recorded on the page itself.  Without that evidence,
> (which requires writing) the operating is in a a better place to make
> that guess so it's probably better to defer that decision.
>

Well, we'd only need that info to be stored in the buffer cache somehow-
wouldn't have to go to disk or cause more I/O, of course. My thinking was
that we could track it with the existing counter too, avoiding even that
small amount of locking to write to the buffer page.

Thanks,

Stephen

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Merlin Moncure

On Thu, Apr 17, 2014 at 2:28 PM, Stephen Frost  wrote:
>
>
> On Thursday, April 17, 2014, Merlin Moncure  wrote:
>>
>> yeah -- the thing is, we are already too spendy already on
>> supplemental write i/o (hint bits, visible bits, freezing, etc) and
>> likely not worth it to throw something else on the pile unless the
>> page is already dirty; the medium term trend in storage is that read
>> vs write performance is becoming increasingly asymmetric, particularly
>> on the random side so it's very unlikely to balance out.
>
> Guess I wasn't clear but I was thinking to read the page in, not do any
> writing, and do it in a asynchronous way to the process doing the evicting.

no -- I got you. My point was, that's a pure guess unless you base it
on evidence recorded on the page itself.  Without that evidence,
(which requires writing) the operating is in a a better place to make
that guess so it's probably better to defer that decision.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Fabrízio de Royes Mello

On Thu, Apr 17, 2014 at 12:46 PM, Bruce Momjian  wrote:
>
> On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > The idea is that we only need quotes when there are odd characters in
> > > the identifier.  We do that right now in some places, though I can't
> > > find them in pg_dump.  I know psql does that, see quote_ident().
> >
> > I think our general style rule is that identifiers embedded in messages
> > are always double-quoted.  There's an exception for type names, but
> > not otherwise.  You're confusing the message case with printing SQL.
>
> OK.  I was unclear if a status _display_ was a message like an error
> message.
>

The attached patch fix missing double-quoted in "dumping contents of
table.." message and add schema name to other messages:
- "reading indexes for table \"%s\".\"%s\"\n"
- "reading foreign key constraints for table \"%s\".\"%s\"\n"
- "reading triggers for table \"%s\".\"%s\"\n"
- "finding the columns and types of table \"%s\".\"%s\"\n"
- "finding default expressions of table \"%s\".\"%s\"\n"
- "finding check constraints for table \"%s\".\"%s\"\n"

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 9464540..7f73e8d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -706,8 +706,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
 
-	ahlog(AH, 1, "processing data for table \"%s\"\n",
-		  te->tag);
+	ahlog(AH, 1, "processing data for table \"%s\".\"%s\"\n",
+		  AH->currSchema, te->tag);
 
 	/*
 	 * In parallel restore, if we created the table earlier in
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a6c0428..78ec5bf 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1400,7 +1400,17 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 	const char *column_list;
 
 	if (g_verbose)
-		write_msg(NULL, "dumping contents of table %s\n", classname);
+	{
+		/* Print namespace information if available */
+		if (tbinfo->dobj.namespace != NULL &&
+			tbinfo->dobj.namespace->dobj.name != NULL)
+			write_msg(NULL, "dumping contents of table \"%s\".\"%s\"\n",
+	  tbinfo->dobj.namespace->dobj.name,
+	  classname);
+		else
+			write_msg(NULL, "dumping contents of table \"%s\"\n",
+	  classname);
+	}
 
 	/*
 	 * Make sure we are in proper schema.  We will qualify the table name
@@ -4974,8 +4984,17 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			continue;
 
 		if (g_verbose)
-			write_msg(NULL, "reading indexes for table \"%s\"\n",
-	  tbinfo->dobj.name);
+		{
+			/* Print namespace information if available */
+			if (tbinfo->dobj.namespace != NULL &&
+tbinfo->dobj.namespace->dobj.name != NULL)
+write_msg(NULL, "reading indexes for table \"%s\".\"%s\"\n",
+		  tbinfo->dobj.namespace->dobj.name,
+		  tbinfo->dobj.name);
+			else
+write_msg(NULL, "reading indexes for table \"%s\"\n",
+		  tbinfo->dobj.name);
+		}
 
 		/* Make sure we are in proper schema so indexdef is right */
 		selectSourceSchema(fout, tbinfo->dobj.namespace->dobj.name);
@@ -5340,8 +5359,17 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
 			continue;
 
 		if (g_verbose)
-			write_msg(NULL, "reading foreign key constraints for table \"%s\"\n",
-	  tbinfo->dobj.name);
+		{
+			/* Print namespace information if available */
+			if (tbinfo->dobj.namespace != NULL &&
+tbinfo->dobj.namespace->dobj.name != NULL)
+write_msg(NULL, "reading foreign key constraints for table \"%s\".\"%s\"\n",
+		  tbinfo->dobj.namespace->dobj.name,
+		  tbinfo->dobj.name);
+			else
+write_msg(NULL, "reading foreign key constraints for table \"%s\"\n",
+		  tbinfo->dobj.name);
+		}
 
 		/*
 		 * select table schema to ensure constraint expr is qualified if
@@ -5678,8 +5706,17 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int numTables)
 			continue;
 
 		if (g_verbose)
-			write_msg(NULL, "reading triggers for table \"%s\"\n",
-	  tbinfo->dobj.name);
+		{
+			/* Print namespace information if available */
+			if (tbinfo->dobj.namespace != NULL &&
+tbinfo->dobj.namespace->dobj.name != NULL)
+write_msg(NULL, "reading triggers for table \"%s\".\"%s\"\n",
+		  tbinfo->dobj.namespace->dobj.name,
+		  tbinfo->dobj.name);
+			else
+write_msg(NULL, "reading triggers for table \"%s\"\n",
+		  tbinfo->dobj.name);
+		}
 
 		/*
 		 * select table schema to ensure regproc name is qualified if needed
@@ -6291,8 +6328,17 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 		 * the o

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost

On Thursday, April 17, 2014, Merlin Moncure  wrote:
>
> yeah -- the thing is, we are already too spendy already on
> supplemental write i/o (hint bits, visible bits, freezing, etc) and
> likely not worth it to throw something else on the pile unless the
> page is already dirty; the medium term trend in storage is that read
> vs write performance is becoming increasingly asymmetric, particularly
> on the random side so it's very unlikely to balance out.
>

Guess I wasn't clear but I was thinking to read the page in, not do any
writing, and do it in a asynchronous way to the process doing the evicting.

Thanks,

Stephen

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Merlin Moncure

On Thu, Apr 17, 2014 at 2:16 PM, Stephen Frost  wrote:
> * Merlin Moncure (mmonc...@gmail.com) wrote:
>>  I don't think this would work unless we would keep some kind of
>> tracking information on the page itself which seems not worth a write
>> operation to do (maybe if the page is dirtied it could be snuck in
>> there though...).  IOW, it would only make sense to do this if we knew
>> that this page was likely to be read in again.  This might be true in
>> general on particular workloads but is probably a pretty flimsy
>> assumption without supporting evidence; probably better to let the O/S
>> deal with it.
>
> The trouble is that we're ending up "hiding" the information from the OS
> about the frequency of utilization of that page.  You have a good point
> and we wouldn't want to do this for pages that are just accessed once or
> similar, but perhaps just mark a page that's reached the 'max' as having
> been 'hot' and then, for those pages, advise the OS that while we're
> under pressure and need to push this page out, it was once pretty hottly
> used and therefore we may want it again soon.
>
> For pages that never reach the 'max' level, we wouldn't do anything on
> the assumption that those were only temporairly needed.

yeah -- the thing is, we are already too spendy already on
supplemental write i/o (hint bits, visible bits, freezing, etc) and
likely not worth it to throw something else on the pile unless the
page is already dirty; the medium term trend in storage is that read
vs write performance is becoming increasingly asymmetric, particularly
on the random side so it's very unlikely to balance out.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Merlin Moncure

On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus  wrote:
> Hackers,
>
> I think 9.3 has given us evidence that our users aren't giving new
> versions of PostgreSQL substantial beta testing, or if they are, they
> aren't sharing the results with us.
>
> How can we make beta testing better and more effective?  How can we get
> more users to actually throw serious workloads at new versions and share
> the results?
>
> I've tried a couple of things over the last two years and they haven't
> worked all that well.  Since we're about to go into another beta testing
> period, we need something new.  Ideas?

I've seen lots of bugs reported and fixed in the beta period over the
years.  My take is that it's basically unrealistic to expect volunteer
beta testers to replace bone fide regression testing.

I think it's a pretty fair statement that we've had some QC issues in
the general area of replication technologies.  What this is indicating
to me is that replication needs substantially more coverage in 'make
check'.  Since I'm wishing for things, it would be nice to see an
expansion of the buildfarm so that we could [optionally] run various
performance tests as well as various replication scenarios.  Then we
could go back to users and say, please donate 'repeatable tests and
machines to run them on' and reap the long term value.

Not at all making light out of any of this...it's a huge project.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I wonder if it would help to actually tell the OS to read in buffers
> > that we're *evicting*...  On the general notion that if the OS already
> > has them buffered then it's almost a no-op, and if it doesn't and it's
> > actually a 'hot' buffer that we're gonna need again shortly, the OS will
> > have it.
> 
> But if it's actually gone cold, you're just forcing unnecessary read I/O,
> not to mention possibly causing something slightly warmer to be lost from
> kernel cache.

Certainly possible- see the email I just sent about another thought
around this.

Obviously, none of these thoughts are really fully formed solutions and
are, instead, just speculation and ideas.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost

* Merlin Moncure (mmonc...@gmail.com) wrote:
>  I don't think this would work unless we would keep some kind of
> tracking information on the page itself which seems not worth a write
> operation to do (maybe if the page is dirtied it could be snuck in
> there though...).  IOW, it would only make sense to do this if we knew
> that this page was likely to be read in again.  This might be true in
> general on particular workloads but is probably a pretty flimsy
> assumption without supporting evidence; probably better to let the O/S
> deal with it.

The trouble is that we're ending up "hiding" the information from the OS
about the frequency of utilization of that page.  You have a good point
and we wouldn't want to do this for pages that are just accessed once or
similar, but perhaps just mark a page that's reached the 'max' as having
been 'hot' and then, for those pages, advise the OS that while we're
under pressure and need to push this page out, it was once pretty hottly
used and therefore we may want it again soon.

For pages that never reach the 'max' level, we wouldn't do anything on
the assumption that those were only temporairly needed.

Just some thoughts.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Tom Lane

Stephen Frost  writes:
> I wonder if it would help to actually tell the OS to read in buffers
> that we're *evicting*...  On the general notion that if the OS already
> has them buffered then it's almost a no-op, and if it doesn't and it's
> actually a 'hot' buffer that we're gonna need again shortly, the OS will
> have it.

But if it's actually gone cold, you're just forcing unnecessary read I/O,
not to mention possibly causing something slightly warmer to be lost from
kernel cache.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Merlin Moncure

On Thu, Apr 17, 2014 at 2:00 PM, Stephen Frost  wrote:
> * Merlin Moncure (mmonc...@gmail.com) wrote:
>> I doubt that's necessary though -- if the postgres caching algorithm
>> improves such that there is a better tendency for hot pages to stay in
>> s_b,  Eventually the O/S will deschedule the page for something else
>> that needs it.   In other words, otherwise preventable double
>> buffering is really a measurement of bad eviction policy because it
>> manifests in volatility of frequency accessed pages.
>
> I wonder if it would help to actually tell the OS to read in buffers
> that we're *evicting*...  On the general notion that if the OS already
> has them buffered then it's almost a no-op, and if it doesn't and it's
> actually a 'hot' buffer that we're gonna need again shortly, the OS will
> have it.
>
> In other words, try to make the OS more like a secondary cache to ours
> by encouraging it to cache things we're evicting.

 I don't think this would work unless we would keep some kind of
tracking information on the page itself which seems not worth a write
operation to do (maybe if the page is dirtied it could be snuck in
there though...).  IOW, it would only make sense to do this if we knew
that this page was likely to be read in again.  This might be true in
general on particular workloads but is probably a pretty flimsy
assumption without supporting evidence; probably better to let the O/S
deal with it.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost

* Merlin Moncure (mmonc...@gmail.com) wrote:
> I doubt that's necessary though -- if the postgres caching algorithm
> improves such that there is a better tendency for hot pages to stay in
> s_b,  Eventually the O/S will deschedule the page for something else
> that needs it.   In other words, otherwise preventable double
> buffering is really a measurement of bad eviction policy because it
> manifests in volatility of frequency accessed pages.

I wonder if it would help to actually tell the OS to read in buffers
that we're *evicting*...  On the general notion that if the OS already
has them buffered then it's almost a no-op, and if it doesn't and it's
actually a 'hot' buffer that we're gonna need again shortly, the OS will
have it.

In other words, try to make the OS more like a secondary cache to ours
by encouraging it to cache things we're evicting.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 11:53 AM, Merlin Moncure  wrote:
> No. but if you were very judicious, maybe you could hint the o/s
> (posix_fadvise) about pages that are likely to stay hot that you don't
> need them.

Mitsumasa KONDO wrote a patch like that. I don't think the results
were that promising, but things change quickly.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Merlin Moncure

On Thu, Apr 17, 2014 at 1:48 PM, Andres Freund  wrote:
> On 2014-04-17 21:44:47 +0300, Heikki Linnakangas wrote:
>> On 04/17/2014 09:38 PM, Stephen Frost wrote:
>> >* Greg Stark (st...@mit.edu) wrote:
>> >>On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost  wrote:
>> >>>Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
>> >>>long enough, the kernel will happily evict it as 'cold' from *its*
>> >>>cache, leading to...
>> >>
>> >>This is a whole nother problem.
>> >>
>> >>It is worrisome that we could be benchmarking the page replacement
>> >>algorithm in Postgres and choose a page replacement algorithm that
>> >>chooses pages that performs well because it tends to evict pages that
>> >>are in the OS cache. And then one day (hopefully not too far off)
>> >>we'll fix the double buffering problem and end up with a strange
>> >>choice of page replacement algorithm.
>> >
>> >That's certainly possible but I don't see the double buffering problem
>> >going away any time particularly soon and, even if it does, it's likely
>> >to either a) mean we're just using the kernel's cache (eg: something w/
>> >mmap, etc), or b) will involve so many other changes that this will end
>> >up getting changed anyway.  In any case, while I think we should
>> >document any such cache management system we employ as having this risk,
>> >I don't think we should worry about it terribly much.
>>
>> Note that if we somehow come up with a page replacement algorithm that tends
>> to evict pages that are in the OS cache, we have effectively solved the
>> double buffering problem. When a page is cached in both caches, evicting it
>> from one of them eliminates the double buffering. Granted, you might prefer
>> to evict it from the OS cache instead, and such an algorithm could be bad in
>> other ways. But if a page replacement algorithm happens avoid double
>> buffering, that's a genuine merit for that algorithm.
>
> I don't think it's a good idea to try to synchronize algorithms with the
> OSs. There's so much change about the caching logic in e.g. linux that
> it won't stay effective for very long.

No. but if you were very judicious, maybe you could hint the o/s
(posix_fadvise) about pages that are likely to stay hot that you don't
need them.

I doubt that's necessary though -- if the postgres caching algorithm
improves such that there is a better tendency for hot pages to stay in
s_b,  Eventually the O/S will deschedule the page for something else
that needs it.   In other words, otherwise preventable double
buffering is really a measurement of bad eviction policy because it
manifests in volatility of frequency accessed pages.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Heikki Linnakangas


On 04/17/2014 09:38 PM, Stephen Frost wrote:

* Greg Stark (st...@mit.edu) wrote:

On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost  wrote:

Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
long enough, the kernel will happily evict it as 'cold' from *its*
cache, leading to...


This is a whole nother problem.

It is worrisome that we could be benchmarking the page replacement
algorithm in Postgres and choose a page replacement algorithm that
chooses pages that performs well because it tends to evict pages that
are in the OS cache. And then one day (hopefully not too far off)
we'll fix the double buffering problem and end up with a strange
choice of page replacement algorithm.


That's certainly possible but I don't see the double buffering problem
going away any time particularly soon and, even if it does, it's likely
to either a) mean we're just using the kernel's cache (eg: something w/
mmap, etc), or b) will involve so many other changes that this will end
up getting changed anyway.  In any case, while I think we should
document any such cache management system we employ as having this risk,
I don't think we should worry about it terribly much.


Note that if we somehow come up with a page replacement algorithm that 
tends to evict pages that are in the OS cache, we have effectively 
solved the double buffering problem. When a page is cached in both 
caches, evicting it from one of them eliminates the double buffering. 
Granted, you might prefer to evict it from the OS cache instead, and 
such an algorithm could be bad in other ways. But if a page replacement 
algorithm happens avoid double buffering, that's a genuine merit for 
that algorithm.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> > Note that if we somehow come up with a page replacement algorithm that tends
> > to evict pages that are in the OS cache, we have effectively solved the
> > double buffering problem. When a page is cached in both caches, evicting it
> > from one of them eliminates the double buffering. Granted, you might prefer
> > to evict it from the OS cache instead, and such an algorithm could be bad in
> > other ways. But if a page replacement algorithm happens avoid double
> > buffering, that's a genuine merit for that algorithm.
> 
> I don't think it's a good idea to try to synchronize algorithms with the
> OSs. There's so much change about the caching logic in e.g. linux that
> it won't stay effective for very long.

There's also more than one OS...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Andres Freund

On 2014-04-17 21:44:47 +0300, Heikki Linnakangas wrote:
> On 04/17/2014 09:38 PM, Stephen Frost wrote:
> >* Greg Stark (st...@mit.edu) wrote:
> >>On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost  wrote:
> >>>Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
> >>>long enough, the kernel will happily evict it as 'cold' from *its*
> >>>cache, leading to...
> >>
> >>This is a whole nother problem.
> >>
> >>It is worrisome that we could be benchmarking the page replacement
> >>algorithm in Postgres and choose a page replacement algorithm that
> >>chooses pages that performs well because it tends to evict pages that
> >>are in the OS cache. And then one day (hopefully not too far off)
> >>we'll fix the double buffering problem and end up with a strange
> >>choice of page replacement algorithm.
> >
> >That's certainly possible but I don't see the double buffering problem
> >going away any time particularly soon and, even if it does, it's likely
> >to either a) mean we're just using the kernel's cache (eg: something w/
> >mmap, etc), or b) will involve so many other changes that this will end
> >up getting changed anyway.  In any case, while I think we should
> >document any such cache management system we employ as having this risk,
> >I don't think we should worry about it terribly much.
> 
> Note that if we somehow come up with a page replacement algorithm that tends
> to evict pages that are in the OS cache, we have effectively solved the
> double buffering problem. When a page is cached in both caches, evicting it
> from one of them eliminates the double buffering. Granted, you might prefer
> to evict it from the OS cache instead, and such an algorithm could be bad in
> other ways. But if a page replacement algorithm happens avoid double
> buffering, that's a genuine merit for that algorithm.

I don't think it's a good idea to try to synchronize algorithms with the
OSs. There's so much change about the caching logic in e.g. linux that
it won't stay effective for very long.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 9:52 AM, Bruce Momjian  wrote:
> Where are we on this?

My hope is that I can get agreement on a way forward during pgCon. Or,
at the very least, explain the issues as I see them in a relatively
accessible and succinct way to those interested.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost

* Greg Stark (st...@mit.edu) wrote:
> On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost  wrote:
> > Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
> > long enough, the kernel will happily evict it as 'cold' from *its*
> > cache, leading to...
> 
> This is a whole nother problem.
> 
> It is worrisome that we could be benchmarking the page replacement
> algorithm in Postgres and choose a page replacement algorithm that
> chooses pages that performs well because it tends to evict pages that
> are in the OS cache. And then one day (hopefully not too far off)
> we'll fix the double buffering problem and end up with a strange
> choice of page replacement algorithm.

That's certainly possible but I don't see the double buffering problem
going away any time particularly soon and, even if it does, it's likely
to either a) mean we're just using the kernel's cache (eg: something w/
mmap, etc), or b) will involve so many other changes that this will end
up getting changed anyway.  In any case, while I think we should
document any such cache management system we employ as having this risk,
I don't think we should worry about it terribly much.

> It also means that every benchmark is super sensitive to the how large
> a fraction of system memory Postgres is managing. If A benchmark of a
> page replacement algorithm with 3GB shared buffers might perform well
> compared to others on a system with 8GB or 32GB total RAM but actually
> be choosing pages very poorly in normal terms and perform terribly on
> a system with 4GB total ram.

I'm not following you here- benchmarks are already sensitive to how much
of the system's memory PG is managing (and how much ends up being
*dedicated* to PG's cache and therefore unavailable for other work).

> Ideally what I would like to see is instrumentation of Postgres's
> buffer pinning so we can generate various test loads and then just run
> the different algorithms on them and measure precisely how many page
> evictions it's causing and when how often it's choosing pages that
> need to be read in soon after and so on. We shouldn't have to run
> Postgres to get these counts at all, just run the algorithm as we read
> through a text file (or database table) listing the pages being
> accessed.

Go for it.  I'd love to see that also.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Josh Berkus

On 04/17/2014 05:39 AM, Greg Stark wrote:
> On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor  wrote:
>> 4) Plays queries from the CSV logs starting from $TIME mimicking actual
>> timing and transaction boundaries
> 
> This ^^
> 
> But I recall a number of previous attempts including plugins for
> general load testing systems, what happened to them?
> 
> Honestly if you really want to load test properly though what you
> really want to do is deploy a copy of your entire application and feed
> it requests simulating user traffic. That results in more accurate
> representation and gives you data that's easier to act on.

Software is available which can do this.  The problem is getting the
workload in the first place.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 9:21 AM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> several orders of magnitude more often.  That's clearly bad.  On
>> systems that are not too heavily loaded it doesn't matter too much
>> because we just fault the page right back in from the OS pagecache.
>
> Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
> long enough, the kernel will happily evict it as 'cold' from *its*
> cache, leading to...
>
>> But I've done pgbench runs where such decisions lead to long stalls,
>> because the page has to be brought back in from disk, and there's a
>> long I/O queue; or maybe just because the kernel thinks PostgreSQL is
>> issuing too many I/O requests and makes some of them wait to cool
>> things down.
>
> Exactly this.

Yes, I believe that's why this is so effective.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-17 Thread Tom Lane

Heikki Linnakangas  writes:
> Two things that are not bugs, but I'd like to change just to make this 
> tool easier to maintain, and to generally clean things up:

> 1. When creating a sequence, we first use simple_heap_insert() to insert 
> the sequence tuple, which creates a WAL record. Then we write a new 
> sequence RM WAL record about the same thing. The reason is that the WAL 
> record written by regular heap_insert is bogus for a sequence tuple. 
> After replaying just the heap insertion, but not the other record, the 
> page doesn't have the magic value indicating that it's a sequence, i.e. 
> it's broken as a sequence page. That's OK because we only do this when 
> creating a new sequence, so if we crash between those two records, the 
> whole relation is not visible to anyone. Nevertheless, I'd like to fix 
> that by using PageAddItem directly to insert the tuple, instead of 
> simple_heap_insert. We have to override the xmin field of the tuple 
> anyway, and we don't need any of the other services like finding the 
> insert location, toasting, visibility map or freespace map updates, that 
> simple_heap_insert() provides.

> 2. _bt_restore_page, when restoring a B-tree page split record. It adds 
> tuples to the page in reverse order compared to how it's done in master. 
> There is a comment noting that, and it asks "Is it worth changing just 
> on general principles?". Yes, I think it is.

> Any objections to changing those two?

Not here.  I've always suspected #2 was going to bite us someday anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-17 Thread Heikki Linnakangas


On 04/08/2014 06:41 AM, Michael Paquier wrote:

On Tue, Apr 8, 2014 at 3:16 AM, Heikki Linnakangas
 wrote:


I've been playing with a little hack that records a before and after image
of every page modification that is WAL-logged, and writes the images to a
file along with the LSN of the corresponding WAL record. I set up a
master-standby replication with that hack in place in both servers, and ran
the regression suite. Then I compared the after images after every WAL
record, as written on master, and as replayed by the standby.

Assuming that adding some dedicated hooks in the core able to do
actions before and after a page modification occur is not *that*
costly (well I imagine that it is not acceptable in terms of
performance), could it be possible to get that in the shape of a
extension that could be used to test WAL record consistency? This may
be an idea to think about...


Yeah, working on it. It can live as a patch set if nothing else.

This has been very fruitful, I just committed another fix for a bug I 
found with this earlier today.


There are quite a few things that cause differences between master and 
standby. We have hint bits in many places, unused space that isn't 
zeroed etc.


Two things that are not bugs, but I'd like to change just to make this 
tool easier to maintain, and to generally clean things up:


1. When creating a sequence, we first use simple_heap_insert() to insert 
the sequence tuple, which creates a WAL record. Then we write a new 
sequence RM WAL record about the same thing. The reason is that the WAL 
record written by regular heap_insert is bogus for a sequence tuple. 
After replaying just the heap insertion, but not the other record, the 
page doesn't have the magic value indicating that it's a sequence, i.e. 
it's broken as a sequence page. That's OK because we only do this when 
creating a new sequence, so if we crash between those two records, the 
whole relation is not visible to anyone. Nevertheless, I'd like to fix 
that by using PageAddItem directly to insert the tuple, instead of 
simple_heap_insert. We have to override the xmin field of the tuple 
anyway, and we don't need any of the other services like finding the 
insert location, toasting, visibility map or freespace map updates, that 
simple_heap_insert() provides.


2. _bt_restore_page, when restoring a B-tree page split record. It adds 
tuples to the page in reverse order compared to how it's done in master. 
There is a comment noting that, and it asks "Is it worth changing just 
on general principles?". Yes, I think it is.


Any objections to changing those two?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark

On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost  wrote:
> Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
> long enough, the kernel will happily evict it as 'cold' from *its*
> cache, leading to...

This is a whole nother problem.

It is worrisome that we could be benchmarking the page replacement
algorithm in Postgres and choose a page replacement algorithm that
chooses pages that performs well because it tends to evict pages that
are in the OS cache. And then one day (hopefully not too far off)
we'll fix the double buffering problem and end up with a strange
choice of page replacement algorithm.

It also means that every benchmark is super sensitive to the how large
a fraction of system memory Postgres is managing. If A benchmark of a
page replacement algorithm with 3GB shared buffers might perform well
compared to others on a system with 8GB or 32GB total RAM but actually
be choosing pages very poorly in normal terms and perform terribly on
a system with 4GB total ram.

Ideally what I would like to see is instrumentation of Postgres's
buffer pinning so we can generate various test loads and then just run
the different algorithms on them and measure precisely how many page
evictions it's causing and when how often it's choosing pages that
need to be read in soon after and so on. We shouldn't have to run
Postgres to get these counts at all, just run the algorithm as we read
through a text file (or database table) listing the pages being
accessed.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-04-17 Thread Bruce Momjian

On Mon, Feb 10, 2014 at 06:40:30PM +, Peter Geoghegan wrote:
> On Sun, Jan 19, 2014 at 2:17 AM, Peter Geoghegan  wrote:
> > I'm just throwing an error when locking the tuple returns
> > HeapTupleInvisible, and the xmin of the tuple is our xid.
> 
> I would like some feedback on this point. We need to consider how
> exactly to avoid updating the same tuple inserted by our command.
> Updating a tuple we inserted cannot be allowed to happen, not least
> because to do so causes livelock.
> 
> A related consideration that I raised in mid to late January that
> hasn't been commented on is avoiding updating the same tuple twice,
> and where we come down on that with respect to where our
> responsibility to the user starts and ends. For example, SQL MERGE
> officially forbids this, but MySQL's INSERT...ON DUPLICATE KEY UPDATE
> seems not to, probably due to implementation considerations.

Where are we on this?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix memset usage in pgcrypto

2014-04-17 Thread Bruce Momjian

On Mon, Mar 31, 2014 at 09:03:41PM -0400, Bruce Momjian wrote:
> On Thu, Dec 26, 2013 at 03:42:12PM +0200, Marko Kreen wrote:
> > http://www.viva64.com/en/b/0227/ reported that on-stack memset()s
> > might be optimized away by compilers.  Fix it.
> > 
> > * Replace memset() with px_memset()
> > * Add px_memset to copy_crlf()
> > * ADd px_memset to pgp-s2k.c
> 
> Where are we on this patch?  Seems it needs backpatching too.

Patch backpatched through 8.4.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> several orders of magnitude more often.  That's clearly bad.  On
> systems that are not too heavily loaded it doesn't matter too much
> because we just fault the page right back in from the OS pagecache.

Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
long enough, the kernel will happily evict it as 'cold' from *its*
cache, leading to...

> But I've done pgbench runs where such decisions lead to long stalls,
> because the page has to be brought back in from disk, and there's a
> long I/O queue; or maybe just because the kernel thinks PostgreSQL is
> issuing too many I/O requests and makes some of them wait to cool
> things down.

Exactly this.

> Of course, the overhead of repeated clock sweeps to push down the
> usage counts isn't a great thing either.  I'm not saying that isn't a
> problem.  But I think bad decisions about what to evict are also a
> problem.

Using a bit more CPU here and there, particularly if it's done in a
background worker, or ideally multiple background workers (for each
buffer pool) would be much better than evicting a hot page that isn't in
the kernel's buffer either 'cause we've held on to it long enough that
the kernel thinks it's cold.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?

2014-04-17 Thread Bruce Momjian

On Wed, Apr 16, 2014 at 06:55:14PM -0400, Bruce Momjian wrote:
> > Seems reasonable.  It could lead to quite a bit of log spam, I
> > suppose, but the way things are now could be pretty mystifying if
> > you've located your trigger file somewhere outside $PGDATA, and a
> > parent directory is lacking permissions.
> > 
> > 
> > +1. Since it actually indicates something that's quite broken (since with 
> > that
> > you can never make the trigger work until you fix it), the log spam seems 
> > like
> > it would be appropriate. (Logspam is never nice, but a single log line is 
> > also
> > very easy to miss - this should log enough that you wouldn't) 
> 
> I have developed the attached patch to address this issue.

Patch applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Bruce Momjian

On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > The idea is that we only need quotes when there are odd characters in
> > the identifier.  We do that right now in some places, though I can't
> > find them in pg_dump.  I know psql does that, see quote_ident().
> 
> I think our general style rule is that identifiers embedded in messages
> are always double-quoted.  There's an exception for type names, but
> not otherwise.  You're confusing the message case with printing SQL.

OK.  I was unclear if a status _display_ was a message like an error
message.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Tom Lane

Bruce Momjian  writes:
> The idea is that we only need quotes when there are odd characters in
> the identifier.  We do that right now in some places, though I can't
> find them in pg_dump.  I know psql does that, see quote_ident().

I think our general style rule is that identifiers embedded in messages
are always double-quoted.  There's an exception for type names, but
not otherwise.  You're confusing the message case with printing SQL.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] pg_upgrade & tablespaces

2014-04-17 Thread Bruce Momjian

On Wed, Apr 16, 2014 at 01:49:20PM -0400, Bruce Momjian wrote:
> On Sun, Jan 12, 2014 at 11:04:41PM -0500, Bruce Momjian wrote:
> > > In the pgsql_old installation you have symlinks pointing back to the
> > > current default location. As well pg_tablespace points back to
> > > /usr/local/pgsql/data/ The issue is that there is not actually
> > > anything there in the way of a tablespace. So when pg_upgrade runs
> > > it tries to upgrade from /usr/local/pgsql/data/tblspc_dir to
> > > /usr/local/pgsql/data/tblspc_dir where the first directory either
> > > does not exist. or if the user went ahead and created the directory
> > > in the new installation, is empty. What is really wanted is to
> > > upgrade from /usr/local/pgsql_old/data/tblspc_dir to
> > > /usr/local/pgsql/data/tblspc_dir. Right now the only way that
> > > happens is with user intervention.
> > 
> > Right, it points to _nothing_ in the _new_ cluster.  Perhaps the
> > simplest approach would be to check all the pg_tablespace locations to
> > see if they point at real directories.  If not, we would have to have
> > the user update pg_tablespace and the symlinks.  :-(  Actually, even in
> > 9.2+, those symlinks are going to point at the same "nothing".  That
> > would support checking the symlinks in all versions.
> 
> I have developed the attached patch which checks all tablespaces to make
> sure the directories exist.  I plan to backpatch this.
> 
> The reason we haven't seen this bug reported more frequently is that a
> _database_ defined in a non-existent tablespace directory already throws
> an backend error, so this check is only necessary where tables/indexes
> (not databases) are defined in non-existant tablespace directories.

Patch applied and backpatched to 9.3.  I beefed up the C comment to
explain how this can happen:

   Check that the tablespace path exists and is a directory.
   Effectively, this is checking only for tables/indexes in
   non-existent tablespace directories.  Databases located
   in non-existent tablespaces already throw a backend error.
   Non-existent tablespace directories can occur when a data directory
   that contains user tablespaces is moved as part of pg_upgrade
   preparation and the symbolic links are not updated.

Thanks for the report and debugging.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Bruce Momjian

On Thu, Apr 17, 2014 at 12:07:39PM -0300, Fabrízio de Royes Mello wrote:
> > Can you get that to _conditionally_ double-quote the strings? 
> 
> Sorry, I didn't understand what you means? Your idea is to check if the
> namespace is available and then don't show the double-quote, is that?

The idea is that we only need quotes when there are odd characters in
the identifier.  We do that right now in some places, though I can't
find them in pg_dump.  I know psql does that, see quote_ident().

> > In fact,
> > maybe we don't even need the double-quotes.  How do we double-quote
> > other places?
> >
> 
> Checking that more deeply I found some other places that show the table name
> and all of them are double-quoted.

OK.

> Just the "dumping contents of table.." message isn't double-quoted:
> 
> $ grep 'table %s' src/bin/pg_dump/*.c
> src/bin/pg_dump/pg_dump.c:            write_msg(NULL, "dumping contents of
> table %s\n",
> 
> 
> So maybe we must double-quote of all string, i.e. "public.foo", including the
> missing bellow.

No, I think double-quoting each part is the correct way.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark

On Tue, Apr 15, 2014 at 7:30 PM, Peter Geoghegan  wrote:
> Frankly, there doesn't need to be any research on this, because it's
> just common sense that probabilistically, leaf pages are much more
> useful than heap pages in servicing index scan queries if we assume a
> uniform distribution. If we don't assume that, then they're still more
> useful on average.

I don't think "common sense" is compelling. I think you need to pin
down exactly what it is about btree intermediate pages that the LRU
isn't capturing and not just argue they're more useful. The LRU is
already capturing which pages are more heavily used than others so you
need to identify what it is that makes index pages *even more* useful
than their frequency and recency of access indicates. Not just that
they're more useful than an average page.

So what I think is missing is that indexes are always accessed from
the root down to the leaf. So the most recent page accessed will
always be the leaf. And in whatever chain of pages was used to reach
the last leaf page the least recently accessed will always be the
root. But we'll need the root page again on the subsequent descent
even if it's to reach the same leaf page we kept in ram in preference
to it.

Now it doesn't *always* make sense to keep an intermediate page over
leaf pages. Imagine an index that we always do full traversals of.
We'll always descend from the root down the left-most pages and then
follow the right pointers across. All the other intermediate pages
will be cold. If we do an occasional descent probing for other keys
those leaf pages shouldn't be cached since they won't be needed again
for the common full index traversals and the next occasional probe
will probably be looking for different keys.

But if we're often probing for the same keys the last thing we want to
do is throw away one of the intermediate pages for those keys when we
could throw away a leaf page. But that's what would happen in a strict
LRU.  It's almost like what we would really want to do is mark the
pages as least recently used in the opposite order from the order
they're actually accessed when descending. Or perhaps bump the usage
count to max+1 when it's an intermediate page so that it takes one
extra cycle of decrementing before it's considered old compared to a
leaf page.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Fabrízio de Royes Mello

On Thu, Apr 17, 2014 at 11:36 AM, Bruce Momjian  wrote:
>
> On Thu, Apr 17, 2014 at 11:29:03AM -0300, Fabrízio de Royes Mello wrote:
> > > > This database have a lot of different schemas with same structure
and if I
> > > > need do view the status of dump I don't know what schema the table
are dump
> > > > from.
> > > Yes this may be helpful. The attached quick'n dirty patch implements
it.
> > >
> >
> > Very nice... thanks!!!
> >
> > I add schema name do the following messages too:
> >
> > pg_restore: processing data for table "public"."bar"
> > pg_restore: processing data for table "public"."foo"
> > pg_restore: processing data for table "s1"."bar"
> > pg_restore: processing data for table "s1"."foo"
> > pg_restore: processing data for table "s2"."bar"
> > pg_restore: processing data for table "s2"."foo"
> > pg_restore: processing data for table "s3"."bar"
> > pg_restore: processing data for table "s3"."foo"
>
> Can you get that to _conditionally_ double-quote the strings?

Sorry, I didn't understand what you means? Your idea is to check if the
namespace is available and then don't show the double-quote, is that?


> In fact,
> maybe we don't even need the double-quotes.  How do we double-quote
> other places?
>

Checking that more deeply I found some other places that show the table
name and all of them are double-quoted.

$ grep 'table \\\"%s' src/bin/pg_dump/*.c
src/bin/pg_dump/common.c:write_msg(NULL, "failed sanity
check, parent OID %u of table \"%s\" (OID %u) not found\n",
src/bin/pg_dump/pg_backup_archiver.c:ahlog(AH, 1,
"processing data for table \"%s\".\"%s\"\n",
src/bin/pg_dump/pg_backup_archiver.c:ahlog(AH, 1, "table \"%s\" could
not be created, will not restore its data\n",
src/bin/pg_dump/pg_backup_db.c:warn_or_exit_horribly(AH,
modulename, "COPY failed for table \"%s\": %s",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "Dumping the contents of
table \"%s\" failed: PQgetCopyData() failed.\n", classname);
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "Dumping the contents of
table \"%s\" failed: PQgetResult() failed.\n", classname);
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "WARNING: owner of
table \"%s\" appears to be invalid\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "reading indexes for
table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "reading foreign key
constraints for table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "reading triggers for
table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:exit_horribly(NULL,
"query produced null referenced table name for foreign key trigger \"%s\"
on table \"%s\" (OID of table: %u)\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "finding the
columns and types of table \"%s\".\"%s\"\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "finding the
columns and types of table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:  "invalid column
numbering in table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "finding default
expressions of table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:  "invalid adnum
value %d for table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "finding check
constraints for table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL,
ngettext("expected %d check constraint on table \"%s\" but found %d\n",
src/bin/pg_dump/pg_dump.c:
 "expected %d check constraints on table \"%s\" but found %d\n",
src/bin/pg_dump/pg_dump.c:exit_horribly(NULL, "invalid column number %d
for table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "invalid argument
string (%s) for trigger \"%s\" on table \"%s\"\n",
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "query to get rule \"%s\"
for table \"%s\" failed: wrong number of rows returned\n",


Just the "dumping contents of table.." message isn't double-quoted:

$ grep 'table %s' src/bin/pg_dump/*.c
src/bin/pg_dump/pg_dump.c:write_msg(NULL, "dumping contents of
table %s\n",


So maybe we must double-quote of all string, i.e. "public.foo", including
the missing bellow.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark

On Thu, Apr 17, 2014 at 10:18 AM, Robert Haas  wrote:
> Because all the usage counts are the same, the eviction at
> this point is completely indiscriminate.  We're just as likely to kick
> out a btree root page or a visibility map page as we are to kick out a
> random heap page, even though the former have probably been accessed
> several orders of magnitude more often.  That's clearly bad.

That's not clear at all. In that circumstance regardless of what page
you evict you're incurring precisely one page fault i/o when the page
is read back in. Incurring that i/o is bad but it's unavoidable and
it's the same badness regardless of what page it's for. The only way
to prefer one page over another is if one page won't be needed for
long enough for the page to be useful for caching this new buffer (or
mixture of buffers) for multiple accesses. If you can't do that then
it doesn't matter which buffer you use since it'll just be evicted to
read back in the original page again.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Andres Freund

On 2014-04-17 10:48:15 -0400, Bruce Momjian wrote:
> On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote:
> > That can happen, but the real problem I was trying to get at is that
> > when all the buffers get up to max usage count, they all appear
> > equally important.  But in reality they're not.  So when we do start
> > evicting those long-resident buffers, it's essentially random which
> > one we kick out.
> 
> True.  Ideally we would have some way to know that _all_ the buffers had
> reached the maximum and kick off a sweep to decrement them all.  I am
> unclear how we would do that.  One odd idea would be to have a global
> counter that is incremented everytime a buffer goes from 4 to 5 (max)
> --- when the counter equals 50% of all buffers, do a clock sweep.  Of
> course, then the counter becomes a bottleneck.

I have my doubts that we'll make the current scheme, where buffer
reclaim essentially is O(NBuffers), work much better. Especially as CPU
cache effects make such large, high frequency, accesses really
expensive.
I think we need more drastic changes.

I am *not* suggesting that we do that, but I believe it'd be possible to
implement a full LRU and be faster than today in scenarios with
nontrivial amounts of shared buffers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Robert Haas

On Thu, Apr 17, 2014 at 10:48 AM, Bruce Momjian  wrote:
>> > I understand now.  If there is no memory pressure, every buffer gets the
>> > max usage count, and when a new buffer comes in, it isn't the max so it
>> > is swiftly removed until the clock sweep has time to decrement the old
>> > buffers.  Decaying buffers when there is no memory pressure creates
>> > additional overhead and gets into timing issues of when to decay.
>>
>> That can happen, but the real problem I was trying to get at is that
>> when all the buffers get up to max usage count, they all appear
>> equally important.  But in reality they're not.  So when we do start
>> evicting those long-resident buffers, it's essentially random which
>> one we kick out.
>
> True.  Ideally we would have some way to know that _all_ the buffers had
> reached the maximum and kick off a sweep to decrement them all.  I am
> unclear how we would do that.  One odd idea would be to have a global
> counter that is incremented everytime a buffer goes from 4 to 5 (max)
> --- when the counter equals 50% of all buffers, do a clock sweep.  Of
> course, then the counter becomes a bottleneck.

Yeah, I think that's the right general line of thinking.  But it
doesn't have to be as coarse-grained as "do a whole clock sweep".  It
can be, you know, for every buffer that gets incremented from 4 to 5,
run the clock sweep far enough to decrement the usage count of some
other buffer by one.  That's similar to your idea but you can do it a
bit at a time rather than having to make a complete pass over
shared_buffers all at once.

Your other point, that the counter can become the bottleneck, is quite
right also and a major problem in this area.  I don't know how to
solve it right at the moment, but I'm hopeful that there may be a way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Bruce Momjian

On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote:
> On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian  wrote:
> > On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
> >> I also believe this to be the case on first principles and my own
> >> experiments.  Suppose you have a workload that fits inside
> >> shared_buffers.  All of the usage counts will converge to 5.  Then,
> >> somebody accesses a table that is not cached, so something's got to be
> >> evicted.  Because all the usage counts are the same, the eviction at
> >> this point is completely indiscriminate.  We're just as likely to kick
> >> out a btree root page or a visibility map page as we are to kick out a
> >> random heap page, even though the former have probably been accessed
> >> several orders of magnitude more often.  That's clearly bad.  On
> >> systems that are not too heavily loaded it doesn't matter too much
> >> because we just fault the page right back in from the OS pagecache.
> >> But I've done pgbench runs where such decisions lead to long stalls,
> >> because the page has to be brought back in from disk, and there's a
> >> long I/O queue; or maybe just because the kernel thinks PostgreSQL is
> >> issuing too many I/O requests and makes some of them wait to cool
> >> things down.
> >
> > I understand now.  If there is no memory pressure, every buffer gets the
> > max usage count, and when a new buffer comes in, it isn't the max so it
> > is swiftly removed until the clock sweep has time to decrement the old
> > buffers.  Decaying buffers when there is no memory pressure creates
> > additional overhead and gets into timing issues of when to decay.
> 
> That can happen, but the real problem I was trying to get at is that
> when all the buffers get up to max usage count, they all appear
> equally important.  But in reality they're not.  So when we do start
> evicting those long-resident buffers, it's essentially random which
> one we kick out.

True.  Ideally we would have some way to know that _all_ the buffers had
reached the maximum and kick off a sweep to decrement them all.  I am
unclear how we would do that.  One odd idea would be to have a global
counter that is incremented everytime a buffer goes from 4 to 5 (max)
--- when the counter equals 50% of all buffers, do a clock sweep.  Of
course, then the counter becomes a bottleneck.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Robert Haas

On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian  wrote:
> On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
>> I also believe this to be the case on first principles and my own
>> experiments.  Suppose you have a workload that fits inside
>> shared_buffers.  All of the usage counts will converge to 5.  Then,
>> somebody accesses a table that is not cached, so something's got to be
>> evicted.  Because all the usage counts are the same, the eviction at
>> this point is completely indiscriminate.  We're just as likely to kick
>> out a btree root page or a visibility map page as we are to kick out a
>> random heap page, even though the former have probably been accessed
>> several orders of magnitude more often.  That's clearly bad.  On
>> systems that are not too heavily loaded it doesn't matter too much
>> because we just fault the page right back in from the OS pagecache.
>> But I've done pgbench runs where such decisions lead to long stalls,
>> because the page has to be brought back in from disk, and there's a
>> long I/O queue; or maybe just because the kernel thinks PostgreSQL is
>> issuing too many I/O requests and makes some of them wait to cool
>> things down.
>
> I understand now.  If there is no memory pressure, every buffer gets the
> max usage count, and when a new buffer comes in, it isn't the max so it
> is swiftly removed until the clock sweep has time to decrement the old
> buffers.  Decaying buffers when there is no memory pressure creates
> additional overhead and gets into timing issues of when to decay.

That can happen, but the real problem I was trying to get at is that
when all the buffers get up to max usage count, they all appear
equally important.  But in reality they're not.  So when we do start
evicting those long-resident buffers, it's essentially random which
one we kick out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Buildfarm "master-next" branch?

2014-04-17 Thread Tom Lane

Andrew Dunstan  writes:
> On 04/17/2014 09:17 AM, Robert Haas wrote:
>> In terms of improving the buildfarm infrastructure, the thing I would
>> most like to have is more frequent runs.

IMO the best single thing that could happen for the buildfarm is if
we had more critters (at least twice as many) running a wider variety of
platforms, compilers, and configuration options than there are today.
More frequent runs would come out of that automatically.

>> ... But that would require more resources for the
>> buildfarm machines, which are provided on a strictly volunteer basis,
>> so it's hard to see how to arrange that.

I don't think we've tried hard lately to get people to sign up.  Maybe
we should ask the -advocacy crew to do something.

>> But the ability to easily spin up temporary branches for testing would
>> also be great.  Unfortunately, I suspect that only a minority of the
>> buildfarm owners would choose to participate, which would make it less
>> useful, but if we could solve that problem I'd be all in favor of it.

> ... Of course, all this would be done in my copious spare time *cough*. I'm 
> not sure this would be the best use of it.

I agree that this would not be worth the effort needed to make it happen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Bruce Momjian

On Thu, Apr 17, 2014 at 11:29:03AM -0300, Fabrízio de Royes Mello wrote:
> > > This database have a lot of different schemas with same structure and if I
> > > need do view the status of dump I don't know what schema the table are 
> > > dump
> > > from.
> > Yes this may be helpful. The attached quick'n dirty patch implements it.
> >
> 
> Very nice... thanks!!!
> 
> I add schema name do the following messages too:
> 
> pg_restore: processing data for table "public"."bar"
> pg_restore: processing data for table "public"."foo"
> pg_restore: processing data for table "s1"."bar"
> pg_restore: processing data for table "s1"."foo"
> pg_restore: processing data for table "s2"."bar"
> pg_restore: processing data for table "s2"."foo"
> pg_restore: processing data for table "s3"."bar"
> pg_restore: processing data for table "s3"."foo"

Can you get that to _conditionally_ double-quote the strings?  In fact,
maybe we don't even need the double-quotes.  How do we double-quote
other places?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Bruce Momjian

On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
> I also believe this to be the case on first principles and my own
> experiments.  Suppose you have a workload that fits inside
> shared_buffers.  All of the usage counts will converge to 5.  Then,
> somebody accesses a table that is not cached, so something's got to be
> evicted.  Because all the usage counts are the same, the eviction at
> this point is completely indiscriminate.  We're just as likely to kick
> out a btree root page or a visibility map page as we are to kick out a
> random heap page, even though the former have probably been accessed
> several orders of magnitude more often.  That's clearly bad.  On
> systems that are not too heavily loaded it doesn't matter too much
> because we just fault the page right back in from the OS pagecache.
> But I've done pgbench runs where such decisions lead to long stalls,
> because the page has to be brought back in from disk, and there's a
> long I/O queue; or maybe just because the kernel thinks PostgreSQL is
> issuing too many I/O requests and makes some of them wait to cool
> things down.

I understand now.  If there is no memory pressure, every buffer gets the
max usage count, and when a new buffer comes in, it isn't the max so it
is swiftly removed until the clock sweep has time to decrement the old
buffers.  Decaying buffers when there is no memory pressure creates
additional overhead and gets into timing issues of when to decay.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Fabrízio de Royes Mello

On Thu, Apr 17, 2014 at 2:14 AM, Michael Paquier 
wrote:
>
> On Thu, Apr 17, 2014 at 11:41 AM, Fabrízio de Royes Mello
>  wrote:
> > Hi all,
> >
> > There are some reason to verbose output of pg_dump don't show schema
name?
> >
> > A output example of using "pg_dump -Fd -j8 -v"
> Specifying a target directory with "-f" is better here...
>

Yeah... I'm just show the relevant options used... ;-)


> > This database have a lot of different schemas with same structure and
if I
> > need do view the status of dump I don't know what schema the table are
dump
> > from.
> Yes this may be helpful. The attached quick'n dirty patch implements it.
>

Very nice... thanks!!!

I add schema name do the following messages too:

pg_restore: processing data for table "public"."bar"
pg_restore: processing data for table "public"."foo"
pg_restore: processing data for table "s1"."bar"
pg_restore: processing data for table "s1"."foo"
pg_restore: processing data for table "s2"."bar"
pg_restore: processing data for table "s2"."foo"
pg_restore: processing data for table "s3"."bar"
pg_restore: processing data for table "s3"."foo"

And:

pg_dump: finding the columns and types of table "s1"."foo"
pg_dump: finding the columns and types of table "s1"."bar"
pg_dump: finding the columns and types of table "s2"."foo"
pg_dump: finding the columns and types of table "s2"."bar"
pg_dump: finding the columns and types of table "s3"."foo"
pg_dump: finding the columns and types of table "s3"."bar"
pg_dump: finding the columns and types of table "public"."foo"
pg_dump: finding the columns and types of table "public"."bar"

And:

pg_dump: processing data for table "public"."bar"
pg_dump: dumping contents of table public.bar
pg_dump: processing data for table "public"."foo"
pg_dump: dumping contents of table public.foo
pg_dump: processing data for table "s1"."bar"
pg_dump: dumping contents of table s1.bar
pg_dump: processing data for table "s1"."foo"
pg_dump: dumping contents of table s1.foo
pg_dump: processing data for table "s2"."bar"
pg_dump: dumping contents of table s2.bar
pg_dump: processing data for table "s2"."foo"
pg_dump: dumping contents of table s2.foo
pg_dump: processing data for table "s3"."bar"
pg_dump: dumping contents of table s3.bar
pg_dump: processing data for table "s3"."foo"
pg_dump: dumping contents of table s3.foo


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 9464540..7f73e8d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -706,8 +706,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
 
-	ahlog(AH, 1, "processing data for table \"%s\"\n",
-		  te->tag);
+	ahlog(AH, 1, "processing data for table \"%s\".\"%s\"\n",
+		  AH->currSchema, te->tag);
 
 	/*
 	 * In parallel restore, if we created the table earlier in
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a6c0428..f763a88 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1400,7 +1400,17 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 	const char *column_list;
 
 	if (g_verbose)
-		write_msg(NULL, "dumping contents of table %s\n", classname);
+	{
+		/* Print namespace information if available */
+		if (tbinfo->dobj.namespace != NULL &&
+			tbinfo->dobj.namespace->dobj.name != NULL)
+			write_msg(NULL, "dumping contents of table %s.%s\n",
+	  tbinfo->dobj.namespace->dobj.name,
+	  classname);
+		else
+			write_msg(NULL, "dumping contents of table %s\n",
+	  classname);
+	}
 
 	/*
 	 * Make sure we are in proper schema.  We will qualify the table name
@@ -6291,8 +6301,17 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 		 * the output of an indexscan on pg_attribute_relid_attnum_index.
 		 */
 		if (g_verbose)
-			write_msg(NULL, "finding the columns and types of table \"%s\"\n",
-	  tbinfo->dobj.name);
+		{
+			/* Print namespace information if available */
+			if (tbinfo->dobj.namespace != NULL &&
+tbinfo->dobj.namespace->dobj.name != NULL)
+write_msg(NULL, "finding the columns and types of table \"%s\".\"%s\"\n",
+			  tbinfo->dobj.namespace->dobj.name,
+			  tbinfo->dobj.name);
+			else
+write_msg(NULL, "finding the columns and types of table \"%s\"\n",
+		  tbinfo->dobj.name);
+		}
 
 		resetPQExpBuffer(q);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Buildfarm "master-next" branch?

2014-04-17 Thread Andrew Dunstan



On 04/17/2014 09:17 AM, Robert Haas wrote:



In terms of improving the buildfarm infrastructure, the thing I would
most like to have is more frequent runs.  It would be great if pushing
a commit to the master repository triggered an immediate build on
every buildfarm animal so that you could see all of the failures in a
short period of time.  But that would require more resources for the
buildfarm machines, which are provided on a strictly volunteer basis,
so it's hard to see how to arrange that.



Some buildfarm owners run at pretty high frequency - I know there are 
cron jobs on some running every 15 minutes. My own Linux and FBSD 
machines run every hour. Windows builds take longer - depending on other 
use of resources they can run a couple of hours per branch. Also my two 
Windows machines doing buildfarm work are running a total of 5 animals, 
so the runs are staggered - on Windows 8 the two animals each run every 
3 hours. Note that each run potentially builds all the branches, if 
there has been some backported change, and the windows animals are set 
up so that if animal A on the same machine is running when animal B's 
run time comes around animal B skips it scheduled run. So sometimes you 
do have to wait a bit. If someone were to providfe me with a bunch of 
nice fast Windows VMs I would set them up with one animal a piece with 
frequent runs and we might get a lot better coverage. But I am tapped 
out as far as the resources I can provide go.





But the ability to easily spin up temporary branches for testing would
also be great.  Unfortunately, I suspect that only a minority of the
buildfarm owners would choose to participate, which would make it less
useful, but if we could solve that problem I'd be all in favor of it.
I'm not volunteering to do the work, though.


The buildfarm's original purpose was to give early warning of 
platform-specific problems of code we had *already* decided on. Now 
projects morph, so we might decide to do something like this. But we'd 
need to think long and hard about it. Postgres has not historically used 
short-lived branches. I don't much like Craig's idea of a long-lived 
testing branch that we're going to do commits and reverts on. If we're 
going to do something like this it would be much better to make some 
provision for short-lived topic branches. e.g. say we allowed branches 
with names like testme-platformname-featurename, (platformname here 
could be a magic "all", or a comma-separated list of names such as 
linux, freebsd, windows). Wnen testing is done, we could merge the 
branch if the testing worked out OK, or drop it if the testing proved to 
be a failure.



There would be some work to make the buildfarm client suitable for this. 
And we'd probably need a "testing dashboard" so as to keep the main 
dashboard page free of test branch results.


Of course, all this would be done in my copious spare time *cough*. I'm 
not sure this would be the best use of it.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Robert Haas

On Thu, Apr 17, 2014 at 9:40 AM, Greg Stark  wrote:
> On Wed, Apr 16, 2014 at 12:44 AM, Robert Haas  wrote:
>> This isn't a fundamental property of the usage-count idea; it's an
>> artifact of the fact that usage count decreases are tied to eviction
>> pressure rather than access pressure.  For example, suppose we made a
>> rule that if the total usage counts of all buffers exceed 3 *
>> NBuffers, then every time you bump the usage count of a buffer from N
>> to N+1, you're required to advance the clock sweep far enough to
>> decrease the reference count of a buffer by one.
>
> This sounds like the right way to reason about it.
>
> From what I remember in school the idea with the clock sweep is to set
> the usage flags to the maximum whenever the buffer is used and
> decrement (actually iirc typically shift right)  it when the clock
> sweep goes by. Ie, simulate a LRU where when the buffer is accessed it
> jumps to the head of the list and when the clock comes by it moves
> gradually down the list.
>
> What you're pointing out is that the clock might not come by very
> often resulting everything being at the head of the list. In that case
> I'm not clear it really matters what gets evicted though. And the cpu
> effort of running the clock n times sounds bad but doing the work
> earlier doesn't really change the amount of work being done, it just
> amortizes it over more calls.
>
> But if you want to do that it seems to me the way to do it is every
> time a buffer is pinned set to the maximum and then run the clock
> max_value - previous_value. So the total usage counts of all buffers
> remains constant. If that results in contention one way to reduce it
> is to do this probabilistically. Run the clock 1% of the time but run
> it 100x as much as you would normally.
>
> But I think you've misidentified the problem and what those other
> algorithms are trying to solve. The problem is not that Postgres will
> pick a bad buffer to evict. If all the buffers have been since the
> last time the clock came around then they're all "hot" anyways and it
> doesn't really matter which one we evict. The problem is that we
> expend an inordinate amount of work finding the few non-hot buffers.
> When you have a really large amount of memory and 99.9% of it is hot
> but 0.1% is whatever random non-hot page was needed last then there's
> an obvious buffer to evict when you need a new one. But we spend a lot
> of work decrementing every hot buffer's usage count 4 times only to
> have them immediately incremented again just to find the 1 buffer
> where the usage count was 4 or 3. The goal of these algorithms that
> divide the buffers into groups is to avoid having to do so much work
> to find the colder buffers. Once the hot buffers migrate to the hot
> pool we only need to run the clock there when we find we have new hot
> pages that we want to promote. All the thrashing in the cold pool can
> be more efficient because there's many fewer pages to consider.

Well, I think Peter has proved that PostgreSQL *will* pick a bad
buffer to evict.  The proof is that when he changed the choice of
buffer to evict, he got a significant performance improvement.

I also believe this to be the case on first principles and my own
experiments.  Suppose you have a workload that fits inside
shared_buffers.  All of the usage counts will converge to 5.  Then,
somebody accesses a table that is not cached, so something's got to be
evicted.  Because all the usage counts are the same, the eviction at
this point is completely indiscriminate.  We're just as likely to kick
out a btree root page or a visibility map page as we are to kick out a
random heap page, even though the former have probably been accessed
several orders of magnitude more often.  That's clearly bad.  On
systems that are not too heavily loaded it doesn't matter too much
because we just fault the page right back in from the OS pagecache.
But I've done pgbench runs where such decisions lead to long stalls,
because the page has to be brought back in from disk, and there's a
long I/O queue; or maybe just because the kernel thinks PostgreSQL is
issuing too many I/O requests and makes some of them wait to cool
things down.

Of course, the overhead of repeated clock sweeps to push down the
usage counts isn't a great thing either.  I'm not saying that isn't a
problem.  But I think bad decisions about what to evict are also a
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Andrew Dunstan



On 04/16/2014 10:28 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 04/16/2014 07:19 PM, Tom Lane wrote:

Yeah, it would be real nice to see a self-contained test case for this.

Well, that might be hard to put together, but I did try running without
pg_stat_statements and auto_explain loaded and the error did not occur.
Not sure where that gets us in terms of deciding on a culprit.

Could we at least see the exact parameter settings for pg_stat_statements
and auto_explain?  (And any other GUCs with nondefault values?)





Here are all the settings from the run that failed:

   listen_addresses = '*'
   port = 5432
   fsync = on
   synchronous_commit = off
   checkpoint_segments = 128
   checkpoint_completion_target = 0.9
   shared_buffers = 512MB
   max_connections = 300
   work_mem = 128MB
   maintenance_work_mem = 32MB
   effective_cache_size = 16GB
   effective_io_concurrency = 2
   logging_collector = on
   log_destination = 'stderr'
   log_filename = 'postgresql-%a.log'
   log_rotation_size = 0
   log_truncate_on_rotation = on
   log_line_prefix = '%t [%p] %l: '
   log_connections = on
   log_disconnections = on
   log_statement = 'all'
   track_activity_query_size = 10240
   shared_preload_libraries = 'auto_explain,pg_stat_statements'

As you can see, auto_explain's log_min_duration hasn't been set, so it 
shouldn't be doing anything very much, I should think.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart

2014-04-17 Thread Robert Haas

On Wed, Apr 16, 2014 at 8:46 PM, Craig Ringer  wrote:
> On 04/17/2014 12:16 AM, Robert Haas wrote:
>> On Wed, Apr 16, 2014 at 7:11 AM, Craig Ringer  wrote:
>
>>> - A flag like BGW_UNREGISTER_ON_RESTART;
>>
>> I would be OK with this, maybe modulo the name.
>>
>>> - To always unregister dynamic bgws on postmaster shm clear + restart;
>>
>> I don't particularly care for this.  Let's suppose the background
>> worker is a long-running daemon, like a PG equivalent of cron.  In
>> static-background worker land, the admin has to restart the cluster to
>> get this going.  In dynamic-background worker land, he can load it on
>> the fly.  But once he gets it up and running, he wants it to stay up
>> and running, surviving crashes and everything.  That's a big part of
>> the value of having a background worker interface in the first place.
>
> It'd be the job of the extension that provides the bgworker to make sure
> that it gets relaunched on postmaster restart.
>
> I tend to agree though. The problem I'm describing only causes issues
> for extensions that launch dynamic bgworkers during extension startup.
>
> Extensions that launch bgworkers in response to SQL commands and don't
> rely on passing state to the bgworker via shared memory shouldn't have
> to deal with restarting them on postmaster restart.
>
>>
>>> - A way to list bgws, inspect their BackgroundWorker structs and obtain
>>> their handles; or
>>
>> This is certainly a good idea.
>>
>>> - A way to idempotently register a bgw only if it doesn't already exist
>>
>> This is probably a good idea, too.
>
> I think we need _one_ solution for 9.4, and need it soon.
>
> The simplest is probably to flush all dynamic bgworkers. But I think
> you're probably right - that solves the problem discussed here, but is
> likely to make life harder for other use cases.
>
> A last-minute API addition for bgworker listing/inspection might not be
> a great idea - it's too late for it to see much testing and analysis and
> it might introduce bigger API problems than it solves.
>
> Duplicate-free registration might be OK, but there are some questions
> around how we'd handle differing parameters, what should be the
> determinant for uniquenes, whether we should go for idempotency or
> return/raise an error to indicate it already exists, etc. So similar
> issue with doing it at the last minute.
>
> To me, that says "let's add a flag to allow a dynamic bgworker to be
> unregistered on postmaster restart". Seems simple and low risk.
>
> I'll follow up with a proposed patch, then we can spend some quality
> shed time on the flag name ;-)

I think I can live with that.  However, I tend to think that the best
solution here is really "don't put try to pass pointers via the
BackgroundWorker structure, because it doesn't [ expletive ] work."
We've had several instances of that already.  When I added support for
dynamic background workers, I had to add bgw_library_name and
bgw_function_name members to that structure because bgw_main won't
work for extension code unless the library is loaded from
shared_preload_libraries AND either we're not running under
EXEC_BACKEND (i.e. Windows) or the system is kind enough to load the
shared library at the same address in both processes, in which case it
will accidentally fail to fail.  We would have had to give bgw_sighup
and bgw_sigterm the same treatment, but since they weren't really
necessary in the first place we just ripped them out instead.

worker_spi also had a bad case of this disease.  It went to elaborate
lengths to pass a pointer via bgw_main_arg, but the pointer was to
*backend-private memory*, so it was completely broken on EXEC_BACKEND
builds.  I'm not sure whether it actually failed there, or managed to
work just because the background worker backend also ran the _PG_init
hook and managed to accidentally place the same data structure at the
same address. I fixed that as part of introducing the dynamic
background worker facility; now it passes an index instead of a
pointer.

What you're complaining about here is basically another instance of
the same problem.  It's not as bad because the main shared memory
segment never moves or has any memory freed except after a
crash-and-restart cycle, so your suggested plug seems likely to be
adequate.  But it also requires that you can allocate enough space in
shared memory to pass around whatever state you need to carry around,
and space in the main shared memory segment is at a premium; storage
there also can't be freed.  Noah suggested to me a while back that we
might do better to change bgw_main_arg to something like char[64]
rather than Datum, which would allow passing a reasonable-size payload
without having to futz with shared memory.  Then, instead of passing a
pointer, you can pass the name of a shared memory region to look up
and an index into the data structure stored there, or something like
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Co

Re: [HACKERS] four minor proposals for 9.5

2014-04-17 Thread Tom Lane

Pavel Stehule  writes:
> We can introduce new feature without hard dependency on CSV format

Look, the long and the short of it is that there is not consensus
that this measurement is worth creating a new CSV log column for.
And from that, there is also not consensus that it's worth putting
into log_line_prefix.  Therefore, this proposal is dead.  Please
stop prolonging the argument.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] four minor proposals for 9.5

2014-04-17 Thread Pavel Stehule

2014-04-17 7:12 GMT+02:00 Amit Kapila :

> On Mon, Apr 14, 2014 at 6:27 PM, Robert Haas 
> wrote:
> > I agree.  I don't think the idea of pushing this into the
> > log_line_prefix stuff as a one-off is a very good one.  Sure, we could
> > wedge it in there, but we've got an existing precedent that everything
> > that you can get with log_line_prefix also shows up in the CSV output
> > file.  And it's easy to imagine LOTS more counters that somebody might
> > want to have.  Time spent planning, time spent executing, time spent
> > waiting for disk I/O, time spent returning results to client, and I'm
> > sure people will think of many others.  I think this will balloon out
> > of control if we don't have a more systematic design for this sort of
> > thing.
>
> Can't we think of some infrastructure similar to what is done for
> log_duration and log_min_duration_statement?
> Current it prints like below:
> LOG:  duration: 343.000 ms  statement: create table t1(c1 int);
>
> Let us say if user wants to track lock wait time a statement has
> spent, then enable some config parameter (either log_lock_duration
> or some other convenient way)
>
> LOG:  lock duration: 'x' ms  statement: create table t1(c1 int);
>

isn't it log_line_prefix analogy?

We can introduce new feature without hard dependency on CSV format

I am thinking so there are clean requests:

simply parseable - vector of numbers is ideal
simply activated, deactivated - maybe list of flags in GUC

Regards

Pavel


>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark

On Wed, Apr 16, 2014 at 12:44 AM, Robert Haas  wrote:
> This isn't a fundamental property of the usage-count idea; it's an
> artifact of the fact that usage count decreases are tied to eviction
> pressure rather than access pressure.  For example, suppose we made a
> rule that if the total usage counts of all buffers exceed 3 *
> NBuffers, then every time you bump the usage count of a buffer from N
> to N+1, you're required to advance the clock sweep far enough to
> decrease the reference count of a buffer by one.

This sounds like the right way to reason about it.

>From what I remember in school the idea with the clock sweep is to set
the usage flags to the maximum whenever the buffer is used and
decrement (actually iirc typically shift right)  it when the clock
sweep goes by. Ie, simulate a LRU where when the buffer is accessed it
jumps to the head of the list and when the clock comes by it moves
gradually down the list.

What you're pointing out is that the clock might not come by very
often resulting everything being at the head of the list. In that case
I'm not clear it really matters what gets evicted though. And the cpu
effort of running the clock n times sounds bad but doing the work
earlier doesn't really change the amount of work being done, it just
amortizes it over more calls.

But if you want to do that it seems to me the way to do it is every
time a buffer is pinned set to the maximum and then run the clock
max_value - previous_value. So the total usage counts of all buffers
remains constant. If that results in contention one way to reduce it
is to do this probabilistically. Run the clock 1% of the time but run
it 100x as much as you would normally.

But I think you've misidentified the problem and what those other
algorithms are trying to solve. The problem is not that Postgres will
pick a bad buffer to evict. If all the buffers have been since the
last time the clock came around then they're all "hot" anyways and it
doesn't really matter which one we evict. The problem is that we
expend an inordinate amount of work finding the few non-hot buffers.
When you have a really large amount of memory and 99.9% of it is hot
but 0.1% is whatever random non-hot page was needed last then there's
an obvious buffer to evict when you need a new one. But we spend a lot
of work decrementing every hot buffer's usage count 4 times only to
have them immediately incremented again just to find the 1 buffer
where the usage count was 4 or 3. The goal of these algorithms that
divide the buffers into groups is to avoid having to do so much work
to find the colder buffers. Once the hot buffers migrate to the hot
pool we only need to run the clock there when we find we have new hot
pages that we want to promote. All the thrashing in the cold pool can
be more efficient because there's many fewer pages to consider.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-04-17 Thread Robert Haas

On Wed, Apr 16, 2014 at 7:38 PM, Bruce Momjian  wrote:
> On Thu, Apr 10, 2014 at 06:03:15PM +0100, Simon Riggs wrote:
>> On 6 February 2014 18:21, Jeff Janes  wrote:
>> > On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris  wrote:
>> >>
>> >> The attached patch replaces the existing siftup method for heapify with
>> >> a siftdown method. Tested with random integers it does 18% fewer
>> >> compares and takes 10% less time for the heapify, over the work_mem
>> >> range 1024 to 1048576.
>> >
>> >
>> > Thanks for working on this.
>>
>> +1
>>
>> Your patch isn't linked properly from the CF manager though.
>>
>> If you like patches like this then there's a long(er) list of
>> optimizations already proposed previously around sorting. It would be
>> good to have someone work through them for external sorts. I believe
>> Noah is working on parallel internal sort (as an aside).
>>
>> There's also an optimization possible for merge joins where we use the
>> output of the first sort as an additional filter on the second sort.
>> That can help when we're going to join two disjoint tables.
>
> Where should this be recorded?  TODO?  Commitfest manager?

IIUC, the original patch was withdrawn; any remaining action items
should probably go to TODO.  I'm not sure which specific idea you're
referring to, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Need Multixact Freezing Docs

2014-04-17 Thread Robert Haas

On Wed, Apr 16, 2014 at 4:39 PM, Josh Berkus  wrote:
>> Hmm, are you sure it's INT_MAX and not 4244967297?  Heikki reported
>> that: http://www.postgresql.org/message-id/52401aea.9000...@vmware.com
>> The absolute value is not important; I think that's mostly harmless.  I
>> don't think applying age() to a multixact value is meaningful, though;
>> that's only good for Xids.
>
> Yeah, I'm sure:
>
> josh=# select relname, age(relminmxid) from pg_class;
>  relname |age
> -+
>  pg_statistic| 2147483647
>  pg_type | 2147483647
>  random  | 2147483647
>  dblink_pkey_results | 2147483647
>  pg_toast_17395  | 2147483647
>
> ...
>
> So if age() doesn't mean anything, then how are users to know when the
> need to freeze?

Or, in other words, this is another example of xid-freezing
infrastructure that needed to be copied for mxid-freezing and wasn't.
We need an analogue of age() for mxids.  Perhaps just mxid_age()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Buildfarm "master-next" branch? (was: Dynamic Shared Memory stuff)

2014-04-17 Thread Robert Haas

On Wed, Apr 16, 2014 at 8:24 PM, Craig Ringer  wrote:
> On 04/17/2014 12:08 AM, Robert Haas wrote:
>> On Tue, Apr 15, 2014 at 10:46 PM, Amit Kapila  
>> wrote:
>>> On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas  wrote:
 On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila  
 wrote:
> On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas  
> wrote:
>> For the create case, I'm wondering if we should put the block that
>> tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
>> is your opinion?
>
> Either way is okay, but I think the way you are suggesting is better as it
> will make code consistent with other place (PGSharedMemoryCreate()).

 OK, can you prepare a patch?
>>>
>>> Please find attached patch to address this issue.
>>> One minor point to note is that now we have to call GetLastError() twice,
>>> once inside error path and once to check EEXIST, but I think that is okay
>>> as existing code in PGSharedMemoryCreate() does it that way.
>>
>> OK.  I committed this blindly, but I don't have a Windows dev
>> environment, so please keep an eye on the Windows buildfarm members
>> and provide follow-on patches if any of them get unhappy about this.
>
> Given that we're doing this a fair bit, is it reasonable to define a
> "master-next" branch in git and have the buildfarm (or at least the
> Windows members) build that?
>
> Permit master-next to be rebased and reset.
>
> That way it's possible to fire stuff off and see what happens on the
> buildfarm without introducing broken commits unnecessarily.
>
> Thoughts?

In this particular case, I have a lot of confidence that Amit tested
this on his own machine before sending in the patch; and moreover, he
wrote that code in the first place.  So it's no worse than it would
have been if that change had been in the originally committed version,
which I didn't personally test on Windows, either, but which has
nevertheless mostly passed buildfarm testing.  Arguably, if I'm going
to be hacking on platform-dependent things very often, I should get my
own Windows build environment set up so that I can test it myself, but
it hasn't quite been worth it to me thus far, and Amit has proven to
be pretty reliable in terms of getting things right.

In terms of improving the buildfarm infrastructure, the thing I would
most like to have is more frequent runs.  It would be great if pushing
a commit to the master repository triggered an immediate build on
every buildfarm animal so that you could see all of the failures in a
short period of time.  But that would require more resources for the
buildfarm machines, which are provided on a strictly volunteer basis,
so it's hard to see how to arrange that.

But the ability to easily spin up temporary branches for testing would
also be great.  Unfortunately, I suspect that only a minority of the
buildfarm owners would choose to participate, which would make it less
useful, but if we could solve that problem I'd be all in favor of it.
I'm not volunteering to do the work, though.

Honestly, I don't think we have a huge problem here today.  Yeah, the
buildfarm turns pretty colors on a fairly regular basis, but those
issues are also generally fixed very quickly.  With the unfortunate
exception of the seemingly never-ending stream multixact-related bugs,
a user who took a snapshot of our master branch at a randomly selected
point during the 9.4 development cycle would likely have gotten code
reliable enough to be run in production.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Greg Stark

On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor  wrote:
> 4) Plays queries from the CSV logs starting from $TIME mimicking actual
> timing and transaction boundaries

This ^^

But I recall a number of previous attempts including plugins for
general load testing systems, what happened to them?

Honestly if you really want to load test properly though what you
really want to do is deploy a copy of your entire application and feed
it requests simulating user traffic. That results in more accurate
representation and gives you data that's easier to act on.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff -> if

2014-04-17 Thread Nicolas Barbier

2014-04-17 Michael Paquier :

> Is there no equivalent in German? For example in French there is "ssi".

gdw (genau dann, wenn)

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.

2014-04-17 Thread Andreas Karlsson


On 04/17/2014 01:35 AM, Tom Lane wrote:

I'll go change it.


Thanks for fixing this. The new name "Execution time" is much clearer.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2014-04-17 Thread Andres Freund

On 2014-04-16 19:18:02 -0400, Bruce Momjian wrote:
> On Thu, Feb  6, 2014 at 09:40:32AM +0100, Andres Freund wrote:
> > On 2014-02-05 12:36:42 -0500, Robert Haas wrote:
> > > >> It may well be that your proposal is spot on.  But I'd like to see some
> > > >> data-structure-by-data-structure measurements, rather than assuming 
> > > >> that
> > > >> alignment must be a good thing.
> > > >
> > > > I am fine with just aligning BufferDescriptors properly. That has
> > > > clearly shown massive improvements.
> > > 
> > > I thought your previous idea of increasing BUFFERALIGN to 64 bytes had
> > > a lot to recommend it.
> > 
> > Good.
> > 
> > I wonder if we shouldn't move that bit of logic:
> > if (size >= BUFSIZ)
> > newStart = BUFFERALIGN(newStart);
> > out of ShmemAlloc() and instead have a ShmemAllocAligned() and
> > ShmemInitStructAligned() that does it. So we can sensibly can control it
> > per struct.
> > 
> > > But that doesn't mean it doesn't need testing.
> > 
> > I feel the need here, to say that I never said it doesn't need testing
> > and never thought it didn't...
> 
> Where are we on this?

It needs somebody with time to evaluate possible performance regressions
- I personally won't have time to look into this in detail before pgcon.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] slow startup due to LWLockAssign() spinlock

2014-04-17 Thread Andres Freund

On 2014-04-16 19:33:52 -0400, Bruce Momjian wrote:
> On Tue, Feb  4, 2014 at 12:58:49AM +0100, Andres Freund wrote:
> > On 2014-02-03 11:22:45 -0500, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > On larger, multi-socket, machines, startup takes a fair bit of time. As
> > > > I was profiling anyway I looked into it and noticed that just about all
> > > > of it is spent in LWLockAssign() called by InitBufferPool(). Starting
> > > > with shared_buffers=48GB on the server Nate Boley provided, takes about
> > > > 12 seconds. Nearly all of it spent taking the ShmemLock spinlock.
> > > > Simply modifying LWLockAssign() to not take the spinlock when
> > > > !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making
> > > > LWLockAssign() prettier it seems enough of a speedup to be worthwile
> > > > nonetheless.
> > > 
> > > Hm.  This patch only works if the postmaster itself never assigns any
> > > LWLocks except during startup.  That's *probably* all right, but it
> > > seems a bit scary.  Is there any cheap way to make the logic actually
> > > be what your comment claims, namely "Interlocking is not necessary during
> > > postmaster startup"?  I guess we could invent a ShmemInitInProgress global
> > > flag ...
> > 
> > So, here's a flag implementing things with that flag. I kept your name,
> > as it's more in line with ipci.c's naming, but it looks kinda odd
> > besides proc_exit_inprogress.
> 
> Uh, where are we on this?

I guess it's waiting for the next CF :(.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] New windows compiler warning from 585bca39

2014-04-17 Thread David Rowley

I've attached a tiny patch that fixes a new compiler warning on the windows
build...

Perhaps the #ifndef could be placed in a nicer spot in the patch, but the
attached should at least describe where the problem lies...

(ClCompile target) ->
  src\interfaces\libpq\fe-connect.c(3365): warning C4101: 'msgid' :
unreferenced local variable [D:\Postgres\b\libpq.vcxproj]

1 Warning(s)


fe-connect_win32_warning_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-17 Thread Etsuro Fujita


(2014/04/16 22:16), Hannu Krosing wrote:

On 04/16/2014 01:35 PM, Etsuro Fujita wrote:



Maybe I'm missing something, but I think that you can do what I think
you'd like to do by the following procedure:

No, what I'd like PostgreSQL to do is to

1. select the id+set from local table
2. select the rows from remote table with WHERE ID IN ()
3. then join the original set to selected set, with any suitable join
strategy

The things I do not want are

A. selecting all rows from remote table
 (this is what your examples below do)

or

B. selecting rows from remote table by single selects using "ID = $"
 (this is something that I managed to do by some tweaking of costs)

as A will be always slow if there are millions of rows in remote table
and B is slow(ish) when the idset is over a few hundred ids

I hope this is a bit better explanation than I provided before .


Ah, I understand what you'd like to do.  Thank you for the explanation.


P.S. I am not sure if this is a limitation of postgres_fdw or postgres
itself


If I understand correctly, neither the current postgres_fdw planning 
function nor the current postgres planner itself support such a plan. 
For that I think we would probably need to implement a distributed query 
processing technique such as semijoin or bloomjoin in those modules.


Thanks,

P.S.

or, that as Tom mentioned, by disabling the use_remote_estimate function:


I misunderstood the meaning of what Tom pointed out.  Sorry for that.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.

2014-04-17 Thread Oleg Bartunov

I found a bit confusing, when planning time is greater total time, so
+1 for execution time.

On Thu, Apr 17, 2014 at 3:35 AM, Tom Lane  wrote:
> Bruce Momjian  writes:
>> Where are we on this?  I still see:
>
>>   test=> EXPLAIN ANALYZE SELECT 1;
>>QUERY PLAN
>>   
>> 
>>Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 
>> rows=1 loops=1)
>>Planning time: 0.009 ms
>> -->Total runtime: 0.009 ms
>>   (3 rows)
>
> There seemed to be a clear majority of votes in favor of changing it to
> say "Execution time".  Robert was arguing for no change, but I don't think
> that's tenable in view of the fact that the addition of the "Planning
> time" line is already a change, and one that makes the old wording
> confusing.
>
> I'll go change it.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

97 matches

Mail list logo