Re: [HACKERS] cheaper snapshots redux

2011-08-26 Thread Gokulakannan Somasundaram
On Tue, Aug 23, 2011 at 5:25 AM, Robert Haas  wrote:

> I've been giving this quite a bit more thought, and have decided to
> abandon the scheme described above, at least for now.  It has the
> advantage of avoiding virtually all locking, but it's extremely
> inefficient in its use of memory in the presence of long-running
> transactions.  For example, if there's an open transaction that's been
> sitting around for 10 million transactions or so and has an XID
> assigned, any new snapshot is going to need to probe into the big
> array for any XID in that range.  At 8 bytes per entry, that means
> we're randomly accessing about ~80MB of memory-mapped data.  That
> seems problematic both in terms of blowing out the cache and (on small
> machines) possibly even blowing out RAM.  Nor is that the worst case
> scenario: a transaction could sit open for 100 million transactions.
>
> First i respectfully disagree with you on the point of 80MB. I would say
that its very rare that a small system( with <1 GB RAM ) might have a long
running transaction sitting idle, while 10 million transactions are sitting
idle. Should an optimization be left, for the sake of a very small system to
achieve high enterprise workloads?

Second, if we make use of the memory mapped files, why should we think, that
all the 80MB of data will always reside in memory? Won't they get paged out
by the  operating system, when it is in need of memory? Or do you have some
specific OS in mind?

Thanks,
Gokul.


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Noah Misch
On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote:
> Hmmm, let me explain what led me to this patch - right now I'm doing a
> comparison of filesystems with various block sizes (both fs and db
> blocks). I've realized that the db block size significantly influences
> frequency of checkpoints and amount of data to write, so I'm collecting
> data from pg_stat_bgwriter too. The benchmark goes like this
> 
> 1. collect pg_stat_bgwriter stats
> 2. run pgbench for 10 minutes
> 3. collect pg_stat_bgwriter stats (to compute difference with (1))
> 4. kill the postmaster
> 
> The problem is that when checkpoint stats are collected, there might be a
> checkpoint in progress and in that case the stats are incomplete. In some
> cases (especially with very small db blocks) this has significant impact
> because the checkpoints are less frequent.

Could you remove this hazard by adding a step "2a. psql -c CHECKPOINT"?

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cheaper snapshots redux

2011-08-26 Thread Robert Haas
On Thu, Aug 25, 2011 at 6:29 PM, Jim Nasby  wrote:
> Actually, I wasn't thinking about the system dynamically sizing shared memory 
> on it's own... I was only thinking of providing the ability for a user to 
> change something like shared_buffers and allow that change to take effect 
> with a SIGHUP instead of requiring a full restart.

I agree.  That would be awesome.  Sadly, I don't have time to work on it.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cheaper snapshots redux

2011-08-26 Thread Robert Haas
On Thu, Aug 25, 2011 at 6:24 PM, Jim Nasby  wrote:
> On Aug 25, 2011, at 8:24 AM, Robert Haas wrote:
>> My hope (and it might turn out that I'm an optimist) is that even with
>> a reasonably small buffer it will be very rare for a backend to
>> experience a wraparound condition.  For example, consider a buffer
>> with ~6500 entries, approximately 64 * MaxBackends, the approximate
>> size of the current subxip arrays taken in aggregate.  I hypothesize
>> that a typical snapshot on a running system is going to be very small
>> - a handful of XIDs at most - because, on the average, transactions
>> are going to commit in *approximately* increasing XID order and, if
>> you take the regression tests as representative of a real workload,
>> only a small fraction of transactions will have more than one XID.  So
>
> BTW, there's a way to actually gather some data on this by using PgQ (part of 
> Skytools and used by Londiste). PgQ works by creating "ticks" at regular 
> intervals, where a tick is basically just a snapshot of committed XIDs. 
> Presumably Slony does something similar.
>
> I can provide you with sample data from our production systems if you're 
> interested.

Yeah, that would be great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropdb and dropuser: IF EXISTS

2011-08-26 Thread Robert Haas
On Fri, Aug 26, 2011 at 12:08 AM, Josh Kupershmidt  wrote:
> I noticed a few places where it would be handy if dropdb took a flag
> like "--if-exists" which would basically just add in the 'IF EXISTS'
> clause to the DROP DATABASE statement. For example, scripts like
> find_static or mbregress.sh use dropdb && createdb, but they generate
> noisy errors from dropdb when run for the first time since there's no
> --if-exists flag. (They could just pipe 'DROP DATABASE IF EXISTS ...'
> to psql, but what's the point of having dropdb if it's not used?)
>
> Attached is a very quick patch implementing the "--if-exists" or "-X"
> option for dropdb and dropuser. I didn't bother adding in a check to
> make sure the server version was 8.2+ since we're not even supporting
> 8.1 nowadays, though that'd be easy enough to add in.

+1 for --if-exists, but -X isn't doing a lot for me, especially since
we've used -X for other purposes in other commands.  I'd just skip
having a short form for this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inputting relative datetimes

2011-08-26 Thread Robert Haas
On Fri, Aug 26, 2011 at 4:32 PM, Jim Nasby  wrote:
> On Aug 25, 2011, at 5:08 AM, Dean Rasheed wrote:
>>> Funny you should mention intervals...
>>>
>>> timestamptz 'today' - interval '5 days'
>>> timestamptz 'now' + interval '2 hours'
>>>
>>
>> Yes, but what I am trying to achieve is a way of entering such
>> relative timestamps using a single input value, so that absolute and
>> relative timestamps can both be bound to a SQL query using just one
>> variable.
>
> Even if the community doesn't want to add this to core, I think it would be a 
> great add-on to put on PGXN. If you don't feel up to writing it themselves, 
> perhaps you would pay one of the consulting companies to do it for them?

Frankly, our current date parsing code is pretty darn strange and
flaky.  If nobody's found the energy to rationalize that, what are the
chances that we can add a whole bunch more functionality without also
adding a whole bunch more bugs?  For examples of the sorts of things
we haven't gotten around to fixing, see:

http://archives.postgresql.org/pgsql-hackers/2011-03/msg01295.php

Another problem here is that it seems possible, even likely, that
everyone will have their own particular flavor of what they'd like to
see accepted: two weeks ago, a week ago Tuesday, next Saturday, last
Friday, two weeks from Saturday, Christmas plus three fortnights...
if it weren't already spaghetti code...  give it time.  I'm not
necessarily opposed to the idea (especially as a contrib module), but
I'm a little nervous that we might be overestimating the extent to
which Dean's needs are universal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of useless include references

2011-08-26 Thread Bruce Momjian
Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of vie ago 26 01:35:45 -0300 2011:
> > It has been years since I ran src/tools/pginclude/pgrminclude to remove
> > unnecessary include files.  (I have already fixed things so include
> > files can be compiled on their own.)
> > 
> > The attached patch removes unneeded include references, and marks some
> > includes as needing to be skipped by pgrminclude.
> 
> In btree_gist I think you should remove #include "postgres.h" from the
> .h file and put it in the .c files instead, as is customary.  I think
> that would make the other changes incorrect.  ltree.h and pg_upgrade.h
> also get this wrong.

Thanks, done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of useless include references

2011-08-26 Thread Bruce Momjian
Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian  writes:
> > > It has been years since I ran src/tools/pginclude/pgrminclude to remove
> > > unnecessary include files.  (I have already fixed things so include
> > > files can be compiled on their own.)
> > 
> > > The attached patch removes unneeded include references, and marks some
> > > includes as needing to be skipped by pgrminclude.
> > 
> > > I am sure applying this patch will break builds on some platforms and
> > > some option combinations so I will monitor the buildfarm when I apply it
> > > and make adjustments.
> > 
> > The last time you did this was in July 2006.  It took us two weeks to
> > mostly recover, but we were still dealing with some fallout in December,
> > cf
> > http://archives.postgresql.org/pgsql-hackers/2006-12/msg00491.php
> > 
> > We had the buildfarm then, had had it for a couple years.  The notion
> > that watching the buildfarm is enough is fully disproven by history.
> > 
> > Unless you have a better test plan than last time (which this isn't),
> > I don't think this should be done at all.  The benefits are microscopic
> > and the pain real.
> 
> I don't have a better plan.  There are #ifdef code blocks that often
> don't get processed and therefore this can't be done better.  I will
> abandon the idea.

OK, try #2.  I already had code that removed #if/#else/#endif code in
*.h files for better testing, so I extended that to all *.c files.  This
reduces the size of the diff from 6.6k lines to 4.7k lines but it makes
it much less likely that there will be problems from running
pgrminclude.

The current patch is here:

http://momjian.us/expire/pgrminclude.diff

I tested the patch on BSD and Linux.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cryptic error message in low-memory conditions

2011-08-26 Thread Tom Lane
Daniel Farina  writes:
> Some Postgres servers will error out for a while with the following
> error message:
> "expected authentication request from server, but received c"
> [ and this seems to be triggered by fork failures in the server ]

Fascinating.

I poked at this for awhile by the expedient of hot-wiring postmaster.c
to always fail the fork request:

*** src/backend/postmaster/postmaster.c.origTue Aug 23 17:13:09 2011
--- src/backend/postmaster/postmaster.cFri Aug 26 19:08:25 2011
***
*** 3182,3188 
  #ifdef EXEC_BACKEND
  pid = backend_forkexec(port);
  #else/* !EXEC_BACKEND */
! pid = fork_process();
  if (pid == 0)/* child */
  {
  free(bn);
--- 3182,3189 
  #ifdef EXEC_BACKEND
  pid = backend_forkexec(port);
  #else/* !EXEC_BACKEND */
! pid = -1; // fork_process();
! errno = ENOMEM;
  if (pid == 0)/* child */
  {
  free(bn);

The basic case did what I was expecting:

$ psql "dbname=regression"
psql: could not fork new process for connection: Cannot allocate memory

which was unsurprising since I'm quite sure that code path got tested
in basically this fashion, back in the dark ages.  But sure enough,
over a TCP connection with SSL support turned on, it fails as described.
strace shows the postmaster is sending what it's supposed to.
I eventually figured out the problem: libpq sees the "E" response,
thinks it must be talking to a postmaster too old to understand
NEGOTIATE_SSL_CODE, and closes the socket and tries again.  But *it
forgets to clear out its input buffer*, so the body of the error message
is still there waiting to be read, and the next connection sees the "c"
as the first byte of the first postmaster response.  The fix for this is

*** src/interfaces/libpq/fe-connect.c.origThu Jul 28 10:39:57 2011
--- src/interfaces/libpq/fe-connect.cFri Aug 26 18:52:18 2011
***
*** 2064,2069 
--- 2064,2072 
  conn->allow_ssl_try = false;
  /* Assume it ain't gonna handle protocol 3, either */
  conn->pversion = PG_PROTOCOL(2, 0);
+ /* Discard any unread/unsent data */
+ conn->inStart = conn->inCursor = conn->inEnd = 0;
+ conn->outCount = 0;
  /* Must drop the old connection */
  closesocket(conn->sock);
  conn->sock = -1;

and similarly in the other places in fe-connect.c where we're abandoning
an open connection (not sure every one of those has to have it, but it
seems a good safety practice).

That gets us to the right place in the default SSL mode:

$ psql "host=localhost sslmode=prefer dbname=regression"
psql: could not fork new process for connection: Cannot allocate memory

but we're still not there if you're trying to force SSL:

$ psql "host=localhost sslmode=require dbname=regression"
psql: server does not support SSL, but SSL was required

The reason for this is that that same bit of code supposes that any
"E" response must mean that the postmaster didn't recognize
NEGOTIATE_SSL_CODE.  It doesn't (and of course shouldn't) pay any
attention to the actual textual error message.

Now, seeing as how NEGOTIATE_SSL_CODE has been understood by every build
since PG 7.0, I believe that this is dead code and we could remove it;
it seems exceedingly unlikely that any modern build of libpq will ever
be used to talk to a server that responds to that with "E".

In fact it's worse than just delivering a misleading error message in
the "require" case, because if you're not doing "require" then what
happens next is that the code forces the protocol level down to 2 and
tries again, supposing that it must be dealing with an ancient server.
In the normal case where you are talking to a server under load rather
than a deliberately-broken one, it's entirely possible that the second
connection attempt succeeds.  And what you've got then is that the
connection is operating in protocol 2 rather than what the user probably
expected, disabling assorted functionality that he may well be depending
on.  I don't recall having seen reports that could match that syndrome,
but then again Daniel's complaint is a new one on me too.

So I'm thinking we'd be well advised to eliminate the assumption that an
"E" response could be fixed by downgrading to protocol 2, and instead
just make this code report the error message it got from the postmaster.
That's more than a one-liner so I don't have a patch for it yet.

Lastly, I noticed that if I tried this repeatedly on a Unix socket,
I sometimes got

psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
could not send startup packet: Broken pipe

rather than the expected resul

Re: [HACKERS] Removal of useless include references

2011-08-26 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of vie ago 26 01:35:45 -0300 2011:
> It has been years since I ran src/tools/pginclude/pgrminclude to remove
> unnecessary include files.  (I have already fixed things so include
> files can be compiled on their own.)
> 
> The attached patch removes unneeded include references, and marks some
> includes as needing to be skipped by pgrminclude.

In btree_gist I think you should remove #include "postgres.h" from the
.h file and put it in the .c files instead, as is customary.  I think
that would make the other changes incorrect.  ltree.h and pg_upgrade.h
also get this wrong.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Andrew Dunstan



On 08/26/2011 04:46 PM, Jim Nasby wrote:

On Aug 26, 2011, at 12:15 PM, Andrew Dunstan wrote:

I knew there would be some bike-shedding about how we specify these things, 
which is why I haven't written docs yet.

While we're debating what shade of yellow to paint the shed...

My actual use case is to be able to be able to "inject" SQL into a 
SQL-formatted dump either pre- or post-data (I'm on 8.3, so I don't actually dump any 
data; I'm *mostly* emulating the ability to dump data on just certain tables).

So for what I'm doing, the ideal interface would be a way to tell pg_dump "When 
you're done dumping all table structures but before you get to any constraints, please 
run $COMMAND and inject it's output into the dump output." For some of the data 
obfuscation we're doing it would be easiest if $COMMAND was a perl script instead of SQL, 
but we could probably convert it.

Of course, many other folks actually need the ability to just spit out specific 
portions of the dump; I'm hoping we can come up with something that supports 
both concepts.



Well, the Unix approach is to use tools that do one thing well to build 
up more complex tools. Making pg_dump run some external command to 
inject things into the stream seems like the wrong thing given this 
philosophy. Use pg_dump to get the bits you want (pre-data, post-data) 
and sandwich them around whatever else you want. As for getting data 
from just certain tables, I just posted a patch for pg_dump to exclude 
data for certain tables, and we could look at providing a positive as 
well as a negative filter if there is sufficient demand.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Tom Lane
Alvaro Herrera  writes:
> The "--section=data --section=indexes" proposal seems very reasonable to
> me -- more so than "--sections='data indexes'".

+1 ... not only easier to code and less squishily defined, but more like
the existing precedent for other pg_dump switches, such as --table.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector concatenation - backend crash

2011-08-26 Thread Tom Lane
jes...@krogh.cc writes:
>> Attached SQL files gives (at least in my hands) a reliable backend crash
>> with this stacktrace .. reproduced on both 9.0.4 and HEAD. I'm sorry
>> I cannot provide a more trimmed down set of vectors the reproduces the
>> bug, thus
>> the "obsfucated" dataset. But even deleting single terms in the vectors
>> make the bug go away.

I found it.  tsvector_concat does this to compute the worst-case output
size needed:

/* conservative estimate of space needed */
out = (TSVector) palloc0(VARSIZE(in1) + VARSIZE(in2));

Unfortunately, that's not really worst case: it could be that the output
will require more alignment padding bytes than the inputs did, if there
is a mix of lexemes with and without position data.  For example, if in1
contains one lexeme of odd length without position data, and in2
contains one lexeme of even length with position data (and no pad byte),
and in1's lexeme sorts before in2's, then we will need a pad byte in the
second lexeme where there was none before.

The core of the fix is to suppose that we might need a newly-added pad
byte for each lexeme:

out = (TSVector) palloc0(VARSIZE(in1) + VARSIZE(in2) + i1 + i2);

which really is an overestimate but I don't feel a need to be tenser
about it.  What I actually committed is a bit longer because I added
some comments and some Asserts ...

> Ok, I found 8.3.0 to be "good" so i ran a git bisect on it.. it gave
> me this commit:
> 
> e6dbcb72fafa4031c73cc914e829a6dec96ab6b6 is the first bad commit
> commit e6dbcb72fafa4031c73cc914e829a6dec96ab6b6
> Author: Tom Lane 
> Date:   Fri May 16 16:31:02 2008 +
> 
> Extend GIN to support partial-match searches, and extend tsquery to
> support
> prefix matching using this facility.

AFAICT this is a red herring: the bug exists all the way back to where
tsvector_concat was added, in 8.3.  I think the reason that your test
case happens to not crash before this commit is that it changed the sort
ordering rules for lexemes.  As you can see from my minimal example
above, we might need different numbers of pad bytes depending on how the
lexemes sort relative to each other.

Anyway, patch is committed; thanks for the report!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cryptic error message in low-memory conditions

2011-08-26 Thread Daniel Farina
Hello list,

This is something that I've only recently somewhat pinned down to a cause...

Some Postgres servers will error out for a while with the following
error message:

"expected authentication request from server, but received c"

If one uses Their Favorite Search Engine, this message is scattered
around the internet, all in reference to Postgres, I think, but none
of the top results seem to have any lucid responses or cause listed.
We've seen this reproduce -- sometimes for minutes at a time -- and
after catching one in the act I am reasonably confident that one
common cause of this is systems that are low on memory, which I
confirmed by looking at postgres logs and matching them up against our
monitoring system.

Critical statistics first: the systems run Linux with overcommit off,
so malloc returns NULL now and again. There is no OOM killer activity.
 SSL is the transport, and SQL role password authentication is in use.
 There is no swap.

Here's an example of the various kinds of failure one can get from
connecting to a system that is low on memory:

2011-08-26 16:03:06 | INFO "psql? failed with exception #"
2011-08-26 16:02:27 | INFO "psql? failed with exception #"
2011-08-26 16:01:51 | INFO "psql? failed with exception #"
2011-08-26 16:01:15 | INFO "psql? failed with exception #"
2011-08-26 16:00:39 | INFO "psql? failed with exception #"
2011-08-26 16:00:01 | INFO "psql? failed with exception #"
2011-08-26 15:59:25 | INFO "psql? failed with exception #"
2011-08-26 15:58:48 | INFO "psql? failed with exception #"
2011-08-26 15:58:12 | INFO "psql? failed with exception #"

On the backend side, one can see that often there is a failure to
fork, which is basically expected in this condition.  Various
statements will be reporting OOM also.

The commonality of an error message that does not say anything about
being out of memory is representative of the norm, and nominally one
does not get any express indication that the system is out of memory,
but otherwise responsive.  This puts someone doing monitoring (like
us) in a tricky position: the utilizer of the database is free to use
their memory -- that's what it's for -- but the problem is we cannot
determine that the server is basically online, if fully utilized.
This defeats the ever-common "authenticate and run SELECT 1;"  basic
monitoring style frequently used to determine the most basic levels of
uptime.

Should the 'out of memory' conditions were delivered most of the time
we could act differently, but for now we basically have to assume that
postgres is offline and poke around.  It's also interesting to note
that the systems are basically responsive (ssh can always seem to
fork, as I'm poking around tools like 'ls' et al seem to be fine), and
sometimes the load average isn't even extreme -- a leaky application
with too many connections can cause this, so it's not like every tiny
last scrap of memory has been consumed.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Jim Nasby
On Aug 26, 2011, at 12:15 PM, Andrew Dunstan wrote:
> I knew there would be some bike-shedding about how we specify these things, 
> which is why I haven't written docs yet.

While we're debating what shade of yellow to paint the shed...

My actual use case is to be able to be able to "inject" SQL into a 
SQL-formatted dump either pre- or post-data (I'm on 8.3, so I don't actually 
dump any data; I'm *mostly* emulating the ability to dump data on just certain 
tables).

So for what I'm doing, the ideal interface would be a way to tell pg_dump "When 
you're done dumping all table structures but before you get to any constraints, 
please run $COMMAND and inject it's output into the dump output." For some of 
the data obfuscation we're doing it would be easiest if $COMMAND was a perl 
script instead of SQL, but we could probably convert it.

Of course, many other folks actually need the ability to just spit out specific 
portions of the dump; I'm hoping we can come up with something that supports 
both concepts.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inputting relative datetimes

2011-08-26 Thread Jim Nasby
On Aug 25, 2011, at 5:08 AM, Dean Rasheed wrote:
>> Funny you should mention intervals...
>> 
>> timestamptz 'today' - interval '5 days'
>> timestamptz 'now' + interval '2 hours'
>> 
> 
> Yes, but what I am trying to achieve is a way of entering such
> relative timestamps using a single input value, so that absolute and
> relative timestamps can both be bound to a SQL query using just one
> variable.


Even if the community doesn't want to add this to core, I think it would be a 
great add-on to put on PGXN. If you don't feel up to writing it themselves, 
perhaps you would pay one of the consulting companies to do it for them?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Alvaro Herrera
Excerpts from Robert Haas's message of vie ago 26 15:36:36 -0300 2011:
> On Fri, Aug 26, 2011 at 1:15 PM, Andrew Dunstan  wrote:
> > I don't have anything in principle against your '--sections="foo bar"'
> > suggestion, but it would be more work to program. Simpler, and probably more
> > consistent with how we do other things, would be allowing multiple --section
> > options, if we don't want to have named options such as I have provided.
> 
> I wouldn't object to that, but "more work to program" probably means
> about an extra 10 lines of code in this particular case.

The "--section=data --section=indexes" proposal seems very reasonable to
me -- more so than "--sections='data indexes'".

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Robert Haas
On Fri, Aug 26, 2011 at 1:15 PM, Andrew Dunstan  wrote:
> I don't have anything in principle against your '--sections="foo bar"'
> suggestion, but it would be more work to program. Simpler, and probably more
> consistent with how we do other things, would be allowing multiple --section
> options, if we don't want to have named options such as I have provided.

I wouldn't object to that, but "more work to program" probably means
about an extra 10 lines of code in this particular case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Tomas Vondra
On 26 Srpen 2011, 19:17, Greg Smith wrote:
> On 08/26/2011 03:54 AM, Magnus Hagander wrote:
>> In theory, this could be the "progress view" or
>> "progress field" talked about around Gregs previous patch - or it
>> could just be modifying the commandstring in pg_stat_activity.
>
> Right.  The whole progress indicator idea is hard to do for queries in
> general.  But there's enough of these other progress indicator ideas
> around now that it may be worth putting a standard way to handle them in
> here.  It sounds like that would be sufficient to address the area Tomas
> is trying to instrument better.  I badly want a progress indicator on
> CREATE INDEX CONCURRENTLY too, to at least let me know what phase of the
> build process it's on.  That's turned into a major headache recently.
>
> If we run with the idea of just allowing backends to publish a progress
> text string, I think this one maps into a similar space as the
> autovacuum one.  Publishing how many seconds the operation has been
> running for may be reasonable too.  Whether the overhead of the timing
> calls necessary to compute that will be high or not depends on the
> refresh rate of the progress info.  My suggestion before was to name
> these as key=value pairs for easy parsing; here's three examples now:
>
> autovacumm:  pgbench_accounts h=182701 m=301515 d=321345 s=62.231
> (cache hits, cache misses, dirty writes, seconds)
>
> background writer:  checkpoint b=511 t=3072 s=5.321
> (buffers written, total, seconds)
>
> create index concurrently:  pgbench_accounts p=1 b=62 t=6213 s=81.232
> (phase, blocks processed, total block estimate, seconds)
>
> I think that the idea of making this easily human readable is
> optimistic, because it will make all these strings big enough to start
> mattering.  Given that, we almost have to assume the only consumers of
> this data will be able to interpret it using the documentation.  I'd be
> happy with just the minimal data set in each case, not including any
> statistics you can easily derive from the values given (like the MB/s
> readings).  Adding that figure in particular to more of the log messages
> would be nice though.

I'm a bit confused - are you talking about updating process title or about
writing the info to log? The process title is probably fine for watching
the progress interactively, but it really does not solve what I need.

I need to be able to infer the progress for past events, so I'd have to
sample the 'ps ax' output regularly.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Tomas Vondra
On 26 Srpen 2011, 9:54, Magnus Hagander wrote:
> This seems like the wrong thing to write to the log. It's really only
> useful *during* the checkpoint run, isn't it? If so, I think it should
> go in a pg_stat view. In theory, this could be the "progress view" or
> "progress field" talked about around Gregs previous patch - or it
> could just be modifying the commandstring in pg_stat_activity. Either
> way, it should be updated in shared memory in that case (like current
> query is), and not sent with a message to the collector.

I personally find it handy for example when I need to find out why
performance degraded at a certain point in the past. Now I can see there
was a checkpoint (thanks to log_checkpoints=on), but I don't know any
details about it's progress.

I already collect info from the pg_stat_bgwriter, that's why I thought I
could update it more often. The log file is a natural destination for such
information, IMHO. I see that as an extension to the current checkpoint
messages that are written to the log.

The commandstring approach is probably fine for interactive work, but not
very handy when you need to analyze something that already happened.
Unless you collected the data, of course.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Greg Smith

On 08/26/2011 03:54 AM, Magnus Hagander wrote:

In theory, this could be the "progress view" or
"progress field" talked about around Gregs previous patch - or it
could just be modifying the commandstring in pg_stat_activity.


Right.  The whole progress indicator idea is hard to do for queries in 
general.  But there's enough of these other progress indicator ideas 
around now that it may be worth putting a standard way to handle them in 
here.  It sounds like that would be sufficient to address the area Tomas 
is trying to instrument better.  I badly want a progress indicator on 
CREATE INDEX CONCURRENTLY too, to at least let me know what phase of the 
build process it's on.  That's turned into a major headache recently.


If we run with the idea of just allowing backends to publish a progress 
text string, I think this one maps into a similar space as the 
autovacuum one.  Publishing how many seconds the operation has been 
running for may be reasonable too.  Whether the overhead of the timing 
calls necessary to compute that will be high or not depends on the 
refresh rate of the progress info.  My suggestion before was to name 
these as key=value pairs for easy parsing; here's three examples now:


autovacumm:  pgbench_accounts h=182701 m=301515 d=321345 s=62.231
(cache hits, cache misses, dirty writes, seconds)

background writer:  checkpoint b=511 t=3072 s=5.321
(buffers written, total, seconds)

create index concurrently:  pgbench_accounts p=1 b=62 t=6213 s=81.232
(phase, blocks processed, total block estimate, seconds)

I think that the idea of making this easily human readable is 
optimistic, because it will make all these strings big enough to start 
mattering.  Given that, we almost have to assume the only consumers of 
this data will be able to interpret it using the documentation.  I'd be 
happy with just the minimal data set in each case, not including any 
statistics you can easily derive from the values given (like the MB/s 
readings).  Adding that figure in particular to more of the log messages 
would be nice though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Andrew Dunstan



On 08/26/2011 12:46 PM, Robert Haas wrote:

On Fri, Aug 26, 2011 at 11:22 AM, Andrew Dunstan  wrote:

But we could also add these switches to pg_dump too if people feel it's
worthwhile. I haven't looked but the logic should not be terribly hard.

Something like the attached, in fact, which seems pretty simple.

It seems like there are three sets of things you might want here:
pre-data, data, post-data.  So in the end we could end up with:

--pre-data-only
--post-data-only
--data-only
--no-pre-data
--no-post-data
--no-data

And then maybe someone will want just the create index commands and
not the constraint commands.  It seems like it might be more elegant
to come up with a single switch where you can list which things you
want:

--sections='predata data'
--sections='postdata'
--sections='index'

Just thinking out loud


I knew there would be some bike-shedding about how we specify these 
things, which is why I haven't written docs yet.


All the possibilities you specify except for the indexes section can be 
done by using these switches in combination with -s and -a.


For anything more fine-grained, I'm inclined to say that people need to 
roll their own. pg_restore's --list and --use-list give you extremely 
fine-grained control. I have working scripts which use these for example 
to filter out londiste and pgq objects, certain large tables, audit 
objects and more. As an example of the complexity I think we should 
avoid, which section would UNIQUE and PRIMARY KEY constraints belong in? 
"constraints" because that's what they are, or "indexes" because that's 
what they create? No matter which answer you choose someone will claim 
you have violated POLA.


Chopping things into pre-data, data and post-data would get us around 
99% of the cases we could reasonably provide for in my experience. That 
seems enough :-)


I don't have anything in principle against your '--sections="foo bar"' 
suggestion, but it would be more work to program. Simpler, and probably 
more consistent with how we do other things, would be allowing multiple 
--section options, if we don't want to have named options such as I have 
provided.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Jeff Davis
On Fri, 2011-08-26 at 12:46 -0400, Robert Haas wrote:
> --sections='predata data'
> --sections='postdata'
> --sections='index'

Agreed. After command line options reach a certain level of complexity,
I think it's worth looking for a more general way to express them.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Robert Haas
On Fri, Aug 26, 2011 at 11:22 AM, Andrew Dunstan  wrote:
>> But we could also add these switches to pg_dump too if people feel it's
>> worthwhile. I haven't looked but the logic should not be terribly hard.
>
> Something like the attached, in fact, which seems pretty simple.

It seems like there are three sets of things you might want here:
pre-data, data, post-data.  So in the end we could end up with:

--pre-data-only
--post-data-only
--data-only
--no-pre-data
--no-post-data
--no-data

And then maybe someone will want just the create index commands and
not the constraint commands.  It seems like it might be more elegant
to come up with a single switch where you can list which things you
want:

--sections='predata data'
--sections='postdata'
--sections='index'

Just thinking out loud

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade problem

2011-08-26 Thread hubert depesz lubaczewski
On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> 
> OK, this was very helpful.  I found out that there is a bug in current
> 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> tables.  (The bug is not in any released version of pg_upgrade.)  The
> attached, applied patches should fix it for you.  I assume you are
> running 9.0.X, and not 9.0.4.

pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.

will keep you posted.

Best regards,

depesz


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-08-26 Thread Andrew Dunstan



On 08/25/2011 06:15 PM, Andrew Dunstan wrote:



But we could also add these switches to pg_dump too if people feel 
it's worthwhile. I haven't looked but the logic should not be terribly 
hard.





Something like the attached, in fact, which seems pretty simple.

cheers

andrew



diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f6cd7eb..e9b4cc6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -140,6 +140,8 @@ static int	column_inserts = 0;
 static int	no_security_labels = 0;
 static int	no_unlogged_table_data = 0;
 static int	serializable_deferrable = 0;
+static int  exclude_post_data = 0;
+static int  post_data_only = 0;
 
 
 static void help(const char *progname);
@@ -334,6 +336,8 @@ main(int argc, char **argv)
 		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
 		{"no-security-labels", no_argument, &no_security_labels, 1},
 		{"no-unlogged-table-data", no_argument, &no_unlogged_table_data, 1},
+		{"no-post-data", no_argument, &exclude_post_data, 1},
+		{"post-data-only", no_argument, &post_data_only, 1},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -790,7 +794,7 @@ main(int argc, char **argv)
 	dumpStdStrings(g_fout);
 
 	/* The database item is always next, unless we don't want it at all */
-	if (include_everything && !dataOnly)
+	if (include_everything && !dataOnly && !post_data_only)
 		dumpDatabase(g_fout);
 
 	/* Now the rearrangeable objects. */
@@ -876,6 +880,8 @@ help(const char *progname)
 	printf(_("  --no-unlogged-table-datado not dump unlogged table data\n"));
 	printf(_("  --quote-all-identifiers quote all identifiers, even if not key words\n"));
 	printf(_("  --serializable-deferrable   wait until the dump can run without anomalies\n"));
+	printf(_("  --no-post-data  do not dump constraints, indexes, rules, triggers\n"));
+	printf(_("  --post-data-onlyonly dump constraints, indexes, rules, triggers\n"));
 	printf(_("  --use-set-session-authorization\n"
 			 "  use SET SESSION AUTHORIZATION commands instead of\n"
 	"  ALTER OWNER commands to set ownership\n"));
@@ -7023,6 +7029,25 @@ collectComments(Archive *fout, CommentItem **items)
 static void
 dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 {
+
+	int skip = 0;
+
+	switch (dobj->objType)
+	{
+		case DO_INDEX:
+		case DO_TRIGGER:
+		case DO_CONSTRAINT:
+		case DO_FK_CONSTRAINT:
+		case DO_RULE:
+			skip = exclude_post_data;
+			break;
+		default:
+			skip = post_data_only;
+	}
+
+	if (skip)
+		return;
+	
 	switch (dobj->objType)
 	{
 		case DO_NAMESPACE:

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper

2011-08-26 Thread Dave Page
On Fri, Aug 26, 2011 at 3:09 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 08/26/2011 07:27 AM, Albe Laurenz wrote:
>>> http://oracle-fdw.projects.postgresql.org/
>>>
>>> Would it make sense to mention that in chapter 5.10
>>> of the documentation?
>
>> I don't think so, any more than any other external module should be
>> mentioned in the docs. There are known FDWs for several well known
>> external databases, several file formats, and more exotic data sources
>> such as twitter. I don't think we want to  maintain a list of these in
>> the docs.
>
> Wiki page, maybe?

For example this one: http://wiki.postgresql.org/wiki/Foreign_data_wrappers


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-26 Thread Alexander Korotkov
On Thu, Aug 25, 2011 at 10:53 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

>
>> In the tests on the first version of patch I found index quality of
>> regular
>> build much better than it of buffering build (without neighborrelocation).
>> Now it's similar, though it's because index quality of regular index build
>> become worse. There by in current tests regular index build is faster than
>> in previous. I see following possible causes of it:
>>  1) I didn't save source random data. So, now it's a new random data.
>> 2) Some environment parameters of my test setup may alters, though I
>> doubt.
>> Despite these possible explanation it seems quite strange for me.
>>
>
> That's pretty surprising. Assuming the data is truly random, I wouldn't
> expect a big difference in the index quality of one random data set over
> another. If the index quality depends so much on, say, the distribution of
> the few first tuples that are inserted to it, that's a quite interesting
> find on its own, and merits some further research.

Yeah, it's pretty strange. Using same random datasets in different tests can
help to exclude onepossible cause of difference.


>  In order to compare index build methods on more qualitative indexes, I've
>> tried to build indexes with my double sorting split method (see:
>> http://syrcose.ispras.ru/2011/**files/SYRCoSE2011_Proceedings.**
>> pdf#page=36).
>> So
>> on uniform dataset search is faster in about 10 times! And, as it was
>> expected, regular index build becomes much slower. It runs more than 60
>> hours and while only 50% of index is complete (estimated by file sizes).
>>
>> Also, automatic switching to buffering build shows better index quality
>> results in all the tests. While it's hard for me to explain that.
>>
>
> Hmm, makes me a bit uneasy that we're testing with a modified page
> splitting algorithm. But if the new algorithm is that good, could you post
> that as a separate patch, please?
>
I've post it in another message and I will try to get it into more
appropriate form. Let me clarify this a little. I don't think my split
algorithm is 10 times better than state of the art algorithms. I think that
currently used new linear split shows unreasonably bad results in may cases.
For example, uniformly distributed data is pretty easy case. And with almost
any splitting algorithm we can get index with almost zero overlaps. But new
linear split produces huge overlaps in this case. That's why I decided to
make some experiments with another split algorithm.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: Fast GiST index build

2011-08-26 Thread Alexander Korotkov
On Thu, Aug 25, 2011 at 11:08 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> Could you share the test scripts, patches and data sets etc. needed to
> reproduce the tests you've been running? I'd like to try them out on a test
> server.
>

1) I've updated links to the datasets on the wiki page.
2) Script for index quality testing fastbuild_test.php is in the attachment.
In order to run it you need PHP with pdo and pdo_pgsql modules. Also
plantuner moduler is required (it is used to force planer to use specific
index). After running that script following query returns relative score of
index quality:

select indexname, avg(count::real/(select count from test_result a2 where
a2.indexname = 'usnoa2_idx3' and a2.predicate = a1.predicate and
a2.tablename = a1.tablename)::real) from test_result a1 where a1.tablename =
'usnoa2' group by indexname;

where 'usnoa2' - table name, 'usnoa2_idx3' - name of index which quality was
assumed to be 1.
3) Patch which makes plantuner work with HEAD is also in attachment.
4) Patch with my split algorithm implementation is attached. Now it's form
is appropriate only for testing purposes.
5) For indexes creation I use simple script which is attached as
'indexes.sql'. Also, similar script with different index names I'm running
with my split patch.

Feel free to ask questions about all this stuff.

--
With best regards,
Alexander Korotkov.


fastbuild_test.php.gz
Description: GNU Zip compressed data


plantuner.patch.gz
Description: GNU Zip compressed data


my_split.patch.gz
Description: GNU Zip compressed data
select pg_stat_statements_reset();
set log_statement_stats = on;
set synchronize_seqscans = off;

create index uniform_idx1 on uniform using gist(point) with (buffering=on);
create index uniform_idx2 on uniform using gist(point) with (buffering=auto);
create index uniform_idx3 on uniform using gist(point) with (buffering=off);

create index usnoa2_idx1 on usnoa2 using gist(point) with (buffering=on);
create index usnoa2_idx2 on usnoa2 using gist(point) with (buffering=auto);
create index usnoa2_idx3 on usnoa2 using gist(point) with (buffering=off);

create index usnoa2_shuffled_idx1 on usnoa2_shuffled using gist(point) with (buffering=on);
create index usnoa2_shuffled_idx2 on usnoa2_shuffled using gist(point) with (buffering=auto);
create index usnoa2_shuffled_idx3 on usnoa2_shuffled using gist(point) with (buffering=off);


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper

2011-08-26 Thread Tom Lane
Andrew Dunstan  writes:
> On 08/26/2011 07:27 AM, Albe Laurenz wrote:
>> http://oracle-fdw.projects.postgresql.org/
>> 
>> Would it make sense to mention that in chapter 5.10
>> of the documentation?

> I don't think so, any more than any other external module should be 
> mentioned in the docs. There are known FDWs for several well known 
> external databases, several file formats, and more exotic data sources 
> such as twitter. I don't think we want to  maintain a list of these in 
> the docs.

Wiki page, maybe?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper

2011-08-26 Thread Andrew Dunstan



On 08/26/2011 07:27 AM, Albe Laurenz wrote:

I wrote:

I wrote a FDW for Oracle to a) learn some server coding
and b) see how well the FDW API works for me.

I have released the software on PgFoundry:
http://oracle-fdw.projects.postgresql.org/

Would it make sense to mention that in chapter 5.10
of the documentation?





I don't think so, any more than any other external module should be 
mentioned in the docs. There are known FDWs for several well known 
external databases, several file formats, and more exotic data sources 
such as twitter. I don't think we want to  maintain a list of these in 
the docs.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector concatenation - backend crash

2011-08-26 Thread Tom Lane
Jesper Krogh  writes:
> On 2011-08-26 05:28, Tom Lane wrote:
>> Hm ... I can reproduce this on one of my usual machines, but not
>> another.  What platform are you on exactly?

> 64 bit Ubuntu Lucid (amd64).

Huh, weird ... because the platform it's not failing for me on is
Fedora 14 x86_64.  Which is annoying, because that machine has better
tools for looking for memory stomps than the 32-bit HP box where I
do see the problem.  Anyway, will see what I can find.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of useless include references

2011-08-26 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian  writes:
> > It has been years since I ran src/tools/pginclude/pgrminclude to remove
> > unnecessary include files.  (I have already fixed things so include
> > files can be compiled on their own.)
> 
> > The attached patch removes unneeded include references, and marks some
> > includes as needing to be skipped by pgrminclude.
> 
> > I am sure applying this patch will break builds on some platforms and
> > some option combinations so I will monitor the buildfarm when I apply it
> > and make adjustments.
> 
> The last time you did this was in July 2006.  It took us two weeks to
> mostly recover, but we were still dealing with some fallout in December,
> cf
> http://archives.postgresql.org/pgsql-hackers/2006-12/msg00491.php
> 
> We had the buildfarm then, had had it for a couple years.  The notion
> that watching the buildfarm is enough is fully disproven by history.
> 
> Unless you have a better test plan than last time (which this isn't),
> I don't think this should be done at all.  The benefits are microscopic
> and the pain real.

I don't have a better plan.  There are #ifdef code blocks that often
don't get processed and therefore this can't be done better.  I will
abandon the idea.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper

2011-08-26 Thread Albe Laurenz
I wrote:
> I wrote a FDW for Oracle to a) learn some server coding
> and b) see how well the FDW API works for me.

I have released the software on PgFoundry:
http://oracle-fdw.projects.postgresql.org/

Would it make sense to mention that in chapter 5.10
of the documentation?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector concatenation - backend crash

2011-08-26 Thread jesper
> Hi
>
> Attached SQL files gives (at least in my hands) a reliable backend crash
> with this stacktrace .. reproduced on both 9.0.4 and HEAD. I'm sorry
> I cannot provide a more trimmed down set of vectors the reproduces the
> bug, thus
> the "obsfucated" dataset. But even deleting single terms in the vectors
> make
> the bug go away.

Ok, I found 8.3.0 to be "good" so i ran a git bisect on it.. it gave
me this commit:

e6dbcb72fafa4031c73cc914e829a6dec96ab6b6 is the first bad commit
commit e6dbcb72fafa4031c73cc914e829a6dec96ab6b6
Author: Tom Lane 
Date:   Fri May 16 16:31:02 2008 +

Extend GIN to support partial-match searches, and extend tsquery to
support
prefix matching using this facility.

Teodor Sigaev and Oleg Bartunov

:04 04 febf59ba02bcd4ce3863e880c6bbd989e0b7b1d2
5e96383e628dd27b5c68b0186af18f80fb7ef129 M  doc
:04 04 b920deca6f074b83dd5d2bd0446785a23019d11a
3f10e54cdeac63129f34865adcadf34ff74ff9a8 M  src
bisect run success

Which means that 8.3 releases are OK, but 8.4 and forward has the problem.

Which at least touches the same area.. the patch is allthogh over 3K lines,
and my C-skills are not "that good".

Attached is the git bisect script.. just for the archives.

Jesper

git-bisect-script
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-08-26 Thread Kohei KaiGai
Robert, Thanks for your reviewing.

> For me, the line you removed from dml.out causes the regression tests to fail.
>
Fixed. Why did I removed this line??

> I don't understand what this is going for:
>
> +       /*
> +        * To boost up trusted procedure checks on db_procedure object
> +        * class, we also confirm the decision when user calls a procedure
> +        * labeled as 'tcontext'.
> +        */
>
> Can you explain?
>
Yes. It also caches an expected security label when a client being
labeled as "scontext" tries to execute a procedure being labeled as
"tcontext", to reduce number of system call invocations on fmgr_hook
and needs_fmgr_hook.
If the expected security label is not same with "scontext", it means
the procedure performs as a trusted procedure that switches security
label of the client during its execution; like a security invoker
function.
A pair of security labels are the only factor to determine whether the
procedure is a trusted-procedure, or not. Thus, it is suitable to
cache in userspace avc.

As an aside, the reason why we don't cache the default security label
being assigned on newly created named objects (such as tables, ...) is
that selinux allows to set up exceptional default security label on a
particular name, so it does not suitable for avc structure.
(I'm waiting for getting included this interface into libselinux.)

> sepgsql_avc_check_perms_label has a formatting error on the line that
> says "result = false".  It's not indented correctly.
>
OK, I fixed it.

> Several functions do this: sepgsql_avc_check_valid(); do { ... } while
> (!sepgsql_avc_check_valid);  I don't understand why we need a loop
> there.
>
It enables to prevent inconsistent access control decision from
concurrent security policy reloading.
I want the following steps being executed in atomic.
 1) Lookup object class number in kernel-side
 2) Lookup permission bits in kernel-side
 3) Ask kernel-side its access control decision.

The selinux_status_update returns 1, if any status of selinux in
kernel side (that requires to flush userspace caches) had been changed
since the last invocation.
In this case, we retry whole of the process from the beginning to
ensure whole of access control decision being made by either old or
new policy.
Thus, I enclosed these blocks by do {...} while() loop.

> The comment for sepgql_avc_check_perms_label uses the word "elsewhere"
> when it really means "otherwise".
>
OK, I fixed it.

> Changing the calling sequence of sepgsql_get_label() would perhaps be
> better separated out into its own patch.
>
OK, I reverted it.

Thanks,
-- 
KaiGai Kohei 
 configure.in   |4 +-
 contrib/sepgsql/Makefile   |2 +-
 contrib/sepgsql/dml.c  |   59 +++---
 contrib/sepgsql/hooks.c|   64 +++---
 contrib/sepgsql/proc.c |   68 ++-
 contrib/sepgsql/relation.c |   69 +++
 contrib/sepgsql/schema.c   |   39 ++--
 contrib/sepgsql/selinux.c  |2 +-
 contrib/sepgsql/sepgsql.h  |   18 ++-
 contrib/sepgsql/uavc.c |  511 
 doc/src/sgml/sepgsql.sgml  |   12 +-
 11 files changed, 649 insertions(+), 199 deletions(-)

diff --git a/configure.in b/configure.in
index a844afc..b444358 100644
--- a/configure.in
+++ b/configure.in
@@ -964,8 +964,8 @@ fi
 
 # for contrib/sepgsql
 if test "$with_selinux" = yes; then
-  AC_CHECK_LIB(selinux, selinux_sepgsql_context_path, [],
-   [AC_MSG_ERROR([library 'libselinux', version 2.0.93 or newer, is required for SELinux support])])
+  AC_CHECK_LIB(selinux, selinux_status_open, [],
+   [AC_MSG_ERROR([library 'libselinux', version 2.0.99 or newer, is required for SELinux support])])
 fi
 
 # for contrib/uuid-ossp
diff --git a/contrib/sepgsql/Makefile b/contrib/sepgsql/Makefile
index 1978ccf..e273d8f 100644
--- a/contrib/sepgsql/Makefile
+++ b/contrib/sepgsql/Makefile
@@ -1,7 +1,7 @@
 # contrib/sepgsql/Makefile
 
 MODULE_big = sepgsql
-OBJS = hooks.o selinux.o label.o dml.o \
+OBJS = hooks.o selinux.o uavc.o label.o dml.o \
 	schema.o relation.o proc.o
 DATA_built = sepgsql.sql
 REGRESS = label dml misc
diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c
index 22666b7..3199337 100644
--- a/contrib/sepgsql/dml.c
+++ b/contrib/sepgsql/dml.c
@@ -150,12 +150,11 @@ check_relation_privileges(Oid relOid,
 		  uint32 required,
 		  bool abort)
 {
-	char		relkind = get_rel_relkind(relOid);
-	char	   *scontext = sepgsql_get_client_label();
-	char	   *tcontext;
+	ObjectAddress	object;
 	char	   *audit_name;
 	Bitmapset  *columns;
 	int			index;
+	char		relkind = get_rel_relkind(relOid);
 	bool		result = true;
 
 	/*
@@ -184,45 +183,43 @@ check_relation_privileges(Oid relOid,
 	/*
 	 * Check permissions on the relation
 	 */
-	tcontext = sepgsql_get_label(RelationRelationId, relOid, 0);
-	audit_name = getObjectDescriptionOids(RelationRelationId, relOid);
+	object.classId = RelationRelationId;
+	object.objectId = relOid;
+	object.objectSubId = 0;
+	audit_name

Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Tomas Vondra
On 26 Srpen 2011, 9:35, Greg Smith wrote:
> On 08/25/2011 04:57 PM, Tomas Vondra wrote:
>> (b) sends bgwriter stats (so that the buffers_checkpoint is updated)
>>
> As for throwing more log data out, I'm not sure what new analysis you're
> thinking of that it allows.  I/O gets increasingly spiky as you zoom in
> on it; averaging over a shorter period can easily end up providing less
> insight about trends.  If anything, I spend more time summarizing the
> data that's already there, rather than wanting to break them down.  It's
> already providing way too much detail for most people.  Customers tell
> me they don't care to see checkpoint stats unless they're across a day
> or more of sampling, so even the current "once every ~5 minutes" is way
> more info than they want.  I have all this log parsing code and things
> that look at pg_stat_bgwriter to collect that data and produce higher
> level reports.  And lots of it would break if any of this patch is added
> and people turn it on.  I imagine other log/stat parsing programs might
> suffer issues too.  That's your other hurdle for change here:  the new
> analysis techniques have to be useful enough to justify that some
> downstream tool disruption is inevitable.

I was aware that by continuously updating pg_stat_bgwriter, the data won't
be synchronized (i.e. the buffers_checkpoint counters will change while
the number of requested/timed checkpoints remain the same).

But does that really break the tools that process the data? When you're
working with summarized data, the result should be more or less the same
as the difference will be smoothed out by averaging etc. You can always
see just one "in progress" checkpoint, so if you get 24 checkpoints a day,
the difference will be 1/24 of a checkpoint. Yes, it's a difference.

A really crazy workaround would be to change checkpoints_requested /
checkpoints_timed to double, and use that to indicate current progress of
the checkpoint. So for example 10.54 would mean 10 checkpoints completed,
one checkpoint in progress, already written 54% of blocks. But yes, that's
a bit crazy.

> If you have an idea for how to use this extra data for something useful,
> let's talk about what that is and see if it's possible to build it in
> instead.  This problem is harder than it looks, mainly because the way
> the OS caches writes here makes trying to derive hard numbers from what
> the background writer is doing impossible.  When the database writes
> things out, and when they actually get written to disk, they are not the
> same event.  The actual write is often during the sync phase, and not
> being able to tracking that beast is where I see the most problems at.
> The write phase, the easier part to instrument in the database, that is
> pretty boring.  That's why the last extra logging I added here focused
> on adding visibility to the sync activity instead.

Hmmm, let me explain what led me to this patch - right now I'm doing a
comparison of filesystems with various block sizes (both fs and db
blocks). I've realized that the db block size significantly influences
frequency of checkpoints and amount of data to write, so I'm collecting
data from pg_stat_bgwriter too. The benchmark goes like this

1. collect pg_stat_bgwriter stats
2. run pgbench for 10 minutes
3. collect pg_stat_bgwriter stats (to compute difference with (1))
4. kill the postmaster

The problem is that when checkpoint stats are collected, there might be a
checkpoint in progress and in that case the stats are incomplete. In some
cases (especially with very small db blocks) this has significant impact
because the checkpoints are less frequent.

I can't infer this from other data (e.g. iostat), because that does allow
me what I/O is caused by the checkpoint.

Yes, this does not consider sync timing, but in my case that's not a big
issue (the page cache is rather small so the data are actually forced to
the drive soon).

I could probably live with keeping the current pg_stat_bgwriter logic
(i.e. updating just once) and writing checkpoint status just to the log. I
don't think that should break any current tools that parse logs, because
the message is completely different (prefixed with 'checkpoint status') so
any reasonably written tool should be OK.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Magnus Hagander
On Thu, Aug 25, 2011 at 22:57, Tomas Vondra  wrote:
> Hello,
>
> I'd like to propose a small patch that allows better checkpoint progress
> monitoring. The patch is quite simple - it adds a new integer GUC
> "checkpoint_update_limit" and every time checkpoint writes this number of
> buffers, it does two things:
>
> (a) logs a "checkpoint status" message into the server log, with info
> about total number of buffers to write, number of already written buffers,
> current and average write speed and estimate of remaining time
>
> (b) sends bgwriter stats (so that the buffers_checkpoint is updated)
>
> I believe this will make checkpoint tuning easier, especially with large
> shared bufferers and large when there's other write activity (so that it's
> difficult to see checkpoint I/O).
>
> The default value (0) means this continuous logging is disabled.

This seems like the wrong thing to write to the log. It's really only
useful *during* the checkpoint run, isn't it? If so, I think it should
go in a pg_stat view. In theory, this could be the "progress view" or
"progress field" talked about around Gregs previous patch - or it
could just be modifying the commandstring in pg_stat_activity. Either
way, it should be updated in shared memory in that case (like current
query is), and not sent with a message to the collector.

IMHO, of course ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-08-26 Thread Greg Smith

On 08/25/2011 04:57 PM, Tomas Vondra wrote:

(b) sends bgwriter stats (so that the buffers_checkpoint is updated)
   


The idea behind only updating the stats in one chunk, at the end, is 
that it makes one specific thing easier to do.  Let's say you're running 
a monitoring system that is grabbing snapshots of pg_stat_bgwriter 
periodically.  If you want to figure out how much work a checkpoint did, 
you only need two points of data to compute that right now.  Whenever 
you see either of the checkpoint count numbers increase, you just 
subtract off the previous sample; now you've got a delta for how many 
buffers that checkpoint wrote out.  You can derive the information about 
the buffer counts involved that appears in the logs quite easily this 
way.  The intent was to make that possible to do, so that people can 
figure this out without needing to parse the log data.


Spreading out the updates defeats that idea.  It also makes it possible 
to see the buffer writes more in real-time, as they happen.  You can 
make a case for both approaches having their use cases; the above is 
just summarizing the logic behind why it's done the way it is right 
now.  I don't think many people are actually doing things with this to 
the level where their tool will care.  The most popular consumer of 
pg_stat_bgwriter data I see is Munin graphing changes, and I don't think 
it will care either way.


Giving people the option of doing it the other way is a reasonable idea, 
but I'm not sure there's enough use case there to justify adding a GUC 
just for that.  My next goal here is to eliminate checkpoint_segments, 
not to add yet another tunable extremely few users would ever touch.


As for throwing more log data out, I'm not sure what new analysis you're 
thinking of that it allows.  I/O gets increasingly spiky as you zoom in 
on it; averaging over a shorter period can easily end up providing less 
insight about trends.  If anything, I spend more time summarizing the 
data that's already there, rather than wanting to break them down.  It's 
already providing way too much detail for most people.  Customers tell 
me they don't care to see checkpoint stats unless they're across a day 
or more of sampling, so even the current "once every ~5 minutes" is way 
more info than they want.  I have all this log parsing code and things 
that look at pg_stat_bgwriter to collect that data and produce higher 
level reports.  And lots of it would break if any of this patch is added 
and people turn it on.  I imagine other log/stat parsing programs might 
suffer issues too.  That's your other hurdle for change here:  the new 
analysis techniques have to be useful enough to justify that some 
downstream tool disruption is inevitable.


If you have an idea for how to use this extra data for something useful, 
let's talk about what that is and see if it's possible to build it in 
instead.  This problem is harder than it looks, mainly because the way 
the OS caches writes here makes trying to derive hard numbers from what 
the background writer is doing impossible.  When the database writes 
things out, and when they actually get written to disk, they are not the 
same event.  The actual write is often during the sync phase, and not 
being able to tracking that beast is where I see the most problems at.  
The write phase, the easier part to instrument in the database, that is 
pretty boring.  That's why the last extra logging I added here focused 
on adding visibility to the sync activity instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers