Re: [HACKERS] [GENERAL] pg_upgrade -u

2013-05-28 Thread Joshua D. Drake


On 05/28/2013 07:55 PM, Bruce Momjian wrote:


Perhaps just documenting the behavior is all that is needed, but -U is
everywhere and I think that's a good thing.


[ moved to hacker ]

Wow, I never realized other tools used -U for user, instead of -u.
Should I change pg_upgrade to use -U for 9.4?  I can keep supporting -u
as an undocumented option.


Yes, -U makes the most sense as that is what is used with the other 
tools. I think you should just drop -u, this isn't something people are 
doing everyday (like psql). The backward compatibility argument is 
pretty slim.


JD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Mon, May 27, 2013 at 03:06:13PM -0400, Alvaro Herrera wrote:
> > Bruce Momjian wrote:

> > I would have each data segment be self-identifying, i.e. have a magic
> > number at the beginning of the file and the relation OID, some fork
> > identification and the segment number somewhere -- probably the special
> > space of the first page.
> 
> Is this something we want on the TODO?  I was not clear how to do with
> without making the first page format special or wasting space on all the
> other pages.

I don't think the special space has to necessarily be identically sized
in all the pages -- I admit I haven't looked closely, but the special
size is part of the page header.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade -u

2013-05-28 Thread Bruce Momjian
On Wed, May 22, 2013 at 03:05:57PM -0400, Ray Stell wrote:
> > However, if we pass these items into the scripts, we then force
> > these values to be used, even if the user wants to use a different
> > value.  It is a balance between supplying defaults vs. requiring the
> > user to supply or change the values used during the ugprade.
> >
> > At this point, I have favored _not_ supplying defaults in the
> > script.  Do you have an alternative argument in favor of supplying
> > defaults?
>
>
>
> Well, the story really began when I ran initdb with a -U arg. vacuumdb
> takes the -U also, but pg_upgrade does not.
>
> It seems like if I have to supply a -u in order to get pg_upgrade
> to function in the case where there is no default superuser in the
> cluster, then an associated vacuumdb command requires a -U arg.
>
> Perhaps just documenting the behavior is all that is needed, but -U is
> everywhere and I think that's a good thing.

[ moved to hacker ]

Wow, I never realized other tools used -U for user, instead of -u. 
Should I change pg_upgrade to use -U for 9.4?  I can keep supporting -u
as an undocumented option.

I have applied the attached patch to document that you might need to set
connection parameters for vacuumdb.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index d676d28..4ac0e56
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** psql --username postgres --file script.s
*** 442,448 
  
   Because optimizer statistics are not transferred by pg_upgrade, you will
   be instructed to run a command to regenerate that information at the end
!  of the upgrade.
  
 
  
--- 442,449 
  
   Because optimizer statistics are not transferred by pg_upgrade, you will
   be instructed to run a command to regenerate that information at the end
!  of the upgrade.  You might need to set connection parameters to
!  match your new cluster.
  
 
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-28 Thread Greg Smith

On 5/28/13 10:00 PM, Jon Nelson wrote:

On Tue, May 28, 2013 at 10:36 AM, Greg Smith  wrote:

On 5/28/13 11:12 AM, Jon Nelson wrote:


It opens a new file, fallocates 16MB, calls fdatasync.



Outside of the run for performance testing, I think it would be good at this
point to validate that there is really a 16MB file full of zeroes resulting
from these operations.  I am not really concerned that posix_fallocate might
be slower in some cases; that seems unlikely.  I am concerned that it might
result in a file that isn't structurally the same as the 16MB of zero writes
implementation used now.


util-linux comes with fallocate which (might?) suffice for testing in
that respect, no?
If that is a real concern, it could be made part of the autoconf
testing, perhaps.


I was just thinking of something to run in your test program, not 
another build time check.  Just run the new allocation sequence, and 
then check the resulting WAL file for a) correct length, and b) 16K of 
zero bytes.  I would like to build some confidence that posix_fallocate 
is operating correctly in this context on at least one platform.  My 
experience with Linux handling this class of functions correctly has 
left me skeptical of them working until that's proven to be the case.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Andres Freund
On 2013-05-28 21:26:49 -0400, Robert Haas wrote:
> On Tue, May 28, 2013 at 8:00 PM, Andres Freund  wrote:
> > I only suggested MOVED_IN/OFF because I didn't remember
> > HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
> > combination could have been produced in the past in a way I couldn't
> > find so far, I am all for it. I don't see a problem with breaking
> > backward compatibility on that level and I don't think there's any way
> > to get there without some level of low level compat break.
>
> I'm not sure why you call it a compatibility break.  It's true that an
> old PostgreSQL can't read new heaps, but we've never guaranteed that
> direction anyway, and every time we add or reinterpret any infomask
> bits, that's true.  For example, the fklocks patch did tat.  What's
> important is that a new PostgreSQL will still be able to read old
> heaps; that is, pg_upgrade compatibility is preserved.

Oh, not the on-disk format. But as you said, code that looks at xmin is
going to need to change. fklocks broke code that relied on
HeapTupleHeaderGetXmax(), this would break code that looks at
xmin. Removing/renaming *GetXmin similarly sounds like a good idea to
make the breakage explicit.

> > I am all for adding a comment why this is safe though. We thought about
> > it for some while before we were convinced...
>
> I'm fine with that, but the logic is present in multiple places, and I
> did not want to comment them all identically.  If there's a central
> place in which a comment would be appropriate, let's add it there; or
> IOW, what do you suggest in detail?

That's a good point. Generally lots of this is underdocumented/commented
and can only learned by spending a year or so in the postgres code. I'd
say rename README.HOT to README and add a section there and reference it
from those two places? It already has a mostly general explanation of
concurrent index builds. Don't have a better idea.

> > Hm. As previously said, I am less than convinced of those adhoc
> > mechanisms and I think this should get properly integrated into the
> > normal cache invalidation mechanisms.
> > But: I think this is safe since we compare the stored/cached xmin/tid
> > with one gotten from the SearchSysCache just before which will point to
> > the correct location as of the last AcceptInvalidationMessages(). I
> > can't think of a way were this would allow the case you describe.
>
> I don't understand why it can't.  AcceptInvalidationMessages()
> guarantees that we're looking at the latest version of the catalog,
> but it doesn't say anything about whether the latest catalog state
> happens to look like any earlier catalog state.

Well, AcceptInvalidationMessages() will get a new version of the tuple
with a new tid/xmin combo. So what would need to happen is the function
being updated (to a new location), then the old version needs to get
removed, then the new one updated again, reusing to the old
location. Allthewhile either both versions need to have been frozen or
we need to have wrapped around to the same xid. All that without the
function being executed inbetween which would have invalidated the old
state and stored a new xmin/tid.
But you're right, nothing except chance prevents that from happening,
not sure what I thought of earlier.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-28 Thread Jon Nelson
On Tue, May 28, 2013 at 10:36 AM, Greg Smith  wrote:
> On 5/28/13 11:12 AM, Jon Nelson wrote:
>>
>> It opens a new file, fallocates 16MB, calls fdatasync.
>
>
> Outside of the run for performance testing, I think it would be good at this
> point to validate that there is really a 16MB file full of zeroes resulting
> from these operations.  I am not really concerned that posix_fallocate might
> be slower in some cases; that seems unlikely.  I am concerned that it might
> result in a file that isn't structurally the same as the 16MB of zero writes
> implementation used now.

util-linux comes with fallocate which (might?) suffice for testing in
that respect, no?
If that is a real concern, it could be made part of the autoconf
testing, perhaps.

> The timing program you're writing has some aspects that are similar to the
> contrib/pg_test_fsync program.  You might borrow some code from there
> usefully.

Thanks! If it looks like what I'm attaching will not do, then I'll
look at that as a possible next step.

> To clarify the suggestion I was making before about including performance
> test results:  that doesn't necessarily mean the testing code must run using
> only the database.  That's better if possible, but as Robert says it may not
> be for some optimizations.  The important thing is to have something
> measuring the improvement that a reviewer can duplicate, and if that's a
> standalone benchmark problem that's still very useful.  The main thing I'm
> wary of is any "this should be faster" claims that don't come with any
> repeatable measurements at all.  Very often theories about the fastest way
> to do something don't match what's actually seen in testing.

Ack.
A note: The attached test program uses *fsync* instead of *fdatasync*
after calling fallocate (or writing out 16MB of zeroes), per an
earlier suggestion.

--
Jon
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define SIXTEENMB 1024*1024*16
#define EIGHTKB 1024*8

void writeout(int fd, char *buf)
{
	int i;
	for (i = 0; i < SIXTEENMB / EIGHTKB; ++i) {
		if (write(fd, buf, EIGHTKB) != EIGHTKB) {
			fprintf(stderr, "Error in write: %m!\n");
			exit(1);
		}
	}
}

int main(int argc, char *argv[])
{
	int with_fallocate, open_close_iterations, rewrite_iterations;
	int fd, i, j;
	double tt;
	struct timeval tv1, tv2;
	char *buf0, *buf1;
	const char *filename; /* for convenience */

	if (argc != 4) {
		fprintf(stderr, "Usage: %s   \n", argv[0]);
		exit(1);
	}
	
	filename = argv[1];

	open_close_iterations = atoi(argv[2]);
	if (open_close_iterations < 0) {
		fprintf(stderr, "Error parsing 'open_close_iterations'!\n");
		exit(1);
	}

	rewrite_iterations = atoi(argv[3]);
	if (rewrite_iterations < 0) {
		fprintf(stderr, "Error parsing 'rewrite_iterations'!\n");
		exit(1);
	}

	buf0 = malloc(SIXTEENMB);
	if (!buf0) {
		fprintf(stderr, "Unable to allocate memory!\n");
		exit(1);
	}
	memset(buf0, 0, SIXTEENMB);

	buf1 = malloc(SIXTEENMB);
	if (!buf1) {
		fprintf(stderr, "Unable to allocate memory!\n");
		exit(1);
	}
	memset(buf1, 1, SIXTEENMB);

	for (with_fallocate = 0;with_fallocate < 2;++with_fallocate) {
		for (i = 0;i < open_close_iterations; ++i) {
			gettimeofday(&tv1, NULL);
			fd = open(filename, O_CREAT | O_EXCL | O_WRONLY);
			if (fd < 0) {
fprintf(stderr, "Error opening file: %m\n");
exit(1);
			}
			if (with_fallocate) {
if (posix_fallocate(fd, 0, SIXTEENMB) != 0) {
	fprintf(stderr, "Error in posix_fallocate!\n");
exit(1);
}
			} else {
writeout(fd, buf0);
			}
			if (fsync(fd)) {
fprintf(stderr, "Error in fdatasync: %m!\n");
exit(1);
			}
			for (j = 0; j < rewrite_iterations; ++j) {
lseek(fd, 0, SEEK_SET);
writeout(fd, buf1);
if (fdatasync(fd)) {
	fprintf(stderr, "Error in fdatasync: %m!\n");
	exit(1);
}
			}
			if (close(fd)) {
fprintf(stderr, "Error in close: %m!\n");
exit(1);
			}
			unlink(filename);		/* don't check for error */
		}
		gettimeofday(&tv2, NULL);
		tt = (tv2.tv_usec + tv2.tv_sec * 100) - (tv1.tv_usec + tv1.tv_sec * 100);
		tt /= 100;
		fprintf(stderr,
			"with%s posix_fallocate: %d open/close iterations, %d rewrite in %0.4fs\n",
			(with_fallocate ? "" : "out"), open_close_iterations, rewrite_iterations, tt);
		sleep(5);
	}
	/* cleanup */
	free(buf0);
	free(buf1);
	return 0;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] all_visible replay aborting due to uninitialized pages

2013-05-28 Thread Andres Freund
On 2013-05-28 21:36:17 -0400, Robert Haas wrote:
> On Tue, May 28, 2013 at 1:58 PM, Andres Freund  wrote:
> > Guessing around I looked and noticed the following problematic pattern:
> > 1) A: wants to do an update, doesn't have enough freespace
> > 2) A: extends the relation on the filesystem level 
> > (RelationGetBufferForTuple)
> > 3) A: does PageInit (RelationGetBufferForTuple)
> > 4) A: aborts, e.g. due to a serialization failure (heap_update)
> >
> > At this point the page is initialized in memory, but not wal logged. It
> > isn't pinned or locked either.
> >
> > 5) B: vacuum finds that page and it's empty. So it marks it all
> > visible. But since the page wasn't written out (we haven't even marked
> > it dirty in 3.) the standby doesn't know that and reports the page as
> > being uninitialized.
> >
> > ISTM the best backbranchable fix for this is to teach lazy_scan_heap to
> > log an FPI for the heap page via visibilitymap_set in that rather
> > limited case.
> >
> > Happy to provide a patch unless somebody has a better idea?
> 
> Good catch.  However, I would suggest using log_newpage() before
> visibilitymap_set() rather than trying to stick extra logic into
> visibilitymap_set().  I think that will be cleaner and simpler.

Thought about that, but given that 9.3's visibilitymap_set already will
already FPI heap pages I concluded it wouldn't really be an improvement
since it's only one ||log_heap_page or so there. Not sure what's
better. Will write the patch and see how it goes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_replication when standby is unreachable

2013-05-28 Thread Abhishek Rai
Hello Postgres gurus,

I'm writing a thin clustering layer on top of Postgres using the
synchronous replication feature.  The goal is to enable HA and survive
permanent loss of a single node.  Using an external coordinator
(Zookeeper), one of the nodes is elected as the primary.  The primary node
then picks up another healthy node as its standby, and starts serving.
Thereafter, the cluster monitors the primary and the standby,  and triggers
a re-election if itself or its standby go down.

Detecting primary health is easy.  But what is the best way to know if the
standby is live?  Since this is not a hot-standby, I cannot send queries to
it.  Currently, I'm sending the following query to the primary:

  SELECT * from pg_stat_replication();

I've noticed that when I terminate the standby (cleanly or through kill
-9), the result of above function goes from 1 row to zero rows.  The result
comes back to 1 row when the standby restarts and reconnects.  I was
wondering if there is any kind of guarantee about the results of
pg_stat_replication as the standby suffers a network partition, and/or
restarts and reconnects with the primary.  Are there any parameters that
control this behavior?

I tried looking at src/backend/replication/walsender.c/WalSndLoop() but am
still not clear on the expected behavior.

Thanks for your time,
Abhishek


Re: [HACKERS] all_visible replay aborting due to uninitialized pages

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 1:58 PM, Andres Freund  wrote:
> Guessing around I looked and noticed the following problematic pattern:
> 1) A: wants to do an update, doesn't have enough freespace
> 2) A: extends the relation on the filesystem level (RelationGetBufferForTuple)
> 3) A: does PageInit (RelationGetBufferForTuple)
> 4) A: aborts, e.g. due to a serialization failure (heap_update)
>
> At this point the page is initialized in memory, but not wal logged. It
> isn't pinned or locked either.
>
> 5) B: vacuum finds that page and it's empty. So it marks it all
> visible. But since the page wasn't written out (we haven't even marked
> it dirty in 3.) the standby doesn't know that and reports the page as
> being uninitialized.
>
> ISTM the best backbranchable fix for this is to teach lazy_scan_heap to
> log an FPI for the heap page via visibilitymap_set in that rather
> limited case.
>
> Happy to provide a patch unless somebody has a better idea?

Good catch.  However, I would suggest using log_newpage() before
visibilitymap_set() rather than trying to stick extra logic into
visibilitymap_set().  I think that will be cleaner and simpler.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 8:00 PM, Andres Freund  wrote:
> I only suggested MOVED_IN/OFF because I didn't remember
> HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
> combination could have been produced in the past in a way I couldn't
> find so far, I am all for it. I don't see a problem with breaking
> backward compatibility on that level and I don't think there's any way
> to get there without some level of low level compat break.

I'm not sure why you call it a compatibility break.  It's true that an
old PostgreSQL can't read new heaps, but we've never guaranteed that
direction anyway, and every time we add or reinterpret any infomask
bits, that's true.  For example, the fklocks patch did tat.  What's
important is that a new PostgreSQL will still be able to read old
heaps; that is, pg_upgrade compatibility is preserved.

> I am all for adding a comment why this is safe though. We thought about
> it for some while before we were convinced...

I'm fine with that, but the logic is present in multiple places, and I
did not want to comment them all identically.  If there's a central
place in which a comment would be appropriate, let's add it there; or
IOW, what do you suggest in detail?

> Hm. As previously said, I am less than convinced of those adhoc
> mechanisms and I think this should get properly integrated into the
> normal cache invalidation mechanisms.
> But: I think this is safe since we compare the stored/cached xmin/tid
> with one gotten from the SearchSysCache just before which will point to
> the correct location as of the last AcceptInvalidationMessages(). I
> can't think of a way were this would allow the case you describe.

I don't understand why it can't.  AcceptInvalidationMessages()
guarantees that we're looking at the latest version of the catalog,
but it doesn't say anything about whether the latest catalog state
happens to look like any earlier catalog state.

> I don't think this is especially problematic though. If you do a tidscan
> starting from a tid that is so old that it can be removed: you're doing
> it wrong. The tid could have been reused for something else anyway. I
> think the ctid chaining is only meaningful if the tuple got updated very
> recently, i.e. you hold a snapshot that prevents the removal of the
> root tuple's snapshot.

That logic seems sound to me.

> Nice work!

Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Tatsuo Ishii
> On 05/28/2013 04:05 PM, Bruce Momjian wrote:
>>
>> On Tue, May 28, 2013 at 03:39:10PM -0700, Joshua D. Drake wrote:
>>>
>>> On 05/28/2013 03:36 PM, Bruce Momjian wrote:
>>>
> The other option would be to do it on query execute but that doesn't
> seem as efficient as it would have to be parsed each time. Although
> it would still be better than reading the actual SQL.

 Well, you could do SET TRANSACTION READ ONLY, and that would prevent
 any
 write transactions.  You could assume it is a read query, and get the
 error and resubmit on the master if that happens, but that sounds
 inefficient.  I thought you were asking for something where you could
 submit a query and it would report back as read/write or read-only.
>>>
>>> No I am suggesting something that before anything happens with the
>>> parser, the protocol knows what is up. So things like pgpool-ii
>>> don't even need a parser, it just knows it is a read only query
>>> because the protocol says so.
>>
>> Oh, that is an interesting idea.  The application is indicating it is
>> read-only via the protocol, and poolers can optimize that.  Don't we
>> have the ability to pass arbitrary GUC values back through the
>> protocol,
>> e.g. transaction_read_only?  If not, that might be a way to do this
>> cleanly.
>>
> 
> I don't know but I don't think so. Anything that is calling SET is
> going to run through the parser.

Right. SET command needs to be parsed by the parser. However, we
already have embedded parameters in the start up packet, which needs
to be recognized by pooler anyway. See "StartupMessage" section in:
http://www.postgresql.org/docs/9.3/static/protocol-message-formats.html

I am not sure backend currently permits to have
default_transaction_read_only = on in the startup packet or not
though.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Jaime Casanova
On Tue, May 28, 2013 at 4:26 PM, Joshua D. Drake  wrote:
>
> On 05/28/2013 02:18 PM, Bruce Momjian wrote:
>
>>> I would like to see the ability to define if a query is read only at
>>> the protocol level, so that load balances that speak libpq can know
>>> what to do with the query without parsing it.
>>
>>
>> Sounds nice, but how would we do that?  That would require libpq to know
>> it, right?  Do we pass anything back after parsing but before execution?
>>   Could it be optional?  What about functions that modify the database
>> --- isn't that only known at execution time?
>
>
> I can't speak to the actual C code that would be required but from a user
> space, I could see something like this:
>
> con = psycopg2.connect(database='testdb', user='test', transaction-type='r')
>
> Thus when the connection is made, before anything else is done, we know it
> is a read only connection and therefore any load balancer speaking libpq
> would also know it is a read only. The default of course would be r/w and
> you would use a different connection handler for r/w or w queries.
>

you can do that today already, kind-of

create an entry in pgbouncer that connect to
host=read-only.servers.dns and make read-only.servers.dns to point to
more than 1 ip.
then when the application wants to do load balancing, just connect to
the entry that points to read-only.servers.dns and let the magic
happens

which would be great is this to happen transparently to the application

> The other option would be to do it on query execute but that doesn't seem as
> efficient as it would have to be parsed each time. Although it would still
> be better than reading the actual SQL.
>

another idea, as someone else mentioned, and i think has been
discussed bedore is a function that says if the query is r-o or not...
maybe even exporting the plan so we don't need to replan again...

Not sure if that is possible, just hand waving...


--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Craig Ringer
On 05/29/2013 05:11 AM, Bruce Momjian wrote:

> Sure, it is on the TODO list:
>
>   https://wiki.postgresql.org/wiki/Todo#.2Fcontrib.2Fpg_upgrade
>
> I can only get a link to pg_upgrade from there, so look two sections
> below that for "Wire Protocol Changes".
Thanks.

The direct link is
https://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes for anyone
looking for it later.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-28 Thread Craig Ringer
On 05/28/2013 04:04 PM, Amit Langote wrote:
> On Tue, May 28, 2013 at 2:32 PM, Craig Ringer  wrote:
>> On 05/11/2013 03:25 AM, Robert Haas wrote:
>>> Not really.  We could potentially fix it by extending the wire
>>> protocol to allow the server to respond to the client's startup packet
>>> with a further challenge, and extend libpq to report that challenge
>>> back to the user and allow sending a response.  But that would break
>>> on-the-wire compatibility, which we haven't done in a good 10 years,
>>> and certainly wouldn't be worthwhile just for this.
>> We were just talking about "things we'd like to do in wire protocol 4".
>>
>> Allowing multi-stage authentication has come up repeatedly and should
>> perhaps go on that list. The most obvious case being "ident auth failed,
>> demand md5".
>>
> I wonder what you think about continuing to use the already
> established connection to the server while you move onto perform
> authentication using next method in the list.
That's precisely what I'm talking about. It'd be nice to avoid the ugly
two-connection approach for SSL too, by allowing STARTTLS or similar
within the protocol.

Being able to negotiate connections - client says "peer?", server says
"failed, peer id doesn't match postgresql username", client says "md5
?" server says "yup, that's ok" - would be nice. For example,
use Kerberos or SSPI where clients are suitably enabled, then fall back
to MD5 where Kerberos or SSPI single-sign-on isn't available.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit fest schedule for 9.4

2013-05-28 Thread Josh Berkus

> Shared responsibility is no-one's responsibility. If we are to have
> multiple CF managers, I think it would be good to have one who's mainly
> responsible, and the second one's job is to nag the first manager if
> ernothing happens, and quickly take over if necessary. Ie. a hot standby
> arrangement, rather than two equal CF managers.

Because, of course, we have zero experience coordinating tasks with
other contributors over the internet.

I agree that one CFM needs to be "senior" (there's a lot of judgement
calls involved) but I don't agree that CFM needs to be a single-threaded
task.

Speaking of which, I still need a backup CFM for CF1. Any volunteers?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Andres Freund
On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
> I have attempted to implement this.  Trouble is, we're out of infomask
> bits.  Using an infomask2 bit might work but we don't have many of
> them left either, so it's worth casting about a bit for a better
> solution.   Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
> this purpose, but I think we're better off trying to reclaim those
> bits in a future release.  Exactly how to do that is a topic for
> another email, but I believe it's very possible.  What I'd like to
> propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
> indicate that xmin is frozen.  This bit pattern isn't used for
> anything else, so there's no confusion possible with existing data
> already on disk, but it's necessary to audit all code that checks
> HEAP_XMIN_INVALID to make sure it doesn't get confused.  I've done
> this, and there's little enough of it that it seems pretty easy to
> handle.

I only suggested MOVED_IN/OFF because I didn't remember
HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
combination could have been produced in the past in a way I couldn't
find so far, I am all for it. I don't see a problem with breaking
backward compatibility on that level and I don't think there's any way
to get there without some level of low level compat break.

> - When we follow HOT chains, we determine where the HOT chain ends by
> matching the xmax of each tuple with the xmin of the next tuple.  If
> they don't match, we conclude that the HOT chain has ended.  I
> initially thought this logic might be buggy even as things stand today
> if the latest tuple in the chain is frozen, but as Andres pointed out
> to me, that's not so.  If the newest tuple in the chain is all-visible
> (the earliest point at which we can theoretically freeze it), all
> earlier tuples are dead altogether, and heap_page_prune() is always
> called after locking the buffer and before freezing, so any tuple we
> freeze must be the first in its HOT chain.  For the same reason, this
> logic doesn't need any adjustment for the new freezing system: it's
> never looking at anything old enough to be frozen in the first place.

I am all for adding a comment why this is safe though. We thought about
it for some while before we were convinced...

> - Various procedural languages use the combination of TID and XMIN to
> determine whether a function needs to be recompiled.  Although the
> possibility of this doing the wrong thing seems quite remote, it's not
> obvious to me why it's theoretically correct even as things stand
> today.  Suppose that previously-frozen tuple is vacuumed away and
> another tuple is placed at the same TID and then frozen.  Then, we
> check whether the cache is still valid and, behold, it is.  This would
> actually get better with this patch, since it wouldn't be enough
> merely for the old and new tuples to both be frozen; they'd have to
> have had the same XID prior to freezing.  I think that could only
> happen if a backend sticks around for at least 2^32 transactions, but
> I don't know what would prevent it in that case.

Hm. As previously said, I am less than convinced of those adhoc
mechanisms and I think this should get properly integrated into the
normal cache invalidation mechanisms.
But: I think this is safe since we compare the stored/cached xmin/tid
with one gotten from the SearchSysCache just before which will point to
the correct location as of the last AcceptInvalidationMessages(). I
can't think of a way were this would allow the case you describe.

> - heap_get_latest_tid() appears broken even without this patch.  It's
> only used on user-specified TIDs, either in a TID scan, or due to the
> use of currtid_byreloid() and currtid_byrelname().  It attempts find
> the latest version of the tuple referenced by the given TID by
> following the CTID links.  Like HOT, it uses XMAX/XMIN matching to
> detect when the chain is broken.  However, unlike HOT, update chains
> can in general span multiple blocks.  It is not now nor has it ever
> been safe to assume that the next tuple in the chain can't be frozen
> before the previous one is vacuumed away.  Andres came up with the
> best example: suppose the tuple to be frozen physically precedes its
> predecessor; then, an in-progress vacuum might reach the to-be-frozen
> tuple before it reaches (and removes) the previous row version.  In
> newer releases, the same problem could be caused by vacuum's
> occasional page-skipping behavior.  As with the previous point, the
> "don't actually change xmin when we freeze" approach actually makes it
> harder for a chain to get "broken" when it shouldn't, but I suspect
> it's just moving us from one set of extremely-obscure failure cases to
> another.

I don't think this is especially problematic though. If you do a tidscan
starting from a tid that is so old that it can be removed: you're doing
it wrong. The tid could have been reused for something else anyway.

Re: [HACKERS] getting rid of freezing

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 12:29 PM, Josh Berkus  wrote:
> On 05/28/2013 07:17 AM, Andres Freund wrote:
>> On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
>>> I was talking this over with Jeff on the plane, and we wanted to be
>>> clear on your goals here:  are you looking to eliminate the *write* cost
>>> of freezing, or just the *read* cost of re-reading already frozen pages?
>>
>> Both. The latter is what I have seen causing more hurt, but the former
>> alone is painful enough.
>
> I guess I don't see how your proposal is reducing the write cost for
> most users then?
>
> - for users with frequently, randomly updated data, pdallvisible would
> not be ever set, so they still need to be rewritten to freeze

Do these users never run vacuum?  As of 9.3, vacuum phase 2 will
typically set PD_ALL_VISIBLE on each relevant page.  The only time
that this WON'T happen is if an insert, update, or delete hits the
page after phases 1 of vacuum and before phase 2 of vacuum.  I don't
think that's going to be the common case.

> - for users with append-only tables, allvisible would never be set since
> those pages don't get vacuumed

There's no good solution for append-only tables.  Eventually, they
will get vacuumed, and when that happens, PD_ALL_VISIBLE will be set,
and freezing will also happen.  I don't think anything that is being
proposed here is going to make that a whole lot better, but it
shouldn't make it any worse than it is now, either.  Since it's
probably not solvable without a rewrite of the heap AM, I'm not going
to feel too bad about that.

> - it would prevent us from getting rid of allvisible, which has a
> documented and known write overhead

Again, I think this is going to be much less of an issue with 9.3, for
the reason explained above.  In 9.2 and prior, we'd scan a page with
dead tuples, prune them to line pointers, vacuum the indexes, and then
mark the dead pointers as unused.  Then, the NEXT vacuum would revisit
the same page and dirty it again ONLY to mark it all-visible.  But in
9.3, the first vacuum will mark the page all-visible at the same time
it marks the dead line pointers unused.  So the write overhead of
PD_ALL_VISIBLE should basically be gone.  If it's not, it would be
good to know why.

> If we just wanted to reduce read cost, why not just take a simpler
> approach and give the visibility map a "isfrozen" bit?  Then we'd know
> which pages didn't need rescanning without nearly as much complexity.

That would break pg_upgrade, which would have to remove visibility map
forks when upgrading.  More importantly, it would require another
round of complex changes to the write-ahead logging in this area.
It's not obvious to me that we'd end up ahead of where we are today,
although perhaps I am a pessimist.

> That would also make it more effective to do precautionary vacuum freezing.

But wouldn't it be a whole lot nicer if we just didn't have to do
vacuum freezing AT ALL?  The point here is to absorb freezing into
some other operation that we already have to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Joshua D. Drake


On 05/28/2013 04:05 PM, Bruce Momjian wrote:


On Tue, May 28, 2013 at 03:39:10PM -0700, Joshua D. Drake wrote:


On 05/28/2013 03:36 PM, Bruce Momjian wrote:


The other option would be to do it on query execute but that doesn't
seem as efficient as it would have to be parsed each time. Although
it would still be better than reading the actual SQL.


Well, you could do SET TRANSACTION READ ONLY, and that would prevent any
write transactions.  You could assume it is a read query, and get the
error and resubmit on the master if that happens, but that sounds
inefficient.  I thought you were asking for something where you could
submit a query and it would report back as read/write or read-only.


No I am suggesting something that before anything happens with the
parser, the protocol knows what is up. So things like pgpool-ii
don't even need a parser, it just knows it is a read only query
because the protocol says so.


Oh, that is an interesting idea.  The application is indicating it is
read-only via the protocol, and poolers can optimize that.  Don't we
have the ability to pass arbitrary GUC values back through the protocol,
e.g. transaction_read_only?  If not, that might be a way to do this
cleanly.



I don't know but I don't think so. Anything that is calling SET is going 
to run through the parser.


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 7:27 PM, Andres Freund  wrote:
> On 2013-05-28 09:39:13 -0700, Josh Berkus wrote:
>> On 05/28/2013 06:21 AM, Robert Haas wrote:
>> > As a general statement, I view this work as something that is likely
>> > needed no matter which one of the "remove freezing" approaches that
>> > have been proposed we choose to adopt.  It does not fix anything in
>> > and of itself, but it (hopefully) removes an objection to the entire
>> > line of inquiry.
>>
>> Agreed.  I have some ideas on how to reduce the impact of freezing as
>> well (of course), and the description of your approach certainly seems
>> to benefit them, especially as it removes the whole "forensic
>> information" objection.
>>
>> One question though: if we're not removing the xmin, how do we know the
>> maximum xid to which we can prune clog?  I can imagine several ways
>> given your approach.
>
> Simply don't count xids which are frozen. Currently we ignore an xid
> because its a special value, after this because the tuple has a certain
> hint bit (combination) set.

Right, what he said.  Calculating the XID before which we no longer
need CLOG is just a matter of looking at all the tuples that we don't
know to be frozen and taking the oldest XID from among those.  This
patch changes the definition of "frozen" but that's a pretty minor
detail of the CLOG-truncation calculation.  So, in essence, this patch
doesn't really make much difference in that area either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 10:53 AM, Benedikt Grundmann
 wrote:
> Today we have seen
>
> 2013-05-28 04:11:12.300 EDT,,,30600,,51a41946.7788,1,,2013-05-27 22:41:10
> EDT,,0,ERROR,XX000,"xlog flush request 1E95/AFB2DB10 is not satisfied ---
> flushed only to 1E7E/21CB79A0","writing block 9 of relation
> base/16416/293974676"""
> 2013-05-28 04:11:13.316 EDT,,,30600,,51a41946.7788,2,,2013-05-27 22:41:10
> EDT,,0,ERROR,XX000,"xlog flush request 1E95/AFB2DB10 is not satisfied ---
> flushed only to 1E7E/21CB79A0","writing block 9 of relation
> base/16416/293974676"""
>
> while taking the backup of the primary.  We have been running for a few days
> like that and today is the first day where we see these problems again.  So
> it's not entirely deterministic / we don't know yet what we have to do to
> reproduce.
>
> So this makes Robert's theory more likely.  However we have also using this
> method (LVM + rsync with hardlinks from primary) for years without these
> problems.  So the big question is what changed?

Well... I don't know.  But my guess is there's something wrong with
the way you're using hardlinks.  Remember, a hardlink means two
logical pointers to the same file on disk.  So if either file gets
modified after the fact, then the other pointer is going to see the
changes.  The xlog flush request not satisfied stuff could happen if,
for example, the backup is pointing to a file, and the primary is
pointing to the same file, and the primary modifies the file after the
backup is taken (thus modifying the backup after-the-fact).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Andres Freund
On 2013-05-28 09:39:13 -0700, Josh Berkus wrote:
> On 05/28/2013 06:21 AM, Robert Haas wrote:
> > As a general statement, I view this work as something that is likely
> > needed no matter which one of the "remove freezing" approaches that
> > have been proposed we choose to adopt.  It does not fix anything in
> > and of itself, but it (hopefully) removes an objection to the entire
> > line of inquiry.
> 
> Agreed.  I have some ideas on how to reduce the impact of freezing as
> well (of course), and the description of your approach certainly seems
> to benefit them, especially as it removes the whole "forensic
> information" objection.
> 
> One question though: if we're not removing the xmin, how do we know the
> maximum xid to which we can prune clog?  I can imagine several ways
> given your approach.

Simply don't count xids which are frozen. Currently we ignore an xid
because its a special value, after this because the tuple has a certain
hint bit (combination) set.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of freezing

2013-05-28 Thread Andres Freund
On 2013-05-28 09:29:26 -0700, Josh Berkus wrote:
> On 05/28/2013 07:17 AM, Andres Freund wrote:
> > On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
> >> I was talking this over with Jeff on the plane, and we wanted to be
> >> clear on your goals here:  are you looking to eliminate the *write* cost
> >> of freezing, or just the *read* cost of re-reading already frozen pages?
> > 
> > Both. The latter is what I have seen causing more hurt, but the former
> > alone is painful enough.
> 
> I guess I don't see how your proposal is reducing the write cost for
> most users then?
> 
> - for users with frequently, randomly updated data, pdallvisible would
> not be ever set, so they still need to be rewritten to freeze

If they update all data they simply never need to get frozen since they
are not old enough.

> - for users with append-only tables, allvisible would never be set since
> those pages don't get vacuumed

They do get vacuumed at least every autovacuum_freeze_max_age even
now. And we should vacuum them more often to make index only scan work
without manual intervention.

> - it would prevent us from getting rid of allvisible, which has a
> documented and known write overhead

Aha.

> This means that your optimization would benefit only users whose pages
> get updated occasionally (enough to trigger vaccuum) but not too
> frequently (which would unset allvisible).  While we lack statistics,
> intuition suggests that this is a minority of databases.

I don't think that follows.

> If we just wanted to reduce read cost, why not just take a simpler
> approach and give the visibility map a "isfrozen" bit?  Then we'd know
> which pages didn't need rescanning without nearly as much complexity.
> That would also make it more effective to do precautionary vacuum freezing.

Because we would still write/dirty/xlog the changes three times?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unsigned integer types

2013-05-28 Thread Andrew Dunstan


On 05/28/2013 07:00 PM, Bruce Momjian wrote:

On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote:


Did you try 'oid' as an unsigned int4?

Using an internal catalog type for user data seems like a horrible idea to me...

Uh, not sure if we can say oid is only an internal catalog type.  It is
certainly used for storing large object references.



pg_largeobject has oids. I don't thing the fact that we use oids to 
store references to pg_largeobject should blind us to the fact that oid 
should be an opaque type. Using them as substitute unsigned ints seems 
like a horrible idea to me too.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of freezing

2013-05-28 Thread Josh Berkus
On 05/28/2013 07:17 AM, Andres Freund wrote:
> On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
>> I was talking this over with Jeff on the plane, and we wanted to be
>> clear on your goals here:  are you looking to eliminate the *write* cost
>> of freezing, or just the *read* cost of re-reading already frozen pages?
> 
> Both. The latter is what I have seen causing more hurt, but the former
> alone is painful enough.

I guess I don't see how your proposal is reducing the write cost for
most users then?

- for users with frequently, randomly updated data, pdallvisible would
not be ever set, so they still need to be rewritten to freeze
- for users with append-only tables, allvisible would never be set since
those pages don't get vacuumed
- it would prevent us from getting rid of allvisible, which has a
documented and known write overhead

This means that your optimization would benefit only users whose pages
get updated occasionally (enough to trigger vaccuum) but not too
frequently (which would unset allvisible).  While we lack statistics,
intuition suggests that this is a minority of databases.

If we just wanted to reduce read cost, why not just take a simpler
approach and give the visibility map a "isfrozen" bit?  Then we'd know
which pages didn't need rescanning without nearly as much complexity.
That would also make it more effective to do precautionary vacuum freezing.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Josh Berkus
On 05/28/2013 06:21 AM, Robert Haas wrote:
> As a general statement, I view this work as something that is likely
> needed no matter which one of the "remove freezing" approaches that
> have been proposed we choose to adopt.  It does not fix anything in
> and of itself, but it (hopefully) removes an objection to the entire
> line of inquiry.

Agreed.  I have some ideas on how to reduce the impact of freezing as
well (of course), and the description of your approach certainly seems
to benefit them, especially as it removes the whole "forensic
information" objection.

One question though: if we're not removing the xmin, how do we know the
maximum xid to which we can prune clog?  I can imagine several ways
given your approach.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 03:39:10PM -0700, Joshua D. Drake wrote:
> 
> On 05/28/2013 03:36 PM, Bruce Momjian wrote:
> 
> >>The other option would be to do it on query execute but that doesn't
> >>seem as efficient as it would have to be parsed each time. Although
> >>it would still be better than reading the actual SQL.
> >
> >Well, you could do SET TRANSACTION READ ONLY, and that would prevent any
> >write transactions.  You could assume it is a read query, and get the
> >error and resubmit on the master if that happens, but that sounds
> >inefficient.  I thought you were asking for something where you could
> >submit a query and it would report back as read/write or read-only.
> 
> No I am suggesting something that before anything happens with the
> parser, the protocol knows what is up. So things like pgpool-ii
> don't even need a parser, it just knows it is a read only query
> because the protocol says so.

Oh, that is an interesting idea.  The application is indicating it is
read-only via the protocol, and poolers can optimize that.  Don't we
have the ability to pass arbitrary GUC values back through the protocol,
e.g. transaction_read_only?  If not, that might be a way to do this
cleanly.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unsigned integer types

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote:
> On 5/28/13 4:07 PM, Bruce Momjian wrote:
> >On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:
> >>2. INTEGER
> >>
> >>I had to store a record with several uint32. I had to store an awful
> >>lot of them; hundreds GB of data per day. Roughly half of the record
> >>consists of uint32 fields.
> >>Increasing the data type to bigint would mean that I could store 3
> >>instead of 4 days worth of data on available storage.
> >>Continuing with int4 meant that I would have to deal with the data in
> >>special way when in enters and leaves the DB. It's easy in C: just
> >>cast uint32_t to int32_t. But python code requires more complex
> >>changes. And the web backend too...
> >>
> >>It's suffering either way!
> >>
> >>Just imagine the conversation I had to have with my boss: "Either
> >>we'll increase budged for storage, or we need to touch every bit of
> >>the system".
> >
> >Did you try 'oid' as an unsigned int4?
> 
> Using an internal catalog type for user data seems like a horrible idea to 
> me...

Uh, not sure if we can say oid is only an internal catalog type.  It is
certainly used for storing large object references.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unsigned integer types

2013-05-28 Thread David Johnston
Maciej Gajewski wrote
> I'm also afraid that with
> the extension I'd be left on my own maintaining it forever. While if
> this could go into the core product, it would live forever.

Clarification from the gallery: are we talking an extension or a custom
PostgreSQL build/fork?

If it is an extension the stick it up on GitHub and let whomever finds it
valuable help contribute to keeping it relevant.

No use letting perfection stand in the way of usability.  If the current
solutions are too slow then exploring the extension aspect - even if it
falls short - is worthwhile.  At minimum you learn from the experience and
maybe someone else (or even yourself) can build on that foundation.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Unsigned-integer-types-tp5756994p5757234.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unsigned integer types

2013-05-28 Thread Jim Nasby

On 5/28/13 4:07 PM, Bruce Momjian wrote:

On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:

2. INTEGER

I had to store a record with several uint32. I had to store an awful
lot of them; hundreds GB of data per day. Roughly half of the record
consists of uint32 fields.
Increasing the data type to bigint would mean that I could store 3
instead of 4 days worth of data on available storage.
Continuing with int4 meant that I would have to deal with the data in
special way when in enters and leaves the DB. It's easy in C: just
cast uint32_t to int32_t. But python code requires more complex
changes. And the web backend too...

It's suffering either way!

Just imagine the conversation I had to have with my boss: "Either
we'll increase budged for storage, or we need to touch every bit of
the system".


Did you try 'oid' as an unsigned int4?


Using an internal catalog type for user data seems like a horrible idea to me...

I'll also add that Maciej hasn't explained why these types couldn't be an 
extension (in fact, I'm pretty sure there's already code for this out there, 
though possibly not utilizing the extension framework).

If you don't need implicit casting it should actually be pretty easy to do this 
externally, and I don't think maintenance would be an issue (it's not like 
uint's change...).
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 01:32:53PM +0800, Craig Ringer wrote:
> On 05/11/2013 03:25 AM, Robert Haas wrote:
> > Not really.  We could potentially fix it by extending the wire
> > protocol to allow the server to respond to the client's startup packet
> > with a further challenge, and extend libpq to report that challenge
> > back to the user and allow sending a response.  But that would break
> > on-the-wire compatibility, which we haven't done in a good 10 years,
> > and certainly wouldn't be worthwhile just for this.
> We were just talking about "things we'd like to do in wire protocol 4".
> 
> Allowing multi-stage authentication has come up repeatedly and should
> perhaps go on that list. The most obvious case being "ident auth failed,
> demand md5".

Added to TODO.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Joshua D. Drake


On 05/28/2013 03:36 PM, Bruce Momjian wrote:


The other option would be to do it on query execute but that doesn't
seem as efficient as it would have to be parsed each time. Although
it would still be better than reading the actual SQL.


Well, you could do SET TRANSACTION READ ONLY, and that would prevent any
write transactions.  You could assume it is a read query, and get the
error and resubmit on the master if that happens, but that sounds
inefficient.  I thought you were asking for something where you could
submit a query and it would report back as read/write or read-only.


No I am suggesting something that before anything happens with the 
parser, the protocol knows what is up. So things like pgpool-ii don't 
even need a parser, it just knows it is a read only query because the 
protocol says so.


JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Bruce Momjian
On Mon, May 27, 2013 at 03:06:13PM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > OK, I have added a section to the TODO list for this:
> > 
> > Desired changes that would prevent upgrades with pg_upgrade
> > 
> > 32-bit page checksums 
> > 
> > Are there any others?
> 
> I would have each data segment be self-identifying, i.e. have a magic
> number at the beginning of the file and the relation OID, some fork
> identification and the segment number somewhere -- probably the special
> space of the first page.

Is this something we want on the TODO?  I was not clear how to do with
without making the first page format special or wasting space on all the
other pages.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 02:26:06PM -0700, Joshua D. Drake wrote:
> >Sounds nice, but how would we do that?  That would require libpq to know
> >it, right?  Do we pass anything back after parsing but before execution?
> >  Could it be optional?  What about functions that modify the database
> >--- isn't that only known at execution time?
> 
> I can't speak to the actual C code that would be required but from a
> user space, I could see something like this:
> 
> con = psycopg2.connect(database='testdb', user='test',
> transaction-type='r')
> 
> Thus when the connection is made, before anything else is done, we
> know it is a read only connection and therefore any load balancer
> speaking libpq would also know it is a read only. The default of
> course would be r/w and you would use a different connection handler
> for r/w or w queries.
> 
> The other option would be to do it on query execute but that doesn't
> seem as efficient as it would have to be parsed each time. Although
> it would still be better than reading the actual SQL.

Well, you could do SET TRANSACTION READ ONLY, and that would prevent any
write transactions.  You could assume it is a read query, and get the
error and resubmit on the master if that happens, but that sounds
inefficient.  I thought you were asking for something where you could
submit a query and it would report back as read/write or read-only.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-05-28 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/06/2013 04:49 PM, Joe Conway wrote:
> If I create a database and install postgis as an extension, and
> then run pg_dump I get this:
> 
> [...] CREATE EXTENSION IF NOT EXISTS postgis WITH SCHEMA public; 
> [...] CREATE RULE geometry_columns_delete AS ON DELETE TO
> geometry_columns DO INSTEAD NOTHING; [...]
> 
> Shouldn't that CREATE RULE be implicitly part of the CREATE
> EXTENSION?
> 
> If so, is this a pg_dump bug, PostGIS bug, or pilot error?
> 
> FWIW I see CREATE OR REPLACE RULE statements in the PostGIS
> extension SQL script.

The attached one-liner seems to do the trick. It should probably be
backpatched to 9.1. Remaining questions:

1) Are there other database object types, likely to be included in
   extension scripts, that are also lacking dependency records to
   their extension?

2) How should we handle already installed extensions, which will still
   lack dependency records after this bugfix?

Thanks,

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRpSi4AAoJEDfy90M199hlzn4P/j2tgs35b2Y3YoMJHIRDUYmK
uihsKybUYN1uYlS58Igv04lhqWk4MMFFzfwvztENP2SzVysMkA7QoP0BIKy/lF+b
CWwouTLkygnU/a9Mj8TTXMc4YINp4zHOK/XKZaong6zIwCGIXtXp9acl6m7wDI1v
S2FkeRB2dJXyC/Vxv0n9p5JfW75KG6DadJa4ZlcsBx7yV1cwnmePLhoDvsX5fPro
BlD4pFV+GgyW8d65kZxuzIQ/Wy44o0f97yDdeZKi4mzEYooakWzl5iZN5idEBQ3i
LDgjwrCPvod0t8sYGSMaz9qc/fPpWAt4sPkwC6QOCE0u7PJnbZ0oGEGb0JBFGPBc
nV/1sib9KXRfALEUknKYALBqnFhZsaGOTFV9yKhtvscqn/Hmk0mXyocVB9rihcO6
7ipgOgpeqFsS7IQMtiFBUIFPl2ARtD01NKIHbDIKFTQPfss6XXTgIBYmT8W0ldaT
f2jxCEN5SzdCq/G3rx5Z2Dlqau3WIfYiSmWyAG/I2UDBtr7/J7TOSKoJh1+3ntvT
Vxc9b+z8dEz3wE143JOhi1aCNCQ7ybI/K44EhkLjSR4hC6CQiCKlI4OP5gaFj8FJ
YhxTe4FscYTYZVVguBTOKxMzrI1caIt+3LEJ3C7GTkTrQnYc/oZL4v86XlbV24ro
V8IUaO0XFeam7oDxYOZw
=d/qa
-END PGP SIGNATURE-
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index cb59f13..c48c661 100644
*** a/src/backend/rewrite/rewriteDefine.c
--- b/src/backend/rewrite/rewriteDefine.c
*** InsertRule(char *rulname,
*** 181,186 
--- 181,189 
  			   DEPENDENCY_NORMAL);
  	}
  
+ 	/* dependency on extension */
+ 	recordDependencyOnCurrentExtension(&myself, is_update);
+ 
  	/* Post creation hook for new rule */
  	InvokeObjectPostCreateHook(RewriteRelationId, rewriteObjectId, 0);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Mon, May 27, 2013 at 05:21:16PM -0700, Joshua D. Drake wrote:

> > I would like to see the ability to define if a query is read only at
> > the protocol level, so that load balances that speak libpq can know
> > what to do with the query without parsing it.
> 
> Sounds nice, but how would we do that?  That would require libpq to know
> it, right?  Do we pass anything back after parsing but before execution?
>  Could it be optional?  What about functions that modify the database
> --- isn't that only known at execution time?

Well, if you hit anything that tries to acquire an Xid, and you're in a
context that said only read-only was acceptable, just raise an error.

In a similar vein, I vaguely recall we discussed (after some security
vulnerability involving SQL injection) a mode where we only accept only
one command per PQexec() call, i.e. reject execution of commands that
contain multiple queries.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FIX: auto_explain docs

2013-05-28 Thread Tomas Vondra
Hi,

I've just noticed that this patch in 2012-01 commitfest

  https://commitfest.postgresql.org/action/patch_view?id=729

added log_timing option to auto_explain, but it never actually made it
to the docs. Attached is a patch for current master, but 9.2 should be
patched too.

regards
Tomas
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 8325f03..03b2309 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -141,6 +141,27 @@ LOAD 'auto_explain';
 

 
+ auto_explain.log_timing (boolean)
+
+
+ auto_explain.log_timing configuration parameter
+
+
+ 
+  auto_explain.log_timing causes EXPLAIN
+  (ANALYZE, TIMING off) output, rather than just EXPLAIN (ANALYZE)
+  output. The overhead of repeatedly reading the system clock can slow down the
+  query significantly on some systems, so it may be useful to set this
+  parameter to FALSE when only actual row counts, and not
+  exact times, are needed.
+  This parameter is only effective when auto_explain.log_analyze
+  is also enabled.  It defaults to TRUE.
+ 
+
+   
+   
+   
+
  auto_explain.log_nested_statements (boolean)
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Joshua D. Drake


On 05/28/2013 02:18 PM, Bruce Momjian wrote:


I would like to see the ability to define if a query is read only at
the protocol level, so that load balances that speak libpq can know
what to do with the query without parsing it.


Sounds nice, but how would we do that?  That would require libpq to know
it, right?  Do we pass anything back after parsing but before execution?
  Could it be optional?  What about functions that modify the database
--- isn't that only known at execution time?


I can't speak to the actual C code that would be required but from a 
user space, I could see something like this:


con = psycopg2.connect(database='testdb', user='test', 
transaction-type='r')


Thus when the connection is made, before anything else is done, we know 
it is a read only connection and therefore any load balancer speaking 
libpq would also know it is a read only. The default of course would be 
r/w and you would use a different connection handler for r/w or w queries.


The other option would be to do it on query execute but that doesn't 
seem as efficient as it would have to be parsed each time. Although it 
would still be better than reading the actual SQL.


Sincerely,

Joshua D. Drake




--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Bruce Momjian
On Mon, May 27, 2013 at 02:09:05PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Mon, May 27, 2013 at 09:17:50AM -0400, Bruce Momjian wrote:
> >> Yes, we should be collecting things we want to do for a pg_upgrade break
> >> so we can see the list all in one place.
> 
> > OK, I have added a section to the TODO list for this:
> 
> > Desired changes that would prevent upgrades with pg_upgrade
> > 32-bit page checksums 
> 
> > Are there any others?
> 
> GiST indexes really oughta have a metapage so there can be a version
> number in them.
> 
> Also, if we are going to unify hstore and json, it'd be nice if we could
> change the existing binary representation of hstore (per discussions at
> Oleg and Teodor's talk --- this will be moot if we invent a new core
> type, but it'd be better not to have to).
> 
> There are probably some other data-type-specific cleanups we could
> make, but I have to go get on an airplane so no time to think about it.

OK, GiST and hstore added to TODO list.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Bruce Momjian
On Mon, May 27, 2013 at 05:21:16PM -0700, Joshua D. Drake wrote:
> 
> On 05/27/2013 04:58 PM, Craig Ringer wrote:
> >
> >On 05/28/2013 12:41 AM, Simon Riggs wrote:
> >>I'm happy with that.
> >>
> >>I was also thinking about collecting changes not related just to disk
> >>format, if any exist.
> >Any wire protocol or syntax changes?
> >
> >I can't seem to find a "things we want to do in wire protocol v4" doc in
> >the wiki but I know I've seen occasional discussion of things that can't
> >be done without protocol changes. Anyone with a better memory than me
> >able to pitch in?
> >
> >What'd be required to support in-band query cancellation? Sending
> >per-statement GUCs (to allow true statement timeout)?
> >
> 
> I would like to see the ability to define if a query is read only at
> the protocol level, so that load balances that speak libpq can know
> what to do with the query without parsing it.

Sounds nice, but how would we do that?  That would require libpq to know
it, right?  Do we pass anything back after parsing but before execution?
 Could it be optional?  What about functions that modify the database
--- isn't that only known at execution time?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 07:58:33AM +0800, Craig Ringer wrote:
> On 05/28/2013 12:41 AM, Simon Riggs wrote:
> > I'm happy with that.
> >
> > I was also thinking about collecting changes not related just to disk
> > format, if any exist.
> Any wire protocol or syntax changes?
> 
> I can't seem to find a "things we want to do in wire protocol v4" doc in
> the wiki but I know I've seen occasional discussion of things that can't
> be done without protocol changes. Anyone with a better memory than me
> able to pitch in?

Sure, it is on the TODO list:

https://wiki.postgresql.org/wiki/Todo#.2Fcontrib.2Fpg_upgrade

I can only get a link to pg_upgrade from there, so look two sections
below that for "Wire Protocol Changes".

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unsigned integer types

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:
> 2. INTEGER
> 
> I had to store a record with several uint32. I had to store an awful
> lot of them; hundreds GB of data per day. Roughly half of the record
> consists of uint32 fields.
> Increasing the data type to bigint would mean that I could store 3
> instead of 4 days worth of data on available storage.
> Continuing with int4 meant that I would have to deal with the data in
> special way when in enters and leaves the DB. It's easy in C: just
> cast uint32_t to int32_t. But python code requires more complex
> changes. And the web backend too...
> 
> It's suffering either way!
> 
> Just imagine the conversation I had to have with my boss: "Either
> we'll increase budged for storage, or we need to touch every bit of
> the system".

Did you try 'oid' as an unsigned int4?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions

2013-05-28 Thread Szymon Guz
Hi,
I've got a patch.

This is for a plpython enhancement.

There is an item at the TODO list
http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python float"

This patch uses a decimal.Decimal type from Python standard library for the
plpthon function numeric argument instead of float.

Patch contains changes in code, documentation and tests.

Most probably there is something wrong, as this is my first Postgres patch
:)

thanks,
Szymon


plpython_decimal.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 03:49:14PM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Oh, I see.  Have we historically been OK with these as long as it is
> > clear it is the PG copyright?  I know we had do some cleanups in the
> > past, but I don't remember the details, obviously.
> 
> We've had request from companies because they wanted to distribute
> Postgres and lawyers weren't comfortable with copyright statements in
> assorted files.  In those cases we've asked the people mentioned in such
> copyright statements, got approval to remove the offending copyright
> lines, and removed them.

OK, so it was different _licenses_ that was the problem.  OK.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-28 Thread Christophe Pettus

On May 28, 2013, at 12:49 PM, Alvaro Herrera wrote:

> We've had request from companies because they wanted to distribute
> Postgres and lawyers weren't comfortable with copyright statements in
> assorted files.  In those cases we've asked the people mentioned in such
> copyright statements, got approval to remove the offending copyright
> lines, and removed them.

I assume this topic has come up and been rejected for some reason, but just in 
case: The Django project requires an explicit agreement for contributions that 
end up in the main source tree for it, part of which is the acceptance of the 
Django license and copyright notice.  (I don't have my copy right in front of 
me, but I don't think it's a full-on assignment of copyright.)

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-28 Thread Alvaro Herrera
Bruce Momjian wrote:

> Oh, I see.  Have we historically been OK with these as long as it is
> clear it is the PG copyright?  I know we had do some cleanups in the
> past, but I don't remember the details, obviously.

We've had request from companies because they wanted to distribute
Postgres and lawyers weren't comfortable with copyright statements in
assorted files.  In those cases we've asked the people mentioned in such
copyright statements, got approval to remove the offending copyright
lines, and removed them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-28 Thread Andres Freund
On 2013-05-28 14:50:57 -0400, Bruce Momjian wrote:
> On Tue, May 28, 2013 at 08:37:44PM +0200, Andres Freund wrote:
> > On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:
> > > > We have a lot of code in PostgreSQL source tree with different
> > > > copyright notices, and there's no problem with that as long as the
> > > > coe is licensed under the PostgreSQL license. For patches that add
> > >
> > > Really?  Where?  I think we have removed them all, as far as I know.
> > > A quick grep shows:
> > >
> > >   $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
> > >   ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 
> > > Henry Spencer
> > >   ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 
> > > Henry Spencer
> > >   ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 
> > > Henry Spencer
> > >   ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
> > >   ./src/bin/pg_dump/pg_backup_directory.c: *  Portions Copyright (c) 
> > > 2000, Philip Warner
> > >   ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
> > >   ./src/port/getopt_long.c: * Portions Copyright (c) 2003
> > 
> > Just remove the "Portions" part from your grep, and you will see quite
> > some more...
> 
> Oh, I see.  Have we historically been OK with these as long as it is
> clear it is the PG copyright?  I know we had do some cleanups in the
> past, but I don't remember the details, obviously.

I don't see a problem with a different copyrights as long as the
licenses are compatible. I remember code getting (re-)moved because it
was GPL, which is a different thing to having a different copyright.

I don't have a all that wide look over the history though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 08:37:44PM +0200, Andres Freund wrote:
> On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:
> > > We have a lot of code in PostgreSQL source tree with different
> > > copyright notices, and there's no problem with that as long as the
> > > coe is licensed under the PostgreSQL license. For patches that add
> >
> > Really?  Where?  I think we have removed them all, as far as I know.
> > A quick grep shows:
> >
> > $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
> > ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 
> > Henry Spencer
> > ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 
> > Henry Spencer
> > ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 
> > Henry Spencer
> > ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
> > ./src/bin/pg_dump/pg_backup_directory.c: *  Portions Copyright (c) 
> > 2000, Philip Warner
> > ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
> > ./src/port/getopt_long.c: * Portions Copyright (c) 2003
> 
> Just remove the "Portions" part from your grep, and you will see quite
> some more...

Oh, I see.  Have we historically been OK with these as long as it is
clear it is the PG copyright?  I know we had do some cleanups in the
past, but I don't remember the details, obviously.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-28 Thread Andres Freund
On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:
> > We have a lot of code in PostgreSQL source tree with different
> > copyright notices, and there's no problem with that as long as the
> > coe is licensed under the PostgreSQL license. For patches that add
>
> Really?  Where?  I think we have removed them all, as far as I know.
> A quick grep shows:
>
>   $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
>   ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 
> Henry Spencer
>   ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 
> Henry Spencer
>   ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 
> Henry Spencer
>   ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
>   ./src/bin/pg_dump/pg_backup_directory.c: *  Portions Copyright (c) 
> 2000, Philip Warner
>   ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
>   ./src/port/getopt_long.c: * Portions Copyright (c) 2003

Just remove the "Portions" part from your grep, and you will see quite
some more...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover

2013-05-28 Thread Bruce Momjian
On Thu, May 23, 2013 at 01:48:24PM -0400, Heikki Linnakangas wrote:
> On 23.05.2013 08:03, Simon Riggs wrote:
> >On 23 May 2013 12:10, Heikki Linnakangas  wrote:
> >
> >>Please take a look: https://github.com/vmware/pg_rewind
> >
> >The COPYRIGHT file shows that VMware is claiming copyright on unstated
> >parts of the code for this. As such, its not a normal submission to
> >the PostgreSQL project, which involves placing copyright with the
> >PGDG.
> 
> We have a lot of code in PostgreSQL source tree with different
> copyright notices, and there's no problem with that as long as the
> coe is licensed under the PostgreSQL license. For patches that add

Really?  Where?  I think we have removed them all, as far as I know.
A quick grep shows:

$ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 
Henry Spencer
./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 
Henry Spencer
./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 
Henry Spencer
./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
./src/bin/pg_dump/pg_backup_directory.c: *  Portions Copyright (c) 
2000, Philip Warner
./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
./src/port/getopt_long.c: * Portions Copyright (c) 2003

Can someone comment on the "Philip Warner" item?  Would someone contact
him to clarify we can remove the mention?  CC'ing him.

> or modify code in PostgreSQL, we generally have copyright notices
> with just PGDG, to avoid having a long list of copyright notices of
> a lot of companies and individuals on every file. I'm no lawyer, but
> I believe there's no difference from the legal point of view.

Probably, but some mentions can cause concern when our code is reviewed
by companies, so simplicity is good.

> >As a result, while it sounds interesting, people should be aware of
> >that and I suggest we shouldn't discuss that code on this list, to
> >avoid any disputes should we decide to include a similar facility in
> >core Postgres in the future.
> 
> That's just paranoia. There are a lot of tools out there on
> pgfoundry, with various copyright holders and even difference
> licenses, and it's fine to talk about all those on this list.
> Besides, the code is licensed under the PostgreSQL license, so if
> someone decides we should have this e.g in contrib, you can just
> grab the sources and commit. Thirdly, there's no reason to refrain
> from even discussing this, even if someone would include a similar
> facility in core Postgres - this is about copyrights, not patents
> (and yes, this contribution has been cleared by VMware legal
> department; VMware doesn't hold any patents on this)

I think Simon has a good point, as VMWare has asserted patents on some
changes to their version of Postgres in the past, so if the copyright
mentions VMWare, we can't assume it is patent-free.  Just the fact you
had to check with the VMware legal department verifies there is cause
for concern about things coming from VMWare.  In fact, I am curious what
level of contribution requires a legal check, but I am not sure you can
even share that information.

Anyway, I would love to think we don't need to worry about this, but I
think we do --- not in this case, but in general.  I acknowledge that
VMWare has been disciplined in share only patent-free information, at
the community's request.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] all_visible replay aborting due to uninitialized pages

2013-05-28 Thread Andres Freund
Hi,

A customer of ours reporting a standby loosing sync with the primary due
to the following error:
CONTEXT:  xlog redo visible: rel 1663/XXX/XXX; blk 173717
WARNING:  page 173717 of relation base/XXX/XXX is uninitialized
...
PANIC:  WAL contains references to invalid pages

Guessing around I looked and noticed the following problematic pattern:
1) A: wants to do an update, doesn't have enough freespace
2) A: extends the relation on the filesystem level (RelationGetBufferForTuple)
3) A: does PageInit (RelationGetBufferForTuple)
4) A: aborts, e.g. due to a serialization failure (heap_update)

At this point the page is initialized in memory, but not wal logged. It
isn't pinned or locked either.

5) B: vacuum finds that page and it's empty. So it marks it all
visible. But since the page wasn't written out (we haven't even marked
it dirty in 3.) the standby doesn't know that and reports the page as
being uninitialized.

ISTM the best backbranchable fix for this is to teach lazy_scan_heap to
log an FPI for the heap page via visibilitymap_set in that rather
limited case.

Happy to provide a patch unless somebody has a better idea?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XLogInsert scaling, revisited

2013-05-28 Thread Alvaro Herrera
Heikki Linnakangas wrote:
> I've been slowly continuing to work that I started last winder to
> make XLogInsert scale better. I have tried quite a few different
> approaches since then, and have settled on the attached. This is
> similar but not exactly the same as what I did in the patches I
> posted earlier.

Did this go anywhere?


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Joshua D. Drake


On 05/28/2013 08:36 AM, Hannu Krosing wrote:


The conversation does not change.

Further, we are not Firefox. We are not user software. We are
developer software.

At least some of the real-world problems with PostgreSQL
comes from "We are developer software" mentality.

Yes, We are developer software, but we are also a
DBA/maintainer/infrastructure  manager


I would not hire any of those three that weren't smart enough to 
understand our versioning scheme or had the wits to open a web browser 
and google:


PostgreSQL versioning

The answer is link #1 on Google.

That said, I won't raise a stink. I am not really of a strong opinion 
either way except to say we are not solving a problem. We are just 
tickling each other's fancies.


JD




--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GRANT role_name TO role_name ON database_name

2013-05-28 Thread Clark C. Evans
I'd really love the ability to grant a *user* 
role-based privileges database by database.

For background, I have several databases running 
in a single cluster, one database per business unit.  
Each database has the same core schema with the same 
basic role permissions, but with significant 
customizations.  Even if it were technically possible 
to make them a single database, it would be unwise 
for administrative reasons.  Each user may have
access to any number of databases, but, within 
each database may be assigned to different roles.

For example, we may have an 'auditor' role which 
gives specific access to some trigger-maintained 
change history.  But, a given user may only be an 
auditor for the business units they are assigned.
That said, they may have other roles in other 
business units.  My requirements are very fluid
here and dictated by regulatory requirements.

Currently, we work around the lack of per-database
role permissions by prefixing roles with the name
of the database.  This is quite tedious though, 
it requires specialized logic to overlay creation,
backups, restores, updating and deleting databases.
It's very irritating, requires custom code and
conventions, even though it works.

About 5 years ago, I think I asked for roles to 
become database specific.  I know think that is a 
bit draconian given the cluster-wide permission
structure used by PostgreSQL.  However, perhaps
a way to make it optionally limited to a given 
database would simplify my permission tracking?

Best,

Clark


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extent Locks

2013-05-28 Thread Stephen Frost
* Jaime Casanova (ja...@2ndquadrant.com) wrote:
> On Tue, May 28, 2013 at 10:53 AM, Andres Freund  
> wrote:
> > But I agree. This needs to work without much manual intervention. I
> > think we just need to make autovacuum truncate only if it finds more
> > free space than whatever amount we might have added at that relation
> > size plus some slop.
> 
> And how do you decide the amount of that "slop"?

How about % of table size?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Extent Locks

2013-05-28 Thread Jaime Casanova
On Tue, May 28, 2013 at 10:53 AM, Andres Freund  wrote:
>
> But I agree. This needs to work without much manual intervention. I
> think we just need to make autovacuum truncate only if it finds more
> free space than whatever amount we might have added at that relation
> size plus some slop.
>

And how do you decide the amount of that "slop"?

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] potential bug in JSON

2013-05-28 Thread Szymon Guz
On 28 May 2013 17:53, Josh Berkus  wrote:

> On 05/28/2013 08:38 AM, Szymon Guz wrote:
> > I've found a potential bug. Why the "->" operator returns JSON instead of
> > TEXT? It doesn't make sens for me, and the documentation doesn't inform
> > about that.
>
> Yes, it most certainly does:
> http://www.postgresql.org/docs/9.3/static/functions-json.html
>
> If you want to get text, use the ->> operator.
>
>
>
Yea, I noticed that. It was a little bit misleading for me that "->" is for
getting field and "->>" is for getting field as text. Especially when
"->"::TEXT doesn't return the same value as "->>".
Maybe there should be added "as JSON" to those operators which don't return
text?

Szymon


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 11:56 AM, Josh Berkus  wrote:
>
>>> This argument comes up every couple of years and the people that
>>> are trying to solve the problem by changing the versioning are
>>> ignoring the fact that there is no problem to solve.
>
> We just had this discussion on -advocacy (where it belongs, frankly)

+1.

> a
> couple months ago:
>
> http://www.postgresql.org/message-id/512e8ef8.3000...@agliodbs.com
>
> To sum up: the negatives of changing our version numbering scheme
> outweighed the positives.

And +1 to that, too.

FWIW, I think we may want to consider retitling 9.4 as 10.0, not
because of any binary compatibility break (which, for the record, I
oppose) but because of features.  It's a little early to make that
call just yet, of course, but I have a good feeling about this cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extent Locks

2013-05-28 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2013-05-28 10:07:06 -0400, Stephen Frost wrote:
> > I'm really not, at all, excited about adding in GUCs for this.
> 
> But I thought you were in favor of doing complex stuff like mapping
> segments filled somewhere else into place :P

That wouldn't require a GUC.. ;)

> But I agree. This needs to work without much manual intervention. I
> think we just need to make autovacuum truncate only if it finds more
> free space than whatever amount we might have added at that relation
> size plus some slop.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Josh Berkus

>> This argument comes up every couple of years and the people that
>> are trying to solve the problem by changing the versioning are
>> ignoring the fact that there is no problem to solve.

We just had this discussion on -advocacy (where it belongs, frankly) a
couple months ago:

http://www.postgresql.org/message-id/512e8ef8.3000...@agliodbs.com

To sum up: the negatives of changing our version numbering scheme
outweighed the positives.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] potential bug in JSON

2013-05-28 Thread Josh Berkus
On 05/28/2013 08:38 AM, Szymon Guz wrote:
> I've found a potential bug. Why the "->" operator returns JSON instead of
> TEXT? It doesn't make sens for me, and the documentation doesn't inform
> about that.

Yes, it most certainly does:
http://www.postgresql.org/docs/9.3/static/functions-json.html

If you want to get text, use the ->> operator.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extent Locks

2013-05-28 Thread Andres Freund
On 2013-05-28 10:07:06 -0400, Stephen Frost wrote:
> * Jaime Casanova (ja...@2ndquadrant.com) wrote:
> > btw, we can also use a next_extend_blocks GUC/reloption as a limit for
> > autovacuum so it will allow that empty pages at the end of the table
> 
> I'm really not, at all, excited about adding in GUCs for this.

But I thought you were in favor of doing complex stuff like mapping
segments filled somewhere else into place :P

But I agree. This needs to work without much manual intervention. I
think we just need to make autovacuum truncate only if it finds more
free space than whatever amount we might have added at that relation
size plus some slop.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] potential bug in JSON

2013-05-28 Thread Andrew Dunstan


On 05/28/2013 11:38 AM, Szymon Guz wrote:
I've found a potential bug. Why the "->" operator returns JSON instead 
of TEXT? It doesn't make sens for me, and the documentation doesn't 
inform about that.


postgres=# SELECT ('{"id": 1}'::json -> 'id')::int;
ERROR:  cannot cast type json to integer
LINE 1: SELECT ('{"id": 1}'::json -> 'id')::int;

postgres=# SELECT ('{"id": 1}'::json -> 'id')::text::int;
 int4
--
1
(1 row)







This is not a bug. It is documented and by design.

If you want text, use the ->> operator. That's exactly what it's for.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extent Locks

2013-05-28 Thread Merlin Moncure
On Tue, May 28, 2013 at 9:07 AM, Stephen Frost  wrote:
> * Jaime Casanova (ja...@2ndquadrant.com) wrote:
>> btw, we can also use a next_extend_blocks GUC/reloption as a limit for
>> autovacuum so it will allow that empty pages at the end of the table
>
> I'm really not, at all, excited about adding in GUCs for this.  We just
> need to realize when the only available space in the relation is at the
> end and people are writing to it and avoid truncating pages off the end-
> if we don't already have locks that prevent vacuum from doing this
> already.  I'd want to see where it's actually happening before stressing
> over it terribly much.

+1   autovacuum configuration is already much too complex as it
is...we should be removing/consolidating options, not adding them.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] potential bug in JSON

2013-05-28 Thread Szymon Guz
I've found a potential bug. Why the "->" operator returns JSON instead of
TEXT? It doesn't make sens for me, and the documentation doesn't inform
about that.

postgres=# SELECT ('{"id": 1}'::json -> 'id')::int;
ERROR:  cannot cast type json to integer
LINE 1: SELECT ('{"id": 1}'::json -> 'id')::int;

postgres=# SELECT ('{"id": 1}'::json -> 'id')::text::int;
 int4
--
1
(1 row)


postgres=# SELECT version();
version

---
 PostgreSQL 9.3beta1 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3, 64-bit
(1 row)


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Hannu Krosing
On 05/28/2013 06:13 AM, Joshua D. Drake wrote:
>
> On 05/27/2013 06:53 PM, Craig Ringer wrote:
>>
>> On 05/28/2013 09:39 AM, Gavin Flower wrote:
>>> Yes, I hate the Firefox style number inflation.
>> I was arguing *for* it ;-)
>>
>> I don't like it much either, but (a) we do about one release a year, not
>> one every few weeks and (b) it's very clear from a quick look at Stack
>> Overflow or first-posts to pgsql-general how confusing two-part major
>> versions are to users. If it's a bit less aesthetically pleasing I'm OK
>> with that.
>>
>
> This argument comes up every couple of years and the people that
> are trying to solve the problem by changing the versioning are
> ignoring the fact that there is no problem to solve.
>
> Consider the following exchange:
>
> Client: I have X problem with PostgreSQL
> CMD: What version?
> Client: 9
> CMD: Which version of 9?
> Client: 9.0.2
> CMD: You should be running 9.2.4 or at least 9.0.13
If the problem has the "at least" part, then the first part is superfluous.

If somebody wants to figure out how to run streaming CTE-s on
"postgresql 8" then you need to ask
for exact "major version which is two first digits" if they want to run
streaming replication there you
can skip on e-mail exchange and tell right away that SR was added in
version 9.0

> ...
>
> The conversation does not change.
>
> Further, we are not Firefox. We are not user software. We are
> developer software.
At least some of the real-world problems with PostgreSQL
comes from "We are developer software" mentality.

Yes, We are developer software, but we are also a
DBA/maintainer/infrastructure  manager
software which needs to live a long time after the "development" is
finished.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-28 Thread Greg Smith

On 5/28/13 11:12 AM, Jon Nelson wrote:

It opens a new file, fallocates 16MB, calls fdatasync.


Outside of the run for performance testing, I think it would be good at 
this point to validate that there is really a 16MB file full of zeroes 
resulting from these operations.  I am not really concerned that 
posix_fallocate might be slower in some cases; that seems unlikely.  I 
am concerned that it might result in a file that isn't structurally the 
same as the 16MB of zero writes implementation used now.


The timing program you're writing has some aspects that are similar to 
the contrib/pg_test_fsync program.  You might borrow some code from 
there usefully.


To clarify the suggestion I was making before about including 
performance test results:  that doesn't necessarily mean the testing 
code must run using only the database.  That's better if possible, but 
as Robert says it may not be for some optimizations.  The important 
thing is to have something measuring the improvement that a reviewer can 
duplicate, and if that's a standalone benchmark problem that's still 
very useful.  The main thing I'm wary of is any "this should be faster" 
claims that don't come with any repeatable measurements at all.  Very 
often theories about the fastest way to do something don't match what's 
actually seen in testing.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-28 Thread Andres Freund
On 2013-05-28 10:12:05 -0500, Jon Nelson wrote:
> On Tue, May 28, 2013 at 9:19 AM, Robert Haas  wrote:
> > On Tue, May 28, 2013 at 10:15 AM, Andres Freund  
> > wrote:
> >> On 2013-05-28 10:03:58 -0400, Robert Haas wrote:
> >>> On Sat, May 25, 2013 at 2:55 PM, Jon Nelson  
> >>> wrote:
> >>> >> The biggest thing missing from this submission is information about 
> >>> >> what
> >>> >> performance testing you did.  Ideally performance patches are 
> >>> >> submitted with
> >>> >> enough information for a reviewer to duplicate the same test the 
> >>> >> author did,
> >>> >> as well as hard before/after performance numbers from your test 
> >>> >> system.  It
> >>> >> often turns tricky to duplicate a performance gain, and being able to 
> >>> >> run
> >>> >> the same test used for initial development eliminates a lot of the 
> >>> >> problems.
> >>> >
> >>> > This has been a bit of a struggle. While it's true that WAL file
> >>> > creation doesn't happen with great frequency, and while it's also true
> >>> > that - with strace and other tests - it can be proven that
> >>> > fallocate(16MB) is much quicker than writing it zeroes by hand,
> >>> > proving that in the larger context of a running install has been
> >>> > challenging.
> >>>
> >>> It's nice to be able to test things in the context of a running
> >>> install, but sometimes a microbenchmark is just as good.  I mean, if
> >>> posix_fallocate() is faster, then it's just faster, right?
> >>
> >> Well, it's a bit more complex than that. Fallocate doesn't actually
> >> initializes the disk space in most filesystems, just marks it as
> >> allocated and zeroed which is one of the reasons it can be noticeably
> >> faster. But that can make the runtime overhead of writing to those pages
> >> higher.
> >
> > Maybe it would be good to measure that impact.  Something like this:
> >
> > 1. Write 16MB of zeroes to an empty file in the same size chunks we're
> > currently using (8kB?).  Time that.  Rewrite the file with real data.
> > Time that.
> > 2. posix_fallocate() an empty file out to 16MB.  Time that.  Rewrite
> > the fie with real data.  Time that.
> >
> > Personally, I have trouble believing that writing 16MB of zeroes by
> > hand is "better" than telling the OS to do it for us.  If that's so,
> > the OS is just stupid, because it ought to be able to optimize the
> > crap out of that compared to anything we can do.  Of course, it is
> > more than possible that the OS is in fact stupid.  But I'd like to
> > hope not.
>
> I wrote a little C program to do something very similar to that (which
> I'll hope to post later today).
> It opens a new file, fallocates 16MB, calls fdatasync.  Then it loops
> 10 times:  seek to the start of the file, writes 16MB of ones, calls
> fdatasync.

You need to call fsync() not fdatasync() the first time round. fdatasync
doesn't guarantee metadata is synced.

> Then it closes and removes the file, re-opens it, and this time writes
> out 16MB of zeroes, calls fdatasync, and then does the same loop as
> above. The program times the process from file open to file unlink,
> inclusive.
>
> The results - for me - are pretty consistent: using fallocate is
> 12-13% quicker than writing out zeroes.

Cool!

> I used fdatasync twice to (attempt) to mimic what the WAL writer does.

Not sure what you mean by that though?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Process memory architecture

2013-05-28 Thread Merlin Moncure
On Mon, May 27, 2013 at 7:29 AM, Stephen Frost  wrote:
> * Atri Sharma (atri.j...@gmail.com) wrote:
>> Yes, too many indexes wont hurt much.BTW,wont making too many indexes
>> on columns that probably dont have as many values as to deserve
>> them(so,essentially,indiscriminately making indexes) hurt the
>> performance/memory usage?
>
> I'd expect the performance issue would be from planner time more than
> memory usage- but if there is a serious memory usage issue here, then
> it'd be valuable to have a test case showing what's happening.  We may
> not be releasing the sys cache in some cases or otherwise have a bug in
> this area.

Note, backends do use private memory to cache various things
(relcache, etc).   Absolutely pathological workloads (tons of tables,
tons of (especially) views, etc can exercise this into the gigabytes
and there is no effective way to monitor and control it.  Normally,
it's not a very big deal though.

So, to be a bit more specific, the index *data* (like all on disk
structures) is buffered in shared memory.  But certain plans/metadata
stuff is in private memory.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-28 Thread Jon Nelson
On Tue, May 28, 2013 at 9:19 AM, Robert Haas  wrote:
> On Tue, May 28, 2013 at 10:15 AM, Andres Freund  
> wrote:
>> On 2013-05-28 10:03:58 -0400, Robert Haas wrote:
>>> On Sat, May 25, 2013 at 2:55 PM, Jon Nelson  
>>> wrote:
>>> >> The biggest thing missing from this submission is information about what
>>> >> performance testing you did.  Ideally performance patches are submitted 
>>> >> with
>>> >> enough information for a reviewer to duplicate the same test the author 
>>> >> did,
>>> >> as well as hard before/after performance numbers from your test system.  
>>> >> It
>>> >> often turns tricky to duplicate a performance gain, and being able to run
>>> >> the same test used for initial development eliminates a lot of the 
>>> >> problems.
>>> >
>>> > This has been a bit of a struggle. While it's true that WAL file
>>> > creation doesn't happen with great frequency, and while it's also true
>>> > that - with strace and other tests - it can be proven that
>>> > fallocate(16MB) is much quicker than writing it zeroes by hand,
>>> > proving that in the larger context of a running install has been
>>> > challenging.
>>>
>>> It's nice to be able to test things in the context of a running
>>> install, but sometimes a microbenchmark is just as good.  I mean, if
>>> posix_fallocate() is faster, then it's just faster, right?
>>
>> Well, it's a bit more complex than that. Fallocate doesn't actually
>> initializes the disk space in most filesystems, just marks it as
>> allocated and zeroed which is one of the reasons it can be noticeably
>> faster. But that can make the runtime overhead of writing to those pages
>> higher.
>
> Maybe it would be good to measure that impact.  Something like this:
>
> 1. Write 16MB of zeroes to an empty file in the same size chunks we're
> currently using (8kB?).  Time that.  Rewrite the file with real data.
> Time that.
> 2. posix_fallocate() an empty file out to 16MB.  Time that.  Rewrite
> the fie with real data.  Time that.
>
> Personally, I have trouble believing that writing 16MB of zeroes by
> hand is "better" than telling the OS to do it for us.  If that's so,
> the OS is just stupid, because it ought to be able to optimize the
> crap out of that compared to anything we can do.  Of course, it is
> more than possible that the OS is in fact stupid.  But I'd like to
> hope not.

I wrote a little C program to do something very similar to that (which
I'll hope to post later today).
It opens a new file, fallocates 16MB, calls fdatasync.  Then it loops
10 times:  seek to the start of the file, writes 16MB of ones, calls
fdatasync.
Then it closes and removes the file, re-opens it, and this time writes
out 16MB of zeroes, calls fdatasync, and then does the same loop as
above. The program times the process from file open to file unlink,
inclusive.

The results - for me - are pretty consistent: using fallocate is
12-13% quicker than writing out zeroes. I used fdatasync twice to
(attempt) to mimic what the WAL writer does.

--
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-28 Thread Merlin Moncure
On Sat, May 25, 2013 at 11:27 AM, Merlin Moncure  wrote:
> On Sat, May 25, 2013 at 4:39 AM, Simon Riggs  wrote:
>> There are a number of changes we'd probably like to make to the way
>> things work in Postgres. This thread is not about discussing what
>> those are, just to say that requirements exist and have been discussed
>> in various threads over time.
>>
>> The constraint on such changes is that we've decided that we must have
>> an upgrade path from release to release.
>>
>> So I'd like to make a formal suggestion of a plan for how we cope with this:
>>
>> 1. Implement online upgrade in 9.4 via the various facilities we have
>> in-progress. That looks completely possible.
>>
>> 2. Name the next release after that 10.0 (would have been 9.5). We
>> declare now that
>> a) 10.0 will support on-line upgrade from 9.4 (only)
>> b) various major incompatibilities will be introduced in 10.0 - the
>> change in release number will indicate to everybody that is the case
>> c) agree that there will be no pg_upgrade patch from 9.4 to 10.0, so
>> that we will not be constrained by that
>>
>> This plan doesn't presume any particular change. Each change would
>> need to be discussed on a separate thread, with a separate case for
>> each. All I'm suggesting is that we have a coherent plan for the
>> timing of such changes, so we can bundle them together into one
>> release.
>>
>> By doing this now we give ourselves lots of time to plan changes that
>> will see us good for another decade. If we don't do this, then we
>> simply risk losing the iniative by continuing to support legacy
>> formats and approaches.
>
> Huh.  I don't think that bumping the version number to 10.0 vs 9.5 is
> justification to introduce breaking changes.  In fact, I would rather
> see 10.0 be the version where we formally stop doing that.  I
> understand that some stuff needs to be improved but it often doesn't
> seem to be worth the cost in the long run.

Please disregard this comment -- I didn't realize the topic was
regarding on disk format -- I mistakenly though it was opening the
door for user level feature changes.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

2013-05-28 Thread Benedikt Grundmann
Today we have seen

2013-05-28 04:11:12.300 EDT,,,30600,,51a41946.7788,1,,2013-05-27 22:41:10
EDT,,0,ERROR,XX000,"xlog flush request 1E95/AFB2DB10 is not satisfied ---
flushed only to 1E7E/21CB79A0","writing block 9 of relation
base/16416/293974676"""
2013-05-28 04:11:13.316 EDT,,,30600,,51a41946.7788,2,,2013-05-27 22:41:10
EDT,,0,ERROR,XX000,"xlog flush request 1E95/AFB2DB10 is not satisfied ---
flushed only to 1E7E/21CB79A0","writing block 9 of relation
base/16416/293974676"""

while taking the *backup of the primary*.  We have been running for a few
days like that and today is the first day where we see these problems
again.  So it's not entirely deterministic / we don't know yet what we have
to do to reproduce.

So this makes Robert's theory more likely.  However we have also using this
method (LVM + rsync with hardlinks from primary) for years without these
problems.  So the big question is what changed?

One hypothesis is that it is related to the primary being in hot_standby
instead of minimal or archive wal_method (which we used before we switched
to 9.2).

Here are the entries in the log related to the startup of the corrupt
testing cluster:

2013-05-27 22:41:10.029 EDT,,,30598,,51a41946.7786,1,,2013-05-27 22:41:10
EDT,,0,LOG,0,"database system was interrupted; last known up at
2013-05-26 21:01:09 EDT",""
2013-05-27 22:41:10.029 EDT,,,30599,"",51a41946.7787,1,"",2013-05-27
22:41:10 EDT,,0,LOG,0,"connection received: host=172.27.65.204
port=55279",""
2013-05-27 22:41:10.030 EDT,,,30598,,51a41946.7786,2,,2013-05-27 22:41:10
EDT,,0,LOG,0,"database system was not properly shut down; automatic
recovery in progress",""
2013-05-27 22:41:10.030 EDT,"as-elephant","postgres",30599,"
172.27.65.204:55279",51a41946.7787,2,"",2013-05-27 22:41:10
EDT,,0,FATAL,57P03,"the database system is starting up",""
2013-05-27 22:41:10.031 EDT,,,30598,,51a41946.7786,3,,2013-05-27 22:41:10
EDT,,0,LOG,0,"redo starts at 1E7E/2152B178",""
2013-05-27 22:41:10.094 EDT,,,30598,,51a41946.7786,4,,2013-05-27 22:41:10
EDT,,0,LOG,0,"record with zero length at 1E7E/215AC6B8",""
2013-05-27 22:41:10.094 EDT,,,30598,,51a41946.7786,5,,2013-05-27 22:41:10
EDT,,0,LOG,0,"redo done at 1E7E/215AC688",""
2013-05-27 22:41:10.094 EDT,,,30598,,51a41946.7786,6,,2013-05-27 22:41:10
EDT,,0,LOG,0,"last completed transaction was at log time 2013-05-26
21:09:08.06351-04",""
2013-05-27 22:41:10.134 EDT,,,30595,,51a41945.7783,1,,2013-05-27 22:41:09
EDT,,0,LOG,0,"database system is ready to accept connections",""
2013-05-27 22:41:10.134 EDT,,,30603,,51a41946.778b,1,,2013-05-27 22:41:10
EDT,,0,LOG,0,"autovacuum launcher started",""
2013-05-27 22:41:15.037 EDT,,,30608,"",51a4194b.7790,1,"",2013-05-27
22:41:15 EDT,,0,LOG,0,"connection received: host=172.27.65.204
port=55283",""

This means we currently do NOT have a way to make backups that we trust.
We are very open to any suggestions of any alternative methods we should
consider using.  The database is of non trivial size by now:

proddb=> select pg_size_pretty(pg_database_size('proddb'));
 pg_size_pretty

 1294 GB
(1 row)

The backup script itself is by now a rather long OCaml program, so I doubt
the value in posting it to this list.  But here is the log of what it did
which should be pretty explanatory:

proddb backup: starting
proddb backup: /bin/bash -c "/usr/bin/ssh -l root tot-dbc-001 ls -d
/net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb'.*-*-*.*-*-*'"
(enqueued)
proddb backup: /bin/bash -c "/usr/bin/ssh -l root tot-dbc-001 ls -d
/net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb'.*-*-*.*-*-*'" (running
as pid: [23422])
proddb backup: /bin/bash -c "/usr/bin/ssh -l root tot-dbc-001 ls -d
/net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb'.*-*-*.*-*-*'" ([23422]
exited normally)
proddb backup: /bin/mkdir -p
/net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress (enqueued)
proddb backup: /bin/mkdir -p
/net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress (running as
pid: [23433])
proddb backup: /bin/mkdir -p
/net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress ([23433]
exited normally)
proddb backup: /bin/bash -c "/usr/bin/ssh -l root tot-dbc-001
/mnt/global/base/bin/db tools backup backup -v -src '"((dbname proddb)
(hostname tot-dbc-001) (superuser postgres_prod) (basedir /database)
(version 9.2))"' -dst '"((username postgres) (hostname 127.0.0.1)
(backup_dir
/net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress)
(last_backup_dir (..//proddb.2013-05-26.20-00-00)
proddb backup: /bin/bash -c "/usr/bin/ssh -l root tot-dbc-001
/mnt/global/base/bin/db tools backup backup -v -src '"((dbname proddb)
(hostname tot-dbc-001) (superuser postgres_prod) (basedir /database)
(version 9.2))"' -dst '"((username postgres) (hostname 127.0.0.1)
(backup_dir
/net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress)
(l

Re: [HACKERS] PostgreSQL Process memory architecture

2013-05-28 Thread Robert Haas
On Mon, May 27, 2013 at 10:23 AM, Atri Sharma  wrote:
>   >We may still be able to do better than what we're doing
>> today, but I'm still suspicious that you're going to run into other
>> issues with having 500 indexes on a table anyway.
>
> +1. I am suspicious that the large number of indexes is the problem
> here,even if the problem is not with book keeping associated with
> those indexes.

Right.  The problem seems likely to be that each additional index
requires a relcache entry, which uses some backend-local memory.  But
NOT having those backend-local relcache entries would likely be
devastating for performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background worker and normal exit

2013-05-28 Thread Andres Freund
On 2013-05-28 10:33:47 -0400, Robert Haas wrote:
> On Tue, May 28, 2013 at 10:31 AM, Andres Freund  
> wrote:
> > On 2013-05-28 10:23:46 -0400, Robert Haas wrote:
> >> On Sun, May 26, 2013 at 6:48 PM, Michael Paquier
> >> > - set bgw_restart_time to BGW_NEVER_RESTART. and have the bgworler exit 
> >> > with
> >> > non-0 status code.
> >>
> >> That might be good enough, though.
> >
> > I suggested that to Fujii at pgcon, and it seems to work for him. But I
> > think this sucks since you loose support for a restart upon a FATAL or
> > similar error. And you cannot mark that as something non-fatal in the
> > log. To this day I laugh about the following oddity in the xorg log:
> >
> > [30.087] (II) RADEON(0): RandR 1.2 enabled, ignore the following RandR 
> > disabled message.
> > [30.088] (--) RandR disabled
> >
> > I really don't want to go there.
> >
> > You actually can only return a 1 since everything else will tear down
> > the whole cluster...
> >
> > We actually were discussing this recently:
> > http://archives.postgresql.org/message-id/20130423134833.GD8499%40alap2.anarazel.de
> >
> > I think a separate return code for "exited gracefully, don't restart"
> > would be a good idea.
> 
> Yeah.  Or maybe the restart-timing/restart-when logic should just
> apply to the exit(0) case as well.  Not sure what the downside of that
> would be.

Loosing the ability to restart a process where the reason for exiting
are non-fatal and shouldn't be logged noisily or are already logged. I
actually could use both capabilities.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background worker and normal exit

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 10:31 AM, Andres Freund  wrote:
> On 2013-05-28 10:23:46 -0400, Robert Haas wrote:
>> On Sun, May 26, 2013 at 6:48 PM, Michael Paquier
>> > - set bgw_restart_time to BGW_NEVER_RESTART. and have the bgworler exit 
>> > with
>> > non-0 status code.
>>
>> That might be good enough, though.
>
> I suggested that to Fujii at pgcon, and it seems to work for him. But I
> think this sucks since you loose support for a restart upon a FATAL or
> similar error. And you cannot mark that as something non-fatal in the
> log. To this day I laugh about the following oddity in the xorg log:
>
> [30.087] (II) RADEON(0): RandR 1.2 enabled, ignore the following RandR 
> disabled message.
> [30.088] (--) RandR disabled
>
> I really don't want to go there.
>
> You actually can only return a 1 since everything else will tear down
> the whole cluster...
>
> We actually were discussing this recently:
> http://archives.postgresql.org/message-id/20130423134833.GD8499%40alap2.anarazel.de
>
> I think a separate return code for "exited gracefully, don't restart"
> would be a good idea.

Yeah.  Or maybe the restart-timing/restart-when logic should just
apply to the exit(0) case as well.  Not sure what the downside of that
would be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background worker and normal exit

2013-05-28 Thread Andres Freund
On 2013-05-28 10:23:46 -0400, Robert Haas wrote:
> On Sun, May 26, 2013 at 6:48 PM, Michael Paquier
> > - set bgw_restart_time to BGW_NEVER_RESTART. and have the bgworler exit with
> > non-0 status code.
> 
> That might be good enough, though.

I suggested that to Fujii at pgcon, and it seems to work for him. But I
think this sucks since you loose support for a restart upon a FATAL or
similar error. And you cannot mark that as something non-fatal in the
log. To this day I laugh about the following oddity in the xorg log:

[30.087] (II) RADEON(0): RandR 1.2 enabled, ignore the following RandR 
disabled message.
[30.088] (--) RandR disabled

I really don't want to go there.

You actually can only return a 1 since everything else will tear down
the whole cluster...

We actually were discussing this recently:
http://archives.postgresql.org/message-id/20130423134833.GD8499%40alap2.anarazel.de

I think a separate return code for "exited gracefully, don't restart"
would be a good idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of freezing

2013-05-28 Thread Andres Freund
On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
> I was talking this over with Jeff on the plane, and we wanted to be
> clear on your goals here:  are you looking to eliminate the *write* cost
> of freezing, or just the *read* cost of re-reading already frozen pages?

Both. The latter is what I have seen causing more hurt, but the former
alone is painful enough.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background worker and normal exit

2013-05-28 Thread Robert Haas
On Sun, May 26, 2013 at 6:48 PM, Michael Paquier
 wrote:
>> Hmm so you can't have workers just "doing something once" and exit? I have
>> to admit, i didn't follow bgworkers closely in the past, but could you give
>> a short insight on why this is currently not possible?
>
> Bgworkers are expected to run all the time, and will be restarted each time
> they exit cleanly with a status code 0. Note that they are *still* restarted
> immediately even if bgw_restart_time is set at BGW_NEVER_RESTART or to a
> certain value.
> There are actually two ways you can use to have them perform a one-time
> task:
> - put it in indefinite sleep after the task is accomplished

That's not really the same thing...

> - set bgw_restart_time to BGW_NEVER_RESTART. and have the bgworler exit with
> non-0 status code.

That might be good enough, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 10:15 AM, Andres Freund  wrote:
> On 2013-05-28 10:03:58 -0400, Robert Haas wrote:
>> On Sat, May 25, 2013 at 2:55 PM, Jon Nelson  
>> wrote:
>> >> The biggest thing missing from this submission is information about what
>> >> performance testing you did.  Ideally performance patches are submitted 
>> >> with
>> >> enough information for a reviewer to duplicate the same test the author 
>> >> did,
>> >> as well as hard before/after performance numbers from your test system.  
>> >> It
>> >> often turns tricky to duplicate a performance gain, and being able to run
>> >> the same test used for initial development eliminates a lot of the 
>> >> problems.
>> >
>> > This has been a bit of a struggle. While it's true that WAL file
>> > creation doesn't happen with great frequency, and while it's also true
>> > that - with strace and other tests - it can be proven that
>> > fallocate(16MB) is much quicker than writing it zeroes by hand,
>> > proving that in the larger context of a running install has been
>> > challenging.
>>
>> It's nice to be able to test things in the context of a running
>> install, but sometimes a microbenchmark is just as good.  I mean, if
>> posix_fallocate() is faster, then it's just faster, right?
>
> Well, it's a bit more complex than that. Fallocate doesn't actually
> initializes the disk space in most filesystems, just marks it as
> allocated and zeroed which is one of the reasons it can be noticeably
> faster. But that can make the runtime overhead of writing to those pages
> higher.

Maybe it would be good to measure that impact.  Something like this:

1. Write 16MB of zeroes to an empty file in the same size chunks we're
currently using (8kB?).  Time that.  Rewrite the file with real data.
Time that.
2. posix_fallocate() an empty file out to 16MB.  Time that.  Rewrite
the fie with real data.  Time that.

Personally, I have trouble believing that writing 16MB of zeroes by
hand is "better" than telling the OS to do it for us.  If that's so,
the OS is just stupid, because it ought to be able to optimize the
crap out of that compared to anything we can do.  Of course, it is
more than possible that the OS is in fact stupid.  But I'd like to
hope not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-28 Thread Andres Freund
On 2013-05-28 10:03:58 -0400, Robert Haas wrote:
> On Sat, May 25, 2013 at 2:55 PM, Jon Nelson  wrote:
> >> The biggest thing missing from this submission is information about what
> >> performance testing you did.  Ideally performance patches are submitted 
> >> with
> >> enough information for a reviewer to duplicate the same test the author 
> >> did,
> >> as well as hard before/after performance numbers from your test system.  It
> >> often turns tricky to duplicate a performance gain, and being able to run
> >> the same test used for initial development eliminates a lot of the 
> >> problems.
> >
> > This has been a bit of a struggle. While it's true that WAL file
> > creation doesn't happen with great frequency, and while it's also true
> > that - with strace and other tests - it can be proven that
> > fallocate(16MB) is much quicker than writing it zeroes by hand,
> > proving that in the larger context of a running install has been
> > challenging.
> 
> It's nice to be able to test things in the context of a running
> install, but sometimes a microbenchmark is just as good.  I mean, if
> posix_fallocate() is faster, then it's just faster, right?

Well, it's a bit more complex than that. Fallocate doesn't actually
initializes the disk space in most filesystems, just marks it as
allocated and zeroed which is one of the reasons it can be noticeably
faster. But that can make the runtime overhead of writing to those pages
higher.

I wonder whether noticeably upping checkpoint segments and then
a) COPY in a large table
b) a pgbench on a previously initialized table.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of freezing

2013-05-28 Thread Robert Haas
On Sat, May 25, 2013 at 6:14 AM, Simon Riggs  wrote:
>> One thought I had is that it might be beneficial to freeze when a page
>> ceases to be all-visible, rather than when it becomes all-visible.
>> Any operation that makes the page not-all-visible is going to emit an
>> FPI anyway, so we don't have to worry about torn pages in that case.
>> Under such a scheme, we'd have to enforce the rule that xmin and xmax
>> are ignored for any page that is all-visible; and when a page ceases
>> to be all-visible, we have to go back and really freeze the
>> pre-existing tuples.  I think we might be able to use the existing
>> all_visible_cleared/new_all_visible_cleared flags to trigger this
>> behavior, without adding anything new to WAL at all.
>
> I like the idea but it would mean we'd have to freeze in the
> foreground path rather in a background path.

That's true, but I think with this approach it would be really cheap.
The overhead of setting a few bits in a page is very small compared to
the overhead of emitting a WAL record.  We'd have to test it, but I
wouldn't be surprised to find the cost is too small to measure.

> Have we given up on the double buffering idea to remove FPIs
> completely? If we did that, then this wouldn't work.

I don't see why those things are mutually exclusive.  What is the relationship?

> Anyway, I take it the direction of this idea is that "we don't need a
> separate freezemap, just use the vismap". That seems to be forcing
> ideas down a particular route we may regret. I'd rather just keep
> those things separate, even if we manage to merge the WAL actions for
> most of the time.

Hmm.  To me it seems highly desirable to merge those things, because
they're basically the same thing.  The earliest time at which we can
freeze a tuple is when it's all-visible, and the only argument I've
ever heard for waiting longer is to preserve the original xmin for
forensic purposes, which I think we can do anyway.  I have posted a
patch for that on another thread.  I don't like having two separate
concepts where one will do; I think the fact that it is structured
that way today is mostly an artifact of one setting being page-level
and the other tuple-level, which is a thin excuse for so much
complexity.

> I think the right way is actually to rethink and simplify all this
> complexity of Freezing/Pruning/Hinting/Visibility

I agree, but I think that's likely to have to wait until we get a
pluggable storage API, and then a few years beyond that for someone to
develop the technology to enable the new and better way.  In the
meantime, if we can eliminate or even reduce the impact of freezing in
the near term, I think that's worth doing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extent Locks

2013-05-28 Thread Stephen Frost
* Jaime Casanova (ja...@2ndquadrant.com) wrote:
> btw, we can also use a next_extend_blocks GUC/reloption as a limit for
> autovacuum so it will allow that empty pages at the end of the table

I'm really not, at all, excited about adding in GUCs for this.  We just
need to realize when the only available space in the relation is at the
end and people are writing to it and avoid truncating pages off the end-
if we don't already have locks that prevent vacuum from doing this
already.  I'd want to see where it's actually happening before stressing
over it terribly much.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [BUGS] COPY .... (FORMAT binary) syntax doesn't work

2013-05-28 Thread Robert Haas
On Mon, May 27, 2013 at 10:31 AM, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 26 May 2013 17:10, Tom Lane  wrote:
>>> More readable would be to invent an intermediate nonterminal falling
>>> between ColId and ColLabel, whose expansion would be "IDENT |
>>> unreserved_keyword | col_name_keyword | type_func_name_keyword", and
>>> then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
>>> Any thoughts about a name for that new nonterminal?
>
>> Do you think complicating the parser in that way is worth the trouble
>> for this case? Could that slow down parsing?
>
> It makes the grammar tables a bit larger (1% or so IIRC).  There would
> be some distributed penalty for that, but probably not much.  Of course
> there's always the slippery-slope argument about that.
>
>> We don't actually have to fix it; clearly not too many people are
>> bothered, since no complaints in 3 years. Documenting 'binary' seems
>> better.
>
> Well, my thought is there are other cases.  For instance:
>
> regression=# create role binary;
> ERROR:  syntax error at or near "binary"
> LINE 1: create role binary;
> ^
> regression=# create user cross;
> ERROR:  syntax error at or near "cross"
> LINE 1: create user cross;
> ^
>
> If we don't have to treat type_func_name_keywords as reserved in these
> situations, shouldn't we avoid doing so?

I am almost always in favor of making more things less reserved, so +1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-28 Thread Robert Haas
On Sat, May 25, 2013 at 2:55 PM, Jon Nelson  wrote:
>> The biggest thing missing from this submission is information about what
>> performance testing you did.  Ideally performance patches are submitted with
>> enough information for a reviewer to duplicate the same test the author did,
>> as well as hard before/after performance numbers from your test system.  It
>> often turns tricky to duplicate a performance gain, and being able to run
>> the same test used for initial development eliminates a lot of the problems.
>
> This has been a bit of a struggle. While it's true that WAL file
> creation doesn't happen with great frequency, and while it's also true
> that - with strace and other tests - it can be proven that
> fallocate(16MB) is much quicker than writing it zeroes by hand,
> proving that in the larger context of a running install has been
> challenging.

It's nice to be able to test things in the context of a running
install, but sometimes a microbenchmark is just as good.  I mean, if
posix_fallocate() is faster, then it's just faster, right?  It's
likely to be pretty hard to get reproducible numbers for how much this
actually helps in the real world because write tests are inherently
pretty variable depending on a lot of factors we don't control, so
even if Jon has got the best possible test, the numbers may bounce
around so much that you can't really measure the (probably small) gain
from this approach.  But that doesn't seem like a reason not to adopt
the approach and take whatever gain there is.  At least, not that I
can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Running pgindent

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 9:40 AM, Bruce Momjian  wrote:
> On Wed, May 22, 2013 at 01:52:28PM -0400, Bruce Momjian wrote:
>> Do we want to run pgindent soon?
>
> OK, should I run it this week?  Wednesday, 1800 GMT?

wfm.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Running pgindent

2013-05-28 Thread Bruce Momjian
On Tue, May 28, 2013 at 09:49:32AM -0400, Magnus Hagander wrote:
> On Tue, May 28, 2013 at 9:48 AM, Robert Haas  wrote:
> > On Tue, May 28, 2013 at 9:40 AM, Bruce Momjian  wrote:
> >> On Wed, May 22, 2013 at 01:52:28PM -0400, Bruce Momjian wrote:
> >>> Do we want to run pgindent soon?
> >>
> >> OK, should I run it this week?  Wednesday, 1800 GMT?
> >
> > wfm.
> 
> +1.

OK, consider it scheduled, 2013-05-29, 1400 ET, 1800 GMT.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Running pgindent

2013-05-28 Thread Magnus Hagander
On Tue, May 28, 2013 at 9:48 AM, Robert Haas  wrote:
> On Tue, May 28, 2013 at 9:40 AM, Bruce Momjian  wrote:
>> On Wed, May 22, 2013 at 01:52:28PM -0400, Bruce Momjian wrote:
>>> Do we want to run pgindent soon?
>>
>> OK, should I run it this week?  Wednesday, 1800 GMT?
>
> wfm.

+1.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extent Locks

2013-05-28 Thread Jaime Casanova
On Tue, May 28, 2013 at 8:38 AM, Jaime Casanova  wrote:
>
> We can also think in GUC/reloption for next_extend_blocks so formula
> is needed, or of course the automated calculation that has been
> proposed
>

s/so formula is needed/so *no* formula is needed

btw, we can also use a next_extend_blocks GUC/reloption as a limit for
autovacuum so it will allow that empty pages at the end of the table

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Running pgindent

2013-05-28 Thread Bruce Momjian
On Wed, May 22, 2013 at 01:52:28PM -0400, Bruce Momjian wrote:
> Do we want to run pgindent soon?

OK, should I run it this week?  Wednesday, 1800 GMT?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC catalog access

2013-05-28 Thread Robert Haas
On Sun, May 26, 2013 at 9:10 PM, Michael Paquier
 wrote:
> Perhaps we see little difference in performance because PGPROC has been
> separated into PGPROC and PGXACT, reducing lock contention with getting
> snapshot data?
>
> By the way, I grabbed a 32-core machine and did some more performance tests
> with some open connections with XIDs assigned using pg_cxn v2 given by
> Robert in his previous mail to make sure that the snapshots get pretty
> large.

Thanks for checking this on another machine.  It's interesting that
you were able to measure a hit for relcache rebuild, whereas I was
not, but it doesn't look horrible.

IMHO, we should press forward with this approach.  Considering that
these are pretty extreme test cases, I'm inclined to view the
performance loss as acceptable.  We've never really viewed DDL as
something that needs to be micro-optimized, and there is ample
testimony to that fact in the existing code and in the treatment of
prior patches in this area.  This is not to say that we want to go
around willy-nilly making it slower, but I think there will be very
few users for which the number of microseconds it takes to create or
drop an SQL object is performance-critical, especially when you
consider that (1) the effect will be quite a bit less when the objects
are tables, since in that case the snapshot cost will tend to be
drowned out by the filesystem cost and (2) people who don't habitually
keep hundreds and hundreds of connections open - which hopefully most
people don't - won't see the effect anyway.   Against that, this
removes the single largest barrier to allowing more concurrent DDL, a
feature that I suspect will make a whole lot of people *very* happy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extent Locks

2013-05-28 Thread Jaime Casanova
On Tue, May 28, 2013 at 7:36 AM, Stephen Frost  wrote:
>
> On the other hand, I do feel like people are worried about
> over-extending a relation and wasting disk space- but with the way that
> vacuum can clean up pages at the end, that would only be a temporary
> situation anyway.
>

Hi,

Maybe i'm wrong but this should be easily solved by an
autovacuum_no_truncate_empty_pages or an autovacuum_empty_pages_limit
GUC/reloption.
Just to clarify the second one autovacuum will allow until that limit
of empty pages, and will remove excess from there

We can also think in GUC/reloption for next_extend_blocks so formula
is needed, or of course the automated calculation that has been
proposed

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ASYNC Privileges proposal

2013-05-28 Thread Bruce Momjian
On Mon, May 20, 2013 at 02:44:58AM +0100, Chris Farmiloe wrote:
> Hey all,
> 
> I find the current LISTEN / NOTIFY rather limited in the context of databases
> with multiple roles. As it stands it is not possible to restrict the use of
> LISTEN or NOTIFY to specific roles, and therefore notifications (and their
> payloads) cannot really be trusted as coming from any particular source.
> 
> If the payloads of notifications could be trusted, then applications could 
> make
> better use of them, without fear of leaking any sensitive information to 
> anyone
> who shouldn't be able to see it. 
> 
> I'd like to propose a new ASYNC database privilege that would control whether 
> a
> role can use LISTEN, NOTIFY and UNLISTEN statements and the associated
> pg_notify function.
> 
> ie: 
> GRANT ASYNC ON DATABASE  TO bob;
> REVOKE ASYNC ON DATABASE  FROM bob;
> 
> SECURITY DEFINER functions could then be used anywhere that a finer grained
> access control was required.
> 
> I had a quick play to see what might be involved [attached], and would like to
> hear people thoughts; good idea, bad idea, not like that! etc  

I question the usefulness of allowing listen/notify to be restricted to
an entire class of users.  The granularity of this seems too broad,
though I am not sure if allowing message to be sent to a specific user
is easily achievable.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move unused buffers to freelist

2013-05-28 Thread Robert Haas
>> Instead, I suggest modifying BgBufferSync, specifically this part right
>> here:
>>
>> else if (buffer_state & BUF_REUSABLE)
>> reusable_buffers++;
>>
>> What I would suggest is that if the BUF_REUSABLE flag is set here, use
>> that as the trigger to do StrategyMoveBufferToFreeListEnd().
>
> I think at this point also we need to lock buffer header to check refcount
> and usage_count before moving to freelist, or do you think it is not
> required?

If BUF_REUSABLE is set, that means we just did exactly what you're
saying.  Why do it twice?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Robert Haas
Various people, including at least Heikki, Andres, and myself, have
proposed various schemes for avoiding freezing that amount to doing it
sooner, when we're already writing WAL anyway, or at least when the
buffer is already dirty anyway, or at least while the buffer is
already in shared_buffers anyway.  Various people, including at least
Tom and Andres, have raised the point that this would lose
possibly-useful forensic information that they have in the past found
to be of tangible value in previous debugging of databases that have
somehow gotten messed up.  I don't know who originally proposed it,
but I've had many conversations about how we could address this
concern: instead of replacing xmin when we freeze, just set an
infomask bit that means "xmin is frozen" and leave the old, literal
xmin in place.  FrozenTransactionId would still exist and still be
understood, of course, but new freezing operations wouldn't use it.

I have attempted to implement this.  Trouble is, we're out of infomask
bits.  Using an infomask2 bit might work but we don't have many of
them left either, so it's worth casting about a bit for a better
solution.   Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
this purpose, but I think we're better off trying to reclaim those
bits in a future release.  Exactly how to do that is a topic for
another email, but I believe it's very possible.  What I'd like to
propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
indicate that xmin is frozen.  This bit pattern isn't used for
anything else, so there's no confusion possible with existing data
already on disk, but it's necessary to audit all code that checks
HEAP_XMIN_INVALID to make sure it doesn't get confused.  I've done
this, and there's little enough of it that it seems pretty easy to
handle.

A somewhat larger problem is that this requires auditing every place
that looks at a tuple xmin and deciding whether any changes are needed
to handle the possibility that the tuple may be frozen even though
xmin != FrozenTransactionId.  This is a somewhat more significant
change, but it doesn't seem to be too bad.  But there are a couple of
cases that are tricky enough that they seem worth expounding upon:

- When we follow HOT chains, we determine where the HOT chain ends by
matching the xmax of each tuple with the xmin of the next tuple.  If
they don't match, we conclude that the HOT chain has ended.  I
initially thought this logic might be buggy even as things stand today
if the latest tuple in the chain is frozen, but as Andres pointed out
to me, that's not so.  If the newest tuple in the chain is all-visible
(the earliest point at which we can theoretically freeze it), all
earlier tuples are dead altogether, and heap_page_prune() is always
called after locking the buffer and before freezing, so any tuple we
freeze must be the first in its HOT chain.  For the same reason, this
logic doesn't need any adjustment for the new freezing system: it's
never looking at anything old enough to be frozen in the first place.

- Various procedural languages use the combination of TID and XMIN to
determine whether a function needs to be recompiled.  Although the
possibility of this doing the wrong thing seems quite remote, it's not
obvious to me why it's theoretically correct even as things stand
today.  Suppose that previously-frozen tuple is vacuumed away and
another tuple is placed at the same TID and then frozen.  Then, we
check whether the cache is still valid and, behold, it is.  This would
actually get better with this patch, since it wouldn't be enough
merely for the old and new tuples to both be frozen; they'd have to
have had the same XID prior to freezing.  I think that could only
happen if a backend sticks around for at least 2^32 transactions, but
I don't know what would prevent it in that case.

- heap_get_latest_tid() appears broken even without this patch.  It's
only used on user-specified TIDs, either in a TID scan, or due to the
use of currtid_byreloid() and currtid_byrelname().  It attempts find
the latest version of the tuple referenced by the given TID by
following the CTID links.  Like HOT, it uses XMAX/XMIN matching to
detect when the chain is broken.  However, unlike HOT, update chains
can in general span multiple blocks.  It is not now nor has it ever
been safe to assume that the next tuple in the chain can't be frozen
before the previous one is vacuumed away.  Andres came up with the
best example: suppose the tuple to be frozen physically precedes its
predecessor; then, an in-progress vacuum might reach the to-be-frozen
tuple before it reaches (and removes) the previous row version.  In
newer releases, the same problem could be caused by vacuum's
occasional page-skipping behavior.  As with the previous point, the
"don't actually change xmin when we freeze" approach actually makes it
harder for a chain to get "broken" when it shouldn't, but I suspect
it's just moving us from one set of extremely-obscure failure case

Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-28 Thread Cédric Villemain
Le samedi 25 mai 2013 16:41:24, Cédric Villemain a écrit :
> > > If it seems to be on the right way, I'll keep fixing EXTENSION building
> > > with VPATH.
> > 
> > I haven't tried the patch, but let me just say that Debian (and
> > apt.postgresql.org) would very much like the VPATH situation getting
> > improved. At the moment we seem to have to invent a new build recipe
> > for every extension around.

Attached patch adds support for VPATH with USE_PGXS
It just change recipe for install: in pgxs.mk.

I am unsure automatic variables can be used this way with all UNIX variation 
of make...

I also didn't touch MODULE and PROGRAM (yet)
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index 31746f3..2575855 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -121,33 +121,40 @@ all: all-lib
 endif # MODULE_big
 
 
-install: all installdirs
-ifneq (,$(EXTENSION))
-	$(INSTALL_DATA) $(addprefix $(srcdir)/, $(addsuffix .control, $(EXTENSION))) '$(DESTDIR)$(datadir)/extension/'
-endif # EXTENSION
-ifneq (,$(DATA)$(DATA_built))
-	$(INSTALL_DATA) $(addprefix $(srcdir)/, $(DATA)) $(DATA_built) '$(DESTDIR)$(datadir)/$(datamoduledir)/'
-endif # DATA
-ifneq (,$(DATA_TSEARCH))
-	$(INSTALL_DATA) $(addprefix $(srcdir)/, $(DATA_TSEARCH)) '$(DESTDIR)$(datadir)/tsearch_data/'
-endif # DATA_TSEARCH
+install: all installdirs installcontrol installdata installdatatsearch installdocs installscripts
 ifdef MODULES
 	$(INSTALL_SHLIB) $(addsuffix $(DLSUFFIX), $(MODULES)) '$(DESTDIR)$(pkglibdir)/'
 endif # MODULES
+ifdef PROGRAM
+	$(INSTALL_PROGRAM) $(PROGRAM)$(X) '$(DESTDIR)$(bindir)'
+endif # PROGRAM
+
+installcontrol: $(addsuffix .control, $(EXTENSION))
+ifneq (,$(EXTENSION))
+	$(INSTALL_DATA) $< '$(DESTDIR)$(datadir)/extension/'
+endif
+
+installdata: $(DATA) $(DATA_built)
+ifneq (,$(DATA)$(DATA_built))
+	$(INSTALL_DATA) $^ '$(DESTDIR)$(datadir)/$(datamoduledir)/'
+endif
+
+installdatatsearch: $(DATA_TSEARCH)
+ifneq (,$(DATA_TSEARCH))
+	$(INSTALL_DATA) $^ '$(DESTDIR)$(datadir)/tsearch_data/'
+endif
+
+installdocs: $(DOCS)
 ifdef DOCS
 ifdef docdir
-	$(INSTALL_DATA) $(addprefix $(srcdir)/, $(DOCS)) '$(DESTDIR)$(docdir)/$(docmoduledir)/'
+	$(INSTALL_DATA) $^ '$(DESTDIR)$(docdir)/$(docmoduledir)/'
 endif # docdir
 endif # DOCS
-ifdef PROGRAM
-	$(INSTALL_PROGRAM) $(PROGRAM)$(X) '$(DESTDIR)$(bindir)'
-endif # PROGRAM
+
+installscripts: $(SCRIPTS) $(SCRIPTS_built)
 ifdef SCRIPTS
-	$(INSTALL_SCRIPT) $(addprefix $(srcdir)/, $(SCRIPTS)) '$(DESTDIR)$(bindir)/'
+	$(INSTALL_SCRIPT) $^ '$(DESTDIR)$(bindir)/'
 endif # SCRIPTS
-ifdef SCRIPTS_built
-	$(INSTALL_SCRIPT) $(SCRIPTS_built) '$(DESTDIR)$(bindir)/'
-endif # SCRIPTS_built
 
 ifdef MODULE_big
 install: install-lib


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-28 Thread Cédric Villemain
Le mardi 28 mai 2013 14:16:38, Cédric Villemain a écrit :
> > Once all our contribs can build with USE_PGXS I
> > fix the VPATH.
> > 
> > The last step is interesting: installcheck/REGRESS. For this one, if I
> > can know exactly what's required (for debian build for example), then I
> > can also fix this target.
> 
> There is a hack to link the regression data files from the srcdir
> to the builddir when doing 'make VPATH'. but it failed when used in
> conjunction with USE_PGXS and out-of-tree build of an extension.
> 
> Issue is the absence of the data/ directory in the builddir.
> 
> Attached patch fix that.

use $(MKDIR_P) instead of mkdir -p 

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index bbcfe04..e8ff584 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -263,6 +263,7 @@ test_files_build := $(patsubst $(srcdir)/%, $(abs_builddir)/%, $(test_files_src)
 
 all: $(test_files_build)
 $(test_files_build): $(abs_builddir)/%: $(srcdir)/%
+	$(MKDIR_P) '$(dir $@)'
 	ln -s $< $@
 endif # VPATH
 


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-28 Thread Cédric Villemain
> Once all our contribs can build with USE_PGXS I
> fix the VPATH.

Attached patch set VPATH for out-of-tree extension builds

If the makefile is not in the current directory (where we launch 'make')
then assume we are building out-of-src tree and set the VPATH to the
directory of the *first* makefile...

Thus it fixes:
mkdir /tmp/a && cd /tmp/a
make -f extension_src/Makefile USE_PGXS=1


Note that the patch fix things. Still I am not really happy with the rule to 
get the srcdir.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index e8ff584..64732ff 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -61,9 +61,18 @@ ifdef PGXS
 top_builddir := $(dir $(PGXS))../..
 include $(top_builddir)/src/Makefile.global
 
+# If Makefile is not in current directory we are building the extension with
+# VPATH so we set the variable here
+# XXX what about top_srcdir ?
+ifeq ($(CURDIR),$(dir $(firstword $(MAKEFILE_LIST
 top_srcdir = $(top_builddir)
 srcdir = .
 VPATH =
+else
+top_srcdir = $(top_builddir)
+srcdir = $(dir $(firstword $(MAKEFILE_LIST)))
+VPATH = $(dir $(firstword $(MAKEFILE_LIST)))
+endif
 
 # These might be set in Makefile.global, but if they were not found
 # during the build of PostgreSQL, supply default values so that users


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Extent Locks

2013-05-28 Thread Stephen Frost
* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 05/17/2013 11:38 AM, Robert Haas wrote:
> > maybe with a bit of modest pre-extension.

> When it comes to pre-extension, is it realistic to get a count of
> backends waiting on the lock and extend the relation by (say) 2x the
> number of waiting backends?

Having the process which has the lock do more work before releasing it,
and having the other processes realize that there is room available
after blocking on the lock (and not trying to extend the relation
themselves..), might help.  One concern that came up in Ottawa is
over autovacuum coming along and discovering empty pages at the end of
the relation and deciding to try and truncate it.  I'm not convinced
that would happen due to the locks involved but if we actually extend
the relation by enough that the individual processes can continue
writing for a while before another extension is needed, then perhaps it
could.

On the other hand, I do feel like people are worried about
over-extending a relation and wasting disk space- but with the way that
vacuum can clean up pages at the end, that would only be a temporary
situation anyway.

> If it's possible this would avoid the need to attempt any
> recency-of-last-extension based preallocation with the associated
> problem of how to store and access the last-extended time efficiently,
> while still hopefully reducing contention on the relation extension lock
> and without delaying the backend doing the extension too much more.

I do like the idea of getting an idea of how many blocks are being asked
for, based on how many other backends are trying to write, but I've been
thinking a simple algorithm might also work well, eg:

alloc_size = 1 page
extend_time = 0
while(writing)
if(blocked and extend_time < 5s)
alloc_size *= 2
extend_start_time = now()
extend(alloc_size)
extend_time = now() - extend_start_time

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Unsigned integer types

2013-05-28 Thread Andrew Dunstan


On 05/28/2013 05:17 AM, Maciej Gajewski wrote:

I'm afraid that implementing uints as and extension would introduce
some performance penalty (I may be wrong).


You are.


I'm also afraid that with
the extension I'd be left on my own maintaining it forever. While if
this could go into the core product, it would live forever.


This is an argument against ever doing anything as an extension.


You have not at all addressed the real problem with doing what you are 
asking for, the one that Tom Lane stated:



Basically, there is zero chance this will happen unless you can find
a way of fitting them into the numeric promotion hierarchy that doesn't
break a lot of existing applications.  We have looked at this more than
once, if memory serves, and failed to come up with a workable design
that didn't seem to violate the POLA.




cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] storing plpython global pointer

2013-05-28 Thread Szymon Guz
On 28 May 2013 14:15, Jan Urbański  wrote:

> On 28/05/13 14:04, Szymon Guz wrote:
>
>> Hi,
>> I need to store a global pointer for plpython usage. This is a PyObject*
>> which can be initialized per session I think
>>
>> Where should I keep such a pointer?
>>
>
> Hi,
>
> you probably could use a global variable, similar to PLy_interp_globals
> that's defined in plpy_main.c.
>
> Another method would be to expose the Decimal constructor in the plpy
> module. You could modify plpy_plpymodule.c to import decimal and expose the
> Decimal constructor as plpy.Decimal.
>
> Best,
> Jan
>

I think I'd rather go with the first solution, as this function should not
be accessible inside the plpython function. That's what I was thinking
about as well, but I wasn't sure.

thanks,
Szymon


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"

2013-05-28 Thread Cédric Villemain
> Once all our contribs can build with USE_PGXS I
> fix the VPATH.
> 
> The last step is interesting: installcheck/REGRESS. For this one, if I can
> know exactly what's required (for debian build for example), then I can
> also fix this target.

There is a hack to link the regression data files from the srcdir
to the builddir when doing 'make VPATH'. but it failed when used in
conjunction with USE_PGXS and out-of-tree build of an extension.

Issue is the absence of the data/ directory in the builddir.

Attached patch fix that.
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index bbcfe04..e8ff584 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -263,6 +263,7 @@ test_files_build := $(patsubst $(srcdir)/%, $(abs_builddir)/%, $(test_files_src)
 
 all: $(test_files_build)
 $(test_files_build): $(abs_builddir)/%: $(srcdir)/%
+	mkdir -p $(dir $@)
 	ln -s $< $@
 endif # VPATH
 


signature.asc
Description: This is a digitally signed message part.


  1   2   >