Re: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation

2012-09-26 Thread Amit Kapila
  On Thursday, September 27, 2012 10:19 AM
> Noah Misch  writes:
> > You cannot assume executor-unmodified columns are also unmodified
> from
> > heap_update()'s perspective.  Expansion in one column may instigate
> TOAST
> > compression of a logically-unmodified column, and that counts as a
> change for
> > xlog delta purposes.
> 
> Um ... what about BEFORE triggers?
 
This optimization will not apply in case Before triggers updates the tuple.
 
> 
> Frankly, I think that expecting the executor to tell you which columns
> have been modified is a non-starter.  We have a solution for HOT and
> it's silly to do the same thing differently just a few lines away.
> 
My apprehension is that it can hit the performance advantage if we compare
all attributes to check which have been modified and that to under Buffer
Exclusive Lock.
In case of HOT only the index attributes get compared.

I agree that doing things differently at 2 nearby places is not good. 
So I will do it same way as for HOT and then take the performance data again
and if there is no big impact then
we can do it that way. 


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation

2012-09-26 Thread Tom Lane
Noah Misch  writes:
> You cannot assume executor-unmodified columns are also unmodified from
> heap_update()'s perspective.  Expansion in one column may instigate TOAST
> compression of a logically-unmodified column, and that counts as a change for
> xlog delta purposes.

Um ... what about BEFORE triggers?

Frankly, I think that expecting the executor to tell you which columns
have been modified is a non-starter.  We have a solution for HOT and
it's silly to do the same thing differently just a few lines away.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching timeline over streaming replication

2012-09-26 Thread Amit Kapila
On Thursday, September 27, 2012 6:30 AM Josh Berkus wrote:
> >   Yes that is correct. I thought timeline change happens only when
> somebody
> > does PITR.
> >   Can you please tell me why we change timeline after promotion,
> because the
> > original
> >   Timeline concept was for PITR and I am not able to trace from code
> the
> > reason
> >   why on promotion it is required?
> 
> The idea behind the timeline switch is to prevent a server from
> subscribing to a master which is actually behind it.  For example,
> consider this sequence:
> 
> 1. M1->async->S1
> 2. M1 is at xid 2001 and fails.
> 3. S1 did not receive transaction 2001 and is at xid 2000
> 4. S1 is promoted.
> 5. S1 processed an new, different transaction 2001
> 6. M1 is repaired and brought back up
> 7. M1 is subscribed to S1
> 8. M1 is now corrupt.
> 
> That's why we need the timeline switch.

Thanks.
I understood this point, but currently in documentation of Timelines, this 
usecase is not documented (Section 24.3.5).

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-09-26 Thread Kohei KaiGai
Hi Alvaro,

Let me volunteer for reviewing, of course, but now pgsql_fdw is in my queue...

If some other folks can also volunteer it soon, it is welcome.

2012/9/26 Alvaro Herrera :
> Excerpts from Alvaro Herrera's message of mié sep 26 13:04:34 -0300 2012:
>> Excerpts from Kohei KaiGai's message of mié abr 25 06:40:23 -0300 2012:
>>
>> > I tried to implement a patch according to the idea. It allows extensions
>> > to register an entry point of the self-managed daemon processes,
>> > then postmaster start and stop them according to the normal manner.
>>
>> Here's my attempt at this.  This is loosely based on your code, as well
>> as parts of what Simon sent me privately.
>
> Actually please consider this version instead, in which the support to
> connect to a database and run transactions actually works.  I have also
> added a new sample module (worker_spi) which talks to the server using
> the SPI interface.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services



-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 64-bit API for large object

2012-09-26 Thread Kohei KaiGai
I checked this patch. It looks good, but here are still some points to be
discussed.

* I have a question. What is the meaning of INT64_IS_BUSTED?
  It seems to me a marker to indicate a platform without 64bit support.
  However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
  says as follows:
  | Remove all the special-case code for INT64_IS_BUSTED, per decision that
  | we're not going to support that anymore.

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
  as follows:
|  case SEEK_SET:
|  if (offset < 0)
|  elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
|  obj_desc->offset = offset;
|  break;
  It is a right assumption, if large object size would be restricted to 2GB.
  But the largest positive int64 is larger than expected limitation.
  So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
  instead.

* At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
  In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
  will break head of the large object because of "pageno" overflow.

* Please also add checks on inv_read() to prevent LargeObjectDesc->offset
  unexpectedly overflows 4TB boundary.

* At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
  change? All its usage is to store the result of "len % LOBLKSIZE".

Thanks,

2012/9/24 Nozomi Anzai :
> Here is 64-bit API for large object version 2 patch.
>
>> I checked this patch. It can be applied onto the latest master branch
>> without any problems. My comments are below.
>>
>> 2012/9/11 Tatsuo Ishii :
>> > Ok, here is the patch to implement 64-bit API for large object, to
>> > allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
>> > 32KB). The patch is based on Jeremy Drake's patch posted on September
>> > 23, 2005
>> > (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
>> > and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
>> > for the backend part and Yugo Nagata for the rest(including
>> > documentation patch).
>> >
>> > Here are changes made in the patch:
>> >
>> > 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)
>> >
>> > lo_initialize() gathers backend 64-bit large object handling
>> > function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.
>> >
>> > If client calls lo_*64 functions and backend does not support them,
>> > lo_*64 functions return error to caller. There might be an argument
>> > since calls to lo_*64 functions can automatically be redirected to
>> > 32-bit older API. I don't know this is worth the trouble though.
>> >
>> I think it should definitely return an error code when user tries to
>> use lo_*64 functions towards the backend v9.2 or older, because
>> fallback to 32bit API can raise unexpected errors if application
>> intends to seek the area over than 2GB.
>>
>> > Currently lo_initialize() throws an error if one of oids are not
>> > available. I doubt we do the same way for 64-bit functions since this
>> > will make 9.3 libpq unable to access large objects stored in pre-9.2
>> > PostgreSQL servers.
>> >
>> It seems to me the situation to split the case of pre-9.2 and post-9.3
>> using a condition of "conn->sversion >= 90300".
>
> Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.
>
>
>> > To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
>> > is a pointer to 64-bit integer and actual data is placed somewhere
>> > else. There might be other way: add new member to union u to store
>> > 64-bit integer:
>> >
>> > typedef struct
>> > {
>> > int len;
>> > int isint;
>> > union
>> > {
>> > int*ptr;/* can't use void 
>> > (dec compiler barfs)   */
>> > int integer;
>> > int64   bigint; /* 64-bit integer */
>> > }   u;
>> > } PQArgBlock;
>> >
>> > I'm a little bit worried about this way because PQArgBlock is a public
>> > interface.
>> >
>> I'm inclined to add a new field for the union; that seems to me straight
>> forward approach.
>> For example, the manner in lo_seek64() seems to me confusable.
>> It set 1 on "isint" field even though pointer is delivered actually.
>>
>> +   argv[1].isint = 1;
>> +   argv[1].len = 8;
>> +   argv[1].u.ptr = (int *) &len;
>
> Your proposal was not adopted per discussion.
>
>
>> > Also we add new type "pg_int64":
>> >
>> > #ifndef NO_PG_INT64
>> > #define HAVE_PG_INT64 1
>> > typedef long long int pg_int64;
>> > #endif
>> >
>> > in postgres_ext.h per suggestion from Tom Lane:
>> > http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php
>> >
>> I'm uncertain about context of this discussion.
>>
>> Does it make matter if we include  and use int64_t instead
>> of the self define

[HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation

2012-09-26 Thread Noah Misch
On Mon, Sep 24, 2012 at 10:57:02AM +, Amit kapila wrote:
> Rebased version of patch based on latest code.

I like the direction you're taking with this patch; the gains are striking,
especially considering the isolation of the changes.

You cannot assume executor-unmodified columns are also unmodified from
heap_update()'s perspective.  Expansion in one column may instigate TOAST
compression of a logically-unmodified column, and that counts as a change for
xlog delta purposes.  You do currently skip the optimization for relations
having a TOAST table, but TOAST compression can still apply.  Observe this
with text columns of storage mode PLAIN.  I see two ways out: skip the new
behavior when need_toast=true, or compare all inline column data, not just
what the executor modified.  One can probably construct a benchmark favoring
either choice.  I'd lean toward the latter; wide tuples are the kind this
change can most help.  If the marginal advantage of ignoring known-unmodified
columns proves important, we can always bring it back after designing a way to
track which columns changed in the toaster.

Given that, why not treat the tuple as an opaque series of bytes and not worry
about datum boundaries?  When several narrow columns change together, say a
sequence of sixteen smallint columns, you will use fewer binary delta commands
by representing the change with a single 32-byte substitution.  If an UPDATE
changes just part of a long datum, the delta encoding algorithm will still be
able to save considerable space.  That case arises in many forms:  changing
one word in a long string, changing one element in a long array, changing one
field of a composite-typed column.  Granted, this makes the choice of delta
encoding algorithm more important.

Like Heikki, I'm left wondering why your custom delta encoding is preferable
to an encoding from the literature.  Your encoding has much in common with
VCDIFF, even sharing two exact command names.  If a custom encoding is the
right thing, code comments or a README section should at least discuss the
advantages over an established alternative.  Idle thought: it might pay off to
use 1-byte sizes and offsets most of the time.  Tuples shorter than 256 bytes
are common; for longer tuples, we can afford wider offsets.

The benchmarks you posted upthread were helpful.  I think benchmarking with
fsync=off is best if you don't have a battery-backed write controller or SSD.
Otherwise, fsync time dominates a pgbench run.  Please benchmark recovery.  To
do so, set up WAL archiving and take a base backup from a fresh cluster.  Run
pgbench for awhile.  Finally, observe the elapsed time to recover your base
backup to the end of archived WAL.


> *** a/src/backend/access/common/heaptuple.c
> --- b/src/backend/access/common/heaptuple.c

> + /*
> +  * encode_xlog_update
> +  *  Forms a diff tuple from old and new tuple with the modified 
> columns.
> +  *
> +  *  att - attribute list.
> +  *  oldtup - pointer to the old tuple.
> +  *  heaptup - pointer to the modified tuple.
> +  *  wal_tup - pointer to the wal record which needs to be formed 
> from old
> +   and new tuples by using the modified columns 
> list.
> +  *  modifiedCols - modified columns list by the update command.
> +  */
> + void
> + encode_xlog_update(Form_pg_attribute *att, HeapTuple oldtup,
> +HeapTuple heaptup, HeapTuple wal_tup,
> +Bitmapset *modifiedCols)

This name is too generic for an extern function.  Maybe "heap_delta_encode"?

> + void
> + decode_xlog_update(HeapTupleHeader htup, uint32 old_tup_len, char *data,
> +uint32 *new_tup_len, char *waldata, uint32 
> wal_len)

Likwise, maybe "heap_delta_decode" here.

> *** a/src/backend/access/heap/heapam.c
> --- b/src/backend/access/heap/heapam.c
> ***
> *** 71,77 
>   #include "utils/syscache.h"
>   #include "utils/tqual.h"
>   
> - 
>   /* GUC variable */
>   boolsynchronize_seqscans = true;
>   

Spurious whitespace change.

> ***
> *** 3195,3204  l2:
>   /* XLOG stuff */
>   if (RelationNeedsWAL(relation))
>   {
> ! XLogRecPtr  recptr = log_heap_update(relation, buffer, 
> oldtup.t_self,
> ! 
>  newbuf, heaptup,
> ! 
>  all_visible_cleared,
> ! 
>  all_visible_cleared_new);
>   
>   if (newbuf != buffer)
>   {
> --- 3203,3233 
>   /* XLOG stuff */
>   if (RelationNeedsWAL(relation))
>   {
> ! XLogRecPtr  recptr;
> ! 
> ! /*
> !  * Apply the xlog diff update algorithm only for hot updates.
>

Re: [HACKERS] pg_upgrade does not completely honor --new-port

2012-09-26 Thread Bruce Momjian
On Tue, Sep 25, 2012 at 05:36:54PM +0300, Devrim Gunduz wrote:
> 
> Hi,
> 
> I just performed a test upgrade from 9.1 to 9.2, and used --new-port
> variable. However, the analyze_new_cluster.sh does not include the new
> port, thus when I run it, it fails. Any chance to add the port number to
> the script?

Well, the reason people normally use the port number is to do a live
check, but obviously when the script is created it isn't doing a check. 
I am worried that if I do embed the port number in there, then if they
change the port after the upgrade, they now can't use the script.  I
assume users would have PGPORT set before running the script, no?

> Also, is it worth to add the value specified in --new-bindir as a prefix
> to vacuumdb command in the same script?

Wow, I never thought of adding a path to those scripts, but it certainly
makes sense.  I have created the attached patch which does this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
index bed10f8..2785eb7 100644
--- a/contrib/pg_upgrade/check.c
+++ b/contrib/pg_upgrade/check.c
@@ -477,7 +477,7 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %svacuumdb --all %s%s\n", ECHO_QUOTE,
+	fprintf(script, "echo %s\"%s/vacuumdb\" --all %s%s\n", ECHO_QUOTE, new_cluster.bindir,
 	/* Did we copy the free space files? */
 			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
 			"--analyze-only" : "--analyze", ECHO_QUOTE);
@@ -498,7 +498,7 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %s--%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "vacuumdb --all --analyze-only\n");
+	fprintf(script, "\"%s/vacuumdb\" --all --analyze-only\n", new_cluster.bindir);
 	fprintf(script, "echo%s\n", ECHO_BLANK);
 	fprintf(script, "echo %sThe server is now available with minimal optimizer statistics.%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
@@ -519,7 +519,7 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %s---%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "vacuumdb --all --analyze-only\n");
+	fprintf(script, "\"%s/vacuumdb\" --all --analyze-only\n", new_cluster.bindir);
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
 #ifndef WIN32
@@ -532,7 +532,7 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %s-%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "vacuumdb --all %s\n",
+	fprintf(script, "\"%s/vacuumdb\" --all %s\n", new_cluster.bindir,
 	/* Did we copy the free space files? */
 			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
 			"--analyze-only" : "--analyze");

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Modest proposal: run check_keywords.pl on every build

2012-09-26 Thread Andrew Dunstan


On 09/26/2012 09:16 PM, Tom Lane wrote:

I've had it with mopping up after oversights like this one:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg01057.php

We made a script (src/tools/check_keywords.pl) to check for this type of
error years ago, and even added it to the top-level "maintainer-check"
target awhile back, but nonetheless people continue to screw it up on
a regular basis (twice already this year alone).  I think we should move
the script into src/backend/parser and run it as part of the make gram.c
target.  (Doing it that way will not cause perl to become required for
building from tarballs, since gram.c is supplied prebuilt in tarballs.
And we already require perl for builds from raw git pulls.)

Any objections?





works for me.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching timeline over streaming replication

2012-09-26 Thread m...@rpzdesign.com

Josh:

The good part is you are the first person to ask for a copy
and I will send you the hook code that I have and you can be a good sport
and put it on GitHub, that is great, you can give us both credit for a 
joint effort, I do the code,

you put it GitHub.

The not so good part is that the community has a bunch of other trigger work
and other stuff going on, so there was not much interest in non-WAL 
replication hook code.


I do not have time to debate implementation nor wait for release of 9.3
with my needs not met, so I will just keep patching the hook code into 
whatever release

code base comes along.

The bad news is that I have not implemented the logic of the external 
replication daemon.


The other good and bad news is that you are free to receive the messages 
from the hook code
thru the unix socket and implement replication any way you want and the 
bad news is that you are free

to IMPLEMENT replication any way you want.

I am going to implement master-master-master-master SELF HEALING 
replication, but that is just my preference.
Should take about a week to get it operational and another week to see 
how it works in my geographically dispersed

servers in the cloud.

Send me a note if it is ok to send you a zip file with the source code 
files that I touched in the 9.2 code base so you

can shove it up on GitHub.

Cheers,

marco


On 9/26/2012 6:48 PM, Josh Berkus wrote:

I was able to patch the 9.2.0 code base in 1 day and change my entire
architecture strategy for replication
into self healing async master-master-master and the tiniest bit of
sharding code imaginable

Sounds cool.   Do you have a fork available on Github?  I'll try it out.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Modest proposal: run check_keywords.pl on every build

2012-09-26 Thread Tom Lane
I've had it with mopping up after oversights like this one:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg01057.php

We made a script (src/tools/check_keywords.pl) to check for this type of
error years ago, and even added it to the top-level "maintainer-check"
target awhile back, but nonetheless people continue to screw it up on
a regular basis (twice already this year alone).  I think we should move
the script into src/backend/parser and run it as part of the make gram.c
target.  (Doing it that way will not cause perl to become required for
building from tarballs, since gram.c is supplied prebuilt in tarballs.
And we already require perl for builds from raw git pulls.)

Any objections?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching timeline over streaming replication

2012-09-26 Thread Josh Berkus

>   Yes that is correct. I thought timeline change happens only when somebody
> does PITR.
>   Can you please tell me why we change timeline after promotion, because the
> original
>   Timeline concept was for PITR and I am not able to trace from code the
> reason 
>   why on promotion it is required?

The idea behind the timeline switch is to prevent a server from
subscribing to a master which is actually behind it.  For example,
consider this sequence:

1. M1->async->S1
2. M1 is at xid 2001 and fails.
3. S1 did not receive transaction 2001 and is at xid 2000
4. S1 is promoted.
5. S1 processed an new, different transaction 2001
6. M1 is repaired and brought back up
7. M1 is subscribed to S1
8. M1 is now corrupt.

That's why we need the timeline switch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does the SQL standard actually define LATERAL anywhere?

2012-09-26 Thread Tom Lane
Peter Eisentraut  writes:
> On Sat, 2012-09-01 at 00:29 -0400, Tom Lane wrote:
>> this is a spec-defined syntax so surely the SQL standard ought to tell
>> us what to do.  But I'm darned if I see anything in the standard that
>> defines the actual *behavior* of a LATERAL query.

> I have it another read and couldn't find anything either.  As written,
> LATERAL is effectively a noise word, AFAICT.

It's not a noise word --- its effects on scope of name visibility are
spelled out clearly enough.  What is not clear is what the query
semantics are supposed to be in the presence of references that would
be disallowed in the traditional understanding of a FROM clause.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching timeline over streaming replication

2012-09-26 Thread Josh Berkus

> I was able to patch the 9.2.0 code base in 1 day and change my entire
> architecture strategy for replication
> into self healing async master-master-master and the tiniest bit of
> sharding code imaginable

Sounds cool.   Do you have a fork available on Github?  I'll try it out.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EVENT Keyword and CREATE TABLE

2012-09-26 Thread Tom Lane
Brian Weaver  writes:
> In some of our old tables going back several years we a column named
> 'event' as in:
> event character varying(1000)
> I was working off trunk and the database refuses to create this table
> any longer. Is this by design or is it a regression bug?

It's a bug.  The event-trigger patch added EVENT as a new keyword, but
forgot to list it in the unreserved_keywords production, which is
necessary to make it actually act unreserved.

I've committed a fix, and manually verified there are no other such
errors at present, but this isn't the first time this has happened.
We probably need to put in some automated cross-check, or it won't
be the last time either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does the SQL standard actually define LATERAL anywhere?

2012-09-26 Thread Peter Eisentraut
On Sat, 2012-09-01 at 00:29 -0400, Tom Lane wrote:
> this is a spec-defined syntax so surely the SQL standard ought to tell
> us what to do.  But I'm darned if I see anything in the standard that
> defines the actual *behavior* of a LATERAL query.

I have it another read and couldn't find anything either.  As written,
LATERAL is effectively a noise word, AFAICT.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Tomas Vondra
On 26.9.2012 18:14, Jeff Janes wrote:
> On Wed, Sep 26, 2012 at 8:25 AM, Tomas Vondra  wrote:
>> Dne 26.09.2012 16:51, Jeff Janes napsal:
>>
>>
>>> What is generating the endless stream you are seeing is that you have
>>> 1000 databases so if naptime is one minute you are vacuuming 16 per
>>> second.  Since every database gets a new process, that process needs
>>> to read the file as it doesn't inherit one.
>>
>>
>> Right. But that makes the 10ms timeout even more strange, because the
>> worker is then using the data for very long time (even minutes).
> 
> On average that can't happen, or else your vacuuming would fall way
> behind.  But I agree, there is no reason to have very fresh statistics
> to start with.  naptime/5 seems like a good cutoff for me for the
> start up reading.  If a table only becomes eligible for vacuuming in
> the last 20% of the naptime, I see no reason that it can't wait
> another round.  But that just means the statistics collector needs to
> write the file less often, the workers still need to read it once per
> database since each one only vacuums one database and don't inherit
> the data from the launcher.

So what happens if there are two workers vacuuming the same database?
Wouldn't that make it more probable that were already vacuumed by the
other worker?

See the comment at the beginning of autovacuum.c, where it also states
that the statfile is reloaded before each table (probably because of the
calls to autovac_refresh_stats which in turn calls clear_snapshot).

>>> I think forking it off to to another value would be better.  If you
>>> are an autovacuum worker which is just starting up and so getting its
>>> initial stats, you can tolerate a stats file up to "autovacuum_naptime
>>> / 5.0" stale.  If you are already started up and are just about to
>>> vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
>>> currently is, so as not to redundantly vacuum a table.
>>
>>
>> I always thought there's a "no more than one worker per database" limit,
>> and that the file is always reloaded when switching to another database.
>> So I'm not sure how could a worker see such a stale table info? Or are
>> the workers keeping the stats across multiple databases?
> 
> If you only have one "active" database, then all the workers will be
> in it.  I don't how likely it is that they will leap frog each other
> and collide.  But anyway, if you 1000s of databases, then each one
> will generally require zero vacuums per naptime (as you say, it is
> mostly read only), so it is the reads upon start-up, not the reads per
> table that needs vacuuming, which generates most of the traffic.  Once
> you separate those two parameters out, playing around with the
> PGSTAT_RETRY_DELAY one seems like a needless risk.

OK, right. My fault.

Yes, our databases are mostly readable - more precisely whenever we load
data, we immediately do VACUUM ANALYZE on the tables, so autovacuum
never kicks in on them. The only thing it works on are system catalogs
and such stuff.

 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on
 the
 time
it takes to write the file (e.g. 10x the write time or something).
>>>
>>>
>>> This is already in place.
>>
>>
>> Really? Where?
> 
> I had thought that this part was effectively the same thing:
> 
>  * We don't recompute min_ts after sleeping, except in the
>  * unlikely case that cur_ts went backwards.
> 
> But I think I did not understand your proposal.
> 
>>
>> I've checked the current master, and the only thing I see in
>> pgstat_write_statsfile
>> is this (line 3558):
>>
>>   last_statwrite = globalStats.stats_timestamp;
>>
>> https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558
>>
>>
>> I don't think that's doing what I meant. That really doesn't scale the
>> timeout
>> according to write time. What happens right now is that when the stats file
>> is
>> written at time 0 (starts at zero, write finishes at 100 ms), and a worker
>> asks
>> for the file at 99 ms (i.e. 1ms before the write finishes), it will set the
>> time
>> of the inquiry to last_statrequest and then do this
>>
>>if (last_statwrite < last_statrequest)
>>   pgstat_write_statsfile(false);
>>
>> i.e. comparing it to the start of the write. So another write will start
>> right
>> after the file is written out. And over and over.
> 
> Ah.  I had wondered about this before too, and wondered if it would be
> a good idea to have it go back to the beginning of the stats file, and
> overwrite the timestamp with the current time (rather than the time it
> started writing it), as the last action it does before the rename.  I
> think that would automatically make it adaptive to the time it takes
> to write out the file, in a fairly simple way.

Yeah, I was thinking about that too.

>> Moreover there's the 'rename' step making the new file invisible for the
>> worker
>> processes, which makes the thing 

Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Tomas Vondra
On 26.9.2012 18:29, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Excerpts from Euler Taveira's message of miĂŠ sep 26 11:53:27 -0300 2012:
>>> On 26-09-2012 09:43, Tomas Vondra wrote:
 5) splitting the single stat file into multiple pieces - e.g. per database,
 written separately, so that the autovacuum workers don't need to read all
 the data even for databases that don't need to be vacuumed. This might be
 combined with (4).
> 
>>> IMHO that's the definitive solution. It would be one file per database plus 
>>> a
>>> global one. That way, the check would only read the global.stat and process
>>> those database that were modified. Also, an in-memory map could store that
>>> information to speed up the checks.
> 
>> +1
> 
> That would help for the case of hundreds of databases, but how much
> does it help for lots of tables in a single database?

Well, it wouldn't, but it wouldn't make it worse either. Or at least
that's how I understand it.

> I'm a bit suspicious of the idea that we should encourage people to use
> hundreds of databases per installation anyway: the duplicated system
> catalogs are going to be mighty expensive, both in disk space and in
> their cache footprint in shared buffers.  There was some speculation
> at the last PGCon about how we might avoid the duplication, but I think
> we're years away from any such thing actually happening.

You don't need to encourage us to do that ;-) We know it's not perfect
and considering a good alternative - e.g. several databases (~10) with
schemas inside, replacing the current database-only approach. This way
we'd get multiple stat files (thus gaining the benefits) with less
overhead (shared catalogs).

And yes, using tens of thousands of tables (serving as "caches") for a
reporting solution is "interesting" (as in the old Chinese curse) too.

> What seems to me like it could help more is fixing things so that the
> autovac launcher needn't even launch a child process for databases that
> haven't had any updates lately.  I'm not sure how to do that, but it
> probably involves getting the stats collector to produce some kind of
> summary file.

Yes, I've proposed something like this in my original mail - setting a
"dirty" flag on objects (a database in this case) whenever a table in it
gets eligible for vacuum/analyze.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system_information.triggers & truncate triggers

2012-09-26 Thread Gavin Flower

On 27/09/12 02:59, Christopher Browne wrote:

On Wed, Sep 26, 2012 at 10:02 AM, Tom Lane  wrote:

Daniel Farina  writes:

On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova  wrote:

The definition of information_schema.triggers contains this:
-- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE
so it seems that we are not showing TRUNCATE triggers intentionally,
but that comment fails to explain why

Wouldn't it be because TRUNCATE is a PostgreSQL language extension?

Yeah.  The SQL standard specifies the allowed values in that column,
and TRUNCATE is not among them.

For similar reasons, you won't find exclusion constraints represented
in the information_schema views, and there are some other cases that
I don't recall this early in the morning.

The point of the information_schema (at least IMHO) is to present
standard-conforming information about standard-conforming database
objects in a standard-conforming way, so that cross-DBMS applications
can rely on what they'll see there.  If you are doing anything that's
not described by the SQL standard, you will get at best an incomplete
view of it from the information_schema.  In that case you're a lot
better off looking directly at the underlying catalogs.

(Yes, I'm aware that some other DBMS vendors have a more liberal
interpretation of what standards compliance means in this area.)

Let me grouse about this a bit...  

I appreciate that standards compliance means that information_schema
needs to be circumspect as to what it includes.

But it is irritating that information_schema provides a representation
of (for instance) triggers that, at first, looks nice and clean and
somewhat version-independent, only to fall over because there's a
class of triggers that it consciously ignores.

If I'm wanting to do schema analytics on this (and I do), I'm left
debating between painful choices:

a) Use information_schema for what it *does* have, and then add in a
surprising-looking hack that's pretty version-dependent to draw in the
other triggers that it left out

b) Ignore the seeming-nice information_schema representation, and
construct a version-dependent extraction covering everything that more
or less duplicates the work being done by information_schema.triggers.

I'd really like to have something like

c) Something like information_schema that "takes the
standards-conformance gloves off" and gives a nice representation of
all the triggers.

Make no mistake, I'm not casting aspersions at how pg_trigger was
implemented; I have no complaint there, as it's quite fair that the
internal representation won't be totally "human-readability-friendly."
  That is a structure that is continuously accessed by backends, and it
is entirely proper to bias implementation to internal considerations.
But I'd sure like ways to get at more analytically-friendly
representations.

A different place where I wound up having to jump through considerable
hoops when doing schema analytics was vis-a-vis identifying functions.
  I need to be able to compare schemas across databases, so oid-based
identification of functions is a total non-starter.  It appears that
the best identification of a function would be based on the
combination of schema name, function name, and the concatenation of
argument data types.  It wasn't terribly difficult to construct that
third bit, but it surely would be nice if there was a view capturing
it, and possibly even serializing it into a table to enable indexing
on it.  Performance-wise, function comparisons turned out to be one of
the most expensive things I did, specifically because of that mapping
surrounding arguments.




I agree with your comments, but I couldn't helping thinking about Grouse 
shooting! :-)



http://www.telegraph.co.uk/news/features/7944546/Grouse-shooting-season.html
[...]
Grouse shooting season
Grouse-shooters have been looking forward to mid-August with bridal 
excitement since the Game Act of 1831 made it illegal to shoot out of 
season.

[...]



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] EVENT Keyword and CREATE TABLE

2012-09-26 Thread Brian Weaver
I think I just got bitten hard by a commit in mid July... git sha1 3855968.

In some of our old tables going back several years we a column named
'event' as in:

CREATE TABLE tblaudittrail (
id bigint NOT NULL,
siteid integer NOT NULL,
entrytype character varying(25),
form character varying(50),
recordid integer,
field character varying(25),
changedfrom character varying(500),
changedto character varying(500),
changedon timestamp with time zone,
changedby character varying(25),
event character varying(1000)
);

I was working off trunk and the database refuses to create this table
any longer. Is this by design or is it a regression bug?

Thanks

-- Brian
-- 

/* insert witty comment here */


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] data to json enhancements

2012-09-26 Thread Hannu Krosing

On 09/26/2012 06:46 PM, Tom Lane wrote:

Andrew Dunstan  writes:

Drawing together various discussions both here and elsewhere (e.g. the
PostgresOpen hallway track) I propose to work on the following:
1. make datum_to_json() honor a type's cast to json if it exists. The
fallback is to use the type's string representation, as now.
2. add a cast hstore -> json (any others needed for core / contrib types ?)
3. add a to_json(anyelement) function
4. add a new aggregate function json_agg(anyrecord) -> json to simplify
and make more effecient turning a resultset into json.
Comments welcome.

ISTM the notion of to_json(anyelement) was already heavily discussed and
had spec-compliance issues ... in fact, weren't you one of the people
complaining?  What exactly does #3 mean that is different from the
previous thread?

Also, on reflection I'm not sure about commandeering cast-to-json for
this --- aren't we really casting to "json member" or something like
that?  The distinction between a container and its contents seems
important here.  With a container type as source, it might be important
to do something different if we're coercing it to a complete JSON
value versus something that will be just one member.  I'm handwaving
here because I don't feel like going back to re-read the RFC, but
it seems like something that should be considered carefully before
we lock down an assumption that there can never be a difference.

"json value" is a superset of "json object", so no special handling
should be required here - they nest cleanly.

(you can check http://www.json.org/ for definition.)

I agree that one of the standards did say that "JSON generators"
should produce only JSON-serialised arrays and dictionaries and
not "JSON values" - that is one of (literal null, true or false or a 
json array, dictionary,

number or double-quoted string)

But if we would do that we would really be the _only_ one who would
place this restriction on their to_json function.

As I already reported in the previous discussion, neither python, ruby or
neither of the two javascripts I tested (mozilla & chrome) place this
restriction on their json serialisation functions.

Maybe the "JSON generator" in the standard means something else, like
a stand-alone program or server.

The only mention/definition for Generator in the RFC is

"5. Generators

A JSON generator produces JSON text.  The resulting text MUST
strictly conform to the JSON grammar."

To be formally standards compliant I propose that we officially define our
json type to store "a json value" and as json object is really just a 
subset of

json value we can provide (or just advise the use of)
a domain json_object, which has CHECK for the first non-whitespace char
being { or [ .


regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_signal_backend() asymmetry

2012-09-26 Thread Noah Misch
I'm marking this patch Ready for Committer.  I suggest backpatching it to 9.2;
the patch corrects an oversight in 9.2 changes.  There's more compatibility
value in backpatching than in retaining distinct behavior for 9.2 only.

On Thu, Jun 28, 2012 at 09:32:41AM -0700, Josh Kupershmidt wrote:
> ! if (!superuser())
>   {
>   /*
> !  * Since the user is not superuser, check for matching roles.
>*/
> ! if (proc->roleId != GetUserId())
> ! return SIGNAL_BACKEND_NOPERMISSION;
>   }

I would have collapsed the conditionals and deleted the obvious comment:

if (!(superuser() || proc->roleId == GetUserId()))
return SIGNAL_BACKEND_NOPERMISSION;

The committer can do that if desired; no need for another version.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-26 Thread Bruce Momjian
On Mon, Sep 24, 2012 at 03:55:35PM -0700, Josh Berkus wrote:
> On 9/24/12 3:43 PM, Simon Riggs wrote:
> > On 24 September 2012 17:36, Josh Berkus  wrote:
> >>
>  For me, the Postgres user interface should include
>  * REINDEX CONCURRENTLY
> >>
> >> I don't see why we don't have REINDEX CONCURRENTLY now.
> > 
> > Same reason for everything on (anyone's) TODO list.
> 
> Yes, I'm just pointing out that it would be a very small patch for
> someone, and that AFAIK it didn't make it on the TODO list yet.

I see it on the TODO list, and it has been there for years:

https://wiki.postgresql.org/wiki/Todo#Indexes
Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter enum add value if not exists

2012-09-26 Thread Bruce Momjian
On Sat, Sep 22, 2012 at 06:08:19PM -0400, Tom Lane wrote:
> Hannu Krosing  writes:
> > On 09/22/2012 11:49 PM, Andrew Dunstan wrote:
> >> Not really, I guess we should for the sake of consistency, although TBH
> >> I find it just useless noise and rather wish we hadn't started the 
> >> trend when we did the first DROP IF NOT EXISTS stuff.
> 
> > Time for a GUC
> > existence_notice = none | exists | not_exists | all
> 
> Not another one :-( ... isn't client_min_messages good enough?
> 
> We sort of had this discussion before w.r.t. the notices about creating
> primary key indexes etc.  I wonder whether we should make a formal
> effort to split NOTICE message level into, say, NOTICE and NOVICE
> levels, where the latter contains all the "training wheels" stuff that
> experienced users would really rather not see.  Or maybe just redefine
> NOTICE as meaning novice-oriented messages, and push anything that
> doesn't seem to fit that categorization into another existing message
> level?

I have always wanted a "novice" level, so we could warn about things
like unjoined tables.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] data to json enhancements

2012-09-26 Thread Andrew Dunstan


On 09/26/2012 01:46 PM, Tom Lane wrote:

Andrew Dunstan  writes:

Drawing together various discussions both here and elsewhere (e.g. the
PostgresOpen hallway track) I propose to work on the following:
1. make datum_to_json() honor a type's cast to json if it exists. The
fallback is to use the type's string representation, as now.
2. add a cast hstore -> json (any others needed for core / contrib types ?)
3. add a to_json(anyelement) function
4. add a new aggregate function json_agg(anyrecord) -> json to simplify
and make more effecient turning a resultset into json.
Comments welcome.

ISTM the notion of to_json(anyelement) was already heavily discussed and
had spec-compliance issues ... in fact, weren't you one of the people
complaining?  What exactly does #3 mean that is different from the
previous thread?



I thought I got shouted down on that issue. The main reason we didn't 
include it was that it was getting rather late for those changes, IIRC - 
we only just got in any json stuff at all in under the wire.


And in fact you can have a json value now that's not an array or object:

   andrew=# select json '1' as num, json '"foo"' as txt;
 num |  txt
   -+---
 1   | "foo"



Also, on reflection I'm not sure about commandeering cast-to-json for
this --- aren't we really casting to "json member" or something like
that?  The distinction between a container and its contents seems
important here.  With a container type as source, it might be important
to do something different if we're coercing it to a complete JSON
value versus something that will be just one member.  I'm handwaving
here because I don't feel like going back to re-read the RFC, but
it seems like something that should be considered carefully before
we lock down an assumption that there can never be a difference.


I think in view of the above this would be moot, no?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] data to json enhancements

2012-09-26 Thread Tom Lane
Andrew Dunstan  writes:
> Drawing together various discussions both here and elsewhere (e.g. the 
> PostgresOpen hallway track) I propose to work on the following:

> 1. make datum_to_json() honor a type's cast to json if it exists. The 
> fallback is to use the type's string representation, as now.
> 2. add a cast hstore -> json (any others needed for core / contrib types ?)
> 3. add a to_json(anyelement) function
> 4. add a new aggregate function json_agg(anyrecord) -> json to simplify 
> and make more effecient turning a resultset into json.

> Comments welcome.

ISTM the notion of to_json(anyelement) was already heavily discussed and
had spec-compliance issues ... in fact, weren't you one of the people
complaining?  What exactly does #3 mean that is different from the
previous thread?

Also, on reflection I'm not sure about commandeering cast-to-json for
this --- aren't we really casting to "json member" or something like
that?  The distinction between a container and its contents seems
important here.  With a container type as source, it might be important
to do something different if we're coercing it to a complete JSON
value versus something that will be just one member.  I'm handwaving
here because I don't feel like going back to re-read the RFC, but
it seems like something that should be considered carefully before
we lock down an assumption that there can never be a difference.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Jeff Janes
On Wed, Sep 26, 2012 at 9:29 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:
>>> On 26-09-2012 09:43, Tomas Vondra wrote:
 5) splitting the single stat file into multiple pieces - e.g. per database,
 written separately, so that the autovacuum workers don't need to read all
 the data even for databases that don't need to be vacuumed. This might be
 combined with (4).
>
>>> IMHO that's the definitive solution. It would be one file per database plus 
>>> a
>>> global one. That way, the check would only read the global.stat and process
>>> those database that were modified. Also, an in-memory map could store that
>>> information to speed up the checks.
>
>> +1
>
> That would help for the case of hundreds of databases, but how much
> does it help for lots of tables in a single database?

It doesn't help that case, but that case doesn't need much help.  If
you have N statistics-kept objects in total spread over M databases,
of which T objects need vacuuming per naptime, the stats file traffic
is proportional to N*(M+T).  If T is low, then there is generally is
no problem if M is also low.  Or at least, the problem is much smaller
than when M is high for a fixed value of N.

> I'm a bit suspicious of the idea that we should encourage people to use
> hundreds of databases per installation anyway:

I agree with that, but we could still do a better job of tolerating
it; without encouraging it.  If someone volunteers to write the code
to do this, what trade-offs would there be?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] data to json enhancements

2012-09-26 Thread Andrew Dunstan
Drawing together various discussions both here and elsewhere (e.g. the 
PostgresOpen hallway track) I propose to work on the following:


1. make datum_to_json() honor a type's cast to json if it exists. The 
fallback is to use the type's string representation, as now.

2. add a cast hstore -> json (any others needed for core / contrib types ?)
3. add a to_json(anyelement) function
4. add a new aggregate function json_agg(anyrecord) -> json to simplify 
and make more effecient turning a resultset into json.


Comments welcome.

I also propose to work on some accessor functions to pull data out of 
json (as opposed to producing json from non-json). I will email 
separately about that when I have firmed up a proposed API a bit more.




cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:
>> On 26-09-2012 09:43, Tomas Vondra wrote:
>>> 5) splitting the single stat file into multiple pieces - e.g. per database,
>>> written separately, so that the autovacuum workers don't need to read all
>>> the data even for databases that don't need to be vacuumed. This might be
>>> combined with (4).

>> IMHO that's the definitive solution. It would be one file per database plus a
>> global one. That way, the check would only read the global.stat and process
>> those database that were modified. Also, an in-memory map could store that
>> information to speed up the checks.

> +1

That would help for the case of hundreds of databases, but how much
does it help for lots of tables in a single database?

I'm a bit suspicious of the idea that we should encourage people to use
hundreds of databases per installation anyway: the duplicated system
catalogs are going to be mighty expensive, both in disk space and in
their cache footprint in shared buffers.  There was some speculation
at the last PGCon about how we might avoid the duplication, but I think
we're years away from any such thing actually happening.

What seems to me like it could help more is fixing things so that the
autovac launcher needn't even launch a child process for databases that
haven't had any updates lately.  I'm not sure how to do that, but it
probably involves getting the stats collector to produce some kind of
summary file.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] 2 bugs with extensions

2012-09-26 Thread Dimitri Fontaine
Marko Kreen  writes:
>> Can we work out a minimal example to reproduce the bug?
>
> Yes, the above text or sql/pgq_coop/sql/pgq_coop_init_ext.sql
>
> I guess you could replace pgq_coop with any extension just
> consisting of just functions.

I did just that, with a single function, and couldn't reproduce the
problem either in 9.1 nor in master, with relocatable = true then with
relocatable = false and schema = 'pg_catalog' as in your repository.

The extension has those files contents:

  Makefile
EXTENSION = extschema
DATA = extschema--unpackaged--1.0.sql extschema.sql extschema--1.0.sql

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

  extschema.control
# extschema extension
comment = 'debug extension schema handling'
default_version = '1.0'
relocatable = false
schema = 'pg_catalog'

  extschema.sql
create schema ext;
create or replace function ext.extschema() returns int language sql
as $$ select 1; $$;

  extschema--unpackaged--1.0.sql
alter extension extschema add schema ext;
alter extension extschema add function ext.extschema();

  extschema--1.0.sql
create schema ext;
create or replace function ext.extschema() returns int language sql
as $$ select 1; $$;

So I guess that needs some more work to reproduce the bug.

>> (The Makefile in skytools/sql/pgq_coop fails on my OS)
>
> How does it fail?  Are you using gnu make?  What version?

I guess sed is the problem here, it's a BSD variant:

  dim ~/skytools/sql/pgq_coop make pgq_coop--unpackaged--3.1.sql
  sed: 1: "/^\\/{s/\\i //;p}\n": extra characters at the end of p command
  sed: 1: "/^\\/{s/\\i //;p}\n": extra characters at the end of p command
  sed: 1: "/^\\/{s/\\i //;p}\n": extra characters at the end of p command
  cat pgq_coop.upgrade.sql > pgq_coop--unpackaged--3.1.sql

  sed --version  
  sed: illegal option -- -
  usage: sed script [-Ealn] [-i extension] [file ...]
 sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file 
...]

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Jeff Janes
On Wed, Sep 26, 2012 at 8:25 AM, Tomas Vondra  wrote:
> Dne 26.09.2012 16:51, Jeff Janes napsal:
>
>
>> What is generating the endless stream you are seeing is that you have
>> 1000 databases so if naptime is one minute you are vacuuming 16 per
>> second.  Since every database gets a new process, that process needs
>> to read the file as it doesn't inherit one.
>
>
> Right. But that makes the 10ms timeout even more strange, because the
> worker is then using the data for very long time (even minutes).

On average that can't happen, or else your vacuuming would fall way
behind.  But I agree, there is no reason to have very fresh statistics
to start with.  naptime/5 seems like a good cutoff for me for the
start up reading.  If a table only becomes eligible for vacuuming in
the last 20% of the naptime, I see no reason that it can't wait
another round.  But that just means the statistics collector needs to
write the file less often, the workers still need to read it once per
database since each one only vacuums one database and don't inherit
the data from the launcher.


>> I think forking it off to to another value would be better.  If you
>> are an autovacuum worker which is just starting up and so getting its
>> initial stats, you can tolerate a stats file up to "autovacuum_naptime
>> / 5.0" stale.  If you are already started up and are just about to
>> vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
>> currently is, so as not to redundantly vacuum a table.
>
>
> I always thought there's a "no more than one worker per database" limit,
> and that the file is always reloaded when switching to another database.
> So I'm not sure how could a worker see such a stale table info? Or are
> the workers keeping the stats across multiple databases?

If you only have one "active" database, then all the workers will be
in it.  I don't how likely it is that they will leap frog each other
and collide.  But anyway, if you 1000s of databases, then each one
will generally require zero vacuums per naptime (as you say, it is
mostly read only), so it is the reads upon start-up, not the reads per
table that needs vacuuming, which generates most of the traffic.  Once
you separate those two parameters out, playing around with the
PGSTAT_RETRY_DELAY one seems like a needless risk.

>
>>> 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on
>>> the
>>> time
>>>it takes to write the file (e.g. 10x the write time or something).
>>
>>
>> This is already in place.
>
>
> Really? Where?

I had thought that this part was effectively the same thing:

 * We don't recompute min_ts after sleeping, except in the
 * unlikely case that cur_ts went backwards.

But I think I did not understand your proposal.

>
> I've checked the current master, and the only thing I see in
> pgstat_write_statsfile
> is this (line 3558):
>
>   last_statwrite = globalStats.stats_timestamp;
>
> https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558
>
>
> I don't think that's doing what I meant. That really doesn't scale the
> timeout
> according to write time. What happens right now is that when the stats file
> is
> written at time 0 (starts at zero, write finishes at 100 ms), and a worker
> asks
> for the file at 99 ms (i.e. 1ms before the write finishes), it will set the
> time
> of the inquiry to last_statrequest and then do this
>
>if (last_statwrite < last_statrequest)
>   pgstat_write_statsfile(false);
>
> i.e. comparing it to the start of the write. So another write will start
> right
> after the file is written out. And over and over.

Ah.  I had wondered about this before too, and wondered if it would be
a good idea to have it go back to the beginning of the stats file, and
overwrite the timestamp with the current time (rather than the time it
started writing it), as the last action it does before the rename.  I
think that would automatically make it adaptive to the time it takes
to write out the file, in a fairly simple way.



>
> Moreover there's the 'rename' step making the new file invisible for the
> worker
> processes, which makes the thing a bit more complicated.

I think renames are assumed to be atomic.  Either it sees the old one,
or the new one, but never sees neither.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oid registry

2012-09-26 Thread Dimitri Fontaine
Peter Eisentraut  writes:
> Here is a problem:  If I write an "hstore-ng" extension, I have two

In the patch for Finer Extension Dependencies, the offer is that you
have the hstore-ng extension provide the 'hstore' feature.

  https://commitfest.postgresql.org/action/patch_view?id=727

Now, that says nothing about the type's OID nor how to trust it.

In my proposal we would have a pg_features catalog, that are just
keywords used in control files so that you can change your mind as far
as dependencies are concerned from one release of your extension to the
next (adding or removing some, splitting the extension in parts or
joining them again, etc).

Those features entries do not exist yet, and are a very specific set of
OIDs, so we could maybe provision a large number of them here and refuse
to assign them to untrusted sources.

Again, the complex part of that problem, to me, is not about how to
manage those numbers (running a registry, adding syntax support, etc) as
much as how to manage a distributed network of trust.

When applying for an OID or a feature identifier, the registry team
could maybe GPG sign "something" with the PostgreSQL private key, the
backend only having the public key embedded in its code and binary. IIRC
that's enough to then validate the feature name/oid (with the signature,
a new control file parameter) and allow it to get installed in the
registry reserved range.

We then need a way to ask for the list of objects provided by the
extension providing that feature (it's been signed, it's now trusted),
something that we already know how to do (joining pg_depend etc). The
result of that query would be cached on the client side, because we're
not trying to change the way pg_type OID assignment ranges work. We
could maybe specialize all this down to pg_type and distribute a list of
reserved OIDs there too, but it seems like that ship has already sailed
(nice pg_upgrade chicken and egg problem).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Tomas Vondra

Dne 26.09.2012 17:29, Alvaro Herrera napsal:
Excerpts from Tomas Vondra's message of mié sep 26 12:25:58 -0300 
2012:

Dne 26.09.2012 16:51, Jeff Janes napsal:


> I think forking it off to to another value would be better.  If 
you
> are an autovacuum worker which is just starting up and so getting 
its

> initial stats, you can tolerate a stats file up to
> "autovacuum_naptime
> / 5.0" stale.  If you are already started up and are just about to
> vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as 
it

> currently is, so as not to redundantly vacuum a table.

I always thought there's a "no more than one worker per database"
limit,


There is no such limitation.


OK, thanks. Still, reading/writing the small (per-database) files would 
be
much faster so it would be easy to read/write them more often on 
demand.


Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Alvaro Herrera
Excerpts from Tomas Vondra's message of mié sep 26 12:25:58 -0300 2012:
> Dne 26.09.2012 16:51, Jeff Janes napsal:

> > I think forking it off to to another value would be better.  If you
> > are an autovacuum worker which is just starting up and so getting its
> > initial stats, you can tolerate a stats file up to 
> > "autovacuum_naptime
> > / 5.0" stale.  If you are already started up and are just about to
> > vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
> > currently is, so as not to redundantly vacuum a table.
> 
> I always thought there's a "no more than one worker per database" 
> limit,

There is no such limitation.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] 2 bugs with extensions

2012-09-26 Thread Marko Kreen
On Wed, Sep 26, 2012 at 5:36 PM, Dimitri Fontaine
 wrote:
> After much distractions I've at least been able to do something about
> that bug report.

Thanks.

>> 2) If there is schema with functions, but nothing else,
>> create extension pgq_coop from 'unpackaged';
>> drop extension pgq_coop;
>> create extension pgq_coop;
>> ERROR:  schema "pgq_coop" already exists
>
> From reading the scripts, it's not clear to me, but it appears that the
> schema existed before the create from unpackaged then got added to the
> extension by way of
>
>   ALTER EXTENSION pgq_coop ADD SCHEMA pgq_coop;
>
> Is that true?

Yes.

> Can we work out a minimal example to reproduce the bug?

Yes, the above text or sql/pgq_coop/sql/pgq_coop_init_ext.sql

I guess you could replace pgq_coop with any extension just
consisting of just functions.

> (The Makefile in skytools/sql/pgq_coop fails on my OS)

How does it fail?  Are you using gnu make?  What version?

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Alvaro Herrera
Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:
> On 26-09-2012 09:43, Tomas Vondra wrote:

> > 5) splitting the single stat file into multiple pieces - e.g. per database,
> >written separately, so that the autovacuum workers don't need to read all
> >the data even for databases that don't need to be vacuumed. This might be
> >combined with (4).
> > 
> IMHO that's the definitive solution. It would be one file per database plus a
> global one. That way, the check would only read the global.stat and process
> those database that were modified. Also, an in-memory map could store that
> information to speed up the checks.

+1

> The only downside I can see is that you
> will increase the number of opened file descriptors.

Note that most users of pgstat will only have two files open (instead of
one as currently) -- one for shared, one for their own database.  Only
pgstat itself and autovac launcher would need to open pgstat files for
all databases; but both do not have a need to open other files
(arbitrary tables) so this shouldn't be a major problem.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Tomas Vondra

Dne 26.09.2012 16:51, Jeff Janes napsal:

On Wed, Sep 26, 2012 at 5:43 AM, Tomas Vondra  wrote:

First - our system really is not a "common" one - we do have ~1000 
of

databases of
various size, each containing up to several thousands of tables 
(several

user-defined
tables, the rest serve as caches for a reporting application - yes, 
it's a

bit weird
design but that's life). This all leads to pgstat.stat significantly 
larger

than 60 MB.


...
Now, let's suppose the write takes >10 ms, which is the 
PGSTAT_RETRY_DELAY

values.
With our current pgstat.stat filesize/num of relations, this is 
quite

common.
Actually the common write time in our case is ~100 ms, even if we 
move the

file
into tmpfs. That means that almost all the calls to 
backend_read_statsfile

(which
happen in all pgstat_fetch_stat_*entry calls) result in continuous 
stream of

inquiries from the autovacuum workers, writing/reading of the file.


I don't think it actually does.  What you are missing is the same
thing I was missing a few weeks ago when I also looked into something
like this.

3962:

 * We don't recompute min_ts after sleeping, except in 
the

 * unlikely case that cur_ts went backwards.

That means the file must have been written within 10 ms of when we
*first* asked for it.


Yeah, right - I've missed the first "if (pgStatDBHash)" check right at
the beginning.


What is generating the endless stream you are seeing is that you have
1000 databases so if naptime is one minute you are vacuuming 16 per
second.  Since every database gets a new process, that process needs
to read the file as it doesn't inherit one.


Right. But that makes the 10ms timeout even more strange, because the
worker is then using the data for very long time (even minutes).



...


First, I'm interested in feedback - did I get all the details right, 
or am I

missing something important?

Next, I'm thinking about ways to solve this:

1) turning of autovacuum, doing regular VACUUM ANALYZE from cron


Increasing autovacuum_naptime seems like a far better way to do
effectively the same thing.


Agreed. One of my colleagues turned autovacuum off a few years back and 
that

was a nice lesson how not to solve this kind of issues.

2) tweaking the timer values, especially increasing 
PGSTAT_RETRY_DELAY and

so on
   to consider several seconds to be fresh enough - Would be nice to 
have

this
   as a GUC variables, although we can do another private patch on 
our own.

But
   more knobs is not always better.


I think forking it off to to another value would be better.  If you
are an autovacuum worker which is just starting up and so getting its
initial stats, you can tolerate a stats file up to 
"autovacuum_naptime

/ 5.0" stale.  If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
currently is, so as not to redundantly vacuum a table.


I always thought there's a "no more than one worker per database" 
limit,
and that the file is always reloaded when switching to another 
database.

So I'm not sure how could a worker see such a stale table info? Or are
the workers keeping the stats across multiple databases?

3) logic detecting the proper PGSTAT_RETRY_DELAY value - based 
mostly on the

time
   it takes to write the file (e.g. 10x the write time or 
something).


This is already in place.


Really? Where?

I've checked the current master, and the only thing I see in 
pgstat_write_statsfile

is this (line 3558):

  last_statwrite = globalStats.stats_timestamp;

https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558


I don't think that's doing what I meant. That really doesn't scale the 
timeout
according to write time. What happens right now is that when the stats 
file is
written at time 0 (starts at zero, write finishes at 100 ms), and a 
worker asks
for the file at 99 ms (i.e. 1ms before the write finishes), it will set 
the time

of the inquiry to last_statrequest and then do this

   if (last_statwrite < last_statrequest)
  pgstat_write_statsfile(false);

i.e. comparing it to the start of the write. So another write will 
start right

after the file is written out. And over and over.

Moreover there's the 'rename' step making the new file invisible for 
the worker

processes, which makes the thing a bit more complicated.

What I'm suggesting it that there should be some sort of tracking the 
write time
and then deciding whether the file is fresh enough using 10x that 
value. So when
a file is written in 100 ms, it's be considered OK for the next 900 ms, 
i.e. 1 sec
in total. Sure, we could use 5x or other coefficient, doesn't really 
matter.


5) splitting the single stat file into multiple pieces - e.g. per 
database,
   written separately, so that the autovacuum workers don't need to 
read all
   the data even for databases that don't need to be vacuumed. This 
might be

   combined with (4).


I think this needs to ha

Re: [HACKERS] system_information.triggers & truncate triggers

2012-09-26 Thread Euler Taveira
On 26-09-2012 11:08, Simon Riggs wrote:
> I suggest we implement that with some kind of switch/case in the view
> definition.
> 
-- parameter can be set in a session and defaults to on
SET compliance_information_schema TO off;


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system_information.triggers & truncate triggers

2012-09-26 Thread Christopher Browne
On Wed, Sep 26, 2012 at 10:02 AM, Tom Lane  wrote:
> Daniel Farina  writes:
>> On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova  
>> wrote:
>>> The definition of information_schema.triggers contains this:
>>> -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE
>>> so it seems that we are not showing TRUNCATE triggers intentionally,
>>> but that comment fails to explain why
>
>> Wouldn't it be because TRUNCATE is a PostgreSQL language extension?
>
> Yeah.  The SQL standard specifies the allowed values in that column,
> and TRUNCATE is not among them.
>
> For similar reasons, you won't find exclusion constraints represented
> in the information_schema views, and there are some other cases that
> I don't recall this early in the morning.
>
> The point of the information_schema (at least IMHO) is to present
> standard-conforming information about standard-conforming database
> objects in a standard-conforming way, so that cross-DBMS applications
> can rely on what they'll see there.  If you are doing anything that's
> not described by the SQL standard, you will get at best an incomplete
> view of it from the information_schema.  In that case you're a lot
> better off looking directly at the underlying catalogs.
>
> (Yes, I'm aware that some other DBMS vendors have a more liberal
> interpretation of what standards compliance means in this area.)

Let me grouse about this a bit...  

I appreciate that standards compliance means that information_schema
needs to be circumspect as to what it includes.

But it is irritating that information_schema provides a representation
of (for instance) triggers that, at first, looks nice and clean and
somewhat version-independent, only to fall over because there's a
class of triggers that it consciously ignores.

If I'm wanting to do schema analytics on this (and I do), I'm left
debating between painful choices:

a) Use information_schema for what it *does* have, and then add in a
surprising-looking hack that's pretty version-dependent to draw in the
other triggers that it left out

b) Ignore the seeming-nice information_schema representation, and
construct a version-dependent extraction covering everything that more
or less duplicates the work being done by information_schema.triggers.

I'd really like to have something like

c) Something like information_schema that "takes the
standards-conformance gloves off" and gives a nice representation of
all the triggers.

Make no mistake, I'm not casting aspersions at how pg_trigger was
implemented; I have no complaint there, as it's quite fair that the
internal representation won't be totally "human-readability-friendly."
 That is a structure that is continuously accessed by backends, and it
is entirely proper to bias implementation to internal considerations.
But I'd sure like ways to get at more analytically-friendly
representations.

A different place where I wound up having to jump through considerable
hoops when doing schema analytics was vis-a-vis identifying functions.
 I need to be able to compare schemas across databases, so oid-based
identification of functions is a total non-starter.  It appears that
the best identification of a function would be based on the
combination of schema name, function name, and the concatenation of
argument data types.  It wasn't terribly difficult to construct that
third bit, but it surely would be nice if there was a view capturing
it, and possibly even serializing it into a table to enable indexing
on it.  Performance-wise, function comparisons turned out to be one of
the most expensive things I did, specifically because of that mapping
surrounding arguments.


-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Euler Taveira
On 26-09-2012 09:43, Tomas Vondra wrote:
> I've been struggling with autovacuum generating a lot of I/O and CPU on some
> of our
> systems - after a night spent analyzing this behavior, I believe the current
> autovacuum accidentally behaves a bit like a stress-test in some corner cases
> (but
> I may be seriously wrong, after all it was a long night).
> 
It is known that statistic collector doesn't scale for a lot of databases. It
wouldn't be a problem if we don't have automatic maintenance (aka autovacuum).

> Next, I'm thinking about ways to solve this:
> 
> 1) turning of autovacuum, doing regular VACUUM ANALYZE from cron - certainly 
> an
>option, but it's rather a workaround than a solution and I'm not very fond 
> of
>it. Moreover it fixes only one side of the problem - triggering the 
> statfile
>writes over and over. The file will be written anyway, although not that
>frequently.
> 
It solves your problem if you combine scheduled VA with pgstat.stat in a
tmpfs. I don't see it as a definitive solution if we want to scale auto
maintenance for several hundreds or even thousands databases in a single
cluster (Someone could think it is not that common but in hosting scenarios
this is true. DBAs don't want to run several VMs or pg servers just to
minimize the auto maintenance scalability problem).

> 2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY and so 
> on
>to consider several seconds to be fresh enough - Would be nice to have this
>as a GUC variables, although we can do another private patch on our own. 
> But
>more knobs is not always better.
> 
It doesn't solve the problem. Also it could be a problem for autovacuum (that
make assumptions based in those fixed values).

> 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the 
> time
>it takes to write the file (e.g. 10x the write time or something).
> 
Such adaptive logic would be good only iff it takes a small time fraction to
execute. It have to pay attention to the limits. It appears to be a candidate
for exploration.

> 4) keeping some sort of "dirty flag" in stat entries - and then writing only 
> info
>about objects were modified enough to be eligible for vacuum/analyze (e.g.
>increasing number of index scans can't trigger autovacuum while inserting
>rows can). Also, I'm not worried about getting a bit older num of index 
> scans,
>so 'clean' records might be written less frequently than 'dirty' ones.
> 
It minimizes your problem but harms collector tools (that want fresh
statistics about databases).

> 5) splitting the single stat file into multiple pieces - e.g. per database,
>written separately, so that the autovacuum workers don't need to read all
>the data even for databases that don't need to be vacuumed. This might be
>combined with (4).
> 
IMHO that's the definitive solution. It would be one file per database plus a
global one. That way, the check would only read the global.stat and process
those database that were modified. Also, an in-memory map could store that
information to speed up the checks. The only downside I can see is that you
will increase the number of opened file descriptors.

> Ideas? Objections? Preferred options?
> 
I prefer to attack 3, sort of 4 (explained in 5 -- in-memory map) and 5.

Out of curiosity, did you run perf (or some other performance analyzer) to
verify if some (stats and/or autovac) functions pop up in the report?


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system_information.triggers & truncate triggers

2012-09-26 Thread Simon Riggs
On 26 September 2012 15:02, Tom Lane  wrote:
> Daniel Farina  writes:
>> On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova  
>> wrote:
>>> The definition of information_schema.triggers contains this:
>>> -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE
>>> so it seems that we are not showing TRUNCATE triggers intentionally,
>>> but that comment fails to explain why
>
>> Wouldn't it be because TRUNCATE is a PostgreSQL language extension?
>
> Yeah.  The SQL standard specifies the allowed values in that column,
> and TRUNCATE is not among them.
>
> For similar reasons, you won't find exclusion constraints represented
> in the information_schema views, and there are some other cases that
> I don't recall this early in the morning.
>
> The point of the information_schema (at least IMHO) is to present
> standard-conforming information about standard-conforming database
> objects in a standard-conforming way, so that cross-DBMS applications
> can rely on what they'll see there.  If you are doing anything that's
> not described by the SQL standard, you will get at best an incomplete
> view of it from the information_schema.  In that case you're a lot
> better off looking directly at the underlying catalogs.
>
> (Yes, I'm aware that some other DBMS vendors have a more liberal
> interpretation of what standards compliance means in this area.)

While I understand and even agree with that, I think we also need
another view: information schema as a standard way of representing all
data, even that which extends the standard. Especially so, since
others take the latter view also.

I suggest we implement that with some kind of switch/case in the view
definition.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Jeff Janes
On Wed, Sep 26, 2012 at 5:43 AM, Tomas Vondra  wrote:

> First - our system really is not a "common" one - we do have ~1000 of
> databases of
> various size, each containing up to several thousands of tables (several
> user-defined
> tables, the rest serve as caches for a reporting application - yes, it's a
> bit weird
> design but that's life). This all leads to pgstat.stat significantly larger
> than 60 MB.
>
...
> Now, let's suppose the write takes >10 ms, which is the PGSTAT_RETRY_DELAY
> values.
> With our current pgstat.stat filesize/num of relations, this is quite
> common.
> Actually the common write time in our case is ~100 ms, even if we move the
> file
> into tmpfs. That means that almost all the calls to backend_read_statsfile
> (which
> happen in all pgstat_fetch_stat_*entry calls) result in continuous stream of
> inquiries from the autovacuum workers, writing/reading of the file.

I don't think it actually does.  What you are missing is the same
thing I was missing a few weeks ago when I also looked into something
like this.

3962:

 * We don't recompute min_ts after sleeping, except in the
 * unlikely case that cur_ts went backwards.

That means the file must have been written within 10 ms of when we
*first* asked for it.

What is generating the endless stream you are seeing is that you have
1000 databases so if naptime is one minute you are vacuuming 16 per
second.  Since every database gets a new process, that process needs
to read the file as it doesn't inherit one.

...
>
> First, I'm interested in feedback - did I get all the details right, or am I
> missing something important?
>
> Next, I'm thinking about ways to solve this:
>
> 1) turning of autovacuum, doing regular VACUUM ANALYZE from cron

Increasing autovacuum_naptime seems like a far better way to do
effectively the same thing.

>
> 2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY and
> so on
>to consider several seconds to be fresh enough - Would be nice to have
> this
>as a GUC variables, although we can do another private patch on our own.
> But
>more knobs is not always better.

I think forking it off to to another value would be better.  If you
are an autovacuum worker which is just starting up and so getting its
initial stats, you can tolerate a stats file up to "autovacuum_naptime
/ 5.0" stale.  If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
currently is, so as not to redundantly vacuum a table.


>
> 3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the
> time
>it takes to write the file (e.g. 10x the write time or something).

This is already in place.

> 5) splitting the single stat file into multiple pieces - e.g. per database,
>written separately, so that the autovacuum workers don't need to read all
>the data even for databases that don't need to be vacuumed. This might be
>combined with (4).

I think this needs to happen eventually.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Alvaro Herrera

Really, as far as autovacuum is concerned, it would be much more useful
to be able to reliably detect that a table has been recently vacuumed,
without having to request a 10ms-recent pgstat snapshot.  That would
greatly reduce the amount of time autovac spends on pgstat requests.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] htup header reorganization breaks many extension modules

2012-09-26 Thread Peter Eisentraut
On 9/26/12 10:07 AM, Tom Lane wrote:
> I can't get excited about this either.  This isn't the first, or the
> last, change that add-on modules can expect to have to make to track
> newer Postgres versions.  If we allow Peter's complaint to become the
> new project policy, we'll never be able to make any header
> rearrangements at all, nor change any internal APIs.

I'm not saying we can never change anything about the internal headers,
but we can make a small effort not to create useless annoyances.

That said, could someone clarify the header comments in the new headers?
 We currently have

 * htup.h
 *POSTGRES heap tuple definitions.

 * htup_details.h
 *POSTGRES heap tuple header definitions.

The names of the headers don't match their documented purpose very much.
 Is GETSTRUCT a "detail" of the heap tuple definition, or is it related
to "tuple headers"?  It's not really either, but I guess it is related
to tuple headers because you need to know about the headers to get to
the stuff past it.

When I see headers stuff.h and stuff_details.h, it makes me think that
you should only use stuff.h, and stuff_details.h are internal things.
But a lot of external modules use GETSTRUCT, so they might get confused.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oid registry

2012-09-26 Thread Peter Eisentraut
On 9/25/12 8:48 PM, Andrew Dunstan wrote:
> That still leaves the other uses for having well known Oids (or possibly
> UUIDs) for non-builtin types (e.g. so Drivers don't have to look them up
> in the catalogs, or having issues when types are added to the core.)

Here is a problem:  If I write an "hstore-ng" extension, I have two options:

1. "Steal" the OID/UUID assigned to hstore so my extension integrates
nicely with all the drivers.

2. Don't steal the OID/UUID assigned to hstore and have a very low
chance that my extension will ever be supported nicely by a lot of drivers.

Neither option is very nice.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] 2 bugs with extensions

2012-09-26 Thread Dimitri Fontaine
Hi,

After much distractions I've at least been able to do something about
that bug report.

Marko Kreen  writes:
> 1) Dumpable sequences are not supported - if sequence is tagged
>with pg_catalog.pg_extension_config_dump(), the pg_dump tries
>to run COPY on it.

I can only reproduce that in 9.1. I first tried in HEAD where pg_dump
will just entirely skip the sequence, which is not the right answer
either, but at least does not spit out that message.

> pg_dump: Error message from server: ERROR:  cannot copy from sequence
> "batch_id_seq"
> pg_dump: The command was: COPY pgq.batch_id_seq  TO stdout;

Please find attached a patch that fixes it in 9.1, in all classic
pg_dump, --data-only and --schema-only.

 git diff --stat
  src/bin/pg_dump/pg_dump.c |   68 +++-
  1 files changed, 54 insertions(+), 14 deletions(-)

Once that's ok for 9.1, I'll get to work on a fix for master, oh and
look at what the situation is in 9.2, which I guess is the same as in
master actually.

> 2) If there is schema with functions, but nothing else,
> create extension pgq_coop from 'unpackaged';
> drop extension pgq_coop;
> create extension pgq_coop;
> ERROR:  schema "pgq_coop" already exists

>From reading the scripts, it's not clear to me, but it appears that the
schema existed before the create from unpackaged then got added to the
extension by way of

  ALTER EXTENSION pgq_coop ADD SCHEMA pgq_coop;

Is that true? Can we work out a minimal example to reproduce the bug?
(The Makefile in skytools/sql/pgq_coop fails on my OS) 

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***
*** 181,187  static void dumpTrigger(Archive *fout, TriggerInfo *tginfo);
  static void dumpTable(Archive *fout, TableInfo *tbinfo);
  static void dumpTableSchema(Archive *fout, TableInfo *tbinfo);
  static void dumpAttrDef(Archive *fout, AttrDefInfo *adinfo);
! static void dumpSequence(Archive *fout, TableInfo *tbinfo);
  static void dumpIndex(Archive *fout, IndxInfo *indxinfo);
  static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo);
  static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo);
--- 181,187 
  static void dumpTable(Archive *fout, TableInfo *tbinfo);
  static void dumpTableSchema(Archive *fout, TableInfo *tbinfo);
  static void dumpAttrDef(Archive *fout, AttrDefInfo *adinfo);
! static void dumpSequence(Archive *fout, TableInfo *tbinfo, bool extMember);
  static void dumpIndex(Archive *fout, IndxInfo *indxinfo);
  static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo);
  static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo);
***
*** 1536,1541  dumpTableData(Archive *fout, TableDataInfo *tdinfo)
--- 1536,1547 
  	DataDumperPtr dumpFn;
  	char	   *copyStmt;
  
+ 	if (tbinfo->relkind == RELKIND_SEQUENCE)
+ 	{
+ 		dumpSequence(fout, tbinfo, true);
+ 		return;
+ 	}
+ 
  	if (!dump_inserts)
  	{
  		/* Dump/restore using COPY */
***
*** 1604,1609  makeTableDataInfo(TableInfo *tbinfo, bool oids)
--- 1610,1630 
  {
  	TableDataInfo *tdinfo;
  
+ 	/*
+ 	 * Nothing to do if we already decided to dump the table.  This will
+ 	 * happen for "config" tables.
+ 	 */
+ 	if (tbinfo->dataObj != NULL)
+ 		return;
+ 
+ 	/* Skip VIEWs (no data to dump) */
+ 	if (tbinfo->relkind == RELKIND_VIEW)
+ 		return;
+ 	/* Skip FOREIGN TABLEs (no data to dump) */
+ 	if (tbinfo->relkind == RELKIND_FOREIGN_TABLE)
+ 		return;
+ 
+ 	/* OK, let's dump it */
  	tdinfo = (TableDataInfo *) malloc(sizeof(TableDataInfo));
  
  	tdinfo->dobj.objType = DO_TABLE_DATA;
***
*** 11975,11981  dumpTable(Archive *fout, TableInfo *tbinfo)
  		char	   *namecopy;
  
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
! 			dumpSequence(fout, tbinfo);
  		else if (!dataOnly)
  			dumpTableSchema(fout, tbinfo);
  
--- 11996,12002 
  		char	   *namecopy;
  
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
! 			dumpSequence(fout, tbinfo, false);
  		else if (!dataOnly)
  			dumpTableSchema(fout, tbinfo);
  
***
*** 13118,13124  findLastBuiltinOid_V70(void)
  }
  
  static void
! dumpSequence(Archive *fout, TableInfo *tbinfo)
  {
  	PGresult   *res;
  	char	   *startv,
--- 13139,13145 
  }
  
  static void
! dumpSequence(Archive *fout, TableInfo *tbinfo, bool extMember)
  {
  	PGresult   *res;
  	char	   *startv,
***
*** 13219,13225  dumpSequence(Archive *fout, TableInfo *tbinfo)
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (!dataOnly)
  	{
  		/*
  		 * DROP must be fully qualified in case same name appears in
--- 13240,13246 
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (!extMember && !dataOnly)
  	{
  		/*
  		 * DROP must be fully qualified in case same name app

Re: [HACKERS] htup header reorganization breaks many extension modules

2012-09-26 Thread Alvaro Herrera
Excerpts from Peter Eisentraut's message of mié sep 26 11:18:51 -0300 2012:
> On 9/26/12 10:07 AM, Tom Lane wrote:
> > I can't get excited about this either.  This isn't the first, or the
> > last, change that add-on modules can expect to have to make to track
> > newer Postgres versions.  If we allow Peter's complaint to become the
> > new project policy, we'll never be able to make any header
> > rearrangements at all, nor change any internal APIs.
> 
> I'm not saying we can never change anything about the internal headers,
> but we can make a small effort not to create useless annoyances.

I proposed a possible way out of the problem elsewhere.  Please comment
on that.

> That said, could someone clarify the header comments in the new headers?
>  We currently have
> 
>  * htup.h
>  *POSTGRES heap tuple definitions.
> 
>  * htup_details.h
>  *POSTGRES heap tuple header definitions.

htup.h is what you need if you want to pass tuples around.  It's
particularly useful for other headers that want to declare their
functions as receiving or returning tuples.  htup_details.h is what you
need if you want to operate on tuples, such as creating them or
examining them.

I guess those comments aren't all that well thought out; suggestions
welcome.

> The names of the headers don't match their documented purpose very much.
>  Is GETSTRUCT a "detail" of the heap tuple definition, or is it related
> to "tuple headers"?  It's not really either, but I guess it is related
> to tuple headers because you need to know about the headers to get to
> the stuff past it.

> When I see headers stuff.h and stuff_details.h, it makes me think that
> you should only use stuff.h, and stuff_details.h are internal things.
> But a lot of external modules use GETSTRUCT, so they might get confused.

The other proposal was htup_internal.h but that would have been much
more indicative of stuff that's supposed to be used only for internal
consumption of files in backend/access/heap and such, which is why I
stayed away from that name.  I think htup_details.h is a good enough
compromise.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oid registry

2012-09-26 Thread Tom Lane
Antonin Houska  writes:
> I'm also implementing an extension where direct access to non-fixed OIDs
> (i.e. no catalog cache lookup by name) would be very helpful. I spent some
> time thinking about a workaround that makes OID registry unnecessary.
> How about the following?

> 1. Add a new varlena column to pg_proc catalog table, say 
> 'ext_types',containing
> C array of OIDs.

> 2. Let each extension declare requirements like the following in its 
> configuration
> files:

> "I expect  type at 0-th position of 'ext_types' array."
> "I expect  type at 1-st position of 'ext_types' array."
> etc.

I think this just begs the question: how do you specify  and
how do you know that whatever was found is what you want?

Beyond that, nothing in what you said can't be done today by a function
that does type name lookups and caches the results internally.  And I'd
just as soon not burden the function-call infrastructure with more
overhead to support something only a small fraction of functions would
need.

Another point is that server-internal features don't help client-side
code, which is actually where most of the pain is AFAICT.  We aren't
providing any infrastructure that helps clients interpret PQftype()
values for non-core types.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oid registry

2012-09-26 Thread Andrew Dunstan


On 09/25/2012 08:56 PM, Robert Haas wrote:

On Tue, Sep 25, 2012 at 10:23 AM, Tom Lane  wrote:

Andrew Dunstan  writes:

Given your previous comments, perhaps we could just start handing out
Oids (if there is any demand) numbered, say, 9000 and up. That should
keep us well clear of any existing use.

No, I think you missed my point entirely: handing out OIDs at the top
of the manual assignment range is approximately the worst possible
scenario.  I foresee having to someday move FirstBootstrapObjectId
down to 9000, or 8000, or even less, to cope with growth of the
auto-assigned OID set.  So we need to keep manually assigned OIDs
reasonably compact near the bottom of the range, and it doesn't matter
at all whether such OIDs are used internally or reserved for external
developers.  Nor do I see a need for such reserved OIDs to "look
different" from internally-used OIDs.  Reserved is reserved.

I'm not sure how much anyone cares, but just in case anyone does... it
would be mighty useful to EnterpriseDB to have a range of OIDs that
are guarantee not to be assigned to anyone else, because we're
maintaining a fork of PostgreSQL that regularly merges with the
mainline.  We're actually likely to get crunched in our fork well
before PostgreSQL itself does.  There are enough other forks of
PostgreSQL out there that there may other people who are in a similar
situation, though I am not sure how much we want to cater to people
doing such things.  That having been said, I can't really see how it
would be practical anyway unless we raise the 16384 lower limit for
user-assigned OIDs considerably.   And I'm not sure how to do that
without breaking pg_upgrade.


How many would EDB need for it to be useful?



I am somewhat opposed to the idea of an OID registry.  I can't see why
the space of things people want to reserve OIDs for would be only tens
rather than thousands.  There are surely plenty of extensions that
would like to depend on specific other extensions, or on core.  If we
legislate that hard-coded OIDs are the way to do that, I think we're
going to end up with a lot of 'em.


Maybe you have a better feel than I do for how many live addon types are 
out there. Maybe there are hundreds or thousands, but if there are I am 
blissfully unaware of them.


I did like the alternative idea upthread of UUIDs for types which would 
give them a virtually unlimited space.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] htup header reorganization breaks many extension modules

2012-09-26 Thread Tom Lane
"Albe Laurenz"  writes:
> Hitoshi Harada wrote:
>> But it's only add #include "access/htup_details.h"?  I'd not argue
>> it's harmful unless I missed your point.

> I guess the point is that you need an #ifdef if you want a module
> to be able to build with both 9.3 and lower versions.

I can't get excited about this either.  This isn't the first, or the
last, change that add-on modules can expect to have to make to track
newer Postgres versions.  If we allow Peter's complaint to become the
new project policy, we'll never be able to make any header
rearrangements at all, nor change any internal APIs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system_information.triggers & truncate triggers

2012-09-26 Thread Tom Lane
Daniel Farina  writes:
> On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova  
> wrote:
>> The definition of information_schema.triggers contains this:
>> -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE
>> so it seems that we are not showing TRUNCATE triggers intentionally,
>> but that comment fails to explain why

> Wouldn't it be because TRUNCATE is a PostgreSQL language extension?

Yeah.  The SQL standard specifies the allowed values in that column,
and TRUNCATE is not among them.

For similar reasons, you won't find exclusion constraints represented
in the information_schema views, and there are some other cases that
I don't recall this early in the morning.

The point of the information_schema (at least IMHO) is to present
standard-conforming information about standard-conforming database
objects in a standard-conforming way, so that cross-DBMS applications
can rely on what they'll see there.  If you are doing anything that's
not described by the SQL standard, you will get at best an incomplete
view of it from the information_schema.  In that case you're a lot
better off looking directly at the underlying catalogs.

(Yes, I'm aware that some other DBMS vendors have a more liberal
interpretation of what standards compliance means in this area.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doc patch to note which system catalogs have oids

2012-09-26 Thread Robert Haas
On Mon, Sep 24, 2012 at 9:18 PM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I think this is fundamentally wrong, or at least misleading, because it
>> documents OID as if it were an ordinary column.  Somebody who did
>> "select * from pg_class" and didn't see any "oid" in the result would
>> think the docs were wrong.
>
> Given that it's been quite some time since we defaulted to including
> OIDs in tables, and the high level of confusion that individuals trying
> to join pg_class and pg_namespace together go through due to select *
> not including the oid column, I wonder if perhaps we should consider
> changing that.  Might be possible to do for just the catalog tables (to
> minimize the risk of breaking poorly-written applications), or provide
> a GUC to control including the oid column in select *.
>
>> It's possible that it's worth expending a boilerplate paragraph in each
>> of those pages to say "this catalog has OIDs" (or that it doesn't).
>> But this isn't the way.
>
> I'm afraid I disagree with this.  The oid column, in the system
> catalog, is user-facing and I like having it included as a column in the
> table in the docs, so users know what to use when doing joins.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] htup header reorganization breaks many extension modules

2012-09-26 Thread Alvaro Herrera
Excerpts from Peter Eisentraut's message of mar sep 25 21:30:59 -0300 2012:
> I haven't followed the details of the htup header reorganization, but I
> have noticed that a number of external extension modules will be broken
> because of the move of GETSTRUCT() and to a lesser extent
> heap_getattr().  Of course some #ifdefs can fix that, but it seems
> annoying to make everyone do that.  Maybe this could be reconsidered to
> reduce the impact on other projects.

Hmm.  My original patch didn't have this problem :-(  What it did was to
keep htup.h the "everything needed to work on tuples" header; so
external modules would have not become broken.  I didn't realize this at
the time, which is why I didn't argue to keep it that way instead of
having the new header contain most innards.

I guess we could rename the headers, so that htup.h is what's now called
htup_details.h, and htup_basics.h for what's currently htup.h.  This
would have a lot of fallout in core code, but eliminate impact on
external modules.

That said, I am not really sure that I want to promise header
compatibility forever.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-26 Thread Andres Freund
On Wednesday, September 26, 2012 02:39:36 PM Michael Paquier wrote:
> Do you think it is acceptable to consider that the user has to do the
> cleanup of the old or new index himself if there is a failure?
The problem I see is that if you want the thing to be efficient you might end 
up 
doing step 1) for all/a bunch of indexes, then 2), then  In that case you 
can have loads of invalid indexes around. 

> You could also reissue the reindex command and avoid an additional command.
> When launching a concurrent reindex, it could be possible to check if there
> is already an index that has been created to replace the old one that failed
> previously. In order to control that, why not adding an additional field in
> pg_index?
> When creating a new index concurrently, we register in its pg_index entry
> the oid of the index that it has to replace. When reissuing the command
> after a failure, it is then possible to check if there is already an index
> that has been issued by a previous REINDEX CONCURRENT command and based on
> the flag values of the old and new indexes it is then possible to replay the
> command from the step where it previously failed.
I don't really like this idea but we might end up there anyway because we 
probably need to keep track whether an index is actually only a "replacement" 
index that shouldn't exist on its own. Otherwise its hard to know which 
indexes to drop if it failed halfway through.

Greetings,

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oid registry

2012-09-26 Thread Antonin Houska

I'm also implementing an extension where direct access to non-fixed OIDs
(i.e. no catalog cache lookup by name) would be very helpful. I spent some
time thinking about a workaround that makes OID registry unnecessary.
How about the following?

1. Add a new varlena column to pg_proc catalog table, say 
'ext_types',containing

C array of OIDs.

2. Let each extension declare requirements like the following in its 
configuration

files:

"I expect  type at 0-th position of 'ext_types' array."
"I expect  type at 1-st position of 'ext_types' array."
etc.

3. Ensure that CREATE EXTENSION command reads all these type names, 
finds the
appropriate OIDs in pg_type and puts them to the appropriate position in 
the 'ext_types'
column for each function of the new extension (while in-core types would 
have it set to

NULL of course).

4. Implement a macro to get the 0-th, 1-st, etc. value from 
pg_proc.ext_types (via FMGR).


Is there any serious circumstance I forgot or does it seem to be e.g. 
too invasive?


Kind regards,
Tony H.

On 09/25/2012 12:59 AM, Andrew Dunstan wrote:
This rather overdue mail arises out the developer's meeting back in 
May, where we discussed an item I raised suggesting an Oid registry.


The idea came from some difficulties I encountered when I did the 
backport of the JSON work we did in 9.2 to 9.1, but has wider 
application. Say someone writes an extension that defines type A. You 
want to write an extension that takes advantage of that type, but it's 
difficult if you don't know the type's Oid, and of course currently 
there is no way of knowing for sure what it is unless it's a builtin 
type. So the proposal is to have an Oid registry, in which authors 
could in effect reserve an Oid (or a couple of Oids) for a type. We 
would guarantee that these Oids would be reserved in just the same way 
Oids for builtins are reserved, and #define symbolic constants for the 
reserved Oids. To make that viable, we'd need to extend the CREATE 
commands for any objects we could reserve Oids for to allow for the 
specification of the Oids, somewhat like:


   CREATE TYPE newtype ( ) WITH (TYPEOID = 123456, TYPEARRAYOID =
   234567);

I'm not sure what objects we would need this for other than types, but 
CREATE TYPE would be a good start anyway.


Another suggestion that was made during the discussion was that we 
should also reserve a range of Oids for private use, and guarantee 
that those would never be allocated, somewhat analogously to RFC1918 
IP addresses.


thoughts?

If there is general agreement I want to get working on this fairly soon.

cheers

andrew








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] autovacuum stress-testing our system

2012-09-26 Thread Tomas Vondra

Hi,

I've been struggling with autovacuum generating a lot of I/O and CPU on 
some of our
systems - after a night spent analyzing this behavior, I believe the 
current
autovacuum accidentally behaves a bit like a stress-test in some corner 
cases (but

I may be seriously wrong, after all it was a long night).

First - our system really is not a "common" one - we do have ~1000 of 
databases of
various size, each containing up to several thousands of tables 
(several user-defined
tables, the rest serve as caches for a reporting application - yes, 
it's a bit weird
design but that's life). This all leads to pgstat.stat significantly 
larger than 60 MB.


Now, the two main pieces of information from the pgstat.c are the timer 
definitions


-- pgstat.c : 80 
--


#define PGSTAT_STAT_INTERVAL500   /* Minimum time between stats 
file

   * updates; in milliseconds. */

#define PGSTAT_RETRY_DELAY  10/* How long to wait between 
checks for
   * a new file; in milliseconds. 
*/


#define PGSTAT_MAX_WAIT_TIME1 /* Maximum time to wait for a 
stats
   * file update; in milliseconds. 
*/


#define PGSTAT_INQ_INTERVAL 640   /* How often to ping the 
collector for
   * a new file; in milliseconds. 
*/


#define PGSTAT_RESTART_INTERVAL 60/* How often to attempt to 
restart a
   * failed statistics collector; 
in

   * seconds. */

#define PGSTAT_POLL_LOOP_COUNT	(PGSTAT_MAX_WAIT_TIME / 
PGSTAT_RETRY_DELAY)
#define PGSTAT_INQ_LOOP_COUNT	(PGSTAT_INQ_INTERVAL / 
PGSTAT_RETRY_DELAY)


---

and then this loop (the current HEAD does this a bit differently, but 
the 9.2 code

is a bit readable and suffers the same issue):

-- pgstat.c : 3560 



/*
	 * Loop until fresh enough stats file is available or we ran out of 
time.
	 * The stats inquiry message is sent repeatedly in case collector 
drops

 * it; but not every single time, as that just swamps the collector.
 */
for (count = 0; count < PGSTAT_POLL_LOOP_COUNT; count++)
{
TimestampTz file_ts = 0;

CHECK_FOR_INTERRUPTS();

if (pgstat_read_statsfile_timestamp(false, &file_ts) &&
file_ts >= min_ts)
break;

/* Not there or too old, so kick the collector and wait a bit */
if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
pgstat_send_inquiry(min_ts);

pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
}

if (count >= PGSTAT_POLL_LOOP_COUNT)
elog(WARNING, "pgstat wait timeout");

/* Autovacuum launcher wants stats about all databases */
if (IsAutoVacuumLauncherProcess())
pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
else
pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);

---

What this code does it that it checks the statfile, and if it's not 
stale (the
timestamp of the write start is not older than PGSTAT_RETRY_DELAY 
milliseconds),

the loop is terminated and the file is read.

Now, let's suppose the write takes >10 ms, which is the 
PGSTAT_RETRY_DELAY values.
With our current pgstat.stat filesize/num of relations, this is quite 
common.
Actually the common write time in our case is ~100 ms, even if we move 
the file
into tmpfs. That means that almost all the calls to 
backend_read_statsfile (which
happen in all pgstat_fetch_stat_*entry calls) result in continuous 
stream of

inquiries from the autovacuum workers, writing/reading of the file.

We're not getting 'pgstat wait timeout' though, because it finally gets 
written

before PGSTAT_MAX_WAIT_TIME.

By moving the file to a tmpfs we've minimized the I/O impact, but now 
the collector
and autovacuum launcher consume ~75% of CPU (i.e. ~ one core) and do 
nothing except
burning power because the database is almost read-only. Not a good 
thing in the

"green computing" era I guess.

First, I'm interested in feedback - did I get all the details right, or 
am I

missing something important?

Next, I'm thinking about ways to solve this:

1) turning of autovacuum, doing regular VACUUM ANALYZE from cron - 
certainly an
   option, but it's rather a workaround than a solution and I'm not 
very fond of
   it. Moreover it fixes only one side of the problem - triggering the 
statfile
   writes over and over. The file will be written anyway, although not 
that

   frequently.

2) tweaking the timer values, especially increasing 

Re: [HACKERS] pg_reorg in core?

2012-09-26 Thread Michael Paquier
On Wed, Sep 26, 2012 at 8:13 PM, Andres Freund wrote:

> On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote:
> > On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund  >wrote:
> > > On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
> > > > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund <
> and...@2ndquadrant.com
> > > >
> > > >wrote:
> > > > Could you clarify what do you mean here by cleanup?
> > > > I am afraid I do not get your point here.
> > >
> > > Sorry, was a bit tired when writing the above.
> > >
> > > The point is that to work concurrent the CONCURRENT operations
> > > commit/start multiple transactions internally. It can be interrupted
> > > (user, shutdown, error,
> > > crash) and leave transient state behind every time it does so. What I
> > > wanted to
> > > say is that we need to take care that each of those can easily be
> cleaned
> > > up
> > > afterwards.
> >
> > Sure, many errors may happen.
> > But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up
> method
> > implemented as far as I know (might be missing something though). Isn't
> an
> > index only considered as invalid in case of failure for concurrent
> creation?
> Well, you can DROP or REINDEX the invalid index.
>
> There are several scenarios where you can get invalid indexes. Unique
> violations, postgres restarts, aborted index creation...
>
> > In the case of REINDEX it would be essential to create such a cleanup
> > mechanism as I cannot imagine a production database with an index that
> has
> > been marked as invalid due to a concurrent reindex failure, by assuming
> here,
> > of course, that REINDEX CONCURRENTLY would use the same level of process
> > error as CREATE INDEX CONCURRENTLY.
> Not sure what youre getting at?
>
I just meant that when CREATE INDEX CONCURRENTLY fails, the index created is
considered as invalid, so it cannot be used by planner.

Based on what you told before:
1) build new index with indisready = false
newindex.indisready = true
wait
2) newindex.indisvalid = true
wait
3) swap(oldindex.relfilenode, newindex.relfilenode)
oldindex.indisvalid = false
wait
4) oldindex.indisready = false
wait
drop new index with old relfilenode

If the reindex fails at step 1 or 2, the new index is not usable so the
relation will finish
with an index which is not valid. If it fails at step 4, the old index is
invalid. If it fails at step
3, both indexes are valid and both are usable for given relation.
Do you think it is acceptable to consider that the user has to do the
cleanup of the old or new index
himself if there is a failure?


> > One of the possible cleanup mechanisms I got on top of my head is a
> > callback at transaction abort, each callback would need to be different
> for
> > each subtransaction used at during the concurrent operation.
> > In case the callback itself fails, well the old and/or new indexes become
> > invalid.
> Thats not going to work. E.g. the session might have been aborted or such.
> Also, there is not much you can do from an callback at transaction end as
> you
> cannot do catalog modifications.
>
> I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely
> similar.
>
You could also reissue the reindex command and avoid an additional command.
When launching a
concurrent reindex, it could be possible to check if there is already an
index that has been created to replace the
old one that failed previously. In order to control that, why not adding an
additional field in pg_index?
When creating a new index concurrently, we register in its pg_index entry
the oid of the index that it has to
replace. When reissuing the command after a failure, it is then possible to
check if there is already an index that has
been issued by a previous REINDEX CONCURRENT command and based on the flag
values of the old and new
indexes it is then possible to replay the command from the step where it
previously failed.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Enum binary access

2012-09-26 Thread Petr Chmelar

Hi Merlin,

thanks, your code works perfectly.

Cheers,

Petr

On 10.9.2012 16:33, Merlin Moncure wrote:

On Mon, Sep 10, 2012 at 8:42 AM, Petr Chmelar  wrote:

Hi there,

we tried to create the libpqtypes enum binary send but it doesn't work:

// register types
PGregisterType user_def[] = { {"seqtype", enum_put, enum_get} };
PQregisterTypes(connector->getConn(), PQT_USERDEFINED, user_def, 1, 0);

// enum_put throws format error
int enum_put (PGtypeArgs *args ) {

 char *val = va_arg(args->ap, char *);
 char *out = NULL;
 int vallen = 0, len = 0, oid = 0;
 float sortorder = 0.0;

 if (!args || !val) return 0;

 /* expand buffer enough */
 vallen = strlen(val);
 len = sizeof(int) + sizeof(float) + (vallen * sizeof(char));
 if (args->put.expandBuffer(args, len) == -1) return -1;

 /* put header (oid, sortorder) and value */
 out = args->put.out;
 memcpy(out, &oid, sizeof(int));
 out += sizeof(int);
 memcpy(out, &sortorder, sizeof(float));
 out += sizeof(float);
 memcpy(out, val, vallen);

 return len;
}

// enum_get (FYI, get works OK)
int enum_get (PGtypeArgs *args) {
 char *val = PQgetvalue(args->get.result, args->get.tup_num,
args->get.field_num);
 int len = PQgetlength(args->get.result, args->get.tup_num,
args->get.field_num);
 char **result = va_arg(args->ap, char **);
 *result = (char *) PQresultAlloc((PGresult *) args->get.result, len *
sizeof(char));
 memcpy(*result, val, len * sizeof(char));
 return 0;
}

Postgres doesn't accept enum sent like this and throws format error. This
should be used as a prototype for derived types. There is no real "enum"
named type. Libpqypes doesn't seem to provide simplified binary manipulation
for enum types.

What should we do, please? Can you fix it? I think there is more people that
need access enum types in binary mode.

I was able to get it to work what I did:

*) your 'get' routine should probably be allocating a terminating
byte.  In binary situations, PQgetlength does not return the length of
the null terminator.

*) backend binary format for enums is just the label text string both
on get and put side.  So your putting the oid and sort order was
breaking the put side.  Removed that and everything worked.

*) see (very messy quickly written) code below:


#include "libpq-fe.h"
#include "libpqtypes.h"
#include "string.h"


// enum_put throws format error
int enum_put (PGtypeArgs *args ) {

 char *val = va_arg(args->ap, char *);
 char *out = NULL;
 int vallen = 0, len = 0, oid = 0;
 float sortorder = 0.0;

 if (!args || !val) return 0;

 /* expand buffer enough */
 vallen = strlen(val);
 if (args->put.expandBuffer(args, len) == -1) return -1;

 out = args->put.out;
 memcpy(out, val, vallen);

 return len;
}

// enum_get (FYI, get works OK)
int enum_get (PGtypeArgs *args) {
 char *val = PQgetvalue(args->get.result, args->get.tup_num,
args->get.field_num);
 int len = PQgetlength(args->get.result, args->get.tup_num,
args->get.field_num) + 1;
 char **result = va_arg(args->ap, char **);
 *result = (char *) PQresultAlloc((PGresult *) args->get.result,
len * sizeof(char));
 memcpy(*result, val, len * sizeof(char));
 result[len] = 0;
 return 0;
}


int main()
{
   PGtext t = "b";

   PGconn *conn = PQconnectdb("port=5492 host=localhost");
   PQinitTypes(conn);

   PGregisterType user_def[] = { {"e", enum_put, enum_get} };

   if(!PQregisterTypes(conn, PQT_USERDEFINED, user_def, 1, 0))
 fprintf(stderr, "*ERROR: %s\n", PQgeterror());


   PGresult *res = PQexecf(conn, "select %e", t);

   if(!res)
 fprintf(stderr, "*ERROR: %s\n", PQgeterror());

   if (!PQgetf(res, 0, "%e", 0, &t))
   {
 fprintf(stderr, "*ERROR: %s\n", PQgeterror());
   }

   printf("%s\n", t);

   PQclear(res);
}






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system_information.triggers & truncate triggers

2012-09-26 Thread Daymel Bonne Solís

On 09/26/2012 03:08 AM, Daniel Farina wrote:

On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova  wrote:

On Wed, Sep 26, 2012 at 12:17 AM, Daymel Bonne Solís  wrote:

Hello hackers:

I need a list of all triggers created in my database, but the view
system_information.triggers does not show truncate triggers, but it does for
insert, update and delete triggers.

The same problem is found in versions 9.1 and 9.2.


The definition of information_schema.triggers contains this:
"""
 FROM pg_namespace n, pg_class c, pg_trigger t,
  -- hard-wired refs to TRIGGER_TYPE_INSERT, TRIGGER_TYPE_DELETE,
  -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE
  (VALUES (4, 'INSERT'),
  (8, 'DELETE'),
  (16, 'UPDATE')) AS em (num, text)
"""

so it seems that we are not showing TRUNCATE triggers intentionally,
but that comment fails to explain why

Wouldn't it be because TRUNCATE is a PostgreSQL language extension?


I think this case should be explicitly stated in the documentation.

Regards.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-26 Thread Andres Freund
On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote:
> On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund wrote:
> > On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
> > > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund  > >
> > >wrote:
> > > Could you clarify what do you mean here by cleanup?
> > > I am afraid I do not get your point here.
> > 
> > Sorry, was a bit tired when writing the above.
> > 
> > The point is that to work concurrent the CONCURRENT operations
> > commit/start multiple transactions internally. It can be interrupted
> > (user, shutdown, error,
> > crash) and leave transient state behind every time it does so. What I
> > wanted to
> > say is that we need to take care that each of those can easily be cleaned
> > up
> > afterwards.
> 
> Sure, many errors may happen.
> But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up method
> implemented as far as I know (might be missing something though). Isn't an
> index only considered as invalid in case of failure for concurrent creation?
Well, you can DROP or REINDEX the invalid index.

There are several scenarios where you can get invalid indexes. Unique 
violations, postgres restarts, aborted index creation...

> In the case of REINDEX it would be essential to create such a cleanup
> mechanism as I cannot imagine a production database with an index that has
> been marked as invalid due to a concurrent reindex failure, by assuming here,
> of course, that REINDEX CONCURRENTLY would use the same level of process
> error as CREATE INDEX CONCURRENTLY.
Not sure what youre getting at?

> One of the possible cleanup mechanisms I got on top of my head is a
> callback at transaction abort, each callback would need to be different for
> each subtransaction used at during the concurrent operation.
> In case the callback itself fails, well the old and/or new indexes become
> invalid.
Thats not going to work. E.g. the session might have been aborted or such. 
Also, there is not much you can do from an callback at transaction end as you 
cannot do catalog modifications.

I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely similar.
 
> > > > 2. no support for concurrent on system tables (not easy for shared
> > > > catalogs)
> > > 
> > > Doesn't this exclude all the tables that are in the schema catalog?
> > 
> > No. Only SELECT array_to_string(array_agg(relname), ', ') FROM pg_class
> > WHERE relisshared AND relkind = 'r';
> > their toast tables and their indexes are shared. The problem is that for
> > those you cannot create a separate index and let it update concurrently
> > because you cannot write into each databases pg_class/pg_index.

> Yes indeed, I didn't think about things that are shared among databases.
> Blocking that is pretty simple, only a matter of places checked.

Its just a bit sad to make the thing not really appear lockless ;)


Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-26 Thread Виктор Егоров
You're right, REINDEX was not done.

I've stopped the VACUUM, did a proper server restart (pg_ctl -m fast -w restart)
and will work on rebuilding relations.
Seems like I have another issue with a bunch of bloated tables on my way also.

Thanks for the support.

2012/9/26 Andres Freund :
>> Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE)
> It recommended doing a REINDEX first though? I guess you didn't do that?
>
> ...
>
> I guess you cannot cancel the vacuum? Last time it was in a cycle without
> checking interrupts inbetween.
>
> Can you restart the server?


-- 
Victor Y. Yegorov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] htup header reorganization breaks many extension modules

2012-09-26 Thread Michael Paquier
On Wed, Sep 26, 2012 at 4:14 PM, Albe Laurenz wrote:

> Hitoshi Harada wrote:
> > On Tue, Sep 25, 2012 at 5:30 PM, Peter Eisentraut 
> wrote:
> >> I haven't followed the details of the htup header reorganization, but
> I
> >> have noticed that a number of external extension modules will be
> broken
> >> because of the move of GETSTRUCT() and to a lesser extent
> >> heap_getattr().  Of course some #ifdefs can fix that, but it seems
> >> annoying to make everyone do that.  Maybe this could be reconsidered
> to
> >> reduce the impact on other projects.
>
> > But it's only add #include "access/htup_details.h"?  I'd not argue
> > it's harmful unless I missed your point.
>
> I guess the point is that you need an #ifdef if you want a module
> to be able to build with both 9.3 and lower versions.
>
> Otherwise the compiler will complain about the missing include
> file on older versions.
>
The modules of Postgres depend on the core and not the opposite, so isn't
it the responsability
of the maintainers of the modules to insure that what they make is still
compilable with postgres?
This can be simply fixed by providing, as mentionned, ifdefs controlled by
PG_VERSION_NUM
including htup_details.h, so the correction effort is not that much...
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Oid registry

2012-09-26 Thread Hannu Krosing

On 09/26/2012 02:48 AM, Andrew Dunstan wrote:


On 09/25/2012 08:35 PM, Peter Eisentraut wrote:

On Tue, 2012-09-25 at 18:22 -0400, Tom Lane wrote:

Actually, after reading another message you sent, I thought you were
going to respond that your proposed transforms feature would cover it.

I had thought about this some time ago, but it's clearer to think of
casts as associating two types, versus transforms associating a type and
a language.  JSON and XML tend to be types.



OK, I think this solves the object_to_json problem after all - we'll 
look for a cast to json and if it's not there use the string form of 
the object. Nice.

This is solved then :)

Would it be possible to also use the cast mechanism to do 
anyarray-to-json casts as parallel spelling for array_to_json()

and record-to-json cast for row_to_json()

btw, is anybody currently working on also going the opposite way, that 
is loading rows/records from json ?


That still leaves the other uses for having well known Oids (or 
possibly UUIDs) for non-builtin types (e.g. so Drivers don't have to 
look them up in the catalogs, or having issues when types are added to 
the core.)
One way to solve this would be to pass non-system oids to clients as 
their names. this would need a change in protocol.


Or we could make it so that server sends a special record with 
typename->typeoid mappings at first use of non-system type.



cheers

andrew







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-26 Thread Andres Freund
Hi,

On Wednesday, September 26, 2012 07:57:06 AM Виктор Егоров wrote:
> I'm afraid I'm exactly in this situation now.
:(
 
> Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE)
It recommended doing a REINDEX first though? I guess you didn't do that?

> was: INFO:  "meta_version_chunks": found 55363 removable, 32566245
> nonremovable row versions in 450292 out of 450292 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 588315 unused item pointers.
> 0 pages are entirely empty.
> CPU 2.44s/5.77u sec elapsed 2150.18 sec.
> INFO:  vacuuming "pg_toast.pg_toast_16582"
> 
> And here're are the locks held by the VACCUM backend:
> select
> oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoas
> tidxid from pg_class
>  where oid in (select relation from pg_locks where pid = 1380);
>   oid  |   relname| relkind | relpages | reltuples |
> reltoastrelid | reltoastidxid
> ---+--+-+--+---+---
> +--- 16585 | pg_toast_16582   | t   | 16460004 | 
> 58161600 |
>   0 | 16587
>  16587 | pg_toast_16582_index | i   |   188469 |  58161600 |
>   0 | 0
>  16582 | meta_version_chunks  | r   |   450292 |  32566200 |
> 16585 | 0
> 
> I will not touch anything and would like to get some recommendations on how
> to proceed.

On Wednesday, September 26, 2012 08:12:37 AM Виктор Егоров wrote:
> Forget to mention, that:
> - VACUUM is running on the master;
> - current state is unchanged for 20 hours.
I guess you cannot cancel the vacuum? Last time it was in a cycle without 
checking interrupts inbetween.

Can you restart the server?

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] htup header reorganization breaks many extension modules

2012-09-26 Thread Albe Laurenz
Hitoshi Harada wrote:
> On Tue, Sep 25, 2012 at 5:30 PM, Peter Eisentraut 
wrote:
>> I haven't followed the details of the htup header reorganization, but
I
>> have noticed that a number of external extension modules will be
broken
>> because of the move of GETSTRUCT() and to a lesser extent
>> heap_getattr().  Of course some #ifdefs can fix that, but it seems
>> annoying to make everyone do that.  Maybe this could be reconsidered
to
>> reduce the impact on other projects.

> But it's only add #include "access/htup_details.h"?  I'd not argue
> it's harmful unless I missed your point.

I guess the point is that you need an #ifdef if you want a module
to be able to build with both 9.3 and lower versions.

Otherwise the compiler will complain about the missing include
file on older versions.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system_information.triggers & truncate triggers

2012-09-26 Thread Daniel Farina
On Tue, Sep 25, 2012 at 10:55 PM, Jaime Casanova  wrote:
> On Wed, Sep 26, 2012 at 12:17 AM, Daymel Bonne Solís  wrote:
>> Hello hackers:
>>
>> I need a list of all triggers created in my database, but the view
>> system_information.triggers does not show truncate triggers, but it does for
>> insert, update and delete triggers.
>>
>> The same problem is found in versions 9.1 and 9.2.
>>
>
> The definition of information_schema.triggers contains this:
> """
> FROM pg_namespace n, pg_class c, pg_trigger t,
>  -- hard-wired refs to TRIGGER_TYPE_INSERT, TRIGGER_TYPE_DELETE,
>  -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE
>  (VALUES (4, 'INSERT'),
>  (8, 'DELETE'),
>  (16, 'UPDATE')) AS em (num, text)
> """
>
> so it seems that we are not showing TRUNCATE triggers intentionally,
> but that comment fails to explain why

Wouldn't it be because TRUNCATE is a PostgreSQL language extension?

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers