date:20121205

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-05 Thread Pavel Stehule

2012/12/4 Asif Rehman asifr.reh...@gmail.com:
 Hi,

 Here is the updated patch. I overlooked the loop, checking to free the
 conversions map. Here are the results now.

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral foo(i)
 as (a numeric, b numeric);
sum
 --
  303000.0
 (1 row)

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral foo(i)
 as (a float, b numeric);
sum
 --
  303000.00012

 Regards,
 --Asif

yes, it is fixed.

ok I have no objection

Regards

Pavel Stehule



 On Wed, Dec 5, 2012 at 12:40 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:

 Hello

 I fully agree with Asif's arguments

 previous tupconvert implementation was really strict - so using
 enhanced tupconver has very minimal impact for old user - and I
 checked same speed for plpgsql function - patched and unpatched pg.

 tested

 CREATE OR REPLACE FUNCTION public.foo(i integer)
  RETURNS SETOF record
  LANGUAGE plpgsql
 AS $function$
 declare r record;
 begin
   r := (10.1,20.1); for i in 1..10 loop return next r; end loop; return;
 end;
 $function$

 select sum(a) from generate_series(1,3000) g(i), lateral foo(i) as (a
 numeric, b numeric);

 More - everywhere where we use tupconvert internally, then we use
 cached conversion map - so I am sure, so speed of ANALYZE cannot be
 impacted by this patch

 There are other two issue:

 it allow to write new differnt slow code - IO cast is about 5-10-20%
 slower, and with this path anybody has new possibilities for new bad
 code. But it is not a problem of this patch. It is related to plpgsql
 design and probably we should to move some conversions to outside
 plpgsql to be possible reuse conversion map and enhance plpgsql. I
 have a simple half solutions - plpgsql_check_function can detect this
 situation in almost typical use cases and can raises a performance
 warning. So this issue is not stopper for me - because it is not new
 issue in plpgsql.

 Second issue is more significant:

 there are bug:

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 foo(i) as (a numeric, b numeric);
sum
 --
  303000.0
 (1 row)

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 foo(i) as (a float, b numeric);
   sum
 ---
  7.33675699577682e-232
 (1 row)

 it produces wrong result

 And with minimal change it kill session

 create or replace function foo(i integer)
 returns setof record as $$
 declare r record;
 begin
   r := (10,20); for i in 1..10 loop return next r; end loop; return;
 end;
 $$ language plpgsql;

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 foo(i) as (a numeric, b numeric);
 The connection to the server was lost. Attempting reset: Failed.

 create or replace function fo(i integer)
 returns record as $$
 declare r record;
 begin
   r := (10,20); return r;
 end;
 $$ language plpgsql;

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 fo(i) as (a int, b numeric);
   sum
 ---
  3
 (1 row)

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 fo(i) as (a numeric, b numeric);
 The connection to the server was lost. Attempting reset: Failed.

 Regards

 Pavel Stehule


 2012/12/3 Asif Rehman asifr.reh...@gmail.com:
  Hi,
 
  Thanks for the review. Please see the updated patch.
 
  Hmm.  We're running an I/O cast during do_tup_convert() now, and look
  up the required functions for each tuple.  I think if we're going to
  support this operation at that level, we need to look up the necessary
  functions at convert_tuples_by_foo level, and then apply
  unconditionally
  if they've been set up.
 
  Done. TupleConversionMap struct should keep the array of functions oid's
  that
  needs to be applied. Though only for those cases where both attribute
  type
  id's
  do not match. This way no unnecessary casting will happen.
 
 
  Also, what are the implicancies for existing users of tupconvert?  Do
  we
  want to apply casting during ANALYZE for example?  AFAICS the patch
  shouldn't break any case that works today, but I don't see that there
  has been any analysis of this.
 
  I believe this part of the code should not impact existing users of
  tupconvert.
  As this part of code is relaxing a restriction upon which an error would
  have been
  generated. Though it could have a little impact on performance but
  should be
  only for
  cases where attribute type id's are different and are implicitly cast
  able.
 
  Besides existing users involve tupconvert in case of inheritance. And in
  that case
  an exact match is expected.
 
  Regards,
  --Asif




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: store additional info in GIN index

2012-12-05 Thread Alexander Korotkov

On Wed, Dec 5, 2012 at 1:56 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 4.12.2012 20:12, Alexander Korotkov wrote:
  Hi!
 
  On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra t...@fuzzy.cz
  mailto:t...@fuzzy.cz wrote:
 
  I've tried to apply the patch with the current HEAD, but I'm getting
  segfaults whenever VACUUM runs (either called directly or from
 autovac
  workers).
 
  The patch applied cleanly against 9b3ac49e and needed a minor fix
 when
  applied on HEAD (because of an assert added to ginRedoCreatePTree),
 but
  that shouldn't be a problem.
 
 
  Thanks for testing! Patch is rebased with HEAD. The bug you reported was
  fixed.

 Applies fine, but I get a segfault in dataPlaceToPage at gindatapage.c.
 The whole backtrace is here: http://pastebin.com/YEPuWeuV

 The messages written into PostgreSQL log are quite variable - usually it
 looks like this:

 2012-12-04 22:31:08 CET 31839 LOG:  database system was not properly
 shut down; automatic recovery in progress
 2012-12-04 22:31:08 CET 31839 LOG:  redo starts at 0/68A76E48
 2012-12-04 22:31:08 CET 31839 LOG:  unexpected pageaddr 0/1BE64000 in
 log segment 00010069, offset 15089664
 2012-12-04 22:31:08 CET 31839 LOG:  redo done at 0/69E63638

 but I've seen this message too

 2012-12-04 22:20:29 CET 31709 LOG:  database system was not properly
 shut down; automatic recovery in progress
 2012-12-04 22:20:29 CET 31709 LOG:  redo starts at 0/AEAFAF8
 2012-12-04 22:20:29 CET 31709 LOG:  record with zero length at 0/C7D5698
 2012-12-04 22:20:29 CET 31709 LOG:  redo done at 0/C7D55E


 I wasn't able to prepare a simple testcase to reproduce this, so I've
 attached two files from my fun project where I noticed it. It's a
 simple DB + a bit of Python for indexing mbox archives inside Pg.

 - create.sql - a database structure with a bunch of GIN indexes on
tsvector columns on messages table

 - load.py - script for parsing mbox archives / loading them into the
 messages table (warning: it's a bit messy)


 Usage:

 1) create the DB structure
 $ createdb archives
 $ psql archives  create.sql

 2) fetch some archives (I consistently get SIGSEGV after first three)
 $ wget
 http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-01.gz
 $ wget
 http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-02.gz
 $ wget
 http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-03.gz

 3) gunzip and load them using the python script
 $ gunzip pgsql-hackers.*.gz
 $ ./load.py --db archives pgsql-hackers.*

 4) et voila - a SIGSEGV :-(


 I suspect this might be related to the fact that the load.py script uses
 savepoints quite heavily to handle UNIQUE_VIOLATION (duplicate messages).


Thanks for bug report. It is fixed in the attached patch.

--
With best regards,
Alexander Korotkov.


ginaddinfo.3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DEALLOCATE IF EXISTS

2012-12-05 Thread Heikki Linnakangas


On 30.11.2012 12:05, Vik Reykja wrote:

On Tue, Nov 27, 2012 at 3:15 PM, Heikki Linnakangashlinnakan...@vmware.com

wrote:



I fail to see the point of DEALLOCATE IF EXISTS. Do you have real use case
for this, or was this just a case of adding IF EXISTS to all commands for
the sake of completeness?

Usually the client knows what statements have been prepared, but perhaps
you want to make sure everything is deallocated in some error handling case
or similar. But in that case, you might as well just issue a regular
DEALLOCATE and ignore errors. Or even more likely, you'll want to use
DEALLOCATE ALL.


Hmm.  The test case I had for it, which was very annoying in an I want to
be lazy sort of way, I am unable to reproduce now.  So I guess this
becomes a make it like the others and the community can decide whether
that's desirable.

In my personal case, which again I can't reproduce because it's been a
while since I've done it, DEALLOCATE ALL would have worked.  I was
basically preparing a query to work on it in the same conditions that it
would be executed in a function, and I was only working on one of these at
a time so ALL would have been fine.


Ok. Being the lazy person that I am, I'm going to just mark this as 
rejected then. There is no consensus that we should decorate every DDL 
command with IF EXISTS, and even if we did, it's not clear that it 
should include DEALLOCATE. But thanks for the effort anyway!


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the number of pending entries in GIN index with FASTUPDATE=on

2012-12-05 Thread Heikki Linnakangas


On 28.11.2012 04:11, Kyotaro HORIGUCHI wrote:

3. pgstatginidex shows only version, pending_pages, and
   pendingtuples. Why don't you show the other properties such as
   entry pages, data pages, entries, and total pages as
   pgstatindex does?


I didn't expose those because they are accurate as of last VACUUM.
But if you think they are useful, I don't object to expose them.


Ok, my point was the apparent consistency of the functions. I
don't have any distinct wish about this.


We can always add more fields later if there's a need.


I'll mark this as Ready for Committer.


Thanks, committed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: fix corner use case of variadic fuctions usage

2012-12-05 Thread Heikki Linnakangas


On 30.11.2012 12:18, Vik Reykja wrote:

I am trying to review this patch and on my work computer everything
compiles and tests perfectly. However, on my laptop, the regression tests
don't pass with cache lookup failed for type XYZ where XYZ is some number
that does not appear to be any type oid.

I don't really know where to go from here. I am asking that other people
try this patch to see if they get errors as well.


I get the same error. I'll mark this as waiting on author in the 
commitfest.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: fix corner use case of variadic fuctions usage

2012-12-05 Thread Pavel Stehule

2012/12/5 Heikki Linnakangas hlinnakan...@vmware.com:
 On 30.11.2012 12:18, Vik Reykja wrote:

 I am trying to review this patch and on my work computer everything
 compiles and tests perfectly. However, on my laptop, the regression tests
 don't pass with cache lookup failed for type XYZ where XYZ is some
 number
 that does not appear to be any type oid.

 I don't really know where to go from here. I am asking that other people
 try this patch to see if they get errors as well.



 I get the same error. I'll mark this as waiting on author in the
 commitfest.

it was with new patch?

http://archives.postgresql.org/message-id/cafj8prdyxfxzcl2freslu-dpqaokou-ekky12tbzdo559pw...@mail.gmail.com

Regards

Pavel



 - Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Heikki Linnakangas


On 20.11.2012 21:25, Simon Riggs wrote:

On 19 November 2012 16:25, Robert Haasrobertmh...@gmail.com  wrote:


Beyond that, I think much of the appeal of the extension feature is
that it dumps as CREATE EXTENSION hstore; and nothing more.  That
allows you to migrate a dump between systems with different but
compatible versions of the hstore and have things work as intended.
I'm not opposed to the idea of being able to make extensions without
files on disk work ... but I consider it a niche use case; the
behavior we have right now works well for me and hopefully for others
most of the time.


Distributing software should only happen by files?

So why does Stackbuilder exist on the Windows binary?

Why does yum exist? What's wrong with ftp huh?

Why does CPAN?

I've a feeling this case might be a sensible way forwards, not a niche at all.


I have to join Robert in scratching my head over this. I don't 
understand what the use case is. Can you explain? I don't understand the 
comparison with stackbuilder, yum, ftp and CPAN. CPAN seems close to 
pgxn, but what does that have to do with this patch?


On 20.11.2012 11:08, Dimitri Fontaine wrote:
 Apparently I'm not the only one doing extensions without anything to
 compile, all SQL:

 http://keithf4.com/extension_tips_3

No doubt about that. I'm sure extensions written in pure SQL or PL/pgSQL 
are very common. But what does that have to do with this patch?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: fix corner use case of variadic fuctions usage

2012-12-05 Thread Heikki Linnakangas


On 05.12.2012 10:38, Pavel Stehule wrote:

2012/12/5 Heikki Linnakangashlinnakan...@vmware.com:

I get the same error. I'll mark this as waiting on author in the
commitfest.


it was with new patch?

http://archives.postgresql.org/message-id/cafj8prdyxfxzcl2freslu-dpqaokou-ekky12tbzdo559pw...@mail.gmail.com


Ah, no, sorry. The new patch passes regressions just fine. Never mind..

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: fix corner use case of variadic fuctions usage

2012-12-05 Thread Pavel Stehule

2012/12/5 Heikki Linnakangas hlinnakan...@vmware.com:
 On 05.12.2012 10:38, Pavel Stehule wrote:

 2012/12/5 Heikki Linnakangashlinnakan...@vmware.com:

 I get the same error. I'll mark this as waiting on author in the
 commitfest.


 it was with new patch?


 http://archives.postgresql.org/message-id/cafj8prdyxfxzcl2freslu-dpqaokou-ekky12tbzdo559pw...@mail.gmail.com


 Ah, no, sorry. The new patch passes regressions just fine. Never mind..

:)

Pavel


 - Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing PD_ALL_VISIBLE

2012-12-05 Thread Jeff Davis

On Fri, 2012-11-30 at 13:16 -0800, Jeff Davis wrote:
 I tried for quite a while to show any kind of performance difference
 between checking the VM and checking PD_ALL_VISIBLE on a 12-core box (24
 if you count HT).
 
 Three patches in question:
   1. Current unpatched master
   2. patch that naively always checks the VM page, pinning and unpinning
 each time
   3. Same as #2, but tries to keep buffers pinned (I had to fix a bug in
 this patch though -- new version forthcoming)

New patch attached.

Nathan Boley kindly lent me access to a 64-core box, and that shows a
much more interesting result. The previous test (on the 12-core)
basically showed no difference between any of the patches.

Now, I see why on the 64 core box: the interesting region seems to be
around 32 concurrent connections.

The left column is the concurrency, and the right is the runtime. This
test was for concurrent scans of a 350MB table (each process did 4 scans
and quit). Test program attached.

Patch 1 (scan test):

001 004.299533
002 004.434378
004 004.708533
008 004.518470
012 004.487033
016 004.513915
024 004.765459
032 006.425780
048 007.089146
064 007.908850
072 009.461419
096 013.098646
108 015.278592
128 019.797206

Patch 2 (scan test):

001 004.385206
002 004.596340
004 004.616684
008 004.832248
012 004.858336
016 004.689959
024 005.016797
032 006.857642
048 012.049407
064 025.774772
072 032.680710
096 059.147500
108 083.654806
128 120.350200

Patch 3 (scan test):

001 004.464991
002 004.95
004 004.562364
008 004.649633
012 004.628159
016 004.518748
024 004.768348
032 004.834177
048 007.003305
064 008.242714
072 009.732261
096 013.231056
108 014.996977
128 020.488570

As you can see, patch #2 starts to show a difference at around 32 and
completely falls over by 48 connections. This is expected because it's
the naive approach that pins the VM page every time it needs it.

Patch #1 and #3 are effectively the same, subsequent runs (and with more
measurements around concurrency 32) show that the differences are just
noise (which seems to be greater around the inflection point of 32). All
of the numbers that seem to show any difference can end up with patch #1
better or patch #3 better, depending on the run.

I tried the delete test, too, but I still couldn't see any difference.
(I neglected to mention in my last email: I aborted after each delete so
that it would be repeatable). The inflection point there is
significantly lower, so I assume it must be contending over something
else. I tried making the table unlogged to see if that would change
things, but it didn't change much. This test only scales linearly to
about 8 or so. Or, there could be something wrong with my test.

So, I conclude that contention is certainly a problem for scans for
patch #2, but patch #3 seems to fix that completely by holding the
buffer pins. The deletes are somewhat inconclusive, but I just can't see
a difference.

Holding more pins does have a distributed cost in theory, as Tom points
out, but I don't know where to begin testing that. We'll have to make a
decision between (a) maintaining the extra complexity and doing the
extra page writes involved with PD_ALL_VISIBLE; or (b) holding onto one
extra pin per table being scanned. Right now, if PD_ALL_VISIBLE did not
exist, it would be pretty hard to justify putting it in as far as I can
tell.

Regards,
Jeff Davis

#include libpq-fe.h
#include stdlib.h
#include stdio.h
#include sys/time.h

void
test(char *query, int niter)
{
  PGconn	*conn;
  PGresult	*result;
  int		 i;

  conn = PQconnectdb(host=/tmp dbname=postgres);
  if (PQstatus(conn) != CONNECTION_OK)
{
  fprintf(stderr, connection failed!\n);
  exit(1);
}

  result = PQprepare(conn, , query, 0, NULL);
  if (PQresultStatus(result) != PGRES_COMMAND_OK)
{
  fprintf(stderr, PREPARE failed: %s, PQerrorMessage(conn));
  PQclear(result);
  exit(1);
}
  PQclear(result);

  for (i = 0; i  niter; i++)
{
  result = PQexecPrepared(conn, , 0, NULL, NULL, NULL, 0);
  if (PQresultStatus(result) != PGRES_TUPLES_OK)
	{
	  fprintf(stderr, EXECUTE PREPARED failed: %s\n, PQerrorMessage(conn));
	  PQclear(result);
	  exit(1);
	}
  PQclear(result);
}

  PQfinish(conn);
}

int
main(int argc, char *argv[])
{
  int	 niter;
  int	 nprocs;
  char	*query = SELECT COUNT(*) FROM foo;
  int	 i;
  pid_t *procs;
  struct timeval tv1, tv2;

  if (argc != 3)
{
  fprintf(stderr, expected 3 arguments, got %d\n, argc);
  exit(1);
}

  nprocs = atoi(argv[1]);
  niter = atoi(argv[2]);

  procs = malloc(sizeof(pid_t) * nprocs);

  gettimeofday(tv1, NULL);

  for (i = 0; i  nprocs; i++)
{
  pid_t pid = fork();
  if (pid == 0)
	{
	  test(query, niter);
	  exit(0);
	}
  else
	{
	  procs[i] = pid;
	}
}

  for (i = 0; i  nprocs; i++)
{
  int status;
  waitpid(procs[i], status, 0);
  if (!WIFEXITED(status))
	{
	  fprintf(stderr, child did not exit!\n, argc);
	  exit(1);

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Simon Riggs

On 5 December 2012 00:16, Josh Berkus j...@agliodbs.com wrote:

 Sure, and the DevOps staff would be using the EXPLAIN feature (if we had
 it).  After which they could do little anyway except complain to the ORM
 authors, who might or might not give a damn.  I don't see that there's
 enough value-added from what you suggest to justify the development
 time.

 You're still thinking of a schema change as a SQL script.  ORM-based
 applications usually do not run their schema changes as SQL scripts,
 thus there's nothing to EXPLAIN.  Anything which assumes the presense of
 a distict, user-accessible SQL script is going to leave out a large
 class of our users.

And anything which assumes the *absence* of a manual script is also
leaving out a large class of users. ORMs are very important, but not
the only thing we serve.

Please assume that script meant a set of SQL statements that are
executed in a specific sequence to change a database model from one
version to another. Anything which requires editing of all (or worse,
just some) of the SQL statements is not a good solution. For ORMs,
this requires each ORM to make its own change to support that
functionality and to have a separate mode where it is used. For manual
scripts, this requires specific editing, which fails, as already
described. Either way EXPLAIN is bad, since editing/separate modes can
introduce bugs.

I think we need a parameter called

schema_change_reporting = off (default) | on   [USERSET]

which displays relevant statistics/reports about the actions taken by
DDL statements. That will also highlight locks and the need to reduce
their lock levels.

That's best used as a function to turn it on and then a function to
produce the report.

 However, as I said, if we had the EXPLAIN ALTER, we could use
 auto-explain to log the ALTER plans (finally, a good use for
 auto-explain).  So that's a workable workaround. And EXPLAIN ALTER would
 offer us more flexibility than any logging option, of course.

Auto explain executes things twice, which is not possible for DDL, so
it won't work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread John R Pierce


On 12/5/2012 1:42 AM, Simon Riggs wrote:

I think we need a parameter called

schema_change_reporting = off (default) | on   [USERSET]

which displays relevant statistics/reports about the actions taken by
DDL statements. That will also highlight locks and the need to reduce
their lock levels.


where does this get displayed?   is it just tossed into the postgres log 
files?






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs

On 5 December 2012 00:35, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 So apparently this is something we broke since Nov 18.  Don't know what
 yet --- any thoughts?

 Further experimentation shows that reverting commit
 ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work.  So there's
 something wrong/incomplete about that fix.

 This is a bit urgent since we now have to consider whether to withdraw
 9.2.2 and issue a hasty 9.2.3.  Do we have a regression here since
 9.2.1, and if so how bad is it?

I'll look at this now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Dimitri Fontaine

Heikki Linnakangas hlinnakan...@vmware.com writes:
 No doubt about that. I'm sure extensions written in pure SQL or PL/pgSQL are
 very common. But what does that have to do with this patch?

This patch is all about enabling users to create extension without
having to ship them as root on the file system of the database(s)
server(s) first.

When you're having to code your extension in C, you know you're in for
shipping an executable binary (.so, .dylib or .dll), and for security
reasons it's well understood that you will have to get root privileges
on the server's file system to ship your binary before to be able to ask
PostgreSQL to please load it and execute the code in there.

When you can code your extension using dynamic code such as SQL or
PL/pgSQL, PL/pythonu or PL/perl, there's absolutely no good reason to
have to do the ship on the server's file system first that I can see.

Technically creating an extension inline (sending its definition in
the CREATE EXTENSION query itself) solves the problem of having to
access the server's file system as root.

Then, next pg_dump will include CREATE EXTENSION foo; as usual and at
pg_restore time that access files on the file systems. But maybe you
still are not granted access to the server's file system as root on the
pg_restore target, right? So now you need to be able to include the
extension's script into the dump.

Now, we don't want to have more than one kind of extensions. That's what
we learnt all together when reviewing my proposal from last year. Having
more than one way to ship an extension is good, having two different
animals with two different incompatible behaviors named the same thing
is bad. The solution we found is then to be able to include an
extension's script into pg_dump's output, and that's what my current
patch implements, per last year review.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

2012-12-05 Thread Kohei KaiGai

Thanks for your reviewing in spite of large number of lines.

My comments are below.

2012/12/4 Simon Riggs si...@2ndquadrant.com:
 Patch looks good and also like it will/can be ready for 9.3. I'm happy
 to put time into this as committer and/or reviewer and take further
 responsibility for it, unless others wish to.

 LIKES

 * It's pretty simple to understand and use

 * Check qual is stored in pre-parsed form. That is fast, and also
 secure, since changing search_path of the user doesn't change the
 security check at all. Nice.

 * Performance overhead is low, integration with indexing is clear and
 effective and it works with partitioning

 * It works, apart from design holes listed below, easily plugged IMHO


 DISLIKEs

 * Who gets to see stats on the underlying table? Are the stats
 protected by security? How does ANALYZE work?

I think, ANALYZE should perform on the raw tables without row-security
policy. Even though statistics are gray-zone, it is not a complete set
of the raw table contents, so all we can do is just implying the original
from processed statistical values. The situation is similar to iteration of
probe using PK/FK violation. In general, it is called covert channel, and
out of the scope in regular access control mechanism (including MAC).
So, I don't think we have special protection on pg_stat even if row-
security is configured.

 * INSERT ignores the SECURITY clause, on the ground that this has no
 meaning. So its possible to INSERT data you can't see. For example,
 you can insert medical records for *another* patient, or insert new
 top secret information. This also causes a security hole... since
 inserted rows can violate defined constraints, letting you know that
 other keys you can't see exist. Why don't we treat the SECURITY clause
 as a CHECK constraint? That makes intuitive sense to me.

 * UPDATE allows you to bypass the SECURITY clause, to produce new rows
 that you can't see. (Not good). But you can't get them back again, cos
 you can't see them.

The above two comments seems me that you are suggesting to apply
checks on both of scanning rows stage (UPDATE case) and modifying
rows stage (INSERT and UPDATE), to prevent touchable rows getting
gone anywhere.
In the previous discussion, it was suggested we can implement this
feature using before-row trigger. However, I love the idea to support
same row-security policy integrated with CHECK constraint; that kills
individual user's operation to define triggers.
One problem is a case when row-security policy contains SubLink node.
I expect it takes a special case handling, however, also guess not hard
to implement so much.
Let me investigate the code around here.

 * TRUNCATE works, and allows you to remove all rows of a table, even
 ones you can't see to run a DELETE on. Er...

It was my oversight. My preference is to rewrite TRUNCATE command
with DELETE statement in case when row-security policy is active on
the target table.
In this case, a NOTICE message may be helpful for users not to assume
the table is always empty after the command.

 None of those things are cool, at all.

 Oracle defaults to putting VPD on all event types: INSERT, UPDATE,
 DELETE, SELECT. ISTM we should be doing the same, not just say we can
 add an INSERT trigger if you want.

 Adding a trigger just begs the question as to why we are bothering in
 the first place, since this functionality could already be added by
 INSERT, UPDATE or DELETE triggers, if they are a full replacement for
 this feature. The only answer is ease of use

 We can easily add syntax like this

 [ROW SECURITY CHECK (  ) [ON [ ALL | INSERT, UPDATE, DELETE, SELECT 
 [..,

 with the default being ALL

I think it is flaw of Oracle. :-)
In case when user can define leakable function, it enables to leak contents
of invisible rows at the timing when executor fetch the rows, prior to
modification
stage, even if we allows to configure individual row-security policies
for SELECT
and DELETE or UPDATE commands.
My preference is one policy on a particular table for all the commands.

 * The design has nothing at all to do with SECURITY LABELs. Why did we
 create them, I wonder? I understood we would have row-level label
 security. Doesn't that require us to have a data type, such as
 reglabel or similar enum? Seems strange. Oracle has two features:
 Oracle Label Security and Row Level Security -

I think it should be implemented on the next step. It additionally takes
two independent features (1) functionality to inject a column to store
security label at table definition. (2) functionality to assign a default
security label when a new row is inserted.
As Oracle constructs OLS on the top of VPD feature, the base row-
security feature shall be upstreamed first.

 OTHER

 * The docs should explain a little better how to optimize using RLS.
 Specifically, the fact that indexable operators are marked leakproof
 and thus can be optimized ahead of the rlsquals. The docs say rls
 quals

Re: [HACKERS] Fwd: question on foreign key lock

2012-12-05 Thread Filip Rembiałkowski

Robert, thank you for  the answer.

1. need exclusive lock anyway to add triggers.
Why adding a trigger needs exclusive lock?
Someone would say blocking reads is not needed (since possible trigger
events are: Insert/Update/Delete/Truncate).

2. will create a risk of deadlock.
From user perspective a risk of deadlock is sometimes better than
excessive  locking. Transactional DDL users should be prepared for
exceptions/retries anyway.

3. I made a naive test of simply changing AccessExclusiveLock to
ExclusiveLock, and seeing how many regression tests it breaks. It
breaks none :-)
Current Git head gives me 2 fails/133 tests regardless of this change.


regards,
Filip










On Mon, Nov 12, 2012 at 5:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Nov 8, 2012 at 3:45 AM, Filip Rembiałkowski
 filip.rembialkow...@gmail.com wrote:
 maybe this is a better group for this question?

 I can't see why creating foreign key on table A referencing table B,
 generates an AccessExclusiveLock on B.
 It seems (to a layman :-) ) that only writes to B should be blocked.

 I'm really interested if this is either expected effect or any open TODO
 item or suboptimal behavior of postgres.

 This comment explains it:

 /*
  * Grab an exclusive lock on the pk table, so that someone doesn't delete
  * rows out from under us. (Although a lesser lock would do for that
  * purpose, we'll need exclusive lock anyway to add triggers to the pk
  * table; trying to start with a lesser lock will just create a risk of
  * deadlock.)
  */
 pkrel = heap_openrv(fkconstraint-pktable, AccessExclusiveLock);

 Concurrent DDL is something that's been discussed in detail on this
 list in the past; unfortunately, there are some tricky race conditions
 are the shared invalidation queue and SnapshotNow that make it hard to
 implement properly.  I'm hoping to have some time to work on this at
 some point, but it hasn't happened yet.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

2012-12-05 Thread Amit Kapila

On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:
 After some diversions to fix bugs and refactor existing code, I've
 committed a couple of small parts of this patch, which just add some
 sanity checks to notice incorrect PITR scenarios. Here's a new version
 of the main patch based on current HEAD.

After testing with the new patch, the following problems are observed. 

Defect - 1: 

1. start primary A 
2. start standby B following A 
3. start cascade standby C following B. 
4. start another standby D following C. 
5. Promote standby B. 
6. After successful time line switch in cascade standby C  D, stop D. 
7. Restart D, Startup is successful and connecting to standby C. 
8. Stop C. 
9. Restart C, startup is failing. 

Defect-2: 
1. start primary A 
2. start standby B following A 
3. start cascade standby C following B. 
4. Start another standby D following C. 
5. Execute the following commands in the primary A. 
   create table tbl(f int); 
   insert into tbl values(generate_series(1,1000)); 
6. Promote standby B. 
7. Execute the following commands in the primary B. 
   insert into tbl values(generate_series(1001,2000)); 
   insert into tbl values(generate_series(2001,3000)); 
8. Stop standby D normally and restart D. Restart is failing. 
9. Stop standby C normally and restart C. Restart is failing. 

Note: Stop the node means doing a smart shutdown.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Petr Jelinek

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Josh Berkus
 Sent: 05 December 2012 01:17
 To: Tom Lane
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] ALTER TABLE ... NOREWRITE option

 However, as I said, if we had the EXPLAIN ALTER, we could use auto-explain
 to log the ALTER plans (finally, a good use for auto-explain).  So that's a
 workable workaround. And EXPLAIN ALTER would offer us more flexibility
 than any logging option, of course.

+1

And for the minority who wants to check themselves (like me) in their tooling 
this is also usable solution.

Regards
Petr Jelinek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Simon Riggs

On 5 December 2012 09:46, John R Pierce pie...@hogranch.com wrote:
 On 12/5/2012 1:42 AM, Simon Riggs wrote:

 I think we need a parameter called

 schema_change_reporting = off (default) | on   [USERSET]

 which displays relevant statistics/reports about the actions taken by
 DDL statements. That will also highlight locks and the need to reduce
 their lock levels.


 where does this get displayed?   is it just tossed into the postgres log
 files?

Good question. Where would you like?

It needs to be per-session rather than global.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Is ramdisk usefull ?

2012-12-05 Thread Gilles Fauvie

Hello everybody,

There is message from Tom Lane (2008 !) about ramdisk:

begin /


 So, IMHO, saying trust your OS + PostgreSQL is not a 100% perfect
 approach for the people who are asking to keep their objects on RAM,
 even though I know that there is nothing we can say right now.

Well, nothing is a 100% solution.  But my opinion is that people who
think they are smarter than an LRU caching algorithm are typically
mistaken.  If the table is all that heavily used, it will stay in memory
just fine.  If it's not sufficiently heavily used to stay in memory
according to an LRU algorithm, maybe the memory space really should be
spent on something else.

Now there are certainly cases where a standard caching algorithm falls
down --- the main one I can think of offhand is where you would like to
give one class of queries higher priority than another, and so memory
space should preferentially go to tables that are needed by the first
class.  But if that's your problem, pin these tables in memory is
still an awfully crude solution to the problem.  I'd be inclined to
think instead about a scheme that lets references made by
higher-priority queries bump buffers' use-counts by more than 1,
or some other way of making the priority considerations visible to an
automatic cache management algorithm.

regards, tom lane


end /

I'm working on a migration project (IDS2 to Postgresql) for a big
organisation in Belgium.
We need a very quick access to a postgresql table. I did some tests with
and without using a ramdisk, and I noticed that requests are smarter using
the ramdisk (context: more than 1,000k reads a day and only 5 or 6 writes a
day).
Is this message up-to-date ? Can you confirm me that's better to let work
the engine of postgresql ? Is there another solution to give more or less
priority to a request or a table ?
Giving less priority to a request is also interesting me because we need to
launch big batch request. My first idea was to create a slave server using
streaming replication. Maybe there is another solution... ?

Best Regards,

Gilles Fauvie



-- 

*Gilles Fauvie - Open Source Consultant
*Email: gfau...@integer.be
LinkedIn: http://be.linkedin.com/in/fauviegilles

*INTEGER SPRL/BVBA
*P. +32 (0)67 88 36 13 | F. +32 (0)67 47 52 04
http://www.integer.be | Email: customerc...@integer.be

To contact our technical support, we highly recommend that you use the
email address customerc...@integer.be. This will help us save time and we
can respond more quickly to your questions.

Pour contacter notre support technique, nous recommandons grandement que
vous utilisiez l'adresse email customerc...@integer.be. Cela nous aidera à
gagner du temps et nous pourrons répondre plus vite à vos questions.

[HACKERS] Review: Dumping an Extension's Script

2012-12-05 Thread Ali Dar

I have done a high level review of the patch, and here are my comments:

1) I don't know if community is really looking for this functionality, I
followed the thread a bit and appraently there is a disagreement over it. I
will not comments on that. Let the big guys decide if they really want this
feature.

2) I think comments are very limited, I would be really great if you can
add thorough comments of whatever new that you have added. Couple of
examples are given below:
i) You have added 'script' variable in 'ExtensionControlFile' structure,
but no comments. All other variables have one, you should follow the suite
and we can understand what's going on at the first look.
ii) What is the need of require option?
It would be great if you can go through all the comments and try to add the
missing comments and improve the old ones.
3) There are spelling mistakes in the existing comments, for example:
i) File:pg_dump.c, line:850
ii) File: pg_dump.h line: 136
It would be good idea to run a spell check on your old and new comments.

4) 'if_no_exits' flag of new grammar rule is set to false, even though the
rule states that the extension is created as 'IF NOT EXISTS'.

5) Why 'relocatable' option is added to the new grammar rule of create
extension? In a regular 'create extension' statement, when 'schema' option
is specified, it means that the extension is relocatable. So why is option
needed? If there is a specific reason why 'scheme' option wasn't used, then
please explain.

6) I am unable to figure out why new 'require' option is needed(comments
would have helped)? Is it because when we will dump the extension with -X
flag, it will first dump that referenced extension?
I created the following extension(hstore was already created):
create extension testex with version '1' requires 'hstore' as
$testex$ create type testtype as(a char) $testex$;

I then dumped the extension using -X option, only 'testex' extension was
dumped and not 'hstore'. Apparently by looking at query being used to fetch
extensions in dump, the extensions that it depends on are also being
fetched. But it's not working like that.

7) You have removed an old PG comments in pg_dump.c line:7526 above the
following check:
if (!extinfo-dobj.dump || dataOnly)
return;
I guess it's been done in mistake?

8) In extension.c the following check is added:
/*
 * If the extension script is not in the command, then the user is not
 * the extension packager and we want to read about relocatable and
 * requires in the control file, not in the SQL command.
 */
if (d_relocatable || d_requires)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(parameter \%s\ is only expected when using CREATE EXTENSION AS,
d_relocatable ? relocatable : requires)));

Is the above code really reachable? If the user has not specified a
script(old create extension syntax is used), then 'd_relocatable' and
'd_requires' will never be true, because the old 'create extension' syntax
doesn't allow it.


9) For every extension being dumped, and then for every object inside that
extension. The code traverses all the fetched items of the database to
compare if that item belong to that specific extension. Because of that
'DumpableObject **dobjs' is traversed fully, every time a single extension
is dumped. This seems to be big big performance hit for a huge database.
And considering if many many extensions are dumped in a huge database, dump
might become very slow. Can you think of any other scheme? Maybe you should
follow PG's design, and fetch the extension objects in dumpExtension() and
dump them.

10) pg_dumpall is not handled at all. User can't use -X switch if he wants
to dump all the databases with extensions.

Regards,
Ali Dar

Re: [HACKERS] [WIP] pg_ping utility

2012-12-05 Thread Alvaro Herrera

Phil Sorber escribió:
 On Mon, Dec 3, 2012 at 11:59 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:

  - Same thing with this example:
  +   para
  +Standard Usage:
  +screen
  + prompt$/prompt userinputpg_isready/userinput
  + prompt$/prompt userinputecho $?/userinput
  + computeroutput0/computeroutput
  +/screen
  +   /para
  For the time being PQPING_OK returns 0 because it is on top of the enum
  PGPing, but this might change if for a reason or another the order of
  outputs is changed.
 
 So I understand what you mean by the ordering might change, but this
 is actual output from the shell. I'm not sure how to convey that
 sentiment properly here and still have a real example. Perhaps just
 remove the example?

No, I think it is the reference docs on the returned value that must be
fixed.  That is, instead of saying that the return value correspond to
the enum values, you should be saying that it will return
literal0/literal if it's okay, 1 in another case and 2 in yet
another case.  And then next to the PQping() enum, add a comment that
the values must not be messed around with because pg_isready exposes
them to users and shell scripts.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Dumping an Extension's Script

2012-12-05 Thread Cédric Villemain

 1) I don't know if community is really looking for this functionality, I
 followed the thread a bit and appraently there is a disagreement over it. I
 will not comments on that. Let the big guys decide if they really want this
 feature.

I saw very few distinct people voting on that.
I think this is a killer feature to improve extension usage.

The use-case is very clear and confirmed (and has been explained by Dimitri 
already : deploy extension in several hundreds of database without setting up 
ton's of new connections, ip/user/password association, etc. just a single sql 
command)
BTW I can dump everything from port 5432, except extensions (I can even run 
pg_basebackup instead of rsync !).

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Tom Lane

John R Pierce pie...@hogranch.com writes:
 On 12/5/2012 1:42 AM, Simon Riggs wrote:
 I think we need a parameter called
 
 schema_change_reporting = off (default) | on   [USERSET]
 
 which displays relevant statistics/reports about the actions taken by
 DDL statements. That will also highlight locks and the need to reduce
 their lock levels.

 where does this get displayed?   is it just tossed into the postgres log 
 files?

And perhaps more to the point, what's the advantage compared to
good old log_statement = ddl?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-05 Thread Robert Haas

On Tue, Dec 4, 2012 at 2:05 PM, Jan Wieck janwi...@yahoo.com wrote:
 So the question on the table is which of these three intervals
 should be GUCs, and what values to use if they aren't.

 I could live with all the above defaults, but would like to see more
 comments on them.

I largely agree with what's already been said.  The only interval that
seems to me to maybe need its own knob is the total time after which
the autovacuum worker will give up.  If we set it to 2 *
deadlock_timeout, some people might find that a reason to raise
deadlock_timeout.  Since people *already* raise deadlock_timeout to
obscenely high values (a minute?  an hour???) and then complain that
things blow up in their face, I think there's a decent argument to be
made that piggybacking anything else on that setting is unwise.

Against that, FWICT, this problem only affects a small number of
users: Jan is the only person I can ever remember reporting this
issue.  I'm not dumb enough to think he's the only person who it
affects; but my current belief is that it's not an enormously common
problem.  So the main argument I can see against adding a GUC is that
the problem is too marginal to justify a setting of its own.  What I
really see as the key issue is: suppose we hardcode this to say 2
seconds.  Is that going to fix the problem effectively for 99% of the
people who have this problem, or for 25% of the people who have this
problem?  In the former case, we probably don't need a GUC; in the
latter case, we probably do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Alvaro Herrera

Here's a first attempt at a new documentation chapter.  This goes in
part Server Programming, just after the SPI chapter.

I just noticed that worker_spi could use some more sample code, for
example auth_counter was getting its own LWLock and also its own shmem
area, which would be helpful to demonstrate I think.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
!-- doc/src/sgml/bgworker.sgml --

chapter id=bgworker
 titleBackground Worker Processes/title

 indexterm zone=bgworker
  primaryProcesses/primary
  secondaryAdditional/secondary
 /indexterm

 para
  PostgreSQL can be extended to run user-supplied code in separate processes.
  Such processes are started, stopped and monitored by 
commandpostgres/command,
  which permits them have a lifetime closely linked to the server's status.
  These processes have the option to attach to productnamePostgreSQL/'s
  shared memory area and connect to databases internally.
 /para

 warning
  para
   There are considerable robustness and security risks in using background
   worker processes, because them being written in the literalC/ language
   gives them unrestricted access to data.  Administrators wishing to enable
   modules that include background worker process should exercise extreme
   caution.  Only carefully audited modules should be permitted to run
   background worker processes.
  /para
 /warning

 para
  Only modules listed in varnameshared_preload_libraries/ can run
  background workers.  A module wishing to register a background worker needs
  to register it by calling
  functionRegisterBackgroundWorker(typeBackgroundWorker 
*worker/type)/function.
  The structure structnameBackgroundWorker/structname is defined thus:
programlisting
typedef struct BackgroundWorker
{
char   *bgw_name;
int bgw_flags;
BgWorkerStartTime bgw_start_time;
int bgw_restart_time;   /* in seconds, or BGW_NEVER_RESTART */
bgworker_main_type  bgw_main;
void   *bgw_main_arg;
bgworker_sighdlr_type bgw_sighup;
bgworker_sighdlr_type bgw_sigterm;
} BackgroundWorker;
/programlisting
  /para

  para
   structfieldbgw_name/ is a string to be used in log messages, process
   lists and similar contexts.
  /para

  para
   structfieldbgw_flags/ is a bitwise-or'd bitmask indicating the
   capabilities that the module would like to have.  Possible values are
   literalBGWORKER_SHMEM_ACCESS/literal (requesting shared memory access)
   and literalBGWORKER_BACKEND_DATABASE_CONNECTION/literal (requesting the
   ability to establish a database connection, through which it can later run
   transactions and queries).
  /para

  para
   structfieldbgw_start_time/structfield is the server state during which
   commandpostgres/ should start the process; it can be one of
   literalBgWorkerStart_PostmasterStart/ (start as soon as
   commandpostgres/ itself has finished its own initialization; processes
   requesting this are not eligible for database connections),
   literalBgWorkerStart_ConsistentState/ (start as soon as consistent state
   has been reached in a HOT standby, allowing processes to connect to
   databases and run read-only queries), and
   literalBgWorkerStart_RecoveryFinished/ (start as soon as the system has
   entered normal read-write state).  Note the last two values are equivalent
   in a server that's not a HOT standby.
  /para
  
  para
   structfieldbgw_restart_time/structfield is the interval, in seconds, that
   commandpostgres/command should wait before restarting the process,
   in case it crashes.  It can be any positive value, or 
literalBGW_NEVER_RESTART/literal, indicating not to restart the process in 
case of a crash.
  /para

  para
   structfieldbgw_main/structfield is a pointer to the function to run once
   the process is started.  structfieldbgw_main_arg/structfield will be
   passed to it as its only argument.  Note that
   literalMyBgworkerEntry/literal is a pointer to a copy of the
   structnameBackgroundWorker/structname structure passed
   at registration time.
  /para

  para
   structfieldbgw_sighup/structfield and structfieldbgw_sigterm/ are
   pointers to functions that will be installed as signal handlers for the new
   process.
  /para

  paraOnce running, the process can connect to a database by calling
   functionBackgroundWorkerInitializeConnection(parameterchar 
*dbname/parameter, parameterchar *username/parameter)/function.
   This allows the process to run transactions and queries using the
   literalSPI/literal interface.  If varnamedbname/ is NULL,
   the session is not connected to any particular database, but shared catalogs
   can be accessed.  If varnameusername/ is NULL, the process will run as
   the superuser created during commandinitdb/.
  /para

  para
   Signals are initially blocked when control reaches the
   structfieldbgw_main/ function, and must be unblocked by it; this is to
   allow the process to

Re: [HACKERS] [WIP] pg_ping utility

2012-12-05 Thread Phil Sorber

On Wed, Dec 5, 2012 at 8:53 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Phil Sorber escribió:
 On Mon, Dec 3, 2012 at 11:59 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:

  - Same thing with this example:
  +   para
  +Standard Usage:
  +screen
  + prompt$/prompt userinputpg_isready/userinput
  + prompt$/prompt userinputecho $?/userinput
  + computeroutput0/computeroutput
  +/screen
  +   /para
  For the time being PQPING_OK returns 0 because it is on top of the enum
  PGPing, but this might change if for a reason or another the order of
  outputs is changed.

 So I understand what you mean by the ordering might change, but this
 is actual output from the shell. I'm not sure how to convey that
 sentiment properly here and still have a real example. Perhaps just
 remove the example?

 No, I think it is the reference docs on the returned value that must be
 fixed.  That is, instead of saying that the return value correspond to
 the enum values, you should be saying that it will return
 literal0/literal if it's okay, 1 in another case and 2 in yet
 another case.  And then next to the PQping() enum, add a comment that
 the values must not be messed around with because pg_isready exposes
 them to users and shell scripts.

+1 I'm on board with this.


 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Dumping an Extension's Script

2012-12-05 Thread Dimitri Fontaine

Hi,

Thanks for your review!

Ali Dar ali.munir@gmail.com writes:
 1) I don't know if community is really looking for this functionality, I
 followed the thread a bit and appraently there is a disagreement over it. I
 will not comments on that. Let the big guys decide if they really want this
 feature.

Let's open the extension feature to users who are not at ease with
building packages for their distribution of choice or our users who are
not granted root privileges on their database servers.

After all you don't even need to be a superuser to CREATE an EXTENSION.

 2) I think comments are very limited, I would be really great if you can
 add thorough comments of whatever new that you have added. Couple of
 examples are given below:

I've added some comments in the attached new revision of the patch, and
fixed the errors you've been listing.

 ii) What is the need of require option?
 It would be great if you can go through all the comments and try to add the
 missing comments and improve the old ones.

When CREATE EXTENSION is given the extension's script in the SQL command
itself directly, the control files property are not to be found in a
separate file. Most of those properties relate to the packaging and the
reading of the file so we don't need them by construction, but it's not
the case with two of them.

The parameter require is used in the create extension code to be able
to set the search_path correctly before running the extension's script,
so that dependencies are found automatically.

The parameter relocatable ends up in the pg_extension catalogs for
being able to generate an ERROR if the user attempts to ALTER EXTENSION
… SET SCHEMA, so we need a way to have it from the SQL command now (so
that pg_dump emits a non lossy command).

Here's the comment I've been adding to gram.y to explain that. Note that
I wouldn't expect to find such a comment at this place, but didn't see
another place where to detail it…

/*
 * This option is needed only when the script is given
 * inline. We then don't have any control file but still
 * need to set the search_path according to the dependency
 * list when running the extension's script.
 */
if (strcmp($1, requires) == 0)

 3) There are spelling mistakes in the existing comments, for example:

Fixed, thanks.

 4) 'if_no_exits' flag of new grammar rule is set to false, even though the
 rule states that the extension is created as 'IF NOT EXISTS'.

Oh, a copy/paste bug, the worse kind. Fixed, thanks.

 5) Why 'relocatable' option is added to the new grammar rule of create
 extension? In a regular 'create extension' statement, when 'schema' option
 is specified, it means that the extension is relocatable. So why is option
 needed? If there is a specific reason why 'scheme' option wasn't used, then
 please explain.

When schema option is specificied, it does NOT mean that the extension
is relocatable. The relocatable option is choosen by the extension's
author to cope with some dependencies and other problems, and is
important for reading the script (@extschema@ substitutions). Also it's
then essential to disallow ALTER EXTENSION … SET SCHEMA … even if that
very command could work, so we keep that parameter in the catalogs.

It's one of the only two control parameters that still needs to be
passed on the SQL command now.

 6) I am unable to figure out why new 'require' option is needed(comments
 would have helped)? Is it because when we will dump the extension with -X
 flag, it will first dump that referenced extension?

That's not how it works, no. We can of course review that, but I don't
currently see any reason to force dumping required extensions scripts.

The 'require' processing didn't change at all in this patch. You are
just allowed to give the list in the SQL command rather than the control
file, that's all.

 I created the following extension(hstore was already created):
 create extension testex with version '1' requires 'hstore' as
 $testex$ create type testtype as(a char) $testex$;

 I then dumped the extension using -X option, only 'testex' extension was
 dumped and not 'hstore'.

I suppose that at restore time you will have to have hstore on disk
with its .so file, but maybe won't have testex readily deployed on the
new server. So my understanding is that what's happening is what needs
to happen. What do you think?

 7) You have removed an old PG comments in pg_dump.c line:7526 above the
 following check:

Oops. Added back. Thanks.

 8) In extension.c the following check is added:
 /*
  * If the extension script is not in the command, then the user is not
  * the extension packager and we want to read about relocatable and
  * requires in the control file, not in the SQL command.
  */
 if (d_relocatable || d_requires)
 ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg(parameter \%s\ is only expected when using CREATE EXTENSION AS,
 d_relocatable ? relocatable :

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-05 Thread Kevin Grittner

Robert Haas wrote:

 Since people *already* raise deadlock_timeout to obscenely high
 values (a minute? an hour???) and then complain that things blow
 up in their face, I think there's a decent argument to be made
 that piggybacking anything else on that setting is unwise.

If people are really doing that, then I tend to agree. I wasn't
aware of that practice.

 Against that, FWICT, this problem only affects a small number of
 users: Jan is the only person I can ever remember reporting this
 issue. I'm not dumb enough to think he's the only person who it
 affects; but my current belief is that it's not an enormously
 common problem. So the main argument I can see against adding a
 GUC is that the problem is too marginal to justify a setting of
 its own. What I really see as the key issue is: suppose we
 hardcode this to say 2 seconds. Is that going to fix the problem
 effectively for 99% of the people who have this problem, or for
 25% of the people who have this problem? In the former case, we
 probably don't need a GUC; in the latter case, we probably do.

Given the fact that autovacuum will keep throwing workers at it to
essentially loop indefinitely at an outer level, I don't think the
exact setting of this interval is all that critical either. My gut
feel is that anything in the 2 second to 5 second range would be
sane, so I won't argue over any explicit setting within that range.
Below that, I think the overhead of autovacuum coming back to the
table repeatedly would probably start to get too high; below that
we could be causing some small, heavily-updated table to be
neglected by autovacuum -- especially if you get multiple
autovacuum workers tied up in this delay on different tables at the
same time.

Regarding how many people are affected, I have seen several reports
of situations where users claim massive impact on performance when
autovacuum kicks in. The reports have not included enough detail to
quantify the impact or in most cases to establish a cause, but this
seems like it could have a noticable impact, especially if the
deadlock timeout was set to more than a second.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP json generation enhancements

2012-12-05 Thread David E. Wheeler

On Nov 26, 2012, at 11:12 AM, Peter Eisentraut pete...@gmx.net wrote:

 Although my intuition would be [], the existing concatenation-like
 aggregates return null for no input rows, so this probably ought to be
 consistent with those.

This annoys me at times, but I wrap such calls in COALESCE() and forget about 
it. So I agree to keep it consistent with other array-returning aggregate 
functions.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread David E. Wheeler

On Dec 4, 2012, at 10:05 AM, Josh Berkus j...@agliodbs.com wrote:

json_get(json, variadic text) = json
 
 Given that I already do the equivalent in Python, this would suit me
 well.  Not sure about other users ...

Well, given that sometimes you will have mixed arrays and objects, how would 
you distinguish 42 as an object key or an array index?

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Heikki Linnakangas


On 05.12.2012 12:22, Dimitri Fontaine wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

No doubt about that. I'm sure extensions written in pure SQL or PL/pgSQL are
very common. But what does that have to do with this patch?


This patch is all about enabling users to create extension without
having to ship them as root on the file system of the database(s)
server(s) first.
...
When you can code your extension using dynamic code such as SQL or
PL/pgSQL, PL/pythonu or PL/perl, there's absolutely no good reason to
have to do the ship on the server's file system first that I can see.

Technically creating an extension inline (sending its definition in
the CREATE EXTENSION query itself) solves the problem of having to
access the server's file system as root.


Ok, I'm with you this far.


Then, next pg_dump will include CREATE EXTENSION foo; as usual and at
pg_restore time that access files on the file systems. But maybe you
still are not granted access to the server's file system as root on the
pg_restore target, right? So now you need to be able to include the
extension's script into the dump.


Now you lost me. I can see the need to install an extension without 
access to the filesystem - but it does not follow that you need to be 
able to dump an extension script. In general, I think you're confusing 
three things:


1. The way an extension is deployed. It could be by copying the files to 
the file system, by sending them over libpq, or shipped in .rpms by the 
OS, or something else.


2. The way an extension's files are laid out before it's deployed. 
Typically, you want to keep an extension's source code (whether it's C 
or SQL or plpython) in a version control system.


3. Being able to deploy extensions to the server without superuser or 
root access


I think it would make this discussion a lot clearer if we keep those 
concerns separate. It's useful to have a mechanism to deploy an 
extension over libpq. It's not clear to me if you're envisioning to 
change 2. I don't think we should; having a .sql file and a .control 
file seems perfectly fine to me.


I'd suggest that we just need a way to upload an extension to the server 
via libpq. Something like UPLOAD EXTENSION foo, which goes into COPY 
mode and you can stream over a zip file containing the .sql and .control 
file that make up the extension. The server would unzip the file into 
the right directory.


Now, point 3 is yet another issue. If you need to copy the extension 
files to /usr/share/, you need root (or similar) access on the 
filesystem. We could allow extensions to be located somewhere in the 
data directory instead. Like $PGDATA/extensions. But again, that would 
be an independent change from 1 and 2.


And I still don't understand why pg_dump needs to know about any of this...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Dimitri Fontaine

Simon Riggs si...@2ndquadrant.com writes:
 On 5 December 2012 09:46, John R Pierce pie...@hogranch.com wrote:
 On 12/5/2012 1:42 AM, Simon Riggs wrote:

 I think we need a parameter called

 schema_change_reporting = off (default) | on   [USERSET]

 which displays relevant statistics/reports about the actions taken by
 DDL statements. That will also highlight locks and the need to reduce
 their lock levels.


 where does this get displayed?   is it just tossed into the postgres log
 files?

 Good question. Where would you like?

What about pg_stat_ddl, a new system's view?  It would maybe need to
have some xid/cid like ordering to be able to reproduce the script. It
could also maybe use the ddl_rewrite module I'm working on for the event
triggers framework, as far as normalized SQL goes.

The rewrite information would be a boolean column in that view, I guess.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread David E. Wheeler

On Nov 28, 2012, at 4:10 PM, Merlin Moncure mmonc...@gmail.com wrote:

 Yes, it's iterative. And for deeply nested json it might be somewhat
 inefficient, although the parser is pretty fast AFAICT. But it's a start.
 
 not completely buying that: see comments below.  not supporting xpath
 style decompositions seems wrong to me.  IOW, json_get should be set
 returning (perhaps via wild cards in the keytext) or we need
 json_each.

The problem I see with the current proposal is that this limitation, it seems 
to me, would prevent the ability to index nested keys. If you're essentially 
composing and decomposing JSON values as you drill down, the intermediate JSON 
values between the original one and the final return value can't be indexed, 
can they?

For sufficiently large columns, I expect I would want a GIN index to speed JSON 
value extraction queries. Possible with this proposal?

Best,

David

PS: SOrry for the delayed replies, digging my way out of a couple weeks of back 
posts…



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Andrew Dunstan



On 12/05/2012 12:11 PM, David E. Wheeler wrote:

On Dec 4, 2012, at 10:05 AM, Josh Berkus j...@agliodbs.com wrote:


json_get(json, variadic text) = json

Given that I already do the equivalent in Python, this would suit me
well.  Not sure about other users ...

Well, given that sometimes you will have mixed arrays and objects, how would you 
distinguish 42 as an object key or an array index?



if the thing is an array, test to see if the string is a valid integer 
string, and if so use the integer value.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Andrew Dunstan



On 12/05/2012 12:14 PM, David E. Wheeler wrote:

On Nov 28, 2012, at 4:10 PM, Merlin Moncure mmonc...@gmail.com wrote:


Yes, it's iterative. And for deeply nested json it might be somewhat
inefficient, although the parser is pretty fast AFAICT. But it's a start.

not completely buying that: see comments below.  not supporting xpath
style decompositions seems wrong to me.  IOW, json_get should be set
returning (perhaps via wild cards in the keytext) or we need
json_each.

The problem I see with the current proposal is that this limitation, it seems 
to me, would prevent the ability to index nested keys. If you're essentially 
composing and decomposing JSON values as you drill down, the intermediate JSON 
values between the original one and the final return value can't be indexed, 
can they?

For sufficiently large columns, I expect I would want a GIN index to speed JSON 
value extraction queries. Possible with this proposal?


Probably not.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 19:13:10 +0200, Heikki Linnakangas wrote:
 On 05.12.2012 12:22, Dimitri Fontaine wrote:
 Heikki Linnakangashlinnakan...@vmware.com  writes:
 No doubt about that. I'm sure extensions written in pure SQL or PL/pgSQL are
 very common. But what does that have to do with this patch?
 
 This patch is all about enabling users to create extension without
 having to ship them as root on the file system of the database(s)
 server(s) first.
 ...
 When you can code your extension using dynamic code such as SQL or
 PL/pgSQL, PL/pythonu or PL/perl, there's absolutely no good reason to
 have to do the ship on the server's file system first that I can see.
 
 Technically creating an extension inline (sending its definition in
 the CREATE EXTENSION query itself) solves the problem of having to
 access the server's file system as root.

 Ok, I'm with you this far.

 Then, next pg_dump will include CREATE EXTENSION foo; as usual and at
 pg_restore time that access files on the file systems. But maybe you
 still are not granted access to the server's file system as root on the
 pg_restore target, right? So now you need to be able to include the
 extension's script into the dump.

 Now you lost me. I can see the need to install an extension without access
 to the filesystem - but it does not follow that you need to be able to dump
 an extension script. In general, I think you're confusing three things:

 1. The way an extension is deployed. It could be by copying the files to the
 file system, by sending them over libpq, or shipped in .rpms by the OS, or
 something else.

 2. The way an extension's files are laid out before it's deployed.
 Typically, you want to keep an extension's source code (whether it's C or
 SQL or plpython) in a version control system.

 3. Being able to deploy extensions to the server without superuser or root
 access

 I think it would make this discussion a lot clearer if we keep those
 concerns separate. It's useful to have a mechanism to deploy an extension
 over libpq. It's not clear to me if you're envisioning to change 2. I don't
 think we should; having a .sql file and a .control file seems perfectly fine
 to me.

 I'd suggest that we just need a way to upload an extension to the server via
 libpq. Something like UPLOAD EXTENSION foo, which goes into COPY mode and
 you can stream over a zip file containing the .sql and .control file that
 make up the extension. The server would unzip the file into the right
 directory.

Not sure what is better here. Dimitri's way seems to be easier to manage
for people who maintain their database in update scripts and such and
your's seems to be a bit simpler from the backend perspective.

 Now, point 3 is yet another issue. If you need to copy the extension files
 to /usr/share/, you need root (or similar) access on the filesystem. We
 could allow extensions to be located somewhere in the data directory
 instead. Like $PGDATA/extensions. But again, that would be an independent
 change from 1 and 2.

I think installing them into some global space is not a sensible
interim-step. Having a UPLOAD EXTENSION in one database affect all other
databases or even clusters (because you e.g. updated the version) would
be really confusing.

Which leads to:

 And I still don't understand why pg_dump needs to know about any of this...

Extensions should be fully per-database and we want pg_dump backups to
be restorable into another database/clusters/servers. So having a mode
for pg_dump that actually makes dumps that are usable for recovering
after a disaster seems sensible to me. Otherwise you need to redeploy
from the VCS or whatever, which isn't really what you want when
restoring a database backup.

Comparing the situation to the one where you have extensions provided by
the packaging system or by /contrib or whatever doesn't seem to be all
that valid to me.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread David E. Wheeler

On Dec 5, 2012, at 9:21 AM, Andrew Dunstan and...@dunslane.net wrote:

 For sufficiently large columns, I expect I would want a GIN index to speed 
 JSON value extraction queries. Possible with this proposal?
 
 Probably not.

That greatly reduces its utility for querying, though not, of course, for using 
it in procedural code.

Wouldn't using a jsonpath-style implementation allow for indexing?

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Heikki Linnakangas


On 05.12.2012 19:27, Andres Freund wrote:

And I still don't understand why pg_dump needs to know about any of this...


Extensions should be fully per-database and we want pg_dump backups to
be restorable into another database/clusters/servers. So having a mode
for pg_dump that actually makes dumps that are usable for recovering
after a disaster seems sensible to me. Otherwise you need to redeploy
from the VCS or whatever, which isn't really what you want when
restoring a database backup.


Ok - but that it yet another issue, not to be confused with how you 
deploy extensions. If we are to have such a mode in pg_dump, it should 
be able to dump *all* extensions, regardless of how they were deployed. 
(ok, might be difficult for extensions that include .so files or 
similar, but certainly for an extension that only contains a .sql file 
and a .control file, it shouldn't matter how it was deployed).


And whether extension control files (or the same information stored in a 
table or wherever) should be per-database or per cluster - that's *yet* 
another separate issue. You could argue for either behavior.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Simon Riggs

On 5 December 2012 15:01, Tom Lane t...@sss.pgh.pa.us wrote:
 John R Pierce pie...@hogranch.com writes:
 On 12/5/2012 1:42 AM, Simon Riggs wrote:
 I think we need a parameter called

 schema_change_reporting = off (default) | on   [USERSET]

 which displays relevant statistics/reports about the actions taken by
 DDL statements. That will also highlight locks and the need to reduce
 their lock levels.

 where does this get displayed?   is it just tossed into the postgres log
 files?

 And perhaps more to the point, what's the advantage compared to
 good old log_statement = ddl?

That logs DDL statements for the whole system and isn't user settable.
It wouldn't be useful to extend that, since the user wouldn't be able
to enable/disable and the stats would get dumped in the server log.

Need something more user specific.

Ideas

* pg_stat_ddl (from Dimitri) which would be a temp table containing results
* Stream of NOTICE statements back to client that seems easier
* err...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 19:40:58 +0200, Heikki Linnakangas wrote:
 On 05.12.2012 19:27, Andres Freund wrote:
 And I still don't understand why pg_dump needs to know about any of this...
 
 Extensions should be fully per-database and we want pg_dump backups to
 be restorable into another database/clusters/servers. So having a mode
 for pg_dump that actually makes dumps that are usable for recovering
 after a disaster seems sensible to me. Otherwise you need to redeploy
 from the VCS or whatever, which isn't really what you want when
 restoring a database backup.

 Ok - but that it yet another issue, not to be confused with how you deploy
 extensions. If we are to have such a mode in pg_dump, it should be able to
 dump *all* extensions, regardless of how they were deployed. (ok, might be
 difficult for extensions that include .so files or similar, but certainly
 for an extension that only contains a .sql file and a .control file, it
 shouldn't matter how it was deployed).

For me it seems pretty natural to support dumping extension the way they
got created. I.e. a plain CREATE EXTENSION ...; if the extension was
preinstalled and some form that includes the extension source if you
installed it via the connection.

Extensions that were installed in some global manner *should* not be
dumped with ones installed over the connection. E.g. dumping /contrib or
packaged modules seems to be a bad idea to me.

That would possibly be useful as a debugging tool, but I don't see much
point besides that.

 And whether extension control files (or the same information stored in a
 table or wherever) should be per-database or per cluster - that's *yet*
 another separate issue. You could argue for either behavior.

What would be the case for the per-cluster in the case of uploaded
extensions?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Dimitri Fontaine

Heikki Linnakangas hlinnakan...@vmware.com writes:
 Ok - but that it yet another issue, not to be confused with how you deploy
 extensions. If we are to have such a mode in pg_dump, it should be able to
 dump *all* extensions, regardless of how they were deployed. (ok, might be
 difficult for extensions that include .so files or similar, but certainly
 for an extension that only contains a .sql file and a .control file, it
 shouldn't matter how it was deployed).

That's what you have in the current patch. Try

   = create extension 'hstore';
   $ pg_dump --extension-script hstore

It works as far as the script is concerned, and the control file is not
needed any more because the script as dumped does not need it, except
for the two parameters 'require' and 'relocatable', that are added in
the SQL command.

The binary file is not taken care of by this mechanism. Remember that in
most cases pg_restore will not be granted to deploy it at the right
place anyway, for security reasons.

 And whether extension control files (or the same information stored in a
 table or wherever) should be per-database or per cluster - that's *yet*
 another separate issue. You could argue for either behavior.

At the SQL level, extensions do live in a database. The only reason why
we currently have them on the file system is binary executables (.so,
.dylib, .dll). And those are not per database, not even per cluster, not
even per major version, they are *per server*. It's something that makes
me very sad, and that I want to have the chance to fix later, but that
won't happen in 9.3, and certainly not in that very patch…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2012-12-05 19:13:10 +0200, Heikki Linnakangas wrote:
 And I still don't understand why pg_dump needs to know about any of this...

 Extensions should be fully per-database and we want pg_dump backups to
 be restorable into another database/clusters/servers.

Wait a minute.  I haven't bought into either of those statements, and
most particularly not the first one.

Upthread, Dimitri claimed that he wasn't creating two different kinds of
extensions with this patch, but the more I read about it the more it
seems that he *is* making a fundamentally different kind of animal.
And I don't think it's necessarily a good idea, especially not if we
still call it an extension.

I kind of like Heikki's idea of leaving CREATE EXTENSION alone and
inventing a separate UPLOAD EXTENSION operation, but there's a problem
with that: in many, probably most, installations, the server does not
and should not have permission to scribble on the directories where the
extension scripts are stored.  Possibly we could circumvent that by
creating an auxiliary extensions directory under $PGDATA.  (But then
it starts to seem like pg_dumpall --- not pg_dump --- ought to include
those files in its output...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Merlin Moncure

On Wed, Dec 5, 2012 at 11:14 AM, David E. Wheeler da...@justatheory.com wrote:
 On Nov 28, 2012, at 4:10 PM, Merlin Moncure mmonc...@gmail.com wrote:

 Yes, it's iterative. And for deeply nested json it might be somewhat
 inefficient, although the parser is pretty fast AFAICT. But it's a start.

 not completely buying that: see comments below.  not supporting xpath
 style decompositions seems wrong to me.  IOW, json_get should be set
 returning (perhaps via wild cards in the keytext) or we need
 json_each.

 The problem I see with the current proposal is that this limitation, it seems 
 to me, would prevent the ability to index nested keys. If you're essentially 
 composing and decomposing JSON values as you drill down, the intermediate 
 JSON values between the original one and the final return value can't be 
 indexed, can they?

 For sufficiently large columns, I expect I would want a GIN index to speed 
 JSON value extraction queries. Possible with this proposal?

I think best practices for JSON manipulation (at least in performance
sensitive cases with large documents) are going to be to fully
decompose into sql structures and manipulate after the fact.  JSON's
primary role is to serve as data exchange and Andrew's API (with the
tweaks he came up with) seems to facilitate that pretty well; full
decomposition is a snap.

Indexing large documents for fancy querying is a niche case but also
quite complex.  This isn't very well covered by xmlpath either btw --
I think for inspiration we should be looking at hstore.

That said, how would you do that?  The first thing that jumps into my
mind is to cut right to the chase:  Maybe the semantics could be
defined so that implement hackstack @ needle would reasonable cover
most cases.

So my takeaways are:
*) decomposition != precise searching.  andrew's api handles the
former and stands on it's own merits.

*) xmlpath/jsonpath do searching (and decomposition) but are very
clunky from sql perspective and probably absolutely nogo in terms if
GIST/GIN.  postgres spiritually wants to do things via operators and
we should (if possible) at least consider that first

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Andrew Dunstan



On 12/05/2012 12:29 PM, David E. Wheeler wrote:

On Dec 5, 2012, at 9:21 AM, Andrew Dunstan and...@dunslane.net wrote:


For sufficiently large columns, I expect I would want a GIN index to speed JSON 
value extraction queries. Possible with this proposal?

Probably not.

That greatly reduces its utility for querying, though not, of course, for using 
it in procedural code.

Wouldn't using a jsonpath-style implementation allow for indexing?



Indexing tree-like data isn't at all easy. We don't index XML either. 
There has been discussion of this sort of indexing it in the past, and a 
couple of people have said they would work on it, but I have not seen a 
proposal or a single line of code.


Jsonpath on its own would not do what you're suggesting. A first 
approach to indexing treeish data requires that you generate all the 
possible paths and index that. That would be quite explosive in volume. 
And anyway, jsonpath is not on offer here.


I'm sorry what I have offered isn't what you want, but plenty of other 
people have told me it will go a long way meeting their needs.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 And whether extension control files (or the same information stored in a 
 table or wherever) should be per-database or per cluster - that's *yet* 
 another separate issue. You could argue for either behavior.

I think anyone arguing for the former is confusing an installed
extension with a not-installed one.  Maybe it would help if we adopted
different terminologies.  Perhaps call the control+sql files a template,
while using extension for the installed entity?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On 5 December 2012 15:01, Tom Lane t...@sss.pgh.pa.us wrote:
 And perhaps more to the point, what's the advantage compared to
 good old log_statement = ddl?

 That logs DDL statements for the whole system and isn't user settable.

Not true; you can set it with ALTER ROLE/DATABASE, and a superuser can
adjust it for his own session.

 * pg_stat_ddl (from Dimitri) which would be a temp table containing results
 * Stream of NOTICE statements back to client that seems easier
 * err...

And an ORM is going to do what with either, pray tell?  None of these
are of any use except with an interactive session; in which the user
could perfectly well use EXPLAIN, anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 12:55:42 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2012-12-05 19:13:10 +0200, Heikki Linnakangas wrote:
  And I still don't understand why pg_dump needs to know about any of this...

  Extensions should be fully per-database and we want pg_dump backups to
  be restorable into another database/clusters/servers.

 Wait a minute.  I haven't bought into either of those statements, and
 most particularly not the first one.

Ok.

 Upthread, Dimitri claimed that he wasn't creating two different kinds of
 extensions with this patch, but the more I read about it the more it
 seems that he *is* making a fundamentally different kind of animal.
 And I don't think it's necessarily a good idea, especially not if we
 still call it an extension.

I have to admit I haven't read the whole discussion about this. And I
also have to say that I have no idea yet whether I like the current
implementation because I haven't looked at it yet. I just wanted to give
input to the separate problems Heikki listed. Because I wished for
something roughly like this for years...

To me it seems to be sensible that extensions which are preinstalled on
the system are global and extensions which a single user inside a single
database created are per database.
Imo that doesn't make them all that fundamentally different.

 I kind of like Heikki's idea of leaving CREATE EXTENSION alone and
 inventing a separate UPLOAD EXTENSION operation, but there's a problem
 with that: in many, probably most, installations, the server does not
 and should not have permission to scribble on the directories where the
 extension scripts are stored.  Possibly we could circumvent that by
 creating an auxiliary extensions directory under $PGDATA.  (But then
 it starts to seem like pg_dumpall --- not pg_dump --- ought to include
 those files in its output...)

UPLOAD EXTENSION seems to be a good idea.

But I really really would like them to go to a per-database directory
not a per-cluster one. Otherwise the coordination between different
database owners inside a cluster will get really hairy. I want to be
able to install different versions of an application into different
databases.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Heikki Linnakangas


On 05.12.2012 20:07, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

And whether extension control files (or the same information stored in a
table or wherever) should be per-database or per cluster - that's *yet*
another separate issue. You could argue for either behavior.


I think anyone arguing for the former is confusing an installed
extension with a not-installed one.  Maybe it would help if we adopted
different terminologies.  Perhaps call the control+sql files a template,
while using extension for the installed entity?


+1 on the naming.

You could still argue that templates should be per-database. It would 
make life easier for someone who is database owner but not superuser, 
for example, allowing you to install an extension that only affects your 
own database (assuming we set up the permissions so that that's 
possible, of course).


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Tom Lane

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 At the SQL level, extensions do live in a database. The only reason why
 we currently have them on the file system is binary executables (.so,
 .dylib, .dll). And those are not per database, not even per cluster, not
 even per major version, they are *per server*. It's something that makes
 me very sad, and that I want to have the chance to fix later, but that
 won't happen in 9.3, and certainly not in that very patchâ¦

I think you're wasting your time to imagine that that case will ever be
fixed.  Allowing the server to scribble on executable files would set
off all kinds of security alarm bells, and rightly so.  If Postgres ever
did ship with such a thing, I rather imagine that I'd be required to
patch it out of Red Hat releases (not that SELinux wouldn't prevent
it from happening anyway).

I do see an argument for allowing SQL-only extensions to be installed
this way, since that doesn't allow the execution of anything the user
couldn't execute anyway.  There's no need to worry about anything except
control and script files though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Heikki Linnakangas


On 05.12.2012 20:13, Andres Freund wrote:

But I really really would like them to go to a per-database directory
not a per-cluster one. Otherwise the coordination between different
database owners inside a cluster will get really hairy. I want to be
able to install different versions of an application into different
databases.


Extension authors should be careful to maintain backwards-compatibility, 
so that it would be enough to have the latest version installed. If you 
break compatibility, you probably should rename the extension.


That said, I can understand that in practice you'd want to have 
different versions installed at the same time, so that you don't need to 
re-test everything when upgrading an extension, and don't need to trust 
that the extension author didn't accidentally break 
backwards-compatibility anyway.


If you really meant different versions of an application, and not 
different versions of an extension, then it seems to me that you're 
abusing the extension infrastructure for something else. If you have 
some functions that you consider part of the application, even if those 
functions might be useful in other applications too, you probably don't 
want to treat them as an extension.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 20:15:42 +0200, Heikki Linnakangas wrote:
 On 05.12.2012 20:07, Tom Lane wrote:
 Heikki Linnakangashlinnakan...@vmware.com  writes:
 And whether extension control files (or the same information stored in a
 table or wherever) should be per-database or per cluster - that's *yet*
 another separate issue. You could argue for either behavior.
 
 I think anyone arguing for the former is confusing an installed
 extension with a not-installed one.

Not sure whether it would be the best design, but having something like
UPLOAD EXTENSION which can only exist in the installed form would be
enough for nearly all the use-cases I experienced.

  Maybe it would help if we adopted
 different terminologies.  Perhaps call the control+sql files a template,
 while using extension for the installed entity?

 +1 on the naming.

+1 on the idea of naming them separately, I am not happy with template,
but then I don't have a better suggestion.

 You could still argue that templates should be per-database. It would make
 life easier for someone who is database owner but not superuser, for
 example, allowing you to install an extension that only affects your own
 database (assuming we set up the permissions so that that's possible, of
 course).

+1. We could even have two variants, UPLOAD [GLOBAL]
EXTENSION/TEMPLATE. ISTM that we would need some kind of search path
anyway so adding that separation seems to be a minimal amount of
additional effort.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 20:23:29 +0200, Heikki Linnakangas wrote:
 On 05.12.2012 20:13, Andres Freund wrote:
 But I really really would like them to go to a per-database directory
 not a per-cluster one. Otherwise the coordination between different
 database owners inside a cluster will get really hairy. I want to be
 able to install different versions of an application into different
 databases.

 Extension authors should be careful to maintain backwards-compatibility, so
 that it would be enough to have the latest version installed. If you break
 compatibility, you probably should rename the extension.

In theory yes. In practice:

 That said, I can understand that in practice you'd want to have different
 versions installed at the same time, so that you don't need to re-test
 everything when upgrading an extension, and don't need to trust that the
 extension author didn't accidentally break backwards-compatibility anyway.

;)

 If you really meant different versions of an application, and not
 different versions of an extension, then it seems to me that you're
 abusing the extension infrastructure for something else. If you have some
 functions that you consider part of the application, even if those functions
 might be useful in other applications too, you probably don't want to treat
 them as an extension.

I was thinking of reusable parts of applications that might be used in
more than one application.

*But* I think this also is a good basis to encapsulate individual
non-shared parts of an application. Why not?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 13:18:16 -0500, Tom Lane wrote:
 Dimitri Fontaine dimi...@2ndquadrant.fr writes:
  At the SQL level, extensions do live in a database. The only reason why
  we currently have them on the file system is binary executables (.so,
  .dylib, .dll). And those are not per database, not even per cluster, not
  even per major version, they are *per server*. It's something that makes
  me very sad, and that I want to have the chance to fix later, but that
  won't happen in 9.3, and certainly not in that very patch…

Maybe I am missing something, but you already can separate them per
major version. You co-wrote the debian infrastructure to do so for some
debian packages, so I am not sure what you mean here.

Adding some *NON WRITABLE* per-cluster library directory doesn't seem to
be as controversion as other suggestions.


 I think you're wasting your time to imagine that that case will ever be
 fixed.  Allowing the server to scribble on executable files would set
 off all kinds of security alarm bells, and rightly so.  If Postgres ever
 did ship with such a thing, I rather imagine that I'd be required to
 patch it out of Red Hat releases (not that SELinux wouldn't prevent
 it from happening anyway).

+1

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Josh Berkus

Simon,

 And anything which assumes the *absence* of a manual script is also
 leaving out a large class of users. ORMs are very important, but not
 the only thing we serve.

Yes.  In the long run, we'll probably need two solutions.  An
interactive EXPLAIN, and something which logs or aborts for the ORM folks.

 Please assume that script meant a set of SQL statements that are
 executed in a specific sequence to change a database model from one
 version to another. Anything which requires editing of all (or worse,
 just some) of the SQL statements is not a good solution. For ORMs,
 this requires each ORM to make its own change to support that
 functionality and to have a separate mode where it is used. 

Exactly.  And only the ORMs which are very close to PostgreSQL would be
willing to do this.  Most would not.

 I think we need a parameter called
 
 schema_change_reporting = off (default) | on   [USERSET]

The problem with anything which reports back to the session is that even
when DBAs are running SQL scripts, migrations are seldom run in an
interactive session.  For example, I manage all migrations for large
projects using Python and YAML files, and SQLitch uses Perl and JSON
wrappers for the SQL.  Doing migrations via psql -f filename -q is
also very common.  So anything reported back in an interactive session
would be lost.

That's why we need a mechanism which either logs, or aborts on specific
actions.  From the perspective of the DevOps staff, abort is possibly
the better option, but there may be issues with it on our end.  That was
the attraction of the original NOREWRITE patch, although as I said that
suffers from new keywords and a total lack of extensibility.

What about adding something like:

ddl_action = [ none, log, warn, abort ]
ddl_events = [ all, rewrite, exclusive, access_exclusive ]

I realize I'm getting out into the weeds here, but I'm thinking as a
contract DBA, what would *really* help me? and something like the above
would do it.  This would allow me to do something like:

I wanna test this Rails migration, and have it die if it tries to do a
full table rewrite or take an access_exclusive lock.  And I'll check the
logs afterwards if it blows up.

ddl_action = 'log,abort'
ddl_events = 'rewrite,access_exclusive'

This would make it very easy to set some rules for the organization, and
enforce them with automated testing.

 Auto explain executes things twice, which is not possible for DDL, so
 it won't work.

I keep trying to find a use for auto-explain.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread David E. Wheeler

On Dec 5, 2012, at 9:57 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Indexing large documents for fancy querying is a niche case but also
 quite complex.  This isn't very well covered by xmlpath either btw --
 I think for inspiration we should be looking at hstore.

Agreed, although hstore, IIRC, does not support nesting.

 That said, how would you do that?  The first thing that jumps into my
 mind is to cut right to the chase:  Maybe the semantics could be
 defined so that implement hackstack @ needle would reasonable cover
 most cases.

Yes.

 So my takeaways are:
 *) decomposition != precise searching.  andrew's api handles the
 former and stands on it's own merits.

Agreed.

 *) xmlpath/jsonpath do searching (and decomposition) but are very
 clunky from sql perspective and probably absolutely nogo in terms if
 GIST/GIN.  postgres spiritually wants to do things via operators and
 we should (if possible) at least consider that first

I don't understand how xmlpath/jsonpath is not able to be implemented with 
operators.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Robert Haas

On Wed, Dec 5, 2012 at 5:22 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 This patch is all about enabling users to create extension without
 having to ship them as root on the file system of the database(s)
 server(s) first.

Right, but it changes the way that existing extensions *dump*, which
seems to me to undo the use case that works now.

I mean, the advantage of dumping an extension as CREATE EXTENSION
hstore is that you can reload that dump on a different server with a
newer version of hstore installed, and it'll still work.  If we go
back to dumping all of the SQL commands that compose that extension,
then it'll break again, in exactly the way things were broken before
we had extensions in the first place.  Back in the bad old days, you'd
dump your old database (which had all of the SQL commands for say
hstore) and then reload it on your new database - and it would fail,
because the old SQL commands didn't match the new binaries.  Oops.
With the extension mechanism, it all works just fine: the old database
emits CREATE EXTENSION hstore and the new database can execute that
just fine.  You still have a problem if the extension has meanwhile
been changed in a backwards-incompatible way that doesn't work for
your application (i.e. you're using the = operator which has since
been removed) but hopefully that doesn't happen too often, and in any
event it seems relatively unavoidable.  And it takes nothing away from
the problem that extensions DO solve, which is incompatibilities
between the SQL file and the shared library.

 When you can code your extension using dynamic code such as SQL or
 PL/pgSQL, PL/pythonu or PL/perl, there's absolutely no good reason to
 have to do the ship on the server's file system first that I can see.

 Technically creating an extension inline (sending its definition in
 the CREATE EXTENSION query itself) solves the problem of having to
 access the server's file system as root.

True, but so does not putting the code into an extension at all.  You
can just create loose functions and operators.  It's unclear to me
what advantage the extension mechanism provides if there's no shared
library and no on-disk files involved.

 Then, next pg_dump will include CREATE EXTENSION foo; as usual and at
 pg_restore time that access files on the file systems. But maybe you
 still are not granted access to the server's file system as root on the
 pg_restore target, right? So now you need to be able to include the
 extension's script into the dump.

Granting for the moment that there's a reason to call this an
extension at all, rather than a schema or just a bunch of random
CREATE commands, which is not obvious to me, yes, you need to include
it in the dump.  But sure then the extension needs to be marked as
being, somehow, a different flavor of extension that can only use SQL
(not shlibs) and needs to be dumped in-line, because otherwise, as
noted above, we break things for the flavor of extensions we've
already got.

Also, even there, it seems to me that it ought to work something like this:

CREATE EXTENSION inline_extension NULL; -- create an extension with no members
CREATE FUNCTION blahblahblah ...
ALTER EXTENSION inline_extension ADD FUNCTION blahblab ...
and so on for all the other members

That is, the extension members should just become dumpable objects.
This seems quite bizarre since the whole point of extensions AIUI is
to avoid dumping the members, but it's better than what the patch
implements.  In the patch, IIRC, you emit all the members as a
separate dump that gets enclosed by dollar quotes.  This strikes me
as ugly, and I think you can construct circular dependency situations
in which it will fail outright.

 Now, we don't want to have more than one kind of extensions. That's what
 we learnt all together when reviewing my proposal from last year. Having
 more than one way to ship an extension is good, having two different
 animals with two different incompatible behaviors named the same thing
 is bad. The solution we found is then to be able to include an
 extension's script into pg_dump's output, and that's what my current
 patch implements, per last year review.

I don't think I agree.  I don't see a problem having more than one
kind of extensions, but I'm worried that you're trying to shoehorn
something that isn't really an extension into an extension-sized box.
And I sure don't want that to mean let's break stuff that works right
now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread David E. Wheeler

On Dec 5, 2012, at 10:04 AM, Andrew Dunstan and...@dunslane.net wrote:

 Indexing tree-like data isn't at all easy. We don't index XML either. There 
 has been discussion of this sort of indexing it in the past, and a couple of 
 people have said they would work on it, but I have not seen a proposal or a 
 single line of code.

Yeah, I forgot that xmlpath was not indexable.

 Jsonpath on its own would not do what you're suggesting. A first approach to 
 indexing treeish data requires that you generate all the possible paths and 
 index that. That would be quite explosive in volume. And anyway, jsonpath is 
 not on offer here.

Yeah, explosive for sure, but for sufficiently small JSON values, that 
shouldn’t be much of an issue. I expect GINs to be expensive anyway (see 
full-text indexing).

I am not invested in jsonpath; I just cited it as an example of using a single 
function call to do a nested search. Obviously `json_get(json, variadic text)` 
allows this, too, and could potentially use a GIN index of a JSON tree to 
perform the variadic text search at some point in the future, yes?

 I'm sorry what I have offered isn't what you want, but plenty of other people 
 have told me it will go a long way meeting their needs.

*Sigh.* I guess I have not been clear.

The stuff you propose is *awesome*. I love it. The syntax with the chaining 
operators warms my heart, and I can’t wait to make *extensive* use of it in my 
procedural code. Maybe I would never *need* to do column queries of JSON 
contents often enough to require an expensive index.

So I'm happy with this stuff, as long as it does not get in the way of 
supporting indexing at some point in the future. I can’t wait to start using it!

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Josh Berkus


 *) xmlpath/jsonpath do searching (and decomposition) but are very
 clunky from sql perspective and probably absolutely nogo in terms if
 GIST/GIN.  postgres spiritually wants to do things via operators and
 we should (if possible) at least consider that first

Why is it a nogo for GiST?  Ltree works, doesn't it?  If we only support
equality lookups in what way is a JSON doc different from a collection
of ltree rows?

We'd probably want to use SP-GiST for better index size/performance, but
I don't see that this is impossible.  Just some difficult code.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Robert Haas

On Wed, Dec 5, 2012 at 12:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 For me it seems pretty natural to support dumping extension the way they
 got created. I.e. a plain CREATE EXTENSION ...; if the extension was
 preinstalled and some form that includes the extension source if you
 installed it via the connection.

 Extensions that were installed in some global manner *should* not be
 dumped with ones installed over the connection. E.g. dumping /contrib or
 packaged modules seems to be a bad idea to me.

 That would possibly be useful as a debugging tool, but I don't see much
 point besides that.

I agree with all of that.

What I can't quite figure out is - AIUI, extensions are a way of
bundling shared libraries with SQL scripts, and a way of managing the
dump and restore process.  If you just have SQL, there's no bundling
to do, and if you reverse out the pg_dump changes (which is more or
less what's being proposed here), then what do you have left other
than the good feeling of being part of an extension?  At that point,
it seems to me that you've gone to a lot of work to add a layer of
packaging that serves no real purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-05 Thread Robert Haas

On Wed, Dec 5, 2012 at 11:24 AM, Kevin Grittner kgri...@mail.com wrote:
 Robert Haas wrote:
 Since people *already* raise deadlock_timeout to obscenely high
 values (a minute? an hour???) and then complain that things blow
 up in their face, I think there's a decent argument to be made
 that piggybacking anything else on that setting is unwise.

 If people are really doing that, then I tend to agree. I wasn't
 aware of that practice.

It's probably not quite common enough to be called a practice, but I
have encountered it a number of times in support situations.  Alas, I
no longer remember the details of exactly what misery it caused, but I
do remember it wasn't good.  :-)

 Against that, FWICT, this problem only affects a small number of
 users: Jan is the only person I can ever remember reporting this
 issue. I'm not dumb enough to think he's the only person who it
 affects; but my current belief is that it's not an enormously
 common problem. So the main argument I can see against adding a
 GUC is that the problem is too marginal to justify a setting of
 its own. What I really see as the key issue is: suppose we
 hardcode this to say 2 seconds. Is that going to fix the problem
 effectively for 99% of the people who have this problem, or for
 25% of the people who have this problem? In the former case, we
 probably don't need a GUC; in the latter case, we probably do.

 Given the fact that autovacuum will keep throwing workers at it to
 essentially loop indefinitely at an outer level, I don't think the
 exact setting of this interval is all that critical either. My gut
 feel is that anything in the 2 second to 5 second range would be
 sane, so I won't argue over any explicit setting within that range.
 Below that, I think the overhead of autovacuum coming back to the
 table repeatedly would probably start to get too high; below that
 we could be causing some small, heavily-updated table to be
 neglected by autovacuum -- especially if you get multiple
 autovacuum workers tied up in this delay on different tables at the
 same time.

I think that part of what's tricky here is that the dynamics of this
problem depend heavily on table size.  I handled one support case
where lowering autovacuum_naptime to 15s was an indispenable part of
the solution, so in that case having an autovacuum worker retry for
more than a few seconds sounds kind of insane.  OTOH, that case
involved a small, rapidly changing table.  If you've got an enormous
table where vacuum takes an hour to chug through all of it, abandoning
the effort to truncate the table after a handful of seconds might
sound equally insane.

Many it'd be sensible to relate the retry time to the time spend
vacuuming the table.  Say, if the amount of time spent retrying
exceeds 10% of the time spend vacuuming the table, with a minimum of
1s and a maximum of 1min, give up.  That way, big tables will get a
little more leeway than small tables, which is probably appropriate.

 Regarding how many people are affected, I have seen several reports
 of situations where users claim massive impact on performance when
 autovacuum kicks in. The reports have not included enough detail to
 quantify the impact or in most cases to establish a cause, but this
 seems like it could have a noticable impact, especially if the
 deadlock timeout was set to more than a second.

Yeah, I agree this could be a cause of those types of reports, but I
don't have any concrete evidence that any of the cases I've worked
were actually due to this specific issue.  The most recent case of
this type I worked on was due to I/O saturation - which, since it
happened to be EC2, really meant network saturation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 13:50:27 -0500, Robert Haas wrote:
 On Wed, Dec 5, 2012 at 12:47 PM, Andres Freund and...@2ndquadrant.com wrote:
  For me it seems pretty natural to support dumping extension the way they
  got created. I.e. a plain CREATE EXTENSION ...; if the extension was
  preinstalled and some form that includes the extension source if you
  installed it via the connection.
 
  Extensions that were installed in some global manner *should* not be
  dumped with ones installed over the connection. E.g. dumping /contrib or
  packaged modules seems to be a bad idea to me.
 
  That would possibly be useful as a debugging tool, but I don't see much
  point besides that.

 I agree with all of that.

 What I can't quite figure out is - AIUI, extensions are a way of
 bundling shared libraries with SQL scripts, and a way of managing the
 dump and restore process.  If you just have SQL, there's no bundling
 to do, and if you reverse out the pg_dump changes (which is more or
 less what's being proposed here), then what do you have left other
 than the good feeling of being part of an extension?  At that point,
 it seems to me that you've gone to a lot of work to add a layer of
 packaging that serves no real purpose.

Manageability.

E.g. for years I had a set of (trigger) functions to counted the number
of rows in a table in a lockless manner. That's used in 10+ applications
of former clients of mine. All (plpg)sql.
Imagine I want to ship an updated version that 1. removes some
*internal* functions, 2. adds some internal function. 3. adds a new
*external* function.

Now most of the clients use completely different development models and
completely different ways of manageing upgrades. I needed to integrate
my teensy module into all of them.

If we had a way to package it nicely they could just upload the
extension inside their own workflows and I (or they) would be freed from
integrating foreign update scripts into their workflow.

Imagine embedding a PGXN module into your application which is used on
many servers and doesn't need superuser privileges or anything. Same
thing.

That's not something all that uncommon is it?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Robert Haas

On Wed, Dec 5, 2012 at 2:01 PM, Andres Freund and...@2ndquadrant.com wrote:
 E.g. for years I had a set of (trigger) functions to counted the number
 of rows in a table in a lockless manner. That's used in 10+ applications
 of former clients of mine. All (plpg)sql.
 Imagine I want to ship an updated version that 1. removes some
 *internal* functions, 2. adds some internal function. 3. adds a new
 *external* function.

 Now most of the clients use completely different development models and
 completely different ways of manageing upgrades. I needed to integrate
 my teensy module into all of them.

 If we had a way to package it nicely they could just upload the
 extension inside their own workflows and I (or they) would be freed from
 integrating foreign update scripts into their workflow.

OK, but let me play devil's advocate here.   Under the status quo, if
they used loose database objects, they would need to execute some
database code that does this:

DROP FUNCTION internalfunc1(int);
CREATE FUNCTION internalfunc2(int);
CREATE FUNCTION externalfunc3(int);

IIUC, under this proposal, the client would instead need to execute
some SQL code that looks something this (I'm faking the syntax here,
forgive me, but the patch doesn't seem to contemplate ALTER):

ALTER EXTENSION myextension UPDATE TO 1.1 USING SCRIPT $$
   ALTER EXTENSION myextension DROP FUNCTION internalfunc1(int);
   DROP FUNCTION internalfunc1(int);
   CREATE FUNCTION internalfunc2(int);
   ALTER EXTENSION myextension ADD FUNCTION internalfunc2(int);
   CREATE FUNCTION externalfunc3(int);
   ALTER FUNCTION myextension ADD FUNCTION externalfunc3(int);
$$;

That doesn't really look like an improvement to me.  What am I missing?

 Imagine embedding a PGXN module into your application which is used on
 many servers and doesn't need superuser privileges or anything. Same
 thing.

 That's not something all that uncommon is it?

Not at all.  I'm not questioning the use case at all; I'm questioning
whether extensions are the right tool for addressing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Robert Haas

On Wed, Dec 5, 2012 at 1:41 PM, Josh Berkus j...@agliodbs.com wrote:
 That's why we need a mechanism which either logs, or aborts on specific
 actions.  From the perspective of the DevOps staff, abort is possibly
 the better option, but there may be issues with it on our end.  That was
 the attraction of the original NOREWRITE patch, although as I said that
 suffers from new keywords and a total lack of extensibility.

You know, event triggers seem like an awfully good solution to this
problem.  All we'd need is a new event called table_rewrite:

CREATE EVENT TRIGGER my_event_trigger
ON table_rewrite
EXECUTE PROCEDURE consider_whether_to_throw_an_error();

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 14:10:34 -0500, Robert Haas wrote:
 On Wed, Dec 5, 2012 at 2:01 PM, Andres Freund and...@2ndquadrant.com wrote:
  E.g. for years I had a set of (trigger) functions to counted the number
  of rows in a table in a lockless manner. That's used in 10+ applications
  of former clients of mine. All (plpg)sql.
  Imagine I want to ship an updated version that 1. removes some
  *internal* functions, 2. adds some internal function. 3. adds a new
  *external* function.
 
  Now most of the clients use completely different development models and
  completely different ways of manageing upgrades. I needed to integrate
  my teensy module into all of them.
 
  If we had a way to package it nicely they could just upload the
  extension inside their own workflows and I (or they) would be freed from
  integrating foreign update scripts into their workflow.

 OK, but let me play devil's advocate here.   Under the status quo, if
 they used loose database objects, they would need to execute some
 database code that does this:

 DROP FUNCTION internalfunc1(int);
 CREATE FUNCTION internalfunc2(int);
 CREATE FUNCTION externalfunc3(int);

They would need to do exactly that if their database had version 1.1 and
they upgrade to 1.3 but not if they already had 1.2...

 IIUC, under this proposal, the client would instead need to execute
 some SQL code that looks something this (I'm faking the syntax here,
 forgive me, but the patch doesn't seem to contemplate ALTER):

 ALTER EXTENSION myextension UPDATE TO 1.1 USING SCRIPT $$
ALTER EXTENSION myextension DROP FUNCTION internalfunc1(int);
DROP FUNCTION internalfunc1(int);
CREATE FUNCTION internalfunc2(int);
ALTER EXTENSION myextension ADD FUNCTION internalfunc2(int);
CREATE FUNCTION externalfunc3(int);
ALTER FUNCTION myextension ADD FUNCTION externalfunc3(int);
 $$;

 That doesn't really look like an improvement to me.  What am I missing?

They should be able to simply slurp the extension from a file, possibly
even install it outside their own update mechanism. Given that you don't
know which version was installed beforehand thats not really possible
without some infrastructure.

And they should be able to drop the extension again afterwards without
it leaving a trace. Nearly all I have seen out there fails at that, and
the extension mechanism provides tracking of that.

  Imagine embedding a PGXN module into your application which is used on
  many servers and doesn't need superuser privileges or anything. Same
  thing.
 
  That's not something all that uncommon is it?

 Not at all.  I'm not questioning the use case at all; I'm questioning
 whether extensions are the right tool for addressing it.

Do you have some alterantive suggestion?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

2012-12-05 Thread Heikki Linnakangas


On 05.12.2012 14:32, Amit Kapila wrote:

On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:

After some diversions to fix bugs and refactor existing code, I've
committed a couple of small parts of this patch, which just add some
sanity checks to notice incorrect PITR scenarios. Here's a new version
of the main patch based on current HEAD.


After testing with the new patch, the following problems are observed.

Defect - 1:

 1. start primary A
 2. start standby B following A
 3. start cascade standby C following B.
 4. start another standby D following C.
 5. Promote standby B.
 6. After successful time line switch in cascade standby C  D, stop D.
 7. Restart D, Startup is successful and connecting to standby C.
 8. Stop C.
 9. Restart C, startup is failing.


Ok, the error I get in that scenario is:

C 2012-12-05 19:55:43.840 EET 9283 FATAL:  requested timeline 2 does not 
contain minimum recovery point 0/3023F08 on timeline 1
C 2012-12-05 19:55:43.841 EET 9282 LOG:  startup process (PID 9283) 
exited with exit code 1
C 2012-12-05 19:55:43.841 EET 9282 LOG:  aborting startup due to startup 
process failure


It seems that the commits I made to master already:

http://archives.postgresql.org/pgsql-committers/2012-12/msg00116.php
http://archives.postgresql.org/pgsql-committers/2012-12/msg00111.php

were a few bricks shy of a load. The problem is that if recovery stops 
at a checkpoint record that changes timeline, so that minRecoveryPoint 
points to the end of the checkpoint record, we still record the old TLI 
as the TLI of minRecoveryPoint. This is because 1) there's a silly bug 
in the patch; replayEndTLI is not updated along with replayEndRecPtr. 
But even after fixing that, we're not good.


The problem is that replayEndRecPtr is currently updated *before* 
calling the redo function. replayEndRecPtr is what becomes 
minRecoveryPoint when XLogFlush() is called. If the record is a 
checkpoint record, redoing it will switch recovery to the new timeline, 
but replayEndTLI will not be updated until the next record.


IOW, as far as minRecoveryPoint is concerned, a checkpoint record that 
switches timeline is considered to be part of the old timeline. But when 
a server is promoted and a new timeline is created, the checkpoint 
record is considered to be part of the new timeline; that's what we 
write in the page header and in the control file.


That mismatch causes the error. I'd like to fix this by always treating 
the checkpoint record to be part of the new timeline. That feels more 
correct. The most straightforward way to implement that would be to peek 
at the xlog record before updating replayEndRecPtr and replayEndTLI. If 
it's a checkpoint record that changes TLI, set replayEndTLI to the new 
timeline before calling the redo-function. But it's a bit of a 
modularity violation to peek into the record like that.


Or we could just revert the sanity check at beginning of recovery that 
throws the requested timeline 2 does not contain minimum recovery point 
0/3023F08 on timeline 1 error. The error I added to redo of checkpoint 
record that says unexpected timeline ID %u in checkpoint record, before 
reaching minimum recovery point %X/%X on timeline %u checks basically 
the same thing, but at a later stage. However, the way 
minRecoveryPointTLI is updated still seems wrong to me, so I'd like to 
fix that.


I'm thinking of something like the attached (with some more comments 
before committing). Thoughts?


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 702ea7c..bdae7a4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5822,6 +5822,7 @@ StartupXLOG(void)
 			 */
 			do
 			{
+TimeLineID EndTLI;
 #ifdef WAL_DEBUG
 if (XLOG_DEBUG ||
  (rmid == RM_XACT_ID  trace_recovery_messages = DEBUG2) ||
@@ -5895,8 +5896,20 @@ StartupXLOG(void)
  * Update shared replayEndRecPtr before replaying this record,
  * so that XLogFlush will update minRecoveryPoint correctly.
  */
+if (record-xl_rmid == RM_XLOG_ID 
+	(record-xl_info  ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN)
+{
+	CheckPoint	checkPoint;
+
+	memcpy(checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
+	EndTLI = checkPoint.ThisTimeLineID;
+}
+else
+	EndTLI = ThisTimeLineID;
+
 SpinLockAcquire(xlogctl-info_lck);
 xlogctl-replayEndRecPtr = EndRecPtr;
+xlogctl-replayEndTLI = EndTLI;
 recoveryPause = xlogctl-recoveryPause;
 SpinLockRelease(xlogctl-info_lck);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Merlin Moncure

On Wed, Dec 5, 2012 at 12:49 PM, Josh Berkus j...@agliodbs.com wrote:

 *) xmlpath/jsonpath do searching (and decomposition) but are very
 clunky from sql perspective and probably absolutely nogo in terms if
 GIST/GIN.  postgres spiritually wants to do things via operators and
 we should (if possible) at least consider that first

 Why is it a nogo for GiST?  Ltree works, doesn't it?  If we only support
 equality lookups in what way is a JSON doc different from a collection
 of ltree rows?

 We'd probably want to use SP-GiST for better index size/performance, but
 I don't see that this is impossible.  Just some difficult code.

huh -- good point.   xpath at least is quite complicated and likely
impractical (albeit not impossible) to marry with GIST in a meaningful
way.   jsonpath (at least AIUI from here:
http://code.google.com/p/json-path/) seems to be lighter weight as is
all things json when stacked up against xml.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Robert Haas

On Wed, Dec 5, 2012 at 2:19 PM, Andres Freund and...@2ndquadrant.com wrote:
 IIUC, under this proposal, the client would instead need to execute
 some SQL code that looks something this (I'm faking the syntax here,
 forgive me, but the patch doesn't seem to contemplate ALTER):

 ALTER EXTENSION myextension UPDATE TO 1.1 USING SCRIPT $$
ALTER EXTENSION myextension DROP FUNCTION internalfunc1(int);
DROP FUNCTION internalfunc1(int);
CREATE FUNCTION internalfunc2(int);
ALTER EXTENSION myextension ADD FUNCTION internalfunc2(int);
CREATE FUNCTION externalfunc3(int);
ALTER FUNCTION myextension ADD FUNCTION externalfunc3(int);
 $$;

 That doesn't really look like an improvement to me.  What am I missing?

 They should be able to simply slurp the extension from a file, possibly
 even install it outside their own update mechanism. Given that you don't
 know which version was installed beforehand thats not really possible
 without some infrastructure.

 And they should be able to drop the extension again afterwards without
 it leaving a trace. Nearly all I have seen out there fails at that, and
 the extension mechanism provides tracking of that.

Ah, OK.  Well, it sounds like this might be a decent fit for the
TEMPLATE concept proposed upthread, then.

I have no objection whatsoever to the concept of storing the SQL and
control files somewhere that doesn't need access to the server
filesystem - in fact, I think I previously proposed allowing those to
be stored in a database table.  You could do that with something like:

CREATE TEMPLATE yadda;
ALTER TEMPLATE yadda ADD FILE 'yadda--1.0.sql' CONTENT $$...$$;

...or whatever.  And that'd be 100% fine with me, and it could dump
and restore just that way, and life would be good.  Or at least, it
sounds to me like that would meet the requirements you are
articulating without breaking anything that works today.  In fact, it
sounds pretty cool.

The root of my objections upthread, I think, is that the design this
patch puts on the table seems to me to conflate the extension (which
is always a database object) with the template (which is *currently*
always a filesystem object).  I think that's bound to create some
problems.  In the patch as it exists today, I think those problems are
going to leak out in the form of breaking some of the things for which
extensions can currently be used, but even if we address those points
I have a sneaking suspicion that there will be others.

For example, your point (in the portion of your email I'm not quoting
here) about an upgrade across multiple version is well-taken - you
need a different script depending on the version that's currently
installed.  Fixing that, though, seems to require a catalog of upgrade
scripts, so that the server can look at the installed version and the
available scripts and decide how to proceed.  That catalog currently
takes the form of separate files in the filesystem, but I don't see
any reason why we can't store it somewhere else.  What I'm not sold on
is the idea of shuttling it across as part of CREATE/ALTER EXTENSION
statements.  I'm willing to be persuaded, but right now I can't see
how that's ever going either robust or convenient.  Making it part of
a separate SQL object type gets around that problem rather nicely,
IMHO.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Josh Berkus


 You know, event triggers seem like an awfully good solution to this
 problem.  All we'd need is a new event called table_rewrite:
 
 CREATE EVENT TRIGGER my_event_trigger
 ON table_rewrite
 EXECUTE PROCEDURE consider_whether_to_throw_an_error();

Oh, wow, that would be perfect.  That also solves the problem of I
don't care if I rewrite really_small_table, but I do care if I rewrite
really_big_table.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Andres Freund

On 2012-12-05 10:49:35 -0800, Josh Berkus wrote:

  *) xmlpath/jsonpath do searching (and decomposition) but are very
  clunky from sql perspective and probably absolutely nogo in terms if
  GIST/GIN.  postgres spiritually wants to do things via operators and
  we should (if possible) at least consider that first

 Why is it a nogo for GiST?  Ltree works, doesn't it?  If we only support
 equality lookups in what way is a JSON doc different from a collection
 of ltree rows?

The space requirement for the paths are quite different. Its not that
hard to build indexing support, its hard to build efficient support.

The more you hide from postgres (i.e. behind a single very complex
operator/function) the harder it is for the planner to detect whether
your expression is indexable or not.

 We'd probably want to use SP-GiST for better index size/performance, but
 I don't see that this is impossible.  Just some difficult code.

I don't immediately see why SP-Gist would be be beneficial. What kind of
access structure do you have in mind?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Andrew Dunstan



On 12/05/2012 01:49 PM, Josh Berkus wrote:

*) xmlpath/jsonpath do searching (and decomposition) but are very
clunky from sql perspective and probably absolutely nogo in terms if
GIST/GIN.  postgres spiritually wants to do things via operators and
we should (if possible) at least consider that first

Why is it a nogo for GiST?  Ltree works, doesn't it?  If we only support
equality lookups in what way is a JSON doc different from a collection
of ltree rows?

We'd probably want to use SP-GiST for better index size/performance, but
I don't see that this is impossible.  Just some difficult code.



The set of paths for a single json datum can be huge, as opposed to one 
for a single ltree datum. That strikes me as a serious barrier. In any 
case, nobody I know of is even offering to do this - when they do we can 
look at the design. Until then I'm assuming nothing.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Andrew Dunstan



On 12/05/2012 01:48 PM, David E. Wheeler wrote:

I'm sorry what I have offered isn't what you want, but plenty of other people 
have told me it will go a long way meeting their needs.

*Sigh.* I guess I have not been clear.

The stuff you propose is *awesome*. I love it. The syntax with the chaining 
operators warms my heart, and I can’t wait to make *extensive* use of it in my 
procedural code. Maybe I would never *need* to do column queries of JSON 
contents often enough to require an expensive index.



OK, sorry if I misunderstood. I guess I'm trying pretty hard to 
concentrate on what can be accomplished now, and other people are 
talking about blue sky possibilities.





So I'm happy with this stuff, as long as it does not get in the way of 
supporting indexing at some point in the future. I can’t wait to start using it!


I don't see why it should get in the way of anything like that. If 
anything, the parser design changes I have proposed should make later 
development much easier.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread Merlin Moncure

On Wed, Dec 5, 2012 at 12:42 PM, David E. Wheeler da...@justatheory.com wrote:
 On Dec 5, 2012, at 9:57 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Indexing large documents for fancy querying is a niche case but also
 quite complex.  This isn't very well covered by xmlpath either btw --
 I think for inspiration we should be looking at hstore.

 Agreed, although hstore, IIRC, does not support nesting.

 That said, how would you do that?  The first thing that jumps into my
 mind is to cut right to the chase:  Maybe the semantics could be
 defined so that implement hackstack @ needle would reasonable cover
 most cases.

 Yes.

 So my takeaways are:
 *) decomposition != precise searching.  andrew's api handles the
 former and stands on it's own merits.

 Agreed.

 *) xmlpath/jsonpath do searching (and decomposition) but are very
 clunky from sql perspective and probably absolutely nogo in terms if
 GIST/GIN.  postgres spiritually wants to do things via operators and
 we should (if possible) at least consider that first

 I don't understand how xmlpath/jsonpath is not able to be implemented with 
 operators.

yeah -- i phrased that badly -- by 'operators' I meant that on both
sides would be json document with absolute minimum fanciness such as
wildcards and predicate matches.  basically, 'overlaps' and
(especially) 'contains'.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json accessors

2012-12-05 Thread David E. Wheeler

On Dec 5, 2012, at 11:51 AM, Andrew Dunstan and...@dunslane.net wrote:

 So I'm happy with this stuff, as long as it does not get in the way of 
 supporting indexing at some point in the future. I can’t wait to start using 
 it!
 
 I don't see why it should get in the way of anything like that. If anything, 
 the parser design changes I have proposed should make later development much 
 easier.

Awesome, thanks!

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Alvaro Herrera

Robert Haas escribió:

 I have no objection whatsoever to the concept of storing the SQL and
 control files somewhere that doesn't need access to the server
 filesystem - in fact, I think I previously proposed allowing those to
 be stored in a database table.  You could do that with something like:
 
 CREATE TEMPLATE yadda;
 ALTER TEMPLATE yadda ADD FILE 'yadda--1.0.sql' CONTENT $$...$$;
 
 ...or whatever.

This seems unnecessary to me.  What the patch at hand does is take the
file (actually, the contents of the file) and execute it directly,
without installing anything on disk.  The precise contents of the
extension is still tracked through pg_depend, so you can drop it without
having previously saved neither the control file or the SQL script.  (In
fact, that's how DROP EXTENSION works currently.)

There's also the pg_dump side of things; with your proposal we would be
forced to move over the yadda--1.0.sql file from the old server to the
new one; or, equivalently, put the whole ALTER TEMPLATE .. CONTENT
command in the dump, which is equivalent to what Dimitri's patch does;
so there doesn't seem to be a point.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Dimitri Fontaine

Andres Freund and...@2ndquadrant.com writes:
 Maybe I am missing something, but you already can separate them per
 major version. You co-wrote the debian infrastructure to do so for some
 debian packages, so I am not sure what you mean here.

The debian infrastructure I've help building is all about compiling an
extension source package once and having as many binary artefacts as you
have major versions of PostgreSQL lying around.  So yes in debian you
can actually install such extensions at different on disk locations per
major version. Sorry for the confusion.

 Adding some *NON WRITABLE* per-cluster library directory doesn't seem to
 be as controversion as other suggestions.

Well, it means a per-initdb (user driven) location where to store binary
files, ask Tom what he thinks about that with his Red Hat Packager… Hat.

 On 2012-12-05 13:18:16 -0500, Tom Lane wrote:
 I think you're wasting your time to imagine that that case will ever be
 fixed.  Allowing the server to scribble on executable files would set
 off all kinds of security alarm bells, and rightly so.  If Postgres ever
 did ship with such a thing, I rather imagine that I'd be required to
 patch it out of Red Hat releases (not that SELinux wouldn't prevent
 it from happening anyway).

That part I did understand. I still can't be happy about it, but I won't
get back with any proposal where that's put into questions. That said,
while you're talking about it, what if it's an opt-in GUC?

 I do see an argument for allowing SQL-only extensions to be installed
 this way, since that doesn't allow the execution of anything the user
 couldn't execute anyway.  There's no need to worry about anything except
 control and script files though.

 […please make sure you're not drinking (coffee) before reading further…]

Now if we can't fix the executable files situation, what about making
the C coded extensions not require an executable anymore? I'm thinking
about studying what it would take exactly to write a PL/C where the
PostgreSQL backend would basically compile the embedded C code at CREATE
FUNCTION time and store bytecode or binary in the probin column.

I've stumbled accross more than one dynamic code or retargetable
compiler thing already, and several of those even have compatible
licences. Maybe the most promising ones are PL/LLVM or embeding the QEMU
code transformation code (a fork of the tcc compiler).

So, we're talking about a PL/C language, in-core or extension, where you
could define say hstore without shipping any executable binary. Yeah,
I'm crazy that way. Now I'll get back to the main thread…


Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 21:16:52 +0100, Dimitri Fontaine wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Adding some *NON WRITABLE* per-cluster library directory doesn't seem to
  be as controversion as other suggestions.

 Well, it means a per-initdb (user driven) location where to store binary
 files, ask Tom what he thinks about that with his Red Hat Packager… Hat.

I think it might be different if the directory is non-writable and
connot be made writable by user running postgres.

  I do see an argument for allowing SQL-only extensions to be installed
  this way, since that doesn't allow the execution of anything the user
  couldn't execute anyway.  There's no need to worry about anything except
  control and script files though.

  […please make sure you're not drinking (coffee) before reading further…]

 Now if we can't fix the executable files situation, what about making
 the C coded extensions not require an executable anymore? I'm thinking
 about studying what it would take exactly to write a PL/C where the
 PostgreSQL backend would basically compile the embedded C code at CREATE
 FUNCTION time and store bytecode or binary in the probin column.

 So, we're talking about a PL/C language, in-core or extension, where you
 could define say hstore without shipping any executable binary. Yeah,
 I'm crazy that way. Now I'll get back to the main thread…

Imo thats not a sensible thing to pursue.

It would seriously shorten the effort needed to run user-provided
code. Yes, you can execute unrestricted perl, python whatever already
but with selinux and similar things in place that won't allow you to run
your own binaries. And currently execute-or-write protection will
prevent you from executing compiled code. So you would have to disable
that as well...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Dimitri Fontaine

So,

Lots of things are being said, most of them are really interesting and
some of them are just re-hashing what we said about a year ago in the
Inline Extensions thread, whose conclusion was that the key not to
have two different beasts (inline, file based) was for pg_dump to
process them all the same way.

Meanwhile, I think I need to address that reaction first:

Robert Haas robertmh...@gmail.com writes:
 What I can't quite figure out is - AIUI, extensions are a way of
 bundling shared libraries with SQL scripts, and a way of managing the
 dump and restore process.

Not quite.

Extensions are user defined CASCADE support.

It's all about pg_depend extensibility.

Extensions are a way to manage dependencies of SQL objects in a way that
allow you to manage them as a single entity. Now you can 

  CREATE EXTENSION
  ALTER EXTENSION
  DROP EXTENSION

and all you're doing is managing a bunch of SQL objects at once.

The fact that it allows to implement a working dumprestore of aborigen
extensions called contribs has been the first step, not the whole goal.

You will notice that there's nothing in the whole extension patch and
framework that refers to a module, those little executable binary
files whose name depend on the OS PostgreSQL is running on, and that you
manage at the file system level, in postgresql.conf with some GUCs, and
with the LOAD command.

You will also notice that we have been *very* careful not to taint any
extension related SQL command with the notion of files. That part is
well separated away and only meant to be known by extension authors and
packagers, not by mere mortals such as DBAs or users. The current patch
is willing to push that a little further away, making it optional even
to extension authors.

Those two facts didn't just happen. And I was not alone in designing the
system that way. Let's continue the design and its implementation! :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Simon Riggs

On 5 December 2012 19:15, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Dec 5, 2012 at 1:41 PM, Josh Berkus j...@agliodbs.com wrote:
 That's why we need a mechanism which either logs, or aborts on specific
 actions.  From the perspective of the DevOps staff, abort is possibly
 the better option, but there may be issues with it on our end.  That was
 the attraction of the original NOREWRITE patch, although as I said that
 suffers from new keywords and a total lack of extensibility.

 You know, event triggers seem like an awfully good solution to this
 problem.  All we'd need is a new event called table_rewrite:

 CREATE EVENT TRIGGER my_event_trigger
 ON table_rewrite
 EXECUTE PROCEDURE consider_whether_to_throw_an_error();

+1, I was just thinking that myself.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Alvaro Herrera

Andres Freund escribió:
 On 2012-12-05 21:16:52 +0100, Dimitri Fontaine wrote:

  Now if we can't fix the executable files situation, what about making
  the C coded extensions not require an executable anymore? I'm thinking
  about studying what it would take exactly to write a PL/C where the
  PostgreSQL backend would basically compile the embedded C code at CREATE
  FUNCTION time and store bytecode or binary in the probin column.

 Imo thats not a sensible thing to pursue.

+1.  Certainly a pg_dump patch's thread is not the place to propose it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-05 Thread Tom Lane

Asif Rehman asifr.reh...@gmail.com writes:
 Here is the updated patch. I overlooked the loop, checking to free the
 conversions map. Here are the results now.

I looked at this patch briefly.  It seems to me to be completely
schizophrenic about the coercion rules.  This bit:

-   if (atttypid != att-atttypid ||
-   (atttypmod != att-atttypmod  atttypmod = 0))
+   if ((atttypmod != att-atttypmod  atttypmod = 0) ||
+   !can_coerce_type(1, atttypid, 
(att-atttypid), COERCE_IMPLICIT_CAST))

says that we'll allow column types to differ if there is an implicit
cast from the source to the target (or at least I think that's what's
intended, although it's got the source and target backwards).  Fine, but
then why don't we use the cast machinery to do the conversion?  This
is taking plpgsql's cowboy let's-coerce-via-IO-no-matter-whether-that's-
right-or-not approach and sticking it into a core part of the system.
There's no guarantee at all that applying typoutput then typinput
will match the conversion semantics you get from an actual cast, and
in many practical cases such as int4 to int8 it'll be drastically less
efficient anyway.  It would make more sense to do something similar to
coerce_record_to_complex(), that is modify the expression tree to
coerce the columns using the regular cast machinery.

Also, the typmod part of the test seems completely broken.  For one
thing, comparing typmods isn't sane if the types themselves aren't
the same.  And it's quite unclear to me why we'd want to have an
anything-goes policy for type coercion, or even an implicit-casts-only
policy, but then insist that the typmods match exactly.  This coding
will allow varchar(42) to text, but not varchar(42) to varchar(43)
... where's the sense in that?

The patch also seems to go a great deal further than what was asked for
originally, or indeed is mentioned in the documentation patch, namely
fixing the restriction on RETURN to allow composite-typed expressions.
Specifically it's changing the code that accepts composite input
arguments.  Do we actually want that?  If so, shouldn't it be
documented?

I'm inclined to suggest that we drop all the coercion stuff and just
do what Robert actually asked for originally, which was the mere
ability to return a composite value as long as it matched the function's
result type.  I'm not convinced that we want automatic implicit type
coercions here.  In any case I'm very much against sticking such a thing
into general-purpose support code like tupconvert.c.  That will create a
strong likelihood that plpgsql's poorly-designed coercion semantics will
leak into other aspects of the system.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Dimitri Fontaine

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 +1.  Certainly a pg_dump patch's thread is not the place to propose it.

Sure. Sorry about that, the goal of that previous message was to let
people come to understand better my whole vision of what is an
Extension, a contrib, and where we are what I wanted us to build.

I refined those ideas in another email though, so you can safely ignore
this sub-thread. I'll get back to the question of storing .so in a per
database location with an opt-in GUC later, when appropriate.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2012-12-05 21:16:52 +0100, Dimitri Fontaine wrote:
 Now if we can't fix the executable files situation, what about making
 the C coded extensions not require an executable anymore? I'm thinking
 about studying what it would take exactly to write a PL/C where the
 PostgreSQL backend would basically compile the embedded C code at CREATE
 FUNCTION time and store bytecode or binary in the probin column.
 
 So, we're talking about a PL/C language, in-core or extension, where you
 could define say hstore without shipping any executable binary. Yeah,
 I'm crazy that way. Now I'll get back to the main threadâ¦

 Imo thats not a sensible thing to pursue.

That would be another thing that Red Hat would refuse to ship, as would
any other distro with an ounce of concern about security.  But in any
case there's no way that we could implement it portably.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Tom Lane

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 On 2012-12-05 13:18:16 -0500, Tom Lane wrote:
 I think you're wasting your time to imagine that that case will ever be
 fixed.  Allowing the server to scribble on executable files would set
 off all kinds of security alarm bells, and rightly so.  If Postgres ever
 did ship with such a thing, I rather imagine that I'd be required to
 patch it out of Red Hat releases (not that SELinux wouldn't prevent
 it from happening anyway).

 That part I did understand. I still can't be happy about it, but I won't
 get back with any proposal where that's put into questions. That said,
 while you're talking about it, what if it's an opt-in GUC?

GUC or no GUC, it'd still be letting an unprivileged network-exposed
application (PG) do something that's against any sane system-level
security policy.  Lipstick is not gonna help this pig.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] strange isolation test buildfarm failure on guaibasaurus

2012-12-05 Thread Stefan Kaltenbrunner

Hi all!

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=guaibasaurusdt=2012-12-05%2016%3A17%3A01


seems like a rather odd failure in the isolation test (client)


Stefan



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Andres Freund

On 2012-12-05 12:09:58 -0300, Alvaro Herrera wrote:
 Here's a first attempt at a new documentation chapter.  This goes in
 part Server Programming, just after the SPI chapter.

 I just noticed that worker_spi could use some more sample code, for
 example auth_counter was getting its own LWLock and also its own shmem
 area, which would be helpful to demonstrate I think.

I am not exactly renowned for my english skills, but I have made a pass
over the file made some slight changes and extended it in two places.
I've also added a comment with a question that came to my mind when
reading the docs...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
--- /tmp/bgworker.old.sgml	2012-12-05 22:12:00.220609454 +0100
+++ /tmp/bgworker.sgml	2012-12-05 22:26:11.604712943 +0100
@@ -11,16 +11,16 @@
  para
   PostgreSQL can be extended to run user-supplied code in separate processes.
   Such processes are started, stopped and monitored by commandpostgres/command,
-  which permits them have a lifetime closely linked to the server's status.
+  which permits them to have a lifetime closely linked to the server's status.
   These processes have the option to attach to productnamePostgreSQL/'s
-  shared memory area and connect to databases internally.
+  shared memory area and to connect to databases internally.
  /para
 
  warning
   para
There are considerable robustness and security risks in using background
-   worker processes, because them being written in the literalC/ language
-   gives them unrestricted access to data.  Administrators wishing to enable
+   worker processes because, being written in the literalC/ language,
+   they have unrestricted access to data.  Administrators wishing to enable
modules that include background worker process should exercise extreme
caution.  Only carefully audited modules should be permitted to run
background worker processes.
@@ -31,7 +31,8 @@
   Only modules listed in varnameshared_preload_libraries/ can run
   background workers.  A module wishing to register a background worker needs
   to register it by calling
-  functionRegisterBackgroundWorker(typeBackgroundWorker *worker/type)/function.
+  functionRegisterBackgroundWorker(typeBackgroundWorker *worker/type)/function
+  from it's function_PG_Init()/.
   The structure structnameBackgroundWorker/structname is defined thus:
 programlisting
 typedef struct BackgroundWorker
@@ -50,12 +51,12 @@
 
   para
structfieldbgw_name/ is a string to be used in log messages, process
-   lists and similar contexts.
+   listings and similar contexts.
   /para
 
   para
structfieldbgw_flags/ is a bitwise-or'd bitmask indicating the
-   capabilities that the module would like to have.  Possible values are
+   capabilities that the module wants.  Possible values are
literalBGWORKER_SHMEM_ACCESS/literal (requesting shared memory access)
and literalBGWORKER_BACKEND_DATABASE_CONNECTION/literal (requesting the
ability to establish a database connection, through which it can later run
@@ -68,33 +69,35 @@
literalBgWorkerStart_PostmasterStart/ (start as soon as
commandpostgres/ itself has finished its own initialization; processes
requesting this are not eligible for database connections),
-   literalBgWorkerStart_ConsistentState/ (start as soon as consistent state
+   literalBgWorkerStart_ConsistentState/ (start as soon as a consistent state
has been reached in a HOT standby, allowing processes to connect to
databases and run read-only queries), and
literalBgWorkerStart_RecoveryFinished/ (start as soon as the system has
entered normal read-write state).  Note the last two values are equivalent
in a server that's not a HOT standby.
   /para
-  
+
   para
structfieldbgw_restart_time/structfield is the interval, in seconds, that
-   commandpostgres/command should wait before restarting the process,
-   in case it crashes.  It can be any positive value, or literalBGW_NEVER_RESTART/literal, indicating not to restart the process in case of a crash.
+   commandpostgres/command should wait before restarting the process, in
+   case it crashes.  It can be any positive value,
+   or literalBGW_NEVER_RESTART/literal, indicating not to restart the
+   process in case of a crash.
   /para
 
   para
structfieldbgw_main/structfield is a pointer to the function to run once
the process is started.  structfieldbgw_main_arg/structfield will be
-   passed to it as its only argument.  Note that
-   literalMyBgworkerEntry/literal is a pointer to a copy of the
-   structnameBackgroundWorker/structname structure passed
-   at registration time.
+   passed to it as its only argument.  Note that the global variable
+   literalMyBgworkerEntry/literal points to a copy of the
+   structnameBackgroundWorker/structname structure passed at registration
+   time.
   /para
 
   para

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 16:20:41 -0500, Tom Lane wrote:
 Dimitri Fontaine dimi...@2ndquadrant.fr writes:
  On 2012-12-05 13:18:16 -0500, Tom Lane wrote:
  I think you're wasting your time to imagine that that case will ever be
  fixed.  Allowing the server to scribble on executable files would set
  off all kinds of security alarm bells, and rightly so.  If Postgres ever
  did ship with such a thing, I rather imagine that I'd be required to
  patch it out of Red Hat releases (not that SELinux wouldn't prevent
  it from happening anyway).

  That part I did understand. I still can't be happy about it, but I won't
  get back with any proposal where that's put into questions. That said,
  while you're talking about it, what if it's an opt-in GUC?

 GUC or no GUC, it'd still be letting an unprivileged network-exposed
 application (PG) do something that's against any sane system-level
 security policy.  Lipstick is not gonna help this pig.

What about the non-writable per cluster directory? Thats something I've
actively wished for in the past when developing a C module thats also
used in other clusters.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Tom Lane

Andres Freund and...@anarazel.de writes:
 On 2012-12-05 16:20:41 -0500, Tom Lane wrote:
 GUC or no GUC, it'd still be letting an unprivileged network-exposed
 application (PG) do something that's against any sane system-level
 security policy.  Lipstick is not gonna help this pig.

 What about the non-writable per cluster directory? Thats something I've
 actively wished for in the past when developing a C module thats also
 used in other clusters.

I see no security objection to either per-cluster or per-database
script+control-file directories, as long as they can only contain
SQL scripts and not executable files.

If we allow such things to be installed by less-than-superusers,
we'll have to think carefully about what privileges are given
when running the script.  I forget at the moment how much of that
we already worked out back in the 9.1 era; I remember it was discussed
but not whether we had a bulletproof solution.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Alvaro Herrera

Andres Freund wrote:
 On 2012-12-05 12:09:58 -0300, Alvaro Herrera wrote:
  Here's a first attempt at a new documentation chapter.  This goes in
  part Server Programming, just after the SPI chapter.
 
  I just noticed that worker_spi could use some more sample code, for
  example auth_counter was getting its own LWLock and also its own shmem
  area, which would be helpful to demonstrate I think.
 
 I am not exactly renowned for my english skills, but I have made a pass
 over the file made some slight changes and extended it in two places.

Thanks, I have applied it.

 I've also added a comment with a question that came to my mind when
 reading the docs...

para
 structfieldbgw_sighup/structfield and structfieldbgw_sigterm/ are
 pointers to functions that will be installed as signal handlers for the 
 new
 -   process.
 +   process. XXX: Can they be NULL?
/para

Hm.  The code doesn't check, so what happens is probably a bug anyhow.
I don't know whether sigaction crashes in this case; its manpage doesn't
say.  I guess the right thing to do is have RegisterBackgroundWorker
check for a NULL sighandler, and set it to something standard if so (say
SIG_IGN for SIGHUP and maybe quickdie() or similar for SIGTERM).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Simon Riggs

On 5 December 2012 15:09, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Here's a first attempt at a new documentation chapter.  This goes in
 part Server Programming, just after the SPI chapter.

 I just noticed that worker_spi could use some more sample code, for
 example auth_counter was getting its own LWLock and also its own shmem
 area, which would be helpful to demonstrate I think.

to run once - to run when

Prefer
BgWorkerStart_ConsistentState to be renamed to BgWorkerRun_InHotStandby
BgWorkerStart_RecoveryFinished to be renamed to BgWorkerRun_InNormalMode

presumably a process will shutdown if (BgWorkerRun_InHotStandby 
!BgWorkerRun_InNormalMode)


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Andres Freund

On 2012-12-05 16:42:38 -0500, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On 2012-12-05 16:20:41 -0500, Tom Lane wrote:
  GUC or no GUC, it'd still be letting an unprivileged network-exposed
  application (PG) do something that's against any sane system-level
  security policy.  Lipstick is not gonna help this pig.

  What about the non-writable per cluster directory? Thats something I've
  actively wished for in the past when developing a C module thats also
  used in other clusters.

 I see no security objection to either per-cluster or per-database
 script+control-file directories, as long as they can only contain
 SQL scripts and not executable files.

Well, I was explicitly talking about C code above. The question doesn't
really have to do too much with this thread, sorry. Given I am proposing
the directory to be explicitly read-only and under permission that don't
allow postgres to change that its not really suitable for this topic...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Robert Haas

On Wed, Dec 5, 2012 at 2:54 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Robert Haas escribió:
 I have no objection whatsoever to the concept of storing the SQL and
 control files somewhere that doesn't need access to the server
 filesystem - in fact, I think I previously proposed allowing those to
 be stored in a database table.  You could do that with something like:

 CREATE TEMPLATE yadda;
 ALTER TEMPLATE yadda ADD FILE 'yadda--1.0.sql' CONTENT $$...$$;

 ...or whatever.

 This seems unnecessary to me.  What the patch at hand does is take the
 file (actually, the contents of the file) and execute it directly,
 without installing anything on disk.  The precise contents of the
 extension is still tracked through pg_depend, so you can drop it without
 having previously saved neither the control file or the SQL script.  (In
 fact, that's how DROP EXTENSION works currently.)

Yeah, DROP will work.  But what about ALTER .. UPDATE?

 There's also the pg_dump side of things; with your proposal we would be
 forced to move over the yadda--1.0.sql file from the old server to the
 new one; or, equivalently, put the whole ALTER TEMPLATE .. CONTENT
 command in the dump, which is equivalent to what Dimitri's patch does;
 so there doesn't seem to be a point.

Well, there's certainly a point, because IIUC Dimitri's patch dumps
the file into the pg_dump output no matter whether the file originally
came from an SQL command or the filesystem.  IMHO, anyone who thinks
that isn't going to break things rather badly isn't thinking hard
enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Alvaro Herrera

Simon Riggs wrote:
 On 5 December 2012 15:09, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
  Here's a first attempt at a new documentation chapter.  This goes in
  part Server Programming, just after the SPI chapter.
 
  I just noticed that worker_spi could use some more sample code, for
  example auth_counter was getting its own LWLock and also its own shmem
  area, which would be helpful to demonstrate I think.
 
 to run once - to run when

Thanks.

 Prefer
 BgWorkerStart_ConsistentState to be renamed to BgWorkerRun_InHotStandby
 BgWorkerStart_RecoveryFinished to be renamed to BgWorkerRun_InNormalMode
 
 presumably a process will shutdown if (BgWorkerRun_InHotStandby 
 !BgWorkerRun_InNormalMode)

Hmm, no, I haven't considered that.  You mean that a bgworker that
specifies to start at BgWorkerStart_ConsistentState will stop once
normal mode is reached?  Currently they don't do that.  And since we
don't have the notion that workers stop working, it wouldn't work --
postmaster would start them back up immediately.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: question on foreign key lock

2012-12-05 Thread Robert Haas

On Wed, Dec 5, 2012 at 7:08 AM, Filip Rembiałkowski
filip.rembialkow...@gmail.com wrote:
 Robert, thank you for  the answer.

 1. need exclusive lock anyway to add triggers.
 Why adding a trigger needs exclusive lock?
 Someone would say blocking reads is not needed (since possible trigger
 events are: Insert/Update/Delete/Truncate).

 2. will create a risk of deadlock.
 From user perspective a risk of deadlock is sometimes better than
 excessive  locking. Transactional DDL users should be prepared for
 exceptions/retries anyway.

 3. I made a naive test of simply changing AccessExclusiveLock to
 ExclusiveLock, and seeing how many regression tests it breaks. It
 breaks none :-)
 Current Git head gives me 2 fails/133 tests regardless of this change.

Sure.  You could probably downgrade it quite a bit further without
breaking the regression tests, but that doesn't mean it's safe in all
cases.  Rather than having this discussion all over again, I suggest
that you have a look at commits
2dbbda02e7e688311e161a912a0ce00cde9bb6fc,
2c3d9db56d5d49bdc777b174982251c01348e3d8,
a195e3c34f1eeb6a607c342121edf48e49067ea9, and the various mailing list
discussions pertaining thereto, particularly the thread ALTER TABLE
lock strength reduction patch is unsafe which was started by Tom
Lane.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Yeah, DROP will work.  But what about ALTER .. UPDATE?

What about it?

 Well, there's certainly a point, because IIUC Dimitri's patch dumps
 the file into the pg_dump output no matter whether the file originally
 came from an SQL command or the filesystem.  IMHO, anyone who thinks
 that isn't going to break things rather badly isn't thinking hard
 enough.

Only if you ask for it using --extension-script. The default behaviour
didn't change, whether you decide to install your extension from the
file system or the PostgreSQL port.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: question on foreign key lock

2012-12-05 Thread Andres Freund

On 2012-12-05 17:05:41 -0500, Robert Haas wrote:
 On Wed, Dec 5, 2012 at 7:08 AM, Filip Rembiałkowski
 filip.rembialkow...@gmail.com wrote:
  Robert, thank you for  the answer.
 
  1. need exclusive lock anyway to add triggers.
  Why adding a trigger needs exclusive lock?
  Someone would say blocking reads is not needed (since possible trigger
  events are: Insert/Update/Delete/Truncate).
 
  2. will create a risk of deadlock.
  From user perspective a risk of deadlock is sometimes better than
  excessive  locking. Transactional DDL users should be prepared for
  exceptions/retries anyway.
 
  3. I made a naive test of simply changing AccessExclusiveLock to
  ExclusiveLock, and seeing how many regression tests it breaks. It
  breaks none :-)
  Current Git head gives me 2 fails/133 tests regardless of this change.

 Sure.  You could probably downgrade it quite a bit further without
 breaking the regression tests, but that doesn't mean it's safe in all
 cases.  Rather than having this discussion all over again, I suggest
 that you have a look at commits
 2dbbda02e7e688311e161a912a0ce00cde9bb6fc,
 2c3d9db56d5d49bdc777b174982251c01348e3d8,
 a195e3c34f1eeb6a607c342121edf48e49067ea9, and the various mailing list
 discussions pertaining thereto, particularly the thread ALTER TABLE
 lock strength reduction patch is unsafe which was started by Tom
 Lane.

Just to give an example about the complexities surrounding this:
Lowering the lock level for foreign key creation probably would be
dangerous for query planning more precisely join removal.

S1: BEGIN TRANSACTION ISOLATION LEVEL REPATABLE READ;
S1: SELECT * FROM a;

S2: DELETE FROM a WHERE a.id IN (all_duplicate_ids);
S2: ALTER TABLE a ADD CONTSTRAINT a_unique UNIQUE (a);
S2: ALTER TABLE b ADD CONSTRAINT b_fkey FOREIGN KEY (b_id) REFERENCES a(id));

S1: SELECT b.id FROM a LEFT JOIN b ON(b.id = a.id);

The last S1 query might now remove the join to b because of the foreign
key (which it sees due to SnapshotNow semantics) although rows that
violate unique key (which is required for the foreign key) still
exist. The existance of those duplicate values would change the result
though!


(come to think of it, I think we might still hit the above case if S1
doesn't access a before the foreign key gets altered...)

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping an Extension's Script

2012-12-05 Thread Alvaro Herrera

Dimitri Fontaine escribió:
 Robert Haas robertmh...@gmail.com writes:

  Well, there's certainly a point, because IIUC Dimitri's patch dumps
  the file into the pg_dump output no matter whether the file originally
  came from an SQL command or the filesystem.  IMHO, anyone who thinks
  that isn't going to break things rather badly isn't thinking hard
  enough.
 
 Only if you ask for it using --extension-script. The default behaviour
 didn't change, whether you decide to install your extension from the
 file system or the PostgreSQL port.

What happens on a normal pg_dump of the complete database?  For
extensions that were installed using strings instead of files, do I get
a string back?  Because if not, the restore is clearly going to fail
anyway.

I mean, clearly the user doesn't want to list the extensions, figure
which ones were installed by strings, and then do pg_dump
--extension-script on them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-05 Thread Dimitri Fontaine

Simon Riggs si...@2ndquadrant.com writes:
 CREATE EVENT TRIGGER my_event_trigger
 ON table_rewrite
 EXECUTE PROCEDURE consider_whether_to_throw_an_error();

 +1, I was just thinking that myself.

+1, and I think that can happen for 9.3, as soon as we agree on the list
of code points where we want that event to fire. ALTER TABLE variants
that are rewriting the heap, sure. CLUSTER? VACUUM FULL? TRUNCATE?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Andres Freund

On 2012-12-05 19:03:44 -0300, Alvaro Herrera wrote:
 Simon Riggs wrote:
  Prefer
  BgWorkerStart_ConsistentState to be renamed to BgWorkerRun_InHotStandby
  BgWorkerStart_RecoveryFinished to be renamed to BgWorkerRun_InNormalMode
 
  presumably a process will shutdown if (BgWorkerRun_InHotStandby 
  !BgWorkerRun_InNormalMode)

 Hmm, no, I haven't considered that.  You mean that a bgworker that
 specifies to start at BgWorkerStart_ConsistentState will stop once
 normal mode is reached?  Currently they don't do that.  And since we
 don't have the notion that workers stop working, it wouldn't work --
 postmaster would start them back up immediately.

I personally don't see too much demand for this from a use-case
perspective. Simon, did you have anything in mind that made you ask
this?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Andres Freund

On 2012-12-05 18:42:42 -0300, Alvaro Herrera wrote:
 para
  structfieldbgw_sighup/structfield and structfieldbgw_sigterm/ 
  are
  pointers to functions that will be installed as signal handlers for the 
  new
  -   process.
  +   process. XXX: Can they be NULL?
 /para

 Hm.  The code doesn't check, so what happens is probably a bug anyhow.
 I don't know whether sigaction crashes in this case; its manpage doesn't
 say.  I guess the right thing to do is have RegisterBackgroundWorker
 check for a NULL sighandler, and set it to something standard if so (say
 SIG_IGN for SIGHUP and maybe quickdie() or similar for SIGTERM).

Afair a NULL sigaction is used to query the current handler. Which
indirectly might lead to problems due to the wrong handler being called.

Setting up SIG_IGN and quickdie in that case seems to be sensible.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 134 matches

Mail list logo