Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread Itagaki Takahiro
On Thu, Oct 21, 2010 at 2:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The plan for UNION initially involves a couple of SubqueryScan nodes,
 which impose an extra cost of cpu_tuple_cost per tuple.  Those later
 get optimized away, but we don't try to readjust the cost estimates
 for that.

Thanks. It also explains my another question why Merge Append cannot
be used for UNION ALL plans. Inheritance is better than UNION ALL
in much more cases thanks to Merge Append.

=# EXPLAIN SELECT * FROM parent ORDER BY i LIMIT 10;
  QUERY PLAN
--
 Limit  (cost=1.02..1.58 rows=10 width=4)
   -  Result  (cost=1.02..56.79 rows=1001 width=4)
 -  Merge Append  (cost=1.02..56.79 rows=1001 width=4)
   Sort Key: public.parent.i
   -  Sort  (cost=1.01..1.01 rows=1 width=4)
 Sort Key: public.parent.i
 -  Seq Scan on parent  (cost=0.00..1.00 rows=1 width=4)
   -  Index Scan using child_i_idx on child parent
(cost=0.00..43.25 rows=1000 width=4)
(8 rows)

=# EXPLAIN (SELECT * FROM ONLY parent ORDER BY i) UNION ALL (SELECT *
FROM child ORDER BY i) ORDER BY i LIMIT 10;
  QUERY PLAN
---
 Limit  (cost=75.91..75.93 rows=10 width=4)
   -  Sort  (cost=75.91..78.41 rows=1001 width=4)
 Sort Key: parent.i
 -  Append  (cost=1.01..54.28 rows=1001 width=4)
   -  Sort  (cost=1.01..1.01 rows=1 width=4)
 Sort Key: parent.i
 -  Seq Scan on parent  (cost=0.00..1.00 rows=1 width=4)
   -  Index Scan using child_i_idx on child
(cost=0.00..43.25 rows=1000 width=4)
(8 rows)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Richard Huxton

On 20/10/10 01:47, Robert Haas wrote:

On Tue, Oct 19, 2010 at 6:14 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Comments?


It might be reasonable to back-patch whatever we decide on into 9.0,
because it is so new, but I would be reluctant to go back further
unless we have some evidence that it's bothering people.  It seems to
me that this can could have a lot of worms in it, and I fear that
there could be several rounds of fixes, which I would rather not
inflict on users of supposedly-stable branches.


The work-around I applied when I stumbled across this was just to apply 
an explicit cast before my function's RETURN. That neatly solves my 
particular problem (which I at first thought was a formatting issue 
somewhere in my app).


The real danger with this is the opportunity to end up with occasional 
bad data in tables, quite possibly unnoticed. If I'd come across this in 
an existing system rather than a new app I'm pretty sure it would have 
confused me for a lot longer than it did.

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg
Robert Haas wrote:
On Wed, Oct 20, 2010 at 5:30 PM, Stephen R. van den Berg s...@cuci.nl wrote:
 Ideal would be: put the table-oid inside the header of each page
 (either in the official header, or in the special area).
 This way even lost blocks can be correlated to the same table.
 I'd still vote for the latest known table definition in the first
 page. ?It's by no means perfect, but it will help 99% of all
 salvage attempts by an order of magnitude.

 and break on-disk compatibility just to make it easier to

If it's inserted in the special area, it will not break any
compatibility.

I don't think we should shrink the amount of usable space by 4 bytes
per block

Instead of putting it in every page, it could be inserted (say) once
every megabyte (if done in the special area) to avoid excessive
overhead.

I'm pretty dubious about the proposal to stuff an otherwise-useless
metapage in every heap, too.

The information is supposed to go in the special area, so it will not
be an extra page.

  If you have many small tables, you just
doubled your disk utilization 

For small tables, the table description typically is small as well,
so in the common case it all will still fit in one page.

- worse than that, maybe, if some of
them are empty.

An empty table does not contain any critical information which needs
to be restored (by definition :-); so the code that inserts the
table definition in the special area could easily be instructed *not*
to write out this information unless the table actually has entries.

  If we needed a metapage anyway and had extra space to
play with, stuffing some useful forensic information in there might be
worthwhile, but I have a hard time thinking that forensics alone is a
sufficient justification for such a change.

The change could easily be made backward compatible to all on disk
formats which support the special area pointer (I'm not sure how far
back that is).
-- 
Stephen.

For children with short attention spans: boomerangs that don't come back.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Dimitri Fontaine
David E. Wheeler da...@kineticode.com writes:
 Might I suggest instead a META.json file like PGXN requires? Here's a
 simple example:

I don't see what it buys us in this very context. The main thing here to
realize is that I wrote about no code to parse the control file. I don't
think the extension patch should depend on the JSON-in-core patch.

Now, once we have JSON and before the release, I guess given a good
reason to have much more complex configuration files that don't look at
all like postgresql.conf, we could revisit.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-21 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 20.10.2010 18:06, Tatsuo Ishii wrote:
 Apart this, I wonder why walsender/walreceiver do not transfer archive
 logs as well.

 What do you mean?

I'd be pleased if Tatsuo idea have anything to do with this mail:

  http://archives.postgresql.org/message-id/m239s1f2oi@2ndquadrant.fr

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_hba.conf host name wildcard support

2010-10-21 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Peter Eisentraut pete...@gmx.net writes:
 1. TCP Wrappers style, leading dot indicates suffix match.
 So .example.com matches anything.example.com.  Not sure how useful that
 would be, but it could be implemented in about 3 lines of code.

 I'd lean to #1 myself.

FWIW, +1

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug in plpython's Python Generators

2010-10-21 Thread Jean-Baptiste Quenot
Hi there,

I can't make Python Generators to work reliably.  According to the
documentation, this should work:

CREATE OR REPLACE FUNCTION foobar()
RETURNS SETOF text AS
$$
for s in ('Hello', 'World'):
plpy.execute('select 1')
yield s
$$
LANGUAGE 'plpythonu';

I get this error when calling the function:

test=# select foobar();
ERROR:  error fetching next item from iterator
CONTEXT:  PL/Python function foobar


When I remove the dummy plpy.execute() call, it works:

CREATE OR REPLACE FUNCTION foobar()
RETURNS SETOF text AS
$$
for s in ('Hello', 'World'):
yield s
$$
LANGUAGE 'plpythonu';

test=# select foobar();
 foobar

 Hello
 World
(2 rows)


Seems like calls to plpy.execute() conflict with generators.  This is
the case both on versions 8.4.4 and 9.0.1.

All the best,
-- 
Jean-Baptiste Quenot

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-21 Thread Marko Tiikkaja

Hi,

Here's an updated patch.

I'm still not too fond of the logic in spi.c, but I can't see a better 
way to do this.  If someone sees a better way, I'm not going to object.


I also made some changes to the SQL functions now that we have a 
different API.  The previous code pushed and popped snapshots quite heavily.


I'd also like to see more regression tests for SQL functions, but I'm 
going to submit my suggestions as a separate patch.



Regards,
Marko Tiikkaja
*** a/src/backend/catalog/pg_proc.c
--- b/src/backend/catalog/pg_proc.c
***
*** 755,761  fmgr_sql_validator(PG_FUNCTION_ARGS)
--- 755,763 
Oid funcoid = PG_GETARG_OID(0);
HeapTuple   tuple;
Form_pg_proc proc;
+   List   *raw_parsetree_list;
List   *querytree_list;
+   ListCell   *list_item;
boolisnull;
Datum   tmp;
char   *prosrc;
***
*** 828,836  fmgr_sql_validator(PG_FUNCTION_ARGS)
 */
if (!haspolyarg)
{
!   querytree_list = pg_parse_and_rewrite(prosrc,
!   
  proc-proargtypes.values,
!   
  proc-pronargs);
(void) check_sql_fn_retval(funcoid, proc-prorettype,
   
querytree_list,
   
NULL, NULL);
--- 830,854 
 */
if (!haspolyarg)
{
!   /*
!* Parse and rewrite the queries in the function text.
!*
!* Even though check_sql_fn_retval is only interested 
in the last
!* query, we analyze all of them here to check for any 
errors.
!*/
!   raw_parsetree_list = pg_parse_query(prosrc);
!   
!   querytree_list = NIL;
!   foreach(list_item, raw_parsetree_list)
!   {
!   Node *parsetree = (Node *) lfirst(list_item);
! 
!   querytree_list = 
pg_analyze_and_rewrite(parsetree, prosrc,
!   
proc-proargtypes.values, proc-pronargs);
!   }
! 
!   Assert(querytree_list != NIL);
! 
(void) check_sql_fn_retval(funcoid, proc-prorettype,
   
querytree_list,
   
NULL, NULL);
*** a/src/backend/executor/functions.c
--- b/src/backend/executor/functions.c
***
*** 84,89  typedef struct
--- 84,90 
boolreturnsSet; /* true if returning multiple 
rows */
boolreturnsTuple;   /* true if returning whole tuple result 
*/
boolshutdown_reg;   /* true if registered shutdown callback 
*/
+   boolsnapshot;   /* true if pushed an active 
snapshot */
boolreadonly_func;  /* true to run in read only mode */
boollazyEval;   /* true if using lazyEval for 
result query */
  
***
*** 93,107  typedef struct
  
JunkFilter *junkFilter; /* will be NULL if function returns 
VOID */
  
!   /* head of linked list of execution_state records */
!   execution_state *func_state;
  } SQLFunctionCache;
  
  typedef SQLFunctionCache *SQLFunctionCachePtr;
  
  
  /* non-export function prototypes */
! static execution_state *init_execution_state(List *queryTree_list,
 SQLFunctionCachePtr fcache,
 bool lazyEvalOK);
  static void init_sql_fcache(FmgrInfo *finfo, bool lazyEvalOK);
--- 94,107 
  
JunkFilter *junkFilter; /* will be NULL if function returns 
VOID */
  
!   List *func_state;
  } SQLFunctionCache;
  
  typedef SQLFunctionCache *SQLFunctionCachePtr;
  
  
  /* non-export function prototypes */
! static List *init_execution_state(List *queryTree_list,
 SQLFunctionCachePtr fcache,
 bool lazyEvalOK);
  static void init_sql_fcache(FmgrInfo *finfo, bool lazyEvalOK);
***
*** 123,183  static void sqlfunction_destroy(DestReceiver *self);
  
  
  /* Set up the list of per-query execution_state records for a SQL function */
! static execution_state *
  init_execution_state(List 

Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread Tom Lane
Itagaki Takahiro itagaki.takah...@gmail.com writes:
 On Thu, Oct 21, 2010 at 2:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The plan for UNION initially involves a couple of SubqueryScan nodes,
 which impose an extra cost of cpu_tuple_cost per tuple.  Those later
 get optimized away, but we don't try to readjust the cost estimates
 for that.

 Thanks. It also explains my another question why Merge Append cannot
 be used for UNION ALL plans.

Hmm, seems like the example you show ought to work.  I wonder if there
was an oversight in that patch...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Tom Lane
Stephen R. van den Berg s...@cuci.nl writes:
 Robert Haas wrote:
 and break on-disk compatibility just to make it easier to

 If it's inserted in the special area, it will not break any
 compatibility.

I'll tell you what I really don't like about this proposal: we discuss
some scheme or other for taking over the special space in heap pages
at least once a year.  None of them have been particularly compelling
so far, but one may come along that is; especially given that we're now
trying to maintain on-disk compatibility across versions.  So I think
the opportunity cost of assigning a use to that space is mighty high.
I don't find this idea important enough to justify foreclosing future
uses for the special space.

The real bottom line is this: if you care enough about your data to
be willing to expend a large amount of effort on manual recovery
attempts, why didn't you have a decent backup scheme in place?
There are way too many scenarios where you'll have no hope of doing
any such manual recovery anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Itagaki Takahiro itagaki.takah...@gmail.com writes:
 Why does only hstore have version '9.1'? Any other modules have
 '9.1devel'.

 It's the only contrib that's not using PGXS but instead directly
 includes $(top_builddir)/src/Makefile.global,

... well, that's just a bug in hstore.  *All* the contrib modules
should be using PGXS, unless they have a damn good reason not to;
which is not apparent for hstore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote:
 
 Also, it appears to be non-deterministic, to a degree at least,
 so you may not observe the problem in the exact way that I do.
 The SELECTs only look at the root and the predicate doesn't match.
 So each SELECT sets an SIReadLock on block 0 and exits the search.
 Looks good so far.
 
 T1 then inserts, and it has to modify page 0, so it does
 FlagRWConflict(). That sets writer-inConflict = reader and
 reader-outConflict = writer (where writer is T1 and reader is
 T2); and T1-outConflict and T2-inConflict remain NULL.
 
 Then T2 inserts, and I didn't catch that part in as much detail in
 gdb, but it apparently has no effect on that state, so we still
 have T1-inConflict = T2, T1-outConflict = NULL, T2-inConflict =
 NULL, and T2-outConflict = T1.
 
I now see where the wheels fall off.  The GiST query initially stops
at a high level, so predicate locks only go that deep, and the
*first* insert of a conflicting row must ripple up and modify a
locked page; but *subsequent* inserts may only need to modify the
leaf level.  Even though your particular example doesn't involve a
cycle and therefore doesn't require a rollback for correctness
(although it might tend to generate a false positive if index page
locks were working correctly), you've exposed a flaw in the
GiST AM implementation of predicate locks.
 
On a first glance, it appears that we would need to check for
conflicts as we move down through the index to find the right spot
for an insert, not as we modify pages for the insert.  I hope
there's some more subtle technique or some way to qualify it;
otherwise a search which stops at the root page would generate a
conflict out to just about any index insertion from a concurrent
transaction.
 
I will add this to my list of issues to fix based on your review,
unless it's something you would like to tackle -- I'm not going to
chase away anyone who wants to help with this.  :-)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 ... well, that's just a bug in hstore.  *All* the contrib modules
 should be using PGXS, unless they have a damn good reason not to;
 which is not apparent for hstore.

Here's a patch.
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/contrib/hstore/Makefile
--- b/contrib/hstore/Makefile
***
*** 1,9 
  # contrib/hstore/Makefile
  
- subdir = contrib/hstore
- top_builddir = ../..
- include $(top_builddir)/src/Makefile.global
- 
  MODULE_big = hstore
  OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
  	crc32.o
--- 1,5 
***
*** 12,15  DATA_built = hstore.sql
--- 8,21 
  DATA = uninstall_hstore.sql
  REGRESS = hstore
  
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/hstore
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
  include $(top_srcdir)/contrib/contrib-global.mk
+ endif
+ 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread David E. Wheeler
On Oct 21, 2010, at 12:33 AM, Dimitri Fontaine wrote:

 I don't see what it buys us in this very context. The main thing here to
 realize is that I wrote about no code to parse the control file. I don't
 think the extension patch should depend on the JSON-in-core patch.
 
 Now, once we have JSON and before the release, I guess given a good
 reason to have much more complex configuration files that don't look at
 all like postgresql.conf, we could revisit.

Sure. The reason to do it, though, is so that extension authors can create just 
one metadata file, instead of two (or three, if one must also put such data 
into the Makefile).

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-21 Thread Mark Wong
On Wed, Oct 20, 2010 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Oct 20, 2010 at 3:47 PM, daveg da...@sonic.net wrote:
 On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote:
 On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote:
  I don't think it's a big cost once all the processes
  have been forked if you're reusing them beyond perhaps slightly more
  efficient cache usage.

 Hm, this site claims to get a 13% win just from the reduced tlb misses
 using a preload hack with Pg 8.2. That would be pretty substantial.

 http://oss.linbit.com/hugetlb/

 That was my motivation in trying a patch. TLB misses can be a substantial
 overhead. I'm not current on the state of play, but working at Sun's
 benchmark lab on a DB TPC-B benchmark something for the first generation
 of MP systems, something like 30% of all bus traffic was TLB misses. The
 next iteration of the hardward had a much larger TLB.

 I have a client with 512GB memory systems, currently with 128GB configured
 as postgresql buffer cache. Which is 32M TLB entires trying to fit in the
 few dozed cpu TLB slots. I suspect there may be some contention.

 I'll benchmark of course.

 Do you mean 128GB shared buffers, or shared buffers + OS cache?  I
 think that the general wisdom is that performance tails off beyond
 8-10GB of shared buffers anyway, so a performance improvement on 128GB
 shared buffers might not mean much unless you can also show that 128GB
 shared buffers actually performs better than some smaller amount.

I'm sure someone will correct me if I'm wrong, but when I looked at
this a couple years ago I believe a side effect of using hugetlbs is
that these segments are never swapped out.

I made a weak attempt to patch postgres to use hugetlbs when
allocating shared memory.  If I can find that patch I'll send it out..

Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Dimitri Fontaine
David E. Wheeler da...@kineticode.com writes:
 Sure. The reason to do it, though, is so that extension authors can create
 just one metadata file, instead of two (or three, if one must also put such
 data into the Makefile).

That's a good idea, but my guess is that the implementation cost of
supporting the control format in your perl infrastructure is at least an
order of magnitude lower than the cost for me to support your current
JSON file format, so I lean towards you having an automated way to fill
in the json file from the control one...

The Makefile supports $(VERSION) because chances are it's already there
(think packaging or tarball release targets). Having yet another place
where to manually maintain a version number ain't appealing.

In the latest patch, though, the only other thing you find in the
Makefile about the extension is its basename, which must be the one of
both the .control and the .sql files. And it's possible for $(EXTENSION)
to be a list of them, too, because of contrib/spi.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-21 Thread Mark Wong
On Tue, Oct 19, 2010 at 8:30 PM, daveg da...@sonic.net wrote:
 On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
 On 20/10/10 16:05, Mark Kirkwood wrote:
 
 
 shmget and friends are hugetlbpage  aware, so it seems it should 'just
 work'.
 

 Heh - provided you specify

 SHM_HUGETLB


 in the relevant call that is :-)

 I had a patch for this against 8.3 that I could update if there is any
 interest. I suspect it is helpful.

Oh, probably better than me digging up my broken one.  Send it out as
is if you don't want to update it. :)

Regards,
Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote:
 
 That looks like a reasonable state to me, but I'm not sure exactly
 what the design calls for. I am guessing that the real problem is
 in PreCommit_CheckForSerializationFailure(), where there are 6
 conditions that must be met for an error to be thrown. T2 falls
 out right away at condition 1. T1 falls out on condition 4. I
 don't really understand condition 4 at all -- can you explain it?
 And can you explain conditions 5 and 6 too?
 
Since most transactions are rolled back on a conflict detection
during a read or write attempt, there are only a few very specific
conditions which can slip through to where they need to be
detected on commit.  Here's the code with the six conditions:
 
if (MySerializableXact-inConflict != InvalidSerializableXact
   MySerializableXact-inConflict != MySerializableXact
   !(MySerializableXact-inConflict-rolledBack)
   MySerializableXact-inConflict-inConflict !=
   InvalidSerializableXact
   !SxactIsCommitted(MySerializableXact-inConflict)
   !SxactIsCommitted(MySerializableXact-inConflict-inConflict))
 
Condition 4 is testing whether MySerializableXact is on the out
side of a pivot -- in the parlance of most examples, is
MySerializableXact TN?
 
Condition 5 and 6 confirm that neither T0 nor T1 have committed
first; we can only have a problem if TN commits first.
 
Basically, when we already have a pivot, but no transaction has yet
committed, we wait to see if TN commits first.  If so, we have a
problem; if not, we don't.  There's probably some room for improving
performance by cancelling T0 or T1 instead of TN, at least some of
the time; but in this pass we are always cancelling the transaction
in whose process we detect the need to cancel something.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread David E. Wheeler
On Oct 21, 2010, at 8:12 AM, Dimitri Fontaine wrote:

 That's a good idea, but my guess is that the implementation cost of
 supporting the control format in your perl infrastructure is at least an
 order of magnitude lower than the cost for me to support your current
 JSON file format, so I lean towards you having an automated way to fill
 in the json file from the control one...

Well, it *will* be easier. Eventually. Right now, the file has to be edited by 
hand. Which I can tell you from experience is rather…error-prone.

Anyway, I wouldn't push for a JSON file format until a parser was just there 
for you to use without too much trouble.

 The Makefile supports $(VERSION) because chances are it's already there
 (think packaging or tarball release targets). Having yet another place
 where to manually maintain a version number ain't appealing.

Be aware that PGXS sets a $(VERSION) variable already, so you'll need to use 
another name, I think, to guard from conflicts. This is in Makefile.global:

VERSION = 9.0.1
MAJORVERSION = 9.0

Maybe use EXTVERSION? You don't want to overwrite the core version because a 
makefile author could use it to change the build (pgTAP does this, for example).

 In the latest patch, though, the only other thing you find in the
 Makefile about the extension is its basename, which must be the one of
 both the .control and the .sql files. And it's possible for $(EXTENSION)
 to be a list of them, too, because of contrib/spi.

Right, that makes sense.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Dimitri Fontaine
David E. Wheeler da...@kineticode.com writes:
 Be aware that PGXS sets a $(VERSION) variable already, so you'll need
 to use another name, I think, to guard from conflicts. This is in
 Makefile.global:

Of course that's not a problem for contribs, and I used EXTVERSION in a
previous version of the patch. I guess I will get back to use
$(EXTVERSION) in the Makefile next time I have to produce a patch.

This part of the problem didn't receive much thoughts yet, and it shows
up. About the rest of the patch have been in my head for months, I
expect less problems there...

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Alvaro Herrera
Excerpts from Dimitri Fontaine's message of jue oct 21 12:53:18 -0300 2010:

 This part of the problem didn't receive much thoughts yet, and it shows
 up. About the rest of the patch have been in my head for months, I
 expect less problems there...

Keep on it.  You're doing a terrific job.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 11:12 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 David E. Wheeler da...@kineticode.com writes:
 Sure. The reason to do it, though, is so that extension authors can create
 just one metadata file, instead of two (or three, if one must also put such
 data into the Makefile).

 That's a good idea, but my guess is that the implementation cost of
 supporting the control format in your perl infrastructure is at least an
 order of magnitude lower than the cost for me to support your current
 JSON file format, so I lean towards you having an automated way to fill
 in the json file from the control one...

Ah, truth to power.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 My point is that anyplace that is relying on the surface typelem,
 without drilling down to see what the base type is, is wrong.
 So yeah, those lookups are (will be) necessary.

 OK.  In that case, +1 from me.

I've come across another interesting definitional issue, which is what
properties should domains have with respect to matching to polymorphic
arguments.  Currently, the polymorphic matching functions take domains
at face value (ie, without noticing their relationships to their base
types), with one exception: because they use get_element_type() to
decide if a type matches ANYARRAY, domains over arrays will be
considered to match ANYARRAY.  This leads to some weird inconsistencies
and at least one genuine bug.  Observe (this is with 9.0.x HEAD):

regression=# create domain myi as int;
CREATE DOMAIN
regression=# select array[1,2] || 3;
 ?column? 
--
 {1,2,3}
(1 row)

regression=# select array[1,2] || 3::myi;
ERROR:  operator does not exist: integer[] || myi

In this case, one might expect myi to be automatically downcast to int
so that it could be matched up with the int array, but that's not
happening.  However:

regression=# create domain myia as int[];
CREATE DOMAIN
regression=# select array[1,2]::myia || 3;
 ?column? 
--
 {1,2,3}
(1 row)

So we will downcast myia to int[], or at least one might assume that's
what's happening.  But actually it's worse than that: the result of this
operation is thought to be myia not int[], because myia itself is taken
as matching ANYARRAY, and the operator result is the same ANYARRAY type.
Thus, this case goes off the rails completely:

regression=# create domain myia2 as int[] check(array_length(value,1) = 2);
CREATE DOMAIN
regression=# select array[1,2]::myia2;
 array 
---
 {1,2}
(1 row)

regression=# select array[1,2,3]::myia2;
ERROR:  value for domain myia2 violates check constraint myia2_check
regression=# select array[1,2]::myia2 || 3;
 ?column? 
--
 {1,2,3}
(1 row)

The result of the || is considered to be myia2, as can be seen for
example this way:

regression=# create view vvv as select array[1,2]::myia2 || 3 as x;
CREATE VIEW
regression=# \d vvv
 View public.vvv
 Column | Type  | Modifiers 
+---+---
 x  | myia2 | 

So we have a value that's claimed to belong to the domain, but it
doesn't meet the domain's constraints.

What I am intending to do about this in the short run is to leave the
anyarray-ness tests in the polymorphic-compatibility-checking functions
as-is.  That will mean (with the change in typelem for domains) that a
domain over array doesn't match ANYARRAY unless you explicitly downcast
it.  I argue that this is consistent with the current behavior of not
auto-downcasting domains to match the element type of an array.  We
could go back and change it later, but if we do, we should try to make
both cases provide auto-downcast-when-needed behavior.  I have not dug
into just what code changes would be needed for that.  Auto-downcast
wouldn't be exactly compatible with the current behavior anyway, since
it would result in a different claimed type for the operator result.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Jeff Davis
On Thu, 2010-10-21 at 10:29 -0500, Kevin Grittner wrote:
 Basically, when we already have a pivot, but no transaction has yet
 committed, we wait to see if TN commits first.  If so, we have a
 problem; if not, we don't.  There's probably some room for improving
 performance by cancelling T0 or T1 instead of TN, at least some of
 the time; but in this pass we are always cancelling the transaction
 in whose process we detect the need to cancel something.

Well, in this case we do clearly have a problem, because the result is
not equal to the serial execution of the transactions in either order.

So the question is: at what point is the logic wrong? It's either:
  1. PreCommit_CheckForSerializationFailure() is missing a failure case.
  2. The state prior to entering that function (which I believe I
sufficiently described) is wrong.

If it's (2), then what should the state look like, and how is the GiST
code supposed to result in that state?

I know some of these questions are answered in the relevant research,
but I'd like to discuss this concrete example specifically.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 regression=# select array[1,2] || 3::myi;
 ERROR:  operator does not exist: integer[] || myi
 
 In this case, one might expect myi to be automatically downcast to
 int so that it could be matched up with the int array, but that's
 not happening.
 
I guess it should allow that, although for my uses of domains it's
hard to see a reasonable use case for it, so that part doesn't
bother me too much.
 
 regression=# create domain myia as int[];
 CREATE DOMAIN
 regression=# select array[1,2]::myia || 3;
  ?column? 
 --
  {1,2,3}
 (1 row)
 
 So we will downcast myia to int[], or at least one might assume
 that's what's happening.  But actually it's worse than that: the
 result of this operation is thought to be myia not int[], because
 myia itself is taken as matching ANYARRAY, and the operator result
 is the same ANYARRAY type.
 
That is actually what I would want and expect.  Let's say I have an
array of attorney bar numbers, and I add one more as a literal. 
While an argument could be made that the integer should be cast to
a bar number before being added to the array, we don't require that
for an assignment to a simple variable in the domain, so it would be
surprising to require a cast here, and even more surprising for the
concatenation to result in an array of primitive integers rather
than a array of attorney bar numbers.
 
 regression=# create domain myia2 as int[]
 check(array_length(value,1) = 2);
 CREATE DOMAIN
 
 regression=# select array[1,2]::myia2 || 3;
  ?column? 
 --
  {1,2,3}
 (1 row)
 
 So we have a value that's claimed to belong to the domain, but it
 doesn't meet the domain's constraints.
 
Yeah, that's obviously wrong.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 So we will downcast myia to int[], or at least one might assume
 that's what's happening.  But actually it's worse than that: the
 result of this operation is thought to be myia not int[], because
 myia itself is taken as matching ANYARRAY, and the operator result
 is the same ANYARRAY type.
 
 That is actually what I would want and expect.  Let's say I have an
 array of attorney bar numbers, and I add one more as a literal. 
 While an argument could be made that the integer should be cast to
 a bar number before being added to the array, we don't require that
 for an assignment to a simple variable in the domain, so it would be
 surprising to require a cast here, and even more surprising for the
 concatenation to result in an array of primitive integers rather
 than a array of attorney bar numbers.

I disagree with that argument: you are confusing an array over a domain
type with a domain over an array type.  In the latter case, the domain
could have additional constraints (such as the length constraint in my
other example), and there's no reason to assume that || or other array
operators would preserve those constraints.

A perhaps comparable example is

create domain verysmallint as int check (value  10);

select 9::verysmallint + 1;

The result of the addition is int, not verysmallint, which is why you
don't get an error.

From an abstract-data-type point of view, the fact that any of these
operations are even allowed without an explicit downcast is a bit
uncool: it exposes the implementation of the domain type, which one
could argue shouldn't be allowed, at least not without some notational
marker showing you know what you're doing.  But the SQL committee
seems to have decided to ignore that tradition and allow auto-downcasts.
Nonetheless, when a domain type is fed to an operator that works on its
base type, it has to be clearly understood that there *is* an implied
downcast, and whatever special properties the domain may have had will
be lost.  As far as the operator and its result are concerned, the
domain is just its base type.

I'm not against fixing these cases so that auto downcasts happen, I'm
just saying that it's outside the scope of what I'm going to do in
response to the current bug.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Bruce Momjian
Kevin Grittner wrote:
 Tom Lane  wrote:
  
   I'm all for doing this client-side.
  
 Well, that makes a big difference.  Unless someone can make a
 convincing argument for why we should modify the server side to
 support this, I think we should just focus on this one client-side
 patch.
  
 I'd be happy to give it a closer look, but I may not be able to do so
 for a few weeks, and won't complain if someone beats me to it.

Uh, why would we do this client-side rather than server-side?  If we do
it server-side, all interfaces get it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus arrays versus typmods

2010-10-21 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 you are confusing an array over a domain type with a domain over
 an array type.
 
Yes I was.  Sorry.  Argument withdrawn.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Tom Lane  wrote:
 I'm all for doing this client-side.

 Uh, why would we do this client-side rather than server-side?  If we do
 it server-side, all interfaces get it.

1. The API that's being implemented is JDBC-specific.

2. Even if you want to argue that it would be generally useful to have
such a feature, it would certainly require additional client-side
programming to make each client interface make use of it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Tom Lane  wrote:
  I'm all for doing this client-side.
 
  Uh, why would we do this client-side rather than server-side?  If we do
  it server-side, all interfaces get it.
 
 1. The API that's being implemented is JDBC-specific.
 
 2. Even if you want to argue that it would be generally useful to have
 such a feature, it would certainly require additional client-side
 programming to make each client interface make use of it.

Wouldn't it be simpler to code this in the server and have the client
use that facility, rather than have the each client implement it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Kevin Grittner
Bruce Momjian br...@momjian.us wrote:
 
 why would we do this client-side rather than server-side?
 
Because the timeout is supposed to be a limit on the time allowed
for specific Java methods to complete, which might be running a
large number of SQL statements within one invocation, and which may
include significant network latency.  It's a lot of work to get
pretty close on the server side, and you can never really
implement exactly what the JDBC API is requesting.
 
What if you have an app which can draw data from any of a number of
remote databases, and you want to use this limit so if one becomes
unavailable for some reason you can re-run the request on another
within a reasonable time?  The network connection goes down after
you submit your request, you've got a period of minutes or hours
until TCP gives up, and the user expects a response within a few
seconds...
 
If you implement something with server-side semantics, there's
nothing to prevent an application which is PostgreSQL-aware from
accessing it through JDBC, of course.  statement_timeout and other
GUCs can be set locally to your heart's content.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-21 Thread Bruce Momjian
Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  
  why would we do this client-side rather than server-side?
  
 Because the timeout is supposed to be a limit on the time allowed
 for specific Java methods to complete, which might be running a
 large number of SQL statements within one invocation, and which may
 include significant network latency.  It's a lot of work to get
 pretty close on the server side, and you can never really
 implement exactly what the JDBC API is requesting.
  
 What if you have an app which can draw data from any of a number of
 remote databases, and you want to use this limit so if one becomes
 unavailable for some reason you can re-run the request on another
 within a reasonable time?  The network connection goes down after
 you submit your request, you've got a period of minutes or hours
 until TCP gives up, and the user expects a response within a few
 seconds...
  
 If you implement something with server-side semantics, there's
 nothing to prevent an application which is PostgreSQL-aware from
 accessing it through JDBC, of course.  statement_timeout and other
 GUCs can be set locally to your heart's content.

OK, thanks.  Just had to ask.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Tom Lane
I just noticed that there doesn't seem to be any good way of finding
out what a postmaster's default value of unix_socket_directory is.
If you try to SHOW it you just get an empty string.  We could probably
fix things so that SHOW exposes the actual setting, but (1) there might
be security arguments against that, and (2) if your problem is that you
would like to find out the value so's you can connect to said
postmaster, SHOW isn't going to help you.

One possible response would be to add an item to what pg_config knows
about, eg pg_config --socketdir.  This doesn't answer every possible
use-case either, but it would be helpful for some scenarios.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread Greg Stark
On Thu, Oct 21, 2010 at 6:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Thanks. It also explains my another question why Merge Append cannot
 be used for UNION ALL plans.

 Hmm, seems like the example you show ought to work.  I wonder if there
 was an oversight in that patch...


Huh, that definitely worked in the earlier versions of the patch (as
much as it worked at all)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in plpython's Python Generators

2010-10-21 Thread Alvaro Herrera
Excerpts from Jean-Baptiste Quenot's message of jue oct 21 09:20:16 -0300 2010:

 I get this error when calling the function:
 
 test=# select foobar();
 ERROR:  error fetching next item from iterator

I can reproduce this here.  The first bug to solve is, I think, getting
a more meaningful error report.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg
Tom Lane wrote:
Stephen R. van den Berg s...@cuci.nl writes:
 If it's inserted in the special area, it will not break any
 compatibility.

I'll tell you what I really don't like about this proposal: we discuss
some scheme or other for taking over the special space in heap pages
at least once a year.  None of them have been particularly compelling
so far, but one may come along that is; especially given that we're now
trying to maintain on-disk compatibility across versions.  So I think
the opportunity cost of assigning a use to that space is mighty high.
I don't find this idea important enough to justify foreclosing future
uses for the special space.

Well, I had (of course) thought of that, and the classical solution to
this is to specify a certain attribute based format in order not to
canabalise the space and block it for further other use.

I.e. in the special area, we could start using something like:
2-byte field length (including the length field), 1-byte identifier,
field content.

For the recovery information I'd like to reserve:
identifier: 00: table OID
01: table layout

The real bottom line is this: if you care enough about your data to
be willing to expend a large amount of effort on manual recovery
attempts, why didn't you have a decent backup scheme in place?

Two obvious valid answers would be: Stupidity and/or ignorance,
sometimes a strain of bad luck.
I know it is a sad state of affairs, but not all users of postgresql
are equally knowledgable/intelligent/responsible.

There are way too many scenarios where you'll have no hope of doing
any such manual recovery anyway.

True.  It's all a matter of statistics.  Judging by the number of reports
I find by googling net-history, I'd have to conclude that the proposed
extra information would have helped more than half of them.
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ask for review of MERGE

2010-10-21 Thread Greg Smith

Robert Haas wrote:

I think the right way to write UPSERT is something
along the lines of:

MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
s.item_id = t.item_id ...
  


That led in the right direction, after a bit more fiddling I was finally 
able to get something that does what I wanted:  a single table UPSERT 
implemented with this MERGE implementation.  Here's a log of a test 
session, suitable for eventual inclusion in the regression tests:


CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock ORDER BY item_id;

item_id | balance
-+-
 10 |2200
 20 |1900

MERGE INTO Stock t
USING (VALUES(10,100)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1

SELECT * FROM Stock ORDER BY item_id;
item_id | balance
-+-
 10 |2300
 20 |1900

MERGE INTO Stock t
USING (VALUES(30,2000)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1
SELECT * FROM Stock ORDER BY item_id;
item_id | balance
-+-
 10 |2300
 20 |1900
 30 |2000

I'm still a little uncertain as to whether any of my other examples 
should have worked under the spec but just didn't work here, but I'll 
worry about that later.


Here's what the query plan looks like on a MATCH:

Merge  (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 
rows=0 loops=1)

  Action 1: Update When Matched
  Action 2: Insert When Not Mactched
  MainPlan:
  -  Nested Loop Left Join  (cost=0.00..8.29 rows=1 width=22) (actual 
time=0.050..0.061 rows=1 loops=1)
-  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=8) 
(actual time=0.009..0.010 rows=1 loops=1)
-  Index Scan using stock_item_id_key on stock t  
(cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1)

  Index Cond: (*VALUES*.column1 = item_id)
Total runtime: 0.370 ms


And here's a miss:

Merge  (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145 
rows=0 loops=1)

  Action 1: Update When Matched
  Action 2: Insert When Not Mactched
  MainPlan:
  -  Nested Loop Left Join  (cost=0.00..8.29 rows=1 width=22) (actual 
time=0.028..0.033 rows=1 loops=1)
-  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=8) 
(actual time=0.004..0.005 rows=1 loops=1)
-  Index Scan using stock_item_id_key on stock t  
(cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)

  Index Cond: (*VALUES*.column1 = item_id)
Total runtime: 0.255 ms

Next steps here:
1) Performance/concurrency tests against trigger-based UPSERT approach.
2) Finish bit rot cleanup against HEAD.
3) Work out more complicated test cases to try and fine more unexpected 
behavior edge cases and general bugs.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote:
 
 in this case we do clearly have a problem, because the result is
 not equal to the serial execution of the transactions in either
 order.
 
Yeah, you're right.  I misread that example -- newbie with the
PERIOD type.
 
 So the question is: at what point is the logic wrong? It's either:
   1. PreCommit_CheckForSerializationFailure() is missing a failure
 case.
   2. The state prior to entering that function (which I believe I
 sufficiently described) is wrong.
 
It's (2).  For the reasons I described in my previous email.  Even
though misread the specifics of your example, I was close enough to
see where the problem was accurately.  :-/
 
 If it's (2), then what should the state look like, and how is the
 GiST code supposed to result in that state?
 
The second insert should create conflicts similar to what the first
did, but in the other direction -- simple write skew.  How GiST is
supposed to catch this is the big question.  My logic that a
conflicting insert will modify a page read by the other transaction
only holds until someone inserts a conflicting entry.  That's why it
wasn't reliably failing until you had and example where both
transactions accessing the same leaf page.
 
In your example, session 1's insert creates the leaf entry and
propagates entries up to the root.  When session 2 inserts, it
can just modify the leaf, so the conflict is missed.  As I said, the
most obvious way to fix this is to look for conflicts while
descending to the leaf for an insert.  I'm almost sure we can do
better than that, but I haven't finished thinking it through.  A
rough idea might be that when we find a conflict on an insert, we
acquire additional predicate locks on everything between the lowest
point of conflict and the leaf; the rest of the logic would remain
as-is.  I haven't finished mulling that over, but it seems likely to
work.  If we did that, session 2 would detect the conflict on the
insert to the leaf, and all would be well.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Peter Eisentraut
On tor, 2010-10-14 at 22:54 -0400, Robert Haas wrote:
 and maybe not that bad, but I wonder if there is some preparatory
 refactoring that could be done to trim it down a bit.  I notice, for
 example, that a lot of places that looked at asc/desc, nulls
 first/last now look at asc/desc, nulls first/last, collation.  In
 particular, all the pathkey stuff is like this.  And similarly places
 that used to care about type, typmod now have to care about type,
 tymod, collation.  There might be ways to restructure some of this
 code so that these things can be changed without having to touch quite
 so many places.

Yeah, I think that's what I'll try to do next.

We already have TypeName as a structure that contains type and typmod
(and collation, in my patch).  We could make that a primnode instead of
a parsenode, and use it in more places, or we could make a new leaner
structure that only contains the numeric info.

We could then, for example, change things like this:

typedef struct Var
{
Exprxpr;
...
Oid vartype;
int32   vartypmod;
...
}

into this

typedef struct Var
{
Exprxpr;
...
TypeName/TypeFoo vartype;
...
}

This would save boatloads of duplicate code.

 It looks like you've define collations as objects that exist within
 particular namespaces, but there's no CREATE COLLATION statement, so I
 don't see what purpose this serves.  I suppose we could leave that to
 be added later, but is there actually a use case for having collations
 in individual schemas, or should we treat them more like we do casts -
 i.e. as database-global objects?

The SQL standard defines it that way, and there should be a CREATE
COLLATION statement later.  Application-specific collation sequences
might not be unreasonable in the future.

 Why does the executor ever need to see collate clauses?

Hmm, maybe not.  I think it did in an earlier working draft.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in plpython's Python Generators

2010-10-21 Thread Alvaro Herrera
Excerpts from Alvaro Herrera's message of jue oct 21 15:32:53 -0300 2010:
 Excerpts from Jean-Baptiste Quenot's message of jue oct 21 09:20:16 -0300 
 2010:
 
  I get this error when calling the function:
  
  test=# select foobar();
  ERROR:  error fetching next item from iterator
 
 I can reproduce this here.  The first bug to solve is, I think, getting
 a more meaningful error report.

Something like this.  Somebody that really knows their way around Python
has to clean this up.

alvherre=# select * from foobar();
ERROR:  error extrayendo el próximo elemento del iterador
CONTEXTO:  falló SPI_execute: SPI_ERROR_UNCONNECTED
función PL/Python «foobar»

I think all error cases in plpython need some improvement so that they
show the error message from Python.  Right now they are ignored.

... and presumably somebody can fix the real bug that Jean-Baptiste hit,
too.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


plpy.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Cédric Villemain
2010/10/21 Tom Lane t...@sss.pgh.pa.us:
 I just noticed that there doesn't seem to be any good way of finding
 out what a postmaster's default value of unix_socket_directory is.
 If you try to SHOW it you just get an empty string.  We could probably
 fix things so that SHOW exposes the actual setting, but (1) there might
 be security arguments against that, and (2) if your problem is that you
 would like to find out the value so's you can connect to said
 postmaster, SHOW isn't going to help you.

 One possible response would be to add an item to what pg_config knows
 about, eg pg_config --socketdir.  This doesn't answer every possible
 use-case either, but it would be helpful for some scenarios.

 Thoughts?

I agree this is interesting information to get, but wonder how
pg_config can know that and it looks to me that this information as
nothing to do in pg_config

pg_config is all about installation, socket_dir is a postgresql.conf setting.

I'd prefer a 'pg_ctl status' output.


                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 2:44 PM, Peter Eisentraut pete...@gmx.net wrote:
 On tor, 2010-10-14 at 22:54 -0400, Robert Haas wrote:
 and maybe not that bad, but I wonder if there is some preparatory
 refactoring that could be done to trim it down a bit.  I notice, for
 example, that a lot of places that looked at asc/desc, nulls
 first/last now look at asc/desc, nulls first/last, collation.  In
 particular, all the pathkey stuff is like this.  And similarly places
 that used to care about type, typmod now have to care about type,
 tymod, collation.  There might be ways to restructure some of this
 code so that these things can be changed without having to touch quite
 so many places.

 Yeah, I think that's what I'll try to do next.

 We already have TypeName as a structure that contains type and typmod
 (and collation, in my patch).  We could make that a primnode instead of
 a parsenode, and use it in more places, or we could make a new leaner
 structure that only contains the numeric info.

 We could then, for example, change things like this:

 typedef struct Var
 {
    Expr        xpr;
    ...
    Oid         vartype;
    int32       vartypmod;
    ...
 }

 into this

 typedef struct Var
 {
    Expr        xpr;
    ...
    TypeName/TypeFoo vartype;
    ...
 }

 This would save boatloads of duplicate code.

I think that the idea of having a node that represents a type in all
its glory is a very good one.  I'm somewhat inclined not to reuse
TypeName, because I think we'll end up wanting to use this in places
where names and location are not available.  In fact, judging by
some of the logic in LookupTypeNames(), we have some cases like that
already, which might be worth trying to clean up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-21 Thread daveg
On Thu, Oct 21, 2010 at 08:16:27AM -0700, Mark Wong wrote:
 On Tue, Oct 19, 2010 at 8:30 PM, daveg da...@sonic.net wrote:
  On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
  On 20/10/10 16:05, Mark Kirkwood wrote:
  
  
  shmget and friends are hugetlbpage  aware, so it seems it should 'just
  work'.
  
 
  Heh - provided you specify
 
  SHM_HUGETLB
 
 
  in the relevant call that is :-)
 
  I had a patch for this against 8.3 that I could update if there is any
  interest. I suspect it is helpful.
 
 Oh, probably better than me digging up my broken one.  Send it out as
 is if you don't want to update it. :)

I'll update it and see if I can get a largish machine to test, at least with
pgbench on. But not today alas.

-dg
 
-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Alvaro Herrera
Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010:

 I agree this is interesting information to get, but wonder how
 pg_config can know that and it looks to me that this information as
 nothing to do in pg_config
 
 pg_config is all about installation, socket_dir is a postgresql.conf setting.

Yeah -- how is pg_config to know?  All it can tell you is what was the
compiled-in default.

pg_ctl would be nice, but we'd have to make it parse the config file
(there has been talk about that).  In any case, if you don't know where
the socket is, presumably you don't know where the config file is,
either.  You've just moved the problem.

Maybe you should go the SHOW route.  The user could connect via TCP and
find out the socket directory that way.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 One possible response would be to add an item to what pg_config knows
 about, eg pg_config --socketdir.  This doesn't answer every possible
 use-case either, but it would be helpful for some scenarios.

 Thoughts?

Following some links one can find out
  http://packages.debian.org/source/unstable/postgresql-9.0
  
http://ftp.de.debian.org/debian/pool/main/p/postgresql-9.0/postgresql-9.0_9.0.1-1.debian.tar.gz

And check that debian package is patching src/include/pg_config_manual.h
and not using the other facility proposed in the comments:

 * here's where to twiddle it.  You can also override this at runtime
 * with the postmaster's -k switch.

-#define DEFAULT_PGSOCKET_DIR  /tmp
+#define DEFAULT_PGSOCKET_DIR  /var/run/postgresql

But still, I wonder how this -k switch will get a role here, pg_control
certainly won't know about that. I guess it's worse to give a wrong
value rather than none, but that's easy to fix by having a good label
for the line, I guess.

All in all it would be good to have that in pg_control.
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Greg Stark
On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg s...@cuci.nl wrote:
 For the recovery information I'd like to reserve:
 identifier: 00: table OID
            01: table layout


So here's a proposal for something that could maybe be implemented. I
think I'm leaning against this currently as there are just too many
caveats even for this limited functionality. I'm more inclined to go
with the idea someone else proposed of dumping an audit log of all DDL
on the table or after any DDL dumping the create table statements as
pg_dump would generate them would to a separate fork.

But here goes just to give you an idea what I think is doable and how
limited it would be:

In this meta data object put:

table oid
table name
number of columns
array of typlen for those columns (-1 for varlena and size for fixed length)

That would have a maximum size of just over 6k which is too large for
a BUFSZ 4k database but fits on default 8k databases. It would have
enough information to be able to find all the table columns but not to
understand how to interpret the contents either what their name or
types.

Including the type brings in a new set of complications. Even if you
assume the built-in typoids never change there are going to be typoids
that correspond to user defined types. Without the names of those
types the oids are pretty useless.

Just the typoids would put you over 8k in the worst case and the names
would put you into the realm of needing arbitrary numbers of blocks
for even average cases. Simiarly including the column names would
require potentially many blocks.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010:
 I agree this is interesting information to get, but wonder how
 pg_config can know that and it looks to me that this information as
 nothing to do in pg_config
 
 pg_config is all about installation, socket_dir is a postgresql.conf setting.

 Yeah -- how is pg_config to know?  All it can tell you is what was the
 compiled-in default.

That's what I wanted, actually.  If you've set a non-default value in
postgresql.conf, SHOW will tell you about that, but it fails to expose
the default value.

 Maybe you should go the SHOW route.  The user could connect via TCP and
 find out the socket directory that way.

Yeah, the SHOW case is not useless by any means.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Robert Haas
2010/10/21 Tom Lane t...@sss.pgh.pa.us:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010:
 I agree this is interesting information to get, but wonder how
 pg_config can know that and it looks to me that this information as
 nothing to do in pg_config

 pg_config is all about installation, socket_dir is a postgresql.conf 
 setting.

 Yeah -- how is pg_config to know?  All it can tell you is what was the
 compiled-in default.

 That's what I wanted, actually.  If you've set a non-default value in
 postgresql.conf, SHOW will tell you about that, but it fails to expose
 the default value.

 Maybe you should go the SHOW route.  The user could connect via TCP and
 find out the socket directory that way.

 Yeah, the SHOW case is not useless by any means.

I think adding this to pg_config is sensible.  Sure, the user could
have moved the socket directory.  But it's a place to start looking.
So why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-21 Thread Josh Berkus
On 10/20/10 6:54 PM, Robert Haas wrote:
 I find it impossible to believe that's
 a good decision, and IMHO we should be focusing on how to make the
 parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
 most of the same benefits without throwing away hard-won performance.

I'd be happy to accept that.  Is it possible, though?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-21 Thread David Fetter
On Thu, Oct 21, 2010 at 12:10:22PM -0700, David Gould wrote:
 On Thu, Oct 21, 2010 at 08:16:27AM -0700, Mark Wong wrote:
  On Tue, Oct 19, 2010 at 8:30 PM, daveg da...@sonic.net wrote:
   On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
   On 20/10/10 16:05, Mark Kirkwood wrote:
   
   
   shmget and friends are hugetlbpage  aware, so it seems it should 'just
   work'.
   
  
   Heh - provided you specify
  
   SHM_HUGETLB
  
  
   in the relevant call that is :-)
  
   I had a patch for this against 8.3 that I could update if there is any
   interest. I suspect it is helpful.
  
  Oh, probably better than me digging up my broken one.  Send it out as
  is if you don't want to update it. :)
 
 I'll update it and see if I can get a largish machine to test, at least with
 pgbench on. But not today alas.

If you'd be so kind as to update it, others can probably find the
aforementioned largish machine to test it on :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Tom Lane
Stephen R. van den Berg s...@cuci.nl writes:
 Tom Lane wrote:
 There are way too many scenarios where you'll have no hope of doing
 any such manual recovery anyway.

 True.  It's all a matter of statistics.  Judging by the number of reports
 I find by googling net-history, I'd have to conclude that the proposed
 extra information would have helped more than half of them.

Uh, no, it would have helped whatever minuscule fraction of them had the
tools and the expertise to make use of the information.  This should not
be confused with a magic fix-it button.  If you've lost your system
catalogs you're facing a whole lot of work that will at best recover an
untrustworthy version of some of your data.  Most of the data-loss
reports I've seen appeared to come from people who wouldn't be capable
of doing such recovery work even if better tools were available.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 4:21 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/20/10 6:54 PM, Robert Haas wrote:
 I find it impossible to believe that's
 a good decision, and IMHO we should be focusing on how to make the
 parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
 most of the same benefits without throwing away hard-won performance.

 I'd be happy to accept that.  Is it possible, though?

I sketched an outline of the problem AIUI here:

http://archives.postgresql.org/pgsql-hackers/2010-10/msg01348.php

I think it's possible; I'm not quite sure how hard it is.
Unfortunately, I've not had as much PG-hacking time lately as I'd
like...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 We already have TypeName as a structure that contains type and typmod
 (and collation, in my patch).  We could make that a primnode instead of
 a parsenode, and use it in more places, or we could make a new leaner
 structure that only contains the numeric info.

TypeName per se is completely inappropriate for use beyond the first
stage of parsing, because it requires catalog lookups to make any sense
of.  I think the post-parsing representation should still start with a
type OID.  I can agree with replacing typmod with a struct, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 4:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 We already have TypeName as a structure that contains type and typmod
 (and collation, in my patch).  We could make that a primnode instead of
 a parsenode, and use it in more places, or we could make a new leaner
 structure that only contains the numeric info.

 TypeName per se is completely inappropriate for use beyond the first
 stage of parsing, because it requires catalog lookups to make any sense
 of.  I think the post-parsing representation should still start with a
 type OID.  I can agree with replacing typmod with a struct, though.

I think we should have both the type OID and the typmod in the struct.
 Carrying the type OID separately from the typmod has caused us enough
heartache already.  No?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread A.M.

On Oct 21, 2010, at 4:19 PM, Robert Haas wrote:

 2010/10/21 Tom Lane t...@sss.pgh.pa.us:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010:
 I agree this is interesting information to get, but wonder how
 pg_config can know that and it looks to me that this information as
 nothing to do in pg_config
 
 pg_config is all about installation, socket_dir is a postgresql.conf 
 setting.
 
 Yeah -- how is pg_config to know?  All it can tell you is what was the
 compiled-in default.
 
 That's what I wanted, actually.  If you've set a non-default value in
 postgresql.conf, SHOW will tell you about that, but it fails to expose
 the default value.
 
 Maybe you should go the SHOW route.  The user could connect via TCP and
 find out the socket directory that way.
 
 Yeah, the SHOW case is not useless by any means.
 
 I think adding this to pg_config is sensible.  Sure, the user could
 have moved the socket directory.  But it's a place to start looking.
 So why not?


Because pg_config is supposed to return the current state of a cluster?
Because it might indicate a connection to the wrong server?

Cheers,
M
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Oct 21, 2010 at 4:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 TypeName per se is completely inappropriate for use beyond the first
 stage of parsing, because it requires catalog lookups to make any sense
 of.  I think the post-parsing representation should still start with a
 type OID.  I can agree with replacing typmod with a struct, though.

 I think we should have both the type OID and the typmod in the struct.
  Carrying the type OID separately from the typmod has caused us enough
 heartache already.  No?

I think that that would probably involve a whole lot more notational
busywork than just replacing typmod with something more complicated.
However, we're talking about breaking vast amounts of code in either
case, so maybe making it even vaster isn't a real consideration.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, work in progress

2010-10-21 Thread Josh Berkus

 I think that that would probably involve a whole lot more notational
 busywork than just replacing typmod with something more complicated.
 However, we're talking about breaking vast amounts of code in either
 case, so maybe making it even vaster isn't a real consideration.

Gods, yes.  Please let's not extend typemod any further without an overhaul.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] find -path isn't portable

2010-10-21 Thread Tom Lane
buildfarm member koi, having recently been rescued from git purgatory,
is failing like this:

configure: using CFLAGS=-O -Kinline
configure: using CPPFLAGS= -I/usr/local/include/libxml2  -I/usr/local/include
configure: using LDFLAGS= -L/usr/local/lib  -L/usr/local/lib
preparing build tree... UX:find: ERROR: Illegal option -- -path
UX:find: TO FIX: Usage: find [path-list] [predicate-list]
UX:ln: ERROR: Cannot create 
/home/ohp/pgfarmbuild/HEAD/pgsql.10464//config/Makefile: No such file or 
directory
configure: error: failed

which indicates it can't cope with your recent patch to prep_buildtree.
I was afraid that might be a problem.  Can we please stick to the
find(1) options defined by the Single Unix Spec?
http://www.opengroup.org/onlinepubs/007908799/xcu/find.html

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing an installation's default value of unix_socket_directory

2010-10-21 Thread Tom Lane
A.M. age...@themactionfaction.com writes:
 On Oct 21, 2010, at 4:19 PM, Robert Haas wrote:
 I think adding this to pg_config is sensible.  Sure, the user could
 have moved the socket directory.  But it's a place to start looking.
 So why not?

 Because pg_config is supposed to return the current state of a cluster?

pg_config is not supposed to do any such thing.  It exists specifically
and solely to tell you about build options that were baked into the
compiled code.

Actually, the only reason this is even up for discussion is that there's
no configure option to set DEFAULT_PGSOCKET_DIR.  If there were, and
debian were using it, then pg_config --configure would tell what I wish
to know.  I thought for a bit about proposing we add such an option,
but given the current state of play it might be more misleading than
helpful: as long as distros are accustomed to changing this setting via
a patch, you couldn't trust pg_config --configure to tell you what a
given installation actually has compiled into it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera
Hi,

A customer was hitting some misbehavior in one of their internal tests and
I tracked it down to plancache not behaving properly with
subtransactions: in particular, a plan is not being marked dead when
the subtransaction on which it is planned rolls back.  It was reported
in 8.4, but I can reproduce the problem on 9.0 too with this small
script:

drop schema alvherre cascade;
drop schema test cascade;
create schema test;
create schema alvherre;
set search_path = 'alvherre';

create or replace function dummy(text) returns text language sql
as $$ SELECT relname::text FROM pg_class c WHERE c.oid = $1::regclass 
$$;

create or replace function broken(p_name_table text) returns void
language plpgsql as $$
declare
v_table_full text := alvherre.dummy(p_name_table);
begin
return;
end;
$$;

BEGIN;
 create table test.stuffs (stuff text);
 SAVEPOINT a;
 select broken('nonexistant.stuffs');

 ROLLBACK TO a;
 select broken('test.stuffs');

rollback;


The symptom is that the second call to broken() fails with this error
message:

ERROR:  relation  does not exist
CONTEXT:  SQL function dummy statement 1
PL/pgSQL function broken line 3 during statement block local variable 
initialization

Note that this is totally bogus, because the relation being referenced
does indeed exist.  In fact, if you commit the transaction and call the
function again, it works.

Also, the state after the first call is a bit bogus: if you repeat the
whole sequence starting at the BEGIN line, it causes a crash on 8.4.

I hacked up plancache a bit so that it marks plans as dead when the
subtransaction resource owner releases it.  It adds a new arg to
ReleaseCachedPlan(); if true, the plan is marked dead.  All current
callers, except the one in ResourceOwnerReleaseInternal(), use false
thus preserving the current behavior.  resowner sets this as true when
aborting a (sub)transaction.

I have to admit that it seems somewhat the wrong API, but I don't see a
better way.  (I thought above relcache or syscache inval, but as far as
I can't tell there isn't any here).  I'm open to suggestions.

Patch attached.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org


0001-Mark-a-cache-plan-as-dead-when-aborting-its-creating.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] find -path isn't portable

2010-10-21 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue oct 21 17:48:18 -0300 2010:
 buildfarm member koi, having recently been rescued from git purgatory,
 is failing like this:
 
 configure: using CFLAGS=-O -Kinline
 configure: using CPPFLAGS= -I/usr/local/include/libxml2  -I/usr/local/include
 configure: using LDFLAGS= -L/usr/local/lib  -L/usr/local/lib
 preparing build tree... UX:find: ERROR: Illegal option -- -path
 UX:find: TO FIX: Usage: find [path-list] [predicate-list]
 UX:ln: ERROR: Cannot create 
 /home/ohp/pgfarmbuild/HEAD/pgsql.10464//config/Makefile: No such file or 
 directory
 configure: error: failed
 
 which indicates it can't cope with your recent patch to prep_buildtree.
 I was afraid that might be a problem.  Can we please stick to the
 find(1) options defined by the Single Unix Spec?
 http://www.opengroup.org/onlinepubs/007908799/xcu/find.html

Ugh.  I'll look into this.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Thu, Oct 21, 2010 at 6:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Thanks. It also explains my another question why Merge Append cannot
 be used for UNION ALL plans.

 Hmm, seems like the example you show ought to work.  I wonder if there
 was an oversight in that patch...

 Huh, that definitely worked in the earlier versions of the patch (as
 much as it worked at all)

Actually, it works as long as the UNION is in a subquery:

regression=# EXPLAIN select * from (
(SELECT * FROM ONLY parent ORDER BY i) UNION ALL
(SELECT * FROM child ORDER BY i)) ss ORDER BY i LIMIT 10;
  QUERY PLAN
---
 Limit  (cost=168.76..169.13 rows=10 width=4)
   -  Result  (cost=168.76..294.51 rows=3400 width=4)
 -  Merge Append  (cost=168.76..294.51 rows=3400 width=4)
   Sort Key: parent.i
   -  Sort  (cost=168.75..174.75 rows=2400 width=4)
 Sort Key: parent.i
 -  Seq Scan on parent  (cost=0.00..34.00 rows=2400 
width=4)
   -  Index Scan using child_i_idx on child  (cost=0.00..43.25 
rows=1000 width=4)
(8 rows)

The oversight here is that we don't use appendrel planning for
a top-level UNION ALL construct.  That didn't use to matter,
because you always got the same stupid Append plan either way.
Now it seems like we ought to have some more intelligence for the
top-level SetOp case.  I smell some code refactoring coming up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 Most of the data-loss reports I've seen appeared to come from
 people who wouldn't be capable of doing such recovery work even if
 better tools were available.
 
No doubt; but the recovery work often winds up in the hands of
people with more skills than those responsible for the loss. 
Whoever comes in to help with recovery is looking for every bit of
traction they can get.  You'd be amazed at some of the odd straws
people can grasp to help recover data.
 
I don't know how big the special area is, but if enough room could
be carved out to include even the relation ID or filename, it might
be a big help to someone.  I'm pretty skeptical about including
information about attributes, though.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we have a database specification in .pgpass?

2010-10-21 Thread Bruce Momjian
Peter Eisentraut wrote:
 On ons, 2010-10-13 at 14:32 -0400, Bruce Momjian wrote:
  We have a database specification in .pgpass:
  
  hostname:port:database:username:password
  
  What is the purpose of 'database' since username/password combinations
  are global, not per database?  I would like to documents its purpose.
 
 As a side note, the thing at the other end of a connection is not
 necessarily a PostgreSQL server.  It could be a connection pool proxy.
 I don't know if any implementatation could make use of the database
 field at the moment, but it should be kept in mind.
 
 That said, it would probably be good to document that the database field
 is currently only useful in certain limited circumstances.

Agreed, done.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 70d9202..a911c50 100644
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*** myEventProc(PGEventId evtId, void *evtIn
*** 6331,6336 
--- 6331,6338 
 or the default socket directory) connections coming from the local
 machine. In a standby server, a database name of literalreplication/
 matches streaming replication connections made to the master server.
+The literaldatabase/ field is of limited usefulness because
+users have the same password for all databases in the same cluster.
/para
  
para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-21 Thread David E. Wheeler
On Oct 21, 2010, at 2:17 PM, Tom Lane wrote:

 The oversight here is that we don't use appendrel planning for
 a top-level UNION ALL construct.  That didn't use to matter,
 because you always got the same stupid Append plan either way.
 Now it seems like we ought to have some more intelligence for the
 top-level SetOp case.  I smell some code refactoring coming up.

Does it smell like chicken?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_filedump binary for CentOS

2010-10-21 Thread Bruce Momjian
Greg Stark wrote:
 On Thu, Oct 14, 2010 at 2:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Bruce Momjian br...@momjian.us writes:
  Should we consider moving pg_filedump into our /contrib?
 
  Can't: it's GPL.
 
 
 I don't particularly see a problem with having GPL'd contrib modules.
 It would mean any users hoping to redistribute the package couldn't
 include those modules except under the GPL. But most repackagers don't
 include the contrib modules anyways. Even ones that do and want to
 include those modules would only have to include the source to that
 module.
 
 I can see not wanting to let that camel's nose in for fear of having
 packagers always be uncertain about the status of each contrib module
 though.

I think we should just link to the tool from our docs so there is no
license complexity.  Where do we add it?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeout and wait-forever in sync rep

2010-10-21 Thread Bruce Momjian
Fujii Masao wrote:
 Hi,
 
 As the result of the discussion, I think that we need the following two
 parameters for the case where the standby goes down.

Can we have a parameter that calls a operating system command when a
standby is declared dead, to notify the administrator?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Tom Lane
Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 A customer was hitting some misbehavior in one of their internal tests and
 I tracked it down to plancache not behaving properly with
 subtransactions: in particular, a plan is not being marked dead when
 the subtransaction on which it is planned rolls back.

I don't believe that it's plancache's fault; the real problem is that
plpgsql is keeping simple expression execution trees around longer
than it should.  Your patch masks the problem by forcing those trees to
be rebuilt, but it's the execution trees not the plan trees that contain
stale data.

I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
correctly in this example, but that seems to be where to look.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] docs on contrib modules that can't pg_upgrade?

2010-10-21 Thread Bruce Momjian
Robert Treat wrote:
 Howdy folks,
 
 Was wondering if there are any docs on which contrib modules don't work with 
 pg_upgrade? I seem to remember discussion on this during the 9.0 cycle, but 
 couldn't find it in the mail archive, and don't see anything in the wiki.  
 What 
 brings this up is I'm currently working on an 8.3 upgrade and it has 
 pg_freespacemap which breaks things; I think easy enough to work-around in 
 this case, but I am sure for other contribs, or for folks with a lot of 
 machinery built on top of a contrib, that won't always be the case. If 
 something like this doesn't exist, I'll start a wiki page on it, but thought 
 I'd ask first.

I don't know of any /contrib modules that will not upgrade;  if we had
any they would be mentioned in the pg_upgrade docs and checked in the
source code.

We had a possible issue with hstore, but that was changed in a
backward-compatible way for 9.0.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Mark Kirkwood

On 19/10/10 13:16, Josh Berkus wrote:

Robert asked me to write this up, so here it is.

It is critical that we make replication easier to set up, administrate 
and monitor than it currently is.  In my conversations with people, 
this is more important to our users and the adoption of PostgreSQL 
than synchronous replication is.


First, I'm finding myself constantly needing to tutor people on how to 
set up replication.  The mere fact that it requires a minimum 1-hour 
class to explain how to use it, or a 10-page tutoral, tells us it's 
too complex.  As further evidence, Bruce and I explained binary 
replication to several MySQL geeks at OpenSQLCamp last weekend, and 
they were horrified at the number and complexity of the steps 
required.  As it currently is, binary replication is not going to win 
us a lot of new users from the web development or virtualization world.




+1

I've been having the same experience - how to set this up and do 
failover and failback etc occupies quite a bit of time in courses I've 
been teaching here in NZ and Australia. Having this whole replication 
business much simpler is definitely the way to go.


A good example of how simple it can be is mongodb, where it is 
essentially one command to setup a 2 replica system with a voting arbiter:


$ mongo
 rs.initiate(
  {
_id : replication_set0,
members : [
 { _id  : 0, host : 192.163,2,100 },
 { _id  : 1, host : 192.168.2.101 },
 { _id  : 2, host : 192.168.2.103, arbiterOnly : true }
]
  }
)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 5:21 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:

 Most of the data-loss reports I've seen appeared to come from
 people who wouldn't be capable of doing such recovery work even if
 better tools were available.

 No doubt; but the recovery work often winds up in the hands of
 people with more skills than those responsible for the loss.
 Whoever comes in to help with recovery is looking for every bit of
 traction they can get.  You'd be amazed at some of the odd straws
 people can grasp to help recover data.

 I don't know how big the special area is, but if enough room could
 be carved out to include even the relation ID or filename, it might
 be a big help to someone.  I'm pretty skeptical about including
 information about attributes, though.

Unfortunately, the use case for the relation ID or filename is much
thinner than the case for the column definitions.  You're less likely
to lose that information, and if you do lose it you can probably guess
by file size or by running strings on the data files.  The really hard
thing is to remember just exactly what columns you had in there, in
what order... and oh yeah there was that column we dropped.  But like
you, I'm pretty dubious about trying to store all that stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
  A customer was hitting some misbehavior in one of their internal tests and
  I tracked it down to plancache not behaving properly with
  subtransactions: in particular, a plan is not being marked dead when
  the subtransaction on which it is planned rolls back.
 
 I don't believe that it's plancache's fault; the real problem is that
 plpgsql is keeping simple expression execution trees around longer
 than it should.  Your patch masks the problem by forcing those trees to
 be rebuilt, but it's the execution trees not the plan trees that contain
 stale data.

Ahh, this probably explains why I wasn't been able to reproduce the
problem without involving subxacts, or prepared plans, that seemed to
follow mostly the same paths around plancache cleanup.

It's also the likely cause that this hasn't ben reported earlier.

 I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
 correctly in this example, but that seems to be where to look.

Will take a look ... if the girls let me ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation patch

2010-10-21 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote:
 
 When using locks in an unconventional way, it would be helpful to
 describe the invalid schedules that you're preventing. Perhaps an
 example if you think it would be reasonably simple? Also some
 indication of how another process is intended to modify the list
 without walking it.
 
I've just pushed some comment changes intended to address this.  Did
I hit the mark?
 
-Kevin
 
P.S.  Sorry for the delay in responding to such simple requests --
I've been tied up with a family medical crisis; I hope to crank
through much of what you've raised this weekend.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Bruce Momjian
Robert Haas wrote:
 On Mon, Oct 18, 2010 at 2:29 PM, Jeff Davis pg...@j-davis.com wrote:
  A reasonable conversion path might be to offer integer timestamps using
  a different type name (e.g. inttimestamp) that always means integer
  timestamps. Then, they could convert using ALTER TABLE, then do an
  in-place upgrade. We could even make pg_upgrade optionally convert
  inttimestamp to timestamp in O(1) on an integer-timestamps build.
 
 I think in retrospect it would certainly have been better to make
 integer timestamps and float timestamps two separate data types,
 rather than two versions of the same data type.  Whether it's worth
 providing that now after the fact is not clear to me.  I'd be inclined
 to wait and see whether we get many complaints...
 
 One problem with changing types in pg_upgrade is that type OIDs can
 get embedded in the on-disk representation - I believe that this
 happens for arrays, for instance.  So I think it's practical for
 pg_upgrade to change type names during a version upgrade, but not type
 OIDs.

One thing we have talked about is converting the page on read-in from
the backend.  Since the timestamps are the same size as float or
integer, that might be possible.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Greg Stark
On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian br...@momjian.us wrote:
 One thing we have talked about is converting the page on read-in from
 the backend.  Since the timestamps are the same size as float or
 integer, that might be possible.

Did we have a solution for the problem that understanding which
columns are timestamps requires having a tuple descriptor and parsing
the every tuple? That seems like it would a) be slow and b) require a
lot of high level code in the middle of a low-level codepath.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-21 Thread Bruce Momjian
Greg Stark wrote:
 On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote:
  The trick is that it would require us to have two pg_class tables, two
  pg_attribute tables, two pg_attrdef tables, etc.: in each case, one
  permanent and one temporary. ?I am not sure how complex that will turn
  out to be.
 
 Tom suggested using inheritance for this.
 
 I find it strange to try constructing catalog tables to represent
 these local definitions which never need to be read by any other
 backend and in any case are 1:1 copies of the global catalog entries.
 
 It seems to me simpler and more direct to just nail relcache
 entries for these objects into memory and manipulate them directly.
 They can be constructed from the global catalog tables and then
 tweaked to point to the backend local temporary tables.

Funny, but that is how I implemented temporary tables in 1999 and lasted
until 2002 when schema support was added.  It actually worked because
all the lookups go through the syscache.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg
Kevin Grittner wrote:
Tom Lane t...@sss.pgh.pa.us wrote:
 Most of the data-loss reports I've seen appeared to come from
 people who wouldn't be capable of doing such recovery work even if
 better tools were available.

No doubt; but the recovery work often winds up in the hands of
people with more skills than those responsible for the loss. 
Whoever comes in to help with recovery is looking for every bit of
traction they can get.  You'd be amazed at some of the odd straws
people can grasp to help recover data.

And, obviously, the amount of time spent (by the professional) to
recover the data goes down a lot when more information is available;
which reduces cost and shortens the downtime to the sad sod that
lost the data in the first place.
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-21 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Currently it isn't possible to create temporary tables on read-only
  standby servers, and I don't see it listed on the TODO list.  Can I add
  it?
 
 Not unless you have some credible concept for how it might ever be
 implemented.  You can't create temp tables because you can't modify
 system catalogs, and if you did somehow create them you couldn't put
 anything in them because you can't generate XIDs on a slave ... much
 less commit them.  We have talked about ways that temp tables might be
 created without touching the real system catalogs, but the XID issue
 seems a complete showstopper.

So, this is one of those odd cases where we know people are going to ask
for a feature (temp tables on slaves), but we are not ready to put it on
our TODO list.  Where do we document that this isn't going to happen? 
In Features we don't want?  That title doesn't really match. 
Features we don't know how to do doesn't sound good.  ;-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Bruce Momjian
Josh Berkus wrote:
 Greg,
 
  The way things stand you *always* need archived logs. Even if you have
  streaming set up it might try to use archived logs if it falls too far
  behind.
 
 Actually, you don't.  If you're willing to accept possible
 desynchronization and recloning of the standbys, then you can skip the
 archive logs.

Agreed, but as a reality check:  when I proposed that wal_keep_segments
= -1 would keep all WAL segments (for use while the file system was
being backed up), I was told administrators shoud compute how much free
disk space they had.  Obviously easy of use is not our #1 priority.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:

 I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
 correctly in this example, but that seems to be where to look.

I think the reason is that one econtext is pushed for function
execution, and another one for blocks that contain exceptions.  The
example function does not contain exceptions -- the savepoints are
handled by the external SQL code.

I'll have a closer look tomorrow.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-21 Thread Bruce Momjian
Robert Haas wrote:
 On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark gsst...@mit.edu wrote:
  On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas robertmh...@gmail.com wrote:
  Exactly. ?It doesn't take many 3-7% slowdowns to add up to being 50%
  or 100% slower, and that sucks. ?In fact, I'm still not convinced that
  we were wise to boost default_statistics_target as much as we did. ?I
  argued for a smaller boost at the time.
 
  Well we don't want to let ourselves be paralyzed by FUD so it was
  important to identify specific concerns and then tackle those
  concerns. Once we identified the worst-case planning cases we profiled
  them and found that the inflection point of the curve was fairly
  clearly above 100 but that there were cases where values below 1,000
  caused problems. So I'm pretty happy with the evidence-based approach.
 
 The inflection point of the curve was certainly a good thing for us to
 look at but the fact remains that we took a hit on a trivial
 benchmark, and we can't afford to take too many of those.

Agreed.  If people start wondering if our new major releases are perhaps
_slower_ than previous ones, we have lost a huge amount of momentum.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-21 Thread Bruce Momjian
Robert Haas wrote:
 On Thu, Oct 21, 2010 at 4:21 PM, Josh Berkus j...@agliodbs.com wrote:
  On 10/20/10 6:54 PM, Robert Haas wrote:
  I find it impossible to believe that's
  a good decision, and IMHO we should be focusing on how to make the
  parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
  most of the same benefits without throwing away hard-won performance.
 
  I'd be happy to accept that. ?Is it possible, though?
 
 I sketched an outline of the problem AIUI here:
 
 http://archives.postgresql.org/pgsql-hackers/2010-10/msg01348.php
 
 I think it's possible; I'm not quite sure how hard it is.
 Unfortunately, I've not had as much PG-hacking time lately as I'd
 like...

Have we documented these TODOs?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 8:22 PM, Bruce Momjian br...@momjian.us wrote:
 Josh Berkus wrote:
 Greg,

  The way things stand you *always* need archived logs. Even if you have
  streaming set up it might try to use archived logs if it falls too far
  behind.

 Actually, you don't.  If you're willing to accept possible
 desynchronization and recloning of the standbys, then you can skip the
 archive logs.

 Agreed, but as a reality check:  when I proposed that wal_keep_segments
 = -1 would keep all WAL segments (for use while the file system was
 being backed up), I was told administrators shoud compute how much free
 disk space they had.  Obviously easy of use is not our #1 priority.

Amen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql autocompletion for \z and \dg

2010-10-21 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:56 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 It looks like psql's tab completion for the \z and \dg commands in
 psql are missing. I couldn't see a reason for this, so attached patch
 fixes.

 Also, this patch proposes to change psql's \? help text to say that
 \dg and \du are the same, since AFAICT they do exactly the same thing.

Please add this in the usual spot:
https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Josh Berkus

 Agreed, but as a reality check:  when I proposed that wal_keep_segments
 = -1 would keep all WAL segments (for use while the file system was
 being backed up), I was told administrators shoud compute how much free
 disk space they had.  Obviously easy of use is not our #1 priority.

Depends.  Running out of disk space isn't exactly user-friendly either.
 And detecting how much free space is available would be a painful bit
of platform-dependant code ...

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Greg Stark
On Thu, Oct 21, 2010 at 5:46 PM, Josh Berkus j...@agliodbs.com wrote:

 Agreed, but as a reality check:  when I proposed that wal_keep_segments
 = -1 would keep all WAL segments (for use while the file system was
 being backed up), I was told administrators shoud compute how much free
 disk space they had.  Obviously easy of use is not our #1 priority.

 Depends.  Running out of disk space isn't exactly user-friendly either.
  And detecting how much free space is available would be a painful bit
 of platform-dependant code ...

Nor can we assume we're the only thing using disk space.

However the user-unfriendliness isn't the fact that administrators
need to determine how much disk they're willing to dedicate to
Postgres. The user-unfriendliness is that they then have to specify
this in terms of WAL log files and also have to know that we sometimes
keep more than that and so on.

We've done a good job in the past of converting GUC variables to
meaningful units for administrators and users but it's an ongoing
effort. If we need a GUC to control the amount of disk space we use it
should be in units of MB/GB/TB. If we need a GUC for controlling how
much WAL history to keep for recovering standbys or replicas then it
should be specified in units of time.

Units like number of wal files or worse in the case of
checkpoint_segments number of wal files / 2 - 1 or something like
that are terrible. They require arcane knowledge for the
administrator to have a clue how to set.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Bruce Momjian
Greg Stark wrote:
 On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg s...@cuci.nl wrote:
  In order to simplify recovery at this point (enormously), it would
  have been very helpful (at almost negligible cost), to have the name
  of the table, the name of the columns, and the types of the
  columns available.
 
  Why don't we insert that data into the first page of a regular table
  file after in the special data area?
 
  I'd be willing to create a patch for that (should be pretty easy),
  if nobody considers it to be a bad idea.
 
 There isn't necessarily one value for these attributes.  You can
 rename columns and that rename may succeed and commit or fail and
 rollback. You can drop or add columns and some rows will have or not
 have the added columns at all. You could even add a column, insert
 some rows, then abort -- all in a transaction. So some (aborted) rows
 will have extra columns that aren't even present in the current table
 definition.
 
 All this isn't to say the idea you're presenting is impossible or a
 bad idea. If this meta information was only a hint for forensic
 purposes and you take into account these caveats it might still be
 useful. But I'm not sure how useful. I mean, you can't really decipher
 everything properly without the data in the catalog -- and you have to
 premise this on the idea that you've lost everything in the catalog
 but not the data in other tables. Which seems like a narrow use case.

I was thinking we could dump a flat file very 15 minutes into each
database directory that had recovery-useful information.  It wouldn't be
perfect, but would probably be sufficient for most forensics.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 8:52 PM, Greg Stark gsst...@mit.edu wrote:
 On Thu, Oct 21, 2010 at 5:46 PM, Josh Berkus j...@agliodbs.com wrote:

 Agreed, but as a reality check:  when I proposed that wal_keep_segments
 = -1 would keep all WAL segments (for use while the file system was
 being backed up), I was told administrators shoud compute how much free
 disk space they had.  Obviously easy of use is not our #1 priority.

 Depends.  Running out of disk space isn't exactly user-friendly either.
  And detecting how much free space is available would be a painful bit
 of platform-dependant code ...

 Nor can we assume we're the only thing using disk space.

 However the user-unfriendliness isn't the fact that administrators
 need to determine how much disk they're willing to dedicate to
 Postgres. The user-unfriendliness is that they then have to specify
 this in terms of WAL log files and also have to know that we sometimes
 keep more than that and so on.

 We've done a good job in the past of converting GUC variables to
 meaningful units for administrators and users but it's an ongoing
 effort. If we need a GUC to control the amount of disk space we use it
 should be in units of MB/GB/TB. If we need a GUC for controlling how
 much WAL history to keep for recovering standbys or replicas then it
 should be specified in units of time.

 Units like number of wal files or worse in the case of
 checkpoint_segments number of wal files / 2 - 1 or something like
 that are terrible. They require arcane knowledge for the
 administrator to have a clue how to set.

Very true.  But the lack of a -1 setting for wal_keep_segments means
that if you would like to take a backup without archiving, you must
set wal_keep_segments to a value greater than or equal to the rate at
which you generate WAL segments multiplied by the time it takes you to
run a backup.  If that doesn't qualify as requiring arcane knowledge,
I'm mystified as to what ever could.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql autocompletion for \z and \dg

2010-10-21 Thread Josh Kupershmidt
On Thu, Oct 21, 2010 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote:
 Please add this in the usual spot:
 https://commitfest.postgresql.org/action/commitfest_view/open

Aye sir, added.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Bruce Momjian
Robert Haas wrote:
  However the user-unfriendliness isn't the fact that administrators
  need to determine how much disk they're willing to dedicate to
  Postgres. The user-unfriendliness is that they then have to specify
  this in terms of WAL log files and also have to know that we sometimes
  keep more than that and so on.
 
  We've done a good job in the past of converting GUC variables to
  meaningful units for administrators and users but it's an ongoing
  effort. If we need a GUC to control the amount of disk space we use it
  should be in units of MB/GB/TB. If we need a GUC for controlling how
  much WAL history to keep for recovering standbys or replicas then it
  should be specified in units of time.
 
  Units like number of wal files or worse in the case of
  checkpoint_segments number of wal files / 2 - 1 or something like
  that are terrible. They require arcane knowledge for the
  administrator to have a clue how to set.
 
 Very true.  But the lack of a -1 setting for wal_keep_segments means
 that if you would like to take a backup without archiving, you must
 set wal_keep_segments to a value greater than or equal to the rate at
 which you generate WAL segments multiplied by the time it takes you to
 run a backup.  If that doesn't qualify as requiring arcane knowledge,
 I'm mystified as to what ever could.

LOL.  Time machine required (both forward and backward time options).

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg
Greg Stark wrote:
On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg s...@cuci.nl wrote:
 For the recovery information I'd like to reserve:
 identifier: 00: table OID
 ? ? ? ? ? ?01: table layout

So here's a proposal for something that could maybe be implemented. I
think I'm leaning against this currently as there are just too many
caveats even for this limited functionality. I'm more inclined to go
with the idea someone else proposed of dumping an audit log of all DDL
on the table or after any DDL dumping the create table statements as
pg_dump would generate them would to a separate fork.

Yes, but that has two distinct downsides I'd like to avoid:
- It can grow uncontrollably in the case of someone using alter table
  on a (too) regular basis.
- It separates the data from the tablefile it pertains to (which could
  complicate recovery (a lot)).

In this meta data object put:

table oid
table name
number of columns
array of typlen for those columns (-1 for varlena and size for fixed length)

[...]

Including the type brings in a new set of complications. Even if you
assume the built-in typoids never change there are going to be typoids
that correspond to user defined types. Without the names of those
types the oids are pretty useless.

Just the typoids would put you over 8k in the worst case and the names
would put you into the realm of needing arbitrary numbers of blocks
for even average cases. Simiarly including the column names would
require potentially many blocks.

All valid points/concerns.
But, let's approach this from the side of the forensics analist instead,
and see what information typically really would be needed (for argument's
sake, let's call the unfortunate sod that lost the catalog to his database
the customer):

The customer usually still has access to the developer, or some developer
documentation which documents which columns are used for what.  It would
most likely document most columns (especially the column names, to a
lesser extent, the column types), but might be lacking some of the
more recent changes which (unfortunately) were done on the live
database using alter table, and hadn't been documented properly yet
(I'm talking about typical real world cases I've encountered).

Which means that analist would primarily be concerned with getting back
the information of the column types and the column names.  The next
step would be to know about slight deviations from the documented
columns (dropped or added columns since the last revision of the docs).

Next are custom types.  Custom types are likely to be better documented,
hence the column name would usually be enough to recover the definition
of a custom type from the docs.

Assuming the above assumptions to be true in the common case, I would
propose to implement something along the following lines:

Entries for the special area in tables:

0007 00 tableoidTo be sprinkled in every
megabyte or so.

 01 00 cc tablename...  cc = number of columns

 01 01    ...   // are column widths
 for varlena widths
 01 02    ...   // are typeoids per
column (is 16-bits wide enough
for the common cases?  If not
disregard my suggestion and make
these 32-bit wide each)
 01 03 nn col1name nn col2name ...
nn = length of the column name
that follows
End of filled special area, no
need to parse beyond here in the
current page.

Whereas the 01 special area types could be present in any number
of pages.
If they are present, they shall be present starting at the
first page of the file, and possibly will be present in the
next following page(s), until a page is encountered without
them.

Multiple occurrences of 01 01, 01 02, or 01 03 shall be concatenated
to form the complete informationset.
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-21 Thread Stephen R. van den Berg
Bruce Momjian wrote:
Greg Stark wrote:
 On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg s...@cuci.nl 
 wrote:
  In order to simplify recovery at this point (enormously), it would
  have been very helpful (at almost negligible cost), to have the name
  of the table, the name of the columns, and the types of the
  columns available.

  Why don't we insert that data into the first page of a regular table
  file after in the special data area?

I was thinking we could dump a flat file very 15 minutes into each
database directory that had recovery-useful information.  It wouldn't be
perfect, but would probably be sufficient for most forensics.

It would definitely be better than the current state.
But it still disconnects the information from the files they belong to (a bit).
From a cost/benifit ratio point of view, I'd still prefer to interlace
the information into the tablefiles (which also scales better in case of
numerous tables).
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Josh Berkus

 Very true.  But the lack of a -1 setting for wal_keep_segments means
 that if you would like to take a backup without archiving, you must
 set wal_keep_segments to a value greater than or equal to the rate at
 which you generate WAL segments multiplied by the time it takes you to
 run a backup.  If that doesn't qualify as requiring arcane knowledge,
 I'm mystified as to what ever could.

Speaking of which, what's the relationship between checkpoint_segments
and wal_keep_segments?  PG seems perfectly willing to let me set the
latter higher than the former, and it's not documented.

If checkpoint_segments were a hard limit, then we could let admins set
wal_keep_segments to -1, knowing that they'd set checkpoint_segments to
the max space they had available.

Although we might want to rename those.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Bruce Momjian
Greg Stark wrote:
 On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian br...@momjian.us wrote:
  One thing we have talked about is converting the page on read-in from
  the backend. ?Since the timestamps are the same size as float or
  integer, that might be possible.
 
 Did we have a solution for the problem that understanding which
 columns are timestamps requires having a tuple descriptor and parsing
 the every tuple? That seems like it would a) be slow and b) require a
 lot of high level code in the middle of a low-level codepath.

Yep, that's what it requires.  It would rewrite in the new format.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 9:09 PM, Josh Berkus j...@agliodbs.com wrote:

 Very true.  But the lack of a -1 setting for wal_keep_segments means
 that if you would like to take a backup without archiving, you must
 set wal_keep_segments to a value greater than or equal to the rate at
 which you generate WAL segments multiplied by the time it takes you to
 run a backup.  If that doesn't qualify as requiring arcane knowledge,
 I'm mystified as to what ever could.

 Speaking of which, what's the relationship between checkpoint_segments
 and wal_keep_segments?  PG seems perfectly willing to let me set the
 latter higher than the former, and it's not documented.

I think it's pretty well explained in the fine manual.

http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS

 If checkpoint_segments were a hard limit, then we could let admins set
 wal_keep_segments to -1, knowing that they'd set checkpoint_segments to
 the max space they had available.

This assumes that more checkpoint segments is always better, which
isn't true.  I might have 100 GB of disk space free, but not want to
replay WAL for 4 days if I have a crash.

I do think that the current default of checkpoint_segments=3 is
pathologically insane, but that's another can of worms.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Josh Berkus

 I think it's pretty well explained in the fine manual.
 
 http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS

Nope.  No relationship to checkpoint_segments is explained there.  Try
again?

 If checkpoint_segments were a hard limit, then we could let admins set
 wal_keep_segments to -1, knowing that they'd set checkpoint_segments to
 the max space they had available.
 
 This assumes that more checkpoint segments is always better, which
 isn't true.  I might have 100 GB of disk space free, but not want to
 replay WAL for 4 days if I have a crash.

No, it assumes no such thing.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-21 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Greg Stark wrote:
 It seems to me simpler and more direct to just nail relcache
 entries for these objects into memory and manipulate them directly.
 They can be constructed from the global catalog tables and then
 tweaked to point to the backend local temporary tables.

 Funny, but that is how I implemented temporary tables in 1999 and lasted
 until 2002 when schema support was added.  It actually worked because
 all the lookups go through the syscache.

... and as I recall, we got rid of it principally because the temp
tables weren't visible to ordinary catalog lookups, thus breaking
all sorts of client-side logic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-21 Thread Robert Haas
On Thu, Oct 21, 2010 at 10:03 PM, Josh Berkus j...@agliodbs.com wrote:

 I think it's pretty well explained in the fine manual.

 http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS

 Nope.  No relationship to checkpoint_segments is explained there.  Try
 again?

Well, it says This sets only the minimum number of segments retained
in pg_xlog; the system might need to retain more segments for WAL
archival or to recover from a checkpoint.  So in other words, the
relationship with checkpoint segments is that whichever one currently
requires retaining a larger number of segments applies.  That's all
the relationship there is.  I'm not sure I understand the question.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-21 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Greg Stark wrote:
 Did we have a solution for the problem that understanding which
 columns are timestamps requires having a tuple descriptor and parsing
 the every tuple? That seems like it would a) be slow and b) require a
 lot of high level code in the middle of a low-level codepath.

 Yep, that's what it requires.  It would rewrite in the new format.

In the case of the recent hstore fixes, we were able to put the burden
on the hstore functions themselves to do any necessary conversion.
I wonder if it'd be possible to do something similar here?  I haven't
chased the bits in any detail, but I'm thinking that integer timestamps
in a plausible range might all look like denormalized floats, and
conversely plausible float timestamps would look like ridiculously large
integer timestamps.  Would we be willing to make such assumptions to
support in-place upgrade of timestamps?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-21 Thread Itagaki Takahiro
On Fri, Oct 22, 2010 at 1:31 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Of course, you what that means? Yes, another version of the patch, that
 will build the control file out of the control.in at build time rather
 than install time, and that's back to using EXTVERSION both in the
 Makefile and in the .control.in file.

Here are detailed report for v9 patch.

* extension.v9.patch.gz seems to contain other changes in the repo.
Did you use old master to get the diff?

* Typo in doc/xfunc.sgml. They are to be replaceable probably.
openjade:xfunc.sgml:2510:32:E: element REPLACABLE undefined
openjade:xfunc.sgml:2523:46:E: element REPLACABLE undefined

* There are some inconsistency between extension names in \dx+ view
and actual name used by CREATE EXTENSION.
  - auto_username = insert_username
  - intarray = _int
  - xml2 = pgxml
We might need to rename them, or add 'installer'/'uninstaller' entries
into control files to support different extension names from .so name.

* pg_execute_from_file() and encoding
It expects the file is in server encoding, but it is not always true
because we support multiple encodings in the same installation.
How about adding encoding parameter to the function?
  = pg_execute_from_file( file, encoding )
CREATE EXTENSION could have optional ENCODING option.
  = CREATE EXTENSION name [ ENCODING 'which' ]

I strongly hope the multi-encoding support for my Japanese textsearch
extension. Comments in the extension is written in UTF-8, but both
UTF-8 and EUC_JP are equally used for database encodings.

* Error messages in pg_execute_from_file()
- must be superuser to get file information would be
  must be superuser to execute file .
- File '%s' could not be executed would be
  could not execute file: '%s'. Our message style guide is here:
  http://www.postgresql.org/docs/9.0/static/error-style-guide.html
Many messages in extension.c are also to be adjusted.

commands/extension.c needs to be cleaned up a bit more:
* fsize in read_extension_control_file() is not used.
* ferror() test just after AllocateFile() is not needed;
  NULL checking is enough.
* malloc() in add_extension_custom_variable_classes().
I think the README says nothing about malloc() except assign_hook
cases; palloc would be better here.


BTW, did you register your patch to the next commitfest?
It would be better to do so for tracking the activities.
https://commitfest.postgresql.org/action/commitfest_view?id=8

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >