date:20130305

On 3 March 2013 23:39, Tom Lane t...@sss.pgh.pa.us wrote:
 Nicolas Barbier nicolas.barb...@gmail.com writes:
 2013/3/3 Kevin Grittner kgri...@ymail.com:
 Nicolas Barbier nicolas.barb...@gmail.com wrote:
 I think that automatically using materialized views even when the
 query doesn’t mention them directly, is akin to automatically
 using indexes without having to mention them in the query.

 Oh, I understand that concept perfectly well, I just wonder how
 often it is useful in practice.

 There's a much more fundamental reason why this will never happen, which
 is that the query planner is not licensed to decide that you only want
 an approximate and not an exact answer to your query.

 If MVs were guaranteed always up-to-date, maybe we could think about
 automatic use of them --- but that's a far different feature from what
 Kevin has here.

Its not a different feature, its what most people expect a feature
called MV to deliver. That's not a matter of opinion, its simply how
every other database works currently - Oracle, Teradata, SQLServer at
least. The fact that we don't allow MVs to automatically optimize
queries is acceptable, as long as that is clearly marked in some way
to avoid confusion, and I don't mean buried on p5 of the docs. What we
have here is a partial implementation that can be improved upon over
next few releases. I hope anyone isn't going to claim that
Materialized Views have been implemented in the release notes in
this release, because unqualified that would be seriously misleading
and might even stifle further funding to improve things to the level
already implemented elsewhere. Just to reiterate, I fully support the
committing of this partial feature into Postgres in this release
because it will be a long haul to complete the full feature and what
we have here is a reasonable stepping stone to get there.

Transactionally up-yo-date MVs can be used like indexes in the
planner. The idea that this is impossible because of the permutations
involved is somewhat ridiculous; there is much published work on
optimising that and some obvious optimisations. Clearly that varies
according to the number of MVs and the number of tables they touch,
not the overall complexity of the query. The overhead is probably same
or less as partial indexes, which we currently think is acceptable. In
any case, if you don't wish that overhead, don't use MVs.

Non-transactionally up-to-date MVs could also be used like indexes in
the planner, if we gave the planner the licence it (clearly) lacks.
If using MV makes a two-hour query return in 1 minute, then using an
MV that is 15 minutes out of date is likely to be a win. The licence
is some kind of user parameter/option that specifies how stale an
answer a query can return. For many queries that involve averages and
sums, a stale or perhaps an approximate answer would hardly differ
anyway. So I think there is room somewhere there for a staleness
time specification by the user, allowing approximation.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

On 5 March 2013 01:04, Daniel Farina dan...@heroku.com wrote:

 Corruption has easily occupied more than one person-month of time last
 year for us.  This year to date I've burned two weeks, although
 admittedly this was probably the result of statistical clustering.
 Other colleagues of mine have probably put in a week or two in
 aggregate in this year to date.  The ability to quickly, accurately,
 and maybe at some later date proactively finding good backups to run
 WAL recovery from is one of the biggest strides we can make in the
 operation of Postgres.  The especially ugly cases are where the page
 header is not corrupt, so full page images can carry along malformed
 tuples...basically, when the corruption works its way into the WAL,
 we're in much worse shape.  Checksums would hopefully prevent this
 case, converting them into corrupt pages that will not be modified.

 It would be better yet if I could write tools to find the last-good
 version of pages, and so I think tight integration with Postgres will
 see a lot of benefits that would be quite difficult and non-portable
 when relying on file system checksumming.

 You are among the most well-positioned to make assessments of the cost
 of the feature, but I thought you might appreciate a perspective of
 the benefits, too.  I think they're large, and for me they are the
 highest pole in the tent for what makes Postgres stressful to operate
 as-is today.  It's a testament to the quality of the programming in
 Postgres that Postgres programming error is not the largest problem.

That's good perspective.

I think we all need to be clear that committing this patch also
commits the community (via the committer) to significant work and
responsibility around this, and my minimum assessment of it is 1 month
per year for a 3-5 years, much of that on the committer. In effect
this will move time and annoyance experienced by users of Postgres
back onto developers of Postgres. That is where it should be, but the
effect will be large and easily noticeable, IMHO.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

Hello, I could cause the behavior and might understand the cause.

The head of origin/REL9_2_STABLE shows the behavior I metioned in
the last message when using the shell script attached. 9.3dev
runs as expected.

In XLogPageRead, when RecPtr goes beyond the last page, the
current xlog file is released and new page requested.

The variables were as below at the point.

  StandbyRequested == true
  StandbyMode == false
  ArchiveRecoveryRequested == true
  InArchiveRecovery == false

In this case, XLogPageRead immediately returns NULL before trying
to get xlogs via streaming nor from archive. So ReadRecord
returns NULL, then unexpectedly exits 'main redo apply loop' and
increases timeline ID as if it were promoted.

This seems fiexed by letting it try all requested
sources. Attached patch does it and the test script runs as
expected.

 We found that PostgreSQL with this patch unexpctedly becomes
 primary when starting up as standby. We'll do further
 investigation for the behavior.
 
   Anyway, I've committed this to master and 9.2 now.
  
  This seems to fix the issue. We'll examine this further.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh
version=abf5c5
version=924b
killall -9 postgres
source pgsetpath $version 0
rm -rf $PGDATA/* $PGARC/*
PGDATA0=$PGDATA
PGPORT0=$PGPORT
initdb -D $PGDATA0
cat  $PGDATA0/postgresql.conf EOF
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p $PGARC/%f'
max_wal_senders = 3
hot_standby = on
EOF
cat  $PGDATA0/pg_hba.conf EOF
local   replication horigutitrust
EOF
echo ## Startup master
pg_ctl -D $PGDATA0 -w start
source pgsetpath $version 1 -p 5433
PGDATA1=$PGDATA
PGPORT1=$PGPORT
rm -rf $PGDATA/* $PGARC/*
echo ## basebackup
pg_basebackup -h /tmp -p $PGPORT0 -F p -X s -D $PGDATA1
chmod 700 $PGDATA1
cat  $PGDATA1/recovery.conf EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5432'
restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi'
#restore_command='a=$PGARC; if [  -d \$a ]; then echo Archive directory \$a is 
not found.; exit 1; elif [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi'
EOF

echo ## Startup standby
pg_ctl -D $PGDATA1 start
echo ## Sleep for 5 seconds
sleep 5

echo ## Shutdown standby
pg_ctl -D $PGDATA1 -w stop -m f

echo ## Shutdown master in immediate mode
pg_ctl -D $PGDATA0 -w stop -m i

cat  $PGDATA0/recovery.conf EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5433'
restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi'
EOF

echo ## Starting master as a standby
if [ $1 == w ]; then touch /tmp/xlogwait; fi
PGPORT=5432 pg_ctl -D $PGDATA0  start
#psql postgres -c select pg_is_in_recovery();
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..00b5bc5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10604,7 +10604,19 @@ retry:
 			  sources);
 switched_segment = true;
 if (readFile  0)
+{
+	if (!InArchiveRecovery  ArchiveRecoveryRequested)
+	{
+		InArchiveRecovery = true;
+		goto retry;
+	}
+	else if (!StandbyMode  StandbyModeRequested)
+	{
+		StandbyMode = true;
+		goto retry;
+	}
 	return false;
+}
 			}
 		}
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-05 Thread Heikki Linnakangas


On 04.03.2013 09:11, Simon Riggs wrote:

On 3 March 2013 18:24, Greg Smithg...@2ndquadrant.com  wrote:


The 16-bit checksum feature seems functional, with two sources of overhead.
There's some CPU time burned to compute checksums when pages enter the
system.  And there's extra overhead for WAL logging hint bits.  I'll
quantify both of those better in another message.


It's crunch time. Do you and Jeff believe this patch should be
committed to Postgres core?

Are there objectors?


In addition to my hostility towards this patch in general, there are 
some specifics in the patch I'd like to raise (read out in a grumpy voice):


If you enable checksums, the free space map never gets updated in a 
standby. It will slowly drift to be completely out of sync with reality, 
which could lead to significant slowdown and bloat after failover.


Since the checksums are an all-or-nothing cluster-wide setting, the 
three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and 
PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the 
code simpler, and leaves the bits free for future use. If we want to 
enable such per-page setting in the future, we can add it later. For a 
per-relation scheme, they're not needed.



+ * The checksum algorithm is a modified Fletcher 64-bit (which is
+ * order-sensitive). The modification is because, at the end, we have two
+ * 64-bit sums, but we only have room for a 16-bit checksum. So, instead of
+ * using a modulus of 2^32 - 1, we use 2^8 - 1; making it also resemble a
+ * Fletcher 16-bit. We don't use Fletcher 16-bit directly, because processing
+ * single bytes at a time is slower.


How does the error detection rate of this compare with e.g CRC-16? Is 
there any ill effect from truncating the Fletcher sums like this?



+   /*
+* Store the sums as bytes in the checksum. We add one to shift the 
range
+* from 0..255 to 1..256, to make zero invalid for checksum bytes (which
+* seems wise).
+*/
+   p8Checksum[0] = (sum1 % 255) + 1;
+   p8Checksum[1] = (sum2 % 255) + 1;


That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall 
seeing that in other checksum implementations either. 16-bits is not 
very wide for a checksum, and this eats about 1% of the space of valid 
values.


I can see that it might be a handy debugging aid to avoid 0. But there's 
probably no need to avoid 0 in both bytes, it seems enough to avoid a 
completely zero return value.


XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN 
without a lock. That's not atomic, so it could incorrectly determine 
that a page doesn't need to be backed up. We used to always hold an 
exclusive lock on the buffer when it's called, which prevents 
modifications to the LSN, but that's no longer the case.


Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I 
think it will generate WAL records for unlogged tables as it is.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-05 Thread Andres Freund

Hi,

Have you benchmarked the toastrelidx removal stuff in any way? If not,
thats fine, but if yes I'd be interested.

On 2013-03-04 22:33:53 +0900, Michael Paquier wrote:
 --- a/src/backend/access/heap/tuptoaster.c
 +++ b/src/backend/access/heap/tuptoaster.c
 @@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
  {
   Relationtoastrel;
 - Relationtoastidx;
 + Relation   *toastidxs;
   HeapTuple   toasttup;
   TupleDesc   toasttupDesc;
   Datum   t_values[3];
 @@ -1257,15 +1257,26 @@ toast_save_datum(Relation rel, Datum value,
   char   *data_p;
   int32   data_todo;
   Pointer dval = DatumGetPointer(value);
 + ListCell   *lc;
 + int count = 0;

I find count a confusing name for a loop iteration variable... i of orr,
idxno, or ...

 + int num_indexes;
  
   /*
* Open the toast relation and its index.  We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it 
 has
 -  * additional columns besides OID.
 +  * additional columns besides OID. A toast table can have multiple 
 identical
 +  * indexes associated to it.
*/
   toastrel = heap_open(rel-rd_rel-reltoastrelid, RowExclusiveLock);
   toasttupDesc = toastrel-rd_att;
 - toastidx = index_open(toastrel-rd_rel-reltoastidxid, 
 RowExclusiveLock);
 + if (toastrel-rd_indexvalid == 0)
 + RelationGetIndexList(toastrel);

Hm, I think we should move this into a macro, this is cropping up at
more and more places.


 - index_insert(toastidx, t_values, t_isnull,
 -  (toasttup-t_self),
 -  toastrel,
 -  toastidx-rd_index-indisunique ?
 -  UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
 + for (count = 0; count  num_indexes; count++)
 + index_insert(toastidxs[count], t_values, t_isnull,
 +  (toasttup-t_self),
 +  toastrel,
 +  
 toastidxs[count]-rd_index-indisunique ?
 +  UNIQUE_CHECK_YES : 
 UNIQUE_CHECK_NO);

The indisunique check looks like a copy  pasto to me, albeit not
yours...

  
   /*
* Create the TOAST pointer value that we'll return
 @@ -1475,10 +1493,13 @@ toast_delete_datum(Relation rel, Datum value)
   struct varlena *attr = (struct varlena *) DatumGetPointer(value);
   struct varatt_external toast_pointer;
 + /*
 +  * We actually use only the first index but taking a lock on all is
 +  * necessary.
 +  */

Hm, is it guaranteed that the first index is valid?
 + foreach(lc, toastrel-rd_indexlist)
 + toastidxs[count++] = index_open(lfirst_oid(lc), 
 RowExclusiveLock);



   /*
 -  * If we're swapping two toast tables by content, do the same for their
 -  * indexes.
 +  * If we're swapping two toast tables by content, do the same for all of
 +  * their indexes. The swap can actually be safely done only if all the 
 indexes
 +  * have valid Oids.
*/

What's an index without a valid oid?

   if (swap_toast_by_content 
 - relform1-reltoastidxid  relform2-reltoastidxid)
 - swap_relation_files(relform1-reltoastidxid,
 - relform2-reltoastidxid,
 - target_is_pg_class,
 - swap_toast_by_content,
 - InvalidTransactionId,
 - InvalidMultiXactId,
 - mapped_tables);
 + relform1-reltoastrelid 
 + relform2-reltoastrelid)
 + {
 + Relation toastRel1, toastRel2;
 +
 + /* Open relations */
 + toastRel1 = heap_open(relform1-reltoastrelid, 
 RowExclusiveLock);
 + toastRel2 = heap_open(relform2-reltoastrelid, 
 RowExclusiveLock);

Shouldn't those be Access Exlusive Locks?

 + /* Obtain index list if necessary */
 + if (toastRel1-rd_indexvalid == 0)
 + RelationGetIndexList(toastRel1);
 + if (toastRel2-rd_indexvalid == 0)
 + RelationGetIndexList(toastRel2);
 +
 + /* Check if the swap is possible for all the toast indexes */

So there's no error being thrown if this turns out not to be possible?

 + if (!list_member_oid(toastRel1-rd_indexlist, InvalidOid) 
 +

Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used

Hello, 

  I think that the only other behavioural glitch I spotted was that it
  fails to catch one overflow case, which should generate an out of
  ranger error:
 
  select format('|%*s|', -2147483648, 'foo');
  Result: |foo|
 
  because -(-2147483648) = 0 in signed 32-bit integers.

Ouch. Thanks for pointing out.

 fixed - next other overflow check added

Your change shown below seems assuming that the two's complement
of the most negative number in integer types is identical to
itself, and looks working as expected at least on
linux/x86_64. But C standard defines it as undefined, (As far as
I hear :-).

|   if (width != 0)
|   {
|   int32 _width = -width;
|
|   if (SAMESIGN(width, _width))
|   ereport(ERROR,

Instead, I think we can deny it by simply comparing with
INT_MIN. I modified the patch like so and put some modifications
on styling.

Finally, enlargeStringInfo fails just after for large numbers. We
might should keep it under certain length to get rid of memory
exhaustion.

Anyway, I'll send this patch to committers as it is in this
message.

best wishes,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9b7e967..b2d2ed6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1519,21 +1519,13 @@
  primaryformat/primary
 /indexterm
 literalfunctionformat/function(parameterformatstr/parameter typetext/type
-[, parameterstr/parameter typeany/type [, ...] ])/literal
+[, parameterformatarg/parameter typeany/type [, ...] ])/literal
/entry
entrytypetext/type/entry
entry
  Format arguments according to a format string.
- This function is similar to the C function
- functionsprintf/, but only the following conversion specifications
- are recognized: literal%s/literal interpolates the corresponding
- argument as a string; literal%I/literal escapes its argument as
- an SQL identifier; literal%L/literal escapes its argument as an
- SQL literal; literal%%/literal outputs a literal literal%/.
- A conversion can reference an explicit parameter position by preceding
- the conversion specifier with literalreplaceablen/$/, where
- replaceablen/replaceable is the argument position.
- See also xref linkend=plpgsql-quote-literal-example.
+ This function is similar to the C function functionsprintf/.
+ See xref linkend=functions-string-format.
/entry
entryliteralformat('Hello %s, %1$s', 'World')/literal/entry
entryliteralHello World, World/literal/entry
@@ -2847,6 +2839,186 @@
 /tgroup
/table
 
+   sect2 id=functions-string-format
+titlefunctionformat/function/title
+
+indexterm
+ primaryformat/primary
+/indexterm
+
+para
+ The function functionformat/ produces formatted output according to
+ a format string in a similar way to the C function functionsprintf/.
+/para
+
+para
+synopsis
+format(parameterformatstr/ typetext/ [, parameterformatarg/ typeany/ [, ...] ])
+/synopsis
+ replaceableformatstr/ is a format string that specifies how the
+ result should be formatted.  Text in the format string is copied directly
+ to the result, except where firsttermformat specifiers/ are used.
+ Format specifiers act as placeholders in the string, allowing subsequent
+ function arguments to be formatted and inserted into the result.
+/para
+
+para
+ Format specifiers are introduced by a literal%/ character and take
+ the form
+synopsis
+%[replaceableparameter/][replaceableflags/][replaceablewidth/]replaceabletype/
+/synopsis
+ variablelist
+  varlistentry
+   termreplaceableparameter/replaceable (optional)/term
+   listitem
+para
+ An expression of the form literalreplaceablen/$/ where
+ replaceablen/ is the index of the argument to use for the format
+ specifier's value.  An index of 1 means the first argument after
+ replaceableformatstr/.  If the replaceableparameter/ field is
+ omitted, the default is to use the next argument.
+/para
+screen
+SELECT format('Testing %s, %s, %s', 'one', 'two', 'three');
+lineannotationResult: /computeroutputTesting one, two, three/
+
+SELECT format('Testing %3$s, %2$s, %1$s', 'one', 'two', 'three');
+lineannotationResult: /computeroutputTesting three, two, one/
+/screen
+
+para
+ Note that unlike the C function functionsprintf/ defined in the
+ Single UNIX Specification, the functionformat/ function in
+ productnamePostgreSQL/ allows format specifiers with and without
+ explicit replaceableparameter/ fields to be mixed in the same
+ format string.  A format specifier without a
+ replaceableparameter/ field always uses the next

Re: [HACKERS] 9.2.3 crashes during archive recovery

Sorry, I sent wrong script.

 The head of origin/REL9_2_STABLE shows the behavior I metioned in
 the last message when using the shell script attached. 9.3dev
 runs as expected.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh
pgpath=$HOME/bin/pgsql_924b
echo $PATH | grep $pgpath | wc -l
if [ `echo $PATH | grep $pgpath | wc -l` == 0 ]; then
PATH=$pgpath/bin:$PATH
fi

PGDATA0=$HOME/data/pgdata_0
PGDATA1=$HOME/data/pgdata_1
PGARC0=$HOME/data/pgarc_0
PGARC1=$HOME/data/pgarc_1
PGPORT0=5432
PGPORT1=5433
unset PGPORT
unset PGDATA
echo Postgresql is \`which postgres`\
killall -9 postgres
rm -rf $PGDATA0/* $PGARC0/*
initdb -D $PGDATA0
cat  $PGDATA0/postgresql.conf EOF
port=5432
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p $PGARC0/%f'
max_wal_senders = 3
hot_standby = on
#log_min_messages = debug5
EOF
cat  $PGDATA0/pg_hba.conf EOF
local   replication horigutitrust
EOF
echo ## Startup master
pg_ctl -D $PGDATA0 -w  -o -p $PGPORT0 start

rm -rf $PGDATA1/* $PGARC1/*
echo ## basebackup
pg_basebackup -h /tmp -p $PGPORT0 -F p -X s -D $PGDATA1
chmod 700 $PGDATA1
cat  $PGDATA1/recovery.conf EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5432'
restore_command='a=$PGARC1; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; 
fi'
EOF

echo ## Startup standby
pg_ctl -D $PGDATA1 -o -p $PGPORT1 start
echo ## Sleep for 5 seconds
sleep 5

echo ## Shutdown standby
pg_ctl -D $PGDATA1 -w stop -m f

echo ## Shutdown master in immediate mode
pg_ctl -D $PGDATA0 -w stop -m i

cat  $PGDATA0/recovery.conf EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5433'
restore_command='a=$PGARC0; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; 
fi'
EOF

echo ## Starting master as a standby
pg_ctl -D $PGDATA0  start

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread KONDO Mitsumasa


Hi,

Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
Attempt patch records minRecoveryPoint.
[crash recovery - record minRecoveryPoint in control file - archive recovery]
I think that this is an original intention of Heikki's patch.

I also found a bug in latest 9.2_stable. It does not get latest timeline and
recovery history file in archive recovery when master and standby timeline is 
different.

Best regards,

(2013/03/05 18:22), Kyotaro HORIGUCHI wrote:

Hello, I could cause the behavior and might understand the cause.

The head of origin/REL9_2_STABLE shows the behavior I metioned in
the last message when using the shell script attached. 9.3dev
runs as expected.

In XLogPageRead, when RecPtr goes beyond the last page, the
current xlog file is released and new page requested.

The variables were as below at the point.

   StandbyRequested == true
   StandbyMode == false
   ArchiveRecoveryRequested == true
   InArchiveRecovery == false

In this case, XLogPageRead immediately returns NULL before trying
to get xlogs via streaming nor from archive. So ReadRecord
returns NULL, then unexpectedly exits 'main redo apply loop' and
increases timeline ID as if it were promoted.

This seems fiexed by letting it try all requested
sources. Attached patch does it and the test script runs as
expected.


We found that PostgreSQL with this patch unexpctedly becomes
primary when starting up as standby. We'll do further
investigation for the behavior.


Anyway, I've committed this to master and 9.2 now.


This seems to fix the issue. We'll examine this further.


regards,





--
Mitsumasa KONDO
NTT OSS Center
--- a/src/backend/access/transam/xlog.c	2013-03-04 15:13:49.0 -0500
+++ b/src/backend/access/transam/xlog.c	2013-03-05 06:43:49.435093827 -0500
@@ -4446,7 +4446,7 @@
 	if (targetTLI == 1)
 		return list_make1_int((int) targetTLI);
 
-	if (InArchiveRecovery)
+	if (ArchiveRecoveryRequested)
 	{
 		TLHistoryFileName(histfname, targetTLI);
 		fromArchive =
@@ -10603,8 +10603,11 @@
 readFile = XLogFileReadAnyTLI(readId, readSeg, emode,
 			  sources);
 switched_segment = true;
-if (readFile  0)
+if (readFile  0){
+	if (StandbyModeRequested)
+		return true;
 	return false;
+}
 			}
 		}
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

Simon Riggs si...@2ndquadrant.com wrote:
 On 3 March 2013 23:39, Tom Lane t...@sss.pgh.pa.us wrote:
 Nicolas Barbier nicolas.barb...@gmail.com writes:
 2013/3/3 Kevin Grittner kgri...@ymail.com:
 Nicolas Barbier nicolas.barb...@gmail.com wrote:
 I think that automatically using materialized views even when
 the query doesn’t mention them directly, is akin to
 automatically using indexes without having to mention them in
 the query.

 Oh, I understand that concept perfectly well, I just wonder
 how often it is useful in practice.

 There's a much more fundamental reason why this will never
 happen, which is that the query planner is not licensed to
 decide that you only want an approximate and not an exact answer
 to your query.

 If MVs were guaranteed always up-to-date, maybe we could think
 about automatic use of them --- but that's a far different
 feature from what Kevin has here.

 Its not a different feature, its what most people expect a
 feature called MV to deliver. That's not a matter of opinion, its
 simply how every other database works currently - Oracle,
 Teradata, SQLServer at least. The fact that we don't allow MVs to
 automatically optimize queries is acceptable, as long as that is
 clearly marked in some way to avoid confusion, and I don't mean
 buried on p5 of the docs. What we have here is a partial
 implementation that can be improved upon over next few releases.
 I hope anyone isn't going to claim that Materialized Views have
 been implemented in the release notes in this release, because
 unqualified that would be seriously misleading and might even
 stifle further funding to improve things to the level already
 implemented elsewhere. Just to reiterate, I fully support the
 committing of this partial feature into Postgres in this release
 because it will be a long haul to complete the full feature and
 what we have here is a reasonable stepping stone to get there.

 Transactionally up-yo-date MVs can be used like indexes in the
 planner. The idea that this is impossible because of the
 permutations involved is somewhat ridiculous; there is much
 published work on optimising that and some obvious optimisations.
 Clearly that varies according to the number of MVs and the number
 of tables they touch, not the overall complexity of the query.
 The overhead is probably same or less as partial indexes, which
 we currently think is acceptable. In any case, if you don't wish
 that overhead, don't use MVs.

 Non-transactionally up-to-date MVs could also be used like
 indexes in the planner, if we gave the planner the licence it
 (clearly) lacks. If using MV makes a two-hour query return in 1
 minute, then using an MV that is 15 minutes out of date is likely
 to be a win. The licence is some kind of user parameter/option
 that specifies how stale an answer a query can return. For many
 queries that involve averages and sums, a stale or perhaps an
 approximate answer would hardly differ anyway. So I think there
 is room somewhere there for a staleness time specification by
 the user, allowing approximation.

I don't think I disagree with any of what Simon says other than his
feelings about the planning cost.  Imagine that there are ten MVs
that might apply to a complex query, but some of them are mutually
exclusive, so there are a large number of permutations of MVs which
could be used to replace parts of the original query.  And maybe
some of base tables have indexes which could reduce execution cost
which aren't present in some or all of the MVs.  And each MV has a
number of indexes.  The combinatorial explosion of possible plans
would make it hard to constrain plan time without resorting to some
crude rules about what to choose.  That's not an unsolvable
problem, but I see it as a pretty big problem.

I have no doubt that someone could take a big data warehouse and
add one or two MVs and show a dramatic improvement in the run time
of a query.  Almost as big as if the query were rewritten to usee
the MV directly.  It would make for a very nice presentation, and
as long as they are used sparingly this could be a useful tool for
a data warehouse environment which is playing with alternative ways
to optimize slow queries which pass a lot of data.  In other
environments, I feel that it gets a lot harder to show a big win.

The good news is that it sounds like we agree on the ideal
long-term feature set.  I'm just a lot more excited, based on the
use-cases I've seen, about the addition of incremental updates than
substituting MVs into query plans which reference the underlying
tables.  Perhaps that indicates a chance to the final feature set
sooner, through everyone scratching their own itches.  :-)

And we both seem to feel that some system for managing acceptable
levels of MV freshness is a necessary feature in order to go very
much further.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list

[HACKERS] Materialized view assertion failure in HEAD

2013-03-05 Thread Joachim Wieland

I'm getting an assertion failure in HEAD with materialized views, see
below for backtrace.

To reproduce, just run make installcheck, dump the regression database
and then restore it, the server crashes during restore.

(gdb) bt
#0  0x7f283a366425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7f283a369b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00888429 in ExceptionalCondition (
conditionName=0xa0c840 !(!matviewRel-rd_rel-relhasoids),
errorType=0xa0c78a FailedAssertion, fileName=0xa0c780 matview.c,
lineNumber=135) at assert.c:54
#3  0x005dbfc4 in ExecRefreshMatView (stmt=0x1b70a28,
queryString=0x1b6ff98 REFRESH MATERIALIZED VIEW tvmm;\n\n\n, params=0x0,
completionTag=0x7fff47af0a60 ) at matview.c:135
#4  0x007758a5 in standard_ProcessUtility (parsetree=0x1b70a28,
queryString=0x1b6ff98 REFRESH MATERIALIZED VIEW tvmm;\n\n\n, params=0x0,
dest=0x1b70da0, completionTag=0x7fff47af0a60 ,
context=PROCESS_UTILITY_TOPLEVEL) at utility.c:1173
#5  0x00773e3f in ProcessUtility (parsetree=0x1b70a28,
queryString=0x1b6ff98 REFRESH MATERIALIZED VIEW tvmm;\n\n\n, params=0x0,
dest=0x1b70da0, completionTag=0x7fff47af0a60 ,
context=PROCESS_UTILITY_TOPLEVEL) at utility.c:341
#6  0x00772d7e in PortalRunUtility (portal=0x1aeb6d8,
utilityStmt=0x1b70a28, isTopLevel=1 '\001', dest=0x1b70da0,
completionTag=0x7fff47af0a60 ) at pquery.c:1185
#7  0x00772f56 in PortalRunMulti (portal=0x1aeb6d8,
isTopLevel=1 '\001', dest=0x1b70da0, altdest=0x1b70da0,
completionTag=0x7fff47af0a60 ) at pquery.c:1317
#8  0x00772481 in PortalRun (portal=0x1aeb6d8,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x1b70da0,
altdest=0x1b70da0, completionTag=0x7fff47af0a60 ) at pquery.c:814
#9  0x0076c155 in exec_simple_query (
query_string=0x1b6ff98 REFRESH MATERIALIZED VIEW tvmm;\n\n\n)
at postgres.c:1048
#10 0x00770517 in PostgresMain (argc=2, argv=0x1ab4bb0,
username=0x1ab4a88 joe) at postgres.c:3969
#11 0x0070ccec in BackendRun (port=0x1ae5ac0) at postmaster.c:3989
#12 0x0070c401 in BackendStartup (port=0x1ae5ac0) at postmaster.c:3673
#13 0x00708ce6 in ServerLoop () at postmaster.c:1575
#14 0x00708420 in PostmasterMain (argc=3, argv=0x1ab2420)
at postmaster.c:1244
#15 0x006704f7 in main (argc=3, argv=0x1ab2420) at main.c:197


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] odd behavior in materialized view

2013-03-05 Thread Fujii Masao

On Tue, Mar 5, 2013 at 7:36 AM, Kevin Grittner kgri...@ymail.com wrote:
 Fujii Masao masao.fu...@gmail.com wrote:

 When I accessed the materialized view in the standby server,

 I got the following ERROR message. Looks odd to me. Is this a bug?

ERROR:  materialized view hogeview has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

 The procedure to reproduce this error message is:

 In the master server:
CREATE TABLE hoge (i int);
INSERT INTO hoge VALUES (generate_series(1,100));
CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
DELETE FROM hoge;
REFRESH MATERIALIZED VIEW hogeview;
SELECT count(*) FROM hogeview;

 In the standby server
SELECT count(*) FROM hogeview;

 SELECT count(*) goes well in the master, and expectedly returns 0.
 OTOH, in the standby, it emits the error message.

 Will investigate.

Thanks!

And I found another problem. When I ran the following SQLs in the master,
PANIC error occurred in the standby.

CREATE TABLE hoge (i int);
INSERT INTO hoge VALUES (generate_series(1,100));
CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
VACUUM ANALYZE;

The PANIC error messages that I got in the standby are

WARNING:  page 0 of relation base/12297/16387 is uninitialized
CONTEXT:  xlog redo visible: rel 1663/12297/16387; blk 0
PANIC:  WAL contains references to invalid pages
CONTEXT:  xlog redo visible: rel 1663/12297/16387; blk 0

base/12297/16387 is the file of the materialized view 'hogeview'.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Trigger

On Mon, Mar 4, 2013 at 11:13 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Another question.  If I do ALTER TABLE foo DROP COLUMN bar, do we need
 to fire an event trigger for the dropped column?  Right now we don't,
 ISTM we should.  And if we want that, then the above set of three
 properties doesn't cut it.

+1.  Similar questions arise for object-access-hooks, among other places.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Trigger

On Mon, Mar 4, 2013 at 4:59 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Alvaro Herrera escribió:

 I think this is mostly ready to go in.  I'll look at your docs, and
 unless there are more objections will commit later or early tomorrow.

 Actually it still needs a bit more work: the error messages in
 pg_event_trigger_dropped_object need to be reworked.  It's a bit
 annoying that the function throws an error if the function is called in
 a CREATE command, rather than returning an empty set; or is it just me?

That seems like a reasonable thing to change.

 Here's v6 with docs and regression tests too.  Note the new function in
 objectaddress.c; without that, I was getting regression failures because
 catalogs such as pg_amop and pg_default_acl are not present in its
 supporting table.

I agree that this is reasonably close to being committable.  I do have
a bit of concern about the save-and-restore logic for the
dropped-object list -- it necessitates that the processing of every
DDL command that can potentially drop objects be bracketed with a
PG_TRY/PG_CATCH block.  While that's relatively easy to guarantee
today (but doesn't ALTER TABLE need similar handling?), it seems to me
that it might get broken in the future.  How hard would it be to pass
this information up and down the call stack instead of doing it this
way?  Or at least move the save/restore logic into something inside
the deletion machinery itself so that new callers don't have to worry
about it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-05 Thread Heikki Linnakangas

I spent some more time on this, and came up with the attached patch. It 
includes the changes I posted earlier, to use indexes instead of 
pointers in the hash table. In addition, it makes the hash table size 
variable, depending on the length of the input. This further reduces the 
startup cost on small inputs. I changed the hash method slightly, 
because the old method would not use any bits from the 3rd byte with a 
small hash table size, but fortunately that didn't seem to negative 
impact with larger hash table sizes either.


I wrote a little C extension to test this. It contains a function, which 
runs pglz_compress() on a bytea input, N times. I ran that with 
different kinds of inputs, and got the following results:


unpatched:

 testname  |   auto
---+---
 5k text   |  1425.750
 512b text |  1287.413
 2k random | -1053.160
 100k random   | -1056.379
 512b random   | -1018.474
 64b of text   | -2547.106
 64b random| -3731.700
 100k of same byte |  1093.885
 100k text |   849.430
(9 rows)

pglz-replace-pointers-with-indexes.patch (the patch I posted earlier):

testname  |   auto
---+---
 5k text   |  1251.576
 512b text |   946.348
 2k random |  -815.095
 100k random   |  -808.356
 512b random   |  -614.418
 64b of text   |  -744.382
 64b random| -1060.758
 100k of same byte |  1192.397
 100k text |   796.530
(9 rows)

pglz-variable-size-hash-table.patch:

 testname  |  auto
---+--
 5k text   | 1276.905
 512b text |  823.796
 2k random | -844.484
 100k random   | -848.080
 512b random   | -615.039
 64b of text   | -393.448
 64b random| -568.685
 100k of same byte | 1186.759
 100k text |  799.978
(9 rows)

These values are from a single run of the test, but I did repeat them 
several times to make sure there isn't too much variability in them to 
render the results meaningless. The negative values mean that 
pglz_compress gave up on the compression in the test, ie. it shows how 
long it took for pglz_compress to realize that it can't compress the 
input. Compare the abs() values.


With the variable-size hash table, I'm not sure how significant the 
earlier patch to use indexes instead of pointers is. But I don't think 
it makes things any worse, so it's included in this.


On 01.03.2013 17:42, Heikki Linnakangas wrote:

On 01.03.2013 17:37, Alvaro Herrera wrote:

My take on this is that if this patch is necessary to get Amit's patch
to a commitable state, it's fair game.


I don't think it's necessary for that, but let's see..


With these tweaks, I was able to make pglz-based delta encoding perform 
roughly as well as Amit's patch. So I think that's the approach we 
should take, as it's a bit simpler and more versatile. I'll follow up on 
that in the other thread.


- Heikki
diff --git a/src/backend/utils/adt/pg_lzcompress.c b/src/backend/utils/adt/pg_lzcompress.c
index 66c64c1..b8a69ec 100644
--- a/src/backend/utils/adt/pg_lzcompress.c
+++ b/src/backend/utils/adt/pg_lzcompress.c
@@ -112,7 +112,7 @@
  *			of identical bytes like trailing spaces) and for bigger ones
  *			our 4K maximum look-back distance is too small.
  *
- *			The compressor creates a table for 8192 lists of positions.
+ *			The compressor creates a table for lists of positions.
  *			For each input position (except the last 3), a hash key is
  *			built from the 4 next input bytes and the position remembered
  *			in the appropriate list. Thus, the table points to linked
@@ -120,7 +120,10 @@
  *			matching strings. This is done on the fly while the input
  *			is compressed into the output area.  Table entries are only
  *			kept for the last 4096 input positions, since we cannot use
- *			back-pointers larger than that anyway.
+ *			back-pointers larger than that anyway.  The size of the hash
+ *			table depends on the size of the input - a larger table has
+ *			a larger startup cost, as it needs to be initialized to zero,
+ *			but reduces the number of hash collisions on long inputs.
  *
  *			For each byte in the input, it's hash key (built from this
  *			byte and the next 3) is used to find the appropriate list
@@ -180,8 +183,7 @@
  * Local definitions
  * --
  */
-#define PGLZ_HISTORY_LISTS		8192	/* must be power of 2 */
-#define PGLZ_HISTORY_MASK		(PGLZ_HISTORY_LISTS - 1)
+#define PGLZ_MAX_HISTORY_LISTS	8192	/* must be power of 2 */
 #define PGLZ_HISTORY_SIZE		4096
 #define PGLZ_MAX_MATCH			273
 
@@ -198,9 +200,9 @@
  */
 typedef struct PGLZ_HistEntry
 {
-	struct PGLZ_HistEntry *next;	/* links for my hash key's list */
-	struct PGLZ_HistEntry *prev;
-	int			hindex;			/* my current hash key */
+	int16		next;			/* links for my hash key's list */
+	int16		prev;
+	uint32		hindex;			/* my current hash key */
 	const char *pos;			/* my input position */
 }

Re: [HACKERS] Fix pgstattuple/pgstatindex to use regclass-type as the argument

On Sun, Mar 3, 2013 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe this is acceptable collateral damage.  I don't know.  But we
 definitely stand a chance of breaking applications if we change
 pgstatindex like this.  It might be better to invent a differently-named
 function to replace pgstatindex.

If this were a built-in function, we might have to make a painful
decision between breaking backward compatibility and leaving this
broken forever, but as it isn't, we don't.  I think your suggestion of
adding a new function is exactly right.  We can remove the old one in
a future release, and support both in the meantime.  It strikes me
that if extension versioning is for anything, this is it.

We encountered, not long ago, a case where someone couldn't pg_upgrade
from 9.0 to 9.2.  The reason is that they had defined a view which
happened to reference pg_stat_activity.procpid, which we renamed.
Oops.  Granted, few users do that, and granted, we can't always
refrain from changing system catalog structure.  But it seems to me
that it's good to avoid the pain where we can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used

2013-03-05 Thread Pavel Stehule

2013/3/5 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp:
 Hello,

  I think that the only other behavioural glitch I spotted was that it
  fails to catch one overflow case, which should generate an out of
  ranger error:
 
  select format('|%*s|', -2147483648, 'foo');
  Result: |foo|
 
  because -(-2147483648) = 0 in signed 32-bit integers.

 Ouch. Thanks for pointing out.

 fixed - next other overflow check added

 Your change shown below seems assuming that the two's complement
 of the most negative number in integer types is identical to
 itself, and looks working as expected at least on
 linux/x86_64. But C standard defines it as undefined, (As far as
 I hear :-).

 |   if (width != 0)
 |   {
 |   int32 _width = -width;
 |
 |   if (SAMESIGN(width, _width))
 |   ereport(ERROR,


this pattern was used elsewhere in pg

 Instead, I think we can deny it by simply comparing with
 INT_MIN. I modified the patch like so and put some modifications
 on styling.

ook - I have not enough expirience with this topic and I cannot say
what is preferred.


 Finally, enlargeStringInfo fails just after for large numbers. We
 might should keep it under certain length to get rid of memory
 exhaustion.

I though about it, but I don't know a correct value - probably any
width specification higher 1MB will be bogus and can be blocked ?? Our
VARLENA max size is 1GB so with should not be higher than 1GB ever.

what do you thinking about these limits?

Regards

Pavel


 Anyway, I'll send this patch to committers as it is in this
 message.

 best wishes,

 --
 Kyotaro Horiguchi
 NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

Daniel Farina dan...@heroku.com wrote:

 This kind of change may have many practical problems that may
 make it un-pragmatic to alter at this time (considering the
 workaround is to set the extra float digits), but I can't quite
 grasp the rationale for well, the only program that cares about
 the most precision available is pg_dump.  It seems like most
 programs would care just as much.

Something to keep in mind is that when you store 0.01 into a double
precision column, the precise value stored, when written in
decimal, is:

0.0120816681711721685132943093776702880859375

Of course, some values can't be precisely written in decimal with
so few digits.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggested new CF status: Pending Discussion

On Mon, Mar 4, 2013 at 11:50 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Is that true of all commitfests, or only the last one in a cycle?  If the
 former, I think the existence of the waiting on author category belies
 this point.

The original point of Waiting on Author was that a patch might need
minor adjustments before it could get committed.  And, generally, for
the first few CommitFests, that's what happens.  But in the final
CommitFest, some people's notion of what a tweak in expands to include
multiple complete rewrites.  If a patch is basically OK, or only minor
points are being discussed, it's appropriate to punt it back to the
author and say, hey, fix these things and then we can commit this.
But if the whole method of operation of the patch needs rethinking,
then, in my opinion anyway, it should get pushed out to the last
CommitFest.

As for the last CommitFest, it's less that we should apply a different
standard and more that people fight back against the same standard a
lot harder when it means waiting a whole release cycle.

  Very little
 of the recently-committed stuff was ready to commit on January 15th,
 or even close to it, and the percentage of what's left that falls into
 that category is probably dropping steadily.  At this point, if
 there's not a consensus on it, the correct status is Returned with
 Feedback.  Specifically, the feedback that we're not going to commit
 it this CommitFest because we don't have consensus on it yet.

 That is a fair point, and I think Tom has said something similar.  But it
 leaves open the question of who it is that is supposed to be implementing
 it.  Is it the commit-fest manager who decides there is not sufficient
 consensus, or the author, or a self-assigned reviewer?

It can be any of those.

 I know that I certainly would not rush into an ongoing a conversation, in
 which several of the participants have their commit-bits, and say I'm
 calling myself the reviewer and am calling it dead, please stop discussing
 this.  Or even just, stop discussing it as an item for 9.3.

Perhaps not, but you could certainly express an opinion, if you had
one.  If you express well-thought-out opinions a sufficient number of
times, the idea is that (along with various other things) lead to you
getting a commit bit, too.

 I think the role of the commit-fest manager is that of a traffic-cop, not a
 magistrate.  But if we are going to have Commitfest II: The summary
 judgement, that needs to be run by a magistrate, as a separate process from
 the ordinary part of a commitfest.

I found that when I could spend 20 or 30 hours a week just managing
the CommitFest, I could do a fairly good job separating the stuff that
had a real chance of being ready with a modest amount of work from the
stuff that didn't.  But that basically involved me understanding, in a
fair degree of detail, every patch in the CommitFest.  Once I got my
commit bit, that became a lot less practical, because understanding
every patch in the CommitFest broadly and some of them well enough to
commit them added up to more work than I could do.  Also, that style
of CommitFest management involved me spending a lot of time arguing
with people who didn't want their patch punted no matter how
well-considered the arguments, and the novelty of that eventually wore
off.  To be fair, many people were very reasonable about the whole
thing - but the holdouts could suck up a huge amount of time and
emotional energy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] odd behavior in materialized view

Fujii Masao masao.fu...@gmail.com wrote:

 And I found another problem. When I ran the following SQLs in the
 master, PANIC error occurred in the standby.

 CREATE TABLE hoge (i int);
 INSERT INTO hoge VALUES (generate_series(1,100));
 CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
 VACUUM ANALYZE;

 The PANIC error messages that I got in the standby are

 WARNING:  page 0 of relation base/12297/16387 is uninitialized
 CONTEXT:  xlog redo visible: rel 1663/12297/16387; blk 0
 PANIC:  WAL contains references to invalid pages
 CONTEXT:  xlog redo visible: rel 1663/12297/16387; blk 0

 base/12297/16387 is the file of the materialized view 'hogeview'.

Yeah, that looks like it will be fixed by the fix for the first
problem.  The write of a first page without any rows to indicate
that it is a scannable empty relation must be WAL-logged.  I should
have something later today.

Thanks for spotting this.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized view assertion failure in HEAD

Joachim Wieland j...@mcknight.de wrote:

 I'm getting an assertion failure in HEAD with materialized views, see
 below for backtrace.

 To reproduce, just run make installcheck, dump the regression database
 and then restore it, the server crashes during restore.

 #2  0x00888429 in ExceptionalCondition (
   conditionName=0xa0c840 !(!matviewRel-rd_rel-relhasoids),
   errorType=0xa0c78a FailedAssertion, fileName=0xa0c780 matview.c,
   lineNumber=135) at assert.c:54

Will investigate.

You don't have default_with_oids = on, do you?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used

2013-03-05 Thread Dean Rasheed

On 5 March 2013 13:46, Pavel Stehule pavel.steh...@gmail.com wrote:
 2013/3/5 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp:
 Hello,

  I think that the only other behavioural glitch I spotted was that it
  fails to catch one overflow case, which should generate an out of
  ranger error:
 
  select format('|%*s|', -2147483648, 'foo');
  Result: |foo|
 
  because -(-2147483648) = 0 in signed 32-bit integers.

 Ouch. Thanks for pointing out.

 fixed - next other overflow check added

 Your change shown below seems assuming that the two's complement
 of the most negative number in integer types is identical to
 itself, and looks working as expected at least on
 linux/x86_64. But C standard defines it as undefined, (As far as
 I hear :-).

 |   if (width != 0)
 |   {
 |   int32 _width = -width;
 |
 |   if (SAMESIGN(width, _width))
 |   ereport(ERROR,


 this pattern was used elsewhere in pg

 Instead, I think we can deny it by simply comparing with
 INT_MIN. I modified the patch like so and put some modifications
 on styling.

 ook - I have not enough expirience with this topic and I cannot say
 what is preferred.


 Finally, enlargeStringInfo fails just after for large numbers. We
 might should keep it under certain length to get rid of memory
 exhaustion.

 I though about it, but I don't know a correct value - probably any
 width specification higher 1MB will be bogus and can be blocked ?? Our
 VARLENA max size is 1GB so with should not be higher than 1GB ever.

 what do you thinking about these limits?


I think it's fine as it is.

It's no different from repeat() for example. We allow repeat('a',
10) so allowing format('%10s', '') seems reasonable,
although probably not very useful.

Upping either beyond 1GB generates an out of memory error, which also
seems reasonable -- I can't imagine why you would want such a long
string.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-05 Thread Fujii Masao

On Tue, Mar 5, 2013 at 10:35 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Thanks for the review. All your comments are addressed and updated patches
 are attached.

I got the compile warnings:
tuptoaster.c:1539: warning: format '%s' expects type 'char *', but
argument 3 has type 'Oid'
tuptoaster.c:1539: warning: too many arguments for format

The patch doesn't handle the index on the materialized view correctly.

=# CREATE TABLE hoge (i int);
CREATE TABLE
=# CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
SELECT 0
=# CREATE INDEX hogeview_idx ON hogeview(i);
CREATE INDEX
=# REINDEX TABLE hogeview;
REINDEX
=# REINDEX TABLE CONCURRENTLY hogeview;
NOTICE:  table hogeview has no indexes
REINDEX

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

2013-03-05 Thread Heikki Linnakangas


On 05.03.2013 15:59, Kevin Grittner wrote:

Daniel Farinadan...@heroku.com  wrote:


This kind of change may have many practical problems that may
make it un-pragmatic to alter at this time (considering the
workaround is to set the extra float digits), but I can't quite
grasp the rationale for well, the only program that cares about
the most precision available is pg_dump.  It seems like most
programs would care just as much.


Something to keep in mind is that when you store 0.01 into a double
precision column, the precise value stored, when written in
decimal, is:

0.0120816681711721685132943093776702880859375

Of course, some values can't be precisely written in decimal with
so few digits.


It would be nice to have a base-2 text format to represent floats. It 
wouldn't be as human-friendly as base-10, but it could be used when you 
don't want to lose precision. pg_dump in particular.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

On 5 March 2013 12:15, Kevin Grittner kgri...@ymail.com wrote:

 I don't think I disagree with any of what Simon says other than his
 feelings about the planning cost.  Imagine that there are ten MVs
 that might apply to a complex query, but some of them are mutually
 exclusive, so there are a large number of permutations of MVs which
 could be used to replace parts of the original query.  And maybe
 some of base tables have indexes which could reduce execution cost
 which aren't present in some or all of the MVs.  And each MV has a
 number of indexes.  The combinatorial explosion of possible plans
 would make it hard to constrain plan time without resorting to some
 crude rules about what to choose.  That's not an unsolvable
 problem, but I see it as a pretty big problem.

If we are proposing that we shouldn't try to optimise because its not
usefully solvable, then I would disagree.

If we are saying that more plans are possible with MVs, then I'd say,
yes there *could* be - that's the one of the purposes. That represents
more options for optimisation and we should be happy, not sad about
that. Yes, we would need some thought to how those potential
optimisations can be applied without additional planning cost, but I
see that as a long term task, not a problem. The question is will the
potential for additional planning time actually materialise into a
planning problem? (See below).

 I have no doubt that someone could take a big data warehouse and
 add one or two MVs and show a dramatic improvement in the run time
 of a query.  Almost as big as if the query were rewritten to usee
 the MV directly.  It would make for a very nice presentation, and
 as long as they are used sparingly this could be a useful tool for
 a data warehouse environment which is playing with alternative ways
 to optimize slow queries which pass a lot of data.  In other
 environments, I feel that it gets a lot harder to show a big win.

Are there realistically going to be more options to consider? In
practice, no, because in most cases we won't be considering both MVs
and indexes.

Splatting MVs on randomly won't show any more improvement than
splatting indexes on randomly helps. Specific optimisations help in
specific cases only. And of course, adding extra data structures that
have no value certainly does increase planning time. Presumably we'd
need some way of seeing how frequently MVs were picked, so we could
drop unused MVs just like we can indexes.

* Indexes are great at speeding up various kinds of search queries. If
you don't have any queries like that, they help very little.

* MVs help in specific and restricted use cases, but can otherwise be
thought of as a new kind of index structure. MVs help with joins and
aggregations, so if you don't do much of that, you'll see no benefit.

That knowledge also allows us to develop heuristics for sane
optimisation. If the MV has a GROUP BY in it, and a query doesn't,
then it probably won't help much to improve query times. If it
involves a join you aren't using, then that won't help either. My
first guess would be that we don't even bother looking for MV plans
unless it has an aggregated result, or a large scale join. We do
something similar when we look for plans that use indexes when we have
appropriate quals - no quals, no indexes.

As a result, I don't see MVs increasing planning times for requests
that would make little or no use of them. There will be more planning
time on queries that could make use of them and that is good because
we really care about that.

Sure, a badly written planner might cost more time than it saves. All
of this work requires investment from someone with the time and/or
experience to make a good go at it. I'm not pushing Tom towards it,
nor anyone else, but I do want to see the door kept open for someone
to do this when possible (i.e. not GSoC).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggested new CF status: Pending Discussion

On 4 March 2013 18:59, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 3, 2013 at 9:27 PM, Josh Berkus j...@agliodbs.com wrote:
 Except that the implication of waiting on author is that, if there's
 no updates in a couple weeks, we bounce it.  And the author doesn't
 necessarily control a bikeshedding discussion about syntax, for example.

 That's true.  I think, though, that the basic problem is that we've
 lost track of the ostensible purpose of a CommitFest, which is to
 commit the patches that *are already ready* for commit.

 Mumble.  That's *part* of the purpose of a CF, but not all.  It's also
 meant to be a time when people concentrate on reviewing patches, and
 surely discussions about syntax or whatever have to be part of that.

 I recall in fact that at the last developer meeting, there was
 discussion about trying to get people to do more formal reviewing of
 design ideas that hadn't even made it to the submittable-patch stage.
 So I feel it's counterproductive to try to narrow the concept of a CF
 to only ready to commit patches.

Agreed.

Ready to commit is a state, and everybody has a different view of
the state of each patch. So we need a point where we all sync up and
decide by some mechanism what the group's opinion of the state is and
handle accordingly.

Or put another way, I very much welcome the chance for feedback from
others on my work, and appreciate the opportunity for review of others
work. If we can commit at any time, then every discussion needs to be
followed in detail otherwise we get you should have said that
earlier repeatedly. By all stopping, having a cup of tea and deciding
what we're happy with and then continue, it gives the opportunity for
efficient thread scheduling of our work. Between CFs we can work
mostly independently.

People starting new discussions while we have a big patch queue is
annoying, because it just makes the whole queue slow down and even
less gets done in the long run. We need to look at good overall
thruput, not continually interrupt each other, causing single
threading and overall loss of efficiency. Same idea as if we had a
meeting, it would be cool if everybody listened to the meeting, not
took calls and then walked back in and repeat discussions that already
happened. And overall, one long rolling discussion on hackers is the
same thing as saying all developers spend all day every day in a
meeting - we should recognise how unproductive that is and work out
ways to avoid it.

Process changes every year because the skills, capacities and
requirements change each year. So change doesn't imply last thing was
broken.

The general question is how can we work efficiently together and still
produce high quality software?


 But having said that, maybe the last CF of a cycle has to be treated
 more nearly as you suggest.  Certainly if we hold off ending the CF
 in hopes of committing stuff that wasn't nearly ready to commit at
 its beginning, then we're back to bad old habits that seldom lead
 to anything but a late release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used

2013-03-05 Thread Pavel Stehule

2013/3/5 Dean Rasheed dean.a.rash...@gmail.com:
 On 5 March 2013 13:46, Pavel Stehule pavel.steh...@gmail.com wrote:
 2013/3/5 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp:
 Hello,

  I think that the only other behavioural glitch I spotted was that it
  fails to catch one overflow case, which should generate an out of
  ranger error:
 
  select format('|%*s|', -2147483648, 'foo');
  Result: |foo|
 
  because -(-2147483648) = 0 in signed 32-bit integers.

 Ouch. Thanks for pointing out.

 fixed - next other overflow check added

 Your change shown below seems assuming that the two's complement
 of the most negative number in integer types is identical to
 itself, and looks working as expected at least on
 linux/x86_64. But C standard defines it as undefined, (As far as
 I hear :-).

 |   if (width != 0)
 |   {
 |   int32 _width = -width;
 |
 |   if (SAMESIGN(width, _width))
 |   ereport(ERROR,


 this pattern was used elsewhere in pg

 Instead, I think we can deny it by simply comparing with
 INT_MIN. I modified the patch like so and put some modifications
 on styling.

 ook - I have not enough expirience with this topic and I cannot say
 what is preferred.


 Finally, enlargeStringInfo fails just after for large numbers. We
 might should keep it under certain length to get rid of memory
 exhaustion.

 I though about it, but I don't know a correct value - probably any
 width specification higher 1MB will be bogus and can be blocked ?? Our
 VARLENA max size is 1GB so with should not be higher than 1GB ever.

 what do you thinking about these limits?


 I think it's fine as it is.

 It's no different from repeat() for example. We allow repeat('a',
 10) so allowing format('%10s', '') seems reasonable,
 although probably not very useful.

 Upping either beyond 1GB generates an out of memory error, which also
 seems reasonable -- I can't imagine why you would want such a long
 string.

 Regards,
 Dean

ok

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used

Kyotaro HORIGUCHI escribió:
 Umm. sorry,
 
  If you have no problem with this, I'll send this to committer.
 
 I just found that this patch already has a revewer. I've seen
 only Status field in patch list..

Patches can be reviewed by more than one people.  It's particularly
useful if they have different things to say.  So don't hesitate to
review patches that have already been reviewed by other people.

In fact, you can even review committed patches; it's not unlikely that
you will be able to find bugs in those, too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Triggerg

Robert Haas escribió:
 On Mon, Mar 4, 2013 at 4:59 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  Alvaro Herrera escribió:
 
  I think this is mostly ready to go in.  I'll look at your docs, and
  unless there are more objections will commit later or early tomorrow.
 
  Actually it still needs a bit more work: the error messages in
  pg_event_trigger_dropped_object need to be reworked.  It's a bit
  annoying that the function throws an error if the function is called in
  a CREATE command, rather than returning an empty set; or is it just me?
 
 That seems like a reasonable thing to change.

I'm thinking just removing the check altogether, and if the list of
objects dropped is empty, then the empty set is returned.  That is, if
you call the function directly in user-invoked SQL, you get empty; if
you call the function in a CREATE event trigger function, you get empty.

Since this makes the boolean drop_in_progress useless, it'd be removed
as well.

 I do have
 a bit of concern about the save-and-restore logic for the
 dropped-object list -- it necessitates that the processing of every
 DDL command that can potentially drop objects be bracketed with a
 PG_TRY/PG_CATCH block.  While that's relatively easy to guarantee
 today (but doesn't ALTER TABLE need similar handling?), it seems to me
 that it might get broken in the future.  How hard would it be to pass
 this information up and down the call stack instead of doing it this
 way?

This would be rather messy, I think; there are too many layers, and too
many different functions.

 Or at least move the save/restore logic into something inside the
 deletion machinery itself so that new callers don't have to worry
 about it?

Hmm, maybe this can be made to work, I'll see about it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

2013-03-05 Thread Maciek Sakrejda

On Tue, Mar 5, 2013 at 12:03 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 I don't think that it is about looking nice.
 C doesn't promise you more than FLT_DIG or DBL_DIG digits of
 precision, so PostgreSQL cannot either.

 If you allow more, that would mean that if you store the same
 number on different platforms and query it, it might come out
 differently.  Among other things, that would be a problem for
 the regression tests.

Thank you: I think this is what I was missing, and what wasn't clear
from the proposed doc patch. But then how can pg_dump assume that it's
always safe to set extra_float_digits = 3? Why the discrepancy between
default behavior and what pg_dump gets? It can't know whether the dump
is to be restored into the same system or a different one (and AFAICT,
there's not even an option to tweak extra_float_digits there).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-05 Thread Andres Freund

On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
 Thanks for the review. All your comments are addressed and updated patches
 are attached.
 Please see below for the details, and if you find anything else just let me
 know.
 
 On Tue, Mar 5, 2013 at 6:27 PM, Andres Freund and...@2ndquadrant.comwrote:
 
  Have you benchmarked the toastrelidx removal stuff in any way? If not,
  thats fine, but if yes I'd be interested.
 
 No I haven't. Is it really that easily measurable? I think not, but me too
 I'd be interested in looking at such results.

I don't think its really measurable, at least not for modifications. But
istm that the onus to proof that to some degree is upon the patch.


  + if (toastrel-rd_indexvalid == 0)
   + RelationGetIndexList(toastrel);
 
  Hm, I think we should move this into a macro, this is cropping up at
  more and more places.
 
 This is not necessary. RelationGetIndexList does a check similar at its
 top, so I simply removed all those checks.

Well, in some of those cases a function call might be noticeable
(probably only in the toast fetch path). Thats why I suggested putting
the above in a macro...

 
   + for (count = 0; count  num_indexes; count++)
   + index_insert(toastidxs[count], t_values, t_isnull,
   +  (toasttup-t_self),
   +  toastrel,
   +
   toastidxs[count]-rd_index-indisunique ?
   +  UNIQUE_CHECK_YES :
  UNIQUE_CHECK_NO);
 
  The indisunique check looks like a copy  pasto to me, albeit not
  yours...
 
 Yes it is the same for all the indexes normally, but it looks more solid to
 me to do that as it is. So unchanged.

Hm, if the toast indexes aren't unique anymore loads of stuff would be
broken. Anyway, not your fault.

 
   + /* Obtain index list if necessary */
   + if (toastRel1-rd_indexvalid == 0)
   + RelationGetIndexList(toastRel1);
   + if (toastRel2-rd_indexvalid == 0)
   + RelationGetIndexList(toastRel2);
   +
   + /* Check if the swap is possible for all the toast indexes
  */
 
  So there's no error being thrown if this turns out not to be possible?
 
 There are no errors also in the former process... This should fail
 silently, no?

Not sure what you mean by former process? So far I don't see any
reason why it would be a good idea to fail silently. We end up with
corrupt data if the swap is silently not performed.

   + if (count == 0)
   + snprintf(NewToastName,
  NAMEDATALEN, pg_toast_%u_index,
   +  OIDOldHeap);
   + else
   + snprintf(NewToastName,
  NAMEDATALEN, pg_toast_%u_index_cct%d,
   +  OIDOldHeap,
  count);
   + RenameRelationInternal(lfirst_oid(lc),
   +
   NewToastName);
   + count++;
   + }
 
  Hm. It seems wrong that this layer needs to know about _cct.
 
  Any other idea? For the time being I removed cct and added only a suffix
 based on the index number...

Hm. It seems like throwing an error would be sufficient, that path is
only entered for shared catalogs, right? Having multiple toast indexes
would be a bug.

   + /*
   +  * Index is considered as a constraint if it is PRIMARY KEY or
  EXCLUSION.
   +  */
   + isconstraint =  indexRelation-rd_index-indisprimary ||
   + indexRelation-rd_index-indisexclusion;
 
  unique constraints aren't mattering here?
 
 No they are not. Unique indexes are not counted as constraints in the case
 of index_create. Previous versions of the patch did that but there are
 issues with unique indexes using expressions.

Hm. index_create's comment says:
 * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint

There are unique indexes that are constraints and some that are
not. Looking at -indisunique is not sufficient to determine whether its
one or not.

  We probably should remove the fsm of the index altogether after this?
 
 The freespace map? Not sure it is necessary here. Isn't it going to be
 removed with the relation anyway?

I had a thinko here, forgot what I said. I thought the freespacemap
would be the one from the old index, but htats clearly bogus. Comes from
writing reviews after having to leave home at 5 in the morning to catch
a plane ;)

   +void
   +index_concurrent_drop(Oid indexOid)
   +{
   + Oid constraintOid =
  get_index_constraint(indexOid);
   + ObjectAddress   object;
   + Form_pg_index   indexForm;
   + Relationpg_index;
   + HeapTuple   indexTuple;
   + bool

Re: [HACKERS] [GENERAL] Floating point error

2013-03-05 Thread James Cloos

 HL == Heikki Linnakangas hlinnakan...@vmware.com writes:

HL It would be nice to have a base-2 text format to represent floats.
HL It wouldn't be as human-friendly as base-10, but it could be used
HL when you don't want to lose precision. pg_dump in particular.

hexidecimal notation for floats exists.  The printf format flag is %a
for miniscule and %A for majuscule.

The result of 1./3. is 0xa.aabp-5.

This site has some info and a conversion demo:

http://gregstoll.dyndns.org/~gregstoll/floattohex/

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Triggerg

 Robert Haas escribió:

  Or at least move the save/restore logic into something inside the
  deletion machinery itself so that new callers don't have to worry
  about it?

I don't think that works well; precisely the point of the
initialize/finalize pair of functions is to bracket the drop so that the
objects reported by the deletion machinery are stored in the right list.

I tried this macro:

/*
 * Wrap a code fragment that executes a command that may drop database objects,
 * so that the event trigger environment is appropriately setup.
 *
 * Note this macro will call EventTriggerDDLCommandEnd if the object type is
 * supported; caller must make sure to call EventTriggerDDLCommandStart by
 * itself.
 */
#define ExecuteDropCommand(isCompleteQuery, codeFragment, has_objtype, objtype) 
\
do { \
slist_head  _save_objlist; \
bool_supported; \
\
_supported = has_objtype ? 
EventTriggerSupportsObjectType(objtype) : true; \
\
if (isCompleteQuery) \
{ \
EventTriggerInitializeDrop(_save_objlist); \
} \
PG_TRY(); \
{ \
codeFragment; \
if (isCompleteQuery  _supported) \
{ \
EventTriggerDDLCommandEnd(parsetree); \
} \
} \
PG_CATCH(); \
{ \
if (isCompleteQuery  _supported) \
{ \
EventTriggerFinalizeDrop(_save_objlist); \
} \
PG_RE_THROW(); \
} \
PG_END_TRY(); \
EventTriggerFinalizeDrop(_save_objlist); \
} while (0)

This looks nice in DropOwned:

case T_DropOwnedStmt:
if (isCompleteQuery)
EventTriggerDDLCommandStart(parsetree);
ExecuteDropCommand(isCompleteQuery,
   DropOwnedObjects((DropOwnedStmt *) 
parsetree),
   false, 0);
break;

And it works for DropStmt too:

ExecuteDropCommand(isCompleteQuery,
switch (stmt-removeType)
{
case OBJECT_INDEX:
case OBJECT_TABLE:
case OBJECT_SEQUENCE:
case OBJECT_VIEW:
case OBJECT_MATVIEW:
case OBJECT_FOREIGN_TABLE:
RemoveRelations((DropStmt *) parsetree);
break;
default:
RemoveObjects((DropStmt *) parsetree);
break;
}, true, stmt-removeType);

but a rather serious problem is that pgindent refuses to work completely
on this (which is understandable, IMV).  My editor doesn't like the
braces inside something that looks like a function call, either.  We use
this pattern (a codeFragment being called by a macro) as
ProcessMessageList in inval.c, but the code fragments there are much
simpler.

And in AlterTable, using the macro would be much uglier because the code
fragment is more convoluted.

I'm not really sure if it's worth having the above macro if the only
caller is DropOwned.

Hmm, maybe I should be considering a pair of macros instead --
UTILITY_START_DROP and UTILITY_END_DROP.  I'll give this a try.  Other
ideas are welcome.

FWIW, I noticed that the AlterTable case lacks a call to DDLCommandEnd;
reporting that to Dimitri led to him noticing that DefineStmt also lacks
one.  This is a simple bug in the already-committed implementation which
should probably be fixed separately from this patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-05 Thread Jeff Davis


Thank you for the review.

On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote:
 If you enable checksums, the free space map never gets updated in a 
 standby. It will slowly drift to be completely out of sync with reality, 
 which could lead to significant slowdown and bloat after failover.

Will investigate.

 Since the checksums are an all-or-nothing cluster-wide setting, the 
 three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and 
 PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the 
 code simpler, and leaves the bits free for future use. If we want to 
 enable such per-page setting in the future, we can add it later. For a 
 per-relation scheme, they're not needed.

They don't really need to be there, I just put them there because it
seemed wise if we ever want to allow online enabling/disabling of
checksums. But I will remove them.

 How does the error detection rate of this compare with e.g CRC-16? Is 
 there any ill effect from truncating the Fletcher sums like this?

I don't recall if I published these results or not, but I loaded a
table, and used pageinspect to get the checksums of the pages. I then
did some various GROUP BY queries to see if I could find any clustering
or stepping of the checksum values, and I could not. The distribution
seemed very uniform across the 255^2 space.

I tried to think of other problems, like missing errors in the high or
low bits of a word or a page (similar to the issue with mod 256
described below), but I couldn't find any. I'm not enough of an expert
to say more than that about the error detection rate.

Fletcher is probably significantly faster than CRC-16, because I'm just
doing int32 addition in a tight loop.

Simon originally chose Fletcher, so perhaps he has more to say.

 That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall 
 seeing that in other checksum implementations either. 16-bits is not 
 very wide for a checksum, and this eats about 1% of the space of valid 
 values.
 
 I can see that it might be a handy debugging aid to avoid 0. But there's 
 probably no need to avoid 0 in both bytes, it seems enough to avoid a 
 completely zero return value.

http://en.wikipedia.org/wiki/Fletcher%27s_checksum

If you look at the section on Fletcher-16, it discusses the choice of
the modulus. If we used 256, then an error anywhere except the lowest
byte of a 4-byte word read from the page would be missed.

Considering that I was using only 255 values anyway, I thought I might
as well shift the values away from zero.

We could get slightly better by using all combinations. I also
considered chopping the 64-bit ints into 16-bit chunks and XORing them
together. But when I saw the fact that we avoided zero with the other
approach, I kind of liked it, and kept it.

 XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN 
 without a lock. That's not atomic, so it could incorrectly determine 
 that a page doesn't need to be backed up. We used to always hold an 
 exclusive lock on the buffer when it's called, which prevents 
 modifications to the LSN, but that's no longer the case.

Will investigate, but it sounds like a buffer header lock will fix it.

 Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I 
 think it will generate WAL records for unlogged tables as it is.

Yes, thank you.

Also, in FlushBuffer(), this patch moves the clearing of the
BM_JUST_DIRTIED bit to before the WAL flush. That seems to expand the
window during which a change to a page will prevent it from being marked
clean. Do you see any performance problem with that?

The alternative is to take the buffer header lock twice: once to get the
LSN, then WAL flush, then another header lock to clear BM_JUST_DIRTIED.
Not sure if that's better or worse. This goes back to Simon's patch, so
he may have a comment here, as well.

I'll post a new patch with these comments addressed, probably tomorrow
so that I have some time to self-review and do some basic testing.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

Maciek Sakrejda m.sakre...@gmail.com writes:
 Thank you: I think this is what I was missing, and what wasn't clear
 from the proposed doc patch. But then how can pg_dump assume that it's
 always safe to set extra_float_digits = 3?

It's been proven (don't have a link handy, but the paper is at least
a dozen years old) that 3 extra digits are sufficient to accurately
reconstruct any IEEE single or double float value, given properly
written conversion functions in libc.  So that's where that number comes
from.  Now, if either end is not using IEEE floats, you may or may not
get equivalent results --- but it's pretty hard to make any guarantees
at all in such a case.

 Why the discrepancy between
 default behavior and what pg_dump gets?

Basically, the default behavior is tuned to the expectations of people
who think that what they put in is what they should get back, ie we
don't want the system doing this by default:

regression=# set extra_float_digits = 3;
SET
regression=# select 0.1::float4;
   float4
-
 0.10001
(1 row)

regression=# select 0.1::float8;
   float8
-
 0.10001
(1 row)

We would get a whole lot more bug reports, not fewer, if that were
the default behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

2013-03-05 Thread Maciek Sakrejda

On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Why the discrepancy between
 default behavior and what pg_dump gets?

 Basically, the default behavior is tuned to the expectations of people
 who think that what they put in is what they should get back, ie we
 don't want the system doing this by default:

 regression=# set extra_float_digits = 3;
 SET
 regression=# select 0.1::float4;
float4
 -
  0.10001
 (1 row)

 regression=# select 0.1::float8;
float8
 -
  0.10001
 (1 row)

 We would get a whole lot more bug reports, not fewer, if that were
 the default behavior.

Isn't this a client rendering issue, rather than an on-the-wire encoding issue?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

Maciek Sakrejda m.sakre...@gmail.com writes:
 On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Basically, the default behavior is tuned to the expectations of people
 who think that what they put in is what they should get back, ie we
 don't want the system doing this by default:
 
 regression=# set extra_float_digits = 3;
 SET
 regression=# select 0.1::float4;
 float4
 -
 0.10001
 (1 row)
 
 regression=# select 0.1::float8;
 float8
 -
 0.10001
 (1 row)
 
 We would get a whole lot more bug reports, not fewer, if that were
 the default behavior.

 Isn't this a client rendering issue, rather than an on-the-wire encoding 
 issue?

Nope, at least not unless you ask for binary output format (which
introduces a whole different set of portability gotchas, so it's
not the default either).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

2013-03-05 Thread Steve Singer


On 13-03-03 08:13 PM, Josh Berkus wrote:

This (creating the extensions) works fine for me on a  Ubuntu 10.x system




Now if only we can work out the combinatorics issue ...



The plpython2u extensions worked fine but I haven't been able to get 
this to work with plpython3u (python 3.1).


create extension hstore_plpython3u;
ERROR:  could not load library 
/usr/local/pgsql93git/lib/hstore_plpython3.so: 
/usr/local/pgsql93git/lib/hstore_plpython3.so: undefined symbol: 
_Py_NoneStruct



Peter mentioned in the submission that the unit tests don't pass with 
python3, it doesn't  work for meoutside the tests either.


Steve




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

2013-03-05 Thread Tom Duffey

This conversation has moved beyond my ability to be useful but I want to remind 
everyone of my original issues in case it helps you improve the docs:

1) Data shown in psql did not match data retrieved by JDBC. I had to debug 
pretty deep into the JDBC code to confirm that a value I was staring at in psql 
was different in JDBC. Pretty weird, but I figured it had something to do with 
floating point malarky.

2) The problem in #1 could not be reproduced when running on our test database. 
Again very weird, because as far as psql was showing me the values in the two 
databases were identical. I used COPY to transfer some data from the production 
database to the test database.

I now know that what you see in psql is not necessarily what you see in JDBC. I 
also know that you need to set extra_float_digits = 3 before using COPY to 
transfer data from one database to another or risk differences in floating 
point values. Sounds like both pg_dump and the JDBC driver must be doing this 
or its equivalent on their own.

If the numeric types page of the documentation had mentioned the 
extra_float_digits then I might have been able to solve my own problem. I'd 
like you to add some mention of it even if it is just handwaving but will let 
you guys hash it out from here. Either way, PostgreSQL rocks!

Tom

On Mar 5, 2013, at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maciek Sakrejda m.sakre...@gmail.com writes:
 On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Basically, the default behavior is tuned to the expectations of people
 who think that what they put in is what they should get back, ie we
 don't want the system doing this by default:
 
 regression=# set extra_float_digits = 3;
 SET
 regression=# select 0.1::float4;
 float4
 -
 0.10001
 (1 row)
 
 regression=# select 0.1::float8;
 float8
 -
 0.10001
 (1 row)
 
 We would get a whole lot more bug reports, not fewer, if that were
 the default behavior.
 
 Isn't this a client rendering issue, rather than an on-the-wire encoding 
 issue?
 
 Nope, at least not unless you ask for binary output format (which
 introduces a whole different set of portability gotchas, so it's
 not the default either).
 
   regards, tom lane

--
Tom Duffey
tduf...@trillitech.com
414-751-0600 x102



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms


 Peter mentioned in the submission that the unit tests don't pass with
 python3, it doesn't  work for meoutside the tests either.

Also, note that the feature is the concept of Transforms, not the
individual transforms which Peter provides.  The idea is to enable a new
kind of extension.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

Simon, Kevin, all:

Actually, there was already an attempt at automated MV query planning as
a prior university project.  We could mine that for ideas.

Hmmm.  I thought it was on pgfoundry, but it's not.  Does anyone have
access to ACM databases etc. so they could search for this?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parameterized paths vs index clauses extracted from OR clauses

Robert Haas robertmh...@gmail.com writes:
 On Thu, Feb 28, 2013 at 3:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Whichever way we go, the resulting patch is likely to be too large and
 invasive for me to feel terribly comfortable about back-patching it into
 9.2.  AFAICT this issue only arises for indexquals extracted out of
 larger OR conditions, so maybe it's not worth taking such a risk for.

 EnterpriseDB has received a number of complaints from our customers
 resulting from planner behavior changes which were back-patched; so I
 am not sanguine about back-patching unless the situation is pretty
 darn dire and the fix is pretty darn near certain to be an improvement
 in every case.

Well, the point is not so much about whether it's an improvement as that
9.2's current behavior is a regression from 9.1 and earlier.  People may
not like changes in minor releases, but they don't like regressions
either.

 A downside of this approach is that to preserve
 the same-number-of-rows assumption, we'd end up having to enforce the
 extracted clauses as filter clauses in parameterized paths, even if
 they'd not proved to be of any use as index quals.

 I'm not sure I fully grasp why this is a downside.  Explain further?

Because we'd be checking redundant clauses.  You'd get something like

Nested Loop
Filter: (foo OR (bar AND baz))

... some outer scan here ...

Index Scan:
Filter: (foo OR bar)

If foo OR bar is useful as an indexqual condition in the inner scan,
that's one thing.  But if it isn't, the cycles expended to check it in
the inner scan are possibly wasted, because we'll still have to check
the full original OR clause later.  It's possible that the filter
condition removes enough rows from the inner scan's result to justify
the redundant checks, but it's at least as possible that it doesn't.

 Since there's little point in using a paramaterized path in the first
 place unless it enables you to drastically reduce the number of rows
 being processed, I would anticipate that maybe the consequences aren't
 too bad, but I'm not sure.

Yeah, we could hope that the inner scan is already producing few enough
rows that it doesn't matter much.  But I think that we'd end up checking
the added qual even in a non-parameterized scan; there's no mechanism
for pushing quals into the general qual lists and then retracting them
later.  (Hm, maybe what we need is a marker for enforce this clause
only if you feel like it?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reproducible Bus error in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)

Dmitry Koterov dmi...@koterov.ru wrote:

 LOG:  server process (PID 18705) was terminated by signal 7: Bus error

So far I have only heard of this sort of error when PostgreSQL is
running in a virtual machine and the VM software is buggy.  If you
are not running in a VM, my next two suspects would be
hardware/BIOS configuration issues, or an antivirus product.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner kgri...@ymail.com wrote:
 I don't think I disagree with any of what Simon says other than his
 feelings about the planning cost.  Imagine that there are ten MVs
 that might apply to a complex query, but some of them are mutually
 exclusive, so there are a large number of permutations of MVs which
 could be used to replace parts of the original query.  And maybe
 some of base tables have indexes which could reduce execution cost
 which aren't present in some or all of the MVs.  And each MV has a
 number of indexes.  The combinatorial explosion of possible plans
 would make it hard to constrain plan time without resorting to some
 crude rules about what to choose.  That's not an unsolvable
 problem, but I see it as a pretty big problem.

I'm not sure I agree.  Suppose you have a query like SELECT * FROM a
INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE some
stuff.  The query planner will construct paths for scans on a, b, and
c.  Then it will construct joinrels for (a b), (a c), (b c), and
eventually (a b c) and calculate a set of promising paths for each of
them.  If there is a materialized view available for one of those
joinrels, all we really need to do is add the possible paths for
scanning that materialized view to the joinrel.  They'll either be
better than the existing paths, or they won't.  If they're better, the
paths that otherwise would have gotten chosen will get kicked out.  If
they're worse, the materialized-view paths will get kicked out.
Either way, we don't have any more paths than we would have otherwise
- so no combinatorial explosion.

There is one case where we might end up with more paths than we had
before.  Suppose there's a materialized view on the query SELECT a.x,
a.y, a.z, b.t FROM a INNER JOIN b ON a.x = b.x ORDER BY a.z and the
users enters just that query.  Suppose further that the materialized
view has an index on column z, but table a does not.  In that case,
the best paths for the joinrel (a b) not involving the materialized
view will be an unordered path, but we could scan the materialized
view using the index and so will have a path that is ordered by a.z to
add to the joinrel.  This path will stick around even if it's more
expensive than the unordered path because we know it avoids a final
sort.  So in that case we do have more paths, but they are potentially
useful paths, so I don't see that as a problem.

It seems to me that the tricky part of this is not that it might add a
lot more paths (because I don't think it will, and if it does I think
they're useful paths), but that figuring out whether or not a
materialized view matches any particular joinrel might be expensive.
I think we're going to need a pretty accurate heuristic for quickly
discarding materialized views that can't possibly be relevant to the
query as a whole, or to particular joinrels.  There are a couple of
possible ways to approach that.  The most manual of these is probably
to have a command like ALTER TABLE x {ENABLE | DISABLE } REWRITE
MATERIALIZED y, where the user has to explicitly ask for materialized
views to be considered, or else they aren't.  That sort of
fine-grained control might have other benefits as well.

I think a more automated solution is also possible, if we want it.
For a materialized view to match a query, all of the baserels in the
materialized view must also be present in the query.  (Actually, there
are situations where this isn't true; e.g. the materialized view has
an extra table, but it's joined in a way that could be pruned away by
the join removal code, but I think we could ignore such
somewhat-pathological cases at least initially.)  It seems to me that
if we could figure out a very-cheap way to throw away all of the
materialized views that don't pass that basic test, we'd be reasonably
close to a workable solution.  A database with tons of materialized
views defined on the same set of target relations might still have
some planning time problems, but so might a database with tons of
indexes.

In that regard, what I was thinking about is to use something sort of
like a Bloom filter.  Essentially, produce a fingerprint for each
materialized view.  For the sake of argument, let's say that the
fingerprint is a 64-bit integer, although it could be a bit string of
any length.  To construct the fingerprint, hash the OID of each
relation involved in the view onto a bit position between 0 and 63.
Set the bits for all relations involved in the materialized view.
Then, construct a fingerprint for the query in the same way.  Any
materialized view where (matview_fingerprint  query_fingerprint) !=
matview_fingerprint needn't be considered; moreover, for a given
joinrel, any materialized view where matview_fingerprint !=
joinrel_fingerprint (computed using the same scheme) needn't be
considered.  Of course, a matching fingerprint doesn't mean that the
materialized view matches, or even necessarily that it involves the
correct

Re: [HACKERS] Materialized views WIP patch

Josh Berkus j...@agliodbs.com wrote:

There is no shortage of literature on the topic, although any
papers from the ACM could certainly be of interest due to the
general high quality of papers published there.  Adding anything
like this to 9.3 is clearly out of the question, though, so I
really don't want to spend time researching this now, or
encouraging others to do so until after we have a 9.3 release
candidate.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Trigger

Alvaro Herrera escribió:

 Hmm, maybe I should be considering a pair of macros instead --
 UTILITY_START_DROP and UTILITY_END_DROP.  I'll give this a try.  Other
 ideas are welcome.

This seems to work.  See attached; I like the result because there's no
clutter and it supports all three cases without a problem.

I also added a new output column to pg_event_trigger_dropped_objects,
subobject_name, which is NULL except when a column is being dropped,
in which case it contains the column name.  Also, the object type
column now says table column instead of table when dropping a
column.

Another question arose in testing: this reports dropping of temp
objects, too, but of course not always: particularly not when temp
objects are dropped at the end of a session (or the beginning of a
session that reuses a previously used temp schema).  I find this rather
inconsistent and I wonder if we should instead suppress reporting of
temp objects.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
index 71241c8..1e67d86 100644
--- a/doc/src/sgml/event-trigger.sgml
+++ b/doc/src/sgml/event-trigger.sgml
@@ -36,8 +36,8 @@
  The literalddl_command_start/ event occurs just before the
  execution of a literalCREATE/, literalALTER/, or literalDROP/
  command.  As an exception, however, this event does not occur for
- DDL commands targeting shared objects - databases, roles, and tablespaces
- - or for command targeting event triggers themselves.  The event trigger
+ DDL commands targeting shared objects mdash; databases, roles, and tablespaces
+ mdash; or for command targeting event triggers themselves.  The event trigger
  mechanism does not support these object types.
  literalddl_command_start/ also occurs just before the execution of a
  literalSELECT INTO/literal command, since this is equivalent to
@@ -46,6 +46,16 @@
/para
 
para
+To list all objects that have been deleted as part of executing a
+command, use the set returning
+function literalpg_event_trigger_dropped_objects()/ from
+your literalddl_command_end/ event trigger code (see
+xref linkend=functions-event-triggers). Note that
+the trigger is executed after the objects have been deleted from the
+system catalogs, so it's not possible to look them up anymore.
+   /para
+
+   para
  Event triggers (like other functions) cannot be executed in an aborted
  transaction.  Thus, if a DDL command fails with an error, any associated
  literalddl_command_end/ triggers will not be executed.  Conversely,
@@ -433,6 +443,11 @@
 entry align=centerliteralX/literal/entry
/row
row
+entry align=leftliteralDROP OWNED/literal/entry
+entry align=centerliteralX/literal/entry
+entry align=centerliteralX/literal/entry
+   /row
+   row
 entry align=leftliteralDROP RULE/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteralX/literal/entry
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9b7e967..5721d1b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15702,4 +15702,54 @@ FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger();
 xref linkend=SQL-CREATETRIGGER.
 /para
   /sect1
+
+  sect1 id=functions-event-triggers
+   titleEvent Trigger Functions/title
+
+   indexterm
+ primarypg_event_trigger_dropped_objects/primary
+   /indexterm
+
+   para
+Currently productnamePostgreSQL/ provides one built-in event trigger
+helper function, functionpg_event_trigger_dropped_objects/, which
+lists all object dropped by the command in whose literalddl_command_end/
+event it is called.  If the function is run in a context other than a
+literalddl_command_end/ event trigger function, or if it's run in the
+literalddl_command_end/ event of a command that does not drop objects,
+it will return the empty set.
+/para
+
+   para
+The functionpg_event_trigger_dropped_objects/ function can be used
+in an event trigger like this:
+programlisting
+CREATE FUNCTION test_event_trigger_for_drops()
+RETURNS event_trigger LANGUAGE plpgsql AS $$
+DECLARE
+obj record;
+BEGIN
+FOR obj IN SELECT * FROM pg_event_trigger_dropped_objects()
+LOOP
+RAISE NOTICE '% dropped object: % %.% %',
+ tg_tag,
+ obj.object_type,
+ obj.schema_name,
+ obj.object_name,
+ obj.subobject_name;
+END LOOP;
+END
+$$;
+CREATE EVENT TRIGGER test_event_trigger_for_drops
+   ON ddl_command_end
+   EXECUTE PROCEDURE test_event_trigger_for_drops();
+/programlisting
+/para
+
+ para
+   For more information about event triggers,
+   see xref linkend=event-triggers.
+

Re: [HACKERS] Materialized views WIP patch

Robert Haas robertmh...@gmail.com wrote:

 All that having been said, it's hard for me to imagine that
 anyone really cares about any of this until we have an
 incremental update feature, which right now we don't.

These are actually independent of one another, as long as we nail
down how we're determining freshness -- which is probably needed
for either.  Someone who's immersed in tuning long-running DW
queries might be interested in this before incremental update.
(They might load the data once per month, so refreshing the MVs as
a step in that process might be cheaper than incrementally
maintaining them.)  Someone could base freshness on
pg_relation_is_scannable() and start working on automatic query
rewrite right now, if they wanted to.

 Actually, I'm betting that's going to be significantly harder
 than automatic-query-rewrite, when all is said and done. 
 Automatic-query-rewrite, if and when we get it, will not be easy
 and will require a bunch of work from someone with a good
 understanding of the planner, but it strikes me as the sort of
 thing that might work out to one large project and then it's
 done.

I still think we're going to hit the wall on planning time under
certain circumstances and need to tweak that over the course of
several releases, but now is not the time to get into the details
of why I think that.  We've spent way too much time on it already
for the point we're at in the 9.3 cycle.  I've kept my concerns
hand-wavy on purpose, and am trying hard to resist the temptation
to spend a lot of time demonstrating the problems.

 Whereas, incremental update sounds to me like a series of
 projects over a series of releases targeting various special
 cases, where we can always point to some improvements vs. release
 N-1 but we're never actually done

Exactly.  I predict that we will eventually have some special sort
of trigger for maintaining MVs based on base table changes to
handle the ones that are just too expensive (in developer time or
run time) to fully automate.  But there is a lot of low-hanging
fruit for automation.

 Even a reasonably simplistic and partial implementation of
 incremental update will benefit a lot of users.

Agreed.

 But in terms of relative difficulty, it's not at all obvious to
 me that that's the easier part of the project.

I totally agree that getting something working to use MVs in place
of underlying tables is not all that different or more difficult
than using partial indexes.  I just predict that we'll get a lot of
complaints about cases where it results in worse performance and
we'll need to deal with those issues.  I don't seem that as being
brain surgery; just a messy matter of trying to get this pretty
theory to work well in the real world -- probably using a bunch of
not-so-elegant heuristics.  And in the end, the best you can hope
for is performance not noticeably worse than you would get if you
modified your query to explicitly use the MV(s) -- you're just
saving yourself the rewrite.  Well, OK, there is the point that,
(like indexes) if you run the query which hits the base tables with
different parameters, and a new plan is generated each time, it
might pick different MVs or exclude them as is most efficient for
the given parameters.  That's the Holy Grail of all this.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Nicolas Barbier

2013/3/5 Robert Haas robertmh...@gmail.com:

 All that having been said, it's hard for me to imagine that anyone
 really cares about any of this until we have an incremental update
 feature, which right now we don't.  Actually, I'm betting that's going
 to be significantly harder than automatic-query-rewrite, when all is
 said and done.

I agree.

E.g., things such as keeping a matview consistent relative to changes
applied to the base tables during the same transaction, might be
mightily difficult to implement in a performant way. OTOH, matviews
that can only be used for optimization if their base tables were not
changed “too recently” (e.g., by transactions that are still in
flight, including the current transaction), are probably kind of
useful in themselves as long as those base tables are not updated all
the time.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

Robert Haas robertmh...@gmail.com writes:
 On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner kgri...@ymail.com wrote:
 I don't think I disagree with any of what Simon says other than his
 feelings about the planning cost.

 I'm not sure I agree.  Suppose you have a query like SELECT * FROM a
 INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE some
 stuff.  The query planner will construct paths for scans on a, b, and
 c.  Then it will construct joinrels for (a b), (a c), (b c), and
 eventually (a b c) and calculate a set of promising paths for each of
 them.  If there is a materialized view available for one of those
 joinrels, all we really need to do is add the possible paths for
 scanning that materialized view to the joinrel.

That only works to the extent that a materialized view can be described
by a path.  My impression is that most of the use-cases for MVs will
involve aggregates or similar data reduction operators, and we don't
currently implement anything about aggregates at the Path level.
Arguably it would be useful to do so; in particular, we could get rid
of the currently hard-wired mechanism for choosing between sorted and
hashed aggregation, and perhaps there'd be a less grotty way to deal
with index-optimized MIN/MAX aggregates.  But there's a great deal to do
to make that happen, and up to now I haven't seen any indication that it
would do much except add overhead.

FWIW, my opinion is that doing anything like this in the planner is
going to be enormously expensive.  Index matching is already pretty
expensive, and that has the saving grace that we only do it once per
base relation.  Your sketch above implies trying to match to MVs once
per considered join relation, which will be combinatorially worse.
Even with a lot of sweat over reducing the cost of the matching, it
will hurt.

 All that having been said, it's hard for me to imagine that anyone
 really cares about any of this until we have an incremental update
 feature, which right now we don't.

Agreed.  Even if we're willing to have an approximate results are OK
GUC (which frankly strikes me as a horrid idea), people would certainly
not be willing to turn it on without some guarantee as to how stale the
results could be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Nicolas Barbier

2013/3/5 Kevin Grittner kgri...@ymail.com:

 Exactly.  I predict that we will eventually have some special sort
 of trigger for maintaining MVs based on base table changes to
 handle the ones that are just too expensive (in developer time or
 run time) to fully automate.  But there is a lot of low-hanging
 fruit for automation.

I think it would be totally OK to restrict the possible definitions
for matviews that can be maintained fully incrementally to something
like:

SELECT attributes and aggregations FROM trivial joins WHERE trivial
condition GROUP BY attributes;

Those definitions are the most useful for optimizing the things that
matviews are good at (joins and aggregation).

Nicolas

PS. Sorry for having fired off this discussion that obviously doesn’t
really relate to the current patch.

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

Nicolas Barbier nicolas.barb...@gmail.com wrote:

 PS. Sorry for having fired off this discussion that obviously
 doesn’t really relate to the current patch.

I know it's hard to resist.  While I think there will be a number
of people for whom the current patch will be a convenience and will
therefore use it, it is hard to look at what's there and *not* go
if only it also...

Perhaps it would be worth looking for anything in the patch that
you think might be painting us into a corner where it would be hard
to do all the other cool things.  While it's late enough in the
process that changing anything like that which you find would be
painful, it might be a lot more painful later if we release without
doing something about it.  My hope, of course, is that you won't
find any such thing.  With this patch I've tried to provide a
minimal framework onto which these other things can be bolted. 
I've tried hard not to do anything which would make it hard to
extend, but new eyes may see something I missed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

2013-03-05 Thread Peter Eisentraut

On 3/3/13 6:14 PM, Josh Berkus wrote:
 Currently Peter is punting (as is proper in a new patch) by having a
 separate extension for each combination (hstore/plperl, hstore/plpython,
 ltree/plpython, etc.).  This is obviously not a maintainable approach in
 the long run.

There is also a {Perl, Python, Ruby, etc.} binding for {libpq, libmysql,
libpng, libyaml, libssl, libgmp, etc.}, each as a separately
downloadable and buildable package.  I don't think anyone has ever
seriously considered a system by which if, say, you have Python and
libyaml installed, pyyaml magically appears.  Might be nice, but maybe
not.  The solution, in practice, is that you download pyyaml, and it
pulls in any required dependencies.  This would work the same way.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

Peter,

 There is also a {Perl, Python, Ruby, etc.} binding for {libpq, libmysql,
 libpng, libyaml, libssl, libgmp, etc.}, each as a separately
 downloadable and buildable package.  I don't think anyone has ever
 seriously considered a system by which if, say, you have Python and
 libyaml installed, pyyaml magically appears.  Might be nice, but maybe
 not.  The solution, in practice, is that you download pyyaml, and it
 pulls in any required dependencies.  This would work the same way.

That would be good too, although we don't currently have that
capability; if I try to install hstore_plpython without plpython
installed, it just errors out.  Aside from that, what can we reasonably
do for 9.3 to get this feature in?

Maybe we add a transforms/ subdirectory of contrib, so that it can be as
crowded as we want?  Or put the transforms on PGXN for now?

I want to see this feature go in so that the community starts writing
transforms this year instead of next year.

BTW, dependancies seem to be working OK for DROP:

postgres=# drop extension plpythonu;
ERROR:  cannot drop extension plpythonu because other objects depend on it
DETAIL:  extension hstore_plpythonu depends on extension plpythonu
function look_up_hstore(hstore,text) depends on transform for hstore
language plpythonu
extension ltree_plpythonu depends on extension plpythonu
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
STATEMENT:  drop extension plpythonu;
ERROR:  cannot drop extension plpythonu because other objects depend on it
DETAIL:  extension hstore_plpythonu depends on extension plpythonu
function look_up_hstore(hstore,text) depends on transform for hstore
language plpythonu
extension ltree_plpythonu depends on extension plpythonu
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

On 03/05/2013 01:09 PM, Kevin Grittner wrote:
 Josh Berkus j...@agliodbs.com wrote:
 
 There is no shortage of literature on the topic, although any
 papers from the ACM could certainly be of interest due to the
 general high quality of papers published there.  Adding anything
 like this to 9.3 is clearly out of the question, though, so I
 really don't want to spend time researching this now, or
 encouraging others to do so until after we have a 9.3 release
 candidate.

Good point.

Just FYI: once we start work on 9.4, some university team got planner
stuff for matviews working with postgres, using version 8.0 or something.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Trigger

Okay, I added a couple of lines to skip reporting dropped temp schemas,
and to skip any objects belonging to any temp schema (not just my own,
note).  Not posting a new version because the change is pretty trivial.

Now, one last thing that comes up is what about objects that don't have
straight names (such as pg_amop, pg_amproc, pg_default_acl etc etc), the
only thing you get is a catalog OID and an object OID ... but they are
pretty useless because by the time you get to the ddl_command_end
trigger, the objects are gone from the catalog.  Maybe we should report
*something* about those.  Say, perhaps the object description ... but if
we want that, it should be untranslated (i.e. not just what
getObjectDescription gives you, because that may be translated, so we
would need to patch it so that it only translates if the caller requests
it)

Another example is reporting of functions: right now you get the
function name .. but if there are overloaded functions there's no way to
know wihch one was dropped.  Example:

alvherre=# create function f(int, int) returns int language sql as $$ select $1 
+ $2; $$;
CREATE FUNCTION
alvherre=# create function f(int, int, int) returns int language sql as $$ 
select $1 + $2 + $3; $$;
CREATE FUNCTION
alvherre=# drop function f(int, int);
DROP FUNCTION
alvherre=# select * from dropped_objects ;
   type   | schema |  object   | subobj | curr_user | sess_user 
--++---++---+---
 function | public | f || alvherre  | alvherre

Maybe we could use the subobject_name field (what you see as subobj
above) to store the function signature (perhaps excluding the function
name), for example.  So you'd get object=f subobject=(int,int).
Or maybe we should stash the whole function signature as name and leave
subobject NULL.

The reason I'm worrying about this is that it might be important for
some use cases.  For instance, replication cases probably don't care
about that at all.  But if we want to be able to use event triggers for
auditing user activity, we need this info.

Thoughts?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

Peter,

I'm still getting intermittent linking failures:

postgres=# drop extension plperl cascade;
NOTICE:  drop cascades to extension hstore_plperl
DROP EXTENSION

postgres=# create extension plperl;
CREATE EXTENSION
postgres=# create extension hstore_plperl;
ERROR:  could not load library
/home/josh/pg93/lib/postgresql/hstore_plperl.so:
/home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
hstoreUniquePairs
STATEMENT:  create extension hstore_plperl;
ERROR:  could not load library
/home/josh/pg93/lib/postgresql/hstore_plperl.so:
/home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
hstoreUniquePairs
postgres=#

There appears to be something wonky which breaks when I've been running
9.2, shut it down, and fire up 9.3.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms


 postgres=# create extension plperl;
 CREATE EXTENSION
 postgres=# create extension hstore_plperl;
 ERROR:  could not load library
 /home/josh/pg93/lib/postgresql/hstore_plperl.so:
 /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
 hstoreUniquePairs
 STATEMENT:  create extension hstore_plperl;
 ERROR:  could not load library
 /home/josh/pg93/lib/postgresql/hstore_plperl.so:
 /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
 hstoreUniquePairs
 postgres=#
 
 There appears to be something wonky which breaks when I've been running
 9.2, shut it down, and fire up 9.3.

More on this: the problem appears to be that the symbols for hstore are
loaded only if I've just just created the extension in that database:

postgres=# create database plperlh
postgres-# ;
CREATE DATABASE
postgres=# \c plperlh;
You are now connected to database plperlh as user josh.
plperlh=# create extension plperl;
CREATE EXTENSION
plperlh=# create extension hstore;
CREATE EXTENSION
plperlh=# create extension hstore_plperl;
CREATE EXTENSION
plperlh=#

plperlh=# \c postgres
You are now connected to database postgres as user josh.
postgres=# create extension hstore_plperl;
ERROR:  could not load library
/home/josh/pg93/lib/postgresql/hstore_plperl.so:
/home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
PL_thr_key
STATEMENT:  create extension hstore_plperl;
ERROR:  could not load library
/home/josh/pg93/lib/postgresql/hstore_plperl.so:
/home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
PL_thr_key





-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

2013-03-05 Thread Joshua D. Drake



On 03/05/2013 02:52 PM, Josh Berkus wrote:


plperlh=# \c postgres
You are now connected to database postgres as user josh.
postgres=# create extension hstore_plperl;
ERROR:  could not load library
/home/josh/pg93/lib/postgresql/hstore_plperl.so:
/home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
PL_thr_key
STATEMENT:  create extension hstore_plperl;
ERROR:  could not load library
/home/josh/pg93/lib/postgresql/hstore_plperl.so:
/home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol:
PL_thr_key


What happens if you set your LD_LIBRARY_PATH to reflect each 
installation before you start the database?


JD











--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms


 What happens if you set your LD_LIBRARY_PATH to reflect each
 installation before you start the database?

No change, at least not setting it to $PGHOME/lib.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-05 Thread Michael Paquier

On Tue, Mar 5, 2013 at 11:22 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Mar 5, 2013 at 10:35 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Thanks for the review. All your comments are addressed and updated
 patches
  are attached.

 I got the compile warnings:
 tuptoaster.c:1539: warning: format '%s' expects type 'char *', but
 argument 3 has type 'Oid'
 tuptoaster.c:1539: warning: too many arguments for format

Fixed. Thanks for catching that.


 The patch doesn't handle the index on the materialized view correctly.

Hehe... I didn't know that materialized views could have indexes...
I fixed it, will send updated patch once I am done with Andres' comments.
-- 
Michael

Re: [HACKERS] Materialized view assertion failure in HEAD

2013-03-05 Thread Joachim Wieland

On Tue, Mar 5, 2013 at 9:11 AM, Kevin Grittner kgri...@ymail.com wrote:
 Will investigate.
 You don't have default_with_oids = on, do you?

No, this was a default install with a default postgresql.conf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Writable foreign tables: how to identify rows

One of the core problems for a writable-foreign-tables feature is how
to identify a previously-fetched row for UPDATE or DELETE actions.
In an ordinary Postgres table, we use the ctid system column for that,
but a remote table doesn't necessarily have such a thing.  (Oracle has
a rowid that acts a lot like our ctids, but I don't believe the
concept is common in other RDBMSes.)  Without any magic row identifier
such as these, I suppose an FDW would need to insist on knowing the
primary key for the remote table, so that it could update based on the
values of the pkey column(s).

The current writable-foreign-tables patch goes to great lengths to try
to cater for magic row identifiers of unspecified data types; which is
something I encouraged in the beginning, but now that I see the results
I think the messiness-to-usefulness quotient is awfully low.  Basically
what it's doing is hacking the TupleDesc associated with a foreign table
so that the descriptor (sometimes) includes extra columns.  I don't
think that's workable at all --- there are too many places that assume
that relation TupleDescs match up with what's in the catalogs.

I think if we're going to support magic row identifiers, they need to
be actual system columns, complete with negative attnums and entries
in pg_attribute.  This would require FDWs to commit to the data type
of a magic row identifier at foreign-table creation time, but that
doesn't seem like much of a restriction: probably any one FDW would
have only one possible way to handle a magic identifier.  So I'm
envisioning adding an FDW callback function that gets called at table
creation and returns an indication of which system columns the foreign
table should have, and then we actually make pg_attribute entries for
those columns.

For postgres_fdw, that would really be enough, since it could just
cause a ctid column to be created with the usual definition.  Then
it could put the remote ctid into the usual t_self field in returned
tuples.

Supporting magic identifiers that aren't of the TID data type is
considerably harder, mainly because it's not clear how heap_getsysattr()
could know how to fetch the column value.  I have some rough ideas
about that, but I suggest that we might want to punt on that extension
for the time being.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

Tom,

 One of the core problems for a writable-foreign-tables feature is how
 to identify a previously-fetched row for UPDATE or DELETE actions.
 In an ordinary Postgres table, we use the ctid system column for that,
 but a remote table doesn't necessarily have such a thing.  (Oracle has
 a rowid that acts a lot like our ctids, but I don't believe the
 concept is common in other RDBMSes.)  Without any magic row identifier
 such as these, I suppose an FDW would need to insist on knowing the
 primary key for the remote table, so that it could update based on the
 values of the pkey column(s).

Well, a good test case for this could be the various key-value stores,
where the foreign row identifier (FRI) is an immutable string key of
arbitrary size.  If we can support that, there's a lot of datastores we
can support, and even a rule of your target must have a single-column
key would be tolerable for non-postgres FDWs for quite a while.

 I think if we're going to support magic row identifiers, they need to
 be actual system columns, complete with negative attnums and entries
 in pg_attribute.  This would require FDWs to commit to the data type
 of a magic row identifier at foreign-table creation time, but that
 doesn't seem like much of a restriction: probably any one FDW would
 have only one possible way to handle a magic identifier.  So I'm
 envisioning adding an FDW callback function that gets called at table
 creation and returns an indication of which system columns the foreign
 table should have, and then we actually make pg_attribute entries for
 those columns.

Per the above, it would be good to allow the system column to also be
a column which is exposed to the user.

 Supporting magic identifiers that aren't of the TID data type is
 considerably harder, mainly because it's not clear how heap_getsysattr()
 could know how to fetch the column value.  I have some rough ideas
 about that, but I suggest that we might want to punt on that extension
 for the time being.

Well, if it gets pgsql_fdw in, I'm for it.  I'd always assumed that
writeable non-postgres targets were going to be hard.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-05 Thread Pavan Deolasee

On Wed, Mar 6, 2013 at 6:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 One of the core problems for a writable-foreign-tables feature is how
 to identify a previously-fetched row for UPDATE or DELETE actions.
 In an ordinary Postgres table, we use the ctid system column for that,
 but a remote table doesn't necessarily have such a thing.  (Oracle has
 a rowid that acts a lot like our ctids, but I don't believe the
 concept is common in other RDBMSes.)  Without any magic row identifier
 such as these, I suppose an FDW would need to insist on knowing the
 primary key for the remote table, so that it could update based on the
 values of the pkey column(s).

 The current writable-foreign-tables patch goes to great lengths to try
 to cater for magic row identifiers of unspecified data types; which is
 something I encouraged in the beginning, but now that I see the results
 I think the messiness-to-usefulness quotient is awfully low.  Basically
 what it's doing is hacking the TupleDesc associated with a foreign table
 so that the descriptor (sometimes) includes extra columns.  I don't
 think that's workable at all --- there are too many places that assume
 that relation TupleDescs match up with what's in the catalogs.

 I think if we're going to support magic row identifiers, they need to
 be actual system columns, complete with negative attnums and entries
 in pg_attribute.  This would require FDWs to commit to the data type
 of a magic row identifier at foreign-table creation time, but that
 doesn't seem like much of a restriction: probably any one FDW would
 have only one possible way to handle a magic identifier.  So I'm
 envisioning adding an FDW callback function that gets called at table
 creation and returns an indication of which system columns the foreign
 table should have, and then we actually make pg_attribute entries for
 those columns.

 For postgres_fdw, that would really be enough, since it could just
 cause a ctid column to be created with the usual definition.  Then
 it could put the remote ctid into the usual t_self field in returned
 tuples.


+1 for adding a new system attribute. We did something similar in
Postgres-XC, though problem there was much simpler because we always
knew that the remote FDW is a Postgres instance running the same
version. So we added a new system column node_id which does not get
any disk storage, but gets set during execution time depending on
which node the tuple belongs to. The ctid system column of course is
set to the remote ctid.

In the context of postgres_fdw, I am not sure if we need an additional
system column like a node_id. Would there be a possibility where
tuples to-be-modified are coming from different foreign tables and at
runtime we need to decide where to execute the UPDATE/DELETE operation
? If we start supporting inheritance involving foreign tables as child
tables, this will become a reality.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Index Unqiueness

2013-03-05 Thread abhinav batra

Hey
I want to work towards the follwing feature in TODO list:
Prevent index uniqueness checks when UPDATE does not modify the
columnUniqueness
(index) checks are done when updating a column even if the column is not
modified by the UPDATE. However, HOT already short-circuits this in common
cases, so more work might not be helpful.

So I wanted to know if someone is already working on this.

Thanks
Abhinav Batra

Re: [HACKERS] Reproducible Bus error in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)

2013-03-05 Thread Merlin Moncure

On Tue, Mar 5, 2013 at 3:04 PM, Kevin Grittner kgri...@ymail.com wrote:
 Dmitry Koterov dmi...@koterov.ru wrote:

 LOG:  server process (PID 18705) was terminated by signal 7: Bus error

 So far I have only heard of this sort of error when PostgreSQL is
 running in a virtual machine and the VM software is buggy.  If you
 are not running in a VM, my next two suspects would be
 hardware/BIOS configuration issues, or an antivirus product.

for posterity, what's the hardware platform?  software bus errors are
more likely on non x86 hardware.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Tatsuo Ishii

 Kevin Grittner kgri...@ymail.com wrote:
 
 REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
 
 Given the short time, I left out the [, ...].  If people think
 that is important to get into this release, a follow-on patch might
 be possible.
 
 Barring objections, I will use the above and push tomorrow.
 
 I'm still working on docs, and the changes related to the syntax
 change are still only lightly tested, but as far as I know, all is
 complete except for the docs.  I'm still working on those and
 expect to have them completed late today.  I'm posting this patch
 to allow a chance for final review of the code changes before I
 push.

Was the remaining work on docs done? I would like to test MVs and am
waiting for the docs completed.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

Hmm..

 Horiguch's patch does not seem to record minRecoveryPoint in
 ReadRecord();
 Attempt patch records minRecoveryPoint.
 [crash recovery - record minRecoveryPoint in control file - archive
 recovery]
 I think that this is an original intention of Heikki's patch.

It could be. Before that, my patch does not wake up hot standby
because I didn't care of minRecoveryPoint in it:-p

On the other hand, your patch fixes that point but ReadRecord
runs on the false page data. However the wrong record on the
false page can be identified as broken, It should be an
undesiable behavior.

If we want to show the final solution by ourselves, we need to
make another patch to settle them all. Let me take further
thought..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parameterized paths vs index clauses extracted from OR clauses

On Tue, Mar 5, 2013 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, the point is not so much about whether it's an improvement as that
 9.2's current behavior is a regression from 9.1 and earlier.  People may
 not like changes in minor releases, but they don't like regressions
 either.

That's true, but I'm still worried that we're just moving the
unhappiness around from one group of people to another group of
people, and I don't have a lot of confidence about which group is
larger.

 A downside of this approach is that to preserve
 the same-number-of-rows assumption, we'd end up having to enforce the
 extracted clauses as filter clauses in parameterized paths, even if
 they'd not proved to be of any use as index quals.

 I'm not sure I fully grasp why this is a downside.  Explain further?

 Because we'd be checking redundant clauses.  You'd get something like

 Nested Loop
 Filter: (foo OR (bar AND baz))

 ... some outer scan here ...

 Index Scan:
 Filter: (foo OR bar)

 If foo OR bar is useful as an indexqual condition in the inner scan,
 that's one thing.  But if it isn't, the cycles expended to check it in
 the inner scan are possibly wasted, because we'll still have to check
 the full original OR clause later.  It's possible that the filter
 condition removes enough rows from the inner scan's result to justify
 the redundant checks, but it's at least as possible that it doesn't.

Yeah, that's pretty unappealing.  It probably doesn't matter much if
foo is just a column reference, but what if it's an expensive
function?  For that matter, what if it's a volatile function that we
can't execute twice without changing the results?

 Since there's little point in using a paramaterized path in the first
 place unless it enables you to drastically reduce the number of rows
 being processed, I would anticipate that maybe the consequences aren't
 too bad, but I'm not sure.

 Yeah, we could hope that the inner scan is already producing few enough
 rows that it doesn't matter much.  But I think that we'd end up checking
 the added qual even in a non-parameterized scan; there's no mechanism
 for pushing quals into the general qual lists and then retracting them
 later.  (Hm, maybe what we need is a marker for enforce this clause
 only if you feel like it?)

Not sure I get the parenthesized bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request for vote to move forward with recovery.conf overhaul

On Sun, Mar 3, 2013 at 9:25 PM, Greg Smith g...@2ndquadrant.com wrote:
 -Allow a pg_ctl standby and pg_ctl recover command that work similarly
 to promote.  This should slim down the work needed for the first
 replication setup people do.
 -Make it obvious when people try to use recovery.conf that it's not
 supported anymore.
 -Provide a migration path for tool authors strictly in the form of some
 documentation and error message hints.  That was it as far as concessions to
 backward compatibility.

 The wrap-up I did started at
 http://www.postgresql.org/message-id/4ee91248.8010...@2ndquadrant.com and
 only had a few responses/controversy from there.  Robert wrote a good
 summary:

 1. Get rid of recovery.conf - error out if it is seen
 2. For each parameter that was previously a recovery.conf parameter, make it
 a GUC
 3. For the parameter that was does recovery.conf exist?, replace it with
 does standby.enabled exist?.

All that works for me.

 I thought this stopped from there because no one went back to clean up
 Fujii's submission, which Simon and Michael have now put more time into.
 There is not much distance between it and the last update Michael sent.
 Here's the detailed notes from my original proposal, with updates to
 incorporate the main feedback I got then; note that much of this is
 documentation rather than code:

 -Creating a standby.enabled file puts the system into recovery mode. That
 feature needs to save some state, and making those decisions based on
 existence of a file is already a thing we do.  Rather than emulating the
 rename to recovery.done that happens now, the server can just delete it, to
 keep from incorrectly returning to a state it's exited.  A UI along the
 lines of the promote one, allowing pg_ctl standby, should fall out of
 here.

 This file can be relocated to the config directory, similarly to how the
 include directory looks for things.  There was a concern that this would
 require write permissions that don't exist on systems that relocate configs,
 like Debian/Ubuntu.  That doesn't look to be a real issue though.  Here's a
 random Debian server showing the postgres user can write to all of those:

 $ ls -ld /etc/postgresql
 drwxr-xr-x 4 root root 4096 Jun 29  2012 /etc/postgresql
 $ ls -ld /etc/postgresql/9.1
 drwxr-xr-x 3 postgres postgres 4096 Jul  1  2012 /etc/postgresql/9.1
 $ ls -ld /etc/postgresql/9.1/main
 drwxr-xr-x 2 postgres postgres 4096 Mar  3 11:00 /etc/postgresql/9.1/main

 -A similar recovery.enabled file turns on PITR recovery.

 -It should be possible to copy a postgresql.conf file from master to standby
 and just use it.  For example:
 --standby_mode = on:  Ignored unless you've started the server with
 standby.enabled, won't bother the master if you include it.
 --primary_conninfo:  This will look funny on the master showing it
 connecting to itself, but it will get ignored there too.

 -If startup finds a recovery.conf file where it used to live at,
 abort--someone is expecting the old behavior.  Hint to RTFM or include a
 short migration guide right on the spot.  That can have a nice section about
 how you might use the various postgresql.conf include* features if they want
 to continue managing those files separately.  Example: rename what you used
 to make recovery.conf as replication.conf and use include_if_exists if you
 want to be able to rename it to recovery.done like before.  Or drop it into
 a config/ directory (similarly to the proposal for SET PERSISTENT) where the
 rename to recovery.done will make it then skipped.  (Only files ending with
 .conf are processed by include_dir)

 -Tools such as pgpool that want to write a simple configuration file,
 only touching the things that used to go into recovery.conf, can tell
 people to do the same trick.  End their postgresql.conf with a call to
 \include_if_exists replication.conf as part of setup.  While I don't
 like pushing problems toward tool vendors, as one I think validating if
 this has been done doesn't require the sort of fully GUC compatible
 parser people (rightly) want to avoid.  A simple scan of the
 postgresql.conf looking for the recommended text at the front of a line
 could confirm whether that bit is there.  And adding a single
 include_if_exists line to the end of the postgresql.conf is not a
 terrible edit job to consider pushing toward tools.  None of this is any
 more complicated than the little search and replace job that initdb does
 right now.

 -If you want to do something special yourself to clean up when recovery
 finishes, perhaps to better emulate the old those settings go away
 implementation, there's already recovery_end_command available for that.
 Let's say you wanted to force the old name and did include_if_exists
 config/recovery.conf.  Now you could do:

 recovery_end_command = 'rm -f /tmp/pgsql.trigger.5432  mv
 conf.d/recovery.conf conf.d/recovery.done'

No argument with any of that, either.

If that's what we're implementing, I'm on board.

--

Re: [HACKERS] pg_ctl idempotent option

2013-03-05 Thread Peter Eisentraut

On Mon, 2013-01-14 at 06:37 -0500, Peter Eisentraut wrote:
 Here is a patch to add an option -I/--idempotent to pg_ctl, the result
 of which is that pg_ctl doesn't error on start or stop if the server is
 already running or already stopped.

So apparently, pg_upgrade needs the existing behavior, so making the
idempotent option the only behavior won't work.  Therefore, I think this
patch is still useful as originally presented.  I've made one change
that pg_ctl won't print any messages if the -I option is used and the
server is already started/stopped.
diff --git a/contrib/start-scripts/linux b/contrib/start-scripts/linux
index b950cf5..03c6e50 100644
--- a/contrib/start-scripts/linux
+++ b/contrib/start-scripts/linux
@@ -84,17 +84,17 @@ case $1 in
 	echo -n Starting PostgreSQL: 
 	test x$OOM_SCORE_ADJ != x  echo $OOM_SCORE_ADJ  /proc/self/oom_score_adj
 	test x$OOM_ADJ != x  echo $OOM_ADJ  /proc/self/oom_adj
-	su - $PGUSER -c $DAEMON -D '$PGDATA'  $PGLOG 21
+	su - $PGUSER -c $DAEMON -I -D '$PGDATA'  $PGLOG 21
 	echo ok
 	;;
   stop)
 	echo -n Stopping PostgreSQL: 
-	su - $PGUSER -c $PGCTL stop -D '$PGDATA' -s -m fast
+	su - $PGUSER -c $PGCTL stop -I -D '$PGDATA' -s -m fast
 	echo ok
 	;;
   restart)
 	echo -n Restarting PostgreSQL: 
-	su - $PGUSER -c $PGCTL stop -D '$PGDATA' -s -m fast -w
+	su - $PGUSER -c $PGCTL stop -I -D '$PGDATA' -s -m fast -w
 	test x$OOM_SCORE_ADJ != x  echo $OOM_SCORE_ADJ  /proc/self/oom_score_adj
 	test x$OOM_ADJ != x  echo $OOM_ADJ  /proc/self/oom_adj
 	su - $PGUSER -c $DAEMON -D '$PGDATA'  $PGLOG 21
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 3107514..549730d 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -39,6 +39,7 @@
arg choice=optoption-o/option replaceableoptions/replaceable/arg
arg choice=optoption-p/option replaceablepath/replaceable/arg
arg choice=optoption-c/option/arg
+   arg choice=optoption-I/option/arg
   /cmdsynopsis
 
   cmdsynopsis
@@ -55,6 +56,7 @@
arg choice=plainoptioni[mmediate]/option/arg
  /group
/arg
+   arg choice=optoption-I/option/arg
   /cmdsynopsis
 
   cmdsynopsis
@@ -271,6 +273,25 @@ titleOptions/title
  /varlistentry
 
  varlistentry
+  termoption-I/option/term
+  termoption--idempotent/option/term
+  listitem
+   para
+When used with the literalstart/literal or literalstop/literal
+actions, return exit code 0 if the server is already running or
+stopped, respectively.  Otherwise, an error is raised and a nonzero
+exit code is returned in these cases.
+   /para
+
+   para
+This option is useful for System-V-style init scripts, which require
+the literalstart/literal and literalstop/literal actions to be
+idempotent.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-l replaceable class=parameterfilename/replaceable/option/term
   termoption--log replaceable class=parameterfilename/replaceable/option/term
   listitem
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 7d5e168..f6b92d1 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -86,6 +86,7 @@
 static char *pgdata_opt = NULL;
 static char *post_opts = NULL;
 static const char *progname;
+static bool idempotent = false;
 static char *log_file = NULL;
 static char *exec_path = NULL;
 static char *register_servicename = PostgreSQL;		/* FIXME: + version ID? */
@@ -774,9 +775,15 @@
 	{
 		old_pid = get_pgpid();
 		if (old_pid != 0)
-			write_stderr(_(%s: another server might be running; 
-		   trying to start server anyway\n),
-		 progname);
+		{
+			if (idempotent)
+exit(0);
+			else
+			{
+write_stderr(_(%s: another server might be running\n), progname);
+exit(1);
+			}
+		}
 	}
 
 	read_post_opts();
@@ -860,6 +867,8 @@
 
 	if (pid == 0)/* no pid file */
 	{
+		if (idempotent)
+			exit(0);
 		write_stderr(_(%s: PID file \%s\ does not exist\n), progname, pid_file);
 		write_stderr(_(Is server running?\n));
 		exit(1);
@@ -1764,9 +1773,9 @@
 	printf(_(%s is a utility to initialize, start, stop, or control a PostgreSQL server.\n\n), progname);
 	printf(_(Usage:\n));
 	printf(_(  %s init[db]   [-D DATADIR] [-s] [-o \OPTIONS\]\n), progname);
-	printf(_(  %s start   [-w] [-t SECS] [-D DATADIR] [-s] [-l FILENAME] [-o \OPTIONS\]\n), progname);
-	printf(_(  %s stop[-W] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]\n), progname);
-	printf(_(  %s restart [-w] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]\n
+	printf(_(  %s start   [-w] [-t SECS] [-D DATADIR] [-s] [-I] [-l FILENAME] [-o \OPTIONS\]\n), progname);
+	printf(_(  %s stop[-W] [-t SECS] [-D DATADIR] [-s] [-I] [-m SHUTDOWN-MODE]\n), progname);
+	printf(_(  %s restart [-w] [-t SECS] [-D DATADIR] [-s]  [-m SHUTDOWN-MODE]\n
 			  [-o \OPTIONS\]\n), progname);
 	printf(_(  %s reload  [-D DATADIR] [-s]\n),

Re: [HACKERS] sql_drop Event Trigger

Alvaro Herrera escribió:

 Now, one last thing that comes up is what about objects that don't have
 straight names (such as pg_amop, pg_amproc, pg_default_acl etc etc), the
 only thing you get is a catalog OID and an object OID ... but they are
 pretty useless because by the time you get to the ddl_command_end
 trigger, the objects are gone from the catalog.  Maybe we should report
 *something* about those.

Here's another idea --- have three columns, type, schema (as in the
current patch and as shown above), and a third one for object identity.

For tables and other objects that have simple names, the identity would
be their names.  For columns, it'd be tablename.columnname.  For
functions, it'd be the complete signature.  And so on.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-05 Thread Michael Paquier

On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee pavan.deola...@gmail.comwrote:

 +1 for adding a new system attribute. We did something similar in
 Postgres-XC, though problem there was much simpler because we always
 knew that the remote FDW is a Postgres instance running the same
 version. So we added a new system column node_id which does not get
 any disk storage, but gets set during execution time depending on
 which node the tuple belongs to. The ctid system column of course is
 set to the remote ctid.

Just thinking aloud... This could also be merged in XC code and reduce XC
code footprint on PG core. http://www.linkedin.com/in/pavandeolasee
-- 
Michael

Re: [HACKERS] Parameterized paths vs index clauses extracted from OR clauses

Robert Haas robertmh...@gmail.com writes:
 On Tue, Mar 5, 2013 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If foo OR bar is useful as an indexqual condition in the inner scan,
 that's one thing.  But if it isn't, the cycles expended to check it in
 the inner scan are possibly wasted, because we'll still have to check
 the full original OR clause later.  It's possible that the filter
 condition removes enough rows from the inner scan's result to justify
 the redundant checks, but it's at least as possible that it doesn't.

 Yeah, that's pretty unappealing.  It probably doesn't matter much if
 foo is just a column reference, but what if it's an expensive
 function?  For that matter, what if it's a volatile function that we
 can't execute twice without changing the results?

Well, *that* worry at least is a nonissue: if the clause contains
volatile functions then it's not a candidate to be an indexqual anyway.
The code that pulls out these simplified OR clauses is only looking for
clauses that can be shown to match existing indexes, so it won't pick
anything like that.  But expensive functions could be a hazard.

 (Hm, maybe what we need is a marker for enforce this clause
 only if you feel like it?)

 Not sure I get the parenthesized bit.

I was thinking that we'd extract the simplified clause and then mark it
as something to be used only if it can be used as an indexqual.
However, on second thought that still leaves us with the problem that
some parameterized paths yield more rows than others.  Maybe that
assumption simply has to go ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-05 Thread Kohei KaiGai

2013/3/6 Tom Lane t...@sss.pgh.pa.us:
 One of the core problems for a writable-foreign-tables feature is how
 to identify a previously-fetched row for UPDATE or DELETE actions.
 In an ordinary Postgres table, we use the ctid system column for that,
 but a remote table doesn't necessarily have such a thing.  (Oracle has
 a rowid that acts a lot like our ctids, but I don't believe the
 concept is common in other RDBMSes.)  Without any magic row identifier
 such as these, I suppose an FDW would need to insist on knowing the
 primary key for the remote table, so that it could update based on the
 values of the pkey column(s).

 The current writable-foreign-tables patch goes to great lengths to try
 to cater for magic row identifiers of unspecified data types; which is
 something I encouraged in the beginning, but now that I see the results
 I think the messiness-to-usefulness quotient is awfully low.  Basically
 what it's doing is hacking the TupleDesc associated with a foreign table
 so that the descriptor (sometimes) includes extra columns.  I don't
 think that's workable at all --- there are too many places that assume
 that relation TupleDescs match up with what's in the catalogs.

 I think if we're going to support magic row identifiers, they need to
 be actual system columns, complete with negative attnums and entries
 in pg_attribute.  This would require FDWs to commit to the data type
 of a magic row identifier at foreign-table creation time, but that
 doesn't seem like much of a restriction: probably any one FDW would
 have only one possible way to handle a magic identifier.  So I'm
 envisioning adding an FDW callback function that gets called at table
 creation and returns an indication of which system columns the foreign
 table should have, and then we actually make pg_attribute entries for
 those columns.

 For postgres_fdw, that would really be enough, since it could just
 cause a ctid column to be created with the usual definition.  Then
 it could put the remote ctid into the usual t_self field in returned
 tuples.

 Supporting magic identifiers that aren't of the TID data type is
 considerably harder, mainly because it's not clear how heap_getsysattr()
 could know how to fetch the column value.  I have some rough ideas
 about that, but I suggest that we might want to punt on that extension
 for the time being.

 Thoughts?

Sorry, -1 for me.

The proposed design tried to kill two birds with one stone.
The reason why the new GetForeignRelWidth() can reserve multiple slot
for pseudo-columns is, that intends to push-down complex calculation in
target-list to the remote computing resource.

For example, please assume the third target-entry below is very complex
calculation, thus, it consumes much CPU cycles.

  SELECT x, y, (x-y)^2 from remote_table;

FDW driver will able to replace it with just a reference to a pseudo-column
that shall hold the calculation result of (x-y)^2 in remote side. It does not
take a calculation in local side, it can assist CPU off-load.

If we have a particular system attribute to carry remote rowid on foreign-
table declaration time, it will loose opportunity of such kind of optimization.

When we handle a query including sub-queries, it generates its TupleDesc
in run-time without problems. I don't think we have no special reason that
we cannot admit foreign-tables to adopt an alternative TupleDesc being
constructed in run-time.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request for vote to move forward with recovery.conf overhaul

2013-03-05 Thread Michael Paquier

Thanks for taking time in typing a complete summary of the situation. That
really helps.

On Mon, Mar 4, 2013 at 11:25 AM, Greg Smith g...@2ndquadrant.com wrote:

On 1/23/13 6:36 AM, Michael Paquier wrote:

The only problem I see is if the same parameter is defined in
recovery.conf and postgresql.conf, is the priority given to recovery.conf?

This sort of thing is what dragged down the past work on this. I don't
really agree with the idea this thread is based on, that it was backward
compatibility that kept this from being finished last year. I put a good
bit of time into a proposal that a few people seemed happy with; all its
ideas seem to have washed away. That balanced a couple of goals:

-Allow a pg_ctl standby and pg_ctl recover command that work similarly
to promote. This should slim down the work needed for the first
replication setup people do.
-Make it obvious when people try to use recovery.conf that it's not
supported anymore.
-Provide a migration path for tool authors strictly in the form of some
documentation and error message hints. That was it as far as concessions
to backward compatibility.

The wrap-up I did started at http://www.postgresql.org/**
message-id/4EE91248.8010505@**2ndQuadrant.comhttp://www.postgresql.org/message-id/4ee91248.8010...@2ndquadrant.comand
only had a few responses/controversy from there. Robert wrote a good
summary:

1. Get rid of recovery.conf - error out if it is seen
2. For each parameter that was previously a recovery.conf parameter, make
it a GUC
3. For the parameter that was does recovery.conf exist?, replace it with
does standby.enabled exist?.

Agreed on that.

I thought this stopped from there because no one went back to clean up
Fujii's submission, which Simon and Michael have now put more time into.
There is not much distance between it and the last update Michael sent.

Just to be picky. I recommend using the version dated of 23rd of January as
a work base if we definitely get rid of recovery.conf, the patch file is
called 20130123_recovery_unite.patch.
The second patch I sent on the 27th tried to support both recovery.conf and
postgresql.conf, but this finishes with a lot of duplicated code as two
sets of functions are necessary to deparse options for both postgresql.conf
and recovery.conf...

Here's the detailed notes from my original proposal, with updates to
incorporate the main feedback I got then; note that much of this is
documentation rather than code:

-Creating a standby.enabled file puts the system into recovery mode. That
feature needs to save some state, and making those decisions based on
existence of a file is already a thing we do. Rather than emulating the
rename to recovery.done that happens now, the server can just delete it, to
keep from incorrectly returning to a state it's exited. A UI along the
lines of the promote one, allowing pg_ctl standby, should fall out of
here.

This file can be relocated to the config directory, similarly to how the
include directory looks for things. There was a concern that this would
require write permissions that don't exist on systems that relocate
configs, like Debian/Ubuntu. That doesn't look to be a real issue though.
Here's a random Debian server showing the postgres user can write to all
of those:

$ ls -ld /etc/postgresql
drwxr-xr-x 4 root root 4096 Jun 29 2012 /etc/postgresql
$ ls -ld /etc/postgresql/9.1
drwxr-xr-x 3 postgres postgres 4096 Jul 1 2012 /etc/postgresql/9.1
$ ls -ld /etc/postgresql/9.1/main
drwxr-xr-x 2 postgres postgres 4096 Mar 3 11:00 /etc/postgresql/9.1/main

-A similar recovery.enabled file turns on PITR recovery.

-It should be possible to copy a postgresql.conf file from master to
standby and just use it. For example:
--standby_mode = on: Ignored unless you've started the server with
standby.enabled, won't bother the master if you include it.
--primary_conninfo: This will look funny on the master showing it
connecting to itself, but it will get ignored there too.

-If startup finds a recovery.conf file where it used to live at,
abort--someone is expecting the old behavior. Hint to RTFM or include a
short migration guide right on the spot. That can have a nice section
about how you might use the various postgresql.conf include* features if
they want to continue managing those files separately. Example: rename
what you used to make recovery.conf as replication.conf and use
include_if_exists if you want to be able to rename it to recovery.done like
before. Or drop it into a config/ directory (similarly to the proposal for
SET PERSISTENT) where the rename to recovery.done will make it then
skipped. (Only files ending with .conf are processed by include_dir)

-Tools such as pgpool that want to write a simple configuration file,
only touching the things that used to go into recovery.conf, can tell
people to do the same trick. End their postgresql.conf with a call to

Re: [HACKERS] 9.2.3 crashes during archive recovery