date:20130920

On Thu, Sep 19, 2013 at 10:28 PM, Robert Haas robertmh...@gmail.com wrote:
 Marking this as Ready for committer.

 Thanks a ton for reviewing the patch.

 I rewrote the comments for this patch and fixed the incorrect
 formatting in parse_relation.c.  It used spaces instead of tabs, which
 is why if you look at the patch file you'll see that the alignment
 looked off.  See new version, attached.
   Thanks, Attached version looks better.

 However, I have a few residual questions:

 1. Does copy.c also need to be fixed?  I see that it also calls
 ExecConstraints(), and I don't see it setting tableOid anywhere.  We
 certainly want COPY and INSERT to behave the same way.

I have missed it by confusing it with OID, as OID is set in COPY.
I think along with COPY, it needs to set in ATRewriteTable() as well,
because during rewrite also we check constraints.

I will send an updated patch after point-2 is concluded.


 2. Should constraints also allow access to the oid column?  Right
 now you can do that but it doesn't seem to work, e.g.:

 rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
 CREATE TABLE

I have tried various combinations, it is giving error at my end.
postgres=# create table tbl_with_oid(c1 int) With OIDS;
CREATE TABLE
postgres=# alter table tbl_with_oid add check (oid  16403);
ERROR:  system column oid reference in check constraint is invalid
postgres=# alter table tbl_with_oid add check (oid =0);
ERROR:  system column oid reference in check constraint is invalid
postgres=# alter table tbl_with_oid add check (oid::integer %2 =0);
ERROR:  system column oid reference in check constraint is invalid
postgres=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
ERROR:  system column oid reference in check constraint is invalid

 Just prohibiting that might be fine; it doesn't seem that useful
 anyway.

Currently only tableOid is allowed, other system columns should error
out due to below check:
+   /* In constraint check, no system column is allowed except 
tableOid */
+   if (pstate-p_expr_kind == EXPR_KIND_CHECK_CONSTRAINT 
+   attnum  InvalidAttrNumber  attnum !=  
TableOidAttributeNumber)

 But if we're setting the table OID, maybe we should set the
 OID too, and then just allow this.
   It can be done for OID as well. I don't have any strong reason for
either doing or not doing it, if you think it can be of use then we
can add it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior

On Fri, Sep 20, 2013 at 12:57 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 10:28 PM, Robert Haas robertmh...@gmail.com wrote:
 Marking this as Ready for committer.

 Thanks a ton for reviewing the patch.

 I rewrote the comments for this patch and fixed the incorrect
 formatting in parse_relation.c.  It used spaces instead of tabs, which
 is why if you look at the patch file you'll see that the alignment
 looked off.  See new version, attached.
Thanks, Attached version looks better.

 However, I have a few residual questions:

 1. Does copy.c also need to be fixed?  I see that it also calls
 ExecConstraints(), and I don't see it setting tableOid anywhere.  We
 certainly want COPY and INSERT to behave the same way.

 I have missed it by confusing it with OID, as OID is set in COPY.
 I think along with COPY, it needs to set in ATRewriteTable() as well,
 because during rewrite also we check constraints.

 I will send an updated patch after point-2 is concluded.


 2. Should constraints also allow access to the oid column?  Right
 now you can do that but it doesn't seem to work, e.g.:

 rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
 CREATE TABLE

 I have tried various combinations, it is giving error at my end.
 postgres=# create table tbl_with_oid(c1 int) With OIDS;
 CREATE TABLE
 postgres=# alter table tbl_with_oid add check (oid  16403);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# alter table tbl_with_oid add check (oid =0);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# alter table tbl_with_oid add check (oid::integer %2 =0);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
 ERROR:  system column oid reference in check constraint is invalid

I am sorry, I just realized after pressing send button that you want
to say the above point without considering above patch.

 Just prohibiting that might be fine; it doesn't seem that useful
 anyway.

 Currently only tableOid is allowed, other system columns should error
 out due to below check:
 +   /* In constraint check, no system column is allowed except 
 tableOid */
 +   if (pstate-p_expr_kind == EXPR_KIND_CHECK_CONSTRAINT 
 +   attnum  InvalidAttrNumber  attnum !=  
 TableOidAttributeNumber)

 But if we're setting the table OID, maybe we should set the
 OID too, and then just allow this.
It can be done for OID as well. I don't have any strong reason for
 either doing or not doing it, if you think it can be of use then we
 can add it.

One more thing, I think it will be better to update information in
docs, probably in Create Table page where CHECK constraints are
explained and where DDL constraints are explained at link:
http://www.postgresql.org/docs/devel/static/ddl-constraints.html

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dump/Reload broken with relocatable extensions

2013-09-20 Thread Vik Fearing

On 09/19/2013 11:40 PM, Dimitri Fontaine wrote:
 Vik Fearing vik.fear...@dalibo.com writes:
 I don't know if this has been discussed before, a cursory search of the
 archives didn't turn up anything interesting.  I perhaps didn't put in
 the right keywords.
 For others not to spend too much time on this: it seems like a problem
 with the extension not abiding by the rules about its relocatable
 property and the @extschema@ thingy.

   http://www.postgresql.org/docs/9.3/static/extend-extensions.html#AEN54999

I can't get this to work.  If I modify my function to be

CREATE FUNCTION no_accents(text)
RETURNS boolean
AS 'select $1 = unaccent($1);'
LANGUAGE sql STABLE STRICT
SET search_path = '@extschema@';

then I get

d=# create extension unaccent;
ERROR:  function unaccent(text) does not exist
LINE 1: select $1 = unaccent($1);
^
HINT:  No function matches the given name and argument types. You might
need to add explicit type casts.
QUERY:  select $1 = unaccent($1);


If I modify it to be

CREATE FUNCTION no_accents(text)
RETURNS boolean
AS 'select $1 = unaccent($1);'
LANGUAGE sql STABLE STRICT;
ALTER FUNCTION no_accents(text) SET search_path = '@extschema@';

then I get the same restore problem I originally described.

What am I doing wrong?

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-20 Thread Daniel Farina

On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina dan...@fdr.io wrote:
 I think the n-call underestimation propagation may not be quite precise for
 various detailed reasons (having to do with 'sticky' queries) and to make it
 precise is probably more work than it's worth.  And, on more reflection, I'm
 also having a hard time imaging people intuiting that value usefully.  So,
 here's a version removing that.

I forgot about removal of the relevant SGML, amended here in v6.


pg_stat_statements-identification-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertions in PL/PgSQL

2013-09-20 Thread Amit Khandekar

On 16 September 2013 03:43, Marko Tiikkaja ma...@joh.to wrote:

 On 2013-09-15 23:23, Jaime Casanova wrote:

 If using ASSERT as keyword is not acceptable, not that i agree but in
 case.
 What about using RAISE EXCEPTION WHEN (condition)


 I was going to suggest the same idea: Extend RAISE syntax without
introducing new keywords. Something like:
RAISE assert_exception WHEN assert_condition
... where assert_exception is a new exception label which maps to a new
internal sqlstate.


 I think it would be extremely surprising if a command like that got
 optimized away based on a GUC, so I don't think that would be a good idea.


In pl_gram.y, in the rule stmt_raise, determine that this RAISE is for
ASSERT, and then return NULL if plpgsql_curr_compile-enable_assertions is
false. Isn't this possible ?





 Regards,
 Marko Tiikkaja


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread Martijn van Oosterhout

On Fri, Sep 20, 2013 at 08:58:53AM +0900, Tatsuo Ishii wrote:
 For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is
 UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
 SHIFT-JIS and database encoding is UTF-8 because there is a conversion
 between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
 SHIFT-JIS and database encoding is ISO-8859-1 because there's no
 conversion between them.

As far as I can tell the whole reason for introducing NCHAR is to
support SHIFT-JIS, there hasn't been call for any other encodings, that
I can remember anyway.

So rather than this whole NCHAR thing, why not just add a type
sjistext, and a few type casts and call it a day...

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-20 Thread Sameer Thakur


 Attached patch combines documentation patch and source-code patch.


I have had a stab at reviewing the documentation. Have a look.

--- a/doc/src/sgml/config.sgml

+++ b/doc/src/sgml/config.sgml

@@ -1749,6 +1749,50 @@ include 'filename'

   /listitem

  /varlistentry

+ varlistentry id=guc-synchronous-transfer
xreflabel=synchronous_transfer

+  termvarnamesynchronous_transfer/varname
(typeenum/type)/term

+  indexterm

+   primaryvarnamesynchronous_transfer/ configuration
parameter/primary

+  /indexterm

+  listitem

+   para

+This parameter controls the synchronous nature of WAL transfer and

+maintains file system level consistency between master server and

+standby server. It specifies whether master server will wait for
file

+system level change (for example : modifying data page) before

+the corresponding WAL records are replicated to the standby server.

+   /para

+   para

+Valid values are literalcommit/, literaldata_flush/ and

+literalall/. The default value is literalcommit/, meaning

+that master will only wait for transaction commits, this is
equivalent

+to turning off literalsynchronous_transfer/ parameter and
standby

+server will behave as a quotesynchronous standby / in

+Streaming Replication. For value literaldata_flush/, master
will

+wait only for data page modifications but not for transaction

+commits, hence the standby server will act as quoteasynchronous

+failback safe standby/. For value literal all/, master will
wait

+for data page modifications as well as for transaction commits and

+resultant standby server will act as quotesynchronous failback
safe

+standby/.The wait is on background activities and hence will not
create performance overhead.

+  To configure synchronous failback safe standby

+xref linkend=guc-synchronous-standby-names should be set.

+   /para

+  /listitem

+ /varlistentry



@@ -2258,14 +2302,25 @@ include 'filename'/indexterm

   listitem

para

-Specifies a comma-separated list of standby names that can support

-firsttermsynchronous replication/, as described in

-xref linkend=synchronous-replication.

-At any one time there will be at most one active synchronous
standby;

-transactions waiting for commit will be allowed to proceed after

-this standby server confirms receipt of their data.

-The synchronous standby will be the first standby named in this
list

-that is both currently connected and streaming data in real-time

+Specifies a comma-separated list of standby names. If this
parameter

+is set then standby will behave as synchronous standby in
replication,

+as described in xref linkend=synchronous-replication or
synchronous

+failback safe standby, as described in xref
linkend=failback-safe.

+At any time there will be at most one active standby; when standby
is

+synchronous standby in replication, transactions waiting for commit

+will be allowed to proceed after this standby server confirms
receipt

+of their data. But when standby is synchronous failback safe
standby

+data page modifications as well as transaction commits will be
allowed

+to proceed only after this standby server confirms receipt of
their data.

+If this parameter is set to empty value and

+xref linkend=guc-synchronous-transfer is set to
literaldata_flush/

+then standby is called as asynchronous failback safe standby and
only

+data page modifications will wait before corresponding WAL record
is

+replicated to standby.

+   /para

+   para

+Synchronous standby in replication will be the first standby named
in

+this list that is both currently connected and streaming data in
real-time

 (as shown by a state of literalstreaming/literal in the

 link linkend=monitoring-stats-views-table

 literalpg_stat_replication//link view).





--- a/doc/src/sgml/high-availability.sgml

+++ b/doc/src/sgml/high-availability.sgml

+

+  sect2 id=failback-safe

+ titleSetting up failback safe standby/title

+

+   indexterm zone=high-availability

+   primarySetting up failback safe standby/primary

+   /indexterm

+

+   para

+ PostgreSQL streaming replication offers durability, but if the master
crashes and

+a particular WAL record is unable to reach to standby server, then that

+WAL record is present on master server but not on standby server.

+In such a case master is ahead of standby server in term of WAL records
and data in database.

+This leads to file-system level inconsistency between master and standby
server.

+For example a heap page update on the master might not have been reflected
on standby when

Re: [HACKERS] Assertions in PL/PgSQL

2013-09-20 Thread Marko Tiikkaja


On 9/20/13 12:09 PM, Amit Khandekar wrote:

On 16 September 2013 03:43, Marko Tiikkaja ma...@joh.to wrote:

I think it would be extremely surprising if a command like that got
optimized away based on a GUC, so I don't think that would be a good idea.



In pl_gram.y, in the rule stmt_raise, determine that this RAISE is for
ASSERT, and then return NULL if plpgsql_curr_compile-enable_assertions is
false. Isn't this possible ?


Of course it's possible.  But I, as a PostgreSQL user writing PL/PgSQL 
code, would be extremely surprised if this new cool option to RAISE 
didn't work for some reason.  If we use ASSERT the situation is 
different; most people will realize it's a new command and works 
differently from RAISE.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Martijn van Oosterhout

On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
 I think there is agreement that better (as in more obscure)
 operators than === and !== need to be picked  and we need to find a
 place in the user-docs to warn users of the behaviour of this
 operators.   Hannu has proposed
 
 *==*   binary equal, surely very equal by any other definition as wall
 !==?  maybe not equal -- binary inequal, may still be equal by
 other comparison methods

It's a pity operators must be non-alpha and can't be named. Something
like:

SELECT foo OPERATOR(byte_equivalent) bar;

is simultaneously obscure, yet clear.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] Assertions in PL/PgSQL

2013-09-20 Thread Marko Tiikkaja


On 9/19/13 9:09 PM, Robert Haas wrote:

Personally, I'm pretty skeptical about the value of adding dedicated
syntax for this.  I mean, I'll be the first to admit that PL/pgsql is
a clunky and awkward language.  But somebody's always proposing
something that they think will make it better, and I feel somehow that
if we accept all of those proposals at face value, we'll just end up
with a mess.  So IMHO the bar for adding new syntax to PL/pgsql should
be reasonably high.  YMMV, of course, and probably does.


I see where you're coming from, and agree, to an extent.


The issue of how this is spelled is somewhat secondary for me.  I
think ASSERT is probably as good as anything.  But let's not kid
ourselves: even reserving this word only in PL/pgsql WILL break things
for some users, and there are LOTS of people out there with LOTS of
procedural code.  Every tiny backward-incompatibility reduces by just
a little bit the percentage of those people who can upgrade and
increases the delay before they do.  This is an area where past
experience has made me quite wary.


The thing is, what this means is that to add a new feature to the 
language, you have to make the syntax so damn ugly that no one wants to 
use it (see row_count, for example) or you will break some poor user's 
function.  And now we got all this cool functionality which nobody wants 
to use, and the language itself actually gets progressively worse.  All 
this is starting to sound like it's already too late to make PL/PgSQL 
better, and we should just start afresh.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql tab completion for updatable foreign tables

2013-09-20 Thread Samrat Revagade



  Okay, are you adding this to the september commitfest?
 

 OK, I've done that. I think that it's too late for 9.3.



+1 for idea.

I have tested patch and got surprising results with Cent-OS
Patch is working fine for Cent-OS 6.2 and RHEL 6.3
But is is giving problem on Cent-OS 6.3 (tab complete for local tables but
not for foreign tables)

I have used following script:

Two postgres  servers are running by using ports 5432 and 5433.
Server with port 5432 has postgres_fdw installed and will interact with the
remote server running under port 5433.

psql -p 5433 -c CREATE TABLE aa_remote (a int, b int) postgres
postgres=# CREATE EXTENSION postgres_fdw;
postgres=# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port '5433', dbname 'postgres');
postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS
(password '');
postgres=# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER
postgres_server OPTIONS (table_name 'aa_remote');
postgres=# INSERT into aa_foreign values (1,2);

But while doing any operation on aa_foreign tab do not complete on Cent-OS
6.3 (tab complete for local tables but not for foreign tables)
Is that a problem ?

Regards,
Samrat Revagade

Re: [HACKERS] logical changeset generation v6

Hi,

On 2013-09-19 12:05:35 -0400, Robert Haas wrote:
 No question.  I'm not saying that that optimization shouldn't go in
 right after the main patch does, but IMHO right now there are too many
 things going in the 0004 patch to discuss them all simultaneously.
 I'd like to find a way of splitting this up that will let us
 block-and-tackle individual pieces of it, even we end up committing
 them all one right after the other.

Fine with me. I was critized for splitting up stuff too much before ;)

Expect a finer-grained series.

 But that raises an interesting question: why is the overhead so bad?
 I mean, this shouldn't be any worse than having a series of
 transactions running concurrently with pgbench that take a snapshot
 and hold it for as long as it takes the decoding process to decode the
 most-recently committed transaction.

Pgbench really slows down scarily if there are some slightly longer
running transactions around...

 Is the issue here that we can't
 advance xmin until we've fsync'd the fruits of decoding down to disk?

Basically yes. We only advance the xmin of the slot so far that we could
still build a valid snapshot to decode the first transaction not
confirmed to have been synced to disk by the client.

 If so, that's mighty painful.  But we'd really only need to hold back
 xmin in that situation when some catalog change has occurred
 meanwhile, which for pgbench means never.  So something seems fishy
 here.

It's less simple than that. We need to protect against concurrent DDL
producing deleted rows that we will still need. We need
HeapTupleStisfiesVacuum() to return HEAPTUPLE_RECENTLY_DEAD not
HEAPTUPLE_DEAD for such rows, right?
The way to do that is to guarantee that if
TransactionIdDidCommit(xmax) is true, TransactionIdPrecedes(xmax, OldestXmin) 
is also true.
So, we need to peg OldestXmin (as passed to HTSV) to the xid of the
oldest transaction we're still decoding.

I am not sure how you could do that iff somewhere in the future DDL has
started since there's no interlock preventing anyone against doing so.

  - It still bothers me that we're going to have mandatory slots for
  logical replication and no slots for physical replication.

  If people think this needs to be a general facility from the start, I
  can be convinced that way, but I think there's so much to discuss around
  the semantics and different usecases that I'd much prefer to discuss
  that later.
 
 I'm worried that if we don't know how the physical replication slots
 are going to work, they'll end up being randomly different from the
 logical replication slots, and that'll be an API wart which we'll have
 a hard time getting rid of later.

Hm. I actually think that minus some s/Logical//g and a mv won't be much
need to change on the slot interface itself.

What we need for physical rep is basically to a) store the position up
to where the primary has fsynced the WAL b) store the xmin horizon the standby
currently has.
Sure, we can store more stats (most of pg_stat_replication, perhaps some
more) but that's not functionally critical and not hard to extend.

The points I find daunting are the semantics, like:
* How do we control whether a standby is allowed prevent WAL file
  removal. What if archiving is configured?
* How do we control whether a standby is allowed to peg xmin?
* How long do we peg an xmin/wal file removal if the standby is gone
* How does the userinterface look to remove a slot if a standby is gone
* How do we decide/control which commands use a slot in which cases?


  - What is the purpose of (Un)SuspendDecodingSnapshots?  It seems that
  should be explained somewhere.  I have my doubts about how safe that
  is.
 
  I'll document the details if they aren't right now. Consider what
  happens if somebody does something like: VACUUM FULL pg_am;. If we
  were to build the relation descriptor of pg_am in an historical
  snapshot, as you coin it, we'd have the wrong filenode in there. And
  consequently any future lookups in pg_am will open a file that doesn't
  exist.
  That problem only exist for non-nailed relations that are accessed
  during decoding.
 
 But if it's some user table flagged with the terribly-named
 treat_as_catalog_table flag, then they could have not only changed the
 relfilenode but also the tupledesc.  And then you can't just wave your
 hands at the problem.

Heh. Well cought.

There's a comment about that somewhere... Those are problematic, my plan
so far is to throw my hands up and forbid alter tables that rewrite
those.

I know you don't like that flag and especially it's name. I am open to
suggestions to a) rename it b) find a better solution. I am pretty sure
a) is possible but I have severe doubts about any realistic b).

  Yes, I don't like it either. I am not sure what to replace it with
  though. It's easy enough to fit something in GetCatalogSnapshot() and I
  don't have a problem with that, but I am less happy with adding code
  like that to GetSnapshotData() for

[HACKERS] SSI freezing bug

2013-09-20 Thread Heikki Linnakangas


Hi,

Prompted by Andres Freund's comments on my Freezing without Write I/O 
patch, I realized that there's there's an existing bug in the way 
predicate locking handles freezing (or rather, it doesn't handle it).


When a tuple is predicate-locked, the key of the lock is ctid+xmin. 
However, when a tuple is frozen, its xmin is changed to FrozenXid. That 
effectively invalidates any predicate lock on the tuple, as checking for 
a lock on the same tuple later won't find it as the xmin is different.


Attached is an isolationtester spec to demonstrate this.

- Heikki
# Test predicate locks with freezing
#
# This test has two serializable transactions. Both select two rows
# from the table, and then update one of them.
# If these were serialized (run one at a time), the transaction that
# runs later would see one of the rows to be updated.
#
# Any overlap between the transactions must cause a serialization failure.
#
# Normally that works, but freezing a tuple screws up predicate locking.
# After freezing a tuple, any predicate locks on it are effectively lost.

setup
{
  CREATE TABLE test (i int PRIMARY KEY, t text);
  INSERT INTO test VALUES (5, 'apple'), (7, 'pear'), (11, 'banana');
}

teardown
{
  DROP TABLE test;
}

session s1
setup { BEGIN ISOLATION LEVEL SERIALIZABLE; set enable_seqscan=off;}
step r1 { SELECT * FROM test WHERE i IN (5, 7) }
step w1 { UPDATE test SET t = 'pear_xact1' WHERE i = 7 }
step c1 { COMMIT; }

session s2
setup { BEGIN ISOLATION LEVEL SERIALIZABLE; set enable_seqscan=off;}
step r2 { SELECT * FROM test WHERE i IN (5, 7) }
step w2 { UPDATE test SET t = 'apple_xact2' WHERE i = 5 }
step c2 { COMMIT; }

session freezer
step freeze { VACUUM FREEZE test; }

# This should cause an serialization error - and it does if you remove the
# freeze step. 
permutation r1 r2 freeze w1 w2 c1 c2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory

Hi,


On 2013-09-19 11:44:34 -0400, Robert Haas wrote:
 On Wed, Sep 18, 2013 at 1:42 PM, Andres Freund and...@2ndquadrant.com wrote:

  --- /dev/null
  +++ b/src/backend/storage/ipc/dsm.c
  +#define PG_DYNSHMEM_STATE_FILE   PG_DYNSHMEM_DIR 
  /state
  +#define PG_DYNSHMEM_NEW_STATE_FILE   PG_DYNSHMEM_DIR /state.new
 
  Hm, I guess you dont't want to add it  to global/ or so because of the
  mmap implementation where you presumably scan the directory?
 
 Yes, and also because I thought this way would make it easier to teach
 things like pg_basebackup (or anybody's home-brew scripts) to just
 skip that directory completely.  Actually, I was wondering if we ought
 to have a directory under pgdata whose explicit charter it was to
 contain files that shouldn't be copied as part of a base backup.
 pg_do_not_back_this_up.

Wondered exactly about that as soon as you've mentioned
pg_basebackup. pg_local/?

  + /* Determine size for new control segment. */
  + maxitems = PG_DYNSHMEM_FIXED_SLOTS
  + + PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends;
 
  It seems likely that MaxConnections would be sufficient?
 
 I think we could argue about the best way to set this until the cows
 come home, but I don't think it probably matters much at this point.
 We can always change the formula later as we gain experience.
 However, I don't have a principled reason for assuming that only
 user-connected backends will create dynamic shared memory segments.

Hm, yes. I had MaxBackends down as MaxConnections + autovacuum stuff;
but max_worker_processes are in there now, so you're right that doesn't
make sense.

  +/*
  + * Read and parse the state file.
  + *
 
  Perhaps CRC32 the content?
 
 I don't see the point.  If the file contents are garbage that happens
 to look like a number, we'll go oh, there isn't any such segment or
 oh, there is such a segment but it doesn't look like a control
 segment, so forget it.   There are a lot of things we really ought to
 be CRCing to avoid corruption risk, but I can't see how this is
 remotely one of them.

I was worried about a partially written file or containing contents from
two different postmaster cycles, but it's actually far too small for
that...
I initially had thought you'd write the contents of the entire shared
control segment there, not just it's id.

  + /* Create or truncate the file. */
  + statefd = open(PG_DYNSHMEM_NEW_STATE_FILE, O_RDWR|O_CREAT|O_TRUNC, 
  0600);
 
  Doesn't this need a | PG_BINARY?
 
 It's a text file.  Do we need PG_BINARY anyway?

I'd say yes. Non binary mode stuff on windows does stuff like
transforming LF = CRLF on reading/writing, which makes sizes not match
up and similar ugliness.
Imo there's little reason to use non-binary mode for anything written
for postgres' own consumption.

  Why are you using open() and not
  BasicOpenFile or even OpenTransientFile?
 
 Because those don't work in the postmaster.

Oh, that's news to me. Seems strange, especially for BasicOpenFile.

 
  + /* Write contents. */
  + snprintf(statebuf, PG_DYNSHMEM_STATE_BUFSIZ, %lu\n,
  +  (unsigned long) dsm_control_handle);
 
  Why are we upcasting the length of dsm_control_handle here? Also,
  doesn't this need the usual UINT64_FORMAT thingy?
 
 dsm_handle is an alias for uint32.  Is that always exactly an unsigned
 int or can it sometimes be an unsigned long?  I thought the latter, so
 couldn't figure out how to write this portably without casting to a
 type that explicitly matched the format string.

That should always be an unsigned int on platforms we support. Note that
we've printed TransactionIds that way (i.e. %u) for a long time and they
are a uint32 as well.

  Not sure whether it's sensible to only LOG in these cases. After all
  there's something unexpected happening. The robustness argument doesn't
  count since we're already shutting down.

 I see no point in throwing an error.   The fact that we're having
 trouble cleaning up one dynamic shared memory segment doesn't mean we
 shouldn't try to clean up others, or that any remaining postmaster
 shutdown hooks shouldn't be executed.

Well, it means we'll do a regular shutdown instead of PANICing
and *not* writing a checkpoint.
If something has corrupted our state to the point we cannot unregister
shared memory we registered, something has gone terribly wrong. Quite
possibly we've scribbled over our control structures or such. In that
case it's not proper to do a normal shutdown, we're quite possibly
writing bad data.

  + ereport(ERROR,
  + (errcode(ERRCODE_INTERNAL_ERROR),
  +  errmsg(dynamic shared memory 
  control segment is not valid)));
 
  Imo that's a PANIC or at the very least a FATAL.
 
 Sure, that's a tempting option, but it doesn't seem to serve any very
 necessary point.  There's no data corruption problem if we proceed
 here.  Most

Re: [HACKERS] SSI freezing bug

Hi,


On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote:
 When a tuple is predicate-locked, the key of the lock is ctid+xmin. However,
 when a tuple is frozen, its xmin is changed to FrozenXid. That effectively
 invalidates any predicate lock on the tuple, as checking for a lock on the
 same tuple later won't find it as the xmin is different.
 
 Attached is an isolationtester spec to demonstrate this.

Do you have any idea to fix that besides keeping the xmin horizon below the
lowest of the xids that are predicate locked? Which seems nasty to
compute and is probably not trivial to fit into the procarray.c
machinery?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI freezing bug

On 2013-09-20 13:53:04 +0200, Andres Freund wrote:
 Hi,
 
 
 On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote:
  When a tuple is predicate-locked, the key of the lock is ctid+xmin. However,
  when a tuple is frozen, its xmin is changed to FrozenXid. That effectively
  invalidates any predicate lock on the tuple, as checking for a lock on the
  same tuple later won't find it as the xmin is different.
  
  Attached is an isolationtester spec to demonstrate this.
 
 Do you have any idea to fix that besides keeping the xmin horizon below the
 lowest of the xids that are predicate locked? Which seems nasty to
 compute and is probably not trivial to fit into the procarray.c
 machinery?

A better solution probably is to promote tuple-level locks if they exist
to a relation level one upon freezing I guess?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_sleep(interval)

2013-09-20 Thread Fabien COELHO



Here is a review of the pg_sleep(INTERVAL) patch version 1:

 - patch applies cleanly on current head

 - the function documentation is updated and clear

 - the function definition relies on a SQL function
   to call the underlying pg_sleep(INT) function
   I'm fine with this, especially as if someone calls this
   function, he/she is not in a hurry:-)

 - this one-liner implementation is straightforward

 - the provided feature is simple, and seems useful.

 - configure/make/make check work ok

However :

 - the new function is *not* tested anywhere!

   I would suggest simply to replace some pg_sleep(int) instances
   by corresponding pg_sleep(interval) instances in the non
   regression tests.

 - some concerns have been raised that it breaks pg_sleep(TEXT)
   which currently works thanks to the implicit TEXT - INT cast.

   I would suggest to add pg_sleep(TEXT) explicitely, like:

 CREATE FUNCTION pg_sleep(TEXT) RETURNS VOID VOLATILE STRICT AS
 $$ select pg_sleep($1::INTEGER) $$ LANGUAGE SQL;

   That would be another one liner, to update the documentation and
   to add some tests as well!

   ISTM that providing pg_sleep(TEXT) cleanly resolves the
   upward-compatibility issue raised.

--
Fabien


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertions in PL/PgSQL

On Fri, Sep 20, 2013 at 6:24 AM, Marko Tiikkaja ma...@joh.to wrote:
 The issue of how this is spelled is somewhat secondary for me.  I
 think ASSERT is probably as good as anything.  But let's not kid
 ourselves: even reserving this word only in PL/pgsql WILL break things
 for some users, and there are LOTS of people out there with LOTS of
 procedural code.  Every tiny backward-incompatibility reduces by just
 a little bit the percentage of those people who can upgrade and
 increases the delay before they do.  This is an area where past
 experience has made me quite wary.

 The thing is, what this means is that to add a new feature to the language,
 you have to make the syntax so damn ugly that no one wants to use it (see
 row_count, for example) or you will break some poor user's function.  And
 now we got all this cool functionality which nobody wants to use, and the
 language itself actually gets progressively worse.  All this is starting to
 sound like it's already too late to make PL/PgSQL better, and we should just
 start afresh.

To some extent I agree that PL/pgsql is hopeless.  I think there are
some things we can do to improve it, but most of what gets proposed at
least in this forum strikes me as tinkering around the edges, and it
can't make up for fundamentally bad language design decisions.  Part
of the problem, of course, is that most programming languages don't
get re-released every year.  It's not that it would be OK for C to
suddenly reserve a bunch of new keywords; it's that they don't try.
And when they do release no language versions (like C99) some people
(like us) don't adopt them, for fear of being unable to run our code
on older systems.  Such considerations apply with equal force to
PL/pgsql, but it gets a new release every year rather than every
decade, so the problems are magnified.

The other part of the problem is that the language isn't designed from
the beginning to be extensible.  In Perl, for example, they chose to
mark variables with a leading $, @, or % and functions with a leading
.  That last marking has largely fallen into desuetude, but the point
is that - to the extent that you do have and use such markers - you
can add new keywords without breaking anything.  Some languages can
also distinguish keywords positionally; for example, ABORT doesn't
need to be reserved in PostgreSQL's SQL dialect because it can only
appear as a command at the beginning of a line, and it can't be a
column, type, or function name in that position.  Such an approach
might even work ASSERT in PL/pgsql, if there's a clean way to
disambiguate vs. the assignment syntax.  But even if we can make that
work, we're going to continue to face this problem with each new
language extension.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where to load modules from?

On Thu, Sep 19, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-19 22:56:52 +0200, Dimitri Fontaine wrote:
 Robert Haas robertmh...@gmail.com writes:
  I think I'd prefer a GUC that allows specifying multiple directories
  that are searched in order to a single symlinked directory.
 
  Why?
 
  I ask because I have the opposite preference, based on the precedent
  of pg_xlog.

 Because I want to specify multiple paths. E.g. one with modules for a
 specific postgres version, one for the cluster and one for my
 development directory.
 Now we could recursively search a directory that contains symlinks to
 directories, but that seems ugly.

I see.  My main hesitation is around security.  I feel somehow that
changing a GUC to trojan the system would be easier for a remote user
to accomplish than having to replace a directory with a symlink.

 In an effort to reach consensus, what about having both, with the GUC
 being empty by default? That way you have a default per-cluster place
 where to stuff binaries to be loaded, and a GUC to manage finer settings
 if needs be.

 Well, we can have the guc have a default value of $datadir/pg_lib or
 such. But using two independent mechanisms seems like a bad idea to me.

Heartily agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where to load modules from?

On 2013-09-20 08:06:56 -0400, Robert Haas wrote:
 On Thu, Sep 19, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote:
  Because I want to specify multiple paths. E.g. one with modules for a
  specific postgres version, one for the cluster and one for my
  development directory.
  Now we could recursively search a directory that contains symlinks to
  directories, but that seems ugly.

 I see.  My main hesitation is around security.  I feel somehow that
 changing a GUC to trojan the system would be easier for a remote user
 to accomplish than having to replace a directory with a symlink.

If they can change a PGC_POSTMASTER GUC, they already can easily enough
do:
shared_preload_libraries='/path/to/my/bad/so.so'

that's already allowed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where to load modules from?

On Fri, Sep 20, 2013 at 8:10 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-20 08:06:56 -0400, Robert Haas wrote:
 On Thu, Sep 19, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Because I want to specify multiple paths. E.g. one with modules for a
  specific postgres version, one for the cluster and one for my
  development directory.
  Now we could recursively search a directory that contains symlinks to
  directories, but that seems ugly.

 I see.  My main hesitation is around security.  I feel somehow that
 changing a GUC to trojan the system would be easier for a remote user
 to accomplish than having to replace a directory with a symlink.

 If they can change a PGC_POSTMASTER GUC, they already can easily enough
 do:
 shared_preload_libraries='/path/to/my/bad/so.so'

 that's already allowed.

OK.  Well, in that case, it seems we wouldn't be opening any new doors.

So... our usual comma-separated GUC syntax?  Empty means no extra
places to search.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior

On Fri, Sep 20, 2013 at 3:41 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 12:57 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 10:28 PM, Robert Haas robertmh...@gmail.com wrote:
 Marking this as Ready for committer.

 Thanks a ton for reviewing the patch.

 I rewrote the comments for this patch and fixed the incorrect
 formatting in parse_relation.c.  It used spaces instead of tabs, which
 is why if you look at the patch file you'll see that the alignment
 looked off.  See new version, attached.
Thanks, Attached version looks better.

 However, I have a few residual questions:

 1. Does copy.c also need to be fixed?  I see that it also calls
 ExecConstraints(), and I don't see it setting tableOid anywhere.  We
 certainly want COPY and INSERT to behave the same way.

 I have missed it by confusing it with OID, as OID is set in COPY.
 I think along with COPY, it needs to set in ATRewriteTable() as well,
 because during rewrite also we check constraints.

 I will send an updated patch after point-2 is concluded.


 2. Should constraints also allow access to the oid column?  Right
 now you can do that but it doesn't seem to work, e.g.:

 rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
 CREATE TABLE

 I have tried various combinations, it is giving error at my end.
 postgres=# create table tbl_with_oid(c1 int) With OIDS;
 CREATE TABLE
 postgres=# alter table tbl_with_oid add check (oid  16403);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# alter table tbl_with_oid add check (oid =0);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# alter table tbl_with_oid add check (oid::integer %2 =0);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
 ERROR:  system column oid reference in check constraint is invalid

 I am sorry, I just realized after pressing send button that you want
 to say the above point without considering above patch.

 Just prohibiting that might be fine; it doesn't seem that useful
 anyway.

 Currently only tableOid is allowed, other system columns should error
 out due to below check:
 +   /* In constraint check, no system column is allowed except 
 tableOid */
 +   if (pstate-p_expr_kind == EXPR_KIND_CHECK_CONSTRAINT 
 +   attnum  InvalidAttrNumber  attnum !=  
 TableOidAttributeNumber)

 But if we're setting the table OID, maybe we should set the
 OID too, and then just allow this.
It can be done for OID as well. I don't have any strong reason for
 either doing or not doing it, if you think it can be of use then we
 can add it.

 One more thing, I think it will be better to update information in
 docs, probably in Create Table page where CHECK constraints are
 explained and where DDL constraints are explained at link:
 http://www.postgresql.org/docs/devel/static/ddl-constraints.html

Yes, agreed.  So, are you going to prepare a new version with
documentation and addressing the other points mentioned?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

* Kevin Grittner (kgri...@ymail.com) wrote:
 ... like
 just refreshing a view so that the new contents are the same as
 what you would see if you re-ran the query defining the matview.

I've heard this notion a few times of wanting to make sure that what you
get from running the query matches the matview.  While that makes sense
when the equality operator and what-you-see actually match, it doesn't
when they don't because the what-you-see ends up being non-deterministic
and can change based on the order the datums are seen in during the
query processing which can change with different data ordering on disk
and even due to simply different plans for the same data, I believe.

Consider a GROUP BY with a citext column as one of the key fields.
You're going to get whatever value the aggregate happened to come across
first when building the HashAgg.  Having the binary equality operator
doesn't help that and seems like it could even result in change storms
happening due to a different plan when the actual data didn't change.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Freezing without write I/O

On Thu, Sep 19, 2013 at 6:27 PM, Ants Aasma a...@cybertec.at wrote:
 I'm tackling similar issues in my patch. What I'm thinking is that we should
 change our existing supposedly atomic accesses to be more explicit and make
 the API compatible with C11 atomics[1]. For now I think the changes should be
 something like this:

 * Have separate typedefs for atomic variants of datatypes (I don't think we
   have a whole lot of them). With C11 atomics support, this would change to

 typedef _Atomic TransactionId AtomicTransactionId;

What's the point of this?

 * Require that pg_atomic_init(type, var, val) be used to init atomic
   variables. Initially it would just pass through to assignment, as all
   supported platforms can do 32bit atomic ops. C11 compatible compilers will
   delegate to atomic_init().

I don't think this is a bad idea for decoration, but again I'm not
sure how much fundamental value it adds.  If it makes it easier for
people to write code that works in C11 and fails on C89, we lose.

 * Create atomic read and write macros that look something like:

 #define pg_atomic_load(type, var) (*((volatile type*) (var)))

   and

 #define pg_atomic_store(type, var, val) do { \
 *((volatile type*) (var)) = (val); \
 } while(0)

   C11 would pass through to the compilers implementation with relaxed memory
   ordering.

Same comment.

 * Rename pg_read_barrier()/pg_write_barrier()/pg_memory_barrier() to
   pg_acquire_fence()/pg_release_fence()/pg_acq_rel_fence(). Herb Sutter makes
   a convincing argument why loosening up the barrier semantics is the sane
   choice here. [2] C11 support can then pass though to atomic_thread_fence().

I am entirely unconvinced that we need this.  Some people use acquire
+ release fences, some people use read + write fences, and there are
all combinations (read-acquire, read-release, read-acquire-release,
write-acquire, write-release, write-acquire-release, both-acquire,
both-release, both-acquire-release).  I think we're going to have
enough trouble deciding between the primitives we already have, let
alone with a whole mess of new ones.  Mistakes will only manifest
themselves on certain platforms and in many cases the races are so
tight that actual failures are very unlikely to be reserved in
regression testing.

Personally, I think the biggest change that would help here is to
mandate that spinlock operations serve as compiler fences.  That would
eliminate the need for scads of volatile references all over the
place.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where to load modules from?

2013-09-20 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 So... our usual comma-separated GUC syntax?  Empty means no extra
 places to search.

Sounds pretty good to me.

The only advantage of using an initdb place would have been the
opportunity to actually register modules and WAL log that step so that
the standby pg_modules directory gets filled automatically.

I realise that might be another discussion and patch entirely.

I'll prepare a patch using GUCs just doing the bare minimum for now,
allowing to load modules from GUC directed places.

Regards,
-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

Steve Singer st...@ssinger.info wrote:
 On 09/12/2013 06:27 PM, Kevin Grittner wrote:
 Attached is a patch for a bit of infrastructure I believe to be
 necessary for correct behavior of REFRESH MATERIALIZED VIEW
 CONCURRENTLY as well as incremental maintenance of matviews.

 Here is attempt at a review of the v1 patch.

Thanks for looking at it.

 There has been a heated discussion on list about if we even want this
 type of operator.

It's not a question of whether we want to allow operators which
define comparisons between objects in a non-standard way.  We
already have 11 such cases; this would add one more to make 12.  In
all cases there are different operators for making a comparison
that return potentially different results from the default
operators, to support uses that are specific to that type.

 I think there is agreement that better (as in more obscure) operators
 than === and !== need to be picked  and we need to find a place in the
 user-docs to warn users of the behaviour of this operators.  Hannu has
 proposed

 *==*  binary equal, surely very equal by any other definition as
 wall
 !==?  maybe not equal -- binary inequal, may still be
 equal by
 other comparison methods

 and no one has yet objected to this.

I do.  The issue is that there are a great many places that there
are multiple definitions of equality.  We generally try to use
something which tries to convey something about the nature of the
operation, since the fact that it can have different results is a
given.  There is nothing in those operators that says binary image
comparison.  I thought about prepending a hash character in front
of normal operators, because that somehow suggests binary
operations to my eye (although I have no idea whether it does so
for anyone else), but those operators are already used for an
alternative set of comparisons for intervals, with a different
meaning.  I'm still trying to come up with something I really like.

 My vote would be to update the patch to
 use those operator names and then figure out where we can document them.  It 
 it
 means we have to section describing the equal operator for records as well 
 then
 maybe we should do that so we can get on with things.

 Code Review
 

 The record_image_eq and record_image_cmp functions are VERY similar.  It seems
 to me that you could have the meet of these functions into a common function
 (that isn't exposed via SQL) that then behaves differently  in 2 or 3 spots
 based on a parameter that controls if it detoasts the tuple for the compare or
 returns false on the equals.

Did you look at the record_eq and record_cmp functions which exist
before this patch?  Is there a reason we should do it one way for
the default operators and another way for this alternative?  Do you
think we should change record_eq and record_cmp to do things the
way you suggest?

 Beyond that I don't see any issues with the code.

 You call out a question about two records being compared have a different 
 number
 of columns should it always be an error, or only an error when they match on 
 the
 columns they do have in common.

 The current behaviour is

   select * FROM r1,r2 where r1==r2;
   a | b | a | b | c
 ---+---+---+---+---
   1 | 1 | 1 | 2 | 1
 (1 row)

 update r2 set b=1;
 UPDATE 1
 test=# select * FROM r1,r2 where r2==r1;
 ERROR:  cannot compare record types with different numbers of columns

 This seems like a bad idea to me.  I don't like that I get a type comparison
 error only sometimes based on the values of the data in the column.  If I'm
 a user testing code that compares two records with this operator and I test my
 query with 1 value pair then and it works then I'd naively expect to get a
 true or false on all other value sets of the same record type.

Again, this is a result of following the precedent set by the
existing record comparison operators.

test=# create table r1 (c1 int, c2 int);
CREATE TABLE
test=# create table r2 (c1 int, c2 int, c3 int);
CREATE TABLE
test=# insert into r1 values (1, 2);
INSERT 0 1
test=# insert into r2 values (3, 4, 5);
INSERT 0 1
test=# select * from r1 join r2 on r1  r2;
 c1 | c2 | c1 | c2 | c3
++++
  1 |  2 |  3 |  4 |  5
(1 row)

test=# update r1 set c1 = 3, c2 = 4;
UPDATE 1
test=# select * from r1 join r2 on r1  r2;
ERROR:  cannot compare record types with different numbers of columns

Would be in favor of forcing a change to the record comparison
operators which have been in use for years?  If not, is there a
good reason to have them behave differently in this regard?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

Hi,

I agree with most of what you said - I think that's a littlebit too much
change for too little benefit.

On 2013-09-20 08:32:29 -0400, Robert Haas wrote:
 Personally, I think the biggest change that would help here is to
 mandate that spinlock operations serve as compiler fences.  That would
 eliminate the need for scads of volatile references all over the
 place.

The effectively already do, don't they? It's an external, no-inlineable
function call (s_lock, not the actual TAS). And even if it were to get
inlined via LTO optimization, the inline assembler we're using is doing
the __asm__ __volatile__ (..., memory) dance. That's a full compiler
barrier.
The non-asm implementations call to OS/compiler primitives that are also
fences.

In the case I brougth up here there is no spinlock or something similar.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where to load modules from?

On 2013-09-20 14:35:31 +0200, Dimitri Fontaine wrote:
 Robert Haas robertmh...@gmail.com writes:
  So... our usual comma-separated GUC syntax?  Empty means no extra
  places to search.

+1.

 The only advantage of using an initdb place would have been the
 opportunity to actually register modules and WAL log that step so that
 the standby pg_modules directory gets filled automatically.

-many

 I realise that might be another discussion and patch entirely.

Yes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

On Fri, Sep 20, 2013 at 8:40 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-20 08:32:29 -0400, Robert Haas wrote:
 Personally, I think the biggest change that would help here is to
 mandate that spinlock operations serve as compiler fences.  That would
 eliminate the need for scads of volatile references all over the
 place.

 The effectively already do, don't they? It's an external, no-inlineable
 function call (s_lock, not the actual TAS). And even if it were to get
 inlined via LTO optimization, the inline assembler we're using is doing
 the __asm__ __volatile__ (..., memory) dance. That's a full compiler
 barrier.
 The non-asm implementations call to OS/compiler primitives that are also
 fences.

 In the case I brougth up here there is no spinlock or something similar.

Well, that doesn't match my previous discussions with Tom Lane, or this comment:

 *  Another caution for users of these macros is that it is the caller's
 *  responsibility to ensure that the compiler doesn't re-order accesses
 *  to shared memory to precede the actual lock acquisition, or follow the
 *  lock release.  Typically we handle this by using volatile-qualified
 *  pointers to refer to both the spinlock itself and the shared data
 *  structure being accessed within the spinlocked critical section.
 *  That fixes it because compilers are not allowed to re-order accesses
 *  to volatile objects relative to other such accesses.

That's not to disagree with your point.  If TAS is a compiler barrier,
then we oughta be OK.  Unless something can migrate into the spot
between a failed TAS and the call to s_lock?  But I'm pretty sure that
we've repeatedly had to change code to keep things from falling over
in this area, see e.g. commits
fa72121594534dda6cc010f0bf6b3e8d22987526,
07eeb9d109c057854b20707562ce517efa591761,
d3fc362ec2ce1cf095950dddf24061915f836c22, and
584f818bef68450d23d1b75afbaf19febe38fd37 (the last apparently a live
bug).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertions in PL/PgSQL

2013-09-20 Thread Pavel Stehule

2013/9/20 Robert Haas robertmh...@gmail.com

 On Fri, Sep 20, 2013 at 6:24 AM, Marko Tiikkaja ma...@joh.to wrote:
  The issue of how this is spelled is somewhat secondary for me.  I
  think ASSERT is probably as good as anything.  But let's not kid
  ourselves: even reserving this word only in PL/pgsql WILL break things
  for some users, and there are LOTS of people out there with LOTS of
  procedural code.  Every tiny backward-incompatibility reduces by just
  a little bit the percentage of those people who can upgrade and
  increases the delay before they do.  This is an area where past
  experience has made me quite wary.
 
  The thing is, what this means is that to add a new feature to the
 language,
  you have to make the syntax so damn ugly that no one wants to use it (see
  row_count, for example) or you will break some poor user's function.  And
  now we got all this cool functionality which nobody wants to use, and the
  language itself actually gets progressively worse.  All this is starting
 to
  sound like it's already too late to make PL/PgSQL better, and we should
 just
  start afresh.

 To some extent I agree that PL/pgsql is hopeless.  I think there are
 some things we can do to improve it, but most of what gets proposed at
 least in this forum strikes me as tinkering around the edges, and it
 can't make up for fundamentally bad language design decisions.  Part
 of the problem, of course, is that most programming languages don't
 get re-released every year.  It's not that it would be OK for C to
 suddenly reserve a bunch of new keywords; it's that they don't try.
 And when they do release no language versions (like C99) some people
 (like us) don't adopt them, for fear of being unable to run our code
 on older systems.  Such considerations apply with equal force to
 PL/pgsql, but it gets a new release every year rather than every
 decade, so the problems are magnified.

 The other part of the problem is that the language isn't designed from
 the beginning to be extensible.  In Perl, for example, they chose to
 mark variables with a leading $, @, or % and functions with a leading
 .  That last marking has largely fallen into desuetude, but the point
 is that - to the extent that you do have and use such markers - you
 can add new keywords without breaking anything.  Some languages can
 also distinguish keywords positionally; for example, ABORT doesn't
 need to be reserved in PostgreSQL's SQL dialect because it can only
 appear as a command at the beginning of a line, and it can't be a
 column, type, or function name in that position.  Such an approach
 might even work ASSERT in PL/pgsql, if there's a clean way to
 disambiguate vs. the assignment syntax.  But even if we can make that
 work, we're going to continue to face this problem with each new
 language extension.


PL/pgSQL had a ADA completeness, uniformity and beauty newer. But it is not
too bad, and one new specialized statement doesn't kill us. A proposed
functionality is often used and we have not tools (macros) how to implement
it simply.

we support a conditions for few statement - so enhancing RAISE statement is
possible

so some form of RAISE EXCEPTION WHEN NOT FOUND  AND use_assrts USING
message = 'there are no some';

but this form is relative long and less readable (can be difficult find
asserts in code and separate it from custom exceptions). I am fully for
some variant of ASSERT statement. The benefit is higher than cost.

ASSERT keyword is simply, readable - and I can accept it, if we found a
syntax for complete functionality (although I prefer a PRAGMA introduction).

Regards

Pavel



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

Re: [HACKERS] Assertions in PL/PgSQL

2013-09-20 Thread Hannu Krosing

On 09/20/2013 01:59 PM, Robert Haas wrote:
 The other part of the problem is that the language isn't designed from
 the beginning to be extensible.  In Perl, for example, they chose to
 mark variables with a leading $, @, or % and functions with a leading
 .  That last marking has largely fallen into desuetude, but the point
 is that - to the extent that you do have and use such markers - you
 can add new keywords without breaking anything.  Some languages can
 also distinguish keywords positionally; for example, ABORT doesn't
 need to be reserved in PostgreSQL's SQL dialect because it can only
 appear as a command at the beginning of a line, and it can't be a
 column, type, or function name in that position.  Such an approach
 might even work ASSERT in PL/pgsql, if there's a clean way to
 disambiguate vs. the assignment syntax.  But even if we can make that
 work, we're going to continue to face this problem with each new
 language extension.

Perhaps we could use the pragma approach here and add some types of new
functionality in omments

--#ASSERT .

or even

--#pragma ASSERT .

It is still not guaranteed to be 100% compatible, but at least changing
comments should be relatively safe way for fixing your functions

And you could have another pragma to disable some pragmas which you
could SET in GUC (global, session or per function) for extra ugliness ;)

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-20 Thread Samrat Revagade

On Fri, Sep 20, 2013 at 3:40 PM, Sameer Thakur samthaku...@gmail.comwrote:






 Attached patch combines documentation patch and source-code patch.


 I have had a stab at reviewing the documentation. Have a look.


Thanks.
Attached patch implements suggestions in documentation.
But comments from Fujii-san still needs to be implemented .
We will implement them soon.


synchronous_transfer_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where to load modules from?

Robert Haas escribió:
 On Sun, Sep 15, 2013 at 10:51 AM, Peter Eisentraut pete...@gmx.net wrote:
  On Sun, 2013-09-15 at 16:09 +0200, Dimitri Fontaine wrote:
  Peter Eisentraut pete...@gmx.net writes:
   It shouldn't be in the commit fest if it has no patch.
 
  What should I do if my goal is to get community consensus on the best
  way to solve a problem, and want to start the discussion with some
  proposals?
 
  Post it to the pgsql-hackers list.
 
 The idea of using the CommitFest process to request design review was
 floated at one of the last couple of developer meetings in Ottawa.
 Personally, I'm for it.

I did it for minmax indexes on CF1 and nobody complained.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior

On Fri, Sep 20, 2013 at 5:53 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 3:41 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 12:57 PM, Amit Kapila amit.kapil...@gmail.com 
 wrote:
 On Thu, Sep 19, 2013 at 10:28 PM, Robert Haas robertmh...@gmail.com wrote:
 Marking this as Ready for committer.

 Thanks a ton for reviewing the patch.

 I rewrote the comments for this patch and fixed the incorrect
 formatting in parse_relation.c.  It used spaces instead of tabs, which
 is why if you look at the patch file you'll see that the alignment
 looked off.  See new version, attached.
Thanks, Attached version looks better.

 However, I have a few residual questions:

 1. Does copy.c also need to be fixed?  I see that it also calls
 ExecConstraints(), and I don't see it setting tableOid anywhere.  We
 certainly want COPY and INSERT to behave the same way.

 I have missed it by confusing it with OID, as OID is set in COPY.
 I think along with COPY, it needs to set in ATRewriteTable() as well,
 because during rewrite also we check constraints.

 I will send an updated patch after point-2 is concluded.


 2. Should constraints also allow access to the oid column?  Right
 now you can do that but it doesn't seem to work, e.g.:

 rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
 CREATE TABLE

 I have tried various combinations, it is giving error at my end.
 postgres=# create table tbl_with_oid(c1 int) With OIDS;
 CREATE TABLE
 postgres=# alter table tbl_with_oid add check (oid  16403);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# alter table tbl_with_oid add check (oid =0);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# alter table tbl_with_oid add check (oid::integer %2 =0);
 ERROR:  system column oid reference in check constraint is invalid
 postgres=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
 ERROR:  system column oid reference in check constraint is invalid

 I am sorry, I just realized after pressing send button that you want
 to say the above point without considering above patch.

 Just prohibiting that might be fine; it doesn't seem that useful
 anyway.

 Currently only tableOid is allowed, other system columns should error
 out due to below check:
 +   /* In constraint check, no system column is allowed except 
 tableOid */
 +   if (pstate-p_expr_kind == EXPR_KIND_CHECK_CONSTRAINT 
 +   attnum  InvalidAttrNumber  attnum !=  
 TableOidAttributeNumber)

 But if we're setting the table OID, maybe we should set the
 OID too, and then just allow this.
It can be done for OID as well. I don't have any strong reason for
 either doing or not doing it, if you think it can be of use then we
 can add it.

 One more thing, I think it will be better to update information in
 docs, probably in Create Table page where CHECK constraints are
 explained and where DDL constraints are explained at link:
 http://www.postgresql.org/docs/devel/static/ddl-constraints.html

 Yes, agreed.  So, are you going to prepare a new version with
 documentation and addressing the other points mentioned?

   Handling for OID is not clear, shall we allow it or not in check constraint?
   In my current patch OID will not be allowed in check constraint.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

Steve,

  Thanks for providing a summary.

* Steve Singer (st...@ssinger.info) wrote:
 The arguments for this patch are
 * We want the materialized view to return the same value as would be
 returned if the query were executed directly.  This not only means
 that the values should be the same according to a datatypes =
 operator but that they should appear the same 'to the eyeball'.

With the cases where the equality operator doesn't match the 'to the
eyeball' appearance, this really seems to be a pipe dream to me, unless
we *also* define an ordering or similar which would then change the
existing semantics for end users (which might be reasonable), but I'm
not sure how we could do that without input from the type- iow, I don't
think we can say well, we'll order by byte representation.  It's also
going to possibly be a performance hit since we're going to have to do
both an equality op call during a HashAgg/HashJoin/whatever and have to
do a which one is bigger of these two equal things, and then I wonder
if we'd need to allow users to somehow specify I don't want the bigger
of the equal things, I want the smaller, in this query for this equality
check.  I'd really like to see how we're going to provide for that when
the user is doing a GROUP BY without breaking something else or causing
problems with later SQL spec revisions.

 * Supporting the materialized view refresh check via SQL makes a lot
 of sense but doing that requires exposing something via SQL

... which we don't currently expose for the queries that we already
support users running today.  Users seem to generally be accepting of
that too, even though what they end up with in the output isn't
necessairly consistent from query to query.  The issue here is that
we're trying to make the mat view look like what the query would do when
run at the same time, which is a bit of a losing battle, imv.

 * If we adequately document what we mean by record_image_identical
 and the operator we pick for this then users shouldn't be surprised
 at what they get if they use this

That's a bit over-simplistic, really.  We do try to keep to the POLA
(principle of least astonishment) and that's not going to be easy here.

 I think there is agreement that better (as in more obscure)
 operators than === and !== need to be picked  and we need to find a
 place in the user-docs to warn users of the behaviour of this
 operators.   Hannu has proposed

I'm a bit on the fence about this, after having discussed my concerns
with Robert at PostgresOpen.  If we're going to expose and support
these at the SQL level, and we can figure out some semantics and
consistency for using them that isn't totally baffling, then perhaps
having them as simple and clear operator names would be reasonable.  One
concern here is if the SQL spec might decide some day that '===' is a
useful operator for something else (perhaps even they look the same
when cast to text).

 *==*   binary equal, surely very equal by any other definition as wall
 !==?  maybe not equal -- binary inequal, may still be equal by
 other comparison methods

Those look alright to me also though, but we'd need to work out the
other operation names, right?  And then figure out if we want to use
those other operators (less-than, greater-than, etc) when doing equality
operations to figure out which equal value to return to the user.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior

On Fri, Sep 20, 2013 at 9:43 AM, Amit Kapila amit.kapil...@gmail.com wrote:
Handling for OID is not clear, shall we allow it or not in check 
 constraint?
In my current patch OID will not be allowed in check constraint.

I vote for allowing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

Stephen Frost sfr...@snowman.net wrote:
 * Kevin Grittner (kgri...@ymail.com) wrote:
 ... like
 just refreshing a view so that the new contents are the same as
 what you would see if you re-ran the query defining the matview.

 I've heard this notion a few times of wanting to make sure that what you
 get from running the query matches the matview.  While that makes sense
 when the equality operator and what-you-see actually match, it doesn't
 when they don't because the what-you-see ends up being non-deterministic
 and can change based on the order the datums are seen in during the
 query processing which can change with different data ordering on disk
 and even due to simply different plans for the same data, I believe.

That's a fair point to some extent.  Where notions of equality
differ, it is not always non-deterministic, but it can be.  For
citext you are correct.  For a sum() of numeric data, the number of
decimal positions will be the largest value seen; the value present
in the query results will not vary by order of rows scanned or by
plan.

The result of this is that with the patch, an incremental refresh
of a matview will always match what the query returned at the time
it was run (there is no *correctness* problem) but if someone uses
a query with non-deterministic results they may have a lot of
activity on a concurrent refresh even if there was no change to the
underlying data -- so you could have a *performance* penalty in
cases where the query returns something different, compared to
leaving the old equal but not the same results.

 Consider a GROUP BY with a citext column as one of the key fields.
 You're going to get whatever value the aggregate happened to come across
 first when building the HashAgg.  Having the binary equality operator
 doesn't help that and seems like it could even result in change storms
 happening due to a different plan when the actual data didn't change.

Yup.  A person who wants to GROUP BY a citext value for a matview
might want to fold it to a consistent capitalization to avoid that
issue.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior

On Fri, Sep 20, 2013 at 7:17 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 9:43 AM, Amit Kapila amit.kapil...@gmail.com wrote:
Handling for OID is not clear, shall we allow it or not in check 
 constraint?
In my current patch OID will not be allowed in check constraint.

 I vote for allowing it.

Okay, I shall send the updated patch by tomorrow.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

Stephen Frost sfr...@snowman.net wrote:

 The issue here is that we're trying to make the mat view look
 like what the query would do when run at the same time, which is
 a bit of a losing battle, imv.

If we're not doing that, I don't see the point of matviews.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

On Friday, September 20, 2013, Kevin Grittner wrote:

 Stephen Frost sfr...@snowman.net javascript:; wrote:

  The issue here is that we're trying to make the mat view look
  like what the query would do when run at the same time, which is
  a bit of a losing battle, imv.

 If we're not doing that, I don't see the point of matviews.


When taken out of context, I can see how that might not come across
correctly. I was merely pointing out that it's a losing battle in the
context of types which have equality operators which claim equalness but
cast to text with results that don't match there.

Thanks,

Stephen

Re: [HACKERS] SSI freezing bug

Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote:
 When a tuple is predicate-locked, the key of the lock is ctid+xmin.
 However, when a tuple is frozen, its xmin is changed to FrozenXid. That
 effectively
 invalidates any predicate lock on the tuple, as checking for a lock on
 the
 same tuple later won't find it as the xmin is different.

 Attached is an isolationtester spec to demonstrate this.

 Do you have any idea to fix that besides keeping the xmin horizon below the
 lowest of the xids that are predicate locked? Which seems nasty to
 compute and is probably not trivial to fit into the procarray.c
 machinery?

 A better solution probably is to promote tuple-level locks if they exist
 to a relation level one upon freezing I guess?

That would work.  A couple other ideas would be to use the oldest
serializable xmin which is being calculated in predicate.c to
either prevent freezing of newer transaction or to summarize
predicate locks (using the existing SLRU-based summarization
functions) which use xmin values for xids which are being frozen. 
The transactions must already be committed, and so are eligible for
summarization.

I'm not sure which is best.  Will review, probably this weekend.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql tab completion for updatable foreign tables

2013-09-20 Thread Dean Rasheed

On 20 September 2013 11:29, Samrat Revagade revagade.sam...@gmail.com wrote:


  Okay, are you adding this to the september commitfest?
 

 OK, I've done that. I think that it's too late for 9.3.



 +1 for idea.

 I have tested patch and got surprising results with Cent-OS
 Patch is working fine for Cent-OS 6.2 and RHEL 6.3
 But is is giving problem on Cent-OS 6.3 (tab complete for local tables but
 not for foreign tables)

 I have used following script:

 Two postgres  servers are running by using ports 5432 and 5433.
 Server with port 5432 has postgres_fdw installed and will interact with the
 remote server running under port 5433.

 psql -p 5433 -c CREATE TABLE aa_remote (a int, b int) postgres
 postgres=# CREATE EXTENSION postgres_fdw;
 postgres=# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
 OPTIONS (host 'localhost', port '5433', dbname 'postgres');
 postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS
 (password '');
 postgres=# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER
 postgres_server OPTIONS (table_name 'aa_remote');
 postgres=# INSERT into aa_foreign values (1,2);

 But while doing any operation on aa_foreign tab do not complete on Cent-OS
 6.3 (tab complete for local tables but not for foreign tables)
 Is that a problem ?


Hmm. It works for me. What does pg_relation_is_updatable() return for
your foreign table?

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

Stephen Frost sfr...@snowman.net wrote:
 On Friday, September 20, 2013, Kevin Grittner  wrote:
 Stephen Frost sfr...@snowman.net wrote:

 The issue here is that we're trying to make the mat view look
 like what the query would do when run at the same time, which
 is a bit of a losing battle, imv.

 If we're not doing that, I don't see the point of matviews.

 When taken out of context, I can see how that might not come
 across correctly. I was merely pointing out that it's a losing
 battle in the context of types which have equality operators
 which claim equalness but cast to text with results that don't
 match there.

I think the patch I've submitted wins that battle.  The only oddity
is that if someone uses a query for a matview which can provide
results with rows which are equal to previous result rows under the
default record opclass but different under this patch's opclass,
RMVC could update to the latest query results when someone thinks
that might not be necessary.  Workarounds would include using a
query with deterministic results (like using the upper() or lower()
function on a grouped citext column in the result set) or not using
the CONCURRENTLY option.  Neither seems too onerous.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

On 2013-09-20 08:54:26 -0400, Robert Haas wrote:
 On Fri, Sep 20, 2013 at 8:40 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-09-20 08:32:29 -0400, Robert Haas wrote:
  Personally, I think the biggest change that would help here is to
  mandate that spinlock operations serve as compiler fences.  That would
  eliminate the need for scads of volatile references all over the
  place.
 
  The effectively already do, don't they? It's an external, no-inlineable
  function call (s_lock, not the actual TAS). And even if it were to get
  inlined via LTO optimization, the inline assembler we're using is doing
  the __asm__ __volatile__ (..., memory) dance. That's a full compiler
  barrier.
  The non-asm implementations call to OS/compiler primitives that are also
  fences.
 
  In the case I brougth up here there is no spinlock or something similar.
 
 Well, that doesn't match my previous discussions with Tom Lane, or this 
 comment:
 
  *  Another caution for users of these macros is that it is the caller's
  *  responsibility to ensure that the compiler doesn't re-order accesses
  *  to shared memory to precede the actual lock acquisition, or follow the
  *  lock release.  Typically we handle this by using volatile-qualified
  *  pointers to refer to both the spinlock itself and the shared data
  *  structure being accessed within the spinlocked critical section.
  *  That fixes it because compilers are not allowed to re-order accesses
  *  to volatile objects relative to other such accesses.

To me that doesn't really make much sense to be honest.Note that the next 
paragraph tells us that

 *  On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and
 *  S_UNLOCK() macros must further include hardware-level memory fence
 *  instructions to prevent similar re-ordering at the hardware level.
 *  TAS() and TAS_SPIN() must guarantee that loads and stores issued after
 *  the macro are not executed until the lock has been obtained.  
Conversely,
 *  S_UNLOCK() must guarantee that loads and stores issued before the macro
 *  have been executed before the lock is released.

so, TAS has to work as a memory barrier if the architecture doesn't have
strong cache coherency guarantees itself. But memory barriers have to be
compiler barriers?

 That's not to disagree with your point.  If TAS is a compiler barrier,
 then we oughta be OK.  Unless something can migrate into the spot
 between a failed TAS and the call to s_lock?

We're talking compiler barriers right. In that case failure or success
do not play a role, or am I missing something?

 But I'm pretty sure that
 we've repeatedly had to change code to keep things from falling over
 in this area, see e.g. commits

 584f818bef68450d23d1b75afbaf19febe38fd37 (the last apparently a live
 bug).
 d3fc362ec2ce1cf095950dddf24061915f836c22, and

I've quickly checked those out, and things looked mighty different back
then. And incidentally several of the inline assembly implementations
back then didn't specify that they clobber memory (which is what makes
it a compiler barrier).

 fa72121594534dda6cc010f0bf6b3e8d22987526,

Not sure here. Several of the inline assembly bits where changed to
clobber memory, but not all.

 07eeb9d109c057854b20707562ce517efa591761,

Hm. Those mostly look to be overly cautios to me.

I think we should go through the various implementations and make sure
they are actual compiler barriers and then change the documented policy.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

* Kevin Grittner (kgri...@ymail.com) wrote:
 The result of this is that with the patch, an incremental refresh
 of a matview will always match what the query returned at the time
 it was run (there is no *correctness* problem) but if someone uses
 a query with non-deterministic results they may have a lot of
 activity on a concurrent refresh even if there was no change to the
 underlying data -- so you could have a *performance* penalty in
 cases where the query returns something different, compared to
 leaving the old equal but not the same results.

You mean 'at the time of the incremental refresh', right?  Yet that may
or may not match running that query directly by an end-user as the plan
that a user might get for the entire query could be different than what
is run for an incremental update, or due to statistics updates, etc.

  Consider a GROUP BY with a citext column as one of the key fields.
  You're going to get whatever value the aggregate happened to come across
  first when building the HashAgg.  Having the binary equality operator
  doesn't help that and seems like it could even result in change storms
  happening due to a different plan when the actual data didn't change.
 
 Yup.  A person who wants to GROUP BY a citext value for a matview
 might want to fold it to a consistent capitalization to avoid that
 issue.

I'm trying to figure out why that's a perfectly acceptable solution for
users running views with GROUP BYs, but apparently it isn't sufficient
for mat views?  In other words, you would suggest telling users sorry,
you can't rely on the value returned by a GROUP BY on that citext column
using a normal view, but we're going to try and do better for mat
views.

I don't intend the above to imply that we should never update values in
mat views when we can do so in a reliable way to ensure that the value
matches what a view would return.  This matches our notion of UPDATE,
where we will still UPDATE a value even if the old value and the new
value are equal according to the type's equality operator, when the
conditional for the UPDATE is using a reliable type (eg: integer).

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] record identical operator

On 2013-09-20 10:51:46 -0400, Stephen Frost wrote:
 I'm trying to figure out why that's a perfectly acceptable solution for
 users running views with GROUP BYs, but apparently it isn't sufficient
 for mat views?

Err, because users wrote a GROUP BY? They haven't (neccessarily) in the
cases of the matviews we're talking about?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

Stephen Frost sfr...@snowman.net wrote:
 * Kevin Grittner (kgri...@ymail.com) wrote:
 The result of this is that with the patch, an incremental refresh
 of a matview will always match what the query returned at the time
 it was run (there is no *correctness* problem) but if someone uses
 a query with non-deterministic results they may have a lot of
 activity on a concurrent refresh even if there was no change to the
 underlying data -- so you could have a *performance* penalty in
 cases where the query returns something different, compared to
 leaving the old equal but not the same results.

 You mean 'at the time of the incremental refresh', right?  Yet that may
 or may not match running that query directly by an end-user as the plan
 that a user might get for the entire query could be different than what
 is run for an incremental update, or due to statistics updates, etc.

I'm confused.  The refresh *does* run the query.  Sure, if the
query is run again it could return different results.  I'm missing
the point here.

 Consider a GROUP BY with a citext column as one of the key
 fields.  You're going to get whatever value the aggregate
 happened to come across first when building the HashAgg. 
 Having the binary equality operator doesn't help that and seems
 like it could even result in change storms happening due to a
 different plan when the actual data didn't change.

 Yup.  A person who wants to GROUP BY a citext value for a matview
 might want to fold it to a consistent capitalization to avoid that
 issue.

 I'm trying to figure out why that's a perfectly acceptable solution for
 users running views with GROUP BYs, but apparently it isn't sufficient
 for mat views?  In other words, you would suggest telling users sorry,
 you can't rely on the value returned by a GROUP BY on that citext column
 using a normal view, but we're going to try and do better for mat
 views.

Again, I'm lost.  If they don't do something to make the result
deterministic, it could be different on each run of the VIEW, and
on each REFRESH of the matview.  I don't see why that is an
argument for trying to suppress the effort needed make the matview
match the latest run of the query.

 I don't intend the above to imply that we should never update values in
 mat views when we can do so in a reliable way to ensure that the value
 matches what a view would return.  This matches our notion of UPDATE,
 where we will still UPDATE a value even if the old value and the new
 value are equal according to the type's equality operator, when the
 conditional for the UPDATE is using a reliable type (eg: integer).

Well, we provide a trigger function to suppress the UPDATE
operation if the old and new values are identical -- in terms of
what is stored.  We do not attempt to use the default btree equals
operator to suppress updates to different values in some
circumstances.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2013-09-20 10:51:46 -0400, Stephen Frost wrote:
  I'm trying to figure out why that's a perfectly acceptable solution for
  users running views with GROUP BYs, but apparently it isn't sufficient
  for mat views?
 
 Err, because users wrote a GROUP BY? They haven't (neccessarily) in the
 cases of the matviews we're talking about?

Sure; my thinking was going back to what Hannu had suggested where we
have a mechanism to see if the value was updated (using xmin or similar)
and then update it in the mat view in that case, without actually doing
a comparison at all.

That wouldn't necessairly work for the GROUP BY case, but that situation
doesn't work reliably today anyway.  If we solve the GROUP BY case in
SQL for these types then we could use the same for mat views, but we've
been happy enough to ignore the issue thus far.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Freezing without write I/O

Andres Freund escribió:
 Hi,
 
 On 2013-09-19 14:42:19 +0300, Heikki Linnakangas wrote:
  On 18.09.2013 16:22, Andres Freund wrote:
  * Why can we do a GetOldestXmin(allDbs = false) in
 BeginXidLSNRangeSwitch()?
  
  Looks like a bug. I think I got the arguments backwards, was supposed to be
  allDbs = true and ignoreVacuum = false. I'm not sure if vacuums could be
  ignored, but 'false' is the safe option.
 
 Not sure either...

The ignoreVacuum flag specifies to ignore backends running non-full
VACUUM, that is, processes that are known never to generate new Xids,
never obtain Xmin, and never to insert tuples anywhere.  With all these
restrictions, I think it's safe to use ignoreVacuum=true here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

On 2013-09-20 11:05:06 -0400, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  On 2013-09-20 10:51:46 -0400, Stephen Frost wrote:
   I'm trying to figure out why that's a perfectly acceptable solution for
   users running views with GROUP BYs, but apparently it isn't sufficient
   for mat views?
  
  Err, because users wrote a GROUP BY? They haven't (neccessarily) in the
  cases of the matviews we're talking about?
 
 Sure; my thinking was going back to what Hannu had suggested where we
 have a mechanism to see if the value was updated (using xmin or similar)
 and then update it in the mat view in that case, without actually doing
 a comparison at all.

VACUUM, HOT pruning. Have fun.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

* Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  You mean 'at the time of the incremental refresh', right?  Yet that may
  or may not match running that query directly by an end-user as the plan
  that a user might get for the entire query could be different than what
  is run for an incremental update, or due to statistics updates, etc.
 
 I'm confused.  The refresh *does* run the query.  Sure, if the
 query is run again it could return different results.  I'm missing
 the point here.

Perhaps I'm assuming things are farther along than they are..  I was
assumed that 'incremental mat view' updates were actuallly 'partial'-
iow, that it keeps track of the rows in the mat view which are
impacted by the set of rows in the base relations and then runs specific
queries to pull out and operate on the base-rel set-of-rows to update
the specific mat-view-rows.  If it's running the whole query again then
it's certainly more likely to get the same results that the user did,
but it's not a guarantee and that's only a happy coincidence while we're
still doing the whole query approach (which I hope we'd eventually
progress beyond..).

  I'm trying to figure out why that's a perfectly acceptable solution for
  users running views with GROUP BYs, but apparently it isn't sufficient
  for mat views?  In other words, you would suggest telling users sorry,
  you can't rely on the value returned by a GROUP BY on that citext column
  using a normal view, but we're going to try and do better for mat
  views.
 
 Again, I'm lost.  

Perhaps my reply to Andres will help clear that up?

 If they don't do something to make the result
 deterministic, it could be different on each run of the VIEW, and
 on each REFRESH of the matview.  I don't see why that is an
 argument for trying to suppress the effort needed make the matview
 match the latest run of the query.

Is this really just run the new query and look if the old and new row
are different wrt 'incremental' view updates?  If so, I think we're
trying to design something here that we're going to totally break very
shortly when we move beyond that kind of sledgehammer approach to
incremental mat view updates and I'd rather we figure out a solution
which will work beyond that..
 
  I don't intend the above to imply that we should never update values in
  mat views when we can do so in a reliable way to ensure that the value
  matches what a view would return.  This matches our notion of UPDATE,
  where we will still UPDATE a value even if the old value and the new
  value are equal according to the type's equality operator, when the
  conditional for the UPDATE is using a reliable type (eg: integer).
 
 Well, we provide a trigger function to suppress the UPDATE
 operation if the old and new values are identical -- in terms of
 what is stored.  We do not attempt to use the default btree equals
 operator to suppress updates to different values in some
 circumstances.

This feels like we're trying to figure out how to create a key for a
whole row based on the binary representation of that row and then use
that to check if we should update a given row or not.  This seems a bit
radical, but perhaps that's what we should just do then, rather than
invent binary equivilance operators for what-things-look-like-now for
this- extract the row columns out using their binary _send functions
and then hash the result to build a key that you can use.  At least then
we're using a documented (well, we could probably improve on this, but
still) and stable representation of the data that at least some of our
users already deal with.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [PERFORM] encouraging index-only scans

On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote:
 The reason I suggested keeping track of the xids of unremovable tuples
 is that the current logic doesn't handle that at all. We just
 unconditionally set n_dead_tuples to zero after a vacuum even if not a
 single row could actually be cleaned out. Which has the effect that we
 will not start a vacuum until enough bloat (or after changing this, new
 inserts) has collected to start vacuum anew. Which then will do twice
 the work.

 Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
 much good either - we would just immediately trigger a new vacuum the
 next time we check, even if the xmin horizon is still the same.

One idea would be to store the xmin we used for the vacuum somewhere.
Could we make that part of the pgstats infrastructure?  Or store it in
a new pg_class column?  Then we could avoid re-triggering until it
advances.  Or, maybe better, we could remember the oldest XID that we
weren't able to remove due to xmin considerations and re-trigger when
the horizon passes that point.

 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.

 Yes, that's a pretty valid concern. But we can't really do it that
 easily. a) We can only remove dead line pointers when we know there's no
 index pointing to it anymore. Which we only know after the index has
 been removed. b) We cannot check the validity of an index pointer if
 there's no heap tuple for it. Sure, we could check whether we're
 pointing to a dead line pointer, but the random io costs of that are
 prohibitive.
 Now, we could just mark line pointers as dead and not mark that page as
 all-visible and pick it up again on the next vacuum cycle. But that
 would suck long-term.

 I think the only real solution here is to store removed tuples tids
 (i.e. items where we've marked as dead) somewhere. Whenever we've found
 sufficient tuples to-be-removed from indexes we do phase 2.

I don't really agree with that.  Yes, we could make that change, and
yes, it might be better than what we're doing today, but it would be
complex and have its own costs.  And it doesn't mean that lesser steps
are without merit.  A vacuum pass over the heap buys us a LOT of space
for reuse even without touching the indexes: we don't reclaim the line
pointers, but we do reclaim the space for the tuples themselves, which
is a big deal.  So being able to do that more frequently without
causing problems has a lot of value, I think.  The fact that we get to
set all-visible bits along the way makes future vacuums cheaper, and
makes index scans work better, so that's good too.  And the first
vacuum to find a dead tuple will dirty the page to truncate it to a
dead line pointer, while any subsequent revisits prior to the index
vac cycle will only examine the page without dirtying it.  All in all,
just leaving the page to be caught be a future vacuum doesn't seem
that bad to me, at least for a first cut.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2013-09-20 11:05:06 -0400, Stephen Frost wrote:
  Sure; my thinking was going back to what Hannu had suggested where we
  have a mechanism to see if the value was updated (using xmin or similar)
  and then update it in the mat view in that case, without actually doing
  a comparison at all.
 
 VACUUM, HOT pruning. Have fun.

Yea, clearly oversimplified, but I do expect that we're going to reach a
point where we're looking at the rows being updated in the base rels and
which rows they map to in the view and then marking those rows as
needing to be updated.  That whole mechanism doesn't depend on this
are-they-binary-equal approach and is what I had anticipated as the
path we'd be going down in the future.

The above is also what I recall had been discussed at the hackers
meeting, along with some ideas/papers about how to specifically
implement partial updates, hence my assumption that was what we were
talking about..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Assertions in PL/PgSQL

Pavel Stehule escribió:

 PL/pgSQL had a ADA completeness, uniformity and beauty newer. But it is not
 too bad, and one new specialized statement doesn't kill us. A proposed
 functionality is often used and we have not tools (macros) how to implement
 it simply.
 
 we support a conditions for few statement - so enhancing RAISE statement is
 possible

Extending RAISE is one option.  Another option is to decorate BEGIN and
END with an assertion option; and the assertion would be checked when
the block is entered (in BEGIN) or finished (in END).

BEGIN ASSERT (a = 1) WITH (name = a_is_one)
a := a + 1;
END;


BEGIN ASSERT (a  0)
a := a + 1;
END ASSERT (a = 2) WITH (name = a_is_two);

This would play nice with loops too, where the assertion is checked on
every iteration.  And you can have empty blocks if you want the
assertion to be standalone in the middle of some block.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

On 2013-09-20 16:47:24 +0200, Andres Freund wrote:
 I think we should go through the various implementations and make sure
 they are actual compiler barriers and then change the documented policy.

From a quick look
* S_UNLOCK for PPC isn't a compiler barrier
* S_UNLOCK for MIPS isn't a compiler barrier
* I don't know enough about unixware (do we still support that as a
platform even) to judge
* True64 Alpha I have no clue about
* PA-RISCs tas() might not be a compiler barrier for !GCC
* PA-RISCs S_UNLOCK might not be a compiler barrier
* HP-UX !GCC might not
* IRIX 5 seems to be a compiler barrier
* SINIX - I don't care
* AIX PPC - compiler barrier
* Sun - TAS is implemented in external assembly, normal function call,
  compiler barrier
* Win(32|64) - compiler barrier
* Generic S_UNLOCK *NOT* necessarily a compiler barrier.

Ok, so I might have been a bit too optimistic...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertions in PL/PgSQL

2013-09-20 Thread Pavel Stehule

2013/9/20 Alvaro Herrera alvhe...@2ndquadrant.com

 Pavel Stehule escribió:

  PL/pgSQL had a ADA completeness, uniformity and beauty newer. But it is
 not
  too bad, and one new specialized statement doesn't kill us. A proposed
  functionality is often used and we have not tools (macros) how to
 implement
  it simply.
 
  we support a conditions for few statement - so enhancing RAISE statement
 is
  possible

 Extending RAISE is one option.  Another option is to decorate BEGIN and
 END with an assertion option; and the assertion would be checked when
 the block is entered (in BEGIN) or finished (in END).

 BEGIN ASSERT (a = 1) WITH (name = a_is_one)
 a := a + 1;
 END;


 BEGIN ASSERT (a  0)
 a := a + 1;
 END ASSERT (a = 2) WITH (name = a_is_two);

 This would play nice with loops too, where the assertion is checked on
 every iteration.  And you can have empty blocks if you want the
 assertion to be standalone in the middle of some block.


it can works, but it looks too strange

-1

Pavel



 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] [PERFORM] encouraging index-only scans

On 2013-09-20 11:30:26 -0400, Robert Haas wrote:
 On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote:
  The reason I suggested keeping track of the xids of unremovable tuples
  is that the current logic doesn't handle that at all. We just
  unconditionally set n_dead_tuples to zero after a vacuum even if not a
  single row could actually be cleaned out. Which has the effect that we
  will not start a vacuum until enough bloat (or after changing this, new
  inserts) has collected to start vacuum anew. Which then will do twice
  the work.
 
  Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
  much good either - we would just immediately trigger a new vacuum the
  next time we check, even if the xmin horizon is still the same.
 
 One idea would be to store the xmin we used for the vacuum somewhere.
 Could we make that part of the pgstats infrastructure?  Or store it in
 a new pg_class column?  Then we could avoid re-triggering until it
 advances.  Or, maybe better, we could remember the oldest XID that we
 weren't able to remove due to xmin considerations and re-trigger when
 the horizon passes that point.

I suggested a slightly more complex variant of this upthread:
http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] encouraging index-only scans

On Fri, Sep 20, 2013 at 11:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-20 11:30:26 -0400, Robert Haas wrote:
 On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  The reason I suggested keeping track of the xids of unremovable tuples
  is that the current logic doesn't handle that at all. We just
  unconditionally set n_dead_tuples to zero after a vacuum even if not a
  single row could actually be cleaned out. Which has the effect that we
  will not start a vacuum until enough bloat (or after changing this, new
  inserts) has collected to start vacuum anew. Which then will do twice
  the work.
 
  Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
  much good either - we would just immediately trigger a new vacuum the
  next time we check, even if the xmin horizon is still the same.

 One idea would be to store the xmin we used for the vacuum somewhere.
 Could we make that part of the pgstats infrastructure?  Or store it in
 a new pg_class column?  Then we could avoid re-triggering until it
 advances.  Or, maybe better, we could remember the oldest XID that we
 weren't able to remove due to xmin considerations and re-trigger when
 the horizon passes that point.

 I suggested a slightly more complex variant of this upthread:
 http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de

Ah, yeah.  Sorry, I forgot about that.

Personally, I'd try the simpler version first.  But I think whoever
takes the time to implement this will probably get to pick.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

2013-09-20 Thread didier

Hi


On Fri, Sep 20, 2013 at 5:11 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-09-20 16:47:24 +0200, Andres Freund wrote:
  I think we should go through the various implementations and make sure
  they are actual compiler barriers and then change the documented policy.

 From a quick look
 * S_UNLOCK for PPC isn't a compiler barrier
 * S_UNLOCK for MIPS isn't a compiler barrier
 * I don't know enough about unixware (do we still support that as a
 platform even) to judge
 * True64 Alpha I have no clue about
 * PA-RISCs tas() might not be a compiler barrier for !GCC
 * PA-RISCs S_UNLOCK might not be a compiler barrier
 * HP-UX !GCC might not
 * IRIX 5 seems to be a compiler barrier
 * SINIX - I don't care
 * AIX PPC - compiler barrier
 * Sun - TAS is implemented in external assembly, normal function call,
   compiler barrier
 * Win(32|64) - compiler barrier
 * Generic S_UNLOCK *NOT* necessarily a compiler barrier.

 Ok, so I might have been a bit too optimistic...

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range types do not display in pg_stats

On 09/19/2013 02:55 PM, Mike Blackwell wrote:
 Interesting.  Is this a 9.3 issue?  I ran the above against my 9.2.4 server
 and got no rows in pg_stats.  Did I miss something?

Yeah, that was on 9.3.  I think the issue on 9.2 is the same, it just
expresses differently.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range types do not display in pg_stats

Robert,

 It probably has to do with the CASE stakind stuff in the definition of
 the pg_stats view.  Try \d+ pg_stats to see what I mean.

Ok, if this is not a known bug, I'll see if I can work up a fix.  No
promises, given the hairyness ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Looking for information on our elephant

On 09/19/2013 04:16 PM, Tatsuo Ishii wrote:
 Hi,
 
 I'm Looking for information on our elephant: especially how/why we
 chose elephant as our mascot.

What I've been told:

Our first logo was one Jan designed, which was a elephant in a diamond.
 While the idea was nice, it looked terrible, so when Marc and Greg
launched PostreSQL Inc. they hired an actual desinger who created the
blue elephant we have now.  This design was then contributed to the
community.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

2013-09-20 Thread didier

Hi,

IMO it's a bug if S_UNLOCK is a not a compiler barrier.

Moreover for volatile remember:
https://www.securecoding.cert.org/confluence/display/seccode/DCL17-C.+Beware+of+miscompiled+volatile-qualified+variables

Who is double checking compiler output? :)

regards
Didier



On Fri, Sep 20, 2013 at 5:11 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-09-20 16:47:24 +0200, Andres Freund wrote:
  I think we should go through the various implementations and make sure
  they are actual compiler barriers and then change the documented policy.

 From a quick look
 * S_UNLOCK for PPC isn't a compiler barrier
 * S_UNLOCK for MIPS isn't a compiler barrier
 * I don't know enough about unixware (do we still support that as a
 platform even) to judge
 * True64 Alpha I have no clue about
 * PA-RISCs tas() might not be a compiler barrier for !GCC
 * PA-RISCs S_UNLOCK might not be a compiler barrier
 * HP-UX !GCC might not
 * IRIX 5 seems to be a compiler barrier
 * SINIX - I don't care
 * AIX PPC - compiler barrier
 * Sun - TAS is implemented in external assembly, normal function call,
   compiler barrier
 * Win(32|64) - compiler barrier
 * Generic S_UNLOCK *NOT* necessarily a compiler barrier.

 Ok, so I might have been a bit too optimistic...

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Steve Singer


On 09/20/2013 08:38 AM, Kevin Grittner wrote:
Did you look at the record_eq and record_cmp functions which exist 
before this patch?  Is there a reason we should do it one way for the 
default operators and another way for this alternative? 
Do you think we should change record_eq and record_cmp to do things 
the way you suggest? 


I think the record_eq and record_cmp functions would be better if they 
shared code as well, but I don't think changing that should be part of 
this patch, you aren't otherwise touching those functions. I know some 
people dislike code that is switch based and prefer duplication, my 
preference is to avoid duplication.


This seems like a bad idea to me.  I don't like that I get a type comparison
error only sometimes based on the values of the data in the column.  If I'm
a user testing code that compares two records with this operator and I test my
query with 1 value pair then and it works then I'd naively expect to get a
true or false on all other value sets of the same record type.


Again, this is a result of following the precedent set by the
existing record comparison operators.

test=# create table r1 (c1 int, c2 int);
CREATE TABLE
test=# create table r2 (c1 int, c2 int, c3 int);
CREATE TABLE
test=# insert into r1 values (1, 2);
INSERT 0 1
test=# insert into r2 values (3, 4, 5);
INSERT 0 1
test=# select * from r1 join r2 on r1  r2;
  c1 | c2 | c1 | c2 | c3
++++
   1 |  2 |  3 |  4 |  5
(1 row)

test=# update r1 set c1 = 3, c2 = 4;
UPDATE 1
test=# select * from r1 join r2 on r1  r2;
ERROR:  cannot compare record types with different numbers of columns

Would be in favor of forcing a change to the record comparison
operators which have been in use for years?  If not, is there a
good reason to have them behave differently in this regard?

--
I hadn't picked up on that you copied that part of the behaviour from 
the existing comparison operators.
No we would need a really good reason for changing the behaviour of the 
comparison operators and I don't think we have that.  I agree that the 
binary identical operators should behave similarly to the existing 
comparison operators and error out on the column number mismatch after 
comparing the columns that are present in both.


Steve


Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii is...@postgresql.org wrote:
 What about limiting to use NCHAR with a database which has same
 encoding or compatible encoding (on which the encoding conversion is
 defined)? This way, NCHAR text can be automatically converted from
 NCHAR to the database encoding in the server side thus we can treat
 NCHAR exactly same as CHAR afterward.  I suppose what encoding is used
 for NCHAR should be defined in initdb time or creation of the database
 (if we allow this, we need to add a new column to know what encoding
 is used for NCHAR).

 For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is
 UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
 SHIFT-JIS and database encoding is UTF-8 because there is a conversion
 between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
 SHIFT-JIS and database encoding is ISO-8859-1 because there's no
 conversion between them.

I think the point here is that, at least as I understand it, encoding
conversion and sanitization happens at a very early stage right now,
when we first receive the input from the client.  If the user sends a
string of bytes as part of a query or bind placeholder that's not
valid in the database encoding, it's going to error out before any
type-specific code has an opportunity to get control.   Look at
textin(), for example.  There's no encoding check there.  That means
it's already been done at that point.  To make this work, someone's
going to have to figure out what to do about *that*.  Until we have a
sketch of what the design for that looks like, I don't see how we can
credibly entertain more specific proposals.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

Dimitri Fontaine escribió:

 My understanding is that if you choose citext then you don't care at all
 about the case, so that the two relations actually yield the same
 results for the right definition of same here: the citext one.

For the record, I don't think citext means that the user doesn't care
about the case; it only means they want the comparisons to be
case-insensitive, but case should nonetheless be preserved.  That is,
case-insensitive, case-preserving.  A parallel is MS-DOS file name
semantics.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

On Thu, Sep 19, 2013 at 6:42 PM, MauMau maumau...@gmail.com wrote:
 National character types support may be important to some potential users of
 PostgreSQL and the popularity of PostgreSQL, not me.  That's why national
 character support is listed in the PostgreSQL TODO wiki.  We might be losing
 potential users just because their selection criteria includes national
 character support.

We'd have to go back and search the archives to figure out why that
item was added to the TODO, but I'd be surprised if anyone ever had it
in mind to create additional types that behave just like existing
types but with different names.  I don't think that you'll be able to
get consensus around that path on this mailing list.

 I am not keen to introduce support for nchar and nvarchar as
 differently-named types with identical semantics.

 Similar examples already exist:

 - varchar and text: the only difference is the existence of explicit length
 limit
 - numeric and decimal
 - int and int4, smallint and int2, bigint and int8
 - real/double precison and float

I agree that the fact we have both varchar and text feels like a wart.
 The other examples mostly involve different names for the same
underlying type, and so are different from what you are asking for
here.

 I understand your feeling.  The concern about incompatibility can be
 eliminated by thinking the following way.  How about this?

 - NCHAR can be used with any database encoding.

 - At first, NCHAR is exactly the same as CHAR.  That is,
 implementation-defined character set described in the SQL standard is the
 database character set.

 - In the future, the character set for NCHAR can be selected at database
 creation like Oracle's CREATE DATABAWSE  NATIONAL CHARACTER SET
 AL16UTF16.  The default it the database set.

Hmm.  So under that design, a database could support up to a total of
two character sets, the one that you get when you say 'foo' and the
other one that you get when you say n'foo'.

I guess we could do that, but it seems a bit limited.  If we're going
to go to the trouble of supporting multiple character sets, why not
support an arbitrary number instead of just two?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Could ANALYZE estimate bloat?

2013-09-20 Thread Joshua D. Drake



On 09/20/2013 11:59 AM, Josh Berkus wrote:


Hackers,

I've been tinkering with a number of table bloat checks, and it occurred
to me that the problm is that these are all approximations based on
overall gross statistics, and as such highly inaccurate.

It seems like would could have ANALYZE, while sampling from the table,
also check the amount of dead space per page and use that as an estimate
of the % of dead space overall.  We'd still need something else for
indexes, but this seems like it would be a good start.

No?


I think this is a great idea.






--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

On Fri, Sep 20, 2013 at 3:15 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 David Rowley wrote:
 I moved the source around and I've patched against it again. New patch 
 attached.

 Thank you, marked as ready for committer.

 /*
+ * helper function for processing the format string in
+ * log_line_prefix()
+ * This function returns NULL if it finds something which
+ * it deems invalid for the log_line_prefix string.
+ */

Comments should be formatted as a single paragraph.  If you want
multiple paragraphs, you need to include a line that's blank except
for the leading  *.

+static const char
+*process_log_prefix_padding(const char *p, int *ppadding)

The asterisk should be on the previous line, separated from char by a space.

+   appendStringInfo(buf, %*ld, padding,
log_line_number);

Is %* supported by all versions of printf() on every platform we support?

Earlier there was some discussion of performance.  Was that tested?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Could ANALYZE estimate bloat?

Hackers,

I've been tinkering with a number of table bloat checks, and it occurred
to me that the problm is that these are all approximations based on
overall gross statistics, and as such highly inaccurate.

It seems like would could have ANALYZE, while sampling from the table,
also check the amount of dead space per page and use that as an estimate
of the % of dead space overall.  We'd still need something else for
indexes, but this seems like it would be a good start.

No?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Extend namespace of valid guc names

On Thu, Sep 19, 2013 at 6:19 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-19 14:57:53 -0400, Robert Haas wrote:
 On Tue, Sep 17, 2013 at 6:56 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I think that ship has long since sailed. postgresql.conf has allowed
  foo.bar style GUCs via custom_variable_classes for a long time, and
  these days we don't even require that but allow them generally. Also,
  SET, postgres -c, and SELECT set_config() already don't have the
  restriction to one dot in the variable name.

 Well, we should make it consistent, but I'm not sure that settles the
 question of which direction to go.

 I suggested making it consistent in either form elsewhere in this
 thread, Tom wasn't convinced. I think because of backward compatibility
 concerns we hardly can restrict the valid namespace in all forms to the
 most restrictive form (i.e. guc-file.l). Quite some people use GUCs as
 variables.
 Maybe we can forbid the more absurd variants (SET ...  = 3;
 SELECT set_config('...', '1', true)), but restricting it entirely seems
 to cause pain for no reason.

Yeah, I don't think I understand Tom's reasoning.  I mean, why
shouldn't there be ONE rule for what can be a GUC?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_sleep(interval)

On Fri, Sep 20, 2013 at 7:59 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
  - the new function is *not* tested anywhere!

I would suggest simply to replace some pg_sleep(int) instances
by corresponding pg_sleep(interval) instances in the non
regression tests.

  - some concerns have been raised that it breaks pg_sleep(TEXT)
which currently works thanks to the implicit TEXT - INT cast.

I would suggest to add pg_sleep(TEXT) explicitely, like:

  CREATE FUNCTION pg_sleep(TEXT) RETURNS VOID VOLATILE STRICT AS
  $$ select pg_sleep($1::INTEGER) $$ LANGUAGE SQL;

That would be another one liner, to update the documentation and
to add some tests as well!

ISTM that providing pg_sleep(TEXT) cleanly resolves the
upward-compatibility issue raised.

I think that's ugly and I'm not one bit convinced it will resolve all
the upgrade-compatibility issues.  Realistically, all sleeps are going
to be reasonably well measured in seconds anyway.  If you want to
sleep for some other interval, convert that interval to a number of
seconds first.

Another problem is that, as written, this is vulnerable to search_path
hijacking attacks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_sleep(interval)

2013-09-20 Thread Fabien COELHO



Hello Robert,


 - some concerns have been raised that it breaks pg_sleep(TEXT)
   which currently works thanks to the implicit TEXT - INT cast.

   I would suggest to add pg_sleep(TEXT) explicitely, like:

 CREATE FUNCTION pg_sleep(TEXT) RETURNS VOID VOLATILE STRICT AS
 $$ select pg_sleep($1::INTEGER) $$ LANGUAGE SQL;

   That would be another one liner, to update the documentation and
   to add some tests as well!

   ISTM that providing pg_sleep(TEXT) cleanly resolves the
   upward-compatibility issue raised.


I think that's ugly and I'm not one bit convinced it will resolve all
the upgrade-compatibility issues.


Realistically, all sleeps are going to be reasonably well measured in 
seconds anyway.


I do not know that. From a usual dabatabase point of view, it does not 
make much sense to slow down a database anyway, and this function is never 
needed... so it really depends on the use case.


If someone want to simulate a long standing transaction to check its 
effect on some features such as dumping data orreplication or whatever, 
maybe pg_sleep(INTERVAL '5 hours') is nicer that pg_sleep(18000), if you 
are not too good at dividing by 60, 3600 or 86400...


If you want to sleep for some other interval, convert that interval to a 
number of seconds first.


You could say that for any use of INTERVAL. ISTM that the point if the 
interval type is just to be more readable than a number of seconds to 
express a delay.



Another problem is that, as written, this is vulnerable to search_path
hijacking attacks.


Yes, sure. I was not suggesting to create the function directly as above, 
this is just the test I made to check whether it worked as I thought, i.e. 
providing a TEXT version works and interacts properly with the INTEGER and 
INTERVAL versions. My guess is that the function definition would be 
inserted directly in pg_proc as other pg_* functions at initdb time.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] File_fdw documentation patch to clarify OPTIONS clause

2013-09-20 Thread Mike Blackwell

The file_fdw documentation for the OPTIONS clause references the COPY
command's documentation.  In the case of COPY, the value for some options
(e.g. HEADER, OIDS, ...) is optional.  While the file_fdw documentation
makes no mention of it, the values for all options are required.

Attached is a patch to add a note to the docs mentioning this fact.

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
 

http://www.rrdonnelley.com


 http://www.rrdonnelley.com/
* mike.blackw...@rrd.com*


file-fdw-doc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread Peter Eisentraut

On 9/20/13 2:22 PM, Robert Haas wrote:
 I am not keen to introduce support for nchar and nvarchar as
  differently-named types with identical semantics.
 
  Similar examples already exist:
 
  - varchar and text: the only difference is the existence of explicit length
  limit
  - numeric and decimal
  - int and int4, smallint and int2, bigint and int8
  - real/double precison and float
 I agree that the fact we have both varchar and text feels like a wart.
  The other examples mostly involve different names for the same
 underlying type, and so are different from what you are asking for
 here.

Also note that we already have NCHAR [VARYING].  It's mapped to char or
varchar, respectively, in the parser, just like int, real, etc. are handled.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL renegotiation

Here's the patch I propose to handle renegotiation cleanly.  I noticed
in testing that SSL_renegotiate_pending() doesn't seem to actually work
--- if I throw an ereport(FATAL) at the point where I expect the
renegotiation to be complete, it always dies; even if I give it
megabytes of extra traffic while waiting for the renegotiation to
complete.  I suspect this is an OpenSSL bug.  Instead, in this patch I
check the internal renegotiation counter: grab its current value when
starting the renegotiation, and consider it complete when the counter
has advanced.  This works fine.

Another thing not covered by the original code snippet I proposed
upthread is to avoid renegotiating when there was a renegotiation in
progress.  This bug has been observed in the field.

Per discussion, I made it close the connection with a FATAL error if the
limit is reached and the renegotiation hasn't taken place.  To do
otherwise is not acceptable for a security PoV.

Sean Chittenden mentioned that when retrying the handshake, we should be
careful to only do it a few times, not forever, to avoid a malfeasant
from grabbing hold of a connection indefinitely.  I've added that too,
hardcoding the number of retries to 20.

Also, I made this code request a renegotiation slightly before the limit
is actually reached.  I noticed that in some cases some traffic can go
by before the renegotiation is actually completed.  The difference
should be pretty minimal.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/libpq/be-secure.c
--- b/src/backend/libpq/be-secure.c
***
*** 101,106  char	   *ssl_crl_file;
--- 101,109 
   */
  int			ssl_renegotiation_limit;
  
+ /* are we in the middle of a renegotiation? */
+ static bool in_ssl_renegotiation = false;
+ 
  #ifdef USE_SSL
  static SSL_CTX *SSL_context = NULL;
  static bool ssl_loaded_verify_locations = false;
***
*** 321,350  secure_write(Port *port, void *ptr, size_t len)
  	if (port-ssl)
  	{
  		int			err;
  
! 		if (ssl_renegotiation_limit  port-count  ssl_renegotiation_limit * 1024L)
  		{
  			SSL_set_session_id_context(port-ssl, (void *) SSL_context,
  	   sizeof(SSL_context));
  			if (SSL_renegotiate(port-ssl) = 0)
  ereport(COMMERROR,
  		(errcode(ERRCODE_PROTOCOL_VIOLATION),
! 		 errmsg(SSL renegotiation failure)));
! 			if (SSL_do_handshake(port-ssl) = 0)
! ereport(COMMERROR,
! 		(errcode(ERRCODE_PROTOCOL_VIOLATION),
! 		 errmsg(SSL renegotiation failure)));
! 			if (port-ssl-state != SSL_ST_OK)
! ereport(COMMERROR,
! 		(errcode(ERRCODE_PROTOCOL_VIOLATION),
! 		 errmsg(SSL failed to send renegotiation request)));
! 			port-ssl-state |= SSL_ST_ACCEPT;
! 			SSL_do_handshake(port-ssl);
! 			if (port-ssl-state != SSL_ST_OK)
! ereport(COMMERROR,
! 		(errcode(ERRCODE_PROTOCOL_VIOLATION),
! 		 errmsg(SSL renegotiation failure)));
! 			port-count = 0;
  		}
  
  wloop:
--- 324,384 
  	if (port-ssl)
  	{
  		int			err;
+ 		static long renegotiation_count = 0;
  
! 		/*
! 		 * If SSL renegotiations are enabled and we're getting close to the
! 		 * limit, start one now; but avoid it if there's one already in
! 		 * progress.  Request the renegotiation before the limit has actually
! 		 * expired; we give it 1% of the specified limit, or 1kB, whichever is
! 		 * greater.
! 		 */
! 		if (ssl_renegotiation_limit  !in_ssl_renegotiation 
! 			port-count  Min((ssl_renegotiation_limit - 1) * 1024L,
! 		(long) rint(ssl_renegotiation_limit * 1024L * 0.99)))
  		{
+ 			in_ssl_renegotiation = true;
+ 
  			SSL_set_session_id_context(port-ssl, (void *) SSL_context,
  	   sizeof(SSL_context));
  			if (SSL_renegotiate(port-ssl) = 0)
  ereport(COMMERROR,
  		(errcode(ERRCODE_PROTOCOL_VIOLATION),
! 		 errmsg(SSL failure during renegotiation start)));
! 			else
! 			{
! int			handshake;
! int			retries = 20;
! 
! /*
!  * The way we determine that a renegotiation has completed is
!  * by observing OpenSSL's internal renegotiation counter.
!  * Memoize it when starting the renegotiation, and assume that
!  * the renegotiation is complete when the counter advances.
!  *
!  * OpenSSL provides SSL_renegotiation_pending(), but this
!  * doesn't seem to work in testing.
!  */
! renegotiation_count = SSL_num_renegotiations(port-ssl);
! 
! /*
!  * A handshake can fail, so be prepared to retry it, but only a
!  * few times
!  */
! for (;;)
! {
! 	handshake = SSL_do_handshake(port-ssl);
! 	if (handshake  0)
! 		break;	/* done */
! 	ereport(COMMERROR,
! 			(errcode(ERRCODE_PROTOCOL_VIOLATION),
! 			 errmsg(SSL handshake failure on renegotiation, retrying)));
! 	if (retries-- = 0)
! 		ereport(FATAL,
! (errcode(ERRCODE_PROTOCOL_VIOLATION),
!  errmsg(unable to complete SSL handshake)));
! }
! 			}

Re: [HACKERS] gaussian distribution pgbench

KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote:

 I create gaussinan distribution pgbench patch that can access
 records with gaussian frequency. And I submit this commit fest.

Thanks!

I have moved this to the Open CommitFest, though.

https://commitfest.postgresql.org/action/commitfest_view/open

You had accidentally added to the CF In Progress.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Could ANALYZE estimate bloat?

Josh,

* Josh Berkus (j...@agliodbs.com) wrote:
 I've been tinkering with a number of table bloat checks, and it occurred
 to me that the problm is that these are all approximations based on
 overall gross statistics, and as such highly inaccurate.

Greg Smith and I discussed some improvements around this area @Open.
I'd suggest you talk with him about the ideas that he has.

 It seems like would could have ANALYZE, while sampling from the table,
 also check the amount of dead space per page and use that as an estimate
 of the % of dead space overall.  We'd still need something else for
 indexes, but this seems like it would be a good start.
 
 No?

Err, I thought that was one of the things like ANALYZE *did* collect
through the 'live tuples' number?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] logical changeset generation v6

2013-09-20 Thread Peter Geoghegan

On Thu, Sep 19, 2013 at 7:43 AM, Andres Freund and...@2ndquadrant.com wrote:
 More generally, the thing that bugs me about this approach is that
 logical replication is not really special, but what you've done here
 MAKES it special. There are plenty of other situations where we are
 too aggressive about holding back xmin.  A single-statement
 transaction that selects from a single table holds back xmin for the
 entire cluster, and that is Very Bad.  It would probably be unfair to
 say, well, you have to solve that problem first.  But on the other
 hand, how do we know that the approach you've adopted here isn't going
 to make the more general problem harder to solve?  It seems invasive
 at a pretty low level.

I agree that it's invasive, but I am doubtful that pegging the xmin in
a more granular fashion precludes this kind of optimization. We might
have to generalize what Andres has done, which could mean eventually
throwing it out and starting from scratch, but I have a hard time
seeing how that implies an appreciable cost above solving the general
problem first (now that Andres has already implemented the
RecentGlobalDataXmin thing). As I'm sure you appreciate, the cost of
doing the opposite - of solving the general problem first - may be
huge: waiting another release for logical changeset generation.

 The reason why I think it's actually different is that the user actually
 has control over how long transactions are running on the primary. They
 don't really control how fast a replication consumer consumes and how
 often it sends feedback messages.

Right. That's about what I said last year.

I find the following analogy useful: A logical changeset generation
implementation without RecentGlobalDataXmin is kind of like an
old-fashioned nuclear reactor, like the one they had at Chernobyl.
Engineers have to actively work in order to prevent it from
overheating. However, an implementation with RecentGlobalDataXmin is
like a modern, much safer nuclear reactor. Engineers have to actively
work to keep the reactor heated. Which is to say, with
RecentGlobalDataXmin a standby that dies cannot bloat the master too
much (almost as with hot_standby_feedback - that too requires active
participation from the standby to do harm to the master). Without
RecentGlobalDataXmin, the core system and the plugin at the very least
need to worry about that case when a standby dies.

I have a little bit of feedback that I forgot to mention in my earlier
reviews, because I thought it was too trivial then: something about
the name pg_receivellog annoys me in a way that the name
pg_receivexlog does not. Specifically, it looks like someone meant to
type pg_receivelog but fat-fingered it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

2013-09-20 Thread Fabien COELHO



Hello Mitsumasa,


In the general transaction situation, clients access for all records equally is
hard to happen. I think gaussian distribution access patterns are most of
transaction petterns in general. My patch realizes neary this access pattern.


That is great! I was just looking for something like that!

I have not looked at the patch yet, but from the plots you sent, it seems 
that it is a gaussian distribution over the keys. However this pattern 
induces stronger cache effects which are maybe not too realistic, because 
neighboring keys in the middle are more likely to be chosen.


It seems to me that this is not desirable.

Have you considered adding a randomization layer, that is once you have 
a key in [1 .. n] centered around n/2, then you perform a pseudo-random 
transformation into the same domain so that key values are scattered over 
the whole domain?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Could ANALYZE estimate bloat?

On 09/20/2013 02:17 PM, Stephen Frost wrote:
 No?
 
 Err, I thought that was one of the things like ANALYZE *did* collect
 through the 'live tuples' number?

Nope.  live tuples is updated by the stats collector, NOT by analyze.
 Also, live vs. dead tuples doesn't tell me how much free *space* is
available on pages.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.


From: Tatsuo Ishii is...@postgresql.org

What about limiting to use NCHAR with a database which has same
encoding or compatible encoding (on which the encoding conversion is
defined)? This way, NCHAR text can be automatically converted from
NCHAR to the database encoding in the server side thus we can treat
NCHAR exactly same as CHAR afterward.  I suppose what encoding is used
for NCHAR should be defined in initdb time or creation of the database
(if we allow this, we need to add a new column to know what encoding
is used for NCHAR).

For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is
UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
SHIFT-JIS and database encoding is UTF-8 because there is a conversion
between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
SHIFT-JIS and database encoding is ISO-8859-1 because there's no
conversion between them.


Thanks for the idea, it sounds flexible for wider use.  Your cooperation 
would be much appreciated to devise implementation with as little code as 
possible.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.


From: Valentine Gogichashvili val...@gmail.com
the whole NCHAR appeared as hack for the systems, that did not have it 
from

the beginning. It would not be needed, if all the text would be magically
stored in UNICODE or UTF from the beginning and idea of character would be
the same as an idea of a rune and not a byte.


I guess so, too.



PostgreSQL has a very powerful possibilities for storing any kind of
encoding. So maybe it makes sense to add the ENCODING as another column
property, the same way a COLLATION was added?


Some other people in this community suggested that.  ANd the SQL standard 
suggests the same -- specifying a character encoding for each column: 
CHAR(n) CHARASET SET ch.



Text operations should work automatically, as in memory all strings will 
be

converted to the database encoding.

This approach will also open a possibility to implement custom ENCODINGs
for the column data storage, like snappy compression or even BSON, gobs or
protbufs for much more compact type storage.


Thanks for your idea that sounds interesting, although I don't understand 
that well.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.


From: Martijn van Oosterhout klep...@svana.org

As far as I can tell the whole reason for introducing NCHAR is to
support SHIFT-JIS, there hasn't been call for any other encodings, that
I can remember anyway.


Could you elaborate on this, giving some info sources?



So rather than this whole NCHAR thing, why not just add a type
sjistext, and a few type casts and call it a day...


The main reason for supporting NCHAR types is to ease migration from other 
DBMSs, not requiring DDL changes.  So sjistext does not match that purpose.



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.


From: Robert Haas robertmh...@gmail.com

I don't think that you'll be able to
get consensus around that path on this mailing list.



I agree that the fact we have both varchar and text feels like a wart.


Is that right?  I don't feel varchar/text case is a wart.  I think text was 
introduced for a positive reason to ease migration from other DBMSs.  The 
manual says:


http://www.postgresql.org/docs/current/static/datatype-character.html

Although the type text is not in the SQL standard, several other SQL 
database management systems have it as well.


And isn't EnterpriseDB doing similar things for Oracle compatibility, 
although I'm not sure about the details?  Could you share your idea why we 
won't get consensus?





I understand your feeling.  The concern about incompatibility can be
eliminated by thinking the following way.  How about this?

- NCHAR can be used with any database encoding.

- At first, NCHAR is exactly the same as CHAR.  That is,
implementation-defined character set described in the SQL standard is 
the

database character set.

- In the future, the character set for NCHAR can be selected at database
creation like Oracle's CREATE DATABAWSE  NATIONAL CHARACTER SET
AL16UTF16.  The default it the database set.


Hmm.  So under that design, a database could support up to a total of
two character sets, the one that you get when you say 'foo' and the
other one that you get when you say n'foo'.

I guess we could do that, but it seems a bit limited.  If we're going
to go to the trouble of supporting multiple character sets, why not
support an arbitrary number instead of just two?


I agree with you about the arbitrary number.  Tatsuo san gave us a good 
suggestion.  Let's consider how to implement that.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.


From: Robert Haas robertmh...@gmail.com
On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii is...@postgresql.org 
wrote:

What about limiting to use NCHAR with a database which has same
encoding or compatible encoding (on which the encoding conversion is
defined)? This way, NCHAR text can be automatically converted from
NCHAR to the database encoding in the server side thus we can treat
NCHAR exactly same as CHAR afterward.  I suppose what encoding is used
for NCHAR should be defined in initdb time or creation of the database
(if we allow this, we need to add a new column to know what encoding
is used for NCHAR).

For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is
UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
SHIFT-JIS and database encoding is UTF-8 because there is a conversion
between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
SHIFT-JIS and database encoding is ISO-8859-1 because there's no
conversion between them.


I think the point here is that, at least as I understand it, encoding
conversion and sanitization happens at a very early stage right now,
when we first receive the input from the client.  If the user sends a
string of bytes as part of a query or bind placeholder that's not
valid in the database encoding, it's going to error out before any
type-specific code has an opportunity to get control.   Look at
textin(), for example.  There's no encoding check there.  That means
it's already been done at that point.  To make this work, someone's
going to have to figure out what to do about *that*.  Until we have a
sketch of what the design for that looks like, I don't see how we can
credibly entertain more specific proposals.


OK, I see your point.  Let's consider that design.  I'll learn the code 
regarding this.  Does anybody, especially Tatsuo san, Tom san, Peter san, 
have any good idea?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-20 Thread Peter Geoghegan

On Tue, Sep 17, 2013 at 9:29 AM, Robert Haas robertmh...@gmail.com wrote:
On Sat, Sep 14, 2013 at 6:27 PM, Peter Geoghegan p...@heroku.com wrote:
Note that today there is no guarantee that the original waiter for a
duplicate-inserting xact to complete will be the first one to get a
second chance

ProcLockWakeup() only wakes as many waiters from the head of the queue
as can all be granted the lock without any conflicts. So I don't
think there is a race condition in that path.

Right, but what about XactLockTableWait() itself? It only acquires a
ShareLock on the xid of the got-there-first inserter that potentially
hasn't yet committed/aborted. There will be no conflicts between
multiple second-chance-seeking blockers trying to acquire this lock
concurrently, and so in fact there is (what I guess you'd consider to
be) a race condition in the current btree insertion code. So my
earlier point about according an upsert implementation license to
optimize ordering of retries across multiple unique indexes -- that it
isn't really inconsistent with the current code when dealing with only
one unique index insertion -- has not been invalidated.

EvalPlanQualFetch() and Do_MultiXactIdWait() also call
XactLockTableWait(), for similar reasons. In my patch, the later row
locking code used by INSERT...ON DUPLICATE KEY LOCK FOR UPDATE calls
XactLockTableWait() too.

So far it's
been a slippery slope type argument that can be equally well used to
argue against some facet of almost any substantial patch ever
proposed.

I don't completely agree with that characterization, but you do have a
point. Obviously, if the differences in the area of interruptibility,
starvation, deadlock risk, etc. can be made small enough relative to
the status quo can be made small enough, then those aren't reasons to
reject the approach.

That all seems fair to me. That's the standard that I'd apply as a
reviewer myself.

But I'm skeptical that you're going to be able to accomplish that,
especially without adversely affecting maintainability. I think the
way that you're proposing to use lwlocks here is sufficiently
different from what the rest of the system does that it's going to be
hard to avoid system-wide affects that can't easily be caught during
code review;

Fair enough. In case it isn't already totally clear to someone, I
concede that it isn't going to be workable to hold even shared buffer
locks across all these operations. Let's get past that, though.

and like Andres, I don't share your skepticism about
alternative approaches.

Well, I expressed skepticism about one alternative approach in
particular, which is the promise tuples approach. Andres seems to
think that I'm overly concerned about bloat, but I'm not sure he
appreciates why I'm so sensitive to it in this instance. I'll be
particularly sensitive to it if value locks need to be held
indefinitely rather than there being a speculative
grab-the-value-locks attempt (because that increases the window in
which another session can necessitate that we retry at row locking
time quite considerably - see below).

I think the fundamental
problem with UPSERT, MERGE, and this proposal is what happens when the
conflicting tuple is present but not visible to your scan, either
because it hasn't committed yet or because it has committed but is not
visible to your snapshot.

Yeah, you're right. As I mentioned to Andres already, when row locking
happens and there is this kind of conflict, my approach is to retry
from scratch (go right back to before value lock acquisition) in the
sort of scenario that generally necessitates EvalPlanQual() looping,
or to throw a serialization failure where that's appropriate. After an
unsuccessful attempt at row locking there could well be an interim
wait for another xact to finish, before retrying (at read committed
isolation level). This is why I think that value locking/retrying
should be cheap, and should avoid bloat if at all possible.

Forgive me if I'm making a leap here, but it seems like what you're
saying is that the semantics of upsert that one might naturally expect
are *arguably* fundamentally impossible, because they entail
potentially locking a row that isn't current to your snapshot, and you
cannot throw a serialization failure at read committed. I respectfully
suggest that that exact definition of upsert isn't a useful one,
because other snapshot isolation/MVCC systems operating within the
same constraints must have the same issues, and yet they manage to
implement something that could be called upsert that people seem happy
with.

I also discovered, after it was committed, that it didn't help in the
way I expected:

http://www.postgresql.org/message-id/CA+TgmoY8P3sD=ouvig+xzjmzk5-phunv39rtfyzuqxu8hjt...@mail.gmail.com

Well, at the time you didn't also provide raw commit latency benchmark
results for your hardware using a tool like pg_test_fsync, which I'd
consider absolutely essential to such a discussion. That's

Re: [HACKERS] Could ANALYZE estimate bloat?

* Josh Berkus (j...@agliodbs.com) wrote:
  Also, live vs. dead tuples doesn't tell me how much free *space* is
 available on pages.

I'm not really sure that you'd get much better from ANALYZE than you get
from tracking the inserted/updated/deleted tuples.  Collecting that
information when VACUUM'ing the table would certainly provide much more
accurate results, which could possibly be stored in a page-level bitmap
of completely empty pages at the beginning of each 1G segment.
Alternatively, the bitmap could be updated during processing instead of
waiting for a VACUUM.

Greg and I hypothesized that such a bitmap might be used to truncate
individual 1G segments in the middle of a relation rather than only at
the end, perhaps all the way down to a point where only a header plus
the page-level bitmap in the 1G segment are left.  This was discussed in
context of a VACUUM which used the try-to-elevate-the-lock approach
already used to truncate the last 1G segment of the relations, though
I've also wondered if it could take page-level locks starting at the end
of the 1G segment and walking backwards until it's unable to acquire a
lock or a non-empty page is found.

Of course, we're aware of the issues around the storage management
system and interfaces which might make this entirely unrealistic but
it's getting to a point where, I think (not sure about Greg), we need to
deal with that in some way to improve on issues like this.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE