[HACKERS] \copy (query) delimiter syntax error

2007-02-03 Thread Michael Fuhr
psql's \copy (query) with a delimiter yields a syntax error:

  test= \copy foo to foo.txt delimiter '|'
  (works)

  test= \copy (select * from foo) to foo.txt
  (works)

  test= \copy (select * from foo) to foo.txt delimiter '|'
  ERROR:  syntax error at or near USING
  LINE 1: COPY ( select * from foo ) TO STDOUT USING DELIMITERS '|'

The problem is that \copy sends USING DELIMITERS for backward
compatibility (comment on line 502 of src/bin/psql/copy.c) but that
COPY (query) doesn't support USING DELIMITERS:

  CopyStmt:   COPY opt_binary qualified_name opt_column_list opt_oids
  copy_from copy_file_name copy_delimiter opt_with copy_opt_list
  ...
  | COPY select_with_parens TO copy_file_name opt_with
copy_opt_list

  copy_delimiter:
  /* USING DELIMITERS kept for backward compatibility. 2002-06-15 */
 opt_using DELIMITERS Sconst

What should be fixed -- COPY or \copy?  Does psql's \copy still
need backward compatibility to unsupported pre-7.3?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Referential Integrity and SHARE locks

2007-02-03 Thread Simon Riggs
On Fri, 2007-02-02 at 16:50 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Fri, 2007-02-02 at 15:57 -0500, Tom Lane wrote:
  , and it doesn't scale to
  more than two holders, and I don't think it works for combinations of
  share and exclusive lock.  Also, what happened to the third type of lock?
 
  Well, we just need to record the maximum two lock holders (given the
  semantics described). The third lock type is both locks at once.
 
 You're not going to support shared locks?  That will be a big step
 backwards ...

I did say that Shared locks were supported also. The lack of ordering of
multitransactions is a hole in my suggestion, so I need to reconsider.

  Anyway, implementation aside, I wanted to agree the overall TODO, so we
  can think through the best way over a long period, if you agree in
  general.
 
 No, I don't.  I think knowledge of which columns are in a PK is quite a
 few levels away from the semantics of row locking.  To point out just
 one problem, what happens when you add or drop a PK?  Or drop and
 replace with a different column set?  Yes, I know dropping one requires
 exclusive lock on the table, but the transaction doing it could hold row
 locks within the table, and now it's very unclear what they mean.

There are issues, yes. Dropping PKs is a very irregular occurrence nor
is it likely to be part of a complex transaction. It wouldn't bother me
to say that if a transaction already holds a RowExclusiveLock or a
RowShareLock it cannot upgrade to an AccessExclusiveLock.

For me, PKs are intimately tied to the rows they uniquely identify; we
should be mentally linking them and seeing them as a replacement for
Oids, which we still see as intimately linked at low level. We'll be
missing many optimizations if we don't, we've discussed others this week
already.

The TODO I was requesting you consider was this:

Develop non-conflicting locking scheme to allow RI checks to co-exist
peacefully with non-PK UPDATEs on the referenced table.

That is, IMHO, a general statement of an important unresolved issue with
our Referential Integrity implementation. That is in no way intended as
any form of negative commentary on the excellent detailed work that has
got us so far already.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] \copy (query) delimiter syntax error

2007-02-03 Thread Andrew Dunstan
Michael Fuhr wrote:
 psql's \copy (query) with a delimiter yields a syntax error:

   test= \copy foo to foo.txt delimiter '|'
   (works)

   test= \copy (select * from foo) to foo.txt
   (works)

   test= \copy (select * from foo) to foo.txt delimiter '|'
   ERROR:  syntax error at or near USING
   LINE 1: COPY ( select * from foo ) TO STDOUT USING DELIMITERS '|'

 The problem is that \copy sends USING DELIMITERS for backward
 compatibility (comment on line 502 of src/bin/psql/copy.c) but that
 COPY (query) doesn't support USING DELIMITERS:

   CopyStmt:   COPY opt_binary qualified_name opt_column_list opt_oids
   copy_from copy_file_name copy_delimiter opt_with
 copy_opt_list
   ...
   | COPY select_with_parens TO copy_file_name opt_with
 copy_opt_list

   copy_delimiter:
   /* USING DELIMITERS kept for backward compatibility.
 2002-06-15 */
  opt_using DELIMITERS Sconst

 What should be fixed -- COPY or \copy?  Does psql's \copy still
 need backward compatibility to unsupported pre-7.3?



I'd say fix psql. Not sure how far back we should backpatch it. It's
interesting that this has been there since 8.0 and is only now discovered.

cheers

andrew



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] \copy (query) delimiter syntax error

2007-02-03 Thread Michael Fuhr
On Sat, Feb 03, 2007 at 10:52:29AM -0600, Andrew Dunstan wrote:
 I'd say fix psql. Not sure how far back we should backpatch it. It's
 interesting that this has been there since 8.0 and is only now discovered.

The problem is new in 8.2 because COPY (query) doesn't support USING
DELIMITERS.  COPY tablename does, so it has worked all along.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Referential Integrity and SHARE locks

2007-02-03 Thread Stephan Szabo
On Sat, 3 Feb 2007, Simon Riggs wrote:

 On Fri, 2007-02-02 at 16:50 -0500, Tom Lane wrote:
  No, I don't.  I think knowledge of which columns are in a PK is quite a
  few levels away from the semantics of row locking.  To point out just
  one problem, what happens when you add or drop a PK?  Or drop and
  replace with a different column set?  Yes, I know dropping one requires
  exclusive lock on the table, but the transaction doing it could hold row
  locks within the table, and now it's very unclear what they mean.

 There are issues, yes. Dropping PKs is a very irregular occurrence nor
 is it likely to be part of a complex transaction. It wouldn't bother me
 to say that if a transaction already holds a RowExclusiveLock or a
 RowShareLock it cannot upgrade to an AccessExclusiveLock.

The lock check seems like a strange constraint, given that it's not
necessarily going to be anything that conflicts with the row locks. I'm
not sure there'd be a better idea given this sort of scheme, but it still
seems strange.

 The TODO I was requesting you consider was this:

 Develop non-conflicting locking scheme to allow RI checks to co-exist
 peacefully with non-PK UPDATEs on the referenced table.

 That is, IMHO, a general statement of an important unresolved issue with
 our Referential Integrity implementation. That is in no way intended as
 any form of negative commentary on the excellent detailed work that has
 got us so far already.

Well, if we really want to solve that completely then we really need
column locking, or at least locking at the level of arbitrary (possibly
overlapping) unique constraints, not just the PK because foreign keys
don't necessarily reference the primary key.  But the PK case is certainly
the most common and it'd certainly be nice to cover that case.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] --enable-debug does not work with gcc

2007-02-03 Thread Matthias Luedtke

Hi all,

Magnus Hagander wrote:

Personally, in my development tree I use a Makefile.custom containing

# back off optimization unless profiling
ifeq ($(PROFILE),)
   CFLAGS:= $(patsubst -O2,-O1,$(CFLAGS))
endif

-O1 still generates uninitialized variable warnings but the code is a
lot saner to step through ... not perfect, but saner.  It's been a
workable compromise for a long time.  I don't recommend developing with
-O0, exactly because it disables some mighty valuable warnings.

Agreed.  I use -O1 by default myself, unless I am doing performance testing.


Something for the developers FAQ perhaps? I confess I did not know of
Makefile.custom :-D


I did not know, either, but would have been glad, had I found such info
a few weeks ago when I started digging into the pg source ;)

And while we are at it, how about some CFLAGS=-DOPTIMIZER_DEBUG
sweetness for the debugging section in the Dev FAQ?

Are there any other macros that enable some more debug output? (Or is
there a place where all this has already been documented?)

Cheers,
Matthias


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] \copy (query) delimiter syntax error

2007-02-03 Thread Andrew Dunstan



Michael Fuhr wrote:

On Sat, Feb 03, 2007 at 10:52:29AM -0600, Andrew Dunstan wrote:
  

I'd say fix psql. Not sure how far back we should backpatch it. It's
interesting that this has been there since 8.0 and is only now discovered.



The problem is new in 8.2 because COPY (query) doesn't support USING
DELIMITERS.  COPY tablename does, so it has worked all along.

  


oh, good point. OK, I have cut this quick patch that will continue to 
accept the legacy syntax in psql in non-inline-query cases, but will 
make psql unreservedly emit new style syntax for COPY to the backend.  
Does that seem reasonable, or is it too much of a change for the stable 
branch?


cheers

andrew
Index: src/bin/psql/copy.c
===
RCS file: /cvsroot/pgsql/src/bin/psql/copy.c,v
retrieving revision 1.72
diff -c -r1.72 copy.c
*** src/bin/psql/copy.c	5 Jan 2007 22:19:49 -	1.72
--- src/bin/psql/copy.c	3 Feb 2007 19:37:34 -
***
*** 118,123 
--- 118,124 
  	char	   *token;
  	const char *whitespace =  \t\n\r;
  	char		nonstd_backslash = standard_strings() ? 0 : '\\';
+ 	boolhave_query = false;
  
  	if (args)
  		line = pg_strdup(args);
***
*** 163,168 
--- 164,170 
  			xstrcat(result-table,  );
  			xstrcat(result-table, token);
  		}
+ 		have_query = true;
  	}
  
  	token = strtokx(NULL, whitespace, .,(), \,
***
*** 268,291 
  	0, false, false, pset.encoding);
  
  	/*
! 	 * Allows old COPY syntax for backward compatibility 2002-06-19
  	 */
! 	if (token  pg_strcasecmp(token, using) == 0)
! 	{
! 		token = strtokx(NULL, whitespace, NULL, NULL,
! 		0, false, false, pset.encoding);
! 		if (!(token  pg_strcasecmp(token, delimiters) == 0))
! 			goto error;
! 	}
! 	if (token  pg_strcasecmp(token, delimiters) == 0)
  	{
! 		token = strtokx(NULL, whitespace, NULL, ',
! 		nonstd_backslash, true, false, pset.encoding);
! 		if (!token)
! 			goto error;
! 		result-delim = pg_strdup(token);
! 		token = strtokx(NULL, whitespace, NULL, NULL,
! 		0, false, false, pset.encoding);
  	}
  
  	if (token)
--- 270,297 
  	0, false, false, pset.encoding);
  
  	/*
! 	 * Allows old COPY syntax for backward compatibility.
! 	 * Skip if we have an inline query instead of a table name.
  	 */
! 	if (! have_query)
  	{
! 		if (token  pg_strcasecmp(token, using) == 0)
! 		{
! 			token = strtokx(NULL, whitespace, NULL, NULL,
! 			0, false, false, pset.encoding);
! 			if (!(token  pg_strcasecmp(token, delimiters) == 0))
! goto error;
! 		}
! 		if (token  pg_strcasecmp(token, delimiters) == 0)
! 		{
! 			token = strtokx(NULL, whitespace, NULL, ',
! 			nonstd_backslash, true, false, pset.encoding);
! 			if (!token)
! goto error;
! 			result-delim = pg_strdup(token);
! 			token = strtokx(NULL, whitespace, NULL, NULL,
! 			0, false, false, pset.encoding);
! 		}
  	}
  
  	if (token)
***
*** 480,511 
  
  	printfPQExpBuffer(query, COPY );
  
- 	/* Uses old COPY syntax for backward compatibility 2002-06-19 */
- 	if (options-binary)
- 		appendPQExpBuffer(query, BINARY );
- 
  	appendPQExpBuffer(query, %s , options-table);
  
  	if (options-column_list)
  		appendPQExpBuffer(query, %s , options-column_list);
  
- 	/* Uses old COPY syntax for backward compatibility 2002-06-19 */
- 	if (options-oids)
- 		appendPQExpBuffer(query, WITH OIDS );
- 
  	if (options-from)
  		appendPQExpBuffer(query, FROM STDIN);
  	else
  		appendPQExpBuffer(query, TO STDOUT);
  
  
! 	/* Uses old COPY syntax for backward compatibility 2002-06-19 */
  	if (options-delim)
! 		emit_copy_option(query,  USING DELIMITERS , options-delim);
  
- 	/* There is no backward-compatible CSV syntax */
  	if (options-null)
! 		emit_copy_option(query,  WITH NULL AS , options-null);
  
  	if (options-csv_mode)
  		appendPQExpBuffer(query,  CSV);
--- 486,513 
  
  	printfPQExpBuffer(query, COPY );
  
  	appendPQExpBuffer(query, %s , options-table);
  
  	if (options-column_list)
  		appendPQExpBuffer(query, %s , options-column_list);
  
  	if (options-from)
  		appendPQExpBuffer(query, FROM STDIN);
  	else
  		appendPQExpBuffer(query, TO STDOUT);
  
  
! 	if (options-binary)
! 		appendPQExpBuffer(query,  BINARY );
! 
! 	if (options-oids)
! 		appendPQExpBuffer(query,  OIDS );
! 
  	if (options-delim)
! 		emit_copy_option(query,  DELIMITER , options-delim);
  
  	if (options-null)
! 		emit_copy_option(query,  NULL AS , options-null);
  
  	if (options-csv_mode)
  		appendPQExpBuffer(query,  CSV);

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal: Commit timestamp

2007-02-03 Thread Jan Wieck

On 2/1/2007 11:23 PM, Jim Nasby wrote:

On Jan 25, 2007, at 6:16 PM, Jan Wieck wrote:
If a per database configurable tslog_priority is given, the  
timestamp will be truncated to milliseconds and the increment logic  
is done on milliseconds. The priority is added to the timestamp.  
This guarantees that no two timestamps for commits will ever be  
exactly identical, even across different servers.


Wouldn't it be better to just store that information separately,  
rather than mucking with the timestamp?


Though, there's anothe issue here... I don't think NTP is good for  
any better than a few milliseconds, even on a local network.


How exact does the conflict resolution need to be, anyway? Would it  
really be a problem if transaction B committed 0.1 seconds after  
transaction A yet the cluster thought it was the other way around?


Since the timestamp is basically a Lamport counter which is just bumped 
be the clock as well, it doesn't need to be too precise.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Proposal: Commit timestamp

2007-02-03 Thread Theo Schlossnagle


On Feb 3, 2007, at 3:52 PM, Jan Wieck wrote:


On 2/1/2007 11:23 PM, Jim Nasby wrote:

On Jan 25, 2007, at 6:16 PM, Jan Wieck wrote:
If a per database configurable tslog_priority is given, the   
timestamp will be truncated to milliseconds and the increment  
logic  is done on milliseconds. The priority is added to the  
timestamp.  This guarantees that no two timestamps for commits  
will ever be  exactly identical, even across different servers.
Wouldn't it be better to just store that information separately,   
rather than mucking with the timestamp?
Though, there's anothe issue here... I don't think NTP is good  
for  any better than a few milliseconds, even on a local network.
How exact does the conflict resolution need to be, anyway? Would  
it  really be a problem if transaction B committed 0.1 seconds  
after  transaction A yet the cluster thought it was the other way  
around?


Since the timestamp is basically a Lamport counter which is just  
bumped be the clock as well, it doesn't need to be too precise.


Unless I'm missing something, you are _treating_ the counter as a  
Lamport timestamp, when in fact it is not and thus does not provide  
semantics of a Lamport timestamp.  As such, any algorithms that use  
lamport timestamps as a basis or assumption for the proof of their  
correctness will not translate (provably) to this system.


How are your counter semantically equivalent to Lamport timestamps?

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] --enable-debug does not work with gcc

2007-02-03 Thread Peter Eisentraut
Matthias Luedtke wrote:
 And while we are at it, how about some CFLAGS=-DOPTIMIZER_DEBUG
 sweetness for the debugging section in the Dev FAQ?

Most of the debugging macros are not documented because their purpose 
only arises out of the source code.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Commit timestamp

2007-02-03 Thread Jan Wieck

On 2/3/2007 4:05 PM, Theo Schlossnagle wrote:

On Feb 3, 2007, at 3:52 PM, Jan Wieck wrote:


On 2/1/2007 11:23 PM, Jim Nasby wrote:

On Jan 25, 2007, at 6:16 PM, Jan Wieck wrote:
If a per database configurable tslog_priority is given, the   
timestamp will be truncated to milliseconds and the increment  
logic  is done on milliseconds. The priority is added to the  
timestamp.  This guarantees that no two timestamps for commits  
will ever be  exactly identical, even across different servers.
Wouldn't it be better to just store that information separately,   
rather than mucking with the timestamp?
Though, there's anothe issue here... I don't think NTP is good  
for  any better than a few milliseconds, even on a local network.
How exact does the conflict resolution need to be, anyway? Would  
it  really be a problem if transaction B committed 0.1 seconds  
after  transaction A yet the cluster thought it was the other way  
around?


Since the timestamp is basically a Lamport counter which is just  
bumped be the clock as well, it doesn't need to be too precise.


Unless I'm missing something, you are _treating_ the counter as a  
Lamport timestamp, when in fact it is not and thus does not provide  
semantics of a Lamport timestamp.  As such, any algorithms that use  
lamport timestamps as a basis or assumption for the proof of their  
correctness will not translate (provably) to this system.


How are your counter semantically equivalent to Lamport timestamps?


Yes, you must be missing something.

The last used timestamp is remembered. When a remote transaction is 
replicated, the remembered timestamp is set to max(remembered, remote). 
For a local transaction, the remembered timestamp is set to 
max(remembered+1ms, systemclock) and that value is used as the 
transaction commit timestamp.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-02-03 Thread Jan Wieck
Attached is the implementation of the proposed changes as a patch for 
discussion.


The chosen syntax is backward compatible and uses

ALTER TABLE tab ENABLE TRIGGER trig (fires on origin - default)
ALTER TABLE tab DISABLE TRIGGER trig (disabled)
ALTER TABLE tab ENABLE REPLICA TRIGGER trig (fires on replica only)
ALTER TABLE tab ENABLE ALWAYS TRIGGER trig (fires always)

A sessions current role is controlled by the PG_SUSET GUC 
session_replication_role. The possible states are origin, replica and 
local. The local state is identical to origin with respect to trigger 
firing. It is intended to be used to issue statements that do not get 
replicated at all.


The commands psql and pg_dump are adjusted in a backward compatible 
manner. Although I noticed that psql currently is incompatible with at 
least 8.1 databases due to querying indisvalid on \d.


Comments?


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
Index: src/backend/commands/tablecmds.c
===
RCS file: /usr/local/pgsql/CvsRoot/pgsql/src/backend/commands/tablecmds.c,v
retrieving revision 1.213
diff -u -r1.213 tablecmds.c
--- src/backend/commands/tablecmds.c2 Feb 2007 00:07:02 -   1.213
+++ src/backend/commands/tablecmds.c2 Feb 2007 20:27:47 -
@@ -3,7 +3,7 @@
  * tablecmds.c
  *   Commands for creating and altering table structures and settings
  *
- * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1996-2007, PostgreSQL Global DevelopmEnt Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
@@ -252,7 +252,7 @@
 static void ATExecSetTableSpace(Oid tableOid, Oid newTableSpace);
 static void ATExecSetRelOptions(Relation rel, List *defList, bool isReset);
 static void ATExecEnableDisableTrigger(Relation rel, char *trigname,
-  bool enable, bool 
skip_system);
+  char fires_when, bool 
skip_system);
 static void ATExecAddInherit(Relation rel, RangeVar *parent);
 static void ATExecDropInherit(Relation rel, RangeVar *parent);
 static void copy_relation_data(Relation rel, SMgrRelation dst);
@@ -2192,6 +2192,8 @@
pass = AT_PASS_MISC;
break;
case AT_EnableTrig: /* ENABLE TRIGGER variants */
+   case AT_EnableAlwaysTrig:
+   case AT_EnableReplicaTrig:
case AT_EnableTrigAll:
case AT_EnableTrigUser:
case AT_DisableTrig:/* DISABLE TRIGGER variants */
@@ -2364,24 +2366,40 @@
case AT_ResetRelOptions:/* RESET (...) */
ATExecSetRelOptions(rel, (List *) cmd-def, true);
break;
-   case AT_EnableTrig: /* ENABLE TRIGGER name */
-   ATExecEnableDisableTrigger(rel, cmd-name, true, false);
+
+   case AT_EnableTrig: /* ENABLE TRIGGER name 
*/
+   ATExecEnableDisableTrigger(rel, cmd-name, 
+   TRIGGER_FIRES_ON_ORIGIN, false);
+   break;
+   case AT_EnableAlwaysTrig:   /* ENABLE ALWAYS TRIGGER name */
+   ATExecEnableDisableTrigger(rel, cmd-name, 
+   TRIGGER_FIRES_ALWAYS, false);
+   break;
+   case AT_EnableReplicaTrig:  /* ENABLE REPLICA TRIGGER name 
*/
+   ATExecEnableDisableTrigger(rel, cmd-name, 
+   TRIGGER_FIRES_ON_REPLICA, false);
break;
case AT_DisableTrig:/* DISABLE TRIGGER name */
-   ATExecEnableDisableTrigger(rel, cmd-name, false, 
false);
+   ATExecEnableDisableTrigger(rel, cmd-name, 
+   TRIGGER_DISABLED, false);
break;
case AT_EnableTrigAll:  /* ENABLE TRIGGER ALL */
-   ATExecEnableDisableTrigger(rel, NULL, true, false);
+   ATExecEnableDisableTrigger(rel, NULL, 
+   TRIGGER_FIRES_ON_ORIGIN, false);
break;
case AT_DisableTrigAll: /* DISABLE TRIGGER ALL */
-   ATExecEnableDisableTrigger(rel, NULL, false, false);
+   ATExecEnableDisableTrigger(rel, NULL, 
+   TRIGGER_DISABLED, false);
break;
case AT_EnableTrigUser: /* ENABLE 

Re: [HACKERS] Proposal: Commit timestamp

2007-02-03 Thread Theo Schlossnagle


On Feb 3, 2007, at 4:38 PM, Jan Wieck wrote:


On 2/3/2007 4:05 PM, Theo Schlossnagle wrote:

On Feb 3, 2007, at 3:52 PM, Jan Wieck wrote:

On 2/1/2007 11:23 PM, Jim Nasby wrote:

On Jan 25, 2007, at 6:16 PM, Jan Wieck wrote:
If a per database configurable tslog_priority is given, the
timestamp will be truncated to milliseconds and the increment   
logic  is done on milliseconds. The priority is added to the   
timestamp.  This guarantees that no two timestamps for commits   
will ever be  exactly identical, even across different servers.
Wouldn't it be better to just store that information  
separately,   rather than mucking with the timestamp?
Though, there's anothe issue here... I don't think NTP is good   
for  any better than a few milliseconds, even on a local network.
How exact does the conflict resolution need to be, anyway?  
Would  it  really be a problem if transaction B committed 0.1  
seconds  after  transaction A yet the cluster thought it was the  
other way  around?


Since the timestamp is basically a Lamport counter which is just   
bumped be the clock as well, it doesn't need to be too precise.
Unless I'm missing something, you are _treating_ the counter as a   
Lamport timestamp, when in fact it is not and thus does not  
provide  semantics of a Lamport timestamp.  As such, any  
algorithms that use  lamport timestamps as a basis or assumption  
for the proof of their  correctness will not translate (provably)  
to this system.

How are your counter semantically equivalent to Lamport timestamps?


Yes, you must be missing something.

The last used timestamp is remembered. When a remote transaction is  
replicated, the remembered timestamp is set to max(remembered,  
remote). For a local transaction, the remembered timestamp is set  
to max(remembered+1ms, systemclock) and that value is used as the  
transaction commit timestamp.


A Lamport clock, IIRC, require a cluster wide tick.  This seems based  
only on activity and is thus an observational tick only which means  
various nodes can have various perspectives at different times.


Given that time skew is prevalent, why is the system clock involved  
at all?


As is usual distributed systems problems, they are very hard to  
explain casually and also hard to review from a theoretical angle  
without a proof.  Are you basing this off a paper?  If so which one?   
If not, have you written a rigorous proof of correctness for this  
approach?


// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Commit timestamp

2007-02-03 Thread Jan Wieck

On 2/3/2007 4:58 PM, Theo Schlossnagle wrote:

On Feb 3, 2007, at 4:38 PM, Jan Wieck wrote:


On 2/3/2007 4:05 PM, Theo Schlossnagle wrote:

On Feb 3, 2007, at 3:52 PM, Jan Wieck wrote:

On 2/1/2007 11:23 PM, Jim Nasby wrote:

On Jan 25, 2007, at 6:16 PM, Jan Wieck wrote:
If a per database configurable tslog_priority is given, the
timestamp will be truncated to milliseconds and the increment   
logic  is done on milliseconds. The priority is added to the   
timestamp.  This guarantees that no two timestamps for commits   
will ever be  exactly identical, even across different servers.
Wouldn't it be better to just store that information  
separately,   rather than mucking with the timestamp?
Though, there's anothe issue here... I don't think NTP is good   
for  any better than a few milliseconds, even on a local network.
How exact does the conflict resolution need to be, anyway?  
Would  it  really be a problem if transaction B committed 0.1  
seconds  after  transaction A yet the cluster thought it was the  
other way  around?


Since the timestamp is basically a Lamport counter which is just   
bumped be the clock as well, it doesn't need to be too precise.
Unless I'm missing something, you are _treating_ the counter as a   
Lamport timestamp, when in fact it is not and thus does not  
provide  semantics of a Lamport timestamp.  As such, any  
algorithms that use  lamport timestamps as a basis or assumption  
for the proof of their  correctness will not translate (provably)  
to this system.

How are your counter semantically equivalent to Lamport timestamps?


Yes, you must be missing something.

The last used timestamp is remembered. When a remote transaction is  
replicated, the remembered timestamp is set to max(remembered,  
remote). For a local transaction, the remembered timestamp is set  
to max(remembered+1ms, systemclock) and that value is used as the  
transaction commit timestamp.


A Lamport clock, IIRC, require a cluster wide tick.  This seems based  
only on activity and is thus an observational tick only which means  
various nodes can have various perspectives at different times.


Given that time skew is prevalent, why is the system clock involved  
at all?


This question was already answered.

As is usual distributed systems problems, they are very hard to  
explain casually and also hard to review from a theoretical angle  
without a proof.  Are you basing this off a paper?  If so which one?   
If not, have you written a rigorous proof of correctness for this  
approach?


I don't have any such paper and the proof of concept will be the 
implementation of the system. I do however see enough resistance against 
this proposal to withdraw the commit timestamp at this time. The new 
replication system will therefore require the installation of a patched, 
non-standard PostgreSQL version, compiled from sources cluster wide in 
order to be used. I am aware that this will dramatically reduce it's 
popularity but it is impossible to develop this essential feature as an 
external module.


I thank everyone for their attention.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Commit timestamp

2007-02-03 Thread Theo Schlossnagle


On Feb 3, 2007, at 5:09 PM, Jan Wieck wrote:


On 2/3/2007 4:58 PM, Theo Schlossnagle wrote:

I don't have any such paper and the proof of concept will be the  
implementation of the system. I do however see enough resistance  
against this proposal to withdraw the commit timestamp at this  
time. The new replication system will therefore require the  
installation of a patched, non-standard PostgreSQL version,  
compiled from sources cluster wide in order to be used. I am aware  
that this will dramatically reduce it's popularity but it is  
impossible to develop this essential feature as an external module.


I thank everyone for their attention.


Actually, I believe the commit timestamp stuff would be very useful  
in general.  I would certainly like to see rigorous proofs of any   
multi-master replication technology built on top of them.  I believe  
that while your replication stuff might rely on the commit  
timestamps, the commit timestamps rely on thing else (except the work  
that you have been churning on).


Using commit timestamps, one can easily implement cross vendor  
database replication.  These can be used to implement something like  
trigger selective redo logs.  I think they can be used to produce DML  
logs that will require a lot less accounting to manage replicating  
tables from PostgreSQL into another database (like Oracle or MySQL).


// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Commit timestamp

2007-02-03 Thread Bruce Momjian
Jan Wieck wrote:
 I don't have any such paper and the proof of concept will be the 
 implementation of the system. I do however see enough resistance against 
 this proposal to withdraw the commit timestamp at this time. The new 
 replication system will therefore require the installation of a patched, 
 non-standard PostgreSQL version, compiled from sources cluster wide in 
 order to be used. I am aware that this will dramatically reduce it's 
 popularity but it is impossible to develop this essential feature as an 
 external module.
 
 I thank everyone for their attention.

Going and working on it on your own doesn't seem like the proper
solution.  I don't see people objecting to adding it, but they want it
work, which I am sure you want too.  You have to show how it will work
and convince others of that, and then you have a higher chance it will
work, and be in the PostgreSQL codebase.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-02-03 Thread Joshua D. Drake
Jan Wieck wrote:
 Attached is the implementation of the proposed changes as a patch for
 discussion.
 
 The chosen syntax is backward compatible and uses
 
 ALTER TABLE tab ENABLE TRIGGER trig (fires on origin - default)
 ALTER TABLE tab DISABLE TRIGGER trig (disabled)
 ALTER TABLE tab ENABLE REPLICA TRIGGER trig (fires on replica only)
 ALTER TABLE tab ENABLE ALWAYS TRIGGER trig (fires always)
 

snip

 
 The commands psql and pg_dump are adjusted in a backward compatible
 manner. Although I noticed that psql currently is incompatible with at
 least 8.1 databases due to querying indisvalid on \d.
 
 Comments?

This is interesting. If I understand correctly the idea here is to be
able to determine which triggers will get fired based on the role the
database plays?

E.g; I have a REPLICA TRIGGER and thus I can use that on a
subscriber/slave to take replicated data and create reports automatically.

How do we deal with other problems such as a PROMOTED state?

Sincerely,

Joshua D. Drake





-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] recovery.conf parsing problems

2007-02-03 Thread Bruce Momjian

Added to TODO:

  o Allow recovery.conf to allow the same syntax as
postgresql.conf, including quoting


http://archives.postgresql.org/pgsql-hackers/2006-12/msg00497.php


---

Simon Riggs wrote:
 On Wed, 2006-12-13 at 19:28 +, Simon Riggs wrote:
  On Wed, 2006-12-13 at 04:23 +, Andrew - Supernews wrote:
   While testing a PITR recovery, I discovered that recovery.conf doesn't
   seem to allow specifying ' in the command string, making it hard to
   protect the restore_command against problematic filenames (whitespace
   etc.). This doesn't seem to be a problem for archive_command, which
   allows \' (e.g. archive_command = '/path/to/script \'%f\' \'%p\'').
 
 Immediate workaround is to use a script to encapsulate the actual
 desired request.
 
   Should this be fixed?
  
  Yes, I'll look into that.
 
 OK, I would propose to extend the guc-file.l to include sufficient code
 to allow the parsing of the conf files to be identical between the
 postgresql.conf and the recovery.conf (it isn't the same yet).
 
 There'll be care taken to ensure that the various options are not
 settable in the wrong file.
 
 Any requests for specific implementation details? I'll be looking to
 remove code from xlog.c, if possible.
 
 Implementation won't be immediate because of other current work.
 
 -- 
   Simon Riggs 
   EnterpriseDB   http://www.enterprisedb.com
 
 
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.1.5 release note

2007-02-03 Thread Bruce Momjian
Tatsuo Ishii wrote:
 As usual, following item in the 8.1.5 release note is pretty vague:
 
  * Efficiency improvements in hash tables and bitmap index scans(Tom)
 
 Especially I'm wondering what was actually improved in bitmap index
 scans. I see several commit messages regarding bitmap index scans, but
 I cannot figure out which one is related to the item.

Sorry for the delay. I don't know the answer.  The commit messages are
the only way.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] SRF optimization question

2007-02-03 Thread Jeremy Drake
I am writing a set returning function in C.  There are cases where I can
know definitively, upfront, that this function will only return one row.
I have noticed, through happenstance of partially converted function, that
I can mark a normal, non-set returning function as returning SETOF
something, while not utilizing the SRF macros and using PG_RETURN_DATUM,
and it still works as returning one row.

I am wondering, if it is an acceptable optimization that if I know
up-front that a function will only return one row, to avoid all of the
SRF overhead of setting up a new memory context, and a function context
struct, and requiring an extra call to my function to tell Postgres that I
am done sending rows, to simply not use the SRF stuff and interact with
Postgres as though I was not returning SETOF?  Is this a sane idea, or did
I just stumble into an accidental feature when I changed my CREATE
FUNCTION statement without changing my C code?



-- 
UNIX was half a billion (5) seconds old on
Tue Nov  5 00:53:20 1985 GMT (measuring since the time(2) epoch).
-- Andy Tannenbaum

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Dead code in _bt_split?

2007-02-03 Thread Bruce Momjian

Heikki, did this code cleanup get included in your recent btree split
fix?

---

Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  In that case, newitemleft would be false, right?
  I'm saying the piece marked with X below is unreachable:
 
 Oh, I see.  Hmm ... probably so, I think that chunk of code was just
 copied and pasted from where it occurs within the loop.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-02-03 Thread Jan Wieck

On 2/3/2007 5:25 PM, Joshua D. Drake wrote:

Jan Wieck wrote:

Attached is the implementation of the proposed changes as a patch for
discussion.

The chosen syntax is backward compatible and uses

ALTER TABLE tab ENABLE TRIGGER trig (fires on origin - default)
ALTER TABLE tab DISABLE TRIGGER trig (disabled)
ALTER TABLE tab ENABLE REPLICA TRIGGER trig (fires on replica only)
ALTER TABLE tab ENABLE ALWAYS TRIGGER trig (fires always)



snip



The commands psql and pg_dump are adjusted in a backward compatible
manner. Although I noticed that psql currently is incompatible with at
least 8.1 databases due to querying indisvalid on \d.

Comments?


This is interesting. If I understand correctly the idea here is to be
able to determine which triggers will get fired based on the role the
database plays?


Not the database, the session actually has a role, which defaults to 
origin. The default configuration for triggers (including RI triggers) 
is O (fires on origin). If the session does


SET session_replication_role = replica;

only triggers configured A (always) or R (replica) will fire. Not those 
configured O (origin) or D (disabled). This means that a row based 
replication system like Slony only has to set the replication role of 
the session in order to disable triggers. It does not need to touch the 
system catalog or even ALTER TABLE to do its work. This would even 
suppress Slony-I's deny-access-trigger, that is in place on subscribers 
to prevent accidental updates on a replica.


Doing it on the session level is even more important for row based 
multimaster. At the same time where a user session does an update that 
needs to be added to the replication log, the replication engine in 
another session must be able to apply a remote transactions updates 
without firing the log trigger.





E.g; I have a REPLICA TRIGGER and thus I can use that on a
subscriber/slave to take replicated data and create reports automatically.

How do we deal with other problems such as a PROMOTED state?


Promoted as in transfer of origin to a replica? In the case of a 
master-slave system like Slony-I, the origin of a table has the log 
trigger, that captures row changes, while a replica has a trigger that 
simply bails out with an error. Transfer of ownership is done by 
dropping one and creating the other trigger.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Remove log segment and log_id fields from pg_controldata

2007-02-03 Thread Bruce Momjian

The original discussion of this patch was here:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00876.php

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Simon Riggs wrote:
 On Tue, 2006-12-05 at 17:26 -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   On Tue, 2006-12-05 at 16:24 -0500, Tom Lane wrote:
   Sure, what would happen is that every backend passing through this code
   would execute the several lines of computation needed to decide whether
   to call RequestCheckpoint.
  
   Right, but the calculation uses RedoRecPtr, which may not be completely
   up to date. So presumably you want to re-read the shared memory value
   again to make sure we are exactly accurate and allow only one person to
   call checkpoint? Either way we have to take a lock. Insert lock causes
   deadlock, so we would need to use infolock. 
  
  Not at all.  It's highly unlikely that RedoRecPtr would be so out of
  date as to result in a false request for a checkpoint, and if it does,
  so what?  Worst case is we perform an extra checkpoint.
 
 On its own, I wouldn't normally agree...
 
  Also, given the current structure of the routine, this is probably not
  the best place for that code at all --- it'd make more sense for it to
  be in the just-finished-a-segment code stretch, which would ensure that
  it's only done by one backend once per segment.
 
 But thats a much better plan since it requires no locking.
 
 There's a lot more changes there for such a simple fix though and lots
 more potential bugs, but I've coded it as you suggest and removed the
 fields from pg_control.
 
 Patch passes make check, applies cleanly on HEAD.
 pg_resetxlog and pgcontroldata tested.
 
 
 -- 
   Simon Riggs 
   EnterpriseDB   http://www.enterprisedb.com
 

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] psql possible TODO

2007-02-03 Thread Bruce Momjian

Added to TODO:

o Add \# to list command history like \s, but with line numbers

 http://archives.postgresql.org/pgsql-hackers/2006-12/msg00255.php


---

Joshua D. Drake wrote:
 Hello,
 
 O.k. this is my currently thinking:
 
 \#will list just like \s except it will include line numbers:
 
 1 SELECT * FROM FOO; (not a tab of course)
 2 UPDATE bar SET foo = 6;
 
 \# 2  will execute query number 2
 \#e 2 will open the numbered query in $EDITOR
 
 I would love to figure out a way to auto explain these queries without
 obnoxious syntax. Any ideas?
 
 
 Sincerely,
 
 Joshua D. Drake
 
 
 -- 
 
   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
  http://www.commandprompt.com/
 
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
 
 
 
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] elog(FATAL)ing non-existent roles during client

2007-02-03 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Gavin Sherry wrote:
 On Tue, 5 Dec 2006, Gavin Sherry wrote:
 
  On Thu, 30 Nov 2006, Tom Lane wrote:
 
   Gavin Sherry [EMAIL PROTECTED] writes:
I wonder if we should check if the role exists for the other
authentication methods too? get_role_line() should be very cheap and it
would prevent unnecessary authentication work if we did it before
contacting, for example, the client ident server. Even with trust, it
would save work because otherwise we do not check if the user exists 
until
InitializeSessionUserId(), at which time we're set up our proc entry 
etc.
  
   This only saves work if the supplied ID is in fact invalid, which one
   would surely think isn't the normal case; otherwise it costs more.
 
  Yes.
 
   I could see doing this in the ident path, because contacting a remote
   ident server is certainly expensive on both sides.  I doubt it's a good
   idea in the trust case.
 
  Agreed. How about Kerberos too, applying the same logic?
 
 Attached is a patch check adds the checks.
 
 Gavin
Content-Description: 

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Compacting a relation

2007-02-03 Thread Bruce Momjian
Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  vacuumlazy.c contains a hint Consider compacting this relation but 
  AFAICT, 
  there is no indication anywhere how compacting is supposed to be achieved.
  I guess this means VACUUM FULL or CLUSTER, but I don't think the hint can 
  be 
  processed effectively by a user.
 
 So change it ...

New message is:

  errhint(Consider using VACUUM FULL on this relation or increasing the 
configuration parameter \max_fsm_pages\.)));

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] custom variable classes

2007-02-03 Thread Bruce Momjian

Added to TODO:

  * Allow custom variable classes that can restrict who can set the values

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00911.php


---

Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
  One thing I want to look at for 8.3 is improving custom variable 
  classes. Right now these are all user settable, which makes them quite 
  inappropriate for security related settings (such as which perl modules 
  to load for use by trusted plperl). I'm wondering if we should perhaps 
  allow something like:
 
custom_variable_classes = 'foo'
foo:security_level.bar = 'blurfl'
 
 This seems really ugly --- for one thing, what if the DBA gets it wrong?
 
 The value won't mean anything until the code that uses it gets loaded,
 at which time the correct GucContext for the variable will be supplied.
 How about we just compare that to the current definition source of the
 value, and if we see it's been set improperly, revert the value to
 default?
 
 It might also be a good idea to disallow ALTER USER/DATABASE SET for a
 custom variable that's a placeholder, since we don't at that time know
 if the value should be SUSET or not; and there seems no pressing need to
 allow these ALTERs to be done without having loaded the defining module.
 
 [ thinks for a bit... ]  With that provision in place, I think it would
 be safe to revert to the reset value instead of the wired-in default,
 which would be marginally nicer.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: Commit timestamp

2007-02-03 Thread Jan Wieck

On 2/3/2007 5:20 PM, Bruce Momjian wrote:

Jan Wieck wrote:
I don't have any such paper and the proof of concept will be the 
implementation of the system. I do however see enough resistance against 
this proposal to withdraw the commit timestamp at this time. The new 
replication system will therefore require the installation of a patched, 
non-standard PostgreSQL version, compiled from sources cluster wide in 
order to be used. I am aware that this will dramatically reduce it's 
popularity but it is impossible to develop this essential feature as an 
external module.


I thank everyone for their attention.


Going and working on it on your own doesn't seem like the proper
solution.  I don't see people objecting to adding it, but they want it
work, which I am sure you want too.  You have to show how it will work
and convince others of that, and then you have a higher chance it will
work, and be in the PostgreSQL codebase.


Bruce,

I think I have sufficiently detailed explained how this Lamport 
timestamp will be unique and ever increasing, with the nodes ID being 
used as a tie breaker. The only thing important for last update wins 
conflict resolution is that whatever timestamp you have associated with 
a row, the update you do to it must be associated with a later timestamp 
so that all other nodes will overwrite the data. If a third node gets 
the two updates out of order, it will do the second nodes update and 
since the row it has then has a later timestamp then the first update 
arriving late, it will throw away that information. All nodes in sync 
again.


This is all that is needed for last update wins resolution. And as said 
before, the only reason the clock is involved in this is so that nodes 
can continue autonomously when they lose connection without conflict 
resolution going crazy later on, which it would do if they were simple 
counters. It doesn't require microsecond synchronized clocks and the 
system clock isn't just used as a Lamport timestamp.


The problem seems to me that people want a full scale proof of concept 
for the whole multimaster replication system I'm planning instead of 
thinking isolated about this one aspect, the intended use case and other 
possible uses for it (like table logging). And we all know that that 
discussion will take us way behind the 8.3 feature freeze date, so the 
whole thing will never get done.


I don't want to work on this on my own and I sure would prefer it to be 
a default PostgreSQL feature. As said, I have learned some things from 
Slony-I. One of them is that I will not go through any more ugly 
workarounds in order to not require a patched backend. If the features I 
really need aren't going to be in the default codebase, people will have 
to install from patched sources.


Finally, again, Slony-I could have well used this feature. With a 
logical commit timestamp, I would have never even thought about that 
other wart called xxid. It would have all been sooo much easier.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Referential Integrity and SHARE locks

2007-02-03 Thread Jan Wieck

On 2/2/2007 4:51 AM, Simon Riggs wrote:

It sounds like if we don't put a SHARE lock on the referenced table then
we can end the transaction in an inconsistent state if the referenced
table has concurrent UPDATEs or DELETEs. BUT those operations do impose
locking rules back onto the referencing tables that would not be granted
until after any changes to the referencing table complete, whereupon
they would restrict or cascade. So an inconsistent state doesn't seem
possible to me.

What am I missing?



You're missing MVCC. The newly inserted reference only becomes visible 
when it is committed. If the order of actions is insert and check for 
PK, other transaction deletes PK and commits, inserted FK commits ... 
the other transaction didn't see it coming.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org