Re: [HACKERS] Improving deadlock error messages

2007-04-26 Thread Jim Nasby

On Apr 23, 2007, at 11:38 PM, Tom Lane wrote:

Neil Conway [EMAIL PROTECTED] writes:

On Sat, 2007-04-21 at 19:43 -0400, Neil Conway wrote:

Attached is a very quick hack of a patch to do this.


Does anyone have any feedback on this approach? If people are  
satisfied
with this solution, I can get a cleaned up patch ready to apply  
shortly.


I'm really still opposed to the entire concept.  You're proposing  
to put

a lot of fragile-looking code into a seldom-exercised error path.
I fear bugs will survive a long time in there, and the net effect  
will be
that we get no information when we need it most.  The numeric  
printouts

may be ugly, but they are reliable.


If we're that worried about test coverage for deadlocks, what about  
adding a test to the regression tests? IIRC the framework can  
coordinate between multiple connections now...

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-26 Thread Jim Nasby

On Apr 25, 2007, at 2:48 PM, Heikki Linnakangas wrote:
In recovery, with full_pages_writes=on, we read in each page only  
to overwrite the contents with a full page image. That's a waste of  
time, and can have a surprisingly large effect on recovery time.


As a quick test on my laptop, I initialized a DBT-2 test with 5  
warehouses, and let it run for 2 minutes without think-times to  
generate some WAL. Then I did a kill -9 postmaster, and took a  
copy of the data directory to use for testing recovery.


With CVS HEAD, the recovery took ~ 2 minutes. With the attached  
patch, it took 5 seconds. (yes, I used the same not-yet-recovered  
data directory in both tests, and cleared the os cache with echo 1  
 /proc/sys/vm/drop_caches).


I was surprised how big a difference it makes, but when you think  
about it it's logical. Without the patch, it's doing roughly the  
same I/O as the test itself, reading in pages, modifying them, and  
writing them back. With the patch, all the reads are done  
sequentially from the WAL, and then written back in a batch at the  
end of the WAL replay which is a lot more efficient.


It's interesting that (with the patch) full_page_writes can  
*shorten* your recovery time. I've always thought it to have a  
purely negative effect on performance.


I'll leave it up to the jury if this tiny little change is  
appropriate after feature freeze...


While working on this, this comment in ReadBuffer caught my eye:


/*
 * During WAL recovery, the first access to any data page should
 * overwrite the whole page from the WAL; so a clobbered page
 * header is not reason to fail.  Hence, when InRecovery we may
 * always act as though zero_damaged_pages is ON.
 */
if (zero_damaged_pages || InRecovery)
{


But that assumption only holds if full_page_writes is enabled,  
right? I changed that in the attached patch as well, but if it  
isn't accepted that part of it should still be applied, I think.


So what happens if a backend is running with full_page_writes = off,  
someone edits postgresql.conf to turns it on and forgets to reload/ 
restart, and then we crash? You'll come up in recovery mode thinking  
that f_p_w was turned on, when in fact it wasn't.


ISTM that we need to somehow log what the status of full_page_writes  
is, if it's going to affect how recovery works.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

2007-04-26 Thread Dave Page
This was another occurance of the strange create index failure on
Narwhal - unfortunately, despite having 'keep_error_builds' = 1 in my
BF config it seems to have removed the tree so I can't get the dump that
Tom wanted.

Does anyone know why the keep_error_builds option didn't work in this
case? Or have I misinterpreted it's meaning?

Regards, Dave.

 Original Message 
Subject: PGBuildfarm member narwhal Branch HEAD Status changed from OK
to InstallCheck failure
Date: Wed, 25 Apr 2007 19:34:34 -0700 (PDT)
From: PG Build Farm [EMAIL PROTECTED]
To: [EMAIL PROTECTED],
[EMAIL PROTECTED]


The PGBuildfarm member narwhal had the following event on branch HEAD:

Status changed from OK to InstallCheck failure

The snapshot timestamp for the build that triggered this notification
is: 2007-04-26 02:00:01

The specs of this machine are:
OS:  Windows Server 2003 R2 / 5.2.3790
Arch: i686
Comp: GCC / 3.4.2 (mingw-special)

For more information, see
http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=narwhalbr=HEAD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
On Wed, Apr 25, 2007 at 03:17:19PM -0400, Tom Lane wrote:
 Why in the world is that like that?  We don't have such a kluge
 anyplace else we use va_list.  stringinfo.c for instance has
 never needed any such thing.

I don't remember the exact details but this was added a long time ago
before 8.0 because we had some problems with one of the archs. I was
suprised about the differences back then too, but haven't checked
since.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
On Wed, Apr 25, 2007 at 04:38:30PM -0400, Tom Lane wrote:
 My recommendation is to get rid of the APREF hack, deal only in
 va_list not va_list, and inline ECPGget_variable into the two
 places it's used to avoid the question of passing va_lists around
 after they've been modified.  The routine's not that big (especially
 seeing that only half of it is actually shared by the two callers)
 and it's just not worth the notational effort, let alone any portability
 risks, to factor it out.

Having spend countless hours debugging this stuff I fully agree with
you. It's not just ECPGget_variable though. I also had to inline
create_statement. This is only called once, so no big deal, but the
calling function gets bigger by quite a margin that way. It was
definitely easier to read with these functions, but again all this
hassle isn't worth it.

Attached you'll find a patch that should inline both functions and
remove the APREF stuff. This successfully runs the regression suite on
my Linux box. Please test it on those archs that needed special
treatment before I commit.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] too much WAL volume

2007-04-26 Thread Zeugswetter Andreas ADI SD

  Writing to a different area was considered in pg, but there were
more 
  negative issues than positive.
  So imho pg_compresslog is the correct path forward. The current
  discussion is only about whether we want a more complex
pg_compresslog 
  and no change to current WAL, or an increased WAL size for a less 
  complex implementation.
  Both would be able to compress the WAL to the same archive log
size.
 
 Huh?  As conceived, pg_compresslog does nothing to lower log 
 volume for general purposes, just on-disk storage size for 
 archiving.  It doesn't help us at all with the tremendous 
 amount of log we put out for an OLTP server, for example.

Ok, that is not related to the original discussion though.
I have thus changed the subject, and removed [PATCHES].

You cannot directly compare the pg WAL size with other db's since they
write parts to other areas (e.g. physical log in Informix). You would
need to include those writes in a fair comparison.
It is definitely not true, that writing to a different area has only
advantages. The consensus was, that writing the page images to the WAL
has more pro's. We could revisit the pros and cons though. 

Other options involve special OS and hardware (we already have that), or
accepting a high risc of needing a
restore after power outage (we don't have that, because we use no
mechanism to detect such a failure).

I am not sure that shrinking per WAL record size (other than the full
page images), e.g. by only logging changed bytes and not whole tuples,
would have a huge impact on OLTP tx/sec, since the limiting factor is
IO's per second and not Mb per second. Recent developments like HOT seem
a lot more promising in this regard since they avoid IO.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Re: [Pgbuildfarm-members] [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

2007-04-26 Thread Andrew Dunstan
Dave Page wrote:
 This was another occurance of the strange create index failure on
 Narwhal - unfortunately, despite having 'keep_error_builds' = 1 in my
 BF config it seems to have removed the tree so I can't get the dump that
 Tom wanted.

 Does anyone know why the keep_error_builds option didn't work in this
 case? Or have I misinterpreted it's meaning?



Well, anyone will usually be me ;-)

To the best of my knowledge and belief it does work. This is the code it
should cause to be executed:

my $timestr = strftime %Y-%m-%d-%H:%M:%S, localtime($now);
move($pgsql, pgsqlkeep.$timestr);
move(inst, instkeep.$timestr)   if (-d inst) ;

Please see if you can trap the script stdout/stderr somewhere so that next
time it fails you can send it to me.

How exactly do you have the buildfarm runs automated? Maybe that does a
little cleaning of its own?

cheers

andrew





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
ARGH!

This time with patch.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
diff -ruN --exclude CVS /home/postgres/pgsql-ecpg/ecpglib/descriptor.c ecpglib/descriptor.c
--- /home/postgres/pgsql-ecpg/ecpglib/descriptor.c	2006-10-04 09:38:18.0 +0200
+++ ecpglib/descriptor.c	2007-04-26 12:02:20.0 +0200
@@ -507,7 +507,6 @@
 	do
 	{
 		enum ECPGdtype itemtype;
-		enum ECPGttype type;
 		const char *tobeinserted = NULL;
 		bool		malloced;
 
@@ -516,13 +515,29 @@
 		if (itemtype == ECPGd_EODT)
 			break;
 
-		type = va_arg(args, enum ECPGttype);
-#if defined(__GNUC__)  (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__))
-		ECPGget_variable(args, type, var, false);
-#else
-		ECPGget_variable(args, type, var, false);
-#endif
+		var-type = va_arg(args, enum ECPGttype);
+		var-pointer = va_arg(args, char *);
 
+		var-varcharsize = va_arg(args, long);
+		var-arrsize = va_arg(args, long);
+		var-offset = va_arg(args, long);
+
+		if (var-arrsize == 0 || var-varcharsize == 0)
+			var-value = *((char **) (var-pointer));
+		else
+			var-value = var-pointer;
+
+		/*
+		 * negative values are used to indicate an array without given bounds
+		 */
+		/* reset to zero for us */
+		if (var-arrsize  0)
+			var-arrsize = 0;
+		if (var-varcharsize  0)
+			var-varcharsize = 0;
+
+		var-next = NULL;
+		
 		switch (itemtype)
 		{
 			case ECPGd_data:
diff -ruN --exclude CVS /home/postgres/pgsql-ecpg/ecpglib/execute.c ecpglib/execute.c
--- /home/postgres/pgsql-ecpg/ecpglib/execute.c	2007-03-29 14:04:34.0 +0200
+++ ecpglib/execute.c	2007-04-26 12:01:33.0 +0200
@@ -79,136 +79,6 @@
 	}
 }
 
-#if defined(__GNUC__)  (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__))
-#define APREF ap
-#else
-#define APREF *ap
-#endif
-
-void
-ECPGget_variable(va_list APREF, enum ECPGttype type, struct variable * var, bool indicator)
-{
-	var-type = type;
-	var-pointer = va_arg(APREF, char *);
-
-	var-varcharsize = va_arg(APREF, long);
-	var-arrsize = va_arg(APREF, long);
-	var-offset = va_arg(APREF, long);
-
-	if (var-arrsize == 0 || var-varcharsize == 0)
-		var-value = *((char **) (var-pointer));
-	else
-		var-value = var-pointer;
-
-	/*
-	 * negative values are used to indicate an array without given bounds
-	 */
-	/* reset to zero for us */
-	if (var-arrsize  0)
-		var-arrsize = 0;
-	if (var-varcharsize  0)
-		var-varcharsize = 0;
-
-	var-next = NULL;
-
-	if (indicator)
-	{
-		var-ind_type = va_arg(APREF, enum ECPGttype);
-		var-ind_pointer = va_arg(APREF, char *);
-		var-ind_varcharsize = va_arg(APREF, long);
-		var-ind_arrsize = va_arg(APREF, long);
-		var-ind_offset = va_arg(APREF, long);
-
-		if (var-ind_type != ECPGt_NO_INDICATOR
-			 (var-ind_arrsize == 0 || var-ind_varcharsize == 0))
-			var-ind_value = *((char **) (var-ind_pointer));
-		else
-			var-ind_value = var-ind_pointer;
-
-		/*
-		 * negative values are used to indicate an array without given bounds
-		 */
-		/* reset to zero for us */
-		if (var-ind_arrsize  0)
-			var-ind_arrsize = 0;
-		if (var-ind_varcharsize  0)
-			var-ind_varcharsize = 0;
-	}
-}
-
-/*
- * create a list of variables
- * The variables are listed with input variables preceding outputvariables
- * The end of each group is marked by an end marker.
- * per variable we list:
- * type - as defined in ecpgtype.h
- * value - where to store the data
- * varcharsize - length of string in case we have a stringvariable, else 0
- * arraysize - 0 for pointer (we don't know the size of the array),
- * 1 for simple variable, size for arrays
- * offset - offset between ith and (i+1)th entry in an array,
- * normally that means sizeof(type)
- * ind_type - type of indicator variable
- * ind_value - pointer to indicator variable
- * ind_varcharsize - empty
- * ind_arraysize -	arraysize of indicator array
- * ind_offset - indicator offset
- */
-static bool
-create_statement(int lineno, int compat, int force_indicator, struct connection * connection, struct statement ** stmt, const char *query, va_list APREF)
-{
-	struct variable **list = ((*stmt)-inlist);
-	enum ECPGttype type;
-
-	if (!(*stmt = (struct statement *) ECPGalloc(sizeof(struct statement), lineno)))
-		return false;
-
-	(*stmt)-command = ECPGstrdup(query, lineno);
-	(*stmt)-connection = connection;
-	(*stmt)-lineno = lineno;
-	(*stmt)-compat = compat;
-	(*stmt)-force_indicator = force_indicator;
-
-	list = ((*stmt)-inlist);
-
-	type = va_arg(APREF, enum ECPGttype);
-
-	while (type != ECPGt_EORT)
-	{
-		if (type == ECPGt_EOIT)
-			list = ((*stmt)-outlist);
-		else
-		{
-			struct variable *var,
-	   *ptr;
-
-			if (!(var = (struct variable *) ECPGalloc(sizeof(struct variable), lineno)))
-return false;
-
-			ECPGget_variable(ap, type, var, true);
-
-			/* if variable is NULL, the statement hasn't been prepared */
-			

[HACKERS] Re: [Pgbuildfarm-members] [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

2007-04-26 Thread Dave Page
Andrew Dunstan wrote:
 Dave Page wrote:
 This was another occurance of the strange create index failure on
 Narwhal - unfortunately, despite having 'keep_error_builds' = 1 in my
 BF config it seems to have removed the tree so I can't get the dump that
 Tom wanted.

 Does anyone know why the keep_error_builds option didn't work in this
 case? Or have I misinterpreted it's meaning?

 
 
 Well, anyone will usually be me ;-)

Yeah, I know but I didn't want to presume :-)

 To the best of my knowledge and belief it does work. This is the code it
 should cause to be executed:
 
   my $timestr = strftime %Y-%m-%d-%H:%M:%S, localtime($now);
   move($pgsql, pgsqlkeep.$timestr);
   move(inst, instkeep.$timestr)   if (-d inst) ;
 
 Please see if you can trap the script stdout/stderr somewhere so that next
 time it fails you can send it to me.

I already log the run output - there's nothing unusual in there.

 How exactly do you have the buildfarm runs automated? Maybe that does a
 little cleaning of its own?

A simple batch script, run from the task scheduler which fires off the
script. It doesn't do any cleanup at all.

Regards, Dave.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Andrew Dunstan
Michael Meskes wrote:

 Attached you'll find a patch that should inline both functions and
 remove the APREF stuff. This successfully runs the regression suite on
 my Linux box. Please test it on those archs that needed special
 treatment before I commit.


If you commit to HEAD it will be automatically tested on the buildfarm.

cheers

andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Michael Meskes
On Thu, Apr 26, 2007 at 06:28:29AM -0500, Andrew Dunstan wrote:
 If you commit to HEAD it will be automatically tested on the buildfarm.

True. But it might also break a lot of other archs without helping on
those troubled ones. I thought this way would be better.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-26 Thread Zeugswetter Andreas ADI SD

 So what happens if a backend is running with full_page_writes 
 = off, someone edits postgresql.conf to turns it on and 
 forgets to reload/ restart, and then we crash? You'll come up 
 in recovery mode thinking that f_p_w was turned on, when in 
 fact it wasn't.
 
 ISTM that we need to somehow log what the status of 
 full_page_writes is, if it's going to affect how recovery works.

Optimally recovery should do this when confronted with a full page image
only. The full page is in the same WAL record that first touches a page,
so this should not need to depend on a setting.

Andreas

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-04-26 Thread Tom Lane
Jim Nasby [EMAIL PROTECTED] writes:
 So what happens if a backend is running with full_page_writes = off,  
 someone edits postgresql.conf to turns it on and forgets to reload/ 
 restart, and then we crash? You'll come up in recovery mode thinking  
 that f_p_w was turned on, when in fact it wasn't.

One of the advantages of the proposed patch is that it avoids having to
make any assumptions like that.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] RESET command seems pretty disjointed now

2007-04-26 Thread Neil Conway
On Tue, 2007-04-24 at 18:04 +0300, Marko Kreen wrote:
 Attached patch addresses all 3 comments.  As it will be
 top-level command, I put code into commands/discard.c

Applied with some minor tweaks -- thanks for the patch. I didn't bother
moving the regression tests out of guc.sql, although they don't really
belong there any longer.

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] RESET command seems pretty disjointed now

2007-04-26 Thread Bruce Momjian

Patch applied from Neil.

---

Marko Kreen wrote:
 On 4/23/07, Neil Conway [EMAIL PROTECTED] wrote:
  On Tue, 2007-04-17 at 16:34 +0300, Marko Kreen wrote:
   Attached patch does following conversions:
 
  ISTM it would be cleaner to use an enum to identify the different
  variants of the DISCARD command, rather than a character string.
 
  Is guc.c still the logical place for the implementation of DISCARD?
  Something under backend/commands might be better, although I don't see a
  real obvious place for it.
 
  The psql tab completion code requires updating for the new DISCARD
  command.
 
 Attached patch addresses all 3 comments.  As it will be
 top-level command, I put code into commands/discard.c
 
 -- 
 marko

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Mark Wong

On 4/26/07, Michael Meskes [EMAIL PROTECTED] wrote:

On Wed, Apr 25, 2007 at 04:38:30PM -0400, Tom Lane wrote:
 My recommendation is to get rid of the APREF hack, deal only in
 va_list not va_list, and inline ECPGget_variable into the two
 places it's used to avoid the question of passing va_lists around
 after they've been modified.  The routine's not that big (especially
 seeing that only half of it is actually shared by the two callers)
 and it's just not worth the notational effort, let alone any portability
 risks, to factor it out.

Having spend countless hours debugging this stuff I fully agree with
you. It's not just ECPGget_variable though. I also had to inline
create_statement. This is only called once, so no big deal, but the
calling function gets bigger by quite a margin that way. It was
definitely easier to read with these functions, but again all this
hassle isn't worth it.

Attached you'll find a patch that should inline both functions and
remove the APREF stuff. This successfully runs the regression suite on
my Linux box. Please test it on those archs that needed special
treatment before I commit.


I applied it to REL8_2_STABLE/pgsql and ran 'run_build.pl --test
--keepall REL8_2_STABLE'.  Looks like it passed everything for me. :)

Regards,
Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] ECPG failure on BF member Vaquita (Windows Vista)

2007-04-26 Thread Tom Lane
Michael Meskes [EMAIL PROTECTED] writes:
 Having spend countless hours debugging this stuff I fully agree with
 you. It's not just ECPGget_variable though. I also had to inline
 create_statement.

AFAICS you do not need to inline create_statement.  The risk factor
is where you call a routine that does something with a va_list, and
then you want to do something else (other than va_end) with that va_list
after it returns.  The one use of create_statement doesn't do that,
hence no problem.  (We know this works, because stringinfo.c does it.)

BTW, I think there is a small bug there: you fail to call va_end() in the
failure path.  I'm not sure if there are any modern machines where
va_end() isn't a no-op, but nonetheless the code isn't meeting the spec.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] psql default options

2007-04-26 Thread Gregory Stark

For a long time one of the big gripes we get is that when using psql
interactively if you turn autocommit off and you typo on the nth command
you've suddenly lost all your work. The response was always that we needed a
generic subtransaction facility to handle it.

We've had such a facility for a while but we're still biting people with the
same problem.

I would like to suggest that we make psql default when in interactive mode to
using AUTOCOMMIT=false and ON_ERROR_ROLLBACK=true.

Defaulting to autocommit=false is somewhat surprising and a big change. I
recall the first time I used Oracle I was completely flummoxed by having
autocommit off. I know that'll be a hard sell.

But as I used it I grew quite fond of having autocommit off. It's saved my
bacon more than once. I strongly recommend it to anyone working with data they
care about even a little bit. Being able to double check your work results
before committing them is extremely valuable.

So basically what I'm saying is that I think what I recommend users use should
just be the default when using psql interactively.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] About the simple_heap_update function

2007-04-26 Thread rupesh bajaj

Hi,
I try to update a tuple in pg_attribute table by using the function
simple_heap_update while somequery processing is going on. But when I use to
see the value of that tuple(just updated) once again by SearchSysCache
function in the same query processing, I am not able to see the updated
values. Can you please tell me what is the reason. Or what should i do to
make the updation visible.

Thanks,
Rupesh Bajaj


[HACKERS] Hi, I wanto joinin the developer group of postgresql

2007-04-26 Thread shieldy

Hi, I wanto joinin the developer group of postgresql。
But, I just donot know how to put the first step, as I installed the
postgresql, and also get the postgresql code. after that, I also installed
the cygwin on my computer( as my os is windows xp). but now I wonder what's
my next step. as I have extends some aspects in the postgresql spatial data.
can you give me some suggestions on how should I go on? thankyou!


Re: [HACKERS] Buildfarm: Stage logs not available for MSVC builds

2007-04-26 Thread Andrew Dunstan

-hackers probably isn't the place for such complaints.

The problem not beacuse of MSVC, but because of member misconfiguration,
by the look of it. The tar command string will need to be set in the
config file and tar installed. I found that I needed bsdtar for Windows
for this to work. See

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=weevildt=2007-04-13%2014:54:06


cheers

andrew


Dave Page wrote:
 I just noticed that the stage logs aren't displayed against MSVC build
 hosts as they are for regular hosts, eg:

 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mastodondt=2007-04-25%2001:00:02

 vs.

 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhaldt=2007-04-25%2002:00:03

 Is this WIP, or a bug to be fixed?

 Regards Dave.

 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] About the simple_heap_update function

2007-04-26 Thread Tom Lane
rupesh bajaj [EMAIL PROTECTED] writes:
 I try to update a tuple in pg_attribute table by using the function
 simple_heap_update while somequery processing is going on. But when I use to
 see the value of that tuple(just updated) once again by SearchSysCache
 function in the same query processing, I am not able to see the updated
 values. Can you please tell me what is the reason. Or what should i do to
 make the updation visible.

You'd need a CommandCounterIncrement() ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [DOCS] row-level stats and last analyze time

2007-04-26 Thread Neil Conway
On Tue, 2007-04-24 at 17:38 -0400, Neil Conway wrote:
 which included other modifications to reduce the pgstat I/O volume in
 8.1. I don't think this particular change was wise

I looked into this a bit further:

(1) I believe the reasoning for Tom's earlier change was not to reduce
the I/O between the backend and the pgstat process: it was to keep the
in-memory stats hash tables small, and to reduce the amount of data that
needs to be written to disk. When the only stats messages we get for a
table are VACUUM or ANALYZE messages, we discard the message in the
pgstat daemon.

(2) If stats_row_level is false, there won't be a stats hash entry for
any tables, so we can skip sending the VACUUM or ANALYZE message in the
first place, by the same logic. (This is more debatable if the user just
disabled stats_row_level for the current session, although since only a
super-user can do that, perhaps that's OK.)

(3) I don't like the fact that the current coding is so willing to throw
away VACUUM and ANALYZE pgstat messages. I think it is quite plausible
that the DBA might be interested in the last-VACUUM and last-ANALYZE
information for a table which hasn't had live operations applied to it
recently. The rest of the pgstat code has a similarly disappointing
willingness to silently discard messages it doesn't think are worth
keeping (e.g. pgstat_recv_autovac() is ignored for databases with no
other activity, and pgstat_count_xact_commit/rollback() is a no-op
unless *either* row-level or block-level stats are enabled.)

If we're so concerned about saving space in the stats hash tables for
tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange
to record the timestamps for database-wide VACUUMs and ANALYZEs
separately from table-local VACUUMs and ANALYZEs? That is, a table's
last_vacuum time could effectively be the max of the last database-wide
vacuum time and the last VACUUM on that particular table. (Recording the
time of the last database-wide VACUUM might be worth doing anyway, e.g.
for avoiding wraparound failure).

Comments?

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [DOCS] row-level stats and last analyze time

2007-04-26 Thread Alvaro Herrera
Neil Conway wrote:

 (3) I don't like the fact that the current coding is so willing to throw
 away VACUUM and ANALYZE pgstat messages. I think it is quite plausible
 that the DBA might be interested in the last-VACUUM and last-ANALYZE
 information for a table which hasn't had live operations applied to it
 recently. The rest of the pgstat code has a similarly disappointing
 willingness to silently discard messages it doesn't think are worth
 keeping (e.g. pgstat_recv_autovac() is ignored for databases with no
 other activity, and pgstat_count_xact_commit/rollback() is a no-op
 unless *either* row-level or block-level stats are enabled.)

One thing to keep in mind is that autovac drives some decision from
whether the database has a pgstat entry or not.  In particular it means
it doesn't bother processing non-connectable databases, unless they are
close to Xid wraparound.

I think this behavior is a useful one, since usually vacuuming those
databases is a waste of time anyway.  Whether to drive it from pgstat or
from somewhere else is another matter, but if you want to drive it from
another mechanism, keep in mind that the autovacuum launcher (which is
the process that makes this decision) is not connected to any database
so it cannot examine any catalog's content.  There are of course ways
around that: for example you could put the information in the
pg_database flatfile.  But it's something to keep in mind if you want to
change it.

 If we're so concerned about saving space in the stats hash tables for
 tables that don't see non-VACUUM / non-ANALYZE activity, why not arrange
 to record the timestamps for database-wide VACUUMs and ANALYZEs
 separately from table-local VACUUMs and ANALYZEs? That is, a table's
 last_vacuum time could effectively be the max of the last database-wide
 vacuum time and the last VACUUM on that particular table. (Recording the
 time of the last database-wide VACUUM might be worth doing anyway, e.g.
 for avoiding wraparound failure).

Another thing to keep in mind is that autovacuum does not do
database-wide vacuums anymore -- they are not needed.  Xid wraparound
decisions are handled on a table-by-table basis, so information about
when the last database-wide vacuum was is not needed.

Note that Xid wraparound decisions are driven by information in
pg_class.  So it's not a problem that pgstat may lose the info from this
POV.

The bottom line is that the current pgstat behavior and autovacuum are
closely related.  So if you want to change pgstats you should also keep
an eye on how it's going to affect autovac.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Interaction of PITR backups and Bulkoperationsavoiding WAL

2007-04-26 Thread Bruce Momjian
Simon Riggs wrote:
 On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:
   It strikes me that allowing archive_command to be changed on the fly
   might not be such a good idea though, or at least it shouldn't be
   possible to flip it from empty to nonempty during live operation.
  
   I'd rather fix it the proposed way than force a restart. ISTM wrong to
   have an availability feature cause downtime.
  
  I don't think that people are very likely to need to turn archiving on
  and off on-the-fly.  Your proposed solution introduces a great deal of
  complexity (and risk of future bugs-of-omission, to say nothing of race
  conditions) to solve a non-problem.  We have better things to be doing
  with our development time.
 
 It's certainly a quicker fix. Unless others object, I'll set
 archive_command to only be changeable at server startup.

Simon, has this patch been submitted?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Implicit casts to text

2007-04-26 Thread Bruce Momjian

Where are we on this?

---

Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  FWIW, is the attached patch about what you had in mind?  (It probably only 
  covers normal types at the moment.)
 
 Hm, I hadn't realized that it would take as little work as that ...
 I have an itchy feeling that you missed something but I'm not sure
 what.
 
 One thing I had wanted to do is take out the existing functions and
 pg_cast entries that are effectively just providing hard-wired
 equivalents to this, but that's merely housekeeping.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Implicit casts to text

2007-04-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Where are we on this?

Since there weren't any objections, I guess we can do it ;-)

I'll try to do something with Peter's patch plus removing the deadwood.
Would you add his patch to the queue so I don't forget?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Implicit casts to text

2007-04-26 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Where are we on this?
 
 Since there weren't any objections, I guess we can do it ;-)
 
 I'll try to do something with Peter's patch plus removing the deadwood.
 Would you add his patch to the queue so I don't forget?

Added.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] psql default options

2007-04-26 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I would like to suggest that we make psql default when in interactive mode to
 using AUTOCOMMIT=false and ON_ERROR_ROLLBACK=true.

That is *way* too big a behavioral change to make depend on something as
fragile as whether psql thinks it's interactive or not.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] elog(FATAL) vs shared memory

2007-04-26 Thread Bruce Momjian

Where are we on this?

---

Tom Lane wrote:
 In this thread:
 http://archives.postgresql.org/pgsql-bugs/2007-03/msg00145.php
 we eventually determined that the reported lockup had three components:
 
 (1) something (still not sure what --- Martin and Mark, I'd really like
 to know) was issuing random SIGTERMs to various postgres processes
 including autovacuum.
 
 (2) if a SIGTERM happens to arrive while btbulkdelete is running,
 the next CHECK_FOR_INTERRUPTS will do elog(FATAL), causing elog.c
 to do proc_exit(0), leaving the vacuum still recorded as active in
 the shared memory array maintained by _bt_start_vacuum/_bt_end_vacuum.
 The PG_TRY block in btbulkdelete doesn't get a chance to clean up.
 
 (3) eventually, either we try to re-vacuum the same index or
 accumulation of bogus active entries overflows the array.
 Either way, _bt_start_vacuum throws an error, which btbulkdelete
 PG_CATCHes, leading to_bt_end_vacuum trying to re-acquire the LWLock
 already taken by _bt_start_vacuum, meaning that the process hangs up.
 And then so does anything else that needs to take that LWLock...
 
 Point (3) is already fixed in CVS, but point (2) is a lot nastier.
 What it essentially says is that trying to clean up shared-memory
 state in a PG_TRY block is unsafe: you can't be certain you'll
 get to do it.  Now this is not a big deal during normal SIGTERM or
 SIGQUIT database shutdown, because we're going to abandon the shared
 memory segment anyway.  However, if we ever want to support individual
 session kill via SIGTERM, it's a problem.  Even if we were not
 interested in someday considering that a supported feature, it seems
 that dealing with random SIGTERMs is needed for robustness in at least
 some environments.
 
 AFAICS, there are basically two ways we might try to approach this:
 
 Plan A: establish the rule that you mustn't try to clean up shared
 memory state in a PG_CATCH block.  Anything you need to do like that
 has to be handled by an on_shmem_exit hook function, so it will be
 called during a FATAL exit.  (Or maybe you can do it in PG_CATCH for
 normal ERROR cases, but you need a backing on_shmem_exit hook to
 clean up for FATAL.)
 
 Plan B: change the handling of FATAL errors so that they are thrown
 like normal errors, and the proc_exit call happens only when we get
 out to the outermost control level in postgres.c.  This would mean
 that PG_CATCH blocks get a chance to clean up before the FATAL exit
 happens.  The problem with that is that a non-cooperative PG_CATCH
 block might think it could recover from the error, and then the exit
 does not happen at all.  We'd need a coding rule that PG_CATCH blocks
 *must* re-throw FATAL errors, which seems at least as ugly as Plan A.
 In particular, all three of the external-interpreter PLs are willing
 to return errors into the external interpreter, and AFAICS we'd be
 entirely at the mercy of the user-written Perl or Python or Tcl code
 whether it re-throws the error or not.
 
 So Plan B seems unacceptably fragile.  Does anyone see a way to fix it,
 or perhaps a Plan C with a totally different idea?  Plan A seems pretty
 ugly but it's the best I can come up with.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Modifying TOAST thresholds

2007-04-26 Thread Bruce Momjian

I have seen no one do peroformance testing of this, so it seems it will
have to wait for 8.4.

---

Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  What I would definitely like to see for 8.3 is some performance testing
  done to determine whether we ought to change the current defaults.
  (Both TOAST_TUPLES_PER_PAGE and EXTERN_TUPLES_PER_PAGE ought to be looked
  at.)
 
 It will take some thinking before it's even clear what we should be measuring.
 The optimal value will depend heavily on the usage pattern so the best value
 for the default will be something hand-wavy like the smallest tuple size
 where the cost of a select including the column is greater than the time saved
 on a select not including the column or something like that.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-26 Thread Koichi Suzuki

Josh,

Josh Berkus wrote:

Koichi, Andreas,


1) To deal with partial/inconsisitent write to the data file at crash
recovery, we need full page writes at the first modification to pages
after each checkpoint.   It consumes much of WAL space.


We need to find a way around this someday.  Other DBs don't do this; it may be 
becuase they're less durable, or because they fixed the problem.


Maybe both.   Fixing the problem may need some means to detect 
partial/inconsistent writes to the data files, which may needs 
additional CPU resource.





I don't think there should be only one setting.   It depend on how
database is operated.   Leaving wal_add_optiomization_info = off default
does not bring any change in WAL and archive log handling.   I
understand some people may not be happy with additional 3% or so
increase in WAL size, especially people who dosn't need archive log at
all.   So I prefer to leave the default off.


Except that, is there any reason to turn this off if we are archiving?  Maybe 
it should just be slaved to archive_command ... if we're not using PITR, it's 
off, if we are, it's on.


Hmm, this sounds to work.  On the other hand, existing users, who are 
happy with the current archiving and would not like to change current 
archiving command to pg_compresslog or archive log size will increase a 
bit.  I'd like to hear some more on this.





1) is there any throughput benefit for platforms with fast CPU but
contrained I/O (e.g. 2-drive webservers)?  Any penalty for servers with
plentiful I/O?

I've only run benchmarks with archive process running, because
wal_add_optimization_info=on does not make sense if we don't archive
WAL.   In this situation, total I/O decreases because writes to archive
log decreases.   Because of 3% or so increase in WAL size, there will be
increase in WAL write, but decrease in archive writes makes it up.


Yeah, I was just looking for a way to make this a performance feature.  I see 
now that it can't be.  ;-)


As to the performance feature, I tested the patch against 8.3HEAD. 
With pgbench, throughput was as follows:

Case1. Archiver: cp command, wal_add_optimization_info = off,
   full_page_writes=on
Case2. Archiver: pg_compresslog, wal_add_optimization_info = on,
   full_page_writes=on
DB Size: 1.65GB, Total transaction:1,000,000

Throughput was:
Case1: 632.69TPS
Case2: 653.10TPS ... 3% gain.

Archive Log Size:
Case1: 1.92GB
Case2: 0.57GB (about 30% of the Case1)... Before compression, the size 
was 1.92GB.  Because this is based on the number of WAL segment file 
size, there will be at most 16MB error in the measurement.  If we count 
this, the increase in WAL I/O will be less than 1%.





3) How is this better than command-line compression for log-shipping? 
e.g. why do we need it in the database?

I don't fully understand what command-line compression means.   Simon
suggested that this patch can be used with log-shipping and I agree.
If we compare compression with gzip or other general purpose
compression, compression ratio, CPU usage and I/O by pg_compresslog are
all quite better than those in gzip.


OK, that answered my question.


This is why I don't like Josh's suggested name of wal_compressable
eighter.
WAL is compressable eighter way, only pg_compresslog would need to be
more complex if you don't turn off the full page optimization. I think a
good name would tell that you are turning off an optimization.
(thus my wal_fullpage_optimization on/off)


Well, as a PG hacker I find the name wal_fullpage_optimization quite baffling 
and I think our general user base will find it even more so.  Now that I have 
Koichi's explanation of the problem, I vote for simply slaving this to the 
PITR settings and not having a separate option at all.


Could I have more specific suggestion on this?

Regards;


--
-
Koichi Suzuki

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [PATCHES] Fix for large file support

2007-04-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:
 [ redirecting to -hackers for wider comment ]
 
 Zdenek Kotala [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  LET_OS_MANAGE_FILESIZE is good way. I think one problem of this option I 
  fixed. It is size of offset. I went thru the code and did not see any 
  other problem there. However, how you mentioned it need more testing. I 
  going to take server with large disk array and I will test it.
 
  I would like to add --enable-largefile switch to configure file to 
  enable access to wide group of users. What you think about it?
 
 Yeah, I was going to suggest the same thing --- but not with that switch
 name.  We already use enable/disable-largefile to control whether 64-bit
 file access is built at all (this mostly affects pg_dump at the moment).
 
 I think the clearest way might be to flip the sense of the variable.
 I never found LET_OS_MANAGE_FILESIZE to be a good name anyway.  I'd
 suggest USE_SEGMENTED_FILES, which defaults to on, and you can
 turn it off via --disable-segmented-files if configure confirms your
 OS has largefile support (thus you could not specify both this and
 --disable-largefile).
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] CIC and deadlocks

2007-04-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Pavan Deolasee wrote:
 On 4/11/07, Tom Lane [EMAIL PROTECTED] wrote:
 
 
  [ itch... ]  The problem is with time-extended execution of
  GetSnapshotData; what happens if the other guy lost the CPU for a good
  long time while in the middle of GetSnapshotData?  He might set his
  xmin based on info you saw as long gone.
 
  You might be correct that it's safe, but the argument would have to
  hinge on the OldestXmin process being unable to commit because of
  someone holding shared ProcArrayLock; a point you are definitely not
  making above.  (Study the comments in GetSnapshotData for awhile,
  also those in xact.c's commit-related code.)
 
 
 My argument was based on what you said above, but I obviously did not
 state it well :)
 
 Anyways, I think its better to be safe and we agree that its not such a
 bad thing to take exclusive lock on procarray because CIC is not something
 that happens very often. Attached is a revised patch which takes exclusive
 lock on the procarray, rest remaining the same.
 
 Thanks,
 Pavan
 
 -- 
 
 EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Reviewers Guide to Deferred Transactions/TransactionGuarantee

2007-04-26 Thread Bruce Momjian
Simon Riggs wrote:
  That should go away entirely; to me the main point of the separate
  wal-writer process is to take over responsibility for not letting too
  many dirty wal buffers accumulate.
 
 Yes
 
 
 I'll make the agreed changes by next Wed/Thurs. 

I have seen no patch yet with the agreed changes.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Background LRU Writer/free list

2007-04-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Greg Smith wrote:
 I'm mostly done with my review of the Automatic adjustment of 
 bgwriter_lru_maxpages patch.  In addition to issues already brought up 
 with that code, there are some small things that need to be done to merge 
 it with the recent pg_stat_bgwriter patch, and I have some concerns about 
 its unbounded scanning of the buffer pool; I'll write that up in more 
 detail or just submit an improved patch as I get time this week.
 
 But there's a fundamental question that has been bugging me, and I think 
 it impacts the direction that code should take.  Unless I'm missing 
 something in my reading, buffers written out by the LRU writer aren't ever 
 put onto the free list.  I assume this is to stop from prematurely 
 removing buffers that contain useful data.  In cases where a substantial 
 percentage of the buffer cache is dirty, the LRU writer has to scan a 
 significant portion of the pool looking for one of the rare clean buffers, 
 then write it out.  When a client goes to grab a free buffer afterward, it 
 has to scan the same section of the pool to find the now clean buffer, 
 which seems redundant.
 
 With the new patch, the LRU writer is fairly well bounded in that it 
 doesn't write out more than it thinks it will need; you shouldn't get into 
 a situation where many more pages are written than will be used in the 
 near future.  Given that mindset, shouldn't pages the LRU scan writes just 
 get moved onto the free list?
 
 --
 * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCH] A crash and subsequent recovery of the master can cause the slave to get out-of-sync

2007-04-26 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Florian G. Pflug wrote:
 Hi
 
 I believe I have discovered the following problem in pgsql 8.2 and HEAD,
 concerning warm-standbys using WAL log shipping.
 
 The problem is that after a crash, the master might complete incomplete
 actions via rm_cleanup() - but since it won't wal-log those changes,
 the slave won't know about this. This will at least prevent the creation
 of any further restart points on the slave (because safe_restartpoint)
 will never return true again - it it might even cause data corruption,
 if subsequent wal records are interpreted wrongly by the slave because
 it sees other data than the master did when it generated them.
 
 Attached is a patch that lets RecoveryRestartPoint call all
 rm_cleanup() methods and create a restart point whenever it encounters
 a shutdown checkpoint in the wal (because those are generated after
 recovery). This ought not cause a performance degradation, because
 shutdown checkpoints will occur very infrequently.
 
 The patch is per discussion with Simon Riggs.
 
 I've not yet had a chance to test this patch, I only made sure
 that it compiles. I'm sending this out now because I hope this
 might make it into 8.2.4.
 
 greetings, Florian Pflug


 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] elog(FATAL) vs shared memory

2007-04-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Where are we on this?

Still trying to think of a less messy solution...

 What it essentially says is that trying to clean up shared-memory
 state in a PG_TRY block is unsafe: you can't be certain you'll
 get to do it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] elog(FATAL) vs shared memory

2007-04-26 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Where are we on this?
 
 Still trying to think of a less messy solution...

OK, put in the patches hold queue for 8.4.

---


 
  What it essentially says is that trying to clean up shared-memory
  state in a PG_TRY block is unsafe: you can't be certain you'll
  get to do it.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pgsql crollable cursor doesn't support one form of postgresql's cu

2007-04-26 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Pavel Stehule wrote:
 
 Pavel Stehule [EMAIL PROTECTED] writes:
   I found one unsupported form plpgsql's fetch statement which is  
 supported
   by postgresql.
 
   PostgreSQL knows
   FETCH 3 FROM 
 
   but plpgsql needs everytime direction's keyword.
 
 No, I think that's OK, because that form specifies fetching 3 rows,
 which plpgsql's FETCH doesn't support.
 
 
 it's true. There is same question for move statement too. Other difference 
 is unsupported keyword IN.
 
 It can be fixed:
 
 *** ./gram.y.orig 2007-04-19 20:27:17.0 +0200
 --- ./gram.y  2007-04-19 20:41:16.0 +0200
 ***
 *** 2059,2071 
   else if (pg_strcasecmp(yytext, absolute) == 0)
   {
   fetch-direction = FETCH_ABSOLUTE;
 ! fetch-expr = plpgsql_read_expression(K_FROM, FROM);
   check_FROM = false;
   }
   else if (pg_strcasecmp(yytext, relative) == 0)
   {
   fetch-direction = FETCH_RELATIVE;
 ! fetch-expr = plpgsql_read_expression(K_FROM, FROM);
   check_FROM = false;
   }
   else if (pg_strcasecmp(yytext, forward) == 0)
 --- 2059,2071 
   else if (pg_strcasecmp(yytext, absolute) == 0)
   {
   fetch-direction = FETCH_ABSOLUTE;
 ! fetch-expr = read_sql_construct(K_FROM, K_IN, FROM/IN, 
 SELECT , 
 true, true, NULL);
   check_FROM = false;
   }
   else if (pg_strcasecmp(yytext, relative) == 0)
   {
   fetch-direction = FETCH_RELATIVE;
 ! fetch-expr = read_sql_construct(K_FROM, K_IN, FROM/IN, 
 SELECT , 
 true, true, NULL);
   check_FROM = false;
   }
   else if (pg_strcasecmp(yytext, forward) == 0)
 ***
 *** 2076,2081 
 --- 2076,2087 
   {
   fetch-direction = FETCH_BACKWARD;
   }
 + else if (tok != T_SCALAR)
 + {
 + plpgsql_push_back_token(tok);
 + fetch-expr = read_sql_construct(K_FROM, K_IN, FROM/IN, 
 SELECT , 
 true, true, NULL);
 + check_FROM = false;
 + }
   else
   {
   /* Assume there's no direction clause */
 ***
 *** 2083,2091 
   check_FROM = false;
   }
 
 ! /* check FROM keyword after direction's specification */
 ! if (check_FROM  yylex() != K_FROM)
 ! yyerror(expected \FROM\);
 
   return fetch;
   }
 --- 2089,2097 
   check_FROM = false;
   }
 
 ! /* check FROM or IN keyword after direction's specification */
 ! if (check_FROM  (yylex() != K_FROM  yylex() != K_IN))
 ! yyerror(expected \FROM/IN\);
 
   return fetch;
   }
 
 Regards
 Pavel Stehule
 
 _
 Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/
 
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Re: [COMMITTERS] pgsql: Remove some of the most blatant brain-fade in the recent guc

2007-04-26 Thread Bruce Momjian

Is anyone working on this fix?

---

Tom Lane wrote:
 Log Message:
 ---
 Remove some of the most blatant brain-fade in the recent guc patch
 (it's so nice to have a buildfarm member that actively rejects naked
 uses of strcasecmp).  This coding is still pretty awful, though, since
 it's going to be O(N^2) in the number of guc variables.  May I direct
 your attention to bsearch?
 
 Modified Files:
 --
 pgsql/src/backend/utils/misc:
 guc.c (r1.387 - r1.388)
 
 (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/misc/guc.c.diff?r1=1.387r2=1.388)
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] autovacuum does not start in HEAD

2007-04-26 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


ITAGAKI Takahiro wrote:
 I wrote:
  I found that autovacuum launcher does not launch any workers in HEAD.
 
 The attached autovacuum-fix.patch could fix the problem. I changed
 to use 'greater or equal' instead of 'greater' at the decision of
 next autovacuum target.
 
 The point was in the resolution of timer; There is a platform that timer
 has only a resolution of milliseconds. We initialize adl_next_worker with
 current_time in rebuild_database_list(), but we could use again the same
 value in do_start_worker(), because there is no measurable difference
 in those low-resolution-platforms.
 
 
 Another attached patch, autovacuum-debug.patch, is just for printf-debug.
 I got the following logs without fix -- autovacuum never works.
 
 # SELECT oid, datname FROM pg_database ORDER BY oid;
   oid  |  datname  
 ---+---
  1 | template1
  11494 | template0
  11495 | postgres
  16384 | bench
 (4 rows)
 
 # pgbench bench -s1 -c1 -t10
 [with configurations of autovacuum_naptime = 10s and log_min_messages = 
 debug1]
 
 LOG:  do_start_worker skip : 230863399.25, 230863399.25, 
 230863409.25
 LOG:  rebuild_database_list: db=11495, time=230863404.25
 LOG:  rebuild_database_list: db=16384, time=230863409.25
 DEBUG:  autovacuum: processing database bench
 LOG:  do_start_worker skip : 230863404.25, 230863404.25, 
 230863414.25
 LOG:  do_start_worker skip : 230863404.25, 230863409.25, 
 230863414.25
 LOG:  rebuild_database_list: db=11495, time=230863409.25
 LOG:  rebuild_database_list: db=16384, time=230863414.25
 LOG:  do_start_worker skip : 230863409.25, 230863409.25, 
 230863419.25
 LOG:  do_start_worker skip : 230863409.25, 230863414.25, 
 230863419.25
 LOG:  rebuild_database_list: db=11495, time=230863414.25
 LOG:  rebuild_database_list: db=16384, time=230863419.25
 ...
 (no autovacuum activities forever)
 
 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center
 

[ Attachment, skipping... ]

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Feature freeze progress report

2007-04-26 Thread Bruce Momjian
Now that we are half-way though the scheduled feature freeze, I wanted
to share my thoughts about this period.

Having just pushed all open items into the patches queue or 8.4 hold
queue, I am seeing that we have many more in-process patches than we
normally do at this stage of the process.  I think there are three
reasons for this:

1)  The patches are not necessarily larger, but are more complex because
much most of the easy TODO items have already been written for previous
PostgreSQL releases.

2)  We have a number of new developers who took on some of these complex
TODO items, and some of the TODO items were significantly beyond the
developer capabilities at the start of the process.

3)  Many of the complex patches are hard to review because they deal
with very complex areas of the code, like reliability or transaction
semantics.

Our community could probably handle a few of these complex patches, but
the volume for this release is significantly higher than previous
releases.  The community is doing a good job of giving patch writers
feedback and new patch versions are getting generated.  However, this
amount of patch churn is not normal.

There are a few possible results that might come out of this:

1)  Feature freeze will be much longer.
2)  More patches will be postponed for later releases than usual.
3)  Most patches will be included but beta will be longer because
of bug fixing.
4)  Most patches will be included but beta will not be any longer.

I think we all hope for #4, but right now, I don't know the probability
of that.  We are going to have to think creatively in the coming weeks
to increase the likelihood of a #4 result.  However, right now, I can't
think of what we can do to improve the odds.  I think the community has
to come up with ideas on how to accomplish this.

[ FYI, I leave on a 2-week trip tomorrow/Friday.]

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [COMMITTERS] pgsql: Remove some of the most blatant brain-fade in the recent guc

2007-04-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Is anyone working on this fix?

I dunno, but that patch is gonna get reverted altogether if someone
doesn't fix the fact that it broke PGCLIENTENCODING ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

2007-04-26 Thread Tom Lane
Dave Page [EMAIL PROTECTED] writes:
 I've been seeing this failure intermittently on Narwhal HEAD, and once
 on 8.1. Other branches have been OK, as have other animals running on
 the same physical box. Narwhal-HEAD is run more often than any other
 builds however.

Oh, this is interesting:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=baijidt=2007-04-26%2022:00:02

Different compiler, different OS, not quite the same block number (109,
whereas IIRC all the previous examples have complained of block 104).
Is this the same physical machine as narwhal?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] too much WAL volume

2007-04-26 Thread Greg Smith

On Thu, 26 Apr 2007, Zeugswetter Andreas ADI SD wrote:


I am not sure that shrinking per WAL record size (other than the full
page images), e.g. by only logging changed bytes and not whole tuples,
would have a huge impact on OLTP tx/sec, since the limiting factor is
IO's per second and not Mb per second.


With the kind of caching controller that's necessary for any serious OLTP 
work with Postgres, number of I/Os per second isn't really an important 
number.  Total volume of writes to the WAL volume can be though.  It's 
difficult but not impossible to encounter a workload that becomes 
bottlenecked by WAL volume on a good OLTP server, particularly because 
that's often going to a single or RAID-1 disk.  Whether those workloads 
also have the appropriate properties such that their WAL could be shrunk 
usefully in real-time is a good question.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] pgsql crollable cursor doesn't support one form of postgresql's cu

2007-04-26 Thread Neil Conway
I haven't read the rest of the thread yet, but is this hunk not buggy?
yylex() is side-effecting, so the two calls to yylex() do not do what
the comment suggests.

 *** 2083,2091 
   check_FROM = false;
   }
 
 ! /* check FROM keyword after direction's specification */
 ! if (check_FROM  yylex() != K_FROM)
 ! yyerror(expected \FROM\);
 
   return fetch;
   }
 --- 2089,2097 
   check_FROM = false;
   }
 
 ! /* check FROM or IN keyword after direction's specification */
 ! if (check_FROM  (yylex() != K_FROM  yylex() != K_IN))
 ! yyerror(expected \FROM/IN\);
 
   return fetch;

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pgsql crollable cursor doesn't support one form ofpostgresql's cu

2007-04-26 Thread Pavel Stehule

Hello,

it's true. There is bug. I'll send actualised version tomorrow.

Regards
Pavel


I haven't read the rest of the thread yet, but is this hunk not buggy?
yylex() is side-effecting, so the two calls to yylex() do not do what
the comment suggests.


 ! /* check FROM or IN keyword after direction's specification */
 ! if (check_FROM  (yylex() != K_FROM  yylex() != K_IN))
 ! yyerror(expected \FROM/IN\);

   return fetch;

-Neil




_
Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. 
http://messenger.msn.cz/



---(end of broadcast)---
TIP 6: explain analyze is your friend