Re: [PATCHES] Relation forks FSM rewrite patches

2008-07-03 Thread Mark Kirkwood

Heikki Linnakangas wrote:
Here's an updated version of the relation forks patch, and an 
incremental FSM rewrite patch on top of that. The relation forks patch 
is ready for review. The FSM implementation is more work-in-progress 
still, but I'd like to get some review on that as well, with the goal 
of doing more performance testing and committing it after the commit 
fest.


The one part that I'm not totally satisfied in the relation forks 
patch is the smgrcreate() function. The question problem is: which 
piece of code decides which forks to create for a relation, and when 
to create them? I settled on the concept that all forks that a 
relation will need are created at once, in one smgrcreate() call. 
There's no function to create additional forks later on. Likewise, 
there's no function to unlink individual forks, you have to unlink the 
whole relation.


Currently, heap_create() decides which forks to create. That's fine at 
the moment, but will become problematic in the future, as it's 
indexam-specific which forks an index requires. That decision should 
really be done in the indexam. One possibility would be to only create 
the main fork in heap_create(), and let indexam create any additional 
forks it needs in ambuild function.




I've had a bit of a play with this, looks pretty nice. One point that 
comes to mind is that we are increasing the number of required file 
descriptors by a factor of 2 (and possibly 3 if we use a fork for the 
visibility map). Is this a problem do you think?


Cheers

Mark

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [HACKERS] [PATCHES] WITH RECURSIVE patch V0.1

2008-05-18 Thread Mark Mielke

Merlin Moncure wrote:

On Sun, May 18, 2008 at 5:22 PM, Zoltan Boszormenyi [EMAIL PROTECTED] wrote:
  

Can we get the rows in tree order, please? I.e. something like this:



Is ordering by tree order defined in the standard when no explicit
order is given?  If not, it probably returns them in the order they
are pulled up, which might be the fastest way


+1 for the fastest way, which I expect to often be find all level 1 
matches, find all level 2 matches, ... If ORDER BY is important, it 
should be specified (although it may be difficult or impossible to 
properly represent ORDER BY for a tree? not sure?) I think most uses of 
recursive require extra client side code to deal with anyways, so only 
relative order is important (order within a particular branch).


There are things I'd like to use this for right now. Currently I use 
plpgsql procedures to implement my own recursion. :-)


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [PATCHES] configure option for XLOG_BLCKSZ

2008-05-02 Thread Mark Wong
On Fri, May 2, 2008 at 12:04 AM, Joshua D. Drake [EMAIL PROTECTED] wrote:

 Tom Lane wrote:

  Mark Wong [EMAIL PROTECTED] writes:
 
   I saw a that a patch was committed that exposed a configure switch for
   BLCKSZ.  I was hoping that I could do that same for XLOG_BLCKSZ.
  
 
  Well, we certainly *could*, but what's the use-case really?  The case
  for varying BLCKSZ is marginal already, and I've seen none at all for
  varying XLOG_BLCKSZ.  Why do we need to make it easier than edit
  pg_config_manual.h?
 

  The use case I could see is for performance testing but I would concur that
 it doesn't take much to modify pg_config_manual.h. In thinking about it,
 this might actually be a foot gun. You have a new pg guy, download source
 and think to himself..., Hey I have a 4k block size as formatted on my hard
 disk. Then all of a sudden they have an incompatible PostgreSQL with
 everything else.

As someone who has tested varying both those parameters it feels
awkward to have a configure option for one and not the other, or vice
versa.  I have slightly stronger feelings for having them both as
configure options because it's easier to script, but feel a little
more strongly about having BLCKSZ and XLOG_BLCKSZ both as either
configure options or in pg_config_manual.h.  To have them such that
one needs to change them in different manners makes a tad more work in
automating testing.  So my case is just for ease of testing.

Regards,
Mark

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] configure option for XLOG_BLCKSZ

2008-05-02 Thread Mark Wong
On Fri, May 2, 2008 at 8:50 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Mark Wong [EMAIL PROTECTED] writes:

  As someone who has tested varying both those parameters it feels
   awkward to have a configure option for one and not the other, or vice
   versa.  I have slightly stronger feelings for having them both as
   configure options because it's easier to script, but feel a little
   more strongly about having BLCKSZ and XLOG_BLCKSZ both as either
   configure options or in pg_config_manual.h.  To have them such that
   one needs to change them in different manners makes a tad more work in
   automating testing.  So my case is just for ease of testing.

  Well, that's a fair point.  Another issue though is whether it makes
  sense for XLOG_BLCKSZ to be different from BLCKSZ at all, at least in
  the default case.  They are both the unit of I/O and it's not clear
  why you'd want different units.  Mark, has your testing shown any
  indication that they really ought to be separately configurable?
  I could see having the same configure switch set both of 'em.

I still believe it makes sense to have them separated.  I did have
some data, which has since been destroyed, that suggested there were
some system characterization differences for OLTP workloads with
PostgreSQL.  Let's hope those disks get delivered to Portland soon. :)

Regards,
Mark

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] configure option for XLOG_BLCKSZ

2008-05-02 Thread Mark Wong
On Fri, May 2, 2008 at 9:16 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Mark Wong [EMAIL PROTECTED] writes:

  I still believe it makes sense to have them separated.  I did have
   some data, which has since been destroyed, that suggested there were
   some system characterization differences for OLTP workloads with
   PostgreSQL.  Let's hope those disks get delivered to Portland soon. :)

  Fair enough.  It's not that much more code to have another configure
  switch --- will go do that.

  If we are allowing blocksize and relation seg size to have configure
  switches, seems that symmetry would demand that XLOG_SEG_SIZE be
  configurable as well.  Thoughts?

I don't have a feel for this one, but when we get the disks set up we
can certainly test to see what effects it has. :)

Regards,
Mark

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


[PATCHES] configure option for XLOG_BLCKSZ

2008-05-01 Thread Mark Wong
Hi all,

I saw a that a patch was committed that exposed a configure switch for
BLCKSZ.  I was hoping that I could do that same for XLOG_BLCKSZ.  I
think I got the configure.in, sgml, pg_config_manual.h, and
pg_config.h.in changes correct.

Regards,
Mark
Index: configure
===
RCS file: /projects/cvsroot/pgsql/configure,v
retrieving revision 1.592
diff -c -r1.592 configure
*** configure	2 May 2008 01:08:22 -	1.592
--- configure	2 May 2008 04:39:34 -
***
*** 1374,1379 
--- 1374,1380 
--with-libs=DIRSalternative spelling of --with-libraries
--with-pgport=PORTNUM   set default port number [5432]
--with-blocksize=BLOCKSIZE  set block size in kB [8]
+   --with-xlog-blocksize=BLOCKSIZE  set xlog block size in kB [8]
--with-segsize=SEGSIZE  set segment size in GB [1]
--with-tcl  build Tcl modules (PL/Tcl)
--with-tclconfig=DIRtclConfig.sh is in DIR
***
*** 2602,2607 
--- 2603,2658 
  _ACEOF
  
  
+ { echo $as_me:$LINENO: checking for xlog block size 5
+ echo $ECHO_N checking for xlog block size... $ECHO_C 6; }
+ 
+ pgac_args=$pgac_args with_xlog_blocksize
+ 
+ 
+ # Check whether --with-xlog-blocksize was given.
+ if test ${with_xlog_blocksize+set} = set; then
+   withval=$with_xlog_blocksize;
+   case $withval in
+ yes)
+   { { echo $as_me:$LINENO: error: argument required for --with-xlog-blocksize option 5
+ echo $as_me: error: argument required for --with-xlog-blocksize option 2;}
+{ (exit 1); exit 1; }; }
+   ;;
+ no)
+   { { echo $as_me:$LINENO: error: argument required for --with-xlog-blocksize option 5
+ echo $as_me: error: argument required for --with-xlog-blocksize option 2;}
+{ (exit 1); exit 1; }; }
+   ;;
+ *)
+   xlog_blocksize=$withval
+   ;;
+   esac
+ 
+ else
+   xlog_blocksize=8
+ fi
+ 
+ 
+ case ${xlog_blocksize} in
+   1) XLOG_BLCKSZ=1024;;
+   2) XLOG_BLCKSZ=2048;;
+   4) XLOG_BLCKSZ=4096;;
+   8) XLOG_BLCKSZ=8192;;
+  16) XLOG_BLCKSZ=16384;;
+  32) XLOG_BLCKSZ=32768;;
+   *) { { echo $as_me:$LINENO: error: Invalid block size. Allowed values are 1,2,4,8,16,32. 5
+ echo $as_me: error: Invalid block size. Allowed values are 1,2,4,8,16,32. 2;}
+{ (exit 1); exit 1; }; }
+ esac
+ { echo $as_me:$LINENO: result: ${xlog_blocksize}kB 5
+ echo ${ECHO_T}${xlog_blocksize}kB 6; }
+ 
+ 
+ cat confdefs.h _ACEOF
+ #define XLOG_BLCKSZ ${XLOG_BLCKSZ}
+ _ACEOF
+ 
+ 
  #
  # File segment size
  #
Index: configure.in
===
RCS file: /projects/cvsroot/pgsql/configure.in,v
retrieving revision 1.558
diff -c -r1.558 configure.in
*** configure.in	2 May 2008 01:08:26 -	1.558
--- configure.in	2 May 2008 04:39:34 -
***
*** 249,254 
--- 249,278 
   Changing BLCKSZ requires an initdb.
  ]) 
  
+ AC_MSG_CHECKING([for xlog block size])
+ PGAC_ARG_REQ(with, xlog-blocksize, [  --with-xlog-blocksize=BLOCKSIZE  set xlog block size in kB [[8]]],
+  [xlog_blocksize=$withval],
+  [xlog_blocksize=8])
+ case ${xlog_blocksize} in
+   1) XLOG_BLCKSZ=1024;;
+   2) XLOG_BLCKSZ=2048;;
+   4) XLOG_BLCKSZ=4096;;
+   8) XLOG_BLCKSZ=8192;;
+  16) XLOG_BLCKSZ=16384;;
+  32) XLOG_BLCKSZ=32768;;
+   *) AC_MSG_ERROR([Invalid block size. Allowed values are 1,2,4,8,16,32.])
+ esac
+ AC_MSG_RESULT([${xlog_blocksize}kB])
+ 
+ AC_DEFINE_UNQUOTED([XLOG_BLCKSZ], ${XLOG_BLCKSZ}, [
+  Size of a WAL file block.  This need have no particular relation to BLCKSZ.
+  XLOG_BLCKSZ must be a power of 2, and if your system supports O_DIRECT I/O,
+  XLOG_BLCKSZ must be a multiple of the alignment requirement for direct-I/O
+  buffers, else direct I/O may fail.
+ 
+  Changing XLOG_BLCKSZ requires an initdb.
+ ]) 
+ 
  #
  # File segment size
  #
Index: doc/src/sgml/installation.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/installation.sgml,v
retrieving revision 1.308
diff -c -r1.308 installation.sgml
*** doc/src/sgml/installation.sgml	2 May 2008 01:08:26 -	1.308
--- doc/src/sgml/installation.sgml	2 May 2008 04:39:36 -
***
*** 1104,1109 
--- 1104,1123 
/varlistentry
  
varlistentry
+termoption--with-xlog-blocksize=replaceableBLOCKSIZE/replaceable/option/term
+listitem
+ para
+  Set the firsttermxlog block size/, in kilobytes.  This is the unit
+  of storage and I/O within the WAL files.  The default, 8 kilobytes,
+  is suitable for most situations; but other values may be useful
+  in special cases.
+  The value must be a power of 2 between 1 and 32 (kilobytes).
+  Note that changing this value requires an initdb.
+ /para
+/listitem
+   /varlistentry
+ 
+   varlistentry
 termoption--disable-spinlocks/option/term
 listitem

[PATCHES] Proposing correction to posix_fadvise() usage in xlog.c

2008-02-29 Thread Mark Wong
I believe I have a correction to the usage of posix_fadvise() in
xlog.c.  Basically posix_fadvise() is being called right before the
WAL segment file is closed, which effectively doesn't do anything as
opposed to when the file is opened.  This proposed correction calls
posix_fadvise() in three locations in order to make sure
POSIX_FADV_DONTNEED is set correctly since there are three cases for
opening a WAL segment file for writing.

I'm hesitant to post any data I have because I only have a little pc
with a SATA drive in it.  My hardware knowledge on SATA controllers
and drives is a little weak, but my testing with dbt-2 is showing the
performance dropping.  I am guessing that SATA drives have write cache
enabled by default so it seems to make sense that using
POSIX_FADV_DONTNEED will cause writes to be slower by writing through
the disk cache.  Again, assuming that is possible with SATA hardware.

If memory serves, one of the wins here is suppose to be that in a
scenario where we are not expecting to re-read writes to the WAL we
also do not want the writes to disk to flush out other data from the
operating system disk cache.  But I'm not sure how best to test
correctness.

Anyway, I hope I'm not way off but I'm sure someone will correct me. :)

Regards,
Mark


pgsql-log-fadvise.patch
Description: Binary data

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Auto create (top level) directory for create tablespace

2007-12-15 Thread Mark Kirkwood
I thought it made sense for CREATE TABLESPACE to attempt to create the 
top level location directory - and also for tablespace redo to do 
likwewise during WAL replay.


Tablespace creation then behaves a bit more like intidb with respect to 
directory creation, which I found quite nice.


Patch against HEAD, passes regression tests.

Cheers

Mark
Index: src/backend/commands/tablespace.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/tablespace.c,v
retrieving revision 1.51
diff -c -r1.51 tablespace.c
*** src/backend/commands/tablespace.c	15 Nov 2007 21:14:34 -	1.51
--- src/backend/commands/tablespace.c	15 Dec 2007 19:04:45 -
***
*** 199,204 
--- 199,205 
  	Oid			tablespaceoid;
  	char	   *location;
  	char	   *linkloc;
+ 	struct stat	st;
  	Oid			ownerId;
  
  	/* Must be super user */
***
*** 297,302 
--- 298,317 
  	recordDependencyOnOwner(TableSpaceRelationId, tablespaceoid, ownerId);
  
  	/*
+ 	 * Try to create the target directory if it does not exist.
+ 	 */
+ 	if (stat(location, st)  0)
+ 	{
+ 		if (mkdir(location, 0700) != 0)
+ ereport(ERROR,
+ 	(errcode_for_file_access(),
+  errmsg(could not create location directory \%s\: %m,
+ 		location)));
+ 	
+ 	}
+ 
+ 
+ 	/*
  	 * Attempt to coerce target directory to safe permissions.	If this fails,
  	 * it doesn't exist or has the wrong owner.
  	 */
***
*** 1279,1284 
--- 1294,1314 
  		xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
  		char	   *location = xlrec-ts_path;
  		char	   *linkloc;
+ 		struct stat	st;
+ 
+ 		/*
+ 		 * Try to create the target directory if it does not exist.
+ 		 */
+ 		if (stat(location, st)  0)
+ 		{
+ 			if (mkdir(location, 0700) != 0)
+ 			{
+ ereport(ERROR,
+ 		(errcode_for_file_access(),
+ 	 errmsg(could not create location directory \%s\: %m,
+ 			location)));
+ 			}
+ 		}
  
  		/*
  		 * Attempt to coerce target directory to safe permissions.	If this
Index: src/test/regress/output/tablespace.source
===
RCS file: /projects/cvsroot/pgsql/src/test/regress/output/tablespace.source,v
retrieving revision 1.5
diff -c -r1.5 tablespace.source
*** src/test/regress/output/tablespace.source	3 Jun 2007 22:16:03 -	1.5
--- src/test/regress/output/tablespace.source	15 Dec 2007 19:04:46 -
***
*** 57,63 
  
  -- Will fail with bad path
  CREATE TABLESPACE badspace LOCATION '/no/such/location';
! ERROR:  could not set permissions on directory /no/such/location: No such file or directory
  -- No such tablespace
  CREATE TABLE bar (i int) TABLESPACE nosuchspace;
  ERROR:  tablespace nosuchspace does not exist
--- 57,63 
  
  -- Will fail with bad path
  CREATE TABLESPACE badspace LOCATION '/no/such/location';
! ERROR:  could not create location directory /no/such/location: No such file or directory
  -- No such tablespace
  CREATE TABLE bar (i int) TABLESPACE nosuchspace;
  ERROR:  tablespace nosuchspace does not exist

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Auto create (top level) directory for create tablespace

2007-12-15 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:
  
I thought it made sense for CREATE TABLESPACE to attempt to create the 
top level location directory -



I thought we had deliberately made it not do that.  Auto-recreate during
replay sounds even worse.  The problem is that a tablespace would
normally be under a mount point, and auto-create has zero chance of
getting such a path right.

Ignoring this point is actually a fine recipe for destroying your data;
see Joe Conway's report a couple years back about getting burnt by a
soft NFS mount.  If the DB directory is not there, auto-creating it is
a horrible idea.

  


Hmm - ok, unmounted filesystems could bite you. However, they could bite 
folks creating the directory manually too...(I guess you could argue it 
is less likely though).


On the replay front, the use case I was thinking about is standby 
database - the classic foot gun there is to create a tablespace on 
source box and forget to add the appropriate directory on the target 
and bang! replay fails.


It does seem to me like there are scenarios where either behavior is 
undesirable... a possible option is a configuration parameter to choose 
between auto creation or not. However I'm happy to go with the consensus 
here - if its universally deemed to be a terrible idea, then let's ditch 
the patch :-)


Best wishes

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PATCHES] Cache lookup failed for relation X [was: DROP FUNCTION cache lookup failed for relation X]

2007-11-01 Thread Mark Kirkwood

Tom Lane wrote:


Still in the think-about-it mode, personally ... my proposed fix is
certainly much too invasive to consider back-patching, so unless someone
comes up with a way-simpler idea, it's 8.3 material at best ...


I ran into a variant of this today - simply creating and dropping a 
table repeatedly while doing \d from another session:


Session 1:

perl -e 'while (1) {print drop table if exists z0; \n create table z0 
(a int, b int);\n drop table z0;\n}' | psql cache  z0.log 21


Session 2:

psql cache
=# \d
ERROR:  cache lookup failed for relation 945897 (in RelationIsVisible, 
namespace.c:406)



The previous discussion centered around working on on locking in 
dependency.c whilst dropping related objects - but does this apply when 
there is just one? Anyway I tried to understand what was happening and 
the attached rather hacky patch seems to cure the behaviour -  So I've 
submitted it as a discussion aid, rather than 'the way of fixing 
this'... since I'm hoping there is a better way :-)


regards

Mark


Index: src/backend/catalog/namespace.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/catalog/namespace.c,v
retrieving revision 1.99
diff -c -r1.99 namespace.c
*** src/backend/catalog/namespace.c	27 Aug 2007 03:36:08 -	1.99
--- src/backend/catalog/namespace.c	1 Nov 2007 07:55:34 -
***
*** 19,26 
--- 19,28 
   */
  #include postgres.h
  
+ #include access/heapam.h
  #include access/xact.h
  #include catalog/dependency.h
+ #include catalog/indexing.h
  #include catalog/namespace.h
  #include catalog/pg_authid.h
  #include catalog/pg_conversion.h
***
*** 41,46 
--- 43,49 
  #include storage/ipc.h
  #include utils/acl.h
  #include utils/builtins.h
+ #include utils/fmgroids.h
  #include utils/guc.h
  #include utils/inval.h
  #include utils/lsyscache.h
***
*** 398,409 
  	Form_pg_class relform;
  	Oid			relnamespace;
  	bool		visible;
  
  	reltup = SearchSysCache(RELOID,
  			ObjectIdGetDatum(relid),
  			0, 0, 0);
  	if (!HeapTupleIsValid(reltup))
! 		elog(ERROR, cache lookup failed for relation %u, relid);
  	relform = (Form_pg_class) GETSTRUCT(reltup);
  
  	recomputeNamespacePath();
--- 401,441 
  	Form_pg_class relform;
  	Oid			relnamespace;
  	bool		visible;
+ 	bool		fromcache = true;
  
  	reltup = SearchSysCache(RELOID,
  			ObjectIdGetDatum(relid),
  			0, 0, 0);
  	if (!HeapTupleIsValid(reltup))
! 	{
! 		/* See if we can get it directly. */
! 		Relation		relation;
! 		HeapScanDesc	scan;
! 		ScanKeyData 	scanKeyData;
! 
! 		fromcache = false;
! 
! 		ScanKeyInit(scanKeyData,
! 	ObjectIdAttributeNumber,
! 	BTEqualStrategyNumber, F_OIDEQ,
! 	ObjectIdGetDatum(ClassOidIndexId));
! 
! 		relation = heap_open(RelationRelationId, AccessShareLock);
! 
! 		scan = heap_beginscan(relation, ActiveSnapshot,
! 			  1, scanKeyData);
! 
! 		reltup = heap_getnext(scan, ForwardScanDirection);
! 		reltup = heap_copytuple(reltup);
! 		if (!HeapTupleIsValid(reltup))
! 			elog(ERROR, cache lookup failed for relation %u, relid);
! 		
! 
! 		heap_endscan(scan);
! 
! 		heap_close(relation, AccessShareLock);
! 
! 	}
  	relform = (Form_pg_class) GETSTRUCT(reltup);
  
  	recomputeNamespacePath();
***
*** 446,452 
  		}
  	}
  
! 	ReleaseSysCache(reltup);
  
  	return visible;
  }
--- 478,487 
  		}
  	}
  
! 	if (fromcache)
! 		ReleaseSysCache(reltup);
! 	else
! 		heap_freetuple(reltup);
  
  	return visible;
  }

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [DOCS] OS/X startup scripts

2007-05-14 Thread Mark Kirkwood

Alvaro Herrera wrote:

David Fetter wrote:

On Mon, May 14, 2007 at 03:31:40PM +1200, Mark Kirkwood wrote:

David Fetter wrote:

cvs diff works just great until you want to add or remove a file
without write permissions to the CVS repository, i.e. when you've
checked out as anonymous.


I usually saved an untouched version of the tree to compare against,
so something like:

$ cvs diff -Nacr pgsql.orig pgsql

gives a complete patch including added/deleted files. It is a bit
primitive, but is pretty easy to do!


Except that it also includes diffs for generated files, which tend to be
huge.  To work around that you need to create a list of files to
exclude, and the whole thing (which was cumbersome already) starts to
get unmanageable.

I do use the cvsup mirror myself which makes things so much easier and
faster.



Not so, you just do a

$ make maintainer-clean

before the diff, which clears all those out.

While maintaining a cvs mirror is also good, for a small piece of work 
(or a work on a well defined area of the code) this requires no fooling 
around with repositories at all (which is nice).


Cheers

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] [DOCS] OS/X startup scripts

2007-05-13 Thread Mark Kirkwood

David Fetter wrote:

cvs diff works just great until you want to add or remove a file
without write permissions to the CVS repository, i.e. when you've
checked out as anonymous.



I usually saved an untouched version of the tree to compare against, so 
something like:


$ cvs diff -Nacr pgsql.orig pgsql

gives a complete patch including added/deleted files. It is a bit 
primitive, but is pretty easy to do!


Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Updated bitmap index patch

2007-05-04 Thread Mark Kirkwood

Mark Kirkwood wrote:



I have applied this to todays HEAD performed some quick tests - looks 
good! I have to re-create a TPC-H dataset to test one of the previous 
bugs, so I'll probably look at that tomorrow or so.




The TPC-H query query that previously produced a SIGSEGV now runs and 
gives the correct answer.


Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Updated bitmap index patch

2007-05-02 Thread Mark Kirkwood

Gavin Sherry wrote:

Hi all,

Attached is an updated bitmap index patch. It contains bug fixes, API
changes, binary changes (page identifier to distinguish it from other
indexes) and has been brought up to HEAD.



I have applied this to todays HEAD performed some quick tests - looks 
good! I have to re-create a TPC-H dataset to test one of the previous 
bugs, so I'll probably look at that tomorrow or so.



I worked on a few approaches to VACUUM, none very satisfactory. The
problem is, breaking a compressed word representing matches can have
serious consequences -- at the least, creation of new words, at the worst,
creation of a new page. If a lot of this were to happen, REINDEX would be
much more efficient (this is what earlier patches did).

One approach I looked at was modifying the existing read API to be able to
do something like kill prior tuple. This, I think, made the API quite
complex and it was hard to implement, since the existing mechanism
decompresses words on the fly and it would be hard to identify which TID
is no longer a match. So, I dropped this idea pretty quickly.

The second approach is to just manually traverse each vector and change
matches to non-matches where necessary. The complexity then is in managing
the consequences of breaking compressed words, doing WAL (efficiently) and
calculating free space. I've only partially implemented this approach. At
this stage, I don't have time to finish it due to other commitments.



The second approach seems like better the way to go (as far as I 
understand the issues...). How much work is remaining on this? - not 
sure that I'll have time to look at it either ... but may as well know 
the size to the job :-) !


Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] scan_recycle_buffers

2007-03-10 Thread Mark Kirkwood

Simon Riggs wrote:

On Sat, 2007-03-10 at 07:59 +, Simon Riggs wrote:

On Fri, 2007-03-09 at 18:05 -0500, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

On Fri, 2007-03-09 at 16:45 -0500, Tom Lane wrote:

We do that anyway; but certainly Simon's patch ought not be injecting
an additional one.

It should be possible to pass that down from the planner to the
executor, in certain cases.

Huh?  See HeapScanDesc-rs_nblocks.

Many thanks.


New patch enclosed, implementation as you've requested.

Not ready to apply yet, but good for testing.



A quick test using the setup for Buffer cache is not scan resistant 
thread:


Firstly vanilla 8.3 from 20070310:

Shared Buffers  Elapsed  vmstat IO rate
--  ---  --
400MB   101 s122 MB/s
128KB79 s155 MB/s  [1]

Now apply cycle scan v2:

Shared Buffers  Scan_recycle_buffers Elapsed  vmstat IO rate
--   ---  -
400MB   0101 s122 MB/s
400MB   878 s 155 MB/s
400MB   16   77 s 155 MB/s
400MB   32   78 s 155 MB/s
400MB   64   82 s 148 MB/s
400MB   128  93 s 128 MB/s

Certainly seems to have the desired effect!

Cheers

Mark

[1] I'm not seeing 166 MB/s like previous 8.2.3 data, however 8.3 PGDATA 
is located further toward the end of the disk array - which I suspect is 
limiting the IO rate a little.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] HOT WIP Patch - version 2

2007-02-20 Thread mark
On Tue, Feb 20, 2007 at 08:31:45PM +0530, Pavan Deolasee wrote:
 I see your point, but as you mentioned do we really care ? The chain
 needs to be broken so that the intermediate DEAD tuples can be
 vacuumed. We can't vacuum them normally because they could
 be a part of live HOT-update chain. Resetting the HOT-updated
 status of the root tuple helps to mark the index entry LP_DELETE
 once the entire HOT-update chain is dead.
 ...

For some reason this paragraph raised a query in my mind. Will we
be able to toggle this new hot update code at configure time, so
that we can measure what sort of effect this change has once it is
complete?

Even if only during the early testing cycles for the next release, I
think it would be useful.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] patch adding new regexp functions

2007-02-17 Thread Mark Dilger

Jeremy Drake wrote:

The regexp_split function code was based on some code that a friend of
mine wrote which used PCRE rather than postgres' internal regexp support.
I don't know exactly what his use-case was, but he probably had
one because he wrote the function and had it returning SETOF text ;)
Perhaps he can share a general idea of what it was (nudge nudge)?


db=# CREATE OR REPLACE FUNCTION split(p TEXT, t TEXT) RETURNS SETOF TEXT AS $$
db$# my ($p, $t) = @_;
db$# return [ split(/$p/,$t) ];
db$# $$ LANGUAGE plperl;
CREATE FUNCTION
Time: 1.254 ms
db=# select distinct word from (select * from split('\\W+','mary had a little 
lamb, whose fleece was black as soot') as word) as ss;

  word

 a
 as
 black
 fleece
 had
 lamb
 little
 mary
 soot
 was
 whose
(11 rows)

Time: 30.517 ms



As you can see, this can easily be done with a plperl function.  Some people may 
not want to install plperl, or may not want to allow arbitrary patterns to be 
handed to perl in this fashion.  That was not my concern.  I was simply trying 
to see if I could make it faster in a C-language coded function.


In the end I dropped the project because the plperl function works fast enough 
for me and I don't have any objection to plperl from a security standpoint, etc.


mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Avg performance for int8/numeric

2007-02-16 Thread Mark Kirkwood

Bruce Momjian wrote:

I have tested this patch but it generates regression failures.

There was some code drift, so I am attaching an updated version of the
patch, and the regression diffs.  The 'four' column is an 'int4' so my
guess is that somehow the wrong aggregate is being called.



Good catch - I must have neglected to run the regression test after 
amending the number of array arguments for the numeric avg :-(.


Hmmm - this changing the number of array args for avg means we can't mix 
transition functions for variance with final functions for avg - which 
is exactly what the regression suite does with the 'newavg' aggregate.


I've 'fixed' this by amending the definition of 'newavg' to use the 
transition and final function that 'avg' does. However I found myself 
asking if this lost us the point of that test - so I looked back at the 
older postgres versions (e.g. 7.1.3) and saw that back *then* 'newavg' 
and 'avg' were defined using the same functions...so I think making the 
change as indicated is ok.


I've attached a new patch with this change.

Cheers

Mark
diff -Nacr pgsql.orig/src/backend/utils/adt/numeric.c pgsql/src/backend/utils/adt/numeric.c
*** pgsql.orig/src/backend/utils/adt/numeric.c	Sun Jan 21 11:36:20 2007
--- pgsql/src/backend/utils/adt/numeric.c	Fri Feb 16 18:09:24 2007
***
*** 2165,2170 
--- 2165,2204 
  	return result;
  }
  
+ /*
+  * Improve avg performance by not caclulating sum(X*X).
+  */
+ static ArrayType *
+ do_numeric_avg_accum(ArrayType *transarray, Numeric newval)
+ {
+ 	Datum	   *transdatums;
+ 	int			ndatums;
+ 	Datum		N,
+ sumX;
+ 	ArrayType  *result;
+ 
+ 	/* We assume the input is array of numeric */
+ 	deconstruct_array(transarray,
+ 	  NUMERICOID, -1, false, 'i',
+ 	  transdatums, NULL, ndatums);
+ 	if (ndatums != 2)
+ 		elog(ERROR, expected 2-element numeric array);
+ 	N = transdatums[0];
+ 	sumX = transdatums[1];
+ 
+ 	N = DirectFunctionCall1(numeric_inc, N);
+ 	sumX = DirectFunctionCall2(numeric_add, sumX,
+ 			   NumericGetDatum(newval));
+ 
+ 	transdatums[0] = N;
+ 	transdatums[1] = sumX;
+ 
+ 	result = construct_array(transdatums, 2,
+ 			 NUMERICOID, -1, false, 'i');
+ 
+ 	return result;
+ }
+ 
  Datum
  numeric_accum(PG_FUNCTION_ARGS)
  {
***
*** 2175,2180 
--- 2209,2226 
  }
  
  /*
+  * Optimized case for average of numeric.
+  */
+ Datum
+ numeric_avg_accum(PG_FUNCTION_ARGS)
+ {
+ 	ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+ 	Numeric		newval = PG_GETARG_NUMERIC(1);
+ 
+ 	PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+ 
+ /*
   * Integer data types all use Numeric accumulators to share code and
   * avoid risk of overflow.	For int2 and int4 inputs, Numeric accumulation
   * is overkill for the N and sum(X) values, but definitely not overkill
***
*** 2219,2224 
--- 2265,2286 
  	PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }
  
+ /*
+  * Optimized case for average of int8.
+  */
+ Datum
+ int8_avg_accum(PG_FUNCTION_ARGS)
+ {
+ 	ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+ 	Datum		newval8 = PG_GETARG_DATUM(1);
+ 	Numeric		newval;
+ 
+ 	newval = DatumGetNumeric(DirectFunctionCall1(int8_numeric, newval8));
+ 
+ 	PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+ 
+ 
  Datum
  numeric_avg(PG_FUNCTION_ARGS)
  {
***
*** 2232,2242 
  	deconstruct_array(transarray,
  	  NUMERICOID, -1, false, 'i',
  	  transdatums, NULL, ndatums);
! 	if (ndatums != 3)
! 		elog(ERROR, expected 3-element numeric array);
  	N = DatumGetNumeric(transdatums[0]);
  	sumX = DatumGetNumeric(transdatums[1]);
- 	/* ignore sumX2 */
  
  	/* SQL92 defines AVG of no values to be NULL */
  	/* N is zero iff no digits (cf. numeric_uminus) */
--- 2294,2303 
  	deconstruct_array(transarray,
  	  NUMERICOID, -1, false, 'i',
  	  transdatums, NULL, ndatums);
! 	if (ndatums != 2)
! 		elog(ERROR, expected 2-element numeric array);
  	N = DatumGetNumeric(transdatums[0]);
  	sumX = DatumGetNumeric(transdatums[1]);
  
  	/* SQL92 defines AVG of no values to be NULL */
  	/* N is zero iff no digits (cf. numeric_uminus) */
diff -Nacr pgsql.orig/src/include/catalog/catversion.h pgsql/src/include/catalog/catversion.h
*** pgsql.orig/src/include/catalog/catversion.h	Fri Feb 16 18:06:34 2007
--- pgsql/src/include/catalog/catversion.h	Fri Feb 16 18:09:24 2007
***
*** 53,58 
   */
  
  /*			mmddN */
! #define CATALOG_VERSION_NO	200702131
  
  #endif
--- 53,58 
   */
  
  /*			mmddN */
! #define CATALOG_VERSION_NO	200702151
  
  #endif
diff -Nacr pgsql.orig/src/include/catalog/pg_aggregate.h pgsql/src/include/catalog/pg_aggregate.h
*** pgsql.orig/src/include/catalog/pg_aggregate.h	Sun Jan 21 11:36:22 2007
--- pgsql/src/include/catalog/pg_aggregate.h	Fri Feb 16 18:09:24 2007
***
*** 80,89 
   */
  
  /* avg */
! DATA(insert ( 2100	int8_accum		numeric_avg		0	1231

Re: [PATCHES] [pgsql-patches] Patch to avoid gprof profiling overwrites

2007-01-31 Thread Mark Kirkwood

[EMAIL PROTECTED] wrote:

And the patch is where?


You caught me; guess I'd better make something up fast, huh?

Here it is, thanks.

-- Korry



---

[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
 It's difficult to profile a backend server process  (using gprof)
 because each process overwrites any earlier profile as it exits.
 
 It is especially tricky to nab a useful profile if you happen to have

 autovacuum enabled.
 
 This patch reduces the problem by forcing the backend to 'cd' to a new

 directory ($PGDATA/gprof/pid) just before calling exit(), but only if
 the backend was compiled with -DLINUX_PROFILE.



Nice addition - tested on FreeBSD 6.2, works great!

The name for the define variable could perhaps be better - feels silly 
adding -DLINUX_PROFILE on Freebsd! (maybe just PROFILE or GPROF_PROFILE?).


Cheers

Mark





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [pgsql-patches] Patch to avoid gprof profilingoverwrites

2007-01-31 Thread Mark Kirkwood

[EMAIL PROTECTED] wrote:
The name for the define variable could perhaps be better - feels silly 
adding -DLINUX_PROFILE on Freebsd! (maybe just PROFILE or GPROF_PROFILE?).


That wasn't my choice, there is other code elsewhere that depends on 
that symbol, I just added a little bit more.




Right - but LINUX_PROFILE was added to correct Linux specific oddities 
with the time counter accumulation, whereas your patch is not Linux 
specific at all. So I think a more representative symbol is required.


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [pgsql-patches] Patch to avoid gprof profilingoverwrites

2007-01-31 Thread Mark Kirkwood

Mark Kirkwood wrote:

[EMAIL PROTECTED] wrote:
The name for the define variable could perhaps be better - feels 
silly adding -DLINUX_PROFILE on Freebsd! (maybe just PROFILE or 
GPROF_PROFILE?).


That wasn't my choice, there is other code elsewhere that depends on 
that symbol, I just added a little bit more.




Right - but LINUX_PROFILE was added to correct Linux specific oddities 
with the time counter accumulation, whereas your patch is not Linux 
specific at all. So I think a more representative symbol is required.




In fact - thinking about this a bit more, probably a construction like:

#if defined(LINUX_PROFILE) || defined(PROFILE)

or similar would work - because I think those of us not on Linux do 
*not* want to define LINUX_PROFILE, as our timer accumulation for forked 
process works fine as it is...but we don't want to make Linux guys have 
to define LINUX_PROFILE *and* PROFILE...


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] README for vcbuild

2007-01-08 Thread Mark Kirkwood

Magnus Hagander wrote:



Note that PostgreSQL builds natively with Visual C++. You must *there for*
make sure that you do *NOT* have any tools from Cygwin or Mingw available
in the system PATH. Also, make sure you don't have any Cygwin/Mingw
environment variables leaking through.




Little typo:

Note that PostgreSQL builds natively with Visual C++. You must 
*therefore* ...


Cheers

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Avg performance for int8/numeric

2006-11-24 Thread Mark Kirkwood

Neil Conway wrote:

On Fri, 2006-11-24 at 11:08 +1300, Mark Kirkwood wrote:
- Modifies do_numeric_accum to have an extra bool parameter and does not 
calc sumX2 when it is false.


I think it would be clearer to reorganize this function slightly, and
have only a single branch on useSumX2. On first glance it isn't
obviously that transdatums[2] is defined (but unchanged) when useSumX2
is false.



Right - new patch attached that adds a new function do_numeric_avg_accum 
that only uses N and sum(X). This means I could amend the avg aggregates 
for numeric, int8 to have a initvalues of {0,0}.


Cheers

Mark
? gmon.out
Index: src/backend/utils/adt/numeric.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/numeric.c,v
retrieving revision 1.96
diff -c -r1.96 numeric.c
*** src/backend/utils/adt/numeric.c	4 Oct 2006 00:29:59 -	1.96
--- src/backend/utils/adt/numeric.c	25 Nov 2006 00:00:58 -
***
*** 2097,2102 
--- 2097,2136 
  	return result;
  }
  
+ /*
+  * Improve avg performance by not caclulating sum(X*X).
+  */
+ static ArrayType *
+ do_numeric_avg_accum(ArrayType *transarray, Numeric newval)
+ {
+ 	Datum	   *transdatums;
+ 	int			ndatums;
+ 	Datum		N,
+ sumX;
+ 	ArrayType  *result;
+ 
+ 	/* We assume the input is array of numeric */
+ 	deconstruct_array(transarray,
+ 	  NUMERICOID, -1, false, 'i',
+ 	  transdatums, NULL, ndatums);
+ 	if (ndatums != 2)
+ 		elog(ERROR, expected 2-element numeric array);
+ 	N = transdatums[0];
+ 	sumX = transdatums[1];
+ 
+ 	N = DirectFunctionCall1(numeric_inc, N);
+ 	sumX = DirectFunctionCall2(numeric_add, sumX,
+ 			   NumericGetDatum(newval));
+ 
+ 	transdatums[0] = N;
+ 	transdatums[1] = sumX;
+ 
+ 	result = construct_array(transdatums, 2,
+ 			 NUMERICOID, -1, false, 'i');
+ 
+ 	return result;
+ }
+ 
  Datum
  numeric_accum(PG_FUNCTION_ARGS)
  {
***
*** 2107,2112 
--- 2141,2158 
  }
  
  /*
+  * Optimized case for average of numeric.
+  */
+ Datum
+ numeric_avg_accum(PG_FUNCTION_ARGS)
+ {
+ 	ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+ 	Numeric		newval = PG_GETARG_NUMERIC(1);
+ 
+ 	PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+ 
+ /*
   * Integer data types all use Numeric accumulators to share code and
   * avoid risk of overflow.	For int2 and int4 inputs, Numeric accumulation
   * is overkill for the N and sum(X) values, but definitely not overkill
***
*** 2151,2156 
--- 2197,2218 
  	PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }
  
+ /*
+  * Optimized case for average of int8.
+  */
+ Datum
+ int8_avg_accum(PG_FUNCTION_ARGS)
+ {
+ 	ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+ 	Datum		newval8 = PG_GETARG_DATUM(1);
+ 	Numeric		newval;
+ 
+ 	newval = DatumGetNumeric(DirectFunctionCall1(int8_numeric, newval8));
+ 
+ 	PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+ 
+ 
  Datum
  numeric_avg(PG_FUNCTION_ARGS)
  {
***
*** 2164,2174 
  	deconstruct_array(transarray,
  	  NUMERICOID, -1, false, 'i',
  	  transdatums, NULL, ndatums);
! 	if (ndatums != 3)
! 		elog(ERROR, expected 3-element numeric array);
  	N = DatumGetNumeric(transdatums[0]);
  	sumX = DatumGetNumeric(transdatums[1]);
- 	/* ignore sumX2 */
  
  	/* SQL92 defines AVG of no values to be NULL */
  	/* N is zero iff no digits (cf. numeric_uminus) */
--- 2226,2235 
  	deconstruct_array(transarray,
  	  NUMERICOID, -1, false, 'i',
  	  transdatums, NULL, ndatums);
! 	if (ndatums != 2)
! 		elog(ERROR, expected 2-element numeric array);
  	N = DatumGetNumeric(transdatums[0]);
  	sumX = DatumGetNumeric(transdatums[1]);
  
  	/* SQL92 defines AVG of no values to be NULL */
  	/* N is zero iff no digits (cf. numeric_uminus) */
Index: src/include/catalog/pg_aggregate.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_aggregate.h,v
retrieving revision 1.58
diff -c -r1.58 pg_aggregate.h
*** src/include/catalog/pg_aggregate.h	4 Oct 2006 00:30:07 -	1.58
--- src/include/catalog/pg_aggregate.h	25 Nov 2006 00:01:01 -
***
*** 80,89 
   */
  
  /* avg */
! DATA(insert ( 2100	int8_accum		numeric_avg		0	1231	{0,0,0} ));
  DATA(insert ( 2101	int4_avg_accum	int8_avg		0	1016	{0,0} ));
  DATA(insert ( 2102	int2_avg_accum	int8_avg		0	1016	{0,0} ));
! DATA(insert ( 2103	numeric_accum	numeric_avg		0	1231	{0,0,0} ));
  DATA(insert ( 2104	float4_accum	float8_avg		0	1022	{0,0,0} ));
  DATA(insert ( 2105	float8_accum	float8_avg		0	1022	{0,0,0} ));
  DATA(insert ( 2106	interval_accum	interval_avg	0	1187	{0 second,0 second} ));
--- 80,89 
   */
  
  /* avg */
! DATA(insert ( 2100	int8_avg_accum	numeric_avg		0	1231	{0,0} ));
  DATA(insert ( 2101	int4_avg_accum	int8_avg		0	1016	{0,0} ));
  DATA(insert ( 2102	int2_avg_accum	int8_avg		0	1016	{0,0} ));
! DATA(insert ( 2103

Re: [PATCHES] Avg performance for int8/numeric

2006-11-24 Thread Mark Kirkwood

Neil Conway wrote:




(it is still slower than doing sum/count - possibly due to the
construct/deconstruct overhead of the numeric transition array).


This would indeed be worth profiling. If it turns out that array
overhead is significant, I wonder if we could use a composite type for
the transition variable instead. That might also make it easier to
represent the N value as an int8 rather than a numeric.



I've profiled the 2nd patch using the setup indicated below. The first 
64 lines of the flat graph are attached. The complete profile is here:


http://homepages.paradise.net.nz/markir/download/postgres/postgres-avg.gprof.gz

Setup:

avg=# \d avgtest
   Table public.avgtest
 Column | Type  | Modifiers
+---+---
 id | integer   |
 val0   | bigint|
 val1   | numeric(12,2) |
 val2   | numeric(10,0) |

avg=# analyze verbose avgtest;
INFO:  analyzing public.avgtest
INFO:  avgtest: scanned 3000 of 87689 pages, containing 342138 live 
rows and 0 dead rows; 3000 rows in sample, 1580 estimated total rows

ANALYZE
Time: 252.033 ms
avg=# select avg(val2) from avgtest;
 avg
-
 714285.21428580
(1 row)

Time: 35196.028 ms
avg=# \q

regards

Mark
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 14.42  2.16 2.16 12977 0.00 0.00  AllocSetAlloc
  9.08  3.52 1.36 2000 0.00 0.00  add_abs
  5.54  4.35 0.83 1000 0.00 0.00  slot_deform_tuple
  5.41  5.16 0.81 60001673 0.00 0.00  AllocSetFree
  4.34  5.81 0.65 1000 0.00 0.00  construct_md_array
  4.21  6.44 0.63 2003 0.00 0.00  make_result
  3.54  6.97 0.53 1000 0.00 0.00  numeric_add
  3.27  7.46 0.49 3003 0.00 0.00  set_var_from_num
  3.00  7.91 0.45 12652 0.00 0.00  MemoryContextAlloc
  2.74  8.32 0.41 1001 0.00 0.00  heapgettup_pagemode
  2.54  8.70 0.38 1000 0.00 0.00  
advance_transition_function
  2.40  9.06 0.36 3006 0.00 0.00  alloc_var
  2.27  9.40 0.34 1000 0.00 0.00  do_numeric_avg_accum
  2.00  9.70 0.30 1001 0.00 0.00  CopyArrayEls
  2.00 10.00 0.30 1000 0.00 0.00  numeric_inc
  1.94 10.29 0.29 2002 0.00 0.00  ArrayGetNItems
  1.94 10.58 0.29 1001 0.00 0.00  deconstruct_array
  1.87 10.86 0.28 2002 0.00 0.00  ArrayCastAndSet
  1.74 11.12 0.26 60001672 0.00 0.00  pfree
  1.67 11.37 0.25 1001 0.00 0.00  slot_getattr
  1.60 11.61 0.24 1000 0.00 0.00  advance_aggregates
  1.54 11.84 0.23 4006 0.00 0.00  free_var
  1.54 12.07 0.23 1001 0.00 0.00  datumCopy
  1.47 12.29 0.22 1001 0.00 0.00  SeqNext
  1.40 12.50 0.21 2000 0.00 0.00  add_var
  1.34 12.70 0.20 2003 0.00 0.00  strip_var
  1.34 12.90 0.20 1001 0.00 0.00  ExecScan
  1.27 13.09 0.19 1003 0.00 0.00  AllocSetReset
  1.20 13.27 0.18 1003 0.00 0.00  ExecProcNode
  1.13 13.44 0.17 7010 0.00 0.00  pg_detoast_datum
  0.93 13.58 0.14 1000 0.00 0.00  numeric_avg_accum
  0.93 13.72 0.142 0.07 6.61  ExecAgg
  0.87 13.85 0.13 1001 0.00 0.00  datumGetSize
  0.87 13.98 0.1387860 0.00 0.00  heapgetpage
  0.73 14.09 0.11 1001 0.00 0.00  DirectFunctionCall2
  0.73 14.20 0.11 1000 0.00 0.00  construct_array
  0.60 14.29 0.09 1148 0.00 0.00  DirectFunctionCall1
  0.53 14.37 0.08 1001 0.00 0.00  ExecStoreTuple
  0.53 14.45 0.08 1000 0.00 0.00  HeapTupleSatisfiesSnapshot
  0.40 14.51 0.06 1103 0.00 0.00  heap_getnext
  0.33 14.56 0.05   254419 0.00 0.00  
hash_search_with_hash_value
  0.27 14.60 0.04 1001 0.00 0.00  MemoryContextReset
  0.27 14.64 0.04 1000 0.00 0.00  ExecEvalVar
  0.27 14.68 0.04 1000 0.00 0.00  XidInSnapshot
  0.27 14.72 0.04   511482 0.00 0.00  LWLockRelease
  0.27 14.76 0.04   164939 0.00 0.00  hash_any
  0.27 14.80 0.0487760 0.00 0.00  StrategyGetBuffer
  0.20 14.83 0.03 1009 0.00 0.00  TransactionIdPrecedes
  0.20 14.86 0.0387760 0.00 0.00  FileRead
  0.13 14.88 0.02 1001 0.00 0.00  ExecSeqScan
  0.13 14.90 0.02   511481 0.00 0.00  LWLockAcquire
  0.13 14.92 0.0288217 0.00 0.00

Re: [PATCHES] Avg performance for int8/numeric

2006-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:

So, if I understand this correctly, we're calling Alloc and ContextAlloc 10
times for every row being summed?

There are approx 10M rows and the profile snippet below shows 100M calls to
each of those.



Unless I've accidentally run gprof on the profile output for a 100M row 
case I had lying around :-( ... I'll check


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] Avg performance for int8/numeric

2006-11-24 Thread Mark Kirkwood

Mark Kirkwood wrote:

Luke Lonergan wrote:
So, if I understand this correctly, we're calling Alloc and 
ContextAlloc 10

times for every row being summed?

There are approx 10M rows and the profile snippet below shows 100M 
calls to

each of those.



Unless I've accidentally run gprof on the profile output for a 100M row 
case I had lying around :-( ... I'll check




I haven't (so profile as attached is ok)...

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PATCHES] pg_buffercache tidyup

2006-10-22 Thread Mark Kirkwood

This is probably terrible timing, but I noticed Tom had done some nice
tidying up on pg_freespacemap to eliminate the clumsy conversion to and
from strings. This patch does a similar thing for pg_buffercache.

I did wonder about not showing buffers that are invalid or not in use
(currently displays all attributes bar the id as NULL). Comments?

Cheers

Mark

Index: contrib/pg_buffercache/pg_buffercache_pages.c
===
RCS file: /projects/cvsroot/pgsql/contrib/pg_buffercache/pg_buffercache_pages.c,v
retrieving revision 1.10
diff -c -r1.10 pg_buffercache_pages.c
*** contrib/pg_buffercache/pg_buffercache_pages.c	19 Oct 2006 18:32:46 -	1.10
--- contrib/pg_buffercache/pg_buffercache_pages.c	22 Oct 2006 06:27:52 -
***
*** 8,13 
--- 8,14 
   */
  #include postgres.h
  #include funcapi.h
+ #include access/heapam.h
  #include catalog/pg_type.h
  #include storage/buf_internals.h
  #include storage/bufmgr.h
***
*** 44,52 
  typedef struct
  {
  
! 	AttInMetadata *attinmeta;
  	BufferCachePagesRec *record;
- 	char	   *values[NUM_BUFFERCACHE_PAGES_ELEM];
  
  }	BufferCachePagesContext;
  
--- 45,52 
  typedef struct
  {
  
! 	TupleDesc	tupdesc;
  	BufferCachePagesRec *record;
  
  }	BufferCachePagesContext;
  
***
*** 77,83 
  		/* Switch context when allocating stuff to be used in later calls */
  		oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
  
! 		/* Construct a tuple to return. */
  		tupledesc = CreateTemplateTupleDesc(NUM_BUFFERCACHE_PAGES_ELEM, false);
  		TupleDescInitEntry(tupledesc, (AttrNumber) 1, bufferid,
  		   INT4OID, -1, 0);
--- 77,87 
  		/* Switch context when allocating stuff to be used in later calls */
  		oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
  
! 
! 		/* Create a user function context for cross-call persistence */
! 		fctx = (BufferCachePagesContext *) palloc(sizeof(BufferCachePagesContext));
! 
! 		/* Construct a tuple to return, and save its descriptor. */
  		tupledesc = CreateTemplateTupleDesc(NUM_BUFFERCACHE_PAGES_ELEM, false);
  		TupleDescInitEntry(tupledesc, (AttrNumber) 1, bufferid,
  		   INT4OID, -1, 0);
***
*** 92,118 
  		TupleDescInitEntry(tupledesc, (AttrNumber) 6, isdirty,
  		   BOOLOID, -1, 0);
  
! 		/* Generate attribute metadata needed later to produce tuples */
! 		funcctx-attinmeta = TupleDescGetAttInMetadata(tupledesc);
! 
! 		/*
! 		 * Create a function context for cross-call persistence and initialize
! 		 * the buffer counters.
! 		 */
! 		fctx = (BufferCachePagesContext *) palloc(sizeof(BufferCachePagesContext));
! 		funcctx-max_calls = NBuffers;
! 		funcctx-user_fctx = fctx;
  
  		/* Allocate NBuffers worth of BufferCachePagesRec records. */
  		fctx-record = (BufferCachePagesRec *) palloc(sizeof(BufferCachePagesRec) * NBuffers);
  
! 		/* allocate the strings for tuple formation */
! 		fctx-values[0] = (char *) palloc(3 * sizeof(uint32) + 1);
! 		fctx-values[1] = (char *) palloc(3 * sizeof(uint32) + 1);
! 		fctx-values[2] = (char *) palloc(3 * sizeof(uint32) + 1);
! 		fctx-values[3] = (char *) palloc(3 * sizeof(uint32) + 1);
! 		fctx-values[4] = (char *) palloc(3 * sizeof(uint32) + 1);
! 		fctx-values[5] = (char *) palloc(2);
  
  		/* Return to original context when allocating transient memory */
  		MemoryContextSwitchTo(oldcontext);
--- 96,109 
  		TupleDescInitEntry(tupledesc, (AttrNumber) 6, isdirty,
  		   BOOLOID, -1, 0);
  
! 		fctx-tupdesc = BlessTupleDesc(tupledesc);
  
  		/* Allocate NBuffers worth of BufferCachePagesRec records. */
  		fctx-record = (BufferCachePagesRec *) palloc(sizeof(BufferCachePagesRec) * NBuffers);
  
! 		/* Set max calls and remember the user function context. */
! 		funcctx-max_calls = NBuffers;
! 		funcctx-user_fctx = fctx;
  
  		/* Return to original context when allocating transient memory */
  		MemoryContextSwitchTo(oldcontext);
***
*** 167,184 
  	if (funcctx-call_cntr  funcctx-max_calls)
  	{
  		uint32		i = funcctx-call_cntr;
! 		char	   *values[NUM_BUFFERCACHE_PAGES_ELEM];
! 		int			j;
! 
! 		/*
! 		 * Use a temporary values array, initially pointing to fctx-values,
! 		 * so it can be reassigned w/o losing the storage for subsequent
! 		 * calls.
! 		 */
! 		for (j = 0; j  NUM_BUFFERCACHE_PAGES_ELEM; j++)
! 		{
! 			values[j] = fctx-values[j];
! 		}
  
  
  		/*
--- 158,165 
  	if (funcctx-call_cntr  funcctx-max_calls)
  	{
  		uint32		i = funcctx-call_cntr;
! 		Datum		values[NUM_BUFFERCACHE_PAGES_ELEM];
! 		bool		nulls[NUM_BUFFERCACHE_PAGES_ELEM];
  
  
  		/*
***
*** 189,224 
  			fctx-record[i].isvalid == false)
  		{
  
! 			sprintf(values[0], %u, fctx-record[i].bufferid);
! 			values[1] = NULL;
! 			values[2] = NULL;
! 			values[3] = NULL;
! 			values[4] = NULL;
! 			values[5] = NULL;
  
  		}
  		else
  		{
  
! 			sprintf(values[0], %u, fctx-record[i].bufferid

Re: [PATCHES] WIP: Hierarchical Queries - stage 1

2006-09-22 Thread Mark Cave-Ayland
Hi Tom,

Thanks for your initial thoughts on this.


On Wed, 2006-09-20 at 20:13 -0400, Tom Lane wrote:

(cut)

 You really can't get away with having the identical representation for
 CTEs and ordinary sub-selects in the range table.  For instance, it
 looks like your patch will think that
 
   select ... from (select ...) as x, x, ...
 
 is legal when it certainly is not.  I think you need either a new
 RTEKind or an additional flag in the RTE to show that it's a CTE rather
 than a plain subselect.  I'm not entirely sure that you even want the
 CTEs in the rangetable at all --- that still needs some thought.

For semantic reasons, I can see why you are questioning whether or not
the CTE should be contained within the rangetable - there is an implicit
hint that RTEs reflect entries within the FROM clause, but then I also
see that the rewriter adds RTEs when substituting view definitions into
queries. The comments in parsenodes.h also suggest that an RTE is a
namespace/data source reference for a named entity within the query.

The main problem I can see with keeping the CTEs outside the rangetable
is that according to the source, jointree nodes must currently have
RANGETBLREF nodes as leaf nodes; as I understand it, your suggestion of
maintaining the CTEs separately would involve something along the lines
of keeping a separate CTETable and creating some form of CTETBLREF node
that could be referenced within the jointree. While arguably it may be
semantically neater, it appears to involve quite a bit of extra work...
could you explain in more detail as to why you feel that CTEs should
remain outside the rangetable?

 This comes back to the question of whether the CTE per se should be an
 RTE at all.  Maybe only the reference to it should be an RTE.  The
 behavior when seeing a plain RangeVar in FROM would be to first search
 the side list of valid CTEs, and only on failure go looking for a real
 table.

This is effectively what the patch does, albeit not particularly
elegantly. I'll spend some time on making those changes a bit more
refined.


Kind regards,

Mark.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] WIP: Hierarchical Queries - stage 1

2006-09-20 Thread Mark Cave-Ayland
Hi everyone,

After spending several days reading PostgreSQL source code (and another
couple of days coding), I've managed to come up with some alpha code
that attempts to implement non-recursive WITH common table expressions.
Having got this far, I feel that I need to ask for advice from some more
experienced PostgreSQL hackers and so I post the alpha version here.

The patch attempts to alter the parser grammar, and using code modified
from that used when processing subselects in the FROM clause, attempts
to treat the CTE as if it were presented the same as a FROM subselect.

My main issue at the moment is that the code in transformFromClauseItem
seems a terrible hack, mainly because the grammar returns each string
within the FROM clause as a RangeVar, and transformFromClauseItem
assumes that each RangeVar represents a physical relation. Of course,
this is not the case when referencing a CTE and so the code first checks
to see if an entry has already been created when processing the WITH
clause; if it does then we return NULL to indicate that
transformFromClause should do nothing. Messy, but I wanted to see what
other developers thought before jumping in and rewriting this part of
the code.

Another point to think about is what should a query return if the SELECT
doesn't refer to a CTE? For example:

- WITH foo(a, b) AS (SELECT * FROM pg_class) SELECT * FROM pg_class;

Since I've inserted the CTE as a range table entry of type RTE_SUBQUERY
then the current behaviour is to perform a cross-join between foo and
bar which I'm not sure is the correct behaviour since CTEs seem to be
more like views in this respect.

Finally, the current code fails when supplying an additional alias to a
CTE in the select statement and then trying to refer to it, e.g.

- with myrel(p1) as (select oid from pg_class) select myrel.p1 from
myrel AS foo, pg_class AS bar WHERE myrel.p1 = bar.oid; -- WORKS

- with myrel(p1) as (select oid from pg_class) select myrel.p1 from
myrel AS foo, pg_class AS bar WHERE foo.p1 = bar.oid; -- FAILS

ERROR:  missing FROM-clause entry for table foo at character 103

So in this case, should foo be accepted as a valid alias for myrel? My
feeling is that I will end up with an RTE_CTE range table entry kind
which borrows from the current subquery behaviour, but it would be very
useful to see where the similarity between the two range table entry
types ends.


TIA,

Mark.

Index: src/backend/parser/analyze.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/parser/analyze.c,v
retrieving revision 1.351
diff -c -r1.351 analyze.c
*** src/backend/parser/analyze.c	18 Sep 2006 16:04:04 -	1.351
--- src/backend/parser/analyze.c	20 Sep 2006 19:02:44 -
***
*** 2087,2092 
--- 2087,2095 
  	/* make FOR UPDATE/FOR SHARE info available to addRangeTableEntry */
  	pstate-p_locking_clause = stmt-lockingClause;
  
+ 	/* process the WITH clause */
+ 	transformWithClause(pstate, stmt-withClause);
+ 
  	/* process the FROM clause */
  	transformFromClause(pstate, stmt-fromClause);
  
Index: src/backend/parser/gram.y
===
RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.565
diff -c -r2.565 gram.y
*** src/backend/parser/gram.y	3 Sep 2006 22:37:05 -	2.565
--- src/backend/parser/gram.y	20 Sep 2006 19:02:52 -
***
*** 346,351 
--- 346,354 
  %type str		OptTableSpace OptConsTableSpace OptTableSpaceOwner
  %type list	opt_check_option
  
+ %type list	OptCommonTableExpressionList CommonTableExpressionList
+ %type node	CommonTableExpression
+ 
  
  /*
   * If you make any token changes, update the keyword table in
***
*** 5594,5599 
--- 5597,5638 
  			| WITHOUT HOLD	{ $$ = FALSE; }
  		;
  
+ 
+ 
+ /*
+  *
+  *		COMMON TABLE EXPRESSIONS
+  *WITH STATEMENTS
+  *
+  */
+ 
+ 
+ OptCommonTableExpressionList:
+ 			WITH CommonTableExpressionList			{ $$ = $2; }
+ 			| /* EMPTY */	{ $$ = NIL; }
+ 		;
+ 
+ CommonTableExpressionList:
+ 			CommonTableExpression		{ $$ = list_make1($1); }
+ 			| CommonTableExpressionList ',' CommonTableExpression		{ $$ = lappend($1, $3); }
+ 		;
+ 
+ CommonTableExpression:
+ 			ColId '(' columnList ')' AS select_with_parens
+ {
+ 	Alias *a = makeNode(Alias);
+ 	a-aliasname = $1;
+ 	a-colnames = $3;
+ 
+ 	RangeSubselect *n = makeNode(RangeSubselect);
+ 	n-subquery = $6;
+ 	n-alias = a;
+ 	$$ = (Node *) n;
+ }
+ 		;
+ 
+ 
+ 
  /*
   *
   *		QUERY:
***
*** 5705,5723 
   * However, this is not checked by the grammar; parse analysis must check it.
   */
  simple_select:
! 			SELECT

Re: [HACKERS] [PATCHES] Patch for UUID datatype (beta)

2006-09-19 Thread mark
On Tue, Sep 19, 2006 at 08:20:13AM -0500, Jim C. Nasby wrote:
 On Mon, Sep 18, 2006 at 07:45:07PM -0400, [EMAIL PROTECTED] wrote:
  I would not use a 100% random number generator for a UUID value as was
  suggested. I prefer inserting the MAC address and the time, to at
  least allow me to control if a collision is possible. This is not easy
  to do using a few lines of C code. I'd rather have a UUID type in core
  with no generation routine, than no UUID type in core because the code
  is too complicated to maintain, or not portable enough.
 As others have mentioned, using MAC address doesn't remove the
 possibility of a collision.

It does, as I control the MAC address. I can choose not to overwrite it.
I can choose to ensure that any cases where it is overwritten, it is
overwritten with a unique value. Random number does not provide this
level of control.

 Maybe a good compromise that would allow a generator function to go into
 the backend would be to combine the current time with a random number.
 That will ensure that you won't get a dupe, so long as your clock never
 runs backwards.

Which standard UUID generation function would you be thinking of? 
Inventing a new one doesn't seem sensible. I'll have to read over the
versions again...

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Patch for UUID datatype (beta)

2006-09-19 Thread mark
On Tue, Sep 19, 2006 at 11:21:51PM -0400, Alvaro Herrera wrote:
 [EMAIL PROTECTED] wrote:
  On Tue, Sep 19, 2006 at 08:20:13AM -0500, Jim C. Nasby wrote:
   On Mon, Sep 18, 2006 at 07:45:07PM -0400, [EMAIL PROTECTED] wrote:
I would not use a 100% random number generator for a UUID value as was
suggested. I prefer inserting the MAC address and the time, to at
least allow me to control if a collision is possible. This is not easy
to do using a few lines of C code. I'd rather have a UUID type in core
with no generation routine, than no UUID type in core because the code
is too complicated to maintain, or not portable enough.
   As others have mentioned, using MAC address doesn't remove the
   possibility of a collision.
  It does, as I control the MAC address.
 What happens if you have two postmaster running on the same machine?

Could be bad things. :-)

For the case of two postmaster processes, I assume you mean two
different databases? If you never intend to merge the data between the
two databases, the problem is irrelevant. There is a much greater
chance that any UUID form is more unique, or can be guaranteed to be
unique, within a single application instance, than across all
application instances in existence. If you do intend to merge the
data, you may have a problem.

If I have two connections to PostgreSQL - would the plpgsql procedures
be executed from two different processes? With an in-core generation
routine, I think it is possible for it to collide unless inter-process
synchronization is used (unlikely) to ensure generation of unique
time/sequence combinations each time. I use this right now (mostly),
but as I've mentioned, it isn't my favourite. It's convenient. I don't
believe it provides the sort of guarantees that a SERIAL provides.

A model that intended to try and guarantee uniqueness would provide a
UUID generation service for the entire host, that was not specific to
any application, or database, possibly accessible via the loopback
address. It would ensure that at any given time, either the time is
new, or the sequence is new for the time. If computer time ever went
backwards, it could keep the last time issued persistent, and
increment from this point forward through the clock sequence values
until real time catches up. An alternative would be along the lines of
a /dev/uuid device, that like /dev/random, would be responsible for
outputting unique uuid values for the system. Who does this? Probably
nobody. I'm tempted to implement it, though, for my uses. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Patch for UUID datatype (beta)

2006-09-18 Thread mark
On Mon, Sep 18, 2006 at 10:33:22AM -0400, Tom Lane wrote:
 Andreas Pflug [EMAIL PROTECTED] writes:
  Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
  there's a commonly accepted algorithm providing this.
 Anyone who thinks UUIDs are guaranteed unique has been drinking too much
 of the kool-aid.  They're at best probably unique.  Some generator
 algorithms might make it more probable than others, but you simply
 cannot guarantee it for UUIDs generated on noncommunicating machines.

The versions that include a MAC address, time, and serial number for
the machine come pretty close, presuming that the user has not
overwritten the MAC address with something else. It's unique at
manufacturing time. If the generation is performed from a library
with the same state, on the same machine, on the off chance that you
do request multiple generations at the same exact time (from my
experience, this is already unlikely) the serial number should be
bumped for that time.

So yeah - if you set your MAC address, or if your machine time is ever
set back, or if you assume a serial number of 0 each time (generation
routine isn't shared among processes on the system), you can get overlap.
All of these can be controlled, making it possible to eliminate overlap.

 One of the big reasons that I'm hesitant to put a UUID generation
 function into core is the knowledge that none of them are or can be
 perfect ... so people might need different ones depending on local
 conditions.  I'm inclined to think that a reasonable setup would put
 the datatype (with input, output, comparison and indexing support)
 into core, but provide a generation function as a contrib module,
 making it easily replaceable.

I have UUID generation in core in my current implementation. In the
last year that I've been using it, I have already chosen twice to
generate UUIDs from my calling program. I find it faster, as it avoids
have to call out to PostgreSQL twice. Once to generate the UUID, and
once to insert the row using it. I have no strong need for UUID
generation to be in core, and believe there does exist strong reasons
not to. Performance is better when not in core. Portability of
PostgreSQL is better when not in core. Ability to control how UUID is
defined is better when not in control.

The only thing an in-core version provides is convenience for those
that do not have easy access to a UUID generation library. I don't
care for that convenience.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Patch for UUID datatype (beta)

2006-09-18 Thread mark
On Mon, Sep 18, 2006 at 04:17:50PM -0500, Jim C. Nasby wrote:
 On Mon, Sep 18, 2006 at 12:23:16PM -0400, [EMAIL PROTECTED] wrote:
  I have UUID generation in core in my current implementation. In the
  last year that I've been using it, I have already chosen twice to
  generate UUIDs from my calling program. I find it faster, as it avoids
  have to call out to PostgreSQL twice. Once to generate the UUID, and
  once to insert the row using it. I have no strong need for UUID
  generation to be in core, and believe there does exist strong reasons
  not to. Performance is better when not in core. Portability of
  PostgreSQL is better when not in core. Ability to control how UUID is
  defined is better when not in control.
 That's kinda short-sighted. You're assuming that the only place you'll
 want to generate UUIDs is outside the database. What about a stored
 procedure that's adding data to the database? How about populating a
 table via a SELECT INTO? There's any number of cases where you'd want to
 generate a UUID inside the database.

contrib module.

  The only thing an in-core version provides is convenience for those
  that do not have easy access to a UUID generation library. I don't
  care for that convenience.
 It's not about access to a library, it's about how do you get to that
 library from inside the database, which may not be very easy.
 You may not care for that convenience, but I certainly would.

Then load the contrib module. I do both. I'd happily reduce my contrib
module to be based upon a built-in UUID type within PostgreSQL, providing
the necessary UUID generation routines.

I would not use a 100% random number generator for a UUID value as was
suggested. I prefer inserting the MAC address and the time, to at
least allow me to control if a collision is possible. This is not easy
to do using a few lines of C code. I'd rather have a UUID type in core
with no generation routine, than no UUID type in core because the code
is too complicated to maintain, or not portable enough.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] WIP: splitting BLCKSZ

2006-04-03 Thread Mark Wong
Here's an updated patch with help from Simon.  Once I get a test system
going again in the lab I'll start posting some data.  I'm planning a
combination of block sizes (BLCKSZ and XLOG_BLCKSZ) and number of WAL
buffers.

Thanks,
MarkIndex: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.229
diff -c -r1.229 xlog.c
*** src/backend/access/transam/xlog.c	28 Mar 2006 22:01:16 -	1.229
--- src/backend/access/transam/xlog.c	3 Apr 2006 15:03:09 -
***
*** 113,122 
  
  /*
   * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
!  * but BLCKSZ is assumed to be enough for it.
   */
  #ifdef O_DIRECT
! #define ALIGNOF_XLOG_BUFFER		BLCKSZ
  #else
  #define ALIGNOF_XLOG_BUFFER		ALIGNOF_BUFFER
  #endif
--- 113,122 
  
  /*
   * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
!  * but XLOG_BLCKSZ is assumed to be enough for it.
   */
  #ifdef O_DIRECT
! #define ALIGNOF_XLOG_BUFFER		XLOG_BLCKSZ
  #else
  #define ALIGNOF_XLOG_BUFFER		ALIGNOF_BUFFER
  #endif
***
*** 130,136 
  
  /* User-settable parameters */
  int			CheckPointSegments = 3;
! int			XLOGbuffers = 8;
  char	   *XLogArchiveCommand = NULL;
  char	   *XLOG_sync_method = NULL;
  const char	XLOG_sync_method_default[] = DEFAULT_SYNC_METHOD_STR;
--- 130,136 
  
  /* User-settable parameters */
  int			CheckPointSegments = 3;
! int			XLOGbuffers = 16;
  char	   *XLogArchiveCommand = NULL;
  char	   *XLOG_sync_method = NULL;
  const char	XLOG_sync_method_default[] = DEFAULT_SYNC_METHOD_STR;
***
*** 374,380 
  	 * and xlblocks values depends on WALInsertLock and WALWriteLock.
  	 */
  	char	   *pages;			/* buffers for unwritten XLOG pages */
! 	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + BLCKSZ */
  	Size		XLogCacheByte;	/* # bytes in xlog buffers */
  	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
  	TimeLineID	ThisTimeLineID;
--- 374,380 
  	 * and xlblocks values depends on WALInsertLock and WALWriteLock.
  	 */
  	char	   *pages;			/* buffers for unwritten XLOG pages */
! 	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + XLOG_BLCKSZ */
  	Size		XLogCacheByte;	/* # bytes in xlog buffers */
  	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
  	TimeLineID	ThisTimeLineID;
***
*** 397,403 
  
  /* Free space remaining in the current xlog page buffer */
  #define INSERT_FREESPACE(Insert)  \
! 	(BLCKSZ - ((Insert)-currpos - (char *) (Insert)-currpage))
  
  /* Construct XLogRecPtr value for current insertion point */
  #define INSERT_RECPTR(recptr,Insert,curridx)  \
--- 397,403 
  
  /* Free space remaining in the current xlog page buffer */
  #define INSERT_FREESPACE(Insert)  \
! 	(XLOG_BLCKSZ - ((Insert)-currpos - (char *) (Insert)-currpage))
  
  /* Construct XLogRecPtr value for current insertion point */
  #define INSERT_RECPTR(recptr,Insert,curridx)  \
***
*** 441,447 
  static uint32 readSeg = 0;
  static uint32 readOff = 0;
  
! /* Buffer for currently read page (BLCKSZ bytes) */
  static char *readBuf = NULL;
  
  /* Buffer for current ReadRecord result (expandable) */
--- 441,447 
  static uint32 readSeg = 0;
  static uint32 readOff = 0;
  
! /* Buffer for currently read page (XLOG_BLCKSZ bytes) */
  static char *readBuf = NULL;
  
  /* Buffer for current ReadRecord result (expandable) */
***
*** 706,712 
  	 * If cache is half filled then try to acquire write lock and do
  	 * XLogWrite. Ignore any fractional blocks in performing this check.
  	 */
! 	LogwrtRqst.Write.xrecoff -= LogwrtRqst.Write.xrecoff % BLCKSZ;
  	if (LogwrtRqst.Write.xlogid != LogwrtResult.Write.xlogid ||
  		(LogwrtRqst.Write.xrecoff = LogwrtResult.Write.xrecoff +
  		 XLogCtl-XLogCacheByte / 2))
--- 706,712 
  	 * If cache is half filled then try to acquire write lock and do
  	 * XLogWrite. Ignore any fractional blocks in performing this check.
  	 */
! 	LogwrtRqst.Write.xrecoff -= LogwrtRqst.Write.xrecoff % XLOG_BLCKSZ;
  	if (LogwrtRqst.Write.xlogid != LogwrtResult.Write.xlogid ||
  		(LogwrtRqst.Write.xrecoff = LogwrtResult.Write.xrecoff +
  		 XLogCtl-XLogCacheByte / 2))
***
*** 1228,1239 
  	{
  		/* crossing a logid boundary */
  		NewPageEndPtr.xlogid += 1;
! 		NewPageEndPtr.xrecoff = BLCKSZ;
  	}
  	else
! 		NewPageEndPtr.xrecoff += BLCKSZ;
  	XLogCtl-xlblocks[nextidx] = NewPageEndPtr;
! 	NewPage = (XLogPageHeader) (XLogCtl-pages + nextidx * (Size) BLCKSZ);
  
  	Insert-curridx = nextidx;
  	Insert-currpage = NewPage;
--- 1228,1239 
  	{
  		/* crossing a logid boundary */
  		NewPageEndPtr.xlogid += 1;
! 		NewPageEndPtr.xrecoff = XLOG_BLCKSZ;
  	}
  	else
! 		NewPageEndPtr.xrecoff += XLOG_BLCKSZ;
  	XLogCtl-xlblocks[nextidx] = NewPageEndPtr;
! 	NewPage = (XLogPageHeader) (XLogCtl-pages + nextidx * (Size) 

Re: [PATCHES] WIP: splitting BLCKSZ

2006-03-23 Thread Mark Wong
On Wed, 22 Mar 2006 14:19:48 -0500
Tom Lane [EMAIL PROTECTED] wrote:

 Mark Wong [EMAIL PROTECTED] writes:
  I proposed to explore splitting BLCKSZ into separate values for logging
  and data to see if there might be anything to gain:
  http://archives.postgresql.org/pgsql-hackers/2006-03/msg00745.php
  My first pass was to do more or less a search and replace (attached) and
  I am already running into trouble with a 'make check' (below).  I'm
  guessing that when initdb is run, I'm not properly saving the values
  that I've defined for DATA_BLCKSZ and possibly LOG_BLCKSZ.
 
 I'd suggest leaving BLCKSZ as-is and inventing XLOG_BLCKSZ to be used
 only within the WAL code; should make for a *far* smaller patch.
 Offhand I don't think that anything except xlog.c knows the WAL block
 size --- it should be fairly closely associated with dependencies on
 XLOG_SEG_SIZE, if you are looking for something to grep for.

Ok, I have attached something much smaller.  Appears to pass a 'make
check' but I'll keep going to make sure it's really correct and works.

Thanks,
MarkIndex: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.227
diff -c -r1.227 xlog.c
*** src/backend/access/transam/xlog.c	5 Mar 2006 15:58:22 -	1.227
--- src/backend/access/transam/xlog.c	23 Mar 2006 19:13:31 -
***
*** 113,122 
  
  /*
   * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
!  * but BLCKSZ is assumed to be enough for it.
   */
  #ifdef O_DIRECT
! #define ALIGNOF_XLOG_BUFFER		BLCKSZ
  #else
  #define ALIGNOF_XLOG_BUFFER		ALIGNOF_BUFFER
  #endif
--- 113,122 
  
  /*
   * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
!  * but XLOG_BLCKSZ is assumed to be enough for it.
   */
  #ifdef O_DIRECT
! #define ALIGNOF_XLOG_BUFFER		XLOG_BLCKSZ
  #else
  #define ALIGNOF_XLOG_BUFFER		ALIGNOF_BUFFER
  #endif
***
*** 374,380 
  	 * and xlblocks values depends on WALInsertLock and WALWriteLock.
  	 */
  	char	   *pages;			/* buffers for unwritten XLOG pages */
! 	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + BLCKSZ */
  	Size		XLogCacheByte;	/* # bytes in xlog buffers */
  	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
  	TimeLineID	ThisTimeLineID;
--- 374,380 
  	 * and xlblocks values depends on WALInsertLock and WALWriteLock.
  	 */
  	char	   *pages;			/* buffers for unwritten XLOG pages */
! 	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + XLOG_BLCKSZ */
  	Size		XLogCacheByte;	/* # bytes in xlog buffers */
  	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
  	TimeLineID	ThisTimeLineID;
***
*** 397,403 
  
  /* Free space remaining in the current xlog page buffer */
  #define INSERT_FREESPACE(Insert)  \
! 	(BLCKSZ - ((Insert)-currpos - (char *) (Insert)-currpage))
  
  /* Construct XLogRecPtr value for current insertion point */
  #define INSERT_RECPTR(recptr,Insert,curridx)  \
--- 397,403 
  
  /* Free space remaining in the current xlog page buffer */
  #define INSERT_FREESPACE(Insert)  \
! 	(XLOG_BLCKSZ - ((Insert)-currpos - (char *) (Insert)-currpage))
  
  /* Construct XLogRecPtr value for current insertion point */
  #define INSERT_RECPTR(recptr,Insert,curridx)  \
***
*** 441,447 
  static uint32 readSeg = 0;
  static uint32 readOff = 0;
  
! /* Buffer for currently read page (BLCKSZ bytes) */
  static char *readBuf = NULL;
  
  /* Buffer for current ReadRecord result (expandable) */
--- 441,447 
  static uint32 readSeg = 0;
  static uint32 readOff = 0;
  
! /* Buffer for currently read page (XLOG_BLCKSZ bytes) */
  static char *readBuf = NULL;
  
  /* Buffer for current ReadRecord result (expandable) */
***
*** 662,668 
  			{
  COMP_CRC32(rdata_crc,
  		   page,
! 		   BLCKSZ);
  			}
  			else
  			{
--- 662,668 
  			{
  COMP_CRC32(rdata_crc,
  		   page,
! 		   XLOG_BLCKSZ);
  			}
  			else
  			{
***
*** 672,678 
  		   bkpb-hole_offset);
  COMP_CRC32(rdata_crc,
  		   page + (bkpb-hole_offset + bkpb-hole_length),
! 		   BLCKSZ - (bkpb-hole_offset + bkpb-hole_length));
  			}
  		}
  	}
--- 672,678 
  		   bkpb-hole_offset);
  COMP_CRC32(rdata_crc,
  		   page + (bkpb-hole_offset + bkpb-hole_length),
! 		   XLOG_BLCKSZ - (bkpb-hole_offset + bkpb-hole_length));
  			}
  		}
  	}
***
*** 705,711 
  	 * If cache is half filled then try to acquire write lock and do
  	 * XLogWrite. Ignore any fractional blocks in performing this check.
  	 */
! 	LogwrtRqst.Write.xrecoff -= LogwrtRqst.Write.xrecoff % BLCKSZ;
  	if (LogwrtRqst.Write.xlogid != LogwrtResult.Write.xlogid ||
  		(LogwrtRqst.Write.xrecoff = LogwrtResult.Write.xrecoff +
  		 XLogCtl-XLogCacheByte / 2))
--- 705,711 
  	 * If cache is half filled then try

Re: [PATCHES] [HACKERS] pg_freespacemap question

2006-03-19 Thread Mark Kirkwood

Mark Kirkwood wrote:

Tom Lane wrote:



I do notice a rather serious shortcoming of pg_freespacemap in its
current incarnation, which is that it *only* shows you the per-page free
space data, and not any of the information that would let you determine
what the FSM is doing to filter the raw data.  The per-relation
avgRequest and lastPageCount fields would be interesting for instance.
Perhaps there should be a second view with one row per relation to
carry the appropriate data.



Ok - I did wonder about 2 views, but was unsure if the per-relation 
stuff was interesting. Given that it looks like it is interesting, I'll 
see about getting a second view going.




This patch implements the second view for FSM relations. I have renamed 
the functions and views to be:


pg_freespacemap_relations
pg_freespacemap_pages

This patch depends on the previous one (which was called simply 
'pg_freespacemap.patch').


Cheers

Mark


pg_freespacemap-1.patch.gz
Description: application/gzip

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PATCHES] Pg_buffercache tidy

2006-03-19 Thread Mark Kirkwood

This patch wraps the DDL in a BEGIN...COMMIT (as suggested by Jim for
pg_freespacemap a while ago).

In addition I have amended the example queries to correctly project out
only relations in the current database (relations with the same name
from different databases were getting counted together).

Cheers

Mark

Index: pg_buffercache.sql.in
===
RCS file: /projects/cvsroot/pgsql/contrib/pg_buffercache/pg_buffercache.sql.in,v
retrieving revision 1.3
diff -c -r1.3 pg_buffercache.sql.in
*** pg_buffercache.sql.in   27 Feb 2006 16:09:48 -  1.3
--- pg_buffercache.sql.in   19 Mar 2006 10:45:06 -
***
*** 1,4 
--- 1,5 
  -- Adjust this setting to control where the objects get created.
+ BEGIN;
  SET search_path = public;
  
  -- Register the function.
***
*** 16,18 
--- 17,21 
  -- Don't want these to be available at public.
  REVOKE ALL ON FUNCTION pg_buffercache_pages() FROM PUBLIC;
  REVOKE ALL ON pg_buffercache FROM PUBLIC;
+ 
+ COMMIT;
Index: README.pg_buffercache
===
RCS file: /projects/cvsroot/pgsql/contrib/pg_buffercache/README.pg_buffercache,v
retrieving revision 1.2
diff -c -r1.2 README.pg_buffercache
*** README.pg_buffercache   31 May 2005 00:07:47 -  1.2
--- README.pg_buffercache   19 Mar 2006 10:55:44 -
***
*** 76,83 
   isdirty boolean);
  
regression=# SELECT c.relname, count(*) AS buffers
!FROM pg_class c, pg_buffercache b
!WHERE b.relfilenode = c.relfilenode
 GROUP BY c.relname
 ORDER BY 2 DESC LIMIT 10;
 relname | buffers
--- 76,84 
   isdirty boolean);
  
regression=# SELECT c.relname, count(*) AS buffers
!FROM pg_class c INNER JOIN pg_buffercache b
!ON b.relfilenode = c.relfilenode INNER JOIN pg_database d
!ON (b.reldatabase = d.oid AND d.datname = current_database())
 GROUP BY c.relname
 ORDER BY 2 DESC LIMIT 10;
 relname | buffers


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] pg_freespacemap question

2006-03-12 Thread Mark Kirkwood

Tatsuo Ishii wrote:

BTW, I noticed difference of outputs from pg_freespacemap and
pgstattuple.

I ran pgbench and inspected accounts table by using these tools.

pg_freespacemap:
sum of bytes: 250712

pgstattuple:
free_space: 354880

Shouldn't they be identical?


I would have thought so - unless there are not enough pages left in the 
FSM...


pg_freespacemap is reporting on what gets into the FSM - so provided I 
haven't put a bug in there somewhere (!) - we need to look at how VACUUM 
reports free space to the FSM


cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] pg_freespacemap question

2006-03-12 Thread Mark Kirkwood

Tom Lane wrote:

Tatsuo Ishii wrote:


BTW, I noticed difference of outputs from pg_freespacemap and
pgstattuple.

I ran pgbench and inspected accounts table by using these tools.

pg_freespacemap:
sum of bytes: 250712

pgstattuple:
free_space: 354880

Shouldn't they be identical?



vacuum/fsm disregard pages with uselessly small amounts of
free space (less than the average tuple size, IIRC).


Ah - that what I was seeing! Thanks.


I do notice a rather serious shortcoming of pg_freespacemap in its
current incarnation, which is that it *only* shows you the per-page free
space data, and not any of the information that would let you determine
what the FSM is doing to filter the raw data.  The per-relation
avgRequest and lastPageCount fields would be interesting for instance.
Perhaps there should be a second view with one row per relation to
carry the appropriate data.



Ok - I did wonder about 2 views, but was unsure if the per-relation 
stuff was interesting. Given that it looks like it is interesting, I'll 
see about getting a second view going.


Cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] pg_freespacemap question

2006-03-08 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:

Good points! I had not noticed this test case. Probably NULL is better 



Would setting it to 'BLCKSZ - (fixed index header stuff)' be better, 



No, I don't think so, because that will just make it harder to recognize
what's what (remember that BLCKSZ isn't really a constant, and the index
overhead is not the same for all AMs either).  The point here is that
for indexes the FSM tracks whole-page availability, not the amount of
free space within pages.  So I think NULL is a reasonable representation
of that.  Using NULL will make it easy to filter the results if you want
to see only heap-page data or only index-page data, whereas it will be
very hard to do that if the view adopts an ultimately-artificial
convention about the amount of available space on an index page.



Right - after suggesting it I realized that coding the different index
overhead for each possible AM would have been ... difficult :-). A patch
is attached to implement the NULL free bytes and other recommendations:

1/ Index free bytes set to NULL
2/ Comment added to the README briefly mentioning the index business
3/ Columns reordered more logically
4/ 'Blockid' column removed
5/ Free bytes column renamed to just 'bytes' instead of 'blockfreebytes'

Now 5/ was only hinted at, but seemed worth doing while I was there
(hopefully I haven't made it too terse now).

cheers

Mark


Index: pg_freespacemap.c
===
RCS file: /projects/cvsroot/pgsql/contrib/pg_freespacemap/pg_freespacemap.c,v
retrieving revision 1.2
diff -c -r1.2 pg_freespacemap.c
*** pg_freespacemap.c   14 Feb 2006 15:03:59 -  1.2
--- pg_freespacemap.c   9 Mar 2006 03:38:10 -
***
*** 12,18 
  #include storage/freespace.h
  #include utils/relcache.h
  
! #define   NUM_FREESPACE_PAGES_ELEM6
  
  #if defined(WIN32) || defined(__CYGWIN__)
  /* Need DLLIMPORT for some things that are not so marked in main headers */
--- 12,18 
  #include storage/freespace.h
  #include utils/relcache.h
  
! #define   NUM_FREESPACE_PAGES_ELEM5
  
  #if defined(WIN32) || defined(__CYGWIN__)
  /* Need DLLIMPORT for some things that are not so marked in main headers */
***
*** 29,40 
  typedef struct
  {
  
-   uint32  blockid;
-   uint32  relfilenode;
uint32  reltablespace;
uint32  reldatabase;
uint32  relblocknumber;
!   uint32  blockfreebytes;
  
  } FreeSpacePagesRec;
  
--- 29,40 
  typedef struct
  {
  
uint32  reltablespace;
uint32  reldatabase;
+   uint32  relfilenode;
uint32  relblocknumber;
!   uint32  bytes;
!   boolisindex;
  
  } FreeSpacePagesRec;
  
***
*** 91,107 
  
/* Construct a tuple to return. */
tupledesc = CreateTemplateTupleDesc(NUM_FREESPACE_PAGES_ELEM, 
false);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 1, blockid,
!  INT4OID, -1, 0);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 2, relfilenode,
   OIDOID, -1, 0);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 3, reltablespace,
   OIDOID, -1, 0);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 4, reldatabase,
   OIDOID, -1, 0);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 5, relblocknumber,
   INT8OID, -1, 0);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 6, blockfreebytes,
   INT4OID, -1, 0);
  
/* Generate attribute metadata needed later to produce tuples */
--- 91,105 
  
/* Construct a tuple to return. */
tupledesc = CreateTemplateTupleDesc(NUM_FREESPACE_PAGES_ELEM, 
false);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 1, reltablespace,
   OIDOID, -1, 0);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 2, reldatabase,
   OIDOID, -1, 0);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 3, relfilenode,
   OIDOID, -1, 0);
!   TupleDescInitEntry(tupledesc, (AttrNumber) 4, relblocknumber,
   INT8OID, -1, 0);
!   TupleDescInitEntry(tupledesc

Re: [PATCHES] [SQL] Interval subtracting

2006-03-03 Thread Mark Dilger

Attached is the new patch.  To summarize:

  - new function justify_interval(interval)
  - modified function justify_hours(interval)
  - modified function justify_days(interval)

These functions are defined to meet the requirements as discussed in this 
thread.  Specifically:


  - justify_hours makes certain the sign bit on the hours
matches the sign bit on the days.  It only checks the
sign bit on the days, and not the months, when
determining if the hours should be positive or negative.
After the call, -24  hours  24.

  - justify_days makes certain the sign bit on the days
matches the sign bit on the months.  It's behavior does
not depend on the hours, nor does it modify the hours.
After the call, -30  days  30.

  - justify_interval makes sure the sign bits on all three
fields months, days, and hours are all the same.  After
the call, -24  hours  24 AND -30  days  30.

'make check' passes all tests.  There are no tests for justify_interval, as it 
is new.  But the existing tests for justify_hours and justify_days appear to 
still work, even though the behavior has changed.  Apparently, their test cases 
are not sensitive to the particular changes that have occurred.


I would include new tests in the patch but do not know on which reference 
machine/platform the patches are supposed to be generated.


mark
Index: src/backend/utils/adt/timestamp.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.160
diff --context -r1.160 timestamp.c
*** src/backend/utils/adt/timestamp.c   22 Nov 2005 22:30:33 -  1.160
--- src/backend/utils/adt/timestamp.c   3 Mar 2006 20:23:26 -
***
*** 1975,1980 
--- 1975,2054 
  }
  
  /*
+  *  interval_justify_interval()
+  *
+  *  Adjust interval so 'month', 'day', and 'time' portions are within
+  *  customary bounds.  Specifically:
+  *
+  *0 = abs(time)  24 hours
+  *0 = abs(day)   30 days
+  *
+  *  Also, the sign bit on all three fields is made equal, so either
+  *  all three fields are negative or all are positive.
+  */
+ Datum
+ interval_justify_interval(PG_FUNCTION_ARGS)
+ {
+   Interval   *span = PG_GETARG_INTERVAL_P(0);
+   Interval   *result;
+   
+ #ifdef HAVE_INT64_TIMESTAMP
+   int64   wholeday;
+ #else
+   double  wholeday;
+ #endif
+   int32   wholemonth;
+ 
+   result = (Interval *) palloc(sizeof(Interval));
+   result-month = span-month;
+   result-day = span-day;
+   result-time = span-time;
+ 
+ #ifdef HAVE_INT64_TIMESTAMP
+   TMODULO(result-time, wholeday, USECS_PER_DAY);
+ #else
+   TMODULO(result-time, wholeday, (double) SECS_PER_DAY);
+ #endif
+   result-day += wholeday;/* could overflow... */
+ 
+   wholemonth = result-day / DAYS_PER_MONTH;
+   result-day -= wholemonth * DAYS_PER_MONTH;
+   result-month += wholemonth;
+ 
+   if (result-month  0  result-day  0)
+   {
+   result-day -= DAYS_PER_MONTH;
+   result-month++;
+   }
+   else if (result-month  0  result-day  0)
+   {
+   result-day += DAYS_PER_MONTH;
+   result-month--;
+   }
+ 
+   if (result-time  0  result-day  0)
+   {
+ #ifdef HAVE_INT64_TIMESTAMP
+   result-time += USECS_PER_DAY;
+ #else
+   result-time += (double) SECS_PER_DAY;
+ #endif
+   result-day--;
+   }
+   else if (result-time  0  result-day  0)
+   {
+ #ifdef HAVE_INT64_TIMESTAMP
+   result-time -= USECS_PER_DAY;
+ #else
+   result-time -= (double) SECS_PER_DAY;
+ #endif
+   result-day++;
+   }
+ 
+   PG_RETURN_INTERVAL_P(result);
+ }
+ 
+ /*
   *interval_justify_hours()
   *
   *Adjust interval so 'time' contains less than a whole day, adding
***
*** 2006,2011 
--- 2080,2104 
  #endif
result-day += wholeday;/* could overflow... */
  
+   if (result-time  0  result-day  0)
+   {
+ #ifdef HAVE_INT64_TIMESTAMP
+   result-time += USECS_PER_DAY;
+ #else
+   result-time += (double) SECS_PER_DAY;
+ #endif
+   result-day--;
+   }
+   else if (result-time  0  result-day  0)
+   {
+ #ifdef HAVE_INT64_TIMESTAMP
+   result-time -= USECS_PER_DAY;
+ #else
+   result-time -= (double) SECS_PER_DAY;
+ #endif
+   result-day++;
+   }
+ 
PG_RETURN_INTERVAL_P(result);
  }
  
***
*** 2031,2036 
--- 2124,2140 
result-day -= wholemonth * DAYS_PER_MONTH;
result-month += wholemonth;
  
+   if (result-month  0  result-day  0)
+   {
+   result-day -= DAYS_PER_MONTH;
+   result-month++;
+   }
+   else if (result-month  0  result-day  0)
+   {
+   result-day += DAYS_PER_MONTH

Re: [PATCHES] [SQL] Interval subtracting

2006-03-01 Thread Mark Dilger

Mark Dilger wrote:

Mark Dilger wrote:


Tom Lane wrote:


Milen A. Radev [EMAIL PROTECTED] writes:


Milorad Poluga напи�а:

SELECT '10 years 1 mons 1 days'::interval - '9 years 10 mons 15 
days'::interval

?column?--- 3 mons -14 days
Why not '2 mons  16 days' ? 






Please read the last paragraph in section 8.5.1.4 of the manual
(http://www.postgresql.org/docs/8.1/static/datatype-datetime.html#AEN4775) 


. It mentions the functions named justify_days and justify_hours
that could do what you need.





justify_days doesn't currently do anything with this result --- it
thinks its charter is only to reduce day components that are = 30 days.
However, I think a good case could be made that it should normalize
negative days too; that is, the invariant on its result should be
0 = days  30, not merely days  30.  Similarly for justify_hours.
Comments anyone?  Patch anyone?




Sure, if nobody objects to this change I can write the patch.

mark



I've modified the code and it now behaves as follows:

  select justify_days('3 months -12 days'::interval);
justify_days
  
   2 mons 18 days

  select justify_days('3 months -33 days'::interval);
   justify_days
  ---
   1 mon 27 days

  select justify_hours('3 months -33 days -12 hours'::interval);
 justify_hours
  ---
   3 mons -34 days +12:00:00

  select justify_days(justify_hours('3 months -33 days -12 
hours'::interval));

justify_days
  
   1 mon 26 days 12:00:00

  select justify_hours('-73 hours'::interval);
 justify_hours
  ---
   -4 days +23:00:00

  select justify_days('-62 days'::interval);
 justify_days
  --
   -3 mons +28 days


I find the last two results somewhat peculiar, as the new functionality 
pushes the negative values upwards (from hours to days, days to 
months).  Changing '-73 hours' to '-3 days -1 hour' might be more 
intuitive?  The '-4 days +23 hours' is however consistent with the 
behavior in the other cases.


Thoughts?  I will package this up into a patch fairly soon.

mark


The patch is attached.  Since the functionality is being intentionally changed, 
not surprisingly the regression tests for timestamp, timestamptz and horology 
failed.  The regression.diffs are also attached.


I intended to update the docs for justify_days and justify_hours, but the docs 
don't detail the behavior at a sufficient level for any change to be warranted.


mark
Index: src/backend/utils/adt/timestamp.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.160
diff --context=5 -r1.160 timestamp.c
*** src/backend/utils/adt/timestamp.c   22 Nov 2005 22:30:33 -  1.160
--- src/backend/utils/adt/timestamp.c   1 Mar 2006 18:35:36 -
***
*** 2003,2013 
TMODULO(result-time, wholeday, USECS_PER_DAY);
  #else
TMODULO(result-time, wholeday, (double) SECS_PER_DAY);
  #endif
result-day += wholeday;/* could overflow... */
! 
PG_RETURN_INTERVAL_P(result);
  }
  
  /*
   *interval_justify_days()
--- 2003,2021 
TMODULO(result-time, wholeday, USECS_PER_DAY);
  #else
TMODULO(result-time, wholeday, (double) SECS_PER_DAY);
  #endif
result-day += wholeday;/* could overflow... */
!   if (result-time  0)
!   {
! #ifdef HAVE_INT64_TIMESTAMP
!   result-time += USECS_PER_DAY;
! #else
!   result-time += (double) SECS_PER_DAY;
! #endif
!   result-day--;
!   }
PG_RETURN_INTERVAL_P(result);
  }
  
  /*
   *interval_justify_days()
***
*** 2028,2037 
--- 2036,2050 
result-time = span-time;
  
wholemonth = result-day / DAYS_PER_MONTH;
result-day -= wholemonth * DAYS_PER_MONTH;
result-month += wholemonth;
+   if (result-day  0)
+   {
+   result-day += DAYS_PER_MONTH;
+   result-month--;
+   }
  
PG_RETURN_INTERVAL_P(result);
  }
  
  /* timestamp_pl_interval()
*** ./expected/timestamp.outSat Jun 25 20:04:18 2005
--- ./results/timestamp.out Wed Mar  1 10:21:00 2006
***
*** 442,448 
  SELECT '' AS 54, d1 - timestamp without time zone '1997-01-02' AS diff
 FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
   54 |  diff  
! +
  | @ 9863 days ago
  | @ 39 days 17 hours 32 mins 1 sec
  | @ 39 days 17 hours 32 mins 1 sec
--- 442,448 
  SELECT '' AS 54, d1 - timestamp without time zone '1997-01-02' AS diff
 FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
   54 |   diff
! +---
  | @ 9863 days ago
  | @ 39 days 17 hours 32 mins

Re: [PATCHES] Free WAL caches on switching segments

2006-02-14 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:


Tom Lane wrote:


Sounds like a recipe for ensuring it never will be tested.  What's
needed here is some actual tests, not preparation...



Does the OP have a test scenario that those of us with appropriate OS's 
could try? Come to think of it, what are the appropriate OS's? (I see 
NetBSD mentioned so I suppose all the *BSDs, but what others?).



The test run by the OP was just pgbench, 


Ah - right, missed that sorry.


which is probably not the
greatest scenario for showing the benefits of this patch, but at least
it's neutral ground.  You need a situation in which the kernel is under
memory stress, else early free of disk cache buffers isn't going to make
any difference whatever --- so choose a pgbench scale factor that makes
the database noticeably larger than the test machine's RAM.  Other than
that, follow the usual guidelines for producing trustworthy pgbench
numbers: number of clients smaller than scale factor, number of
transactions per client at least 1000 or so (to eliminate startup
transients), repeat test a couple times to make sure numbers are
reproducible.



Thinking about this, presumably any write intensive, multi-user 
benchmark would seem to be suitable, so would something like OSDL's 
DBT-2 actually be better to try?


Cheers

Mark

(P.s - academic in my case, unless I try out the latest NetBSD or Linux 
on one of my FreeBSD boxes)


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Free WAL caches on switching segments

2006-02-13 Thread Mark Kirkwood

Tom Lane wrote:

Bruce Momjian pgman@candle.pha.pa.us writes:


Yes, your vote counts very much.  What if I apply the patch, but mark
the posix_advise() call in a NOT_USED macro block, so it will be ready
for people to test, but will not be used until we are sure.



Sounds like a recipe for ensuring it never will be tested.  What's
needed here is some actual tests, not preparation...



Gotta second that.

Does the OP have a test scenario that those of us with appropriate OS's 
could try? Come to think of it, what are the appropriate OS's? (I see 
NetBSD mentioned so I suppose all the *BSDs, but what others?).


Cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PATCHES] Want to add to contrib.... xmldbx

2006-01-29 Thread Mark Woodward
I have a fairly simple extension I want to add to contrib. It is an XML
parser that is designed to work with a specific dialect.

I have a PHP extension called xmldbx, it allows the PHP system to
serialize its web session data to an XML stream. (or just serialize
variables) PHP's normal serializer is a non-standard home grown format
that it impossile to read. The xmldbx format may not be too much easier,
but I have a format document for it. I also have a PostgreSQL extension
that can take serialized data and use it in a query, like:

select xmldbx('data.userdata.id', sessions.data);

If the PHP variable has $userdata['id'] set to some varible in its session
data, it will be returned.

This allows a lot of flexability and a tight integration between
PostgreSQL and PHP.

For more information:
http://www.mohawksoft.org -- (The open source end of mohawk softare :-)

What do you all think?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Summary table trigger example race condition

2006-01-10 Thread Mark Kirkwood

Jim C. Nasby wrote:

On Sun, Jan 08, 2006 at 04:13:01PM +1300, Mark Kirkwood wrote:

After re-examining the original code, it looks like it was not actually 
vulnerable to a race condition! (it does the UPDATE, then if not found 
will do an INSERT, and handle unique violation with a repeat of the same 
UPDATE - i.e three DML statements, which are enough to handle the race 
in this case).



What happens if someone deletes the row between the failed insert and
the second update? :)



In this example the rows in the summary table never get deleted by 
DELETE operations on that main one - the trigger just decrements the 
various amounts - i.e DELETE becomes UPDATE, so no problem there.



AFAICT, example 36-1 is the only way to handle this without creating a
race condition.



For the general case indeed you are correct - a good reason for this 
change :-). In addition, that fact that it is actually quite difficult 
to be sure that any race condition is actually being handled makes 
(another) good reason for using the most robust method in the 'official' 
examples.


Best wishes

Mark


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Summary table trigger example race condition

2006-01-10 Thread Mark Kirkwood

Mark Kirkwood wrote:

Jim C. Nasby wrote:


On Sun, Jan 08, 2006 at 04:13:01PM +1300, Mark Kirkwood wrote:



What happens if someone deletes the row between the failed insert and
the second update? :)



In this example the rows in the summary table never get deleted by 
DELETE operations on that main one - the trigger just decrements the 
various amounts - i.e DELETE becomes UPDATE, so no problem there.




Sorry Jim, I just realized you probably meant someone directly deleting 
rows in the summary table itself. Well yes, that would certainly fox it!


I guess I was implicitly considering a use case where the only direct 
DML on the summary table would be some sort of ETL process, which would 
probably lock out other changes (and re-create the summary table data 
afterwards in all likelihood).


Cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] Summary table trigger example race condition

2006-01-07 Thread Mark Kirkwood

Mark Kirkwood wrote:

Jim C. Nasby wrote:


On Fri, Jan 06, 2006 at 02:00:34PM +1300, Mark Kirkwood wrote:

However, I think the actual change is not quite right - after running 




DOH! It would be good if doc/src had a better mechanism for handling
code; one that would allow for writing the code natively (so you don't
have to worry about translating  into lt; and  into gt;) and for
unit testing the different pieces of code.



Yes it would - I usually build the SGML - HTML, then cut the code out 
of a browser session to test - the pain is waiting for the docs to build.



Anyway, updated patch attached.



This one is good!



After re-examining the original code, it looks like it was not actually 
vulnerable to a race condition! (it does the UPDATE, then if not found 
will do an INSERT, and handle unique violation with a repeat of the same 
UPDATE - i.e three DML statements, which are enough to handle the race 
in this case).


However Jim's change handles the race needing only two DML statements in 
a loop, which seems much more elegant! In addition it provides a nice 
example of the 'merge' style code shown in e.g 36-1.


Cheers

Mark


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Summary table trigger example race condition

2006-01-05 Thread Mark Kirkwood

Jim C. Nasby wrote:

On Fri, Jan 06, 2006 at 02:00:34PM +1300, Mark Kirkwood wrote:

However, I think the actual change is not quite right - after running 



DOH! It would be good if doc/src had a better mechanism for handling
code; one that would allow for writing the code natively (so you don't
have to worry about translating  into lt; and  into gt;) and for
unit testing the different pieces of code.



Yes it would - I usually build the SGML - HTML, then cut the code out 
of a browser session to test - the pain is waiting for the docs to build.



Anyway, updated patch attached.



This one is good!

Cheers

Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] TODO Item - Add system view to show free space map

2005-12-16 Thread Mark Kirkwood

Mark Kirkwood wrote:

Bruce Momjian wrote:


Mark Kirkwood wrote:


Simon Riggs wrote:



I like this, but not because I want to read it myself, but because I
want to make autovacuum responsible for re-allocating free space 
when it

runs out. This way we can have an autoFSM feature in 8.2




Not wanting to denigrate value of the interesting but slightly OT 
direction this thread has taken - but does anybody want to 
comment/review the patch itself :-) ?




I saw this question about a transaction block and your reply:

http://archives.postgresql.org/pgsql-patches/2005-10/msg00226.php

but no new patch.  I know someone suggested pgfoundry but it seems most
natural in /contrib.  Do you want to update the patch?



In the expectation of further revisions, I was going to batch that one 
in with the 'rest' - given that there have not been any, I'll submit a 
revised patch.




Here it is - I seem to have had trouble sending any attachments to this 
list recently. Bruce the patch (sent privately), so its in the patches 
queue, but thought I would have another go at getting it to -patches so 
others can review it more easily!


cheers

Mark


contrib-freespacemap.patch.gz
Description: application/gzip

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] TODO Item - Add system view to show free space map

2005-11-17 Thread Mark Kirkwood

Simon Riggs wrote:



I like this, but not because I want to read it myself, but because I
want to make autovacuum responsible for re-allocating free space when it
runs out. This way we can have an autoFSM feature in 8.2




Not wanting to denigrate value of the interesting but slightly OT 
direction this thread has taken - but does anybody want to 
comment/review the patch itself :-) ?


Cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PATCHES] TODO Item - Add system view to show free space map contents

2005-10-27 Thread Mark Kirkwood

This patch implements a view to display the free space map contents - e.g :

regression=# SELECT c.relname, m.relblocknumber, m.blockfreebytes
 FROM pg_freespacemap m INNER JOIN pg_class c
 ON c.relfilenode = m.relfilenode LIMIT 10;
relname | relblocknumber | blockfreebytes
++
sql_features|  5 |   2696
sql_implementation_info |  0 |   7104
sql_languages   |  0 |   8016
sql_packages|  0 |   7376
sql_sizing  |  0 |   6032
pg_authid   |  0 |   7424
pg_toast_2618   | 13 |   4588
pg_toast_2618   | 12 |   1680
pg_toast_2618   | 10 |   1436
pg_toast_2618   |  7 |   1136
(10 rows)

[I found being able to display the FSM pretty cool, even if I say so
myself].

It is written as a contrib module (similar to pg_buffercache) so as to
make any revisions non-initdb requiring.

The code needs to know about several of the (currently) internal data
structures in freespace.c, so I moved these into freespace.h. Similarly
for the handy macros to actually compute the free space. Let me know if 
this was the wrong way to proceed!


Additionally access to the FSM pointer itself is required, I added a
function in freespace.c to return this, rather than making it globally
visible, again if the latter is a better approach, it is easily changed.

cheers

Mark

P.s : Currently don't have access to a windows box, so had to just 'take 
a stab' at what DLLIMPORTs were required.



diff -Ncar pgsql.orig/contrib/pg_freespacemap/Makefile 
pgsql/contrib/pg_freespacemap/Makefile
*** pgsql.orig/contrib/pg_freespacemap/Makefile Thu Jan  1 12:00:00 1970
--- pgsql/contrib/pg_freespacemap/Makefile  Thu Oct 27 17:52:10 2005
***
*** 0 
--- 1,17 
+ # $PostgreSQL$
+ 
+ MODULE_big = pg_freespacemap
+ OBJS  = pg_freespacemap.o
+ 
+ DATA_built = pg_freespacemap.sql 
+ DOCS = README.pg_freespacemap
+ 
+ ifdef USE_PGXS
+ PGXS := $(shell pg_config --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/pg_freespacemap
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -Ncar pgsql.orig/contrib/pg_freespacemap/README.pg_freespacemap 
pgsql/contrib/pg_freespacemap/README.pg_freespacemap
*** pgsql.orig/contrib/pg_freespacemap/README.pg_freespacemap   Thu Jan  1 
12:00:00 1970
--- pgsql/contrib/pg_freespacemap/README.pg_freespacemapThu Oct 27 
18:06:20 2005
***
*** 0 
--- 1,98 
+ Pg_freespacemap - Real time queries on the free space map (FSM).
+ ---
+ 
+   This module consists of a C function 'pg_freespacemap()' that returns 
+   a set of records, and a view 'pg_freespacemap' to wrapper the function.
+ 
+   The module provides the ability to examine the contents of the free space
+   map, without having to restart or rebuild the server with additional
+   debugging code.
+ 
+   By default public access is REVOKED from both of these, just in case there
+   are security issues lurking.
+ 
+ 
+ Installation
+ 
+ 
+   Build and install the main Postgresql source, then this contrib module:
+ 
+   $ cd contrib/pg_freespacemap
+   $ gmake
+   $ gmake install
+ 
+ 
+   To register the functions:
+ 
+   $ psql -d database -f pg_freespacemap.sql
+ 
+ 
+ Notes
+ -
+ 
+   The definition of the columns exposed in the view is:
+ 
+Column |  references  | Description
+   +--+
+blockid|  | Id, 1.. max_fsm_pages
+relfilenode| pg_class.relfilenode | Refilenode of the relation.
+reltablespace  | pg_tablespace.oid| Tablespace oid of the relation.
+reldatabase| pg_database.oid  | Database for the relation.
+relblocknumber |  | Offset of the page in the relation.
+blockfreebytes |  | Free bytes in the block/page.
+ 
+ 
+   There is one row for each page in the free space map.
+ 
+   Because the map is shared by all the databases, there are pages from
+   relations not belonging to the current database.
+ 
+   When the pg_freespacemap view is accessed, internal free space map locks are
+   taken, and a copy of the map data is made for the view to display. 
+   This ensures that the view produces a consistent set of results, while not 
+   blocking normal activity longer than necessary.  Nonetheless there 
+   could be some impact on database performance if this view is read often.
+ 
+ 
+ Sample output
+ -
+ 
+   regression=# \d pg_freespacemap
+   View

Re: [PATCHES] TODO Item - Add system view to show free space map

2005-10-27 Thread Mark Kirkwood

Jim C. Nasby wrote:

Shouldn't the DDL in pg_freespacemap.sql.in be wrapped in a transaction?
Specifically I'm considering the case of the script stopping before the
REVOKEs.


That's nice, (probably should have done it in pg_buffercache )!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] TODO Item - Add system view to show free space map

2005-10-27 Thread Mark Kirkwood

Christopher Kings-Lynne wrote:

Want to host it on pgfoundry until 8.2 is released?



Absolutely - I'll let it run the gauntlet of freedback to fix the silly 
mistakes I put in :-), then do patches for 8.1 and 8.0 (maybe 7.4 and 
7.3 as well - if it rains a lot).


cheers

Mark


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PATCHES] TODO item - tid operator

2005-10-24 Thread Mark Kirkwood
Please find enclosed a patch that adds a '' operator for tid types 
(per TODO item). Patch is against current sources (beta4).


If I have understood the requirement properly, the relevant function 
'tidne' already existed in src/backend/utils/adt/tid.c, just needed an 
ifdef NOT_USED removed. The '' operator however, needed to be added.


I have renumbered the OIDs for the various tid functions in order to 
keep them together with the 'new' tidne one in pg_proc.h - as we have 
run out of OIDs in the old range. Similarly for the '=' operator in 
pg_operator.h.


with the patch applied, all 98 regression tests pass, plus it seems to 
work ok :-)



test=# select ctid, id from foo where ctid != '(0,1)';
 ctid  | id
---+
 (0,2) |  2
 (0,3) |  3
(2 rows)

regards

Mark
Index: src/include/catalog/pg_proc.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.387
diff -c -r1.387 pg_proc.h
*** src/include/catalog/pg_proc.h   15 Oct 2005 02:49:42 -  1.387
--- src/include/catalog/pg_proc.h   24 Oct 2005 02:34:49 -
***
*** 1625,1637 
  DATA(insert OID = 1291 (  array_length_coerce PGNSP PGUID 12 f f t f s 3 2277 
2277 23 16 _null_ _null_ _null_ array_length_coerce - _null_ ));
  DESCR(adjust any array to new element typmod);
  
- DATA(insert OID = 1292 ( tideq   PGNSP PGUID 12 f f t 
f i 2 16 27 27 _null_ _null_ _null_ tideq - _null_ ));
- DESCR(equal);
- DATA(insert OID = 1293 ( currtid PGNSP PGUID 12 f f t f v 2 
27 26 27 _null_ _null_ _null_ currtid_byreloid - _null_ ));
- DESCR(latest tid of a tuple);
- DATA(insert OID = 1294 ( currtid2PGNSP PGUID 12 f f t f v 2 
27 25 27 _null_ _null_ _null_ currtid_byrelname - _null_ ));
- DESCR(latest tid of a tuple);
- 
  DATA(insert OID = 2168 ( pg_database_size PGNSP PGUID 12 f f t f 
v 1 20 19 _null_ _null_ _null_ pg_database_size_name - _null_ ));
  DESCR(Calculate total disk space usage for the specified database);
  
--- 1625,1630 
***
*** 3765,3770 
--- 3758,3772 
  DATA(insert OID = 2592 (  gist_circle_compressPGNSP PGUID 12 f f t f 
i 1 2281 2281 _null_ _null_ _null_ gist_circle_compress - _null_ ));
  DESCR(GiST support);
  
+ /* Tid functions */
+ DATA(insert OID = 2601 ( tideq   PGNSP PGUID 12 f f t 
f i 2 16 27 27 _null_ _null_ _null_ tideq - _null_ ));
+ DESCR(equal);
+ DATA(insert OID = 2602 ( currtid PGNSP PGUID 12 f f t f v 2 
27 26 27 _null_ _null_ _null_ currtid_byreloid - _null_ ));
+ DESCR(latest tid of a tuple);
+ DATA(insert OID = 2603 ( currtid2PGNSP PGUID 12 f f t f v 2 
27 25 27 _null_ _null_ _null_ currtid_byrelname - _null_ ));
+ DESCR(latest tid of a tuple);
+ DATA(insert OID = 2604 ( tidne   PGNSP PGUID 12 f f t 
f i 2 16 27 27 _null_ _null_ _null_ tidne - _null_ ));
+ DESCR(not equal);
  
  /*
   * Symbolic values for provolatile column: these indicate whether the result
Index: src/include/catalog/pg_operator.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_operator.h,v
retrieving revision 1.137
diff -c -r1.137 pg_operator.h
*** src/include/catalog/pg_operator.h   15 Oct 2005 02:49:42 -  1.137
--- src/include/catalog/pg_operator.h   24 Oct 2005 02:34:50 -
***
*** 128,135 
  DATA(insert OID = 389 (  !!PGNSP PGUID l f   0  20  1700   
0   0   0   0  0   0 numeric_fac - - ));
  DATA(insert OID = 385 (  = PGNSP PGUID b t  29  29  16 385  
 0   0   0   0   0 cideq eqsel eqjoinsel ));
  DATA(insert OID = 386 (  = PGNSP PGUID b t  22  22  16 386  
 0   0   0   0   0 int2vectoreq eqsel eqjoinsel ));
! DATA(insert OID = 387 (  = PGNSP PGUID b f  27  27  16 387  
 0   0   0   0   0 tideq eqsel eqjoinsel ));
! #define TIDEqualOperator   387
  
  DATA(insert OID = 410 ( =  PGNSP PGUID b t  20  20  
16 410 411 412 412 412 413 int8eq eqsel eqjoinsel ));
  DATA(insert OID = 411 (  PGNSP PGUID b f  20  20  16 411 
410 0 0 0 0 int8ne neqsel neqjoinsel ));
--- 128,136 
  DATA(insert OID = 389 (  !!PGNSP PGUID l f   0  20  1700   
0   0   0   0  0   0 numeric_fac - - ));
  DATA(insert OID = 385 (  = PGNSP PGUID b t  29  29  16 385  
 0   0   0   0   0 cideq eqsel eqjoinsel ));
  DATA(insert OID = 386 (  = PGNSP PGUID b t  22  22  16 386  
 0   0   0   0   0 int2vectoreq eqsel eqjoinsel ));
! DATA(insert OID = 390 (  PGNSP PGUID b f  27  27  16 390  
 0   0   0   0   0 tidne neqsel neqjoinsel ));
! DATA(insert OID = 391 (  = PGNSP PGUID b f  27  27  16 391

Re: [PATCHES] TODO item - tid operator

2005-10-24 Thread Mark Kirkwood

Neil Conway wrote:



You also probably want to add the declaration for tidne() to
include/utils/builtins.h



Doh! - Indeed, I obviously missed a few extra compile warnings!

Revised patch attached.




Index: src/backend/utils/adt/tid.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/tid.c,v
retrieving revision 1.49
diff -c -r1.49 tid.c
*** src/backend/utils/adt/tid.c 27 May 2005 00:57:49 -  1.49
--- src/backend/utils/adt/tid.c 24 Oct 2005 20:55:57 -
***
*** 174,180 
   arg1-ip_posid == arg2-ip_posid);
  }
  
- #ifdef NOT_USED
  Datum
  tidne(PG_FUNCTION_ARGS)
  {
--- 174,179 
***
*** 185,191 
   BlockIdGetBlockNumber((arg2-ip_blkid)) ||
   arg1-ip_posid != arg2-ip_posid);
  }
- #endif
  
  /*
   *Functions to get latest tid of a specified tuple.
--- 184,189 
Index: src/include/utils/builtins.h
===
RCS file: /projects/cvsroot/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.267
diff -c -r1.267 builtins.h
*** src/include/utils/builtins.h18 Oct 2005 20:38:58 -  1.267
--- src/include/utils/builtins.h24 Oct 2005 20:55:58 -
***
*** 530,535 
--- 530,536 
  extern Datum tidrecv(PG_FUNCTION_ARGS);
  extern Datum tidsend(PG_FUNCTION_ARGS);
  extern Datum tideq(PG_FUNCTION_ARGS);
+ extern Datum tidne(PG_FUNCTION_ARGS);
  extern Datum currtid_byreloid(PG_FUNCTION_ARGS);
  extern Datum currtid_byrelname(PG_FUNCTION_ARGS);
  
Index: src/include/catalog/pg_proc.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.387
diff -c -r1.387 pg_proc.h
*** src/include/catalog/pg_proc.h   15 Oct 2005 02:49:42 -  1.387
--- src/include/catalog/pg_proc.h   24 Oct 2005 20:56:01 -
***
*** 1625,1637 
  DATA(insert OID = 1291 (  array_length_coerce PGNSP PGUID 12 f f t f s 3 2277 
2277 23 16 _null_ _null_ _null_ array_length_coerce - _null_ ));
  DESCR(adjust any array to new element typmod);
  
- DATA(insert OID = 1292 ( tideq   PGNSP PGUID 12 f f t 
f i 2 16 27 27 _null_ _null_ _null_ tideq - _null_ ));
- DESCR(equal);
- DATA(insert OID = 1293 ( currtid PGNSP PGUID 12 f f t f v 2 
27 26 27 _null_ _null_ _null_ currtid_byreloid - _null_ ));
- DESCR(latest tid of a tuple);
- DATA(insert OID = 1294 ( currtid2PGNSP PGUID 12 f f t f v 2 
27 25 27 _null_ _null_ _null_ currtid_byrelname - _null_ ));
- DESCR(latest tid of a tuple);
- 
  DATA(insert OID = 2168 ( pg_database_size PGNSP PGUID 12 f f t f 
v 1 20 19 _null_ _null_ _null_ pg_database_size_name - _null_ ));
  DESCR(Calculate total disk space usage for the specified database);
  
--- 1625,1630 
***
*** 3765,3770 
--- 3758,3772 
  DATA(insert OID = 2592 (  gist_circle_compressPGNSP PGUID 12 f f t f 
i 1 2281 2281 _null_ _null_ _null_ gist_circle_compress - _null_ ));
  DESCR(GiST support);
  
+ /* Tid functions */
+ DATA(insert OID = 2601 ( tideq   PGNSP PGUID 12 f f t 
f i 2 16 27 27 _null_ _null_ _null_ tideq - _null_ ));
+ DESCR(equal);
+ DATA(insert OID = 2602 ( currtid PGNSP PGUID 12 f f t f v 2 
27 26 27 _null_ _null_ _null_ currtid_byreloid - _null_ ));
+ DESCR(latest tid of a tuple);
+ DATA(insert OID = 2603 ( currtid2PGNSP PGUID 12 f f t f v 2 
27 25 27 _null_ _null_ _null_ currtid_byrelname - _null_ ));
+ DESCR(latest tid of a tuple);
+ DATA(insert OID = 2604 ( tidne   PGNSP PGUID 12 f f t 
f i 2 16 27 27 _null_ _null_ _null_ tidne - _null_ ));
+ DESCR(not equal);
  
  /*
   * Symbolic values for provolatile column: these indicate whether the result
Index: src/include/catalog/pg_operator.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_operator.h,v
retrieving revision 1.137
diff -c -r1.137 pg_operator.h
*** src/include/catalog/pg_operator.h   15 Oct 2005 02:49:42 -  1.137
--- src/include/catalog/pg_operator.h   24 Oct 2005 20:56:02 -
***
*** 128,135 
  DATA(insert OID = 389 (  !!PGNSP PGUID l f   0  20  1700   
0   0   0   0  0   0 numeric_fac - - ));
  DATA(insert OID = 385 (  = PGNSP PGUID b t  29  29  16 385  
 0   0   0   0   0 cideq eqsel eqjoinsel ));
  DATA(insert OID = 386 (  = PGNSP PGUID b t  22  22  16 386  
 0   0   0   0   0 int2vectoreq eqsel eqjoinsel ));
! DATA(insert OID = 387 (  = PGNSP PGUID b f  27  27  16 387  
 0   0   0   0   0 tideq eqsel eqjoinsel ));
! #define 

Re: [PATCHES] [HACKERS] Autovacuum loose ends

2005-08-12 Thread Mark Wong
On Fri, 12 Aug 2005 17:49:41 -0400
Alvaro Herrera [EMAIL PROTECTED] wrote:

 On Fri, Aug 12, 2005 at 10:49:43AM -0700, Mark Wong wrote:
  I thought I'd run a couple of tests to see if it would be helpful
  against CVS from Aug 3, 2005.
  
  Here's a run with autovacuum turned off:
  http://www.testing.osdl.org/projects/dbt2dev/results/dev4-015/42/
  5186.55 notpm
  
  Autvacuum on with default settings:
  http://www.testing.osdl.org/projects/dbt2dev/results/dev4-015/38/
  5462.23 notpm
 
 Just noticed what seems to be a bug: in
 
 http://www.testing.osdl.org/projects/dbt2dev/results/dev4-015/42/db/index_info.input
 
 plot index_info.data using 1:2 title i_customer with lines, \
 index_info.data using 1:2 title i_orders with lines, \
 index_info.data using 1:3 title pk_customer with lines, \
 index_info.data using 1:4 title pk_district with lines, \
 index_info.data using 1:5 title pk_item with lines, \
 index_info.data using 1:6 title pk_new_order with lines, \
 index_info.data using 1:7 title pk_order_line with lines, \
 index_info.data using 1:8 title pk_orders with lines, \
 index_info.data using 1:9 title pk_stock with lines, \
 index_info.data using 1:11 title pk_warehouse with lines
 
 Notice how the subindexes are wrong ... I think it should be 1:3 for
 i_orders, no?  Apparently indexes_scan.data has the same problem.

Whoops!  I think I fixed it for real now and the charts should be
updated now.  It was broken slightly more previously.

 It called my attention that the pk_warehouse index seems to have a very
 different usage in both runs in index_info, but in indexes_scan they
 seem similar.

Thanks,
Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] Autovacuum loose ends

2005-08-12 Thread Mark Wong
On Fri, 12 Aug 2005 18:42:09 -0400
Alvaro Herrera [EMAIL PROTECTED] wrote:

 On Fri, Aug 12, 2005 at 03:16:04PM -0700, Mark Wong wrote:
  On Fri, 12 Aug 2005 17:49:41 -0400
  Alvaro Herrera [EMAIL PROTECTED] wrote:
  
   Notice how the subindexes are wrong ... I think it should be 1:3 for
   i_orders, no?  Apparently indexes_scan.data has the same problem.
  
  Whoops!  I think I fixed it for real now and the charts should be
  updated now.  It was broken slightly more previously.
 
 Hmm, did you fix the 42 case only?  The other one is broken too ...

The other dev4-015 cases should be fixed too.
 
 Also, it seems the tran_lock.out file captured wrong input -- I think
 you mean WHERE transactionid IS NULL in the query instead of WHERE
 transaction IS NULL.

Hmm, ok I can try that in a future test run.  I'm not very familiar with
this table, what's the difference between transaction and transactionid?

 I wonder what the big down-spikes (?) at minutes ~45 and ~85 correspond
 to.  Are those checkpoints?  The IO vmstat chart would indicate that, I
 think.

That's correct, those should be checkpoints. 
 
 Anyway, it's interesting to see the performance go up with autovacuum
 on.  I certainly didn't expect that in this kind of test.

I think in Mary's case it was hurting, but she's running the workload
dramatically different.  I think she was planning to revisit that after
we sort out what's going on with the grouped WAL writes.

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] remove BufferBlockPointers for speed and space

2005-08-11 Thread Mark Kirkwood

Mark Kirkwood wrote:

Looks to me like -O2 makes the difference very small (on this 
platform/gcc combo) - is 5/169 worth doing?


Ahem - misunderstood your comment here, sorry.


Qingqing Zhou wrote:


compiled with gcc testbuf.c. I tried -O2 actually, and it turns out 
that

the timing is reduced a lot so not believable.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [PATCHES] O_DIRECT for WAL writes

2005-08-11 Thread Mark Wong
Ok, I finally got a couple of tests done against CVS from Aug 3, 2005.
I'm not sure if I'm showing anything insightful though.  I've learned
that fdatasync and O_DSYNC are simply fsync and O_SYNC respectively on
Linux, which you guys may have already known.  There appears to be a
fair performance decrease in using open_sync.  Just to double check, am
I correct in understanding only open_sync uses O_DIRECT?

fdatasync
http://www.testing.osdl.org/projects/dbt2dev/results/dev4-015/38/
5462 notpm

open_sync
http://www.testing.osdl.org/projects/dbt2dev/results/dev4-015/40/
4860 notpm

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] O_DIRECT for WAL writes

2005-08-06 Thread Mark Wong
Here are comments that Daniel McNeil made earlier, which I've neglected
to forward earlier.  I've cc'ed him and Mark Havercamp, which some of
you got to meet the other day.

Mark

-

With O_DIRECT on Linux, when the write() returns the i/o has been
transferred to the disk.  

Normally, this i/o will be DMAed directly from user-space to the
device.  The current exception is when doing an O_DIRECT write to a 
hole in a file.  (If an program does a truncate() or lseek()/write()
that makes a file larger, the file system does not allocated space
between the old end of file and the new end of file.)  An O_DIRECT
write to hole like this, requires the file system to allocated space,
but there is a race condition between the O_DIRECT write doing the
allocate and then write to initialized the newly allocated data and
any other process that attempts a buffered (page cache) read of the
same area in the file -- it was possible for the read to data from
the allocated region before the O_DIRECT write().  The fix in Linux
is for the O_DIRECT write() to fall back to use buffer i/o to do
the write() and flush the data from the page cache to the disk.

A write() with O_DIRECT only means the data has been transferred to
the disk.   Depending on the file system and mount options, it does
not mean the meta data for the file has been written to disk (see
fsync man page).  Fsync() will guarantee the data and metadata have
been written to disk.

Lastly, if a disk has a write back cache, an O_DIRECT write() does not
guarantee that the disk has put the data on the physical media.
I think some of the journal file systems now support i/o barriers
on commit which will flush the disk write back cache.  (I'm still
looking the kernel code to see how this is done).

Conclusion:

O_DIRECT + fsync() can make sense.  It avoids the copying of data
to the page cache before being written and will also guarantee
that the file's metadata is also written to disk.  It also
prevents the page cache from filling up with write data that
will never be read (I assume it is only read if a recovery
is necessary - which should be rare).  It can also
helps disks with write back cache when using the journaling
file system that use i/o barriers.  You would want to use
large writes, since the kernel page cache won't be writing
multiple pages for you.

I need to look at the kernel code more to comment on O_DIRECT with
O_SYNC.

Questions:

Does the database transaction logger preallocate the log file?

Does the logger care about the order in which each write hits the disk?

Now someone else can comment on my comments.

Daniel

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Mark Wong
I just ran through a few tests with the v14 patch against 100GB of data
from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours.  Just to
give a few details, I only loaded data and started a COPY in parallel
for each the data files:
http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/

Here's a visual of my disk layout, for those familiar with the database schema:

http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4-010-dbt3.html

I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.

Let me know if you have any questions.

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
On Thu, 14 Jul 2005 17:22:18 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 I revisited my patch and removed the code duplications that were there, and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Hi Alon,

I'm curious, what kind of system are you testing this on?  I'm trying to
load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
interested in the results you would expect.

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Hi Alon,

Yeah, that helps.  I just need to break up my scripts a little to just
load the data and not build indexes.

Is the following information good enough to give a guess about the data
I'm loading, if you don't mind? ;)  Here's a link to my script to create
tables:
http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432path=scripts/pgsql/create_tables.sh.in

File sizes:
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl

Number of rows:
# wc -l *.tbl
1500 customer.tbl
   600037902 lineitem.tbl
  25 nation.tbl
   15000 orders.tbl
2000 part.tbl
8000 partsupp.tbl
   5 region.tbl
 100 supplier.tbl

Thanks,
Mark

On Tue, 19 Jul 2005 14:05:56 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 
  On Thu, 14 Jul 2005 17:22:18 -0700
  Alon Goldshuv [EMAIL PROTECTED] wrote:
  
  I revisited my patch and removed the code duplications that were there, and
  added support for CSV with buffered input, so CSV now runs faster too
  (although it is not as optimized as the TEXT format parsing). So now
  TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original 
  file.
  
  Hi Alon,
  
  I'm curious, what kind of system are you testing this on?  I'm trying to
  load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
  interested in the results you would expect.
  
  Mark
  
 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Whoopsies, yeah good point about the PRIMARY KEY.  I'll fix that.

Mark

On Tue, 19 Jul 2005 18:17:52 -0400
Andrew Dunstan [EMAIL PROTECTED] wrote:

 Mark,
 
 You should definitely not be doing this sort of thing, I believe:
 
 CREATE TABLE orders (
   o_orderkey INTEGER,
   o_custkey INTEGER,
   o_orderstatus CHAR(1),
   o_totalprice REAL,
   o_orderDATE DATE,
   o_orderpriority CHAR(15),
   o_clerk CHAR(15),
   o_shippriority INTEGER,
   o_comment VARCHAR(79),
   PRIMARY KEY (o_orderkey))
 
 Create the table with no constraints, load the data, then set up primary keys 
 and whatever other constraints you want using ALTER TABLE. Last time I did a 
 load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup 
 from deferring constarint creation.
 
 
 cheers
 
 andrew
 
 
 
 Mark Wong wrote:
 
 Hi Alon,
 
 Yeah, that helps.  I just need to break up my scripts a little to just
 load the data and not build indexes.
 
 Is the following information good enough to give a guess about the data
 I'm loading, if you don't mind? ;)  Here's a link to my script to create
 tables:
 http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432path=scripts/pgsql/create_tables.sh.in
 
 File sizes:
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
 -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
 -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
 -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
 -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
 -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
 -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
 
 Number of rows:
 # wc -l *.tbl
 1500 customer.tbl
600037902 lineitem.tbl
   25 nation.tbl
15000 orders.tbl
 2000 part.tbl
 8000 partsupp.tbl
5 region.tbl
  100 supplier.tbl
 
 Thanks,
 Mark
 
 On Tue, 19 Jul 2005 14:05:56 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
   
 
 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 
 
 
 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
   
 
 I revisited my patch and removed the code duplications that were there, 
 and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original 
 file.
 
 
 Hi Alon,
 
 I'm curious, what kind of system are you testing this on?  I'm trying to
 load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
 interested in the results you would expect.
 
 Mark
 
   
 
 

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Bad link Makefile patch

2005-07-06 Thread Mark Deric
I built the v8.0.3 product from postgresql-8.0.3-1PGDG.src.rpm on
RedHat9 (I'm thinking the source RPM for RH9 should not have exactly the
same name as the FC3 version, since they are different files).  When I
tried to roll it into an RPM CD builder transaction, I got 'RPM
dependency errors': CRITICAL ERROR: Unable to resolve dependency
libecpg.so.3 for postgresql-libs and CRITICAL ERROR: Unable to resolve
dependency libpq.so.3 for postgresql-libs.  I was only including the
base rpm, -server, and -lib in the RPM transaction set.

The culprit was a nasty bit at $TOP/src/Makefile.shlib:243.  This piece
insures that for the link step of libecpg.so.5.0 and
libecpg_compat.so.2.0, /usr/lib is searched for libpq and libecpg.so
_before_ the locally built copy is searched.

At least for libecpg.so.5.0 and libecpg_compat.so.2.0, the attached
patch fixes the problem.  Not sure if there would be other instances in
-contrib, etc.

Hope this helps and that I'm not redundant with your other fans.

Regards,
Mark


userlib_link.patch
Description: Binary data

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] More to Bad link Makefile patch

2005-07-06 Thread Mark Deric
So, my co-workers chided me mercilessly to get the -contrib RPM working
as well; so the full patch is now attached.

BTW, I did search the archive and this problem did not stick out; but it
could be my crummy reference skills.  And, no, I didn't read all 3500
emails since the v8.0.3 release.

Also, there's some good chance that the binary RPMs on the
postgresql.org FTP site have the same flaw that I encountered as
described below.

Regards,
Mark

On Wed, 6 Jul 2005 09:08:50 -0700
Mark Deric [EMAIL PROTECTED] wrote:

 I built the v8.0.3 product from postgresql-8.0.3-1PGDG.src.rpm on
 RedHat9 (I'm thinking the source RPM for RH9 should not have exactly
 the same name as the FC3 version, since they are different files). 
 When I tried to roll it into an RPM CD builder transaction, I got 'RPM
 dependency errors': CRITICAL ERROR: Unable to resolve dependency
 libecpg.so.3 for postgresql-libs and CRITICAL ERROR: Unable to
 resolve dependency libpq.so.3 for postgresql-libs.  I was only
 including the base rpm, -server, and -lib in the RPM transaction set.
 
 The culprit was a nasty bit at $TOP/src/Makefile.shlib:243.  This
 piece insures that for the link step of libecpg.so.5.0 and
 libecpg_compat.so.2.0, /usr/lib is searched for libpq and libecpg.so
 _before_ the locally built copy is searched.
 
 At least for libecpg.so.5.0 and libecpg_compat.so.2.0, the attached
 patch fixes the problem.  Not sure if there would be other instances
 in-contrib, etc.
 
 Hope this helps and that I'm not redundant with your other fans.
 
 Regards,
 Mark
 


userlib_link.patch
Description: Binary data

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes

2005-07-06 Thread Mark Kirkwood

Neil Conway wrote:

Bruce Momjian wrote:


+ /* +  * Return the length of a datum, possibly compressed
+  */
+ Datum
+ pg_column_size(PG_FUNCTION_ARGS)
+ {
+ Datumvalue = PG_GETARG_DATUM(0);
+ intresult;
+ + /*fn_extra stores the fixed column length, or -1 for 
varlena. */

+ if (fcinfo-flinfo-fn_extra == NULL)/* first call? */
+ {
+ /* On the first call lookup the datatype of the supplied 
argument */



[...]

Is this optimization worth the code complexity?



I didn't performance test it, but the idea of hammering the catalogs for
each value to be processed seemed a bad thing.


+ tp = SearchSysCache(TYPEOID,
+ ObjectIdGetDatum(argtypeid),
+ 0, 0, 0);
+ if (!HeapTupleIsValid(tp))
+ {
+ /* Oid not in pg_type, should never happen. */
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+  errmsg(invalid typid: %u, argtypeid)));
+ }



elog(ERROR) is usually used for can't happen errors; also, the usual 
error message in this scenario is cache lookup failed [...]. Perhaps 
better to use get_typlen() here, anyway.




Ahem,..ah yes, get_typlen()! (ducks)- that is clearly much better!

I have attached a little change to varlena.c that uses it. I left the
ereport as it was, but am not fussed about it either way.

BTW - Bruce, I like the changes, makes the beast more friendly to use!

Best wishes

Mark


Index: src/backend/utils/adt/varlena.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/varlena.c,v
retrieving revision 1.125
diff -c -r1.125 varlena.c
*** src/backend/utils/adt/varlena.c 6 Jul 2005 19:02:52 -   1.125
--- src/backend/utils/adt/varlena.c 7 Jul 2005 02:52:50 -
***
*** 28,34 
  #include utils/builtins.h
  #include utils/lsyscache.h
  #include utils/pg_locale.h
- #include utils/syscache.h
  
  
  typedef struct varlena unknown;
--- 28,33 
***
*** 2364,2385 
{
/* On the first call lookup the datatype of the supplied 
argument */
Oid argtypeid = 
get_fn_expr_argtype(fcinfo-flinfo, 0);
!   HeapTuple   tp;
!   int typlen;
  
!   tp = SearchSysCache(TYPEOID,
!   
ObjectIdGetDatum(argtypeid),
!   0, 0, 0);
!   if (!HeapTupleIsValid(tp))
{
/* Oid not in pg_type, should never happen. */
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
!errmsg(invalid typid: %u, 
argtypeid)));
}
!   
!   typlen = ((Form_pg_type)GETSTRUCT(tp))-typlen;
!   ReleaseSysCache(tp);
fcinfo-flinfo-fn_extra = 
MemoryContextAlloc(fcinfo-flinfo-fn_mcxt,

  sizeof(int));
*(int *)fcinfo-flinfo-fn_extra = typlen;
--- 2363,2379 
{
/* On the first call lookup the datatype of the supplied 
argument */
Oid argtypeid = 
get_fn_expr_argtype(fcinfo-flinfo, 0);
!   int typlen= 
get_typlen(argtypeid);
  
!   
!   if (typlen == 0)
{
/* Oid not in pg_type, should never happen. */
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
!errmsg(cache lookup failed for type 
%u, argtypeid)));
}
! 
fcinfo-flinfo-fn_extra = 
MemoryContextAlloc(fcinfo-flinfo-fn_mcxt,

  sizeof(int));
*(int *)fcinfo-flinfo-fn_extra = typlen;


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes

2005-07-06 Thread Mark Kirkwood

Alvaro Herrera wrote:

On Thu, Jul 07, 2005 at 03:01:46PM +1200, Mark Kirkwood wrote:


Neil Conway wrote:




elog(ERROR) is usually used for can't happen errors.


I have attached a little change to varlena.c that uses it. I left the
ereport as it was, but am not fussed about it either way.



I am, because it gives useless messages to the translators to work on.
elog parameters are not marked for translation, ereport are (errmsg and
friends, really).  So please don't do that.



Ok, didn't realize the difference! Revised patch attached that uses elog.

Neil, I will runs some tests to see if there is any performance saving 
with the first-call-only business.


Cheers

Mark

Index: src/backend/utils/adt/varlena.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/varlena.c,v
retrieving revision 1.125
diff -c -r1.125 varlena.c
*** src/backend/utils/adt/varlena.c 6 Jul 2005 19:02:52 -   1.125
--- src/backend/utils/adt/varlena.c 7 Jul 2005 03:40:44 -
***
*** 28,34 
  #include utils/builtins.h
  #include utils/lsyscache.h
  #include utils/pg_locale.h
- #include utils/syscache.h
  
  
  typedef struct varlena unknown;
--- 28,33 
***
*** 2364,2385 
{
/* On the first call lookup the datatype of the supplied 
argument */
Oid argtypeid = 
get_fn_expr_argtype(fcinfo-flinfo, 0);
!   HeapTuple   tp;
!   int typlen;
  
!   tp = SearchSysCache(TYPEOID,
!   
ObjectIdGetDatum(argtypeid),
!   0, 0, 0);
!   if (!HeapTupleIsValid(tp))
{
/* Oid not in pg_type, should never happen. */
!   ereport(ERROR,
!   (errcode(ERRCODE_INTERNAL_ERROR),
!errmsg(invalid typid: %u, 
argtypeid)));
}
!   
!   typlen = ((Form_pg_type)GETSTRUCT(tp))-typlen;
!   ReleaseSysCache(tp);
fcinfo-flinfo-fn_extra = 
MemoryContextAlloc(fcinfo-flinfo-fn_mcxt,

  sizeof(int));
*(int *)fcinfo-flinfo-fn_extra = typlen;
--- 2363,2377 
{
/* On the first call lookup the datatype of the supplied 
argument */
Oid argtypeid = 
get_fn_expr_argtype(fcinfo-flinfo, 0);
!   int typlen= 
get_typlen(argtypeid);
  
!   
!   if (typlen == 0)
{
/* Oid not in pg_type, should never happen. */
!   elog(ERROR, cache lookup failed for type %u, 
argtypeid);
}
! 
fcinfo-flinfo-fn_extra = 
MemoryContextAlloc(fcinfo-flinfo-fn_mcxt,

  sizeof(int));
*(int *)fcinfo-flinfo-fn_extra = typlen;


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes

2005-07-06 Thread Mark Kirkwood

Neil Conway wrote:

Mark Kirkwood wrote:


I didn't performance test it, but the idea of hammering the catalogs for
each value to be processed seemed a bad thing.



Well, the syscache already sits in front of the catalogs themselves. I'd 
be curious to see what the performance difference actually is...





I did some tests with a two tables, one small and one large, I am seeing
a consistent difference favoring the first-call-only type checking:

 Table public.dim1
 Column |  Type  | Modifiers
++---
 d1key  | integer| not null
 dat| character varying(100) | not null
 dattyp | character varying(20)  | not null
 dfill  | character varying(100) | not null

INFO:  dim1: scanned 24 of 24 pages, containing 1001 live rows and 0
dead rows; 1001 rows in sample, 1001 estimated total rows


SELECT max(pg_column_size(dfill)) FROM dim1


Run  1st call only check? elapsed(ms)
1y5.5
2y3.1
3n11.3
4n4.1

Now try a bigger table:

Table public.fact0
 Column |  Type  | Modifiers
++---
 d0key  | integer| not null
 d1key  | integer| not null
 d2key  | integer| not null
 fval   | integer| not null
 ffill  | character varying(100) | not null


INFO:  fact0: scanned 3000 of 18182 pages, containing 165000 live rows
and 0 dead rows; 3000 rows in sample, 110 estimated total rows

SELECT max(pg_column_size(ffill)) FROM fact0

Run  1st call only check? elapsed(ms)
1y2497
2y2484
3y2496
4y2480
5n3572
6n3570
7n3558
8n3457


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes

2005-07-06 Thread Mark Kirkwood

Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:


On Thu, Jul 07, 2005 at 03:01:46PM +1200, Mark Kirkwood wrote:


I have attached a little change to varlena.c that uses it. I left the
ereport as it was, but am not fussed about it either way.




I am, because it gives useless messages to the translators to work on.



Exactly.  I had already made a private note to fix that patch ---
inventing your own random wording and punctuation for an extremely
standard error message is simply Not Acceptable.



Apologies all, my not fussed about it was highly misleading - I had 
already changed the error message as per Neil's suggestion in the 
revised patch, but that was not obvious from my comment :-(


regards

Mark

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes

2005-06-20 Thread Mark Kirkwood

I did a few cleanups on the last patch. Please examine this one instead.
The changes are:

1. Add documentation for pg_datum_length builtin.
2. Correct some typos in the code comments.
3. Move the code in toastfuncs.c to varlena.c as it is probably the
correct place.
4. Use ereport instead of elog.
5  Quiet compiler warning in pg_datum_length.

Best wishes

Mark

Mark Kirkwood wrote:

The next iteration -

Hopefully I have got the idea basically right.

I wonder if I have done the am I a varlena the long way.., pls advise
if so!




diff -Nacr ./doc/src/sgml/func.sgml.orig ./doc/src/sgml/func.sgml
*** ./doc/src/sgml/func.sgml.orig   Mon Jun 20 15:38:23 2005
--- ./doc/src/sgml/func.sgmlMon Jun 20 15:45:51 2005
***
*** 2187,2192 
--- 2187,2200 
/row
  
row
+
entryliteralfunctionpg_datum_length/function(parameterstring/parameter)/literal/entry
+entrytypeinteger/type/entry
+entryNumber of bytes (before toast decompression) in string/entry
+entryliteralpg_datum_length( 'jo\\000se'::bytea)/literal/entry
+entryliteral5/literal/entry
+   /row
+ 
+   row
 
entryliteralfunctionposition/function(parametersubstring/parameter 
in parameterstring/parameter)/literal/entry
 entrytypeinteger/type/entry
 entryLocation of specified substring/entry
diff -Nacr ./src/backend/access/heap/tuptoaster.c.orig 
./src/backend/access/heap/tuptoaster.c
*** ./src/backend/access/heap/tuptoaster.c.orig Mon Jun 20 17:11:37 2005
--- ./src/backend/access/heap/tuptoaster.c  Mon Jun 20 17:11:44 2005
***
*** 1436,1438 
--- 1436,1482 
  
return result;
  }
+ 
+ /* --
+  * toast_datum_size
+  *
+  *Show the (possibly compressed) size of a datum
+  * --
+  */
+ Size 
+ toast_datum_size(Datum value)
+ {
+ 
+   varattrib   *attr = (varattrib *) DatumGetPointer(value);
+   Sizeresult;
+ 
+   if (VARATT_IS_EXTERNAL(attr))
+   {
+   /*
+* Attribute is stored externally - If it is compressed too, 
+* then we need to get the external datum and calculate its 
size,
+* otherwise we just use the external rawsize.
+*/
+   if (VARATT_IS_COMPRESSED(attr))
+   {
+   varattrib   *attrext = 
toast_fetch_datum(attr);
+   result = VARSIZE(attrext);
+   pfree(attrext);
+   }
+   else
+   {
+   result = attr-va_content.va_external.va_rawsize;
+   }
+   }
+   else
+   {
+   /*
+* Attribute is stored inline either compressed or not, just
+* calculate the size of the datum in either case.
+*/
+   result = VARSIZE(attr);
+   }
+ 
+   return result;
+   
+ }
diff -Nacr ./src/backend/utils/adt/varlena.c.orig 
./src/backend/utils/adt/varlena.c
*** ./src/backend/utils/adt/varlena.c.orig  Mon Jun 20 14:28:03 2005
--- ./src/backend/utils/adt/varlena.c   Mon Jun 20 17:17:58 2005
***
*** 28,33 
--- 28,34 
  #include utils/builtins.h
  #include utils/lsyscache.h
  #include utils/pg_locale.h
+ #include utils/syscache.h
  
  
  typedef struct varlena unknown;
***
*** 2330,2333 
--- 2331,2396 
  
result_text = PG_STR_GET_TEXT(hexsum);
PG_RETURN_TEXT_P(result_text);
+ }
+ 
+ /* 
+  * Show the (possibly compressed) length of a datum.
+  */
+ Datum
+ pg_datum_length(PG_FUNCTION_ARGS)
+ {
+ 
+   Datum   value = PG_GETARG_DATUM(0);
+   int result;
+ 
+ 
+   if (fcinfo-flinfo-fn_extra == NULL)
+   {
+   /*
+* On the first call lookup the datatype of the supplied 
argument
+* and check if is a varlena.
+*/
+   Oid argtypeid = 
get_fn_expr_argtype(fcinfo-flinfo, 0);
+   HeapTuple   tp;
+   int typlen = 0;
+   
+ 
+   tp = SearchSysCache(TYPEOID,
+   
ObjectIdGetDatum(argtypeid),
+   0, 0, 0);
+   if (HeapTupleIsValid(tp)) 
+   {
+   Form_pg_type typtup = (Form_pg_type) GETSTRUCT(tp);
+   typlen = typtup-typlen;
+   ReleaseSysCache(tp);
+   }
+   else
+   {
+   /* Oid not in pg_type, should never happen. */
+   ereport(ERROR,
+   (errcode(ERRCODE_INTERNAL_ERROR),
+errmsg(invalid typid: %u, 
argtypeid

Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes

2005-06-19 Thread Mark Kirkwood

The next iteration -

Hopefully I have got the idea basically right.

I wonder if I have done the am I a varlena the long way.., pls advise
if so!

Cheers

Mark

Tom Lane wrote:


My recollection of that discussion is that we just wanted something
that would return the actual VARSIZE() of the datum.  You're building
something way too confusing ...

A more interesting point is that the way you've declared the function,
it will only work on text values, which is pretty limiting.  Ideally
we'd define this thing as pg_datum_length(any-varlena-type) returns int
but there is no pseudotype corresponding to any varlena type.

I was thinking about this the other day in connection with my proposal
to make something that could return the TOAST value OID of an
out-of-line datum.  I think the only non-restrictive way to do it would
be to declare the function as taking any, and then add a runtime check
using the get_fn_expr_argtype() mechanism to test that what you've been
given is in fact a varlena datatype.





diff -Nacr ./src/backend/access/heap/tuptoaster.c.orig 
./src/backend/access/heap/tuptoaster.c
*** ./src/backend/access/heap/tuptoaster.c.orig Mon Mar 21 13:23:58 2005
--- ./src/backend/access/heap/tuptoaster.c  Sun Jun 19 14:24:43 2005
***
*** 1436,1438 
--- 1436,1482 
  
return result;
  }
+ 
+ /* --
+  * toast_datum_size
+  *
+  *Show the (possibly compressed) size of a datum
+  * --
+  */
+ Size 
+ toast_datum_size(Datum value)
+ {
+ 
+   varattrib   *attr = (varattrib *) DatumGetPointer(value);
+   Sizeresult;
+ 
+   if (VARATT_IS_EXTERNAL(attr))
+   {
+   /*
+* Attribute is stored externally - If it is compressed too, 
+* then we need to get the external datum and calculate its 
size,
+* otherwise we just use the external rawsize.
+*/
+   if (VARATT_IS_COMPRESSED(attr))
+   {
+   varattrib   *attrext = 
toast_fetch_datum(attr);
+   result = VARSIZE(attrext);
+   pfree(attrext);
+   }
+   else
+   {
+   result = attr-va_content.va_external.va_rawsize;
+   }
+   }
+   else
+   {
+   /*
+* Attribute is stored inline either compressed or not, just
+* calculate the size of the datum in either case.
+*/
+   result = VARSIZE(attr);
+   }
+ 
+   return result;
+   
+ }
diff -Nacr ./src/backend/utils/adt/Makefile.orig 
./src/backend/utils/adt/Makefile
*** ./src/backend/utils/adt/Makefile.orig   Fri Apr  2 09:28:45 2004
--- ./src/backend/utils/adt/MakefileFri Jun 17 17:52:09 2005
***
*** 24,30 
tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \
network.o mac.o inet_net_ntop.o inet_net_pton.o \
ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \
!   ascii.o quote.o pgstatfuncs.o encode.o
  
  like.o: like.c like_match.c
  
--- 24,30 
tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \
network.o mac.o inet_net_ntop.o inet_net_pton.o \
ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \
!   ascii.o quote.o pgstatfuncs.o encode.o toastfuncs.o
  
  like.o: like.c like_match.c
  
diff -Nacr ./src/backend/utils/adt/toastfuncs.c.orig 
./src/backend/utils/adt/toastfuncs.c
*** ./src/backend/utils/adt/toastfuncs.c.orig   Fri Jun 17 17:52:09 2005
--- ./src/backend/utils/adt/toastfuncs.cSun Jun 19 17:35:26 2005
***
*** 0 
--- 1,74 
+ /*-
+  *
+  * toastfuncs.c
+  *  Functions for accessing information about toasted data.
+  *
+  * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *
+  * IDENTIFICATION
+  *  $PostgreSQL$
+  *
+  *-
+  */
+ 
+ #include postgres.h
+ #include fmgr.h
+ 
+ #include catalog/pg_type.h
+ #include access/xact.h
+ #include access/tuptoaster.h
+ #include utils/builtins.h
+ #include utils/syscache.h
+ 
+ Datum
+ pg_datum_length(PG_FUNCTION_ARGS)
+ {
+ 
+   Datum   value = PG_GETARG_DATUM(0);
+   int size;
+ 
+ 
+   if (fcinfo-flinfo-fn_extra == NULL)
+   {
+   /*
+* On the first call check lookup the datatype of the supplied 
argument
+* and check if is a varlena .
+*/
+   Oid argtypeid = 
get_fn_expr_argtype(fcinfo-flinfo, 0);
+   HeapTuple   tp;
+   int typlen

Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes

2005-06-18 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:


I thought I would have a look at:
(Datatypes) Add function to return compressed length of TOAST data values.



My recollection of that discussion is that we just wanted something
that would return the actual VARSIZE() of the datum.  You're building
something way too confusing ...



I was guessing a little about exactly what was wanted - for some reason 
  I couldn't access the mail archives for the last few days (however, 
can now).



A more interesting point is that the way you've declared the function,
it will only work on text values, which is pretty limiting.  Ideally
we'd define this thing as pg_datum_length(any-varlena-type) returns int
but there is no pseudotype corresponding to any varlena type.

I was thinking about this the other day in connection with my proposal
to make something that could return the TOAST value OID of an
out-of-line datum.  I think the only non-restrictive way to do it would
be to declare the function as taking any, and then add a runtime check
using the get_fn_expr_argtype() mechanism to test that what you've been
given is in fact a varlena datatype.



Yes - was thinking I needed to check if the Datum was a varlena (and 
didn't know how to do it...), so thanks for the pointer!


With these thoughts in mind, I'll have another go :-)

Cheers

Mark


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PATCHES] TODO Item - Return compressed length of TOAST datatypes (WIP)

2005-06-17 Thread Mark Kirkwood

I thought I would have a look at:

(Datatypes) Add function to return compressed length of TOAST data values.

A WIP patch is attached for comment (wanted to check I hadn't bitten off
more than I could chew *before* asking questions!).

A few questions come to mind:

1) The name - I have called it 'toast_compressed_length'. Seems longish
- I'm wondering if just 'compressed_length' is ok?

2) What should be returned for toasted data that is not compressed (or
plain stored data for that matter)? The WIP patch just gives the
uncompressed size (I notice I may need to subtract VARHDRSZ in some cases).

3) What should be returned for non-varlena types? The WIP patch is
treating everything as a varlena, so is returning incorrect information
for that case.

4) The builtin is declared as immutable - I am not so sure about that (I
am wondering if altering a column's storage from MAIN - EXTENDED and
then updating the column to be itself will fool it).

5) Any multi-byte locale considerations?

regards

Mark








diff -Nacr src/include/catalog/pg_proc.h.orig src/include/catalog/pg_proc.h
*** src/include/catalog/pg_proc.h.orig  Fri Jun 17 15:30:17 2005
--- src/include/catalog/pg_proc.h   Fri Jun 17 17:08:18 2005
***
*** 3655,3660 
--- 3655,3664 
  DATA(insert OID = 2560 (  pg_postmaster_start_time PGNSP PGUID 12 f f t f s 0 
1184  _null_ _null_ _null_ pgsql_postmaster_start_time - _null_ ));
  DESCR(postmaster start time);
  
+ /* Toast compressed length */
+ DATA(insert OID = 2561 (  toast_compressed_lengthPGNSP PGUID 12 f f t 
f i 1 23 25 _null_ _null_ _null_  toast_compressed_length - _null_ ));
+ DESCR(toast compressed length);
+ 
  
  /*
   * Symbolic values for provolatile column: these indicate whether the result
diff -Nacr src/include/access/tuptoaster.h.orig src/include/access/tuptoaster.h
*** src/include/access/tuptoaster.h.origThu Jun 16 21:12:57 2005
--- src/include/access/tuptoaster.h Thu Jun 16 21:14:06 2005
***
*** 138,141 
--- 138,149 
   */
  extern Size toast_raw_datum_size(Datum value);
  
+ /* --
+  * toast_compressed_datum_size -
+  *
+  *Return the compressed (toasted) size of a varlena datum
+  * --
+  */
+ extern Size toast_compressed_datum_size(Datum value);
+ 
  #endif   /* TUPTOASTER_H */
diff -Nacr src/include/utils/pg_lzcompress.h.orig 
src/include/utils/pg_lzcompress.h
*** src/include/utils/pg_lzcompress.h.orig  Thu Jun 16 21:21:37 2005
--- src/include/utils/pg_lzcompress.h   Thu Jun 16 21:21:11 2005
***
*** 228,231 
--- 228,238 
  extern intpglz_get_next_decomp_char_from_lzdata(PGLZ_DecompState *dstate);
  extern intpglz_get_next_decomp_char_from_plain(PGLZ_DecompState *dstate);
  
+ /* --
+  * Function to get compressed size.
+  * Internal use only.
+  * --
+  */
+ extern intpglz_fetch_size(PGLZ_Header *source);
+ 
  #endif   /* _PG_LZCOMPRESS_H_ */
diff -Nacr src/include/utils/builtins.h.orig src/include/utils/builtins.h
*** src/include/utils/builtins.h.orig   Fri Jun 17 15:25:01 2005
--- src/include/utils/builtins.hFri Jun 17 15:27:30 2005
***
*** 828,831 
--- 828,834 
  /* catalog/pg_conversion.c */
  extern Datum pg_convert_using(PG_FUNCTION_ARGS);
  
+ /* toastfuncs.c */
+ Datum   toast_compressed_length(PG_FUNCTION_ARGS);
+ 
  #endif   /* BUILTINS_H */
diff -Nacr src/backend/access/heap/tuptoaster.c.orig 
src/backend/access/heap/tuptoaster.c
*** src/backend/access/heap/tuptoaster.c.orig   Thu Jun 16 20:56:59 2005
--- src/backend/access/heap/tuptoaster.cFri Jun 17 15:12:30 2005
***
*** 1436,1438 
--- 1436,1499 
  
return result;
  }
+ 
+ /* --
+  * toast_compressed_datum_size
+  *
+  *Show the compressed size of a datum
+  * --
+  */
+ Size 
+ toast_compressed_datum_size(Datum value)
+ {
+ 
+ 
+   Sizesize;
+   varattrib   *attr = (varattrib *) DatumGetPointer(value);
+ 
+   if (!PointerIsValid(attr))
+   {
+   /*
+* No storage or NULL.
+*/
+   size = 0;
+   }
+   else if (VARATT_IS_EXTERNAL(attr))
+   {
+   /*
+* Attribute is stored externally 
+* If  it is compressed too, then we need to get the external 
datum
+* and interrogate *its* compressed size
+* otherwise just use the external rawsize (i.e. no compression)
+*/
+   if (VARATT_IS_COMPRESSED(attr))
+   {
+   varattrib   *attrext = 
toast_fetch_datum(attr);
+   size = pglz_fetch_size((PGLZ_Header *)attrext);
+   pfree(attrext);
+   }
+   else
+   {
+ 
+   size = attr-va_content.va_external.va_rawsize;
+   }
+   }
+   else if (VARATT_IS_COMPRESSED

Re: [PATCHES] [HACKERS] pg_buffercache causes assertion failure

2005-05-30 Thread Mark Kirkwood

Mark Kirkwood wrote:

Mark Kirkwood wrote:



I couldn't use int4 as the underlying datatype is unsigned int (not 
available as exposed Pg type). However, using int8 sounds promising 
(is int8 larger than unsigned int on 64-bit platforms?).



Blocknumber is defined as uint32 in block.h - so should always be safe 
to represent as an int8 I am thinking.


I will look at patching pg_buffercache, changing numeric - int8 for the 
relblocknumber column. This seems a tidier solution than using numeric, 
and loses the numeric overhead.


This patch changes the use of numeric to int8 to represent the
relblocknumber column.

regards

Mark

diff -Ncar pgsql.orig/contrib/pg_buffercache/README.pg_buffercache 
pgsql/contrib/pg_buffercache/README.pg_buffercache
*** pgsql.orig/contrib/pg_buffercache/README.pg_buffercache Tue May 31 
11:02:41 2005
--- pgsql/contrib/pg_buffercache/README.pg_buffercache  Tue May 31 11:05:48 2005
***
*** 66,78 
 relfilenode| oid |
 reltablespace  | oid |
 reldatabase| oid |
!relblocknumber | numeric |
 isdirty| boolean |
View definition:
 SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase, 
p.relblocknumber, p.isdirty
   FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid, 
!  reltablespace oid, reldatabase oid, relblocknumber numeric(10,0), 
   isdirty boolean);
  
regression=# SELECT c.relname, count(*) AS buffers
--- 66,78 
 relfilenode| oid |
 reltablespace  | oid |
 reldatabase| oid |
!relblocknumber | bigint  |
 isdirty| boolean |
View definition:
 SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase, 
p.relblocknumber, p.isdirty
   FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid, 
!  reltablespace oid, reldatabase oid, relblocknumber bigint, 
   isdirty boolean);
  
regression=# SELECT c.relname, count(*) AS buffers
diff -Ncar pgsql.orig/contrib/pg_buffercache/pg_buffercache.sql.in 
pgsql/contrib/pg_buffercache/pg_buffercache.sql.in
*** pgsql.orig/contrib/pg_buffercache/pg_buffercache.sql.in Tue May 31 
11:02:41 2005
--- pgsql/contrib/pg_buffercache/pg_buffercache.sql.in  Tue May 31 09:15:03 2005
***
*** 11,17 
  CREATE VIEW pg_buffercache AS
SELECT P.* FROM pg_buffercache_pages() AS P
(bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, 
!relblocknumber numeric(10), isdirty bool);
   
  -- Don't want these to be available at public.
  REVOKE ALL ON FUNCTION pg_buffercache_pages() FROM PUBLIC;
--- 11,17 
  CREATE VIEW pg_buffercache AS
SELECT P.* FROM pg_buffercache_pages() AS P
(bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, 
!relblocknumber int8, isdirty bool);
   
  -- Don't want these to be available at public.
  REVOKE ALL ON FUNCTION pg_buffercache_pages() FROM PUBLIC;
diff -Ncar pgsql.orig/contrib/pg_buffercache/pg_buffercache_pages.c 
pgsql/contrib/pg_buffercache/pg_buffercache_pages.c
*** pgsql.orig/contrib/pg_buffercache/pg_buffercache_pages.cTue May 31 
11:02:41 2005
--- pgsql/contrib/pg_buffercache/pg_buffercache_pages.c Tue May 31 11:23:46 2005
***
*** 93,99 
TupleDescInitEntry(tupledesc, (AttrNumber) 4, reldatabase,
OIDOID, 
-1, 0);
TupleDescInitEntry(tupledesc, (AttrNumber) 5, relblockbumber,
!   
NUMERICOID, -1, 0);
TupleDescInitEntry(tupledesc, (AttrNumber) 6, isdirty,

BOOLOID, -1, 0);
  
--- 93,99 
TupleDescInitEntry(tupledesc, (AttrNumber) 4, reldatabase,
OIDOID, 
-1, 0);
TupleDescInitEntry(tupledesc, (AttrNumber) 5, relblockbumber,
!   
INT8OID, -1, 0);
TupleDescInitEntry(tupledesc, (AttrNumber) 6, isdirty,

BOOLOID, -1, 0);
  


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] pg_buffercache causes assertion failure

2005-05-30 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:


This patch changes the use of numeric to int8 to represent the
relblocknumber column.



Applied, thanks.




This reminds me:

I did some patches for 7.4 and 8.0 a while ago (attached) - while I do 
not expect these to be applied (unless it's ok for contrib to get extra 
modules in stable releases...), is there somewhere for things like this 
to go?


cheers

Mark

P.s : They are amended to use int8 too :-)
diff -Naur pgsql-7.4.7.orig/contrib/Makefile pgsql-7.4.7/contrib/Makefile
--- pgsql-7.4.7.orig/contrib/Makefile   Fri Mar 18 11:44:25 2005
+++ pgsql-7.4.7/contrib/MakefileFri Mar 18 10:55:55 2005
@@ -25,6 +25,7 @@
noupdate\
oid2name\
pg_autovacuum   \
+   pg_buffercache  \
pg_dumplo   \
pg_logger   \
pgbench \
diff -Naur pgsql-7.4.7.orig/contrib/README pgsql-7.4.7/contrib/README
--- pgsql-7.4.7.orig/contrib/README Fri Mar 18 11:44:19 2005
+++ pgsql-7.4.7/contrib/README  Fri Mar 18 10:55:55 2005
@@ -136,6 +136,10 @@
Automatically performs vacuum
by Matthew T. O'Connor matthew@zeut.net
 
+pg_buffercache -
+   Real-time queries on the shared buffer cache
+   by Mark Kirkwood [EMAIL PROTECTED]
+
 pg_dumplo -
Dump large objects
by Karel Zak [EMAIL PROTECTED]
diff -Naur pgsql-7.4.7.orig/contrib/pg_buffercache/Makefile 
pgsql-7.4.7/contrib/pg_buffercache/Makefile
--- pgsql-7.4.7.orig/contrib/pg_buffercache/MakefileThu Jan  1 12:00:00 1970
+++ pgsql-7.4.7/contrib/pg_buffercache/Makefile Fri Mar 18 10:55:55 2005
@@ -0,0 +1,17 @@
+# $PostgreSQL: pgsql/contrib/pg_buffercache/Makefile,v 1.1 2005/03/12 15:36:24 
neilc Exp $
+
+MODULE_big = pg_buffercache
+OBJS   = pg_buffercache_pages.o
+
+DATA_built = pg_buffercache.sql 
+DOCS = README.pg_buffercache
+
+ifdef USE_PGXS
+PGXS = $(shell pg_config --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_buffercache
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff -Naur pgsql-7.4.7.orig/contrib/pg_buffercache/README.pg_buffercache 
pgsql-7.4.7/contrib/pg_buffercache/README.pg_buffercache
--- pgsql-7.4.7.orig/contrib/pg_buffercache/README.pg_buffercache   Thu Jan 
 1 12:00:00 1970
+++ pgsql-7.4.7/contrib/pg_buffercache/README.pg_buffercacheFri Mar 18 
11:47:28 2005
@@ -0,0 +1,110 @@
+Pg_buffercache - Real time queries on the shared buffer cache.
+--
+
+  This module consists of a C function 'pg_buffercache_pages()' that returns 
+  a set of records, plus a view 'pg_buffercache' to wrapper the function.
+
+  The intent is to do for the buffercache what pg_locks does for locks, i.e -
+  ability to examine what is happening at any given time without having to 
+  restart or rebuild the server with debugging code added.
+
+  By default public access is REVOKED from both of these, just in case there
+  are security issues lurking.
+
+
+Installation
+
+
+  Build and install the main Postgresql source, then this contrib module:
+
+  $ cd contrib/pg_buffercache
+  $ gmake
+  $ gmake install
+
+
+  To register the functions:
+
+  $ psql -d database -f pg_buffercache.sql
+
+
+Notes
+-
+
+  The definition of the columns exposed in the view is:
+
+   Column |  references  | Description
+  +--+
+   bufferid   |  | Id, 1-shared_buffers.
+   relfilenode| pg_class.relfilenode | Refilenode of the relation.
+   reldatabase| pg_database.oid  | Database for the relation.
+   relblocknumber |  | Offset of the page in the relation.
+   isdirty|  | Is the page dirty?
+
+
+  There is one row for each buffer in the shared cache. Unused buffers are
+  shown with all fields null except bufferid.
+
+  Because the cache is shared by all the databases, there are pages from
+  relations not belonging to the current database.
+
+  When the pg_buffercache view is accessed, internal buffer manager locks are 
+  taken, and a copy of the buffer cache data is made for the view to display. 
+  This ensures that the view produces a consistent set of results, while not 
+  blocking normal buffer activity longer than necessary.  Nonetheless there 
+  could be some impact on database performance if this view is read often.
+
+
+Sample output
+-
+
+  regression=# \d pg_buffercache;
+   View public.pg_buffercache
+   Column |  Type   | Modifiers
+  +-+---
+   bufferid   | integer |
+   relfilenode| oid |
+   reldatabase| oid |
+   relblocknumber | bigint  |
+   isdirty| boolean |
+  View definition:
+   SELECT p.bufferid, p.relfilenode, p.reldatabase, 
+  p.relblocknumber, p.isdirty
+ FROM

Re: [PATCHES] [HACKERS] pg_buffercache causes assertion failure

2005-05-30 Thread Mark Kirkwood

Neil Conway wrote:

On Tue, 2005-05-31 at 13:07 +1200, Mark Kirkwood wrote:

I did some patches for 7.4 and 8.0 a while ago (attached) - while I do 
not expect these to be applied



Right, I don't see a need to backport this.



is there somewhere for things like this to go?



Pg Foundry?



Of course! Thanks


Mark


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PATCHES] Proof of concept for MIN/MAX optimization

2005-04-14 Thread Mark Kirkwood
Tom Lane wrote:
Mark Kirkwood [EMAIL PROTECTED] writes:
The handling of nulls is a little unexpected (still todo?) :

Yeah, that was broken in the first draft :-( ... I think it's OK in
the committed version though.
(post cvs update) yeah - looks good!
regards
Mark
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] Proof of concept for MIN/MAX optimization

2005-04-13 Thread Mark Kirkwood
Tom Lane wrote:
I haven't done the work yet to add a column to pg_aggregate, so this
just knows about optimizing a couple of hard-wired cases (int4 and
text).  Other than that it's reasonably complete, I think.  Comments?
Looks good :
regression=# explain select max(unique1) from tenk1;
   QUERY PLAN
-
 Result  (cost=0.15..0.16 rows=1 width=0)
   InitPlan
 -  Limit  (cost=0.00..0.15 rows=1 width=4)
   -  Index Scan Backward using tenk1_unique1 on tenk1 
(cost=0.00..1512.59 rows=1 width=4)
(4 rows)

The handling of nulls is a little unexpected (still todo?) :
regression=# insert into tenk2 (unique1,unique2) values(null, 1);
INSERT 0 1
regression=# select max(unique1) from tenk2;
 max
-
(1 row)
regression=# set enable_indexscan=0;
SET
regression=# select max(unique1) from tenk2;
 max
--
 
(1 row)
cheers
Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] [HACKERS] contrib/pg_buffercache

2005-03-31 Thread Mark Kirkwood
Neil Conway wrote:
Mark Kirkwood wrote:
Great that it fixes it... however, I had submitted a tidier patch that 
puts the macro in the header

How is this tidier?
err... puts the macro in the header
(I don't see a reason for pg_buffercache_pages.h at all, actually.)
(chuckles) - well, that puts a different complexion on it, best leave it 
as is then! (in fact, feel free to move the declaration from 
pg_buffercache_pages.h to pg_buffercache_pages.c, and eliminate the 
header altogether)

cheers
Mark
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] contrib/pg_buffercache

2005-03-31 Thread Mark Kirkwood
Andrew Dunstan wrote:
I didn't see the original of the later patch, which is why I sent in 
mine. I honestly don't care that much either way, although I'm inclined 
to agree that the header file is just unnecessary noise.

Well, looks like a strong feeling for no header :-). I didn't really 
think about *not* having one to be honest. So, anyway - instead of 
inviting Neil to spend valuable time eliminating it, I should do it 
myself...

So, please ignore my previous patch to the header file, and consider 
this one - which eliminates it completely.

best wishes
Mark
diff -Nacr pg_buffercache.orig/pg_buffercache_pages.c 
pg_buffercache/pg_buffercache_pages.c
*** pg_buffercache.orig/pg_buffercache_pages.c  Fri Apr  1 11:25:49 2005
--- pg_buffercache/pg_buffercache_pages.c   Fri Apr  1 11:28:22 2005
***
*** 12,18 
  #include storage/buf_internals.h
  #include storage/bufmgr.h
  #include utils/relcache.h
- #include pg_buffercache_pages.h
  
  
  #define NUM_BUFFERCACHE_PAGES_ELEM6
--- 12,17 
***
*** 21,26 
--- 20,27 
  extern DLLIMPORT BufferDesc   *BufferDescriptors;
  extern DLLIMPORT volatile uint32  InterruptHoldoffCount;
  #endif
+ 
+ Datum pg_buffercache_pages(PG_FUNCTION_ARGS);
  
  
  /*
diff -Nacr pg_buffercache.orig/pg_buffercache_pages.h 
pg_buffercache/pg_buffercache_pages.h
*** pg_buffercache.orig/pg_buffercache_pages.h  Fri Apr  1 11:25:49 2005
--- pg_buffercache/pg_buffercache_pages.h   Thu Jan  1 12:00:00 1970
***
*** 1,18 
- /*-
-  *
-  * pg_buffercache_pages.h
-  *Prototypes for pg_buffercache_pages
-  *
-  *
-  *$PostgreSQL: pgsql/contrib/pg_buffercache/pg_buffercache_pages.h,v 1.1 
2005/03/12 15:36:24 neilc Exp $
-  *
-  *-
-  */
- 
- 
- #ifndef PG_BUFFERCACHE_PAGES_H
- #define PG_BUFFERCACHE_PAGES_H
- 
- extern Datum pg_buffercache_pages(PG_FUNCTION_ARGS);
- 
- #endif  /* PG_BUFFERCACHE_PAGES_H */
--- 0 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] contrib/pg_buffercache

2005-03-30 Thread Mark Kirkwood
Neil Conway wrote:
Andrew Dunstan wrote:
I have confirmed that the attached patch works on Cygwin as well as 
Windows. Please apply.

Applied, thanks.
 
Great that it fixes it... however, I had submitted a tidier patch that 
puts the macro in the header (probably after asking Andrew to test the 
first one, oops - sorry Andrew). I have tested it on win32 native.

Do you want to back out the first one and use this instead?
regards
Mark
*** pg_buffercache_pages.h.orig Thu Mar 17 10:12:20 2005
--- pg_buffercache_pages.h  Thu Mar 17 13:44:45 2005
***
*** 15,18 
--- 15,24 
  
  extern Datum pg_buffercache_pages(PG_FUNCTION_ARGS);
  
+ /* A little hackery for Windows and Cygwin */
+ #if defined (WIN32) || defined (__CYGWIN__)
+ extern DLLIMPORT BufferDesc   *BufferDescriptors;
+ extern DLLIMPORT volatile uint32  InterruptHoldoffCount;
+ #endif
+ 
  #endif  /* PG_BUFFERCACHE_PAGES_H */


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer

2005-03-22 Thread Mark Wong
On Tue, Jan 25, 2005 at 06:06:23PM +0900, ITAGAKI Takahiro wrote:
 Environment:
   OS : Linux kernel 2.6.9
   CPU: Pentium 4 3GHz
   disk   : ATA 5400rpm (Data and WAL are placed on same partition.)
   memory : 1GB
   config : shared_buffers=1, wal_buffers=256,
XLOG_SEG_SIZE=256MB, checkpoint_segment=4

Hi Itagaki,

In light of this thread, have you compared the performance on
Linux-2.4?

Direct io on block device has performance regression on 2.6.x kernel
http://www.ussg.iu.edu/hypermail/linux/kernel/0503.1/0328.html

Mark

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] [HACKERS] contrib/pg_buffercache

2005-03-16 Thread Mark Kirkwood
Andrew Dunstan wrote:
It fixes the build error on Windows - haven't tried because i don't have
time, but I know it won't work on Cygwin, because WIN32 isn't (usually)
defined on Cygwin - see previous almost endless discussions.
Yes - I recall that discussion a while ago.
This patch should sort the issue.
One question, should I be using defined(__MINGW32__) as opposed to
defined(WIN32)? I figured I didn't as in this case it is not necessary
to distinguish between native and cygwin.
regards
Mark


*** pg_buffercache_pages.h.orig Thu Mar 17 10:12:20 2005
--- pg_buffercache_pages.h  Thu Mar 17 13:44:45 2005
***
*** 15,18 
--- 15,24 
  
  extern Datum pg_buffercache_pages(PG_FUNCTION_ARGS);
  
+ /* A little hackery for Windows and Cygwin */
+ #if defined (WIN32) || defined (__CYGWIN__)
+ extern DLLIMPORT BufferDesc   *BufferDescriptors;
+ extern DLLIMPORT volatile uint32  InterruptHoldoffCount;
+ #endif
+ 
  #endif  /* PG_BUFFERCACHE_PAGES_H */


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-10 Thread Mark Kirkwood
A couple of minor amendments here:
- remove link to libpq (from cut+past of dblnk's Makefile)
- add comment for pg_buffercache module in contrib/README
- change my listed email to this one (I have resigned)
regards
Mark
diff -Nacr pgsql.orig/contrib/Makefile pgsql/contrib/Makefile
*** pgsql.orig/contrib/Makefile Thu Mar  3 11:29:53 2005
--- pgsql/contrib/Makefile  Wed Mar  9 10:10:41 2005
***
*** 26,31 
--- 26,32 
noupdate\
oid2name\
pg_autovacuum   \
+   pg_buffercache  \
pg_dumplo   \
pg_trgm \
pgbench \
diff -Nacr pgsql.orig/contrib/README pgsql/contrib/README
*** pgsql.orig/contrib/README   Thu Mar  3 11:29:53 2005
--- pgsql/contrib/READMEFri Mar 11 12:11:18 2005
***
*** 136,141 
--- 136,145 
Automatically performs vacuum
by Matthew T. O'Connor matthew@zeut.net
  
+ pg_buffercace -
+   Real time queries on the shared buffer cache.
+   by Mark Kirkwood [EMAIL PROTECTED]
+ 
  pg_dumplo -
Dump large objects
by Karel Zak [EMAIL PROTECTED]
diff -Nacr pgsql.orig/contrib/pg_buffercache/Makefile 
pgsql/contrib/pg_buffercache/Makefile
*** pgsql.orig/contrib/pg_buffercache/Makefile  Thu Jan  1 12:00:00 1970
--- pgsql/contrib/pg_buffercache/Makefile   Thu Mar 10 08:19:20 2005
***
*** 0 
--- 1,19 
+ # $PostgreSQL$
+ 
+ MODULE_big = pg_buffercache
+ OBJS  = pg_buffercache_pages.o
+ 
+ DATA_built = pg_buffercache.sql 
+ DOCS = README.pg_buffercache
+ REGRESS = pg_buffercache
+ 
+ 
+ ifdef USE_PGXS
+ PGXS = $(shell pg_config --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/pg_buffercache
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -Nacr pgsql.orig/contrib/pg_buffercache/README.pg_buffercache 
pgsql/contrib/pg_buffercache/README.pg_buffercache
*** pgsql.orig/contrib/pg_buffercache/README.pg_buffercache Thu Jan  1 
12:00:00 1970
--- pgsql/contrib/pg_buffercache/README.pg_buffercache  Fri Mar 11 12:11:03 2005
***
*** 0 
--- 1,112 
+ Pg_buffercache - Real time queries on the shared buffer cache.
+ --
+ 
+   This module consists of a C function 'pg_buffercache_pages()' that returns 
+   a set of records, plus a view 'pg_buffercache' to wrapper the function.
+ 
+   The intent is to do for the buffercache what pg_locks does for locks, i.e -
+   ability to examine what is happening at any given time without having to 
+   restart or rebuild the server with debugging code added.
+ 
+   By default public access is REVOKED from both of these, just in case there
+   are security issues lurking.
+ 
+ 
+ Installation
+ 
+ 
+   Build and install the main Postgresql source, then this contrib module:
+ 
+   $ cd contrib/pg_buffercache
+   $ gmake
+   $ gmake install
+ 
+ 
+   To register the functions:
+ 
+   $ psql -d database -f pg_buffercache.sql
+ 
+ 
+ Notes
+ -
+ 
+   The definition of the columns exposed in the view is:
+ 
+Column |  references  | Description
+   +--+
+bufferid   |  | Id, 1-shared_buffers.
+relfilenode| pg_class.relfilenode | Refilenode of the relation.
+reltablespace  | pg_tablespace.oid| Tablespace oid of the relation.
+reldatabase| pg_database.oid  | Database for the relation.
+relblocknumber |  | Offset of the page in the relation.
+isdirty|  | Is the page dirty?
+ 
+ 
+   There is one row for each buffer in the shared cache. Unused buffers are
+   shown with all fields null except bufferid.
+ 
+   Because the cache is shared by all the databases, there are pages from
+   relations not belonging to the current database.
+ 
+   When the pg_buffercache view is accessed, internal buffer manager locks are 
+   taken, and a copy of the buffer cache data is made for the view to display. 
+   This ensures that the view produces a consistent set of results, while not 
+   blocking normal buffer activity longer than necessary.  Nonetheless there 
+   could be some impact on database performance if this view is read often.
+ 
+ 
+ Sample output
+ -
+ 
+   regression=# \d pg_buffercache;
+View public.pg_buffercache
+Column |  Type   | Modifiers
+   +-+---
+bufferid   | integer |
+relfilenode| oid |
+reltablespace  | oid |
+reldatabase| oid |
+relblocknumber | numeric |
+isdirty| boolean |
+   View definition:
+SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase, 
+   p.relblocknumber, p.isdirty
+  FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid

Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-08 Thread Mark Kirkwood
Tom Lane wrote:
I would rather see this as a contrib module.  There has been *zero*
consensus that we need this in core, nor any discussion about whether
it might be a security hole.
Hmmm, I guess my motivation for thinking it might be useful in core was
that it was similar to 'pg_locks' and 'pg_stats', i.e. exposing some
internals in a convenient queryable manner (useful for problem solving).
On the security front, I should have added a 'REVOKE ALL ...FROM PUBLIC'
on the view to at least control access (fortunately easy to add).
cheers
Mark
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-08 Thread Mark Kirkwood
Neil Conway wrote:
Mark Kirkwood wrote:

+ TupleDescInitEntry(tupledesc, (AttrNumber) 5, relblockbumber,
+ NUMERICOID, -1, 0); 
I think this should be an int4, not numeric.
I was looking for an UINT4OID :-), but numeric seemed the best
compromise (safer than overflowing int4). I guess I could add a type
'blocknumber' that is actually the same as 'oid', but it seems liks a
lot of effort for one column.
cheers
Mark
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-08 Thread Mark Kirkwood
Tom Lane wrote:
One reason for making it contrib is that I don't think you've got it
entirely right yet, and there will be several iterations before it
settles down.  In a contrib module that is no problem, in core it's a
forced initdb each time.
Yeah - certainly less intrusive as a contrib if amendments are required!
Barring a huge groundswell of support for it in core :-) I will resubmit
a patch for it as a contrib module.
cheers
Mark
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-08 Thread Mark Kirkwood
Mark Kirkwood wrote:
Tom Lane wrote:
One reason for making it contrib is that I don't think you've got it
entirely right yet, and there will be several iterations before it
settles down.  In a contrib module that is no problem, in core it's a
forced initdb each time.

Yeah - certainly less intrusive as a contrib if amendments are required!
Barring a huge groundswell of support for it in core :-) I will resubmit
a patch for it as a contrib module.
And here it is...
Mark

diff -Nacr pgsql.orig/contrib/Makefile pgsql/contrib/Makefile
*** pgsql.orig/contrib/Makefile Thu Mar  3 11:29:53 2005
--- pgsql/contrib/Makefile  Wed Mar  9 10:10:41 2005
***
*** 26,31 
--- 26,32 
noupdate\
oid2name\
pg_autovacuum   \
+   pg_buffercache  \
pg_dumplo   \
pg_trgm \
pgbench \
diff -Nacr pgsql.orig/contrib/pg_buffercache/Makefile 
pgsql/contrib/pg_buffercache/Makefile
*** pgsql.orig/contrib/pg_buffercache/Makefile  Thu Jan  1 12:00:00 1970
--- pgsql/contrib/pg_buffercache/Makefile   Wed Mar  9 11:42:05 2005
***
*** 0 
--- 1,21 
+ # $PostgreSQL$
+ 
+ MODULE_big = pg_buffercache
+ PG_CPPFLAGS = -I$(libpq_srcdir)
+ OBJS  = pg_buffercache_pages.o
+ SHLIB_LINK = $(libpq)
+ 
+ DATA_built = pg_buffercache.sql 
+ DOCS = README.pg_buffercache
+ REGRESS = pg_buffercache
+ 
+ 
+ ifdef USE_PGXS
+ PGXS = $(shell pg_config --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/pg_buffercache
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -Nacr pgsql.orig/contrib/pg_buffercache/README.pg_buffercache 
pgsql/contrib/pg_buffercache/README.pg_buffercache
*** pgsql.orig/contrib/pg_buffercache/README.pg_buffercache Thu Jan  1 
12:00:00 1970
--- pgsql/contrib/pg_buffercache/README.pg_buffercache  Wed Mar  9 12:26:56 2005
***
*** 0 
--- 1,112 
+ Pg_buffercache - Real-time queries on the shared buffer cache.
+ --
+ 
+   This module consists of a C function 'pg_buffercache_pages()' that returns 
+   a set of records, plus a view 'pg_buffercache' to wrapper the function.
+ 
+   The intent is to do for the buffercache what pg_locks does for locks, i.e -
+   ability to examine what is happening at any given time without having to 
+   restart or rebuild the server with debugging code added.
+ 
+   By default public access is REVOKED from both of these, just in case there
+   are security issues lurking.
+ 
+ 
+ Installation
+ 
+ 
+   Build and install the main Postgresql source, then this contrib module:
+ 
+   $ cd contrib/pg_buffercache
+   $ gmake
+   $ gmake install
+ 
+ 
+   To register the functions:
+ 
+   $ psql -d database -f pg_buffercache.sql
+ 
+ 
+ Notes
+ -
+ 
+   The definition of the columns exposed in the view is:
+ 
+Column |  references  | Description
+   +--+
+bufferid   |  | Id, 1-shared_buffers.
+relfilenode| pg_class.relfilenode | Refilenode of the relation.
+reltablespace  | pg_tablespace.oid| Tablespace oid of the relation.
+reldatabase| pg_database.oid  | Database for the relation.
+relblocknumber |  | Offset of the page in the relation.
+isdirty|  | Is the page dirty?
+ 
+ 
+   There is one row for each buffer in the shared cache. Unused buffers are
+   shown with all fields null except bufferid.
+ 
+   Because the cache is shared by all the databases, there are pages from
+   relations not belonging to the current database.
+ 
+   When the pg_buffercache view is accessed, internal buffer manager locks are 
+   taken, and a copy of the buffer cache data is made for the view to display. 
+   This ensures that the view produces a consistent set of results, while not 
+   blocking normal buffer activity longer than necessary.  Nonetheless there 
+   could be some impact on database performance if this view is read often.
+ 
+ 
+ Sample output
+ -
+ 
+   regression=# \d pg_buffercache;
+View public.pg_buffercache
+Column |  Type   | Modifiers
+   +-+---
+bufferid   | integer |
+relfilenode| oid |
+reltablespace  | oid |
+reldatabase| oid |
+relblocknumber | numeric |
+isdirty| boolean |
+   View definition:
+SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase, 
+   p.relblocknumber, p.isdirty
+  FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid, 
+  reltablespace oid, reldatabase oid, relblocknumber numeric(10,0), 
+  isdirty boolean);
+ 
+   regression=# SELECT c.relname, count(*) AS buffers
+FROM pg_class c, pg_buffercache b

Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-07 Thread Mark Kirkwood
The latest iteration.
I have added documentation and updated the expected output so that the
regression tests pass.
In addition, after looking at the various system view names, I decided
that 'pg_cache_dump' does not fit in nicely - so chose an more Pg
suitable name of 'pg_buffercache'. Some renaming of the backend
functions happened too. Finally, since I was saving blocknum, it went
into the view as well.
Hopefully I am dealing with invalid buffer tags sensibly now. The
per-buffer spin lock is still being held - altho it is obviously trivial
to remove if not actually required.
regards
Mark
P.s : remembered to use diff -c
Mark Kirkwood wrote:
Neil Conway wrote:
Tom Lane wrote:
It'd be possible to dispense with the per-buffer spinlocks so long as
you look only at the tag (and perhaps the TAG_VALID flag bit).  The
tags can't be changing while you hold the BufMappingLock.  

That's what I had thought at first, but this comment in 
buf_internals.h dissuaded me: buf_hdr_lock must be held to examine or 
change the tag, flags, usage_count, refcount, or wait_backend_id 
fields. The comment already notes this isn't true if you've got the 
buffer pinned; it would be worth adding another exception for holding 
the BufMappingLock, IMHO.

I'm dubious that there's any point in recording information as
transient as the refcounts and dirtybits

I think it's worth recording dirty bits -- it provides an indication 
of the effectiveness of the bgwriter, for example. Reference counts 
could be done away with, although I doubt it would have a significant 
effect on the time spent holding the lock.


Let's suppose refcount is eliminated. I will then be examining the tag,
flags and buf_id elements of the buffer. Holding the BufMappingLock
prevents the tag changing, but what about the flags?
In addition Tom pointed out that I am not examining the BM_TAG_VALID or
BM_VALID flag bits (I am only checking if tag.blockNum equals
InvalidBlockNumber). My initial thought is to handle !BM_TAG_VALID or
!BM_VALID similarly to InvalidBlockNumber i.e all non buf_id fields set
to NULL.
Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


diff -Nacr pgsql.orig/doc/src/sgml/catalogs.sgml 
pgsql/doc/src/sgml/catalogs.sgml
*** pgsql.orig/doc/src/sgml/catalogs.sgml   Mon Mar  7 12:20:17 2005
--- pgsql/doc/src/sgml/catalogs.sgmlTue Mar  8 12:03:50 2005
***
*** 3875,3880 
--- 3875,3885 
  
  tbody
   row
+   entrylink 
linkend=view-pg-buffercachestructnamepg_buffercache/structname/link/entry
+   entryshared buffer cache/entry
+  /row
+ 
+  row
entrylink 
linkend=view-pg-indexesstructnamepg_indexes/structname/link/entry
entryindexes/entry
   /row
***
*** 3917,3922 
--- 3922,4021 
  /tbody
 /tgroup
/table
+  /sect1
+ 
+  sect1 id=view-pg-buffercache
+   titlestructnamepg_buffercache/structname/title
+ 
+   indexterm zone=view-pg-buffercache
+primarypg_buffercache/primary
+   /indexterm
+   para
+The view structnamepg_buffercache/structname provides access to
+some information from the shared buffer cache.
+   /para
+ 
+   para
+There is one row for each buffer in the shared cache. Unused buffers are 
+shown with all fields null except structfieldbufferid/structfield.
+Because the cache is shared by all the databases, there are pages from
+relations not belonging to the current database. 
+   /para
+ 
+ 
+   table
+titlestructnamepg_buffercache/structname Columns/title
+ 
+tgroup cols=4
+ thead
+  row
+   entryName/entry
+   entryType/entry
+   entryReferences/entry
+   entryDescription/entry
+  /row
+ /thead
+ tbody
+  row
+   entrybufferid/entry
+   entrytypeinteger/type/entry
+   entry/entry
+   entry
+The buffer number. This is numbered 1 to 
varnameshared_buffers/varname.
+   /entry
+  /row
+  row
+   entryrelfilenode/entry
+   entrytypeoid/type/entry
+   entryliterallink 
linkend=catalog-pg-classstructnamepg_class/structname/link.relfilenode/literal/entry
+   entry
+   The on-disk file for the relation that this page came from.
+   /entry
+  /row
+  row
+   entryreltablespace/entry
+   entrytypeoid/type/entry
+   entry
+   literallink 
linkend=catalog-pg-tablespacestructnamepg_tablespace/structname/link.oid/literal
+   /entry
+   entryTablespace the corresponding relation is in./entry
+  /row
+  row
+   entryreldatabase/entry
+   entrytypeoid/type/entry
+   entryliterallink 
linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry
+   entry
+Database the corresponding relation belongs to, or zero if the 
+relation is a globally-shared table/entry
+  /row

Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-06 Thread Mark Kirkwood
Neil Conway wrote:
Tom Lane wrote:
It'd be possible to dispense with the per-buffer spinlocks so long as
you look only at the tag (and perhaps the TAG_VALID flag bit).  The
tags can't be changing while you hold the BufMappingLock.  

That's what I had thought at first, but this comment in buf_internals.h 
dissuaded me: buf_hdr_lock must be held to examine or change the tag, 
flags, usage_count, refcount, or wait_backend_id fields. The comment 
already notes this isn't true if you've got the buffer pinned; it would 
be worth adding another exception for holding the BufMappingLock, IMHO.

I'm dubious that there's any point in recording information as
transient as the refcounts and dirtybits

I think it's worth recording dirty bits -- it provides an indication of 
the effectiveness of the bgwriter, for example. Reference counts could 
be done away with, although I doubt it would have a significant effect 
on the time spent holding the lock.


Let's suppose refcount is eliminated. I will then be examining the tag,
flags and buf_id elements of the buffer. Holding the BufMappingLock
prevents the tag changing, but what about the flags?
In addition Tom pointed out that I am not examining the BM_TAG_VALID or
BM_VALID flag bits (I am only checking if tag.blockNum equals
InvalidBlockNumber). My initial thought is to handle !BM_TAG_VALID or
!BM_VALID similarly to InvalidBlockNumber i.e all non buf_id fields set
to NULL.
Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-05 Thread Mark Kirkwood
Neil Conway wrote:
If you do decide to hold the BufMappingLock, it might make
sense to:
1. allocate an array of NBuffers elements
2. acquire BufferMappingLock in share mode
3. sequentially scan through the buffer pool, copying data into the array
4. release the lock
5. on each subsequent call to the SRF, format and return an element of 
the array

Which should reduce the time to lock is held. This will require 
allocating NBuffers * size_of_stats memory (where size_of_stats will be 
something like 16 bytes).

That is a better approach, so I've used it in this new iteration.
In addition to holding the BufMappingLock, each buffer header is (spin)
locked before examining it, hopefully this is correct - BTW, I like the
new buffer lock design.
I'm still using BuildTupleFromCStrings, so there is considerable use of
sprintf conversion and temporary char * stuff. I would like this to be
a bit cleaner, so any suggestions welcome.
regards
Mark

diff -Naur pgsql.orig/src/backend/catalog/system_views.sql 
pgsql/src/backend/catalog/system_views.sql
--- pgsql.orig/src/backend/catalog/system_views.sql Fri Mar  4 14:23:09 2005
+++ pgsql/src/backend/catalog/system_views.sql  Fri Mar  4 14:21:33 2005
@@ -277,3 +277,9 @@
 DO INSTEAD NOTHING;
 
 GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
+
+CREATE VIEW pg_cache_dump AS
+   SELECT D.* FROM pg_cache_dump() AS D
+   (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, 
+isdirty bool, refcount int4);
+
diff -Naur pgsql.orig/src/backend/utils/adt/cachedump.c 
pgsql/src/backend/utils/adt/cachedump.c
--- pgsql.orig/src/backend/utils/adt/cachedump.cThu Jan  1 12:00:00 1970
+++ pgsql/src/backend/utils/adt/cachedump.c Sat Mar  5 20:21:45 2005
@@ -0,0 +1,221 @@
+/*-
+ *
+ * cachedump.c
+ *display some contents of the buffer cache
+ *
+ * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *   $PostgreSQL$
+ *-
+ */
+#include postgres.h
+#include funcapi.h
+#include catalog/pg_type.h
+#include storage/buf_internals.h
+#include storage/bufmgr.h
+#include utils/relcache.h
+#include utils/builtins.h
+
+
+#define NUM_CACHE_DUMP_ELEM6
+
+/*
+ * Record structure holding the to be exposed cache data.
+ */
+typedef struct
+{
+   uint32  bufferid;
+   Oid relfilenode;
+   Oid reltablespace;
+   Oid reldatabase;
+   boolisdirty;
+   uint32  refcount;
+   BlockNumber blocknum;
+
+} CacheDumpRec;
+
+
+/*
+ * Function context for data persisting over repeated calls.
+ */
+typedef struct 
+{
+   AttInMetadata   *attinmeta;
+   CacheDumpRec*record;
+   char*values[NUM_CACHE_DUMP_ELEM];
+} CacheDumpContext;
+
+
+/*
+ * Function returning data from the shared buffer cache - buffer number,
+ * relation node/tablespace/database, dirty indicator and refcount.
+ */
+Datum
+cache_dump(PG_FUNCTION_ARGS)
+{
+   FuncCallContext *funcctx;
+   Datum   result;
+   MemoryContext   oldcontext;
+   CacheDumpContext*fctx;  /* User function context. */
+   TupleDesc   tupledesc;
+   HeapTuple   tuple;
+
+   if (SRF_IS_FIRSTCALL())
+   {
+   RelFileNode rnode;
+   uint32  i;
+   BufferDesc  *bufHdr;
+
+
+   funcctx = SRF_FIRSTCALL_INIT();
+
+   /* Switch context when allocating stuff to be used in later 
calls */
+   oldcontext = 
MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
+   
+   /* construct a tuple to return */
+   tupledesc = CreateTemplateTupleDesc(NUM_CACHE_DUMP_ELEM, false);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 1, bufferid,
+   
INT4OID, -1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 2, relfilenode,
+   OIDOID, 
-1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 3, reltablespace,
+   OIDOID, 
-1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 4, reldatabase,
+   OIDOID, 
-1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 5, isdirty,
+   
BOOLOID, -1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 6, refcount

Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-04 Thread Mark Kirkwood
Tom Lane wrote:
Mark Kirkwood [EMAIL PROTECTED] writes:
I am using anoncvs from yesterday, so if Tom's new scheme is
*very* new I may be missing it.

It's not committed yet ;-)
Yep - that's pretty new, apologies for slow grey matter... I have been 
following the discussion about the new buffer design, but didn't think 
to connect it to this context!

cheers
Mark
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-03 Thread Mark Kirkwood
Neil Conway wrote:
I don't like accessing shared data structures without acquiring the 
appropriate locks; even if it doesn't break anything, it seems like just 
asking for trouble. In order to be able to inspect a buffer's tag in 
Tom's new locking scheme (not yet in HEAD, but will be in 8.1), you need 
only hold a per-buffer lock. That will mean acquiring and releasing a 
spinlock for each buffer, which isn't _too_ bad...


I am looking at this bit now.
I take the first part to mean that don't need to wrap
LWLockAcquire(BufMgrLock, LW_EXCLUSIVE) ... LWLockRelease(BufMgrLock)
around the inspection of the buffers (?)
This per buffer lock, are we talking about LockBuffer(buf,
BUFFER_LOCK_SHARE) ... releaseBuffer(buf) ?
If so, it looks like I need to do some stuff with ResourceOwners,
otherwise ReleaseBuffer will fail (or am I wandering up the wrong track
here?). I am using anoncvs from yesterday, so if Tom's new scheme is
*very* new I may be missing it.
Thanks
Mark
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] WIP: buffer manager rewrite (take 2)

2005-03-02 Thread Mark Wong
On Wed, Feb 16, 2005 at 07:50:28PM -0500, Tom Lane wrote:
 Second iteration of buffer manager rewrite.  This uses the idea of a
 usage counter instead of just a recently-used flag bit.  I allowed the
 counter to go up to 5, but some playing around with that value would
 be interesting.  (Tweak BM_MAX_USAGE_COUNT in
 src/include/storage/buf_internals.h, then recompile the files in
 src/backend/storage/buffer/.)  Also there are more GUC variables now
 for controlling the bgwriter.
 

I see a huge performance increase, when applied to CVS from 20050301.

Baseline against 8.0.1:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/309/
throughput 3639.97

CVS from 20050301:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/314/
throughput 5483.01
I only ran this for 30 minutes, as opposed to 60, but it looks
promising.

So about a 50% increase in throughput for my test.  Not to shabby. ;)

Mark

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] WIP: buffer manager rewrite (take 2)

2005-03-02 Thread Mark Wong
On Wed, Mar 02, 2005 at 08:48:35PM -0500, Tom Lane wrote:
 Mark Wong [EMAIL PROTECTED] writes:
  CVS from 20050301 [ plus clock-sweep buffer manager ] :
  http://www.osdl.org/projects/dbt2dev/results/dev4-010/314/
  throughput 5483.01
  I only ran this for 30 minutes, as opposed to 60, but it looks
  promising.
 
  So about a 50% increase in throughput for my test.  Not to shabby. ;)
 
 Sweet ... and the response-time improvement is just stunning.  I think
 we may have a winner.  Is there a reason you didn't run the test longer
 though?

I would normally run for 1 hour, but I guess my fingers were thinking
something different at the time. =P

Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PATCHES] Display Pg buffer cache (WIP)

2005-03-02 Thread Mark Kirkwood
I found it useful to be able to look at what relations are occupying the
cache (mainly for debugging and comparing with page estimates).
I am wondering if it might be a useful addition generally?
How it works a SRF called dump_cache() is created, which is then exposed
as system view pg_dump_cache. So stuff like the following can be
performed (cache of 2000 immediately after 'make installcheck'):
regression=# SELECT c.relname, count(*) AS buffers
regression-# FROM pg_class c, pg_dump_cache d
regression-# WHERE d.relfilenode = c.relfilenode
regression-# GROUP BY c.relname
regression-# ORDER BY 2 DESC LIMIT 10;
   relname | buffers
-+-
   tenk1   | 345
   tenk2   | 345
   onek| 138
   pg_attribute| 102
   road|  81
   pg_attribute_relid_attnam_index |  74
   onek2   |  69
   pg_proc |  59
   pg_proc_proname_args_nsp_index  |  56
   hash_f8_heap|  49
(10 rows)
regression=#
As of now the patch breaks 1 regression test (rules - that new view...)
and does not peek into the contents of the buffers themselves (I was
mainly interested in which relations were there), and does not worry too
much about a consistent view of the buffer contents (as I didn't want it
to block all other activity!).
If it seems like a worthwhile addition I will amend the regression
expected and resubmit.
Any comments?
Mark
diff -Naur pgsql.orig/src/backend/catalog/system_views.sql 
pgsql/src/backend/catalog/system_views.sql
--- pgsql.orig/src/backend/catalog/system_views.sql Thu Mar  3 11:29:55 2005
+++ pgsql/src/backend/catalog/system_views.sql  Thu Mar  3 11:41:24 2005
@@ -277,3 +277,8 @@
 DO INSTEAD NOTHING;
 
 GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
+
+CREATE VIEW pg_dump_cache AS
+   SELECT D.* FROM pg_dump_cache() AS D
+   (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid);
+
diff -Naur pgsql.orig/src/backend/utils/adt/Makefile 
pgsql/src/backend/utils/adt/Makefile
--- pgsql.orig/src/backend/utils/adt/Makefile   Thu Mar  3 11:29:53 2005
+++ pgsql/src/backend/utils/adt/MakefileThu Mar  3 11:32:10 2005
@@ -24,7 +24,7 @@
tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \
network.o mac.o inet_net_ntop.o inet_net_pton.o \
ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \
-   ascii.o quote.o pgstatfuncs.o encode.o
+   ascii.o quote.o pgstatfuncs.o encode.o dumpcache.o
 
 like.o: like.c like_match.c
 
diff -Naur pgsql.orig/src/backend/utils/adt/dumpcache.c 
pgsql/src/backend/utils/adt/dumpcache.c
--- pgsql.orig/src/backend/utils/adt/dumpcache.cThu Jan  1 12:00:00 1970
+++ pgsql/src/backend/utils/adt/dumpcache.c Thu Mar  3 11:51:53 2005
@@ -0,0 +1,119 @@
+/*-
+ *
+ * dumpcache.c
+ *display some contents for the buffer cache
+ *
+ *-
+ */
+#include postgres.h
+#include funcapi.h
+#include catalog/pg_type.h
+#include storage/buf_internals.h
+#include storage/bufmgr.h
+#include utils/relcache.h
+
+
+extern Datum dump_cache(PG_FUNCTION_ARGS);
+
+
+/*
+ * Function context for data persisting over repeated calls.
+ */
+typedef struct {
+   int buffer;
+   AttInMetadata   *attinmeta;
+   BufferDesc  *bufhdr;
+   RelFileNode rnode;
+   char*values[3];
+} dumpcache_fctx;
+
+
+/*
+ * Return a tuple that has bufferid, relfilenoide, reltablespace and 
+ * reldatabase OIDs.
+ */
+Datum
+dump_cache(PG_FUNCTION_ARGS)
+{
+   FuncCallContext *funcctx;
+   Datum   result;
+   MemoryContext   oldcontext;
+   dumpcache_fctx  *fctx;  /* User function context. */
+   TupleDesc   tupledesc;
+   HeapTuple   tuple;
+
+   if (SRF_IS_FIRSTCALL())
+   {
+   funcctx = SRF_FIRSTCALL_INIT();
+
+   /* Switch context when allocating stuff to be used in later 
calls */
+   oldcontext = 
MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
+   
+   /* construct a tuple to return */
+   tupledesc = CreateTemplateTupleDesc(4, false);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 1, bufferid,
+   
INT4OID, -1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 2, relfilenode,
+   OIDOID, 
-1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 3, reltablespace

Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer (+ memory leak)

2005-03-01 Thread Mark Wong
On Thu, Feb 03, 2005 at 07:25:55PM +0900, ITAGAKI Takahiro wrote:
 Hello everyone.
 
 I fixed two bugs in the patch that I sent before.
 Check and test new one, please.

Ok, finally got back into the office and was able to run 1 set of
tests.

So the new baseline result with 8.0.1:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/309/
Throughput: 3639.97

Results with the patch but open_direct not set:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/308/
Throughput: 3494.72

Results with the patch and open_direct set:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/312/
Throughput: 3489.69

You can verify that the wall_sync_method is set to open_direct under
the database parameters link, but I'm wondering if I missed
something.  It looks a little odd the the performance dropped.

Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] WIP: bufmgr rewrite per recent discussions

2005-02-22 Thread Mark Cave-Ayland

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: 17 February 2005 15:46
 To: Mark Cave-Ayland
 Cc: pgsql-patches@postgresql.org
 Subject: Re: [PATCHES] WIP: bufmgr rewrite per recent discussions

(cut)

  3. Pad the LWLock struct (in
  src/backend/storage/lmgr/lwlock.c) to some power of 2 up to 
  128 bytes --- same issue of space wasted versus cross-lock 
 contention.
 
  Having seen the results above, is it still worth looking at this?
 
 Yeah, probably, because there are other possible contention 
 sources besides buffers that might be alleviated by padding 
 LWLocks.  For instance the buffer manager global locks and 
 the LockMgrLock are probably all in the same cache line at the moment.

Hi Tom,

Here are the results from the LWLock test. Firstly here are the results with
your second patch with no modifications as a refresher:


PATCH #2 No modifications


10001
10  
204.909702  205.01051   345.098727  345.411606
375.812059  376.37741
195.100496  195.197463  348.791481  349.111363
314.718619  315.139878
199.637965  199.735195  313.561366  313.803225
365.061177  365.666103
195.935529  196.029082  325.893744  326.171754
370.040623  370.625072
196.661374  196.756481  314.468751  314.711517
319.643145  320.099164

Mean:
198.4490132 198.5457462 329.5628138 329.841893
349.0551246 349.5815254


Here are the results with ALIGNOF_BUFFER=128 and padding LWLock to 64 bytes:


PATCH #2 with ALIGNOF_BUFFER = 128 and LWLock padded to 64 bytes

10001
10  
199.672932  199.768756  307.051571  307.299088
367.394745  368.016266
196.443585  196.532912  344.898219  345.204228
375.300921  375.979186
191.098411  191.185807  329.485633  329.77679
305.413304  305.841889
201.110132  201.210049  314.219784  314.476356
314.03306   314.477869
196.615748  196.706032  337.315295  337.62437
370.537538  371.16593

Mean:
196.9881616 197.0807112 326.5941004 326.8761664
346.5359136 347.096228


And finally here are the results with ALIGNOF_BUFFER = 128 and LWLock padded
to 128 bytes:


PATCH #2 with ALIGNOF_BUFFER = 128 and LWLock padded to 128 bytes

10001
10  
195.357405  195.449704  346.916069  347.235895
373.354775  373.934842
190.428061  190.515077  323.932436  324.211975
361.908206  362.476886
206.059573  206.159472  338.288825  338.590642
306.22198   306.618689
195.336711  195.427757  309.316534  309.56603
305.295391  305.695336
188.896205  188.983969  322.889651  323.245394
377.673313  378.269907

Mean:
195.215591  195.3071958 328.268703  328.5699872
344.890733  345.399132


So again I don't see any performance improvement. However, I did manage to
find out what was causing the 'freezing' I mentioned in my earlier email. By
temporarily turning fsync=false in postgresql.conf, the freezing goes away,
so I'm guessing it's something to do with disk/kernel caches and buffering.
Since the drives are software RAID 1 with ext3 I guess that the server is
running I/O bound under load which is perhaps why padding the data
structures doesn't seem to make much difference. I'm not sure whether this
makes the test results particularly useful though :(


Kind regards,

Mark.


WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk



---(end of broadcast)---
TIP 8: explain analyze is your friend


  1   2   >