Re: [PATCHES] External Sort performance patch

2006-02-19 Thread Simon Riggs
On Sun, 2006-02-19 at 01:16 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Following patch implements matching sort cost calculations in the
  planner in sort_cost()
 
 As given, this didn't even compile.  Cleaned up and applied.

Well given it was a patch-on-patch, I guess I did cause you more
difficulty than was necessary. I'll submit combined patches only in
future. If this truly didn't compile, then I must have submitted an
earlier version in error when transferring the patch between machines.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] Patch Submission Guidelines

2006-02-19 Thread Simon Riggs
On Thu, 2006-02-16 at 15:09 -0500, Robert Treat wrote:
 On Thursday 16 February 2006 00:27, Tom Lane wrote:
  Robert Treat [EMAIL PROTECTED] writes:
   As stated, the following patch adds a list of patch submission guidelines
   based on Simon Riggs suggestions to the developers FAQ.
 
  A couple minor comments ...
 
 
 Attached patch updated based on previous feedback.

Looks like a useful addition to me and I'll do my best to follow it.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] implement prepared queries in plperl

2006-02-19 Thread Andrew Dunstan



Dmitry Karasik wrote:

[patch snipped]

I have cleaned this patch somewhat by removing some bitrot that occurred 
since it was submitted, and adjusting formatting to something more 
closely resembling postgresql style (please remember to follow our style 
in future).


The attached works on HEAD and passes the supplied regression tests.

But why do we have to call spi_freeplan? pltcl, which has prepared 
queries, doesn't require this AFAICS. If memory leaks are an issue, 
maybe we should bless the object into a class with a DESTROY method that 
calls spi_freeplan automatically (not sure to do that in XS but I assume 
it's possible).


cheers

andrew


Index: SPI.xs
===
RCS file: /cvsroot/pgsql/src/pl/plperl/SPI.xs,v
retrieving revision 1.18
diff -c -r1.18 SPI.xs
*** SPI.xs	8 Jan 2006 22:27:52 -	1.18
--- SPI.xs	19 Feb 2006 16:17:40 -
***
*** 111,117 
  		int limit = 0;
  	CODE:
  		if (items  2)
! 			croak(Usage: spi_exec_query(query, limit) or spi_exec_query(query));
  		if (items == 2)
  			limit = SvIV(ST(1));
  		ret_hash = plperl_spi_exec(query, limit);
--- 111,118 
  		int limit = 0;
  	CODE:
  		if (items  2)
! 			croak(Usage: spi_exec_query(query, limit) 
!   or spi_exec_query(query));
  		if (items == 2)
  			limit = SvIV(ST(1));
  		ret_hash = plperl_spi_exec(query, limit);
***
*** 141,145 
--- 142,225 
  	OUTPUT:
  		RETVAL
  
+ SV*
+ spi_spi_prepare(query, ...)
+ 	char* query;
+ 	CODE:
+ 		int i;
+ 		SV** argv;
+ 		if (items  1) 
+ 			Perl_croak(aTHX_ Usage: spi_prepare(query, ...));
+ 		argv = ( SV**) palloc(( items - 1) * sizeof(SV*));
+ 		if ( argv == NULL) 
+ 			Perl_croak(aTHX_ spi_prepare: not enough memory);
+ 		for ( i = 1; i  items; i++) 
+ 			argv[i - 1] = ST(i);
+ 		RETVAL = plperl_spi_prepare(query, items - 1, argv);
+ 		pfree( argv);
+ 	OUTPUT:
+ 		RETVAL
+ 
+ SV*
+ spi_spi_exec_prepared(query, ...)
+ 	char * query;
+ 	PREINIT:
+ 		HV *ret_hash;
+ 	CODE:
+ 		HV *attr = NULL;
+ 		int i, offset = 1, argc;
+ 		SV ** argv;
+ 		if ( items  1) 
+ 			Perl_croak(aTHX_ Usage: spi_exec_prepared(query, [\\%%attr,]  
+ 	   [EMAIL PROTECTED]));
+ 		if ( items  1  SvROK( ST( 1))  SvTYPE( SvRV( ST( 1))) == SVt_PVHV)
+ 		{ 
+ 			attr = ( HV*) SvRV(ST(1));
+ 			offset++;
+ 		}
+ 		argc = items - offset;
+ 		argv = ( SV**) palloc( argc * sizeof(SV*));
+ 		if ( argv == NULL) 
+ 			Perl_croak(aTHX_ spi_exec_prepared: not enough memory);
+ 		for ( i = 0; offset  items; offset++, i++) 
+ 			argv[i] = ST(offset);
+ 		ret_hash = plperl_spi_exec_prepared(query, attr, argc, argv);
+ 		RETVAL = newRV_noinc((SV*)ret_hash);
+ 		pfree( argv);
+ 	OUTPUT:
+ 		RETVAL
+ 
+ SV*
+ spi_spi_query_prepared(query, ...)
+ 	char * query;
+ 	CODE:
+ 		int i;
+ 		SV ** argv;
+ 		if ( items  1) 
+ 			Perl_croak(aTHX_ Usage: spi_query_prepared(query, 
+ 	   [EMAIL PROTECTED]));
+ 		argv = ( SV**) palloc(( items - 1) * sizeof(SV*));
+ 		if ( argv == NULL) 
+ 			Perl_croak(aTHX_ spi_query_prepared: not enough memory);
+ 		for ( i = 1; i  items; i++) 
+ 			argv[i - 1] = ST(i);
+ 		RETVAL = plperl_spi_query_prepared(query, items - 1, argv);
+ 		pfree( argv);
+ 	OUTPUT:
+ 		RETVAL
+ 
+ void
+ spi_spi_freeplan(query)
+ 	char *query;
+ 	CODE:
+ 		plperl_spi_freeplan(query);
+ 
+ void
+ spi_spi_cursor_close(cursor)
+ 	char *cursor;
+ 	CODE:
+ 		plperl_spi_cursor_close(cursor);
+ 
+ 
  BOOT:
  items = 0;  /* avoid 'unused variable' warning */
Index: plperl.c
===
RCS file: /cvsroot/pgsql/src/pl/plperl/plperl.c,v
retrieving revision 1.101
diff -c -r1.101 plperl.c
*** plperl.c	28 Jan 2006 16:20:31 -	1.101
--- plperl.c	19 Feb 2006 16:17:41 -
***
*** 56,61 
--- 56,62 
  #include utils/typcache.h
  #include miscadmin.h
  #include mb/pg_wchar.h
+ #include parser/parse_type.h
  
  /* define this before the perl headers get a chance to mangle DLLIMPORT */
  extern DLLIMPORT bool check_function_bodies;
***
*** 99,104 
--- 100,117 
  	MemoryContext	  tmp_cxt;
  } plperl_call_data;
  
+ /**
+  * The information we cache about prepared and saved plans
+  **/
+ typedef struct plperl_query_desc
+ {
+ 	char		qname[sizeof(long) * 2 + 1];
+ 	void	   *plan;
+ 	int			nargs;
+ 	Oid		   *argtypes;
+ 	FmgrInfo   *arginfuncs;
+ 	Oid		   *argtypioparams;
+ } plperl_query_desc;
  
  /**
   * Global data
***
*** 107,112 
--- 120,126 
  static bool plperl_safe_init_done = false;
  static PerlInterpreter *plperl_interp = NULL;
  static HV  *plperl_proc_hash = NULL;
+ static HV  *plperl_query_hash = NULL;
  
  static bool plperl_use_strict = false;
  
***
*** 233,239 
  	

Re: [PATCHES] patch fixing the old RETURN NEXT bug

2006-02-19 Thread Neil Conway
On Sun, 2006-02-12 at 20:15 +0300, Sergey E. Koposov wrote:
 I'm proposing the fix of this bug:
 http://archives.postgresql.org/pgsql-hackers/2005-02/msg00498.php

I think the suggested logic for compatible_tupdesc() is still wrong. For
example, the patch rejects the following:

create table usno (ra real, dec real, bmag real, rmag real, ipix int8);
create function ret_next_check() returns setof usno as $$
declare
r record;
begin
for r in select * from usno loop
return next r;
end loop;
return;
end;
$$ language plpgsql;

insert into usno values (1.0, 2.0, 3.0, 4.0, 5);
select * from ret_next_check();
alter table usno drop column ipix;
select * from ret_next_check(); -- fails, should succeed

Also, this patch should include updates to the regression tests.

-Neil



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] ScanDirections

2006-02-19 Thread James William Pye
Small patch that makes ScanDirection a char(f|b|n) instead of a int or long or
whatever enum makes it. I tried to track down all the areas that depend on it
being specifically -1, 0, or 1. heapam appeared to be the biggest/only one.
(At least `gmake check` passed.)
-- 
Regards, James William Pye
Index: src/backend/access/gist/gistget.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/gist/gistget.c,v
retrieving revision 1.55
diff -c -r1.55 gistget.c
*** src/backend/access/gist/gistget.c   14 Jan 2006 22:03:35 -  1.55
--- src/backend/access/gist/gistget.c   20 Feb 2006 00:41:03 -
***
*** 96,102 
  gistgettuple(PG_FUNCTION_ARGS)
  {
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
!   ScanDirection dir = (ScanDirection) PG_GETARG_INT32(1);
GISTScanOpaque so;
ItemPointerData tid;
boolres;
--- 96,102 
  gistgettuple(PG_FUNCTION_ARGS)
  {
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
!   ScanDirection dir = (ScanDirection) PG_GETARG_CHAR(1);
GISTScanOpaque so;
ItemPointerData tid;
boolres;
Index: src/backend/access/hash/hash.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/hash/hash.c,v
retrieving revision 1.86
diff -c -r1.86 hash.c
*** src/backend/access/hash/hash.c  11 Feb 2006 23:31:33 -  1.86
--- src/backend/access/hash/hash.c  20 Feb 2006 00:41:04 -
***
*** 164,170 
  hashgettuple(PG_FUNCTION_ARGS)
  {
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
!   ScanDirection dir = (ScanDirection) PG_GETARG_INT32(1);
HashScanOpaque so = (HashScanOpaque) scan-opaque;
Relationrel = scan-indexRelation;
Pagepage;
--- 164,170 
  hashgettuple(PG_FUNCTION_ARGS)
  {
IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
!   ScanDirection dir = (ScanDirection) PG_GETARG_CHAR(1);
HashScanOpaque so = (HashScanOpaque) scan-opaque;
Relationrel = scan-indexRelation;
Pagepage;
Index: src/backend/access/heap/heapam.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.206
diff -c -r1.206 heapam.c
*** src/backend/access/heap/heapam.c11 Jan 2006 08:43:11 -  1.206
--- src/backend/access/heap/heapam.c20 Feb 2006 00:41:05 -
***
*** 172,178 
   *tuple as indicated by dir; return the next tuple in 
scan-rs_ctup,
   *or set scan-rs_ctup.t_data = NULL if no more tuples.
   *
!  * dir == 0 means re-fetch the tuple indicated by scan-rs_ctup.
   *
   * Note: the reason nkeys/key are passed separately, even though they are
   * kept in the scan descriptor, is that the caller may not want us to check
--- 172,179 
   *tuple as indicated by dir; return the next tuple in 
scan-rs_ctup,
   *or set scan-rs_ctup.t_data = NULL if no more tuples.
   *
!  * dir == NoMovementScanDirection means re-fetch the tuple indicated by
!  * scan-rs_ctup.
   *
   * Note: the reason nkeys/key are passed separately, even though they are
   * kept in the scan descriptor, is that the caller may not want us to check
***
*** 189,200 
   */
  static void
  heapgettup(HeapScanDesc scan,
!  int dir,
   int nkeys,
   ScanKey key)
  {
HeapTuple   tuple = (scan-rs_ctup);
Snapshotsnapshot = scan-rs_snapshot;
BlockNumber page;
Pagedp;
int lines;
--- 190,202 
   */
  static void
  heapgettup(HeapScanDesc scan,
!  ScanDirection dir,
   int nkeys,
   ScanKey key)
  {
HeapTuple   tuple = (scan-rs_ctup);
Snapshotsnapshot = scan-rs_snapshot;
+   boolbackward = ScanDirectionIsBackward(dir);
BlockNumber page;
Pagedp;
int lines;
***
*** 205,211 
/*
 * calculate next starting lineoff, given scan direction
 */
!   if (dir  0)
{
/*
 * forward scan direction
--- 207,213 
/*
 * calculate next starting lineoff, given scan direction
 */
!   if (ScanDirectionIsForward(dir))
{
/*
 * forward scan direction
***
*** 242,248 
  
linesleft = lines - lineoff + 1;
}
!   else if (dir  0)
{
/*
 * reverse scan direction
--- 244,250 
  
linesleft = lines - lineoff + 1;
}
!   else if (backward)
  

Re: [PATCHES] ScanDirections

2006-02-19 Thread Neil Conway
On Sun, 2006-02-19 at 17:47 -0700, James William Pye wrote:
 Small patch that makes ScanDirection a char(f|b|n) instead of a int or long or
 whatever enum makes it.

Why?

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] plpython tracebacks

2006-02-19 Thread Neil Conway
On Mon, 2006-02-06 at 16:05 -0600, P. Scott DeVos wrote:
 I have been working with plpython for several months and have
 been hampered by the lack of a traceback being logged when a
 plpython function raises an error.  I have written a patch causes
 the PLy_traceback function to fully log the traceback.  The
 output looks just like the traceback output provided by the
 python interpreter.

Can any PL/Python folks comment on whether they want this patch?

 Feedback appreciated.

Context diffs are preferred. Also, diffs should be against the root of
the source tree, and attached as MIME attachements if possible (it seems
the mailing list software munges the patch somewhat otherwise).

 + hdr = PyString_FromString(Traceback (most recent call last):);
 + tmpl = PyString_FromString(  File \%s\, line %s, in %s);
 + ftr = PyString_FromString();
 +
 + tb_list = PyList_New(0); /* create the list of strings */
 + PyList_Append(tb_list, hdr); /* Append the header to the list */

Minor nit: lowercase Append. Similarly, consistent capitalization for
all the preceding comments in the block (adjacent to the variable
declarations) would be nice.

   estr = eob ? PyString_AsString(eob) : Unknown Exception;
 - xstr = PLy_printf(%s: %s, estr, vstr);
 + xstr = PLy_printf(%s%s: %s, tbstr, estr, vstr);

tbstr points into storage owned by tb_log, but the reference to tb_log
has already been released.

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] ScanDirections

2006-02-19 Thread James William Pye
On Sun, Feb 19, 2006 at 08:02:09PM -0500, Neil Conway wrote:
 Why?

*Very* minor. one byte instead of four.

Yes, not really worth the time, but I was poking around at code anyways and
decided to write up a little patch.
-- 
Regards, James William Pye

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] ScanDirections

2006-02-19 Thread Russell Smith

James William Pye wrote:

On Sun, Feb 19, 2006 at 08:02:09PM -0500, Neil Conway wrote:


Why?



*Very* minor. one byte instead of four.


Is one byte any faster anyway?

I would be expecting that with 64bit PC's a 64bit item is as fast, if 
not faster than a 32bit, or 8 bit.


Yes, not really worth the time, but I was poking around at code anyways and
decided to write up a little patch.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] ScanDirections

2006-02-19 Thread Neil Conway
On Sun, 2006-02-19 at 18:12 -0700, James William Pye wrote:
 *Very* minor. one byte instead of four.

Arguably, enumerations are more readable than #defines and easier to
debug. Why do we care about saving 3 bytes for ScanDirection?

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] plpython: fix memory leak

2006-02-19 Thread Neil Conway
Attached is a patch that fixes three Python reference leaks in
PLy_traceback(): the objects returned by PyErr_Fetch() are owned by the
caller, so their reference count should be decremented. The memory leak
can be reproduced as follows:

create function import_fail() returns text as
'import foosocket
return succeeded, that wasn''t supposed to happen'
LANGUAGE plpythonu;

create function test_py() returns void as $$
begin
for i in 1 .. 1000 loop
perform import_fail();
end loop;
return;
end;$$ language plpgsql;

select test_py();

On my system, each invocation of test_py() leaks about 2MB. With the
patch applied, each invocation leaks about 500KB. So obviously there are
some more leaks here -- I searched briefly for additional problems, but
couldn't see anything obvious.

I don't have time at the moment to track down the remaining problems, so
I'd like to apply the patch as-is: I'll come back to the remaining
memory leaks later, unless someone beats me to it.

Barring any objections, I'll apply this patch to HEAD and back branches
tomorrow.

-Neil

# 
# old_revision [734d5f45e00c6d9d6072160562ea351d3e2ef238]
# 
# patch src/pl/plpython/plpython.c
#  from [103ec857e125a1ca082966ee70f7a2d7dbc19928]
#to [97efa8b35c7e6431677f738015f7f5bf86089791]
# 

--- src/pl/plpython/plpython.c	103ec857e125a1ca082966ee70f7a2d7dbc19928
+++ src/pl/plpython/plpython.c	97efa8b35c7e6431677f738015f7f5bf86089791
@@ -2516,6 +2516,7 @@
 	}
 
 	PyErr_NormalizeException(e, v, tb);
+	Py_XDECREF(tb);
 
 	eob = PyObject_Str(e);
 	if (v  ((vob = PyObject_Str(v)) != NULL))
@@ -2534,9 +2535,10 @@
 
 	Py_DECREF(eob);
 	Py_XDECREF(vob);
+	Py_XDECREF(v);
 
 	/*
-	 * intuit an appropriate error level for based on the exception type
+	 * intuit an appropriate error level based on the exception type
 	 */
 	if (PLy_exc_error  PyErr_GivenExceptionMatches(e, PLy_exc_error))
 		*xlevel = ERROR;
@@ -2545,6 +2547,7 @@
 	else
 		*xlevel = ERROR;
 
+	Py_DECREF(e);
 	return xstr;
 }
 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] ScanDirections

2006-02-19 Thread James William Pye
On Sun, Feb 19, 2006 at 08:21:31PM -0500, Neil Conway wrote:
 Arguably, enumerations are more readable than #defines and easier to
 debug.

Agreed. However, I didn't think it would impede much here as I figured gdb would
display 'f' or 'b' or 'n', dunno for sure tho as I didn't try.

Why do we care about saving 3 bytes for ScanDirection?

It's a drop in the bucket--or maybe water vapor in the vacinity, so I doubt I
could find a stunning reason.
-- 
Regards, James William Pye

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] ScanDirections

2006-02-19 Thread Tom Lane
James William Pye [EMAIL PROTECTED] writes:
 Small patch that makes ScanDirection a char(f|b|n) instead of a int or
 long or whatever enum makes it.

Why is this a good idea?  It strikes me as likely to break things,
without really buying anything.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] ScanDirections

2006-02-19 Thread James William Pye
On Sun, Feb 19, 2006 at 09:04:09PM -0500, Tom Lane wrote:
 James William Pye [EMAIL PROTECTED] writes:
  Small patch that makes ScanDirection a char(f|b|n) instead of a int or
  long or whatever enum makes it.
 
 Why is this a good idea?

A more appropriately sized variable for the data that it will be holding?
Certainly not a stunning improvement considering that it's only three bytes.

 It strikes me as likely to break things, without really buying anything.

I tried to be careful and track everything down. However, I imagine I could
have easily missed something, so I understand that concern.


In any case, some parts of the patch merely makes some code make use of the
sdir.h macros instead of depending on -1/0/1--heapam in particular. At least
those portions should be applied, no?
-- 
Regards, James William Pye

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] ScanDirections

2006-02-19 Thread Tom Lane
James William Pye [EMAIL PROTECTED] writes:
 In any case, some parts of the patch merely makes some code make use of the
 sdir.h macros instead of depending on -1/0/1--heapam in particular. At least
 those portions should be applied, no?

I can't see any great advantage there either ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] ScanDirections

2006-02-19 Thread Neil Conway
On Sun, 2006-02-19 at 19:46 -0700, James William Pye wrote:
 In any case, some parts of the patch merely makes some code make use of the
 sdir.h macros instead of depending on -1/0/1--heapam in particular. At least
 those portions should be applied, no?

I agree: using the symbolic names for the ScanDirection alternatives is
more readable than using magic numbers, and costs us nothing.

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] WIP: further sorting speedup

2006-02-19 Thread Luke Lonergan
Title: Re: [PATCHES] WIP: further sorting speedup



Cool!

Well test this sometime soon and get back to you. Were kind of jammed this week, hopefully well get some time.

So you know, weve done some more work on the external sort to remove the tape abstraction from the code, which makes a significant improvement. This involved removing both the Knuth tapes, and the logtape.c codepath. The result is a reasonable improvement in performance (tens of percent), and a dramatic reduction in the amount of code.

Since were looking for a 4-fold improvement based on comparisons to other commercial databases, we feel were not done yet. Our next step (before we got jammed getting our latest MPP release out) was to implement these:
Locate the cause for the excessive time in heap_getattr (you just did it)
Implement something other than replacement selection for creating runs to optimize cache use

- Luke

On 2/19/06 6:40 PM, Tom Lane [EMAIL PROTECTED] wrote:

After applying Simon's recent sort patch, I was doing some profiling and
noticed that sorting spends an unreasonably large fraction of its time
extracting datums from tuples (heap_getattr or index_getattr). The
attached patch does something about this by pulling out the leading sort
column of a tuple when it is received by the sort code or re-read from a
tape. This increases the space needed by 8 or 12 bytes (depending on
sizeof(Datum)) per in-memory tuple, but doesn't cost anything as far as
the on-disk representation goes. The effort needed to extract the datum
at this point is well repaid because the tuple will normally undergo
multiple comparisons while it remains in memory. In some quick tests
the patch seemed to make for a significant speedup, on the order of 30%,
despite increasing the number of runs emitted because of the smaller
available memory.

The choice to pull out just the leading column, rather than all columns,
is driven by concerns of (a) code complexity and (b) memory space.
Having the extra columns pre-extracted wouldn't buy anything anyway
in the common case where the leading key determines the result of
a comparison.

This is still WIP because it leaks memory intra-query (I need to fix it
to clean up palloc'd space better). I thought I'd post it now in case
anyone wants to try some measurements for their own favorite test cases.
In particular it would be interesting to see what happens for a
multi-column sort with lots of duplicated keys in the first column,
which is the case where the least advantage would be gained.

Comments?

regards, tom lane









Re: [PATCHES] WIP: further sorting speedup

2006-02-19 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 So you know, we=B9ve done some more work on the external sort to remove the
 =B3tape=B2 abstraction from the code, which makes a significant improvement.

Improvement where?  That code's down in the noise so far as I can tell.
I see results like this (with the patched code):

CPU: P4 / Xeon with 2 hyper-threads, speed 2793.08 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 24
samples  %symbol name
147310   31.9110  tuplesort_heap_siftup
6838114.8130  comparetup_index
34063 7.3789  btint4cmp
22573 4.8899  AllocSetAlloc
19317 4.1845  writetup_index
18953 4.1057  tuplesort_gettuple_common
18100 3.9209  mergepreread
17083 3.7006  GetMemoryChunkSpace
12527 2.7137  LWLockAcquire
11686 2.5315  LWLockRelease
6172  1.3370  tuplesort_heap_insert
5392  1.1680  index_form_tuple
5323  1.1531  PageAddItem
4943  1.0708  LogicalTapeWrite
4525  0.9802  LogicalTapeRead
4487  0.9720  LockBuffer
4217  0.9135  heapgettup
3891  0.8429  IndexBuildHeapScan
3862  0.8366  ltsReleaseBlock

It appears that a lot of the cycles blamed on tuplesort_heap_siftup are
due to cache misses associated with referencing memtuples[] entries
that have fallen out of L2 cache.  Not sure how to improve that though.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] WIP: further sorting speedup

2006-02-19 Thread Luke Lonergan
Title: Re: [PATCHES] WIP: further sorting speedup



The improvement was pre-Simons patch, and it came from implementing a single pass merge instead of a variable pass based on the number of tapes, as it is in Knuths tape algorithm. Also, the additional tricks in logtape.c were higher in the profile than what I see here.

Simons patch had the effect of reducing the number of passes by increasing the number of tapes depending on the memory available, but thats a long tail effect as seen in figure (70?) in Knuth.

Where Id like this to go is the implementation of a two pass create runs, merge, where the second merge can be avoided unless random access is needed (as discussed previously on list).

In the run creation phase, the idea would be to implement something like quicksort or another L2-cache friendly algorithm (ideas?)

- Luke


On 2/19/06 8:19 PM, Tom Lane [EMAIL PROTECTED] wrote:

Luke Lonergan [EMAIL PROTECTED] writes:
 So you know, we=B9ve done some more work on the external sort to remove the
 =B3tape=B2 abstraction from the code, which makes a significant improvement.

Improvement where? That code's down in the noise so far as I can tell.
I see results like this (with the patched code):

CPU: P4 / Xeon with 2 hyper-threads, speed 2793.08 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 24
samples % symbol name
147310 31.9110 tuplesort_heap_siftup
68381 14.8130 comparetup_index
34063 7.3789 btint4cmp
22573 4.8899 AllocSetAlloc
19317 4.1845 writetup_index
18953 4.1057 tuplesort_gettuple_common
18100 3.9209 mergepreread
17083 3.7006 GetMemoryChunkSpace
12527 2.7137 LWLockAcquire
11686 2.5315 LWLockRelease
6172 1.3370 tuplesort_heap_insert
5392 1.1680 index_form_tuple
5323 1.1531 PageAddItem
4943 1.0708 LogicalTapeWrite
4525 0.9802 LogicalTapeRead
4487 0.9720 LockBuffer
4217 0.9135 heapgettup
3891 0.8429 IndexBuildHeapScan
3862 0.8366 ltsReleaseBlock

It appears that a lot of the cycles blamed on tuplesort_heap_siftup are
due to cache misses associated with referencing memtuples[] entries
that have fallen out of L2 cache. Not sure how to improve that though.

regards, tom lane









Re: [PATCHES] plpython tracebacks

2006-02-19 Thread Harald Armin Massa
Neil,I have written a patch causes the PLy_traceback function to fully log the traceback.The
 output looks just like the traceback output provided by the python interpreter.Can any PL/Python folks comment on whether they want this patch?I am just a humble user of PL/Py, but tracebacks are very wellcome. Not qualified to judge it's code quality I am.
Harald-- GHUM Harald Massapersuadere et programmareHarald Armin MassaReinsburgstraße 202b70197 Stuttgart0173/9409607


Re: [PATCHES] plpython tracebacks

2006-02-19 Thread daveg
On Sun, Feb 19, 2006 at 08:09:09PM -0500, Neil Conway wrote:
 On Mon, 2006-02-06 at 16:05 -0600, P. Scott DeVos wrote:
  I have been working with plpython for several months and have
  been hampered by the lack of a traceback being logged when a
  plpython function raises an error.  I have written a patch causes
  the PLy_traceback function to fully log the traceback.  The
  output looks just like the traceback output provided by the
  python interpreter.
 
 Can any PL/Python folks comment on whether they want this patch?

Yes. Absolutely. This is a required feature for any serious Python user.

One of the great things about python is that almost all debugging can be
done with just the excellent standard python tracebacks. Not having them is a
a not only a inconvience, it signals that the implementation is incomplete
in major ways and, unready for real use.

-dg

-- 
David Gould  [EMAIL PROTECTED]
If simplicity worked, the world would be overrun with insects.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org