"Luke Lonergan" <[EMAIL PROTECTED]> writes: > So you know, we=B9ve done some more work on the external sort to remove the > =B3tape=B2 abstraction from the code, which makes a significant improvement.
Improvement where? That code's down in the noise so far as I can tell. I see results like this (with the patched code): CPU: P4 / Xeon with 2 hyper-threads, speed 2793.08 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 240000 samples % symbol name 147310 31.9110 tuplesort_heap_siftup 68381 14.8130 comparetup_index 34063 7.3789 btint4cmp 22573 4.8899 AllocSetAlloc 19317 4.1845 writetup_index 18953 4.1057 tuplesort_gettuple_common 18100 3.9209 mergepreread 17083 3.7006 GetMemoryChunkSpace 12527 2.7137 LWLockAcquire 11686 2.5315 LWLockRelease 6172 1.3370 tuplesort_heap_insert 5392 1.1680 index_form_tuple 5323 1.1531 PageAddItem 4943 1.0708 LogicalTapeWrite 4525 0.9802 LogicalTapeRead 4487 0.9720 LockBuffer 4217 0.9135 heapgettup 3891 0.8429 IndexBuildHeapScan 3862 0.8366 ltsReleaseBlock It appears that a lot of the cycles blamed on tuplesort_heap_siftup are due to cache misses associated with referencing memtuples[] entries that have fallen out of L2 cache. Not sure how to improve that though. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly