On Wed, 2007-12-19 at 12:08 -0500, Mark Mielke wrote: > Stupid question #2: Is it well recognized that the CPU is the > bottleneck in the PostgreSQL sorting mechanism? Or might it be memory > bandwidth and I/O? >
I think it depends a lot on several factors. It's probably a different bottleneck for integers versus localized text, and depends on the available memory and I/O characteristics. > > It would seem to me that any sort worth parallelizing (administrative > and synchronization overhead), must have data larger than the L2 > cache. If larger than the L2 cache, it becomes real memory speed. If > real memory speed, wouldn't one CPU without hardware synchronization, > be able to fill the memory read/write pipe? We do an external merge sort, which involves merging M runs together. You seem to be implying that we can generate the output run at disk speed, and therefore the CPU speed is unimportant. I suspect that the comparison costs are enough that the above statement isn't true in all cases, particularly in the case of localized text. Also, there is probably a lot of memory copying going on, and that probably destroys a lot of the effectiveness of L2 caching. When L2 caching is ineffective, the CPU spends a lot of time just waiting on memory. In that case, it's better to have P threads of execution all waiting on memory operations in parallel. This would explain why "1p2t" would outperform a "1p1t" in Ron's reference above. These are just my first thoughts, however. There is a lot of existing research out there that we can look into, and also a lot of tests that we can run before jumping into this. I think parallel sorting can be looked into separately from the other sorting improvements. Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org