On Mon, 2007-03-05 at 21:34 -0800, Sherry Moore wrote: > - Based on a lot of the benchmarks and workloads I traced, the > target buffer of read operations are typically accessed again > shortly after the read, while writes are usually not. Therefore, > the default operation mode is to bypass L2 for writes, but not > for reads.
Hi Sherry, I'm trying to relate what you've said to how we should proceed from here. My understanding of what you've said is: - Tom's assessment that the observed performance quirk could be fixed in the OS kernel is correct and you have the numbers to prove it - currently Solaris only does NTA for 128K reads, which we don't currently do. If we were to request 16 blocks at time, we would get this benefit on Solaris, at least. The copyout_max_cached parameter can be patched, but isn't a normal system tunable. - other workloads you've traced *do* reuse the same buffer again very soon afterwards when reading sequentially (not writes). Reducing the working set size is an effective technique in improving performance if we don't have a kernel that does NTA or we don't read in big enough chunks (we need both to get NTA to kick in). and what you haven't said - all of this is orthogonal to the issue of buffer cache spoiling in PostgreSQL itself. That issue does still exist as a non-OS issue, but we've been discussing in detail the specific case of L2 cache effects with specific kernel calls. All of the test results have been stand-alone, so we've not done any measurements in that area. I say this because you make the point that reducing the working set size of write workloads has no effect on the L2 cache issue, but ISTM its still potentially a cache spoiling issue. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate