On Wed, Oct 8, 2014 at 11:22 PM, Robert Haas <robertmh...@gmail.com> wrote: > > On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to > > get through raw parsing, parse analysis, planning, and execution startup. > > If you can find a few hundred pallocs we can avoid in trivial queries, > > it would get interesting; but I'll be astonished if saving 4 is measurable. > > I got nerd-sniped by this problem today, probably after staring at the > profiling data very similar to what led Andres to ask the question in > the first place. The gains are indeed not measurable on a > macrobenchmark, but I had to write the patch to figure that out, so > here it is. > > Although the gain isn't a measurable percentage of total runtime, it > does appear to be a significant percentage of the palloc traffic on > trivial queries. I did 30-second SELECT-only pgbench runs at scale > factor 10, using prepared queries. According to perf, on unpatched > master, StartTransactionCommand accounts for 11.46% of the calls to > AllocSetAlloc. (I imagine this is by runtime, not by call count, but > it probably doesn't matter much either way.) But with this patch, > StartTransactionCommand's share drops to 4.43%. Most of that is > coming from AtStart_Inval(), which wouldn't be hard to fix either. > > I'm inclined to think that tamping down palloc traffic is worthwhile > even if we can't directly measure the savings on a macrobenchmark. > AllocSetAlloc is often at or near the top of profiling results, but > there's rarely a single caller responsible for enough of those calls > for to make it worth optimizing. But that means that if we refuse to > take patches that save just a few pallocs, we're basically giving up > on ever improving anything in this area, and that doesn't seem like a > good idea either; it's only going to get slowly worse over time as we > add more features.
Yeah, this makes sense, even otherwise also if somebody comes up with a much larger patch which reduce few dozen of pallocs and shows noticeable gain, then it will be difficult to quantify the patch based on which particular pallocs gives improvements, as reducing few pallocs might not give any noticeable gain. > BTW, if anyone's curious what the biggest source of AllocSetAlloc > calls is on this workload, the answer is ExecInitIndexScan(), which > looks to be indirectly responsible for almost half the total (or just > over half, with the patch applied). That apparently calls a lot of > different things that allocate small amounts of memory, and it > accounts for nearly all of the AllocSetAlloc traffic that originates > from PortalStart(). One way to dig into this problem is that we look at each individual pallocs in PortalStart() path and try to see which one's can be optimized, another way is that we introduce a concept of Prepared Portals some thing similar to Prepared Statements, but for Portal. This will not only avoid the pallocs in executor startup phase, but can improve the performance by avoiding many other calls related to executor startup. I understand that this will be a much bigger project, however the gains can also be substantial for prepared statements when all/most of the data fits in memory. It seems other databases also have similar concepts for execution (example Oracle uses Cursor_sharing/Shared cursors to achieve the same) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com