Sorry for the late reply. To Michael. Thank you. I know this commitfest is ongoing and I'm not targeting for this.
On Thu, Mar 7, 2019 at 4:54 PM Heikki Linnakangas <hlinn...@iki.fi> wrote: > On 06/03/2019 22:06, Paul Guo wrote: > > The patch also modifies heap_multi_insert() a bit to do a bit further > > code-level optimization by using static memory, instead of using memory > > context and dynamic allocation. > > If toasting is required, heap_prepare_insert() creates a palloc'd tuple. > That is still leaked to the current memory context. > > Thanks. I checked the code for that but apparently, I missed that one. I'll see what proper context can be used for CTAS. For copy code maybe just revert my change. > Leaking into the current memory context is not a bad thing, because > resetting a memory context is faster than doing a lot of pfree() calls. > The callers just need to be prepared for that, and use a short-lived > memory context. > > > By the way, while looking at the code, I noticed that there are 9 local > > arrays with large length in toast_insert_or_update() which seems to be a > > risk of stack overflow. Maybe we should put it as static or global. > > Hmm. We currently reserve 512 kB between the kernel's limit, and the > limit we check in check_stack_depth(). See STACK_DEPTH_SLOP. Those > arrays add up to 52800 bytes on a 64-bit maching, if I did my math > right. So there's still a lot of headroom. I agree that it nevertheless > seems a bit excessive, though. > I was worried about some recursive calling of it, but probably there should be no worry for toast_insert_or_update(). > > With the patch, > > > > Time: 4728.142 ms (00:04.728) > > Time: 14203.983 ms (00:14.204) > > Time: 1008.669 ms (00:01.009) > > > > Baseline, > > Time: 11096.146 ms (00:11.096) > > Time: 13106.741 ms (00:13.107) > > Time: 1100.174 ms (00:01.100) > > Nice speedup! > > > While for toast and large column size there is < 10% decrease but for > > small column size the improvement is super good. Actually if I hardcode > > the batch count as 4 all test cases are better but the improvement for > > small column size is smaller than that with current patch. Pretty much > > the number 4 is quite case specific so I can not hardcode that in the > > patch. Of course we could further tune that but the current value seems > > to be a good trade-off? > > Have you done any profiling, on why the multi-insert is slower with > large tuples? In principle, I don't see why it should be slower. > Thanks for the suggestion. I'll explore a bit more on this. > > - Heikki >