On Fri, 29 Mar 2019 at 01:15, Andres Freund <and...@anarazel.de> wrote: > On 2019-03-28 20:48:47 +1300, David Rowley wrote: > > I had a look at this and performance has improved again, thanks. > > However, I'm not sure if the patch is exactly what we need, let me > > explain. > > I'm not entirely sure either, I just haven't really seen an alternative > that's convincing.
I wonder if instead of having the array of slots in ResultRelInfo, have a struct that's local to copy.c containing the array and the number of tuples stored so far. For partitioned tables, we could store this struct in a hashtable by partition Oid. When the partition changes check if we've got this partition Oid in the hash table and keep adding tuples until the buffer fills. We could keep a global count of the number of tuple stored in all the slot arrays and flush all of them when it gets full. The trade-off here would be that instead of flushing on each partition change, we'd do a hash table lookup on each partition change and possibly create a new array of slots. This would allow us to get rid of the code that conditionally switches on/off the batching based on how often the partition is changing. The key to it being better would hang on the hash lookup + multi-row-inserts being faster than single-row-inserts. I'm just not too sure about how to handle getting rid of the slots when we flush all the tuples. Getting rid of them might be a waste, but it might also stop the code creating tens of millions of slots in the worst case. Maybe to fix that we could get rid of the slots in arrays that didn't get any use at all when we flush the tuples, as indicated by a 0 tuple count. This would require a hash seq scan, but maybe we could keep that cheap by flushing early if we get too many distinct partitions. That would save the table from getting bloated if there happened to be a point in the copy stream where we saw high numbers of distinct partitions with just a few tuples each. Multi-inserts won't help much in that case anyway. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services