On Mon, Nov 02, 2020 at 07:53:45AM +0100, Luc Vlaming wrote: > On 30.10.20 05:51, Justin Pryzby wrote: > > On Thu, Oct 22, 2020 at 01:29:53PM +0100, Simon Riggs wrote: > > > On Fri, 16 Oct 2020 at 22:05, Justin Pryzby <pry...@telsasoft.com> wrote: > > > > > > > > > I made this conditional on BEGIN BULK/SET bulk, so I'll solicit > > > > > > comments on that. > > > > > > I think it would be better if this was self-tuning. So that we don't > > > allocate a bulkinsert state until we've done say 100 (?) rows > > > inserted. > > > > I made it an optional, non-default behavior in response to the legitimate > > concern for performance regression for the cases where a loader needs to be > > as > > fast as possible - as compared with our case, where we want instead to > > optimize > > for our reports by making the loaders responsible for their own writes, > > rather > > than leaving behind many dirty pages, and clobbering the cache, too. > > > > Also, INSERT SELECT doesn't immediately help us (telsasoft), since we use > > INSERT .. VALUES () .. ON CONFLICT. This would handle that case, which is > > great, even though that wasn't a design goal. It could also be an integer > > GUC > > to allow configuring the size of the ring buffer. > > > > > You should also use table_multi_insert() since that will give further > > > performance gains by reducing block access overheads. Switching from > > > single row to multi-row should also only happen once we've loaded a > > > few rows, so we don't introduce overahads for smaller SQL statements. > > > > Good idea...multi_insert (which reduces the overhead of individual inserts) > > is > > mostly independent from BulkInsert state (which uses a ring-buffer to avoid > > dirtying the cache). I made this 0002. > > > > This makes INSERT SELECT several times faster, and not clobber the cache > > too. > > > > Time: 4700.606 ms (00:04.701) > > 123 | 1 > > 37 | 2 > > 20 | 3 > > 11 | 4 > > 4537 | 5 > > 11656 | > > > > Time: 1125.302 ms (00:01.125) > > 2171 | 1 > > 37 | 2 > > 20 | 3 > > 11 | 4 > > 111 | 5 > > 14034 | > > > > When enabled, this passes nearly all regression tests, and all but 2 of the > > changes are easily understood. The 2nd patch still needs work. > > > > Hi, > > Came across this thread because I'm working on an improvement for the > relation extension to improve the speed of the bulkinsert itself in (highly) > parallel cases and would like to make sure that our approaches work nicely
Thanks for looking. Since this is a GUC, I thought it would accomodate users optimizing for either inserts vs selects, as well as users who don't want to change their application (they can "ALTER SYSTEM SET bulk_insert=on"). I'm not thrilled about making a new guc, but that seems to be required for "begin bulk", which was the obvious way to make it an 'opt-in' feature. I guess it'd be easy to add a counter to ModifyTableState, although it makes the code a bit less clean and conceivably performs "discontinuously" - inserts 100rows/sec for the first 999 rows and then 200rows/sec afterwards. If you "mix" small inserts and big inserts, it would be a bad strategy to optimize for the small ones. Anyway, in a quick test, small inserts were not slower. https://www.postgresql.org/message-id/20200713015700.GA23581%40telsasoft.com Do you have an example that regresses with bulk insert ? The two patches are separate, and it's possible they should be enabled differently or independently. -- Justin > Given what I've seen and tried so far with various benchmarks I would also > really like to see a different approach here. The "BEGIN BULK" can be > problematic for example if you mix small amounts of inserts and big amounts > in the same transaction, or if your application possibly does a bulk insert > but otherwise mostly OLTP transactions. > To me the idea from Simon sounds good to only use a bulk insert state after > inserting e.g. a 1000 rows, and this also seems more applicable to most > applications compared to requiring a change to any application that wishes > to have faster ingest. > > Another approach could be to combine this, for example, with a few extra > requirements to limit the amount of regressions and first learn more how > this behaves in the field. > We could, for example, only (just throwing out some ideas), require that: > - the relation has a certain size > - a BufferStrategy a maximum certain size is used > - there is a certain amount of lock waiters on relation extension. (like we > do with bulk extend) > - we have extended the relation for at least e.g. 4 MB and not used the FSM > anymore thereby proving that we are doing bulk operations instead of random > small extensions everywhere into the relation that use the FSM. > > Another thing is that we first try to improve the bulk operation facilities > in general and then have another shot at this? Not sure if there is some > benchmark / query that shows where such a 10x slowdown would appear but > maybe that would be worth a look as well possibly. > > Regards, > Luc