Thanks Robert for sharing your thoughts. On Mon, Dec 6, 2021 at 11:16 PM Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Dec 6, 2021 at 9:23 AM Ashutosh Sharma <ashu.coe...@gmail.com> > wrote: > > One last point - If we try to clone a huge database, as expected CREATE > DATABASE emits a lot of WALs, causing a lot of intermediate checkpoints > which seems to be affecting the performance slightly. > > Yes, I think this needs to be characterized better. If you have a big > shared buffers setting and a lot of those buffers are dirty and the > template database is small, all of which is fairly normal, then this > new approach should be much quicker. On the other hand, what if the > situation is reversed? Perhaps you have a small shared buffers and not > much of it is dirty and the template database is gigantic. Then maybe > this new approach will be slower. But right now I think we don't know > where the crossover point is, and I think we should try to figure that > out. > Yes I think so too. > > So for example, imagine tests with 1GB of shard_buffers, 8GB, and > 64GB. And template databases with sizes of whatever the default is, > 1GB, 10GB, 100GB. Repeatedly make 75% of the pages dirty and then > create a new database from one of the templates. And then just measure > the performance. Maybe for large databases this approach is just > really the pits -- and if your max_wal_size is too small, it > definitely will be. But, I don't know, maybe with reasonable settings > it's not that bad. Writing everything to disk twice - once to WAL and > once to the target directory - has to be more expensive than doing it > once. But on the other hand, it's all sequential I/O and the data > pages don't need to be fsync'd, so perhaps the overhead is relatively > mild. I don't know. > So far, I haven't found much performance overhead with a few gb of data in the template database. It's just a bit with the default settings, perhaps setting a higher value of max_wal_size would reduce this overhead. -- With Regards, Ashutosh Sharma.