On 4 August 2015 at 21:04, Jeff Janes <jeff.ja...@gmail.com> wrote:
> Couple of questions here... >> >> * the docs say "it's desirable to have pending-list cleanup occur in the >> background", but there is no way to invoke that, except via VACUUM. I >> think we need a separate function to be able to call this as a background >> action. If we had that, we wouldn't need much else, would we? >> > > I thought maybe the new bgworker framework would be a way to have a > backend signal a bgworker to do the cleanup when it notices the pending > list is getting large. But that wouldn't directly fix this issue, because > the bgworker still wouldn't recycle that space (without further changes), > only vacuum workers do that currently. > > But I don't think this could be implemented as an extension, because the > signalling code has to be in core, so (not having studied the matter at > all) I don't know if it is good fit for bgworker. > We need to expose 2 functions: 1. a function to perform the recycling directly (BRIN has an equivalent function) 2. a function to see how big the pending list is for a particular index, i.e. do we need to run function 1? We can then build a bgworker that polls the pending list and issues a recycle if and when needed - which is how autovac started. > * why do we have two parameters: gin_pending_list_limit and fastupdate? >> What happens if we set gin_pending_list_limit but don't set fastupdate? >> > > Fastupdate is on by default. If it were turned off, then > gin_pending_list_limit would be mostly irrelevant for those tables. > Fastupdate could have been implemented as a magic value (0 or -1) for > gin_pending_list_limit but that would break backwards compatibility (and > arguably would not be a better way of doing things, anyway). > > >> * how do we know how to set that parameter? Is there a way of knowing >> gin_pending_list_limit has been reached? >> > > I don't think there is an easier answer to that. The trade offs are > complex and depend on things like how well cached the parts of the index > needing insertions are, how many lexemes/array elements are in an average > document, and how many documents inserted near the same time as each other > share lexemes in common. And of course what you need to optimize for, > latency or throughput, and if latency search latency or insert latency. > So we also need a way to count the number of times the pending list is flushed. Perhaps record that on the metapage, so we can see how often it has happened - and another function to view the stats on that This and the OP seem like 9.5 open items to me. >> > > I don't think so. Freeing gin_pending_list_limit from being forcibly tied > to work_mem is a good thing. Even if I don't know exactly how to set > gin_pending_list_limit, I know I don't want to be 4GB just because work_mem > was set there for some temporary reason. I'm happy to leave it at its > default and let its fine tuning be a topic for people who really care about > every microsecond of performance. > OK, I accept this. -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services