Re: [HACKERS] FSM versus GIN pending list bloat

Simon Riggs Tue, 04 Aug 2015 13:51:04 -0700

On 4 August 2015 at 21:04, Jeff Janes <[email protected]> wrote:


> Couple of questions here...
>>
>> * the docs say "it's desirable to have pending-list cleanup occur in the
>> background", but there is no way to invoke that, except via VACUUM. I
>> think we need a separate function to be able to call this as a background
>> action. If we had that, we wouldn't need much else, would we?
>>
>
> I thought maybe the new bgworker framework would be a way to have a
> backend signal a bgworker to do the cleanup when it notices the pending
> list is getting large.  But that wouldn't directly fix this issue, because
> the bgworker still wouldn't recycle that space (without further changes),
> only vacuum workers do that currently.
>
> But I don't think this could be implemented as an extension, because the
> signalling code has to be in core, so (not having studied the matter at
> all) I don't know if it is good fit for bgworker.
>

We need to expose 2 functions:

1. a function to perform the recycling directly (BRIN has an equivalent
function)

2. a function to see how big the pending list is for a particular index,
i.e. do we need to run function 1?

We can then build a bgworker that polls the pending list and issues a
recycle if and when needed - which is how autovac started.


> * why do we have two parameters: gin_pending_list_limit and fastupdate?
>> What happens if we set gin_pending_list_limit but don't set fastupdate?
>>
>
> Fastupdate is on by default.  If it were turned off, then
> gin_pending_list_limit would be mostly irrelevant for those tables.
> Fastupdate could have been implemented as a magic value (0 or -1) for
> gin_pending_list_limit but that would break backwards compatibility (and
> arguably would not be a better way of doing things, anyway).
>
>
>> * how do we know how to set that parameter? Is there a way of knowing
>> gin_pending_list_limit has been reached?
>>
>
> I don't think there is an easier answer to that.  The trade offs are
> complex and depend on things like how well cached the parts of the index
> needing insertions are, how many lexemes/array elements are in an average
> document, and how many documents inserted near the same time as each other
> share lexemes in common.  And of course what you need to optimize for,
> latency or throughput, and if latency search latency or insert latency.
>

So we also need a way to count the number of times the pending list is
flushed. Perhaps record that on the metapage, so we can see how often it
has happened - and another function to view the stats on that

This and the OP seem like 9.5 open items to me.
>>
>
> I don't think so.  Freeing gin_pending_list_limit from being forcibly tied
> to work_mem is a good thing.  Even if I don't know exactly how to set
> gin_pending_list_limit, I know I don't want to be 4GB just because work_mem
> was set there for some temporary reason.  I'm happy to leave it at its
> default and let its fine tuning be a topic for people who really care about
> every microsecond of performance.
>

OK, I accept this.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] FSM versus GIN pending list bloat

Reply via email to