Re: tableam vs. TOAST

Andres Freund Tue, 06 Aug 2019 00:12:42 -0700

Hi,

On 2019-08-05 15:36:59 -0400, Robert Haas wrote:
> On Fri, Aug 2, 2019 at 6:42 PM Andres Freund <and...@anarazel.de> wrote:
> > Hm. This leaves toast_insert_or_update() as a name. That makes it sound
> > like it's generic toast code, rather than heap specific?
> 
> I'll rename it to heap_toast_insert_or_update().  But I think I'll put
> that in 0004 with the other renames.


Makes sense.


> > It's definitely nice how a lot of repetitive code has been deduplicated.
> > Also makes it easier to see how algorithmically expensive
> > toast_insert_or_update() is :(.
> 
> Yep.
> 
> > Shouldn't this condition be the other way round?
> 
> I had to fight pretty hard to stop myself from tinkering with the
> algorithm -- this can clearly be done better, but I wanted to make it
> match the existing structure as far as possible. It also only needs to
> be tested once, not on every loop iteration, so if we're going to
> start changing things, we should go further than just swapping the
> order of the tests.  For now I prefer to draw a line in the sand and
> change nothing.

Makes sense.


> > Couldn't most of these be handled via colflags, instead of having
> > numerous individual checks? I.e. if you had TOASTCOL_COMPRESSED,
> > TOASTCOL_IGNORED, TOASTCOL_MAIN, TOASTFLAG_EXTERNAL, etc, all but the
> > size check ought to boil down to a single mask test?
> 
> I'm not really seeing how more flags would significantly simplify this
> logic, but I might be missing something.

Well, right now you have a number of ifs for each attribute. If you
encoded all the parameters into flags, you could change that to a single
flag test - as far as I can tell, all the parameters could be
represented as that, if you moved MAIN etc into flags.  A single if for
flags (and then the size check) is cheaper than several separate checks.


> > I'm quite unconvinced that making the chunk size specified by the AM is
> > a good idea to do in isolation. We have TOAST_MAX_CHUNK_SIZE in
> > pg_control etc. It seems a bit dangerous to let AMs provide the size,
> > without being very clear that any change of the value will make data
> > inaccessible. It'd be different if the max were only used during
> > toasting.
> 
> I was actually thinking about proposing that we rip
> TOAST_MAX_CHUNK_SIZE out of pg_control.  No real effort has been made
> here to make this something that users could configure, and I don't
> know of a good reason to configure it. It also seems pretty out of
> place in a world where there are multiple AMs floating around -- why
> should heap, and only heap, be checked there? Granted it does have
> some pride of place, but still.

> > I think the *size* checks should be weakened so we check:
> > 1) After each chunk, whether the already assembled chunks exceed the
> >    expected size.
> > 2) After all chunks have been collected, check that the size is exactly
> >    what we expect.
> >
> > I don't think that removes a meaningful amount of error
> > checking. Missing tuples etc get detected by the chunk_ids not being
> > consecutive. The overall size is still verified.
> >
> > The obvious problem with this is the slice fetching logic. For slices
> > with an offset of 0, it's obviously trivial to implement. For the higher
> > slice logic, I'd assume we'd have to fetch the first slice by estimating
> > where the start chunk is based on the current suggest chunk size, and
> > restarting the scan earlier/later if not accurate.  In most cases it'll
> > be accurate, so we'd not loose efficiency.
> 
> I don't feel entirely convinced that there's any rush to do all of
> this right now, and the more I change the harder it is to make sure
> that I haven't broken anything.  How strongly do you feel about this
> stuff?

I think we either should leave the hardcoded limit in place, or make it
actually not fixed. Ripping-it-out-but-not-actually just seems like a
trap for the unwary, without much point.

Greetings,

Andres Freund

Re: tableam vs. TOAST

Reply via email to