Hi, On 2019-08-05 15:36:59 -0400, Robert Haas wrote: > On Fri, Aug 2, 2019 at 6:42 PM Andres Freund <and...@anarazel.de> wrote: > > Hm. This leaves toast_insert_or_update() as a name. That makes it sound > > like it's generic toast code, rather than heap specific? > > I'll rename it to heap_toast_insert_or_update(). But I think I'll put > that in 0004 with the other renames.
Makes sense. > > It's definitely nice how a lot of repetitive code has been deduplicated. > > Also makes it easier to see how algorithmically expensive > > toast_insert_or_update() is :(. > > Yep. > > > Shouldn't this condition be the other way round? > > I had to fight pretty hard to stop myself from tinkering with the > algorithm -- this can clearly be done better, but I wanted to make it > match the existing structure as far as possible. It also only needs to > be tested once, not on every loop iteration, so if we're going to > start changing things, we should go further than just swapping the > order of the tests. For now I prefer to draw a line in the sand and > change nothing. Makes sense. > > Couldn't most of these be handled via colflags, instead of having > > numerous individual checks? I.e. if you had TOASTCOL_COMPRESSED, > > TOASTCOL_IGNORED, TOASTCOL_MAIN, TOASTFLAG_EXTERNAL, etc, all but the > > size check ought to boil down to a single mask test? > > I'm not really seeing how more flags would significantly simplify this > logic, but I might be missing something. Well, right now you have a number of ifs for each attribute. If you encoded all the parameters into flags, you could change that to a single flag test - as far as I can tell, all the parameters could be represented as that, if you moved MAIN etc into flags. A single if for flags (and then the size check) is cheaper than several separate checks. > > I'm quite unconvinced that making the chunk size specified by the AM is > > a good idea to do in isolation. We have TOAST_MAX_CHUNK_SIZE in > > pg_control etc. It seems a bit dangerous to let AMs provide the size, > > without being very clear that any change of the value will make data > > inaccessible. It'd be different if the max were only used during > > toasting. > > I was actually thinking about proposing that we rip > TOAST_MAX_CHUNK_SIZE out of pg_control. No real effort has been made > here to make this something that users could configure, and I don't > know of a good reason to configure it. It also seems pretty out of > place in a world where there are multiple AMs floating around -- why > should heap, and only heap, be checked there? Granted it does have > some pride of place, but still. > > I think the *size* checks should be weakened so we check: > > 1) After each chunk, whether the already assembled chunks exceed the > > expected size. > > 2) After all chunks have been collected, check that the size is exactly > > what we expect. > > > > I don't think that removes a meaningful amount of error > > checking. Missing tuples etc get detected by the chunk_ids not being > > consecutive. The overall size is still verified. > > > > The obvious problem with this is the slice fetching logic. For slices > > with an offset of 0, it's obviously trivial to implement. For the higher > > slice logic, I'd assume we'd have to fetch the first slice by estimating > > where the start chunk is based on the current suggest chunk size, and > > restarting the scan earlier/later if not accurate. In most cases it'll > > be accurate, so we'd not loose efficiency. > > I don't feel entirely convinced that there's any rush to do all of > this right now, and the more I change the harder it is to make sure > that I haven't broken anything. How strongly do you feel about this > stuff? I think we either should leave the hardcoded limit in place, or make it actually not fixed. Ripping-it-out-but-not-actually just seems like a trap for the unwary, without much point. Greetings, Andres Freund