Re: Table AM Interface Enhancements
On Fri, Jun 21, 2024 at 7:37 PM Matthias van de Meent wrote: > On Tue, 16 Apr 2024 at 12:34, Alexander Korotkov wrote: > > > > You're right. No sense trying to fix this. Reverted. > > I just noticed that this revert (commit 6377e12a) seems to have > introduced two comment blocks atop TableAmRoutine's > scan_analyze_next_block, and I can't find a clear reason why these are > two separate comment blocks. > Furthermore, both comment blocks seemingly talk about different > implementations of a block-based analyze functionality, and I don't > have the time to analyze which of these comments is authorative and > which are misplaced or obsolete. Thank you, I've just removed the first comment. It contains heap-specific information and has been copied here from heapam_scan_analyze_next_block(). -- Regards, Alexander Korotkov Supabase
Re: Table AM Interface Enhancements
Hi, On Tue, 16 Apr 2024 at 12:34, Alexander Korotkov wrote: > > You're right. No sense trying to fix this. Reverted. I just noticed that this revert (commit 6377e12a) seems to have introduced two comment blocks atop TableAmRoutine's scan_analyze_next_block, and I can't find a clear reason why these are two separate comment blocks. Furthermore, both comment blocks seemingly talk about different implementations of a block-based analyze functionality, and I don't have the time to analyze which of these comments is authorative and which are misplaced or obsolete. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Table AM Interface Enhancements
On Thu, Nov 23, 2023 at 1:43 PM Alexander Korotkov wrote: > Hello PostgreSQL Hackers, > > I am pleased to submit a series of patches related to the Table Access > Method (AM) interface, which I initially announced during my talk at > PGCon 2023 [1]. These patches are primarily designed to support the > OrioleDB engine, but I believe they could be beneficial for other > table AM implementations as well. > > The focus of these patches is to introduce more flexibility and > capabilities into the Table AM interface. This is particularly > relevant for advanced use cases like index-organized tables, > alternative MVCC implementations, etc. > Hi Alexander and great to see some action around in the table access method interface. Sorry for being late to the game, but wondering a few things about the patches, but I'll start with the first one that caught my eye. 0007-Allow-table-AM-tuple_insert-method-to-return-the--v1.patch > > This allows table AM to return a native tuple slot, which is aware of > table AM-specific system attributes. > This patch seems straightforward enough, but from reading the surrounding code and trying to understand the context I am wondering a few things. Reading the thread, I am unsure if this will go in or not, but just wanted to point out a concern I had. My apologies if I am raising an issue that is already resolved. AFAICT, the general contract for working with table tuple slots is creating them for a particular purpose, filling it in, and then passing around a pointer to it. Since the slot is created using a "source" implementation, the "source" is responsible for memory allocation and also other updates to the state. Please correct me if I have misunderstood how this is intended to work, but this seems like a good API since it avoids unnecessary allocation and, in particular, unbounded creation of new slots affecting memory usage while a query is executing. For a plan you want to execute, you just make sure that you have slots of the right kind in each plan node and there is no need to dynamically allocate more slots. If you want one for the table access method, just make sure to fetch the slot callbacks from the table access method use those correctly. As a result, the number of slots used during execution is bounded Assuming that I've understood it correct, if a TTS is then created and passed to tuple_insert, and it needs to return a different slot, this raises two questions: - As Andres pointed out: who is responsible for taking care of and dealing with the cleanup of the returned slot here? Note that this is not just a matter of releasing memory, there are other stateful things that they might need to deal with that the TAM have created for in the slot. For this, some sort of callback is needed and the tuple_insert implementation needs to call that correctly. - The dual is the cleanup of the "original" slot passed in: a slot of a particular kind is passed in and you need to deal with this correctly to release the resources allocated by the original slot, using some sort of callback. For both these cases, the question is what cleanup function to call. In most cases, the slot comes from a subplan and is not dynamically allocated, i.e., it cannot just use release() since it is reused later. For example, for ExecScanFetch the slot ss_ScanTupleSlot is returned, which is then used with tuple_insert (unless I've misread the code), which is typically cleared, not released. If clear() is used instead, and you clear this slot as part of inserting a tuple, you can instead clear a premature intermediate result (ss_ScanTupleSlot, in the example above), which can cause strange issues if this result is needed later. So, given that the dynamic allocation of new slots is unbounded within a query and it is complicated to make sure that slots are cleared/reset/released correctly depending on context, this seems to be hard to get to work correctly and not risk introducing bugs. IMHO, it would be preferable to have a very simple contract where you init, set, clear, and release the slot to avoid bugs creeping into the code, which is what the PostgreSQL code mostly has now. So, the question here is why changing the slot implementation is needed. I do not know the details of OrioleDB, but this slot is immediately used with ExecInsertIndexTuples() after the call in nodeModifyTable. If the need is to pass information from the TAM to the IAM then it might be better to store this information in the execution state. Is there a case where the correct slot is not created, then fixing that location might be better. (I've noticed that the copyFrom code has a somewhat naïve assumption of what slot implementation should be used, but that is a separate discussion.) Best wishes, Mats Kindahl
Re: Table AM Interface Enhancements
Hi, On 2024-04-16 08:31:24 -0400, Robert Haas wrote: > On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov > wrote: > > Taking a closer look at acquire_sample_rows(), I think it would be > > good if table AM implementation would care about block-level (or > > whatever-level) sampling. So that acquire_sample_rows() just fetches > > tuples one-by-one from table AM implementation without any care about > > blocks. Possible table_beginscan_analyze() could take an argument of > > target number of tuples, then those tuples are just fetches with > > table_scan_analyze_next_tuple(). What do you think? > > Andres is the expert here, but FWIW, that plan seems reasonable to me. > One downside is that every block-based tableam is going to end up with > a very similar implementation, which is kind of something I don't like > about the tableam API in general: if you want to make something that > is basically heap plus a little bit of special sauce, you have to copy > a mountain of code. Right now we don't really care about that problem, > because we don't have any other tableams in core, but if we ever do, I > think we're going to find ourselves very unhappy with that aspect of > things. But maybe now is not the time to start worrying. That problem > isn't unique to analyze, and giving out-of-core tableams the > flexibility to do what they want is better than not. I think that can partially be addressed by having more "block oriented AM" helpers in core, like we have for table_block_parallelscan*. Doesn't work for everything, but should for something like analyze. Greetings, Andres Freund
Re: Table AM Interface Enhancements
On 2024-04-16 13:33:53 +0300, Alexander Korotkov wrote: > Reverted. Thanks!
Re: Table AM Interface Enhancements
On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov wrote: > Taking a closer look at acquire_sample_rows(), I think it would be > good if table AM implementation would care about block-level (or > whatever-level) sampling. So that acquire_sample_rows() just fetches > tuples one-by-one from table AM implementation without any care about > blocks. Possible table_beginscan_analyze() could take an argument of > target number of tuples, then those tuples are just fetches with > table_scan_analyze_next_tuple(). What do you think? Andres is the expert here, but FWIW, that plan seems reasonable to me. One downside is that every block-based tableam is going to end up with a very similar implementation, which is kind of something I don't like about the tableam API in general: if you want to make something that is basically heap plus a little bit of special sauce, you have to copy a mountain of code. Right now we don't really care about that problem, because we don't have any other tableams in core, but if we ever do, I think we're going to find ourselves very unhappy with that aspect of things. But maybe now is not the time to start worrying. That problem isn't unique to analyze, and giving out-of-core tableams the flexibility to do what they want is better than not. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Table AM Interface Enhancements
On Tue, 16 Apr 2024 at 14:52, Alexander Korotkov wrote: > On Mon, Apr 15, 2024 at 11:17 PM Andres Freund wrote: > > On 2024-04-15 16:02:00 -0400, Robert Haas wrote: > > > Do you want that patch applied, not applied, or applied with some set > of > > > modifications? > > > > I think we should apply Alexander's proposed revert and then separately > > discuss what we should do about 041b96802ef. > > Taking a closer look at acquire_sample_rows(), I think it would be > good if table AM implementation would care about block-level (or > whatever-level) sampling. So that acquire_sample_rows() just fetches > tuples one-by-one from table AM implementation without any care about > blocks. Possible table_beginscan_analyze() could take an argument of > target number of tuples, then those tuples are just fetches with > table_scan_analyze_next_tuple(). What do you think? > Hi, Alexander! I like the idea of splitting abstraction levels for: 1. acquirefuncs (FDW or physical table) 2. new specific row fetch functions (alike to existing _scan_analyze_next_tuple()), that could be AM-specific. Then scan_analyze_next_block() or another iteration algorithm would be contained inside table AM implementation of _scan_analyze_next_tuple(). So, init of scan state would be inside table AM implementation of _beginscan_analyze(). Scan state (like BlockSamplerData or other state that could be custom for AM) could be transferred from _beginscan_analyze() to _scan_analyze_next_tuple() by some opaque AM-specific data structure. If so we'll also may need AM-specific table_endscan_analyze to clean it. Regards, Pavel
Re: Table AM Interface Enhancements
On Mon, Apr 15, 2024 at 11:17 PM Andres Freund wrote: > On 2024-04-15 16:02:00 -0400, Robert Haas wrote: > > Do you want that patch applied, not applied, or applied with some set of > > modifications? > > I think we should apply Alexander's proposed revert and then separately > discuss what we should do about 041b96802ef. Taking a closer look at acquire_sample_rows(), I think it would be good if table AM implementation would care about block-level (or whatever-level) sampling. So that acquire_sample_rows() just fetches tuples one-by-one from table AM implementation without any care about blocks. Possible table_beginscan_analyze() could take an argument of target number of tuples, then those tuples are just fetches with table_scan_analyze_next_tuple(). What do you think? -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
On Mon, Apr 15, 2024 at 11:11 PM Andres Freund wrote: > On 2024-04-12 01:04:03 +0300, Alexander Korotkov wrote: > > 1) If we just apply my revert patch and leave c6fc50cb4028 and > > 041b96802ef in the tree, then we get our table AM API narrowed. As > > you expressed the current API requires block numbers to be 1:1 with > > the actual physical on-disk location [2]. Not a secret I think the > > current API is quite restrictive. And we're getting the ANALYZE > > interface narrower than it was since 737a292b5de. Frankly speaking, I > > don't think this is acceptable. > > As others already pointed out, c6fc50cb4028 was committed quite a while > ago. I'm fairly unhappy about c6fc50cb4028, fwiw, but didn't realize that > until it was too late. +1 > > In token of all of the above, is the in-tree state that bad? (if we > > abstract the way 27bc1772fc and dd1f6b0c17 were committed). > > To me the 27bc1772fc doesn't make much sense on its own. You added calls > directly to heapam internals to a file in src/backend/commands/, that just > doesn't make sense. > > Leaving that aside, I think the interface isn't good on its own: > table_relation_analyze() doesn't actually do anything, it just sets callbacks, > that then later are called from analyze.c, which doesn't at all fit to the > name of the callback/function. I realize that this is kinda cribbed from the > FDW code, but I don't think that is a particularly good excuse. > > I don't think dd1f6b0c17 improves the situation, at all. It sets global > variables to redirect how an individual acquire_sample_rows invocation > works: > void > block_level_table_analyze(Relation relation, > AcquireSampleRowsFunc *func, > BlockNumber *totalpages, > BufferAccessStrategy > bstrategy, > ScanAnalyzeNextBlockFunc > scan_analyze_next_block_cb, > ScanAnalyzeNextTupleFunc > scan_analyze_next_tuple_cb) > { > *func = acquire_sample_rows; > *totalpages = RelationGetNumberOfBlocks(relation); > vac_strategy = bstrategy; > scan_analyze_next_block = scan_analyze_next_block_cb; > scan_analyze_next_tuple = scan_analyze_next_tuple_cb; > } > > Notably it does so within the ->relation_analyze tableam callback, which does > *NOT* not actually do anything other than returning a callback. So if > ->relation_analyze() for another relation is called, the acquire_sample_rows() > for the earlier relation will do something different. Note that this isn't a > theoretical risk, acquire_inherited_sample_rows() actually collects the > acquirefunc for all the inherited relations before calling acquirefunc. You're right. No sense trying to fix this. Reverted. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
Hi, On 2024-04-15 16:02:00 -0400, Robert Haas wrote: > On Mon, Apr 15, 2024 at 3:47 PM Andres Freund wrote: > > That said, I don't like the state after applying > > https://postgr.es/m/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com > > because there's too much coupling. Hence talking about needing to iterate on > > the interface in some form, earlier in the thread. > > Mmph, I can't follow what the actual state of things is here. Are we > waiting for Alexander to push that patch? Is he waiting for somebody > to sign off on that patch? I think Alexander is arguing that we shouldn't revert 27bc1772fc & dd1f6b0c17 in 17. I already didn't think that was an option, because I didn't like the added interfaces, but now am even more certain, given how broken dd1f6b0c17 seems to be: https://postgr.es/m/20240415201057.khoyxbwwxfgzomeo%40awork3.anarazel.de > Do you want that patch applied, not applied, or applied with some set of > modifications? I think we should apply Alexander's proposed revert and then separately discuss what we should do about 041b96802ef. > I find the discussion of "too much coupling" too abstract. I want to > get down to specific proposals for what we should change, or not > change. I think it's a bit hard to propose something concrete until we've decided whether we'll revert 27bc1772fc & dd1f6b0c17. Greetings, Andres Freund
Re: Table AM Interface Enhancements
Hi, On 2024-04-12 01:04:03 +0300, Alexander Korotkov wrote: > 1) If we just apply my revert patch and leave c6fc50cb4028 and > 041b96802ef in the tree, then we get our table AM API narrowed. As > you expressed the current API requires block numbers to be 1:1 with > the actual physical on-disk location [2]. Not a secret I think the > current API is quite restrictive. And we're getting the ANALYZE > interface narrower than it was since 737a292b5de. Frankly speaking, I > don't think this is acceptable. As others already pointed out, c6fc50cb4028 was committed quite a while ago. I'm fairly unhappy about c6fc50cb4028, fwiw, but didn't realize that until it was too late. > In token of all of the above, is the in-tree state that bad? (if we > abstract the way 27bc1772fc and dd1f6b0c17 were committed). To me the 27bc1772fc doesn't make much sense on its own. You added calls directly to heapam internals to a file in src/backend/commands/, that just doesn't make sense. Leaving that aside, I think the interface isn't good on its own: table_relation_analyze() doesn't actually do anything, it just sets callbacks, that then later are called from analyze.c, which doesn't at all fit to the name of the callback/function. I realize that this is kinda cribbed from the FDW code, but I don't think that is a particularly good excuse. I don't think dd1f6b0c17 improves the situation, at all. It sets global variables to redirect how an individual acquire_sample_rows invocation works: void block_level_table_analyze(Relation relation, AcquireSampleRowsFunc *func, BlockNumber *totalpages, BufferAccessStrategy bstrategy, ScanAnalyzeNextBlockFunc scan_analyze_next_block_cb, ScanAnalyzeNextTupleFunc scan_analyze_next_tuple_cb) { *func = acquire_sample_rows; *totalpages = RelationGetNumberOfBlocks(relation); vac_strategy = bstrategy; scan_analyze_next_block = scan_analyze_next_block_cb; scan_analyze_next_tuple = scan_analyze_next_tuple_cb; } Notably it does so within the ->relation_analyze tableam callback, which does *NOT* not actually do anything other than returning a callback. So if ->relation_analyze() for another relation is called, the acquire_sample_rows() for the earlier relation will do something different. Note that this isn't a theoretical risk, acquire_inherited_sample_rows() actually collects the acquirefunc for all the inherited relations before calling acquirefunc. This is honestly leaving me somewhat speechless. Greetings, Andres Freund
Re: Table AM Interface Enhancements
On Mon, Apr 15, 2024 at 3:47 PM Andres Freund wrote: > That said, I don't like the state after applying > https://postgr.es/m/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com > because there's too much coupling. Hence talking about needing to iterate on > the interface in some form, earlier in the thread. Mmph, I can't follow what the actual state of things is here. Are we waiting for Alexander to push that patch? Is he waiting for somebody to sign off on that patch? Do you want that patch applied, not applied, or applied with some set of modifications? I find the discussion of "too much coupling" too abstract. I want to get down to specific proposals for what we should change, or not change. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Table AM Interface Enhancements
Hi, On 2024-04-15 23:14:01 +0400, Pavel Borisov wrote: > Why it makes a difference looks a little bit unclear to me, I can't comment > on this. I noticed that before 041b96802ef we had a block number and block > sampler state that tied acquire_sample_rows() to the actual block > structure. That, and the prefetch calls actually translating the block numbers 1:1 to physical locations within the underlying file. And before 041b96802ef they were tied much more closely by the direct calls to heapam added in 27bc1772fc81. > After we have the whole struct ReadStream which doesn't comprise just a > wrapper for the same variables, but the state that ties > acquire_sample_rows() to the streaming read algorithm (and heap). Yes ... ? I don't see how that is a meaningful difference to the state as of 27bc1772fc81. Nor fundamentally worse than the state 27bc1772fc81^, given that we already issued requests for specific blocks in the file. That said, I don't like the state after applying https://postgr.es/m/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com because there's too much coupling. Hence talking about needing to iterate on the interface in some form, earlier in the thread. What are you actually arguing for here? Greetings, Andres Freund
Re: Table AM Interface Enhancements
On Mon, 15 Apr 2024 at 22:09, Robert Haas wrote: > On Mon, Apr 15, 2024 at 12:37 PM Pavel Borisov > wrote: > > In my understanding, the downside of 041b96802ef is bringing > read_stream* things from being heap-only-related up to the level of > acquire_sample_rows() that is not supposed to be tied to heap. And changing > *_analyze_next_block() function signature to use ReadStream explicitly in > the signature. > > I don't think that really clarifies anything. The ReadStream is > basically just acting as a wrapper for a stream of block numbers, and > the API took a BlockNumber before. So why does it make any difference? > > If I understand correctly, Alexander thinks that, before 041b96802ef, > the block number didn't necessarily have to be the physical block > number on disk, but could instead be any 32-bit quantity that the > table AM wanted to pack into the block number. But I don't think > that's true, because acquire_sample_rows() was already passing those > block numbers to PrefetchBuffer(), which already requires physical > block numbers. > Hi, Robert! Why it makes a difference looks a little bit unclear to me, I can't comment on this. I noticed that before 041b96802ef we had a block number and block sampler state that tied acquire_sample_rows() to the actual block structure. After we have the whole struct ReadStream which doesn't comprise just a wrapper for the same variables, but the state that ties acquire_sample_rows() to the streaming read algorithm (and heap). Yes, we don't have other access methods other than heap implemented for analyze routine, so the patch works anyway, but from the view on acquire_sample_rows() as a general method that is intended to have different implementations in the future it doesn't look good. It's my impression on 041b96802ef, please forgive me if I haven't understood something. Regards, Pavel Borisov Supabase
Re: Table AM Interface Enhancements
On Mon, Apr 15, 2024 at 12:41 PM Nazir Bilal Yavuz wrote: > I agree with you but I did not understand one thing. If out-of-core > AMs are used, does not all block sampling logic (BlockSampler_Init(), > BlockSampler_Next() etc.) need to be edited as well since these > functions assume block numbers are actual physical on-disk location, > right? I mean if the block number is something different than the > actual physical on-disk location, the acquire_sample_rows() function > looks wrong to me before c6fc50cb4028 as well. Yes, this is also a problem with trying to use non-physical block numbers. We can hypothesize an AM where it works out OK in practice, say because there are always exactly the same number of logical block numbers as there are physical block numbers. Or, because there are always more logical block numbers than physical block numbers, but for some reason the table AM author doesn't care because they believe that in the target use case for their AM the data distribution will be sufficiently uniform that sampling only low-numbered blocks won't really hurt anything. But that does seem a bit strained. In practice, I suspect that table AMs that use logical block numbers might want to replace this line from acquire_sample_rows() with a call to a tableam method that returns the number of logical blocks: totalblocks = RelationGetNumberOfBlocks(onerel); But even that does not seem like enough, because my guess would be that a lot of table AMs would end up with a sparse logical block space. For instance, you might create a logical block number sequence that starts at 0 and just counts up towards 2^32 and eventually either wraps around or errors out. Each new tuple gets the next TID that isn't yet used. Well, what's going to happen eventually in a lot of workloads is that the low-numbered logical blocks are going to be mostly or entirely empty, and the data is going to be clustered in the ones that are nearer to the highest logical block number that's so far been assigned. So, then, as you say, you'd want to replace the whole BlockSampler thing entirely. That said, I find it a little bit hard to know what people are already doing or realistically might try to do with table AMs. If somebody says they have a table AM where the number of logical block numbers equals the number of physical block numbers (or is somewhat larger but in a way that doesn't really matter) and the existing block sampling logic works well enough, I can't really disprove that. It puts awfully tight limits on what the AM can be doing, but, OK, sometimes people want to develop AMs for very specific purposes. However, because of the prefetching thing, I think even that fairly narrow use case was already broken before 041b96802efa33d2bc9456f2ad946976b92b5ae1. So I just don't really see how that commit made anything worse in any way that really matters. But maybe it did. People often find extremely creative ways of working around the limitations of the core interfaces. I think it could be the case that someone found a clever way of dodging all of these problems and had something that was working well enough that they were happy with it, and now they can't make it work after the changes for some reason. If that someone is reading this thread and wants to spell that out, we can consider whether there's some relief that we could give to that person, *especially* if they can demonstrate that they raised the alarm before the commit went in. But in the absence of that, my current belief is that nonphysical block numbers were never a supported scenario; hence, the idea that 041b96802efa33d2bc9456f2ad946976b92b5ae1 should be reverted for de-supporting them ought to be rejected. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Table AM Interface Enhancements
On Mon, Apr 15, 2024 at 12:37 PM Pavel Borisov wrote: > In my understanding, the downside of 041b96802ef is bringing read_stream* > things from being heap-only-related up to the level of acquire_sample_rows() > that is not supposed to be tied to heap. And changing *_analyze_next_block() > function signature to use ReadStream explicitly in the signature. I don't think that really clarifies anything. The ReadStream is basically just acting as a wrapper for a stream of block numbers, and the API took a BlockNumber before. So why does it make any difference? If I understand correctly, Alexander thinks that, before 041b96802ef, the block number didn't necessarily have to be the physical block number on disk, but could instead be any 32-bit quantity that the table AM wanted to pack into the block number. But I don't think that's true, because acquire_sample_rows() was already passing those block numbers to PrefetchBuffer(), which already requires physical block numbers. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Table AM Interface Enhancements
Hi, On Mon, 15 Apr 2024 at 18:36, Robert Haas wrote: > > On Sat, Apr 13, 2024 at 5:28 AM Alexander Korotkov > wrote: > > Yes, I think so. Table AM API deals with TIDs and block numbers, but > > doesn't force on what they actually mean. For example, in ZedStore > > [1], data is stored on per-column B-trees, where TID used in table AM > > is just a logical key of that B-trees. Similarly, blockNumber is a > > range for B-trees. > > > > c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an > > assumption that we are sampling physical blocks as they are stored in > > data files. That couldn't anymore be some "logical" block numbers > > with meaning only table AM implementation knows. That was pointed out > > by Andres [2]. I'm not sure if ZedStore is alive, but there could be > > other table AM implementations like this, or other implementations in > > development, etc. Anyway, I don't feel good about narrowing the API, > > which is there from pg12. > > I spent some time looking at this. I think it's valid to complain > about the tighter coupling, but c6fc50cb4028 is there starting in v14, > so I don't think I understand why the situation after 041b96802ef is > materially worse than what we've had for the last few releases. I > think it is worse in the sense that, before, you could dodge the > problem without defining USE_PREFETCH, and now you can't, but I don't > think we can regard nonphysical block numbers as a supported scenario > on that basis. I agree with you but I did not understand one thing. If out-of-core AMs are used, does not all block sampling logic (BlockSampler_Init(), BlockSampler_Next() etc.) need to be edited as well since these functions assume block numbers are actual physical on-disk location, right? I mean if the block number is something different than the actual physical on-disk location, the acquire_sample_rows() function looks wrong to me before c6fc50cb4028 as well. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Table AM Interface Enhancements
On Mon, 15 Apr 2024 at 19:36, Robert Haas wrote: > On Sat, Apr 13, 2024 at 5:28 AM Alexander Korotkov > wrote: > > Yes, I think so. Table AM API deals with TIDs and block numbers, but > > doesn't force on what they actually mean. For example, in ZedStore > > [1], data is stored on per-column B-trees, where TID used in table AM > > is just a logical key of that B-trees. Similarly, blockNumber is a > > range for B-trees. > > > > c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an > > assumption that we are sampling physical blocks as they are stored in > > data files. That couldn't anymore be some "logical" block numbers > > with meaning only table AM implementation knows. That was pointed out > > by Andres [2]. I'm not sure if ZedStore is alive, but there could be > > other table AM implementations like this, or other implementations in > > development, etc. Anyway, I don't feel good about narrowing the API, > > which is there from pg12. > > I spent some time looking at this. I think it's valid to complain > about the tighter coupling, but c6fc50cb4028 is there starting in v14, > so I don't think I understand why the situation after 041b96802ef is > materially worse than what we've had for the last few releases. I > think it is worse in the sense that, before, you could dodge the > problem without defining USE_PREFETCH, and now you can't, but I don't > think we can regard nonphysical block numbers as a supported scenario > on that basis. > > But maybe I'm not correctly understanding the situation? > Hi, Robert! In my understanding, the downside of 041b96802ef is bringing read_stream* things from being heap-only-related up to the level of acquire_sample_rows() that is not supposed to be tied to heap. And changing *_analyze_next_block() function signature to use ReadStream explicitly in the signature. Regards, Pavel.
Re: Table AM Interface Enhancements
On Sat, Apr 13, 2024 at 5:28 AM Alexander Korotkov wrote: > Yes, I think so. Table AM API deals with TIDs and block numbers, but > doesn't force on what they actually mean. For example, in ZedStore > [1], data is stored on per-column B-trees, where TID used in table AM > is just a logical key of that B-trees. Similarly, blockNumber is a > range for B-trees. > > c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an > assumption that we are sampling physical blocks as they are stored in > data files. That couldn't anymore be some "logical" block numbers > with meaning only table AM implementation knows. That was pointed out > by Andres [2]. I'm not sure if ZedStore is alive, but there could be > other table AM implementations like this, or other implementations in > development, etc. Anyway, I don't feel good about narrowing the API, > which is there from pg12. I spent some time looking at this. I think it's valid to complain about the tighter coupling, but c6fc50cb4028 is there starting in v14, so I don't think I understand why the situation after 041b96802ef is materially worse than what we've had for the last few releases. I think it is worse in the sense that, before, you could dodge the problem without defining USE_PREFETCH, and now you can't, but I don't think we can regard nonphysical block numbers as a supported scenario on that basis. But maybe I'm not correctly understanding the situation? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Table AM Interface Enhancements
Hi, Melanie! On Fri, Apr 12, 2024 at 8:48 PM Melanie Plageman wrote: > > On Thu, Apr 11, 2024 at 6:04 PM Alexander Korotkov > wrote: > > > > On Fri, Apr 12, 2024 at 12:04 AM Andres Freund wrote: > > > On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote: > > > > I hope this work is targeting pg18. > > > > > > I think anything of the scope discussed by Melanie would be very clearly > > > targeting 18. For 17, I don't know yet whether we should revert the the > > > ANALYZE streaming read user (041b96802ef), just do a bit of comment > > > polishing, > > > or some other small change. > > > > > > One oddity is that before 041b96802ef, the opportunities for making the > > > interface cleaner were less apparent, because c6fc50cb4028 increased the > > > coupling between analyze.c and the way the table storage works. > > > > Thank you for pointing this out about c6fc50cb4028, I've missed this. > > > > > > Otherwise, do I get this right that this post feature-freeze works on > > > > designing a new API? Yes, 27bc1772fc masked the problem. But it was > > > > committed on Mar 30. > > > > > > Note that there were versions of the patch that were targeting the > > > pre-27bc1772fc interface. > > > > Sure, I've checked this before writing. It looks quite similar to the > > result of applying my revert patch [1] to the head. > > > > Let me describe my view over the current situation. > > > > 1) If we just apply my revert patch and leave c6fc50cb4028 and > > 041b96802ef in the tree, then we get our table AM API narrowed. As > > you expressed the current API requires block numbers to be 1:1 with > > the actual physical on-disk location [2]. Not a secret I think the > > current API is quite restrictive. And we're getting the ANALYZE > > interface narrower than it was since 737a292b5de. Frankly speaking, I > > don't think this is acceptable. > > > > 2) Pushing down the read stream and prefetch to heap am is related to > > difficulties [3], [4]. That's quite a significant piece of work to be > > done post FF. > > I had operated under the assumption that we needed to push the > streaming read code into heap AM because that is what we did for > sequential scan, but now that I think about it, I don't see why we > would have to. Bilal's patch pre-27bc1772fc did not do this. But I > think the code in acquire_sample_rows() isn't more tied to heap AM > after 041b96802ef than it was before it. Are you of the opinion that > the code with 041b96802ef ties acquire_sample_rows() more closely to > heap format? Yes, I think so. Table AM API deals with TIDs and block numbers, but doesn't force on what they actually mean. For example, in ZedStore [1], data is stored on per-column B-trees, where TID used in table AM is just a logical key of that B-trees. Similarly, blockNumber is a range for B-trees. c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an assumption that we are sampling physical blocks as they are stored in data files. That couldn't anymore be some "logical" block numbers with meaning only table AM implementation knows. That was pointed out by Andres [2]. I'm not sure if ZedStore is alive, but there could be other table AM implementations like this, or other implementations in development, etc. Anyway, I don't feel good about narrowing the API, which is there from pg12. Links. 1. https://www.pgcon.org/events/pgcon_2020/sessions/session/44/slides/13/Zedstore-PGCon2020-Virtual.pdf 2. https://www.postgresql.org/message-id/20240410212117.mxsldz2w6htrl36v%40awork3.anarazel.de -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
On Thu, Apr 11, 2024 at 6:04 PM Alexander Korotkov wrote: > > On Fri, Apr 12, 2024 at 12:04 AM Andres Freund wrote: > > On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote: > > > I hope this work is targeting pg18. > > > > I think anything of the scope discussed by Melanie would be very clearly > > targeting 18. For 17, I don't know yet whether we should revert the the > > ANALYZE streaming read user (041b96802ef), just do a bit of comment > > polishing, > > or some other small change. > > > > One oddity is that before 041b96802ef, the opportunities for making the > > interface cleaner were less apparent, because c6fc50cb4028 increased the > > coupling between analyze.c and the way the table storage works. > > Thank you for pointing this out about c6fc50cb4028, I've missed this. > > > > Otherwise, do I get this right that this post feature-freeze works on > > > designing a new API? Yes, 27bc1772fc masked the problem. But it was > > > committed on Mar 30. > > > > Note that there were versions of the patch that were targeting the > > pre-27bc1772fc interface. > > Sure, I've checked this before writing. It looks quite similar to the > result of applying my revert patch [1] to the head. > > Let me describe my view over the current situation. > > 1) If we just apply my revert patch and leave c6fc50cb4028 and > 041b96802ef in the tree, then we get our table AM API narrowed. As > you expressed the current API requires block numbers to be 1:1 with > the actual physical on-disk location [2]. Not a secret I think the > current API is quite restrictive. And we're getting the ANALYZE > interface narrower than it was since 737a292b5de. Frankly speaking, I > don't think this is acceptable. > > 2) Pushing down the read stream and prefetch to heap am is related to > difficulties [3], [4]. That's quite a significant piece of work to be > done post FF. I had operated under the assumption that we needed to push the streaming read code into heap AM because that is what we did for sequential scan, but now that I think about it, I don't see why we would have to. Bilal's patch pre-27bc1772fc did not do this. But I think the code in acquire_sample_rows() isn't more tied to heap AM after 041b96802ef than it was before it. Are you of the opinion that the code with 041b96802ef ties acquire_sample_rows() more closely to heap format? - Melanie
Re: Table AM Interface Enhancements
On Thu, Apr 11, 2024 at 11:27 PM Robert Haas wrote: > On Thu, Apr 11, 2024 at 1:46 PM Alexander Korotkov > wrote: > > I understand that I'm the bad guy of this release, not sure if my > > opinion counts. > > > > But what is going on here? I hope this work is targeting pg18. > > Otherwise, do I get this right that this post feature-freeze works on > > designing a new API? Yes, 27bc1772fc masked the problem. But it was > > committed on Mar 30. So that couldn't justify why the proper API > > wasn't designed in time. Are we judging different commits with the > > same criteria? > > I mean, Andres already said that the cleanup was needed possibly in > 17, and possibly in 18. > > As far as fairness is concerned, you'll get no argument from me if you > say the streaming read stuff was all committed far later than it > should have been. I said that in the very first email I wrote on the > "post-feature freeze cleanup" thread. But if you're going to argue > that there's no opportunity for anyone to adjust patches that were > sideswiped by the reverts of your patches, and that if any such > adjustments seem advisable we should just revert the sideswiped > patches entirely, I don't agree with that, and I don't see why anyone > would agree with that. I think it's fine to have the discussion, and > if the result of that discussion is that somebody says "hey, we want > to do X in 17 for reason Y," then we can discuss that proposal on its > merits, taking into account the answers to questions like "why wasn't > this done before the freeze?" and "is that adjustment more or less > risky than just reverting?" and "how about we just leave it alone for > now and deal with it next release?". I don't think 041b96802e could be sideswiped by 27bc1772fc. The "Use streaming I/O in ANALYZE" patch has the same issue before 27bc1772fc, which was committed on Mar 30. So, in the worst case 27bc1772fc steals a week of work. I can imagine without 27bc1772fc , a new API could be proposed days before FF. This means I saved patch authors from what you name in my case "desperate rush". Huh! > > IMHO, 041b96802e should be just reverted. > > IMHO, it's too early to decide that, because we don't know what change > concretely is going to be proposed, and there has been no discussion > of why that change, whatever it is, belongs in this release or next > release. > > I understand that you're probably not feeling great about being asked > to revert a bunch of stuff here, and I do think it is a fair point to > make that we need to be even-handed and not overreact. Just because > you had some patches that had some problems doesn't mean that > everything that got touched by the reverts can or should be whacked > around a whole bunch more post-freeze, especially since that stuff was > *also* committed very late, in haste, way closer to feature freeze > than it should have been. At the same time, it's also important to > keep in mind that our goal here is not to punish people for being bad, > or to reward them for being good, or really to make any moral > judgements at all, but to produce a quality release. I'm sure that, > where possible, you'd prefer to fix bugs in a patch you committed > rather than revert the whole thing as soon as anyone finds any > problem. I would also prefer that, both for your patches, and for > mine. And everyone else deserves that same consideration. I expressed my thoughts about producing a better release without a desperate rush post-FF in my reply to Andres [2]. Links. 1. https://www.postgresql.org/message-id/CA%2BTgmobZUnJQaaGkuoeo22Sydf9%3DmX864W11yZKd6sv-53-aEQ%40mail.gmail.com 2. https://www.postgresql.org/message-id/CAPpHfdt%2BcCj6j6cR5AyBThP6SyDf6wxAz4dU-0NdXjfpiFca7Q%40mail.gmail.com -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
On Fri, Apr 12, 2024 at 12:04 AM Andres Freund wrote: > On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote: > > I hope this work is targeting pg18. > > I think anything of the scope discussed by Melanie would be very clearly > targeting 18. For 17, I don't know yet whether we should revert the the > ANALYZE streaming read user (041b96802ef), just do a bit of comment polishing, > or some other small change. > > One oddity is that before 041b96802ef, the opportunities for making the > interface cleaner were less apparent, because c6fc50cb4028 increased the > coupling between analyze.c and the way the table storage works. Thank you for pointing this out about c6fc50cb4028, I've missed this. > > Otherwise, do I get this right that this post feature-freeze works on > > designing a new API? Yes, 27bc1772fc masked the problem. But it was > > committed on Mar 30. > > Note that there were versions of the patch that were targeting the > pre-27bc1772fc interface. Sure, I've checked this before writing. It looks quite similar to the result of applying my revert patch [1] to the head. Let me describe my view over the current situation. 1) If we just apply my revert patch and leave c6fc50cb4028 and 041b96802ef in the tree, then we get our table AM API narrowed. As you expressed the current API requires block numbers to be 1:1 with the actual physical on-disk location [2]. Not a secret I think the current API is quite restrictive. And we're getting the ANALYZE interface narrower than it was since 737a292b5de. Frankly speaking, I don't think this is acceptable. 2) Pushing down the read stream and prefetch to heap am is related to difficulties [3], [4]. That's quite a significant piece of work to be done post FF. In token of all of the above, is the in-tree state that bad? (if we abstract the way 27bc1772fc and dd1f6b0c17 were committed). The in-tree state provides quite a general API for analyze, supporting even non-block storages. There is a way to reuse existing acquire_sample_rows() for table AMs, which have block numbers 1:1 with the actual physical on-disk location. It requires some cleanup for comments and docs, but does not require us to redesing the API post FF. Links. 1. https://www.postgresql.org/message-id/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com 2. https://www.postgresql.org/message-id/20240410212117.mxsldz2w6htrl36v%40awork3.anarazel.de 3. https://www.postgresql.org/message-id/CAAKRu_ZxU6hucckrT1SOJxKfyN7q-K4KU1y62GhDwLBZWG%2BROg%40mail.gmail.com 4. https://www.postgresql.org/message-id/CAAKRu_YkphAPNbBR2jcLqnxGhDEWTKhYfLFY%3D0R_oG5LHBH7Gw%40mail.gmail.com -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
Hi, On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote: > I hope this work is targeting pg18. I think anything of the scope discussed by Melanie would be very clearly targeting 18. For 17, I don't know yet whether we should revert the the ANALYZE streaming read user (041b96802ef), just do a bit of comment polishing, or some other small change. One oddity is that before 041b96802ef, the opportunities for making the interface cleaner were less apparent, because c6fc50cb4028 increased the coupling between analyze.c and the way the table storage works. > Otherwise, do I get this right that this post feature-freeze works on > designing a new API? Yes, 27bc1772fc masked the problem. But it was > committed on Mar 30. Note that there were versions of the patch that were targeting the pre-27bc1772fc interface. Greetings, Andres Freund
Re: Table AM Interface Enhancements
On Thu, Apr 11, 2024 at 1:46 PM Alexander Korotkov wrote: > I understand that I'm the bad guy of this release, not sure if my > opinion counts. > > But what is going on here? I hope this work is targeting pg18. > Otherwise, do I get this right that this post feature-freeze works on > designing a new API? Yes, 27bc1772fc masked the problem. But it was > committed on Mar 30. So that couldn't justify why the proper API > wasn't designed in time. Are we judging different commits with the > same criteria? I mean, Andres already said that the cleanup was needed possibly in 17, and possibly in 18. As far as fairness is concerned, you'll get no argument from me if you say the streaming read stuff was all committed far later than it should have been. I said that in the very first email I wrote on the "post-feature freeze cleanup" thread. But if you're going to argue that there's no opportunity for anyone to adjust patches that were sideswiped by the reverts of your patches, and that if any such adjustments seem advisable we should just revert the sideswiped patches entirely, I don't agree with that, and I don't see why anyone would agree with that. I think it's fine to have the discussion, and if the result of that discussion is that somebody says "hey, we want to do X in 17 for reason Y," then we can discuss that proposal on its merits, taking into account the answers to questions like "why wasn't this done before the freeze?" and "is that adjustment more or less risky than just reverting?" and "how about we just leave it alone for now and deal with it next release?". > IMHO, 041b96802e should be just reverted. IMHO, it's too early to decide that, because we don't know what change concretely is going to be proposed, and there has been no discussion of why that change, whatever it is, belongs in this release or next release. I understand that you're probably not feeling great about being asked to revert a bunch of stuff here, and I do think it is a fair point to make that we need to be even-handed and not overreact. Just because you had some patches that had some problems doesn't mean that everything that got touched by the reverts can or should be whacked around a whole bunch more post-freeze, especially since that stuff was *also* committed very late, in haste, way closer to feature freeze than it should have been. At the same time, it's also important to keep in mind that our goal here is not to punish people for being bad, or to reward them for being good, or really to make any moral judgements at all, but to produce a quality release. I'm sure that, where possible, you'd prefer to fix bugs in a patch you committed rather than revert the whole thing as soon as anyone finds any problem. I would also prefer that, both for your patches, and for mine. And everyone else deserves that same consideration. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Table AM Interface Enhancements
On Thu, Apr 11, 2024 at 12:19 PM Melanie Plageman wrote: > > On Wed, Apr 10, 2024 at 5:21 PM Andres Freund wrote: > > > > Hi, > > > > On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote: > > > This brings up a question about the prefetching. We never had to have > > > this discussion for sequential scan streaming read because it didn't > > > (and still doesn't) do prefetching. But, if we push the streaming read > > > code down into the heap AM layer, it will be doing the prefetching. > > > So, do we remove the prefetching from acquire_sample_rows() and expect > > > other table AMs to implement it themselves or use the streaming read > > > API? > > > > The prefetching added to acquire_sample_rows was quite narrowly tailored to > > something heap-like - it pretty much required that block numbers to be 1:1 > > with the actual physical on-disk location for the specific AM. So I think > > it's pretty much required for this to be pushed down. > > > > Using a read stream is a few lines for something like this, so I'm not > > worried > > about it. I guess we could have a default implementation for block based > > AMs, > > similar what we have around table_block_parallelscan_*, but not sure it's > > worth doing that, the complexity is much lower than in the > > table_block_parallelscan_ case. > > This makes sense. > > I am working on pushing streaming ANALYZE into heap AM code, and I ran > into a few roadblocks. > > If we want ANALYZE to make the ReadStream object in heap_beginscan() > (like the read stream implementation of heap sequential and TID range > scans do), I don't see any way around changing the scan_begin table AM > callback to take a BufferAccessStrategy at the least (and perhaps also > the BlockSamplerData). I will also say that, had this been 6 months ago, I would probably suggest we restructure ANALYZE's table AM interface to accommodate read stream setup and to address a few other things I find odd about the current code. For example, I think creating a scan descriptor for the analyze scan in acquire_sample_rows() is quite odd. It seems like it would be better done in the relation_analyze callback. The relation_analyze callback saves some state like the callbacks for acquire_sample_rows() and the Buffer Access Strategy. But at least in the heap implementation, it just saves them in static variables in analyze.c. It seems like it would be better to save them in a useful data structure that could be accessed later. We have access to pretty much everything we need at that point (in the relation_analyze callback). I also think heap's implementation of table_beginscan_analyze() doesn't need most of heap_beginscan()/initscan(), so doing this instead of something ANALYZE specific seems more confusing than helpful. - Melanie
Re: Table AM Interface Enhancements
Hi! On Thu, Apr 11, 2024 at 7:19 PM Melanie Plageman wrote: > On Wed, Apr 10, 2024 at 5:21 PM Andres Freund wrote: > > On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote: > > > This brings up a question about the prefetching. We never had to have > > > this discussion for sequential scan streaming read because it didn't > > > (and still doesn't) do prefetching. But, if we push the streaming read > > > code down into the heap AM layer, it will be doing the prefetching. > > > So, do we remove the prefetching from acquire_sample_rows() and expect > > > other table AMs to implement it themselves or use the streaming read > > > API? > > > > The prefetching added to acquire_sample_rows was quite narrowly tailored to > > something heap-like - it pretty much required that block numbers to be 1:1 > > with the actual physical on-disk location for the specific AM. So I think > > it's pretty much required for this to be pushed down. > > > > Using a read stream is a few lines for something like this, so I'm not > > worried > > about it. I guess we could have a default implementation for block based > > AMs, > > similar what we have around table_block_parallelscan_*, but not sure it's > > worth doing that, the complexity is much lower than in the > > table_block_parallelscan_ case. > > This makes sense. > > I am working on pushing streaming ANALYZE into heap AM code, and I ran > into a few roadblocks. > > If we want ANALYZE to make the ReadStream object in heap_beginscan() > (like the read stream implementation of heap sequential and TID range > scans do), I don't see any way around changing the scan_begin table AM > callback to take a BufferAccessStrategy at the least (and perhaps also > the BlockSamplerData). > > read_stream_begin_relation() doesn't just save the > BufferAccessStrategy in the ReadStream, it uses it to set various > other things in the ReadStream object. callback_private_data (which in > ANALYZE's case is the BlockSamplerData) is simply saved in the > ReadStream, so it could be set later, but that doesn't sound very > clean to me. > > As such, it seems like a cleaner alternative would be to add a table > AM callback for creating a read stream object that takes the > parameters of read_stream_begin_relation(). But, perhaps it is a bit > late for such additions. > > It also opens us up to the question of whether or not sequential scan > should use such a callback instead of making the read stream object in > heap_beginscan(). > > I am happy to write a patch that does any of the above. But, I want to > raise these questions, because perhaps I am simply missing an obvious > alternative solution. I understand that I'm the bad guy of this release, not sure if my opinion counts. But what is going on here? I hope this work is targeting pg18. Otherwise, do I get this right that this post feature-freeze works on designing a new API? Yes, 27bc1772fc masked the problem. But it was committed on Mar 30. So that couldn't justify why the proper API wasn't designed in time. Are we judging different commits with the same criteria? IMHO, 041b96802e should be just reverted. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
On Thu, Apr 11, 2024 at 8:11 PM Jeff Davis wrote: > On Wed, 2024-04-10 at 15:19 +0300, Alexander Korotkov wrote: > > 1) 9bd99f4c26 comprises the reworked patch after working with notes > > from Jeff Davis. I agree it would be better to wait for him to > > express explicit agreement. Before reverting this, I would prefer to > > hear his opinion. > > On this particular feature, I had tried it in the past myself, and > there were a number of minor frustrations and I left it unfinished. I > quickly recognized that commit c95c25f9af was too simple to work. > > Commit 9bd99f4c26 looked substantially better, but I was surprised to > see it committed so soon after the redesign. I thought a revert was > likely outcome, but I put it on my list of things to review more deeply > in the next couple weeks so I could give productive feedback. Thank you for your feedback, Jeff. > It would benefit from more discussion in v18, and I apologize for not > getting involved earlier when the patch still could have made it into > v17. I believe you don't have to apologize. It's definitely not your fault that I've committed this patch in this shape. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
On Wed, 2024-04-10 at 15:19 +0300, Alexander Korotkov wrote: > 1) 9bd99f4c26 comprises the reworked patch after working with notes > from Jeff Davis. I agree it would be better to wait for him to > express explicit agreement. Before reverting this, I would prefer to > hear his opinion. On this particular feature, I had tried it in the past myself, and there were a number of minor frustrations and I left it unfinished. I quickly recognized that commit c95c25f9af was too simple to work. Commit 9bd99f4c26 looked substantially better, but I was surprised to see it committed so soon after the redesign. I thought a revert was likely outcome, but I put it on my list of things to review more deeply in the next couple weeks so I could give productive feedback. It would benefit from more discussion in v18, and I apologize for not getting involved earlier when the patch still could have made it into v17. Regards, Jeff Davis
Re: Table AM Interface Enhancements
On Wed, Apr 10, 2024 at 5:21 PM Andres Freund wrote: > > Hi, > > On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote: > > This brings up a question about the prefetching. We never had to have > > this discussion for sequential scan streaming read because it didn't > > (and still doesn't) do prefetching. But, if we push the streaming read > > code down into the heap AM layer, it will be doing the prefetching. > > So, do we remove the prefetching from acquire_sample_rows() and expect > > other table AMs to implement it themselves or use the streaming read > > API? > > The prefetching added to acquire_sample_rows was quite narrowly tailored to > something heap-like - it pretty much required that block numbers to be 1:1 > with the actual physical on-disk location for the specific AM. So I think > it's pretty much required for this to be pushed down. > > Using a read stream is a few lines for something like this, so I'm not worried > about it. I guess we could have a default implementation for block based AMs, > similar what we have around table_block_parallelscan_*, but not sure it's > worth doing that, the complexity is much lower than in the > table_block_parallelscan_ case. This makes sense. I am working on pushing streaming ANALYZE into heap AM code, and I ran into a few roadblocks. If we want ANALYZE to make the ReadStream object in heap_beginscan() (like the read stream implementation of heap sequential and TID range scans do), I don't see any way around changing the scan_begin table AM callback to take a BufferAccessStrategy at the least (and perhaps also the BlockSamplerData). read_stream_begin_relation() doesn't just save the BufferAccessStrategy in the ReadStream, it uses it to set various other things in the ReadStream object. callback_private_data (which in ANALYZE's case is the BlockSamplerData) is simply saved in the ReadStream, so it could be set later, but that doesn't sound very clean to me. As such, it seems like a cleaner alternative would be to add a table AM callback for creating a read stream object that takes the parameters of read_stream_begin_relation(). But, perhaps it is a bit late for such additions. It also opens us up to the question of whether or not sequential scan should use such a callback instead of making the read stream object in heap_beginscan(). I am happy to write a patch that does any of the above. But, I want to raise these questions, because perhaps I am simply missing an obvious alternative solution. - Melanie
Re: Table AM Interface Enhancements
Hi Andres, On Wed, Apr 10, 2024 at 7:52 PM Andres Freund wrote: > On 2024-04-08 14:54:46 -0400, Robert Haas wrote: > > Exactly how much is getting reverted here? I see these, all since March > > 23rd: > > IMO: > > > > dd1f6b0c17 Provide a way block-level table AMs could re-use > > acquire_sample_rows() > > Should be reverted. > > > > 9bd99f4c26 Custom reloptions for table AM > > Hm. There are some oddities here: > > - It doesn't seem great that relcache.c now needs to know about the default > values for all kinds of reloptions. > > - why is there table_reloptions() and tableam_reloptions()? > > - Why does extractRelOptions() need a TableAmRoutine parameter, extracted by a > caller, instead of doing that work itself? > > > > > 97ce821e3e Fix the parameters order for > > TableAmRoutine.relation_copy_for_cluster() > > Shouldn't be, this is a clear fix. > > > > b1484a3f19 Let table AM insertion methods control index insertion > > I'm not sure. I'm not convinced this is right, nor the opposite. If the > tableam takes control of index insertion, shouldn't nodeModifyTuple know this > earlier, so it doesn't prepare a bunch of index insertion state? Also, > there's pretty much no motivating explanation in the commit. > > > > 27bc1772fc Generalize relation analyze in table AM interface > > Should be reverted. > > > > 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete() > > Strongly suspect this should be reverted. The last time this was committed it > was far from ready. It's very easy to cause corruption due to subtle bugs in > this area. > > > > c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot > > If the AM returns a different slot, who is responsible for cleaning it up? And > how is creating a new slot for every insert not going to be a measurable > overhead? > > > > 02eb07ea89 Allow table AM to store complex data structures in rd_amcache > > I am doubtful this is right. Is it really sufficient to have a callback for > freeing? What happens when relcache entries are swapped as part of a rebuild? > That works for "flat" caches, but I don't immediately see how it works for > more complicated datastructures. At least from the commit message it's hard > to evaluate how this actually intended to be used. Thank you for your feedback. I've reverted all of above. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
Hi, On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote: > This brings up a question about the prefetching. We never had to have > this discussion for sequential scan streaming read because it didn't > (and still doesn't) do prefetching. But, if we push the streaming read > code down into the heap AM layer, it will be doing the prefetching. > So, do we remove the prefetching from acquire_sample_rows() and expect > other table AMs to implement it themselves or use the streaming read > API? The prefetching added to acquire_sample_rows was quite narrowly tailored to something heap-like - it pretty much required that block numbers to be 1:1 with the actual physical on-disk location for the specific AM. So I think it's pretty much required for this to be pushed down. Using a read stream is a few lines for something like this, so I'm not worried about it. I guess we could have a default implementation for block based AMs, similar what we have around table_block_parallelscan_*, but not sure it's worth doing that, the complexity is much lower than in the table_block_parallelscan_ case. Greetings, Andres
Re: Table AM Interface Enhancements
On Wed, Apr 10, 2024 at 4:33 PM Andres Freund wrote: > > Hi, > > On 2024-04-10 16:24:40 -0400, Melanie Plageman wrote: > > This thread has been moving pretty fast, so could someone point out > > which version of the patch has the modifications to > > acquire_sample_rows() that would be relevant for Bilal (and others > > involved in analyze streaming read) to review? Is it > > v1-0001-revert-Generalize-relation-analyze-in-table-AM-in.patch? > > I think so. It's at least what I've been looking at. I took a look at this patch, and you're right we will need to do follow-on work with streaming ANALYZE. The streaming read code will have to be moved now that acquire_sample_rows() is table-AM agnostic again. I don't think there was ever a version that Bilal wrote where the streaming read code was outside of acquire_sample_rows(). By the time he got that review feedback, 27bc1772fc8 had gone in. This brings up a question about the prefetching. We never had to have this discussion for sequential scan streaming read because it didn't (and still doesn't) do prefetching. But, if we push the streaming read code down into the heap AM layer, it will be doing the prefetching. So, do we remove the prefetching from acquire_sample_rows() and expect other table AMs to implement it themselves or use the streaming read API? - Melanie
Re: Table AM Interface Enhancements
Hi, On 2024-04-10 16:24:40 -0400, Melanie Plageman wrote: > This thread has been moving pretty fast, so could someone point out > which version of the patch has the modifications to > acquire_sample_rows() that would be relevant for Bilal (and others > involved in analyze streaming read) to review? Is it > v1-0001-revert-Generalize-relation-analyze-in-table-AM-in.patch? I think so. It's at least what I've been looking at. Greetings, Andres Freund
Re: Table AM Interface Enhancements
On Wed, Apr 10, 2024 at 4:03 PM Andres Freund wrote: > > Hi, > > On 2024-04-10 15:19:47 +0300, Alexander Korotkov wrote: > > On Mon, Apr 8, 2024 at 9:54 PM Robert Haas wrote: > > > On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov > > > wrote: > > > > Yes, it was my mistake. I got rushing trying to fit this to FF, even > > > > doing significant changes just before commit. > > > > I'll revert this later today. > > > > The patch to revert is attached. Given that revert touches the work > > done in 041b96802e, I think it needs some feedback before push. > > Hm. It's a bit annoying to revert it, you're right. I think on its own the > revert looks reasonable from what I've seen so far, will continue looking for > a bit. > > I think we'll need to do some cleanup of 041b96802e separately afterwards - > possibly in 17, possibly in 18. Particularly post-27bc1772fc8 > acquire_sample_rows() was tied hard to heapam, so it made sense for 041b96802e > to create the stream in acquire_sample_rows() and have > block_sampling_read_stream_next() be in analyze.c. But eventually that should > be in access/heap/. Compared to 16, the state post the revert does tie > analyze.c a bit closer to the internals of the AM than before, but I'm not > sure the increase matters. Yes in an earlier version of 041b96802e, I gave the review feedback that the read stream should be pushed down into heap-specific code, but then after 27bc1772fc8, Bilal took the approach of putting the read stream code in acquire_sample_rows() since that was no longer table AM-agnostic. This thread has been moving pretty fast, so could someone point out which version of the patch has the modifications to acquire_sample_rows() that would be relevant for Bilal (and others involved in analyze streaming read) to review? Is it v1-0001-revert-Generalize-relation-analyze-in-table-AM-in.patch? - Melanie
Re: Table AM Interface Enhancements
Hi, On 2024-04-10 15:19:47 +0300, Alexander Korotkov wrote: > On Mon, Apr 8, 2024 at 9:54 PM Robert Haas wrote: > > On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov > > wrote: > > > Yes, it was my mistake. I got rushing trying to fit this to FF, even > > > doing significant changes just before commit. > > > I'll revert this later today. > > The patch to revert is attached. Given that revert touches the work > done in 041b96802e, I think it needs some feedback before push. Hm. It's a bit annoying to revert it, you're right. I think on its own the revert looks reasonable from what I've seen so far, will continue looking for a bit. I think we'll need to do some cleanup of 041b96802e separately afterwards - possibly in 17, possibly in 18. Particularly post-27bc1772fc8 acquire_sample_rows() was tied hard to heapam, so it made sense for 041b96802e to create the stream in acquire_sample_rows() and have block_sampling_read_stream_next() be in analyze.c. But eventually that should be in access/heap/. Compared to 16, the state post the revert does tie analyze.c a bit closer to the internals of the AM than before, but I'm not sure the increase matters. Greetings, Andres Freund
Re: Table AM Interface Enhancements
On Wed, Apr 10, 2024 at 05:42:51PM +0400, Pavel Borisov wrote: > Hi, Alexander! > In my view, the actual list of what has raised discussion is: > dd1f6b0c17 Provide a way block-level table AMs could re-use > acquire_sample_rows > () > 27bc1772fc Generalize relation analyze in table AM interface > > Proposals to revert the other patches in a wholesale way look to me like an > ill-performed continuation of a discussion [1]. I can't believe that "Let's For reference this disussion was: I don't dispute that we could do better, and this is just a simplistic look based on "number of commits per day", but the attached does put it in perspective to some extent. > select which commits close to FF looks worse than the others" based on > whereabouts, not patch contents is a good and productive way for the community > to use. I don't know how you can say these patches are being questioned just because they are near the feature freeze (FF). There are clear concerns, and post-feature freeze is not the time to be evaluating which patches which were pushed in near feature freeze need help. What is the huge rush for these patches, and if they were so important, why was this not done earlier? This can all wait until PG 18. If Supabase or someone else needs these patches for PG 17, they will need to create a patched verison of PG 17 with these patches. > At the same time if Andres, who is the most experienced person in the scope of > access methods is willing to give his post-commit re-review of any of the > committed patches and will recommend some of them reverted, it would be a good > sensible input to act accordingly. > patch So the patches were rushed, have problems, and now we are requiring Andres to stop what he is doing to give immediate feedback --- that is not fair to him. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Table AM Interface Enhancements
On Wed, Apr 10, 2024 at 1:25 PM Robert Haas wrote: > That is somewhat fair, but it is also a lot of work. There are > multiple people asking for you to revert things on multiple threads, > and figuring out all of the revert requests and trying to come to some > consensus about what should be done in each case is going to take an > enormous amount of time. I know you've done lots of good work on > PostgreSQL in the past and I respect that, but I think you also have > to realize that you're asking other people to spend a LOT of time > figuring out what to do about the current situation. I see Andres has > posted more specifically about what he thinks should happen to each of > the table AM patches and I am willing to defer to his opinion, but we > need to make some quick decisions here to either keep things or take > them out. Extensive reworks after feature freeze should not be an > option that is on the table; that's what makes it a freeze. Alexander has been sharply criticized for acting in haste, pushing work in multiple areas when it was clearly not ready. And that seems proportionate to me. I agree that he showed poor judgement in the past few months, and especially in the past few weeks. Not just on one occasion, but on several. That must have consequences. > I also do not think I really believe that there's been so much stuff > committed that a blanket revert would be all that hard to carry off, > if that were the option that the community ended up preferring. It seems to me that emotions are running high right now. I think that it would be a mistake to act in haste when determining next steps. It's very important, but it's not very urgent. I've known Alexander for about 15 years. I think that he deserves some consideration here. Say a week or two, to work through some of the more complicated issues -- and to take a breather. I just don't see any upside to rushing through this process, given where we are now. -- Peter Geoghegan
Re: Table AM Interface Enhancements
On Wed, Apr 10, 2024 at 12:36 PM Alexander Korotkov wrote: > But I have to mention that even that I've committed table AM stuff > close to the FF, there has been quite amount of depended work > committed. So, revert of these patches is promising to be not > something immediate and easy, which requires just the decision. It > would touch others work. And and revert patches might also need > review. I get the point that patches got lack of consensus. But in > terms of efforts (not my efforts) it's probably makes sense to get > them some post-commit review. That is somewhat fair, but it is also a lot of work. There are multiple people asking for you to revert things on multiple threads, and figuring out all of the revert requests and trying to come to some consensus about what should be done in each case is going to take an enormous amount of time. I know you've done lots of good work on PostgreSQL in the past and I respect that, but I think you also have to realize that you're asking other people to spend a LOT of time figuring out what to do about the current situation. I see Andres has posted more specifically about what he thinks should happen to each of the table AM patches and I am willing to defer to his opinion, but we need to make some quick decisions here to either keep things or take them out. Extensive reworks after feature freeze should not be an option that is on the table; that's what makes it a freeze. I also do not think I really believe that there's been so much stuff committed that a blanket revert would be all that hard to carry off, if that were the option that the community ended up preferring. > Robert, look. Last year I went through the arrest for expressing my > opinion. I that was not what normal arrest should look like, but a > period of survival. My family went through a period of fear, struggle > and uncertainty. Now, we're healthy and safe, but there is still > uncertainty given asylum seeker status. During all this period, I > have to just obey, agree with everything, lie that I apologize about > things I don't apologize. I had to do this, because the price of > expressing myself was not just my life, but also health, freedom and > well-being of my family. > > I owe you great respect for all your work for PostgreSQL, and > especially for your efforts on getting things organized. But it > wouldn't work the way you increase my potential punishment and I just > say that I'm obey and you're right about everything. You may even > initiate the procedure of my exclusion from committers (no idea what > the procedure is), ban from the list etc. I see you express many > valuable points, but my view is not exactly same as yours. And like a > conclusion to some as result of discussion not threats. > > I feel the sense of blame and fear in latest discussions, and I don't > like it. That's OK to place the blame from time to time. But I would > like to add here more joy and respect (and I'm sorry I personally > didn't do enough in this matter). It's important get things right > etc. But in long term relationships may mean more. I am not sure how to respond to this. On a personal level, I am sorry to hear that you were arrested and, if I can be of some help to you, we can discuss that off-list. However, if you're suggesting that there is some kind of equivalence between me criticizing your decisions about what to commit and someone in a position of authority putting you in jail, well, I don't think it's remotely fair to compare those things. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Table AM Interface Enhancements
Hi, On 2024-04-08 14:54:46 -0400, Robert Haas wrote: > Exactly how much is getting reverted here? I see these, all since March 23rd: IMO: > dd1f6b0c17 Provide a way block-level table AMs could re-use > acquire_sample_rows() Should be reverted. > 9bd99f4c26 Custom reloptions for table AM Hm. There are some oddities here: - It doesn't seem great that relcache.c now needs to know about the default values for all kinds of reloptions. - why is there table_reloptions() and tableam_reloptions()? - Why does extractRelOptions() need a TableAmRoutine parameter, extracted by a caller, instead of doing that work itself? > 97ce821e3e Fix the parameters order for > TableAmRoutine.relation_copy_for_cluster() Shouldn't be, this is a clear fix. > b1484a3f19 Let table AM insertion methods control index insertion I'm not sure. I'm not convinced this is right, nor the opposite. If the tableam takes control of index insertion, shouldn't nodeModifyTuple know this earlier, so it doesn't prepare a bunch of index insertion state? Also, there's pretty much no motivating explanation in the commit. > 27bc1772fc Generalize relation analyze in table AM interface Should be reverted. > 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete() Strongly suspect this should be reverted. The last time this was committed it was far from ready. It's very easy to cause corruption due to subtle bugs in this area. > c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot If the AM returns a different slot, who is responsible for cleaning it up? And how is creating a new slot for every insert not going to be a measurable overhead? > 02eb07ea89 Allow table AM to store complex data structures in rd_amcache I am doubtful this is right. Is it really sufficient to have a callback for freeing? What happens when relcache entries are swapped as part of a rebuild? That works for "flat" caches, but I don't immediately see how it works for more complicated datastructures. At least from the commit message it's hard to evaluate how this actually intended to be used. Greetings, Andres Freund
Re: Table AM Interface Enhancements
On Wed, Apr 10, 2024 at 4:19 PM Robert Haas wrote: > > On Wed, Apr 10, 2024 at 8:20 AM Alexander Korotkov > wrote: > > I agree with the facts. But I have a different interpretation on > > this. The patch was committed as 11470f544e on March 23, 2023, then > > reverted on April 3. I've proposed the revised version, but Andres > > complained that this is the new API design days before FF. > > Well, his first complaint that your committed patch was full of bugs: > > https://www.postgresql.org/message-id/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de > > When you commit a patch and another committer writes a post-commit > review saying that your patch has so many serious problems that he > gave up on reviewing before enumerating all of them, that's a really > bad sign. That should be a strong signal to you to step back and take > a close look at whether you really understand the area of the code > that you're touching well enough to be doing whatever it is that > you're doing. If I got a review like that, I would have reverted the > patch instantly, given up for the release cycle, possibly given up on > the patch permanently, and most definitely not tried again to commit > unless I was absolutely certain that I'd learned a lot in the meantime > *and* had the agreement of the committer who wrote that review (or > maybe some other committer who was acknowledged as an expert in that > area of the code). > > What you did instead is try to do a bunch of post-commit fixup in a > desperate rush right before feature freeze, to which Andres > understandably objected. But that was your second mistake, not your > first one. > > > Then the > > patch with this design was published in the thread for the year with > > periodical rebases. So, I think I expressed my intention with that > > design before 2023 FF, nobody prevented me from expressing objections > > or other feedback during the year. Then I realized that 2024 FF is > > approaching and decided to give this another try for pg18. > > This doesn't seem to match the facts as I understand them. It appears > to me that there was no activity on the thread from April until > November. The message in November was not written by you. Your first > post to the thread after April of 2023 was on March 19, 2024. Five > days later you said you wanted to commit. That doesn't look to me like > you worked diligently on the patch set throughout the year and other > people had reasonable notice that you planned to get the work done > this cycle. It looks like you ignored the patch for 11 months and then > committed it without any real further feedback from anyone. True, > Pavel did post and say that he thought the patches were in good shape. > But you could hardly take that as evidence that Andres was now content > that the problems he'd raised earlier had been fixed, because (a) > Pavel had also been involved beforehand and had not raised the > concerns that Andres later raised and (b) Pavel wrote nothing in his > email specifically about why he thought your changes or his had > resolved those concerns. I certainly agree that Andres doesn't always > give as much review feedback as I'd like to have from him in, and it's > also true that he doesn't always give that feedback as quickly as I'd > like to have it ... but you know what? > > It's not Andres's job to make sure my patches are not broken. It's my > job. That applies to the patches I write, and the patches written by > other people that I commit. If I commit something and it turns out > that it is broken, that's my bad. If I commit something and it turns > out that it does not have consensus, that is also my bad. It is not > the fault of the other people for not helping me get my patches to a > state where they are up to project standard. It is my fault, and my > fault alone, for committing something that was not ready. Now that > does not mean that it isn't frustrating when I can't get the help I > need. It is extremely frustrating. But the solution is not to commit > anyway and then blame the other people for not providing feedback. > > I mean, committing without explicit agreement from someone else is OK > if you're pretty sure that you've got everything sorted out correctly. > But I don't think that the paper trail here supports the narrative > that you worked on this diligently throughout the year and had every > reason to believe it would be acceptable to the community. If I'd > looked at this thread, I would have concluded that you'd abandoned the > project. I would have expected that, when you picked it up again, > there would be a series of emails over a period of time carefully > working through the various issues that had been raised, inviting > specific commentary on specific discussion points, and generally > refining the work, and then maybe a suggestion of a commit at the end. > I would not have expected an email or two basically saying "well, > seems like it's all fixed now," followed by a commit.
Re: Table AM Interface Enhancements
On 4/10/24 09:19, Robert Haas wrote: When you commit a patch and another committer writes a post-commit review saying that your patch has so many serious problems that he gave up on reviewing before enumerating all of them, that's a really bad sign. That should be a strong signal to you to step back and take a close look at whether you really understand the area of the code that you're touching well enough to be doing whatever it is that you're doing. If I got a review like that, I would have reverted the patch instantly, given up for the release cycle, possibly given up on the patch permanently, and most definitely not tried again to commit unless I was absolutely certain that I'd learned a lot in the meantime *and* had the agreement of the committer who wrote that review (or maybe some other committer who was acknowledged as an expert in that area of the code). It's not Andres's job to make sure my patches are not broken. It's my job. That applies to the patches I write, and the patches written by other people that I commit. If I commit something and it turns out that it is broken, that's my bad. If I commit something and it turns out that it does not have consensus, that is also my bad. It is not the fault of the other people for not helping me get my patches to a state where they are up to project standard. It is my fault, and my fault alone, for committing something that was not ready. Now that does not mean that it isn't frustrating when I can't get the help I need. It is extremely frustrating. But the solution is not to commit anyway and then blame the other people for not providing feedback. +many -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Table AM Interface Enhancements
Hi, Alexander! On Wed, 10 Apr 2024 at 16:20, Alexander Korotkov wrote: > On Mon, Apr 8, 2024 at 9:54 PM Robert Haas wrote: > > On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov > wrote: > > > Yes, it was my mistake. I got rushing trying to fit this to FF, even > doing significant changes just before commit. > > > I'll revert this later today. > > The patch to revert is attached. Given that revert touches the work > done in 041b96802e, I think it needs some feedback before push. > > > Alexander, > > > > Exactly how much is getting reverted here? I see these, all since March > 23rd: > > > > dd1f6b0c17 Provide a way block-level table AMs could re-use > > acquire_sample_rows() > > 9bd99f4c26 Custom reloptions for table AM > > 97ce821e3e Fix the parameters order for > > TableAmRoutine.relation_copy_for_cluster() > > 867cc7b6dd Revert "Custom reloptions for table AM" > > b1484a3f19 Let table AM insertion methods control index insertion > > c95c25f9af Custom reloptions for table AM > > 27bc1772fc Generalize relation analyze in table AM interface > > 87985cc925 Allow locking updated tuples in tuple_update() and > tuple_delete() > > c35a3fb5e0 Allow table AM tuple_insert() method to return the different > slot > > 02eb07ea89 Allow table AM to store complex data structures in rd_amcache > > > > I'm not really feeling very good about all of this, because: > > > > - 87985cc925 was previously committed as 11470f544e on March 23, 2023, > > and almost immediately reverted. Now you tried again on March 26, > > 2024. I know there was a bunch of rework in the middle, but there are > > times in the year that things can be committed other than right before > > the feature freeze. Like, don't wait a whole year for the next attempt > > and then again do it right before the cutoff. > > I agree with the facts. But I have a different interpretation on > this. The patch was committed as 11470f544e on March 23, 2023, then > reverted on April 3. I've proposed the revised version, but Andres > complained that this is the new API design days before FF. Then the > patch with this design was published in the thread for the year with > periodical rebases. So, I think I expressed my intention with that > design before 2023 FF, nobody prevented me from expressing objections > or other feedback during the year. Then I realized that 2024 FF is > approaching and decided to give this another try for pg18. > > But I don't yet see it's wrong with this patch. I waited a year for > feedback. I waited 2 days after saying "I will push this if no > objections". Given your feedback now, I get that it would be better to > do another attempt to commit this earlier. > > I admit my mistake with dd1f6b0c17. I get rushed trying to fix the > things actually making things worse. I apologise for this. But if > I'm forced to revert 87985cc925 without even hearing any reasonable > critics besides imperfection of timing, I feel like this is the > punishment for my mistake with dd1f6b0c17. Pretty unreasonable > punishment in my view. > > > - The Discussion links in the commit messages do not seem to stand for > > the proposition that these particular patches ought to be committed in > > this form. Some of them are just links to the messages where the patch > > was originally posted, which is probably not against policy or > > anything, but it'd be nicer to see links to versions of the patch with > > which people are, in nearby emails, agreeing. Even worse, some of > > these are links to emails where somebody said, "hey, some earlier > > commit does not look good." In particular, > > dd1f6b0c172a643a73d6b71259fa2d10378b39eb has a discussion link where > > Andres complains about 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa, but > > it's not clear how that justifies the new commit. > > I have to repeat again, that I admit my mistake with dd1f6b0c17, > apologize for that, and make my own conclusions to not repeat this. > But dd1f6b0c17 seems to be the only one that has a link to the message > with complains. I went through the list of commits above, it seems > that others have just linked to the first message of the thread. > Probably, there is a lack of consensus for some of them. But I never > heard about a policy to link not just the discussion start, but also > exact messages expressing agreeing. And I didn't see others doing > that. > > > - The commit message for 867cc7b6dd says "This reverts commit > > c95c25f9af4bc77f2f66a587735c50da08c12b37 due to multiple design issues > > spotted after commit." That's not a very good justification for then > > trying again 6 days later with 9bd99f4c26, and it's *definitely* not a > > good justification for there being no meaningful discussion links in > > the commit message for 9bd99f4c26. They're just the same links you had > > in the previous attempt, so it's pretty hard for anybody to understand > > what got fixed and whether all of the concerns were really addressed. > > Just looking over the commit,
Re: Table AM Interface Enhancements
On Wed, Apr 10, 2024 at 8:20 AM Alexander Korotkov wrote: > I agree with the facts. But I have a different interpretation on > this. The patch was committed as 11470f544e on March 23, 2023, then > reverted on April 3. I've proposed the revised version, but Andres > complained that this is the new API design days before FF. Well, his first complaint that your committed patch was full of bugs: https://www.postgresql.org/message-id/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de When you commit a patch and another committer writes a post-commit review saying that your patch has so many serious problems that he gave up on reviewing before enumerating all of them, that's a really bad sign. That should be a strong signal to you to step back and take a close look at whether you really understand the area of the code that you're touching well enough to be doing whatever it is that you're doing. If I got a review like that, I would have reverted the patch instantly, given up for the release cycle, possibly given up on the patch permanently, and most definitely not tried again to commit unless I was absolutely certain that I'd learned a lot in the meantime *and* had the agreement of the committer who wrote that review (or maybe some other committer who was acknowledged as an expert in that area of the code). What you did instead is try to do a bunch of post-commit fixup in a desperate rush right before feature freeze, to which Andres understandably objected. But that was your second mistake, not your first one. > Then the > patch with this design was published in the thread for the year with > periodical rebases. So, I think I expressed my intention with that > design before 2023 FF, nobody prevented me from expressing objections > or other feedback during the year. Then I realized that 2024 FF is > approaching and decided to give this another try for pg18. This doesn't seem to match the facts as I understand them. It appears to me that there was no activity on the thread from April until November. The message in November was not written by you. Your first post to the thread after April of 2023 was on March 19, 2024. Five days later you said you wanted to commit. That doesn't look to me like you worked diligently on the patch set throughout the year and other people had reasonable notice that you planned to get the work done this cycle. It looks like you ignored the patch for 11 months and then committed it without any real further feedback from anyone. True, Pavel did post and say that he thought the patches were in good shape. But you could hardly take that as evidence that Andres was now content that the problems he'd raised earlier had been fixed, because (a) Pavel had also been involved beforehand and had not raised the concerns that Andres later raised and (b) Pavel wrote nothing in his email specifically about why he thought your changes or his had resolved those concerns. I certainly agree that Andres doesn't always give as much review feedback as I'd like to have from him in, and it's also true that he doesn't always give that feedback as quickly as I'd like to have it ... but you know what? It's not Andres's job to make sure my patches are not broken. It's my job. That applies to the patches I write, and the patches written by other people that I commit. If I commit something and it turns out that it is broken, that's my bad. If I commit something and it turns out that it does not have consensus, that is also my bad. It is not the fault of the other people for not helping me get my patches to a state where they are up to project standard. It is my fault, and my fault alone, for committing something that was not ready. Now that does not mean that it isn't frustrating when I can't get the help I need. It is extremely frustrating. But the solution is not to commit anyway and then blame the other people for not providing feedback. I mean, committing without explicit agreement from someone else is OK if you're pretty sure that you've got everything sorted out correctly. But I don't think that the paper trail here supports the narrative that you worked on this diligently throughout the year and had every reason to believe it would be acceptable to the community. If I'd looked at this thread, I would have concluded that you'd abandoned the project. I would have expected that, when you picked it up again, there would be a series of emails over a period of time carefully working through the various issues that had been raised, inviting specific commentary on specific discussion points, and generally refining the work, and then maybe a suggestion of a commit at the end. I would not have expected an email or two basically saying "well, seems like it's all fixed now," followed by a commit. > Do you propose a ban from March 1 to the end of any year? I think the > first doesn't make sense, because it leaves only 2 months a year for > the work. That would create a potential rush during these 2
Re: Table AM Interface Enhancements
On Mon, Apr 8, 2024 at 9:54 PM Robert Haas wrote: > On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov > wrote: > > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing > > significant changes just before commit. > > I'll revert this later today. The patch to revert is attached. Given that revert touches the work done in 041b96802e, I think it needs some feedback before push. > Alexander, > > Exactly how much is getting reverted here? I see these, all since March 23rd: > > dd1f6b0c17 Provide a way block-level table AMs could re-use > acquire_sample_rows() > 9bd99f4c26 Custom reloptions for table AM > 97ce821e3e Fix the parameters order for > TableAmRoutine.relation_copy_for_cluster() > 867cc7b6dd Revert "Custom reloptions for table AM" > b1484a3f19 Let table AM insertion methods control index insertion > c95c25f9af Custom reloptions for table AM > 27bc1772fc Generalize relation analyze in table AM interface > 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete() > c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot > 02eb07ea89 Allow table AM to store complex data structures in rd_amcache > > I'm not really feeling very good about all of this, because: > > - 87985cc925 was previously committed as 11470f544e on March 23, 2023, > and almost immediately reverted. Now you tried again on March 26, > 2024. I know there was a bunch of rework in the middle, but there are > times in the year that things can be committed other than right before > the feature freeze. Like, don't wait a whole year for the next attempt > and then again do it right before the cutoff. I agree with the facts. But I have a different interpretation on this. The patch was committed as 11470f544e on March 23, 2023, then reverted on April 3. I've proposed the revised version, but Andres complained that this is the new API design days before FF. Then the patch with this design was published in the thread for the year with periodical rebases. So, I think I expressed my intention with that design before 2023 FF, nobody prevented me from expressing objections or other feedback during the year. Then I realized that 2024 FF is approaching and decided to give this another try for pg18. But I don't yet see it's wrong with this patch. I waited a year for feedback. I waited 2 days after saying "I will push this if no objections". Given your feedback now, I get that it would be better to do another attempt to commit this earlier. I admit my mistake with dd1f6b0c17. I get rushed trying to fix the things actually making things worse. I apologise for this. But if I'm forced to revert 87985cc925 without even hearing any reasonable critics besides imperfection of timing, I feel like this is the punishment for my mistake with dd1f6b0c17. Pretty unreasonable punishment in my view. > - The Discussion links in the commit messages do not seem to stand for > the proposition that these particular patches ought to be committed in > this form. Some of them are just links to the messages where the patch > was originally posted, which is probably not against policy or > anything, but it'd be nicer to see links to versions of the patch with > which people are, in nearby emails, agreeing. Even worse, some of > these are links to emails where somebody said, "hey, some earlier > commit does not look good." In particular, > dd1f6b0c172a643a73d6b71259fa2d10378b39eb has a discussion link where > Andres complains about 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa, but > it's not clear how that justifies the new commit. I have to repeat again, that I admit my mistake with dd1f6b0c17, apologize for that, and make my own conclusions to not repeat this. But dd1f6b0c17 seems to be the only one that has a link to the message with complains. I went through the list of commits above, it seems that others have just linked to the first message of the thread. Probably, there is a lack of consensus for some of them. But I never heard about a policy to link not just the discussion start, but also exact messages expressing agreeing. And I didn't see others doing that. > - The commit message for 867cc7b6dd says "This reverts commit > c95c25f9af4bc77f2f66a587735c50da08c12b37 due to multiple design issues > spotted after commit." That's not a very good justification for then > trying again 6 days later with 9bd99f4c26, and it's *definitely* not a > good justification for there being no meaningful discussion links in > the commit message for 9bd99f4c26. They're just the same links you had > in the previous attempt, so it's pretty hard for anybody to understand > what got fixed and whether all of the concerns were really addressed. > Just looking over the commit, it's pretty hard to understand what is > being changed and why: there's not a lot of comment updates, there's > no documentation changes, and there's not a lot of explanation in the > commit message either. Even if this feature is great and all
Re: Table AM Interface Enhancements
On Mon, Apr 8, 2024 at 9:54 PM Robert Haas wrote: > On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov > wrote: > > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing > > significant changes just before commit. > > I'll revert this later today. It appears to be a non-trivial revert, because 041b96802e already revised the relation analyze after 27bc1772fc. That is, I would need to "backport" 041b96802e. Sorry, I'm too tired to do this today. I'll come back to this tomorrow. > Alexander, > > Exactly how much is getting reverted here? I see these, all since March 23rd: > > dd1f6b0c17 Provide a way block-level table AMs could re-use > acquire_sample_rows() > 9bd99f4c26 Custom reloptions for table AM > 97ce821e3e Fix the parameters order for > TableAmRoutine.relation_copy_for_cluster() > 867cc7b6dd Revert "Custom reloptions for table AM" > b1484a3f19 Let table AM insertion methods control index insertion > c95c25f9af Custom reloptions for table AM > 27bc1772fc Generalize relation analyze in table AM interface > 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete() > c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot > 02eb07ea89 Allow table AM to store complex data structures in rd_amcache It would be discouraging to revert all of this. Some items are very simple, some items get a lot of work. I'll come back tomorrow and answer all your points. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov wrote: > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing > significant changes just before commit. > I'll revert this later today. Alexander, Exactly how much is getting reverted here? I see these, all since March 23rd: dd1f6b0c17 Provide a way block-level table AMs could re-use acquire_sample_rows() 9bd99f4c26 Custom reloptions for table AM 97ce821e3e Fix the parameters order for TableAmRoutine.relation_copy_for_cluster() 867cc7b6dd Revert "Custom reloptions for table AM" b1484a3f19 Let table AM insertion methods control index insertion c95c25f9af Custom reloptions for table AM 27bc1772fc Generalize relation analyze in table AM interface 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete() c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot 02eb07ea89 Allow table AM to store complex data structures in rd_amcache I'm not really feeling very good about all of this, because: - 87985cc925 was previously committed as 11470f544e on March 23, 2023, and almost immediately reverted. Now you tried again on March 26, 2024. I know there was a bunch of rework in the middle, but there are times in the year that things can be committed other than right before the feature freeze. Like, don't wait a whole year for the next attempt and then again do it right before the cutoff. - The Discussion links in the commit messages do not seem to stand for the proposition that these particular patches ought to be committed in this form. Some of them are just links to the messages where the patch was originally posted, which is probably not against policy or anything, but it'd be nicer to see links to versions of the patch with which people are, in nearby emails, agreeing. Even worse, some of these are links to emails where somebody said, "hey, some earlier commit does not look good." In particular, dd1f6b0c172a643a73d6b71259fa2d10378b39eb has a discussion link where Andres complains about 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa, but it's not clear how that justifies the new commit. - The commit message for 867cc7b6dd says "This reverts commit c95c25f9af4bc77f2f66a587735c50da08c12b37 due to multiple design issues spotted after commit." That's not a very good justification for then trying again 6 days later with 9bd99f4c26, and it's *definitely* not a good justification for there being no meaningful discussion links in the commit message for 9bd99f4c26. They're just the same links you had in the previous attempt, so it's pretty hard for anybody to understand what got fixed and whether all of the concerns were really addressed. Just looking over the commit, it's pretty hard to understand what is being changed and why: there's not a lot of comment updates, there's no documentation changes, and there's not a lot of explanation in the commit message either. Even if this feature is great and all the code is perfect now, it's going to be hard for anyone to figure out how to use it. 97ce821e3e looks like a clear bug fix to me, but I wonder if the rest of this should all just be reverted, with a ban on ever trying it again after March 1 of any year. I'd like to believe that there are only bookkeeping problems here, and that there was in fact clear agreement that all of these changes should be made in this form, and that the commit messages simply failed to reference the most relevant emails. But what I fear, especially in view of Andres's remarks, is that these commits were done in haste without adequate consensus, and I think that's a serious problem. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Table AM Interface Enhancements
On Mon, Apr 8, 2024, 19:08 Andres Freund wrote: > On 2024-04-08 08:37:44 -0700, Andres Freund wrote: > > On 2024-04-08 11:17:51 +0400, Pavel Borisov wrote: > > > On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov > > > > I was under the impression there are not so many out-of-core table > > > > AMs, which have non-dummy analysis implementations. And even if > there > > > > are some, duplicating acquire_sample_rows() isn't a big deal. > > > > > > > > But given your feedback, I'd like to propose to keep both options > > > > open. Turn back the block-level API for analyze, but let table-AM > > > > implement its own analyze function. Then existing out-of-core AMs > > > > wouldn't need to do anything (or probably just set the new API method > > > > to NULL). > > > > > > > I think that providing both new and old interface functions for > block-based > > > and non-block based custom am is an excellent compromise. > > > > I don't agree, that way lies an unmanageable API. To me the new API > doesn't > > look well polished either, so it's not a question of a smoother > transition or > > something like that. > > > > I don't think redesigning extension APIs at this stage of the release > cycle > > makes sense. > > Wait, you already pushed an API redesign? With a design that hasn't even > seen > the list from what I can tell? Without even mentioning that on the list? > You > got to be kidding me. > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing significant changes just before commit. I'll revert this later today. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
On 2024-04-08 08:37:44 -0700, Andres Freund wrote: > On 2024-04-08 11:17:51 +0400, Pavel Borisov wrote: > > On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov > > > I was under the impression there are not so many out-of-core table > > > AMs, which have non-dummy analysis implementations. And even if there > > > are some, duplicating acquire_sample_rows() isn't a big deal. > > > > > > But given your feedback, I'd like to propose to keep both options > > > open. Turn back the block-level API for analyze, but let table-AM > > > implement its own analyze function. Then existing out-of-core AMs > > > wouldn't need to do anything (or probably just set the new API method > > > to NULL). > > > > > I think that providing both new and old interface functions for block-based > > and non-block based custom am is an excellent compromise. > > I don't agree, that way lies an unmanageable API. To me the new API doesn't > look well polished either, so it's not a question of a smoother transition or > something like that. > > I don't think redesigning extension APIs at this stage of the release cycle > makes sense. Wait, you already pushed an API redesign? With a design that hasn't even seen the list from what I can tell? Without even mentioning that on the list? You got to be kidding me.
Re: Table AM Interface Enhancements
Hi, On 2024-04-08 11:17:51 +0400, Pavel Borisov wrote: > On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov > > I was under the impression there are not so many out-of-core table > > AMs, which have non-dummy analysis implementations. And even if there > > are some, duplicating acquire_sample_rows() isn't a big deal. > > > > But given your feedback, I'd like to propose to keep both options > > open. Turn back the block-level API for analyze, but let table-AM > > implement its own analyze function. Then existing out-of-core AMs > > wouldn't need to do anything (or probably just set the new API method > > to NULL). > > > I think that providing both new and old interface functions for block-based > and non-block based custom am is an excellent compromise. I don't agree, that way lies an unmanageable API. To me the new API doesn't look well polished either, so it's not a question of a smoother transition or something like that. I don't think redesigning extension APIs at this stage of the release cycle makes sense. Greetings, Andres Freund
Re: Table AM Interface Enhancements
Hi, Alexander On Mon, 8 Apr 2024 at 13:59, Pavel Borisov wrote: > > > On Mon, 8 Apr 2024 at 13:34, Alexander Korotkov > wrote: > >> On Mon, Apr 8, 2024 at 10:18 AM Pavel Borisov >> wrote: >> > On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov >> wrote: >> >> >> >> Hi, >> >> >> >> On Mon, Apr 8, 2024 at 12:40 AM Andres Freund >> wrote: >> >> > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote: >> >> > > I've pushed 0001, 0002 and 0006. >> >> > >> >> > I briefly looked at 27bc1772fc81 and I don't think the state post >> this commit >> >> > makes sense. Before this commit another block based AM could >> implement analyze >> >> > without much code duplication. Now a large portion of analyze.c has >> to be >> >> > copied, because they can't stop acquire_sample_rows() from calling >> >> > heapam_scan_analyze_next_block(). >> >> > >> >> > I'm quite certain this will break a few out-of-core AMs in a way >> that can't >> >> > easily be fixed. >> >> >> >> I was under the impression there are not so many out-of-core table >> >> AMs, which have non-dummy analysis implementations. And even if there >> >> are some, duplicating acquire_sample_rows() isn't a big deal. >> >> >> >> But given your feedback, I'd like to propose to keep both options >> >> open. Turn back the block-level API for analyze, but let table-AM >> >> implement its own analyze function. Then existing out-of-core AMs >> >> wouldn't need to do anything (or probably just set the new API method >> >> to NULL). >> > >> > I think that providing both new and old interface functions for >> block-based and non-block based custom am is an excellent compromise. >> > >> > The patch v1-0001-Turn-back.. is mainly an undo of part of the >> 27bc1772fc81 that had turned off _analyze_next_tuple..analyze_next_block >> for external callers. If some extensions are already adapted to the old >> interface functions, they are free to still use it. >> >> Please, check this. Instead of keeping two APIs, it generalizes >> acquire_sample_rows(). The downside is change of >> AcquireSampleRowsFunc signature, which would need some changes in FDWs >> too. >> > To me, both approaches v1-0001-Turn-back... and v2-0001-Generalize... and > patch v2 look good. > > Pavel. > I added some changes in comments to better reflect changes in patch v2. See a patch v3 (code unchanged from v2) Regards, Pavel v3-0001-Generalize-acquire_sample_rows.patch Description: Binary data
Re: Table AM Interface Enhancements
On Mon, 8 Apr 2024 at 13:34, Alexander Korotkov wrote: > On Mon, Apr 8, 2024 at 10:18 AM Pavel Borisov > wrote: > > On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov > wrote: > >> > >> Hi, > >> > >> On Mon, Apr 8, 2024 at 12:40 AM Andres Freund > wrote: > >> > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote: > >> > > I've pushed 0001, 0002 and 0006. > >> > > >> > I briefly looked at 27bc1772fc81 and I don't think the state post > this commit > >> > makes sense. Before this commit another block based AM could > implement analyze > >> > without much code duplication. Now a large portion of analyze.c has > to be > >> > copied, because they can't stop acquire_sample_rows() from calling > >> > heapam_scan_analyze_next_block(). > >> > > >> > I'm quite certain this will break a few out-of-core AMs in a way that > can't > >> > easily be fixed. > >> > >> I was under the impression there are not so many out-of-core table > >> AMs, which have non-dummy analysis implementations. And even if there > >> are some, duplicating acquire_sample_rows() isn't a big deal. > >> > >> But given your feedback, I'd like to propose to keep both options > >> open. Turn back the block-level API for analyze, but let table-AM > >> implement its own analyze function. Then existing out-of-core AMs > >> wouldn't need to do anything (or probably just set the new API method > >> to NULL). > > > > I think that providing both new and old interface functions for > block-based and non-block based custom am is an excellent compromise. > > > > The patch v1-0001-Turn-back.. is mainly an undo of part of the > 27bc1772fc81 that had turned off _analyze_next_tuple..analyze_next_block > for external callers. If some extensions are already adapted to the old > interface functions, they are free to still use it. > > Please, check this. Instead of keeping two APIs, it generalizes > acquire_sample_rows(). The downside is change of > AcquireSampleRowsFunc signature, which would need some changes in FDWs > too. > To me, both approaches v1-0001-Turn-back... and v2-0001-Generalize... and patch v2 look good. Pavel.
Re: Table AM Interface Enhancements
On Mon, Apr 8, 2024 at 10:18 AM Pavel Borisov wrote: > On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov wrote: >> >> Hi, >> >> On Mon, Apr 8, 2024 at 12:40 AM Andres Freund wrote: >> > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote: >> > > I've pushed 0001, 0002 and 0006. >> > >> > I briefly looked at 27bc1772fc81 and I don't think the state post this >> > commit >> > makes sense. Before this commit another block based AM could implement >> > analyze >> > without much code duplication. Now a large portion of analyze.c has to be >> > copied, because they can't stop acquire_sample_rows() from calling >> > heapam_scan_analyze_next_block(). >> > >> > I'm quite certain this will break a few out-of-core AMs in a way that can't >> > easily be fixed. >> >> I was under the impression there are not so many out-of-core table >> AMs, which have non-dummy analysis implementations. And even if there >> are some, duplicating acquire_sample_rows() isn't a big deal. >> >> But given your feedback, I'd like to propose to keep both options >> open. Turn back the block-level API for analyze, but let table-AM >> implement its own analyze function. Then existing out-of-core AMs >> wouldn't need to do anything (or probably just set the new API method >> to NULL). > > I think that providing both new and old interface functions for block-based > and non-block based custom am is an excellent compromise. > > The patch v1-0001-Turn-back.. is mainly an undo of part of the 27bc1772fc81 > that had turned off _analyze_next_tuple..analyze_next_block for external > callers. If some extensions are already adapted to the old interface > functions, they are free to still use it. Please, check this. Instead of keeping two APIs, it generalizes acquire_sample_rows(). The downside is change of AcquireSampleRowsFunc signature, which would need some changes in FDWs too. -- Regards, Alexander Korotkov v2-0001-Generalize-acquire_sample_rows.patch Description: Binary data
Re: Table AM Interface Enhancements
Hi, Alexander and Andres! On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov wrote: > Hi, > > On Mon, Apr 8, 2024 at 12:40 AM Andres Freund wrote: > > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote: > > > I've pushed 0001, 0002 and 0006. > > > > I briefly looked at 27bc1772fc81 and I don't think the state post this > commit > > makes sense. Before this commit another block based AM could implement > analyze > > without much code duplication. Now a large portion of analyze.c has to be > > copied, because they can't stop acquire_sample_rows() from calling > > heapam_scan_analyze_next_block(). > > > > I'm quite certain this will break a few out-of-core AMs in a way that > can't > > easily be fixed. > > I was under the impression there are not so many out-of-core table > AMs, which have non-dummy analysis implementations. And even if there > are some, duplicating acquire_sample_rows() isn't a big deal. > > But given your feedback, I'd like to propose to keep both options > open. Turn back the block-level API for analyze, but let table-AM > implement its own analyze function. Then existing out-of-core AMs > wouldn't need to do anything (or probably just set the new API method > to NULL). > I think that providing both new and old interface functions for block-based and non-block based custom am is an excellent compromise. The patch v1-0001-Turn-back.. is mainly an undo of part of the 27bc1772fc81 that had turned off _analyze_next_tuple..analyze_next_block for external callers. If some extensions are already adapted to the old interface functions, they are free to still use it. > And even for non-block based AMs, the new interface basically requires > > reimplementing all of analyze.c. > . > Non-lock base AM needs to just provide an alternative implementation > for what acquire_sample_rows() does. This seems like reasonable > effort for me, and surely not reimplementing all of analyze.c. > I agree. Regards, Pavel Borisov Supabase
Re: Table AM Interface Enhancements
Hi, On 2024-04-08 02:25:17 +0300, Alexander Korotkov wrote: > On Mon, Apr 8, 2024 at 12:40 AM Andres Freund wrote: > > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote: > > > I've pushed 0001, 0002 and 0006. > > > > I briefly looked at 27bc1772fc81 and I don't think the state post this > > commit > > makes sense. Before this commit another block based AM could implement > > analyze > > without much code duplication. Now a large portion of analyze.c has to be > > copied, because they can't stop acquire_sample_rows() from calling > > heapam_scan_analyze_next_block(). > > > > I'm quite certain this will break a few out-of-core AMs in a way that can't > > easily be fixed. > > I was under the impression there are not so many out-of-core table > AMs, which have non-dummy analysis implementations. I know of at least 4 that have some production usage. > And even if there are some, duplicating acquire_sample_rows() isn't a big > deal. I don't agree. The code has evolved a bunch over time, duplicating it into various AMs is a bad idea. > But given your feedback, I'd like to propose to keep both options > open. Turn back the block-level API for analyze, but let table-AM > implement its own analyze function. Then existing out-of-core AMs > wouldn't need to do anything (or probably just set the new API method > to NULL). I think this patch simply hasn't been reviewed even close to careful enough and should be reverted. It's IMO to late for a redesign. Sorry for not looking earlier, I was mostly out sick for the last few months. I think a dedicated tableam callback for sample acquisition probably makes sense, but if we want that, we need to provide an easy way for AMs that are sufficiently block-like to reuse the code, not have two different ways to implement analyze. ISTM that ->relation_analyze is quite misleading as a name. For one, it it just sets some callbacks, no? But more importantly, it sounds like it'd actually allow to wrap the whole analyze process, rather than just the acquisition of samples. Greetings, Andres Freund
Re: Table AM Interface Enhancements
On Mon, Apr 8, 2024 at 2:25 AM Alexander Korotkov wrote: > On Mon, Apr 8, 2024 at 12:40 AM Andres Freund wrote: > > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote: > > > I've pushed 0001, 0002 and 0006. > > > > I briefly looked at 27bc1772fc81 and I don't think the state post this > > commit > > makes sense. Before this commit another block based AM could implement > > analyze > > without much code duplication. Now a large portion of analyze.c has to be > > copied, because they can't stop acquire_sample_rows() from calling > > heapam_scan_analyze_next_block(). > > > > I'm quite certain this will break a few out-of-core AMs in a way that can't > > easily be fixed. > > I was under the impression there are not so many out-of-core table > AMs, which have non-dummy analysis implementations. And even if there > are some, duplicating acquire_sample_rows() isn't a big deal. > > But given your feedback, I'd like to propose to keep both options > open. Turn back the block-level API for analyze, but let table-AM > implement its own analyze function. Then existing out-of-core AMs > wouldn't need to do anything (or probably just set the new API method > to NULL). The attached patch was to illustrate the approach. It surely needs some polishing. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
Hi, On Mon, Apr 8, 2024 at 12:40 AM Andres Freund wrote: > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote: > > I've pushed 0001, 0002 and 0006. > > I briefly looked at 27bc1772fc81 and I don't think the state post this commit > makes sense. Before this commit another block based AM could implement analyze > without much code duplication. Now a large portion of analyze.c has to be > copied, because they can't stop acquire_sample_rows() from calling > heapam_scan_analyze_next_block(). > > I'm quite certain this will break a few out-of-core AMs in a way that can't > easily be fixed. I was under the impression there are not so many out-of-core table AMs, which have non-dummy analysis implementations. And even if there are some, duplicating acquire_sample_rows() isn't a big deal. But given your feedback, I'd like to propose to keep both options open. Turn back the block-level API for analyze, but let table-AM implement its own analyze function. Then existing out-of-core AMs wouldn't need to do anything (or probably just set the new API method to NULL). > And even for non-block based AMs, the new interface basically requires > reimplementing all of analyze.c. . Non-lock base AM needs to just provide an alternative implementation for what acquire_sample_rows() does. This seems like reasonable effort for me, and surely not reimplementing all of analyze.c. -- Regards, Alexander Korotkov v1-0001-Turn-back-the-block-level-API-for-relation-analyz.patch Description: Binary data
Re: Table AM Interface Enhancements
Hi, On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote: > I've pushed 0001, 0002 and 0006. I briefly looked at 27bc1772fc81 and I don't think the state post this commit makes sense. Before this commit another block based AM could implement analyze without much code duplication. Now a large portion of analyze.c has to be copied, because they can't stop acquire_sample_rows() from calling heapam_scan_analyze_next_block(). I'm quite certain this will break a few out-of-core AMs in a way that can't easily be fixed. And even for non-block based AMs, the new interface basically requires reimplementing all of analyze.c. What am I missing here? Greetings, Andres Freund
Re: Table AM Interface Enhancements
Hi, Alexander! On Sun, 7 Apr 2024 at 12:34, Pavel Borisov wrote: > Hi, Alexander! > > On Sun, 7 Apr 2024 at 07:33, Alexander Korotkov > wrote: > >> Hi, Pavel! >> >> On Fri, Apr 5, 2024 at 6:58 PM Pavel Borisov >> wrote: >> > On Tue, 2 Apr 2024 at 19:17, Jeff Davis wrote: >> >> >> >> On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote: >> >> > I don't like the idea that every custom table AM reltoptions should >> >> > begin with StdRdOptions. I would rather introduce the new data >> >> > structure with table options, which need to be accessed outside of >> >> > table AM. Then reloptions will be a backbox only directly used in >> >> > table AM, while table AM has a freedom on what to store in reloptions >> >> > and how to calculate externally-visible options. What do you think? >> >> >> >> Hi Alexander! >> >> >> >> I agree with all of that. It will take some refactoring to get there, >> >> though. >> >> >> >> One idea is to store StdRdOptions like normal, but if an unrecognized >> >> option is found, ask the table AM if it understands the option. In that >> >> case I think we'd just use a different field in pg_class so that it can >> >> use whatever format it wants to represent its options. >> >> >> >> Regards, >> >> Jeff Davis >> > >> > I tried to rework a patch regarding table am according to the input >> from Alexander and Jeff. >> > >> > It splits table reloptions into two categories: >> > - common for all tables (stored in a fixed size structure and could be >> accessed from outside) >> > - table-am specific (variable size, parsed and accessed by access >> method only) >> >> Thank you for your work. Please, check the revised patch. >> >> It makes CommonRdOptions a separate data structure, not directly >> involved in parsing the reloption. Instead table AM can fill it on >> the base of its reloptions or calculate the other way. Patch comes >> with a test module, which comes with heap-based table AM. This table >> AM has "enable_parallel" reloption, which is used as the base to set >> the value of CommonRdOptions.parallel_workers. >> > To me, a patch v10 looks good. > > I think the comment for RelationData now applies only to rd_options, not > to rd_common_options. > >NULLs means "use defaults". > > Regards, > Pavel > I made minor changes to the patch. Please find v11 attached. Regards, Pavel. v11-0001-Custom-reloptions-for-table-AM.patch Description: Binary data
Re: Table AM Interface Enhancements
Hi, Alexander! On Sun, 7 Apr 2024 at 07:33, Alexander Korotkov wrote: > Hi, Pavel! > > On Fri, Apr 5, 2024 at 6:58 PM Pavel Borisov > wrote: > > On Tue, 2 Apr 2024 at 19:17, Jeff Davis wrote: > >> > >> On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote: > >> > I don't like the idea that every custom table AM reltoptions should > >> > begin with StdRdOptions. I would rather introduce the new data > >> > structure with table options, which need to be accessed outside of > >> > table AM. Then reloptions will be a backbox only directly used in > >> > table AM, while table AM has a freedom on what to store in reloptions > >> > and how to calculate externally-visible options. What do you think? > >> > >> Hi Alexander! > >> > >> I agree with all of that. It will take some refactoring to get there, > >> though. > >> > >> One idea is to store StdRdOptions like normal, but if an unrecognized > >> option is found, ask the table AM if it understands the option. In that > >> case I think we'd just use a different field in pg_class so that it can > >> use whatever format it wants to represent its options. > >> > >> Regards, > >> Jeff Davis > > > > I tried to rework a patch regarding table am according to the input from > Alexander and Jeff. > > > > It splits table reloptions into two categories: > > - common for all tables (stored in a fixed size structure and could be > accessed from outside) > > - table-am specific (variable size, parsed and accessed by access method > only) > > Thank you for your work. Please, check the revised patch. > > It makes CommonRdOptions a separate data structure, not directly > involved in parsing the reloption. Instead table AM can fill it on > the base of its reloptions or calculate the other way. Patch comes > with a test module, which comes with heap-based table AM. This table > AM has "enable_parallel" reloption, which is used as the base to set > the value of CommonRdOptions.parallel_workers. > To me, a patch v10 looks good. I think the comment for RelationData now applies only to rd_options, not to rd_common_options. >NULLs means "use defaults". Regards, Pavel
Re: Table AM Interface Enhancements
Hi, Pavel! On Fri, Apr 5, 2024 at 6:58 PM Pavel Borisov wrote: > On Tue, 2 Apr 2024 at 19:17, Jeff Davis wrote: >> >> On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote: >> > I don't like the idea that every custom table AM reltoptions should >> > begin with StdRdOptions. I would rather introduce the new data >> > structure with table options, which need to be accessed outside of >> > table AM. Then reloptions will be a backbox only directly used in >> > table AM, while table AM has a freedom on what to store in reloptions >> > and how to calculate externally-visible options. What do you think? >> >> Hi Alexander! >> >> I agree with all of that. It will take some refactoring to get there, >> though. >> >> One idea is to store StdRdOptions like normal, but if an unrecognized >> option is found, ask the table AM if it understands the option. In that >> case I think we'd just use a different field in pg_class so that it can >> use whatever format it wants to represent its options. >> >> Regards, >> Jeff Davis > > I tried to rework a patch regarding table am according to the input from > Alexander and Jeff. > > It splits table reloptions into two categories: > - common for all tables (stored in a fixed size structure and could be > accessed from outside) > - table-am specific (variable size, parsed and accessed by access method only) Thank you for your work. Please, check the revised patch. It makes CommonRdOptions a separate data structure, not directly involved in parsing the reloption. Instead table AM can fill it on the base of its reloptions or calculate the other way. Patch comes with a test module, which comes with heap-based table AM. This table AM has "enable_parallel" reloption, which is used as the base to set the value of CommonRdOptions.parallel_workers. -- Regards, Alexander Korotkov v10-0001-Custom-reloptions-for-table-AM.patch Description: Binary data
Re: Table AM Interface Enhancements
Hi, hackers! On Tue, 2 Apr 2024 at 19:17, Jeff Davis wrote: > On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote: > > I don't like the idea that every custom table AM reltoptions should > > begin with StdRdOptions. I would rather introduce the new data > > structure with table options, which need to be accessed outside of > > table AM. Then reloptions will be a backbox only directly used in > > table AM, while table AM has a freedom on what to store in reloptions > > and how to calculate externally-visible options. What do you think? > > Hi Alexander! > > I agree with all of that. It will take some refactoring to get there, > though. > > One idea is to store StdRdOptions like normal, but if an unrecognized > option is found, ask the table AM if it understands the option. In that > case I think we'd just use a different field in pg_class so that it can > use whatever format it wants to represent its options. > > Regards, > Jeff Davis > I tried to rework a patch regarding table am according to the input from Alexander and Jeff. It splits table reloptions into two categories: - common for all tables (stored in a fixed size structure and could be accessed from outside) - table-am specific (variable size, parsed and accessed by access method only) Please find a patch attached. v9-0001-Custom-reloptions-for-table-AM.patch Description: Binary data
Re: Table AM Interface Enhancements
On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote: > I don't like the idea that every custom table AM reltoptions should > begin with StdRdOptions. I would rather introduce the new data > structure with table options, which need to be accessed outside of > table AM. Then reloptions will be a backbox only directly used in > table AM, while table AM has a freedom on what to store in reloptions > and how to calculate externally-visible options. What do you think? Hi Alexander! I agree with all of that. It will take some refactoring to get there, though. One idea is to store StdRdOptions like normal, but if an unrecognized option is found, ask the table AM if it understands the option. In that case I think we'd just use a different field in pg_class so that it can use whatever format it wants to represent its options. Regards, Jeff Davis
Re: Table AM Interface Enhancements
Hi, Jeff! On Tue, Apr 2, 2024 at 8:19 AM Jeff Davis wrote: > On Sat, 2024-03-30 at 23:33 +0200, Alexander Korotkov wrote: > > I've pushed 0001, 0002 and 0006. > > Sorry to jump in to this discussion so late. I had worked on something > like the custom reloptions (0002) in the past, and there were some > complications that don't seem to be addressed in commit c95c25f9af. > > * At minimum I think it needs some direction (comments, docs, tests) > that show how it's supposed to be used. > > * The bytea returned by the reloptions() method is not in a trivial > format. It's a StdRelOptions struct with string values stored after the > end of the struct. To build the bytea internally, there's some > infrastructure like allocateRelOptStruct() and fillRelOptions(), and > it's not very easy to extend those to support a few custom options. > > * If we ever decide to add a string option to StdRdOptions, I think the > design breaks, because the code that looks for those string values > wouldn't know how to skip over the custom options. Perhaps we can just > promise to never do that, but we should make it explicit somehow. > > * Most existing heap reloptions (other than fillfactor) are used by > other parts of the system (like autovacuum) so should be considered > valid for any AM. Most AMs will just want to add a handful of their own > options on top, so it would be good to demonstrate how this should be > done. > > * There are still places that are inappropriately calling > heap_reloptions directly. For instance, in ProcessUtilitySlow(), it > seems to assume that a toast table is a heap? Thank you for the detailed explanation. This piece definitely needs more work. I've just reverted the c95c25f9af. I don't like the idea that every custom table AM reltoptions should begin with StdRdOptions. I would rather introduce the new data structure with table options, which need to be accessed outside of table AM. Then reloptions will be a backbox only directly used in table AM, while table AM has a freedom on what to store in reloptions and how to calculate externally-visible options. What do you think? -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
On Sat, 2024-03-30 at 23:33 +0200, Alexander Korotkov wrote: > I've pushed 0001, 0002 and 0006. Sorry to jump in to this discussion so late. I had worked on something like the custom reloptions (0002) in the past, and there were some complications that don't seem to be addressed in commit c95c25f9af. * At minimum I think it needs some direction (comments, docs, tests) that show how it's supposed to be used. * The bytea returned by the reloptions() method is not in a trivial format. It's a StdRelOptions struct with string values stored after the end of the struct. To build the bytea internally, there's some infrastructure like allocateRelOptStruct() and fillRelOptions(), and it's not very easy to extend those to support a few custom options. * If we ever decide to add a string option to StdRdOptions, I think the design breaks, because the code that looks for those string values wouldn't know how to skip over the custom options. Perhaps we can just promise to never do that, but we should make it explicit somehow. * Most existing heap reloptions (other than fillfactor) are used by other parts of the system (like autovacuum) so should be considered valid for any AM. Most AMs will just want to add a handful of their own options on top, so it would be good to demonstrate how this should be done. * There are still places that are inappropriately calling heap_reloptions directly. For instance, in ProcessUtilitySlow(), it seems to assume that a toast table is a heap? Regards, Jeff Davis
Re: Table AM Interface Enhancements
On Tue, 02 Apr 2024 at 07:59, Alexander Korotkov wrote: > On Mon, Apr 1, 2024 at 7:36 PM Tom Lane wrote: >> Coverity complained about what you did in RelationParseRelOptions >> in c95c25f9a: >> >> *** CID 1595992: Null pointer dereferences (FORWARD_NULL) >> /srv/coverity/git/pgsql-git/postgresql/src/backend/utils/cache/relcache.c: >> 499 in RelationParseRelOptions() >> 493 >> 494 /* >> 495 * Fetch reloptions from tuple; have to use a hardwired >> descriptor because >> 496 * we might not have any other for pg_class yet (consider >> executing this >> 497 * code for pg_class itself) >> 498 */ >> >>> CID 1595992: Null pointer dereferences (FORWARD_NULL) >> >>> Passing null pointer "tableam" to "extractRelOptions", which >> >>> dereferences it. >> 499 options = extractRelOptions(tuple, GetPgClassDescriptor(), >> 500 >> tableam, amoptsfn); >> 501 >> >> I see that extractRelOptions only uses the tableam argument for some >> relkinds, and RelationParseRelOptions does set it up for those >> relkinds --- but Coverity's complaint isn't without merit, because >> those two switch statements are looking at *different copies of the >> relkind*, which in theory could be different. This all seems quite >> messy and poorly factored. Can't we do better? Why do we need to >> involve two copies of allegedly the same pg_class tuple, anyhow? > > I wasn't registered at Coverity yet. Now I've registered and am > waiting for approval to access the PostgreSQL analysis data. > > I wonder why Coverity complains about tableam, but not amoptsfn. > Their usage patterns are very similar. > > It appears that relation->rd_rel isn't the full copy of pg_class tuple > (see AllocateRelationDesc). RelationParseRelOptions() is going to > update relation->rd_options, and thus needs a full pg_class tuple to > fetch options out of it. However, it is really unnecessary to access > both tuples at the same time. We can use a full tuple, not > relation->rd_rel, in both cases. See the attached patch. > > -- > Regards, > Alexander Korotkov + Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple); +; There is an additional semicolon in the code. -- Regards, Japin Li
Re: Table AM Interface Enhancements
On Mon, Apr 1, 2024 at 7:36 PM Tom Lane wrote: > Coverity complained about what you did in RelationParseRelOptions > in c95c25f9a: > > *** CID 1595992: Null pointer dereferences (FORWARD_NULL) > /srv/coverity/git/pgsql-git/postgresql/src/backend/utils/cache/relcache.c: > 499 in RelationParseRelOptions() > 493 > 494 /* > 495 * Fetch reloptions from tuple; have to use a hardwired > descriptor because > 496 * we might not have any other for pg_class yet (consider > executing this > 497 * code for pg_class itself) > 498 */ > >>> CID 1595992: Null pointer dereferences (FORWARD_NULL) > >>> Passing null pointer "tableam" to "extractRelOptions", which > >>> dereferences it. > 499 options = extractRelOptions(tuple, GetPgClassDescriptor(), > 500 > tableam, amoptsfn); > 501 > > I see that extractRelOptions only uses the tableam argument for some > relkinds, and RelationParseRelOptions does set it up for those > relkinds --- but Coverity's complaint isn't without merit, because > those two switch statements are looking at *different copies of the > relkind*, which in theory could be different. This all seems quite > messy and poorly factored. Can't we do better? Why do we need to > involve two copies of allegedly the same pg_class tuple, anyhow? I wasn't registered at Coverity yet. Now I've registered and am waiting for approval to access the PostgreSQL analysis data. I wonder why Coverity complains about tableam, but not amoptsfn. Their usage patterns are very similar. It appears that relation->rd_rel isn't the full copy of pg_class tuple (see AllocateRelationDesc). RelationParseRelOptions() is going to update relation->rd_options, and thus needs a full pg_class tuple to fetch options out of it. However, it is really unnecessary to access both tuples at the same time. We can use a full tuple, not relation->rd_rel, in both cases. See the attached patch. -- Regards, Alexander Korotkov fix-RelationParseRelOptions-Coverity-complain.patch Description: Binary data
Re: Table AM Interface Enhancements
On Mon, Apr 1, 2024 at 7:36 PM Tom Lane wrote: > > Coverity complained about what you did in RelationParseRelOptions > in c95c25f9a: > > *** CID 1595992: Null pointer dereferences (FORWARD_NULL) > /srv/coverity/git/pgsql-git/postgresql/src/backend/utils/cache/relcache.c: > 499 in RelationParseRelOptions() > 493 > 494 /* > 495 * Fetch reloptions from tuple; have to use a hardwired > descriptor because > 496 * we might not have any other for pg_class yet (consider > executing this > 497 * code for pg_class itself) > 498 */ > >>> CID 1595992: Null pointer dereferences (FORWARD_NULL) > >>> Passing null pointer "tableam" to "extractRelOptions", which > >>> dereferences it. > 499 options = extractRelOptions(tuple, GetPgClassDescriptor(), > 500 > tableam, amoptsfn); > 501 > > I see that extractRelOptions only uses the tableam argument for some > relkinds, and RelationParseRelOptions does set it up for those > relkinds --- but Coverity's complaint isn't without merit, because > those two switch statements are looking at *different copies of the > relkind*, which in theory could be different. This all seems quite > messy and poorly factored. Can't we do better? Why do we need to > involve two copies of allegedly the same pg_class tuple, anyhow? Thank you for reporting this, Tom. I'm planning to investigate this later today. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
Coverity complained about what you did in RelationParseRelOptions in c95c25f9a: *** CID 1595992: Null pointer dereferences (FORWARD_NULL) /srv/coverity/git/pgsql-git/postgresql/src/backend/utils/cache/relcache.c: 499 in RelationParseRelOptions() 493 494 /* 495 * Fetch reloptions from tuple; have to use a hardwired descriptor because 496 * we might not have any other for pg_class yet (consider executing this 497 * code for pg_class itself) 498 */ >>> CID 1595992: Null pointer dereferences (FORWARD_NULL) >>> Passing null pointer "tableam" to "extractRelOptions", which >>> dereferences it. 499 options = extractRelOptions(tuple, GetPgClassDescriptor(), 500 tableam, amoptsfn); 501 I see that extractRelOptions only uses the tableam argument for some relkinds, and RelationParseRelOptions does set it up for those relkinds --- but Coverity's complaint isn't without merit, because those two switch statements are looking at *different copies of the relkind*, which in theory could be different. This all seems quite messy and poorly factored. Can't we do better? Why do we need to involve two copies of allegedly the same pg_class tuple, anyhow? regards, tom lane
Re: Table AM Interface Enhancements
On 31/3/2024 00:33, Alexander Korotkov wrote: I think the way forward might be to introduce the new API, which would isolate executor details from table AM. We may introduce a new data structure InsertWithArbiterContext which would contain EState and a set of callbacks which would avoid table AM from calling the executor directly. That should be our goal for pg18. Now, this is too close to FF to design a new API. I'm a bit late, but have you ever considered adding some sort of index probing routine to the AM interface for estimation purposes? I am working out the problem when we have dubious estimations. For example, we don't have MCV or do not fit MCV statistics for equality of multiple clauses, or we detected that the constant value is out of the histogram range. In such cases (especially for [parameterized] JOINs), the optimizer could have a chance to probe the index and avoid huge underestimation. This makes sense, especially for multicolumn filters/clauses. Having a probing AM method, we may invent something for this challenge. -- regards, Andrei Lepikhov Postgres Professional
Re: Table AM Interface Enhancements
Hi, Pavel! I've pushed 0001, 0002 and 0006. On Fri, Mar 29, 2024 at 5:23 PM Pavel Borisov wrote: >> >> I think for better code look this could be removed: >> >vlock: >> > CHECK_FOR_INTERRUPTS(); >> together with CHECK_FOR_INTERRUPTS(); in heapam_tuple_insert_with_arbiter >> placed in the beginning of while loop. > > To clarify things, this I wrote only about CHECK_FOR_INTERRUPTS(); > rearrangement. Thank you for your review of this patch. But I still think there is a problem that this patch moves part of the executor to table AM which directly uses executor data structures and functions. This works, but not committable since it breaks the encapsulation. I think the way forward might be to introduce the new API, which would isolate executor details from table AM. We may introduce a new data structure InsertWithArbiterContext which would contain EState and a set of callbacks which would avoid table AM from calling the executor directly. That should be our goal for pg18. Now, this is too close to FF to design a new API. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
> > I think for better code look this could be removed: > >vlock: > > CHECK_FOR_INTERRUPTS(); > together with CHECK_FOR_INTERRUPTS(); in heapam_tuple_insert_with_arbiter > placed in the beginning of while loop. > To clarify things, this I wrote only about CHECK_FOR_INTERRUPTS(); rearrangement. Regards, Pavel
Re: Table AM Interface Enhancements
I've looked at patch 0003. Generally, it does a similar thing as 0001 - it exposes a more generalized method tuple_insert_with_arbiter that encapsulates tuple_insert_speculative/tuple_complete_speculative and at the same time allows extensibility of this i.e. different implementation for custom table other than heap by the extensions. Though the code rearrangement is little bit more complicated, the patch is clear. It doesn't change the behavior for heap tables. tuple_insert_speculative/tuple_complete_speculative are removed from table AM methods. I think it would not be hard for existing users of this to adapt to the changes. Code block ExecCheckTupleVisible -- ExecCheckTIDVisible moved to heapam_handler.c I've checked, the code block unchanged except that ExecCheckTIDVisible now gets Relation from the caller instead of constructing it from ResultRelInfo. Also two big code blocks are moved from ExecOnConflictUpdate and ExecInsert to a new method heapam_tuple_insert_with_arbiter. They correspond the old code with several minor modifications. For ExecOnConflictUpdate comment need to be revised. This one is for shifted code: > * Try to lock tuple for update as part of speculative insertion. Probably it is worth to be moved to a comment for heapam_tuple_insert_with_arbiter. For heapam_tuple_insert_with_arbiter both "return NULL" could be shifted level up into the end of the block: >if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate, , >+ arbiterIndexes)) Also I'd add comment for heapam_tuple_insert_with_arbiter: /* See comments for table_tuple_insert_with_arbiter() */ A comment to be corrected: src/backend/access/heap/heapam.c: * implement table_tuple_insert_speculative() As Jaipin said, I'd also propose removing "inline" from heapam_tuple_insert_with_arbiter. More corrections for comments: %s/If tuple doesn't violates/If tuple doesn't violate/g %s/which comprises the list of/list, which comprises/g %s/conflicting tuple gets locked/conflicting tuple should be locked/g I think for better code look this could be removed: >vlock: > CHECK_FOR_INTERRUPTS(); together with CHECK_FOR_INTERRUPTS(); in heapam_tuple_insert_with_arbiter placed in the beginning of while loop. Overall the patch looks good enough to me. Regards, Pavel >
Re: Table AM Interface Enhancements
On Thu, 28 Mar 2024 at 21:26, Alexander Korotkov wrote: > Hi Pavel! > > Revised patchset is attached. > > On Thu, Mar 28, 2024 at 3:12 PM Pavel Borisov wrote: >> The other extensibility that seems quite clear and uncontroversial to me is >> 0006. >> >> It simply shifts the decision on whether tuple inserts should invoke inserts >> to the related indices to the table am level. It doesn't change the current >> heap insert behavior so it's safe for the existing heap access method. But >> new table access methods could redefine this (only for tables created with >> these am's) and make index inserts independently of ExecInsertIndexTuples >> inside their own implementations of tuple_insert/tuple_multi_insert methods. >> >> I'd propose changing the comment: >> >> 1405 * This function sets `*insert_indexes` to true if expects caller to >> return >> 1406 * the relevant index tuples. If `*insert_indexes` is set to false, >> then >> 1407 * this function cares about indexes itself. >> >> in the following way >> >> Tableam implementation of tuple_insert should set `*insert_indexes` to true >> if it expects the caller to insert the relevant index tuples (as in heap >> implementation). It should set `*insert_indexes` to false if it cares >> about index inserts itself and doesn't want the caller to do index inserts. > > Changed as you proposed. > >> Maybe, a commit message is also better to reformulate to describe better who >> should do what. > > Done. > > Also, I removed extra includes in 0001 as you proposed and edited the > commit message in 0002. > >> I think, with rebase and correction in the comments/commit message patch >> 0006 is ready to be committed. > > I'm going to push 0001, 0002 and 0006 if no objections. Thanks for updating the patches. Here are some minor sugesstion. 0003 +static inline TupleTableSlot * +heapam_tuple_insert_with_arbiter(ResultRelInfo *resultRelInfo, I'm not entirely certain whether the "inline" keyword has any effect. 0004 +static bytea * +heapam_indexoptions(amoptions_function amoptions, char relkind, + Datum reloptions, bool validate) +{ + return index_reloptions(amoptions, reloptions, validate); +} Could you please explain why we are not verifying the relkind like heapam_reloptions()? - case RELKIND_VIEW: case RELKIND_MATVIEW: + case RELKIND_VIEW: case RELKIND_PARTITIONED_TABLE: I think this change is unnecessary. + { + Form_pg_class classForm; + HeapTuple classTup; + + /* fetch the relation's relcache entry */ + if (relation->rd_index->indrelid >= FirstNormalObjectId) + { + classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relation->rd_index->indrelid)); + classForm = (Form_pg_class) GETSTRUCT(classTup); + if (classForm->relam >= FirstNormalObjectId) + tableam = GetTableAmRoutineByAmOid(classForm->relam); + else + tableam = GetHeapamTableAmRoutine(); + heap_freetuple(classTup); + } + else + { + tableam = GetHeapamTableAmRoutine(); + } + amoptsfn = relation->rd_indam->amoptions; + } - We can reduce the indentation by moving the classFrom and classTup into the if branch. - Perhaps we could remove the brace of else branch to maintain consistency in the code style. -- Regards, Japin Li
Re: Table AM Interface Enhancements
Hi, Alexander! The other extensibility that seems quite clear and uncontroversial to me is 0006. It simply shifts the decision on whether tuple inserts should invoke inserts to the related indices to the table am level. It doesn't change the current heap insert behavior so it's safe for the existing heap access method. But new table access methods could redefine this (only for tables created with these am's) and make index inserts independently of ExecInsertIndexTuples inside their own implementations of tuple_insert/tuple_multi_insert methods. I'd propose changing the comment: 1405 * This function sets `*insert_indexes` to true if expects caller to return 1406 * the relevant index tuples. If `*insert_indexes` is set to false, then 1407 * this function cares about indexes itself. in the following way Tableam implementation of tuple_insert should set `*insert_indexes` to true if it expects the caller to insert the relevant index tuples (as in heap implementation). It should set `*insert_indexes` to false if it cares about index inserts itself and doesn't want the caller to do index inserts. Maybe, a commit message is also better to reformulate to describe better who should do what. I think, with rebase and correction in the comments/commit message patch 0006 is ready to be committed. Regards, Pavel Borisov.
Re: Table AM Interface Enhancements
Hi, Alexander! Thank you for working on these patches. On Thu, 28 Mar 2024 at 02:14, Alexander Korotkov wrote: > Hi, Pavel! > > Thank you for your feedback. The revised patch set is attached. > > I found that vacuum.c has a lot of heap-specific code. Thus, it > should be OK for analyze.c to keep heap-specific code. Therefore, I > rolled back the movement of functions between files. That leads to a > smaller patch, easier to review. > I agree with you. And with the changes remaining in heapam_handler.c I suppose we can also remove the includes introduced: #include #include "utils/sampling.h" #include "utils/spccache.h" On Wed, Mar 27, 2024 at 2:52 PM Pavel Borisov > wrote: > >> The revised rest of the patchset is attached. > >> 0001 (was 0006) – I prefer the definition of AcquireSampleRowsFunc to > >> stay in vacuum.h. If we move it to sampling.h then we would have to > >> add there includes to define Relation, HeapTuple etc. I'd like to > >> avoid this kind of change. Also, I've deleted > >> table_beginscan_analyze(), because it's only called from > >> tableam-specific AcquireSampleRowsFunc. Also I put some comments to > >> heapam_scan_analyze_next_block() and heapam_scan_analyze_next_tuple() > >> given that there are now no relevant comments for them in tableam.h. > >> I've removed some redundancies from acquire_sample_rows(). And added > >> comments to AcquireSampleRowsFunc based on what we have in FDW docs > >> for this function. Did some small edits as well. As you suggested, > >> turned back declarations for acquire_sample_rows() and compare_rows(). > > > > > > In my comment in the thread I was not thinking about returning the old > name acquire_sample_rows(), it was only about the declarations and the > order of functions to be one code block. To me heapam_acquire_sample_rows() > looks better for a name of heap implementation of *AcquireSampleRowsFunc(). > I suggest returning the name heapam_acquire_sample_rows() from v4. Sorry > for the confusion in this. > > I found that the function name acquire_sample_rows is referenced in > quite several places in the source code. So, I would prefer to save > the old name to keep the changes minimal. > The full list shows only a couple of changes in analyze.c and several comments elsewhere. contrib/postgres_fdw/postgres_fdw.c: * of the relation. Same algorithm as in acquire_sample_rows in src/backend/access/heap/vacuumlazy.c:* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM src/backend/access/heap/vacuumlazy.c:* The logic here is a bit simpler than acquire_sample_rows(), as src/backend/access/heap/vacuumlazy.c:* what acquire_sample_rows() does. src/backend/access/heap/vacuumlazy.c:* acquire_sample_rows() does, so be consistent. src/backend/access/heap/vacuumlazy.c:* acquire_sample_rows() will recognize the same LP_DEAD items as dead src/backend/commands/analyze.c:static int acquire_sample_rows(Relation onerel, int elevel, src/backend/commands/analyze.c: acquirefunc = acquire_sample_rows; src/backend/commands/analyze.c: * acquire_sample_rows -- acquire a random sample of rows from the table src/backend/commands/analyze.c:acquire_sample_rows(Relation onerel, int elevel, src/backend/commands/analyze.c: * This has the same API as acquire_sample_rows, except that rows are src/backend/commands/analyze.c: acquirefunc = acquire_sample_rows; My point for renaming is to make clear that it's a heap implementation of acquire_sample_rows which could be useful for possible reworking heap implementations of table am methods into a separate place later. But I'm also ok with the existing naming. > > The changed type of static function that always returned true for heap > looks good to me: > > static void heapam_scan_analyze_next_block > > > > The same is for removing the comparison of always true "block_accepted" > in (heapam_)acquire_sample_rows() > > Ok! > > > Removing table_beginscan_analyze and call scan_begin() is not in the > same style as other table_beginscan_* functions. Though this is not a > change in functionality, I'd leave this part as it was in v4. > > With the patch, this method doesn't have usages outside of table am. > I don't think we should keep a method, which doesn't have clear > external usage patterns. But I agree that starting a scan with > heap_beginscan() and ending with table_endscan() is not correct. Now > ending this scan with heap_endscan(). > Good! > > Also, a comment about it was introduced in v5: > > > > src/backend/access/heap/heapam_handler.c: * with > table_beginscan_analyze() > > Corrected. > For comments I'd propose: > > %s/In addition, store estimates/In addition, a function should store > estimates/g > > %s/zerp/zero/g > > Fixed. > > >> 0002 (was 0007) – I've turned the redundant "if", which you've pointed > >> out, into an assert. Also, added some comments, most
Re: Table AM Interface Enhancements
> > This seems not needed, it's already inited to InvalidOid before. > +else > +accessMethod = default_table_access_method; > > + accessMethodId = InvalidOid; > > This code came from 374c7a22904. I don't insist on this simplification in > a patch 0002. > A correction of the code quote for the previous message: +else + accessMethodId = InvalidOid;
Re: Table AM Interface Enhancements
Hi, Alexander! The revised rest of the patchset is attached. > 0001 (was 0006) – I prefer the definition of AcquireSampleRowsFunc to > stay in vacuum.h. If we move it to sampling.h then we would have to > add there includes to define Relation, HeapTuple etc. I'd like to > avoid this kind of change. Also, I've deleted > table_beginscan_analyze(), because it's only called from > tableam-specific AcquireSampleRowsFunc. Also I put some comments to > heapam_scan_analyze_next_block() and heapam_scan_analyze_next_tuple() > given that there are now no relevant comments for them in tableam.h. > I've removed some redundancies from acquire_sample_rows(). And added > comments to AcquireSampleRowsFunc based on what we have in FDW docs > for this function. Did some small edits as well. As you suggested, > turned back declarations for acquire_sample_rows() and compare_rows(). > In my comment in the thread I was not thinking about returning the old name acquire_sample_rows(), it was only about the declarations and the order of functions to be one code block. To me heapam_acquire_sample_rows() looks better for a name of heap implementation of *AcquireSampleRowsFunc(). I suggest returning the name heapam_acquire_sample_rows() from v4. Sorry for the confusion in this. The changed type of static function that always returned true for heap looks good to me: static void heapam_scan_analyze_next_block The same is for removing the comparison of always true "block_accepted" in (heapam_)acquire_sample_rows() Removing table_beginscan_analyze and call scan_begin() is not in the same style as other table_beginscan_* functions. Though this is not a change in functionality, I'd leave this part as it was in v4. Also, a comment about it was introduced in v5: src/backend/access/heap/heapam_handler.c: * with table_beginscan_analyze() For comments I'd propose: %s/In addition, store estimates/In addition, a function should store estimates/g %s/zerp/zero/g > 0002 (was 0007) – I've turned the redundant "if", which you've pointed > out, into an assert. Also, added some comments, most notably comment > for TableAmRoutine.reloptions based on the indexam docs. > %s/validate sthe/validates the/g This seems not needed, it's already inited to InvalidOid before. +else +accessMethod = default_table_access_method; + accessMethodId = InvalidOid; This code came from 374c7a22904. I don't insist on this simplification in a patch 0002. Overall both patches look good to me. Regards, Pavel Borisov.
Re: Table AM Interface Enhancements
On Fri, 22 Mar 2024 at 08:51, Pavel Borisov wrote: > Hi, Alexander! > > Thank you for working on this patchset and pushing some of these patches! > > I tried to write comments for tts_minimal_is_current_xact_tuple() > and tts_minimal_getsomeattrs() for them to be the same as for the same > functions for heap and virtual tuple slots, as I proposed above in the > thread. (tts_minimal_getsysattr is not introduced by the current patchset, > but anyway) > > Meanwhile I found that (never appearing) error message > for tts_minimal_is_current_xact_tuple needs to be corrected. Please see the > patch in the attachment. > > I need to correct myself: it's for tts_minimal_getsysattr() not tts_minimal_getsomeattrs() Pavel.
Re: Table AM Interface Enhancements
Hi, Alexander! Thank you for working on this patchset and pushing some of these patches! I tried to write comments for tts_minimal_is_current_xact_tuple() and tts_minimal_getsomeattrs() for them to be the same as for the same functions for heap and virtual tuple slots, as I proposed above in the thread. (tts_minimal_getsysattr is not introduced by the current patchset, but anyway) Meanwhile I found that (never appearing) error message for tts_minimal_is_current_xact_tuple needs to be corrected. Please see the patch in the attachment. Regards, Pavel Borisov Add-comments-on-MinimalTupleSlots-usage.patch Description: Binary data
Re: Table AM Interface Enhancements
Hi, Alexander! On Wed, 20 Mar 2024 at 09:22, Pavel Borisov wrote: > Hi, Alexander! > > For 0007: > > Code inside > > +heapam_reloptions(char relkind, Datum reloptions, bool validate) > +{ > + if (relkind == RELKIND_RELATION || > + relkind == RELKIND_TOASTVALUE || > + relkind == RELKIND_MATVIEW) > + return heap_reloptions(relkind, reloptions, validate); > + > + return NULL; > > looks redundant to what is done inside heap_reloptions(). Was this on > purpose? Is it possible to leave only "return heap_reloptions()" ? > > This looks like a duplicate: > src/include/access/reloptions.h:extern bytea > *index_reloptions(amoptions_function amoptions, Datum reloptions, > src/include/access/tableam.h:extern bytea > *index_reloptions(amoptions_function amoptions, Datum reloptions, > > Otherwise the patch looks good and doing what it's proposed to do. > For patch 0006: The change for analyze is in the same style as for previous table am extensibility patches. table_scan_analyze_next_tuple/table_scan_analyze_next_block existing extensibility is dropped in favour of more general method table_relation_analyze. I haven't found existing extensions on a GitHub that use these table am's, so probably it's quite ok to remove the extensibility that didn't get any traction for many years. The patch contains a big block of code copy-paste. I've checked that the code is the same with only function name replacement in favor to using table am instead of heap am. I'd propose restoring the static functions declaration in the head of the file, which was removed in the patch and place heapam_acquire_sample_rows() above compare_rows() to make functions copied as the whole code block. This is for better patch look only, not a principal change. -static int acquire_sample_rows(Relation onerel, int elevel, - HeapTuple *rows, int targrows, - double *totalrows, double *totaldeadrows); -static int compare_rows(const void *a, const void *b, void *arg) May it also be a better place than vacuum.h for typedef int (*AcquireSampleRowsFunc) ? Maybe sampling.h ? The other patch that I'd like to review is 0012: For a typedef enum RowRefType I think some comments would be useful to avoid confusion about the changes like - newrc->allMarkTypes = (1 << newrc->markType); + newrc->allRefTypes = (1 << refType); Also I think the semantical difference between ROW_REF_COPY and ROW_MARK_COPY is better to be mentioned in the comments and/or commit message. This may include a description of assigning different reftypes in parse_relation.c In a comment there is a small confusion between markType and refType: * The parent's allRefTypes field gets the OR of (1<
Re: Table AM Interface Enhancements
Hi, Alexander! For 0007: Code inside +heapam_reloptions(char relkind, Datum reloptions, bool validate) +{ + if (relkind == RELKIND_RELATION || + relkind == RELKIND_TOASTVALUE || + relkind == RELKIND_MATVIEW) + return heap_reloptions(relkind, reloptions, validate); + + return NULL; looks redundant to what is done inside heap_reloptions(). Was this on purpose? Is it possible to leave only "return heap_reloptions()" ? This looks like a duplicate: src/include/access/reloptions.h:extern bytea *index_reloptions(amoptions_function amoptions, Datum reloptions, src/include/access/tableam.h:extern bytea *index_reloptions(amoptions_function amoptions, Datum reloptions, Otherwise the patch looks good and doing what it's proposed to do. Regards, Pavel Borisov.
Re: Table AM Interface Enhancements
On Tue, 19 Mar 2024 at 21:05, Alexander Korotkov wrote: > Hi, Pavel! > > On Tue, Mar 19, 2024 at 11:34 AM Pavel Borisov wrote: >> On Tue, 19 Mar 2024 at 03:34, Alexander Korotkov >> wrote: >>> >>> On Sun, Mar 3, 2024 at 1:50 PM Alexander Korotkov >>> wrote: >>> > On Mon, Nov 27, 2023 at 10:18 PM Mark Dilger >>> > wrote: >>> > > >>> > > > On Nov 25, 2023, at 9:47 AM, Alexander Korotkov >>> > > > wrote: >>> > > > >>> > > >> Should the patch at least document which parts of the EState are >>> > > >> expected to be in which states, and which parts should be viewed as >>> > > >> undefined? If the implementors of table AMs rely on any/all aspects >>> > > >> of EState, doesn't that prevent future changes to how that structure >>> > > >> is used? >>> > > > >>> > > > New tuple tuple_insert_with_arbiter() table AM API method needs EState >>> > > > argument to call executor functions: ExecCheckIndexConstraints(), >>> > > > ExecUpdateLockMode(), and ExecInsertIndexTuples(). I think we >>> > > > probably need to invent some opaque way to call this executor function >>> > > > without revealing EState to table AM. Do you think this could work? >>> > > >>> > > We're clearly not accessing all of the EState, just some specific >>> > > fields, such as es_per_tuple_exprcontext. I think you could at least >>> > > refactor to pass the minimum amount of state information through the >>> > > table AM API. >>> > >>> > Yes, the table AM doesn't need the full EState, just the ability to do >>> > specific manipulation with tuples. I'll refactor the patch to make a >>> > better isolation for this. >>> >>> Please find the revised patchset attached. The changes are following: >>> 1. Patchset is rebase. to the current master. >>> 2. Patchset is reordered. I tried to put less debatable patches to the top. >>> 3. tuple_is_current() method is moved from the Table AM API to the >>> slot as proposed by Matthias van de Meent. >>> 4. Assert added to the table_free_rd_amcache() as proposed by Pavel Borisov. >> >> >> Patches 0001-0002 are unchanged compared to the last version in thread [1]. >> In my opinion, it's still ready to be committed, which was not done for time >> were too close to feature freeze one year ago. >> >> 0003 - Assert added from previous version. I still have a strong opinion >> that allowing multi-chunked data structures instead of single chunks is >> completely safe and makes natural process of Postgres improvement that is >> self-justified. The patch is simple enough and ready to be pushed. >> >> 0004 (previously 0007) - Have not changed, and there is consensus that this >> is reasonable. I've re-checked the current code. Looks safe considering >> returning a different slot, which I doubted before. So consider this patch >> also ready. >> >> 0005 (previously 0004) - Unused argument in the is_current_xact_tuple() >> signature is removed. Also comparing to v1 the code shifted from tableam >> methods to TTS's level. >> >> I'd propose to remove Assert(!TTS_EMPTY(slot)) for >> tts_minimal_is_current_xact_tuple() and tts_virtual_is_current_xact_tuple() >> as these are only error reporting functions that don't use slot actually. >> >> Comment similar to: >> +/* >> + * VirtualTupleTableSlots never have a storage tuples. We generally >> + * shouldn't get here, but provide a user-friendly message if we do. >> + */ >> also applies to tts_minimal_is_current_xact_tuple() >> >> I'd propose changes for clarity of this comment: >> %s/a storage tuples/storage tuples/g >> %s/generally//g >> >> Otherwise patch 0005 also looks good to me. >> >> I'm planning to review the remaining patches. Meanwhile think pushing what >> is now ready and uncontroversial is a good intention. >> Thank you for the work done on this patchset! > > Thank you, Pavel! > > Regarding 0005, I did apply "a storage tuples" grammar fix. Regarding > the rest of the things, I'd like to keep methods > tts_*_is_current_xact_tuple() to be similar to nearby > tts_*_getsysattr(). This is why I'm keeping the rest unchanged. I > think we could refactor that later, but together with > tts_*_getsysattr() methods. > > I'm going to push 0003, 0004 and 0005 if there are no objections. > > And I'll update 0001 and 0002 in their dedicated thread. > When I try to test the patch on Ubuntu 22.04 with GCC 11.4.0. There are some warnings as following: /home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c: In function ‘heapam_acquire_sample_rows’: /home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c:1603:28: warning: implicit declaration of function ‘get_tablespace_maintenance_io_concurrency’ [-Wimplicit-function-declaration] 1603 | prefetch_maximum = get_tablespace_maintenance_io_concurrency(onerel->rd_rel->reltablespace); |^ /home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c:1757:30:
Re: Table AM Interface Enhancements
Hi, Alexander! On Tue, 19 Mar 2024 at 03:34, Alexander Korotkov wrote: > On Sun, Mar 3, 2024 at 1:50 PM Alexander Korotkov > wrote: > > On Mon, Nov 27, 2023 at 10:18 PM Mark Dilger > > wrote: > > > > > > > On Nov 25, 2023, at 9:47 AM, Alexander Korotkov < > aekorot...@gmail.com> wrote: > > > > > > > >> Should the patch at least document which parts of the EState are > expected to be in which states, and which parts should be viewed as > undefined? If the implementors of table AMs rely on any/all aspects of > EState, doesn't that prevent future changes to how that structure is used? > > > > > > > > New tuple tuple_insert_with_arbiter() table AM API method needs > EState > > > > argument to call executor functions: ExecCheckIndexConstraints(), > > > > ExecUpdateLockMode(), and ExecInsertIndexTuples(). I think we > > > > probably need to invent some opaque way to call this executor > function > > > > without revealing EState to table AM. Do you think this could work? > > > > > > We're clearly not accessing all of the EState, just some specific > fields, such as es_per_tuple_exprcontext. I think you could at least > refactor to pass the minimum amount of state information through the table > AM API. > > > > Yes, the table AM doesn't need the full EState, just the ability to do > > specific manipulation with tuples. I'll refactor the patch to make a > > better isolation for this. > > Please find the revised patchset attached. The changes are following: > 1. Patchset is rebase. to the current master. > 2. Patchset is reordered. I tried to put less debatable patches to the > top. > 3. tuple_is_current() method is moved from the Table AM API to the > slot as proposed by Matthias van de Meent. > 4. Assert added to the table_free_rd_amcache() as proposed by Pavel > Borisov. > Patches 0001-0002 are unchanged compared to the last version in thread [1]. In my opinion, it's still ready to be committed, which was not done for time were too close to feature freeze one year ago. 0003 - Assert added from previous version. I still have a strong opinion that allowing multi-chunked data structures instead of single chunks is completely safe and makes natural process of Postgres improvement that is self-justified. The patch is simple enough and ready to be pushed. 0004 (previously 0007) - Have not changed, and there is consensus that this is reasonable. I've re-checked the current code. Looks safe considering returning a different slot, which I doubted before. So consider this patch also ready. 0005 (previously 0004) - Unused argument in the is_current_xact_tuple() signature is removed. Also comparing to v1 the code shifted from tableam methods to TTS's level. I'd propose to remove Assert(!TTS_EMPTY(slot)) for tts_minimal_is_current_xact_tuple() and tts_virtual_is_current_xact_tuple() as these are only error reporting functions that don't use slot actually. Comment similar to: +/* + * VirtualTupleTableSlots never have a storage tuples. We generally + * shouldn't get here, but provide a user-friendly message if we do. + */ also applies to tts_minimal_is_current_xact_tuple() I'd propose changes for clarity of this comment: %s/a storage tuples/storage tuples/g %s/generally//g Otherwise patch 0005 also looks good to me. I'm planning to review the remaining patches. Meanwhile think pushing what is now ready and uncontroversial is a good intention. Thank you for the work done on this patchset! Regards, Pavel Borisov, Supabase. [1]. https://www.postgresql.org/message-id/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
Re: Table AM Interface Enhancements
On Mon, Nov 27, 2023 at 10:18 PM Mark Dilger wrote: > > > On Nov 25, 2023, at 9:47 AM, Alexander Korotkov > > wrote: > > > >> Should the patch at least document which parts of the EState are expected > >> to be in which states, and which parts should be viewed as undefined? If > >> the implementors of table AMs rely on any/all aspects of EState, doesn't > >> that prevent future changes to how that structure is used? > > > > New tuple tuple_insert_with_arbiter() table AM API method needs EState > > argument to call executor functions: ExecCheckIndexConstraints(), > > ExecUpdateLockMode(), and ExecInsertIndexTuples(). I think we > > probably need to invent some opaque way to call this executor function > > without revealing EState to table AM. Do you think this could work? > > We're clearly not accessing all of the EState, just some specific fields, > such as es_per_tuple_exprcontext. I think you could at least refactor to > pass the minimum amount of state information through the table AM API. Yes, the table AM doesn't need the full EState, just the ability to do specific manipulation with tuples. I'll refactor the patch to make a better isolation for this. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
Hi, Matthias! On Fri, Nov 24, 2023 at 1:07 AM Matthias van de Meent wrote: > On Thu, 23 Nov 2023 at 13:43, Alexander Korotkov wrote: > > > > Hello PostgreSQL Hackers, > > > > I am pleased to submit a series of patches related to the Table Access > > Method (AM) interface, which I initially announced during my talk at > > PGCon 2023 [1]. These patches are primarily designed to support the > > OrioleDB engine, but I believe they could be beneficial for other > > table AM implementations as well. > > > > The focus of these patches is to introduce more flexibility and > > capabilities into the Table AM interface. This is particularly > > relevant for advanced use cases like index-organized tables, > > alternative MVCC implementations, etc. > > > > Here's a brief overview of the patches included in this set: > > Note: no significant review of the patches, just a first response on > the cover letters and oddities I noticed: > > Overall, this patchset adds significant API area to TableAmRoutine, > without adding the relevant documentation on how it's expected to be > used. I have to note that, unlike documentation for index access methods, our documentation for table access methods doesn't have an explanation of API functions. Instead, it just refers to tableam.h for details. The patches touching tableam.h also revise relevant comments. These comments are for sure a target for improvements. > With the overall size of the patchset also being very > significant I wouldn't say that volume is very significant. It's just 2K lines, not the great size of a patchset. But it for sure represents changes of great importance. > I don't think this patch is reviewable as is; the goal > isn't clear enough, The goal is to revise table AM API so that new full-featured implementations could exist. AFAICS, the current API was designed keeping zheap in mind, but even zheap was always shipped with the core patch. All other implementations of table AM, which I've seen, are very limited. Basically, there is still no real alternative and functional OLTP table AM. I believe API limitation is one of the reasons for that. > the APIs aren't well explained, and As I mentioned before, the table AM API is documented by the comments in tableam.h. The comments in the patchset aren't perfect for sure, but a subject for the incremental improvements. > the interactions with the index API are left up in the air. Right. These patches bring more control on interactions with indexes to table AMs without touching the index API. In my PGCon 2016 talk I proposed that table AM could have its own implementation of index AM. As you mentioned before, this patchset isn't very small already. Considering it all together with a patchset for index AM redesign would make it a mess. I propose we can consider here the patches, which are usable by themselves even without index AM changes. And the patches tightly coupled with index AM API changes could be considered later together with those changes. > > 0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch > > > > Optimizes the process of locking concurrently updated tuples during > > update and delete operations. Helpful for table AMs where refinding > > existing tuples is expensive. > > Is this essentially an optimized implementation of the "DELETE FROM > ... RETURNING *" per-tuple primitive? Not really. The test for "DELETE FROM ... RETURNING *" was used just to reproduce one of the bugs stopped in [2]. The general idea is to avoid repeated calls for tuple lock. > > 0003-Allow-table-AM-to-store-complex-data-structures-i-v1.patch > > > > Allows table AM to store complex data structure in rd_amcache rather > > than a single chunk of memory. > > I don't think we should allow arbitrarily large and arbitrarily many > chunks of data in the relcache or table caches. Hmm.. It seems to be far out of control of API what and how large PostgreSQL extensions could actually cache. > Why isn't one chunk > enough? It's generally possible to fit everything into one chunk, but that's extremely unhandy when your cache contains something at least as complex as tuple slots and descriptors. I think the reason that we still have one chunk restriction is that we don't have a full-featured implementation fitting API yet. If we had it, I can't imagine there would be one chunk for a cache. > > 0004-Add-table-AM-tuple_is_current-method-v1.patch > > > > This allows us to abstract how/whether table AM uses transaction > > identifiers. > > I'm not a fan of the indirection here. Also, assuming that table slots > don't outlive transactions, wouldn't this be a more appropriate fit > with the table tuple slot API? This is a good idea. I will update the patch accordingly. > > 0005-Generalize-relation-analyze-in-table-AM-interface-v1.patch > > > > Provides a more flexible API for sampling tuples, beneficial for > > non-standard table types like index-organized tables. > > > >
Re: Table AM Interface Enhancements
> > > Additionally changes in 0007 looks dependent from 0005. Does replacement > of slot inside ExecInsert, that is already used in the code below the call > of > > >/* insert the tuple normally */ > >- table_tuple_insert(resultRelationDesc, slot, > >- estate->es_output_cid, > >- 0, NULL); > > could be done without side effects? > I'm sorry that I inserter not all relevant code in the previous message: /* insert the tuple normally */ - table_tuple_insert(resultRelationDesc, slot, - estate->es_output_cid, - 0, NULL); + slot = table_tuple_insert(resultRelationDesc, slot, + estate->es_output_cid, + (Previously slot variable that exists in the ExecInsert() and could be used later was not modified at the quoted code block) Pavel.
Re: Table AM Interface Enhancements
Hi, Alexander! I've reviewed patch 0004. It's clear enough and I think does what it's supposed. One thing, in function signature +bool (*tuple_is_current) (Relation rel, TupleTableSlot *slot); there is a Relation agrument, which is unused in both existing heapam method. Also it's unused in OrioleDb implementation of tuple_is_current. For what goal it is needed in the interface? No other objections around this patch. I've also looked at 0005-0007. Although it is not a thorough review, they seem to depend on previous patch 0004. Additionally changes in 0007 looks dependent from 0005. Does replacement of slot inside ExecInsert, that is already used in the code below the call of >/* insert the tuple normally */ >- table_tuple_insert(resultRelationDesc, slot, >- estate->es_output_cid, >- 0, NULL); could be done without side effects? Kind regards, Pavel. >
Re: Table AM Interface Enhancements
Hi, Nikita! On Wed, 29 Nov 2023 at 18:27, Nikita Malakhov wrote: > Hi, > > Pavel, as far as I understand Alexander's idea assertion and especially > ereport > here does not make any sense - this method is not considered to report > error, it > silently calls if there is underlying [free] function and simply falls > through otherwise, > also, take into account that it could be located in the uninterruptible > part of the code. > > On the whole topic I have to > > On Wed, Nov 29, 2023 at 4:56 PM Pavel Borisov > wrote: > >> Hi, Alexander! >> >>> I'm planning to review some of the other patches from the current >>> patchset soon. >>> >> >> I've looked into the patch 0003. >> The patch looks in good shape and is uncontroversial to me. Making memory >> structures to be dynamically allocated is simple enough and it allows to >> store complex data like lists etc. I consider places like this that expect >> memory structures to be flat and allocated at once are because the was no >> need in more complex ones previously. If there is a need for them, I think >> they could be added without much doubt, provided the simplicity of the >> change. >> >> For the code: >> +static inline void >> +table_free_rd_amcache(Relation rel) >> +{ >> + if (rel->rd_tableam && rel->rd_tableam->free_rd_amcache) >> + { >> + rel->rd_tableam->free_rd_amcache(rel); >> + } >> + else >> + { >> + if (rel->rd_amcache) >> + pfree(rel->rd_amcache); >> + rel->rd_amcache = NULL; >> + } >> >> here I suggest adding Assert(rel->rd_amcache == NULL) (or maybe better an >> error report) after calling free_rd_amcache to be sure the custom >> implementation has done what it should do. >> >> Also, I think some brief documentation about writing this custom method >> is quite relevant maybe based on already existing comments in the code. >> >> Kind regards, >> Pavel >> > > When we do default single chunk routine we invalidate rd_amcache pointer, + if (rel->rd_amcache) + pfree(rel->rd_amcache); + rel->rd_amcache = NULL; If we delegate this to method, my idea is to check the method implementation don't leave this pointer valid. If it's not needed, I'm ok with it, but to me it seems that the check I proposed makes sense. Regards, Pavel
Re: Table AM Interface Enhancements
Hi, Pavel, as far as I understand Alexander's idea assertion and especially ereport here does not make any sense - this method is not considered to report error, it silently calls if there is underlying [free] function and simply falls through otherwise, also, take into account that it could be located in the uninterruptible part of the code. On the whole topic I have to On Wed, Nov 29, 2023 at 4:56 PM Pavel Borisov wrote: > Hi, Alexander! > >> I'm planning to review some of the other patches from the current >> patchset soon. >> > > I've looked into the patch 0003. > The patch looks in good shape and is uncontroversial to me. Making memory > structures to be dynamically allocated is simple enough and it allows to > store complex data like lists etc. I consider places like this that expect > memory structures to be flat and allocated at once are because the was no > need in more complex ones previously. If there is a need for them, I think > they could be added without much doubt, provided the simplicity of the > change. > > For the code: > +static inline void > +table_free_rd_amcache(Relation rel) > +{ > + if (rel->rd_tableam && rel->rd_tableam->free_rd_amcache) > + { > + rel->rd_tableam->free_rd_amcache(rel); > + } > + else > + { > + if (rel->rd_amcache) > + pfree(rel->rd_amcache); > + rel->rd_amcache = NULL; > + } > > here I suggest adding Assert(rel->rd_amcache == NULL) (or maybe better an > error report) after calling free_rd_amcache to be sure the custom > implementation has done what it should do. > > Also, I think some brief documentation about writing this custom method is > quite relevant maybe based on already existing comments in the code. > > Kind regards, > Pavel > -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/
Re: Table AM Interface Enhancements
Hi, Alexander! > I'm planning to review some of the other patches from the current patchset > soon. > I've looked into the patch 0003. The patch looks in good shape and is uncontroversial to me. Making memory structures to be dynamically allocated is simple enough and it allows to store complex data like lists etc. I consider places like this that expect memory structures to be flat and allocated at once are because the was no need in more complex ones previously. If there is a need for them, I think they could be added without much doubt, provided the simplicity of the change. For the code: +static inline void +table_free_rd_amcache(Relation rel) +{ + if (rel->rd_tableam && rel->rd_tableam->free_rd_amcache) + { + rel->rd_tableam->free_rd_amcache(rel); + } + else + { + if (rel->rd_amcache) + pfree(rel->rd_amcache); + rel->rd_amcache = NULL; + } here I suggest adding Assert(rel->rd_amcache == NULL) (or maybe better an error report) after calling free_rd_amcache to be sure the custom implementation has done what it should do. Also, I think some brief documentation about writing this custom method is quite relevant maybe based on already existing comments in the code. Kind regards, Pavel
Re: Table AM Interface Enhancements
Hi, Alexander! I think table AM extensibility is a very good idea generally, not only in the scope of APIs that are needed in OrioleDB. Thanks for your proposals! For patches > 0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch 0002-Add-EvalPlanQual-delete-returning-isolation-test-v1.patch The new isolation test is related to the previous patch. These two patches were previously discussed in [2]. As discussion in [2] seems close to the patches being committed and the only thing it is not in v16 yet is that it was too close to feature freeze, I've copied these most recent versions of patches 0001 and 0002 from this thread in [2] to finish and commit them there. I'm planning to review some of the other patches from the current patchset soon. [2]. https://www.postgresql.org/message-id/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Kind regards, Pavel Borisov
Re: Table AM Interface Enhancements
> On Nov 25, 2023, at 9:47 AM, Alexander Korotkov wrote: > >> Should the patch at least document which parts of the EState are expected to >> be in which states, and which parts should be viewed as undefined? If the >> implementors of table AMs rely on any/all aspects of EState, doesn't that >> prevent future changes to how that structure is used? > > New tuple tuple_insert_with_arbiter() table AM API method needs EState > argument to call executor functions: ExecCheckIndexConstraints(), > ExecUpdateLockMode(), and ExecInsertIndexTuples(). I think we > probably need to invent some opaque way to call this executor function > without revealing EState to table AM. Do you think this could work? We're clearly not accessing all of the EState, just some specific fields, such as es_per_tuple_exprcontext. I think you could at least refactor to pass the minimum amount of state information through the table AM API. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Table AM Interface Enhancements
On Fri, Nov 24, 2023 at 5:18 PM Mark Dilger wrote: > > On Nov 23, 2023, at 4:42 AM, Alexander Korotkov > > wrote: > > > > 0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch > > > > Provides a new table AM API method to encapsulate the whole INSERT ... > > ON CONFLICT ... algorithm rather than just implementation of > > speculative tokens. > > I *think* I understand that you are taking the part of INSERT..ON CONFLICT > that lives outside the table AM and pulling it inside so that table AM > authors are free to come up with whatever implementation is more suited for > them. The most straightforward way of doing so results in an EState > parameter in the interface definition. That seems not so good, as the EState > is a fairly complicated structure, and future changes in the executor might > want to rearrange what EState tracks, which would change which values > tuple_insert_with_arbiter() can depend on. I think this is the correct understanding. > Should the patch at least document which parts of the EState are expected to > be in which states, and which parts should be viewed as undefined? If the > implementors of table AMs rely on any/all aspects of EState, doesn't that > prevent future changes to how that structure is used? New tuple tuple_insert_with_arbiter() table AM API method needs EState argument to call executor functions: ExecCheckIndexConstraints(), ExecUpdateLockMode(), and ExecInsertIndexTuples(). I think we probably need to invent some opaque way to call this executor function without revealing EState to table AM. Do you think this could work? > > 0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch > > > > Allows table AM to skip index insertions in the executor and handle > > those insertions itself. > > The new parameter could use more documentation. > > > 0009-Custom-reloptions-for-table-AM-v1.patch > > > > Enables table AMs to define and override reloptions for tables and indexes. > > This could use some regression tests to exercise the custom reloptions. Thank you for these notes. I'll take this into account for the next patchset version. -- Regards, Alexander Korotkov
Re: Table AM Interface Enhancements
> On Nov 23, 2023, at 4:42 AM, Alexander Korotkov wrote: > 0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch > > Provides a new table AM API method to encapsulate the whole INSERT ... > ON CONFLICT ... algorithm rather than just implementation of > speculative tokens. I *think* I understand that you are taking the part of INSERT..ON CONFLICT that lives outside the table AM and pulling it inside so that table AM authors are free to come up with whatever implementation is more suited for them. The most straightforward way of doing so results in an EState parameter in the interface definition. That seems not so good, as the EState is a fairly complicated structure, and future changes in the executor might want to rearrange what EState tracks, which would change which values tuple_insert_with_arbiter() can depend on. Should the patch at least document which parts of the EState are expected to be in which states, and which parts should be viewed as undefined? If the implementors of table AMs rely on any/all aspects of EState, doesn't that prevent future changes to how that structure is used? > 0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch > > Allows table AM to skip index insertions in the executor and handle > those insertions itself. The new parameter could use more documentation. > 0009-Custom-reloptions-for-table-AM-v1.patch > > Enables table AMs to define and override reloptions for tables and indexes. This could use some regression tests to exercise the custom reloptions. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Table AM Interface Enhancements
Hi, On Thu, 23 Nov 2023 at 13:43, Alexander Korotkov wrote: > > Hello PostgreSQL Hackers, > > I am pleased to submit a series of patches related to the Table Access > Method (AM) interface, which I initially announced during my talk at > PGCon 2023 [1]. These patches are primarily designed to support the > OrioleDB engine, but I believe they could be beneficial for other > table AM implementations as well. > > The focus of these patches is to introduce more flexibility and > capabilities into the Table AM interface. This is particularly > relevant for advanced use cases like index-organized tables, > alternative MVCC implementations, etc. > > Here's a brief overview of the patches included in this set: Note: no significant review of the patches, just a first response on the cover letters and oddities I noticed: Overall, this patchset adds significant API area to TableAmRoutine, without adding the relevant documentation on how it's expected to be used. With the overall size of the patchset also being very significant, I don't think this patch is reviewable as is; the goal isn't clear enough, the APIs aren't well explained, and the interactions with the index API are left up in the air. > 0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch > > Optimizes the process of locking concurrently updated tuples during > update and delete operations. Helpful for table AMs where refinding > existing tuples is expensive. Is this essentially an optimized implementation of the "DELETE FROM ... RETURNING *" per-tuple primitive? > 0003-Allow-table-AM-to-store-complex-data-structures-i-v1.patch > > Allows table AM to store complex data structure in rd_amcache rather > than a single chunk of memory. I don't think we should allow arbitrarily large and arbitrarily many chunks of data in the relcache or table caches. Why isn't one chunk enough? > 0004-Add-table-AM-tuple_is_current-method-v1.patch > > This allows us to abstract how/whether table AM uses transaction identifiers. I'm not a fan of the indirection here. Also, assuming that table slots don't outlive transactions, wouldn't this be a more appropriate fit with the table tuple slot API? > 0005-Generalize-relation-analyze-in-table-AM-interface-v1.patch > > Provides a more flexible API for sampling tuples, beneficial for > non-standard table types like index-organized tables. > > 0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch > > Provides a new table AM API method to encapsulate the whole INSERT ... > ON CONFLICT ... algorithm rather than just implementation of > speculative tokens. Does this not still require speculative inserts, with speculative tokens, for secondary indexes? Why make AMs implement that all over again? > 0007-Allow-table-AM-tuple_insert-method-to-return-the--v1.patch > > This allows table AM to return a native tuple slot, which is aware of > table AM-specific system attributes. This seems reasonable. > 0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch > > Allows table AM to skip index insertions in the executor and handle > those insertions itself. Who handles index tuple removal then? I don't see a patch that describes index AM changes for this... > 0009-Custom-reloptions-for-table-AM-v1.patch > > Enables table AMs to define and override reloptions for tables and indexes. > > 0010-Notify-table-AM-about-index-creation-v1.patch > > Allows table AMs to prepare or update specific meta-information during > index creation. I don't think the described use case of this API is OK - a table AM cannot know about the internals of index AMs, and is definitely not allowed to overwrite the information of that index. If I ask for an index that uses the "btree" index, then that needs to be the AM actually used, or an error needs to be raised if it is somehow incompatible with the table AM used. It can't be that we silently update information and create an index that is explicitly not what the user asked to create. I also don't see updates in documentation, which I think is quite a shame as I have trouble understanding some parts. > 0012-Introduce-RowID-bytea-tuple-identifier-v1.patch > > `This patch introduces 'RowID', a new bytea tuple identifier, to > overcome the limitations of the current 32-bit block number and 16-bit > offset-based tuple identifier. This is particularly useful for > index-organized tables and other advanced use cases. We don't have any index methods that can handle anything but block+offset TIDs, and I don't see any changes to the IndexAM APIs to support these RowID tuples, so what's the plan here? I don't see any of that in the commit message, nor in the rest of this patchset. > Each commit message contains a detailed explanation of the changes and > their rationale. I believe these enhancements will significantly > improve the flexibility and capabilities of the PostgreSQL Table AM > interface. I've noticed there is not a lot of rationale for several of the changes as to why