Re: Table AM Interface Enhancements

Alexander Korotkov Thu, 11 Apr 2024 10:46:28 -0700

Hi!

On Thu, Apr 11, 2024 at 7:19 PM Melanie Plageman
<melanieplage...@gmail.com> wrote:
> On Wed, Apr 10, 2024 at 5:21 PM Andres Freund <and...@anarazel.de> wrote:
> > On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote:
> > > This brings up a question about the prefetching. We never had to have
> > > this discussion for sequential scan streaming read because it didn't
> > > (and still doesn't) do prefetching. But, if we push the streaming read
> > > code down into the heap AM layer, it will be doing the prefetching.
> > > So, do we remove the prefetching from acquire_sample_rows() and expect
> > > other table AMs to implement it themselves or use the streaming read
> > > API?
> >
> > The prefetching added to acquire_sample_rows was quite narrowly tailored to
> > something heap-like - it pretty much required that block numbers to be 1:1
> > with the actual physical on-disk location for the specific AM.  So I think
> > it's pretty much required for this to be pushed down.
> >
> > Using a read stream is a few lines for something like this, so I'm not 
> > worried
> > about it. I guess we could have a default implementation for block based 
> > AMs,
> > similar what we have around table_block_parallelscan_*, but not sure it's
> > worth doing that, the complexity is much lower than in the
> > table_block_parallelscan_ case.
>
> This makes sense.
>
> I am working on pushing streaming ANALYZE into heap AM code, and I ran
> into a few roadblocks.
>
> If we want ANALYZE to make the ReadStream object in heap_beginscan()
> (like the read stream implementation of heap sequential and TID range
> scans do), I don't see any way around changing the scan_begin table AM
> callback to take a BufferAccessStrategy at the least (and perhaps also
> the BlockSamplerData).
>
> read_stream_begin_relation() doesn't just save the
> BufferAccessStrategy in the ReadStream, it uses it to set various
> other things in the ReadStream object. callback_private_data (which in
> ANALYZE's case is the BlockSamplerData) is simply saved in the
> ReadStream, so it could be set later, but that doesn't sound very
> clean to me.
>
> As such, it seems like a cleaner alternative would be to add a table
> AM callback for creating a read stream object that takes the
> parameters of read_stream_begin_relation(). But, perhaps it is a bit
> late for such additions.
>
> It also opens us up to the question of whether or not sequential scan
> should use such a callback instead of making the read stream object in
> heap_beginscan().
>
> I am happy to write a patch that does any of the above. But, I want to
> raise these questions, because perhaps I am simply missing an obvious
> alternative solution.


I understand that I'm the bad guy of this release, not sure if my
opinion counts.

But what is going on here?  I hope this work is targeting pg18.
Otherwise, do I get this right that this post feature-freeze works on
designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
committed on Mar 30.  So that couldn't justify why the proper API
wasn't designed in time.  Are we judging different commits with the
same criteria?

IMHO, 041b96802e should be just reverted.

------
Regards,
Alexander Korotkov

Re: Table AM Interface Enhancements

Reply via email to