> 8 дек. 2020 г., в 16:44, Denis Smirnov <s...@arenadata.io> написал(а):
> 
> Andrey, thanks for your feedback!
> 
> I agree that AMs with fix sized blocks can have much alike code in 
> acquire_sample_rows() (though it is not a rule). But there are several points 
> about current master sampling.
> 
> * It is not perfect - AM developers may want to improve it with other 
> sampling algorithms.
> * It is designed with a big influence of heap AM - for example, 
> RelationGetNumberOfBlocks() returns uint32 while other AMs can have a bigger 
> amount of blocks.
> * heapam_acquire_sample_rows() is a small function - I don't think it is not 
> a big trouble to write something alike for any AM developer.
> * Some AMs may have a single level sampling (only algorithm Z from Vitter for 
> example) - why not?
> 
> As a result we get a single and clear method to acquire rows for statistics. 
> If we don’t modify but rather extend current API ( for example in a manner it 
> is done for FDW) the code becomes more complicated and difficult to 
> understand.

This makes sense. Purpose of the API is to provide flexible abstraction. 
Current table_scan_analyze_next_block()/table_scan_analyze_next_tuple() API 
assumes too much about AM implementation.
But why do you pass int natts and VacAttrStats **stats to 
acquire_sample_rows()? Is it of any use? It seems to break abstraction too.

Best regards, Andrey Borodin.

Reply via email to