On Thu, Mar 14, 2024 at 9:59 AM John Naylor <johncnaylo...@gmail.com> wrote: > > On Wed, Mar 13, 2024 at 9:29 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > On Wed, Mar 13, 2024 at 8:05 PM John Naylor <johncnaylo...@gmail.com> wrote: > > > > > > On Wed, Mar 13, 2024 at 8:39 AM Masahiko Sawada <sawada.m...@gmail.com> > > > wrote: > > > > > > > As I mentioned above, if we implement the test cases in C, we can use > > > > the debug-build array in the test code. And we won't use it in AND/OR > > > > operations tests in the future. > > > > > > That's a really interesting idea, so I went ahead and tried that for > > > v71. This seems like a good basis for testing larger, randomized > > > inputs, once we decide how best to hide that from the expected output. > > > The tests use SQL functions do_set_block_offsets() and > > > check_set_block_offsets(). The latter does two checks against a tid > > > array, and replaces test_dump_tids(). > > > > Great! I think that's a very good starter. > > > > The lookup_test() (and test_lookup_tids()) do also test that the > > IsMember() function returns false as expected if the TID doesn't exist > > in it, and probably we can do these tests in a C function too. > > > > BTW do we still want to test the tidstore by using a combination of > > SQL functions? We might no longer need to input TIDs via a SQL > > function. > > I'm not sure. I stopped short of doing that to get feedback on this > much. One advantage with SQL functions is we can use generate_series > to easily input lists of blocks with different numbers and strides, > and array literals for offsets are a bit easier. What do you think?
While I'm not a fan of the following part, I agree that it makes sense to use SQL functions for test data generation: -- Constant values used in the tests. \set maxblkno 4294967295 -- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is 291. -- We use a higher number to test tidstore. \set maxoffset 512 It would also be easier for developers to test the tidstore with their own data set. So I agreed with the current approach; use SQL functions for data generation and do the actual tests inside C functions. Is it convenient for developers if we have functions like generate_tids() and generate_random_tids() to generate TIDs so that they can pass them to do_set_block_offsets()? Then they call check_set_block_offsets() and others for actual data lookup and iteration tests. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com