I think even if it's not (easily) generalizable across languages, it'd still be a win for C++ (and hopefully languages that bind to C++). Also, I don't think they're meant to completely replace language-specific tests, but rather complement them, and make it easier to add and maintain tests in the overwhelmingly common case.
I do feel it's somewhat painful to write these kinds of tests in C++, largely because of the iteration time and the difficulty of repeating tests across various configurations. I also think this could be an opportunity to leverage things like Hypothesis/property-based testing or perhaps fuzzing to make the kernels even more robust. -David On 2021/05/14 18:09:45, Eduardo Ponce <edponc...@gmail.com> wrote: > Another aspect to keep in mind is that some tests require internal options > to be changed before executing the compute functions (e.g., check overflow, > allow NaN comparisons, change validity bits, etc.). Also, there are tests > that take randomized inputs and others make use of the min/max values for > each specific data type. Certainly, these details can be generalized across > languages/testing frameworks but not without careful treatment. > > Moreover, each language implementation still needs to test > language-specific or internal functions, so having a meta test framework > will not necessarily get rid of language-specific tests. > > ~Eduardo > > On Fri, May 14, 2021 at 1:56 PM Weston Pace <weston.p...@gmail.com> wrote: > > > With that in mind it seems the somewhat recurring discussion on coming > > up with a language independent standard for logical query plans > > ( > > https://lists.apache.org/thread.html/rfab15e09c97a8fb961d6c5db8b2093824c58d11a51981a40f40cc2c0%40%3Cdev.arrow.apache.org%3E > > ) > > would be relevant. Each test case would then be a triplet of (Input > > Dataframe, Logical Plan, Output Dataframe). So perhaps tackling this > > effort would be to make progress on both fronts. > > > > On Fri, May 14, 2021 at 7:39 AM Julian Hyde <jhyde.apa...@gmail.com> > > wrote: > > > > > > Do these any of these compute functions have analogs in other > > implementations of Arrow (e.g. Rust)? > > > > > > I believe that as much as possible of Arrow’s compute functionality > > should be cross-language. Perhaps there are language-specific differences > > in how functions are invoked, but the basic functionality is the same. > > > > > > If people buy into that premise, then a single suite of tests is a > > powerful way to make that happen. The tests can be written in a high-level > > language and can generate tests in each implementation language. (For these > > purposes, the “high-level language” could be a special text format, could > > be a data language such as JSON, or could be a programming language such as > > Python; it doesn’t matter much.) > > > > > > For example, > > > > > > assertThatCall(“foo(1, 2)”, returns(“3”)) > > > > > > might actually call foo with arguments 1 and 2, or it might generate a > > C++ or Rust test that does the same. > > > > > > > > > Julian > > > > > > > > > > On May 14, 2021, at 8:45 AM, Antoine Pitrou <anto...@python.org> > > wrote: > > > > > > > > > > > > Le 14/05/2021 à 15:30, Wes McKinney a écrit : > > > >> hi folks, > > > >> As we build more functions (kernels) in the project, I note that the > > > >> amount of hand-coded C++ code relating to testing function correctness > > > >> is growing significantly. Many of these tests are quite simple and > > > >> could be expressed in a text format that can be parsed and evaluated. > > > >> Thoughts about building something like that to make it easier to write > > > >> functional correctness tests? > > > > > > > > Or perhaps build-up higher level C++ functions if desired? > > > > > > > > Or even write some of those tests as part of the PyArrow test suite. > > > > > > > > I'm not sure adding a custom (and probably inflexible) text format is > > really a good use of our time. > > > > > > > > Regards > > > > > > > > Antoine. > > > > > >