[ 
https://issues.apache.org/jira/browse/ARROW-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360067#comment-17360067
 ] 

Antoine Pitrou commented on ARROW-13013:
----------------------------------------

I'm a bit lukewarm about this.

First, if we expose a {{AssertCallFunction}} function, what kind of boilerplate 
does writing the tests in Python avoid?

Second, we currently aren't able to run Python tests in Valgrind or 
ASAN/UBSAN-based CI jobs. So this would significantly decrease our test 
coverage, and compute kernels can do quite a bit of low-level memory accesses 
that make such checks really useful.

While C++ can be a bit annoying and verbose at times, I don't think it shows up 
very often in kernel tests. As an exemple, I took a quick look at 
{{scalar_string_test.cc}}, and ~90% of that file seems to be actual testing, 
not boilerplate. For example:

{code:c++}
TYPED_TEST(TestStringKernels, AsciiUpper) {
  this->CheckUnary("ascii_upper", "[]", this->type(), "[]");
  this->CheckUnary("ascii_upper", "[\"aAazZæÆ&\", null, \"\", \"bbb\"]", 
this->type(),
                   "[\"AAAZZæÆ&\", null, \"\", \"BBB\"]");
}

TYPED_TEST(TestStringKernels, AsciiLower) {
  this->CheckUnary("ascii_lower", "[]", this->type(), "[]");
  this->CheckUnary("ascii_lower", "[\"aAazZæÆ&\", null, \"\", \"BBB\"]", 
this->type(),
                   "[\"aaazzæÆ&\", null, \"\", \"bbb\"]");
}

TYPED_TEST(TestStringKernels, AsciiReverse) {
  this->CheckUnary("ascii_reverse", "[]", this->type(), "[]");
  this->CheckUnary("ascii_reverse", R"(["abcd", null, "", "bbb"])", 
this->type(),
                   R"(["dcba", null, "", "bbb"])");

  Datum invalid_input = ArrayFromJSON(this->type(), R"(["aAazZæÆ&", null, "", 
"bbb"])");
  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid,
                                  testing::HasSubstr("Non-ASCII sequence in 
input"),
                                  CallFunction("ascii_reverse", 
{invalid_input}));
}

TYPED_TEST(TestStringKernels, Utf8Reverse) {
  this->CheckUnary("utf8_reverse", "[]", this->type(), "[]");
  this->CheckUnary("utf8_reverse", R"(["abcd", null, "", "bbb"])", this->type(),
                   R"(["dcba", null, "", "bbb"])");
  this->CheckUnary("utf8_reverse", R"(["aAazZæÆ&", null, "", "bbb", "ɑɽⱤæÆ"])",
                   this->type(), R"(["&ÆæZzaAa", null, "", "bbb", "ÆæⱤɽɑ"])");

  // inputs with malformed utf8 chars would produce garbage output, but the end 
result
  // would produce arrays with same lengths. Hence checking offset buffer 
equality
  auto malformed_input = ArrayFromJSON(this->type(), "[\"ɑ\xFFɑa\", 
\"ɽ\xe1\xbdɽa\"]");
  const Result<Datum>& res = CallFunction("utf8_reverse", {malformed_input});
  
ASSERT_TRUE(res->array()->buffers[1]->Equals(*malformed_input->data()->buffers[1]));
}
{code}
This may be slighlier terser when expressed in Python, but I don't think the 
difference would be enormous (note the last line should use {{BufferEquals}}).



> [C++][Compute][Python] Move (majority of) kernel unit tests to python
> ---------------------------------------------------------------------
>
>                 Key: ARROW-13013
>                 URL: https://issues.apache.org/jira/browse/ARROW-13013
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Ben Kietzman
>            Priority: Major
>
> mailing list discussion: 
> [https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E]
> Writing unit tests for compute functions in c++ is laborious, entails a lot 
> of boilerplate, and slows iteration since it requires recompilation when 
> adding new tests. The majority of these test cases need not be written in C++ 
> at all and could instead be made part of the pyarrow test suite.
> In order to make the kernels' C++ implementations easily debuggable from unit 
> tests, we'll have to expose a c++ function named {{AssertCallFunction}} or 
> so. {{AssertCallFunction}} will invoke the named compute::Function and 
> compare actual results to expected without crossing the C++/python boundary, 
> allowing a developer to step through all relevant code with a single 
> breakpoint in GDB. Construction of scalars/arrays/function options and any 
> other inputs to the function is amply supported by {{pyarrow}}, and will 
> happen outside the scope of {{AssertCallFunction}}.
> {{AssertCallFunction}} should not try to derive additional assertions from 
> its arguments - for example {{CheckScalar("add",
> {left, right}
> , expected)}} will first assert that {{left + right == expected}} then 
> {{left.slice(1) + right.slice(1) == expected.slice(1)}} to ensure that 
> offsets are handled correctly. This has value but can be easily expressed in 
> Python and configuration of such behavior would overcomplicate the interface 
> of {{AssertCallFunction}}.
> Unit tests for kernels would then be written in 
> {{arrow/python/pyarrow/test/kernels/test_*.py}}. The C++ unit test for 
> [addition with implicit 
> casts|https://github.com/apache/arrow/blob/b38ab81cb96e393a026d05a22e5a2f62ff6c23d7/cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc#L897-L918]
>  could then be rewritten as
> {code:python}
> def test_addition_implicit_casts():
>     AssertCallFunction("add", [[ 0,    1,   2,    None],
>                                [ 0.25, 1.5, 2.75, None]],
>                        expected=[0.25, 1.5, 2.75, None])
>     # ...
> {code}
> NB: Some unit tests will probably still reside in C++ since we'll need to 
> test things we don't wish to expose in a user facing API, such as "whether a 
> boolean kernel avoids clobbering bits when outputting into a slice". These 
> should be far more manageable since they won't need to assert correct logic 
> across all possible input types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to