lidavidm commented on a change in pull request #10317:
URL: https://github.com/apache/arrow/pull/10317#discussion_r633518422



##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -2317,6 +2365,14 @@ const FunctionDoc utf8_lower_doc(
     "Transform input to lowercase",
     ("For each string in `strings`, return a lowercase version."), 
{"strings"});
 
+const FunctionDoc ascii_reverse_doc(
+    "Reverse ascii input",
+    ("For each ascii string in `strings`, return a reversed version."), 
{"strings"});

Review comment:
       For consistency, can we include a message similar to the other kernels 
about UTF-8 support? e.g.
   
   ```
        "This function assumes the input is fully ASCII.  It it may contain\n"
        "non-ASCII characters, use \"utf8_reverse\" instead."),
   ```

##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -2317,6 +2365,14 @@ const FunctionDoc utf8_lower_doc(
     "Transform input to lowercase",
     ("For each string in `strings`, return a lowercase version."), 
{"strings"});
 
+const FunctionDoc ascii_reverse_doc(
+    "Reverse ascii input",
+    ("For each ascii string in `strings`, return a reversed version."), 
{"strings"});
+
+const FunctionDoc utf8_reverse_doc(
+    "Reverse utf8 input",
+    ("For each utf8 string in `strings`, return a reversed version."), 
{"strings"});

Review comment:
       Should we clarify that the reversing is done at the codepoint level, and 
not the grapheme cluster level? (So people would be aware that this kernel 
can't handle an é encoded as e + ´, for instance, or a flag emoji.)

##########
File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc
##########
@@ -91,6 +91,30 @@ TYPED_TEST(TestStringKernels, AsciiLower) {
                    "[\"aaazzæÆ&\", null, \"\", \"bbb\"]");
 }
 
+TYPED_TEST(TestStringKernels, AsciiReverse) {
+  this->CheckUnary("ascii_reverse", "[]", this->type(), "[]");
+  this->CheckUnary("ascii_reverse", R"(["abcd", null, "", "bbb"])", 
this->type(),
+                   R"(["dcba", null, "", "bbb"])");
+
+  Datum invalid_input = ArrayFromJSON(this->type(), R"(["aAazZæÆ&", null, "", 
"bbb"])");
+  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, testing::HasSubstr("Invalid UTF8 
sequence"),

Review comment:
       I know this is encoded into the base transform class, but this is rather 
a confusing error message for an ASCII kernel.

##########
File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc
##########
@@ -91,6 +91,30 @@ TYPED_TEST(TestStringKernels, AsciiLower) {
                    "[\"aaazzæÆ&\", null, \"\", \"bbb\"]");
 }
 
+TYPED_TEST(TestStringKernels, AsciiReverse) {
+  this->CheckUnary("ascii_reverse", "[]", this->type(), "[]");
+  this->CheckUnary("ascii_reverse", R"(["abcd", null, "", "bbb"])", 
this->type(),
+                   R"(["dcba", null, "", "bbb"])");
+
+  Datum invalid_input = ArrayFromJSON(this->type(), R"(["aAazZæÆ&", null, "", 
"bbb"])");
+  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, testing::HasSubstr("Invalid UTF8 
sequence"),

Review comment:
       `StringTransform` already makes calls to the concrete kernel via 
`Derived::Foo` calls, perhaps constructing the Status could be made into such a 
call.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to