ianmcook commented on a change in pull request #10190:
URL: https://github.com/apache/arrow/pull/10190#discussion_r628209893



##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -297,6 +297,33 @@ test_that("strsplit and str_split", {
 
 })
 
+test_that("arrow_*_split_whitespace functions", {
+
+  # use only ASCII whitespace characters
+  df_ascii <- tibble(x = c("Foo\nand bar", "baz\tand qux and quux"))
+
+  # use only non-ASCII whitespace characters
+  df_utf8 <- tibble(x = c("Foo\u00A0and\u2000bar", 
"baz\u2006and\u1680qux\u3000and\u2008quux"))

Review comment:
       I picked these non-ASCII whitespace characters from this list: 
https://stackoverflow.com/a/46637343/375432
   Some of the characters there were not recognized as whitespace by 
`arrow_utf8_split_whitespace`, in particular the zero-width ones, so I didn't 
use those here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to