[ https://issues.apache.org/jira/browse/ARROW-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374924#comment-17374924 ]
Nic Crane commented on ARROW-13259: ----------------------------------- Thanks very much [~maartenbreddels] and [~jorisvandenbossche] ! [~lidavidm] - nah, it's fine, I can just copy from the Python implementation and chuck in some R code like {code:java} if(stop==-1)stop = .Machine$integer.max{code} CC [~pachamaltese] > [C++] Enable slicing to end of string using "utf8_slice_codeunits" when > string length unknown or different lengths > ------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-13259 > URL: https://issues.apache.org/jira/browse/ARROW-13259 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Nic Crane > Priority: Major > > We're currently trying to write bindings from the C++ function > "utf8_slice_codeunits" to R, specifically trying to replicate the behaviour > of R's string::str_sub > In both the R and C++ implementations, I can use negative indices to count > back from the end of a string (show below in R, but the latter directly > invokes the C++ implementation): > > {code:java} > # stringr version > > stringr::str_sub("Apache Arrow", -5, -2) > [1] "Arro" > # C++ version > > call_function("utf8_slice_codeunits", Scalar$create("Apache Arrow"), > > options = list(start=-5L, stop=-1L)) > Scalar > Arro{code} > Note that in the C++ implementation, I have to add 1 to the stop value as the > final value is non-inclusive. > The problem is when I'm trying to use negative indices to refer to the final > values in a string: > > {code:java} > stringr version > > stringr::str_sub("Apache Arrow", -5, -1) > [1] "Arrow" > # C++ version > > call_function("utf8_slice_codeunits", Scalar$create("Apache Arrow"), > > options = list(start=-5L, stop=0L)) > Scalar > {code} > The result is blank as the 'stop' value 0 refers to the start of the string, > effective walking backwards, which isn't possible (except via the step > argument which I can't get working but I don't think is what I want anyway). > I've tried to get around this by attempting to write some code that > calculates the length of the string and supply that to the stop argument, but > it didn't work. > I do have a possible workaround that involves reversing the string, > extracting the substring using inverted values of swapped stop/start values, > and then reversing the result, but before I go down that path, I was > wondering if there is anything that can (and should! the answer may be a > simple "nope!") be changed in the C++ code to make it possible to do this a > different way? > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)