[ https://issues.apache.org/jira/browse/ARROW-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kshiteej K reassigned ARROW-17301: ---------------------------------- Assignee: Kshiteej K (was: ChenTsing) > [C++] Implement compute function "binary_slice" > ----------------------------------------------- > > Key: ARROW-17301 > URL: https://issues.apache.org/jira/browse/ARROW-17301 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Affects Versions: 8.0.1 > Reporter: ChenTsing > Assignee: Kshiteej K > Priority: Major > Fix For: 11.0.0 > > > In some situations, may request an access method to get binary or sting > likes array one or some continuous bytes , for example start 1 end 3 step 1, > the two bytes, it seems like "{{{}binary_replace_slice{}}} " function, > provide byte and code two measurement unit > > > h1. *application case:* > > here, I can give one example to descirbe why need a function to extract > binary in byte unit: > > In distribute database, data has distribute policy and relatived > hash algorithm for different data type, here we just discuss string-like and > binary type, the hash algorithm need detach string-like or binary in bytes to > calculating, for example , take 1-4 byte cast to integer and shift-left 16 > bits, then take 5-6byte cast to integer and the result from last step, and so > on, the 'utf8_slice_codeunits' function can partly meet the require if all > are ascii, but if the string-like contain chinese, one chinese may occupied > three bytes, start 1 to end 3, three utf8 character > may take nine bytes, but it not meet the hash algorithm, it only need 3 > bytes, so if provide a function but not cast, the same function arguments > like 'utf8_slice_codeunits', it may called 'binary_slice_byteunit' -- This message was sent by Atlassian Jira (v8.20.10#820010)