[ 
https://issues.apache.org/jira/browse/ARROW-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kshiteej K reassigned ARROW-17301:
----------------------------------

    Assignee: Kshiteej K  (was: ChenTsing)

> [C++] Implement compute function "binary_slice"
> -----------------------------------------------
>
>                 Key: ARROW-17301
>                 URL: https://issues.apache.org/jira/browse/ARROW-17301
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 8.0.1
>            Reporter: ChenTsing
>            Assignee: Kshiteej K
>            Priority: Major
>             Fix For: 11.0.0
>
>
> In some situations, may  request an access method to get binary or sting 
> likes array one or some continuous bytes , for example start 1 end 3 step 1,  
> the two bytes, it seems like "{{{}binary_replace_slice{}}} " function, 
> provide byte and code two measurement unit
>  
>  
> h1. *application case:*
>  
> here, I can give one example to descirbe why need a function to extract 
> binary in byte unit:
>  
>           In distribute database, data has distribute policy and relatived 
> hash algorithm for different data type, here we just discuss string-like and 
> binary type, the hash algorithm need detach string-like or binary in bytes to 
> calculating, for example , take 1-4 byte cast to integer and shift-left 16 
> bits, then take 5-6byte cast to integer and the result from last step, and so 
> on, the  'utf8_slice_codeunits' function can partly meet the require if all 
> are ascii,  but if the string-like contain chinese, one chinese may occupied 
> three bytes,  start 1 to end 3, three utf8 character
>   may take nine bytes, but it not meet the hash algorithm, it only need 3 
> bytes, so if provide a function but not cast, the same function arguments 
> like 'utf8_slice_codeunits', it may called 'binary_slice_byteunit'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to