HaoYang670 commented on code in PR #1665:
URL: https://github.com/apache/arrow-rs/pull/1665#discussion_r867324877


##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -18,13 +18,136 @@
 //! Defines kernel to extract a substring of an Array
 //! Supported array types: \[Large\]StringArray, \[Large\]BinaryArray
 
+use crate::array::DictionaryArray;
 use crate::buffer::MutableBuffer;
+use crate::datatypes::*;
 use crate::{array::*, buffer::Buffer};
 use crate::{
     datatypes::DataType,
     error::{ArrowError, Result},
 };
 use std::cmp::Ordering;
+use std::sync::Arc;
+
+/// Returns an ArrayRef with substrings of all the elements in `array`.
+///
+/// # Arguments
+///
+/// * `start` - The start index of all substrings.
+/// If `start >= 0`, then count from the start of the string,
+/// otherwise count from the end of the string.
+///
+/// * `length`(option) - The length of all substrings.
+/// If `length` is `None`, then the substring is from `start` to the end of 
the string.
+///
+/// Attention: Both `start` and `length` are counted by byte, not by char.
+///
+/// # Basic usage
+/// ```
+/// # use arrow::array::StringArray;
+/// # use arrow::compute::kernels::substring::substring;
+/// let array = StringArray::from(vec![Some("arrow"), None, Some("rust")]);
+/// let result = substring(&array, 1, Some(4)).unwrap();
+/// let result = result.as_any().downcast_ref::<StringArray>().unwrap();
+/// assert_eq!(result, &StringArray::from(vec![Some("rrow"), None, 
Some("ust")]));
+/// ```
+///
+/// # Error
+/// - The function errors when the passed array is not a \[Large\]String array 
or \[Large\]Binary array.
+/// - The function errors if the offset of a substring in the input array is 
at invalid char boundary (only for \[Large\]String array).
+///
+/// ## Example of trying to get an invalid utf-8 format substring
+/// ```
+/// # use arrow::array::StringArray;
+/// # use arrow::compute::kernels::substring::substring;
+/// let array = StringArray::from(vec![Some("E=mc²")]);
+/// let error = substring(&array, 0, Some(5)).unwrap_err().to_string();
+/// assert!(error.contains("invalid utf-8 boundary"));
+/// ```
+pub fn substring(array: &dyn Array, start: i64, length: Option<u64>) -> 
Result<ArrayRef> {

Review Comment:
   Just a nit: Maybe we could let `length` be `Option<u32>`. Because the 
longest length will not exceed `1<<32 - 1` (for `LargeBinaryArray` and 
`LargeStringArray`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to