[ https://issues.apache.org/jira/browse/ARROW-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jorge updated ARROW-10030: -------------------------- Component/s: Rust Description: Proposal for comments: [https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit?usp=sharing] (dump of the document above) Rust Arrow supports two main computational models: # Batch Operations, that leverage some form of vectorization # Element-by-element operations, that emerge in more complex operations This document concerns element-by-element operations, that are the most common operations outside of the library. h2. Element-by-element operations These operations are programmatically written as: # Downcast the array to its specific type # Initialize buffers # Iterate over indices and perform the operation, appending to the buffers accordingly # Create ArrayData with the required null bitmap, buffers, childs, etc. # return ArrayRef from ArrayData We can split this process in 3 parts: # Initialization (1 and 2) # Iteration (3) # Finalization (4 and 5) Currently, the API that we offer to our users is: # as_any() to downcast the array based on its DataType # Builders for all types, that users can initialize, matching the downcasted array # Iterate # Use for i in (0..array.len()) # Use Array::value(i) and Array::is_valid(i)/is_null(i)` # use builder.append_value(new_value) or builder.append_null() # Finish the builder and wrap the result in an Arc This API has some issues: # value(i) +is unsafe+, even though it is not marked as such # builders are usually slow due to the checks that they need to perform # The API is not intuitive h2. Proposal This proposal aims at improving this API in 2 specific ways: * Implement IntoIterator Iterator<Item=T> and Iterator<Item=Option<T>> * Implement FromIterator<Item=T> and Item=Option<T> so that users can write: {code:java} let array = Int32Array::from(vec![Some(0), None, Some(2), None, Some(4)]); // to and from iter, with a +1 let result: Int32Array = array .iter() .map(|e| if let Some(r) = e { Some(r + 1) } else { None }) .collect(); let expected = Int32Array::from(vec![Some(1), None, Some(3), None, Some(5)]); assert_eq!(result, expected); {code} This results in an API that is: # efficient, as it is our responsibility to create `FromIterator` that are efficient in populating the buffers/child etc from an iterator # Safe, as it does not allow segfaults # Simple, as users do not need to worry about Builders, buffers, etc, only native Rust. was: Proposal for comments: https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit?usp=sharing (dump of the proposal:) Rust Arrow supports two main computational models: # Batch Operations, that leverage some form of vectorization # Element-by-element operations, that emerge in more complex operations This document concerns element-by-element operations, that are the most common operations outside of the library. h2. Element-by-element operations These operations are programmatically written as: # Downcast the array to its specific type # Initialize buffers # Iterate over indices and perform the operation, appending to the buffers accordingly # Create ArrayData with the required null bitmap, buffers, childs, etc. # return ArrayRef from ArrayData We can split this process in 3 parts: # Initialization (1 and 2) # Iteration (3) # Finalization (4 and 5) Currently, the API that we offer to our users is: # as_any() to downcast the array based on its DataType # Builders for all types, that users can initialize, matching the downcasted array # Iterate # Use for i in (0..array.len()) # Use Array::value(i) and Array::is_valid(i)/is_null(i)` # use builder.append_value(new_value) or builder.append_null() # Finish the builder and wrap the result in an Arc This API has some issues: # value(i) +is unsafe+, even though it is not marked as such # builders are usually slow due to the checks that they need to perform # The API is not intuitive h2. Proposal This proposal aims at improving this API in 2 specific ways: * Implement IntoIterator Iterator<Item=T> and Iterator<Item=Option<T>> * Implement FromIterator<Item=T> and Item=Option<T> so that users can write: {code:java} let array = Int32Array::from(vec![Some(0), None, Some(2), None, Some(4)]); // to and from iter, with a +1 let result: Int32Array = array .iter() .map(|e| if let Some(r) = e { Some(r + 1) } else { None }) .collect(); let expected = Int32Array::from(vec![Some(1), None, Some(3), None, Some(5)]); assert_eq!(result, expected); {code} This results in an API that is: # efficient, as it is our responsibility to create `FromIterator` that are efficient in populating the buffers/child etc from an iterator # Safe, as it does not allow segfaults # Simple, as users do not need to worry about Builders, buffers, etc, only native Rust. > [Rust] Support fromIter and toIter > ---------------------------------- > > Key: ARROW-10030 > URL: https://issues.apache.org/jira/browse/ARROW-10030 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust > Reporter: Jorge > Priority: Major > > Proposal for comments: > [https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit?usp=sharing] > (dump of the document above) > Rust Arrow supports two main computational models: > # Batch Operations, that leverage some form of vectorization > # Element-by-element operations, that emerge in more complex operations > This document concerns element-by-element operations, that are the most > common operations outside of the library. > h2. Element-by-element operations > These operations are programmatically written as: > # Downcast the array to its specific type > # Initialize buffers > # Iterate over indices and perform the operation, appending to the buffers > accordingly > # Create ArrayData with the required null bitmap, buffers, childs, etc. > # return ArrayRef from ArrayData > > We can split this process in 3 parts: > # Initialization (1 and 2) > # Iteration (3) > # Finalization (4 and 5) > Currently, the API that we offer to our users is: > # as_any() to downcast the array based on its DataType > # Builders for all types, that users can initialize, matching the downcasted > array > # Iterate > # Use for i in (0..array.len()) > # Use Array::value(i) and Array::is_valid(i)/is_null(i)` > # use builder.append_value(new_value) or builder.append_null() > # Finish the builder and wrap the result in an Arc > This API has some issues: > # value(i) +is unsafe+, even though it is not marked as such > # builders are usually slow due to the checks that they need to perform > # The API is not intuitive > h2. Proposal > This proposal aims at improving this API in 2 specific ways: > * Implement IntoIterator Iterator<Item=T> and Iterator<Item=Option<T>> > * Implement FromIterator<Item=T> and Item=Option<T> > so that users can write: > > {code:java} > let array = Int32Array::from(vec![Some(0), None, Some(2), None, Some(4)]); > // to and from iter, with a +1 > let result: Int32Array = array > .iter() > .map(|e| if let Some(r) = e { Some(r + 1) } else { None }) > .collect(); > let expected = Int32Array::from(vec![Some(1), None, Some(3), None, Some(5)]); > assert_eq!(result, expected); > {code} > > This results in an API that is: > # efficient, as it is our responsibility to create `FromIterator` that are > efficient in populating the buffers/child etc from an iterator > # Safe, as it does not allow segfaults > # Simple, as users do not need to worry about Builders, buffers, etc, only > native Rust. -- This message was sent by Atlassian Jira (v8.3.4#803005)