[jira] [Created] (ARROW-2523) [Rust] Implement CAST operations for arrays
Andy Grove created ARROW-2523: - Summary: [Rust] Implement CAST operations for arrays Key: ARROW-2523 URL: https://issues.apache.org/jira/browse/ARROW-2523 Project: Apache Arrow Issue Type: New Feature Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.10.0 I have implemented CAST operations in DataFusion but I would like to re-implement this now directly in Arrow. I will create a PR after the Rust refactor is complete. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Refactoring the Rust API
I filed a PR to track this (https://issues.apache.org/jira/browse/ARROW-2521) but thought it was worth raising on the mailing list too. I am running into limitations now of the way that Array is represented as an enum and I am unable to implement List> with the current design. When Krisztian Szucs and I were working on the initial code we had two different approaches and we went with this enum approach at the time because we weren't able to make the other approach (traits + generics) work. Now that I'm further along the Rust learning curve, I can make the trait + generic approach work and I'm currently prototyping in a separate repo, and it is looking good so far. I have been able to create a struct array containing different type fields including List>. I think I'm ready to start the refactor for real in my fork. We only have ~1k LOC so I don't think it will take too long, but because I'm doing this in my spare time I am going to estimate that I will have it complete in just over one week, aiming for having it complete by 4/30. I think it's fine to continue merging small PRs in the meanwhile but I think we should hold off any major changes in the coming week. Thanks, Andy.
[jira] [Created] (ARROW-2522) [C++] Version shared library files
Antoine Pitrou created ARROW-2522: - Summary: [C++] Version shared library files Key: ARROW-2522 URL: https://issues.apache.org/jira/browse/ARROW-2522 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.9.0 Reporter: Antoine Pitrou We should version installed shared library files (SO under Unix, DLL under Windows) to disambiguate incompatible ABI versions. CMake provides support for that: http://pusling.com/blog/?p=352 https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Sync Call Notes
Done, see https://issues.apache.org/jira/browse/ARROW-2522 Le 28/04/2018 à 02:01, Wes McKinney a écrit : > Yes, I'd say let's definitely bump the SO version with each major > release. Is there a JIRA for this already? If not let's create one > > On Tue, Apr 24, 2018 at 2:02 PM, Antoine Pitrou wrote: >> >> Le 24/04/2018 à 19:58, Wes McKinney a écrit : >>> >>> In summary, until the Arrow developer group grows significantly >>> larger, I think we should expect the users of these libraries to "live >>> at HEAD". I do think we should make ABI changes transparent and >>> well-documented so the pain is minimized. For the moment, we still >>> have a lot of development work to do for more people to "care" about >>> Apache Arrow and invest in its success long term. >> >> Perhaps we can also version the installed SO files and bump their >> version at each feature relase? Right now it's just "libarrow.so.0" >> (pointing to "libarrow.so.0.0.0"). >> >> Regards >> >> Antoine.
[jira] [Created] (ARROW-2521) [Rust] Review Array design before first release
Andy Grove created ARROW-2521: - Summary: [Rust] Review Array design before first release Key: ARROW-2521 URL: https://issues.apache.org/jira/browse/ARROW-2521 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.10.0 Early on, [~kszucs] and I worked on two different designs for how to represent Arrow arrays in Rust, each with their pros and cons. Krisztian started out with a generics approach e.g. Array which was great until we tried to implement structs, which can contain mixed types so we ended up using enum to represent arrays, which was great until I got to the list types ... I don't think I can implement nested lists with this approach. I am reviewing this again now that I am more familiar with Arrow and also my Rust skills have improved greatly since I started working on all of this. I will be prototyping in a separate repo, and will update this Jira once I have something concrete to share, but I feel it is important to address this before the first official release of the Rust version. Also, if we are going to consider a refactor like this, it is better to do it now while the codebase is tiny. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Struct memory format question
Thanks, Antoine. I was hoping that was the case. I filed a PR to make this more specific in the spec. https://github.com/apache/arrow/pull/1959 This is filed under a Rust issue where this question came about. Thanks, Andy. On Sat, Apr 28, 2018 at 9:35 AM, Antoine Pitrou wrote: > > Le 28/04/2018 à 16:55, Andy Grove a écrit : > > I have implemented structs in Rust as a vector of Arrays. Each nested > array > > uses a byte-aligned contiguous region of memory, but the array for field > 2 > > is not contiguous with the array for field 1. > > Child arrays of a nested arrays do not have to be contiguous with one > another. They are truly independent arrays that just happen to be > related through their usage in the nested array. > > Regards > > Antoine. >
[jira] [Created] (ARROW-2520) [Rust] CI should also build against nightly Rust
Andy Grove created ARROW-2520: - Summary: [Rust] CI should also build against nightly Rust Key: ARROW-2520 URL: https://issues.apache.org/jira/browse/ARROW-2520 Project: Apache Arrow Issue Type: Improvement Reporter: Andy Grove Fix For: 0.10.0 We should build Arrow against Rust nightly, but allow failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Struct memory format question
Le 28/04/2018 à 16:55, Andy Grove a écrit : > I have implemented structs in Rust as a vector of Arrays. Each nested array > uses a byte-aligned contiguous region of memory, but the array for field 2 > is not contiguous with the array for field 1. Child arrays of a nested arrays do not have to be contiguous with one another. They are truly independent arrays that just happen to be related through their usage in the nested array. Regards Antoine.
[jira] [Created] (ARROW-2519) [Rust] Implement min/max for primitive arrays
Andy Grove created ARROW-2519: - Summary: [Rust] Implement min/max for primitive arrays Key: ARROW-2519 URL: https://issues.apache.org/jira/browse/ARROW-2519 Project: Apache Arrow Issue Type: New Feature Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.10.0 I would like to efficient get the min or max value in an array of primitives for efficient aggregate queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Struct memory format question
I have implemented structs in Rust as a vector of Arrays. Each nested array uses a byte-aligned contiguous region of memory, but the array for field 2 is not contiguous with the array for field 1. Is this "compliant" ? I have been reading layout.md and I don't think it is clear but it seems to suggest that all the nested arrays should be contiguous to each other in memory? Thanks, Andy.