[jira] [Created] (ARROW-2523) [Rust] Implement CAST operations for arrays

2018-04-28 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2523:
-

 Summary: [Rust] Implement CAST operations for arrays
 Key: ARROW-2523
 URL: https://issues.apache.org/jira/browse/ARROW-2523
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


I have implemented CAST operations in DataFusion but I would like to 
re-implement this now directly in Arrow. I will create a PR after the Rust 
refactor is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Refactoring the Rust API

2018-04-28 Thread Andy Grove
I filed a PR to track this (https://issues.apache.org/jira/browse/ARROW-2521)
but thought it was worth raising on the mailing list too.

I am running into limitations now of the way that Array is represented as
an enum and I am unable to implement List> with the current design.

When Krisztian Szucs and I were working on the initial code we had two
different approaches and we went with this enum approach at the time
because we weren't able to make the other approach (traits + generics) work.

Now that I'm further along the Rust learning curve, I can make the trait +
generic approach work and I'm currently prototyping in a separate repo, and
it is looking good so far. I have been able to create a struct array
containing different type fields including List>.

I think I'm ready to start the refactor for real in my fork. We only have
~1k LOC so I don't think it will take too long, but because I'm doing this
in my spare time I am going to estimate that I will have it complete in
just over one week, aiming for having it complete by 4/30.

I think it's fine to continue merging small PRs in the meanwhile but I
think we should hold off any major changes in the coming week.

Thanks,

Andy.


[jira] [Created] (ARROW-2522) [C++] Version shared library files

2018-04-28 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2522:
-

 Summary: [C++] Version shared library files
 Key: ARROW-2522
 URL: https://issues.apache.org/jira/browse/ARROW-2522
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


We should version installed shared library files (SO under Unix, DLL under 
Windows) to disambiguate incompatible ABI versions.

CMake provides support for that:
http://pusling.com/blog/?p=352
https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Sync Call Notes

2018-04-28 Thread Antoine Pitrou

Done, see https://issues.apache.org/jira/browse/ARROW-2522



Le 28/04/2018 à 02:01, Wes McKinney a écrit :
> Yes, I'd say let's definitely bump the SO version with each major
> release. Is there a JIRA for this already? If not let's create one
> 
> On Tue, Apr 24, 2018 at 2:02 PM, Antoine Pitrou  wrote:
>>
>> Le 24/04/2018 à 19:58, Wes McKinney a écrit :
>>>
>>> In summary, until the Arrow developer group grows significantly
>>> larger, I think we should expect the users of these libraries to "live
>>> at HEAD". I do think we should make ABI changes transparent and
>>> well-documented so the pain is minimized. For the moment, we still
>>> have a lot of development work to do for more people to "care" about
>>> Apache Arrow and invest in its success long term.
>>
>> Perhaps we can also version the installed SO files and bump their
>> version at each feature relase?  Right now it's just "libarrow.so.0"
>> (pointing to "libarrow.so.0.0.0").
>>
>> Regards
>>
>> Antoine.


[jira] [Created] (ARROW-2521) [Rust] Review Array design before first release

2018-04-28 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2521:
-

 Summary: [Rust] Review Array design before first release
 Key: ARROW-2521
 URL: https://issues.apache.org/jira/browse/ARROW-2521
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


Early on, [~kszucs] and I worked on two different designs for how to represent 
Arrow arrays in Rust, each with their pros and cons.

Krisztian started out with a generics approach e.g. Array which was great 
until we tried to implement structs, which can contain mixed types so we ended 
up using enum to represent arrays, which was great until I got to the list 
types ... I don't think I can implement nested lists with this approach.

I am reviewing this again now that I am more familiar with Arrow and also my 
Rust skills have improved greatly since I started working on all of this.

I will be prototyping in a separate repo, and will update this Jira once I have 
something concrete to share, but I feel it is important to address this before 
the first official release of the Rust version. Also, if we are going to 
consider a refactor like this, it is better to do it now while the codebase is 
tiny.

 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Struct memory format question

2018-04-28 Thread Andy Grove
Thanks, Antoine. I was hoping that was the case. I filed a PR to make this
more specific in the spec.

https://github.com/apache/arrow/pull/1959

This is filed under a Rust issue where this question came about.

Thanks,

Andy.

On Sat, Apr 28, 2018 at 9:35 AM, Antoine Pitrou  wrote:

>
> Le 28/04/2018 à 16:55, Andy Grove a écrit :
> > I have implemented structs in Rust as a vector of Arrays. Each nested
> array
> > uses a byte-aligned contiguous region of memory, but the array for field
> 2
> > is not contiguous with the array for field 1.
>
> Child arrays of a nested arrays do not have to be contiguous with one
> another.  They are truly independent arrays that just happen to be
> related through their usage in the nested array.
>
> Regards
>
> Antoine.
>


[jira] [Created] (ARROW-2520) [Rust] CI should also build against nightly Rust

2018-04-28 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2520:
-

 Summary: [Rust] CI should also build against nightly Rust
 Key: ARROW-2520
 URL: https://issues.apache.org/jira/browse/ARROW-2520
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
 Fix For: 0.10.0


We should build Arrow against Rust nightly, but allow failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Struct memory format question

2018-04-28 Thread Antoine Pitrou

Le 28/04/2018 à 16:55, Andy Grove a écrit :
> I have implemented structs in Rust as a vector of Arrays. Each nested array
> uses a byte-aligned contiguous region of memory, but the array for field 2
> is not contiguous with the array for field 1.

Child arrays of a nested arrays do not have to be contiguous with one
another.  They are truly independent arrays that just happen to be
related through their usage in the nested array.

Regards

Antoine.


[jira] [Created] (ARROW-2519) [Rust] Implement min/max for primitive arrays

2018-04-28 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2519:
-

 Summary: [Rust] Implement min/max for primitive arrays
 Key: ARROW-2519
 URL: https://issues.apache.org/jira/browse/ARROW-2519
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


I would like to efficient get the min or max value in an array of primitives 
for efficient aggregate queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Struct memory format question

2018-04-28 Thread Andy Grove
I have implemented structs in Rust as a vector of Arrays. Each nested array
uses a byte-aligned contiguous region of memory, but the array for field 2
is not contiguous with the array for field 1.

Is this "compliant" ?

I have been reading layout.md and I don't think it is clear but it seems to
suggest that all the nested arrays should be contiguous to each other in
memory?

Thanks,

Andy.