Wes,

Lots of good questions. I'll answer as much as I can but I'm really not
familiar with the C++ implementation and have built the Rust implementation
based on the specs. One of my goals for the IPC and integration testing is
to get familiar with the other implementations so that I can make sure the
Rust implementation becomes more aligned.

Maybe before I tackle IPC I should get familiar with the C++ and/or Java
API and start comparing how things are done. Java would be easier for me
given my background but is that as far along as the C++ API?

To answer your specific questions as best I can:

- Memory pool: There was a contribution of a pluggable pool that isn't
integrated with the rest of the code base. We should integrate it or remove
it. I filed ARROW-2620 for this. A pluggable memory pool isn't something
that I'm personally interested in since there are upcoming features in Rust
for this anyway as far as I know.

- Arrays: I didn't know 32 bit support was a requirement. Array len() and
null_count() actually return usize now (that changed as part of the
refactor) so I assume that should work on 32 bit but I will need to confirm
that.

- Lists: The spec only has an example for List with a fixed width primitive
so I'm still a little unclear on how lists and nested lists should really
work with other types. I'll have to study the C++ impl I guess to
understand this better. As a general comment, it would be nice to add more
detail to the specs and I'm happy to contribute to that as I learn more.

- List<T> vs ListArray<T>: These should probably be combined but I need to
learn more about List. I'm actually working on a PoC for my DataFusion
project right now that requires lists of user-defined types so this is
becoming a priority for me anyway.

- Should min/max be methods on PrimitiveArray vs top-level functions: The
issue is that min/max are strongly typed so they work well for
PrimitiveArray<T>. I suppose they could be added to the Array trait and
they would return another Array. I'll take a look.

- Null support: This is sketchy at best right now, but NULL support is the
last feature that I need before releasing the next version of my project so
this is a priority for me and I'll be fixing this up over the next 2 weeks.

* What is a "BufferArray": This got renamed to PrimitiveArray and is simply
an array of fixed-width primitives.

- Struct: column vs field: I filed ARROW-2617 to address this.

- Bitmap 1 vs 0: A flag makes sense. I filed ARROW-2618.

- JSON serde should nove to separate module: Makes sense. ARROW-2619.

- List / PrimitiveList / ListBuilder : See earlier comments on my (lack of)
understanding on List types

I hope that helps.

Thanks,

Andy.


On Fri, May 18, 2018 at 3:08 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> hi Andy,
>
> I gave a read through the Rust implementation. I have never programmed
> in Rust (hope to change that someday!), so some of the programming
> constructs are lost on me, but I focused on the Arrow columnar
> questions. My notes / questions follow
>
> cheers
> Wes
>
> ## High level comments
>
> * Do you plan to provide support for memory pools in the same style as
> C++? I
>   see some work already along these lines, but it will impact the API in a
>   number of places
>
> ## array.rs
>
> L35
>
> * Should len and null_count always return u64, even on 32-bit arch?
> * Do you intend to support zero-copy slicing in the way that C++ supports
> it?
>   Probably doesn't need to get done now
>
> L47 (ListArray)
>
> * In the format docs, for List<T>, T can be any type, including other
> nested
>   types. What does this mean here?
> * What is the difference between List<T> and ListArray<T> in Rust, can you
> add
>   a comment?
>
> L157
>
> * Are there benefits to having min and max be methods on PrimitiveArray
>   vs. top-level functions?
>
> L214
>
> * This Add operation does not account for nulls
>
> L247
>
> * What is a "BufferArray"?
>
> L295
>
> * Maybe call Struct attribute "columns" instead "fields"?
> * Same with changing column(i) -> field(i)
>
> ## bitmap.rs
>
> L35
>
> * I would expect a bitmap to be initialized with all 0's instead of all
> 1's, or
>   perhaps this should be a flag to do one or the other
>
> ## builder.rs
>
> * Similar question as above re: usize vs u64
>
> ## datatypes.rs
>
> L68
>
> * It feels like JSON serde for data types, fields, schemas belongs in its
> own
>   module (this thinking might just be influenced by the design of the
> Java/C++
>   libraries)
>
> ## list.rs
>
> L31
>
> * Seems like you might want to call this PrimitiveList, since it only
> supports
>   primitive child types. How will lists of arbitrary arrays be handled?
> (per
>   notes above, and per mailing list discussions)
> * The inner cells of a list may be null; this interface does not appear to
>   support that
>
> ## list_builder.rs
>
> * Same comment, should this be PrimitiveListBuilder?
> * How to build lists with a nested type child?
> * How to append null lists?
>
> On Fri, May 4, 2018 at 6:07 AM, Andy Grove <andygrov...@gmail.com> wrote:
> > Here's my blog post explaining the refactor:
> > https://andygrove.io/2018/05/apache-arrow-traits-generics/
> >
> > The Reddit thread is going to be here for anyone wanting to see the
> > feedback:
> > https://www.reddit.com/r/rust/comments/8gy45t/refactoring_
> apache_arrow_to_use_traits_and/
> >
> > Thanks,
> >
> > Andy.
> >
> > On Wed, May 2, 2018 at 5:10 PM, Andy Grove <andygrov...@gmail.com>
> wrote:
> >
> >> There was an interesting blog post posted to Reddit a couple days ago
> that
> >> is very relevant to this refactor. The author is building a dataframe
> >> library in Rust and started out with an enum to represent arrays and
> then
> >> moved to using generic traits with the enum.
> >>
> >> https://www.reddit.com/r/rust/comments/8g2274/dataframes_
> >> traits_enums_generics_and_dynamic/
> >>
> >> I don't think that approach would work for us and I'm tempted to write a
> >> up a blog post myself explaining the current refactor and why it is
> needed.
> >> I think I'll try and do that this weekend. I'm keen to get a discussion
> >> going around the refactor to make sure we don't need to do another
> refactor
> >> in the future.
> >>
> >> Andy.
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Apr 29, 2018 at 9:59 AM, Andy Grove <andygrov...@gmail.com>
> wrote:
> >>
> >>> So it turns out this refactor isn't as disruptive as I thought and I
> >>> mostly have it working already.
> >>>
> >>> The buffer/builder/list types barely change at all other than the fact
> >>> that we no longer need all those macros after moving to generics.
> >>>
> >>> It really is only array.rs that is pretty much a rewrite.
> >>>
> >>> Also, in my earlier email I got my dates wrong. I am aiming to have
> this
> >>> PR ready by Monday May 7th. The real test for me is integrating it with
> >>> DataFusion to make sure I haven't missed anything.
> >>>
> >>> Here's the branch where I'm working on this:
> >>> https://github.com/andygrove/arrow/tree/refactor_rust_api
> >>>
> >>> Thanks,
> >>>
> >>> Andy.
> >>>
> >>>
> >>>
> >>>
> >>> On Sat, Apr 28, 2018 at 2:10 PM, Andy Grove <andygrov...@gmail.com>
> >>> wrote:
> >>>
> >>>> I filed a PR to track this (https://issues.apache.org/jir
> >>>> a/browse/ARROW-2521) but thought it was worth raising on the mailing
> >>>> list too.
> >>>>
> >>>> I am running into limitations now of the way that Array is represented
> >>>> as an enum and I am unable to implement List<List<T>> with the current
> >>>> design.
> >>>>
> >>>> When Krisztian Szucs and I were working on the initial code we had two
> >>>> different approaches and we went with this enum approach at the time
> >>>> because we weren't able to make the other approach (traits +
> generics) work.
> >>>>
> >>>> Now that I'm further along the Rust learning curve, I can make the
> trait
> >>>> + generic approach work and I'm currently prototyping in a separate
> repo,
> >>>> and it is looking good so far. I have been able to create a struct
> array
> >>>> containing different type fields including List<List<T>>.
> >>>>
> >>>> I think I'm ready to start the refactor for real in my fork. We only
> >>>> have ~1k LOC so I don't think it will take too long, but because I'm
> doing
> >>>> this in my spare time I am going to estimate that I will have it
> complete
> >>>> in just over one week, aiming for having it complete by 4/30.
> >>>>
> >>>> I think it's fine to continue merging small PRs in the meanwhile but I
> >>>> think we should hold off any major changes in the coming week.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Andy.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
>

Reply via email to