Implementing the format fully requires memory management and IO interfaces (i.e. arrow/io/{file.h, interfaces.h, memory.h}). So those parts are not separable.
On Fri, Sep 20, 2019 at 10:36 AM Neal Richardson <neal.p.richard...@gmail.com> wrote: > > I wonder if having a core "format" C++ library, which the io, compute, > etc. library/libraries would depend on, is a natural step. > Particularly since we're coming up on 1.0 and the format is being > declared stable. > > Neal > > On Fri, Sep 20, 2019 at 8:28 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > > We would have to be even more careful about managing symbol exports. > > Third party projects would need to link more libraries in their > > applications (not unlike the way that Boost works now -- I suppose > > that Boost is the closest analogue to what we're going for) > > > > On Fri, Sep 20, 2019 at 2:30 AM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > >> > > >> We could indeed split up libarrow into more shared libraries. This > > >> would mean accepting a lot more maintenance effort though, on a team > > >> that is already overburdened. I'm not too keen on that in the short > > >> term. > > > > > > > > > Something for longer term to think about. What are you seeing as the > > > added maintenance here? > > > > > > > > > On Thu, Sep 19, 2019 at 5:38 PM Wes McKinney <wesmck...@gmail.com> wrote: > > >> > > >> hi Micah, > > >> > > >> > > >> On Thu, Sep 19, 2019 at 12:41 AM Micah Kornfield <emkornfi...@gmail.com> > > >> wrote: > > >> > > > >> > > > > >> > > * Should optional components be "opt in", "out out", or a mix? > > >> > > Currently it's a mix, and that's confusing for people. I think we > > >> > > should make them all "opt in". > > >> > > > >> > Agreed they should all be opt in by default. I think active developer > > >> > are > > >> > quite adept at flipping the appropriate CMake flags. > > >> > > > >> > > >> Cool. I opened a tracking JIRA > > >> https://issues.apache.org/jira/browse/ARROW-6637 and attached many > > >> issues. Sorry for the new JIRA flood > > >> > > >> > > > >> > > * Do we want to bring the out-of-the-box core build down to zero > > >> > > dependencies, including not depending on boost::filesystem and > > >> > > possibly checking the compiled Flatbuffers files. > > >> > > > >> > While it may be > > >> > > slightly more maintenance work, I think the optics of a > > >> > > "dependency-free" core build would be beneficial and help the project > > >> > > marketing-wise. > > >> > > > >> > I'm -.5 on checking in generated artifacts but this is mostly > > >> > stylistic. > > >> > In the case of flatbuffers it seems like we might be able to get-away > > >> > with > > >> > vendoring since it should mostly be headers only. > > >> > > > >> > I would prefer to try come up with more granular components and be > > >> > very conservative on what is "core". I think it should be possible > > >> > have a > > >> > zero dependency build if only MemoryPool, Buffers, Arrays and > > >> > ArrayBuilders > > >> > in a core package [1]. This combined with discussion Antoine started > > >> > on an > > >> > ABI compatible C-layer would make basic inter-op within a process > > >> > reasonable. Moving up the stack to IPC and files, there is probably a > > >> > way > > >> > to package headers separately from implementations. This would allow > > >> > other > > >> > projects wishing to integrate with Arrow to bring their own > > >> > implementations > > >> > without the baggage of boost::filesystem. Would this leave anything > > >> > besides > > >> > "flatbuffers" as a hard dependency to support IPC? > > >> > > > >> > > >> We could indeed split up libarrow into more shared libraries. This > > >> would mean accepting a lot more maintenance effort though, on a team > > >> that is already overburdened. I'm not too keen on that in the short > > >> term. > > >> > > >> > Thanks, > > >> > Micah > > >> > > > >> > > > >> > [1] It probably makes sense to go even further and separate out > > >> > MemoryPool > > >> > and Buffer, so we can break the circular relationship between parquet > > >> > and > > >> > arrow. > > >> > > >> Don't think this is possible even then, particularly in light of my > > >> recent work reading and writing Arrow columnar data "closer to the > > >> metal" inside Parquet, yielding beneficial performance improvements. > > >> > > >> > > > >> > On Wed, Sep 18, 2019 at 8:03 AM Wes McKinney <wesmck...@gmail.com> > > >> > wrote: > > >> > > > >> > > To be clear I think we should make these changes right after 0.15.0 > > >> > > is > > >> > > released so we aren't playing whackamole with our packaging scripts. > > >> > > I'm happy to take the lead on the work... > > >> > > > > >> > > On Wed, Sep 18, 2019 at 9:54 AM Antoine Pitrou <solip...@pitrou.net> > > >> > > wrote: > > >> > > > > > >> > > > On Wed, 18 Sep 2019 09:46:54 -0500 > > >> > > > Wes McKinney <wesmck...@gmail.com> wrote: > > >> > > > > I think these are both interesting areas to explore further. I'd > > >> > > > > like > > >> > > > > to focus on the couple of immediate items I think we should > > >> > > > > address > > >> > > > > > > >> > > > > * Should optional components be "opt in", "out out", or a mix? > > >> > > > > Currently it's a mix, and that's confusing for people. I think we > > >> > > > > should make them all "opt in". > > >> > > > > * Do we want to bring the out-of-the-box core build down to zero > > >> > > > > dependencies, including not depending on boost::filesystem and > > >> > > > > possibly checking the compiled Flatbuffers files. While it may be > > >> > > > > slightly more maintenance work, I think the optics of a > > >> > > > > "dependency-free" core build would be beneficial and help the > > >> > > > > project > > >> > > > > marketing-wise. > > >> > > > > > > >> > > > > Both of these issues must be addressed whether we undertake a > > >> > > > > Bazel > > >> > > > > implementation or some other refactor of the C++ build system. > > >> > > > > > >> > > > I think checking in the Flatbuffers files (and also Protobuf and > > >> > > > Thrift > > >> > > > where applicable :-)) would be fine. > > >> > > > > > >> > > > As for boost::filesystem, getting rid of it wouldn't be a huge > > >> > > > task. > > >> > > > Still worth deciding whether we want to prioritize development > > >> > > > time for > > >> > > > it, because it's not entirely trivial either. > > >> > > > > > >> > > > Regards > > >> > > > > > >> > > > Antoine. > > >> > > > > > >> > > > > > >> > >