Re: [DISCUSS] Changing C++ build system default options to produce more barebones builds
> > * Should optional components be "opt in", "out out", or a mix? > Currently it's a mix, and that's confusing for people. I think we > should make them all "opt in". Agreed they should all be opt in by default. I think active developer are quite adept at flipping the appropriate CMake flags. > * Do we want to bring the out-of-the-box core build down to zero > dependencies, including not depending on boost::filesystem and > possibly checking the compiled Flatbuffers files. While it may be > slightly more maintenance work, I think the optics of a > "dependency-free" core build would be beneficial and help the project > marketing-wise. I'm -.5 on checking in generated artifacts but this is mostly stylistic. In the case of flatbuffers it seems like we might be able to get-away with vendoring since it should mostly be headers only. I would prefer to try come up with more granular components and be very conservative on what is "core". I think it should be possible have a zero dependency build if only MemoryPool, Buffers, Arrays and ArrayBuilders in a core package [1]. This combined with discussion Antoine started on an ABI compatible C-layer would make basic inter-op within a process reasonable. Moving up the stack to IPC and files, there is probably a way to package headers separately from implementations. This would allow other projects wishing to integrate with Arrow to bring their own implementations without the baggage of boost::filesystem. Would this leave anything besides "flatbuffers" as a hard dependency to support IPC? Thanks, Micah [1] It probably makes sense to go even further and separate out MemoryPool and Buffer, so we can break the circular relationship between parquet and arrow. On Wed, Sep 18, 2019 at 8:03 AM Wes McKinney wrote: > To be clear I think we should make these changes right after 0.15.0 is > released so we aren't playing whackamole with our packaging scripts. > I'm happy to take the lead on the work... > > On Wed, Sep 18, 2019 at 9:54 AM Antoine Pitrou > wrote: > > > > On Wed, 18 Sep 2019 09:46:54 -0500 > > Wes McKinney wrote: > > > I think these are both interesting areas to explore further. I'd like > > > to focus on the couple of immediate items I think we should address > > > > > > * Should optional components be "opt in", "out out", or a mix? > > > Currently it's a mix, and that's confusing for people. I think we > > > should make them all "opt in". > > > * Do we want to bring the out-of-the-box core build down to zero > > > dependencies, including not depending on boost::filesystem and > > > possibly checking the compiled Flatbuffers files. While it may be > > > slightly more maintenance work, I think the optics of a > > > "dependency-free" core build would be beneficial and help the project > > > marketing-wise. > > > > > > Both of these issues must be addressed whether we undertake a Bazel > > > implementation or some other refactor of the C++ build system. > > > > I think checking in the Flatbuffers files (and also Protobuf and Thrift > > where applicable :-)) would be fine. > > > > As for boost::filesystem, getting rid of it wouldn't be a huge task. > > Still worth deciding whether we want to prioritize development time for > > it, because it's not entirely trivial either. > > > > Regards > > > > Antoine. > > > > >
Re: Timeline for 0.15.0 release
> > The process should be well documented at this point but there are a > number of steps. Is [1] the up-to-date documentation for the release? Are there instructions for the adding the code signing Key to SVN? I will make a go of it. i will try to mitigate any internet issues by doing the process for a cloud instance (I assume that isn't a problem?). Thanks, Micah [1] https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide On Wed, Sep 18, 2019 at 8:29 AM Wes McKinney wrote: > The process should be well documented at this point but there are a > number of steps. Note that you need to add your code signing key to > the KEYS file in SVN (that's not very hard to do). I think it's fine > to hand off the process to others after the VOTE but it would be > tricky to have multiple RMs involved with producing the source and > binary artifacts for the vote > > On Tue, Sep 17, 2019 at 10:55 PM Micah Kornfield > wrote: > > > > SGTM, as well. > > > > I should have a little bit of time next week if I can help as RM but I > have > > a couple of concerns: > > 1. In the past I've had trouble downloading and validating releases. > I'm a > > bit worried, that I might have similar problems doing the necessary > uploads. > > 2. My internet connection will likely be not great, I don't know if this > > would make it even less likely to be successful. > > > > Does it become problematic if somehow I would have to abandon the process > > mid-release? Is there anyone who could serve as a backup? Are the steps > > well documented? > > > > Thanks, > > Micah > > > > On Tue, Sep 17, 2019 at 4:25 PM Neal Richardson < > neal.p.richard...@gmail.com> > > wrote: > > > > > Sounds good to me. > > > > > > Do we have a release manager yet? Any volunteers? > > > > > > Neal > > > > > > On Tue, Sep 17, 2019 at 4:06 PM Wes McKinney > wrote: > > > > > > > hi all, > > > > > > > > It looks like we're drawing close to be able to make the 0.15.0 > > > > release. I would suggest "pencils down" at the end of this week and > > > > see if a release candidate can be produced next Monday September 23. > > > > Any thoughts or objections? > > > > > > > > Thanks, > > > > Wes > > > > > > > > On Wed, Sep 11, 2019 at 11:23 AM Wes McKinney > > > wrote: > > > > > > > > > > hi Eric -- yes, that's correct. I'm planning to amend the Format > docs > > > > > today regarding the EOS issue and also update the C++ library > > > > > > > > > > On Wed, Sep 11, 2019 at 11:21 AM Eric Erhardt > > > > > wrote: > > > > > > > > > > > > I assume the plan is to merge the ARROW-6313-flatbuffer-alignment > > > > branch into master before the 0.15 release, correct? > > > > > > > > > > > > BTW - I believe the C# alignment changes are ready to be merged > into > > > > the alignment branch - https://github.com/apache/arrow/pull/5280/ > > > > > > > > > > > > Eric > > > > > > > > > > > > -Original Message- > > > > > > From: Micah Kornfield > > > > > > Sent: Tuesday, September 10, 2019 10:24 PM > > > > > > To: Wes McKinney > > > > > > Cc: dev ; niki.lj > > > > > > Subject: Re: Timeline for 0.15.0 release > > > > > > > > > > > > I should have a little more bandwidth to help with some of the > > > > packaging starting tomorrow and going into the weekend. > > > > > > > > > > > > On Tuesday, September 10, 2019, Wes McKinney < > wesmck...@gmail.com> > > > > wrote: > > > > > > > > > > > > > Hi folks, > > > > > > > > > > > > > > With the state of nightly packaging and integration builds > things > > > > > > > aren't looking too good for being in release readiness by the > end > > > of > > > > > > > this week but maybe I'm wrong. I'm planning to be working to > close > > > as > > > > > > > many issues as I can and also to help with the ongoing > alignment > > > > fixes. > > > > > > > > > > > > > > Wes > > > > > > > > > > > > > > On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield < > > > emkornfi...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > >> Just for reference [1] has a dashboard of the current issues: > > > > > > >> > > > > > > >> > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi > > > > > > >> ki.apache.org > > > > %2Fconfluence%2Fdisplay%2FARROW%2FArrow%2B0.15.0%2BRelea > > > > > > >> se&data=02%7C01%7CEric.Erhardt%40microsoft.com > > > > %7Ccbead81a42104034 > > > > > > >> > > > > a4f308d736678a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370376 > > > > > > >> > > > > 90648216338&sdata=0Upux3i%2B9X6f8uanGKSGM5VYxR6c2ADWrxSPi1%2FgbH4 > > > > > > >> %3D&reserved=0 > > > > > > >> > > > > > > >> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney < > wesmck...@gmail.com> > > > > wrote: > > > > > > >> > > > > > > >>> hi all, > > > > > > >>> > > > > > > >>> It doesn't seem like we're going to be in a position to > release > > > at > > > > > > >>> the beginning of next week. I hope that one more week of > work (or > > > > > > >>> less) will be enough to get us there. Aside from merging the > > > > > > >>> alignme
Draft blog post for 0.15 release
Hi all, In preparation for next week, I've started a release announcement blog post here: https://github.com/apache/arrow-site/pull/27 Please fill in the parts you know best. Committers can just push edits to my branch; also feel free to reply to this thread with content, or email me directly, and I'll add it in for you. Neal
[jira] [Created] (ARROW-6616) [Website] Release annoucement blog post for 0.15
Neal Richardson created ARROW-6616: -- Summary: [Website] Release annoucement blog post for 0.15 Key: ARROW-6616 URL: https://issues.apache.org/jira/browse/ARROW-6616 Project: Apache Arrow Issue Type: Task Components: Website Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 0.15.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6615) [C++] Add filtering option to fs::Selector
Francois Saint-Jacques created ARROW-6615: - Summary: [C++] Add filtering option to fs::Selector Key: ARROW-6615 URL: https://issues.apache.org/jira/browse/ARROW-6615 Project: Apache Arrow Issue Type: New Feature Reporter: Francois Saint-Jacques It would convenient if Selector could support file path filtering, either via a regex or globbing applied to the path. This is semi required for filtering file in Dataset to properly apply the file format. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6614) [C++][Dataset] Implement FileSystemDataSourceDiscovery
Francois Saint-Jacques created ARROW-6614: - Summary: [C++][Dataset] Implement FileSystemDataSourceDiscovery Key: ARROW-6614 URL: https://issues.apache.org/jira/browse/ARROW-6614 Project: Apache Arrow Issue Type: New Feature Reporter: Francois Saint-Jacques DataSourceDiscovery is what allows InferingSchema and constructing a DataSource with PartitionScheme. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6613) [C++] Remove dependency on boost::filesystem
Antoine Pitrou created ARROW-6613: - Summary: [C++] Remove dependency on boost::filesystem Key: ARROW-6613 URL: https://issues.apache.org/jira/browse/ARROW-6613 Project: Apache Arrow Issue Type: Wish Components: C++ Reporter: Antoine Pitrou Fix For: 1.0.0 See ARROW-2196 for details. boost::filesystem should not be required for base functionality at least (including filesystems, probably). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6612) [C++] Add ARROW_CSV CMake build flag
Wes McKinney created ARROW-6612: --- Summary: [C++] Add ARROW_CSV CMake build flag Key: ARROW-6612 URL: https://issues.apache.org/jira/browse/ARROW-6612 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 I think it would be better to make building this part of the project not unconditional -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6611) [C++] Make ARROW_JSON=OFF the default
Wes McKinney created ARROW-6611: --- Summary: [C++] Make ARROW_JSON=OFF the default Key: ARROW-6611 URL: https://issues.apache.org/jira/browse/ARROW-6611 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 The JSON-based functionality is only needed for * Integration tests * Unit tests * JSON scanning If the user opts in to unit tests or integration tests, then we can flip it on, but I think that the user should opt in when building libarrow -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6610) [C++] Add ARROW_FILESYSTEM=ON/OFF CMake configuration flag
Wes McKinney created ARROW-6610: --- Summary: [C++] Add ARROW_FILESYSTEM=ON/OFF CMake configuration flag Key: ARROW-6610 URL: https://issues.apache.org/jira/browse/ARROW-6610 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Building this code should not be required in order to take advantage of the columnar core (memory allocation, data structures, IPC) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6609) [C++] Add minimal build Dockerfile example
Wes McKinney created ARROW-6609: --- Summary: [C++] Add minimal build Dockerfile example Key: ARROW-6609 URL: https://issues.apache.org/jira/browse/ARROW-6609 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Assignee: Wes McKinney Fix For: 0.15.0 This will also help developers test a minimal build configuration -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6608) [C++] Make default for ARROW_HDFS to be OFF
Wes McKinney created ARROW-6608: --- Summary: [C++] Make default for ARROW_HDFS to be OFF Key: ARROW-6608 URL: https://issues.apache.org/jira/browse/ARROW-6608 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 This is one optional usage of {{boost::filesystem}} that could be eliminated from the simple "core" build -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6607) Support for set/list columns in python
Giora Simchoni created ARROW-6607: - Summary: Support for set/list columns in python Key: ARROW-6607 URL: https://issues.apache.org/jira/browse/ARROW-6607 Project: Apache Arrow Issue Type: Wish Components: Python Environment: python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in Windows 10 Reporter: Giora Simchoni Hi, Using python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in Windows 10... ```python import pandas as pd df = pd.DataFrame(\{'a': [1,2,3], 'b': [set([1,2]), set([2,3]), set([3,4,5])]}) df.to_feather('test.ft') ``` I get: ``` Traceback (most recent call last): File "", line 1, in File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2131, in to_feather to_feather(self, fname) File "/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", line 83, in to_feather feather.write_feather(df, path) File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 182, in write_feather writer.write(df) File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 93, in write table = Table.from_pandas(df, preserve_index=False) File "pyarrow/table.pxi", line 1174, in pyarrow.lib.Table.from_pandas File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 496, in dataframe_to_arrays for c, f in zip(columns_to_convert, convert_fields)] File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 496, in for c, f in zip(columns_to_convert, convert_fields)] File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 487, in convert_column raise e File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 481, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 191, in pyarrow.lib.array File "pyarrow/array.pxi", line 78, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ('Could not convert \{1, 2} with type set: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column b with type object') ``` And obviously `df.drop('b', axis=1).to_feather('test.ft')` works. Questions: (1) Is it possible to support these kind of set/list columns? (2) Anyone has an idea on how to deal with this? I *cannot* unnest these set/list columns as this would explode the DataFrame. My only other idea is to convert set `\{1,2}` into a string `1,2` and parse it after reading the file. And hoping it won't be slow. Update: With lists column the error is different: ```python import pandas as pd df = pd.DataFrame(\{'a': [1,2,3], 'b': [[1,2], [2,3], [3,4,5]]}) df.to_feather('test.ft') ``` ``` Traceback (most recent call last): File "", line 1, in File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2131, in to_feather to_feather(self, fname) File "/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", line 83, in to_feather feather.write_feather(df, path) File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 182, in write_feather writer.write(df) File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 97, in write self.writer.write_array(name, col.data.chunk(0)) File "pyarrow/feather.pxi", line 67, in pyarrow.lib.FeatherWriter.write_array File "pyarrow/error.pxi", line 93, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: list ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] C-level in-process array protocol
Hello, One thing that was discussed in the sync call is the ability to easily pass arrays at runtime between Arrow implementations or Arrow-supporting libraries in the same process, without bearing the cost of linking to e.g. the C++ Arrow library. (for example: "Duckdb wants to provide an option to return Arrow data of result sets, but they don't like having Arrow as a dependency") One possibility would be to define a C-level protocol similar in spirit to the Python buffer protocol, which some people may be familiar with (*). The basic idea is to define a simple C struct, which is ABI-stable and describes an Arrow away adequately. The struct can be stack-allocated. Its definition can also be copied in another project (or interfaced with using a C FFI layer, depending on the language). There is no formal proposal, this message is meant to stir the discussion. Issues to work out: * Memory lifetime issues: where Python simply associates the Py_buffer with a PyObject owner (a garbage-collected Python object), we need another means to control lifetime of pointed areas. One simple possibility is to include a destructor function pointer in the protocol struct. * Arrow type representation. We probably need some kind of "format" mini-language to represent Arrow types, so that a type can be described using a `const char*`. Ideally, primitives types at least should be trivially parsable. We may take inspiration from Python here (`struct` module format characters, PEP 3118 format additions). Example C struct definition (not a formal proposal!): struct ArrowBuffer { void* data; int64_t nbytes; // Called by the consumer when it doesn't need the buffer anymore void (*release)(struct ArrowBuffer*); // Opaque user data (for e.g. the release callback) void* user_data; }; struct ArrowArray { // Type description const char* format; // Data description int64_t length; int64_t null_count; int64_t n_buffers; // Note: this pointers are probably owned by the ArrowArray struct // and will be released and free()ed by the release callback. struct BufferDescriptor* buffers; struct ArrowDescriptor* dictionary; // Called by the consumer when it doesn't need the array anymore void (*release)(struct ArrowArrayDescriptor*); // Opaque user data (for e.g. the release callback) void* user_data; }; Thoughts? (*) For the record, the reference for the Python buffer protocol: https://docs.python.org/3/c-api/buffer.html#buffer-structure and its C struct definition: https://github.com/python/cpython/blob/v3.7.4/Include/object.h#L181-L195 Regards Antoine.
[jira] [Created] (ARROW-6606) [C++] Construct tree structure from std::vector
Francois Saint-Jacques created ARROW-6606: - Summary: [C++] Construct tree structure from std::vector Key: ARROW-6606 URL: https://issues.apache.org/jira/browse/ARROW-6606 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques This will be used by FileSystemDataSource for pushdown predicate pruning of branches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6605) [C++] Add recursion depth control to fs::Selector
Francois Saint-Jacques created ARROW-6605: - Summary: [C++] Add recursion depth control to fs::Selector Key: ARROW-6605 URL: https://issues.apache.org/jira/browse/ARROW-6605 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques This is similar to the recursive options, but also control the depth. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6604) [C++] Add support for nested types to MakeArrayFromScalar
Benjamin Kietzman created ARROW-6604: Summary: [C++] Add support for nested types to MakeArrayFromScalar Key: ARROW-6604 URL: https://issues.apache.org/jira/browse/ARROW-6604 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Benjamin Kietzman Assignee: Benjamin Kietzman At the same time move MakeArrayFromScalar and MakeArrayOfNull under src/arrow/array/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6603) [C#] ArrayBuilder API to support writing nulls
Eric Erhardt created ARROW-6603: --- Summary: [C#] ArrayBuilder API to support writing nulls Key: ARROW-6603 URL: https://issues.apache.org/jira/browse/ARROW-6603 Project: Apache Arrow Issue Type: Improvement Components: C# Reporter: Eric Erhardt There is currently no API in the PrimitiveArrayBuilder class to support writing nulls. See this TODO - [https://github.com/apache/arrow/blob/1515fe10c039fb6685df2e282e2e888b773caa86/csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs#L101.] Also see [https://github.com/apache/arrow/issues/5381]. We should add some APIs to support writing nulls. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Timeline for 0.15.0 release
The process should be well documented at this point but there are a number of steps. Note that you need to add your code signing key to the KEYS file in SVN (that's not very hard to do). I think it's fine to hand off the process to others after the VOTE but it would be tricky to have multiple RMs involved with producing the source and binary artifacts for the vote On Tue, Sep 17, 2019 at 10:55 PM Micah Kornfield wrote: > > SGTM, as well. > > I should have a little bit of time next week if I can help as RM but I have > a couple of concerns: > 1. In the past I've had trouble downloading and validating releases. I'm a > bit worried, that I might have similar problems doing the necessary uploads. > 2. My internet connection will likely be not great, I don't know if this > would make it even less likely to be successful. > > Does it become problematic if somehow I would have to abandon the process > mid-release? Is there anyone who could serve as a backup? Are the steps > well documented? > > Thanks, > Micah > > On Tue, Sep 17, 2019 at 4:25 PM Neal Richardson > wrote: > > > Sounds good to me. > > > > Do we have a release manager yet? Any volunteers? > > > > Neal > > > > On Tue, Sep 17, 2019 at 4:06 PM Wes McKinney wrote: > > > > > hi all, > > > > > > It looks like we're drawing close to be able to make the 0.15.0 > > > release. I would suggest "pencils down" at the end of this week and > > > see if a release candidate can be produced next Monday September 23. > > > Any thoughts or objections? > > > > > > Thanks, > > > Wes > > > > > > On Wed, Sep 11, 2019 at 11:23 AM Wes McKinney > > wrote: > > > > > > > > hi Eric -- yes, that's correct. I'm planning to amend the Format docs > > > > today regarding the EOS issue and also update the C++ library > > > > > > > > On Wed, Sep 11, 2019 at 11:21 AM Eric Erhardt > > > > wrote: > > > > > > > > > > I assume the plan is to merge the ARROW-6313-flatbuffer-alignment > > > branch into master before the 0.15 release, correct? > > > > > > > > > > BTW - I believe the C# alignment changes are ready to be merged into > > > the alignment branch - https://github.com/apache/arrow/pull/5280/ > > > > > > > > > > Eric > > > > > > > > > > -Original Message- > > > > > From: Micah Kornfield > > > > > Sent: Tuesday, September 10, 2019 10:24 PM > > > > > To: Wes McKinney > > > > > Cc: dev ; niki.lj > > > > > Subject: Re: Timeline for 0.15.0 release > > > > > > > > > > I should have a little more bandwidth to help with some of the > > > packaging starting tomorrow and going into the weekend. > > > > > > > > > > On Tuesday, September 10, 2019, Wes McKinney > > > wrote: > > > > > > > > > > > Hi folks, > > > > > > > > > > > > With the state of nightly packaging and integration builds things > > > > > > aren't looking too good for being in release readiness by the end > > of > > > > > > this week but maybe I'm wrong. I'm planning to be working to close > > as > > > > > > many issues as I can and also to help with the ongoing alignment > > > fixes. > > > > > > > > > > > > Wes > > > > > > > > > > > > On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield < > > emkornfi...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > >> Just for reference [1] has a dashboard of the current issues: > > > > > >> > > > > > >> > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi > > > > > >> ki.apache.org > > > %2Fconfluence%2Fdisplay%2FARROW%2FArrow%2B0.15.0%2BRelea > > > > > >> se&data=02%7C01%7CEric.Erhardt%40microsoft.com > > > %7Ccbead81a42104034 > > > > > >> > > > a4f308d736678a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370376 > > > > > >> > > > 90648216338&sdata=0Upux3i%2B9X6f8uanGKSGM5VYxR6c2ADWrxSPi1%2FgbH4 > > > > > >> %3D&reserved=0 > > > > > >> > > > > > >> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney > > > wrote: > > > > > >> > > > > > >>> hi all, > > > > > >>> > > > > > >>> It doesn't seem like we're going to be in a position to release > > at > > > > > >>> the beginning of next week. I hope that one more week of work (or > > > > > >>> less) will be enough to get us there. Aside from merging the > > > > > >>> alignment changes, we need to make sure that our packaging jobs > > > > > >>> required for the release candidate are all working. > > > > > >>> > > > > > >>> If folks could remove issues from the 0.15.0 backlog that they > > > don't > > > > > >>> think they will finish by end of next week that would help focus > > > > > >>> efforts (there are currently 78 issues in 0.15.0 still). I am > > > > > >>> looking to tackle a few small features related to dictionaries > > > while > > > > > >>> the release window is still open. > > > > > >>> > > > > > >>> - Wes > > > > > >>> > > > > > >>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney < > > wesmck...@gmail.com> > > > > > >>> wrote: > > > > > >>> > > > > > > >>> > hi, > > > > > >>> > > > > > > >>> > I think we should try to release the week of September 9, so > > > > > >>> > development work should be compl
[jira] [Created] (ARROW-6602) [Doc] Add feature / implementation matrix
Antoine Pitrou created ARROW-6602: - Summary: [Doc] Add feature / implementation matrix Key: ARROW-6602 URL: https://issues.apache.org/jira/browse/ARROW-6602 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Antoine Pitrou Fix For: 1.0.0 We have many different implementations and each implementation makes a different set of features available. It would be nice to have a top-level doc page making it clear which implementation supports what. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Arrow sync call September 19 at 12:00 US/Eastern, 16:00 UTC
I'm unable to join today but hope that participants can review the active DISCUSS threads On Tue, Sep 17, 2019 at 11:28 PM Neal Richardson wrote: > > Hi all, > Belated reminder that the biweekly Arrow call is coming up in less than 12 > hours at https://meet.google.com/vtm-teks-phx. All are welcome to join. > Notes will be sent out to the mailing list afterwards. > > Neal
Re: [DISCUSS] Changing C++ build system default options to produce more barebones builds
To be clear I think we should make these changes right after 0.15.0 is released so we aren't playing whackamole with our packaging scripts. I'm happy to take the lead on the work... On Wed, Sep 18, 2019 at 9:54 AM Antoine Pitrou wrote: > > On Wed, 18 Sep 2019 09:46:54 -0500 > Wes McKinney wrote: > > I think these are both interesting areas to explore further. I'd like > > to focus on the couple of immediate items I think we should address > > > > * Should optional components be "opt in", "out out", or a mix? > > Currently it's a mix, and that's confusing for people. I think we > > should make them all "opt in". > > * Do we want to bring the out-of-the-box core build down to zero > > dependencies, including not depending on boost::filesystem and > > possibly checking the compiled Flatbuffers files. While it may be > > slightly more maintenance work, I think the optics of a > > "dependency-free" core build would be beneficial and help the project > > marketing-wise. > > > > Both of these issues must be addressed whether we undertake a Bazel > > implementation or some other refactor of the C++ build system. > > I think checking in the Flatbuffers files (and also Protobuf and Thrift > where applicable :-)) would be fine. > > As for boost::filesystem, getting rid of it wouldn't be a huge task. > Still worth deciding whether we want to prioritize development time for > it, because it's not entirely trivial either. > > Regards > > Antoine. > >
[jira] [Created] (ARROW-6601) [Java] Add benchmark for JDBC adapter to avoid potential regression
Ji Liu created ARROW-6601: - Summary: [Java] Add benchmark for JDBC adapter to avoid potential regression Key: ARROW-6601 URL: https://issues.apache.org/jira/browse/ARROW-6601 Project: Apache Arrow Issue Type: Task Components: Java Reporter: Ji Liu Add a performance test as well to get a baseline number, to avoid performance regression when we change related code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Changing C++ build system default options to produce more barebones builds
On Wed, 18 Sep 2019 09:46:54 -0500 Wes McKinney wrote: > I think these are both interesting areas to explore further. I'd like > to focus on the couple of immediate items I think we should address > > * Should optional components be "opt in", "out out", or a mix? > Currently it's a mix, and that's confusing for people. I think we > should make them all "opt in". > * Do we want to bring the out-of-the-box core build down to zero > dependencies, including not depending on boost::filesystem and > possibly checking the compiled Flatbuffers files. While it may be > slightly more maintenance work, I think the optics of a > "dependency-free" core build would be beneficial and help the project > marketing-wise. > > Both of these issues must be addressed whether we undertake a Bazel > implementation or some other refactor of the C++ build system. I think checking in the Flatbuffers files (and also Protobuf and Thrift where applicable :-)) would be fine. As for boost::filesystem, getting rid of it wouldn't be a huge task. Still worth deciding whether we want to prioritize development time for it, because it's not entirely trivial either. Regards Antoine.
[jira] [Created] (ARROW-6600) [Java] Implement dictionary-encoded subfields for Union type
Ji Liu created ARROW-6600: - Summary: [Java] Implement dictionary-encoded subfields for Union type Key: ARROW-6600 URL: https://issues.apache.org/jira/browse/ARROW-6600 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Ji Liu Assignee: Ji Liu Implement dictionary-encoded subfields for {{Union}} type. Each child vector could be encodable or not. Meanwhile extra common logic into {{DictionaryEncoder}} as well as refactor List subfield encoding to keep consistent with {{Struct/Union}} type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Changing C++ build system default options to produce more barebones builds
I think these are both interesting areas to explore further. I'd like to focus on the couple of immediate items I think we should address * Should optional components be "opt in", "out out", or a mix? Currently it's a mix, and that's confusing for people. I think we should make them all "opt in". * Do we want to bring the out-of-the-box core build down to zero dependencies, including not depending on boost::filesystem and possibly checking the compiled Flatbuffers files. While it may be slightly more maintenance work, I think the optics of a "dependency-free" core build would be beneficial and help the project marketing-wise. Both of these issues must be addressed whether we undertake a Bazel implementation or some other refactor of the C++ build system. On Wed, Sep 18, 2019 at 2:48 AM Uwe L. Korn wrote: > > Hello Micah, > > I don't think we have explored using bazel yet. I would see it as a possible > modular alternative but as you mention it will be a lot of work and we would > probably need a mentor who is familiar with bazel, otherwise we probably end > up spending too much time on this and get a non-typical bazel setup. > > Uwe > > On Wed, Sep 18, 2019, at 8:44 AM, Micah Kornfield wrote: > > It has come up in the past, but I wonder if exploring Bazel as a build > > system with its a very explicit dependency graph might help (I'm not sure > > if something similar is available in CMake). > > > > This is also a lot of work, but could also potentially benefit the > > developer experience because we can make unit tests depend on individual > > compilable units instead of all of libarrow. There are trade-offs here as > > well in terms of public API coverage. > > > > On Tue, Sep 17, 2019 at 11:14 PM Uwe L. Korn wrote: > > > > > Hello, > > > > > > I can think of two other alternatives that make it more visible what Arrow > > > core is and what are the optional components: > > > > > > * Error out when no component is selected instead of building just the > > > core Arrow. Here we could add an explanative message that list all > > > components and for each component 2-3 words what it does and what it > > > requires. This would make the first-time experience much better. > > > * Split the CMake project into several subprojects. By correctly > > > structuring the CMakefiles, we should be able to separate out the Arrow > > > components into separate CMake projects that can be built independently if > > > needed while all using the same third-party toolchain. We would still have > > > a top-level CMakeLists.txt that is invoked just like the current one but > > > through having subprojects, you would not anymore be bound to use the > > > single top-level one. This would also have some benefit for packagers that > > > could separate out the build of individual Arrow modules. Furthermore, it > > > would also make it easier for PoC/academic projects to just take the Arrow > > > Core sources and drop it in as a CMake subproject; while this is not a > > > good > > > solution for production-grade software, it is quite common practice to do > > > this in research. > > > I really like this approach and I think this is something we should have > > > as a long-term target, I'm also happy to implement given the time but I > > > think one CMake refactor per year is the maximum I can do and that was > > > already eaten up by the dependency detection. Also, I'm unsure about how > > > much this would block us at the moment vs the marketing benefit of having > > > a > > > more modular Arrow; currently I'm leaning on the side that the > > > marketing/adoption benefit would be much larger but we lack someone > > > frustration-tolerant to do the refactoring. > > > > > > Uwe > > > > > > On Wed, Sep 18, 2019, at 12:18 AM, Wes McKinney wrote: > > > > hi folks, > > > > > > > > Lately there seem to be more and more people suggesting that the > > > > optional components in the Arrow C++ project are getting in the way of > > > > using the "core" which implements the columnar format and IPC > > > > protocol. I am not sure I agree with this argument, but in general I > > > > think it would be a good idea to make all optional components in the > > > > project "opt in" rather than "opt out" > > > > > > > > To demonstrate where things currently stand, I created a Dockerfile to > > > > try to make the smallest possible and most dependency-free build > > > > > > > > > > > https://github.com/wesm/arrow/tree/cpp-minimal-dockerfile/dev/cpp_minimal > > > > > > > > Here is the output of this build > > > > > > > > https://gist.github.com/wesm/02328fbb463033ed486721b8265f755f > > > > > > > > First, let's look at the CMake invocation > > > > > > > > cmake .. -DBOOST_SOURCE=BUNDLED \ > > > > -DARROW_BOOST_USE_SHARED=OFF \ > > > > -DARROW_COMPUTE=OFF \ > > > > -DARROW_DATASET=OFF \ > > > > -DARROW_JEMALLOC=OFF \ > > > > -DARROW_JSON=ON \ > > > > -DARROW_USE_GLOG=OFF \ > > > > -DARROW_WITH_BZ2=OFF \ > > > > -DARROW_WITH_ZLIB=OFF \ > > > > -DARROW_W
[jira] [Created] (ARROW-6599) Implement SUM aggregate expression
Andy Grove created ARROW-6599: - Summary: Implement SUM aggregate expression Key: ARROW-6599 URL: https://issues.apache.org/jira/browse/ARROW-6599 Project: Apache Arrow Issue Type: Sub-task Components: Rust, Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.15.0 Implement the SUM aggregate function in the new physical query plan -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6598) [Java] Sort the code for ApproxEqualsVisitor
Liya Fan created ARROW-6598: --- Summary: [Java] Sort the code for ApproxEqualsVisitor Key: ARROW-6598 URL: https://issues.apache.org/jira/browse/ARROW-6598 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Liya Fan Assignee: Liya Fan As a follow up issue of ARROW-6458, we finalize the code for ApproxEqualsVisitor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2019-09-18-0
Arrow Build Report for Job nightly-2019-09-18-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0 Failed Tasks: - docker-cpp-fuzzit: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-cpp-fuzzit - docker-docs: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-docs - docker-spark-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-spark-integration - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-conda-linux-gcc-py36 Succeeded Tasks: - ubuntu-disco: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-ubuntu-disco - docker-iwyu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-iwyu - docker-go: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-go - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-debian-stretch - ubuntu-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-ubuntu-xenial - conda-linux-gcc-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-conda-linux-gcc-py27 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-conda-win-vs2015-py37 - docker-cpp-release: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-cpp-release - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-conda-osx-clang-py37 - wheel-manylinux2010-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-travis-wheel-manylinux2010-cp37m - docker-dask-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-dask-integration - wheel-manylinux1-cp27m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-travis-wheel-manylinux1-cp27m - wheel-manylinux1-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-travis-wheel-manylinux1-cp36m - wheel-win-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-appveyor-wheel-win-cp35m - docker-r: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-r - docker-js: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-js - docker-c_glib: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-c_glib - docker-python-3.6-nopandas: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-python-3.6-nopandas - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-centos-7 - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-centos-6 - docker-python-2.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-python-2.7 - wheel-manylinux2010-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-travis-wheel-manylinux2010-cp36m - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-conda-osx-clang-py27 - wheel-manylinux2010-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-travis-wheel-manylinux2010-cp35m - docker-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-python-3.7 - wheel-win-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-appveyor-wheel-win-cp36m - docker-cpp-cmake32: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-cpp-cmake32 - docker-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-circle-docker-pandas-master - wheel-manylinux2010-cp27mu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-travis-wheel-manylinux2010-cp27mu - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-travis-gandiva-jar-trusty - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-conda-osx-clang-py36 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-18-0-azure-conda-win-vs2015-py36 - wheel-manylinux1-cp37m: URL: https://g
[jira] [Created] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7
Antoine Pitrou created ARROW-6597: - Summary: [Python] Segfault in test_pandas with Python 2.7 Key: ARROW-6597 URL: https://issues.apache.org/jira/browse/ARROW-6597 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Antoine Pitrou Assignee: Antoine Pitrou I get a segfault in test_pandas with Python 2.7. gdb stack trace (excerpt): {code} Thread 27 "python" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffb7fff700 (LWP 17725)] 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at ../src/arrow/python/datetime.cc:229 229 *out = PyDate_FromDate(static_cast(year), static_cast(month), (gdb) bt #0 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at ../src/arrow/python/datetime.cc:229 #1 0x7fffcabaed34 in arrow::Status arrow::py::ConvertDates(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657 #2 0x7fffcabaeb8c in arrow::Status arrow::py::ConvertAsPyObjects(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, arrow::Status arrow::py::ConvertDates(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, _object**)::{lambda(int const&, _object**)#1}::operator()(int const, _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, out_values=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:417 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: How can I help?
Hi Weston, Documenting your use cases would be a great help, imo. If open then am interested to help with that. I am looking to build some advanced POC. Please advise Thanks Nick https://twitter.com/semanticbeeng On Tue, Sep 17, 2019 at 6:10 PM Weston Platter wrote: > Hey there, > > I’ve had huge success using Arrow in production at my last couple jobs, > and wanted to ask how I can help and give back. > > On twitter, Wes mentioned that there’s some work to be done with python > packaging and wheels ( > https://twitter.com/wesmckinn/status/1174071228253929472). I’ve got some > free time in the next 2-3 weeks and would be happy to pitch in where it > makes the most sense. Is there a few small tasks I could get started on? > > Weston >
Re: [DISCUSS] Changing C++ build system default options to produce more barebones builds
Hello Micah, I don't think we have explored using bazel yet. I would see it as a possible modular alternative but as you mention it will be a lot of work and we would probably need a mentor who is familiar with bazel, otherwise we probably end up spending too much time on this and get a non-typical bazel setup. Uwe On Wed, Sep 18, 2019, at 8:44 AM, Micah Kornfield wrote: > It has come up in the past, but I wonder if exploring Bazel as a build > system with its a very explicit dependency graph might help (I'm not sure > if something similar is available in CMake). > > This is also a lot of work, but could also potentially benefit the > developer experience because we can make unit tests depend on individual > compilable units instead of all of libarrow. There are trade-offs here as > well in terms of public API coverage. > > On Tue, Sep 17, 2019 at 11:14 PM Uwe L. Korn wrote: > > > Hello, > > > > I can think of two other alternatives that make it more visible what Arrow > > core is and what are the optional components: > > > > * Error out when no component is selected instead of building just the > > core Arrow. Here we could add an explanative message that list all > > components and for each component 2-3 words what it does and what it > > requires. This would make the first-time experience much better. > > * Split the CMake project into several subprojects. By correctly > > structuring the CMakefiles, we should be able to separate out the Arrow > > components into separate CMake projects that can be built independently if > > needed while all using the same third-party toolchain. We would still have > > a top-level CMakeLists.txt that is invoked just like the current one but > > through having subprojects, you would not anymore be bound to use the > > single top-level one. This would also have some benefit for packagers that > > could separate out the build of individual Arrow modules. Furthermore, it > > would also make it easier for PoC/academic projects to just take the Arrow > > Core sources and drop it in as a CMake subproject; while this is not a good > > solution for production-grade software, it is quite common practice to do > > this in research. > > I really like this approach and I think this is something we should have > > as a long-term target, I'm also happy to implement given the time but I > > think one CMake refactor per year is the maximum I can do and that was > > already eaten up by the dependency detection. Also, I'm unsure about how > > much this would block us at the moment vs the marketing benefit of having a > > more modular Arrow; currently I'm leaning on the side that the > > marketing/adoption benefit would be much larger but we lack someone > > frustration-tolerant to do the refactoring. > > > > Uwe > > > > On Wed, Sep 18, 2019, at 12:18 AM, Wes McKinney wrote: > > > hi folks, > > > > > > Lately there seem to be more and more people suggesting that the > > > optional components in the Arrow C++ project are getting in the way of > > > using the "core" which implements the columnar format and IPC > > > protocol. I am not sure I agree with this argument, but in general I > > > think it would be a good idea to make all optional components in the > > > project "opt in" rather than "opt out" > > > > > > To demonstrate where things currently stand, I created a Dockerfile to > > > try to make the smallest possible and most dependency-free build > > > > > > > > https://github.com/wesm/arrow/tree/cpp-minimal-dockerfile/dev/cpp_minimal > > > > > > Here is the output of this build > > > > > > https://gist.github.com/wesm/02328fbb463033ed486721b8265f755f > > > > > > First, let's look at the CMake invocation > > > > > > cmake .. -DBOOST_SOURCE=BUNDLED \ > > > -DARROW_BOOST_USE_SHARED=OFF \ > > > -DARROW_COMPUTE=OFF \ > > > -DARROW_DATASET=OFF \ > > > -DARROW_JEMALLOC=OFF \ > > > -DARROW_JSON=ON \ > > > -DARROW_USE_GLOG=OFF \ > > > -DARROW_WITH_BZ2=OFF \ > > > -DARROW_WITH_ZLIB=OFF \ > > > -DARROW_WITH_ZSTD=OFF \ > > > -DARROW_WITH_LZ4=OFF \ > > > -DARROW_WITH_SNAPPY=OFF \ > > > -DARROW_WITH_BROTLI=OFF \ > > > -DARROW_BUILD_UTILITIES=OFF > > > > > > Aside from the issue of how to obtain and link Boost, here's a couple of > > things: > > > > > > * COMPUTE and DATASET IMHO should be off by default > > > * All compression libraries should be turned off > > > * GLOG should be off by default > > > * Utilities should be off (they are used for integration testing) > > > * Jemalloc should probably be off, but we should make it clear that > > > opting in will yield better performance > > > > > > I found that it wasn't possible to set ARROW_JSON=OFF without breaking > > > the build. I opened ARROW-6590 to fix this > > > > > > Aside from potentially changing these defaults, there's some things in > > > the build that we might want to turn into optional pieces: > > > > > > * We should see if we can make boost::filesystem not mandatory in the > > > barebones build, if only to satisfy the pean