[jira] [Created] (ARROW-4851) [Java] BoundsChecking.java defaulting behavior for old drill parameter seems off
Micah Kornfield created ARROW-4851: -- Summary: [Java] BoundsChecking.java defaulting behavior for old drill parameter seems off Key: ARROW-4851 URL: https://issues.apache.org/jira/browse/ARROW-4851 Project: Apache Arrow Issue Type: Bug Reporter: Micah Kornfield In order to turn bounds checking off you still need to flip the old deprecated parameter. We should probably change it to true by default, so only the new arrow parameter is used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Depending on non-released Apache projects (C++ Avro)
Thanks. This work has been pushed off a bit because I need to get existing PRs into better shape. Hopefully 1.9 is released when I pick it up (otherwise I would lean towards forking for the time being as well). On Tue, Mar 12, 2019 at 3:01 AM Uwe L. Korn wrote: > Hello Micah, > > > Uwe, I'm not sure I understand what type of support/help you are thinking > > of. Could you elaborate a little bit more before I reach out? > > I would help them with the same build system improvement we have done in > the recent time (and are currently) doing in Arrow for C++. Nothing I > would explicitly advertise on their ML, only as a heads up if there are > Avro people on the Arrow ML. I cannot give any commitments on this but this > would be probably some high gain for not so much work if that is currently > hindering releases or adoption. > > Uwe > > > > > -Micah > > > > On Tue, Mar 5, 2019 at 4:53 PM Wes McKinney wrote: > > > > > I am OK with that, but if we find ourselves making compromises that > > > affect performance or memory efficiency (where possibly invasive > > > refactoring may be required) perhaps we should reconsider option #3. > > > > > > On Tue, Mar 5, 2019 at 11:29 AM Uwe L. Korn wrote: > > > > > > > > I'm leaning a bit towards 1) but I would love to get some input from > the > > > Avro community as 1) depends also on their side as we will submit some > > > patches upstream that need to be reviewed and someday also released. > > > > > > > > Are AVRO committers subscribed here or should we reach out to them on > > > their ML? Given that we are quite active in the C++ space currently, I > feel > > > that we can contribute quite some infrastructure in building and > packaging > > > that we do eitherway for Arrow. This might be quite helpful for a > project. > > > We have seen with Parquet where much of the development is just > happening > > > as it is part of Arrow. (Not suggesting to merge/fork the Avro > codebase but > > > just to apply some of the best practices we learned while building > Arrow). > > > > > > > > Uwe > > > > > > > > On Tue, Mar 5, 2019, at 4:57 PM, Wes McKinney wrote: > > > > > I'd be +0.5 in favor of forking in this particular case. Since > Avro is > > > > > not vectorized (unlike Parquet and ORC) I suspect it may be more > > > > > difficult to get the best performance using a general purpose API > > > > > versus one that is more specialized to producing Arrow record > batches. > > > > > Given that has been relatively light C++ development activity in > > > > > Apache Avro and no releases for 2 years it does give me pause. > > > > > > > > > > We might want to look at Impala's Avro scanner, they are doing some > > > > > LLVM IR cross-compilation also (they're using the Avro C++ library > > > > > though) > > > > > > > > > > > > > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner-ir.cc > > > > > > > > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner.cc > > > > > > > > > > On Tue, Mar 5, 2019 at 1:01 AM Micah Kornfield < > emkornfi...@gmail.com> > > > wrote: > > > > > > > > > > > > I'm looking at incorporating Avro in Arrow C++ [1]. It seems > that > > > the Avro > > > > > > C++ library APIs have improved from the last release. However, > it > > > is not > > > > > > clear when a new release will be available (I asked on the JIRA > > > Item for > > > > > > the next release [2] and received no response). > > > > > > > > > > > > I was wondering if there is a policy governing using other Apache > > > projects > > > > > > or how people felt about the following options: > > > > > > 1. Depend on a specific git commit through the third-party > library > > > system. > > > > > > 2. Copy the necessary source code temporarily to our project, > and > > > change > > > > > > to using the next release when it is available. > > > > > > 3. Fork the code we need (the main benefit I see here is being > able > > > to > > > > > > refactor it to avoid having to deal with exceptions, easier > > > integration > > > > > > with our IO system and one less 3rd party dependency to deal > with). > > > > > > 4. Wait on the 1.9 release before proceeding. > > > > > > > > > > > > Thanks, > > > > > > Micah > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/ARROW-1209 > > > > > > [2] https://issues.apache.org/jira/browse/AVRO-2250 > > > > > > > > > > >
[jira] [Created] (ARROW-4850) [CI] Integration test failures do not fail the Travis CI build
Wes McKinney created ARROW-4850: --- Summary: [CI] Integration test failures do not fail the Travis CI build Key: ARROW-4850 URL: https://issues.apache.org/jira/browse/ARROW-4850 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration Reporter: Wes McKinney Fix For: 0.13.0 See https://github.com/apache/arrow/pull/3871 These changes fail the build, but it is reported as success The errors can be seen in https://travis-ci.org/apache/arrow/jobs/505028161 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4849) [C++] Add docker-compose entry for testing Ubuntu Bionic build with system packages
Uwe L. Korn created ARROW-4849: -- Summary: [C++] Add docker-compose entry for testing Ubuntu Bionic build with system packages Key: ARROW-4849 URL: https://issues.apache.org/jira/browse/ARROW-4849 Project: Apache Arrow Issue Type: Improvement Components: C++, Packaging Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.13.0 To better support people on Ubuntu and also show the missing things to get Arrow packaged into Fedora, add an entry to the docker-compose.yml that builds on Ubuntu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4848) Static libparquet not compiled with -DARROW_STATIC on Windows
Jeroen created ARROW-4848: - Summary: Static libparquet not compiled with -DARROW_STATIC on Windows Key: ARROW-4848 URL: https://issues.apache.org/jira/browse/ARROW-4848 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.12.1 Reporter: Jeroen When trying to link the R bindings against static libparquet.a + libarrow.a we get a lot of missing arrow symbol warnings from libparquet.a. I think the problem is that libparquet.a was not compiled -DARROW_STATIC, and therefore cannot be linked against libarrow.a. When arrow cmake is configured with -DARROW_BUILD_SHARED=OFF I think it should automatically use -DARROW_STATIC when compiling libparquet on Windows? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4847) [Python] Add pyarrow.table factory function that dispatches to various ctors based on type of input
Wes McKinney created ARROW-4847: --- Summary: [Python] Add pyarrow.table factory function that dispatches to various ctors based on type of input Key: ARROW-4847 URL: https://issues.apache.org/jira/browse/ARROW-4847 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Assignee: Wes McKinney Fix For: 0.13.0 For example, in {{pyarrow.table(df)}} if {{df}} is a {{pandas.DataFrame}}, then table will dispatch to {{pa.Table.from_pandas}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4846) [Java] Update Jackson to 2.9.8
Wes McKinney created ARROW-4846: --- Summary: [Java] Update Jackson to 2.9.8 Key: ARROW-4846 URL: https://issues.apache.org/jira/browse/ARROW-4846 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Wes McKinney Assignee: Andy Grove Fix For: 0.13.0 We are looking at removing Jackson from arrow-vector dependencies in ARROW-2501 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: MATLAB, Arrow, ABI's and Linux
Hello Joris, '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64': /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64: undefined symbol: _ZNK5arrow6Status8ToStringEv. sounds like fromArrowStream.mexa64 was still compiled using -D_GLIBCXX_USE_CXX11_ABI=0, you might try to explicitly pass -D_GLIBCXX_USE_CXX11_ABI=1 or when building an conda environment, be sure to have the compilers package installed and use the $CC and $CXX environment variables to pick the right compilers. You may also need to LD_PRELOAD the libstdc++.so that is coming with conda and not the one coming from the system. Anaconda/defaults and conda-forge based Arrow packages are both nowadays built with the new ABI. But they are built with slightly different toolchains, so it is best to only install packages from one of the two repositories and don't mix them. Uwe On Tue, Mar 12, 2019, at 5:32 PM, Wes McKinney wrote: > hi Joris, > > You probably ran into the conda-forge compiler migration. I'm not sure > about Anaconda's Apache Arrow libraries since they maintain those > recipes. > > If you need shared libaries using the gcc 4.x ABI you may have to > build them yourself right, or use the Linux packages for the platform > where you are working. It would be useful to have a Dockerfile that > produces "portable" shared libraries with the RedHat devtoolset-2 > compiler > > - Wes > > On Tue, Mar 12, 2019 at 11:22 AM Joris Peeters > wrote: > > > > [A] Short background: > > > > We are working on a MEX library that converts a binary array (representing > > an Arrow stream or file) into MATLAB structs. This is in > > parallel/complement to what already exists in the main Arrow project, which > > focuses on feather, but the hope is certainly to contribute back some of > > this work. We talked to MathWorks about this already (as GSA Capital). > > > > The library (e.g. fromArrowStream.mexa64) gets published on > > (company-internal) Anaconda, and upon installation the dependencies on > > arrow-cpp, boost-cpp etc are resolved (from remote channels). All .so's end > > up in a user-local conda environment's ../lib, which in MATLAB we make > > available through addpath. Compilation uses -D_GLIBCXX_USE_CXX11_ABI=0. > > > > [B] The issue I'm facing ... > > > > For quite a while (when we depended on arrow-cpp=0.10 from conda-forge) > > this has worked fine, but lately I've encountered increasing issues wrt ABI > > compatibility, amongst arrow, gtest (both at build time) and MATLAB (at > > run-time). > > > > Is arrow-cpp 0.11 only built with the new ABI? I loaded it from both > > defaults and conda-forge channel, and it seems different in this regard > > than conda-forge's 0.10. Either way, I'm now attempting to built my library > > without the -D_GLIBCXX_USE_CXX11_ABI=0 compile flag, as that seems to be > > the more sustainable way forward. > > > > Question: is it possible to load a MEX library that has been compiled with > > the new ABI? When doing this naively, I get an error like: > > > > Invalid MEX-file > > '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64': > > /data/home/jpeeter/apps/matlab/MATLAB/R2018a/bin/glnxa64/../../sys/os/glnxa64/libstdc++.so.6: > > version `CXXABI_1.3.11' not found (required by > > /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/libarrow.so.11). > > > > which is fair enough. > > > > Alternatively, though, when loading MATLAB with an LD_PRELOAD, like > > LD_PRELOAD=/usr/lib64/libstdc++.so.6 ~/apps/matlab/MATLAB/R2018a/bin/matlab > > > > I get this error: > > Invalid MEX-file > > '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64': > > /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64: > > undefined symbol: _ZNK5arrow6Status8ToStringEv. > > > > If this isn't possible, is there a reliable/recommended Anaconda-way to > > only bring in libraries that have been compiled with the old ABI? My > > impression was that conda-forge libraries satisfied that, but > > - This no longer seems to be true for arrow-cpp=0.11? I might be mistaken > > here. > > - We are resolving our dependencies through Anaconda, so it could be quite > > brittle for users to explicitly have to specify a channel for certain > > libraries. There are further subtleties wrt resolving boost & arrow from > > different channels etc. Ideally - but not necessarily - users don't depend > > on conda-forge at all, but only on the default channels (as an aside: we > > use Artifactory internally). > > > > Essentially, I'm happy with either, > > - having a way for MATLAB to load MEX with the new ABI, or, > > - reliably depending on libraries compiled with the old ABI > > > > but I've been struggling for a while now to achieve either one of those. > > Any pointers would be most welcome! > > > > (For once, this turned out to be easier on Windows than Linux!) > > > > Best, > > -Joris. >
[jira] [Created] (ARROW-4845) Compiler warnings on Windows
Jeroen created ARROW-4845: - Summary: Compiler warnings on Windows Key: ARROW-4845 URL: https://issues.apache.org/jira/browse/ARROW-4845 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 0.12.1 Reporter: Jeroen I am seeing the warnings below when compiling the R bindings on Windows. Most of these seem easy to fix (comparing int with size_t or int32 with int64). {code} array.cpp: In function 'Rcpp::LogicalVector Array__Mask(const std::shared_ptr&)': array.cpp:102:24: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long long unsigned int'} and 'int64_t' {aka 'long long int'} [-Wsign-compare] for (size_t i = 0; i < array->length(); i++, bitmap_reader.Next()) { ~~^ /mingw64/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-testing/include" -DNDEBUG -DARROW_STATIC -I"C:/R/library/Rcpp/include"-O2 -Wall -mtune=generic -c array__to_vector.cpp -o array__to_vector.o array__to_vector.cpp: In member function 'virtual arrow::Status arrow::r::Converter_Boolean::Ingest_some_nulls(SEXP, const std::shared_ptr&, R_xlen_t, R_xlen_t) const': array__to_vector.cpp:254:28: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' {aka 'long long int'} [-Wsign-compare] for (size_t i = 0; i < n; i++, data_reader.Next(), null_reader.Next(), ++p_data) { ~~^~~ array__to_vector.cpp:258:28: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' {aka 'long long int'} [-Wsign-compare] for (size_t i = 0; i < n; i++, data_reader.Next(), ++p_data) { ~~^~~ array__to_vector.cpp: In member function 'virtual arrow::Status arrow::r::Converter_Decimal::Ingest_some_nulls(SEXP, const std::shared_ptr&, R_xlen_t, R_xlen_t) const': array__to_vector.cpp:473:28: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' {aka 'long long int'} [-Wsign-compare] for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data) { ~~^~~ array__to_vector.cpp:478:28: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' {aka 'long long int'} [-Wsign-compare] for (size_t i = 0; i < n; i++, ++p_data) { ~~^~~ array__to_vector.cpp: In member function 'virtual arrow::Status arrow::r::Converter_Int64::Ingest_some_nulls(SEXP, const std::shared_ptr&, R_xlen_t, R_xlen_t) const': array__to_vector.cpp:515:28: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' {aka 'long long int'} [-Wsign-compare] for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data) { ~~^~~ array__to_vector.cpp: In instantiation of 'arrow::Status arrow::r::SomeNull_Ingest(SEXP, R_xlen_t, R_xlen_t, const array_value_type*, const std::shared_ptr&, Lambda) [with int RTYPE = 14; array_value_type = long long int; Lambda = arrow::r::Converter_Date64::Ingest_some_nulls(SEXP, const std::shared_ptr&, R_xlen_t, R_xlen_t) const::; SEXP = SEXPREC*; R_xlen_t = long long int]': array__to_vector.cpp:366:77: required from here array__to_vector.cpp:116:26: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' {aka 'long long int'} [-Wsign-compare] for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data, ++p_values) { ~~^~~ array__to_vector.cpp: In instantiation of 'arrow::Status arrow::r::SomeNull_Ingest(SEXP, R_xlen_t, R_xlen_t, const array_value_type*, const std::shared_ptr&, Lambda) [with int RTYPE = 13; array_value_type = unsigned char; Lambda = arrow::r::Converter_Dictionary::Ingest_some_nulls_Impl(SEXP, const std::shared_ptr&, R_xlen_t, R_xlen_t) const [with Type = arrow::UInt8Type; SEXP = SEXPREC*; R_xlen_t = long long int]::; SEXP = SEXPREC*; R_xlen_t = long long int]': array__to_vector.cpp:341:47: required from 'arrow::Status arrow::r::Converter_Dictionary::Ingest_some_nulls_Impl(SEXP, const std::shared_ptr&, R_xlen_t, R_xlen_t) const [with Type = arrow::UInt8Type; SEXP = SEXPREC*; R_xlen_t = long long int]' array__to_vector.cpp:313:78: required from here array__to_vector.cpp:116:26: warning: comparison of integer expressions of different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' {aka 'long long int'} [-Wsign-compare] array__to_vector.cpp: In instantiation of 'arrow::Status arrow::r::SomeNull_Ingest(SEXP, R_xlen_t, R_xlen_t, const array_value_type*, const std::shared_ptr&, Lambda) [with int RTYPE = 13; array_value
[jira] [Created] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion
Jeroen created ARROW-4844: - Summary: Static libarrow is missing vendored libdouble-conversion Key: ARROW-4844 URL: https://issues.apache.org/jira/browse/ARROW-4844 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.12.1 Reporter: Jeroen When trying to statically link libarrow.a I get linking errors which suggest that libdouble-conversion.a was not properly embedded in libarrow.a. This problem happens on both MacOS and Windows. For the R bindings we need static libraries. {code} C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c): undefined reference to `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, int*) const' C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda): undefined reference to `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, int*) const' C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097): undefined reference to `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, int*) const' C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589): undefined reference to `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, int*) const' C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647): undefined reference to `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, int*) const' {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4843) [Rust] [DataFusion] Parquet data source should support DATE
Andy Grove created ARROW-4843: - Summary: [Rust] [DataFusion] Parquet data source should support DATE Key: ARROW-4843 URL: https://issues.apache.org/jira/browse/ARROW-4843 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Affects Versions: 0.13.0 Reporter: Andy Grove Fix For: 0.13.0 The new Parquet data source (ARROW-4466) fails with "Unable to convert parquet logical type DATE" when reading some parquet files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4840) [C++] Persist CMake options in generated header
Uwe L. Korn created ARROW-4840: -- Summary: [C++] Persist CMake options in generated header Key: ARROW-4840 URL: https://issues.apache.org/jira/browse/ARROW-4840 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.14.0 (do this after we merged the CMake refactor) We should export all compile-time options like ARROW_WITH_ZSTD or ARROW_PARQUET in a header file so that other libraries depending on Arrow C++ can also respect that in their builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Timeline for 0.13 Arrow release
I've cleaned up my issues for Rust, moving most of them to 0.14.0. I have two PRs in progress that I would appreciate reviews on: https://github.com/apache/arrow/pull/3671 - [Rust] Table API (a.k.a DataFrame) https://github.com/apache/arrow/pull/3851 - [Rust] Parquet data source in DataFusion Once these are merged I have some small follow up PRs for 0.13.0 that I can get done this week. Thanks, Andy. On Tue, Mar 12, 2019 at 8:21 AM Wes McKinney wrote: > hi folks, > > I think we are on track to be able to release toward the end of this > month. My proposed timeline: > > * This week (March 11-15): feature/improvement push mostly > * Next week (March 18-22): shift to bug fixes, stabilization, empty > backlog of feature/improvement JIRAs > * Week of March 25: propose release candidate > > Does this seem reasonable? This puts us at about 9-10 weeks from 0.12. > > We need an RM for 0.13, any PMCs want to volunteer? > > Take a look at our release page: > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219 > > Out of the open or in-progress issues, we have: > > * C#: 3 issues > * C++ (all components): 51 issues > * Java: 3 issues > * Python: 38 issues > * Rust (all components): 33 issues > > Please help curating the backlogs for each component. There's a > smattering of issues in other categories. There are also 10 open > issues with No Component (and 20 resolved issues), those need their > metadata fixed. > > Thanks, > Wes > > On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney wrote: > > > > The timeline for the 0.13 release is drawing closer. I would say we > > should consider a release candidate either the week of March 18 or > > March 25, which gives us ~3 weeks to close out backlog items. > > > > There are around 220 issues open or in-progress in > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release > > > > Please have a look. If issues are not assigned to someone as the next > > couple of weeks pass by I'll begin moving at least C++ and Python > > issues to 0.14 that don't seem like they're going to get done for > > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other > > components can review and curate the issues that would be helpful. > > > > You can help keep the JIRA issues tidy by making sure to add Fix > > Version to issues and to make sure to add a Component so that issues > > are properly categorized in the release notes. > > > > Thanks > > Wes > > > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney > wrote: > > > > > > See > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide > > > > > > The source release step is one of the places where problems occur. > > > > > > On Sat, Feb 9, 2019, 10:33 AM > >> > > >> > > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn wrote: > > >> > > > >> > We could dockerize some of the release steps to ensure that they > run in the same environment. > > >> > > >> I may be able to help with said Dockerization. If not for this > release, then for the next. Are there docs on which systems we wish to > target and/or any build steps beyond the current dev container ( > https://github.com/apache/arrow/tree/master/dev/container)? >
[jira] [Created] (ARROW-4842) [C++] Persist CMake options in pkg-config files
Uwe L. Korn created ARROW-4842: -- Summary: [C++] Persist CMake options in pkg-config files Key: ARROW-4842 URL: https://issues.apache.org/jira/browse/ARROW-4842 Project: Apache Arrow Issue Type: Improvement Components: C++, Packaging Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.14.0 Persist options like ARROW_WITH_ZSTD in {{arrow.pc}} so libraries can determine which features are available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: MATLAB, Arrow, ABI's and Linux
hi Joris, You probably ran into the conda-forge compiler migration. I'm not sure about Anaconda's Apache Arrow libraries since they maintain those recipes. If you need shared libaries using the gcc 4.x ABI you may have to build them yourself right, or use the Linux packages for the platform where you are working. It would be useful to have a Dockerfile that produces "portable" shared libraries with the RedHat devtoolset-2 compiler - Wes On Tue, Mar 12, 2019 at 11:22 AM Joris Peeters wrote: > > [A] Short background: > > We are working on a MEX library that converts a binary array (representing > an Arrow stream or file) into MATLAB structs. This is in > parallel/complement to what already exists in the main Arrow project, which > focuses on feather, but the hope is certainly to contribute back some of > this work. We talked to MathWorks about this already (as GSA Capital). > > The library (e.g. fromArrowStream.mexa64) gets published on > (company-internal) Anaconda, and upon installation the dependencies on > arrow-cpp, boost-cpp etc are resolved (from remote channels). All .so's end > up in a user-local conda environment's ../lib, which in MATLAB we make > available through addpath. Compilation uses -D_GLIBCXX_USE_CXX11_ABI=0. > > [B] The issue I'm facing ... > > For quite a while (when we depended on arrow-cpp=0.10 from conda-forge) > this has worked fine, but lately I've encountered increasing issues wrt ABI > compatibility, amongst arrow, gtest (both at build time) and MATLAB (at > run-time). > > Is arrow-cpp 0.11 only built with the new ABI? I loaded it from both > defaults and conda-forge channel, and it seems different in this regard > than conda-forge's 0.10. Either way, I'm now attempting to built my library > without the -D_GLIBCXX_USE_CXX11_ABI=0 compile flag, as that seems to be > the more sustainable way forward. > > Question: is it possible to load a MEX library that has been compiled with > the new ABI? When doing this naively, I get an error like: > > Invalid MEX-file > '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64': > /data/home/jpeeter/apps/matlab/MATLAB/R2018a/bin/glnxa64/../../sys/os/glnxa64/libstdc++.so.6: > version `CXXABI_1.3.11' not found (required by > /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/libarrow.so.11). > > which is fair enough. > > Alternatively, though, when loading MATLAB with an LD_PRELOAD, like > LD_PRELOAD=/usr/lib64/libstdc++.so.6 ~/apps/matlab/MATLAB/R2018a/bin/matlab > > I get this error: > Invalid MEX-file > '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64': > /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64: > undefined symbol: _ZNK5arrow6Status8ToStringEv. > > If this isn't possible, is there a reliable/recommended Anaconda-way to > only bring in libraries that have been compiled with the old ABI? My > impression was that conda-forge libraries satisfied that, but > - This no longer seems to be true for arrow-cpp=0.11? I might be mistaken > here. > - We are resolving our dependencies through Anaconda, so it could be quite > brittle for users to explicitly have to specify a channel for certain > libraries. There are further subtleties wrt resolving boost & arrow from > different channels etc. Ideally - but not necessarily - users don't depend > on conda-forge at all, but only on the default channels (as an aside: we > use Artifactory internally). > > Essentially, I'm happy with either, > - having a way for MATLAB to load MEX with the new ABI, or, > - reliably depending on libraries compiled with the old ABI > > but I've been struggling for a while now to achieve either one of those. > Any pointers would be most welcome! > > (For once, this turned out to be easier on Windows than Linux!) > > Best, > -Joris.
[jira] [Created] (ARROW-4841) [C++] Persist CMake options in generated CMake config
Uwe L. Korn created ARROW-4841: -- Summary: [C++] Persist CMake options in generated CMake config Key: ARROW-4841 URL: https://issues.apache.org/jira/browse/ARROW-4841 Project: Apache Arrow Issue Type: Improvement Components: C++, Packaging Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.14.0 (do this after we merged the CMake refactor) We should persist all options set during the CMake run also in {{arrowConfig.cmake}} so that CMake projects that depend on Arrow also can determine what they are able to provide. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
MATLAB, Arrow, ABI's and Linux
[A] Short background: We are working on a MEX library that converts a binary array (representing an Arrow stream or file) into MATLAB structs. This is in parallel/complement to what already exists in the main Arrow project, which focuses on feather, but the hope is certainly to contribute back some of this work. We talked to MathWorks about this already (as GSA Capital). The library (e.g. fromArrowStream.mexa64) gets published on (company-internal) Anaconda, and upon installation the dependencies on arrow-cpp, boost-cpp etc are resolved (from remote channels). All .so's end up in a user-local conda environment's ../lib, which in MATLAB we make available through addpath. Compilation uses -D_GLIBCXX_USE_CXX11_ABI=0. [B] The issue I'm facing ... For quite a while (when we depended on arrow-cpp=0.10 from conda-forge) this has worked fine, but lately I've encountered increasing issues wrt ABI compatibility, amongst arrow, gtest (both at build time) and MATLAB (at run-time). Is arrow-cpp 0.11 only built with the new ABI? I loaded it from both defaults and conda-forge channel, and it seems different in this regard than conda-forge's 0.10. Either way, I'm now attempting to built my library without the -D_GLIBCXX_USE_CXX11_ABI=0 compile flag, as that seems to be the more sustainable way forward. Question: is it possible to load a MEX library that has been compiled with the new ABI? When doing this naively, I get an error like: Invalid MEX-file '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64': /data/home/jpeeter/apps/matlab/MATLAB/R2018a/bin/glnxa64/../../sys/os/glnxa64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/libarrow.so.11). which is fair enough. Alternatively, though, when loading MATLAB with an LD_PRELOAD, like LD_PRELOAD=/usr/lib64/libstdc++.so.6 ~/apps/matlab/MATLAB/R2018a/bin/matlab I get this error: Invalid MEX-file '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64': /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64: undefined symbol: _ZNK5arrow6Status8ToStringEv. If this isn't possible, is there a reliable/recommended Anaconda-way to only bring in libraries that have been compiled with the old ABI? My impression was that conda-forge libraries satisfied that, but - This no longer seems to be true for arrow-cpp=0.11? I might be mistaken here. - We are resolving our dependencies through Anaconda, so it could be quite brittle for users to explicitly have to specify a channel for certain libraries. There are further subtleties wrt resolving boost & arrow from different channels etc. Ideally - but not necessarily - users don't depend on conda-forge at all, but only on the default channels (as an aside: we use Artifactory internally). Essentially, I'm happy with either, - having a way for MATLAB to load MEX with the new ABI, or, - reliably depending on libraries compiled with the old ABI but I've been struggling for a while now to achieve either one of those. Any pointers would be most welcome! (For once, this turned out to be easier on Windows than Linux!) Best, -Joris.
[jira] [Created] (ARROW-4839) [C#] Add NuGet support
Eric Erhardt created ARROW-4839: --- Summary: [C#] Add NuGet support Key: ARROW-4839 URL: https://issues.apache.org/jira/browse/ARROW-4839 Project: Apache Arrow Issue Type: Improvement Components: C# Reporter: Eric Erhardt Assignee: Eric Erhardt We should add the metadata to the .csproj so we can create a NuGet package without changing any source code. Also, we should add any scripts and documentation on how to create the NuGet package to allow ease of creation at release time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Timeline for 0.13 Arrow release
I disagree. I think we need a forcing function to create some urgency to work out any of the problems that may come up. If we need to make a 0.13.1 to fix problems that are not caught during the release (and producing binary artifacts) then I think that is okay On Tue, Mar 12, 2019 at 9:53 AM Krisztián Szűcs wrote: > > Hi, > > The CMake refactor is huge enough to cause unexpected post-release > defects. I'd consider shipping it with the next release and let is stabilize > during the development of 0.14. > > > On Tue, Mar 12, 2019 at 3:41 PM Wes McKinney wrote: > > > hi Uwe, > > > > I would be OK with trying to release next week. Let's see what others > > think. Getting the CMake refactor in and packaging sorted out is the > > big priority for the C++-using ecosystem. I haven't seen anything > > hugely pressing in the other impls based on the recent patch flow. > > Having C# NuGet package working would be nice > > > > - Wes > > > > On Tue, Mar 12, 2019 at 9:37 AM Uwe L. Korn wrote: > > > > > > Hello, > > > > > > two things come to my mind: > > > > > > * I'm spending now some time again on the CMake refactor. The only > > blocker I currently see is the issue with detecting the correct gtest > > libraries on Windows. Otherwise this can go in. Being very confident about > > my work: This will break things even though CI is green. A safer bet from > > my side would be to make a release and then merge once the release vote > > passed. I would then like to aim for a shorter cycle for 0.14 to get this > > working and green as a lot of the refactor is needed to get packages into > > distributions and thus also clear a bit the path for the availablity of an > > R package for arrow. > > > * I could also volunteer to be RM *if we do the release process > > starting March 18*. I have limited time the week after but enough spare to > > make a release in the week of March 18. > > > > > > Uwe > > > > > > On Tue, Mar 12, 2019, at 3:21 PM, Wes McKinney wrote: > > > > hi folks, > > > > > > > > I think we are on track to be able to release toward the end of this > > > > month. My proposed timeline: > > > > > > > > * This week (March 11-15): feature/improvement push mostly > > > > * Next week (March 18-22): shift to bug fixes, stabilization, empty > > > > backlog of feature/improvement JIRAs > > > > * Week of March 25: propose release candidate > > > > > > > > Does this seem reasonable? This puts us at about 9-10 weeks from 0.12. > > > > > > > > We need an RM for 0.13, any PMCs want to volunteer? > > > > > > > > Take a look at our release page: > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219 > > > > > > > > Out of the open or in-progress issues, we have: > > > > > > > > * C#: 3 issues > > > > * C++ (all components): 51 issues > > > > * Java: 3 issues > > > > * Python: 38 issues > > > > * Rust (all components): 33 issues > > > > > > > > Please help curating the backlogs for each component. There's a > > > > smattering of issues in other categories. There are also 10 open > > > > issues with No Component (and 20 resolved issues), those need their > > > > metadata fixed. > > > > > > > > Thanks, > > > > Wes > > > > > > > > On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney > > wrote: > > > > > > > > > > The timeline for the 0.13 release is drawing closer. I would say we > > > > > should consider a release candidate either the week of March 18 or > > > > > March 25, which gives us ~3 weeks to close out backlog items. > > > > > > > > > > There are around 220 issues open or in-progress in > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release > > > > > > > > > > Please have a look. If issues are not assigned to someone as the next > > > > > couple of weeks pass by I'll begin moving at least C++ and Python > > > > > issues to 0.14 that don't seem like they're going to get done for > > > > > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other > > > > > components can review and curate the issues that would be helpful. > > > > > > > > > > You can help keep the JIRA issues tidy by making sure to add Fix > > > > > Version to issues and to make sure to add a Component so that issues > > > > > are properly categorized in the release notes. > > > > > > > > > > Thanks > > > > > Wes > > > > > > > > > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney > > wrote: > > > > > > > > > > > > See > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide > > > > > > > > > > > > The source release step is one of the places where problems occur. > > > > > > > > > > > > On Sat, Feb 9, 2019, 10:33 AM > > > > >> > > > > > >> > > > > > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn > > wrote: > > > > > >> > > > > > > >> > We could dockerize some of the release steps to ensure that > > they run in the same environment. > > > > > >> > > > > > >> I may be able to help with said Dockerization. If not for this > > release, then for the next.
RE: Timeline for 0.13 Arrow release
> Having C# NuGet package working would be nice I've opened https://issues.apache.org/jira/browse/ARROW-4839 for this and I will be working on it this week. I'd like to see an official Arrow NuGet package on www.nuget.org soon, so getting it in this release would work out perfect from my side.
Re: Timeline for 0.13 Arrow release
Hi, The CMake refactor is huge enough to cause unexpected post-release defects. I'd consider shipping it with the next release and let is stabilize during the development of 0.14. On Tue, Mar 12, 2019 at 3:41 PM Wes McKinney wrote: > hi Uwe, > > I would be OK with trying to release next week. Let's see what others > think. Getting the CMake refactor in and packaging sorted out is the > big priority for the C++-using ecosystem. I haven't seen anything > hugely pressing in the other impls based on the recent patch flow. > Having C# NuGet package working would be nice > > - Wes > > On Tue, Mar 12, 2019 at 9:37 AM Uwe L. Korn wrote: > > > > Hello, > > > > two things come to my mind: > > > > * I'm spending now some time again on the CMake refactor. The only > blocker I currently see is the issue with detecting the correct gtest > libraries on Windows. Otherwise this can go in. Being very confident about > my work: This will break things even though CI is green. A safer bet from > my side would be to make a release and then merge once the release vote > passed. I would then like to aim for a shorter cycle for 0.14 to get this > working and green as a lot of the refactor is needed to get packages into > distributions and thus also clear a bit the path for the availablity of an > R package for arrow. > > * I could also volunteer to be RM *if we do the release process > starting March 18*. I have limited time the week after but enough spare to > make a release in the week of March 18. > > > > Uwe > > > > On Tue, Mar 12, 2019, at 3:21 PM, Wes McKinney wrote: > > > hi folks, > > > > > > I think we are on track to be able to release toward the end of this > > > month. My proposed timeline: > > > > > > * This week (March 11-15): feature/improvement push mostly > > > * Next week (March 18-22): shift to bug fixes, stabilization, empty > > > backlog of feature/improvement JIRAs > > > * Week of March 25: propose release candidate > > > > > > Does this seem reasonable? This puts us at about 9-10 weeks from 0.12. > > > > > > We need an RM for 0.13, any PMCs want to volunteer? > > > > > > Take a look at our release page: > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219 > > > > > > Out of the open or in-progress issues, we have: > > > > > > * C#: 3 issues > > > * C++ (all components): 51 issues > > > * Java: 3 issues > > > * Python: 38 issues > > > * Rust (all components): 33 issues > > > > > > Please help curating the backlogs for each component. There's a > > > smattering of issues in other categories. There are also 10 open > > > issues with No Component (and 20 resolved issues), those need their > > > metadata fixed. > > > > > > Thanks, > > > Wes > > > > > > On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney > wrote: > > > > > > > > The timeline for the 0.13 release is drawing closer. I would say we > > > > should consider a release candidate either the week of March 18 or > > > > March 25, which gives us ~3 weeks to close out backlog items. > > > > > > > > There are around 220 issues open or in-progress in > > > > > > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release > > > > > > > > Please have a look. If issues are not assigned to someone as the next > > > > couple of weeks pass by I'll begin moving at least C++ and Python > > > > issues to 0.14 that don't seem like they're going to get done for > > > > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other > > > > components can review and curate the issues that would be helpful. > > > > > > > > You can help keep the JIRA issues tidy by making sure to add Fix > > > > Version to issues and to make sure to add a Component so that issues > > > > are properly categorized in the release notes. > > > > > > > > Thanks > > > > Wes > > > > > > > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney > wrote: > > > > > > > > > > See > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide > > > > > > > > > > The source release step is one of the places where problems occur. > > > > > > > > > > On Sat, Feb 9, 2019, 10:33 AM > > > >> > > > > >> > > > > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn > wrote: > > > > >> > > > > > >> > We could dockerize some of the release steps to ensure that > they run in the same environment. > > > > >> > > > > >> I may be able to help with said Dockerization. If not for this > release, then for the next. Are there docs on which systems we wish to > target and/or any build steps beyond the current dev container ( > https://github.com/apache/arrow/tree/master/dev/container)? > > > >
Re: Timeline for 0.13 Arrow release
hi Uwe, I would be OK with trying to release next week. Let's see what others think. Getting the CMake refactor in and packaging sorted out is the big priority for the C++-using ecosystem. I haven't seen anything hugely pressing in the other impls based on the recent patch flow. Having C# NuGet package working would be nice - Wes On Tue, Mar 12, 2019 at 9:37 AM Uwe L. Korn wrote: > > Hello, > > two things come to my mind: > > * I'm spending now some time again on the CMake refactor. The only blocker I > currently see is the issue with detecting the correct gtest libraries on > Windows. Otherwise this can go in. Being very confident about my work: This > will break things even though CI is green. A safer bet from my side would be > to make a release and then merge once the release vote passed. I would then > like to aim for a shorter cycle for 0.14 to get this working and green as a > lot of the refactor is needed to get packages into distributions and thus > also clear a bit the path for the availablity of an R package for arrow. > * I could also volunteer to be RM *if we do the release process starting > March 18*. I have limited time the week after but enough spare to make a > release in the week of March 18. > > Uwe > > On Tue, Mar 12, 2019, at 3:21 PM, Wes McKinney wrote: > > hi folks, > > > > I think we are on track to be able to release toward the end of this > > month. My proposed timeline: > > > > * This week (March 11-15): feature/improvement push mostly > > * Next week (March 18-22): shift to bug fixes, stabilization, empty > > backlog of feature/improvement JIRAs > > * Week of March 25: propose release candidate > > > > Does this seem reasonable? This puts us at about 9-10 weeks from 0.12. > > > > We need an RM for 0.13, any PMCs want to volunteer? > > > > Take a look at our release page: > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219 > > > > Out of the open or in-progress issues, we have: > > > > * C#: 3 issues > > * C++ (all components): 51 issues > > * Java: 3 issues > > * Python: 38 issues > > * Rust (all components): 33 issues > > > > Please help curating the backlogs for each component. There's a > > smattering of issues in other categories. There are also 10 open > > issues with No Component (and 20 resolved issues), those need their > > metadata fixed. > > > > Thanks, > > Wes > > > > On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney wrote: > > > > > > The timeline for the 0.13 release is drawing closer. I would say we > > > should consider a release candidate either the week of March 18 or > > > March 25, which gives us ~3 weeks to close out backlog items. > > > > > > There are around 220 issues open or in-progress in > > > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release > > > > > > Please have a look. If issues are not assigned to someone as the next > > > couple of weeks pass by I'll begin moving at least C++ and Python > > > issues to 0.14 that don't seem like they're going to get done for > > > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other > > > components can review and curate the issues that would be helpful. > > > > > > You can help keep the JIRA issues tidy by making sure to add Fix > > > Version to issues and to make sure to add a Component so that issues > > > are properly categorized in the release notes. > > > > > > Thanks > > > Wes > > > > > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney wrote: > > > > > > > > See > > > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide > > > > > > > > The source release step is one of the places where problems occur. > > > > > > > > On Sat, Feb 9, 2019, 10:33 AM > > >> > > > >> > > > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn wrote: > > > >> > > > > >> > We could dockerize some of the release steps to ensure that they run > > > >> > in the same environment. > > > >> > > > >> I may be able to help with said Dockerization. If not for this > > > >> release, then for the next. Are there docs on which systems we wish to > > > >> target and/or any build steps beyond the current dev container > > > >> (https://github.com/apache/arrow/tree/master/dev/container)? > >
Re: Timeline for 0.13 Arrow release
Hello, two things come to my mind: * I'm spending now some time again on the CMake refactor. The only blocker I currently see is the issue with detecting the correct gtest libraries on Windows. Otherwise this can go in. Being very confident about my work: This will break things even though CI is green. A safer bet from my side would be to make a release and then merge once the release vote passed. I would then like to aim for a shorter cycle for 0.14 to get this working and green as a lot of the refactor is needed to get packages into distributions and thus also clear a bit the path for the availablity of an R package for arrow. * I could also volunteer to be RM *if we do the release process starting March 18*. I have limited time the week after but enough spare to make a release in the week of March 18. Uwe On Tue, Mar 12, 2019, at 3:21 PM, Wes McKinney wrote: > hi folks, > > I think we are on track to be able to release toward the end of this > month. My proposed timeline: > > * This week (March 11-15): feature/improvement push mostly > * Next week (March 18-22): shift to bug fixes, stabilization, empty > backlog of feature/improvement JIRAs > * Week of March 25: propose release candidate > > Does this seem reasonable? This puts us at about 9-10 weeks from 0.12. > > We need an RM for 0.13, any PMCs want to volunteer? > > Take a look at our release page: > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219 > > Out of the open or in-progress issues, we have: > > * C#: 3 issues > * C++ (all components): 51 issues > * Java: 3 issues > * Python: 38 issues > * Rust (all components): 33 issues > > Please help curating the backlogs for each component. There's a > smattering of issues in other categories. There are also 10 open > issues with No Component (and 20 resolved issues), those need their > metadata fixed. > > Thanks, > Wes > > On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney wrote: > > > > The timeline for the 0.13 release is drawing closer. I would say we > > should consider a release candidate either the week of March 18 or > > March 25, which gives us ~3 weeks to close out backlog items. > > > > There are around 220 issues open or in-progress in > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release > > > > Please have a look. If issues are not assigned to someone as the next > > couple of weeks pass by I'll begin moving at least C++ and Python > > issues to 0.14 that don't seem like they're going to get done for > > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other > > components can review and curate the issues that would be helpful. > > > > You can help keep the JIRA issues tidy by making sure to add Fix > > Version to issues and to make sure to add a Component so that issues > > are properly categorized in the release notes. > > > > Thanks > > Wes > > > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney wrote: > > > > > > See > > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide > > > > > > The source release step is one of the places where problems occur. > > > > > > On Sat, Feb 9, 2019, 10:33 AM > >> > > >> > > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn wrote: > > >> > > > >> > We could dockerize some of the release steps to ensure that they run > > >> > in the same environment. > > >> > > >> I may be able to help with said Dockerization. If not for this release, > > >> then for the next. Are there docs on which systems we wish to target > > >> and/or any build steps beyond the current dev container > > >> (https://github.com/apache/arrow/tree/master/dev/container)? >
Re: Publishing C# NuGet package
thanks Eric -- that sounds great. I think we're going to want to cut the 0.13 release candidate around 2 weeks from now, so that gives some time to get the packaging things sorted out - Wes On Thu, Mar 7, 2019 at 4:46 PM Eric Erhardt wrote: > > > Some changes may need to be made to the release scripts to update C# > > metadata files. The intent it to make it so that the code artifact can be > > pushed to a package manager using the official ASF release artifact. If we > > don't get it 100% right for 0.13 then > at least we can get a preliminary > > package up there and do things 100% by the books in 0.14. > > The way you build a NuGet package is you call `dotnet pack` on the `.csproj` > file. That will build the .NET assembly (.dll) and package it into a NuGet > package (.nupkg, which is a glorified .zip file). That `.nupkg` file is then > published to the nuget.org website. > > In order to publish it to nuget.org, an account will need to be made to > publish it under. Is that something a PMC member can/will do? The intention > is for the published package to be the official "Apache Arrow" nuget package. > > The .nupkg file can optionally be signed. See > https://docs.microsoft.com/en-us/nuget/create-packages/sign-a-package. > > I can create a JIRA to add all the appropriate NuGet metadata to the .csproj > in the repo. That way no file committed into the repo will need to change in > order to create the NuGet package. I can also add the instructions to create > the NuGet into the csharp README file in that PR.
Re: Timeline for 0.13 Arrow release
hi folks, I think we are on track to be able to release toward the end of this month. My proposed timeline: * This week (March 11-15): feature/improvement push mostly * Next week (March 18-22): shift to bug fixes, stabilization, empty backlog of feature/improvement JIRAs * Week of March 25: propose release candidate Does this seem reasonable? This puts us at about 9-10 weeks from 0.12. We need an RM for 0.13, any PMCs want to volunteer? Take a look at our release page: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219 Out of the open or in-progress issues, we have: * C#: 3 issues * C++ (all components): 51 issues * Java: 3 issues * Python: 38 issues * Rust (all components): 33 issues Please help curating the backlogs for each component. There's a smattering of issues in other categories. There are also 10 open issues with No Component (and 20 resolved issues), those need their metadata fixed. Thanks, Wes On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney wrote: > > The timeline for the 0.13 release is drawing closer. I would say we > should consider a release candidate either the week of March 18 or > March 25, which gives us ~3 weeks to close out backlog items. > > There are around 220 issues open or in-progress in > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release > > Please have a look. If issues are not assigned to someone as the next > couple of weeks pass by I'll begin moving at least C++ and Python > issues to 0.14 that don't seem like they're going to get done for > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other > components can review and curate the issues that would be helpful. > > You can help keep the JIRA issues tidy by making sure to add Fix > Version to issues and to make sure to add a Component so that issues > are properly categorized in the release notes. > > Thanks > Wes > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney wrote: > > > > See > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide > > > > The source release step is one of the places where problems occur. > > > > On Sat, Feb 9, 2019, 10:33 AM >> > >> > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn wrote: > >> > > >> > We could dockerize some of the release steps to ensure that they run in > >> > the same environment. > >> > >> I may be able to help with said Dockerization. If not for this release, > >> then for the next. Are there docs on which systems we wish to target > >> and/or any build steps beyond the current dev container > >> (https://github.com/apache/arrow/tree/master/dev/container)?
[jira] [Created] (ARROW-4838) [C++] Implement safe Make constructor
Francois Saint-Jacques created ARROW-4838: - Summary: [C++] Implement safe Make constructor Key: ARROW-4838 URL: https://issues.apache.org/jira/browse/ARROW-4838 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Fix For: 0.14.0 The following classes need validating constructors: * ArrayData * ChunkedArray * RecordBatch * Column * Table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4837) [C++] Support c++filt on a custom path in the run-test.sh script
Krisztian Szucs created ARROW-4837: -- Summary: [C++] Support c++filt on a custom path in the run-test.sh script Key: ARROW-4837 URL: https://issues.apache.org/jira/browse/ARROW-4837 Project: Apache Arrow Issue Type: Improvement Reporter: Krisztian Szucs On conda this is CXXFILT=/opt/conda/bin/x86_64-conda_cos6-linux-gnu-c++filt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4835) [GLib] Add boolean operations
Kouhei Sutou created ARROW-4835: --- Summary: [GLib] Add boolean operations Key: ARROW-4835 URL: https://issues.apache.org/jira/browse/ARROW-4835 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4836) "Cannot tell() a compressed stream" when using RecordBatchStreamWriter
Mike Pedersen created ARROW-4836: Summary: "Cannot tell() a compressed stream" when using RecordBatchStreamWriter Key: ARROW-4836 URL: https://issues.apache.org/jira/browse/ARROW-4836 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.12.1 Reporter: Mike Pedersen It does not seem like RecordBatchStreamWriter works with compressed streams: {code:python} >>> import pyarrow as pa >>> pa.__version__ '0.12.1' >>> stream = pa.output_stream('/tmp/a.gz') >>> batch = pa.RecordBatch.from_arrays([pa.array([1])], ['a']) >>> writer = pa.RecordBatchStreamWriter(stream, batch.schema) >>> writer.write(batch) Traceback (most recent call last): File "", line 1, in File "pyarrow/ipc.pxi", line 181, in pyarrow.lib._RecordBatchWriter.write File "pyarrow/ipc.pxi", line 196, in pyarrow.lib._RecordBatchWriter.write_batch File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: Cannot tell() a compressed stream {code} As I understand the documentation, this should be possible, right? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Depending on non-released Apache projects (C++ Avro)
Hello Micah, > Uwe, I'm not sure I understand what type of support/help you are thinking > of. Could you elaborate a little bit more before I reach out? I would help them with the same build system improvement we have done in the recent time (and are currently) doing in Arrow for C++. Nothing I would explicitly advertise on their ML, only as a heads up if there are Avro people on the Arrow ML. I cannot give any commitments on this but this would be probably some high gain for not so much work if that is currently hindering releases or adoption. Uwe > > -Micah > > On Tue, Mar 5, 2019 at 4:53 PM Wes McKinney wrote: > > > I am OK with that, but if we find ourselves making compromises that > > affect performance or memory efficiency (where possibly invasive > > refactoring may be required) perhaps we should reconsider option #3. > > > > On Tue, Mar 5, 2019 at 11:29 AM Uwe L. Korn wrote: > > > > > > I'm leaning a bit towards 1) but I would love to get some input from the > > Avro community as 1) depends also on their side as we will submit some > > patches upstream that need to be reviewed and someday also released. > > > > > > Are AVRO committers subscribed here or should we reach out to them on > > their ML? Given that we are quite active in the C++ space currently, I feel > > that we can contribute quite some infrastructure in building and packaging > > that we do eitherway for Arrow. This might be quite helpful for a project. > > We have seen with Parquet where much of the development is just happening > > as it is part of Arrow. (Not suggesting to merge/fork the Avro codebase but > > just to apply some of the best practices we learned while building Arrow). > > > > > > Uwe > > > > > > On Tue, Mar 5, 2019, at 4:57 PM, Wes McKinney wrote: > > > > I'd be +0.5 in favor of forking in this particular case. Since Avro is > > > > not vectorized (unlike Parquet and ORC) I suspect it may be more > > > > difficult to get the best performance using a general purpose API > > > > versus one that is more specialized to producing Arrow record batches. > > > > Given that has been relatively light C++ development activity in > > > > Apache Avro and no releases for 2 years it does give me pause. > > > > > > > > We might want to look at Impala's Avro scanner, they are doing some > > > > LLVM IR cross-compilation also (they're using the Avro C++ library > > > > though) > > > > > > > > > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner-ir.cc > > > > > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner.cc > > > > > > > > On Tue, Mar 5, 2019 at 1:01 AM Micah Kornfield > > wrote: > > > > > > > > > > I'm looking at incorporating Avro in Arrow C++ [1]. It seems that > > the Avro > > > > > C++ library APIs have improved from the last release. However, it > > is not > > > > > clear when a new release will be available (I asked on the JIRA > > Item for > > > > > the next release [2] and received no response). > > > > > > > > > > I was wondering if there is a policy governing using other Apache > > projects > > > > > or how people felt about the following options: > > > > > 1. Depend on a specific git commit through the third-party library > > system. > > > > > 2. Copy the necessary source code temporarily to our project, and > > change > > > > > to using the next release when it is available. > > > > > 3. Fork the code we need (the main benefit I see here is being able > > to > > > > > refactor it to avoid having to deal with exceptions, easier > > integration > > > > > with our IO system and one less 3rd party dependency to deal with). > > > > > 4. Wait on the 1.9 release before proceeding. > > > > > > > > > > Thanks, > > > > > Micah > > > > > > > > > > [1] https://issues.apache.org/jira/browse/ARROW-1209 > > > > > [2] https://issues.apache.org/jira/browse/AVRO-2250 > > > > > > >
Re: [C++] Failing constructors and internal state
+1 on Make and MakeUnsafe On Mon, Mar 11, 2019 at 3:15 PM Francois Saint-Jacques < fsaintjacq...@gmail.com> wrote: > I can settle with Micah's proposal of `MakeValidate`, though I'd prefer > that Make is safe by default and responsible users use MakeUnsafe :), I'll > settle with the pragmatic choice of backward compatibility. > > François > > > On Sun, Mar 10, 2019 at 5:45 PM Wes McKinney wrote: > > > I think having consistent methods for both validated and unvalidated > > construction is a good idea. Being fairly passionate about > > microperformance, I don't think we should penalize responsible users > > of unsafe/unvalidated APIs (e.g. by taking them away and replacing > > them with variants featuring unavoidable computation), this can > > partially be handled through developer documentation (which, ah, we > > will need to write more of) > > > > On Sun, Mar 10, 2019 at 4:01 PM Micah Kornfield > > wrote: > > > > > > I agree there should always be a path to avoid the validation but I > > think there should also be an easy way to have validation included and a > > clear way to tell the difference. IMO, having strong naming convention > so > > callers can tell the difference, and code reviewers can focus more on > less > > safe method usage, is important. I will help new-comers to the project > > write safer code. Which can either be refactored or called out in code > > review for performance issues. It also provides a cue for all developers > > to consider if they are meeting the necessary requirements when using > less > > safe methods. > > > > > > A straw-man proposal for naming conventions: > > > - Constructors are always unvalidated (should still have appropriate > > DCHECKs) > > > - "Make" calls are always unvalidated (should still have appropriate > > DHCECKs) > > > - "MakeValidated" ensures proper structural validation occur (but not > > data is validation). > > > - "MakeSanitized" ensures proper structural and data is validations > occur > > > > > > As noted above it might only pay to refactor a small amount of current > > usage to the safer APIs. > > > > > > We could potentially go even further down the rabbit hole and try to > > define standard for a Hungarian notation [1] to make it more obvious what > > invariants are expected for a particular data-structure variable (I'm > > actually -.5 on this). > > > > > > As a personal bias, I would rather have slower code that has lower risk > > of crashing in production than faster code that does. Obviously, there > is > > a tradeoff here, and the ideal is faster code that won't segfault. > > > > > > Thoughts? > > > > > > -Micah > > > > > > [1] > > https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/ > > > > > > On Sun, Mar 10, 2019 at 9:38 AM Wes McKinney > > wrote: > > >> > > >> hi folks, > > >> > > >> I think some issues are being conflated here, so let me try to dig > > >> through them. Let's first look at the two cited bugs that were fixed, > > >> if I have this right: > > >> > > >> * ARROW-4766: root cause dereferencing a null pointer > > >> * ARROW-4774: root cause unsanitized Python user input > > >> > > >> None of the 4 remedies listed could have prevented ARROW-4766 AFAICT > > >> since we currently allow for null buffers (the object, not the pointer > > >> inside) in ArrayData. This has been discussed on the mailing list in > > >> the past; "sanitizing" ArrayData to be free of null objects would be > > >> expensive and my general attitude in the C++ library is that we should > > >> not be in the business of paying extra CPU cycles for the 1-5% case > > >> when it is unneeded in the 95-99% of cases. We have DCHECK assertions > > >> to check these issues in debug builds while avoiding the costs in > > >> release builds. In the case of checking edge cases in computational > > >> kernels, suffice to say that we should check every kernel on length-0 > > >> input with null buffers to make sure this case is properly handled > > >> > > >> In the case of ARROW-4774, we should work at the language binding > > >> interface to make sure we have convenient "validating" constructors > > >> that check user input for common problems. This can prevent the > > >> duplication of this code across the binding layers (GLib, Python, R, > > >> MATLAB, etc.) > > >> > > >> On the specific 4 steps mentioned by Francois, here are my thoughts: > > >> > > >> 1. Having StatusOr would be useful, but this is a programming > > convenience > > >> > > >> 2. There are a couple purposes of the static factory methods that > > >> exist now, like Table::Make and RecordBatch::Make. One of the reasons > > >> that I added them initially was because of the implicit constructor > > >> behavior of std::vector inside a call to std::make_shared. If you have > > >> a std::vector argument in a class's constructor, then > > >> std::make_shared(..., {...}) will not result in the initializer > > >> list constructing the std::vector. This means some awkwardness like >