[jira] [Created] (ARROW-4851) [Java] BoundsChecking.java defaulting behavior for old drill parameter seems off

2019-03-12 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4851:
--

 Summary: [Java] BoundsChecking.java defaulting behavior for old 
drill parameter seems off
 Key: ARROW-4851
 URL: https://issues.apache.org/jira/browse/ARROW-4851
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield


In order to turn bounds checking off you still need to flip the old deprecated 
parameter.  We should probably change it to true by default, so only the new 
arrow parameter is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Depending on non-released Apache projects (C++ Avro)

2019-03-12 Thread Micah Kornfield
Thanks.  This work has been pushed off a bit because I need to get existing
PRs into better shape.  Hopefully 1.9 is released when I pick it up
(otherwise I would lean towards forking for the time being as well).

On Tue, Mar 12, 2019 at 3:01 AM Uwe L. Korn  wrote:

> Hello Micah,
>
> > Uwe, I'm not sure I understand what type of support/help you are thinking
> > of.  Could you elaborate a little bit more before I reach out?
>
> I would help them with the same build system improvement we have done in
> the recent time (and are currently) doing in Arrow for C++.  Nothing I
> would explicitly advertise on their ML, only as a heads up if there are
> Avro people on the Arrow ML. I cannot give any commitments on this but this
> would be probably some high gain for not so much work if that is currently
> hindering releases or adoption.
>
> Uwe
>
> >
> > -Micah
> >
> > On Tue, Mar 5, 2019 at 4:53 PM Wes McKinney  wrote:
> >
> > > I am OK with that, but if we find ourselves making compromises that
> > > affect performance or memory efficiency (where possibly invasive
> > > refactoring may be required) perhaps we should reconsider option #3.
> > >
> > > On Tue, Mar 5, 2019 at 11:29 AM Uwe L. Korn  wrote:
> > > >
> > > > I'm leaning a bit towards 1) but I would love to get some input from
> the
> > > Avro community as 1) depends also on their side as we will submit some
> > > patches upstream that need to be reviewed and someday also released.
> > > >
> > > > Are AVRO committers subscribed here or should we reach out to them on
> > > their ML? Given that we are quite active in the C++ space currently, I
> feel
> > > that we can contribute quite some infrastructure in building and
> packaging
> > > that we do eitherway for Arrow. This might be quite helpful for a
> project.
> > > We have seen with Parquet where much of the development is just
> happening
> > > as it is part of Arrow. (Not suggesting to merge/fork the Avro
> codebase but
> > > just to apply some of the  best practices we learned while building
> Arrow).
> > > >
> > > > Uwe
> > > >
> > > > On Tue, Mar 5, 2019, at 4:57 PM, Wes McKinney wrote:
> > > > > I'd be +0.5 in favor of forking in this particular case. Since
> Avro is
> > > > > not vectorized (unlike Parquet and ORC) I suspect it may be more
> > > > > difficult to get the best performance using a general purpose API
> > > > > versus one that is more specialized to producing Arrow record
> batches.
> > > > > Given that has been relatively light C++ development activity in
> > > > > Apache Avro and no releases for 2 years it does give me pause.
> > > > >
> > > > > We might want to look at Impala's Avro scanner, they are doing some
> > > > > LLVM IR cross-compilation also (they're using the Avro C++ library
> > > > > though)
> > > > >
> > > > >
> > >
> https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner-ir.cc
> > > > >
> > >
> https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner.cc
> > > > >
> > > > > On Tue, Mar 5, 2019 at 1:01 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > I'm looking at incorporating Avro in Arrow C++ [1]. It  seems
> that
> > > the Avro
> > > > > > C++ library APIs  have improved from the last release.  However,
> it
> > > is not
> > > > > > clear when a new release will be available (I asked on the  JIRA
> > > Item for
> > > > > > the next release [2] and received no response).
> > > > > >
> > > > > > I was wondering if there is a policy governing using other Apache
> > > projects
> > > > > > or how people felt about the following options:
> > > > > > 1.  Depend on a specific git commit through the third-party
> library
> > > system.
> > > > > > 2.  Copy the necessary source code temporarily to our project,
> and
> > > change
> > > > > > to using the next release when it is available.
> > > > > > 3.  Fork the code we need (the main benefit I see here is being
> able
> > > to
> > > > > > refactor it to avoid having to deal with exceptions, easier
> > > integration
> > > > > > with our IO system and one less 3rd party dependency to deal
> with).
> > > > > > 4.  Wait on the 1.9 release before proceeding.
> > > > > >
> > > > > > Thanks,
> > > > > > Micah
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/ARROW-1209
> > > > > > [2] https://issues.apache.org/jira/browse/AVRO-2250
> > > > >
> > >
> >
>


[jira] [Created] (ARROW-4850) [CI] Integration test failures do not fail the Travis CI build

2019-03-12 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4850:
---

 Summary: [CI] Integration test failures do not fail the Travis CI 
build
 Key: ARROW-4850
 URL: https://issues.apache.org/jira/browse/ARROW-4850
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Wes McKinney
 Fix For: 0.13.0


See https://github.com/apache/arrow/pull/3871

These changes fail the build, but it is reported as success

The errors can be seen in https://travis-ci.org/apache/arrow/jobs/505028161



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4849) [C++] Add docker-compose entry for testing Ubuntu Bionic build with system packages

2019-03-12 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4849:
--

 Summary: [C++] Add docker-compose entry for testing Ubuntu Bionic 
build with system packages
 Key: ARROW-4849
 URL: https://issues.apache.org/jira/browse/ARROW-4849
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.13.0


To better support people on Ubuntu and also show the missing things to get 
Arrow packaged into Fedora, add an entry to the docker-compose.yml that builds 
on Ubuntu



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4848) Static libparquet not compiled with -DARROW_STATIC on Windows

2019-03-12 Thread Jeroen (JIRA)
Jeroen created ARROW-4848:
-

 Summary: Static libparquet not compiled with -DARROW_STATIC on 
Windows
 Key: ARROW-4848
 URL: https://issues.apache.org/jira/browse/ARROW-4848
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.12.1
Reporter: Jeroen


When trying to link the R bindings against static libparquet.a + libarrow.a we 
get a lot of missing arrow symbol warnings from libparquet.a. I think the 
problem is that libparquet.a was not compiled -DARROW_STATIC, and therefore 
cannot be linked against libarrow.a.

When arrow cmake is configured with  -DARROW_BUILD_SHARED=OFF I think it should 
automatically use -DARROW_STATIC when compiling libparquet on Windows?




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4847) [Python] Add pyarrow.table factory function that dispatches to various ctors based on type of input

2019-03-12 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4847:
---

 Summary: [Python] Add pyarrow.table factory function that 
dispatches to various ctors based on type of input
 Key: ARROW-4847
 URL: https://issues.apache.org/jira/browse/ARROW-4847
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.13.0


For example, in {{pyarrow.table(df)}} if {{df}} is a {{pandas.DataFrame}}, then 
table will dispatch to {{pa.Table.from_pandas}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4846) [Java] Update Jackson to 2.9.8

2019-03-12 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4846:
---

 Summary: [Java] Update Jackson to 2.9.8
 Key: ARROW-4846
 URL: https://issues.apache.org/jira/browse/ARROW-4846
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Wes McKinney
Assignee: Andy Grove
 Fix For: 0.13.0


We are looking at removing Jackson from arrow-vector dependencies in ARROW-2501



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: MATLAB, Arrow, ABI's and Linux

2019-03-12 Thread Uwe L. Korn
Hello Joris,

'/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64':
/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64:
undefined symbol: _ZNK5arrow6Status8ToStringEv.

sounds like fromArrowStream.mexa64 was still compiled using  
-D_GLIBCXX_USE_CXX11_ABI=0, you might try to explicitly pass  
-D_GLIBCXX_USE_CXX11_ABI=1 or when building an conda environment, be sure to 
have the compilers package installed and use the $CC and $CXX environment 
variables to pick the right compilers. You may also need to LD_PRELOAD the 
libstdc++.so that is coming with conda and not the one coming from the system.

Anaconda/defaults and conda-forge based Arrow packages are both nowadays built 
with the new ABI. But they are built with slightly different toolchains, so it 
is best to only install packages from one of the two repositories and don't mix 
them.

Uwe


On Tue, Mar 12, 2019, at 5:32 PM, Wes McKinney wrote:
> hi Joris,
> 
> You probably ran into the conda-forge compiler migration. I'm not sure
> about Anaconda's Apache Arrow libraries since they maintain those
> recipes.
> 
> If you need shared libaries using the gcc 4.x ABI you may have to
> build them yourself right, or use the Linux packages for the platform
> where you are working. It would be useful to have a Dockerfile that
> produces "portable" shared libraries with the RedHat devtoolset-2
> compiler
> 
> - Wes
> 
> On Tue, Mar 12, 2019 at 11:22 AM Joris Peeters
>  wrote:
> >
> > [A] Short background:
> >
> > We are working on a MEX library that converts a binary array (representing
> > an Arrow stream or file) into MATLAB structs. This is in
> > parallel/complement to what already exists in the main Arrow project, which
> > focuses on feather, but the hope is certainly to contribute back some of
> > this work. We talked to MathWorks about this already (as GSA Capital).
> >
> > The library (e.g. fromArrowStream.mexa64) gets published on
> > (company-internal) Anaconda, and upon installation the dependencies on
> > arrow-cpp, boost-cpp etc are resolved (from remote channels). All .so's end
> > up in a user-local conda environment's ../lib, which in MATLAB we make
> > available through addpath. Compilation uses  -D_GLIBCXX_USE_CXX11_ABI=0.
> >
> > [B] The issue I'm facing ...
> >
> > For quite a while (when we depended on arrow-cpp=0.10 from conda-forge)
> > this has worked fine, but lately I've encountered increasing issues wrt ABI
> > compatibility, amongst arrow, gtest (both at build time) and MATLAB (at
> > run-time).
> >
> > Is arrow-cpp 0.11 only built with the new ABI? I loaded it from both
> > defaults and conda-forge channel, and it seems different in this regard
> > than conda-forge's 0.10. Either way, I'm now attempting to built my library
> > without the -D_GLIBCXX_USE_CXX11_ABI=0 compile flag, as that seems to be
> > the more sustainable way forward.
> >
> > Question: is it possible to load a MEX library that has been compiled with
> > the new ABI? When doing this naively, I get an error like:
> >
> > Invalid MEX-file
> > '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64':
> > /data/home/jpeeter/apps/matlab/MATLAB/R2018a/bin/glnxa64/../../sys/os/glnxa64/libstdc++.so.6:
> > version `CXXABI_1.3.11' not found (required by
> > /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/libarrow.so.11).
> >
> > which is fair enough.
> >
> > Alternatively, though, when loading MATLAB with an LD_PRELOAD, like
> > LD_PRELOAD=/usr/lib64/libstdc++.so.6 ~/apps/matlab/MATLAB/R2018a/bin/matlab
> >
> > I get this error:
> > Invalid MEX-file
> > '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64':
> > /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64:
> > undefined symbol: _ZNK5arrow6Status8ToStringEv.
> >
> > If this isn't possible, is there a reliable/recommended Anaconda-way to
> > only bring in libraries that have been compiled with the old ABI? My
> > impression was that conda-forge libraries satisfied that, but
> > - This no longer seems to be true for arrow-cpp=0.11? I might be mistaken
> > here.
> > - We are resolving our dependencies through Anaconda, so it could be quite
> > brittle for users to explicitly have to specify a channel for certain
> > libraries. There are further subtleties wrt resolving boost & arrow from
> > different channels etc. Ideally - but not necessarily - users don't depend
> > on conda-forge at all, but only on the default channels (as an aside: we
> > use Artifactory internally).
> >
> > Essentially, I'm happy with either,
> > - having a way for MATLAB to load MEX with the new ABI, or,
> > - reliably depending on libraries compiled with the old ABI
> >
> > but I've been struggling for a while now to achieve either one of those.
> > Any pointers would be most welcome!
> >
> > (For once, this turned out to be easier on Windows than Linux!)
> >
> > Best,
> > -Joris.
>


[jira] [Created] (ARROW-4845) Compiler warnings on Windows

2019-03-12 Thread Jeroen (JIRA)
Jeroen created ARROW-4845:
-

 Summary: Compiler warnings on Windows
 Key: ARROW-4845
 URL: https://issues.apache.org/jira/browse/ARROW-4845
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 0.12.1
Reporter: Jeroen


I am seeing the warnings below when compiling the R bindings on Windows. Most 
of these seem easy to fix (comparing int with size_t or int32 with int64).

{code}
array.cpp: In function 'Rcpp::LogicalVector Array__Mask(const 
std::shared_ptr&)':
array.cpp:102:24: warning: comparison of integer expressions of different 
signedness: 'size_t' {aka 'long long unsigned int'} and 'int64_t' {aka 'long 
long int'} [-Wsign-compare]
   for (size_t i = 0; i < array->length(); i++, bitmap_reader.Next()) {
  ~~^
/mingw64/bin/g++  -std=gnu++11 -I"C:/PROGRA~1/R/R-testing/include" -DNDEBUG 
-DARROW_STATIC -I"C:/R/library/Rcpp/include"-O2 -Wall  -mtune=generic 
-c array__to_vector.cpp -o array__to_vector.o
array__to_vector.cpp: In member function 'virtual arrow::Status 
arrow::r::Converter_Boolean::Ingest_some_nulls(SEXP, const 
std::shared_ptr&, R_xlen_t, R_xlen_t) const':
array__to_vector.cpp:254:28: warning: comparison of integer expressions of 
different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' 
{aka 'long long int'} [-Wsign-compare]
   for (size_t i = 0; i < n; i++, data_reader.Next(), null_reader.Next(), 
++p_data) {
  ~~^~~
array__to_vector.cpp:258:28: warning: comparison of integer expressions of 
different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' 
{aka 'long long int'} [-Wsign-compare]
   for (size_t i = 0; i < n; i++, data_reader.Next(), ++p_data) {
  ~~^~~
array__to_vector.cpp: In member function 'virtual arrow::Status 
arrow::r::Converter_Decimal::Ingest_some_nulls(SEXP, const 
std::shared_ptr&, R_xlen_t, R_xlen_t) const':
array__to_vector.cpp:473:28: warning: comparison of integer expressions of 
different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' 
{aka 'long long int'} [-Wsign-compare]
   for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data) {
  ~~^~~
array__to_vector.cpp:478:28: warning: comparison of integer expressions of 
different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' 
{aka 'long long int'} [-Wsign-compare]
   for (size_t i = 0; i < n; i++, ++p_data) {
  ~~^~~
array__to_vector.cpp: In member function 'virtual arrow::Status 
arrow::r::Converter_Int64::Ingest_some_nulls(SEXP, const 
std::shared_ptr&, R_xlen_t, R_xlen_t) const':
array__to_vector.cpp:515:28: warning: comparison of integer expressions of 
different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' 
{aka 'long long int'} [-Wsign-compare]
   for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data) {
  ~~^~~
array__to_vector.cpp: In instantiation of 'arrow::Status 
arrow::r::SomeNull_Ingest(SEXP, R_xlen_t, R_xlen_t, const array_value_type*, 
const std::shared_ptr&, Lambda) [with int RTYPE = 14; 
array_value_type = long long int; Lambda = 
arrow::r::Converter_Date64::Ingest_some_nulls(SEXP, const 
std::shared_ptr&, R_xlen_t, R_xlen_t) const::; 
SEXP = SEXPREC*; R_xlen_t = long long int]':
array__to_vector.cpp:366:77:   required from here
array__to_vector.cpp:116:26: warning: comparison of integer expressions of 
different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' 
{aka 'long long int'} [-Wsign-compare]
 for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data, ++p_values) 
{
~~^~~
array__to_vector.cpp: In instantiation of 'arrow::Status 
arrow::r::SomeNull_Ingest(SEXP, R_xlen_t, R_xlen_t, const array_value_type*, 
const std::shared_ptr&, Lambda) [with int RTYPE = 13; 
array_value_type = unsigned char; Lambda = 
arrow::r::Converter_Dictionary::Ingest_some_nulls_Impl(SEXP, const 
std::shared_ptr&, R_xlen_t, R_xlen_t) const [with Type = 
arrow::UInt8Type; SEXP = SEXPREC*; R_xlen_t = long long 
int]::; SEXP = SEXPREC*; R_xlen_t = long long int]':
array__to_vector.cpp:341:47:   required from 'arrow::Status 
arrow::r::Converter_Dictionary::Ingest_some_nulls_Impl(SEXP, const 
std::shared_ptr&, R_xlen_t, R_xlen_t) const [with Type = 
arrow::UInt8Type; SEXP = SEXPREC*; R_xlen_t = long long int]'
array__to_vector.cpp:313:78:   required from here
array__to_vector.cpp:116:26: warning: comparison of integer expressions of 
different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' 
{aka 'long long int'} [-Wsign-compare]
array__to_vector.cpp: In instantiation of 'arrow::Status 
arrow::r::SomeNull_Ingest(SEXP, R_xlen_t, R_xlen_t, const array_value_type*, 
const std::shared_ptr&, Lambda) [with int RTYPE = 13; 
array_value

[jira] [Created] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion

2019-03-12 Thread Jeroen (JIRA)
Jeroen created ARROW-4844:
-

 Summary: Static libarrow is missing vendored libdouble-conversion
 Key: ARROW-4844
 URL: https://issues.apache.org/jira/browse/ARROW-4844
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.12.1
Reporter: Jeroen


When trying to statically link libarrow.a I get linking errors which suggest 
that libdouble-conversion.a was not properly embedded in libarrow.a. This 
problem happens on both MacOS and Windows.

For the R bindings we need static libraries.

{code}
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
 
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c):
 undefined reference to 
`double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
int*) const'
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
 
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda):
 undefined reference to 
`double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
int*) const'
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
 
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097):
 undefined reference to 
`double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, 
int*) const'
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
 
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589):
 undefined reference to 
`double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
int*) const'
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
 
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647):
 undefined reference to 
`double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, 
int*) const'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4843) [Rust] [DataFusion] Parquet data source should support DATE

2019-03-12 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4843:
-

 Summary: [Rust] [DataFusion] Parquet data source should support 
DATE
 Key: ARROW-4843
 URL: https://issues.apache.org/jira/browse/ARROW-4843
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Affects Versions: 0.13.0
Reporter: Andy Grove
 Fix For: 0.13.0


The new Parquet data source (ARROW-4466) fails with "Unable to convert parquet 
logical type DATE" when reading some parquet files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4840) [C++] Persist CMake options in generated header

2019-03-12 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4840:
--

 Summary: [C++] Persist CMake options in generated header
 Key: ARROW-4840
 URL: https://issues.apache.org/jira/browse/ARROW-4840
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.14.0


(do this after we merged the CMake refactor)

 

We should export all compile-time options like ARROW_WITH_ZSTD or ARROW_PARQUET 
in a header file so that other libraries depending on Arrow C++ can also 
respect that in their builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Timeline for 0.13 Arrow release

2019-03-12 Thread Andy Grove
I've cleaned up my issues for Rust, moving most of them to 0.14.0.

I have two PRs in progress that I would appreciate reviews on:

https://github.com/apache/arrow/pull/3671 - [Rust] Table API (a.k.a
DataFrame)

https://github.com/apache/arrow/pull/3851 - [Rust] Parquet data source in
DataFusion

Once these are merged I have some small follow up PRs for 0.13.0 that I can
get done this week.

Thanks,

Andy.


On Tue, Mar 12, 2019 at 8:21 AM Wes McKinney  wrote:

> hi folks,
>
> I think we are on track to be able to release toward the end of this
> month. My proposed timeline:
>
> * This week (March 11-15): feature/improvement push mostly
> * Next week (March 18-22): shift to bug fixes, stabilization, empty
> backlog of feature/improvement JIRAs
> * Week of March 25: propose release candidate
>
> Does this seem reasonable? This puts us at about 9-10 weeks from 0.12.
>
> We need an RM for 0.13, any PMCs want to volunteer?
>
> Take a look at our release page:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219
>
> Out of the open or in-progress issues, we have:
>
> * C#: 3 issues
> * C++ (all components): 51 issues
> * Java: 3 issues
> * Python: 38 issues
> * Rust (all components): 33 issues
>
> Please help curating the backlogs for each component. There's a
> smattering of issues in other categories. There are also 10 open
> issues with No Component (and 20 resolved issues), those need their
> metadata fixed.
>
> Thanks,
> Wes
>
> On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney  wrote:
> >
> > The timeline for the 0.13 release is drawing closer. I would say we
> > should consider a release candidate either the week of March 18 or
> > March 25, which gives us ~3 weeks to close out backlog items.
> >
> > There are around 220 issues open or in-progress in
> >
> > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release
> >
> > Please have a look. If issues are not assigned to someone as the next
> > couple of weeks pass by I'll begin moving at least C++ and Python
> > issues to 0.14 that don't seem like they're going to get done for
> > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other
> > components can review and curate the issues that would be helpful.
> >
> > You can help keep the JIRA issues tidy by making sure to add Fix
> > Version to issues and to make sure to add a Component so that issues
> > are properly categorized in the release notes.
> >
> > Thanks
> > Wes
> >
> > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney 
> wrote:
> > >
> > > See
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> > >
> > > The source release step is one of the places where problems occur.
> > >
> > > On Sat, Feb 9, 2019, 10:33 AM  > >>
> > >>
> > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn  wrote:
> > >> >
> > >> > We could dockerize some of the release steps to ensure that they
> run in the same environment.
> > >>
> > >> I may be able to help with said Dockerization. If not for this
> release, then for the next. Are there docs on which systems we wish to
> target and/or any build steps beyond the current dev container (
> https://github.com/apache/arrow/tree/master/dev/container)?
>


[jira] [Created] (ARROW-4842) [C++] Persist CMake options in pkg-config files

2019-03-12 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4842:
--

 Summary: [C++] Persist CMake options in pkg-config files
 Key: ARROW-4842
 URL: https://issues.apache.org/jira/browse/ARROW-4842
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.14.0


Persist options like ARROW_WITH_ZSTD in {{arrow.pc}} so libraries can determine 
which features are available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: MATLAB, Arrow, ABI's and Linux

2019-03-12 Thread Wes McKinney
hi Joris,

You probably ran into the conda-forge compiler migration. I'm not sure
about Anaconda's Apache Arrow libraries since they maintain those
recipes.

If you need shared libaries using the gcc 4.x ABI you may have to
build them yourself right, or use the Linux packages for the platform
where you are working. It would be useful to have a Dockerfile that
produces "portable" shared libraries with the RedHat devtoolset-2
compiler

- Wes

On Tue, Mar 12, 2019 at 11:22 AM Joris Peeters
 wrote:
>
> [A] Short background:
>
> We are working on a MEX library that converts a binary array (representing
> an Arrow stream or file) into MATLAB structs. This is in
> parallel/complement to what already exists in the main Arrow project, which
> focuses on feather, but the hope is certainly to contribute back some of
> this work. We talked to MathWorks about this already (as GSA Capital).
>
> The library (e.g. fromArrowStream.mexa64) gets published on
> (company-internal) Anaconda, and upon installation the dependencies on
> arrow-cpp, boost-cpp etc are resolved (from remote channels). All .so's end
> up in a user-local conda environment's ../lib, which in MATLAB we make
> available through addpath. Compilation uses  -D_GLIBCXX_USE_CXX11_ABI=0.
>
> [B] The issue I'm facing ...
>
> For quite a while (when we depended on arrow-cpp=0.10 from conda-forge)
> this has worked fine, but lately I've encountered increasing issues wrt ABI
> compatibility, amongst arrow, gtest (both at build time) and MATLAB (at
> run-time).
>
> Is arrow-cpp 0.11 only built with the new ABI? I loaded it from both
> defaults and conda-forge channel, and it seems different in this regard
> than conda-forge's 0.10. Either way, I'm now attempting to built my library
> without the -D_GLIBCXX_USE_CXX11_ABI=0 compile flag, as that seems to be
> the more sustainable way forward.
>
> Question: is it possible to load a MEX library that has been compiled with
> the new ABI? When doing this naively, I get an error like:
>
> Invalid MEX-file
> '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64':
> /data/home/jpeeter/apps/matlab/MATLAB/R2018a/bin/glnxa64/../../sys/os/glnxa64/libstdc++.so.6:
> version `CXXABI_1.3.11' not found (required by
> /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/libarrow.so.11).
>
> which is fair enough.
>
> Alternatively, though, when loading MATLAB with an LD_PRELOAD, like
> LD_PRELOAD=/usr/lib64/libstdc++.so.6 ~/apps/matlab/MATLAB/R2018a/bin/matlab
>
> I get this error:
> Invalid MEX-file
> '/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64':
> /data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64:
> undefined symbol: _ZNK5arrow6Status8ToStringEv.
>
> If this isn't possible, is there a reliable/recommended Anaconda-way to
> only bring in libraries that have been compiled with the old ABI? My
> impression was that conda-forge libraries satisfied that, but
> - This no longer seems to be true for arrow-cpp=0.11? I might be mistaken
> here.
> - We are resolving our dependencies through Anaconda, so it could be quite
> brittle for users to explicitly have to specify a channel for certain
> libraries. There are further subtleties wrt resolving boost & arrow from
> different channels etc. Ideally - but not necessarily - users don't depend
> on conda-forge at all, but only on the default channels (as an aside: we
> use Artifactory internally).
>
> Essentially, I'm happy with either,
> - having a way for MATLAB to load MEX with the new ABI, or,
> - reliably depending on libraries compiled with the old ABI
>
> but I've been struggling for a while now to achieve either one of those.
> Any pointers would be most welcome!
>
> (For once, this turned out to be easier on Windows than Linux!)
>
> Best,
> -Joris.


[jira] [Created] (ARROW-4841) [C++] Persist CMake options in generated CMake config

2019-03-12 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4841:
--

 Summary: [C++] Persist CMake options in generated CMake config
 Key: ARROW-4841
 URL: https://issues.apache.org/jira/browse/ARROW-4841
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.14.0


(do this after we merged the CMake refactor)

 

We should persist all options set during the CMake run also in 
{{arrowConfig.cmake}} so that CMake projects that depend on Arrow also can 
determine what they are able to provide.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


MATLAB, Arrow, ABI's and Linux

2019-03-12 Thread Joris Peeters
[A] Short background:

We are working on a MEX library that converts a binary array (representing
an Arrow stream or file) into MATLAB structs. This is in
parallel/complement to what already exists in the main Arrow project, which
focuses on feather, but the hope is certainly to contribute back some of
this work. We talked to MathWorks about this already (as GSA Capital).

The library (e.g. fromArrowStream.mexa64) gets published on
(company-internal) Anaconda, and upon installation the dependencies on
arrow-cpp, boost-cpp etc are resolved (from remote channels). All .so's end
up in a user-local conda environment's ../lib, which in MATLAB we make
available through addpath. Compilation uses  -D_GLIBCXX_USE_CXX11_ABI=0.

[B] The issue I'm facing ...

For quite a while (when we depended on arrow-cpp=0.10 from conda-forge)
this has worked fine, but lately I've encountered increasing issues wrt ABI
compatibility, amongst arrow, gtest (both at build time) and MATLAB (at
run-time).

Is arrow-cpp 0.11 only built with the new ABI? I loaded it from both
defaults and conda-forge channel, and it seems different in this regard
than conda-forge's 0.10. Either way, I'm now attempting to built my library
without the -D_GLIBCXX_USE_CXX11_ABI=0 compile flag, as that seems to be
the more sustainable way forward.

Question: is it possible to load a MEX library that has been compiled with
the new ABI? When doing this naively, I get an error like:

Invalid MEX-file
'/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64':
/data/home/jpeeter/apps/matlab/MATLAB/R2018a/bin/glnxa64/../../sys/os/glnxa64/libstdc++.so.6:
version `CXXABI_1.3.11' not found (required by
/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/libarrow.so.11).

which is fair enough.

Alternatively, though, when loading MATLAB with an LD_PRELOAD, like
LD_PRELOAD=/usr/lib64/libstdc++.so.6 ~/apps/matlab/MATLAB/R2018a/bin/matlab

I get this error:
Invalid MEX-file
'/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64':
/data/home/jpeeter/apps/anaconda3/envs/testarrow/lib/fromArrowStream.mexa64:
undefined symbol: _ZNK5arrow6Status8ToStringEv.

If this isn't possible, is there a reliable/recommended Anaconda-way to
only bring in libraries that have been compiled with the old ABI? My
impression was that conda-forge libraries satisfied that, but
- This no longer seems to be true for arrow-cpp=0.11? I might be mistaken
here.
- We are resolving our dependencies through Anaconda, so it could be quite
brittle for users to explicitly have to specify a channel for certain
libraries. There are further subtleties wrt resolving boost & arrow from
different channels etc. Ideally - but not necessarily - users don't depend
on conda-forge at all, but only on the default channels (as an aside: we
use Artifactory internally).

Essentially, I'm happy with either,
- having a way for MATLAB to load MEX with the new ABI, or,
- reliably depending on libraries compiled with the old ABI

but I've been struggling for a while now to achieve either one of those.
Any pointers would be most welcome!

(For once, this turned out to be easier on Windows than Linux!)

Best,
-Joris.


[jira] [Created] (ARROW-4839) [C#] Add NuGet support

2019-03-12 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4839:
---

 Summary: [C#] Add NuGet support
 Key: ARROW-4839
 URL: https://issues.apache.org/jira/browse/ARROW-4839
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


We should add the metadata to the .csproj so we can create a NuGet package 
without changing any source code.

Also, we should add any scripts and documentation on how to create the NuGet 
package to allow ease of creation at release time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Timeline for 0.13 Arrow release

2019-03-12 Thread Wes McKinney
I disagree. I think we need a forcing function to create some urgency
to work out any of the problems that may come up. If we need to make a
0.13.1 to fix problems that are not caught during the release (and
producing binary artifacts) then I think that is okay

On Tue, Mar 12, 2019 at 9:53 AM Krisztián Szűcs
 wrote:
>
> Hi,
>
> The CMake refactor is huge enough to cause unexpected post-release
> defects. I'd consider shipping it with the next release and let is stabilize
> during the development of 0.14.
>
>
> On Tue, Mar 12, 2019 at 3:41 PM Wes McKinney  wrote:
>
> > hi Uwe,
> >
> > I would be OK with trying to release next week. Let's see what others
> > think. Getting the CMake refactor in and packaging sorted out is the
> > big priority for the C++-using ecosystem. I haven't seen anything
> > hugely pressing in the other impls based on the recent patch flow.
> > Having C# NuGet package working would be nice
> >
> > - Wes
> >
> > On Tue, Mar 12, 2019 at 9:37 AM Uwe L. Korn  wrote:
> > >
> > > Hello,
> > >
> > > two things come to my mind:
> > >
> > >  * I'm spending now some time again on the CMake refactor. The only
> > blocker I currently see is the issue with detecting the correct gtest
> > libraries on Windows. Otherwise this can go in. Being very confident about
> > my work: This will break things even though CI is green. A safer bet from
> > my side would be to make a release and then merge once the release vote
> > passed. I would then like to aim for a shorter cycle for 0.14 to get this
> > working and green as a lot of the refactor is needed to get packages into
> > distributions and thus also clear a bit the path for the availablity of an
> > R package for arrow.
> > >  * I could also volunteer to be RM *if we do the release process
> > starting March 18*. I have limited time the week after but enough spare to
> > make a release in the week of March 18.
> > >
> > > Uwe
> > >
> > > On Tue, Mar 12, 2019, at 3:21 PM, Wes McKinney wrote:
> > > > hi folks,
> > > >
> > > > I think we are on track to be able to release toward the end of this
> > > > month. My proposed timeline:
> > > >
> > > > * This week (March 11-15): feature/improvement push mostly
> > > > * Next week (March 18-22): shift to bug fixes, stabilization, empty
> > > > backlog of feature/improvement JIRAs
> > > > * Week of March 25: propose release candidate
> > > >
> > > > Does this seem reasonable? This puts us at about 9-10 weeks from 0.12.
> > > >
> > > > We need an RM for 0.13, any PMCs want to volunteer?
> > > >
> > > > Take a look at our release page:
> > > >
> > > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219
> > > >
> > > > Out of the open or in-progress issues, we have:
> > > >
> > > > * C#: 3 issues
> > > > * C++ (all components): 51 issues
> > > > * Java: 3 issues
> > > > * Python: 38 issues
> > > > * Rust (all components): 33 issues
> > > >
> > > > Please help curating the backlogs for each component. There's a
> > > > smattering of issues in other categories. There are also 10 open
> > > > issues with No Component (and 20 resolved issues), those need their
> > > > metadata fixed.
> > > >
> > > > Thanks,
> > > > Wes
> > > >
> > > > On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney 
> > wrote:
> > > > >
> > > > > The timeline for the 0.13 release is drawing closer. I would say we
> > > > > should consider a release candidate either the week of March 18 or
> > > > > March 25, which gives us ~3 weeks to close out backlog items.
> > > > >
> > > > > There are around 220 issues open or in-progress in
> > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release
> > > > >
> > > > > Please have a look. If issues are not assigned to someone as the next
> > > > > couple of weeks pass by I'll begin moving at least C++ and Python
> > > > > issues to 0.14 that don't seem like they're going to get done for
> > > > > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other
> > > > > components can review and curate the issues that would be helpful.
> > > > >
> > > > > You can help keep the JIRA issues tidy by making sure to add Fix
> > > > > Version to issues and to make sure to add a Component so that issues
> > > > > are properly categorized in the release notes.
> > > > >
> > > > > Thanks
> > > > > Wes
> > > > >
> > > > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney 
> > wrote:
> > > > > >
> > > > > > See
> > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> > > > > >
> > > > > > The source release step is one of the places where problems occur.
> > > > > >
> > > > > > On Sat, Feb 9, 2019, 10:33 AM  > > > > >>
> > > > > >>
> > > > > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn 
> > wrote:
> > > > > >> >
> > > > > >> > We could dockerize some of the release steps to ensure that
> > they run in the same environment.
> > > > > >>
> > > > > >> I may be able to help with said Dockerization. If not for this
> > release, then for the next.

RE: Timeline for 0.13 Arrow release

2019-03-12 Thread Eric Erhardt
> Having C# NuGet package working would be nice

I've opened https://issues.apache.org/jira/browse/ARROW-4839 for this and I 
will be working on it this week. I'd like to see an official Arrow NuGet 
package on www.nuget.org soon, so getting it in this release would work out 
perfect from my side.


Re: Timeline for 0.13 Arrow release

2019-03-12 Thread Krisztián Szűcs
Hi,

The CMake refactor is huge enough to cause unexpected post-release
defects. I'd consider shipping it with the next release and let is stabilize
during the development of 0.14.


On Tue, Mar 12, 2019 at 3:41 PM Wes McKinney  wrote:

> hi Uwe,
>
> I would be OK with trying to release next week. Let's see what others
> think. Getting the CMake refactor in and packaging sorted out is the
> big priority for the C++-using ecosystem. I haven't seen anything
> hugely pressing in the other impls based on the recent patch flow.
> Having C# NuGet package working would be nice
>
> - Wes
>
> On Tue, Mar 12, 2019 at 9:37 AM Uwe L. Korn  wrote:
> >
> > Hello,
> >
> > two things come to my mind:
> >
> >  * I'm spending now some time again on the CMake refactor. The only
> blocker I currently see is the issue with detecting the correct gtest
> libraries on Windows. Otherwise this can go in. Being very confident about
> my work: This will break things even though CI is green. A safer bet from
> my side would be to make a release and then merge once the release vote
> passed. I would then like to aim for a shorter cycle for 0.14 to get this
> working and green as a lot of the refactor is needed to get packages into
> distributions and thus also clear a bit the path for the availablity of an
> R package for arrow.
> >  * I could also volunteer to be RM *if we do the release process
> starting March 18*. I have limited time the week after but enough spare to
> make a release in the week of March 18.
> >
> > Uwe
> >
> > On Tue, Mar 12, 2019, at 3:21 PM, Wes McKinney wrote:
> > > hi folks,
> > >
> > > I think we are on track to be able to release toward the end of this
> > > month. My proposed timeline:
> > >
> > > * This week (March 11-15): feature/improvement push mostly
> > > * Next week (March 18-22): shift to bug fixes, stabilization, empty
> > > backlog of feature/improvement JIRAs
> > > * Week of March 25: propose release candidate
> > >
> > > Does this seem reasonable? This puts us at about 9-10 weeks from 0.12.
> > >
> > > We need an RM for 0.13, any PMCs want to volunteer?
> > >
> > > Take a look at our release page:
> > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219
> > >
> > > Out of the open or in-progress issues, we have:
> > >
> > > * C#: 3 issues
> > > * C++ (all components): 51 issues
> > > * Java: 3 issues
> > > * Python: 38 issues
> > > * Rust (all components): 33 issues
> > >
> > > Please help curating the backlogs for each component. There's a
> > > smattering of issues in other categories. There are also 10 open
> > > issues with No Component (and 20 resolved issues), those need their
> > > metadata fixed.
> > >
> > > Thanks,
> > > Wes
> > >
> > > On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney 
> wrote:
> > > >
> > > > The timeline for the 0.13 release is drawing closer. I would say we
> > > > should consider a release candidate either the week of March 18 or
> > > > March 25, which gives us ~3 weeks to close out backlog items.
> > > >
> > > > There are around 220 issues open or in-progress in
> > > >
> > > >
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release
> > > >
> > > > Please have a look. If issues are not assigned to someone as the next
> > > > couple of weeks pass by I'll begin moving at least C++ and Python
> > > > issues to 0.14 that don't seem like they're going to get done for
> > > > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other
> > > > components can review and curate the issues that would be helpful.
> > > >
> > > > You can help keep the JIRA issues tidy by making sure to add Fix
> > > > Version to issues and to make sure to add a Component so that issues
> > > > are properly categorized in the release notes.
> > > >
> > > > Thanks
> > > > Wes
> > > >
> > > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney 
> wrote:
> > > > >
> > > > > See
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> > > > >
> > > > > The source release step is one of the places where problems occur.
> > > > >
> > > > > On Sat, Feb 9, 2019, 10:33 AM  > > > >>
> > > > >>
> > > > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn 
> wrote:
> > > > >> >
> > > > >> > We could dockerize some of the release steps to ensure that
> they run in the same environment.
> > > > >>
> > > > >> I may be able to help with said Dockerization. If not for this
> release, then for the next. Are there docs on which systems we wish to
> target and/or any build steps beyond the current dev container (
> https://github.com/apache/arrow/tree/master/dev/container)?
> > >
>


Re: Timeline for 0.13 Arrow release

2019-03-12 Thread Wes McKinney
hi Uwe,

I would be OK with trying to release next week. Let's see what others
think. Getting the CMake refactor in and packaging sorted out is the
big priority for the C++-using ecosystem. I haven't seen anything
hugely pressing in the other impls based on the recent patch flow.
Having C# NuGet package working would be nice

- Wes

On Tue, Mar 12, 2019 at 9:37 AM Uwe L. Korn  wrote:
>
> Hello,
>
> two things come to my mind:
>
>  * I'm spending now some time again on the CMake refactor. The only blocker I 
> currently see is the issue with detecting the correct gtest libraries on 
> Windows. Otherwise this can go in. Being very confident about my work: This 
> will break things even though CI is green. A safer bet from my side would be 
> to make a release and then merge once the release vote passed. I would then 
> like to aim for a shorter cycle for 0.14 to get this working and green as a 
> lot of the refactor is needed to get packages into distributions and thus 
> also clear a bit the path for the availablity of an R package for arrow.
>  * I could also volunteer to be RM *if we do the release process starting 
> March 18*. I have limited time the week after but enough spare to make a 
> release in the week of March 18.
>
> Uwe
>
> On Tue, Mar 12, 2019, at 3:21 PM, Wes McKinney wrote:
> > hi folks,
> >
> > I think we are on track to be able to release toward the end of this
> > month. My proposed timeline:
> >
> > * This week (March 11-15): feature/improvement push mostly
> > * Next week (March 18-22): shift to bug fixes, stabilization, empty
> > backlog of feature/improvement JIRAs
> > * Week of March 25: propose release candidate
> >
> > Does this seem reasonable? This puts us at about 9-10 weeks from 0.12.
> >
> > We need an RM for 0.13, any PMCs want to volunteer?
> >
> > Take a look at our release page:
> >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219
> >
> > Out of the open or in-progress issues, we have:
> >
> > * C#: 3 issues
> > * C++ (all components): 51 issues
> > * Java: 3 issues
> > * Python: 38 issues
> > * Rust (all components): 33 issues
> >
> > Please help curating the backlogs for each component. There's a
> > smattering of issues in other categories. There are also 10 open
> > issues with No Component (and 20 resolved issues), those need their
> > metadata fixed.
> >
> > Thanks,
> > Wes
> >
> > On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney  wrote:
> > >
> > > The timeline for the 0.13 release is drawing closer. I would say we
> > > should consider a release candidate either the week of March 18 or
> > > March 25, which gives us ~3 weeks to close out backlog items.
> > >
> > > There are around 220 issues open or in-progress in
> > >
> > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release
> > >
> > > Please have a look. If issues are not assigned to someone as the next
> > > couple of weeks pass by I'll begin moving at least C++ and Python
> > > issues to 0.14 that don't seem like they're going to get done for
> > > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other
> > > components can review and curate the issues that would be helpful.
> > >
> > > You can help keep the JIRA issues tidy by making sure to add Fix
> > > Version to issues and to make sure to add a Component so that issues
> > > are properly categorized in the release notes.
> > >
> > > Thanks
> > > Wes
> > >
> > > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney  wrote:
> > > >
> > > > See 
> > > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> > > >
> > > > The source release step is one of the places where problems occur.
> > > >
> > > > On Sat, Feb 9, 2019, 10:33 AM  > > >>
> > > >>
> > > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn  wrote:
> > > >> >
> > > >> > We could dockerize some of the release steps to ensure that they run 
> > > >> > in the same environment.
> > > >>
> > > >> I may be able to help with said Dockerization. If not for this 
> > > >> release, then for the next. Are there docs on which systems we wish to 
> > > >> target and/or any build steps beyond the current dev container 
> > > >> (https://github.com/apache/arrow/tree/master/dev/container)?
> >


Re: Timeline for 0.13 Arrow release

2019-03-12 Thread Uwe L. Korn
Hello,

two things come to my mind:

 * I'm spending now some time again on the CMake refactor. The only blocker I 
currently see is the issue with detecting the correct gtest libraries on 
Windows. Otherwise this can go in. Being very confident about my work: This 
will break things even though CI is green. A safer bet from my side would be to 
make a release and then merge once the release vote passed. I would then like 
to aim for a shorter cycle for 0.14 to get this working and green as a lot of 
the refactor is needed to get packages into distributions and thus also clear a 
bit the path for the availablity of an R package for arrow.
 * I could also volunteer to be RM *if we do the release process starting March 
18*. I have limited time the week after but enough spare to make a release in 
the week of March 18.

Uwe

On Tue, Mar 12, 2019, at 3:21 PM, Wes McKinney wrote:
> hi folks,
> 
> I think we are on track to be able to release toward the end of this
> month. My proposed timeline:
> 
> * This week (March 11-15): feature/improvement push mostly
> * Next week (March 18-22): shift to bug fixes, stabilization, empty
> backlog of feature/improvement JIRAs
> * Week of March 25: propose release candidate
> 
> Does this seem reasonable? This puts us at about 9-10 weeks from 0.12.
> 
> We need an RM for 0.13, any PMCs want to volunteer?
> 
> Take a look at our release page:
> 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219
> 
> Out of the open or in-progress issues, we have:
> 
> * C#: 3 issues
> * C++ (all components): 51 issues
> * Java: 3 issues
> * Python: 38 issues
> * Rust (all components): 33 issues
> 
> Please help curating the backlogs for each component. There's a
> smattering of issues in other categories. There are also 10 open
> issues with No Component (and 20 resolved issues), those need their
> metadata fixed.
> 
> Thanks,
> Wes
> 
> On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney  wrote:
> >
> > The timeline for the 0.13 release is drawing closer. I would say we
> > should consider a release candidate either the week of March 18 or
> > March 25, which gives us ~3 weeks to close out backlog items.
> >
> > There are around 220 issues open or in-progress in
> >
> > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release
> >
> > Please have a look. If issues are not assigned to someone as the next
> > couple of weeks pass by I'll begin moving at least C++ and Python
> > issues to 0.14 that don't seem like they're going to get done for
> > 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other
> > components can review and curate the issues that would be helpful.
> >
> > You can help keep the JIRA issues tidy by making sure to add Fix
> > Version to issues and to make sure to add a Component so that issues
> > are properly categorized in the release notes.
> >
> > Thanks
> > Wes
> >
> > On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney  wrote:
> > >
> > > See 
> > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> > >
> > > The source release step is one of the places where problems occur.
> > >
> > > On Sat, Feb 9, 2019, 10:33 AM  > >>
> > >>
> > >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn  wrote:
> > >> >
> > >> > We could dockerize some of the release steps to ensure that they run 
> > >> > in the same environment.
> > >>
> > >> I may be able to help with said Dockerization. If not for this release, 
> > >> then for the next. Are there docs on which systems we wish to target 
> > >> and/or any build steps beyond the current dev container 
> > >> (https://github.com/apache/arrow/tree/master/dev/container)?
>


Re: Publishing C# NuGet package

2019-03-12 Thread Wes McKinney
thanks Eric -- that sounds great. I think we're going to want to cut
the 0.13 release candidate around 2 weeks from now, so that gives some
time to get the packaging things sorted out

- Wes

On Thu, Mar 7, 2019 at 4:46 PM Eric Erhardt
 wrote:
>
> > Some changes may need to be made to the release scripts to update C# 
> > metadata files. The intent it to make it so that the code artifact can be 
> > pushed to a package manager using the official ASF release artifact. If we 
> > don't get it 100% right for 0.13 then > at least we can get a preliminary 
> > package up there and do things 100% by the books in 0.14.
>
> The way you build a NuGet package is you call `dotnet pack` on the `.csproj` 
> file. That will build the .NET assembly (.dll) and package it into a NuGet 
> package (.nupkg, which is a glorified .zip file). That `.nupkg` file is then 
> published to the nuget.org website.
>
> In order to publish it to nuget.org, an account will need to be made to 
> publish it under. Is that something a PMC member can/will do? The intention 
> is for the published package to be the official "Apache Arrow" nuget package.
>
> The .nupkg file can optionally be signed. See 
> https://docs.microsoft.com/en-us/nuget/create-packages/sign-a-package.
>
> I can create a JIRA to add all the appropriate NuGet metadata to the .csproj 
> in the repo. That way no file committed into the repo will need to change in 
> order to create the NuGet package. I can also add the instructions to create 
> the NuGet into the csharp README file in that PR.


Re: Timeline for 0.13 Arrow release

2019-03-12 Thread Wes McKinney
hi folks,

I think we are on track to be able to release toward the end of this
month. My proposed timeline:

* This week (March 11-15): feature/improvement push mostly
* Next week (March 18-22): shift to bug fixes, stabilization, empty
backlog of feature/improvement JIRAs
* Week of March 25: propose release candidate

Does this seem reasonable? This puts us at about 9-10 weeks from 0.12.

We need an RM for 0.13, any PMCs want to volunteer?

Take a look at our release page:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103091219

Out of the open or in-progress issues, we have:

* C#: 3 issues
* C++ (all components): 51 issues
* Java: 3 issues
* Python: 38 issues
* Rust (all components): 33 issues

Please help curating the backlogs for each component. There's a
smattering of issues in other categories. There are also 10 open
issues with No Component (and 20 resolved issues), those need their
metadata fixed.

Thanks,
Wes

On Wed, Feb 27, 2019 at 1:49 PM Wes McKinney  wrote:
>
> The timeline for the 0.13 release is drawing closer. I would say we
> should consider a release candidate either the week of March 18 or
> March 25, which gives us ~3 weeks to close out backlog items.
>
> There are around 220 issues open or in-progress in
>
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.13.0+Release
>
> Please have a look. If issues are not assigned to someone as the next
> couple of weeks pass by I'll begin moving at least C++ and Python
> issues to 0.14 that don't seem like they're going to get done for
> 0.13. If development stakeholders for C#, Java, Rust, Ruby, and other
> components can review and curate the issues that would be helpful.
>
> You can help keep the JIRA issues tidy by making sure to add Fix
> Version to issues and to make sure to add a Component so that issues
> are properly categorized in the release notes.
>
> Thanks
> Wes
>
> On Sat, Feb 9, 2019 at 10:39 AM Wes McKinney  wrote:
> >
> > See 
> > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> >
> > The source release step is one of the places where problems occur.
> >
> > On Sat, Feb 9, 2019, 10:33 AM  >>
> >>
> >> > On Feb 8, 2019, at 9:19 AM, Uwe L. Korn  wrote:
> >> >
> >> > We could dockerize some of the release steps to ensure that they run in 
> >> > the same environment.
> >>
> >> I may be able to help with said Dockerization. If not for this release, 
> >> then for the next. Are there docs on which systems we wish to target 
> >> and/or any build steps beyond the current dev container 
> >> (https://github.com/apache/arrow/tree/master/dev/container)?


[jira] [Created] (ARROW-4838) [C++] Implement safe Make constructor

2019-03-12 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4838:
-

 Summary: [C++] Implement safe Make constructor
 Key: ARROW-4838
 URL: https://issues.apache.org/jira/browse/ARROW-4838
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
 Fix For: 0.14.0


The following classes need validating constructors:

* ArrayData
* ChunkedArray
* RecordBatch
* Column
* Table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4837) [C++] Support c++filt on a custom path in the run-test.sh script

2019-03-12 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4837:
--

 Summary: [C++] Support c++filt on a custom path in the run-test.sh 
script
 Key: ARROW-4837
 URL: https://issues.apache.org/jira/browse/ARROW-4837
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Krisztian Szucs


On conda this is CXXFILT=/opt/conda/bin/x86_64-conda_cos6-linux-gnu-c++filt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4835) [GLib] Add boolean operations

2019-03-12 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4835:
---

 Summary: [GLib] Add boolean operations
 Key: ARROW-4835
 URL: https://issues.apache.org/jira/browse/ARROW-4835
 Project: Apache Arrow
  Issue Type: New Feature
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4836) "Cannot tell() a compressed stream" when using RecordBatchStreamWriter

2019-03-12 Thread Mike Pedersen (JIRA)
Mike Pedersen created ARROW-4836:


 Summary: "Cannot tell() a compressed stream" when using 
RecordBatchStreamWriter
 Key: ARROW-4836
 URL: https://issues.apache.org/jira/browse/ARROW-4836
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.12.1
Reporter: Mike Pedersen


It does not seem like RecordBatchStreamWriter works with compressed streams:

{code:python}
>>> import pyarrow as pa
>>> pa.__version__
'0.12.1'
>>> stream = pa.output_stream('/tmp/a.gz')
>>> batch = pa.RecordBatch.from_arrays([pa.array([1])], ['a'])
>>> writer = pa.RecordBatchStreamWriter(stream, batch.schema)
>>> writer.write(batch)
Traceback (most recent call last):
  File "", line 1, in 
  File "pyarrow/ipc.pxi", line 181, in pyarrow.lib._RecordBatchWriter.write
  File "pyarrow/ipc.pxi", line 196, in 
pyarrow.lib._RecordBatchWriter.write_batch
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Cannot tell() a compressed stream
{code}

As I understand the documentation, this should be possible, right?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Depending on non-released Apache projects (C++ Avro)

2019-03-12 Thread Uwe L. Korn
Hello Micah,

> Uwe, I'm not sure I understand what type of support/help you are thinking
> of.  Could you elaborate a little bit more before I reach out?

I would help them with the same build system improvement we have done in the 
recent time (and are currently) doing in Arrow for C++.  Nothing I would 
explicitly advertise on their ML, only as a heads up if there are Avro people 
on the Arrow ML. I cannot give any commitments on this but this would be 
probably some high gain for not so much work if that is currently hindering 
releases or adoption.

Uwe

> 
> -Micah
> 
> On Tue, Mar 5, 2019 at 4:53 PM Wes McKinney  wrote:
> 
> > I am OK with that, but if we find ourselves making compromises that
> > affect performance or memory efficiency (where possibly invasive
> > refactoring may be required) perhaps we should reconsider option #3.
> >
> > On Tue, Mar 5, 2019 at 11:29 AM Uwe L. Korn  wrote:
> > >
> > > I'm leaning a bit towards 1) but I would love to get some input from the
> > Avro community as 1) depends also on their side as we will submit some
> > patches upstream that need to be reviewed and someday also released.
> > >
> > > Are AVRO committers subscribed here or should we reach out to them on
> > their ML? Given that we are quite active in the C++ space currently, I feel
> > that we can contribute quite some infrastructure in building and packaging
> > that we do eitherway for Arrow. This might be quite helpful for a project.
> > We have seen with Parquet where much of the development is just happening
> > as it is part of Arrow. (Not suggesting to merge/fork the Avro codebase but
> > just to apply some of the  best practices we learned while building Arrow).
> > >
> > > Uwe
> > >
> > > On Tue, Mar 5, 2019, at 4:57 PM, Wes McKinney wrote:
> > > > I'd be +0.5 in favor of forking in this particular case. Since Avro is
> > > > not vectorized (unlike Parquet and ORC) I suspect it may be more
> > > > difficult to get the best performance using a general purpose API
> > > > versus one that is more specialized to producing Arrow record batches.
> > > > Given that has been relatively light C++ development activity in
> > > > Apache Avro and no releases for 2 years it does give me pause.
> > > >
> > > > We might want to look at Impala's Avro scanner, they are doing some
> > > > LLVM IR cross-compilation also (they're using the Avro C++ library
> > > > though)
> > > >
> > > >
> > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner-ir.cc
> > > >
> > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner.cc
> > > >
> > > > On Tue, Mar 5, 2019 at 1:01 AM Micah Kornfield 
> > wrote:
> > > > >
> > > > > I'm looking at incorporating Avro in Arrow C++ [1]. It  seems that
> > the Avro
> > > > > C++ library APIs  have improved from the last release.  However, it
> > is not
> > > > > clear when a new release will be available (I asked on the  JIRA
> > Item for
> > > > > the next release [2] and received no response).
> > > > >
> > > > > I was wondering if there is a policy governing using other Apache
> > projects
> > > > > or how people felt about the following options:
> > > > > 1.  Depend on a specific git commit through the third-party library
> > system.
> > > > > 2.  Copy the necessary source code temporarily to our project, and
> > change
> > > > > to using the next release when it is available.
> > > > > 3.  Fork the code we need (the main benefit I see here is being able
> > to
> > > > > refactor it to avoid having to deal with exceptions, easier
> > integration
> > > > > with our IO system and one less 3rd party dependency to deal with).
> > > > > 4.  Wait on the 1.9 release before proceeding.
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/ARROW-1209
> > > > > [2] https://issues.apache.org/jira/browse/AVRO-2250
> > > >
> >
>


Re: [C++] Failing constructors and internal state

2019-03-12 Thread Krisztián Szűcs
+1 on Make and MakeUnsafe

On Mon, Mar 11, 2019 at 3:15 PM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> I can settle with Micah's proposal of `MakeValidate`, though I'd prefer
> that Make is safe by default and responsible users use MakeUnsafe :), I'll
> settle with the pragmatic choice of backward compatibility.
>
> François
>
>
> On Sun, Mar 10, 2019 at 5:45 PM Wes McKinney  wrote:
>
> > I think having consistent methods for both validated and unvalidated
> > construction is a good idea. Being fairly passionate about
> > microperformance, I don't think we should penalize responsible users
> > of unsafe/unvalidated APIs (e.g. by taking them away and replacing
> > them with variants featuring unavoidable computation), this can
> > partially be handled through developer documentation (which, ah, we
> > will need to write more of)
> >
> > On Sun, Mar 10, 2019 at 4:01 PM Micah Kornfield 
> > wrote:
> > >
> > > I agree there should always be a path to avoid the validation but I
> > think there should also be an easy way to have validation included and a
> > clear way to tell the difference.  IMO, having strong naming convention
> so
> > callers can tell the difference, and code reviewers can focus more on
> less
> > safe method usage, is important.  I will help new-comers to the project
> > write safer code.  Which can either be refactored or called out in code
> > review for performance issues.  It also provides a cue for all developers
> > to consider if they are meeting the necessary requirements when using
> less
> > safe methods.
> > >
> > > A straw-man proposal for naming conventions:
> > > - Constructors are always unvalidated (should still have appropriate
> > DCHECKs)
> > > - "Make" calls are always unvalidated (should still have appropriate
> > DHCECKs)
> > > - "MakeValidated" ensures proper structural validation occur (but not
> > data is validation).
> > > - "MakeSanitized" ensures proper structural and data is validations
> occur
> > >
> > > As noted above it might only pay to refactor a small amount of current
> > usage to the safer APIs.
> > >
> > > We could potentially go even further down the rabbit hole and try to
> > define standard for a Hungarian notation [1] to make it more obvious what
> > invariants are expected for a particular data-structure variable (I'm
> > actually -.5 on this).
> > >
> > > As a personal bias, I would rather have slower code that has lower risk
> > of crashing in production than faster code that does.  Obviously, there
> is
> > a tradeoff here, and the ideal is faster code that won't segfault.
> > >
> > > Thoughts?
> > >
> > > -Micah
> > >
> > > [1]
> > https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/
> > >
> > > On Sun, Mar 10, 2019 at 9:38 AM Wes McKinney 
> > wrote:
> > >>
> > >> hi folks,
> > >>
> > >> I think some issues are being conflated here, so let me try to dig
> > >> through them. Let's first look at the two cited bugs that were fixed,
> > >> if I have this right:
> > >>
> > >> * ARROW-4766: root cause dereferencing a null pointer
> > >> * ARROW-4774: root cause unsanitized Python user input
> > >>
> > >> None of the 4 remedies listed could have prevented ARROW-4766 AFAICT
> > >> since we currently allow for null buffers (the object, not the pointer
> > >> inside) in ArrayData. This has been discussed on the mailing list in
> > >> the past; "sanitizing" ArrayData to be free of null objects would be
> > >> expensive and my general attitude in the C++ library is that we should
> > >> not be in the business of paying extra CPU cycles for the 1-5% case
> > >> when it is unneeded in the 95-99% of cases. We have DCHECK assertions
> > >> to check these issues in debug builds while avoiding the costs in
> > >> release builds. In the case of checking edge cases in computational
> > >> kernels, suffice to say that we should check every kernel on length-0
> > >> input with null buffers to make sure this case is properly handled
> > >>
> > >> In the case of ARROW-4774, we should work at the language binding
> > >> interface to make sure we have convenient "validating" constructors
> > >> that check user input for common problems. This can prevent the
> > >> duplication of this code across the binding layers (GLib, Python, R,
> > >> MATLAB, etc.)
> > >>
> > >>  On the specific 4 steps mentioned by Francois, here are my thoughts:
> > >>
> > >> 1. Having StatusOr would be useful, but this is a programming
> > convenience
> > >>
> > >> 2. There are a couple purposes of the static factory methods that
> > >> exist now, like Table::Make and RecordBatch::Make. One of the reasons
> > >> that I added them initially was because of the implicit constructor
> > >> behavior of std::vector inside a call to std::make_shared. If you have
> > >> a std::vector argument in a class's constructor, then
> > >> std::make_shared(..., {...}) will not result in the initializer
> > >> list constructing the std::vector. This means some awkwardness like
>