RE: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-23 Thread Melik-Adamyan, Areg
+1 (non-binding)

Is there a plan for C++ API?

-Original Message-
From: Renjie Liu [mailto:liurenjie2...@gmail.com] 
Sent: Wednesday, January 23, 2019 7:44 PM
To: dev@arrow.apache.org
Subject: Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

+1 (non-binding)

I also tried to write a similar engine, and glad to merge with datadusion

paddy horan  于 2019年1月24日周四 上午5:29写道:

> +1 (non-binding)
>
> Thanks Andy
>
> Get Outlook for iOS
>
> 
> From: Chao Sun 
> Sent: Wednesday, January 23, 2019 1:07 PM
> To: dev@arrow.apache.org
> Subject: Re: [VOTE] Accept donation of Rust DataFusion library for 
> Apache Arrow
>
> +1 (non-binding)
>
> Glad to see this coming and I think it is a great complement to 
> existing modules, e.g., Arrow and Parquet. It also aligns with the 
> overall direction that the project is going.
>
> Chao
>
> On Wed, Jan 23, 2019 at 9:30 AM Andy Grove  wrote:
>
> > As far as I know, the majority of the PMC are not actively using 
> > Rust, so as supporting evidence for interest in this donation from 
> > the Rust community, here is a Reddit thread where I talked about 
> > offering
> DataFusion
> > for donation recently:
> >
> >
> >
> https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_
> query_engine_for_apache/
> >
> > There were 69 upvotes and many supportive comments, including a 
> > couple where people specifically mentioned that they liked the fact 
> > that DataFusion uses Arrow. I would hope that this donation leads to 
> > more
> people
> > contributing to Arrow.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale 
> > 
> > wrote:
> >
> > > Hi Andy,
> > >
> > > +1 : Accept contribution of DataFusion Rust library
> > >
> > > Thanks
> > >
> > > On Wed, 23 Jan 2019 at 03:05, Wes McKinney 
> wrote:
> > >
> > > > Dear all,
> > > >
> > > > The developers of DataFusion, an analytical query engine written 
> > > > in Rust, based on the Arrow columnar memory format, are 
> > > > proposing to donate the code to Apache Arrow:
> > > >
> > > > https://github.com/andygrove/datafusion
> > > >
> > > > The community has had an opportunity to discuss this [1] and 
> > > > there do not seem to be objections to this. Andy Grove has 
> > > > staged the code donation in the form of a pull request:
> > > >
> > > > https://github.com/apache/arrow/pull/3399
> > > >
> > > > This vote is to determine if the Arrow PMC is in favor of 
> > > > accepting this donation. If the vote passes, the PMC and the 
> > > > authors of the
> code
> > > > will work together to complete the ASF IP Clearance process
> > > > (http://incubator.apache.org/ip-clearance/) and import this Rust 
> > > > codebase implementation into Apache Arrow.
> > > >
> > > > [ ] +1 : Accept contribution of DataFusion Rust library [ ] 0 : 
> > > > No opinion [ ] -1 : Reject contribution because...
> > > >
> > > > Here is my vote: +1
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > Thanks,
> > > > Wes
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23
> e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > > >
> > >
> >
>


[jira] [Created] (ARROW-4354) Explore Codespeed feasibility and ease of customization

2019-01-23 Thread Areg Melik-Adamyan (JIRA)
Areg Melik-Adamyan created ARROW-4354:
-

 Summary: Explore Codespeed feasibility and ease of customization
 Key: ARROW-4354
 URL: https://issues.apache.org/jira/browse/ARROW-4354
 Project: Apache Arrow
  Issue Type: Task
  Components: Developer Tools
Reporter: Areg Melik-Adamyan


@Tanya Schlusser can you please explore this option and report out?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4353) [CI] Add jobs for 32-bit and 64-bit MinGW

2019-01-23 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4353:
---

 Summary: [CI] Add jobs for 32-bit and 64-bit MinGW
 Key: ARROW-4353
 URL: https://issues.apache.org/jira/browse/ARROW-4353
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4352) [C++] Add support for system GTest

2019-01-23 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4352:
---

 Summary: [C++] Add support for system GTest
 Key: ARROW-4352
 URL: https://issues.apache.org/jira/browse/ARROW-4352
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4351) Fail to build with static parquet

2019-01-23 Thread Jeroen (JIRA)
Jeroen created ARROW-4351:
-

 Summary: Fail to build with static parquet
 Key: ARROW-4351
 URL: https://issues.apache.org/jira/browse/ARROW-4351
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Jeroen


Trying to build a static version of arrow with parquet fails as below. Is this 
configuration possible?

 

{{ -DARROW_BUILD_STATIC=ON}}
{{ -DARROW_BUILD_TESTS=OFF}}
{{ -DARROW_PYTHON=OFF}}
{{ -DARROW_BOOST_USE_SHARED=OFF}}
{{ -DARROW_WITH_SNAPPY=OFF}}
{{ -DARROW_WITH_ZSTD=OFF}}
{{ -DARROW_WITH_LZ4=OFF}}
{{ -DARROW_JEMALLOC=OFF}}
{{ -DARROW_BUILD_SHARED=OFF}}
{{ -DARROW_BOOST_VENDORED=OFF}}
{{ -DARROW_WITH_ZLIB=OFF}}
{{ -DARROW_WITH_BROTLI=OFF}}
{{ -DARROW_USE_GLOG=OFF}}
{{ -DPTHREAD_LIBRARY=OFF}}
{{ -DARROW_BUILD_UTILITIES=ON}}
{{ -DARROW_TEST_LINKAGE="static"}}
{{ -DARROW_HDFS=OFF}}
{{ -DARROW_PARQUET=ON}}

 

...

 

{{==> cmake . -DCMAKE_C_FLAGS_RELEASE=-DNDEBUG 
-DCMAKE_CXX_FLAGS_RELEASE=-DNDEBUG}}
{{Last 15 lines from 
/Users/jeroen/Library/Logs/Homebrew/apache-arrow/01.cmake:}}
{{-- CMAKE_CXX_FLAGS: -Qunused-arguments -O3 -DNDEBUG -Wall 
-Wno-unknown-warning-option -msse4.2 -maltivec -march=armv8-a+crc 
-stdlib=libc++}}
{{-- Looking for backtrace}}
{{-- Looking for backtrace - found}}
{{-- backtrace facility detected in default set of libraries}}
{{-- Found Backtrace: /usr/include}}
{{-- Configuring done}}
{{CMake Error at cmake_modules/BuildUtils.cmake:143 (add_dependencies):}}
{{ The dependency target "arrow_shared" of target "parquet_objlib" does not}}
{{ exist.}}
{{Call Stack (most recent call first):}}
{{ src/parquet/CMakeLists.txt:214 (ADD_ARROW_LIB)}}
{{-- Generating done}}
{{-- Build files have been written to: 
/tmp/apache-arrow-20190123-44858-1as3l4q/apache-arrow-0.12.0/cpp}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4350) [python] pyarrow table convert to pandas dataframe add extra information

2019-01-23 Thread yu peng (JIRA)
yu peng created ARROW-4350:
--

 Summary: [python] pyarrow table convert to pandas dataframe add 
extra information
 Key: ARROW-4350
 URL: https://issues.apache.org/jira/browse/ARROW-4350
 Project: Apache Arrow
  Issue Type: Bug
Reporter: yu peng


{code:java}
In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]})

In [20]: df.iloc[0].to_dict()
Out[20]: {'a': [[1], [2]], 'b': 1}

In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()
Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1}

In [24]: np.array(df.iloc[0].to_dict()['a']).shape
Out[24]: (2, 1)

In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape
Out[25]: (2,)
{code}
Adding extra array type is not functioning as expected. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4349) [C++] Build all benchmarks on Windows without failing

2019-01-23 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4349:
---

 Summary: [C++] Build all benchmarks on Windows without failing
 Key: ARROW-4349
 URL: https://issues.apache.org/jira/browse/ARROW-4349
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.13.0


Some of the benchmarks fail to build on Windows. We don't build them in 
Appveyor so we might consider changing that, or testing them in a nightly build



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-23 Thread Renjie Liu
+1 (non-binding)

I also tried to write a similar engine, and glad to merge with datadusion

paddy horan  于 2019年1月24日周四 上午5:29写道:

> +1 (non-binding)
>
> Thanks Andy
>
> Get Outlook for iOS
>
> 
> From: Chao Sun 
> Sent: Wednesday, January 23, 2019 1:07 PM
> To: dev@arrow.apache.org
> Subject: Re: [VOTE] Accept donation of Rust DataFusion library for Apache
> Arrow
>
> +1 (non-binding)
>
> Glad to see this coming and I think it is a great complement to existing
> modules, e.g., Arrow and Parquet. It also aligns with the overall direction
> that the project is going.
>
> Chao
>
> On Wed, Jan 23, 2019 at 9:30 AM Andy Grove  wrote:
>
> > As far as I know, the majority of the PMC are not actively using Rust, so
> > as supporting evidence for interest in this donation from the Rust
> > community, here is a Reddit thread where I talked about offering
> DataFusion
> > for donation recently:
> >
> >
> >
> https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_query_engine_for_apache/
> >
> > There were 69 upvotes and many supportive comments, including a couple
> > where people specifically mentioned that they liked the fact that
> > DataFusion uses Arrow. I would hope that this donation leads to more
> people
> > contributing to Arrow.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale 
> > wrote:
> >
> > > Hi Andy,
> > >
> > > +1 : Accept contribution of DataFusion Rust library
> > >
> > > Thanks
> > >
> > > On Wed, 23 Jan 2019 at 03:05, Wes McKinney 
> wrote:
> > >
> > > > Dear all,
> > > >
> > > > The developers of DataFusion, an analytical query engine written
> > > > in Rust, based on the Arrow columnar memory format, are proposing
> > > > to donate the code to Apache Arrow:
> > > >
> > > > https://github.com/andygrove/datafusion
> > > >
> > > > The community has had an opportunity to discuss this [1] and
> > > > there do not seem to be objections to this. Andy Grove has staged
> > > > the code donation in the form of a pull request:
> > > >
> > > > https://github.com/apache/arrow/pull/3399
> > > >
> > > > This vote is to determine if the Arrow PMC is in favor of accepting
> > > > this donation. If the vote passes, the PMC and the authors of the
> code
> > > > will work together to complete the ASF IP Clearance process
> > > > (http://incubator.apache.org/ip-clearance/) and import this Rust
> > > > codebase implementation into Apache Arrow.
> > > >
> > > > [ ] +1 : Accept contribution of DataFusion Rust library
> > > > [ ] 0 : No opinion
> > > > [ ] -1 : Reject contribution because...
> > > >
> > > > Here is my vote: +1
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > Thanks,
> > > > Wes
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > > >
> > >
> >
>


[jira] [Created] (ARROW-4348) encountered error when building parquet

2019-01-23 Thread lei yu (JIRA)
lei yu created ARROW-4348:
-

 Summary: encountered error when building parquet
 Key: ARROW-4348
 URL: https://issues.apache.org/jira/browse/ARROW-4348
 Project: Apache Arrow
  Issue Type: Bug
Reporter: lei yu


I am trying to build c++ libraries on Centos 7.5. parquet only. I followed the 
instruction on github and did as below

 
{code:java}
git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir debug
cd debug
cmake .. -DARROW_PARQUET=ON -DARROW_OPTIONAL_INSTALL=ON
make parquet
{code}
 

I don't have Third party libraries installed on my box, so it tries to download 
thirdparties in the building process but I got error after it says that thrift 
has been downloaded and installed.
{code:java}
No rule to make target thrift_ep/src/thrift_ep-install/lib/libthriftd.a', 
needed bysrc/parquet/parquet_types.cpp'. Stop{code}

before the error, it says

{code:java}

[ 7%] Performing configure step for 'thrift_ep'
-- thrift_ep configure command succeeded. See also 
/home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure-.log
[ 8%] Performing build step for 'thrift_ep'
-- thrift_ep build command succeeded. See also 
/home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-.log
[ 9%] Performing install step for 'thrift_ep'
-- thrift_ep install command succeeded. See also 
/home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-install-*.log
[ 10%] Completed 'thrift_ep'
[ 10%] Built target thrift_ep
{code}

I had to build thrift separately and then I can build parquet sucessfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4347) [Python] Run Python Travis CI unit tests on Linux when Java codebase changed

2019-01-23 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4347:
---

 Summary: [Python] Run Python Travis CI unit tests on Linux when 
Java codebase changed
 Key: ARROW-4347
 URL: https://issues.apache.org/jira/browse/ARROW-4347
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.13.0


The Java library is also dependency of the Python tests, but the tests aren't 
triggered if there is a change to the {{java/}} subtree. This blind spot was 
introduced when the CI jobs were split apart

https://github.com/apache/arrow/blob/master/.travis.yml#L133



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4346) [C++] Fix compiler warnings with gcc 8.2.0

2019-01-23 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4346:
---

 Summary: [C++] Fix compiler warnings with gcc 8.2.0
 Key: ARROW-4346
 URL: https://issues.apache.org/jira/browse/ARROW-4346
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.13.0


I just set up a new machine on Ubuntu 18.10 so I'm getting a few papercuts

{code}
/usr/bin/ccache /usr/bin/c++  -DARROW_EXTRA_ERROR_CONTEXT -DARROW_JEMALLOC 
-DARROW_JEMALLOC_INCLUDE_DIR=/home/wesm/code/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//include
 -DARROW_NO_DEPRECATED_API -DARROW_USE_GLOG -DARROW_USE_SIMD -DARROW_WITH_ZSTD 
-Isrc -I../src -isystem /home/wesm/cpp-toolchain/include -isystem 
gbenchmark_ep/src/gbenchmark_ep-install/include -isystem jemalloc_ep-prefix/src 
-isystem ../thirdparty/hadoop/include -isystem orc_ep-install/include -isystem 
/home/wesm/cpp-toolchain/include/thrift -Wno-noexcept-type  -fuse-ld=gold -ggdb 
-O0  -Wall -Wconversion -Wno-sign-conversion -Werror -msse4.2  -g -fPIE   
-std=gnu++11 -MD -MT 
src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o 
-MF 
src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o.d 
-o 
src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o 
-c ../src/parquet/encoding-benchmark.cc
In file included from ../src/parquet/encoding-internal.h:33,
 from ../src/parquet/encoding-benchmark.cc:20:
../src/parquet/encoding.h: In instantiation of ‘int 
parquet::Decoder::DecodeSpaced(parquet::Decoder::T*, int, int, 
const uint8_t*, int64_t) [with DType = 
parquet::DataType<(parquet::Type::type)6>; parquet::Decoder::T = 
parquet::ByteArray; uint8_t = unsigned char; int64_t = long int]’:
../src/parquet/encoding.h:110:15:   required from here
../src/parquet/encoding.h:120:11: error: ‘void* memset(void*, int, size_t)’ 
clearing an object of non-trivial type 
‘parquet::Decoder >::T’ {aka ‘struct 
parquet::ByteArray’}; use assignment or value-initialization instead 
[-Werror=class-memaccess]
 memset(buffer + values_read, 0, (num_values - values_read) * sizeof(T));
 ~~^
In file included from ../src/parquet/schema.h:31,
 from ../src/parquet/encoding.h:29,
 from ../src/parquet/encoding-internal.h:33,
 from ../src/parquet/encoding-benchmark.cc:20:
../src/parquet/types.h:155:8: note: 
‘parquet::Decoder >::T’ {aka ‘struct 
parquet::ByteArray’} declared here
 struct ByteArray {
^
cc1plus: all warnings being treated as errors
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-23 Thread paddy horan
+1 (non-binding)

Thanks Andy

Get Outlook for iOS


From: Chao Sun 
Sent: Wednesday, January 23, 2019 1:07 PM
To: dev@arrow.apache.org
Subject: Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

+1 (non-binding)

Glad to see this coming and I think it is a great complement to existing
modules, e.g., Arrow and Parquet. It also aligns with the overall direction
that the project is going.

Chao

On Wed, Jan 23, 2019 at 9:30 AM Andy Grove  wrote:

> As far as I know, the majority of the PMC are not actively using Rust, so
> as supporting evidence for interest in this donation from the Rust
> community, here is a Reddit thread where I talked about offering DataFusion
> for donation recently:
>
>
> https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_query_engine_for_apache/
>
> There were 69 upvotes and many supportive comments, including a couple
> where people specifically mentioned that they liked the fact that
> DataFusion uses Arrow. I would hope that this donation leads to more people
> contributing to Arrow.
>
> Thanks,
>
> Andy.
>
> On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale 
> wrote:
>
> > Hi Andy,
> >
> > +1 : Accept contribution of DataFusion Rust library
> >
> > Thanks
> >
> > On Wed, 23 Jan 2019 at 03:05, Wes McKinney  wrote:
> >
> > > Dear all,
> > >
> > > The developers of DataFusion, an analytical query engine written
> > > in Rust, based on the Arrow columnar memory format, are proposing
> > > to donate the code to Apache Arrow:
> > >
> > > https://github.com/andygrove/datafusion
> > >
> > > The community has had an opportunity to discuss this [1] and
> > > there do not seem to be objections to this. Andy Grove has staged
> > > the code donation in the form of a pull request:
> > >
> > > https://github.com/apache/arrow/pull/3399
> > >
> > > This vote is to determine if the Arrow PMC is in favor of accepting
> > > this donation. If the vote passes, the PMC and the authors of the code
> > > will work together to complete the ASF IP Clearance process
> > > (http://incubator.apache.org/ip-clearance/) and import this Rust
> > > codebase implementation into Apache Arrow.
> > >
> > > [ ] +1 : Accept contribution of DataFusion Rust library
> > > [ ] 0 : No opinion
> > > [ ] -1 : Reject contribution because...
> > >
> > > Here is my vote: +1
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]:
> > >
> >
> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > >
> >
>


[jira] [Created] (ARROW-4345) [C++] Add Apache 2.0 license file to the Parquet-testing repository

2019-01-23 Thread Rylan Dmello (JIRA)
Rylan Dmello created ARROW-4345:
---

 Summary: [C++] Add Apache 2.0 license file to the Parquet-testing 
repository
 Key: ARROW-4345
 URL: https://issues.apache.org/jira/browse/ARROW-4345
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, cpp
Affects Versions: 0.12.0
Reporter: Rylan Dmello


The parquet-testing repository is used as a git submodule in the Apache Arrow 
repository, but doesn't currently have a license file:

    [https://github.com/apache/arrow/tree/master/cpp/submodules]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Parquet-testing repository license question

2019-01-23 Thread Rylan Dmello
Hi Wes,


Thank you for replying...I will go ahead and create a new JIRA issue for this 
then.


Rylan



From: Wes McKinney 
Sent: Wednesday, January 23, 2019 3:12 PM
To: dev@arrow.apache.org
Subject: Re: Parquet-testing repository license question

hi Rylan,

We don't distribute this data with Arrow releases, but it is depended
on by some unit tests. For the avoidance of ambiguity we should add
the Apache 2.0 license to that repository.

- Wes

On Wed, Jan 23, 2019 at 2:00 PM Rylan Dmello  wrote:
>
> Hi,
>
>
> I noticed that there's a parquet-testing repository linked as a Git submodule 
> from the Apache Arrow repository here:
>
>
> https://github.com/apache/arrow/tree/master/cpp/submodules
>
>
> The parquet-testing repository currently doesn't seem to have a license file. 
> I was wondering about what the licensing implications of this are.
>
>
> Is the license considered to be implicitly 'inherited' from the parent Apache 
> Arrow repository? Perhaps it would be helpful to have an explicit license 
> file to eliminate any potential ambiguity here
>
>
> Thanks,
>
> Rylan



[jira] [Created] (ARROW-4344) [Java] Further cleanup maven output

2019-01-23 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-4344:
---

 Summary: [Java] Further cleanup maven output
 Key: ARROW-4344
 URL: https://issues.apache.org/jira/browse/ARROW-4344
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Followup to ARROW-4180, I noticed a EchoServer logs info output that should be 
changed to debug. Also, upgrading the rat license check plugin will not output 
all files excluded, which ends up to be a large amount of output as it is done 
for every module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Parquet-testing repository license question

2019-01-23 Thread Wes McKinney
hi Rylan,

We don't distribute this data with Arrow releases, but it is depended
on by some unit tests. For the avoidance of ambiguity we should add
the Apache 2.0 license to that repository.

- Wes

On Wed, Jan 23, 2019 at 2:00 PM Rylan Dmello  wrote:
>
> Hi,
>
>
> I noticed that there's a parquet-testing repository linked as a Git submodule 
> from the Apache Arrow repository here:
>
>
> https://github.com/apache/arrow/tree/master/cpp/submodules
>
>
> The parquet-testing repository currently doesn't seem to have a license file. 
> I was wondering about what the licensing implications of this are.
>
>
> Is the license considered to be implicitly 'inherited' from the parent Apache 
> Arrow repository? Perhaps it would be helpful to have an explicit license 
> file to eliminate any potential ambiguity here
>
>
> Thanks,
>
> Rylan


[jira] [Created] (ARROW-4343) [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to docker-compose setup

2019-01-23 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4343:
---

 Summary: [C++] Add as complete as possible Ubuntu Trusty / 14.04 
build to docker-compose setup
 Key: ARROW-4343
 URL: https://issues.apache.org/jira/browse/ARROW-4343
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.13.0


Until we formally stop supporting Trusty it would be useful to be able to 
verify in Docker that builds work there. I still have an Ubuntu 14.04 machine 
that I use (and I've been filing bugs that I find on it) but not sure for how 
much longer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Parquet-testing repository license question

2019-01-23 Thread Rylan Dmello
Hi,


I noticed that there's a parquet-testing repository linked as a Git submodule 
from the Apache Arrow repository here:


https://github.com/apache/arrow/tree/master/cpp/submodules


The parquet-testing repository currently doesn't seem to have a license file. I 
was wondering about what the licensing implications of this are.


Is the license considered to be implicitly 'inherited' from the parent Apache 
Arrow repository? Perhaps it would be helpful to have an explicit license file 
to eliminate any potential ambiguity here


Thanks,

Rylan


[jira] [Created] (ARROW-4342) [Gandiva][Java] spurious failures in projector cache test

2019-01-23 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-4342:
-

 Summary: [Gandiva][Java] spurious failures in projector cache test
 Key: ARROW-4342
 URL: https://issues.apache.org/jira/browse/ARROW-4342
 Project: Apache Arrow
  Issue Type: Bug
  Components: Gandiva, Java
Reporter: Pindikura Ravindra


[ERROR] Tests run: 21, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.542 
s <<< FAILURE! - in org.apache.arrow.gandiva.evaluator.ProjectorTest

[ERROR] testMakeProjector(org.apache.arrow.gandiva.evaluator.ProjectorTest) 
Time elapsed: 0.079 s <<< FAILURE! java.lang.AssertionError at 
org.apache.arrow.gandiva.evaluator.ProjectorTest.testMakeProjector(ProjectorTest.java:164)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4341) [C++] Use TypedBufferBuilder in BooleanBuilder

2019-01-23 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4341:
---

 Summary: [C++] Use TypedBufferBuilder in BooleanBuilder
 Key: ARROW-4341
 URL: https://issues.apache.org/jira/browse/ARROW-4341
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.13.0


Follow up work to ARROW-4031



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4340) Update IWYU version in the `lint` dockerfile

2019-01-23 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4340:
--

 Summary: Update IWYU version in the `lint` dockerfile
 Key: ARROW-4340
 URL: https://issues.apache.org/jira/browse/ARROW-4340
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Krisztian Szucs


I was trying to cleanup the c++ imports based on the current docker-iwyu 
suggestions, but it requires to be customized (symbol maps and pragmas) more 
than it is currently. It'd also help a lot to use the latest IWYU version (see 
the changelog https://include-what-you-use.org/)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-23 Thread Chao Sun
+1 (non-binding)

Glad to see this coming and I think it is a great complement to existing
modules, e.g., Arrow and Parquet. It also aligns with the overall direction
that the project is going.

Chao

On Wed, Jan 23, 2019 at 9:30 AM Andy Grove  wrote:

> As far as I know, the majority of the PMC are not actively using Rust, so
> as supporting evidence for interest in this donation from the Rust
> community, here is a Reddit thread where I talked about offering DataFusion
> for donation recently:
>
>
> https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_query_engine_for_apache/
>
> There were 69 upvotes and many supportive comments, including a couple
> where people specifically mentioned that they liked the fact that
> DataFusion uses Arrow. I would hope that this donation leads to more people
> contributing to Arrow.
>
> Thanks,
>
> Andy.
>
> On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale 
> wrote:
>
> > Hi Andy,
> >
> > +1 : Accept contribution of DataFusion Rust library
> >
> > Thanks
> >
> > On Wed, 23 Jan 2019 at 03:05, Wes McKinney  wrote:
> >
> > > Dear all,
> > >
> > > The developers of DataFusion, an analytical query engine written
> > > in Rust, based on the Arrow columnar memory format, are proposing
> > > to donate the code to Apache Arrow:
> > >
> > > https://github.com/andygrove/datafusion
> > >
> > > The community has had an opportunity to discuss this [1] and
> > > there do not seem to be objections to this. Andy Grove has staged
> > > the code donation in the form of a pull request:
> > >
> > > https://github.com/apache/arrow/pull/3399
> > >
> > > This vote is to determine if the Arrow PMC is in favor of accepting
> > > this donation. If the vote passes, the PMC and the authors of the code
> > > will work together to complete the ASF IP Clearance process
> > > (http://incubator.apache.org/ip-clearance/) and import this Rust
> > > codebase implementation into Apache Arrow.
> > >
> > > [ ] +1 : Accept contribution of DataFusion Rust library
> > > [ ]  0 : No opinion
> > > [ ] -1 : Reject contribution because...
> > >
> > > Here is my vote: +1
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]:
> > >
> >
> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > >
> >
>


Re: Arrow Sync call: 17:00 UTC (12p Eastern)

2019-01-23 Thread Francois Saint-Jacques
Notes from today's meeting

Attendees:
- François Saint-Jacques (Ursa Labs/RStudio)
- Wes McKinney (Ursa Labs/RStudio)
  - Benchmark Project
  - PR Backlog
- Ben Kietzman (Ursa Labs/RStudio)
- Neville Dipale
- Siddhart Teotia (Dremio)
- Andy Grove
- Ravindra (Dremio)
- Shyam SIngh (Dremio)
- Li Jin (Two Sigma)
- Hatem Helal (MathWorks)

- Benchmark Project
  - Call for external help to implement benchmarks in languages like Java,
Go, Rust...
  - Implement a well documented process for adding/updating a benchmark
  - Automate detection of regression in benchmarks, possibly as a nightly
job
  - Increase the diversity of platforms (architecture, operating system)

- PR Backlog
  - Call for external help to review the backlog of submitted PRs,
especially in non-C++ or non-Python components



On Wed, Jan 23, 2019 at 10:31 AM Wes McKinney  wrote:

> Biweekly public call as usual at
>
> https://meet.google.com/vtm-teks-phx
>
> Minutes to be posted after the call
>


[jira] [Created] (ARROW-4339) [C++] rewrite cpp/README shorter, with a separate contribution guide

2019-01-23 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4339:


 Summary: [C++] rewrite cpp/README shorter, with a separate 
contribution guide
 Key: ARROW-4339
 URL: https://issues.apache.org/jira/browse/ARROW-4339
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Benjamin Kietzman


The README.md for the cpp project has grown long and contains a lot of 
information specific to contributors. Move this information into a separate 
CONTRIBUTING.md

In particular, draw more attention to the cmake option 
`-DBUILD_WARNING_LEVEL=CHECKIN` which is used in Travis and includes `-Werror`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4338) tamamlamak

2019-01-23 Thread mehmet ali (JIRA)
mehmet ali  created ARROW-4338:
--

 Summary: tamamlamak
 Key: ARROW-4338
 URL: https://issues.apache.org/jira/browse/ARROW-4338
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Affects Versions: 0.13.0, JS-0.4.0, 1.0.0, JS-0.5.0
Reporter: mehmet ali 
 Fix For: 0.10.0, JS-0.3.1, JS-0.3.0, 0.1.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4337) [C#] Array / RecordBatch Builder Fluent API

2019-01-23 Thread Chris Hutchinson (JIRA)
Chris Hutchinson created ARROW-4337:
---

 Summary: [C#] Array / RecordBatch Builder Fluent API
 Key: ARROW-4337
 URL: https://issues.apache.org/jira/browse/ARROW-4337
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Chris Hutchinson


Implement a fluent API for building arrays and record batches from Arrow 
buffers, flat arrays, spans, enumerables, etc.

A future implementation could extend this API with support for ADO.NET 
DataTables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Arrow Sync call: 17:00 UTC (12p Eastern)

2019-01-23 Thread Wes McKinney
Biweekly public call as usual at

https://meet.google.com/vtm-teks-phx

Minutes to be posted after the call


[jira] [Created] (ARROW-4335) [C++] Better document sparse tensor support

2019-01-23 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4335:
-

 Summary: [C++] Better document sparse tensor support
 Key: ARROW-4335
 URL: https://issues.apache.org/jira/browse/ARROW-4335
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.12.0
Reporter: Antoine Pitrou
Assignee: Kenta Murata


Currently the documentation (including docstrings) for the sparse tensor 
classes and methods is very... sparse. It would be nice to make those 
approachable.

(also, a suggestion: rename {{SparseCSRIndex::indptr()}} to something else? 
perhaps {{SparseCSRIndex::row_indices()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-23 Thread Neville Dipale
Hi Andy,

+1 : Accept contribution of DataFusion Rust library

Thanks

On Wed, 23 Jan 2019 at 03:05, Wes McKinney  wrote:

> Dear all,
>
> The developers of DataFusion, an analytical query engine written
> in Rust, based on the Arrow columnar memory format, are proposing
> to donate the code to Apache Arrow:
>
> https://github.com/andygrove/datafusion
>
> The community has had an opportunity to discuss this [1] and
> there do not seem to be objections to this. Andy Grove has staged
> the code donation in the form of a pull request:
>
> https://github.com/apache/arrow/pull/3399
>
> This vote is to determine if the Arrow PMC is in favor of accepting
> this donation. If the vote passes, the PMC and the authors of the code
> will work together to complete the ASF IP Clearance process
> (http://incubator.apache.org/ip-clearance/) and import this Rust
> codebase implementation into Apache Arrow.
>
> [ ] +1 : Accept contribution of DataFusion Rust library
> [ ]  0 : No opinion
> [ ] -1 : Reject contribution because...
>
> Here is my vote: +1
>
> The vote will be open for at least 72 hours.
>
> Thanks,
> Wes
>
> [1]:
> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
>


Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

2019-01-23 Thread Antoine Pitrou
On Tue, 22 Jan 2019 16:57:42 -0600
Wes McKinney  wrote:
> 
> There were 1540 patches merged into the project in 2018 (excluding the
> Parquet merge) -- that's more than 4 patches per day. Evidence
> suggests that the overall patch count for 2019 will be even higher; if
> I had to guess somewhere well over 2000. Out of last year's patches, I
> merged 1028, i.e. 2 out of every 3. If we are to be able to take on
> 2000 or more patches this year, we'll need more help. If you are
> neither a committer nor a PMC member, you can still help with code
> review and discussions to help contributors get their work into
> merge-ready state.

I generally try to review as many PRs as I feel competent to.

What should be the guideline when some PRs for other implementations
(such as C#, Java...) are lingering on?

> I'll do what I have to in order to keep the patches flowing as fast as
> possible into master, but contributors and other maintainers can help
> with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
> frequently applies. In many cases it is better to merge a patch and
> open up a JIRA for follow up improvements if there is uncertainty
> about whether something is "done".

I'm quite wary of technical debt (which can quickly plague fast-growing
projects) so I tend to be a bit demanding in my reviews :-)

Regards

Antoine.




[jira] [Created] (ARROW-4334) [CI] Setup conda-forge channel globally in travis builds

2019-01-23 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4334:
--

 Summary: [CI] Setup conda-forge channel globally in travis builds
 Key: ARROW-4334
 URL: https://issues.apache.org/jira/browse/ARROW-4334
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration
Reporter: Krisztian Szucs


It looks like conda-forge is already set as top-priority channel: 
[https://github.com/apache/arrow/blob/master/ci/travis_install_conda.sh#L71]


We can most certeinly remove all occurrences of {{-c conda-forge}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)