[jira] [Created] (ARROW-4280) [C++][Documentation] It looks like flex and bison are required for parquet

2019-01-16 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4280:
--

 Summary: [C++][Documentation] It looks like flex and bison are 
required for parquet
 Key: ARROW-4280
 URL: https://issues.apache.org/jira/browse/ARROW-4280
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Documentation
Reporter: Micah Kornfield
Assignee: Micah Kornfield


When trying to build parquet, it initially failed because it couldn't find flex 
and bison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4279) [C++] Rebase https://github.com/apache/parquet-cpp/pull/462# onto arrow repo

2019-01-16 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4279:
--

 Summary: [C++] Rebase 
https://github.com/apache/parquet-cpp/pull/462# onto arrow repo
 Key: ARROW-4279
 URL: https://issues.apache.org/jira/browse/ARROW-4279
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield


The old commit needs to be change to be a PR against the arrow repo and not 
parquet-cpp.

Changes needed as part of this:

1.  Allow for running both old and new code path until performance regression 
can be eliminated.

2.  Instead of passing through nthreads consider using util/task-group from 
arrow as a parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4278) Create performance benchmark for parquet reading

2019-01-16 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4278:
--

 Summary: Create performance benchmark for parquet reading
 Key: ARROW-4278
 URL: https://issues.apache.org/jira/browse/ARROW-4278
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Based on conversation at on 
[https://github.com/apache/parquet-cpp/pull/462#|https://github.com/apache/parquet-cpp/pull/462]
 it seems like a good first step is to incorporate the benchmark provided by 
snir to measure current performance (probably in C++) which writes one column 
of integers from 

0 to 1000 and then trys to read them

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4277) [C++] Add gmock to toolchain

2019-01-16 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4277:
--

 Summary: [C++] Add gmock to toolchain
 Key: ARROW-4277
 URL: https://issues.apache.org/jira/browse/ARROW-4277
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4276) [Release] Remove needless Bintray authentication from binaries verify script

2019-01-16 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4276:
---

 Summary: [Release] Remove needless Bintray authentication from 
binaries verify script
 Key: ARROW-4276
 URL: https://issues.apache.org/jira/browse/ARROW-4276
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow

2019-01-16 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4275:
-

 Summary: [C++] gandiva-decimal_single_test extremely slow
 Key: ARROW-4275
 URL: https://issues.apache.org/jira/browse/ARROW-4275
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration, Gandiva
Affects Versions: 0.11.1
Reporter: Antoine Pitrou


{{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind:
{code}
 99/100 Test #128: gandiva-decimal_single_test ...   Passed  397.11 
sec
100/100 Test #130: gandiva-decimal_single_test_static    Passed  338.97 
sec
{code}

(full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707)

Something should be done to make it faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4274) [Gandiva] static jni library broken after decimal changes

2019-01-16 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-4274:
-

 Summary: [Gandiva] static jni library broken after decimal changes
 Key: ARROW-4274
 URL: https://issues.apache.org/jira/browse/ARROW-4274
 Project: Apache Arrow
  Issue Type: Bug
  Components: Gandiva
Reporter: Pindikura Ravindra
Assignee: Pindikura Ravindra


With the decimal changes, there can be cpp calls from the IR code. The symbols 
for these  need to be visible in the gandiva cpp library. but, the jni library 
makes visible only a limited set of symbols from gandiva (the ones specified in 
src/gandiva/jni/symbols.map).

This breaks  if the jni library links with the static-libstdc++ (dremio builds 
the gandiva binary with stdc++ statically linked) due to two reasons
 # The cpp symbols like std::ios_base::init are not exported via symbols.map. 
This causes LLVM to complain that there is are unresolved symbols.
 # Also, there is a problem with exceptions (string_view.hpp can throw 
exceptions) - This alsi causes LLVM to complain that unwindResume is unresolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release Apache Arrow 0.12.0 RC4

2019-01-16 Thread Krisztián Szűcs
On Wed, Jan 16, 2019 at 1:21 PM Antoine Pitrou  wrote:

>
> Tested on Ubuntu 18.04 (x86-64).
>
> - The source verification went fine.
>
> - The binaries verification doesn't seem to do anything and tells me
> "BINTRAY_PASSWORD is empty"
>
See PR https://github.com/apache/arrow/pull/3397

>
> Regards
>
> Antoine.
>
>
> Le 16/01/2019 à 12:59, Krisztián Szűcs a écrit :
> > Hi,
> >
> > I'd like to propose the 2nd voteable release candidate (RC4) of Apache
> > Arrow
> > version 0.12.0. This is a major release consisting of 610 resolved JIRAs
> > [1].
> >
> > We've hit several roadblocks during the release. The most recent issues
> > were caused by the simultaneous releases of numpy, pandas, and even the
> > great conda-forge compiler migration, which was finished yesterday [10]:
> > - RC0: the source archive didn't include required files to build
> > gandiva-glib
> >   documents, also causing failed binary build for Debian Stretch [6].
> > - RC1: compatibility issues with the just released pandas version
> 0.24.rc1
> > [8]
> > - RC2: voted, but decided to cut a new one because of faulty safe casts
> [11]
> > - RC3: various conda-forge problems and compatibility with pandas 0.22
> [9]
> >
> > This release candidate is based on commit:
> > 8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0 [2]
> >
> > The source release rc4 is hosted at [3].
> > The binary artifacts are hosted at [4].
> > The changelog is located at [5].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [7] for how to validate a release candidate.
> > Please use the verification script from the master, because it has
> required
> > a patch to work after the recent conda-forge compiler migration [12].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 0.12.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 0.12.0 because...
> >
> > - Krisztian
> >
> > [1]:
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0
> > [2]:
> >
> https://github.com/apache/arrow/commit/8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0
> > <
> https://github.com/apache/arrow/tree/5ac58233936fcd2213c23adce72a79911c3f1359
> >
> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.12.0-rc4/
> > 
> > [4]: https://bintray.com/apache/arrow
> > 
> > [5]:
> >
> https://github.com/apache/arrow/blob/8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0/CHANGELOG.md
> > <
> https://github.com/apache/arrow/blob/0bbaef451ae9d59799e693df9b2383808516a326/CHANGELOG.md
> >
> > [6]: https://travis-ci.org/kszucs/crossbow/builds/478393330
> > [7]:
> >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > [8]: https://travis-ci.org/kszucs/crossbow/builds/4785586402
> > 
> > [9]: https://github.com/apache/arrow/pull/3410
> > [10] https://conda-forge.org/status/
> > [11]:
> > https://mail-archives.apache.org/mod_mbox/arrow-dev/201901.mbox/browser
> > [12]: https://github.com/apache/arrow/pull/3413
> >
>


Re: [VOTE] Release Apache Arrow 0.12.0 RC4

2019-01-16 Thread Andy Grove
+1

The Rust implementation is working correctly.

One thing I did just realize though is that the test files that the Rust
tests rely on are not packaged up as part of the release. The tests work
because I have the PARQUET_TEST_DATA env var pointing to my cloned github
repo, but if I just had this release tarball I would not be able to run the
tests successfully. I'm assuming this is expected behavior?

On Wed, Jan 16, 2019 at 4:59 AM Krisztián Szűcs 
wrote:

> Hi,
>
> I'd like to propose the 2nd voteable release candidate (RC4) of Apache
> Arrow
> version 0.12.0. This is a major release consisting of 610 resolved JIRAs
> [1].
>
> We've hit several roadblocks during the release. The most recent issues
> were caused by the simultaneous releases of numpy, pandas, and even the
> great conda-forge compiler migration, which was finished yesterday [10]:
> - RC0: the source archive didn't include required files to build
> gandiva-glib
>   documents, also causing failed binary build for Debian Stretch [6].
> - RC1: compatibility issues with the just released pandas version 0.24.rc1
> [8]
> - RC2: voted, but decided to cut a new one because of faulty safe casts
> [11]
> - RC3: various conda-forge problems and compatibility with pandas 0.22 [9]
>
> This release candidate is based on commit:
> 8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0 [2]
>
> The source release rc4 is hosted at [3].
> The binary artifacts are hosted at [4].
> The changelog is located at [5].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. See [7] for how to validate a release candidate.
> Please use the verification script from the master, because it has required
> a patch to work after the recent conda-forge compiler migration [12].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow 0.12.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.12.0 because...
>
> - Krisztian
>
> [1]:
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0
> [2]:
>
> https://github.com/apache/arrow/commit/8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0
> <
> https://github.com/apache/arrow/tree/5ac58233936fcd2213c23adce72a79911c3f1359
> >
> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.12.0-rc4/
> 
> [4]: https://bintray.com/apache/arrow
> 
> [5]:
>
> https://github.com/apache/arrow/blob/8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0/CHANGELOG.md
> <
> https://github.com/apache/arrow/blob/0bbaef451ae9d59799e693df9b2383808516a326/CHANGELOG.md
> >
> [6]: https://travis-ci.org/kszucs/crossbow/builds/478393330
> [7]:
>
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> [8]: https://travis-ci.org/kszucs/crossbow/builds/4785586402
> 
> [9]: https://github.com/apache/arrow/pull/3410
> [10] https://conda-forge.org/status/
> [11]:
> https://mail-archives.apache.org/mod_mbox/arrow-dev/201901.mbox/browser
> [12]: https://github.com/apache/arrow/pull/3413
>


Re: [VOTE] Release Apache Arrow 0.12.0 RC4

2019-01-16 Thread Antoine Pitrou


Tested on Ubuntu 18.04 (x86-64).

- The source verification went fine.

- The binaries verification doesn't seem to do anything and tells me
"BINTRAY_PASSWORD is empty"

Regards

Antoine.


Le 16/01/2019 à 12:59, Krisztián Szűcs a écrit :
> Hi,
> 
> I'd like to propose the 2nd voteable release candidate (RC4) of Apache
> Arrow
> version 0.12.0. This is a major release consisting of 610 resolved JIRAs
> [1].
> 
> We've hit several roadblocks during the release. The most recent issues
> were caused by the simultaneous releases of numpy, pandas, and even the
> great conda-forge compiler migration, which was finished yesterday [10]:
> - RC0: the source archive didn't include required files to build
> gandiva-glib
>   documents, also causing failed binary build for Debian Stretch [6].
> - RC1: compatibility issues with the just released pandas version 0.24.rc1
> [8]
> - RC2: voted, but decided to cut a new one because of faulty safe casts [11]
> - RC3: various conda-forge problems and compatibility with pandas 0.22 [9]
> 
> This release candidate is based on commit:
> 8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0 [2]
> 
> The source release rc4 is hosted at [3].
> The binary artifacts are hosted at [4].
> The changelog is located at [5].
> 
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. See [7] for how to validate a release candidate.
> Please use the verification script from the master, because it has required
> a patch to work after the recent conda-forge compiler migration [12].
> 
> The vote will be open for at least 72 hours.
> 
> [ ] +1 Release this as Apache Arrow 0.12.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.12.0 because...
> 
> - Krisztian
> 
> [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0
> [2]:
> https://github.com/apache/arrow/commit/8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0
> 
> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.12.0-rc4/
> 
> [4]: https://bintray.com/apache/arrow
> 
> [5]:
> https://github.com/apache/arrow/blob/8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0/CHANGELOG.md
> 
> [6]: https://travis-ci.org/kszucs/crossbow/builds/478393330
> [7]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> [8]: https://travis-ci.org/kszucs/crossbow/builds/4785586402
> 
> [9]: https://github.com/apache/arrow/pull/3410
> [10] https://conda-forge.org/status/
> [11]:
> https://mail-archives.apache.org/mod_mbox/arrow-dev/201901.mbox/browser
> [12]: https://github.com/apache/arrow/pull/3413
> 


[VOTE] Release Apache Arrow 0.12.0 RC4

2019-01-16 Thread Krisztián Szűcs
Hi,

I'd like to propose the 2nd voteable release candidate (RC4) of Apache
Arrow
version 0.12.0. This is a major release consisting of 610 resolved JIRAs
[1].

We've hit several roadblocks during the release. The most recent issues
were caused by the simultaneous releases of numpy, pandas, and even the
great conda-forge compiler migration, which was finished yesterday [10]:
- RC0: the source archive didn't include required files to build
gandiva-glib
  documents, also causing failed binary build for Debian Stretch [6].
- RC1: compatibility issues with the just released pandas version 0.24.rc1
[8]
- RC2: voted, but decided to cut a new one because of faulty safe casts [11]
- RC3: various conda-forge problems and compatibility with pandas 0.22 [9]

This release candidate is based on commit:
8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0 [2]

The source release rc4 is hosted at [3].
The binary artifacts are hosted at [4].
The changelog is located at [5].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [7] for how to validate a release candidate.
Please use the verification script from the master, because it has required
a patch to work after the recent conda-forge compiler migration [12].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 0.12.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow 0.12.0 because...

- Krisztian

[1]:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0
[2]:
https://github.com/apache/arrow/commit/8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0

[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.12.0-rc4/

[4]: https://bintray.com/apache/arrow

[5]:
https://github.com/apache/arrow/blob/8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0/CHANGELOG.md

[6]: https://travis-ci.org/kszucs/crossbow/builds/478393330
[7]:
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
[8]: https://travis-ci.org/kszucs/crossbow/builds/4785586402

[9]: https://github.com/apache/arrow/pull/3410
[10] https://conda-forge.org/status/
[11]:
https://mail-archives.apache.org/mod_mbox/arrow-dev/201901.mbox/browser
[12]: https://github.com/apache/arrow/pull/3413


[jira] [Created] (ARROW-4273) [Release] Fix verification script to use cf201901 conda-forge label

2019-01-16 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4273:
--

 Summary: [Release] Fix verification script to use cf201901 
conda-forge label
 Key: ARROW-4273
 URL: https://issues.apache.org/jira/browse/ARROW-4273
 Project: Apache Arrow
  Issue Type: Task
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4272) illegal hardware instruction

2019-01-16 Thread Elchin (JIRA)
Elchin created ARROW-4272:
-

 Summary: illegal hardware instruction
 Key: ARROW-4272
 URL: https://issues.apache.org/jira/browse/ARROW-4272
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.11.1
 Environment: Python 3.6.7
PySpark 2.4.0
PyArrow: 0.11.1
Pandas: 0.23.4
NumPy: 1.15.4
OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
x86_64 x86_64 GNU/Linux
Reporter: Elchin
 Attachments: core

I can't import pyarrow, it crashes:
{code:java}
>>> import pyarrow as pa
[1]    31441 illegal hardware instruction (core dumped)  python3{code}
Core dump is attached to issue, it can help you to understand what is the 
problem.

The environment is:

Python 3.6.7
 PySpark 2.4.0
 PyArrow: 0.11.1
 Pandas: 0.23.4
 NumPy: 1.15.4
 OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)