[NIGHTLY] Arrow Build Report for Job nightly-2020-04-17-0

2020-04-17 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-17-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0

Failed Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-github-centos-6-amd64
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-github-test-conda-cpp

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-github-centos-8-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-conda-win-vs2015-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-github-debian-buster-amd64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-github-test-conda-cpp-valgrind
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-azure-test-conda-python-3.7
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.8-dask-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-0-circle-test-conda-python-3.8-jpype
- test-conda-python-3.8-pandas

[jira] [Created] (ARROW-8496) [C++] Refine ByteStreamSplitDecodeScalar

2020-04-17 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-8496:
---

 Summary: [C++] Refine ByteStreamSplitDecodeScalar
 Key: ARROW-8496
 URL: https://issues.apache.org/jira/browse/ARROW-8496
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Yibo Cai
Assignee: Yibo Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.17.0 - RC0

2020-04-17 Thread Antoine Pitrou


Hi,

I tested the sources on Ubuntu 18.04.  Everything went fine, including
CUDA, except for Javascript (as usual).

My vote: +1 (binding)

Regards

Antoine.


Le 17/04/2020 à 02:26, Krisztián Szűcs a écrit :
> Hi,
> 
> I would like to propose the following release candidate (RC0) of Apache
> Arrow version 0.17.0. This is a release consisting of 582
> resolved JIRA issues[1].
> 
> This release candidate is based on commit:
> 3cbcb7b62c2f2d02851bff837758637eb592a64b [2]
> 
> The source release rc0 is hosted at [3].
> The binary artifacts are hosted at [4][5][6][7].
> The changelog is located at [8].
> 
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. See [9] for how to validate a release candidate.
> 
> The vote will be open for at least 72 hours.
> 
> [ ] +1 Release this as Apache Arrow 0.17.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.17.0 because...
> 
> [1]: 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.17.0
> [2]: 
> https://github.com/apache/arrow/tree/3cbcb7b62c2f2d02851bff837758637eb592a64b
> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.17.0-rc0
> [4]: https://bintray.com/apache/arrow/centos-rc/0.17.0-rc0
> [5]: https://bintray.com/apache/arrow/debian-rc/0.17.0-rc0
> [6]: https://bintray.com/apache/arrow/python-rc/0.17.0-rc0
> [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.17.0-rc0
> [8]: 
> https://github.com/apache/arrow/blob/3cbcb7b62c2f2d02851bff837758637eb592a64b/CHANGELOG.md
> [9]: 
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> 


[jira] [Created] (ARROW-8497) [Archery] Add missing component to builds

2020-04-17 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8497:
-

 Summary: [Archery] Add missing component to builds
 Key: ARROW-8497
 URL: https://issues.apache.org/jira/browse/ARROW-8497
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery, Developer Tools
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8498) Schema.from_pandas fails on extension type, while Table.from_pandas works

2020-04-17 Thread Thomas Buhrmann (Jira)
Thomas Buhrmann created ARROW-8498:
--

 Summary: Schema.from_pandas fails on extension type, while 
Table.from_pandas works
 Key: ARROW-8498
 URL: https://issues.apache.org/jira/browse/ARROW-8498
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.16.0
Reporter: Thomas Buhrmann


While Table.from_pandas() seems to work as expected with extension types,
 Schema.from_pandas()  raises an ArrowTypeError:

{code:python}
df = pd.DataFrame({
   "x": pd.Series([1, 2, None], dtype="Int8"),
   "y": pd.Series(["a", "b", None], dtype="category"),
   "z": pd.Series(["ab", "bc", None], dtype="string"),
})
print(pa.Table.from_pandas(df).schema)
print(pa.Schema.from_pandas(df))
{code}
 
Results in:

{noformat}
x: int8
y: dictionary
z: string
metadata

{b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, "'
b'stop": 3, "step": 1}], "column_indexes": [{"name": null, "field_'
b'name": null, "pandas_type": "unicode", "numpy_type": "object", "'
b'metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "x", "f'
b'ield_name": "x", "pandas_type": "int8", "numpy_type": "Int8", "m'
b'etadata": null}, {"name": "y", "field_name": "y", "pandas_type":'
b' "categorical", "numpy_type": "int8", "metadata": {"num_categori'
b'es": 2, "ordered": false}}, {"name": "z", "field_name": "z", "pa'
b'ndas_type": "unicode", "numpy_type": "string", "metadata": null}'
b'], "creator": {"library": "pyarrow", "version": "0.16.0"}, "pand'
b'as_version": "1.0.3"}'}

---
ArrowTypeErrorTraceback (most recent call last)
...
ArrowTypeError: Did not pass numpy.dtype object
{noformat}

I'd imagine Table.from_pandas(df).schema and Schema.from_pandas(df) should 
result in the exact same object?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8499) [C++][Dataset] In ScannerBuilder, batch_size will not work if projecter is not empty

2020-04-17 Thread Hongze Zhang (Jira)
Hongze Zhang created ARROW-8499:
---

 Summary: [C++][Dataset] In ScannerBuilder, batch_size will not 
work if projecter is not empty
 Key: ARROW-8499
 URL: https://issues.apache.org/jira/browse/ARROW-8499
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.16.0
Reporter: Hongze Zhang
Assignee: Hongze Zhang


This is due to incomplete logic in function ScanOptions::ReplaceSchema(...)[1].

Introduced since ARROW-7547's fix was applied.

[1] 
https://github.com/apache/arrow/blob/40cd5b8a81db4fd038d3bcbdcd59cba98f336dd9/cpp/src/arrow/dataset/scanner.cc#L41-L47



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8500) [C++] Use selection vectors in Filter implementation for record batches, tables

2020-04-17 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8500:
---

 Summary: [C++] Use selection vectors in Filter implementation for 
record batches, tables
 Key: ARROW-8500
 URL: https://issues.apache.org/jira/browse/ARROW-8500
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


The current implementation of {{Filter}} on RecordBatch, Table does redundant 
analysis of the filter array. It would be more efficient in most cases (i.e. 
whenever there are multiple columns) to convert the boolean array into a 
selection vector and then use {{Take}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-04-17-1

2020-04-17 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-17-1

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1

Failed Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-centos-6-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-centos-8-amd64
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-win-vs2015-py37
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-debian-buster-amd64
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-test-conda-cpp
- test-ubuntu-18.04-cpp-static:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-test-ubuntu-18.04-cpp-static
- ubuntu-bionic-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-ubuntu-bionic-amd64
- ubuntu-eoan-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-ubuntu-eoan-amd64
- ubuntu-focal-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-ubuntu-focal-amd64

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-centos-7-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-win-vs2015-py36
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-conda-win-vs2015-py38
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-travis-homebrew-r-autobrew
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-azure-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-1-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?q

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

2020-04-17 Thread Wes McKinney
Sounds good.

In general I would say that this is a good opportunity to make
improvements around random data generation. For example, I don't think
we have an API for generating a RecordBatch given a schema and some
options (e.g. probability of nulls, distribution of list sizes), for
example, but that would be a good thing to have to assist both with
perf and correctness testing.

On Thu, Apr 16, 2020 at 11:28 PM Micah Kornfield  wrote:
>
> Hi Wes,
> Thanks that seems like a good characterization.  I opened up some JIRA 
> subtasks on ARROW-1644 which go into a little more detail on tasks that can 
> probably be worked on in parallel (I've only assigned ones to myself that I'm 
> actively working on, happy to add discuss/collaborate on the finer points on 
> the JIRAs).  There will probably be a few more JIRAs to open to do final 
> integration work (e.g. a flag to switch between old and new engines).
>
> For unit tests (Item B).  as noted earlier in the thread there is already a 
> disabled unit test trying to verify basic ability to round-trip but that 
> probably isn't sufficient.
>
> Thanks,
> Micah
>
> On Wed, Apr 15, 2020 at 9:32 AM Wes McKinney  wrote:
>>
>> hi Micah,
>>
>> Sounds good. It seems like there are a few projects where people might
>> be able to work without stepping on each other's toes
>>
>> A. Array reassembly from raw repetition/definition levels (I would
>> guess this would be your focus)
>> B. Schema and data generation for round-trip correctness and
>> performance testing (I reckon that the unit tests for A will largely
>> be hand-written examples like you did for the write path)
>> C. Benchmarks, particularly to be able to assess performance changes
>> going from the old incomplete implementations to the new ones
>>
>> Some of us should be able to pitch in to help with this. Might also be
>> a good opportunity to do some cleanup of the test code in
>> cpp/src/parquet/arrow
>>
>> - Wes
>>
>> On Tue, Apr 14, 2020 at 11:19 PM Micah Kornfield  
>> wrote:
>> >
>> > Hi Wes,
>> > Yes, I'm making progress and at this point I anticipate being able to 
>> > finish it off by next release, possibly without support for round tripping 
>> > fixed size lists.  I've been spending some time thinking about different 
>> > approaches and have started coding some of the building blocks, which I 
>> > think in the common case (relatively low nesting levels) should be fairly 
>> > performant (I'm also going to write some benchmarks to sanity check this). 
>> >  One caveat to this is my schedule is going to change slightly next week 
>> > and its possible my bandwidth might be more limited, I'll update the list 
>> > if this happens.
>> >
>> > I think there are at least two areas that I'm not working on that could be 
>> > parallelized if you or your team has bandwidth.
>> >
>> > 1. It would be good to have some parquet files representing real world 
>> > datasets available to benchmark against.
>> > 2. The higher level book keeping of tracking which def-levels/rep-levels 
>> > are needed to compare against for any particular column (i.e. preceding 
>> > repeated parent).  I'm currently working on the code that takes these and 
>> > converts them to offsets/null fields.
>> >
>> > I can go into more details if you or your team would like to collaborate.
>> >
>> > Thanks,
>> > Micah
>> >
>> > On Tue, Apr 14, 2020 at 7:48 AM Wes McKinney  wrote:
>> >>
>> >> hi Micah,
>> >>
>> >> I'm glad that we have the write side of nested completed for 0.17.0.
>> >>
>> >> As far as completing the read side and then implementing sufficient
>> >> testing to exercise corner cases in end-to-end reads/writes, do you
>> >> anticipate being able to work on this in the next 4-6 weeks (obviously
>> >> the state of the world has affected everyone's availability /
>> >> bandwidth)? I ask because someone from my team (or me also) may be
>> >> able to get involved and help this move along. It'd be great to have
>> >> this 100% completed and checked off our list for the next release
>> >> (i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration
>> >> tests get completed also)
>> >>
>> >> thanks
>> >> Wes
>> >>
>> >> On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield  
>> >> wrote:
>> >> >>
>> >> >> Glad to hear about the progress. As I mentioned on #2, what do you
>> >> >> think about setting up a feature branch for you to merge PRs into?
>> >> >> Then the branch can be iterated on and we can merge it back when it's
>> >> >> feature complete and does not have perf regressions for the flat
>> >> >> read/write path.
>> >> >>
>> >> > I'd like to avoid a separate branch if possible.  I'm willing to close 
>> >> > the open PR till I'm sure it is needed but I'm hoping keeping PRs as 
>> >> > small focused as possible with performance testing a long the way will 
>> >> > be a better reviewer and developer experience here.
>> >> >
>> >> >> The earliest I'd have time to work on this myself would likely be
>> >> >> sometime in March. 

Re: [VOTE] Release Apache Arrow 0.17.0 - RC0

2020-04-17 Thread Francois Saint-Jacques
+1 (binding)

Verified all sources locally on Ubuntu 18.04 (including Javascript).
Verified the binaries, wheels verification matches the one found in
https://github.com/apache/arrow/pull/6961

François

On Fri, Apr 17, 2020 at 8:12 AM Antoine Pitrou  wrote:
>
>
> Hi,
>
> I tested the sources on Ubuntu 18.04.  Everything went fine, including
> CUDA, except for Javascript (as usual).
>
> My vote: +1 (binding)
>
> Regards
>
> Antoine.
>
>
> Le 17/04/2020 à 02:26, Krisztián Szűcs a écrit :
> > Hi,
> >
> > I would like to propose the following release candidate (RC0) of Apache
> > Arrow version 0.17.0. This is a release consisting of 582
> > resolved JIRA issues[1].
> >
> > This release candidate is based on commit:
> > 3cbcb7b62c2f2d02851bff837758637eb592a64b [2]
> >
> > The source release rc0 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7].
> > The changelog is located at [8].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [9] for how to validate a release candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 0.17.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 0.17.0 because...
> >
> > [1]: 
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.17.0
> > [2]: 
> > https://github.com/apache/arrow/tree/3cbcb7b62c2f2d02851bff837758637eb592a64b
> > [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.17.0-rc0
> > [4]: https://bintray.com/apache/arrow/centos-rc/0.17.0-rc0
> > [5]: https://bintray.com/apache/arrow/debian-rc/0.17.0-rc0
> > [6]: https://bintray.com/apache/arrow/python-rc/0.17.0-rc0
> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.17.0-rc0
> > [8]: 
> > https://github.com/apache/arrow/blob/3cbcb7b62c2f2d02851bff837758637eb592a64b/CHANGELOG.md
> > [9]: 
> > https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >


[jira] [Created] (ARROW-8501) [Packaging][RPM] Upgrade devtoolset to 8 on CentOS 6

2020-04-17 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-8501:
---

 Summary: [Packaging][RPM] Upgrade devtoolset to 8 on CentOS 6
 Key: ARROW-8501
 URL: https://issues.apache.org/jira/browse/ARROW-8501
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Kouhei Sutou


It seems that devtoolset-6 is removed:

https://github.com/ursa-labs/crossbow/runs/594096124#step:4:3570

{noformat}
No package devtoolset-6 available.
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-04-17-2

2020-04-17 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-17-2

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2

Failed Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-centos-6-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-centos-8-amd64
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-debian-buster-amd64
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-test-conda-cpp
- test-debian-10-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-test-debian-10-cpp
- test-ubuntu-18.04-cpp-static:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-test-ubuntu-18.04-cpp-static
- ubuntu-bionic-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-ubuntu-bionic-amd64
- ubuntu-eoan-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-ubuntu-eoan-amd64
- ubuntu-focal-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-ubuntu-focal-amd64

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-centos-7-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-conda-win-vs2015-py38
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-travis-homebrew-r-autobrew
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-azure-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-17-2-circle-test-

Re: Trouble installing archery?

2020-04-17 Thread Micah Kornfield
Yes, this worked for me as well. Thanks everyone.

On Mon, Apr 13, 2020 at 9:22 AM Bryan Cutler  wrote:

> I had the same problem and Antoine's suggestion was exactly what was wrong.
>
> On Mon, Apr 13, 2020 at 1:27 AM Antoine Pitrou  wrote:
>
> >
> > Le 13/04/2020 à 02:42, Micah Kornfield a écrit :
> > > When I follow the instructions at
> > > https://arrow.apache.org/docs/developers/benchmarks.html
> > >
> > > "pip install -e dev/archery"
> > >
> > > I get a permission denied (error pasted at the end in full).  Are there
> > > additional steps that need to happen when using virtualenv?
> >
> > Hmm, I don't think so.  Did you run `git clean -Xfd` in your checkout?
> > Perhaps there are root-created files lying around... (this often happens
> > with Docker)
> >
> > Regards
> >
> > Antoine.
> >
>


[jira] [Created] (ARROW-8502) [Release][APT][Yum] Ignore all arm64 verifications

2020-04-17 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-8502:
---

 Summary: [Release][APT][Yum] Ignore all arm64 verifications
 Key: ARROW-8502
 URL: https://issues.apache.org/jira/browse/ARROW-8502
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8503) [Packaging][deb] Can't build apache-arrow-archive-keyring for RC

2020-04-17 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-8503:
---

 Summary: [Packaging][deb] Can't build apache-arrow-archive-keyring 
for RC
 Key: ARROW-8503
 URL: https://issues.apache.org/jira/browse/ARROW-8503
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8504) [C++] Add a method that takes an RLE visitor for a bitmap.

2020-04-17 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8504:
--

 Summary: [C++] Add a method that takes an RLE visitor for a bitmap.
 Key: ARROW-8504
 URL: https://issues.apache.org/jira/browse/ARROW-8504
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Micah Kornfield


For nullability data, in many cases nulls are not evenly distributed.  In these 
cases it would be beneficial to have a mechanism to understand when runs of 
set/unset bits are encountered.  One example of this is writing translating a 
bitmap to parquet definition levels .

 

An implementation path could be to add this as method on Bitmap that makes an 
adaptor callback for visit words.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8505) [Release][C#] "sourcelink test" is failed by Apache.Arrow.AssemblyInfo.cs

2020-04-17 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-8505:
---

 Summary: [Release][C#] "sourcelink test" is failed by 
Apache.Arrow.AssemblyInfo.cs
 Key: ARROW-8505
 URL: https://issues.apache.org/jira/browse/ARROW-8505
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#, Packaging
Reporter: Kouhei Sutou
Assignee: Eric Erhardt


{{TEST_DEFAULT=0 TEST_CSHARP=1 dev/release/verify-release-candidate.sh source 
0.17.0 0}} runs {{sourcelink test}}. And it's failed:

https://github.com/ursa-labs/crossbow/runs/594191708?check_suite_focus=true#step:4:1107

{noformat}
+ sourcelink test artifacts/Apache.Arrow/Release/netstandard1.3/Apache.Arrow.pdb
1 Documents without URLs:
2d7aaf2b48220d3dc82554e449b5737767fadd9cc44bff0b1e6ade3f19f0172c sha256 csharp 
/tmp/.NETStandard,Version=v1.3.AssemblyAttributes.cs
1 Documents with errors:
7632ea7a8a56781eb18f13fd45bd2b20e16af47f39fb4ce98460b5870c9af0ad sha256 csharp 
/tmp/arrow-0.17.0.4Tpxy/apache-arrow-0.17.0/csharp/src/Apache.Arrow/obj/Release/netstandard1.3/Apache.Arrow.AssemblyInfo.cs
https://raw.githubusercontent.com/apache/arrow/3cbcb7b62c2f2d02851bff837758637eb592a64b/csharp/src/Apache.Arrow/obj/Release/netstandard1.3/Apache.Arrow.AssemblyInfo.cs
error: url failed NotFound: Not Found
sourcelink test failed
{noformat}

{{Apache.Arrow.AssemblyInfo.cs}} is an auto generated file. So it should not be 
assigned URL.

The {{EmbedUntrackedSources}} Source Link configuration 
https://github.com/dotnet/sourcelink/blob/master/docs/README.md#embeduntrackedsources
 may resolve this but I'm not sure how to use it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)