[jira] [Created] (ARROW-8250) [C++] Add "random access" / slice read API to RecordBatchFileReader
Wes McKinney created ARROW-8250: --- Summary: [C++] Add "random access" / slice read API to RecordBatchFileReader Key: ARROW-8250 URL: https://issues.apache.org/jira/browse/ARROW-8250 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 If you want to read a small section of a file, it is not possible to easily determine the relevant record batches that need "rehydrating". I would propose the following: * A way to cheaply read (and cache, so this doesn't have to be done multiple times) all the RecordBatch metadata without deserializing the record batch data structures themselves * Based on the metadata you can then determine the range of batches that need to be rehydrated and then sliced accordingly to produce the Table of interest This functionality can be lifted into the Feather read APIs also -- This message was sent by Atlassian Jira (v8.3.4#803005)
[VOTE] Accept "DoExchange" RPC to Arrow Flight protocol
Hello, David M Li has proposed adding a "bidirectional" DoExchange RPC [1] to the Arrow Flight Protocol [2]. In this client call, datasets (possibly having different schemas) are sent by both the client and server in a single transaction. This can be used to offload computational tasks and other workloads not currently well-supported by the Flight protocol. Please vote whether to accept the addition. The vote will be open for at least 72 hours (since it's Friday, it'll be open for a good deal longer than 72 hours). [ ] +1 Accept this addition to the Flight protocol [ ] +0 [ ] -1 Do not accept the changes because... Here is my vote: +1 Thanks, Wes [1]: https://github.com/apache/arrow/pull/6686
[jira] [Created] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent
Andy Grove created ARROW-8249: - Summary: [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent Key: ARROW-8249 URL: https://issues.apache.org/jira/browse/ARROW-8249 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Reporter: Andy Grove Fix For: 1.0.0 We now have two similar APIs with Table and LogicalPlanBuilder and although they are similar, there are some differences and it would be good to unify them. There is also code duplication and it most likely makes sense for the Table API to delegate to the query builder API to build logical plans. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Proposal to use Black for automatic formatting of Python code
+1 for using black On Fri, Mar 27, 2020 at 11:53 AM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > On Fri, 27 Mar 2020 at 18:49, Antoine Pitrou wrote: > > > > > I don't want to be the small minority opposing this so let's go for it. > > One question though: will we continue to check Cython files using > > flake8? > > > > Yes, and I think we can continue to check flake8 for python files as well. > At > least that is what we do in eg pandas. There are a few things that flake8 > checks that Black doesn't fix automatically. For example comments that are > too long are not reformatted by black, so it's good to keep flake8 working > for that. > > Joris > > > > > > Regards > > > > Antoine. > > > > > > On Thu, 26 Mar 2020 20:37:01 +0100 > > Joris Van den Bossche wrote: > > > Hi all, > > > > > > I would like to propose adopting Black as code formatter within the > > python > > > project. There is an older JIRA issue about this ( > > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to > > the > > > mailing list for wider attention. > > > > > > Black (https://github.com/ambv/black) is a tool for automatically > > > formatting python code in ways which flake8 and our other linters > approve > > > of (and fill a similar role to clang-format for C++ and cmake-format > for > > > cmake). It can also be added to the linting checks on CI and to the > > > pre-commit hooks like we now run flake8. > > > Using it ensures python code will be formatted consistently, and more > > > importantly automates this formatting, letting you focus on more > > important > > > matters. > > > > > > Black makes some specific formatting choices, and not everybody (me > > > included) will always like those choices (that's how it goes with > > something > > > subjective like formatting). But my experience with using it in some > > other > > > big python projects (pandas, dask) has been very positive. You very > > quickly > > > get used to how it looks, while it is much nicer to not have to worry > > about > > > formatting anymore. > > > > > > Best, > > > Joris > > > > > > > > > > > >
[jira] [Created] (ARROW-8248) vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib)
Scott Wilson created ARROW-8248: --- Summary: vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib) Key: ARROW-8248 URL: https://issues.apache.org/jira/browse/ARROW-8248 Project: Apache Arrow Issue Type: Bug Components: C++, Developer Tools Affects Versions: 0.16.0 Reporter: Scott Wilson After installing Arrow via vcpkg, build the library per the steps below. CMake builds the shared arrow library (.dll) and then the static arrow library (.lib). It overwrites the shared arrow.lib (exports) with the static arrow.lib. This results in multiple link/execution problems when using the vc projects to build the example projects until you realize that shared arrow needs to be rebuilt. (This took me two days.) Also, many of the projects added with the extra -D flags (beyond ARROW_BUILD_TESTS) don't build. *** "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\Common7\Tools\VsDevCmd.bat" -arch=amd64 cd F:\Dev\vcpkg\buildtrees\arrow\src\row-0.16.0-872c330822\cpp mkdir build cd build cmake .. -G "Visual Studio 15 2017 Win64" -DARROW_BUILD_TESTS=ON -DARROW_BUILD_EXAMPLES=ON -DARROW_PARQUET=ON -DARROW_PYTHON=ON -DCMAKE_BUILD_TYPE=Debug cmake --build . --config Debug -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Proposal to use Black for automatic formatting of Python code
On Fri, 27 Mar 2020 at 18:49, Antoine Pitrou wrote: > > I don't want to be the small minority opposing this so let's go for it. > One question though: will we continue to check Cython files using > flake8? > Yes, and I think we can continue to check flake8 for python files as well. At least that is what we do in eg pandas. There are a few things that flake8 checks that Black doesn't fix automatically. For example comments that are too long are not reformatted by black, so it's good to keep flake8 working for that. Joris > > Regards > > Antoine. > > > On Thu, 26 Mar 2020 20:37:01 +0100 > Joris Van den Bossche wrote: > > Hi all, > > > > I would like to propose adopting Black as code formatter within the > python > > project. There is an older JIRA issue about this ( > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to > the > > mailing list for wider attention. > > > > Black (https://github.com/ambv/black) is a tool for automatically > > formatting python code in ways which flake8 and our other linters approve > > of (and fill a similar role to clang-format for C++ and cmake-format for > > cmake). It can also be added to the linting checks on CI and to the > > pre-commit hooks like we now run flake8. > > Using it ensures python code will be formatted consistently, and more > > importantly automates this formatting, letting you focus on more > important > > matters. > > > > Black makes some specific formatting choices, and not everybody (me > > included) will always like those choices (that's how it goes with > something > > subjective like formatting). But my experience with using it in some > other > > big python projects (pandas, dask) has been very positive. You very > quickly > > get used to how it looks, while it is much nicer to not have to worry > about > > formatting anymore. > > > > Best, > > Joris > > > > > >
[jira] [Created] (ARROW-8247) [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table
Wes McKinney created ARROW-8247: --- Summary: [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table Key: ARROW-8247 URL: https://issues.apache.org/jira/browse/ARROW-8247 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 0.17.0 This is a follow up to ARROW-7741 so we have a path to the old Parquet writer logic in the event that bugs are reported and we need to give users a workaround -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8246) [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors
Wes McKinney created ARROW-8246: --- Summary: [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors Key: ARROW-8246 URL: https://issues.apache.org/jira/browse/ARROW-8246 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.17.0 See https://digitalkarabela.com/mingw-w64-how-to-fix-file-too-big-too-many-sections/ This seems to be the MinGW equivalent of {{/bigobj}} in MSVC -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Proposal to use Black for automatic formatting of Python code
I don't want to be the small minority opposing this so let's go for it. One question though: will we continue to check Cython files using flake8? Regards Antoine. On Thu, 26 Mar 2020 20:37:01 +0100 Joris Van den Bossche wrote: > Hi all, > > I would like to propose adopting Black as code formatter within the python > project. There is an older JIRA issue about this ( > https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to the > mailing list for wider attention. > > Black (https://github.com/ambv/black) is a tool for automatically > formatting python code in ways which flake8 and our other linters approve > of (and fill a similar role to clang-format for C++ and cmake-format for > cmake). It can also be added to the linting checks on CI and to the > pre-commit hooks like we now run flake8. > Using it ensures python code will be formatted consistently, and more > importantly automates this formatting, letting you focus on more important > matters. > > Black makes some specific formatting choices, and not everybody (me > included) will always like those choices (that's how it goes with something > subjective like formatting). But my experience with using it in some other > big python projects (pandas, dask) has been very positive. You very quickly > get used to how it looks, while it is much nicer to not have to worry about > formatting anymore. > > Best, > Joris >
Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"
Looks like there is consensus about this. I'll start a vote about the format change soon if no further comments. On Mon, Mar 23, 2020 at 7:41 AM David Li wrote: > > Hey Wes, > > Thanks for the review. I've broken out the format change into this PR: > https://github.com/apache/arrow/pull/6686 > > Best, > David > > On 3/22/20, Wes McKinney wrote: > > hi David, > > > > I did a preliminary view and things look to be on the right track > > there. What do you think about breaking out the protocol changes (and > > adding appropriate comments) so we can have a vote on that in > > relatively short order? > > > > - Wes > > > > On Wed, Mar 18, 2020 at 9:06 AM David Li wrote: > >> > >> Following up here, I've submitted a draft implementation for C++: > >> https://github.com/apache/arrow/pull/6656 > >> > >> The core functionality is there, but there are still holes that I need > >> to implement. Compared to the draft spec, the client also sends a > >> FlightDescriptor to begin with, though it's currently not exposed. > >> This provides consistency with DoGet/DoPut which also send a message > >> to begin with to describe the stream to the server. > >> > >> Andy, I hope this helps clarify whether it meets your needs. > >> > >> Best, > >> David > >> > >> On 2/25/20, David Li wrote: > >> > Hey Andy, > >> > > >> > I've been rather busy unfortunately. I had started on an > >> > implementation in C++ to provide as part of this discussion, but it's > >> > not complete. I'm hoping to have more done in March. > >> > > >> > Best, > >> > David > >> > > >> > On 2/25/20, Andy Grove wrote: > >> >> I was wondering if there had been any momentum on this (the > >> >> BiDirectional > >> >> RPC design)? > >> >> > >> >> I'm interested in this for the use case of Apache Spark sending a > >> >> stream > >> >> of > >> >> data to another process to invoke custom code and then receive a > >> >> stream > >> >> back with the transformed data. > >> >> > >> >> Thanks, > >> >> > >> >> Andy. > >> >> > >> >> > >> >> > >> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau > >> >> wrote: > >> >> > >> >>> I support moving forward with the current proposal. > >> >>> > >> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li > >> >>> wrote: > >> >>> > >> >>> > Just following up here again, any other thoughts? > >> >>> > > >> >>> > I think we do have justifications for potentially separate streams > >> >>> > in > >> >>> > a call, but that's more of an orthogonal question - it doesn't need > >> >>> > to > >> >>> > be addressed here. I do agree that it very much complicates things. > >> >>> > > >> >>> > Thanks, > >> >>> > David > >> >>> > > >> >>> > On 11/29/19, Wes McKinney wrote: > >> >>> > > I would generally agree with this. Note that you have the > >> >>> > > possibility > >> >>> > > to use unions-of-structs to send record batches with different > >> >>> > > schemas > >> >>> > > in the same stream, though with some added complexity on each > >> >>> > > side > >> >>> > > > >> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau > >> >>> > > > >> >>> > wrote: > >> >>> > >> > >> >>> > >> I'd vote for explicitly not supported. We should keep our > >> >>> > >> primitives > >> >>> > >> narrow. > >> >>> > >> > >> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li > >> >>> > >> wrote: > >> >>> > >> > >> >>> > >> > Thanks for the feedback. > >> >>> > >> > > >> >>> > >> > I do think if we had explicitly embraced gRPC from the > >> >>> > >> > beginning, > >> >>> > >> > there are a lot of places where things could be made more > >> >>> > >> > ergonomic, > >> >>> > >> > including with the metadata fields. But it would also have > >> >>> > >> > locked > >> >>> out > >> >>> > >> > us of potential future transports. > >> >>> > >> > > >> >>> > >> > On another note: I hesitate to put too much into this method, > >> >>> > >> > but > >> >>> > >> > we > >> >>> > >> > are looking at use cases where potentially, a client may want > >> >>> > >> > to > >> >>> > >> > upload multiple distinct datasets (with differing schemas). > >> >>> > >> > (This > >> >>> is a > >> >>> > >> > little tentative, and I can get more details...) Right now, > >> >>> > >> > each > >> >>> > >> > logical stream in Flight must have a single, consistent > >> >>> > >> > schema; > >> >>> would > >> >>> > >> > it make sense to look at ways to relax this, or declare this > >> >>> > >> > explicitly out of scope (and require multiple calls and > >> >>> > >> > coordination > >> >>> > >> > with the deployment topology) in order to accomplish this? > >> >>> > >> > > >> >>> > >> > Best, > >> >>> > >> > David > >> >>> > >> > > >> >>> > >> > On 11/27/19, Jacques Nadeau wrote: > >> >>> > >> > > Fair enough. I'm okay with the bytes approach and the > >> >>> > >> > > proposal > >> >>> looks > >> >>> > >> > > good > >> >>> > >> > > to me. > >> >>> > >> > > > >> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li > >> >>> > >> > > > >> >>> > >> > > wrote: > >> >>> > >> > > > >> >>> > >> > >> I've updated the
[jira] [Created] (ARROW-8245) [Python] Skip hidden directories when reading partitioned parquet files
Caleb Overman created ARROW-8245: Summary: [Python] Skip hidden directories when reading partitioned parquet files Key: ARROW-8245 URL: https://issues.apache.org/jira/browse/ARROW-8245 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Caleb Overman When writing a partitioned parquet file Spark can create a temporary hidden `.spark-staging` directory within the parquet file. Because it is a directory and not a file, it is not skipped when trying to read the parquet file. Pyarrow currently only skips directories prefixed with `_`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8244) Add `write_to_dataset` option to populate the "file_path" metadata fields
Rick Zamora created ARROW-8244: -- Summary: Add `write_to_dataset` option to populate the "file_path" metadata fields Key: ARROW-8244 URL: https://issues.apache.org/jira/browse/ARROW-8244 Project: Apache Arrow Issue Type: Wish Reporter: Rick Zamora Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been using the `write_to_dataset` API to write partitioned parquet datasets. This PR is switching to a (hopefully temporary) custom solution, because that API makes it difficult to populate the the "file_path" column-chunk metadata fields that are returned within the optional `metadata_collector` kwarg. Dask needs to set these fields correctly in order to generate a proper global `"_metadata"` file. Possible solutions to this problem: # Optionally populate the file-path fields within `write_to_dataset` # Always populate the file-path fields within `write_to_dataset` # Return the file paths for the data written within `write_to_dataset` (up to the user to manually populate the file-path fields) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder
Andy Grove created ARROW-8243: - Summary: [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder Key: ARROW-8243 URL: https://issues.apache.org/jira/browse/ARROW-8243 Project: Apache Arrow Issue Type: Improvement Reporter: Andy Grove Assignee: Andy Grove LogicalPlanBuilder project method takes a whereas other methods take a Vec. It makes sense to take Vec and take ownership of these inputs since they are being used to build the plan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8242) [C++] GCC 4.8 fails to compile Flight
Krisztian Szucs created ARROW-8242: -- Summary: [C++] GCC 4.8 fails to compile Flight Key: ARROW-8242 URL: https://issues.apache.org/jira/browse/ARROW-8242 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Krisztian Szucs Assignee: Krisztian Szucs See recent build log https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8241) Add convenience methods to Schema
Andy Grove created ARROW-8241: - Summary: Add convenience methods to Schema Key: ARROW-8241 URL: https://issues.apache.org/jira/browse/ARROW-8241 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 I would like to add the following methods to Schema to make it easier to work with. {code:java} pub fn field_with_name(, name: ) -> Result<>; pub fn index_of(, name: ) -> Result; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8240) [Python] New FS interface (pyarrow.fs) does not seem to work correctly for HDFS (Python 3.6, pyarrow 0.16.0)
Yaqub Alwan created ARROW-8240: -- Summary: [Python] New FS interface (pyarrow.fs) does not seem to work correctly for HDFS (Python 3.6, pyarrow 0.16.0) Key: ARROW-8240 URL: https://issues.apache.org/jira/browse/ARROW-8240 Project: Apache Arrow Issue Type: Bug Reporter: Yaqub Alwan I'll preface this with the limited setup I had to do: {{export CLASSPATH=$(hadoop classpath --glob)}} {{export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib64}} Then I ran the following: {{code}} In [1]: import pyarrow.fs In [2]: c = pyarrow.fs.HadoopFileSystem() In [3]: sel = pyarrow.fs.FileSelector('/user/rwiumli') In [4]: c.get_target_stats(sel) --- OSError Traceback (most recent call last) in > 1 c.get_target_stats(sel) ~/tmp/venv/lib/python3.6/site-packages/pyarrow/_fs.pyx in pyarrow._fs.FileSystem.get_target_stats() ~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() ~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() OSError: HDFS list directory failed, errno: 2 (No such file or directory) In [5]: sel = pyarrow.fs.FileSelector('.') In [6]: c.get_target_stats(sel) Out[6]: [, , ] In [7]: !ls sample.py sandeep venv In [8]: {{code}} It looks like the new hadoop fs interface is doing a local lookup? Ok fine... {{code}} In [8]: sel = pyarrow.fs.FileSelector('hdfs:///user/rwiumli') # shouldnt have to do this In [9]: c.get_target_stats(sel) hdfsGetPathInfo(hdfs:///user/rwiumli): getFileInfo error: IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:432) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418) hdfsListDirectory(hdfs:///user/rwiumli): FileSystem#listStatus error: IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:410) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1566) at
[NIGHTLY] Arrow Build Report for Job nightly-2020-03-27-0
Arrow Build Report for Job nightly-2020-03-27-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0 Failed Tasks: - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-win-vs2015-py38 - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-travis-gandiva-jar-trusty - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-cpp-valgrind - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-hdfs-2.9.2 - test-debian-ruby: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-debian-ruby - test-ubuntu-18.04-docs: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-ubuntu-18.04-docs - wheel-manylinux1-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-wheel-manylinux1-cp35m - wheel-manylinux1-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-wheel-manylinux1-cp36m - wheel-manylinux1-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-wheel-manylinux1-cp37m - wheel-manylinux1-cp38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-wheel-manylinux1-cp38 Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-centos-7 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-centos-8 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-debian-stretch - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-travis-gandiva-jar-osx - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-travis-homebrew-cpp - macos-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-travis-macos-r-autobrew - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-cpp - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-kartothek-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-kartothek-latest - test-conda-python-3.7-kartothek-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-kartothek-master - test-conda-python-3.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-pandas-latest - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-pandas-master - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-spark-master - test-conda-python-3.7-turbodbc-latest: URL:
Re: Proposal to use Black for automatic formatting of Python code
+1 from me too, hopefully cython support will land eventually. On Fri, Mar 27, 2020 at 11:33 AM Rok Mihevc wrote: > > +1 for black > > On Fri, Mar 27, 2020 at 11:11 AM Uwe L. Korn wrote: > > > I'm also very much in favor of this. > > > > For the black / cython support, I think the current state is reflected in > > https://github.com/pablogsal/black/tree/cython. > > > > On Fri, Mar 27, 2020, at 4:40 AM, Micah Kornfield wrote: > > > +1 from me as well. > > > > > > On Thursday, March 26, 2020, Neal Richardson < > > neal.p.richard...@gmail.com> > > > wrote: > > > > > > > I'm also in favor, very much so. Life is too short to hold strong > > opinions > > > > about code style; you get used to whatever you're accustomed to > > seeing. And > > > > I support using automation to remove manual nuisances like this. > > > > > > > > Neal > > > > > > > > On Thu, Mar 26, 2020 at 3:49 PM Wes McKinney > > wrote: > > > > > > > > > I'm in favor of this even though I also probably won't like some of > > > > > the formatting decisions it makes. Is there a sense of how far away > > > > > Black is from having Cython support? I saw it was being worked on a > > > > > while back. > > > > > > > > > > On Thu, Mar 26, 2020 at 2:37 PM Joris Van den Bossche > > > > > wrote: > > > > > > > > > > > > Hi all, > > > > > > > > > > > > I would like to propose adopting Black as code formatter within the > > > > > python > > > > > > project. There is an older JIRA issue about this ( > > > > > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing > > it to > > > > > the > > > > > > mailing list for wider attention. > > > > > > > > > > > > Black (https://github.com/ambv/black) is a tool for automatically > > > > > > formatting python code in ways which flake8 and our other linters > > > > approve > > > > > > of (and fill a similar role to clang-format for C++ and > > cmake-format > > > > for > > > > > > cmake). It can also be added to the linting checks on CI and to the > > > > > > pre-commit hooks like we now run flake8. > > > > > > Using it ensures python code will be formatted consistently, and > > more > > > > > > importantly automates this formatting, letting you focus on more > > > > > important > > > > > > matters. > > > > > > > > > > > > Black makes some specific formatting choices, and not everybody (me > > > > > > included) will always like those choices (that's how it goes with > > > > > something > > > > > > subjective like formatting). But my experience with using it in > > some > > > > > other > > > > > > big python projects (pandas, dask) has been very positive. You very > > > > > quickly > > > > > > get used to how it looks, while it is much nicer to not have to > > worry > > > > > about > > > > > > formatting anymore. > > > > > > > > > > > > Best, > > > > > > Joris > > > > > > > > > > > > > >
Re: Proposal to use Black for automatic formatting of Python code
+1 for black On Fri, Mar 27, 2020 at 11:11 AM Uwe L. Korn wrote: > I'm also very much in favor of this. > > For the black / cython support, I think the current state is reflected in > https://github.com/pablogsal/black/tree/cython. > > On Fri, Mar 27, 2020, at 4:40 AM, Micah Kornfield wrote: > > +1 from me as well. > > > > On Thursday, March 26, 2020, Neal Richardson < > neal.p.richard...@gmail.com> > > wrote: > > > > > I'm also in favor, very much so. Life is too short to hold strong > opinions > > > about code style; you get used to whatever you're accustomed to > seeing. And > > > I support using automation to remove manual nuisances like this. > > > > > > Neal > > > > > > On Thu, Mar 26, 2020 at 3:49 PM Wes McKinney > wrote: > > > > > > > I'm in favor of this even though I also probably won't like some of > > > > the formatting decisions it makes. Is there a sense of how far away > > > > Black is from having Cython support? I saw it was being worked on a > > > > while back. > > > > > > > > On Thu, Mar 26, 2020 at 2:37 PM Joris Van den Bossche > > > > wrote: > > > > > > > > > > Hi all, > > > > > > > > > > I would like to propose adopting Black as code formatter within the > > > > python > > > > > project. There is an older JIRA issue about this ( > > > > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing > it to > > > > the > > > > > mailing list for wider attention. > > > > > > > > > > Black (https://github.com/ambv/black) is a tool for automatically > > > > > formatting python code in ways which flake8 and our other linters > > > approve > > > > > of (and fill a similar role to clang-format for C++ and > cmake-format > > > for > > > > > cmake). It can also be added to the linting checks on CI and to the > > > > > pre-commit hooks like we now run flake8. > > > > > Using it ensures python code will be formatted consistently, and > more > > > > > importantly automates this formatting, letting you focus on more > > > > important > > > > > matters. > > > > > > > > > > Black makes some specific formatting choices, and not everybody (me > > > > > included) will always like those choices (that's how it goes with > > > > something > > > > > subjective like formatting). But my experience with using it in > some > > > > other > > > > > big python projects (pandas, dask) has been very positive. You very > > > > quickly > > > > > get used to how it looks, while it is much nicer to not have to > worry > > > > about > > > > > formatting anymore. > > > > > > > > > > Best, > > > > > Joris > > > > > > > > > >
Re: Proposal to use Black for automatic formatting of Python code
I'm also very much in favor of this. For the black / cython support, I think the current state is reflected in https://github.com/pablogsal/black/tree/cython. On Fri, Mar 27, 2020, at 4:40 AM, Micah Kornfield wrote: > +1 from me as well. > > On Thursday, March 26, 2020, Neal Richardson > wrote: > > > I'm also in favor, very much so. Life is too short to hold strong opinions > > about code style; you get used to whatever you're accustomed to seeing. And > > I support using automation to remove manual nuisances like this. > > > > Neal > > > > On Thu, Mar 26, 2020 at 3:49 PM Wes McKinney wrote: > > > > > I'm in favor of this even though I also probably won't like some of > > > the formatting decisions it makes. Is there a sense of how far away > > > Black is from having Cython support? I saw it was being worked on a > > > while back. > > > > > > On Thu, Mar 26, 2020 at 2:37 PM Joris Van den Bossche > > > wrote: > > > > > > > > Hi all, > > > > > > > > I would like to propose adopting Black as code formatter within the > > > python > > > > project. There is an older JIRA issue about this ( > > > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to > > > the > > > > mailing list for wider attention. > > > > > > > > Black (https://github.com/ambv/black) is a tool for automatically > > > > formatting python code in ways which flake8 and our other linters > > approve > > > > of (and fill a similar role to clang-format for C++ and cmake-format > > for > > > > cmake). It can also be added to the linting checks on CI and to the > > > > pre-commit hooks like we now run flake8. > > > > Using it ensures python code will be formatted consistently, and more > > > > importantly automates this formatting, letting you focus on more > > > important > > > > matters. > > > > > > > > Black makes some specific formatting choices, and not everybody (me > > > > included) will always like those choices (that's how it goes with > > > something > > > > subjective like formatting). But my experience with using it in some > > > other > > > > big python projects (pandas, dask) has been very positive. You very > > > quickly > > > > get used to how it looks, while it is much nicer to not have to worry > > > about > > > > formatting anymore. > > > > > > > > Best, > > > > Joris > > > > > >
[jira] [Created] (ARROW-8239) [Java] fix param checks in splitAndTransfer method
Prudhvi Porandla created ARROW-8239: --- Summary: [Java] fix param checks in splitAndTransfer method Key: ARROW-8239 URL: https://issues.apache.org/jira/browse/ARROW-8239 Project: Apache Arrow Issue Type: Bug Reporter: Prudhvi Porandla Assignee: Prudhvi Porandla -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015
Yibo Cai created ARROW-8238: --- Summary: [C++][Compute] Failed to build compute tests on windows with msvc2015 Key: ARROW-8238 URL: https://issues.apache.org/jira/browse/ARROW-8238 Project: Apache Arrow Issue Type: Bug Components: C++ - Compute Reporter: Yibo Cai Build Arrow compute tests on Windows10 with MSVC2015: {code:bash} cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON -DARROW_BUILD_TESTS=ON .. ninja -j3 {code} Build failed with below message: {code:bash} [311/405] Linking CXX executable release\arrow-misc-test.exe FAILED: release/arrow-misc-test.exe cmd.exe /C "cd . && C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir --rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests -- C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console release\arrow_testing.lib release\arrow.lib googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib googletest_ep-prefix\src\googletest_ep\lib\gtest.lib googletest_ep-prefix\src\googletest_ep\lib\gmock.lib C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ." LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console release\arrow_testing.lib release\arrow.lib googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib googletest_ep-prefix\src\googletest_ep\lib\gtest.lib googletest_ep-prefix\src\googletest_ep\lib\gmock.lib C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) with the following output: arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl std::vector >::vector >(class std::initializer_list,class std::allocator const &)" (??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z) already defined in result_test.cc.obj arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl std::vector >::~vector >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) already defined in result_test.cc.obj arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned __int64 __cdecl std::vector >::size(void)const " (?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in result_test.cc.obj release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply defined symbols found [313/405] Building CXX object src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj ninja: build stopped: subcommand failed. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)